Lessons Hard Won
By the time the academicians began to ruminate about OT in the late '70s, a lot of crappy software had been written—enough so that the patterns that were common in that crappy software started to be painfully obvious. This led to a lot of papers and letters like Dijkstra's classic, "Go To Statement Considered Harmful" in 1968.10 Those patterns represented the rather painfully learned lessons of software development (some of which were just touched on).
One basic problem was that persistent state variables were recording the state of the system, and they were getting changed in unexpected ways or at unexpected times when they were globally accessible. This led to errors when code was written assuming a particular state prevailed and that code was later executed when that state did not prevail.
In practice such problems were difficult to diagnose because they tended to be intermittent. There was also a maintenance problem because it was difficult to find all the places where the data was written when, say, some new condition had to be added. So unrelated maintenance tended to introduce new bugs. This problem was so severe that by the late '70s every 3GL IDE had some sort of Find/Browse facility so that you could easily search for every place a variable was accessed. While convenient for the developer, such facilities didn't really help manage the state data properly. The problem was so pronounced that a functional programming paradigm was developed in parallel to the OO paradigm, largely to eliminate the need for persistent state data.
Elephantine Program Units
Once upon a time a fellow asked a colleague why he was getting a particular compiler error. The colleague hadn't seen the message before, but it sounded like some compiler's stack had overflowed, so the colleague took a glance at the code. The procedure where the error occurred was 1800 statements long, and there were several procedures of similar size in the file. Printing the compiler listing used nearly half a box of paper! Since the compiler was doing file-level optimizations, it had too much to think about and committed seppuku.
As soon as the colleague saw the length of the routine he suggested that the basic solution was Don't Do That. At that point the code author indignantly lectured him for ten minutes on how screwed up the compiler was because it didn't allow him to code in the most readable fashion—like nested switch statements that extend for ten pages are readable! The sad thing about this anecdote was that it occurred around 1985 when people were supposed to know better.
Huge program modules led to loss of cohesion. The modules do too many things, and those things tend to have relationships that were all too intimate within the module. That tended to make them more difficult to maintain because changes to one function are not isolated from the other functions in the module. Another common problem was mixed levels of abstraction. Such modules tended to mix both high-level functionality and low-level functionality. This obscured the high-level processing with low-level details, a classic forest-and-trees problem.
Perhaps the worst problem of all, though, was that the larger the module, the more interactions it tended to have with other modules, so the opportunities for side effects were greatly increased. All modern approaches to more maintainable code advocate limiting the scope where side effects can occur. The simplest way to limit scope is to keep modules small.11
The word architecture started to appear in the software literature in the '70s because people were coming to understand that a software application has a skeleton just like a building or an aardvark. The painful lesson about structure is that when it has to be changed, it usually requires a great deal of work. That's because so much other stuff in the application hangs off that structure.
There are four common symptoms that indicate structural changes are being made to an application:
- Lots of small changes are made.
- The changes are spread across many of the program units.
- Changes are difficult to do (e.g., hierarchies have to be redesigned).
- The changes themselves tend to introduce more errors and rework than normal.
Such changes are commonly called Shotgun Refactoring, and in the Hacker Era there was a whole lot of it.
Lack of Cohesion
One correlation that became apparent early on was that changes were being made to many modules when the requirements changed. One reason was that fundamental structure was being modified. The other was lack of cohesion. Functionality was spread across multiple modules, so when the requirements for that functionality changed, all the modules had to be touched. Lack of cohesion makes it difficult to figure out how to modify a program because the relevant functionality is not localized. Because different modules implemented different parts of a given functionality, those implementations tended to depend on one another to do certain things in particular orders. Although it was usually easy to recognize poor cohesion after the fact, there were no systematic practices that would guarantee good cohesion when the application was originally developed.
There have been entire books written about coupling and how to deal with it, so the subject is beyond the practical scope of this book. At the executive summary level, coupling describes the frequency, nature, and direction of intimacy between program elements. The notion of logical coupling grew out of trying to reconcile several different observations:
- Spaghetti code is difficult to maintain.
- Bleeding cohesion across modules resulted in implementation dependencies between them.
- If a client knows a service intimately, it often has to change when the service changes.
- If a service knows a client intimately, it often has to change when the client changes.
- The need to change two elements at once is directly proportional to the nature of access between them.
- The need to change two elements at once is directly proportional to the intimacy of access between them.
- Bidirectional intimacy is usually worse than unidirectional intimacy.
So the notion of coupling arose to describe dependencies between modules resulting from interactions or collaborations between program units. Alas, for a program to work properly, its elements must interact in some fashion. Consequently, developers were faced with a variation on the Three Laws of Thermodynamics: (1) You can't win; (2) You can't even break even; and (3) You can't quit playing.
When thinking about coupling it is important to distinguish between logical and physical coupling. Logical coupling describes how program elements are related to one another based on the roles they play within the problem solution. Physical coupling describes what they need to know about each other to interact from a compiler's perspective. Logical coupling in some fashion is unavoidable because solving the problem depends upon it. But it can be minimized with good design practice.
Physical coupling, though, is almost entirely due to the practical issues of implementing 3GL-type systems within the constraints of hardware computational models. For the compiler to write correct machine code in one module for processing—say, data passed to it from another module via a procedure call—the compiler must know how that data was implemented by the caller (e.g., integer versus floating point). Otherwise, the compiler cannot use the right ALU instructions to process it. So the compiler needs to know about the implementation of the calling module, which is a physical dependency on that implementation that is dictated by the particular hardware. The most obvious manifestation of this sort of coupling is compile-the-world, where many modules in a large application must be recompiled even though only one had a simple change.
Unfortunately, the literature has been concerned primarily about the frequency of access. Only in the late '80s did the deep thinkers start worrying about the direction of coupling. They noticed that when the graph of module dependencies was a directed graph without loops, the program tended to be much more maintainable. They also noticed that very large applications were becoming infeasible to maintain because compile times and configuration management were becoming major headaches. The challenge lies in minimizing or controlling the physical dependencies among program elements. The techniques for doing this are generally known as dependency management.
To date the literature has rarely addressed the nature of coupling. Since the nature of coupling plays a major role in justifying some MDB practices, it is worth identifying here a basic way of classifying the nature of coupling (i.e., the degree of intimacy), in order of increasing intimacy:
- Message identifier alone. This is a pure message with no data and no behavior. The opportunities for foot shooting are quite limited, but even this pristine form can cause problems if the message goes to the wrong place or is presented at the wrong time.
- Data by value. This form is still pretty benign because there is no way to distress the sender of the message and the receiver has full control over what is done with the data. It is somewhat worse than a pure message because the data may no longer be correct in the program context when the message is processed. This is usually only a problem in asynchronous or distributed environments when delays are possible between sending and processing the message. It can also be a problem in threaded applications where parallel processing is possible.
- Data by reference. Here we have the added problem of the receiver being able to modify data in the sender's implementation without the sender's knowledge. Data integrity now becomes a major issue. In fact, passing data by reference compounds the problems in the parallel processing environments because the receiver (or anyone to whom the sender also passed the data by reference) can change it while the sender is using it.
- Behavior by value. This curious situation arises in modern programs that pass applets in messages. This is similar to data-by-reference except that it is the receiver who can be affected in unexpected ways. The reason is that the applet is in no way constrained in the things that it does. When the receiver invokes the behavior it effectively has no control over and cannot predict a potentially unlimited number of side effects. If you don't think applets are a problem, ask anyone who deals with web site security.
Behavior by reference. Though some languages from previous eras could do this by passing pointers to functions, it was very rare. Alas, in OOPLs it can be done trivially by passing object references. Like FORTRAN's assigned GOTO, it probably seemed like a good idea at the time, but it turned out badly. Today we realize this is absolutely the worst form of coupling, and it opens a huge Pandora's Box of problems. In addition to invoking behavior with potential for all sorts of side effects, the receiver can also change the object's knowledge without the sender knowing. Since the class' entire public interface is exposed, the receiver is free to invoke any aspect of the object, even those aspects that would cause a problem for the sender.
But worst of all, the encapsulation of the sender has been totally trashed. The reason is that the object whose reference is passed is part of the sender's implementation.12 By passing it, a bay window has been opened into the implementation of the sender that invites bugs and presents a maintenance nightmare. (In OO terms, it defeats implementation hiding, which is a fundamental OO practice.) This means the maintainer must track down every place the reference is used and verify that any change to the way it is used won't break anything. Even then, we can't be sure that someone won't change the receiver during another bout of maintenance to access the instance incorrectly. And even if everything just works, one still has to rebuild the receiver because of the physical coupling.
Because it is such a broad topic, space precludes dealing with coupling in this book. But it is important to note that the notion of coupling formalizes a flock of maintainability issues, and understanding coupling is fundamental to good program development whether you are doing OO development or not.