That One Thing: Reduce Coupling for More Scalable and Sustainable Software
How many times have you heard someone say, “I make a change and all the tests break” or “Testing doesn’t work because tests are brittle”?
These kinds of comments regularly surface in software development circles. They usually accompany explanations of why quality isn’t what it should be or tests shouldn’t be written. These statements share an underlying cause. If you read the title of the article, you know what it is: coupling.
Coupling is simply the computer science word for dependencies. It is the degree to which code modules rely on other code modules. Larry Constantine invented it as a metric over 40 years ago for Structured Design. With a little adaptation to object-orientation and newer language concepts, it’s as relevant today as it was then.
Coupling impacts our work on many levels, but we’ll focus on the micro level, the coupling of tests to production code that is squarely in the realm of our day-to-day applied software craftsmanship.
Necessary Coupling
It’s easy to overlook the ways in which our software is coupled. It tends to be more obvious in strongly and statically typed languages than in weakly or dynamically typed ones. (Note: By “strongly typed” I mean that data has a strong sense of type rather than that strict type-safety is enforced). Thus, it’s easier to see it in C++ and Java (strong and static) than it might be in Ruby (strongish and dynamic) and JavaScript (weakish and dynamic), which in turn show coupling more easily than Bourne Shell (weak and dynamic).
For this exploration, let’s assume an object-oriented context because it gives us a fairly complete set of the types of coupling available. We’ll use tests for many of the examples as coupling is more often overlooked in tests than in the application code.
Let’s start with the most apparent forms of coupling in Java.
@Test public void testCalculateResult() { Something sut = new Something(); Argument input = new Argument(); Result result = sut.calculateResult(input); assertNotNull(result); }
Some forms of coupling are unavoidable in tests. This test depends on
- the class it is testing (Something),
- the constructor for the class,
- the method it is testing (calculateResult),
- the type of the argument to the method (Argument),
- the constructor to the argument type, and
- the return type of the method (Result).
Are you surprised that we have six points of coupling in a fairly simple test? And that’s without testing anything meaningful about result using getters or setting any attributes of input through constructor arguments or setters.
So why does this matter? Your test is now subject to change based on potential changes in three classes. In this case, they’re part of the interface, so unless the interface is simplified to reduce the number of participating types, it’s what we have.
The Chains That Bind
Here is another tragically common scenario, if you’ll forgive the melodramatic characterization, this time in JavaScript.
it('gets what it gets', function() { expect(chain.getLock().getKey()).toBe(theRightKey); }
Obviously, we’re dependent on chain and its type and its getLock method. But wait! The getLock method returns a type with a getKey method that in turn returns a type, which we assume to be the same type as theRightKey. We’ve now added three more couplings that aren’t inherently part of the module we’re testing.
Commonly, methods like getLock are lazy wrappers around an implementation type; more on that in a moment, but it implies we’re testing implementation rather than intent. Without static typing, we’re even making assumptions the type returned and the presence of the successive getKey method. We’re transitively coupling ourselves to the representation of the lock.
So what can we do about this? I would either just verify that the lock was the expected lock, perhaps with something that tested the deep equality necessary to verify the key, or I would add a getKey method to chain that hid the implementation detail of the lock. Either way, the test would no longer be coupled to the implementation, and the interface would no longer force its clients to couple to the implementation.
Contain Yourself
Going back to Java, let’s look a little more closely at the problems with implementation types.
@Test public void testGetCustomers() { ArrayList<String> customers = sut.getCustomers(); ... }
Most Java programmers know you shouldn’t use concrete types in your interfaces. That avoids coupling to the implementation. So what if we changed ArrayList to just be List?
@Test public void testGetCustomers() { List<String> customers = sut.getCustomers(); ... }
We’ve improved things, but is it good enough? Is it important to the test that this is a list? A list is an implementation decision predicated on the need for ordered storage. If the test will sort it anyway, verify set-wise equality, or simply iterate over the members, the test (and perhaps the return type of getCustomers) can be changed to use Collection or even Iterable instead of List. This would also provide resilience should you later decide that the customers were better represented with some form of Set or other Collection.
On the one hand, this is all basic algorithmics. On the other, I’ve seen many developers painstakingly choose their internal representation, pay little thought to their return types, and write their tests unquestioningly.
To Kill A Mocking Test
I often hear developers talking casually about testing code via “mocks.” Sometimes this decodes as meaning any form of test double. Other times this means they are using a self-proclaimed mocking framework to test. Yet other times they are ardent practitioners of the Mockist arts or unquestioning FactoryGirl users.
First of all, know your test doubles. A dummy simply holds a place. A stub returns a canned value. A spy records interactions for later verification. A mock verifies as many interactions as possible during execution. A fake substitutes a lighter-weight implementation for a heavier one. Just as not all tests written with JUnit are unit tests, not all test doubles created with a mocking framework are mocks. Most mocking frameworks can create any type of test double; some “mocking frameworks” can’t really create mocks by this definition.
The key to choosing test doubles is to realize that each type incurs a different degree of coupling. In general, prefer no double over a dummy over a stub over a spy over a mock. Fakes are different and outside of the hierarchy, more commonly a tool of necessity than of choice.
At the extreme end of the Mockist spectrum, you have those for which every testing problem deserves a mock. Arlo Belshee wrote an excellent blog post called “The No Mocks Book” including some follow on from Steve Freeman and others that highlights some more absurd consequences of a mocks-only approach such as over-engineering.
Let’s look at the more subtle differences between the other test doubles. In general, a double takes the place of some element of the interface or implementation of the software under test. Leveraging that observation with our perspective from the earlier examples, we can analyze the remaining doubles.
The most common use of a dummy fills in a parameter in a calling list that is syntactically required but otherwise unnecessary for the test. In that case, a dummy uses a type that the interface already requires. Its construction can incur additional coupling, but it is otherwise benign. If the system already has null-object implementations, they make great dummies.
A stub fills a place like a dummy, but also provides canned responses to certain interactions. Interactions imply methods and responses imply return types and their possible construction beyond the needs of a dummy. The additional three points of coupling make it a slightly more binding form of test double.
Spies record interactions for later verification. Interactions also imply methods. Recording method invocations suggests storage corresponding to the parameters, their types, and possibly other attributes such as invocation counts and information about known side effects. Spies also tend to be used in clusters, multiplying the degree of coupling.
Finally, mocks have all of the characteristics of spies, plus they encapsulate specific expectations during execution. Note that there are things that mocks can’t verify on the fly until they know the test is finished. For example, it can easily trigger an assertion when the test invokes a method more times than expected, but it can’t know it was invoked less than expected until you explicitly tell it the test is done.
Another, more insidious form of coupling comes with mocks. The poster child for mocks is the purely behavioral method, such as a method whose job is only to orchestrate calls to other methods. The sneaky coupling comes in the binding to behavior, something that compilers and syntax checkers don’t understand. This is a raw form of coupling to the implementation rather than the intent that comes hand-in-hand with many uses of mocks.
Hang Loose
Hopefully, I’ve started you toward a thorough understanding of how coupling impacts your tests and the options for minimizing and avoiding it. Like most aspects of craftsmanship, intentional practice builds the skills that you can apply more effortlessly in your day-to-day routine. Take some time to really ponder how you can loosen the coupling in your tests. You will find your tests become easier to write and less brittle to maintain. The same principles will improve the design of your software and hopefully extend its useful and maintainable life.