Goals of Test Automation
We all come to test automation with some notion of why having automated tests would be a "good thing." Here are some high-level objectives that might apply:
- Tests should help us improve quality.
- Tests should help us understand the SUT.
- Tests should reduce (and not introduce) risk.
- Tests should be easy to run.
- Tests should be easy to write and maintain.
- Tests should require minimal maintenance as the system evolves around them.
The first three objectives demonstrate the value provided by the tests, whereas the last three objectives focus on the characteristics of the tests themselves. Most of these objectives can be decomposed into more concrete (and measurable) goals. I have given these short catchy names so that I can refer to them as motivators of specific principles or patterns.
Tests Should Help Us Improve Quality
The traditional reason given for doing testing is for quality assurance (QA). What, precisely, do we mean by this? What is quality? Traditional definitions distinguish two main categories of quality based on the following questions: (1) Is the software built correctly? and (2) Have we built the right software?
Goal: Tests as Specification
If we are doing test-driven development or test-first development, the tests give us a way to capture what the SUT should be doing before we start building it. They enable us to specify the behavior in various scenarios captured in a form that we can then execute (essentially an "executable specification"). To ensure that we are "building the right software," we must ensure that our tests reflect how the SUT will actually be used. This effort can be facilitated by developing user interface mockups that capture just enough detail about how the application appears and behaves so that we can write our tests.
The very act of thinking through various scenarios in enough detail to turn them into tests helps us identify those areas where the requirements are ambiguous or self-contradictory. Such analysis improves the quality of the specification, which improves the quality of the software so specified.
Goal: Bug Repellent
Yes, tests find bugs—but that really isn't what automated testing is about. Automated testing tries to prevent bugs from being introduced. Think of automated tests as "bug repellent" that keeps nasty little bugs from crawling back into our software after we have made sure it doesn't contain any bugs. Wherever we have regression tests, we won't have bugs because the tests will point the bugs out before we even check in our code. (We are running all the tests before every check-in, aren't we?)
Goal: Defect Localization
Mistakes happen! Of course, some mistakes are much more expensive to prevent than to fix. Suppose a bug does slip through somehow and shows up in the Integration Build [SCM]. If our unit tests are fairly small (i.e., we test only a single behavior in each one), we should be able to pinpoint the bug quickly based on which test fails. This specificity is one of the major advantages that unit tests enjoy over customer tests. The customer tests tell us that some behavior expected by the customer isn't working; the unit tests tell us why. We call this phenomenon Defect Localization. If a customer test fails but no unit tests fail, it indicates a Missing Unit Test (see Production Bugs on page 268).
All of these benefits are wonderful—but we cannot achieve them if we don't write tests for all possible scenarios that each unit of software needs to cover. Nor will we realize these benefits if the tests themselves contain bugs. Clearly, it is crucial that we keep the tests as simple as possible so that they can be easily seen to be correct. While writing unit tests for our unit tests is not a practical solution, we can—and should—write unit tests for any Test Utility Method (page 599) to which we delegate complex algorithms needed by the test methods.
Tests Should Help Us Understand the SUT
Repelling bugs isn't the only thing the tests can do for us. They can also show the test reader how the code is supposed to work. Black box component tests are—in effect—describing the requirements of that of software component.
Goal: Tests as Documentation
Without automated tests, we would need to pore over the SUT code trying to answer the question, "What should be the result if . . . ?" With automated tests, we simply use the corresponding Tests as Documentation; they tell us what the result should be (recall that a Self-Checking Test states the expected outcome in one or more assertions). If we want to know how the system does something, we can turn on the debugger, run the test, and single-step through the code to see how it works. In this sense, the automated tests act as a form of documentation for the SUT.
Tests Should Reduce (and Not Introduce) Risk
As mentioned earlier, tests should improve the quality of our software by helping us better document the requirements and prevent bugs from creeping in during incremental development. This is certainly one form of risk reduction. Other forms of risk reduction involve verifying the software's behavior in the "impossible" circumstances that cannot be induced when doing traditional customer testing of the entire application as a black box. It is a very useful exercise to review all of the project's risks and brainstorm about which kinds of risks could be at least partially mitigated through the use of Fully Automated Tests.
Goal: Tests as Safety Net
When working on legacy code, I always feel nervous. By definition, legacy code doesn't have a suite of automated regression tests. Changing this kind of code is risky because we never know what we might break, and we have no way of knowing whether we have broken something! As a consequence, we must work very slowly and carefully, doing a lot of manual analysis before making any changes.
When working with code that has a regression test suite, by contrast, we can work much more quickly. We can adopt a more experimental style of changing the software: "I wonder what would happen if I changed this? Which tests fail? Interesting! So that's what this parameter is for." In this way, the automated tests act as a safety net that allows us to take chances.2
The effectiveness of the safety net is determined by how completely our tests verify the behavior of the system. Missing tests are like holes in the safety net. Incomplete assertions are like broken strands. Each gap in the safety net can let bugs of various sizes through.
The effectiveness of the safety net is amplified by the version-control capabilities of modern software development environments. A source code repository [SCM] such as CVS, Subversion, or SourceSafe lets us roll back our changes to a known point if our tests suggest that the current set of changes is affecting the code too extensively. The built-in "undo" or "local history" features of the IDE let us turn the clock back 5 seconds, 5 minutes, or even 5 hours.
Goal: Do No Harm
Naturally, there is a flip side to this discussion: How might automated tests introduce risk? We must be careful not to introduce new kinds of problems into the SUT as a result of doing automated testing. The Keep Test Logic Out of Production Code principle directs us to avoid putting test-specific hooks into the SUT. It is certainly desirable to design the system for testability, but any test-specific code should be plugged in by the test and only in the test environment; it should not exist in the SUT when it is in production.
Another form of risk is believing that some code is reliable because it has been thoroughly tested when, in fact, it has not. A common mistake made by developers new to the use of Test Doubles (page 522) is replacing too much of the SUT with a Test Double. This leads to another important principle: Don't Modify the SUT. That is, we must be clear about which SUT we are testing and avoid replacing the parts we are testing with test-specific logic (Figure 3.3).
Figure 3.3 A range of tests, each with its own SUT. An application, component, or unit is only the SUT with respect to a specific set of tests. The "Unit1 SUT" plays the role of DOC (part of the fixture) to the "Unit2 Test" and is part of the "Comp1 SUT."
Tests Should Be Easy to Run
Most software developers just want to write code; testing is simply a necessary evil in our line of work. Automated tests provide a nice safety net so that we can write code more quickly,3 but we will run the automated tests frequently only if they are really easy to run.
What makes tests easy to run? Four specific goals answer this question:
- They must be Fully Automated Tests so they can be run without any effort.
- They must be Self-Checking Tests so they can detect and report any errors without manual inspection.
- They must be Repeatable Tests so they can be run multiple times with the same result.
- Ideally, each test should be an Independent Test that can be run by itself.
With these four goals satisfied, one click of a button (or keyboard shortcut) is all it should take to get the valuable feedback the tests provide. Let's look at these goals in a bit more detail.
Goal: Fully Automated Test
A test that can be run without any Manual Intervention (page 250) is a Fully Automated Test. Satisfying this criterion is a prerequisite to meeting many of the other goals. Yes, it is possible to write Fully Automated Tests that don't check the results and that can be run only once. The main() program that runs the code and directs print statements to the console is a good example of such a test. I consider these two aspects of test automation to be so important in making tests easy to run that I have made them separate goals: Self-Checking Test and Repeatable Test.
Goal: Self-Checking Test
A Self-Checking Test has encoded within it everything that the test needs to verify that the expected outcome is correct. Self-Checking Tests apply the Hollywood principle ("Don't call us; we'll call you") to running tests. That is, the Test Runner (page 377) "calls us" only when a test did not pass; as a consequence, a clean test run requires zero manual effort. Many members of the xUnit family provide a Graphical Test Runner (see Test Runner) that uses a green bar to signal that everything is "A-okay"; a red bar indicates that a test has failed and warrants further investigation.
Goal: Repeatable Test
A Repeatable Test can be run many times in a row and will produce exactly the same results without any human intervention between runs. Unrepeatable Tests (see Erratic Test on page 228) increase the overhead of running tests significantly. This outcome is very undesirable because we want all developers to be able to run the tests very frequently—as often as after every "save." Unrepeatable Tests can be run only once before whoever is running the tests must perform a Manual Intervention. Just as bad are Nondeterministic Tests (see Erratic Test) that produce different results at different times; they force us to spend lots of time chasing down failing tests. The power of the red bar diminishes significantly when we see it regularly without good reason. All too soon, we begin ignoring the red bar, assuming that it will go away if we wait long enough. Once this happens, we have lost a lot of the value of our automated tests, because the feedback indicating that we have introduced a bug and should fix it right away disappears. The longer we wait, the more effort it takes to find the source of the failing test.
Tests that run only in memory and that use only local variables or fields are usually repeatable without us expending any additional effort. Unrepeatable Tests usually come about because we are using a Shared Fixture (page 317) of some sort (this definition includes any persistence of data implemented within the SUT). In such a case, we must ensure that our tests are "self-cleaning" as well. When cleaning is necessary, the most consistent and foolproof strategy is to use a generic Automated Teardown (page 503) mechanism. Although it is possible to write teardown code for each test, this approach can result in Erratic Tests when it is not implemented correctly in every test.
Tests Should Be Easy to Write and Maintain
Coding is a fundamentally difficult activity because we must keep a lot of information in our heads as we work. When we are writing tests, we should stay focused on testing rather than coding of the tests. This means that tests must be simple—simple to read and simple to write. They need to be simple to read and understand because testing the automated tests themselves is a complicated endeavor. They can be tested properly only by introducing the very bugs that they are intended to detect into the SUT; this is hard to do in an automated way so it is usually done only once (if at all), when the test is first written. For these reasons, we need to rely on our eyes to catch any problems that creep into the tests, and that means we must keep the tests simple enough to read quickly.
Of course, if we are changing the behavior of part of the system, we should expect a small number of tests to be affected by our modifications. We want to Minimize Test Overlap so that only a few tests are affected by any one change. Contrary to popular opinion, having more tests pass through the same code doesn't improve the quality of the code if most of the tests do exactly the same thing.
Tests become complicated for two reasons:
- We try to verify too much functionality in a single test.
- Too large an "expressiveness gap" separates the test scripting language (e.g., Java) and the before/after relationships between domain concepts that we are trying to express in the test.
Goal: Simple Tests
To avoid "biting off more than they can chew," our tests should be small and test one thing at a time. Keeping tests simple is particularly important during test-driven development because code is written to pass one test at a time and we want each test to introduce only one new bit of behavior into the SUT. We should strive to Verify One Condition per Test by creating a separate Test Method (page 348) for each unique combination of pre-test state and input. Each Test Method should drive the SUT through a single code path.4
The major exception to the mandate to keep Test Methods short occurs with customer tests that express real usage scenarios of the application. Such extended tests offer a useful way to document how a potential user of the software would go about using it; if these interactions involve long sequences of steps, the Test Methods should reflect this reality.
Goal: Expressive Tests
The "expressiveness gap" can be addressed by building up a library of Test Utility Methods that constitute a domain-specific testing language. Such a collection of methods allows test automaters to express the concepts that they wish to test without having to translate their thoughts into much more detailed code. Creation Methods (page 415) and Custom Assertion (page 474) are good examples of the building blocks that make up such a Higher-Level Language.
The key to solving this dilemma is avoiding duplication within tests. The DRY principle—"Don't repeat yourself"—of the Pragmatic Programmers (http://www.pragmaticprogrammer.com) should be applied to test code in the same way it is applied to production code. There is, however, a counterforce at play. Because the tests should Communicate Intent, it is best to keep the core test logic in each Test Method so it can be seen in one place. Nevertheless, this idea doesn't preclude moving a lot of supporting code into Test Utility Methods, where it needs to be modified in only one place if it is affected by a change in the SUT.
Goal: Separation of Concerns
Separation of Concerns applies in two dimensions: (1) We want to keep test code separate from our production code (Keep Test Logic Out of Production Code) and (2) we want each test to focus on a single concern (Test Concerns Separately) to avoid Obscure Tests (page 186). A good example of what not to do is testing the business logic in the same tests as the user interface, because it involves testing two concerns at the same time. If either concern is modified (e.g., the user interface changes), all the tests would need to be modified as well. Testing one concern at a time may require separating the logic into different components. This is a key aspect of design for testability, a consideration that is explored further in Chapter 11, Using Test Doubles.
Tests Should Require Minimal Maintenance as the System Evolves Around Them
Change is a fact of life. Indeed, we write automated tests mostly to make change easier, so we should strive to ensure that our tests don't inadvertently make change more difficult.
Suppose we want to change the signature of some method on a class. When we add a new parameter, suddenly 50 tests no longer compile. Does that result encourage us to make the change? Probably not. To counter this problem, we introduce a new method with the parameter and arrange to have the old method call the new method, defaulting the missing parameter to some value. Now all of the tests compile but 30 of them still fail! Are the tests helping us make the change?
Goal: Robust Test
Inevitably, we will want to make many kinds of changes to the code as a project unfolds and its requirements evolve. For this reason, we want to write our tests in such a way that the number of tests affected by any one change is quite small. That means we need to minimize overlap between tests. We also need to ensure that changes to the test environment don't affect our tests; we do this by isolating the SUT from the environment as much as possible. This results in much more Robust Tests.
We should strive to Verify One Condition per Test. Ideally, only one kind of change should cause a test to require maintenance. System changes that affect fixture setup or teardown code can be encapsulated behind Test Utility Methods to further reduce the number of tests directly affected by the change.