Test-Driven Development from a Conventional Software Testing Perspective, Part 2

Date: Apr 21, 2006

Now that Jonathan Kohl had had some experience working with a test-driven development (TDD) expert, he needed to try TDD on his own. In part 2 of this series, he discusses his trial-and-error effort at learning TDD skills.

After my experience of test-driven development (TDD) immersion described in part 1 of this series, I was ready to take the next step in my learning. I had acquired some basics on how to do TDD from an expert practitioner, but realized that I still had much more to learn.

As my TDD teacher said, "Practice makes perfect." I needed to do more programming, but in a strict TDD way, so I dabbled here and there as I programmed doing test automation with Ruby. I grew comfortable with the Ruby Test::Unit automated unit test framework, and practiced writing a test and enough code to make that test pass. I was ready to take my TDD practice to the next level, so when I had an opportunity to do more test automation work, I jumped at the chance. After all, test automation is software development, so, as a software tester who does test automation work, this seemed like a great place to try applying TDD so I could learn more.

TDD and Conventional Testing Work

After working as a tester, I decided to use TDD on a test automation project myself. I had a task to program a testing library that other testers would use to make their test automation work easier and less susceptible to product changes.

I started off with a spike, writing experimental code I would use to build a proof of concept, and then throw away. Once I was comfortable with the environment and related libraries I needed to use, I set the spike aside and started afresh. The spike gave me the ability to think of a basic design to get started. I found that I couldn’t just start coding completely cold with a test, like some of my TDD friends do. The spike gave me the ideas I needed to be able to start writing the first tests. Once I had learned enough about the environment through the spike, I got rid of that code.

To start developing my custom library, I wrote a test, and with confidence came up with a method name for the yet-to-be-developed production code. I ran the test and got a red bar. The error message told me it couldn’t find that method, so I wrote the method, and added the necessary include statement so the automated test harness could find it. It failed again, but this time it failed on the assertion, not because it couldn’t find the method.

I was on a roll. I added more code to my method and presto! When I ran the test, it passed with a green bar. Remembering the "do an opposite assertion" trick I’d learned from my developer friend, I added an assertion that did the opposite. This was a simple method, and it returned a Boolean as a result, so my assertions were "assert this is true" and "assert this is false." Something happened, though: Both passed, when they shouldn’t have. I had a false positive on my hands, which was even more serious than a test failure.

Some investigation showed me a fatal flaw in my new method. It was returning something of the wrong type, but my test harness was interpreting it as a Boolean. I changed my tests so they would catch this problem more easily, changed my method, and the tests passed correctly. I then created some simple test data so my tests would run quickly and not use hard-coded values, and I reran the tests. I found a couple of failures because the test data exposed weaknesses in my code. In short order, I took care of these weaknesses and added some new tests for the trouble spots.

Continuing on this path, I had a handful of methods, but it didn’t make sense to leave them as a collection of methods. It was getting awkward to call them, and a couple of groupings had emerged within them. It made more sense to have these methods be part of objects, so I created two classes and put each group of methods in them. I added setup and teardown methods to my automated unit test suite, which would create new objects, and then I could call these methods in my unit tests.

Soon all my tests passed again, after a couple of failures revealed some errors. Running the automated unit tests gave me confidence, and I could change code like this fearlessly. I renamed a lot of my methods so other people would find them easier to use, and refactored regularly now that I had a better understanding of the emerging design.

A Lesson Relearned

After a couple of days of developing this library, it was time for a demo. Showing the testers how the library worked using the xUnit framework within my IDE was a snap. They had a lot of questions, most of which could be answered by walking them through a particular test I had written. One question threw me off, though: "This looks fine in your development environment, but how will it work when I use it with my tests?" It dawned on me that I had been so enamored with my automated unit tests and so busy programming that I had forgotten to run any functional tests. Oops.

I wasn’t sure what would happen, so I wrote some code that would rely on this library (as the testers would when using it), gathered some real production data that was used by the testers, and ran a functional test. It failed. My interface into the library didn’t work very well. It worked fine for the xUnit framework with my mocked-up data, but didn’t stand up to real-world use. My own words came back to haunt me: "Don’t rely completely on automated unit tests; be sure to run functional tests in a system as close to production as possible." The proof of the degree of testability in a design can only be known by actually testing after development, by investigating the results of the tests, and by adjusting those tests based on the test results.

I needed to do a bit of redesign for functional testability, so I created a suite of functional tests. They took a lot longer to run, but my automated unit tests provided a safety net so I could make changes with ease. Shortly I was able to work the code so that the automated unit tests all passed, as did the functional tests. It now had the added bonus of being testable in both situations, and was ready for a production trial. I demonstrated the new functional tests for the testers, and added application programming interface (API) documentation to the library so testers could start using it. I felt sheepish that I had become so in love with my "green bar" that I forgot I was a tester for a while.

Testing too much in one context (in my case, the code context), can skew our views. There are many contexts in which we can test, and all help us get a different view of the software and provide useful information. Through this process, I came to realize how easy it is to develop a false sense of security when the focus of testing is too narrow. I had quickly fallen into a trap of narrow testing thinking, and once again had learned a lesson.

Once the library was ready for production, I paired with a tester who helped me run through some manual functional tests. When we were satisfied with the results, I released the test library for use by the rest of the team. I also released the tests internally—both the TDD-derived automated unit tests and the functional tests. Internal users could then use the tests as product documentation to get up to speed using the library, and use them as tests in the test environment if they needed to change the library or verify a change in the environment.

From TDD to Legacy Code

I now moved on to a new project that involved enhancing a homebrewed test automation framework. There was one catch: It didn’t have any unit tests. While I could start developing sooner, I was quickly gripped with that fear my developer friends had told me about. I would make one change, and then I’d have to do a lot of unit and functional tests to make sure that I hadn’t introduced any bugs in other areas of the automation code.

My progress slowed to a crawl, as I would make one small change, and find it caused a problem somewhere else in the automation stack. Another programming tester and I kept frustrating each other by making changes that caused bugs in each other’s projects. Since we didn’t have a suite of automated tests to run that covered the entire test application, we were limited to our ad hoc unit and functional testing that we did prior to code check-in. Invariably, each of us would miss a set of tests in an area that was unfamiliar, and therefore we would cause problems for each other. I realized that this automated test development looked exactly like any other legacy software development project that isn’t easily testable.

On the other hand, the unit-tested custom library I had developed earlier was working quite well. Testers found a bug during usage, so when I had a bug report I worked with another tester to fix it. We first added a unit test that would cause the error in the bug report, and then added the code to make the test pass. We then ran the entire suite of automated unit tests, and with that green bar in place, we were confident in the code. We ran the released functional tests again for good measure (I’d learned my lesson), and checked in the change with confidence. The simple, testable library was much easier to adapt to change than the large, complex test automation stack.

This experience reinforced a testing paradox I had discovered when I first started doing test automation work: "Who tests the tests?" or "Who watches the watchmen?" Test automation is software development, and as such is subject to the same problems as any other kind of programming activity. It benefits from testing as well, but there’s a huge dependency that causes problems—the application we’re testing. Now we have two applications to worry about: the production software and the test software. Production software can stand on its own, but the test software requires the production software in order to function at all.

The test software gives us confidence in the production software, but what happens when the test software becomes complex? My first inclination was to have tests for our test automation stack, but I quickly found that those tests could get complex as well, which led me into a situation that felt recursive. Like those Russian stacking dolls (Matryoshka)—when you open the doll, a slightly smaller doll is inside, with another doll inside that one, and so on. I could think of tests to test the tests to test the tests. Where do we draw the line on testing tests?

As another developer pointed out to me, there’s a symbiosis between the tests and the code. Jerry Weinberg has described symmetry between the test code and production code, that each tests the other. One developer friend told me that if our automation test code is so complex we have to have tests for it, we should look seriously at our automation stack. This is a difficult problem, and I have often jumped into test automation without thinking about the design and the potential consequences of those early design decisions.

Lessons Learned

I learned a lot from using TDD in a test automation project. I learned that for a testing library that automated tests use, unit tests might be a good idea. It’s another software service that we depend on, and is subject to change according to the changing requirements of the production software. In fact, the automated unit tests for my custom test library revealed a specification error in the production software when I got test data from a new build. When the specification was upgraded at the customer’s request, I simply added a new unit test and updated my test data to reflect it. The automation code that relied on this library didn’t notice a thing, so I had written a good interface. TDD helped me develop a good, testable interface that was simple to maintain. Testers used my automated unit tests as another form of documentation for the API. Users would frequently skip my wordy prose (the published API documentation), and just look at the tests in the unit test folder as examples to figure out how to use the interface.

I also learned that trying to implement TDD within legacy code is a daunting task. Working with a new project that was done with TDD from the beginning was much easier than trying to deal with a legacy project. I learned firsthand about the fear TDD developers express when writing tests without automated unit tests as a safety net. I had a new appreciation for those who try to introduce TDD into established software code-bases.

Lessons from my pair-testing sessions with a developer were reinforced by this process. I was reminded not to get too caught up in thoughtlessly generating tests, but to keep the design in the forefront of my mind. I was reminded once again that it’s easy to get too caught up in the automated unit tests and develop a false sense of security. Using other test techniques with real data in a real system was important; it complemented the automated unit tests on which I was relying. I now had a sense for what TDD developers go through and how they think when developing software. I had firsthand TDD experience that, while very simple and basic, was a great learning experience for me. I had a better understanding of the struggles and pitfalls that programmers face, and of how to develop software in a different way. I could relate to TDD developers more easily after doing it on my own, and I had a wealth of new testing ideas to apply in other areas of my work.

Armed with understanding and fresh ideas about testing, I was ready to reflect on my experience. I was feeling confident about the practice, but I needed to think of what could go wrong to balance my enthusiasm. Continue to part 3 of this series to read my thoughts and conclusions about TDD and testing.