Driving Your Development with Acceptance Tests: An Interview with Markus Gärtner
One of Europe's premier software consultants, he is also an instructor at the Miagi-Do School of Software Testing, will give his first international keynote address at Agile-Testing Days 2012 in Berlin, is just finishing up the tour for his first book…and now we have him here on InformIT. Meet Markus Gärtner, software test/developer and trainer for it-agile GmbH, and an advocate for acceptance test-driven development (ATDD), a development method in which customers specify their requirements by example—so precisely that developers can write tests before creating the production code.
With ATDD, when the code is complete, the programmers can demonstrate it's complete by running the tests. They also know "the code is done, because the automated tests run."
Markus' recommendations are a little more nuanced—he also recommends exploratory testing, for example—as he explains in detail in his book ATDD by Example: A Practical Guide to Acceptance Test-Driven Development, which shows what ATDD is really like, with specific code examples.
I asked Markus to tell us what it was like to transition to ATDD, share any hints that could help us avoid bumps in the road, describe what ATDD looks like in practice, and explain what he saw coming as the next great advancement in software quality.
Matt Heusser: When we talk about things like automated tests, I get a little worried. What is our goal here? Is it to eliminate manual tests? What does ATDD do for us?
Markus Gärtner: Like any tool, automation should support our activities as much as possible. There are certain activities where automation doesn't help at all, and there are other activities where automation brings us great relief. Since reading Harry Collins' book Tacit and Explicit Knowledge,  I like to compare test automation to the process of baking bread. Certain actions all lead to the same results when we execute them; Collins calls these mimeomorphic actions. When baking bread, kneading the dough is a mimeomorphic action. You can automate such actions, but you must not forget the decisions surrounding those actions that are not repeatable and will need an engaged, thinking human. When it comes to bread-baking automation, such actions might include coming up with a recipe for the bread, selecting the ingredients in the supermarket, deciding how thin to slice the bread once it's been baked, and choosing whether to cover a slice with peanut butter or strawberry jam. Collins calls these polimorphic actions—that is, actions that can lead to varying results when repeated.
Thinking about that with our eyes focused on test automation, software development teams have to make the right tradeoffs—which particular types of tests to automate, and which to leave as tests that an engaged human brain executes. Testing quadrants (see Figure 1) are a tool to help you come up with the right balance between the different types of tests. Michael Bolton's distinction between testing and checking  goes in this direction as well. But the bigger lesson you can learn from the testing quadrants is actually that you need to involve testers, programmers, and business experts when coming up with a balanced strategy for your efforts in the course of the iteration.
Regarding ATDD, another goal is involved. One of the main goals for ATDD is to "trawl for requirements," as Mike Cohn puts it.  This approach, when applied well, helps the team to come up with tested requirements, and it helps us to find the right things to build. We'll still need to explore these functions with an engaged human brain at the point when these testable requirements indicate that we might be ready to ship the product.
Matt: Tell me more about how ATDD works in practice. Could you explain what happens—the life of a story, maybe? Perhaps you could use an example from a real client company?
Markus: ATDD doesn't need to start with a story right away. More often than not, it starts even before the story—sometimes it starts with a poor story, and sometimes there is already a quite okay story to get started. While working on the acceptance criteria for an existing story, the team often identifies another story derived from the current story, or they find out that the story needs to be cut, or it isn't ready for the iteration. I remember endless client workshops where we entered with a product backlog, only to find out that we didn't have a clue about what to build at all, and afterwards we ended up with a completely new backlog.
At least three different roles in building the thing must get together to reach a common understanding of the story. The three roles include a minimum of one tester, one programmer, and one business expert (like the product owner in Scrum). They discuss the story and derive acceptance criteria for it based on their discussion. Ron Jeffries' three Cs for user stories refer to card, conversation, and confirmation.  The confirmation part is the interesting thing here, and that's derived from the conversation.
When the "three amigos," as George Dinwiddie calls them,  arrive at a basic set of acceptance criteria, before the story enters an iteration, testers will work on refining these examples. Usually there will be too few examples for the team to be able to start working on it. With their knowledge about boundaries, edge cases, and critical thinking skills, testers help to flesh out the story to the degree that the team needs. This still happens before work starts on the story. Sometimes this goes on parallel to the iteration itself, but then it might prove a risk. One client I knew had the problem that programmers and testers weren't talking enough. The programmers were pretty surprised when the examples had grown, and they hadn't yet thought about many edge cases. At first, this can be very unexpected. Over time, I expect teams to get rid of these sort of surprises, and start on the common understanding even earlier.
In the iteration, programmers start to program the story and extend the system with the new functionality. Testers and programmer also work on automating the examples as automated checks. There they'll stay as close as possible to the original discussion and the acceptance criteria they defined. They'll work together when necessary—that does not mean that testers need to program. I once worked with a client that had educated nurses as their testers. It was pretty clear that none of the nurses would ever learn how to program. But the team did have some testers who were inclined toward programming, and they were eager to learn how to automate the examples, with help from the programming team. In the end, they became the automation toolsmiths. This was a good way to solve that problem as a whole development team.
Finally, the story makes it into the final product. The team decides which of the automated examples to keep, and they might show the results of the automated tests to their business representative. One client of mine had invited customers from another country, and I presented our examples to them in a one-hour meeting. Afterward I had the impression that the customer actually wanted to spend money to buy our suite of automated tests. I found out later that they had troubles similar to ours with understanding the system, and they didn't have a suite of automated tests that could help them to understand the dependencies for the various decisions involved in building the next features.
Matt: It sounds great in theory, and I have a long-term client that has seen a great deal of success with ATDD. Yet, for every success, there seem to be several failures. Why do you think that is—and what can we do about it?
Markus: Back in 2010, I was taking a closer look at the books on this topic. Even with the focus on outside-in development, ATDD wasn't very new at that time; I traced back books on it to at least 2005. One thing that puzzled me, though, was the number of books that tried to explain the approach by focusing on its principles. Through Alistair Cockburn, I had a concept of the Shu Ha Ri model for learning. On the Shu level, someone who is new to a particular skill doesn't learn by applying principles. Normally that's a Ha-level understanding, reached through years of practice. Instead, a Shu-level learner needs clear advice and a working set of things to do to get started. None of the books I explored described these things clearly—or I found the discussions to be too outdated.
That's when I decided to close the gap. I had read Kent Beck's Test-Driven Development by Example,  which consists of three parts: Parts 1 and 2 show a working example of how to apply the approach, and part 3 describes some of the principles behind it. I found this structure compelling, so I asked whether Kent thought writing about it was a good idea, and whether he would support me. A year and a half later, I submitted the final revision following review comments to Addison-Wesley, applying the structure to ATDD.
As it stands, ATDD by Example: A Practical Guide to Acceptance Test-Driven Development gives two different ways of working with ATDD. One approach is side-by-side development, with testers and programmers working in parallel; the other approach describes an outside-in development technique that derives the application domain model from the acceptance criteria. With these two sketches, Shu-level learners can get started in their own companies, over time reaching a set of principles that work for them. I also provide some guidance to trade off different factors in the third part of the book. The whole mix helps to avoid the failures you mentioned.
Matt: What do you think are some of the common failure modes in ATDD? Is it in the initial adoption? Do companies get things wrong? Or is it more long-term issues? I know many companies in which the test suite started out great, but quickly became slow and clunky.
Markus: From what I have seen at clients' offices, I could conclude that neither testers nor programmers pay much attention to the nature of test automation. In the end, test automation is software development. To me, that means rigorously applying techniques such as SOLID design principles, TDD, and refactoring to my test automation code. I strive toward taking more care with my automation code than with the production code; I want to keep that steady loop of quick feedback as vital as possible. Also, I become pretty annoyed when it takes longer to express what I want to test than it would take to write down the code. This is usually a sign of obfuscated design, a problem with my automation code, or both.
When the client's problem is failed test automation, I often find long test methods. Writing and then copying-and-pasting long test methods are the worst things I've seen, and they're usually sort of a starting point for test automation—on whatever level—to fail. Next is coming up with test helpers, but (despite their weird conditional logic) not unit-testing the helpers on their own. Ultimately, we want to reuse them, but reusing a component comes at a price—and that price is transferring the knowledge of that component to the next person. I once worked on a project where we learned this fact the hard way over the course of roughly two years. The solution was to start from scratch, but most projects can't do that. We were lucky enough to be able to do so, and within 18 weeks we replaced a failed test automation approach that had "grown" for about a year. Of course, we were able to put a year of learning into it as well. Two years later, the components we grew and unit-tested were still in widespread use in that company.
For me, it boils down to keeping an eye on what you do, reflecting on your worst problems, and fixing them. If you don't have such an empirical process in place, you'll probably run down that rabbit hole too fast, unable to back up. In places where test automation fails, I suspect that something else isn't working right—maybe retrospectives not coming up with proper actions to cope with the biggest problem, a disassembled team, or management incentives that are counter-productive. I try to keep an eye on those things as well.
Matt: These acceptance tests—do you recommend that they drive the entire application, including the graphical user interface ("end-to-end" tests), or should they get behind the GUI to test the business logic? Or is it some of both? How do companies decide?
Markus: In my experience, a single answer won't address all the problems. In software development, this shouldn't be too surprising, since we face tradeoff decisions in our daily work. Tradeoffs might involve binding a unit test too tightly to your code, so that you can't change the code without changing the unit test. Or tying the code to a particular framework that will help you come up with the data bindings for your database, but will become a nightmare to test automatically. And then there's the "vi versus emacs" discussion, but don't get me started on that topic.
I recommend taking into account what your codebase currently does. How hard is it to automate checks for your domain logic? If there's no direct entry point, you might have to automate tests on the GUI level only. I once worked with a client that had three different tiers of servers. The middle tier wasn't well known, but there were several interfaces between the GUI and the middle-tier server. While refactoring the GUI to fit a model-view-presenter pattern, we were able to discover and rebuild the middle-tier API, thereby enabling more tests to go behind the GUI.
Usually I prefer to run as many tests as possible behind the GUI. GUIs come with unpleasant behavior such as asynchronous thread-handling, where automation has to wait, and it becomes slow and brittle pretty quickly. Whenever possible, I strive to automate behind the GUI level; once the business logic is automated, I can decide how to close the gap between the GUI and the business logic. That could mean running fewer tests manually, or automating fewer slow tests on the GUI level, but not investing too much money and time into it. In the end, we have to maintain all that stuff that we automated, right?
Matt: Part of the theme I'm picking up here is that testers and product owners can learn some programming aspects. Maybe they learn a very powerful domain-specific language; maybe they learn a scripting language. Do you find many non-technical people willing to learn these tools? Do you encounter much resistance? If so, what do you do about it? You told us earlier about a group of educated nurses; can you give us a few more examples of how companies deal with this problem?
Markus: Well, the fact that I have seen business experts use something like Cucumber, and nurses learn some table structure, doesn't mean that they were capable of doing good automation. One crucial ingredient to me is the ability to program and find abstractions. More often than not, this means putting programmers, testers, and business experts in touch with each other. I think this is a magic ingredient, somehow, and maybe we should focus more on that. Behavior-driven development (BDD) tries to do that, but neglects the thought that you cannot automate every test.
Regarding resistance, I think it doesn't pay off to have customers write examples alone, but many companies try to do that. "Oh, we just have the business expert sit alone in the corner, write some examples in a tool she doesn't really know," and then they find out three months later that their approach isn't working. The crucial ingredient to a working approach—whatever name you pick—is to have programmers, testers, and business experts work together on the examples. I provide an example for that in Chapter 1 of my book, during the workshop, where a lot of stuff happens between the lines. While at clients' offices, I try to grasp the emotions of the situation and help people see things from different angles, and not run too far into technical details where that's unnecessary. Most of the resistance comes from boring collaboration where people don't listen to each other, or they don't respect the different focuses and try to understand the other parties at the table.
Even then, there are surprises. I saw a team adopt ATDD within one day. I entered the team room to give a workshop on ATDD; by the end of the day, this team had pulled off a story from their backlog, came up with examples for it, and tried out three different automation tools for it. One smaller group even managed not only to automate the examples, but to implement the whole story—all within one day. This team was pretty far ahead when it comes to technical excellence. That's why they were able to work in such an advanced manner. At other locations I have seen test automation run by business experts alone. We helped with getting automation started there, and half a year later they had tripled the number of examples without any further help. I also have helped a company getting started with ATDD and test automation at behind-the-GUI level. The approach was introduced very slowly. It finally took off once this particular team decided to refactor the middle layer of their system and could see the benefits of a safety net of automated checks for their system in that period of time.
These are just three examples. I think there are infinite other ways to discover out there.
Matt: If companies want to adopt a more examples-driven way of working, where would you suggest they start?
Markus: My short suggestion would be to start wherever they are currently. I actually find it hard not to start with wherever companies are. Things are the way they are because they got that way, as Jerry Weinberg taught me. That said, I don't see value in completely changing everything in a company. Instead I try to see the context of the team. Who do they need to work with? What are the particular skills in the team? What would a "plan zero" for this particular group look like? Usually that means that I fit the elements of workshops, backlog grooming, examples, and test automation onto the team, and they make their own experiences. Over time, they should be able to reflect on their approach and tune it to fit their particular needs.
Of course, this approach fails in companies that don't develop the necessary self-reflection skills or that put obstacles in the way of team learning and responsibility. But I think this goes for any new approach in software development—and that's probably why Agile approaches embrace retrospectives on a short timescale such as every two to four weeks. As Elisabeth Hendrickson taught me, "Empirical evidence trumps speculation." Iterations deliver the empirical evidence that teams need to make adaptations ("pivots," to take a more recent buzzword from the Lean Startup folks) and make the hard decisions after collecting some experience. Every successful team I've seen experiments and tries to learn from it. Where people stopped doing that, I wouldn't give too much hope to getting started with anything new.
But I digress from your original question. Start wherever you are, learn what works for you, and then make adaptations as needed. If you want to use a basic set of practices to get started with a more examples-driven way of working, start with specification workshops, and invest some of your slack to come up with an automation approach that works for you. Then start to collect experience with your picks. That's it. No magic necessary.
Matt: Thank you for participating. Where can we go for more?
Markus: Shameless plug: People might want to visit my blog for news and additions on ATDD by Example: A Practical Guide to Acceptance Test-Driven Development. I am also working on some GitHub repositories that can help people take a closer look into more code to come up with a tool decision more easily. I've started to run Testautomation Coderetreats in Germany. And, of course, there are lots of articles to read, workshops to attend, and conferences where I'll be speaking.
 Harry Collins, Tacit and Explicit Knowledge. University of Chicago Press, 2010.
 Michael Bolton, "Acceptance Tests: Let's Change the Title, Too," 2010.
 Mike Cohn, User Stories Applied: For Agile Software Development. Addison-Wesley, 2004.
 Ron Jeffries, Extreme Programming Adventures in C#. Microsoft Press, 2004.
 George Dinwiddie, "The Three Amigos: All for One and One for All." StickyMinds.com, Nov. 29, 2011.
 Kent Beck, Test-Driven Development by Example. Addison-Wesley, 2002.