Exploratory Testing on Agile Teams

Date: Nov 18, 2005

Jonathan Kohl relates an intriguing experience with a slippery bug that convinced his team of the value of exploratory testing: simultaneous test design, execution, and learning.

An Elusive Bug

It was near the end of the day, and I was working as a tester on an agile team on a high-profile web application. Two teams using Extreme Programming and Scrum were working on separate but interdependent web applications. While I was the lead tester on one of the teams, I was also providing testing guidance to the other team, and it was common for people from the other team—particularly the developers—to come and ask me questions. Our team's day had been productive, and I was hoping to finish up the afternoon with some quiet time at my desk to catch up on phone messages, email, and maybe some more test automation work before I went home.

Just as I settled in, one of the developers from the other agile team walked up to my desk. I could tell immediately that something was wrong: Jason was an upbeat, talented developer with a lot of experience with Extreme Programming and test-driven development and a wonderful attractive energy. As he approached, his shoulders were slumped, he was nervously stroking his chin, and he looked worried. He practically whispered to me, "Jonathan, I need your help." I dragged a chair to my workstation so we could pair up to solve the problem.

Jason explained that his team had a sporadic bug that had become a high-profile problem. Because the developers couldn't repeat it, they didn't think they could afford to spend time on it, and it was put on the back burner. Somewhere along the line, the bug had slipped through the cracks—until it cropped up in a demo of the software to a senior manager, putting the team's credibility at stake.

It turned out that I had originally found the bug while developing an automated test to generate transactions on our system, which was dependent on the system the other team was working on. The area of the system where the bug was occurring wasn't part of our application, but our application relied on this service, so I had filed a bug report and passed it on to their team. I hadn't heard anything about it in a while, and thought they were on top of it. Jason asked what I thought he should do. I told him that the bug wasn't sporadic for me, and I pulled up the automated test I had developed, ran the script, and demonstrated the failure. He broke out into a grin, thanked me, and rushed back to his computer with the stack trace.

"That was easy," I thought with relief.

Two minutes later, Jason came back to my desk, stack trace in hand. "I can't repeat it," he said. I started the test, and we watched it play back on my monitor. When it hit the problem area in the application, we got the same error with a stack trace. He ran back to his computer, and pulled in a pair partner to help. They tried again, several times, but without success. Since we were together in an open work environment, I pulled up a chair and peered over their shoulders as they struggled to reproduce the bug, realizing that it wasn't as straightforward as it appeared.

Back at my workstation, I ran my automated test again and the application crashed. Then I tried it manually several times, and realized I couldn't repeat it reliably. What had changed since I had written the test? The automated test was using test data and running through step by step, doing the same thing over and over. The error occurred in the fifth step of an involved workflow that takes a tester several minutes to reach in the application, so there were several potential distractions and points of variance. I realized that I couldn't repeat it manually anymore. To simplify, I created a new test by modifying the automated test so that it ran until it reached the area of the program where I was having problems, at which point I could take over by hand.

Jason came back and asked how things were going. I showed him the different results achieved when running the fully automated versus manual tests. He asked for the automated test that caused the problem, and went back and ran it with his pair partner. A few minutes later, he shouted over, "Got it!" They could also reproduce it every time with the automated test: "We're going to put some tracing on this to see where it's causing the problem while you work on getting a repeatable case manually." He hoped that even without a manual test case, they could use the automated test to find the source of the problem.

At this point, I was frustrated. Within minutes, my testing efforts had moved from a seemingly repeatable bug to a sporadic bug. It was time to challenge a few assumptions. The problem occurred after the fifth step in the workflow, which contained a screen that took several pieces of test data as inputs. Using my partially automated test, I started designing and implementing tests with different data inputs. I found that if I changed one particular input to a different type of data in the automated test, I stopped getting the failure. Having an inkling of what might be causing this problem, I returned to Jason, who was now working alone. He was running the test, looking at tracing information and growing more and more frustrated. "Something's wrong. I'm getting an error in an area of the application that isn't related to our code at all." I shared my hunch on the test data, so he tried it out using the test data I described, but didn't get a failure.

I reran my tests manually, and still couldn't repeat the failure. The automated test still managed to uncover the problem. I went back to the drawing board, and analyzed where we were with regard to risk, the time we had to work, and test ideas I had already generated. I started thinking about test ideas I had missed and factors that might be affecting the outcome. The biggest difference between the automated test and the manual tests I was running without success: me. I must have been doing something differently.

I realized that I might be varying my test data as I ran through the workflow. I decided to run the test completely manually, but use the test data directly from the automated test, copying and pasting the data directly from the test file. This strategy reproduced the bug consistently. As I looked at the test data in the script, I remembered something important: Two fields on the fifth page required data that was generated for us by another group. That generated test data was evaluated by a system with strict parameters, limiting what could be entered on that page. Back when I wrote the automated test, only one type of test data was working; there were problems with other kinds of test data. Eventually, as each of those test data items was fixed, everyone on the team started to vary the data used in tests. I had developed a habit of using a particular data set for manual testing that was different from the one in the older automated test. Another look at the test data in the automated test showed me that I had developed another habit: The automated test entered data in an optional field, which I sometimes left blank in my manual testing. If when using one data type in a required field I entered data in the optional field, it produced the error, regardless of whether I varied the rest of the data on the page. I had stopped being able to repeat it because I was rushing through the workflow, and forgetting something important about my tests when I got to the screen that had revealed the problem. My partially automated script helped keep me focused on the area that needed my attention and interaction. It also helped me to test more efficiently; I could now repeat the error very quickly using both automated and manual tests.

I went back to Jason and explained what was happening. "I'm using the same test data you're using and that code is unit-tested like you wouldn't believe," he said. "I can't figure out why your automated test is causing the data to run into an area of the program it isn't supposed to run into." I told him I could now repeat the error manually on command, so we paired at my computer. When I copied and pasted in the data I thought was problematic, he stared at it, and asked to see it again. He asked me to come back to his computer, and compared the test data from my automated test to the unit tests he was running. He copied each one and pasted them in a text editor. They were different. Only slightly different, but different. He ran an automated unit test with the test data I provided, and it failed. We had found our culprit. Now we worked on piecing together the puzzle, and as we talked about the failure, new information came to light.

It turned out that the administrators of a central system we all depended on had created two kinds of test data for one data category. One was the generic set that most people in the company used when testing the system. The other set was part of another project that had not been publicized, for security reasons. This new set of data had been released only to Jason's team, and they had written their unit tests using it. Because both data sets were referred to by the same name, we assumed that we were using the same data. The automated tests on the project I was working on hadn't been provided this new test data yet. Each of our automated test suites was using different test data, and we each had missed a crucial set of test cases due to the communication error. If this sporadic bug had made it out into the field, it would have been catastrophic. It would not have been sporadic in production; a large percentage of transactions in the system would have failed. The source of our problem was due to communication problems as well as assumptions. Our testing that afternoon helped reveal not only the bug, but these other problems within our teams.

Analysis

What we were doing in this story is called exploratory testing—simultaneous test design, execution, and learning. It's the opposite of scripted testing (pre-scripted manual tests or automated tests). I combined purely manual exploratory testing and interactive automated testing to narrow the cause of the problem. (I call using the computer as a tool in testing computer assisted testing, a term I picked up from Cem Kaner.) I would run part of the test in an automated fashion, watching the script run on my screen, and then I would take over by hand to do exploratory testing at the point in the application where I wanted to focus. Gerard Meszaros describes this type of partial automation as creating a "test fixture" for exploratory testing.

Jason, the developer in my story, was also doing exploratory testing. When he was running automated unit tests, changing the inputs based on results and analyzing the code, he was doing automated testing interactively. This interactive nature of testing is a key component of exploratory testing. When tests are prerecorded as procedural test scripts, automated unit tests, or automated acceptance tests, they run without an interactive element. Most automated tests run without any human interaction. Many pre-scripted manual tests don't have much interaction, either. An important part of exploratory testing is that a tester is able to change behavior based on new information provided by the program. The tester can redesign test cases and execute them immediately based on new information. Pre-scripted test cases often discourage this behavior, and machines running automated tests aren't intelligent, so they can't change and adapt without being programmed by a human.

Exploratory Testing on Agile Projects Can Be a Good Fit

Disciplined exploratory testing is an effective way to gather and provide feedback to the team. Test-driven development has popularized the notion of writing all tests first on agile projects. Because exploratory testing is often done after the fact on software features or products, agilists often question its value. The test-driven approach can work well for software development, but front-loading all test development can narrow the classes of tests that can be run when software testing. Writing all tests first can also force testing efforts to be predictive, rather than adaptive.

An application is greater than the sum of its parts. For example, a seemingly insignificant event at the user interface layer may cause a catastrophic error at the code level. User interfaces are notoriously hard to test, and exploratory testers have a lot of experience dealing with them. A good complement to code-level testing is exploratory testing at the user interface (or any other testable interface).

User actions on deployed software are made in a social environment by an unpredictable, cognitive human. These variable actions are motivated by intuition and driven by tacit knowledge, and are difficult to repeat by others. Sometimes my job as an exploratory tester is to track down the idiosyncrasies of a particular user who has uncovered something the rest of us can't repeat. Often, a kind of chaos theory effect happens at the user interface, and this user has the right recipe to cause a unique failure. Repeating the failure accurately not only requires the right version of the source code and having the test system deployed in the same way, but knowing what a certain user does at a particular time. These actions are almost impossible to predict, and are just as difficult or impossible to automate up front.

Exploratory Testing Is Highly Adaptive

Exploratory testing allows for learning by adapting test designs as new information is discovered. Jason and I adapted our tests based on the application information we were getting. This method helped us narrow down a cause of a sporadic bug quickly. If we had pre-scripted our tests, we wouldn't have found it within the hour of time we had. We also wouldn't have predicted the human element of our own assumptions and those of the various teams in our test design. This was something we observed, and we adapted our testing to confirm what we thought was occurring.

James Bach has said that scripted testing emphasizes things like predictability and decidability, while exploratory testing relies on adaptability, credibility, and learning. We solve testing problems over the course of a project by learning, gathering more information, and utilizing the skills of diverse team members. We don't try to solve all the testing problems up front. We try to minimize documentation (this doesn't necessarily mean no documentation), and maximize the number of tests we can create and execute. As Martin Fowler points out, agile projects tend to be adaptive rather than predictive. Due to the interactivity of exploratory testing and a reliance on rich feedback, skilled exploratory testing works well with an adaptive development model. Conversely, designing all tests "test-first" can influence all test design efforts to be predictive, potentially cutting off interactive, adaptive test design and execution.

Learning is an important aspect of exploratory testing. Agile teams also value learning. As Kent Beck writes in Extreme Programming Explained, "What is it that we want to get out of code? The most important thing is learning. The way I learn is to have a thought, then test it out to see if it is a good thought. Code is the best way I know how to do this." A good exploratory tester feels similarly about testing. The best way to test out a thought is to use the software. We design tests, run them, and observe behavior in an application, often simultaneously. We inquire, make discoveries, and rerun the tests to confirm or contradict our ideas. We constantly challenge our testing ideas and learn about the software we're testing, and about our own mental models we're using to develop tests. Tests developed and executed in this way help provide all kinds of information about a product—not just bug reports.

Exploratory testing is also a practice of rapid software testing, which is taught by James Bach and Michael Bolton. Rapid software testing focuses on testing effectively and providing feedback when the tester has barely sufficient information and is under extreme time pressure. This strategy is useful for agile projects where things can change quickly, especially when iterations may be two weeks long. The tests we develop and automate can only tell us about what the code is doing according to the tests we thought of at the time. Dealing with change can be difficult; our tests might be confirming the wrong behavior. Ward Cunningham provided an answer on the Agile Testing mailing list to the question Why should agile teams do exploratory testing?: "Because an agile development project can accept new and unanticipated functionality so fast, it is impossible to reason out the consequences of every decision ahead of time. In other words, agile programs are more subject to unintended consequences of choices simply because choices happen so much faster. This is where exploratory testing saves the day. Because the program always runs, it is always ready to be explored."

When To Apply Exploratory Testing

Recently, exploratory testing has gained more exposure in the agile world. Some proponents have focused on using it as an end-of-iteration ritual in which the whole team and the customer are involved. This is a good idea, and I've used it much more than this on agile projects. I have done exploratory testing throughout development, from the first moment I have something to test, until we deliver the software.

The feedback I've had from other team members is that they love good exploratory testing when done by a skilled tester. Skilled test-driven developers have literally told me, "We love your exploratory testing." Exploratory testing helps provide them with rapid feedback, test idea generation, and supply more confidence in what the developers are delivering and the customer is receiving. Thinking about testing from an exploratory testing point of view helps provides feedback using diverse testing techniques. Sometimes the off-the-wall testing ideas find that tricky bug or expose potentially costly hidden assumptions. Most importantly, the interactive nature of testing in this way facilitates using the most powerful tool we have at our disposal: the human mind.

Developers frequently ask me to do exploratory testing during development. They might be partway through a story, and want some feedback on usability or whether the code they've developed matches the intent of the story. They'll ask again after they've developed a story: "Make it break. We want to find out where we have missed tests." At other times, I'll do integration tests by installing a build taken from a continuous integration environment into something close to production, and running integration tests or system tests. These exploratory testing sessions frequently help us find flaws that the unit tests couldn't catch. One particularly useful area has been in tracking down sporadic bugs, as I described earlier in this article. Agile teams can have unique problems with sporadic bugs because the team is moving so quickly and the code is changing, and the developers often don't have time to investigate these sporadic bugs thoroughly. Testers think about these kinds of problems all the time, and a disciplined exploratory tester can help gather information on these sorts of bugs.

I have also done exploratory testing at the end of an iteration prior to releasing software, and during pair testing sessions with testers, developers, and customers. Sometimes, a developer and I will do exploratory testing during test-driven development. Once the developer has a design in place with tests passing, we pair and run tests interactively against a testable interface in the code. We keep some of the ones we wanted to repeat, recording them as automated unit tests. On several projects, I've found exploratory testing to be a good complement to agile testing practices such as test-driven development and user acceptance testing.

Exploratory testing can be done in a highly disciplined way, utilizing the intuition and reasoning skills of a skilled practitioner. In the example in this article, exploratory testing helped identify hidden assumptions and provided good information on areas we had missed when testing during development. The philosophies behind skilled exploratory testing and agile projects tend to be congruent. If you're working on an agile team, I encourage you to learn more about applying exploratory testing throughout development, from beginning to end. There is enormous potential for diverse testing on agile projects; knowing more about exploratory testing will help tap into more of that potential.

For More Information

For more on exploratory testing, consult the following sites.

James Bach's site has a wealth of exploratory testing information:

Documenting exploratory testing:

Session-Based Test Management

Cem Kaner coined the term. His site has a lot of information as well:

Exploratory Testing in Pairs

I've written about working with developers:

Pair Testing with Developers

For classes on rapid software testing, you have a couple of options: