How many times have you been asked, "How do you know that automated test is actually testing what you think it is?" I've been asked that question a lot, and I have even asked it myself on occasion. It's one of the classic issues with test automation: If no human is involved, how do you really know that your tests are testing what you told them to test? What if there's a bug in your test code or in the test tool? What if the test wasn't implemented properly? What if a change in the test code in one place causes code to fail in another? Maybe the test code is outdated due to changes in the application and is no longer testing anything.
You get the point.
This article takes two approaches to these questions. First, we'll look at some methods of validating that a test is in fact testing what it's supposed to be testing. Then we'll consider what you can do if you look under the hood and find problems, and how some of these issues might be avoided. We'll focus on traditional scripted regression tests and basic performance tests.
Making Sure That Your Tests Are Actually Testing
I know of a couple of ways to validate automated tests. This is by no means an exhaustive list; these are just methods I've used in the past that have worked (or failed) for me. They may not be practical in your context, or they may not work for your software, but it's at least a place to start.
Option 1, the good old-fashioned code review. In a review, an automated test (functional or performance) is examined for accuracy by walking through the code and the log files, and by actually watching the test execute. You might even run the test in debug mode (if that's an option with your tool) and step through the code as it executes. This technique gives you the most visibility into what the script is actually doing, and it also might be easier then trying to read the code without really knowing what it's supposed to be doing.
Manual reviews require a certain level of familiarity with the automation tools, languages, any frameworks developed, and the intention of the test script being reviewed. This may not be an easy set of knowledge to come across. And if you do have staff who can perform the review, most likely they won't have time to do it because they're responsible for implementing new tests for the project team. In addition, this is a slow (and sometimes painful) process, which requires a significant investment of person-hours in order to gain any real confidence in the suite of tests.
Introducing Changes to the System Under Test
Another approach you can try is to introduce specific changes into the system under test (sometimes called error seeding). For example, if I know I have a series of automated tests that test the search feature on my web site, I might change the order in which results are returned and then run the scripts to ensure that they fail when they test that feature. With this approach, if the scripts pass, they're not testing what they should be testing. For performance tests, I might reconfigure my web server and execute the tests again, looking for change in the performance results.
While giving you better coverage than manual reviews, this method is not as reliable. In the examples I just described, some tests might fail with a different order on search results, but some tests might still pass because they don't check order but rather boundary conditions on the search field, or some other aspect that's not related to results. For the performance-testing example, you may change a server setting that has no noticeable impact to system performance; even though you think it might (I've witnessed this). This means you still need to switch to manual reviews for those tests that passed when you thought they should have failed.
In addition, now you're not taking just one person off normal project tasks for review, but several. Someone needs to add bugs to the code. Someone needs to build and deploy it (let's not talk about configuration-management issues). Someone needs to adjust the web server. Then you still have the tester, who needs to execute and validate the tests. While this is probably faster than manual reviews, it's most likely much more costly and a lot less reliable.
Spinning Off a Short Side Project To Verify Something
In this approach, you simply validate your automated tests by duplicating them using some other tool or implementation model. This is more of a macro approach to the problem, while the other two have been more micro in nature. I've only seen this done with performance tests, but I can imagine a couple of scenarios where it might make sense to try it for functional automation. This method is probably the most costly, as it's a clear duplication of work that has already been done, but sometimes this is the fastest way to gain confidence in something.
In his two-part series on performance tool comparisons (Part 1, Part 2), Suresh Nageswaran shows an example of what this might look like. In one of my current projects, we're trying to do a similar exercise by verifying page-load times using both the Mercury and Rational performance test tools. Often, you only need to duplicate some small portion of the overall set of tests and then work under the assumption that your implementation model is correct. The drawbacks to this approach are the cost of ownership for multiple tools and developing equivalent skills related to the tools and/or implementation models.
Overlapping Your Testing
This approach is probably my favorite, but it's difficult to find a context in which it can be implemented. In this approach, you simply attempt to audit one type of testing (in this case, functional or performance test automation) by using another type of testing. For example, I like to audit my performance tests whenever I run my functional tests. You can read about a specific example of this in my article "Gathering Performance Information While Executing Everyday Automated Tests." Similarly, you can sometimes do the same thing by having your performance tests validate some sort of functional requirement (but this can be a bit more difficult and may be less practical). More traditionally, for projects in which I've planned our testing, I've tried to build in a small overlap on the manual test effort and the automated test effort.
The costs for this approach are typically the lowest, as you're just tacking an extra task or two onto an existing effort. This means that no one stops what they're doing. It's your classic case of scope creep, only now you're attempting to use it to your advantage in testing. The disadvantage to this approach is that you now have all the problems related to scope creep. The original task now takes a little longer to finish, and someone has to use and maintain the overlapped functionality you developed. Don't get too carried away with overlapping, as each time you do it you're introducing inefficiency into the test project.