Reducing the Time and Cost of Software Testing
Although much attention is generally paid to the cost of software development, and much excitement is generated from technologies that offer development productivity improvement, the cost and productivity of software testing are often ignored or just accepted as “that is what it costs and how long it takes.” This is ironic in that the cost and time of software testing are often comparable to the time and cost of developing the software.
Beizer reported that “half the labor expended to develop a working program is typically spent on testing activities”
IDT conducted its own survey regarding software testing, described in more detail in the book “Implementing Automated Software Testing,” which included over 700 responses. One of the questions we asked was what percentage of the overall program schedule was spent on testing. Forty-six percent of the responses said 30% to 50% was spent on testing, and another 19% of the responses said 50% to 75% was spent on testing.
The cost and time associated with software testing represent a significant portion of a project’s overall cost and schedule, and therefore improvements made to increase testing productivity and reduce labor hours can have a measurable impact. The key considerations in assessing the impact are
- What areas and how much of the test program can be automated?
- What is the expected reduction in test time and schedule?
- What is the expected increase in test coverage and impact on quality
- Are there other mitigating factors to consider?
How much of the test program can be automated? Not all the tests for a project can or should be automated. The best candidates for AST are those that are repeated the most often and are the most labor-intensive. Tests that are run only one time or infrequently will not be high-payoff tests to automate, unless the given context warrants for automation, for example in the case of the test being difficult or cost-time-prohibitive to run manually. Further, test cases that change with each delivery are also not likely to be high-payoff to automate if the AST needs to change each time. The article here “Automation Index” http://r.nbrmail.com/Resources/3920/11732/1950/21981918/onlineversion.axd" provides a summary of our approach to determine what to automate, and the book “Implementing Automated Software Testing” discusses “whether and what to automate” in more detail.
Our experience has been that 40% to 60% of the tests for most projects can and should be automated. An initial top-level assessment should be developed for each project. As you gain experience in implementing AST for more projects, you will continue to refine your ability to accurately gauge the degree to which you can apply AST to the test program, based on your own historical data.
What is the expected reduction in test time and schedule?
Before any test is automated, an ROI calculation should take place. Details of calculating ROI are discussed in detail in “Implementing Automated Software Testing.” Additionally, the need for automation of best test area candidates should be considered, such as automated test planning and development; test data generation; test execution and results analysis; error status and monitoring; and report creation, as discussed in the next sections. Our experience has shown that AST can have a big impact on reducing a project’s schedule during the test execution phase. Activities during this phase would typically include test execution, test result analysis, error correction, and test reporting. If those activities are automated, our experience has shown that much time can be saved.
Our experience also has shown that during initial test automation implementation—that is, during the test automation development phase—there will be an initial time increase.
Automated Test Planning and Development—Initial Test Effort Increase
Automated testing initially adds a level of complexity to the testing effort. Before the decision is made to introduce an automated test tool, many peculiarities need to be considered, which are discussed throughout this book. For example, a review of the planned AUT or system under test (SUT) needs to be conducted to determine whether it is compatible with the test tool. Sometimes no tool exists on the market that meets the automation needs, and test software and frameworks have to be developed in-house. Additionally, the availability of sample data to support automated tests needs to be reviewed. The kinds and variations of data that will be required need to be outlined, and a plan developed for acquiring and/or developing sample data needs to be constructed, this is discussed in the next section. Consideration needs to be given to the modularity and reuse of test scripts. Automated testing represents its own kind of development effort, complete with its own mini development lifecycle. The planning required to support a test development lifecycle, operating in parallel with an application development effort, has the effect of adding to the test planning effort.
In the past, the development of test procedures was a slow, expensive, and labor-intensive process. When a software requirement or a software module changed, a test engineer often had to redevelop existing test procedures and create new test procedures from scratch. Using test management and automated testing tool capability for generating or modifying test procedures takes a fraction of the time of manual-intensive processes.
Test Data Generation—Test Effort/Schedule Decrease
The use of test data generation tools also contributes to the reduction of the test effort. An effective test strategy requires careful acquisition and preparation of test data. Functional testing can suffer if test data is poor, and, conversely, good data can help improve functional testing. Good test data can be structured to improve understanding and testability. The content of the data, correctly chosen, can reduce maintenance efforts and allow flexibility. Preparation of the data can help to focus the business where requirements are vague.
When available, a data dictionary and detailed design documentation can be useful in identifying sample data. In addition to providing data element names, the data dictionary may provide data structures, cardinality, usage rules, and other useful information. Design documentation, particularly database schemas, can also help identify how the application interacts with data, as well as what relationships there are between data elements.
Due to the sheer number of possibilities, it is usually not possible to test 100% of the combinations and variations of all inputs and outputs to verify that the application’s functional and nonfunctional requirements have been met. However, automated testing can help, and various test design techniques are available to help narrow down the large set of data input and output combinations and variations. One such test technique is “data flow coverage,” which seeks to incorporate the flow of data into the selection of test procedure steps. Using this technique will help identify the selection of the test path that will satisfy some characteristic of data flows for all paths applicable. Other techniques such as the boundary condition testing technique are covered in “Implementing Automated Software Testing” and various other testing publications.
Test data is required that helps ensure that each system-level requirement is tested and verified. A review of test data requirements should address several data concerns, including those listed below, and is infeasible to implement in a manual fashion. Here is where automated test data generation pays off.
The test team must consider the volume or size of the database records needed to support tests. They need to identify whether ten records within a database or particular table are sufficient or 10,000 records are necessary. Early lifecycle tests, such as unit or build verification tests, should use small, handcrafted databases, which offer maximum control and minimal disturbances. As the test effort progresses through the different phases and types of tests, the size of the database should increase to a size that is appropriate for the particular tests. For example, performance and volume tests are not meaningful when the production environment database contains 1,000,000 records, but the tests are performed against a database containing only 100 records.
Test engineers need to investigate the variation of the data values (e.g., 10,000 different accounts and a number of different types of accounts). A well-designed test will incorporate variations of test data, and tests for which all the data is similar will produce limited results. For example, tests may need to consider that some accounts may have negative balances, and some have balances in the low range ($100s), moderate range ($1,000s), high range ($100,000s), and very high range ($10,000,000s). Tests must also reflect data that represents an average range. In the case of accounts at a bank, customer accounts might be classified in several ways, including savings, checking, loans, student, joint, and business. All data categories need to be considered.
The test team needs to investigate the relevance of the data values. The scope of test data is pertinent to the accuracy, relevance, and completeness of the data. For example, when testing the queries used to identify the various kinds of bank accounts that have a balance due of greater than $100, not only should there be numerous accounts meeting this criterion, but the tests need to reflect additional data, such as reason codes, contact histories, and account owner demographic data. The inclusion of the complete set of test data enables the test procedure to fully validate and exercise the system and support the evaluation of results. The test engineer would also need to verify that the inclusion of a record returned as a result of this query is valid for the specific purpose (e.g., over 90 days due) and not the result of a missing or inappropriate value. Another key here is that simulation of various paths of business logic and/or end-user rights and privileges requires different classes of data.
- Test execution data integrity
Another test data consideration for the test team involves the need to maintain data integrity while performing tests. The test team needs to be able to segregate data, modify selected data, and return the database to its initial state throughout test operations. The test team needs to make sure that when several test engineers are performing tests at the same time, one test won’t adversely affect the data required for the other test. For example, if one test team member is modifying data values while another is running a query, the result of the query may not be as expected since the records are being changed by the other tester. In order to avoid one tester’s test execution affecting another tester’s test outcomes, assign separate testing tasks, asking each tester to focus on a specific area of the functionality that does not overlap with other testers. Using automated testing scripts will allow for streamlines and automated data integrity.
Data sets should be created that reflect specific “conditions” in the domain of the application, meaning a certain pattern of data that can be arrived at only by performing one or many operations on the data over time. For example, financial systems commonly perform a year-end closeout. Storing data in the year-end condition enables the test team to test the system in a year-end closeout state without having to enter the data for the entire year. Having this data already created and ready to use for testing simplifies test activities, since it is simply a matter of loading this test data, rather than performing many operations to get the data into the year-end closeout state. Automated testing tools can help here, also.
As part of the process for identifying test data requirements, it is beneficial to develop a matrix containing a list of the test procedures in one column and a list of test data requirements in another column. As mentioned, that while a small data subset is good enough for functional testing, a production-size database is required for performance testing. It can take a long time to acquire production-size data, sometimes as long as several months if done manually. Automated testing can allow for quickly populating test databases.
Once test data requirements are outlined, the test team also needs to plan the means of obtaining, generating, or developing the test data and of refreshing the test database to an original state, to enable all testing activities, including regression testing. An automated approach is needed.
Data usually needs to be prepared prior to being used for testing. Data preparation may involve the processing of raw data or text files, consistency checks, and an in-depth analysis of data elements, which includes defining data to test case mapping criteria, clarifying data element definitions, confirming primary keys, and defining data-acceptable parameters. The test team needs to obtain and modify any test databases necessary to exercise software applications and develop environment setup scripts and testbed scripts. Ideally, already existing customer or system data is available that includes realistic combinations and variations of data scenarios (it is assumed that the data has been cleaned to remove any proprietary or personal information). Customer data can also include some combinations or usage patterns that the test team did not consider, so having real customer data available during testing can be a useful reality check for the application.
Generating test data is tedious, time-consuming, and error-prone if done manually.
Test Execution—Test Effort/Schedule Decrease
Before test execution takes place, entrance criteria must be met. For various reasons, entrance criteria verification should be automated. The entrance criteria describe when a testing team is ready to start testing a specific build. For example, in order to accept a software build during system testing, various criteria should be met, most of which should be automated, some examples are provided here; pick and choose what fits best into your environment or what actually applies in your environment:
- All unit and integration tests must be executed successfully.
- The software must build without any issues.
- The build must pass a smoke test that verifies previously working functionality is still working.
- The build must have accompanying documentation (“release notes”) that describes what is new in the build and what has been changed.
- Defects must be updated to “retest” status for the new build.
- The source code must be stored in a version control system.
Once the entrance criteria have been verified, test execution can take place. Manual performance of test execution is labor-intensive and error-prone. A test tool or in-house-developed automation framework allows test scripts to be played back at execution time with minimal manual interference. With the proper setup and in the ideal world, the test engineer needs only to kick off the script, and the tool executes unattended. The tests will compare expected results to actual results and report respective outcomes. The tests can be executed as many times as necessary and can be set up to start at a specified time. This ease of use and flexibility allow the test engineer to focus on other priority tasks.
Today’s automated test tools allow for the selection and execution of a specific test procedure with the click of an icon. With modern automated test procedure (case) generators, test procedure creation and revision times are greatly reduced in comparison to manual test methods, sometimes taking only a matter of seconds.
Test Results Analysis—Test Effort/Schedule Decrease
Automated test tools generally have some kind of test result report mechanism and are capable of maintaining test log information. Some tools produce color-coded results, where green output might indicate that the test passed and red output indicates that the test failed. This kind of test log output improves the ease of test analysis. Most tools also allow for comparison of the failed data to the original data, pointing out the differences automatically, again supporting the ease of test output analysis. In-house-developed test tools can differentiate a pass versus fail in various ways.
To be most productive, verification of exit criteria, like verification of entrance criteria, also should be automated. Exit criteria describe conditions that show the software has been adequately tested. Testing resources are finite, the test budget and number of test engineers allocated to the test program are limited, deadlines approach quickly, and therefore the scope of the test effort must have its limits as well. The test plan must indicate when testing is complete. However, when exit criteria are stated in ambiguous or poorly defined terms, the test team will not be able to determine the point at which the test effort is complete. Testing can go on forever.
The test completion criteria might include a statement that all defined test procedures, which are based on requirements, must be executed successfully without any significant problems, meaning all high-priority defects must have been fixed by development and verified through regression testing by a member of the test team. This, in addition to all of the other suggested practices discussed throughout this book, will provide a high level of confidence that the system meets all requirements without major flaws.
As a simplified example, the exit criteria for an application might include one or more of the following statements.
- Test procedures have been executed in order to determine that the system meets the specified functional and nonfunctional requirements.
- The system is acceptable, provided that all levels 1, 2, and 3 (showstoppers, urgent, and high-priority) software problem reports, documented as a result of testing, have been resolved.
- The system is acceptable, provided that all levels 1 and 2 (showstoppers, urgent) software problem reports have been resolved.
- The system is acceptable, provided that all levels 1 and 2 (showstoppers, urgent) software problem reports, documented as a result of testing, have been resolved, and that 90% of level 3 problem reports have been resolved.
Developers also need to be aware of the system acceptance criteria. The test team needs to communicate the list of entrance and exit criteria to the development staff early on, prior to submitting the test plan for approval. Testing entrance and exit criteria for the organization should be standardized, where possible, and based upon criteria that have been proven on several projects.
It may be determined that the system can ship with some defects that will be addressed in a later release or a patch. Before going into production, test results analysis can help to identify the defects that need to be fixed versus those whose correction can be deferred. For example, some defects may be reclassified as enhancements, and then addressed as part of a later software release. The project or software development manager, together with the other change control board members, will likely determine whether to fix a defect or risk shipping a software product with the defect.
Additional metrics have to be evaluated as part of the exit criteria (for additional discussions of test metrics, see “Implementing Automated Software Testing”). For example:
- What is the rate of defects discovered in previously working functionality during regression tests, meaning defect fixes are breaking previously working functionality?
- How often do defect corrections, meaning defects thought to be fixed, fail the retest?
- What is the newly opened defect rate (on average)? The defect open rate should decline as the testing phase goes on. If this is not the case, it is an indication of bigger problems that need to be analyzed.
Testing can be considered complete when the application/product is acceptable enough to ship or to go live in its current state, meeting the exit criteria, even though there are most likely more defects than those that have been discovered.
Another way to determine the software quality and whether exit criteria has been met is Reliability modeling, discussed in the section “Impacting Software Quality” later on in this excerpt.
Once the official software build has met exit criteria, the software will only be as successful as it is useful to the customers. Therefore, it is important that user acceptance testing is factored into the testing plan.
It is important that the test team establish quality guidelines for the completion and release of software. Automating this effort will pay off.
Error Status/Correction Monitoring—Test Effort/Schedule Decrease
There are automated tools on the market that allow for automatic documentation of defects with minimal manual intervention after a test script has discovered a defect. The information that is documented by the tool can include the identification of the script that produced the defect/error, identification of the test cycle that was being run, a description of the defect/error, and the date and time when the error occurred.
The defect-tracking lifecycle is a critical aspect of error status/correction monitoring as part of the test execution program. It is important that an adequate defect-tracking tool be selected and evaluated for your system environment. Once the tool has been acquired or developed in-house, it is just as important that a defect-tracking lifecycle be instituted, documented, and communicated. All stakeholders need to understand the flow of a defect from the time it has been identified until it has been resolved. Suppose you would retest a defect only if it is in retest status, but the development manager doesn’t comply with this process or is not aware of it. How would you know which defects to retest? This is a perfect example of a process that would benefit from automation.
Test engineers need to document the details of a defect and the steps necessary to re-create it, or simply reference a test procedure, if those steps were followed, in order to assist the development team’s defect correction activities. Defects are commonly classified based on priority, and higher-priority defects get resolved first. Test engineers need to participate in change control boards, if applicable, to review and discuss outstanding defect reports. Once the identified defects have been corrected and the corrections have been released in a new software build, test engineers are informed of this in the defect-tracking tool, where the defect status is set to “retest,” for example. Testers can then focus on retesting defects identified as such. Ideally the fixed defect also contains a description of the fix and what other areas of the system could have been affected by it. Testers can then also focus on retesting those potentially affected areas.
Each test team needs to perform defect reporting by following a defined process.
The process should describe how to evaluate unexpected system behavior. Sometimes a test can produce a false negative, where the system behaves correctly but the test is wrong. Or the test (especially when a testing tool is used) can produce a false positive, where the test is passed but the system has a problem. The tester needs to be equipped with specific diagnostic capabilities to determine the correctness of the output. See the discussion on Gray Box Testing in “Implementing Automated Software Testing” for additional ideas on how to approach this.
- Defect entry
Assuming the tester’s diagnosis determines that the unexpected system behavior is a defect (not a false positive, false negative, or duplicate, etc.), typically, a software problem report, or defect, is entered into the defect-tracking tool by a member of the testing team or a member of another team that is tasked to report defects.
The automated process should also define how to handle a recurring issue—an issue that has been identified before, was fixed, but now has reappeared. Should the “old” defect be reopened, or should a new defect be entered? We usually suggest that the old defect be reviewed for the implemented fix, but a new defect be reopened with reference to the “closed, but related” defect. This reference will give an indication of how often defects are reintroduced after they have been fixed, and this may point to a configuration problem or an issue on the development side.
Once the defect has been corrected, or deemed to be a duplicate, or not a defect, it can be closed. Don’t allow anyone but the test manager to delete defects. This ensures that all defects and their history are properly tracked and not inadvertently, or intentionally, deleted by another staff member.
It may be useful to have a rule regarding partially corrected defects: If a defect is only partially fixed, it cannot be closed as fixed. Usually this should not be the case; defect reports should be as detailed as possible, and only one issue should be documented at a time. If multiple problems are encountered, multiple defects should be entered, regardless of how similar the problems may be.
Once the defect has been corrected and unit-tested to the satisfaction of the software development team, the corrected software code should be checked in using the software configuration management tool. At some point an acceptable number of defects will have been corrected, or in some cases a single critical defect, and a new software build is created and given to the testing team.
When using a defect-tracking tool, the test team needs to define and document the defect lifecycle model, also called the defect workflow. In some organizations the configuration management group or process engineering group is responsible for the defect workflow; in other organizations it is the test team’s responsibility. Regardless of responsibility, automating this process will save time and increase efficiency.
Report Creation—Test Effort/Schedule Decrease
Many automated test tools have built-in report writers, which allow users to create and customize reports tailored to their specific needs. Even those test tools that don’t have built-in report writers might allow for import or export of relevant data in a desired format, making it simple to integrate the test tool output data with databases that allow report creation.
Other Mitigating Factors to Consider
Our experience in implementing automated testing for customers has been a significant payoff and reduction in test time when using AST with applications that are consistently undergoing regression testing as part of each delivery. We find projects often have a standard set of test procedures to repeat with each delivery, and in addition, these same tests need to be run across multiple configurations. As an example, a set of test procedures that required three days to execute and was run monthly was able to be automated and run in less an hour. The project now benefits from this reduced test time each time the tests are rerun.
A likely challenge you will find is locating accurate data on exactly how long it took to manually run tests, how many people were really needed, and how often the tests were actually run. Some programs undoubtedly have kept records where this information is either easily obtainable or derived, but most programs do not. The best alternative we have found is to meet with the test team and reconstruct the information as best as possible. Teams generally have a pretty good understanding of how many people were part of the testing effort for a given delivery and how many calendar days were required for testing. Given these parameters, you can develop estimates for time and effort associated with manual testing.
Tracking the actual reduction in test hours versus what was previously required when the tests were run manually is another good measure on which to develop historical data for your organization and programs. This type of historical data will be invaluable to you when estimating the impact of AST on future projects.
Projects may have unique requirements or test procedures that may not result in AST delivering the theoretical reduction in schedule. Organizations that have new test programs, less mature test processes, and/or are completely new to AST also may not experience the theoretical reduction in schedule. Each project team should assess any mitigating factors that need to be considered. A comparison of what was achieved regarding reduction in hours versus estimated hours and the mitigating factors that resulted in the deviation is another helpful metric to track and then apply to future AST projects.
For a preliminary estimate, using the methodology of reviewing how much of the test program AST is applicable to, what kind of savings could be realized after deducting the AST development or AST tool cost and other tangential factors, plus accounting for any mitigating circumstances is a sound approach to developing a preliminary estimate of projected time and schedule savings or Quality increase when AST is applied to a particular test program. While this is a simplified ROI summary, a detailed discussion on projecting the ROI from AST is provided in “Implementing Automated Software Testing,” but having an approach to estimating the savings from AST early on is generally necessary to plan the project and will help answer the question “Why automate?”