When I first think of triage, I envision old TV episodes of MASH, a sitcom about a mobile Army hospital near the front lines of the Korean War. Injured soldiers would be delivered by truck or helicopter, and the doctors would have to perform quick assessments and make life-and-death decisions about who would get assistance first.
On the surface, the decision-making seems chaotic. However, through experience and rules of thumb, the teams are able to make quick decisions and focus their energy where it will have the maximum impact. The most emergent patients are quickly sent for surgery. Those with serious but less life-threatening wounds are stabilized so they can wait for further treatment. The patients with the least serious wounds may wait quite a while for further treatment. While they may be in pain and discomfort, their condition determines their level of treatment.
Defect triage has essentially the same characteristics. For a given set of defects, you need to determine how to act on them and proceed with repairs. The more serious defects, those with the highest priorities, receive action first, usually because of their import or because they are blocking further project activity. From a software project perspective, let’s establish the following definition for the concept of triage:
- Triage is the act of analyzing, understanding, and categorizing a specific set of defects, in preparation for making a repair scheduling decision.
Once you understand the importance of each individual defect, you must consider it against all of the other known defects; then you must contrast all of the defects against your team’s ability to repair and test them. This enters into your team’s workflow of repairs. It brings to light many factors that have to be considered, all of which have an impact on your efficiency and your overall work quality. This process, change control, is central to the book. By definition,
- Change control is the act of analyzing, planning, and scheduling the repair of multiple sets of defects and grouping them into packages for testing. It leads to an orchestrated tightening of change rules over time, which ultimately leads to a final release state. The compass for change control is a predefined set of release criteria.
Each of these concepts is useful within the central themes of the book.
Dynamics of Software Development
In his classic text, Dynamics of Software Development [McCarthy, 1995, p. 155], Jim McCarthy has two maxims that apply to the endgame. They help to underscore the most important points of endgame management. The first covers triage:
- #51: Triage ruthlessly
- The question is not how perfect the software is; rather, it’s how good the judgment of the team is in determining which imperfections to remediate.
His second maxim is related to change management:
- #52: Don’t shake the Jell-O
- Develop in your team a horror of changing the product. You have to get to ship quality for only one moment, but you have to coerce that moment into being. You do that by focusing all of your energy on stabilization, on reducing the rate and the number of bug fixes and eliminating regressions.
McCarthy obviously understands the endgame. Triage and change management are central to a successful exit from the endgame. They require your team to exhibit tremendous discipline and insight in making the right decisions to guide the evolution and stabilization of the product. And, like gelatin, stability is a fragile thing that once gained—if even for an instant—should result in shipping the product.
As I mentioned, the endgame clock starts ticking with the first release of a software product to an external party for testing.
In agile and Extreme Programming methodologies, the endgame is typically associated with activities such as these:
- one of the early iterations is passed to a customer
- acceptance tests are under way
- the team is fielding external defect reports for repair
In the more iterative or phased models—for example, Spiral, RAD, or RUP—the endgame begins when the first phased release or prototype is introduced for external testing.
And in the classic waterfall models, it begins when the product is totally complete and introduced to test.
No matter how you arrive at the endgame, or what methodology is applied, the flow of events or work is essentially the same. Everything centers on defects that need to be repaired. I view the triage and repair process as an overlay of whatever methodology you’re already using. Figure 1.1 illustrates the steps each defect follows.
Figure 1.1: Endgame Defect Repair Workflow.