An Introduction to Catastrophe Disentanglement for Software Projects
- May 4, 2006
In Spencer Johnson’s Who Moved My Cheese , the little people keep coming back to where the cheese used to be even though it’s not there anymore. It’s a natural tendency to continue doing what we did before even when, to an outside observer, it no longer makes sense. This behavior is quite common when software projects get into trouble. We keep plodding away at the project hoping that the problems will go away and the "cheese" will miraculously reappear. In all too many cases, it doesn’t.
Just as the smart thing to do when a ball of twine seems hopelessly entangled is to stop whatever we are doing with it (otherwise, the tangle gets worse), so it is with a disastrous project; the longer we keep at it, the worse it gets. At some point, we need to halt all activity and reassess what we are doing.
Disastrous software projects, or catastrophes, are projects that are completely out of control in one or more of the following aspects: schedule, budget, or quality. They are by no means rare; 44% of surveyed development organizations report that they have had software projects cancelled or abandoned due to significant overruns, and 15% say that it has happened to more than 10% of their projects (see Figure 1.1).
But obviously, not every overrun or quality problem means a project is out of control, so at what point should we define a software project as a catastrophe? What are the criteria for taking the drastic step of halting all activities, and how do we go about reassessing the project? Most importantly, how do we go about getting the project moving again? The answers to these questions are the essence of the concept of catastrophe disentanglement.
One of the best-known attempts to disentangle a multi-hundred-million-dollar catastrophe ended recently, more than a decade after it began. In August 2005, the plug was finally pulled on the infamous Denver airport baggage handling system, in a scene reminiscent of Hal’s demise in the memorable Kubrick space odyssey movie.1 This was a project that had gained notoriety for costing one million dollars a day for being late. One of the interesting questions about the Denver project is why didn’t the repeated efforts to save it succeed?
Of all the problems that plagued the project (see , ), probably the most formidable was the project’s unachievable goals. It is unlikely that anyone associated with the project could have brought about a significant change to the goals because the project’s extravagant functionality had, in fact, become part of its main attraction. But the ability to define achievable goals is a cornerstone of any catastrophe disentanglement process, without which the process cannot succeed, and that is one of the main reasons the Denver system could not be disentangled.
As indicated by the above survey data, cases like the Denver project are not rare (although few are as extreme). Most development organizations know this even without seeing the survey data. This frustrating reality was expressed in a famous quote from Martin Cobb of the Canadian Treasury Board: "We know why projects fail, we know how to prevent their failure—so why do they still fail?" .
Cobb’s quote highlights the conventional approach of software engineering. The objective of existing software engineering practices is to prevent the occurrence of software catastrophes—that is, to prevent the project from spiraling out of control. As such, the practices have an important role to play in software development. However, more than five decades of experience show that despite these methods, software catastrophes will continue to be around for a while.
When a software project is out of control, there is no PMI, IEEE, SEI, or ISO rescue process to follow because these organizations offer preventive, rather than corrective, solutions. But is such a project necessarily doomed? Will it inevitably collapse in failure? The following chapters will show that this is far from inevitable.
This book fills the void for corrective solutions in software engineering. It deals with projects that are already in serious trouble. In fact, this book is less concerned with how we got into trouble; it is more concerned with how we get out.
1.1 Overview of the Catastrophe Disentanglement Process
Before the first step in disentangling a project can be taken, we must first establish that the whole process is necessary. This means deciding that the project, as it is currently proceeding, has little chance of success without taking drastic measures.
Many software organizations have difficulty making this decision, and some avoid it entirely. In fact, there is a general tendency to let troubled projects carry on way too long before appropriate action is taken . Keil  uses the term "runaways" to describe software projects that continue to absorb valuable resources without ever reaching their objective. Keil’s runaways are, in effect, undiagnosed catastrophes that went on unchecked for much too long. Indeed, the ability to save a project is usually dependent on how early in the schedule a catastrophe is diagnosed. Furthermore, organizations that permit a runaway project to continue are wasting valuable resources. This reality is well demonstrated in the following case.
1.1.1 A Case Study
The FINALIST case, described next, demonstrates how difficult it is to acknowledge that a project is in serious trouble, even when the problem is obvious to almost anyone looking in from the outside. It is an interesting case because it is by no means unique; it demonstrates just how easy it is to become committed to a failing path.
After the year 2000 passed, and the software prophets of doom faded away, a Canadian software company found itself with almost no customers for one of its small business units. The unit’s main expertise was in supporting Cobol programs (where many of the bug-2000 problems were expected to be), and suddenly there wasn’t enough Cobol work to support it.
So the company decided to rewrite one of its core products, FINALIST, a large financial analysis system, but it chose to write it again in Cobol in order to retain the company’s unique expertise for solving bug-2000 problems (which it still thought would materialize). The new project, appropriately named FINALIST2, was given a 30-month schedule and a team of 14 developers, eight of whom were veteran Cobol programmers.
At the beginning of the second year of the project, two Cobol programmers retired and, soon after, three more moved to another company. With only three veteran Cobol programmers left, the FINALIST2 project began to experience serious problems and schedule delays. The company’s management repeatedly resisted calls to reevaluate the project and attempted to get it back on track by conducting frequent reviews, adding more people to the team, providing incentives, and eventually, by extending the schedule.
Finally, 28 months into the project, a consultant was brought in, and his first recommendation was to halt the project immediately. This drastic advice was based on the conclusion that little or no meaningful progress was being made and the project, as it was defined, would probably never be completed. There were not enough experienced Cobol programmers around to do the work, and it was unlikely that new ones would be hired. Furthermore, it was unlikely that the new recruits would become sufficiently proficient in Cobol within any reasonable time frame.
The final recommendation was to either restart the project in a modern programming language or to cancel it entirely.
One of the key points in this case is that management failed to notice that what was once a strength (Cobol) had ceased to be one—a classic example of "who moved my cheese." This failure was clearly fostered by a strong desire to preserve Cobol expertise within the company, but it was also the result of a natural reluctance to acknowledge a mistake (resistance to reevaluate the project). These two factors obscured the solution. And so management attempted to fix almost everything (process, team, schedule) except the problem itself.
This case illustrates the difficulties decision makers have in accepting the need for drastic measures and is reminiscent of a gambler who cannot get up and walk away. First, there is the natural tendency to put off making the difficult decision in hope that conventional methods will eventually get the project back on track. A second difficulty involves over-commitment to previous decisions, prompting the investment of more resources to avoid admitting mistakes (this is known as escalation ).
But troubled projects are never a surprise, and even those most committed to a failing path know that something is severely wrong. But how severe is "severely wrong"? How can we know that it is time for drastic measures? Ideally, there would be a decision algorithm (a kind of software breathalyzer) to which managers could subject their projects, and which would make the decision for them.
1.1.2 Deciding to Rescue a Project
There is no perfect breathalyzer for catastrophes. However, although it is difficult to make a completely objective decision about a project, there are methods that remove much of the subjectivity from the decision. These methods involve an in-depth evaluation of the project and require significant effort. Unlike status reports or regular progress reviews, they are not designed to be applied at regular intervals throughout the development cycle. The process prescribed by these methods is to be applied only when we suspect that a project may be in serious trouble, but we are unsure whether it requires life-saving surgery.
The procedure is based on the evaluation of three basic project areas:
The procedure examines whether serious problems have existed for quite a while in any of these project areas and whether the situation is getting worse, not better. Any one of these areas can trigger a catastrophe decision, but when this happens, it is not unusual for serious problems to exist in all three. Chapter 2, "When Is a Project a Catastrophe?," covers this subject in detail and also discusses the tricky question of what quality is (the definition will be based on the level of product defects and the degree to which customers or users are satisfied with the product).
Once the decision has been made that a project is indeed a catastrophe, the options become more clear: save it or lose it. This is the time for the ten-step disentanglement process.
1.1.3 The Disentanglement Process
The disentanglement process is designed to rescue a seriously troubled project, provided it can establish business or strategic justification for doing so. The process is built around two main figures: the initiating manager (who initiates the process and oversees its implementation) and the project evaluator (who leads and implements the disentanglement process). The initiating manager is an insider, a senior manager in the organization that owns the project. The project evaluator is an outsider, a seasoned professional, reliable, and impartial.
The catastrophe disentanglement process consists of the following ten steps:
- Stop: Halt all project development activities and assign the team to support the disentanglement effort.
- Assign an evaluator: Recruit an external professional to lead the disentanglement process.
- Evaluate project status: Establish the true status of the project.
- Evaluate the team: Identify team problems that may have contributed to the project's failure.
- Define minimum goals: Reduce the project to the smallest size that achieves only the most essential goals.
- Determine whether minimum goals can be achieved: Analyze the feasibility of the minimum goals and determine whether they can reasonably be expected to be achieved.
- Rebuild the team: Based on the new project goals, rebuild a competent project team in preparation for re-starting the project.
- Perform risk analysis: Consider the new goals and the rebuilt team, identify risks in the project, and prepare contingency plans to deal with them.
- Revise the plan: Produce a new high-level project plan that includes a reasonable schedule based on professionally prepared estimates.
- Install an early warning system: Put a system in place that will ensure that the project does not slip back into catastrophe mode.
There are three main reports generated by the project evaluator during the disentanglement process:
Step 4: The team overview document
Step 6: The midway report
At the end of the disentanglement process: The final report
The document contains a summary of the project team evaluation. It is used as input to step 7 ("rebuild the team"). The overview includes the main sources of information, the list of interviews, the reasoning that led to any significant findings, and any problems or incompatibles that arose during the evaluation.
The document is generated midway through the disentanglement process after establishing the feasibility of the minimized goals. This provides senior management and other key stakeholders with a formal update on the progress of the disentanglement process. The report documents all major decisions, evaluations, and conclusions that produced the new reduced-scope project. It also includes summaries of the discussion that led to agreement among the key stakeholders.
Producing this report is the project evaluator's last task. The report summarizes all information collected and generated, all decisions made, all major project documents produced, and lists all problems that were resolved or left unresolved. This report is produced even if the disentanglement process does not succeed or if the project is cancelled.
The sequence of the disentanglement steps is organized according to the logical flow described in Table 1.1. It is important to complete the steps in this sequence (though parts of the steps may overlap). The following points demonstrate why the sequence is important:
- There will not be enough information to propose new goals until the project has been evaluated (this includes both the project status and the team).
- There will not be enough information to rebuild the team until the new project goals have been established.
- There will not be enough information for a new plan (schedule and estimates) until the new project goals have been established, the team has been rebuilt, and the risks have been identified.
Table 1.1 Logical Flow of the Ten Disentanglement Steps
Launch the process
Evaluate the status
5, 6, 7
Prepare to resume
8, 9, 10
Each one of these steps is described in detail in the following chapters. Their success is strongly dependent on the cooperation of all involved parties and the active involvement of the project team. But the main precondition for success is the support of the organization’s senior management. As we shall see in the following chapters, without effective management support, the process will fail at almost every step.
The entire process should take no more than two weeks to complete (see the disentanglement timeline in Figure 13.1 of Chapter 13, "Epilogue: Putting the Final Pieces in Place"). This also represents the maximum amount of time that the project will remain halted.2