Home > Store

Toward Zero Defect Programming

Register your product to gain access to bonus material or receive a coupon.

Toward Zero Defect Programming


  • Sorry, this book is no longer in print.
Not for Sale



Allan M. Stavely is a faculty member in the Computer Science Department at New Mexico Tech, where he has taught for more than twenty years. He has also been a visiting staff member at IBM Research in San Jose, California; at Hewlett-Packard Labs in Bristol, England; and at IBM UK Laboratories at Hursley Park, Winchester, England. Much of his teaching, and most of his research, have focused on software engineering, formal methods, and improved methods of programming in general.


  • Copyright 1999
  • Dimensions: 6-1/2" x 9-1/2"
  • Pages: 256
  • Edition: 1st
  • Book
  • ISBN-10: 0-201-38595-3
  • ISBN-13: 978-0-201-38595-3

Read a paper by the author describing a course taught using this book.

Toward Zero-Defect Programming describes current methods for writing (nearly) bug-free programs. These methods are based on practices developed at IBM and elsewhere under the name Cleanroom Software Engineering. The successful application of these methods in commercial projects over the past fifteen years has produced defect rates that are, at least, an order of magnitude lower than industry averages. Remarkably, this reduction in defects comes at no net cost; on the contrary, it is often accompanied by increased productivity and shorter overall development time!

In a concise and well-illustrated presentation, Stavely shows how these methods can be applied in three key areas of software development:

  1. specification
  2. verification
  3. testing.

Requiring formal specifications, and requiring that the code be proved mathematically to agree with the specifications, forces software engineers to program more simply and more clearly, eliminating many defects as a consequence. Performing verification as part of a team process uncovers additional defects and encourages careful examination of the program for efficiency and other quality aspects. Testing the program, to compensate for human fallibility in the preceding steps, catches (nearly) all remaining bugs.

  • Explains Cleanroom methods for developing high-quality software.
  • Encourages a team process for program verification.
  • Illustrates the importance of incremental development.

The author departs somewhat from IBM Cleanroom methods in simplifying the procedures that readers must learn. His aim is to make specification and verification readily accessible to any student or practitioner who can write well-structured programs. No great mathematical sophistication is assumed. Although the book’s examples are written in a number of programming languages to explain different points, the largest number are in C; therefore, a prior knowledge of C is useful.



Stavely: High-Quality Software Through Semiformal Specification and Verification

High-Quality Software Through Semiformal Specification and Verification

Allan M. Stavely
Computer Science Department
New Mexico Tech
Socorro, NM 87801

A version of this paper was presented at the 12th Conference on Software Engineering Education and Training (CSEE&T'99), March 22--24, 1999, New Orleans, Louisiana, USA.


Since 1993 I have been teaching the most effective software engineering course I have ever taught, as measured by the quality of the programs that students can produce at the end of it. The core of the course is semiformal specification and verification, using techniques adapted from the method called "Cleanroom Software Engineering" [Mills], [Linger].

Yes, we really do verify all of our programs, even programs of realistic size that do real work --- especially those. By avoiding excessive formality while retaining the discipline of mathematical reasoning, we are able to do verification with reasonable effort and apply it to real programs; the techniques seem to scale up to large programs very naturally. And we use the verification, not as an end in itself, but as a tool for detecting and removing defects.

In what follows, I will first describe the principles and techniques on which the course is based. I will then describe the course and how I teach it, and finish by presenting results from student projects and from real development projects done using the methods of the course.


The Cleanroom method, developed by the late Harlan Mills and his colleagues at IBM and elsewhere, attempts to do for software what cleanroom fabrication does for semiconductors: to achieve quality by keeping defects out during fabrication. In semiconductors, dirt or dust that is allowed to contaminate a chip as it is being made cannot possibly be removed later. But we try to do the equivalent when we write programs that are full of bugs, and then attempt to remove them all using debugging.

The Cleanroom method, then, uses a number of techniques to develop software carefully, in a well-controlled way, so as to avoid or eliminate as many defects as possible before the software is ever executed. Elements of the method are:

  • specification of all components of the software at all levels;
  • stepwise refinement using constructs called "box structures";
  • verification of all components by the development team;
  • statistical quality control by independent certification testing;
  • no unit testing, no execution at all prior to certification testing.

Cleanroom practitioners have shown their methods to be very effective in a number of real-world projects at IBM and other organizations ([Linger] p. 56, [Hausler] p. 92, [Trammell]) and by a controlled experiment in an academic environment [Selby]. The levels of quality obtained using Cleanroom methods have been impressive. Estimates of industry averages for defect density vary widely, but some published estimates that I have seen for defects per thousand non-comment lines of code are

  • 10 to 50 in delivered code produced in the U.S. ([DeMarco] p. 200)
  • "in the vicinity of at least 4 or 5 ... at delivery" ([Jones] p. 168)
  • 10 to 30 left after unit testing ([Cobb] p. 47)
  • 50 after compilation and before testing; 10 left after unit testing ([Humphrey] p. 213)

In contrast, the average for Cleanroom-developed code in 17 real-world projects from 1987 to 1994, a total of almost a million lines of code, was 2.3 defects per thousand lines of code measured from the time of first execution in testing ([Linger] p. 56). This is clearly much better than the industry average, whichever estimate is used for the latter. The testing then removed most of these defects; many of the Cleanroom-developed systems showed no errors at all when put into production use.

Cleanroom practitioners also claim that their methods cost nothing in terms of effort, schedule and cost, and often even increase productivity and shorten development time: the extra time spent in specification and verification is more than recovered in greatly reduced time spend in debugging and defect correction. Again, the data from real projects supports these claims ([Linger] p. 56).

The right level of formality

Journal papers on Cleanroom [Mills], [Selby], [Cobb] began to appear in the late 1980s, presenting the method as described in the previous section. I read them with great interest. I found the concept appealing, and if the early reports of successful Cleanroom projects were any indication, this was a technology well worth paying attention to.

I was especially eager to see how verification was done in Cleanroom. I had taught, and occasionally actually done, program verification using traditional methods (Hoare's axioms, for example; see [Hoare] and, for example, [Baber]) for many years. However, as sympathetic as I was to the idea, I had to admit that this kind of verification was tedious and time-consuming, and not practical for everyday use. It could be done on small programs, as academic exercises, and occasionally on a few larger programs, in cases in which unusual safety or security requirements justified the substantial extra effort. But I had trouble arguing that all programming projects in the real world should be done using verification. My students didn't believe it and, frankly, neither did I.

Yet Mills and his colleagues claimed to be performing verification routinely on typical real-world applications, tens of thousands of lines long! Furthermore, they claimed a decrease in overall effort when they used verification.

Unfortunately, the papers didn't explain how the verification was done. They gave tantalizing hints in statements like these ([Mills] p. 22):

The method ... used in Cleanroom development, called functional verification, is quite different than the method of axiomatic verification usually taught in universities. It is based on functional semantics and on the reduction of software verification to ordinary mathematical reasoning about sets and functions as directly as possible.

But nothing that had been published at that time described the process in enough detail that I could learn to do it myself.

Then in June of 1992 I had the great good fortune to attend a workshop on Cleanroom methods. It was organized specifically for undergraduate faculty, sponsored by NSF, and conducted by members of IBM's Cleanroom Software Technology Center. In fact, this "workshop" was a condensed version of IBM's in-house training course.

I learned that "functional semantics" means that program statements are treated as functions from one program state to another. These functions are described by giving the new values of any variables whose values are changed, as functions of the values of variables in the original state. This is a more direct way of describing computations than the assertions used in axiomatic verification, which state facts about values of variables rather than stating what the values actually are.

Specifications of programs and parts of programs, then, are written as "intended functions", and verification means showing that each piece computes what its intended function says that it should compute. The intended functions may contain mathematical formulas, programming-language expressions, English, or any other notation that is appropriate.

Intended functions are written for every part of the code, down to the level of each control construct and nontrivial statement. This discipline forces the programmer to document every part of the code and to think very carefully about each step. Furthermore, it places a high premium on clear and simple intended functions and code, and code that obviously matches its intended function. Many errors are prevented in this way alone.

Here is a simple example of code annotated with intended functions, as it might appear in a C program:

 1	[ report := description of each item in selected_items ]
 2	{
 3		itemlist L = selected_items;
 5		[ report := empty ]
 6		reset(report);
 8		[ report, L:=report || description of each item in L, empty ]
 9		while (L != NULL)
10		{
11			[ report,L := report || description of first item in L,
12				all but first item in L ]
13			put_description(report, L->first);
14			L = L->rest;
15		}
16	}

The intended functions are the parts in square brackets; each one is the specification for the code immediately below it. (The operator "||" denotes concatenation.)

Verification is done following the structure of the control constructs. In this case, the top level is the sequence of the statement on line 3, the computation on lines 5 and 6, and the computation on lines 8 through 16. We must show that the composition of the function computed by line 3 (which is obvious), the function on line 5, and the function on line 8 does what the intended function on line 1 specifies; we don't even look at the code below the functions in this step. By substitution of selected_items for L and "empty" for report in line 8, the composition is

[report, L := empty || description of each item in selected_items, empty ]

which (since the effect on L is not visible outside the braces on lines 2 and 16) agrees with line 1.

To verify that the while-loop computes its intended function, we would ask:

  1. Does the loop always terminate? From the context of the rest of the program, we would see that an itemlist is a linear linked list terminated by NULL, so that repeatedly computing

     [ L := all but first item in L ]
    will eventually leave L containing NULL, if it is not initially NULL.
  2. When the loop condition is false, does doing nothing compute the loop's intended function? Yes, because when L is NULL the set "each item in L" is empty.

  3. When the loop condition is true, does executing the loop body, followed by computing the loop's intended function in the resulting state, compute the loop's intended function from the state before the loop is entered? By substituting the results of the function on line 11 into the function on line 8, we obtain

    	[ report, L := (report || description of first item in L)
                           	       	   || description of each item in (all but first item in L,
                     	    	empty ]
    which reduces to
    	[ report, L := report description of each item in L, empty ]

    Notice how we can do this reasoning just by manipulating the English phrases in simple ways; we don't need to formalize what an "item" or a "description" is, or even spend time thinking about what the terms mean.

The remaining steps are to show that the statements on line 6 and lines 13--14 compute their respective intended functions. To see what each procedure call computes, we would look at the intended functions for the corresponding procedure definitions and substitute the values of the arguments. The rest is trivial.

Cleanroom-style verification is done, not by a lone programmer endlessly pushing symbols around on paper, but by a group of developers discussing each correctness issue among themselves, as a group of mathematicians might do. This is done in a review meeting, led by the program's author and including several other team members.

The author of the code would present the correctness arguments very much as I have just done in the example. The other members of the team might raise questions or suggest ways of improving the code, and would be watching for mistakes in the code, the intended functions, and the correctness arguments. Each part is considered verified when all members agree that it is correct.

The team members do not do detailed written proofs of every point. If a part of the program obviously does what its intended function says, the team members note that fact and go on. In more complicated cases, they might spend a bit more time discussing why a particular section is correct, perhaps drawing pictures or writing and simplifying expressions on a whiteboard. They will do detailed derivations and proofs, using the full power of mathematics, only when absolutely necessary.

This, then, is the way that verification is made really practical: it is done at the appropriate level of formality, and no more formally than is necessary. It is not desirable to prove every point using full formality, right down to the level of mathematical axioms. In fact, in most cases it is counterproductive and a waste of time. After all, even mathematicians don't do it in all their work --- why should programmers?

Furthermore, the real purpose of the verification is not to establish that the code is a mathematically perfect object. On the contrary, the verifiers assume that there may be defects in the code, and they watch carefully for places where the verification fails. This is the way that defects are detected. The value of the verification is not in the proofs themselves; it is in the bugs that are found and removed in the process.

If a bug fix is obvious, the correction is made on the spot and the verification continues; otherwise, the specification and code are sent back to the author for rework. The review process also encourages careful examination of the program with respect to all aspects of quality, much as in other kinds of inspection meetings [Fagan]. If the reviewers conclude that any part of the code is lacking in some aspect of quality, that part is sent back for rework.

After all of the defects found in the review are fixed and the program is successfully verified, it is not assumed to be perfect even then. That is because the verifiers are human, and may make mistakes and overlook defects. Therefore, the program is tested thoroughly. It is not too unusual to find a few bugs. Fortunately, it happens that most such bugs are simple oversights that are found very early in testing, and are easy to find and fix. They are seldom the deep or subtle algorithm flaws that can escape detection until they cause mysterious failures in production use much later. It turns out that the latter are the kinds of defects that the verification is particularly good at exposing.

The testing serves another purpose that is even more important: as quality control. If too many bugs are found in testing, it is a sign that the development, and the verification in particular, were not done carefully enough. The code is sent back to the developers to be redone, and steps are taken to tighten up the development process. Data from the testing can even be used to draw statistical inferences about the reliability of the final product.

Thus, the Cleanroom approach to specification and verification, even though solidly based on mathematics and formal methods, is a very pragmatic one. These are its key characteristics: (1) using the "functional" style, (2) verifying for the purpose of finding bugs, (3) verifying by group interaction, (4) testing to complement the verification, and, probably most importantly, (5) using as much formality as necessary, and no more.

The course

I immediately returned to my college and began developing a course based on these principles. I offered it for the first time in the spring semester of 1993, and have taught it five times in all (as of the end of 1998).

The course is listed as Computer Science 427, "Zero-Defect Software Development". I mean the title to be provocative. I explain to the students that we can't really achieve zero defects consistently, even by using Cleanroom methods. But we can come a lot closer than most people do, and "zero defects" should be our goal.

Most students who have taken the course have been computer science undergraduates in their third or fourth year. A few have been graduate students, and a few have been programming professionals who are not full-time students.

The programming prerequisite for the course is the third course in our sequence, which happens to be systems programming; this is not for any specific systems programming topics, but just for a certain level of programming maturity and experience. Actually, most students who take the course have more programming training and experience than this. An additional prerequisite has been a first course in discrete mathematics; this requirement is now redundant, since our students are now required to take such a course in their first year.

The textbook is my own [Stavely]. I found that no previous book on Cleanroom was suitable for a course such as mine. As often happens, the book evolved from a collection of notes distributed to the students in place of a textbook.

I have modified and adapted the standard Cleanroom methods somewhat for the course. In most cases, the adaptations have been simplifications. One important omission is the use of "box structures": I find that my students can do successful specification and verification of the well-structured programs that they already know how to write.

Specification and verification are the core topics in the course. I also cover Cleanroom-style testing, which is done based on patterns of expected usage rather than code coverage, and the incremental development which is another standard part of the Cleanroom process. I omit statistical inference from the results of testing, because I consider it a more advanced topic and don't want to limit my course to students who have the necessary background in statistics.

I usually spend the first week giving an introduction to the Cleanroom process, followed by the concepts of functions and intended functions. This requires introducing a bit of notation, like the notation used in the example above, but not much. The next topic is the theory of verification, including the correctness questions that must be answered for each of the standard control constructs. I use plenty of examples, of course; wherever possible these are realistic sections of code in a real programming language such as C. I spend a lot of time on how to write good intended functions, because this is critical for later success. Intended functions must express accurately what you intend to specify, they must be clear and unambiguous, they must be as simple as possible without omitting anything important, and they must be at the right level of formality.

By the third week the students should be ready for a real exercise, the writing and verification of a very small program. One that I often assign at this point is one that counts characters, words and lines in a stream of ASCII text (like the Unix utility wc). It turns out that this assignment is deceptively tricky: a student must choose whether to count beginnings of words or ends of words, and what to do about possible special cases at the beginning and end of the input. A poor combination of choices, or any other aspect of the code that is complicated or obscure, will force the student's intended functions to be complicated, and then the verification will be long and tedious. Students see very quickly that writing verifiable code means working hard to make the code as clean and simple as it can be.

Once the students think that they have a successful solution to the exercise, I display a few of their programs to the rest of the class. (I find that students are usually willing to volunteer their programs for this purpose.) First we examine the intended functions to see if they look adequate. Often they will not be, and I and the other students make suggestions for improving them. But if they appear to be, I then attempt a few steps of the verification. Often this will fail, because the intended functions and the code really don't match. Usually the code actually works, but the intended functions don't accurately express what the code should be doing. The students learn a lot from seeing exactly why their early attempts are not quite right.

Specification and verification are skills that require practice to master, and so I normally assign two more programs that are a bit more complicated than this one, but not much. Meanwhile, I lecture on the protocol for group verification reviews. By the third assignment, I ask the students to verify their own programs in groups of three or four people. I circulate around the room, listen in, and mediate or give hints where necessary. By this time most students are writing programs and intended functions that really are verifiable, so the reviews usually go reasonably smoothly.

Once a group claims that a program is verified, I test it, playing the role of an independent testing team, and tell the group of any errors that I find. Occasionally I see a program or section of code that obviously doesn't verify, and then I ask questions. It almost always happens that the group didn't really verify it --- they merely inspected it to see if it "looked right". This is not acceptable procedure, of course, and it is important to watch out for it.

I also encourage the students to keep records of defects found in their programs and whether they are found in review, compilation or test, as well as other data such as code size and time spent in each activity of program development. Once students are in the habit of doing this, they can use the data to track their progress and measure their performance against their expectations, as in the Personal Software Process [Humphrey].

By this time we are at or just beyond the middle of the semester. At this point I assign a larger programming project for the students to do in teams, due at the end of the semester. The first three times, the class was small enough (no more than eight students) that I made the whole class a team. When enrollment is larger, the class is broken into teams of three or four people. I try to choose a project that will result in about a thousand lines of code (not counting blank lines and lines containing only comments or intended functions), developed in at least two increments.

Much of the project work is done outside of class time. Meanwhile, I cover the rest of the lecture material. This includes Cleanroom-style testing and incremental development, and special techniques for specifying and verifying such things as data structures and object-oriented programs, and recursion and programs in functional programming languages. If time permits, I may introduce other topics, including other formal methods in software development.

For further details on the material covered in the course, please see [Stavely].

Results: the class projects

One good measure of the effectiveness of the course is the quality of the students' final projects. Here are the results.

Each project was intended to be a real application program constructed to meet a real need; two of the five are still being used. The projects were a program to generate and fill in forms in the TeX markup language, an electronic mail handler for a subset of the SMTP protocol, a program to trace a network of HTML pages on local and remote machines and report broken links, a suite of programs to gather statistics on web usage from log files, and a program to generate stereotyped HTML pages from (almost) plain text files.

The first three projects produced programs of just under 1,000 lines each; the defect densities were 7.3, 6.5 and 2.1 per thousand lines respectively, measured from first execution. These results are not quite as good (on average) as those from industry Cleanroom projects, but are, in my opinion, quite impressive for undergraduates with no industrial programming experience and no previous exposure to formal methods.

The fourth and fifth times the course was offered, the results of the projects cannot be compared directly with those of the first three times, because the conditions were different. The fourth time, I allowed the students to use a programming language (Perl) which many of them did not know, because they wanted to learn it while doing the project. This was a mistake. The students let too many errors go undetected in verification because of their unfamiliarity with the language. The resulting bug density, over 20 per thousand lines, was higher than it should have been, and didn't impress the students with the effectiveness of the method as well as it should have. The project was successful even so: the bugs were easy to find and fix, and the programs were soon running with no errors, as we normally expect. Still, I should have insisted that the students do their projects in a language that they knew very thoroughly.

The fifth time, a few students were not fully committed to the course, didn't do all of the assignments, and never really mastered the techniques. A few others didn't complete their projects because of unusual external demands on their time. However, of the increments of the project that were actually completed, most had bug densities after verification of less than ten per thousand lines. For some the numbers were much lower, and one student's first increment of 501 lines had no bugs at all.

Thus, except for a few anomalies, the students' final projects have been very successful. The record shows that, under normal circumstances, any competent student, applying the methods taught in the course with a reasonable amount of care, should be able to contruct software containing ten bugs per thousand lines or less by the end of the course. Again, this is before the code is ever executed, and most of these bugs will be removed in testing. By current real-world standards, this is an impressive level of quality, especially for students.

Results: outside the class

Being in the academic world and not in industry, my students and I seldom have an opportunity to undertake really large software development projects, such as those of tens or hundreds of thousands of lines that have been reported in the Cleanroom literature. However, graduates of the course and I have constructed several real applications of modest size using the methods that I teach. Here are the results.

To gain experience in the methods that I was teaching, in 1995 I conducted a small-scale development project to produce a real application program for a real client. The client was the New Mexico Tech registrar, who needed a program to detect time conflicts and missing information of various kinds in a proposed class schedule for the college.

The development went very smoothly. The resulting program was 478 nonblank, non-comment lines in the C language. The development took 35 person-hours, broken down as follows (in hours and minutes):

requirements gathering and writing specification document 8:25
design and coding 9:10
verification review (4:05 elapsed time, 2 people) 8:10
removal of defects caught during review, and reverification 4:15
compiling and testing 5:00
debugging 0

The number of participants in a verification review is typically three to eight. However, it happened that I had only one other trained person available to help me: John Shipman, an early graduate of the course. Thus we used a verification team of only two people.

In the verification reviews, we caught four bugs. We overlooked two, both of which I happened to notice while reading the code again after the verification. Perhaps a larger verification team would have been caught one or both of these. Still, the bug density after verification was only 4.2 per thousand lines. No bugs at all were found in testing. The program has now been in production use for three years, with no malfunctions ever observed.

I also helped Mr. Shipman to verify parts of five programs which he wrote for his consulting and contracting business (Zoological Data Processing, Socorro, NM) and for the New Mexico Tech Computer Center. The verified parts totalled 3,022 nonblank, non-comment lines in the Icon and Python languages. In all, 29 bugs were left after verification, for a bug density of 9.6 per thousand lines at first execution. This number is somewhat misleading, however: 13 of these bugs were actually mistakes in types of variables, number and types of arguments to functions, etc. Icon and Python do not have strong typing; in languages with strong typing, these errors would have been detected by the compiler. When these errors are subtracted, the bug density was 5.3 per thousand lines. Again, this was with a verification team of only two people.

More recently, another graduate of the course, Alex Kent, developed a program using the methods of the course in his capacity as programmer for the New Mexico Tech Computer Center. The program was a printer-control system, written to replace an unreliable system already in place. The resulting code was 2,062 nonblank, non-comment lines of C.

Mr. Kent first tried to eliminate as many bugs as possibly by himself using semiformal verification. He found and removed 17 bugs in this way. He then used a three-person group to verify the resulting code in the normal way; they detected 11 further bugs. Five more bugs were detected in testing. As of this writing, the program has been in production use on one printer for about a week, and has processed close to a thousand printer jobs with no errors. Assuming that no bugs remain, the bug density after the one-person verification was 7.6 per thousand lines, which the three-person verification reduced to 2.4 per thousand lines and the testing reduced to zero.

All of these projects followed the expected pattern for programs developed in the Cleanroom style. Development went smoothly; there were no unpleasant surprises, all-night debugging sessions, or periods of panic. Bug density after verification was low, of the expected order of magnitude. Testing detected most or all of the remaining bugs, and most of those bugs were easy to diagnose and fix. The programs were soon up and running, and they kept running with few or no malfunctions after that.


There is no magic in the techniques that I teach in this course --- in fact, in hindsight, the whole method seems less a high-tech breakthrough than an application of good common sense. As is often the case, we owe this "common sense" to the brilliant insights and hard work of those who came before us! In any case, the method works, reliably and cost-effectively. And it can be taught, even to undergraduates.

These techniques give us one way of constructing software that can truly be said to be engineered, not just cranked out by ad-hoc coding and debugging that we try to keep under control through project management, as in much of what has been called "software engineering". Methods like these --- methods that are firmly grounded in fundamentals, that are adapted to the needs of the real world while retaining their essence, and that really do work in practice --- are what engineering is all about. Perhaps some day, when our field is truly mature, methods with these properties will be accepted as the core of software engineering education.


[Baber] Robert Laurence Baber. Error-Free Software: Know-how and Know-why of Program Correctness. Chichester, England: Wiley, 1991.
[Cobb] Richard H. Cobb and Harlan D. Mills. "Engineering software under statistical quality control." IEEE Software 7, 6 (November 1990), 44-54.
[DeMarco] Tom DeMarco. Controlling Software Projects: Management Measurement & Estimation. Englewood Cliffs, N.J.: Prentice-Hall, 1982.
[Fagan] Michael E. Fagan."Advances in software inspections." IEEE Transactions on Software Engineering , vol. SE-12, no. 7 (July 1986), 744-751.
[Hausler] P. A. Hausler, R. C. Linger and C. J. Trammell. "Adopting Cleanroom software engineering with a phased approach." IBM Systems Journal , vol. 33, no. 1, 1994, 89-109.
[Hoare] C. A. R. Hoare. "An axiomatic basis for computer programming." Commun. ACM 12, 10, October 1969, 576--583.
[Humphrey] Watts S. Humphrey. Introduction to the Personal Software Process. Reading, Mass.: Addison-Wesley, 1997.
[Jones] Capers Jones. Programming Productivity. New York: McGraw-Hill, 1986.
[Linger] Richard C. Linger. "Cleanroom process model." IEEE Software 11, 2 (March 1994), 50-58.
[Mills] Harlan D. Mills, Michael Dyer and Richard C. Linger. "Cleanroom software engineering." IEEE Software 4, 5 (September 1987), 19-24.
[Selby] Richard W. Selby, Victor R. Basili and F. Terry Baker. "Cleanroom software development: an empirical evaluation." IEEE Transactions on Software Engineering, vol, SE-13, no. 9 (September 1987), 1027-1037.
[Stavely] Allan M. Stavely. Toward Zero-Defect Programming. Reading, Mass.: Addison Wesley Longman, 1999.
[Trammell] Carmen J. Trammell, Leon H. Binder and Catherine E. Snyder. "The automated production control system: a case study in Cleanroom Software Engineering." ACM Transactions on Software Engineering and Methodology 1, 1 (January 1992), 81-94.

Return to Book Page

Sample Content

Table of Contents

(Chapters 2 - 11 contain Exercises and all chapters conclude with Notes.)

1. Introduction.

The Problem: Bugs.

The Cleanroom Method.

About This Book.

2. The Functions Computed by Programs.

Computations: States and Functions.

Representation of Functions: Concurrent Assignments.

Conditional Concurrent Assignments.

Local Variables.

Specification Using Intended Functions.

Other Notation Conventions.

Writing Intended Functions.

3. Verification.

The Structured Control Constructs.

Placement of Intended Functions.

The Substitution Principle.

Sequences of Statements.

Trace Tables.


Conditional Trace Tables.

4. Verification of Iterations.


Proving Termination.

Initialized Loops.

Writing Intended Functions for Loops in Isolation.

Other Forms of Indefinite Iteration.

5. Programming with Intended Functions.

A Pascal Program: Length of the Longest Line.

A C Program: Counting Letters and Digits.

An Icon Routine: Uncompressing a String.

A Study in Abstraction: The Registrar’s Program.

Keeping Things Simple.

6. Verification Reviews.

Why Verification Reviews Are Necessary.

Verification Reviews in the Cleanroom Process.

How Verification Reviews Are Done.

Example: Another Routine from the Registrar’s Programs.

Example: A Routine from a Test-data Generator.

Discussion of the Examples.

7. Definite Iteration.

Definite Iteration Over Sequences.

Sequence Variables.

Other Sequence Expressions.

Ranges of Integers.

Other Data Structures.

The Iteration Mechanisms.

Sets and Sequences in Program Design.

8. Data Abstraction and Object-oriented Programs.

Data Abstraction and Encapsulation.

The Abstraction Function.

Data Invariants.

Object-oriented Programs.

9. Recursion and Functional Languages.

Recursive Routines.


Mutual Recursion.

Functional Languages.

10. Testing.

The Role of Testing.

Usage-based Testing.

Test-data Generators.

Other Forms of Testing.

11. Incremental Development.

Developing a Program in Increments.

Example: Rehearsal Scheduling.

12. Where Do We Go From Here?

Other Parts of the Cleanroom Process.

Other Formal Methods.

What Have We Accomplished?

Prospects for the Future.

Hints for Selected Exercises.


Index. 0201385953T04062001


This is a book for programmers who want to write programs with no bugs, or at least with as few bugs as is humanly possible with current technology.

Programmers can do this by using a particular kind of specification and verification, which I will present in the book. By verification, I mean proving, mathematically, that a program agrees with its specification.

Don't panic. You will see that this does not require great mathematical sophistication, only knowledge of a few particular techniques and ways of doing things (and then, of course, practice). There is no rigorous mathematical formalism in this book. In fact, the specifications and proofs are usually only semiformal and are sometimes quite informal indeed.

Why verify programs? Because it is a very effective way of detecting bugs. We often find that, as we attempt to verify a program, that our efforts fail because the program is not, in fact, correct. When this happens, we have found a defect. Then, of course, we fix it.

This will be our fundamental assumption: The purpose of verification is to eliminate defects.

This is a very pragmatic view. Others may take the 'all-or-nothing' position that verification is a mathematical exercise that produces mathematically perfect objects and is worthless if it can't do that, but we won't say any such thing here. You won't see any claims in this book that verified programs cannot possibly contain any defects, do not need to be tested, and can be trusted unconditionally. On the contrary, we cannot always detect all defects in a program using verification: we are human, and we make mistakes. Thus, verified programs still need to be tested carefully, and I will present methods for doing this as well.

The methods presented in this book are based on a set of interrelated techniques developed at IBM in the late 1970s and early 1980s, under the name Cleanroom Software Engineering. IBM has used them on a number of substantial projects since then, with impressive success, and since the late 1980s their use has been spreading to other companies and organizations.

These methods work. Cleanroom methods can reduce the number of defects in software by an order of magnitude or more. The extra time spent in specifying and verifying is more than made up in reduced debugging and reworking time, so using these methods costs nothing extra in terms of effort or schedule time. And the methods scale up well to large systems. You don't need to take my word for it: the results have been reported in the open literature. You will see some of the data, with literature citations, in Chapter 1. And I, along with my colleagues and students, get similar results.

There is no magic here. Our methods require programmers to work carefully and systematically, to check their work, and above all to understand their programs very thoroughly. But these are things that they should be doing anyway, even if many donit. You can think of methods as merely tools that help programmers to do these things.

This book presents my adaptation of Cleanroom methods and should not be taken as an official statement of what Cleanroom Software Engineering is. I do not speak for the Cleanroom community, which in any case has dispersed and expanded far beyond the original group at IBM. In my presentation, I have chosen to emphasize the parts of the Cleanroom process that seem to me to be the most valuable (other Cleanroom practitioners may well disagree), and to deemphasize or omit others. I have also introduced a few new ideas of my own.

The book was written with classroom use in mind and is based on a course that I have been teaching since 1993. Besides presenting the fundamentals of specification, programming, verification, and testing in the Cleanroom style, I have included extra explanations, hints, and examples covering parts of the process that students seem to find difficult at first. Mine is a one-semester undergraduate course for computer science students, but a similar course could be taught at the graduate level. The book is also suitable for an intensive short course in an industrial or continuing-education setting.

Programming professionals will also find this book useful for self-study. You will see in Chapter 6, however, that the recommended way to do the verification is in small groups, in review meetings, rather than on your own. Thus, if you intend to study this book outside a course, you will get the best results if you can work with at least two or three other people who already know these methods or who would like to study them with you.

As a minimum, the reader should be familiar with the basics of algorithms and data structures, as taught in the usual first course in the subject, and with the rudiments of discrete mathematics (sets, relations, functions, and so on). The reader should also, of course, have enough experience in programming and debugging to appreciate how valuable zero-defect programming would be if we could do it!

I also assume the ability to read and understand the structure, at least in general terms, of programs written in languages in the Algol family (including Pascal, C, Modula, Icon, and so on). In Section 9.4 I also use the functional languages Scheme and ML, but that section can be omitted by readers who have no familiarity at all with functional languages. Similarly, in Section 8.4 I use the object-oriented features of C++ and Python languages in my examples, but that section can be skipped by readers who have no familiarity at all with object-oriented programming or languages.

The methods of this book can be applied to programs in a wide variety of languages. To demonstrate this, extended examples of programs, parts of programs, and program development will be presented using the following languages: C, C++, Icon, ML, Pascal, Python, and Scheme. Particular features of these languages are explained where they are used, so a detailed knowledge of all these languages is not required. However, more of the examples are written in C than in the other languages, so familiarity with C would be helpful.

In many places in the text, short fragments of programming-language code are used to illustrate the ideas being presented. Most of these fragments are written in a generic, 'ordinary' procedural programming language: except for trivial details such as where the semicolons are, the language of each fragment might be Pascal, or Modula, or Turning, or Icon, or Ada, or any of a dozen others. The syntax may vary, but the meaning should always be apparent. When an example is meant to be in a particular programming language, I will say so.

Literature citations and comments on related work appear in the Notes sections at the end of each chapter.

My most important source does not appear in those citations. I received my first training in Cleanroom methods at a one-week intensive workshop at the Rochester (New York) Institute of Technology in the summer of 1992. The workshop was sponsored by the (U.S.) National Science Foundation. It was a condensed version of IBM's in-house training course and was conducted by Philip Hausler, Mark Pleszkoch, and Steve Rosen, of the IBM Cleanroom Software Technology Center. I am most grateful to them and to IBM, NSF, and RIT for making the workshop possible. Although I do not cite this workshop in the Notes in specific chapters of the book, the workshop was a primary source for much of the material, particularly the material of Chapters 2 through 6 and Chapters 10 and 11.

I am also most grateful to friends and colleagues who have made valuable contributions to this book through consultation, suggestions, and corrections: Michael Deck, John Duncan, Doug Dunston, Suzanne Flandreau, Alex Kent, Ray Piworunas, Steve Powell, John Shipman (for help on more occasions than I can count), and Laurie Williams; also to Peter Gordon at Addison Wesley Longman, for support and suggestions; and to all of the students in my classes over the past five years, for helping me to debug my presentation of this material. Many thanks to all of you.

Cleanroom methods are the best that I have found for producing high-quality software, and experience has shown that they can take us a long way toward the goal of zero defects. I am confident that they will continue to be developed and refined and reasonably sure that different but even better methods will eventually be developed. Meanwhile, Cleanroom methods have proven their worth and are ready for practical use today, and I hope that the presentation in this book of my version of them will help to promote their use.

In any case, we can begin by changing our attitude toward bugs: they are not inevitable, and they are not acceptable. Zero defects is the goal that we should strive for. We will not always be able to achieve it, but we should try.

A. M. S.
Socorro, New Mexico





(Please note major corrections are in red, minor corrections in black)

page vii, line 2 of paragraph 4: "often find that, as we attempt to verify a program, that..." should be: "often find, as we attempt to verify a program, that..."

page viii, line 14: "You can think of methods..." should be: "You can think of our methods..."

page ix, line 8: "of C++ and Python languages..." should be: "of the C++ and Python languages..."

page ix, a little over halfway down: "... or Modula, or Turning, or Icon, ...": "Turning" should be "Turing".

page 8, quotation at the bottom, first line: "Studies have shown that for every new large-scale..." should be: "Studies have shown that for every six new

page 15, paragraph 3 (not counting the display), line 4: "... conditional assignment is selected to determine..." should be: "... concurrent assignment is selected
to determine..."

page 26. line 11: "... as we will see.)" should be: "... as we will see)."

page 27, line 18--19: "... it is entirely in English and contains no mathematical notation" should be: "... it is almost entirely in English and contains very little
mathematical notation".

page 50, the last full line: "... proceed as if this were an ordinary truth table," should be: "... proceed as if this were an ordinary trace table,"

page 65, point 3 in the second list of three points:
"3. After doing S2, if B is false..."
should be:
"3. After doing S1, if B is false..."

page 72, the fifth line from the bottom:
| true --> I ]
should be:
| true --> maxlength, input := maximum of maxlength and length of first line in input, empty ]

page 80, 82, 83: "isletter(c)" should be "isalpha(c)" throughout.

page 81, third paragraph: replace the whole paragraph with the following:

If we like, we can cause our counters to be initialized to zero by default by declaring them as static. If we do, the explicit initialization is redundant, and can
be deleted; the declaration of the variables, by itself, computes this initialization. It is a good idea to put the intended function

[ letters, digits, others := 0, 0, 0 ]

on the declaration itself, to make explicit the fact that we depend on this initialization. Even if c received a default initial value, we would not put such an
intended function on its declaration, because we do not care what its initial value is.

Then, on page 83, change:

int letters, digits, others; /* (counts of these in input) */
int c; /* (input buffer: a character or EOF) */


static int letters, digits, others; /* (counts of these in input) */
int c; /* (input buffer: a character or EOF) */

page 84, the third equation defining run-length-decoding, at the top of the page: put a period after "or a character not part of a run-length sequence", and
insert after that:
"run-length-decoding of an empty string is an empty string."

page 87, the first code section in the middle of the page:
result ||:= copies(s[i+1], ord(s[i+2])
should read:
result ||:= copies(s[i+1], ord(s[i+2]))

This correction appears again on page 87, in the short code section that follows, and in Figures 5.5 and 5.6 on pages 88 and 89.

page 112, the line numbered 18 in Figure 6.2: "if (L->rest == NULL)" should be "if (L->next == NULL)"

page 123, the line just below the program segment at the top: "The syntax every c := !! &input() means..." should be: "The syntax every c := !!
&input means..."

On page 126, the third line:
for x in letters
should be
for x in letters:

page 136, line 6 of the first paragraph of section 7.6: the word "programming" is misspelled.

page 157, line 3 of the section "Implementation:" in the code: "This is an expensive operation, should be infrequent..." should be: "This is an expensive
operation, but should be infrequent...".

page 157, in Figure 8.1: the last four lines of the figure should be:
Bigstring bigstringEmpty();
void bigstringMakeEmpty(Bigstring b);
void bigstringAppend(Bigstring b, char c);
char * bigstringToString(Bigstring b);

page 190, line 2: "he found that..." should be "they found that..."

page 191, line 4 of the first paragraph of section 10.3: "... and the possible values..." would be clearer and more grammatical if it said "... and on the possible

page 230, under [John84], and page 231, under [Mill84]: "Fla.." should be "Fla." in both places.

Submit Errata

More Information

Unlimited one-month access with your purchase
Free Safari Membership