Home > Articles > Programming

An Interview with Watts Humphrey, Part 26: Catastrophic Software Failures and the Limits of Testing

  • Print
  • + Share This
In this transcript of an oral history, Grady Booch interviews SEI Fellow Watts Humphrey. In Part 26, Humphrey discusses the software failures of the Therac-25 and the V-22 Osprey, why testing catches less than 1% of all scenarios, and why good software is like a symphony, where one bad line of code -- or one bad musician -- can ruin the entire piece.

This interview was provided courtesy of the Computer History Museum.

See the entire interview.

From the author of

Initial PSP Use

Humphrey: So a lot of people were very helpful with my programming problems. But basically coming up with the PSP and the PSP idea and how to do it, by and large pretty much just about everybody I was working with at SEI and elsewhere thought I was kind of a nut. I mean “what in the world is the point of doing that?” was sort of the reaction that I was getting. And remember I’d gone through and written the programs, and then I gave a talk down in Texas to the people at TI. I did talks to a bunch of places trying to get people interested in trying the PSP. A couple of graduate students at CMU, who were PhD students, they expressed interest in it, and so I got them copies of the process and said, “Here -- why don't you use that to write some programs,” and they couldn't. I mean, they used bits of it that they thought would work and they didn't bother with the others. They didn't have the discipline, they didn't have the data. I mean, they were bright guys, but they really didn't have the dedication I did. And, of course, without a tool, it's not that hard to gather the data, but it's hard to get programmers to gather data without some kind of support tool.

There was a group -- a Siemens Research lab in Princeton -- and one of the engineers there had heard about what I was doing. He called me, he was interested. So I went down and talked to them. And they were very interested. They wanted to do it. I talked to the team, they were excited about it. The director of the lab actually told the team that his top priority was to have them do it this way and that the schedule and everything else was secondary. And I thought this was a level of management support I really needed. The guy was great, I thought it was marvelous.

And so I went home all encouraged that I'll get these guys to actually use it. I'd given them the process, I'd described it to them. And so I kept calling like every week: where do you stand, how are you doing? Oh we're doing a prototype here, we don't have time to do it yet. And they never did. They never got to it. They were in too big a hurry to do other stuff. What was interesting was as part of that, I'd actually taken time to sit down with each of the engineers on the team. There were five engineers, and they had a process guy there. He was sort of a technology guy but he was sort of acting as the process guy I guess. But in any event, he and I would sit down and interview each of the engineers individually on the team. And what I did was I wanted to find out what process are you using personally. And they didn't know how to answer that question, so I'd kind of lead them through it.

I said, okay, when you get an assignment to write a module of code, say 2,000 lines of code, what do you get? So they'd tell me, and then say here's how I'd start. I'd say well, what do you do next? So they'd start to describe that. So they would describe what they did, and I'd lead them step by step, and I wrote down on the board what they were saying. And every so often they'd say, here's what they did next. And I'd say, "Well don't you have to do this too?" And so they, "Oh yeah, well I forgot that." And so we built the process for each of these five people. I don't know if it's in my notebooks or not. But I wrote it on the board. I'd write it where they could see it. But in any event, they were all -- the five of them on one team -- totally different. It was amazing. I mean, one young guy came right out of school, seemed like a real sharp guy, but he basically started coding in the middle and he didn't do any design work at all. He just sort of had this idea of what it was and how it was going to work, and so he started coding in the middle of this big thing, and he'd sort of build it.

And at the other extreme, there was an engineer -- I think he was actually from Germany -- and he told me that he went through his process, which was a marvelous process, very disciplined. He didn't have data but he really did a very careful job, and he said he'd never had a defect found in any of his programs. I mean, this guy was proud of what he was doing. He was producing, really, masterpieces of programming. And I thought to myself, we've got these masterpieces going in with this junk. It was just staggering to me that we had this enormous range of the quality of the code going into one system these people were building. And it was sad. And my point is, we see that everywhere. Nobody knows who's using what process or what quality they're producing. So we get beautiful code by some people and junk by others, and the junk will kill the program. Just kill it. And so they run into all these problems, months of testing. Well, their product -- they never could get it to work. And the reason is the junk, there was just too much of it, and they could never get it fixed. So, that's kind of sad.

And so the individual process is a critical piece of this, and that's why I went all the way to the PSP. And so, as I say, I never could find anybody who would use it and so I was really very frustrated. I think I may have mentioned I was at a conference in Berlin, a process conference where Peter Feiler and I gave a paper on terminology and process. It's one of the technical reports we got. But while I was there, I talked to Dr. Mahdevi who is the professor who suggested that I teach a course, and I wrote that textbook I mentioned. And I had a marvelous time because I was going to these various symphonies, and I remember it was interesting sitting in the audience watching a symphony play because, as I said, I went to a concert every night when I was in Berlin, it was a thrilling experience. I was there for what must have been almost a week and they have such a rich place, their symphony hall is gorgeous, you know -- round -- and you can sit sort of behind the symphony.

Booch: And the time you were there that's when the wall had just gone down so the dynamic was amazing.

Humphrey: Oh yes, the wall had come down, I walked over there and around it, was looking at the whole area.

Booch: And you saw the piles of rubble in East Berlin. That was amazing to see.

Defective Software

Humphrey: Oh exactly, exactly. It was a thrilling experience. But in any event I was looking at that and it's amazing, by watching how symphonies work what kind of feeling you get from the dynamics of doing beautiful work. And that's the software issue. We really need symphonic teams. The hacker business is so sad because, just like in a symphony, any individual instrument can destroy the whole effect, and that's exactly true of software. Any individual piece of code can destroy the whole thing. And that's the problem. I'm trying to remember the name of the medical instrument, remember that killed a bunch of people, the Therac-25 machine. And that was a trivial error in an error recovery program. I mean it wasn't something that normally gets used. It was off in the side somewhere. I remember an interesting sideline -- earlier at IBM on OS 360 we had performance problems.

Booch: The Therac-25, that was the machine.

Humphrey: That's what it was, the Therac-25. And that was an error recovery program that had a defect in it, and it missed getting a reset and I believe killed half a dozen people. And so we get those problems. And exactly the same thing with the V-22 Osprey. Do you know the story about the V-22?

Booch: I do but the people listening in might not so why don't you relay.

Humphrey: Well, on the V-22 Osprey, I actually I went out and talked to the executives of the company that built that system, and I was talking about the quality of their software. They bridled, they said we have very high quality software. I said, the fact that it's killed 13 marines is a good measure of quality. They didn't buy that. But in any event, the V-22 Osprey is this tilt-wing aircraft that you can fly as a regular airplane and then as you're coming in to land you tilt the wings up and so the propellers are pointed up instead of forward and it lands as a helicopter. I mean it's an enormous technical achievement to build that thing.

And, of course, one of the issues that they had was what happens when the hydraulics fail while you're tilting the wing? You've got a whole hydraulic system that does that, and so they put in a whole electronic backup system in case you have a hydraulic failure -- an electronic backup system that will fly the airplane electronically. It turned out in this particular case, they were in a test flight with a bunch of marines in the aircraft, they were coming in to land, and as they were tilting the wings to bring it in, the hydraulics failed.

And so the backup system took over, the software that controlled the electronics system took over, and the software had an error in it that essentially crossed the controls. And of course a pilot can figure that out if he's got a little time, but when you're coming in to land that's a little hard, and they crashed and killed everybody. And the point is, I run into this in all kinds of things, the number of possibilities that have to be tested, and this is what the executive was telling me, "Oh, we tested it exhaustively." And I agreed they tested exhaustively but exhaustive testing won't find all the defects. People don't know that. They don't understand that. And let me branch a little piece on here. When you think about a big program, big complex system program, 2 million lines of code something like that, and you run exhaustive tests, what percentage of all the possibilities do you think you've tested? Any idea?

Booch: Oh it's going to be an embarrassingly small number probably in the less than 20, 30% would be my guess. What would you say?

Humphrey: You're way off. Way off. I typically ask people and I get back numbers 50%, 30%, that kind of thing. I asked the people at Microsoft, the Windows people, what they thought. And then we chatted about it a bit and they said about 1%.

Booch: Oh my goodness.

Humphrey: And my reaction is they're high by several orders of magnitude. And let me explain the reason why: the conditions that actually affect testing. I mean, testing will only test a specific set of conditions and the conditions that will affect testing include, for instance, how many job streams are running, what the configuration is for the system at that time, all kinds of things. And also what the operator's doing, what the hardware conditions are. So you can have a hardware failure, you could have data errors, you can have operator errors, you can have just an enormous range of things. And you make a list of all the variations, and then by the way you need different data values too. And so you got different data values. So if you go through and actually see what are the conditions -- I did this on a program with 57 lines of code that I'd written for the PSP. I went through and analyzed exactly how many test cases I'd have to run to exhaustively test it. I didn't worry about different data values yet. I assumed I would classify those and I could come up with that and I never did go back and do that. But it was like 268 test cases for 57 lines of code. I mean, it's extraordinary. And that's true. So people can talk about automated testing and that sort of thing, but the number of possibilities is so extraordinary you literally couldn't do a comprehensive test in the lifetime of the universe today.

Booch: So in effect there's a combinatorial explosion due to the number of possible states.

Humphrey: Exactly. And you look at all the number of possible ways things can connect, I mean it's extraordinary. And so when people have this enormous faith in testing it's vastly misplaced. And so the quality problem is really severe. And that's the issue that I was getting at. My sense was if you didn't deal with quality exhaustively at the very beginning, at the smallest unique level of the program, you will never solve the problem anywhere else. And that you can do. And so I found, and this is what I was after finding out with the PSP prior to my quality study, could I produce defect-free programming? And my contention that a program was defect-free, if I wrote the program, I had a design, I went through it, I produced a comprehensive test and if I wrote the program and I compiled it without error and I ran all the tests without error, then I figured I probably had a pretty good program. Now that was without error the first time I ran the tests. So I'm now treating testing as not a way to find defects, it's a way to verify the quality of the work I've done.

  • + Share This
  • 🔖 Save To Your Account