- Software, Software Engineering, and the Software Engineering Process
- Object Orientation
- Defect Management
- Specifying Quality
- Views of Quality
- Internal Quality and External Quality: Form Complements Function
- Software Product Quality Attributes
- Assessing Product Quality
- Achieving Quality Goals
Assessing Product Quality
Another popular definition of quality is "absence of defects" (Crosby, 1979). In this section we put forward an argument that this definition is also in accord with the one we have provided and that the perspective it provides will assist in the construction of a bridge that will lead us toward a model that relates internal measures of software to external characteristics and to the measures of process quality.
We mentioned that the fitness-for-purpose argument holds that all external attributes of software relate to how the system is seen to satisfy some need. As such, any measure of an external quality attribute would be a measure of the expectation that the software must do what it is supposed to do. Thus we can say that if a product does not do what it is supposed to do, it has failed in some respect. We therefore define a failure (for the moment) as an instance of lack of fitness-for-purpose.
Similarly, we said that fitness-of-form argues that internal characteristics of the software must have been designed satisfactorily. In other words, the software must also be what it is supposed to be. If the software does not satisfy this condition, we can say it has failed, or has a failure. This new notion necessitates the separation of the two concepts of failure due to function on the one hand, and failure due to form on the other. We therefore define the former as operational failure and the latter as structural failure in accord with the definitions of operational and structural failure in all other fields of engineering (Ditlersen & Madsen, 1996).
Products fail because they are defective, or in the course of operation they develop defects. Software, however, does not wear (Pressman, 1995) in that given no change in requirements, it does not develop defects as a result of utilization, unlike a mechanical device that eventually yields to metal fatigue. Therefore it is only the former category of failures (those designed into the product) that are of importance in the case of software. It is logical to assume that these failures that are designed into the product are all results of some omission, mistake, or error by someone in the development team. Therefore, the concepts of defects and failures are very closely linked. It is thus important to manage defects in the course of software development if high quality is desired.In the next section we first provide an overview of the current situation of software engineering in terms of our ability to manage defects, and then build a model that relates the three perspectives of importance mentioned in our definition of quality.
Defect Management as a Means of Ensuring Quality
Although the situation has improved recently, there has traditionally been substantial confusion with respect to defect prevention and defect removal terminology. Beizer (1991) equated the terms bug, glitch, error, slip, fault, and oversight, among others, as referring to some shortcoming in software. Ghezzi et al. (1992), although distinguishing between defects and failures, equated defects with faults, errors, and bugs. Similarly, a number of authors (Ghezzi et al., 1992; Hetzel, 1988; Humphrey, 1995) explicitly stressed the need to distinguish between debugging and testing, implying the existence of confusion between the two terms. Hetzel (1988) stated that "a number of early papers on 'testing' actually concern 'debugging'" (p. 4). Even when distinguished from debugging, different authors have defined the activity of testing variously. Hetzel (1988) defined testing as "the process of establishing confidence that a program or system does what it is supposed to" (p. 4). On the other hand, Myers (1979), in a still widely cited work, defined testing as "the process of executing a program or system with the intent of finding errors" (p. 7). An opposite view to that of Hetzel, this supports the famous assertion by Dijkstra (in Dahl et al., 1972) that program testing can be used to show presence of bugs, but never their absence. A discussion of the differences between the two definitions just given and Hetzel's defense of his own is provided in Hetzel (1988). This argument, which revolves around the assertion that Myers's definition is too narrow and implies application of testing only after a program has been written, in fact highlights another point of confusion. Hetzel, in attempting to mount a defense of his view, confuses (a) product testing versus product acceptance and (b) defect identification (e.g., black box testing) and defect propagation prevention (e.g., design inspection).
Similarly, in another older but very influential work, even today, Goodenough and Gerhart (1975) defined a "successful" test as one that does not uncover any deficiencies, whereas Myers (1979) and others (e.g., Beizer, 1991) define a successful test exactly the opposite way; that is, one that does uncover at least one deficiency.
Under these conditions it is necessary to explicitly enumerate the definitions that pertain to our subsequent work. Effort has been expended to arrive at a consistent, popularly recognized set of such definitions. The set presented next is derived to be consistent with current prevalent definitions, particularly those of Ghezzi et al. (1992), Humphrey (1995), and Sommerville (1995). The wording and the taxonomy presented based on these definitions are, however, my own.
We start by restating that a software program can be viewed as three distinct entities, with the distinctions between them regarded as critical:
The source code. When viewed as source code, a program may be regarded as a very precise (formal), yet intermediate model of the requirements developed through progressive refinement of an initially ambiguous and less formal model of a possible solution (design model). This is done by constraining the modeling syntax and the ultimate transfiguration into a modeling language that is executable (the target programming language). As such, source code can be deemed as a formal model of the requirements, but it is often popularly referred to as software, as in "I am writing software."
The executable. Probably the most nebulous of the three manifestations, the executable is another formal and intermediate model of the requirements translated into a syntax that can be executed on the target machine architecture. At this stage it might be considered an integral and indistinguishable part of the system architecture that is to ultimately provide an execution of the solution sought. This too is often referred to as software; for example, "Give me a copy of that software," or "Let's load the software."
The execution. This is the solution provided through running of the program-executable code on some hardware. In other words, the execution is a model of some virtually constructed environment created to satisfy the intended requirements (hopefully the one understood in common between the producer and the client). Arguably, this virtual environment is the true software product. This too is referred to as software, as in "Our software would not let us do that."
With this observation of the multiplicity of software in mind, we now attempt to provide definitions for some common terms in this area.
We mentioned earlier that software product quality was best discerned through the external product-quality attributes such as functionality and reliability. Therefore the recognition that such quality expectations are not being met with respect to a particular product is identified through these means. We term such shortcomings failures.
A failure is said to have occurred when the software product does not satisfy expectations (Ghezzi et al., 1992; Sommerville, 1995). This might take the form of an incorrect result, nonprovision of a result, presentation of an unexpected system state, or a user perception of a poor interface.
Not all failures are directly related to software source code or an executable. In fact, what was just described is best termed system failure, with software failure being a specific case. For example, a failure observed as the presentation of an erroneous result might be traced to the executable. Another failure caused by an interruption in the power supply is not. In our discussion, we only consider the failures of the former type. An exceptional case must be highlighted in that the way the software is to behave under certain specific environmentally caused failures (e.g., power interruption) might be specified. Lack of fulfillment of such requirement should still be considered an omission (lack of functionality).
In fact, software failure itself may be divided into two categories: operational failure, when the software does not do what it is supposed to do with regards to some external quality attribute, and structural failure, when the software is not what it is expected to be. An example of the former might be a division by zero resulting in a reliability failure. An example of the latter might be a maintainability failure, such as the nonprovision of a class header. Note, however, that this is manifest in the source code, but does not propagate to the executable and beyond.
A fault is an incorrect state entered by a program executable (Ghezzi et al., 1992) or an incorrect transformation undergone; for example, the result of a division by zero with the exception not being handled. Faults are characteristics of the executable code and only result in failure when they are executed with a valid set of data causing transition from a valid state or resulting in the provision of incorrect output. Faults do not necessarily cause software failure, which is also impacted by how the software is used. Reliability (related directly to failures) is a function of executable faults as well as how the system is executed. The complex relationship between executable faults and execution reliability has been studied by a number of investigators (e.g., Littlewood, 1990; Mills et al., 1987). Defect
A defect is an imperfection in the software engineering work product that requires rectification (Humphrey, 1995). Some of these defects cause the generation of an executable that contains a fault (e.g., some code that makes the value of a variable x to equate to zero for y = 3 when the intention is to make x = y).
A defect that causes the generation of a fault is called a bug. This implies that some defects do not generate faults. One such defect might be the provision of a comment on the wrong line or in the wrong section of the source code. A bug is also sometimes called an error (e.g., in the context of compiler reports). This is not the case in our terminology, as the term has a separate meaning described next.
In our terminology, an error or a mistake is a commission or omission at any stage of the software process by a software engineer that results in a defect (Humphrey, 1995).
Operational Failure in Software
Operational failures in software can manifest themselves in the form of shortcomings in the external quality characteristics of software that deal with what the system must do. The important ones we mentioned were functionality, reliability, and usability. Maintainability and some other attributes not included in this work, such as reusability, mainly deal with structural failure.
We define functionality as the extent to which the software product under utilization accurately implements the operations required of it as defined in the requirements specification. This is in line with the definition provided by ISO 9126. In other words, functionality is the degree of faithfulness in the implementation of requirements, either stated or implied (Younessi & Grant, 1995). Missing functionality therefore naturally detracts from operational quality.A precise and universal measure of functionality is not available, but functionality is usually estimated as the coverage demonstrated during acceptance and in-field utilization of the functional behavior of the system in comparison to the specified requirements. A highly functional system therefore has a high proportion of its specified required functions demonstrably implemented.
To estimate the degree of functionality, therefore, one needs two artifacts: the specification and the final product. There is also a tacit assumption that the specification itself is complete and reflects all the requirements explicitly. Unfortunately this assumption is not safe, as it has been shown that a large portion of problems encountered during system acceptance and subsequently is due to incomplete specification of requirements; that is, the existence of tacit requirements (Lauesen & Younessi, 1998).
Putting the issue of completeness aside for the moment (we discuss later that these failures can also be traced back to defects, not necessarily in code but instead requirements or architectural design), we concentrate on the issue of the implementation and its extent and accuracy of coverage of the explicitly stated requirements.
Correctness is a very significant relationship between functionality and an internal characteristic of the software product. To assess whether a software product is functionally defective, it is usually assessed against its specification. Deviations from the specification would then indicate the defect areas. These are usually one of two types.
Defects of omission are those deviations from the specification that lead to some intended function not being implemented; for example, a software product cannot display the result of a calculation or query due to the omission of a print function, although the specification requires it.
Defects of commission are those deviations from the specification that, although functionally implemented, fail to operate properly. They provide incorrect or unexpected results despite the provision of valid inputs; for example, the print function from the previous example has been implemented but prints the address of x rather than its value.
The relationship specified between the external quality attribute of functionality and the internal software characteristic of correctness implies that omissions and commissions do introduce defects into the software that can manifest themselves as shortcomings in functionality, or to use a better term, functional failure. These defects we term defects of functionality or F-type defects. Removing F-type defects should logically improve the functionality of the software.Reliability
Reliability has been defined as the fundamental statistical measure of failure-free operation (Sommerville, 1995).
Assuming functionality as given, the failure-free operation referred to in this definition differs from that in the omission and commission defects already mentioned. We introduce the term transitional failure to define those failures to meet the specification when the system makes a transition from one state to another. Under such circumstances, although the operation or functionality required has not been omitted from implementation, the system still (often only under certain circumstances) produces unexpected results as a consequence of making the transition into an invalid state. For example, we have a transitional failure when a statement makes a division by zero calculation when given certain values of a variable x although the basic formula for the calculation is present and in accordance with the specification.
It is important to distinguish between functional failures of commission when dealing with functionality and those of the transitional type discussed here. The distinction is that although the former produces a wrong result, the state of the system after the production of such a result is expected to be valid and correct (e.g., printing the address rather than the value of a variable). In the case of operational failure, the result may or may not be correct (it usually isn't), but the state after the operation is unexpected, invalid, and incorrect (e.g., dividing by zero when attempting to print the value of a variable).
Reliability can then be expressed as the probability that a software product will not produce a transitional failure when utilized in its operating environment for a given period of time.
Based on this, we can define R-type defects as those that lead to transitional failures or failures in reliability. Many authors and practitioners include the F-type defects of commission in the same group as R-type defects, but we maintain their separation.
Usability is the extent to which the product is convenient and practical to use. This property of software products is usually evaluated by conducting a usability evaluation against a usability specification often organized in terms of some attributes (e.g., learnability, adaptability, etc.; Dix et al., 1993; Preece et al., 1995; Shneiderman, 1992). The extent of conformance to such a specification is therefore taken as the usability of the system. In this sense, evaluating usability is similar to evaluating functionality. The implication is that omissions and commissions of certain types do introduce defects into the software that can manifest themselves as usability shortcomings. These defects are defects of usability or U-type defects. Removing U-type defects should logically improve the usability of the software.
Structural Failure in Software
Structural failures in software can manifest themselves as shortcomings in the external quality characteristics of software that deal with what the system must be. The important one we mentioned was maintainability. We also provided reasons why some other attributes that relate to structural failures, such as reusability, have not been included in this study.
Maintainability is defined as the degree to which software can be corrected, adapted, and enhanced to fit an altered set of requirements (Basili, 1990; Ghezzi et al., 1992; Lano & Haughton, 1994; Lientz & Swanson, 1980).
In the context of software product quality, Sommerville (1995) highlighted that the aim in measuring maintainability is not to arrive at a cost of making a particular change or to predict whether or not a particular component will have to be maintained. As the preceding definition suggests, the aim is to determine the ease with which the software program may be refitted into a new situation. Sommerville, among others, also asserted that all product-quality-focused definitions and measures of maintainability are based on the assumption that maintainability of a program is related directly to its complexity of design. On the other hand, it can also be argued that maintainability relates to the quality of the supporting documentation.
These considerations suggest that preventing or identifying and removing these design defects in software that would lead to maintainability problems (what we call M-type defects) would improve the maintainability of the product, at least in an absolute sense. Identification and removal of defects (in the sense of our definition) in the documentation also can be argued to have the same effect.
Other Potential Attributes
Despite what was presented earlier regarding the adequacy of the preceding attributes of product quality, one might still wish to add other attributes (e.g., reusability). Doing so is consistent with our framework and what will proceed from it in terms of the future models that are to be built in the context of this work. What needs to be done, however, is the following:
Arrive at a concise definition for the attribute to be added.
Ensure relative orthogonality with existing attributes. In fact, recognition of a defect as more than one type is generally permissible (e.g., a denominator that may be evaluated to zero when this is not part of the specification is both a reliability defect and a functionality one). This permissibility requires a proviso that the defect classification scheme supporting the model recognizes this fact and the measurement model knows how to deal with such overlaps. In our work here, such nonorthogonality is permitted.
Define the types of defects that might contribute to failures impacting the newly introduced attribute within a defect classification scheme that forms the basis of defect management activities relating to the process by which the product is built or evaluated (e.g., E-type defects for reusability).