 Reliability and Fault Tolerance of RAMs

• Print
This chapter is from the book

1.6 Reliability Prediction Procedures

In many applications, reliability specifications drive product design. Therefore, accurate reliability predictions are very important in such applications. In this section, we review six well-known reliability prediction mechanisms that were proposed in the 1990s . The authors  used a 64-kb DRAM to illustrate the reliability calculations and provide insight into the manner in which various parameters influ-ence the failure rate predicted. The six reliability prediction mechanisms reviewed are:

• Mil-Hdbk-217E

• Bellcore (now, Telcordia) Reliability Prediction Procedure for Electronic Equipment (Bellcore RPP)

• Nippon Telegraph and Telephone Corporation Standard Reliability Table for Semiconductor Devices (NTT procedure)

• British Telecom Handbook of Reliability Data for Components Used in Telecommunication Systems (British Telecom HRD4)

• French National Center for Telecommunication Studies Procedure (CNET procedure)

• Siemens Reliability and Quality Specification Failure Rates of Components (Siemens procedure)

A simple and reliable failure-rate model used is the constant failure-rate model. In this model, the failure rate is a constant, and the reliability at time t is given as:

`R(t) = e- t. `

The failure rate depends on various parameters, such as the physical and operating characteristics of the circuit and the environment. With the assumption of a constant failure-rate, the failure rate of any higher-level assembly consisting of components with independent and uncorrelated failure characteristics is the sum of the failure rates of those components. If R1(t) = e- 1t and R2(t) = e- 2t are the reliability functions for two such components that are assumed to have independent failure characteristics, then the combined reliability will be R(t) = R1(t)R2(t) (recall that reliability denotes a probability). Therefore, R(t) = e-( 1+ 2)t is the combined reliability, and the failure rate of the assembly is given by 1 + 2.

Table 1.2. Models for failure rate of microelectronic devices; courtesy  © 1992 IEEE Table 1.2 gives the models for each of the reliability prediction procedures for ICs. The Mil-Hdbk-217 and CNET procedures provide two kinds of models – parts count analysis-based and stress analysis-based. The parts count model (called simplified in CNET) assumes typical operating parameters for a component and does not rely on knowledge of their values. The stress model requires a precise analysis of all the parameters influencing the failure rate.

The notations used in Table 1.2 are explained below:

1. PQ: quality factor, determined by post-manufacture inspection and test

2. C1 and C2: failure rate constants, C1 depending on circuit complexity and technology, and C2 depending on the packaging type and pin count

3. Pt , PB, Ps: CNET model parameters, which depend on the circuit technology and function, the packaging technology, and the package pin count, respectively

4. PT and Pt: temperature acceleration factors, which depend on the steady-state operating temperature of the device

5. PV , PU , PS: voltage stress factors, which depend on the ratio of the applied voltage to the rated voltage of the device

6. PE: environmental factor, which depends on the operating environment of the device; note, however, that this term does not address vibration, shock, or temperature

7. PL: device or process learning factor, that depends on the time for which the device has been in production

8. b, G, a: b denotes base failure rate, that depends on the device complexity and technology, G and a are generic and average failure rates, respectively, assuming average operating conditions (note: average and generic are synonymous in the above definition)

Using the above models, the failure rate (in FITs) for a 64-kb DRAM chip were computed . The assumptions for the operating environment used were as follows:

• There exists a ground-benign environment, which is an ideal environment having controlled temperature and humidity and negligible environmental stress.

• There is a ceramic (hermetic) encapsulation in dual-in-line (DIP) packages with 16 pins.

• Devices have survived the infant mortality period.

• Devices were manufactured to good specifications with proper qualification programs and adequate manufacturing controls.

• Power dissipation is 250 mW, typical for components of this size. Note: Power dissipation is used to estimate junction temperature and temperature factor.

Table 1.3 gives the computed values of the failure rate for the 64-kb DRAM. For the Mil-Hdbk-217 stress model, the ambient temperature is assumed to be 40°C and the power dissipation is 250 mW. These values lead to a device junction temperature of 47.5°C. The Mil-Hdbk-217 parts count model assumes an ambient temperature of 30°C and a junction temperature of 45°C. The very low failure rate predicted by the British Telecom HRD4 procedure is due to their choice of the base year (for calculating b, the base failure rate). In this procedure, the base failure rate b includes a time parameter which denotes the steady improvement (over time) in reliability with state-of-the-art manufacturing technology. For MOS DRAMs, this procedure uses the formula: b = (22/t)B5/t, and for MOS SRAMs, it uses the formula: b = (43/t)B5/t. For example, for a 64-kb DRAM, this formula gives a base failure rate of 8.1 FITs, with base year 1965 and current year 1992 .

Table 1.3. Predicted failure rates for a 64-kb DRAM; courtesy  „ 1992 IEEE Since the early 1970s, failure rates for microelectronic devices have decreased by approximately 50% every three years . Most of the failure rate expressions in Table 1.2 consist of a base failure rate modified by several P factors. In some cases, the resultant failure rate is given by the sum of two failure rates (with each modified by its P factor), the first failure rate corresponding to the circuit complexity and technology, and the second one corresponding to the packaging technology used. In the Mil-Hdbk-217 model, for instance, C1 and C2 correspond to these two components of the failure rate. This model observes that the technology complexity factor C1 is roughly proportional to the square root of the complexity (in terms of the number of bits for RAMs). For small DRAMs (less than 16 kb), C1 = 25, and for DRAMs 256 kb to 1 Mb in size, C1 = 200. The package complexity factor C2 for the Mil-Hdbk-217 model is as shown in Table 1.4 (with Np denoting the number of functional pins).

For the CNET model, C2 is assumed to behave as follows (with B being the number of bits in the device) :

• C1 = 5B0.4, for B < 100

• C1 = 12.5B0.2, for ROMs and EPROMs with B < 100

• C1 = 100(B/1000)0.5, for B = 100

Table 1.4. Mil-Hdbk-217 package complexity factor C2; courtesy  „ 1992 IEEE For a 64-kb DRAM, the calculated C1 with the CNET model is seen to be 810 FITs. The expression for C2 with this model is given as 7B0.2. Also, for the CNET model, the technology function factor Pt with which C1 is multiplied has a value of 3.5 for CMOS DRAM. For this model, C2 is multiplied by the package technology factor B and by the package pin count factor, Ps (see the explanations for the notations in Table 1.2, discussed before). PB is 1 for all hermetic packages and for all non-CMOS (i.e., bipolar, NMOS, PMOS) circuits in ground-benign packages. It has a value of 3 for all other packages for non-CMOS circuits and in case of a ground-benign environment with CMOS circuits. For other environments with CMOS circuits, this parameter has a value of 6. The parameter Ps has a value of 1 for packages with fewer than 24 pins and up to 4 for packages with more than 63 pins.

For the NTT model, b for a B-bit RAM is given as kB0.25 if B < 16, 384 and kB0.5 for larger devices. For a 64-kb DRAM, k = 0.337 and the value of b is 0.337(65, 536)0.50 = 86.3 FITs with this model.

Most of the reliability prediction models above rely on a quality factor PQ. This quality factor represents the relationship between the quality of components and the failure rate. In each model, several levels of quality are designated and a value of PQ is attached to each level. Some of these quality requirements come from the customer (e.g., the U.S. military) and others are included in the manufacturer's own quality assurance practices. For example, in the case of Mil-Hdbk-217 and the accompanying quality and reliability requirements set forth in Mil-M-38510, seven quality levels S, S_1, B, B_1, B_2, D, and D_1, are recognized. The S and S_1 quality levels are intended for devices with very stringent reliability requirements, such as those used on life-supporting systems and expensive satellites. For such devices, 100% inspection or testing is required, and an entire lot is rejected if more than a certain fraction, typically 5 to 10%, of the devices fail. Class B devices have a more relaxed reliability requirement and require only 160 hours of burn-in testing as opposed to 240 hours of burn-in at 125xC with Class S devices. Classes B_1 and B_2 allow further slack in the test procedures. Class D is used for hermetically sealed devices with the manufacturer's own quality assurance and normal screening procedures. Class D_1 is the category for nonhermetic parts that are used commercially. Some of these parts can be included in Class D if they have passed a burn-in test. Similarly, the other reliability prediction procedures described have their own quality levels. Although the Siemens procedure does not specify a quality factor, it uses failure-rate values that reflect appropriate quality assurance controls by the component manufacturers.