Reliability and Fault Tolerance of RAMs
- Introduction
- Impact of Scaling on Reliability
- Defects, Faults, Errors, and Reliability
- Reliability and Quality Testing and Measurement
- Reliability Characterization
- Reliability Prediction Procedures
- Reliability Simulation Tools
- Mechanisms for Permanent Device Failure
- Safeguarding Against Failures
- Concluding Remarks
CHAPTER OVERVIEW
Scaling and reliability
Supply voltage and power constraints
Threshold voltage control
Gate oxide reliability
Hot carriers, latchup, soft errors, electrostatic discharge, and electrical overstress
Metallization reliability (contamination, electromigration, and stress migration)
Defects, faults, errors, and reliability
Reliability testing and measurement
Reliability characterization
Reliability prediction
Reliability simulation tools
Mechanisms for permanent device failure
Safeguarding against failure
1.1 Introduction
Since the mid-1990s, the demand for reliability and fault tolerance in semiconductor memories has increased tremendously. This demand has been fueled largely by the revolutionary growth of system-on-a-chip (SoC) and embedded controller architectures that require large amounts of on-chip SRAM, DRAM, and flash memories, integrated with either digital (i.e., ASIC or FGPA), analog, or mixed-signal technologies. As minimum feature sizes of MOS transistors for very deep submi-cron technologies are constantly being scaled down toward 0.05 mm, embedded memories have become very vulnerable to even minor process variations, resulting in low manufacturing yield and reliability. Associated with these process variations are various parametric faults, such as substrate and gate oxide leakage currents and threshold voltage shifts. Sometimes, these phenomena may cause a memory device to pass manufacturing tests but eventually fail during field use. These permanent, or hard errors, therefore, are the result of process technology variations and weaknesses due to scaling that cause layout defects and leakage currents in memory devices.
RAM reliability has also been a major concern among designers of integrated circuits used in mission-critical space and real-time applications, including those used in harsh, high-radiation environments. In such an environment, single-event effects (SEE),for example, heavy-ion or alpha-particle strikes, neutron strikes, and ground-level cosmic radiation, may occur and cause stored data upsets, known as single-event upsets (SEU) or soft errors. Other single-event effects, such as single-event gate rupture (SEGR), single-event hard error (SHE), and single-event burnout (SEB) may have destructive effects on memory devices. The cost of field maintenance of memory devices used in space and real-time applications may be too high, warranting the need for an automated built-in self-repair mechanism, system-level error correction, or a proactive circuit hardening or technology hardening approach to reduce the risk of such problems.
Therefore, reliability and fault tolerance of RAMs are important during both manufacture and field use. Also, from the manufacturing test standpoint, quality and reliability testing early in the lifetime of the device are as crucial as functional testing.
Unlike functional testing where the expected behavior of a fault-free device can be characterized precisely by a binary bit or a bit vector, quality and reliability testing approaches rely on statistical characterization of a set of measurements of observed behavior accompanied by economic decision making. The decision-making process involves forecasting manufacturing cost, operational effectiveness, warranty cost, marketing strategy, and logistical support. This entire process thereby leads to a determination of the intrinsic quality and reliability of the manufactured device. In this chapter, we shall describe the parameters affecting the intrinsic quality and reliability and how to measure and test these parameters.
Table 1.1. Speculative CMOS technology scaling trends compiled from NTRS and ITRS data
|
1997 |
1999 |
2002 |
2005 |
2008 |
Channel length (μm) |
0.25 |
0.18 |
0.13 |
0.10 |
0.07 |
Memory (bits per chip) |
256M |
1G |
4G |
16G |
64G |
Transistors per microprocessor |
11M |
21M |
76M |
200M |
520M |
Power supply voltage VDD (V) |
1.8-2.5 |
1.5-1.8 |
1.2-1.5 |
0.9-1.2 |
0.6-0.9 |
Oxide thickness (nm) |
8 |
6.5 |
5.5 |
5 |
4.5 |
DRAM chip area (mm2) |
280 |
420 |
640 |
960 |
1400 |