Home > Articles > Hardware

Reliability and Fault Tolerance of RAMs

  • Print
  • + Share This
Learn how advanced techniques are allowing engineers to improve RAM reliability without compromising performance, cost, or space requirements.
This chapter is from the book

CHAPTER OVERVIEW

  • Scaling and reliability

    • Supply voltage and power constraints

    • Threshold voltage control

    • Gate oxide reliability

    • Hot carriers, latchup, soft errors, electrostatic discharge, and electrical overstress

    • Metallization reliability (contamination, electromigration, and stress migration)

  • Defects, faults, errors, and reliability

  • Reliability testing and measurement

  • Reliability characterization

  • Reliability prediction

  • Reliability simulation tools

  • Mechanisms for permanent device failure

  • Safeguarding against failure

1.1 Introduction

Since the mid-1990s, the demand for reliability and fault tolerance in semiconductor memories has increased tremendously. This demand has been fueled largely by the revolutionary growth of system-on-a-chip (SoC) and embedded controller architectures that require large amounts of on-chip SRAM, DRAM, and flash memories, integrated with either digital (i.e., ASIC or FGPA), analog, or mixed-signal technologies. As minimum feature sizes of MOS transistors for very deep submi-cron technologies are constantly being scaled down toward 0.05 mm, embedded memories have become very vulnerable to even minor process variations, resulting in low manufacturing yield and reliability. Associated with these process variations are various parametric faults, such as substrate and gate oxide leakage currents and threshold voltage shifts. Sometimes, these phenomena may cause a memory device to pass manufacturing tests but eventually fail during field use. These permanent, or hard errors, therefore, are the result of process technology variations and weaknesses due to scaling that cause layout defects and leakage currents in memory devices.

RAM reliability has also been a major concern among designers of integrated circuits used in mission-critical space and real-time applications, including those used in harsh, high-radiation environments. In such an environment, single-event effects (SEE),for example, heavy-ion or alpha-particle strikes, neutron strikes, and ground-level cosmic radiation, may occur and cause stored data upsets, known as single-event upsets (SEU) or soft errors. Other single-event effects, such as single-event gate rupture (SEGR), single-event hard error (SHE), and single-event burnout (SEB) may have destructive effects on memory devices. The cost of field maintenance of memory devices used in space and real-time applications may be too high, warranting the need for an automated built-in self-repair mechanism, system-level error correction, or a proactive circuit hardening or technology hardening approach to reduce the risk of such problems.

Therefore, reliability and fault tolerance of RAMs are important during both manufacture and field use. Also, from the manufacturing test standpoint, quality and reliability testing early in the lifetime of the device are as crucial as functional testing.

Unlike functional testing where the expected behavior of a fault-free device can be characterized precisely by a binary bit or a bit vector, quality and reliability testing approaches rely on statistical characterization of a set of measurements of observed behavior accompanied by economic decision making. The decision-making process involves forecasting manufacturing cost, operational effectiveness, warranty cost, marketing strategy, and logistical support. This entire process thereby leads to a determination of the intrinsic quality and reliability of the manufactured device. In this chapter, we shall describe the parameters affecting the intrinsic quality and reliability and how to measure and test these parameters.

Table 1.1. Speculative CMOS technology scaling trends compiled from NTRS and ITRS data

 

1997

1999

2002

2005

2008

Channel length (μm)

0.25

0.18

0.13

0.10

0.07

Memory (bits per chip)

256M

1G

4G

16G

64G

Transistors per microprocessor

11M

21M

76M

200M

520M

Power supply voltage VDD (V)

1.8-2.5

1.5-1.8

1.2-1.5

0.9-1.2

0.6-0.9

Oxide thickness (nm)

8

6.5

5.5

5

4.5

DRAM chip area (mm2)

280

420

640

960

1400


  • + Share This
  • 🔖 Save To Your Account