- Impact of Scaling on Reliability
- Defects, Faults, Errors, and Reliability
- Reliability and Quality Testing and Measurement
- Reliability Characterization
- Reliability Prediction Procedures
- Reliability Simulation Tools
- Mechanisms for Permanent Device Failure
- Safeguarding Against Failures
- Concluding Remarks
1.8 Mechanisms for Permanent Device Failure
Physical and chemical mechanisms underlying permanent device failure (or hard errors) can be broadly divided into (1) Chip- and assembly-related (or technology-related), (2) design-related, and (3) environment-related. The first kind of hard failures can be caused by defects in the processing technology, dielectric breakdown, electromigration, and stress migration. Design-related failures can be caused by poor circuit design, or a poor design methodology used to correct signal integrity and other problems. Environment-related damage can be produced in various ways: damage can be caused by human mishandling or poorly grounded equipment, electrostatic discharge (ESD), or single-event effects. Some of these factors, such as ESD, have already been reviewed. Single-event effects are caused when highly energetic particles present in the space or terrestrial environments strike sensitive regions of a microelectronic circuit. In addition to causing soft errors which are temporary and nondestructive, some of these single events may be destructive and cause permanent damage to memory devices. Examples of such phenomena are single-event latchup (SEL), single-event burnout (SEB), single-event hard error (SHE), and single-event gate rupture (SEGR).
1.8.1 Chip- and assembly-related failures
These failures can be either caused by the dopant, the oxide, the metallization, any etchant residue, or by the chip assembly. Furthermore, any combination of these problems can lead to such failures. Dopant-related failures may come about because of mobile ions (both positive and negative, such as sodium and halide ions) trapped in or released from the oxide or the plastic package. The accompanying surface charge movement, often accelerated with the bias voltage, at the Si/SiO2 interface, may affect the threshold voltage of a transistor, or cause high leakage current. Even worse, the charge accumulation may result in a parasitic MOS transistor being turned on. Another example of ionic movement is in intermetal planarization layers which use carbon-based spin-on glass, containing organic compounds. Hydrogen ions produced by the plasma nitride interaction with these organic compounds may cause field inversion.
Contamination of the oxide layer affects its dielectric properties, including the breakdown voltage. Moreover, the electric field across the oxide accelerates its breakdown. The electric field causes charge to be trapped in the oxide and creates a built-in electric field consisting of trapped charges. Oxide breakdown may occur if the built-in field exceeds a critical value.
A poor ohmic contact between Al and Si is another source of such failure. Ohmic contacts are created by alloying aluminum with silicon below 577°C. For temperatures around 400 to 450°C, silicon is in solid state and it slowly diffuses into aluminum, and the Al/Si interface seeps into the silicon. An additional 1 to 2% silicon may help in minimizing the junction silicon from dissolving into aluminum. However, overalloying in shallow junctions required in VLSI circuits may lead to junction shorting. To reduce junction shorting, a barrier metal or compound such as Ti/TiN is used in advanced processes. Another cause of metal failure is the formation of microscopic cracks or voids in contacts and vias. Often, these cracks are the result of steep oxide 'steps' that have inadequate metal deposited on them. Semiconductor memory manufacturers try to minimize metal step coverage by monitoring lot samples using scanning electron microscopes (SEMs). Such cracks are primarily a problem with 0.35-mm and older technologies. For more recent technologies (with feature sizes as low as 0.12 mm), plug materials such as Ti/TiN and tungsten used in contacts and vias have electrical as well as structural benefits.
Insufficient cleaning and surface passivation after etching metal may leave a small amount of residue of etchant. The plastic package or the oxide may contain chemicals such as phosphorus or halides, which may lead to chemical corrosion of the metal lines. Furthermore, in the presence of moisture, electrolytic corrosion may also occur. Elaborate postetch cleaning and passivation techniques may be required to avoid metal corrosion.
Another source of chip failure is in the chip assembly, including the mounting and wire-bonding mechanisms and the packaging material used. Poor process control, improper handling, contamination in solder balls or bumps, and mechanical and thermal stresses across the interface between the chip and the assembly may lead to voids and cracks in the bonding. These stresses are often due to a mismatch of the thermal coefficients of silicon, die attach material (e.g., gold, eutectic, epoxy), and the plastic or ceramic package. Integrated circuits mounted with a eutectic mixture usually have a better thermal coefficient match than those mounted with epoxy; however, the composition of the eutectic is very important in ensuring good thermal conductivity and mechanical strength. Poor thermal conductivity at the back surface of the chip with the assembly, sometimes caused by oxidation of eutectic material, may lead to failure.
Integrated circuits are usually bonded to their package leads with gold or aluminum wires. The bond may be weakened by cracks, high temperature, mechanical and thermal shock, and stress due to electrical bias. Gold wires are usually bonded by thermal compression, whereas aluminum wires are bonded ultrasonically to bonding pads made of aluminum alloy. In case of gold thermally bonded to aluminum, electrochemical potential difference between aluminum and gold, coupled with high temperature, may lead to the formation of the intermetallic phase AuAl2, popularly known as purple plague due to its color. This gold-aluminum interaction degrades the bonding pads and causes void formation and resulting failure. In the case of aluminum wires bonded to alloyed aluminum pads, good process control of wire diameter, tensile strength and ductility are important in ensuring their strength. Often, the strength of these aluminum wires is increased further by alloying with 1% silicon.
Semiconductor memory ICs are typically packaged in metal, ceramic, or plastic packages. For high reliability, the package should be hermetically sealed. A poorly controlled sealing process and large temperature swings can cause chip failure. Furthermore, the finish and the cleanliness of package leads, and the humidity of the atmosphere, may affect the solderability of the ICs to the memory board. High humidity may cause oxidation and corrosion of IC pins and result in expensive field repairs.
1.8.2 Design-related failures
Although rare, a chip failure, such as a timing problem, crosstalk, or threshold shift, can sometimes be traced back to poor circuit or layout design. However, most CAD tools nowadays are very sophisticated and the chance of a faulty circuit or layout going unnoticed until the chip goes to mask is very small. However, some legacy CAD tools have weaknesses that can manifest themselves occasionally as timing and other problems after chip fabrication.
The important manufacturability, yield, and reliability criteria are encoded in the design rules accompanying the manufacturing process, and layout design tools enforce these design rules on the chip. Therefore, a number of potential yield and reliability problems are eliminated during the design verification process. Some CAD tools that optimize netlists of transistors and do layout purely at the transistor level may, however, fail to consider all the effects that cause degradation in deep-submicron technology, such as hot carrier effects and electromigration (see Chapter 5 for more such phenomena). This is a symptom of an outdated legacy tool that is being used to solve present design problems, and therefore, warrants the improvement or replacement of such a tool. In other cases, a CAD tool may perform reliability-driven design, but the user of the tool or design methodology (such as the chip designer) may not make good trade-offs between reliability, timing, signal integrity, and power based on the customer's specifications.
1.8.3 Environment-related failures
Failures due to electrostatic discharge and electrical overstress (ESD/EOS) have already been discussed. Such failures will be exacerbated by poor human handling and poor grounding of laboratory and other equipment. In addition, radiation-induced single-event effects can also be destructive in nature. Destructive single-event-related failures include single-event latchup (SEL), which in some cases may be nondestructive, single-event gate rupture (SEGR), single-event burnout (SEB), and single-event hard error (SHE). We will describe some of them in greater detail in Chapter 3 and will briefly examine them here.
The mechanism for latchup in CMOS devices with a P-N-P-N structure has already been reviewed. This effect, also known as the thyristor effect, causes a parasitic bipolar transistor  inherent in a CMOS device (with an NMOS transistor adjacent to a PMOS transistor) to turn on. This parasitic transistor will result in the device to go into a sustained high-current mode that would eventually destroy the device due to thermal runaway or metallization failure . A large substrate or well current can trigger latchup. Such a large current may be produced by a single-event strike. SEL was observed in tests at the terrestrial level as early as 1979 , and was thought to be limited to heavy ions and bulk technologies. As technologies progressed, it was found that some devices fabricated on epitaxial substrates could also exhibit latchup , and moreover, mere protons could induce latchup in sensitive technologies [2,188,324].
Single-event gate rupture (SEGR) refers to the dielectric breakdown caused by an energetic particle strike to the gate region of a device. This causes the electric field across the gate insulator to exceed some threshold value. Although SEGR has been observed and studied most extensively for power devices such as double-diffused power MOSFETs equipped with a thick, lightly doped epitaxial region to sustain high voltages without breakdown, recently they have also been investigated for logic and memory ICs. In 1994, Swift et al.  observed a new kind of hard error in 4-Mb DRAMs that was consistent with SEGR failure.
Single-event burnout (SEB) due to heavy ions, neutrons, and protons is quite similar to latchup. In this case, a single parasitic bipolar transistor is created. For example, in case of a power MOSFET structure with a lightly doped N-type epitaxial region, the three terminals of the parasitic BJT will be N-source (emitter), P-body (base) and N-epitaxial layer (collector). Following an ion strike, the parasitic BJT may be turned on and may lead to avalanche multiplication of the BJT collector current, resulting in excessive junction heating and eventual device burnout. These single-event effects are described in more detail in Chapter 3.
Single-event hard error (SHE) due to heavy ions, neutrons, and protons is a failure mechanism that causes permanent damage to a memory device, for example, stuck-at bits. SHE may have significant implications in memory scaling and use in space applications [342, 434]. One reason for SHE is the microdose effect (explained in Chapter 3), first observed for resistive-load SRAM devices. This effect causes deposition of charge by a heavy ion within the gate oxide of a MOS device. If the relative size of the ion track and the device gate region are comparable, a significant total-dose device response may be experienced in the form of a large threshold voltage shift. This can lead to device failure and stuck bits (hard errors) in scaled DRAM structures, as predicted by Oldham et al. . Swift et al.  observed a hard error in 4-Mb DRAMs which displayed some deviation from the microdose effect and has been regarded as being related to SEGR.