1.4 From Devices to Circuits
The key features of the current generation of silicon CMOS devices include the use of self-aligned ion implantation to dope the source, drain, and gate; metal silicides on silicon surfaces for reducing the sheet resistance and to improve upon the ohmic contact; shallow trench isolation (STI) to separate FETs to save silicon area; and the employment of nonuniform (retrograde) channel doping that is coupled with halo implants to control short-channel effects .
According to the exponential projections of the Semiconductor Industry Association (SIA) roadmap for the year 2012, dynamic random access memories (DRAMs) are expected to have a capacity of 256-gigabit (Gb), microprocessors with 1.3 ∴ 109 logic FETs, a gate lithography of 35 nm (channel lengths at or below 20 nm), across-chip clocks of 3 gigahertz (GHz), a power supply voltage of 0.5 V, and an equivalent gate oxide thickness of less than 1.0 nm will emerge. However, realizing these goals will necessitate finding new technologies.
The ongoing shrinking of silicon MOSFETs complicates device behavior. To maintain desirable transfer characteristics, more specialized doping or more complex structures are required. The lateral dimensions are constrained by gate lithography and lateral doping profiles, whereas the vertical dimensions are restricted by gate insulator tunneling considerations, vertical doping profile abruptness, and, for bulk or partially depleted SOI designs, maximum body doping constraints due to body-to-drain tunneling.
In the recent published work on possible 25-nm bulk design, many of these issues surfaced . The first design consideration is the requirement for the SiO2 gate insulator to go below 1.0 nm, which could lead to high tunneling leakage through the SiO2. A compromise scheme such as the employment of a thicker oxide/nitride insulator with equivalent oxide thickness of 1.5 nm gives tunneling leakage of ~1 A/cm2. This is possibly the thinnest usable oxide/nitride insulator because thinner insulators will have excessive standby power dissipation thereby restricting the use of dynamic logic.
The reduction of supply voltage to minimize reliability problems  and power dissipation to, say, 1 V for high-performance designs and perhaps as low as 0.6 V for low-power circuits  necessitates a low threshold voltage. However, the threshold voltage must be high enough to prevent the off-state current from exceeding the power budget. The threshold roll-off can be compensated by using super-halo implants (see chapter 2), but the compensation can lead to performance degradation. The implementation of thin high-permittivity insulators to conventional FETs may not help to achieve significant device scaling because the body-doping constraints will still limit the depletion depth. A possible measure is to change the device structure so that a gate below the channel replaces the body . Other feasible variations exist: The two gates may be either separate or connected, the work functions may be different or the same, and the current flow may be in the x, y, or z direction . Theoretically, the three-dimensional (3D) generalization using a cylindrical gate type is the most scaleable. These double- or surround-gated structures have the potentially for more scaling than the conventional FETs. The details of these new features are discussed in section 2.8.3.
The electromigration (EM) current threshold for aluminum (Al) is 2 ∴ 105 A/cm2. Beyond the 0.25-μm device generation, the current densities could reach a level that could induce EM failure for traditionally doped-aluminum conductors. Copper (Cu), on the other hand, offers an EM current threshold of 5 ∴ 106 A/cm2, thereby providing a good substitute for Al metallization and overcoming the EM limitation. Cu strong resistance to EM is primarily due to its high melting point of 1082þC, whereas the melting point for Al is only 660þC .
The continual shrinking of the interconnect cross section leads to higher line resistance. The smaller pitch also results in elevated line-to-line capacitance. Bohr reported that a 0.25-μm line-width Al-metal that is longer than 436 μm can contribute more delay than a 0.25-μm gate . Using copper will certainly help to lower the interconnect delay and provide further shrinkage to the upper interconnect levels, thereby increasing the wiring density and reducing the number of metal layers. The use of copper should also help to drive down processing costs. Because copper cannot be dry-etched easily, the damascene (in-laid) approach is used to deposit copper. This approach gives an additional advantage because the dual damascene process can fabricate both the line and via levels concurrently, which results in approximately 30% fewer steps (and hence lower cost) than the single damascene or subtractive patterning method .
As chip complexities increase, design problems, such as layout of a chip and simulation for the circuit, all correspondingly escalate. Today, circuit designers are often required to design large, complex circuits. Generally, this task is becoming increasingly difficult because designers are now facing not only circuit problems but also process- and device-related issues. Furthermore, they must juggle different design requirements and balance conflicting constraints. First come the multiple levels of abstraction from the specification of a chip function to a layout. This process needs much work because it covers both the "front-end" and "back-end" design activities. Generally, front-end design activities include system definition, functional design and simulation, logic design and simulation, and circuit design and simulation. The back-end design activities involve those physical design details that require little creative work but instead the mechanical translation of the design into semiconductor (e.g., silicon). Another important constraint is the cost, where one must strike a balance between the performance and the price paid to achieve it. This constraint, together with the generally short design time, makes integrated circuit design a challenging process. For the physical implementation of complex circuits, three layout methodologies are commonly used: full-custom design, gate array, and standard-cell approach (semicustom design).
Full-custom design is the design methodology whereby the layout of each function and transistor is fully optimized. Though this approach is capable of attaining the objective of minimizing the power consumption of circuits, it can lead to low design productivity. Therefore, this approach is not recommended for Application-Specific Integrated Circuits (ASIC) and processors.
The second methodology is the gate array approach. Gate arrays consist of already-implemented cells and thus require only personalization steps. The design of the logic gates includes the wiring of different transistors from the continuous array of nMOS and pMOS transistors in the internal cell array using metallization and contact. This methodology allows the reduction of design cost at the expense of some other constraints such as the area, power, and performance.
The third methodology is the standard-cell approach. With the creation of digital library cells, several logic gates and functions can be created and compiled in the library. Generally, in the library cell approach, two layout styles exist. The first is to optimize the cell area to reduce the silicon space. The second is to optimize the cell performance, usually resulting in high speed and requiring more space. This methodology provides lower cost and higher productivity in speeding up the design process.
1.4.1 Latches and Flip-Flops
Latches and flip-flops are basic sequential elements commonly used to store logic values and are always associated with the use of clocks and clocking networks. The clocking network with its 20 to 40% contribution to the overall power dissipation is a major obstacle in implementing high-performance systems . This deterrent leads to a growing need to improve clocked structures. The analysis and previous research [1416,42] suggest that the main focus for low-power design must be the off-chip power, the cache, latches, and flip-flops. The off-chip power cannot be reduced unless the off-chip circuits are optimized. The cache design styles, the size, and the cache access and its coherency maintenance algorithms dictate the power dissipation of the cache.
The direction taken by research in a field is a function of the prevailing design philosophy, the requirements imposed on that field by other disciplines, and the shortcoming of existing designs/systems. For latches and flip-flops, the story is no different. Latch and flip-flop designs at any point in time are natural outcomes of important design requirements of those times and the primary use they are put to. The quality measures of latches and flip-flops have also evolved along the same lines and have shaped their design theme. The main features of the theme are functionality, synchronous versus asynchronous, area optimization, performance, pipelining, and high-speed/low-power operation.
Apart from ensuring the functionality of the circuit, the design should be implemented using the least number of transistors to reduce the area. A large decrease in the number of transistors is possible by utilizing the bi-directionality feature of the MOS transistorthe pass transistor design style. The pass transistor version of the D flip-flop requires only 12 transistors. Despite the area optimization advantage, pass transistor designs do not provide full swings nor do they isolate the outputs from inputs. One of the concerns that is associated with clocked flip-flop designs and that affects their performance is the skew that develops between different clock phases. To address this concern, each phase should be routed in an identical manner to the others. However, two parallel long wires tend to suffer from crosstalk. Furthermore, when the chip's operation speed is above 100 MHz, it is difficult to generate nonoverlapping clocks and to control the clock skew properly in a VLSI chip due to the statistical variations of components in the clock distribution path . Such a difficulty involved in routing many clock phases rekindles the interest, in many cases, in single-phase clock structures.
Pipelining improves throughput in combinatorial circuit design. Pipelines are constructed by breaking up a large circuit into many stages by inserting registers. The objective here is to develop a methodology by which the latches and flip-flops (see chapter 5) could be interspersed with logic to yield fast pipeline structures.