1. Introduction
Standalone and embedded random-access memories (RAMs) have evolved rapidly, and their high density, low power, and low cost have contributed to improving the affordability and performance of electronic systems such as computers, communication systems, and consumer products. In research and development, the density of standalone RAMs has reached the 4-Gb level for dynamic RAMs (DRAMs) [1] and 72-Mb for static RAMs (SRAMs) [2, 3], along with a reduced RAM cell area, as shown in Figure 1 [4].
Figure 1
In embedded RAMs (e-RAMs), recent developments have focused on high speed under low voltages, exemplified by the 1.5-V, 300-MHz, 16-Mb DRAM macro [5] and the 1.5-V, 1-GHz, 24-Mb L3-SRAM cache [6]. Device miniaturization and the rapidly growing demand for mobile or power-aware systems have resulted in an urgent need to reduce power-supply voltage (VCC) (Figure 2). In standalone RAMs, the standard VCC has been reduced to as low as 1.8 V. In e-RAMs, the voltage has been lowered even more, because it is based on that of the logic circuits in microprocessing units (MPUs) [7], reaching below 1.5 V. In particular, the need for e-RAMs to have low-voltage and small memory cells will become increasingly greater, because they are expected to occupy more than 90% of the area of systems-on-a-chip (SoCs) [8]. Reducing the supply voltage to the region below 1 V, however, places three stringent constraints on design [4]:
-
Maintaining a high signal-to-noise-ratio (S/N) for RAM cells to operate stably.
-
Reducing the leakage currents (especially gate-tunnel current and subthreshold current) in MOSFETs, which increases considerably when the gate-oxide thickness (tOX) and the threshold voltage (VT) are reduced.
-
Suppressing speed variations that become prominent at low voltages as a result of design parameter variations.
Figure 2
Unless these problems are solved, RAMs will never be able to operate reliably. In addition, the low-power advantage of CMOS circuits will be lost, and we can envision a scenario in which even CMOS SoCs would suffer from huge dissipations of dc power caused by subthreshold currents, as was the case in the recent bipolar and BiCMOS large-scale integration (LSI) eras.
In particular, reducing subthreshold current is extremely important in RAM circuit design and in random logic LSIs. To the best of our knowledge, the importance of reducing subthreshold currents in low-voltage high-speed room-temperature operation LSIs only became apparent in 1991 [9] as a result of innovative developments with 1.5-V high-speed DRAMs [10, 11]. In addition to the preceding reduction schemes through dynamic substrate control and power switches [12], other key solutions to reduce subthreshold current were proposed in the early 1990s [13–17], although these were all in the standby mode. A solution to reduce subthreshold current in the active mode was presented as early as 1993 using a hypothetical 16-Gb DRAM [18]. Although numerous attempts have subsequently been made in both RAMs and logic LSIs, the problem of reducing subthreshold current in the high-speed active mode remains unsolved, especially in random logic LSIs.
2. Trends and challenges with low-voltage RAMs
There are three major issues in producing low-voltage RAMs—stable RAM-cell operation, reduced leakage currents, and suppression of speed variations that are prominent at a lower voltage. However, developments toward creating a smaller cell and lower power dissipation with the simplest processes possible must also be viewed as major concerns for RAMs, because the three issues are closely related to the degree of device miniaturization and low-voltage operation. The intention of this section is to clarify the issues common to both DRAM and SRAM technology trends. For this discussion, we have mainly assumed the standalone RAM chip shown in Figure 3. The chip comprises a RAM array, iterative circuit blocks such as decoders and drivers, peripheral logic circuits, I/O circuits, and on-chip voltage generators that bridge the supply-voltage gap between the memory cell array and peripheral circuits.
Figure 3
Cell signal charge
The signal charge, QS (QS = CSVDD/2), where CS is storage capacitance), has been reduced through device miniaturization and low voltage, as shown in Figure 4(a) [9, 19]. This reduction destabilizes DRAM-cell operations because of a smaller signal voltage on the data line (DL) in a noisy memory array and larger soft-error rates (SERs). The QS of SRAMs is significantly smaller than that of DRAMs by 1 to 1.5 decades. Thus, the SERs of SRAMs increase rapidly as a result of decreased parasitic CS and rapid reduction in operating voltage despite spatial scaling. In contrast, the SERs of DRAMs decrease gradually with device scaling, as shown in Figure 4(b) [20], as a result of the intentionally increased CS and spatial scaling that causes less collection of charges.
Figure 4
The QS is effectively reduced by the ever-increasing necessary VT, VT variation, and VT mismatch under a given VDD. As shown in Figure 5(a), the necessary VT of RAM cells must be increased with greater memory capacity even under ever-lowering VDD. The increase in VT is due to specifications, where the maximum refresh time, tREFmax, required of standalone DRAMs must lengthen with memory capacity, and the data-retention current of SRAMs in power-aware systems must almost be constant. The VT variation slows down the half-VDD DRAM sensing and reduces the available signal charge of SRAM cells. The VT mismatch between cross-coupled/paired MOSFETs in a large number of DRAM sense amplifiers (SAs) and SRAM cells also increases with increased memory capacity and decreased device size, degrading the sensing margin of DRAM cells and the voltage margin of SRAM cells [4].
Figure 5
Unfortunately, even in the absence of extrinsic variations (implant nonuniformity and channel length/width variations), there is an intrinsic VT variation that increases with device scaling as a result of random microscopic fluctuations in dopant atoms in the extremely small channel area. The standard deviation for this intrinsic random VT variation is expressed by
 |
(1) |
where q is the electronic charge, COX is the gate-oxide capacitance per unit area, NA is the impurity concentration, D is the depletion layer width under the gate, L is the channel length, and W is the channel width [21]. The standard deviation of VT mismatch (offset voltage) ( VT) is
times (VT). The maximum VT mismatch | VT|MAX, however, depends not only on the device parameters, but also on the number of MOSFETs, N, used in the chip. The ratio m = | VT|MAX/ ( VT) increases with N, and its expected value is expressed by
 |
(2) |
The calculated maximum VT mismatch in the n-MOSFETs used in DRAM SAs and SRAM cells is shown in Figure 5(b), where gate areas LW of 9F2 (F: feature size) and 2F2 are assumed, respectively. The mismatch is doubled with feature-size scaling from 0.35 µm to 0.1 µm. It should be noted that the VT in SRAM cells, as much as 50 mV in a 128-Mb SRAM, is more serious because of larger N and smaller LW. Enlarging MOSFETs to reduce the VT is fatal for a large-capacity SRAM because of increased SRAM cell area, while it can be done for DRAM SAs without substantially increasing the chip area because only one SA is placed on a pair of DLs.
One method to solve the VT-mismatch problem of DRAM SAs is the mismatch-compensation circuit technique [22, 23], which, however, causes area and access overheads. Therefore, a column-redundancy technique is needed to eliminate a certain percentage of SAs with excessive VT to maintain the ratio m' = | VT|'MAX/ ( VT) at a constant. Here, | VT|'MAX is the maximum VT after application of a redundancy technique. For example, if the ratio of spare columns to normal columns is 1/256 (0.4% of array area penalty), | VT|'MAX is limited to 2.9 ( VT). As a result, the memory capacity limitation is extended by at least three generations, as Figure 5(b) shows. An efficient test method to detect and replace defective SAs (with excessive VT) is also needed. On the other hand, the mismatch of SRAM cells results in random bit defects, which require quite a large number of programmable elements for storing defective addresses (three million for a 32-Mb SRAM with 128-kb spare cells). Thus, an on-chip error-checking and correcting (ECC) circuit is indispensable [24, 25].
Leakage currents
Both subthreshold current and gate-tunnel current greatly affect the operation of RAM cells and peripheral circuits, not only in the standby mode but also in the active mode.
Subthreshold leakage current
In a DRAM cell, a subthreshold leakage current flowing from the cell storage node to the data line shortens the data retention time. In an SRAM, the data retention current of the cell caused by the leakage is dramatically increased, along with decreasing VT, as Figure 6(a) shows [26]. For example, the subthreshold current of a 1-Mb SRAM array reaches as much as 10 A at VT = 0 V and 50°C, although it can be as small as 3 µA at VT = 0.65 V, which corresponds to the maximum retention current acceptable for a standalone SRAM for cellular-phone applications. Here, VT = 0 and 0.65 V are minimum VTs corresponding to nominal VTs of 0.1 V and 0.75 V, respectively, with an assumption of a VT variation of ±0.1 V. Thus, the currents prevent the VT of both DRAM and SRAM cells from scaling, as mentioned above. The leakage current in peripheral circuits, even in the active mode, also becomes huge, as exemplified in Figure 6(b) by a hypothetical 16-Gb DRAM [18]. At present, our main focus is on subthreshold current in the standby mode, because the VT is still too high. For further reductions in VT, however, even numerous circuits, especially the iterative circuit blocks that are inactive during the active period, will start to generate subthreshold currents, causing a huge active current in the chip.
Figure 6
Gate-tunnel leakage current
A solution to the issue of gate-tunnel leakage current is also urgently required in designing RAMs for power-aware systems because the gate-oxide thickness, tOX, has been rapidly decreasing, as Figure 7 shows [27]. Recently, MPUs—and thus on-chip SRAM caches—have accelerated the trend to reduce tOX at a rate of ×0.175 over the last ten years, which is almost two times faster than that for standalone DRAMs, and thus, operation of core circuits at less than 1.5 V has become popular. The tOX of standard DRAMs has not been reduced so dramatically as that of MPUs (i.e., SRAMs) because of the need for stable memory-cell operations and low cost. DRAM cells have needed a high operating voltage and thus, a thick-tOX MOSFET for stable operations with word bootstrapping, although a low-voltage—and thus a thin-tOX—MOSFET could be accepted for peripheral circuits. Eventually, a single thick-tOX MOSFET was used throughout the chip to decrease cost. Recently, however, a dual-VDD and dual-tOX device approach similar to that taken with MPUs has become popular in e-DRAMs to achieve higher speeds, exemplified by an 8-Mb e-DRAM with 3.7-ns access (Figure 7) [28], and a 3.3-ns-cycle 6.6-ns-access 16-Mb macro with a dual VDD (1.5/2.5 V) and triple tOX (1.7/2.2/5.2 nm) [5]. Even for standalone DRAMs, the dual-tOX approach would, in the future, be useful for high speed and low power. In this case, the thin tOX of the periphery would follow the International Technology Roadmap for Semiconductors (ITRS) [29], while the thick tOX of memory cells would follow a different path [Figure 5(a) and Figure 7], because it is not scalable, even if devices become increasingly miniaturized, as previously explained. Note that MPU and DRAM performances will slow down, because the pace of the tOX reduction projected by the ITRS [8] will slow down. Moreover, even the ITRS projection cannot be achieved without reducing the rapidly increasing gate-tunnel current developed at a tOX of less than 2–3 nm. Unfortunately, however, there have only been a limited number of circuit solutions. For example, the gate leakage current in RAM cells can be suppressed to some extent by reducing the supply voltage [25, 30]. The gate leakage current in peripheral circuits can be suppressed by shutting off the supply path by inserting a thicker-tOX switch [31]. The schemes can be applied only for standby mode. Since the current in the active mode must also be reduced, development of new gate-dielectric materials with low leakage and high dielectric constant appears to be the most desirable solution.
Figure 7
Speed variations and other issues with peripheral circuits
It is essential to suppress speed variations of peripheral circuits because the degree of speed variation for any given variation in design parameters is increased by lowering VDD, exemplified by (VT)/(VDD VT) [32]. Unfortunately, design parameters such as VT increase with technology scaling, as mentioned previously. The challenge is to instantaneously raise the gate-input voltage, to reduce speed variations through stringent controls of design parameters, such as VT, and to control VT or compensate for VT variation through circuit techniques. Power management is an effective way to suppress speed variations, as well as to reduce the power of power-aware systems. Testing methodology that is relevant to leakage currents is also a major area of concern.
3. Low-voltage RAM cells
DRAM cells
One-transistor cells for standalone DRAMs
A smaller cell is the first priority in standalone DRAMs for a given cell-signal voltage ( CSVDD/2CD = QS/CD, where CD is data-line capacitance) of approximately 200 mV read out on each DL. Applying a self-aligned contact to memory cells is essential to reduce the cell area despite the speed penalty inflicted by the increased contact resistance. Leading developments of standalone DRAM cells in research and development are a 6–4F2 trench-capacitor vertical-MOSFET cell [33, 34] and a 6F2 stacked-capacitor open-DL cell [35]. Here, the open-DL cell necessitates a low-impedance array to suppress inherent array noises [4, 36] generated by imbalances between a pair of DLs, each of which is placed in different subarrays. For standalone DRAMs, as many memory cells as possible must be connected to each DL-pair to realize a smaller chip by reducing the overhead area at each DL-division, thus causing a larger CD. Instead, a large signal charge, QS, is needed for the necessary signal voltage. Thus, a larger CS is desirable to lower VDD, which has been attained with sophisticated vertical (stacked/trench) capacitors and high dielectric constant (high-k) thin films. The subthreshold current caused by the resulting low VT is cut by the negative word-line (NWL) scheme [4] with a gate-offset during nonselected periods, as is discussed in the subsection on circuit applications in Section 4. NWL also reduces the high-level word-line voltage necessary for a full-VDD write operation, enabling the use of a thinner-tOX MOSFET for a given stress voltage [37]. Hence, low-voltage operations with a resulting small subthreshold swing (S-factor) are realized.
One-transistor cells for e-DRAMs
The key to achieving high-performance e-DRAM is to use logic-compatible processes with a non-self-aligned cell contact and a MOS-planar capacitor and an extremely small subarray through the multi-divided DL [4]. The resultant increased cell area may be acceptable for e-DRAMs as long as it is significantly smaller than the six-transistor (6-T) full CMOS SRAM cell [7]. In addition, the resulting small CS is accepted by the resulting small CD, still enabling a sufficient signal voltage. Even increased SERs due to the small CS could be solved by using an ECC [24]. The small subarray coupled with the low contact resistance of cells reduces array-relevant line delays that are major bottlenecks in the access/cycle path. Thus, DRAMs could achieve an even faster access time than SRAMs as a result of the smaller physical size of their subarrays for a given memory capacity. In addition, the small subarray, coupled with circuit techniques such as multi-bank interleaving, pipeline operation, and direct sensing [4], solves the speed problem in the row-cycle of DRAMs. A good example is the so-called 1-T SRAM** [38], which incorporated a 1-T DRAM cell with a CS smaller than 10 fF using a single polysilicon planar capacitor and an extensive multi-bank scheme with 128 banks (32 Kb in each) that can operate simultaneously. Somasekhar et al. achieved a row-access frequency higher than 300 MHz for a 0.18-µm, 1.8-V, 2-Mb e-DRAM with a planar capacitor cell [39].
Gain cells
Gain cells such as 3-T and 4-T cells seem to be promising when the supply voltage is reduced to less than 1 V [40]. Figure 8(a) compares areas of various RAM cells. The 1-T cell achieves an area of 8F2 when a self-aligned contact, triple polysilicon, and vertical capacitors are used. The cell becomes larger when the contact is replaced by a non-self-aligned contact. The 3-T and 4-T DRAM cells and the 6-T SRAM cell are also shown in the figure. They do not require a special capacitor [7] and they can be fabricated by a logic-compatible process with non-self-aligned contact and single polysilicon. Obviously, in terms of the cell area and simplicity of process, the 3-T cells are attractive compared with 1-T cells and the 6-T cell. Their advantages become more prominent at a lower VDD. Figure 8(b) compares effective cell areas for VDD. Here, the effective cell area is the sum of the actual cell area and overhead area involved in the DL divisions. Note that even a high-QS 1-T cell requires more DL divisions at a lower VDD to maintain the necessary signal, causing a rapid increase in the effective cell area with decreasing VDD [8, 27]. The lack of gain in the 1-T cell is responsible for the increase. On the other hand, the 3-T, 4-T, and 6-T cells are all gain cells that can develop a sufficient signal voltage without increasing the number of DL divisions, even at a lower VDD, and thus provide a fixed effective cell area that is independent of the VDD. Actually, however, the VDD has a lower limit for each cell. For the 3-T cell, it would be around 0.3 V, assuming a VT for the storage MOSFET of around 0 V, an NWL scheme of VWL = 0.5 V for both read/write lines, and a low VT for the read/write MOSFETs of VT (r) = 0 and VT (w) = 0.3 V. An initial stored voltage (Vstore) of 0.3 V for the cell, and even a decayed Vstore of 0.1 V, can be discriminated because of the gain if an improved sensing scheme is developed. The detection of and compensation for VT variations and an additional capacitor at the storage node would further improve stability and reliability. For the 4-T cell, it would be as high as 0.8 V, because the VT of cross-coupled MOSFETs must be higher than 0.8 V to ensure enough tREFmax, and thus the VDD must be higher than this voltage. The 6-T SRAM cell would be around 0.3 V if a raised supply voltage (VDH) (e.g., 0.5 V) were supplied from an on-chip charge pump, as explained in the next subsection. Consequently, the effective cell area of the 3-T cell would be smaller than other cells at a VDD of less than 0.7 V. Note that the small polysilicon vertical-transistor 2-T 5F2 cell recently proposed by Nakazato et al. [41] is another example of a gain cell, despite the small current drivability of the transistor.
Figure 8
In any event, in addition to the low junction temperature caused by the ultralow VDD, the wide voltage margin provided by gain cells would enable a sufficient tREFmax. Adjusting the potential profile of the storage node to suppress the pn-leakage current further lengthens the tREFmax and preserves the refresh busy rate, even in larger-memory-capacity DRAMs [4], or it lowers the data retention current in the standby mode. Even if the tREFmax were short, fast e-DRAMs, combined with a small subarray and new architectures, would allow the tREFmax to be drastically shortened, as discussed in the following.
The tREFmax is expressed as tREFmax = n(tRC/ ), where n is the refresh cycle, tRC is the RAS cycle time, and is the refresh busy rate, defined as =n(tRC/tREFmax) [4]. This means that tREFmax can be made smaller by reducing n tRC or increasing . Figure 9 shows an example of tREFmax for a 64-Mb DRAM. There are two cases; the first is for a standalone DRAM where n = 4k (4k refresh cycles) and the second is for an e-DRAM where n = 64. Note that tREFmax can be as short as 0.64 µs for tRC = 1 ns, and the refresh busy rate is 10%, while it is 40 ms for a standalone DRAM. Here, a 10% refresh busy rate may be acceptable if refreshes are hidden, as has been done in the 1-T SRAM [38]. One drawback of this scheme is to increase the refresh current (IREF) that is expressed as IREF = M CDVDD/2 tREFmax, where M is the memory capacity (i.e., 64 Mb in this example). IREF can increase to as high as 1.3 A in e-DRAMs, while it is as low as 0.32 mA in standalone DRAMs. However, this current may be acceptable for high-performance applications, such as the on-chip cache memories of high-performance MPUs [39].
Figure 9
SRAM cells
Reducing cell area is the greatest concern in SRAMs, as is suggested by the on-chip, 3-MB, L3 cache [6]. The loadless CMOS, 4-T SRAM [42] shows promise because the cell area is only 56% of that of the 6-T cell. However, it suffers from the data-pattern problem, and it is difficult to accurately control the nonselected word-line voltage to maintain the load current. At the present time, the 6-T cell is the best, despite its large area, because it enables the use of a simple process and design made possible by the wide-voltage margin of the cell. Even in the 6-T cell, however, subthreshold currents and gate-tunnel currents as well as the gate-induced drain leakage (GIDL) increase the retention current with lowering VT and decreasing tOX [43]. Thus, this applies strict limits on how much VT can be reduced. In addition, the soft-error issue is another concern.
To solve this problem, many driving methods and an optimal design for the cell of a small low-voltage cache have been proposed [4, 44]. Recently, a new driving scheme (Figure 10) has been proposed and applied to a 1.5-V, 27-ns access, 6.42 × 8.76 mm2, 16-Mb SRAM [25]. The scheme, which lowers the data-line voltage from 1.5 V to 1 V and raises the ground line to 0.5 V at an active-standby mode transition, reduces the total leakage current per cell in the standby mode. At ambient temperature, the measured total current of the conventional is 95 fA. The largest component is the sum of subthreshold current and GIDL current of the n-MOSFET and p-MOSFET, although the VTs are as large as 0.7 V and 1 V. The gate-tunnel current of the n-MOSFET is comparable to the above, despite an electrical tOX as thick as 3.7 nm. The scheme greatly reduces the total current (to 17 fA). An offset source driving (discussed in the subsection on circuit applications in Section 4) by 0.5 V applied to the driver and transfer n-MOSFETs and an electric field relaxation by 0.5 V for all MOSFETs are responsible for the reduction. The reduction is more remarkable at a higher temperature. At 90°C, the total current of the conventional scheme is drastically increased to 1240 fA because of an increase in the subthreshold-current component. Note that GIDL current and gate-tunnel current are insensitive to temperature. The scheme reduces the total current to 102 fA. To cope with the increased SER caused by the reduced signal charge in the standby mode, an ECC was incorporated with a speed penalty of 3.2 ns and an area penalty of 9.7%, although an additional cell-capacitor can also improve the SER [Figure 5(a)] [45, 46].
Figure 10
Figure 11 shows another solution. The cell features a combination of a low-VT transfer MOSFET coupled with an NWL, a boosted power supply (VDH), and high-VT cross-coupled MOSFETs [47, 48]. The NWL increases cell read-current (Icell) without inducing subthreshold current in transfer MOSFETs. The high-VT MOSFETs reduce the subthreshold current. The VDH increases the signal charge, QS, and the drivability of driver MOSFETs against the high VT and VT imbalance. As a result, the cell read-current and the static noise margin (SNM) are dramatically improved, as shown in Figures 12 and 13. The cell read-current increases while SNM decreases as the VT of transfer MOSFETs decreases. However, both the current and SNM increase as the VDH is raised. A usual design condition of Icell 20 µA and SNM 100 mV can be realized by VDH VDD 100 mV at 1.0-V VDD [Figure 12(a)]. Even at an 0.8-V VDD and the same VDH VDD, it is realized by a lower VT of the transfer MOSFETs [Figure 12(b)]. Moreover, the cell features a strong immunity against VT imbalance, the same as VT in the previous section. Figure 13 shows SNM calculated for the worst combination of VT imbalance in a cell. For example, at an imbalance of 100 mV, the lower limit of VDD to achieve an SNM of 100 mV is 0.6 V without boosting (i.e., VDH = VDD). However, it becomes as low as 0.3 V at VDH - VDD = 100 mV. There are no VDD limitations at VDH VDD = 300 mV. Even for an imbalance as large as 300 mV, the VDD is as low as 0.35 V when VDH is boosted by 300 mV. Power overhead for generating both VDH and negative word-line voltage of is negligible in the active mode. The overhead is only 70 µA, for a total operating current of about 9 mA with assumptions of 128 cells per word line, a 32-b write bus, a VDH VDD of 300 mV and 0.5-V , and 300 MHz at a 1-V VDD. In the standby mode, however, the generator current becomes larger than the total leakage current of the cell array, calling for a generator-current reduction through circuit techniques that are familiar to DRAM designers [4].
Figure 11
Figure 12
Figure 13
4. Reduction of subthreshold current in peripheral circuits
Reduction scheme concepts
Increasing VT is the best way to reduce the subthreshold current Ileak of a MOSFET that is expressed by
 |
(3) |
where plus values refer to n-MOSFETs and minus values to p-MOSFETs, VT is the actual threshold voltage, S is the subthreshold swing, K is the body-effect coefficient, and is the drain-induced barrier lowering (DIBL) factor [49]. Here, q is the electronic charge, k is the Boltzmann constant, and T is the absolute temperature. Usually Ileak is reduced to 1/10 with a VT increment of only 0.1 V (i.e., S ~ 0.1 V/decade at 100°C). The two ways of obtaining a high-VT MOSFET from a low-actual-VT MOSFET are by increasing the doping level of the MOSFET substrate and by applying reverse biases. Thus, the selective use of the resulting high-VT MOSFETs in low-actual-VT circuits or the reverse biasing of low-actual-VT circuits decreases circuit subthreshold currents.
Although there have been many attempts to develop reverse-biasing schemes, the basic concepts can still be categorized into the three shown in Table 1:
-
(A) Gate-source (VGS) reverse biasing.
-
(B) Substrate-source (VBS) reverse biasing.
-
(C) Drain-source voltage (VDS) reduction.
Table 1
Here, the VGS reverse biasing scheme can be further categorized as VS-control with a fixed VG (A1) [14, 15] and VG-control with a fixed VS (A2) [13]. The VBS reverse biasing schemes can be categorized as VB-control with a fixed VS (B1) [12, 50] and VS-control with a fixed VB (B2) [51, 52].
The efficiencies for reducing leakage for offset voltage are plotted in Figure 14 using 0.1-µm MOSFET parameters. The reduction efficiency of (A2) is the Ileak ratio without and with VGS reverse bias:
 |
(4) |
Figure 14
This is quite large because has been directly added to the low-actual VT. The reduction efficiency of (B1) is calculated in the same manner:
 |
(5) |
This is smaller than r1 because of the square-root dependence on and the small K. (C) has quite a small reduction efficiency of
 |
(6) |
because of the small , unless VDS approaches thermal voltage (kT/q), where Ileak is drastically reduced as the second factor of Equation (3). Scheme (A1) has the largest reduction efficiency of r1r2r3 because all three effects are combined. (B2) has a reduction efficiency of r2r3, which is larger than that of (B1) because of the additional effect of reducing VDS. Note the inherently small offset voltage required to reduce the given leakage provided by scheme (A). This effectively reduces not only the subthreshold current in low-power mode, but also achieves a faster recovery time in high-speed mode, as is explained in the next subsection.
The concept involve two types of biasing, static and dynamic. The former, or so-called dual-VT scheme, is to statistically combine low-VT MOSFETs and the resulting high-VT MOSFETs in core circuits. A CMOS dual-VT scheme [53, 54] in which a low VT is applied only to the critical path occupying a small portion of the core is quite effective in simultaneously achieving high speed and low-leakage current, although the basic scheme was proposed for an n-MOSFET 5-V 64-Kb DRAM [55]. A difference in VT of 0.1 V reduces the standby subthreshold current to one-fifth its value for a single low VT, although an excessive VT difference might cause a race condition problem between low- and high-VT circuits. The dual-VT scheme is also applied to SRAMs [54, 56]. It was reported that a combination of dual VT and dual VDD achieved a high-speed low-power 1-V e-SRAM [56]. Another application of the dual-VT scheme is a high-VT power switch [12, 14–18] that can cut the subthreshold current of an internal low-VT core in standby mode, as described in the subsection on circuit applications. High-VT MOSFETs can easily be produced in a DRAM [57] by using the internal supply voltages that are required by DRAMs, as explained in the subsection on applications to RAMs. The high VT, however, eventually restricts the lower limit of VDD as the transconductance of the MOSFET degrades at a lower VDD.
The latter changes the VT so that it is low enough in high-speed modes, such as active mode with no reverse bias, while in low-power modes, such as standby mode, it is increased by changing bias conditions, as shown in Table 1.
Circuit applications
This section reviews dynamic biasing schemes based on the above basic concepts, assuming circuits in which all MOSFETs have a low actual VT.
Gate-source self-reverse biasing (A1)
Figure 15(a) is a circuit diagram for self-reverse biasing. It features a low-VT switch p-MOSFET QSP inserted between the source of the MOSFET QP and VDD. The MOSFET QSP stacked to QP is a kind of power switch, working as a source impedance turning on and off during respective active and standby modes. A subthreshold current flowing from QP when QSP and QP are off in the standby mode generates an offset voltage, , on VDL as shown in Figure 15(b), automatically providing a reverse bias to QP so that the current is eventually reduced. This biasing is a combination of VGS reverse biasing, VBS reverse biasing, and VDS reduction, providing the primary effect to VGS reverse biasing and the secondary effect to VBS reverse biasing and VDS reduction, as described above. The gate voltage is VDD, not VDL, to take advantage of the VGS reverse bias. Note that no matter how large the original leak current at QP is, it is eventually confined to the constant current of QSP through the automatic adjustment of the offset voltage . Here, is expressed as VTS VTP + S log(WP/WS), and the current reduction ratio is expressed as 10 /S if secondary effects are neglected [4]. Thus, the reduction is adjustable with , that is, VTS and WS. If VTS is high enough, the current is completely cut off with a larger , creating a perfect switch. A large , however, results in slow recovery time, large charging/discharging current, and spike noise at mode transients. If VTS is low enough, however, becomes smaller (allowing leakage flow), causing an imperfect (leaky) switch, but the above problems are reduced. Moreover, a low-VT switch is favorable to reduce the necessary channel width of QSP, because the increased transconductance can supply the accumulated current of the logic core with a smaller channel width, especially at a lower VDD. Sharing a low-VT switch through iterative circuits in RAMs [Figure 15(c)] is quite effective [14, 15]. Because a feature of RAM circuits is that only one of the iterative circuits is active, WS can be comparable to WP with little speed penalty in the active mode, while = S/log (nWP/WS) in the standby mode for VTS = VTP. Therefore, both leakage and area penalty as a result of adding QSP are negligible with increasing n (i.e., ). To be more precise, secondary effects must be taken into consideration: The substrate connection of QP to VDD creates substrate reverse bias. The effect of reduced VDS is also added if is large (i.e., a small VDS).
Figure 15
An extreme case of WS = WP and n = 1 is in the Ileak reduction of series-connected MOSFETs, the so-called stacking effect [58, 59]. This effect can be explained by a combination of self-reverse biasing (A1) and VDS reduction (C), as Figure 16 shows, though (C) is not used alone. The leakage current of QP is reduced through self-reverse biasing, while that of QSP is reduced through reducing VDS. The node-voltage-lowering VM at the connection and the Ileak reduction efficiency are determined by the equilibrium of the two currents and expressed by the crossing point of the two curves. Because the reduction efficiency becomes larger as the number of series MOSFETs becomes larger, the Ileak of NAND gates using series-connected n-MOSFETs is efficiently reduced.
Figure 16
Offset gate driving (A2)
Figure 17(a) shows offset gate driving, where the input voltage is “overdriven” by . This is difficult to apply to random logic circuits because the logic swing of the output must be smaller than that of the input. However, it is useful to reduce Ileak in bus drivers [13], in power switches that have a low actual VT (Figure 17(b) [60]), and in RAM cells (Figure 17(c) [47, 61]), as was previously explained. Offset gate driving applied to an imperfect switch reduces Ileak in standby, realizing an effectively perfect switch. However, the problems of a perfect switch described above arise.
Figure 17
Substrate (well) driving (B1)
Figure 18(a) shows the circuit for substrate (well) driving, where the substrate voltages of MOSFETs in core circuits change between active and standby modes [12, 50, 62, 63]. Figure 18(b) shows the operating waveforms. This scheme can also be applied to reduce Ileak in power switches (Figure 18(c) [64]).
Offset source driving (B2)
Figure 18(d) [51, 52] has the circuit for offset source driving, with switches QSP and QSN inserted between the MOSFET sources and power supplies. Note that this is quite different from (A1), though both utilize source switches. The input (gate) voltage of (A1), which is the output of the previous stage, is “full swing” (VDD), while that of (B2) is not (i.e., VDL or VSL). This difference results in the large discrepancy in Ileak reduction efficiency, as shown in Figure 14. From this viewpoint, power switches [17] applied to logic circuits can be categorized as (B2). Another application of this scheme is to reduce Ileak in SRAM cells [25, 65], as was discussed earlier.
Figure 18
Comparison
There is a big difference between the two schemes (A) and (B) in mode-transient time, especially recovery (standby-to-active) time. In VGS reverse biasing, the small voltage swing, , enables quick recovery (several nanoseconds). In VBS reverse biasing, however, it takes more than 100 ns for recovery when it is applied to a power line, because VBS reverse biasing requires a large VB swing ( VB) or VS swing ( VS), which is usually more than 1.5 V for a given change in VT ( VT). The necessary voltage swing imposes different requirements on substrate driving (B1) and offset source driving (B2). In (B1), the necessary voltage is significantly larger than VDD, which is the sum of VDD and VB. For example, existing MOSFETs with a 0.2-V1/2-body-effect coefficient (K) require a VB as large as 2.5 V to reduce the current by two decades with a 0.2-V VT. A larger-K MOSFET is needed to reduce the swing. However, this slows down the speed in stacked circuits, such as NAND gates. In contrast, the K value decreases with MOSFET scaling, implying that the necessary VB will continue to increase further in the future owing to a lower K, and there will be a need for a larger VT reflecting the low-VT era. Eventually, this will enhance short-channel effects and increase other leakage currents, such as the GIDL current [66]. A shallow reverse VB setting, or even a forward VB setting in active mode, is also required to effectively increase VT in standby mode, because VT is more sensitive to VB [4]. However, the requirements to suppress VB noise will instead become more stringent. In fact, a connection between the substrate and source every 200 µm [63] to reduce noise has been proposed, despite an area penalty. In addition, problems inherent in LSIs with an on-chip substrate bias (VBB) generator, which DRAM designers have experienced since the late 1970s, may occur even though VDD is low. These problems include spike current and CMOS latch-up during power-on and mode transitions, VBB degradation caused by increased substrate current in high-speed modes and screening tests at high stress VDD, and slow recovery time as a result of poor current drivability of the on-chip charge pump.
In offset source driving (B2), the necessary voltages and voltage swing at any node are smaller than VDD. This control becomes ineffective as VDD is lowered owing to a smaller substrate bias. However, the problems described above accompanied by an on-chip VBB generator are not expected.
The energy overhead of offset source driving (B2) through mode transitions is usually larger than that of substrate driving (B1). This is because the parasitic capacitances of source lines (VDL and VSL) are larger than those of substrate lines (VBBP and VBBN), though the necessary is smaller, as shown in Figure 14. The parasitic capacitances of VBBP and VBBN consist mainly of junction capacitances between substrate (well) and source/drain of MOSFETs, while those of VDL and VSL include the gate capacitances of on-state MOSFETs as well as junction capacitances. The energy overhead of self-reverse biasing (A1) is quite small because of small and self-adjusted .
Applications to RAMs
Features of RAMs
In the active mode, reducing leakage is extremely difficult because of the limited time to control it. In the standby mode, it is rather easy because there is sufficient time available. Fortunately, however, RAM peripheral circuits favor the reduction of subthreshold current (Ileak) (Figure 19) compared with random logic gates, because of the inherent features of RAMs described in the following. These are exemplified by the modern synchronous DRAM in the figure.
Figure 19
Use of iterative circuit blocks
RAMs consist of multiple iterative circuit blocks with low activation ratios, such as row/column decoders and drivers, each of which has quite a large total-channel width involving subthreshold current. In addition, all circuits in each block, except the selected one, are inactive, even during the active period. This enables Ileak to be controlled simply and effectively with a smaller area penalty than logic LSIs, as shown in Figure 15(c).
Use of input-predictable logic
RAMs are composed of input-predictable circuits, allowing circuit designers to predict all node voltages in the chip and to prepare the most effective subthreshold-current reduction scheme (e.g., VGS self-reverse biasing) in advance. As for input nodes, which are not predictable, the level-fixing input buffer (Figure 20) [15] can force the internal node voltages to be predictable. In standby mode (signal STANDBY is at high level), internal nodes including ai, i, and the following-stage outputs are forced to be at predetermined levels, irrespective of input node Ai. Similar techniques are applied to logic LSIs, though their node voltages are usually unpredictable because they contain registers or latches to retain internal states. Latches (Figure 21) [59] that fix the output level while retaining the latched data are effective in reducing Ileak in sleep mode. Level-fixing flip-flops [67] combined with self-reverse biasing [15], power switches [60], and level holders [18] enable quick recovery from sleep mode. These techniques, in turn, can be applied to RAM peripheral circuits with registers or latches.
Figure 20
Figure 21
Slow cycle
RAMs feature a slow cycle tRC compared with random logic gates, and this allows each circuit to be active for only a short period within the “long” memory cycle, leaving additional time to control the subthreshold current. This is true for DRAM row circuits, which are slow enough to accept leakage controls. However, the column circuits in modern DRAMs (Figure 19) feature a fast burst cycle and unpredictable circuit operation (every column may be selected during the memory cycle). Therefore, it is difficult to reduce Ileak in column circuits in the active mode. This is the case for high-speed SRAMs and logic LSIs.
Use of robust circuits
RAMs do not use leakage-sensitive circuits, such as dynamic NOR gates, that require a level keeper to prevent malfunctions caused by leakage [68]. The decoders of modern CMOS DRAMs consist of dynamic (for the row) and static (for the column) NAND gates to reduce the power (Figure 19). NAND decoders discharge only one output node in a selected decoder, while the NOR decoders used in the n-MOS era discharged all output nodes in decoders, except for the selected one.
In contrast, it is difficult to reduce Ileak in random logic circuits because of the noniterative circuit topology, higher activation ratio, unpredictable node states, and faster cycle. Dual static VT [53], the stack effect in NAND gates described above, and circuit reordering [69] are effective to some extent in reducing Ileak in the standby mode of logic LSIs. However, reducing Ileak in random logic circuits in the active mode is more difficult. The only scheme that has been reported thus far is dual static VT, though it has limited reduction efficiency because of the limited VT difference, as previously explained. More effective schemes have yet to be discovered.
Applications to DRAM standby mode
The reduction of subthreshold leakage current applied to iterative circuit blocks, such as a word-driver block, is extremely important in memory design. For example, a low-VT p-MOS switch [QSP in Figure 22(a)] [14, 15] shared with the n word drivers of a 256-Mb DRAM [70] enables the common power line, VDL, to drop by as a result of the total subthreshold current flow of nI when the switch is off in standby mode. As it provides each p-MOS driver, Q, with a self-reverse bias, the subthreshold current, I, eventually decreases. Hence, even if an on-chip charge pump for the raised supply VDH necessary for DRAM word-line bootstrapping suffers from poor output-current drivability, the VDH is well regulated. In the active mode, the selected word line is driven after VDL is connected to a supply voltage, VDH, by turning on QSP. Here, the channel width of QSP can be reduced to an extent comparable to that of Q without a speed penalty because of the low activation ratio, 1/n, of the drivers. In a 256-Mb chip, a as small as 0.25 V reduced the standby subthreshold current of word drivers and decoders by two decades [Figure 22(b)] without inflicting penalties in terms of speed and area.
Figure 22
Another example is shown in Figure 23(a). This 256-Mb SDRAM [57] with a hierarchical word-line structure utilizes the self-reverse biasing described above combined with “pseudo” multiple static VT using substrate biasing. The circled MOSFETs in the figure are in the subthreshold region during standby mode. Here, self-reverse biasing is applied only to p-MOSFETs (open circles) that produce larger subthreshold current. This is because p-MOSFETs have larger total channel width and larger subthreshold swing due to the buried-channel MOSFET structure. The n-MOSFETs (shaded circles) and the p-MOSFETs in the column decoder have higher VT due to the respective well bias VBB and VDH. By combining both schemes, the total subthreshold leakage current in the power-down/self-refresh mode is reduced to one sixth, as Figure 23(b) shows. The current can be further reduced by applying both schemes to the peripheral circuits.
Figure 23
Applications to DRAM active mode
In the future, with a further reduction in VT, the subthreshold leakage current, IDC, will exceed the capacitive current, IAC, and eventually dominate the total active current, IACT, of the chip [Figure 6(b)], as pointed out as early as 1993 [18, 71]. VGS back-biasing applied to an iterative circuit block, which is divided into m sub-blocks, each consisting of n/m circuits (Figure 24), confines the currents to that of a single selected sub-block [18]. This is because all nonselected sub-blocks have no substantial subthreshold current due to VGS back-biasing (Figure 22) when the switch of the selected sub-block, including the selected word line, is turned on while the others remain off. The above-mentioned multi-static VT also reduces current. The subthreshold currents of low-VT circuits on the critical path are reduced by combining power switches and high-VT level holders (Figure 25) [18, 72]. The power switch goes off just after evaluating the input of the low-VT circuit and holding the evaluated output at the holder. This prevents the output from discharging, allowing the switch to quickly turn on at the necessary time to prepare for the next evaluation. This is a good example of the principle of avoiding large voltage swings with heavily capacitive loads. In fact, it has been reported that these circuits could reduce the active current of a hypothetical 16-Gb DRAM [18, 71] from 1.2 A to 0.1 A (Figure 26), although their effectiveness with an actual chip has not yet been verified.
Figure 24
Figure 25
Figure 26
5. Speed variations and other issues with peripheral circuits
Other key peripheral circuits are sense amplifiers and low-voltage supporting circuits, such as level shifters, stress-release I/O circuits, and on-chip supply-voltage generators in RAM chips (Figure 3). They play important roles in the stability and speed of RAMs. However, well-known logic-gate blocks in peripheral circuits are also important in terms of suppression of speed variations, as explained earlier. Power management is essential for high-speed, low-power designs of the blocks. Testing methodology that is relevant to leakage currents is also a major area of concern.
Sense amplifiers
Sense amplifiers (SAs) are always slow because they manage a small signal, thus requiring high-speed design achieved by reducing speed variations. The design of SAs [4], which usually have a cross-coupled circuit configuration in terms of low power and small area, can be different for DRAMs and SRAMs. This is because the necessary size, the number in a chip, and the circuit operation are usually different. DRAMs feature a huge number of tiny SAs in a chip, because one SA must be placed at each data line due to refresh requirements. In addition, in the standard mid-point (half-VDD) sensing of DRAMs [4], the SA must operate at the lowest voltage (i.e., half-VDD) in the chip, despite the resulting halved data-line power without a dummy cell and with a low-noise array [4]. As a result, the statistically large VT variations, (VT), and low-voltage operation slow down sensing with a wide spread in speed. Increasing the size of SA MOSFETs to reduce (VT) and using redundancy and/or ECC to prevent SAs from acquiring an excessively large VT are effective solutions that are similar to those associated with the VT-mismatch issue previously explained in the subsection on cell signal charge in Section 2. In overdrive sensing [73, 74], this problem is solved by applying a higher voltage solely to SA inputs by isolating the data line from the SA or by capacitive coupling. Using additional capacitors may be acceptable in e-DRAMs, where area is of less concern. The recently presented full-VDD (or ground) sensing with a dummy cell [5], which is a revival of the kind of sensing done during the n-MOS DRAM era of the 1970s, solves the problem with a raised voltage (i.e., VDD).
SRAMs have a small number of SAs on a chip, although they must be highly sensitive for a higher speed. Thus, in addition to some of the above solutions for DRAMs, a low-voltage current SA [75] may be acceptable despite the increase in area.
Low-voltage supporting circuits
High-speed level shifters that are proposed for SoCs [76, 77] and bridge the internal low-voltage core and high-voltage I/O circuits could be used for RAMs. Low-cost stress-release I/O circuits [78–80] that manage the high voltage at the interface with a single thin tOX are also important. On-chip supply-voltage generators [4] continue to be essential in the stable operation of RAM cells with high supply voltages and in standardizing the power supply of standalone RAMs. In addition, they reduce subthreshold currents with multi-VT (Figure 23) and speed variations at lower external supply voltages, as discussed below. Key issues are a high efficiency of voltage conversion, a high degree of accuracy in the output voltage, low power during the standby period, and a low cost of implementation [27].
Power management
Power management is a solution to suppress speed variations and further reduce the power dissipation of power-aware systems through static and dynamic control of supply voltages. Power management can also effectively reduce subthreshold currents with VBB control, as mentioned earlier. Many schemes have thus far been proposed. The following subsections give a brief discussion of power-management problems that DRAM designers have experienced, followed by various viewpoints on power-management schemes that have been proposed by logic designers principally for SoCs.
In the past, DRAM designers encountered numerous problems that occurred even in static or quasi-static VBB and VDD. It is well known that the DRAM has been the only large-volume production LSI using a substrate bias that is supplied from an on-chip VBB generator. In the n-MOS DRAM era, when a quasi-static VBB was supplied to the p-type substrate of the whole chip (i.e., both array and periphery), the generator caused instabilities (surge current [55] or a degraded VBB level [4]) at power-on and during burn-in high-voltage stress tests, and shortened the refresh time of cells due to minority-carrier injection to cells [4]. Poor current drivability of the generator consisting of charge pumps, a large substrate current generated from the peripheral circuits, and the substrate structure were mainly responsible for the instabilities. Even so, DRAM designers were fortunate because both the static bias setting of a deep VBB of about 2 V to 3 V and a sufficiently high VT of about 0.5 V allowed stable chip operation with small changes in VT, even with quite large quasi-static VBB variations and VBB noise [4]. In the CMOS era, substrate bias was removed from peripheral circuits primarily to eliminate instabilities caused by the generator and has only been supplied to the array to ensure stable operation.
Even a bump as small as ±10% VDD made dynamic circuits unstable during the n-MOS era. This was due to a charge being trapped at floating nodes when voltage bumps were applied, causing malfunctions at the next cycle. Note that almost all peripheral circuits and DRAM cells were dynamic. Thus, a small diode-connected n-MOS (i.e., level keeper) was connected to the floating nodes of peripheral circuits to allow trapped charges to escape. However, bumps degraded the voltage margin of n-MOS cells, calling for grounded-plate cell capacitors [4] as a partial solution. Even in the CMOS era, memory cells, sensing relevant circuits (such as data-line precharge circuits and sense amplifiers) and row decoders/drivers were still dynamic, while other peripheral circuits have been static. Half-VDD sensing [81] (coupled with a half-VDD cell plate and a boosted word line) has been a circuitry solution because the margins of DRAM cells and the relevant sensing circuits are maintained wide despite voltage bumps. A CMOS feedback level keeper that is familiar to logic designers has been widely used for other dynamic circuits.
Static control of power-supply voltages
Static control is effective in suppressing speed variations of logic circuits while preserving stability of memory cells and memory-cell-relevant circuits. When VBB or internal VDD is statically controlled on the basis of parameter variations, inter-die speed variations can be suppressed, although intra-die speed variations remain unimproved. Negative effects, if any, when supply voltages are controlled statically or quasi-statically could be managed, as memory designers have done thus far. Controlling VBB with an on-chip VBB generator to adjust VT (the basic idea dates back to 1976 [62, 82, 83]) could be widely used to suppress the variations if the previously discussed drawbacks are rectified. Controlling forward VBB, however, is more effective in reducing speed variations [84–86] because the VT–VBB characteristics are more sensitive to VBB [4]. For example, controlling forward VBB reduced VT variations in logic circuits and improved speed of operations by 10% [85]. If a forward VBB is used, however, the requirements to suppress noise become more stringent, calling for a uniform distribution of the forward VBB throughout the chip [27]. Additional current consumption, in the form of bipolar current induced by the forward VBB, is another matter [85] that must be considered.
Control of internal VDD with an on-chip voltage-down converter (i.e., series regulator) [4] seems to be more practical, because the instabilities discussed above are not involved. In fact, a VDD control with both an off-chip buck converter and an internal-delay-detecting circuit [87] reduced the variation between speeds of the worst and best design conditions from five times to ±20% at 0.5 V. However, the use of an on-chip voltage-down converter instead of the buck converter may be more practical because designs of the converter are simpler and have been well established in DRAM designs despite a lower conversion efficiency.
Dynamic control of power-supply voltages
Dynamic control reduces power dissipation and subthreshold currents. However, the problems described above might be compounded and become serious if dynamic control of VDD and/or VBB were applied to RAM chips, because they involve wide and dynamic changes in supply voltages and extremely low VT. Nevertheless, many attempts have been made, although only for the logic blocks of SoCs. Unfortunately, RAM cells and their relevant circuits are incompatible with dynamic controls, and thus they should at least be “quiet.” Moreover, they must operate at a higher VDD. Their inherently small voltage margins are responsible for the requirements for the quiet and higher-VDD operation, as previously explained. Thus, as long as the controls never cause detrimental effects to RAM cells and their relevant circuits, some of them could be applied to parts of peripheral logic circuits (e.g., static circuits) in RAM chips or RAM blocks in SoCs. Note that SRAM blocks using full CMOS SRAM cells may accept dynamic voltage controls to some extent because of wide voltage margins, although care should be taken if dynamic sensing schemes are adopted.
Power switches [88] completely cut leakage currents of internal core circuits, although they incur a long recovery time on heavily capacitive internal power lines, as was explained in the subsection on circuit applications in Section 4. Dynamic voltage scaling (DVS) [89, 90], in which the clock frequency and VDD vary dynamically in response to the computational load, provides reduced energy consumption per process during periods when few computations are performed, while still providing peak performance when required. Note that the highest VDD and lowest VDD that DVS can accept must be determined by the breakdown voltage of MOSFETs and the stability of RAM cells, respectively. This approach, however, becomes less effective in the low-VDD era because the range across which it is possible to vary VDD becomes narrower. In addition, successful operation over a wide range of VDD requires the accurate tracking of all circuit delays. Furthermore, applying DVS would make dynamic circuits (e.g., e-DRAMs) unstable without a level keeper [90], although resultant instabilities depend on the changing rate of VDD and clock frequency.
For partially depleted (PD) SOIs, a wide changing of VDD may cause additional instabilities due to the floating-body effect. DVS does not reduce subthreshold currents; these currents are reduced by elastic-VT CMOS [52], where the clock frequency, VDD, and VBB are all dynamically varied. However, substrate noise may be coupled from the VDD power line when VDD is changed, which is hazardous in an on-chip VBB scheme. The cost and complexity of design are additional problems.
System-level low-power techniques introduced into a SoC would be effective if the problems described above could be solved. For example, ChipOS [91] was introduced to specify the acceptable maximum power and thus, maximum junction temperature. The power of the logic block for each sub-block is managed by controlling the gated clock and power switch to achieve a given power budget. In autonomous decentralized low-power systems [92, 93], the frequency, supply voltage, substrate bias voltage, and power switch of each sub-block are all controlled by the system, according to its supplied processing load, to achieve the minimum power consumption. Even in this scheme, high-speed controls (e.g., for fast wake-up) of subthreshold currents of selected and nonselected sub-blocks would be essential.
Testing
Testing of low-voltage RAMs is problematic. A large subthreshold current makes it difficult to discriminate between defective and non-defective VDD currents (i.e., IDDQ currents), and thereby poses a problem in the IDDQ testing of low-voltage CMOS circuits. IDDQ testing with the application of a reverse VBB [94] is effective when low-temperature measurement and multi-VT design are combined. Lowering VDD only at detection is also important because it dramatically reduces GIDL currents. The unusual temperature dependence of speed (even nullified) at a lower VDD [95, 96] is another concern in speed testing.
6. Future prospects
On the basis of the above, we present future perspectives on low-voltage RAMs in terms of devices and processes, memory cells, peripheral circuits, and architectures.
Devices and processes
Device structure of RAM chips
In the near future, RAMs must unavoidably take at least a dual-tOX, dual-VT, and dual-VDD approach because of different requirements between RAM cells and peripheral circuits, as discussed in the subsections on cell signal charge and leakage currents in Section 2. RAM cells require an ever-higher VT (Figure 5) and thus, a high VDD and thick tOX for stable and reliable operation. In contrast, peripheral circuits (or logic blocks on a SoC) require a low VDD, low VT, and thus, thin tOX for fast and low-power operations, according to ITRS trends [8]. For a higher I/O interface voltage, a triple tOX would be popular.
Low-leakage currents
Even one of the most-advanced schemes (Figure 10) would be less effective for lower-voltage, larger-capacity SRAMs. The resultant total current is as large as 1.6 µA even for a memory capacity as small as 16 Mb—even if a large VT, a thick tOX, and an offset source driving are all combined. Thus, much larger VT and thicker tOX are needed in the future, calling for new devices such as fully depleted (FD) SOIs with a reduced S-factor and new gate-insulator materials. In addition, lowering VDD while keeping the voltage swing the same to preserve the effectiveness of the scheme increases SER to unacceptable levels because of decreased QS in the standby mode, calling for soft-error-immune devices as well as on-chip ECC circuits.
Low voltage and high speed
PD-SOIs [97] have been successfully used for products such as MPUs because they improve the performance of standard digital logic by 20–35% over the comparable bulk process due to reduced capacitance. Major concerns with PD-SOIs, however, are the instabilities [4, 97] caused by the floating body. In particular, the resulting VT variations degrade margins of cells and their relevant circuits, and the degradation is further enhanced at a lower VDD. For SRAMs, some solutions have been proposed. These include reducing the number of cells connected to one column [98] to lower the accumulated subthreshold leakage from nonselected cells. A body contact applied to the paired MOSFETs of a sense amplifier [99] reduces sense-amplifier offset. A body-tied substrate with partial trench isolation [100, 101] is a solution to significantly improve immunity against soft errors while eliminating instabilities. The floating body in DRAMs degrades data retention time in the 1-T DRAM cell [102]. A combination of bulk for the DRAM cell array and an SOI for the peripheral circuits [103] is a solution despite the costly substrate structure.
The use of a dynamic threshold MOSFET (DTMOS) [104], which is built with the body connected to the gate and thus enables a non-floating body, is attractive in terms of low-voltage operation and the suppression of speed variations. This lowers the upper limit of the VDD to less than 0.5 V, even at room temperature, because of the rapid increase in pn-forward current [85]. However, the feature of self-corrective VT [85, 87] that DTMOS provides can suppress speed variations.
Although the concept of DTMOS was originally proposed with PD-SOIs despite the highly resistive body, it has also been realized with bulk MOSFETs with a low-resistive body [87]. Coupled with an internal VDD control, the DTMOS with bulk MOSFETs reduced the delay variation (i.e., delay difference between the worst and best design conditions) to one-fiftieth at 0.5 V. In addition, it realized a drive current three times greater than a conventional CMOS, while reducing the subthreshold current to two orders of magnitude.
FD-SOIs are also attractive in low-voltage operation because of the reduced S-factor, a small junction capacitance, small body-bias effects, and a small layout area. Thus, excellent performances [87, 105–107] have been achieved with multi-VT (dual/triple) FD-SOI, despite low voltages (0.3–0.5 V) and still large (0.25-µm) FD-SOI processes. In the 0.1-µm or less era, however, we need to reduce additional VT variations [108], if any, caused by thickness variations of the thin body and to attain multi-VT in specific MOSFETs to reduce subthreshold currents, although uses of special gate materials [109] and gate doping [110] have been proposed. Note that realizing multi-VT through dynamic VBB is impossible with FD-SOIs because of the lack of a body.
Because it seems unlikely that device and process solutions will be developed in time, the pace at which VDD is being lowered should be slowed so that larger MOSFETs are acceptable. Hence, vertical MOSFETs [111] that accept large channel length and tOX without sacrificing density might be effective. Vertical MOSFETs may also reduce RAM cell areas [41]. If the above attempts are unsuccessful, low-temperature bulk CMOS [112] may have a resurgence in the future.
Memory cells
In addition to small, high-speed ECC circuits, new RAM cells such as gain cells are indispensable, as explained in the subsection on DRAM cells in Section 3. In the long run, however, high-speed, high-density nonvolatile RAMs show strong potential for use as low-voltage memories. In particular, leakage-free and soft-error-free structures and the nondestructive read-out and non-charge-based operations that they could provide are attractive in terms of achieving fast cycle times, low power with zero standby power, and stable operation, even at the lower VDD. Simple planar structures, if possible, would cut costs. In this sense, magnetic RAMs (MRAMs) [113] and Ovonic Unified Memories** (OUMs**) [114] are appealing propositions. In MRAMs, one major drawback remains, which is to reduce the magnetic field needed to switch the magnetization of the storage element, while in OUMs, managing the proximity heating of the cell is an issue. In addition, the scalabilities and stability required to ensure nonvolatility still remain unresolved because development is still in its early stages.
Peripheral circuits and architectures
As far as RAMs are concerned, the subthreshold currents in the active mode could be reduced by improving the above-described CMOS circuits, unless they are too fast. In high-speed RAMs, such as fast SRAMs or high-speed column-mode DRAMs, however, current reduction is extremely difficult, as discussed in the subsection on applications to RAMs in Section 4. This suggests that a high-speed SoC will suffer from incredibly high power dissipated by its random logic gates because it may remain impossible to control subthreshold currents from these logic gates at a sufficiently high speed. Hence, the number of gates must be reduced. This implies that new SoC architectures will be required, such as memory-rich SoCs, which effectively reduce the subthreshold current. In addition to new architectures, low-power techniques learned from “old circuits,” such as bipolar, BiCMOS, E/D MOS, capacitive boosting, CML circuits, and even I2Ls, might be necessary.
7. Summary
This paper reviewed technology trends in low-voltage DRAMs and SRAMs and clarified the challenges facing low-voltage RAMs in terms of cell signal charge, necessary threshold voltage, VT, and VT variations in the MOSFETs of RAM cells and sense amplifiers, and leakage current. It then discussed developments in conventional RAM cells and emerging cells, such as DRAM gain cells and leakage-immune SRAM cells, from the viewpoints of cell area, operating voltage, and the subthreshold and gate-tunnel currents of MOSFETs. The concepts behind reducing subthreshold currents that have been proposed to date and the features of RAMs with respect to reducing subthreshold currents were then summarized. After that, their applications to RAM circuits to reduce subthreshold currents in standby and active modes were discussed, exemplified by DRAMs. The paper then discussed design issues in other peripheral circuits, such as sense amplifiers, I/O circuits, and on-chip power-supply generators, and it investigated the suppression of speed variations and power reductions through power management and testing. With respect to the above, future prospects were considered, with an emphasis on needs for high-speed nonvolatile RAMs, subthreshold-current reduction for high-speed active mode, and memory-rich SoC architectures.
Acknowledgment
The authors would like to thank Dr. D. Hisamoto and Dr. R. Tsuchiya for their valuable discussions on SOI and bulk CMOS characteristics.
**Trademark or registered trademark of Monolithic System Technology Inc. or Ovonyx, Inc.
Received November 21, 2002;
accepted
for publication March 24, 2003; Internet publication October 30, 2003 |