IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    

IBM Journal of Research and Development

Advanced Silicon Technology   Volume 50, Number 4/5, 2006
Table of contents: HTMLPDF This article: HTML PDFDOI: 10.1147/rd.504.0469Copyright info

Ultralow-voltage, minimum-energy CMOS

by S. Hanson,
B. Zhai,
K. Bernstein,
D. Blaauw,
A. Bryant,
L. Chang,
K. K. Das,
W. Haensch,
E. J. Nowak,
and D. M. Sylvester

Energy efficiency has become a ubiquitous design requirement for digital circuits. Aggressive supply-voltage scaling has emerged as the most effective way to reduce energy use. In this work, we review circuit behavior at low voltages, specifically in the subthreshold (Vdd < Vth) regime, and suggest new strategies for energy-efficient design. We begin with a study at the device level, and we show that extreme sensitivity to the supply and threshold voltages complicates subthreshold design. The effects of this sensitivity can be minimized through simple device modifications and new device geometries. At the circuit level, we review the energy characteristics of subthreshold logic and SRAM circuits, and demonstrate that energy efficiency relies on the balance between dynamic and leakage energies, with process variability playing a key role in both energy efficiency and robustness. We continue the study of energy-efficient design by broadening our scope to the architectural level. We discuss the energy benefits of techniques such as multiple-threshold CMOS (MTCMOS) and adaptive body biasing (ABB), and we also consider the performance benefits of multiprocessor design at ultralow supply voltages.

1. Introduction

Mobile battery-powered electronic devices have created a growing demand for energy-efficient circuit design. Cellular phones alone represent a large industry and create both an opportunity for innovation and the potential for profitability. Future progress in mobile electronics will depend on the development of inexpensive devices with complex functionality and long battery life. The aim of this paper is to show how devices, circuits, and architectures within this design space may be optimized for minimum energy consumption.

Even in the realm of high-performance microprocessors, power has become a limiting constraint. Traditional scaling of high-performance FETs uses a combination of supply-voltage (Vdd) and threshold-voltage (Vth) reduction to accommodate both performance and power requirements, but the rapid rise of subthreshold and gate leakage has placed limits on this scaling strategy. It is clear that new strategies are necessary to address the power concerns in high-performance designs.

Voltage scaling is the most effective solution to stringent power requirements and has been practically demonstrated in a number of designs. Reduction of the supply voltage (with a fixed threshold voltage) results in a quadratic reduction of dynamic energy at the expense of decreased performance. For many applications, this performance penalty is tolerable. In fact, for a wide range of applications, including sensors and medical devices, a significant performance penalty may be tolerated without compromising the usefulness of the device. High-performance designs may also take advantage of supply-voltage reduction during idle periods when the circuit is performing simple background routines, because performance requirements are relaxed or removed altogether. Regardless of the application, the use of aggressive voltage scaling can lead to considerable energy reductions whenever performance demands are low for a circuit.

This paper explores the limits of minimum-energy CMOS. We show that, for large classes of circuits, minimum energy consumption occurs when the voltage is scaled below the device threshold voltage. In this region, called the subthreshold (sub-Vth) regime, energy consumption can be reduced by 20x relative to standard superthreshold (Vdd > Vth) operation. We use a hierarchical approach to the exploration of sub-Vth design. We begin by discussing the evolution of device behavior as supply voltage is reduced into the sub-Vth regime. We then develop intuition and a methodology for minimum-energy design by considering circuit behavior in the sub-Vth regime. Finally, we use this intuition to discuss how architectural techniques may be used to improve energy efficiency in designs dedicated to low-energy operation as well as designs with both performance and energy requirements.

At the device level, we compare FET sub-Vth and super-Vth characteristics and sensitivities. We find that sub-Vth FET currents are exponentially dependent on Vth and Vdd and that this presents the biggest challenge to device, circuit, and architecture design. In a design that must operate over a wide range of voltages and performance levels, minor tradeoffs, such as lengthening FET channels slightly to reduce Vth variation and making minor Vth adjustments to maintain nominal matching between FETs at low Vdd, should be considered. However, if energy minimization at low Vdd is the critical goal, significant device-optimization studies must be considered. Dual-gate FETs show much promise for future work in energy-optimal design. If combined with low-workfunction metal gates and an increased channel length, dual-gate FETs can help minimize Vth variations and achieve a steep sub-Vth slope, the key parameters for sub-Vth operation.

At the circuit level, we present a simple analytical model for the energy-optimal supply voltage, Vmin. This simple model illustrates the tradeoff between leakage energy and dynamic energy that occurs in energy-optimal circuits. We find that energy efficiency is limited by the rise of leakage and that the designers of energy-efficient circuits should reduce Vmin until it approaches Vdd,limit, the minimum functional voltage. Additionally, we find that heightened sensitivity to the threshold voltage in combination with a low Ion/Ioff ratio results in serious circuit-level robustness concerns when process variation is being considered. Energy efficiency also exhibits a strong sensitivity to threshold variability.

We pay special attention to SRAM arrays. We find that large SRAM arrays have higher Vmin and Vdd,limit values than those used for standard logic. For a standard six-transistor SRAM (6T-SRAM) array, Vdd,limit is shown to be higher than Vmin, suggesting that significant redesign will be necessary to produce robust, energy-efficient SRAM design. The problem is further complicated by process variation, particularly Vth mismatch introduced by random dopant fluctuations. The eight-transistor SRAM (8T-SRAM) cell is presented as a feasible solution.

In the final section of the paper, we discuss energy-efficient architectural techniques. We suggest that techniques such as multi-threshold CMOS (MTCMOS), adaptive body biasing (ABB), and the use of voltage islands can help a design achieve energy optimality by shifting Vmin toward Vdd,limit. Architectural techniques, specifically those that involve multiprocessor design, can ameliorate the performance penalty suffered as a result of low-voltage operation.

The paper is organized as follows. In Section 2 we discuss device-level behavior at sub-Vth and near-Vth voltages. Section 3 includes a discussion of the implications of device-level changes at the circuit level and a general and useful energy model for sub-Vth operation. The complications introduced by SRAM design are given special consideration. Finally, in Section 4 we discuss energy-efficient architectural techniques targeted at both dedicated minimum-energy operation and high-performance operation.

2. Device characteristics at ultralow-voltage operation

As Vdd is reduced to minimize energy per operation, FETs make the transition from superthreshold (super-Vth) operation in strong inversion with large gate overdrives, to near-Vth operation in weak inversion with very small overdrives, and finally into sub-Vth operation. Sub-Vth operation differs from super-Vth operation primarily because the sub-Vth on-current (Ion-sub) depends exponentially on threshold voltage (Vth) and power-supply voltage (Vdd), while the typical super-Vth operation on-current (Ion-super) depends roughly linearly on Vth and Vdd. The Ion-sub exponential sensitivities to Vth and Vdd are captured in the following equation:

Equation 1(1)

where vT = kT/q. In these equations, T is temperature, vT is the thermal voltage, k is Boltzmann's constant, q is the charge of an electron, Leff is the effective gate length, μeff is the effective mobility, Cox is the oxide capacitance, W is the gate width, and m is the subthreshold slope factor. On-current is defined in this paper as Ids, when Vgs = Vds = Vdd. It is important to highlight the implicit Vth dependence on Leff in Equation (1) because Ion-sub becomes very sensitive to Leff due to the Vth term. Vth is also dependent on Vds via drain-induced barrier lowering (DIBL), which plays a role in determining the effect Vdd has on Ion-sub. The linear sensitivity of Ion-super to Vth and Vdd for short-channel FETs is captured in the equation

Equation 2(2)

where Rs is the FET source resistance and gmsat is the saturated transconductance, which depends on Leff, Cox, and the carrier saturation velocity. VPO is the pinch-off voltage. The near-Vth Ion sensitivity to Vth and Vdd is bounded by the sub-Vth and super-Vth sensitivities. Figure 1 highlights the differences between super-Vth and sub-Vth current characteristics. Tables 1 and 2 compare key parametric sensitivities of FETs in sub-Vth, near-Vth, and super-Vth operation.

Figure 1 Figure 1


Table 1 Comparison of key sub-Vth, near-Vth, and super-Vth n-FET sensitivities [65-nm technology, room temperature. Effective FET channel length (Lgate) is approximately 35 nm.].
Sub-VthNear-VthSuper-Vth

Vdd200 mV400 mV1 V
 
Vth,sat270 mV250 mV180 mV
 
Ion ~20 μA/μm ~80 μA/μm ~1 mA/μm
 
Sensitivity of Ion to 100-mV Vdd reduction18x4.6x1.20x
 
Sensitivity of Ion to 100-mV Vth increase11x3.7x1.17x
 
Sensitivity of Ioff to 100-mV Vth increase16x15x12x
 
Sensitivity of Ion,n-FET/Ion,p-FET ratio to 100-mV Vth mismatch10x3.7x1.17x
 
Ion/Ioff ratio160x3,150x7,000x
 
Ion/Ioff ratio vs. 100-mV Vth increase1.44x4.2x11x


Table 2 Comparison of additional key sub-Vth and super-Vth n-FET sensitivities [65-nm technology, room temperature. Effective FET channel length (Lgate) is approximately 35 nm.].
Sub-VthSuper-Vth

Sensitivity of Ion to 0.9x inverse sub-Vth slope reduction (at constant Ioff)~1.7x~1.03x
 
Sensitivity of Ion to 1.3x decrease in Tox (at constant Ioff)~1.7x~1.23x
 
Sensitivity of Ion to 1.3x increase in L (at constant Vth and slope)~0.77x~0.94x
 
Sensitivity of Ion/Ioff ratio to 1.3x increase in L (at constant Vth and slope)~1x~1.22x
 
Sensitivity of Ion to 1.3x increase in mobility~1.3x~1.05x
 
Sensitivity of Ion/Ioff ratio to 1.3x mobility change (at constant Ioff)~1x~1.04x

The exponential sub-Vth Ion sensitivity to Vth drastically affects circuit behavior. First, the circuit delay and power now also depend exponentially on Vth and Vdd. More significantly, current matching between two FETs is exponentially dependent on any difference in Vth. For example, while a reasonable 6σ 100-mV Vth mismatch disturbs the FET current ratios by only approximately 1.17x in super-Vth operation, a similar 100-mV Vth mismatch upsets the current matching by greater than 10x in sub-Vth operation. (We use “x” throughout this paper to indicate “times,” so that, for example, “10x” means a factor of 10 times.) This extreme sensitivity to Vdd and Vth presents the most significant challenge to sub-Vth and near-Vth circuit functionality, and is discussed in later circuit sections.

Product requirements dictate how device optimizations may be used to increase energy efficiency. One product application may have performance restrictions and will therefore be required to operate at high Vdd. In this case, it is likely that the process will be similar to a typical super-Vth process. A multiple-core microprocessor may be an example of such an application, in which the Vdd of each core is varied according to performance needs and power constraints during operation. In this scenario, a few key circuits may require modification to enable low-voltage operation, but the potential to minimize energy is ultimately limited by the high-performance requirements. There may also be some small technology modifications (e.g., small Vth adjustments) that enable low-Vdd operation without a significant high-Vdd performance impact. On the other hand, if the application is aimed solely at low-Vdd operation, the technology and circuits can be optimized to minimize total energy consumption. A number of techniques for addressing these two very different scenarios at the architectural level are discussed in Section 4. In this section, we first consider FET low-Vdd characteristics and sensitivities that have implications for both of these scenarios.

The exponential sensitivity to Vth in sub-Vth and near-Vth operation changes the impact that key device parameters have on FET currents (see Tables 1 and 2). For example, a 10% reduction in inverse sub-Vth slope increases sub-Vth Ion by 1.7x and super-Vth Ion by only 3% (sub-Vth slope measures the slope of the drain current with respect to gate voltage and is commonly quoted in its inverse form in mV/decade). As a result, sub-Vth Ion is much more sensitive to FET gate insulator thickness, tox, because tox plays a critical role in determining the sub-Vth slope. In typical high-performance technologies, FET channels are made as short as possible, and as a consequence sub-Vth slope is suboptimal. Reducing tox improves the sub-Vth slope and significantly increases sub-Vth Ion. The impact of the sub-Vth slope improvement in super-Vth Ion is considerably less. In reality, the observed super-Vth Ion increase results from the sublinear saturated transconductance dependence on tox. (Transconductance is an expression of the current-carrying ability of a FET. In general, the larger the transconductance value for a device, the greater the gain it is capable of delivering.) The example in Table 2 shows that a 1.3x reduction in tox improves the sub-Vth Ion by 1.7x and improves the super-Vth Ion by only 1.23x. The authors of [1] show that improved sub-Vth slope makes a significant contribution to the energy savings observed in devices optimized for sub-Vth operation. As we see in Section 3, the leakage reduction resulting from sub-Vth slope improvement provides an attractive strategy for energy minimization.

The impact of FET channel length (Lgate) on sub-Vth Ion is due predominantly to the dependence of Vth and the sub-Vth slope on Lgate. At short channels, Vth decreases and sub-Vth slope degrades as the value of Lgate is reduced because of drain barrier lowering and other short-channel effects. As a consequence, the sub-Vth current increases exponentially at short Lgate values, as shown in Figure 2 for an n-FET in a typical 65-nm technology. Typically, high-performance FETs use Lgate in the region where these parameters vary strongly with length. This creates a considerable challenge for sub-Vth operation because small variations in Lgate values have an enormous impact on Ion. In particular, Lgate linewidth variation leads to significant mismatch in FET drive strengths. This effect can be reduced by increasing values of Lgate (Figure 3 shows Ion variation resulting from linewidth variation as a function of Lgate), but increased gate length degrades super-Vth performance. (The term δL refers to a 3σ variation in linewidth.) On the other hand, sub-Vth performance is not affected as severely as super-Vth performance, because current can be regained with a small reduction of Vth, with no impact on the Ion/Ioff ratio. More significantly, the additional capacitive loading associated with increasing Lgate is significantly smaller for sub-Vth than it is for super-Vth, as shown in Table 3. Sub-Vth operation at longer Lgate values gives the added advantage of a steeper sub-Vth slope. Similar tradeoffs must also be considered with respect to narrow FET channel widths (W). However, the choice of Lgate and W are greatly affected by the circuit application requirements. Gate dimensions have less flexibility in the extreme case in which high performance and high Vdd are at a premium. In this case, the designer can account for the impact of the Vth and slope variations correctly only when predicting sub-Vth and near-Vth circuit behaviors.

Figure 2 Figure 2 Figure 3 Figure 3


Table 3 Sensitivity of sub-Vth vs. super-Vth inverter-chain node capacitance to channel length (130-nm technology).
Sub-VthSuper-Vth

Vdd200 mV1.2 V
Total node capacitance L = 120 nm (fF)2.93.3
Total node capacitance at L = 240 nm (fF)3.14.9
Ratio1.1x1.5x

Another point to consider is that p-FET and n-FET thresholds can decrease at different rates as Vdd is reduced. This implies that the n–p Ion matching can change drastically, as indicated in Figure 4. Small Vth adjustments can be made to provide the optimal matching for sub-Vth and near-Vth operation; but again, super-Vth operation may deteriorate.

Figure 4 Figure 4

Random channel dopant fluctuation (RDF) is another source of threshold variations that results in FET current mismatch. The 3σ Vth variation (δVth) induced by RDF is inversely proportional to the square root of the channel area (δVth ~ A/(W × L)1/2, where A is a constant of 4 in units of mV × μm, W is the FET channel width in μm, and L is the FET channel length in μm) [2]. Table 4 compares Vth mismatch induced by RDF, across-chip channel-length variations (ACLVs), and across-chip rapid thermal anneal (RTA) variations in a 65-nm technology (Lgate ~ 35 nm). At 65 nm, RDF-induced Vth mismatch is comparable to other sources of Vth variability and will dominate in future technologies as channel areas are scaled down. Although the 30–60 mV of Vth variation has a significant impact on super-Vth matching, a similar variation in the sub-Vth region results in a 2–3x variation in Ion. Some relief from current variation can be gained by increasing FET dimensions, but the reduction is less significant than observed when channels are lengthened to reduce the impact of short-channel effects on Vth variation. One possibly useful approach is to provide feedback, at the circuit level, to FET back-gates or wells in order to match thresholds [3]; however, this adds circuit overhead and is impractical for any circuit that depends on the relative strengths of FETs (i.e., “ratioed circuits”). Nonetheless, back-gate or well feedback may enable lower Vdd in a large design if used only with highly sensitive circuits.


Table 4 Sources of random Vth mismatch in 65-nm SOI technology. (ACLV: across-chip channel length variation; RTA: across-chip rapid thermal anneal.)
ACLVRTADoping fluctuation
(L = 35 nm,
W = 500 nm)
Doping fluctuation
(L = 35 nm,
W = 140 nm)

Vth mismatch (mV)25283058

The important lesson is that Vth and sub-Vth slope variations are the key challenges to sub-Vth and near-Vth circuit designs. If a circuit must operate at both high Vdd and very low Vdd (with a strong emphasis on high-voltage operation), minor tradeoffs should be considered, such as lengthening FET channels slightly to reduce Vth variation and making small Vth adjustments to maintain nominal matching between FETs at low Vdd. However, if energy minimization at low Vdd is of paramount importance, significant device optimization studies must be considered. Significant increases in channel length may be advantageous in order to achieve steep sub-Vth slopes and to minimize Vth and sub-Vth slope variations, keeping in mind that channel capacitance plays a smaller role in sub-Vth operation than in super-Vth operation. The use of mid-gap and quarter-gap metal gates in conjunction with the longer channels should be reconsidered [4]. At short channels, mid-gap metal gates can result in poor sub-Vth slopes, but steep slopes can be achieved at longer channels even with low body-doping concentrations. In this case, the Vth is controlled primarily by the metal workfunction and not body doping; hence, Vth variation due to random doping fluctuations is reduced. Furthermore, mobility increases as the doping concentration is reduced. In the future, advanced dual-gate FINFET [4] and back-gated FET [4] structures will become available. Each of these has different benefits for sub-Vth and near-Vth operation. Recent work has shown that dual-gated FETs are ideal sub-Vth devices because they offer the steepest sub-Vth slope [5]. If combined with mid-gap metal gates at long channels, dual-gated FETs may offer sufficient reduction in Vth variations. Back-gated FETs trade off sub-Vth slope for the possibility of further reduction in Vth variations via back-gate feedback. The ultimate choice in device type will depend on how well Vth can be controlled.

FET-channel resistances are very large for sub-Vth and near-Vth operation, providing optimization possibilities in applications in which minimization of energy is the key goal. Because the channel resistances are high, FET series resistance (Rs) and interconnect resistances can be larger without having an impact on performance. For example, the increase in Rs required to reduce super-Vth Ion by 10% is approximately 100 Ω-μm, while a 10% sub-Vth reduction requires an increase in Rs of approximately 6 kΩ-μm. When considering challenges at the FET level, the high Rs values that can occur with the simplest dual-gate process options may not be a concern. Decreasing the gate overlap of source and drain diffusions in order to reduce Miller capacitance at the cost of larger Rs should be considered. Decreasing the dimensions of interconnect is attractive because capacitances can be reduced at the cost of increased resistance. For a net loaded by wire capacitance, this latter tradeoff could be quite significant. A simple sizing suggests that a 1.25x reduction in active power results when a 4x increase in interconnect resistance is accompanied by a 2x reduction in interconnect capacitance. Of course, all of these options compromise performance at high Vdd.

Reliability and susceptibility to wear-out mechanisms in low-Vdd and high-Vdd operation also differ. Hot-carrier degradation is greatly reduced at low Vdd. However, if circuits must operate at both high and low Vdd, degradation during high-Vdd operation can have a large impact on low-Vdd operation. Negative bias temperature instability (NBTI) and channel hot carrier (CHC) effects that cause Vth shifts will be the major concern. Even if circuits operate only at low voltage, where the NBTI effect is greatly reduced, NBTI is made worse by long standby periods. SRAMs, for example, can have very long standby periods and may be susceptible to NBTI even at low voltage. Thus, it is important to evaluate the impact of these reliability effects at low Vdd. On the other hand, electromigration, which increases interconnect resistance, is less of a concern. Susceptibility to radiation needs consideration, and careful analysis of soft-error rates in sub-Vth logic must be conducted.

In summary, the major challenge in sub-Vth FET design is Vth control. In circuits with high Vdd performance requirements, designers must make small compromises to reduce Vth variation and maintain current matching between FETs. However, if energy minimization at low Vdd is the critical goal, significant device optimization studies must be considered. For minimum-energy CMOS, dual-gate FETs show much promise. If combined with low-workfunction metal gates at long channel lengths, dual-gate FETs will help minimize Vth variation and improve sub-Vth slope, the key parameters for sub-Vth operation.

3. Circuit characteristics at ultralow-voltage operation

The previous section described significant changes in device-level behavior as supply voltage is lowered toward the sub-Vth regime. In particular, the Ion/Ioff ratio is reduced significantly in the sub-Vth region, and devices show an increased sensitivity to the threshold voltage, supply voltage, and sub-Vth slope. At the circuit level, this leads to changes in three areas of concern: noise margins, energy optimality, and sensitivity to process variations. Each of these topics is discussed thoroughly for general CMOS logic, and special consideration is given to the design of SRAM arrays.

CMOS characteristics at the voltage-scaling limit

Most useful designs have maintained a “safe” difference between the values of supply voltage and threshold voltage to guarantee robustness and performance. However, as designers have known for many years, CMOS is a very robust logic family, and in the absence of variability it is unnecessary to maintain a margin between the supply and threshold voltages to guarantee functionality. The supply voltage of a design can therefore be dramatically lowered to limit the dynamic energy consumed. Once the supply voltage drops below the threshold voltage, current takes the form of weak inversion sub-Vth current, which is modeled by Equation (1). Using sub-Vth current to charge and discharge nodal capacitances, a circuit may function at very low voltages. The theoretical lower limit on voltage scaling was first established in [67] as

Equation 3(3)

where Cfs is the fast surface state capacitance per area, Cox is the gate-oxide capacitance per unit area, Cd is the depletion capacitance per unit area, and vT is the thermal voltage kT/q. The second form of Equation (3) is an approximation of the first assuming an ideal MOSFET (a sub-Vth swing of 60 mV/dec at 300 K) with Cfs ≪ Cox and Cd ≪ Cox [6]. An ideal MOSFET can therefore theoretically operate at voltages as low as 36 mV. Sub-Vth swing is generally much higher than 60 mV/dec, so an inverter based on realistic MOSFETs will cease to function at a voltage higher than 36 mV. Furthermore, the result in Equation (3) depends on matching between p-FET and n-FET currents. Proper balancing of pull-up and pull-down networks becomes very difficult when gates have transistor stacks (i.e., series-connected transistors). The relative strengths of the pull-up and pull-down networks are dependent on the values of the inputs to the gate, so the use of stacks raises Vdd,limit well above that of a simple inverter. Table 5 shows simulated Vdd,limit values for several common CMOS gates. Vdd,limit is the lowest voltage for which the voltage transfer characteristic (VTC) has a gain greater than magnitude one when the input voltage Vin equals the output voltage Vout (i.e., |gain| ≥ 1 when Vin = Vout) [6]. Figure 5(a) shows the measured voltage transfer characteristic (VTC) of an inverter at 65 mV with various well biases and confirms that sub-Vth logic functions at room temperature below the previously reported low value of 70 mV [8]. In Figure 5(b), a butterfly curve for a 6T-SRAM cell is shown at a supply voltage of 70 mV, proving that sequential elements also maintain functionality well into the sub-Vth regime. The robustness and energy efficiency of SRAM receives special attention in the subsection below on sub-Vth SRAM design issues.


Table 5 Lowest functional supply voltage for several common CMOS gates in a 130-nm technology. Vdd,limit increases with the number of inputs because of the imbalance between pull-up and pull-down networks.
GateVdd,limit (mV)

INV52
Two-input NAND72
Three-input NAND87
Four-input NAND97
Two-input NOR65
Three-input NOR74
Four-input NOR80

Figure 5 Figure 5

Voltage scaling is further limited when complex gates are placed in series. Circuit simulation has been conducted to examine the voltage-scaling limits of typical logic using circuit models of an IBM 65-nm PD-SOI process. For this study, we had to redefine Vdd,limit because we were not considering static isolated gates. Here, we define Vdd,limit as the lowest voltage at which an input signal switching event results in a correct output switching event (the switching threshold is defined as 0.5Vdd). All delay values have been normalized with respect to the delays of the respective gates at Vdd = 1.0 V. Table 6 illustrates Vdd,limit for a wire-load-dominated, representative microprocessor critical path. Such critical paths contain long metal interconnects and are interspaced with optimally sized inverting repeaters. Because of the absence of complex static gates (with stacked devices) in this type of critical path, Vdd,limit is very low. As Table 6 shows, this circuit functions properly at supply voltages as low as 60 mV. Table 6 also shows the Vdd,limit for a logic-dominated, representative microprocessor critical path. This path includes a wide variety of gates including NAND3s, AND-OR-INVERT (AOI) gates, inverters, and transmission-gate-based multiplexers. Multiple instances of each of the mentioned static gates are present in this circuit with varying device sizes and p/n ratios. It is important to mention that this circuit does not have any NOR gates and includes parasitic wire capacitances. Because of the presence of stacked n-FETs and p-FETs in various static gates, this circuit fails to operate below 120 mV.


Table 6 Delays for wire-dominated and logic-dominated representative microprocessor paths with voltage scaling. A delay value indicates that output switching occurred; “fail” indicates that the circuit did not work at the specified voltage. (R and F: rising and falling transitions.)
Vdd (V)1.00.40.30.20.150.140.120.10.06

Wire (R)16272043195036322,0345,130
Wire (F)16261923004695871,7504,433
Logic (R)111493338761,0321,258failfail
Logic (F)18372899251,1892,110failfail

Increasing Vth mismatch between n-FETs and p-FETs further increases Vdd,limit, as is illustrated in Table 7. Two of the entries in the table are labeled “n-FET increase,” and the other two are labeled “p-FET increase,” indicating that the n-FET and p-FET thresholds have been increased by 100 mV. When the n-FET threshold is increased, circuit failure occurs below 200 mV, while when the p-FET threshold is increased, the circuit is operational at values as low as 150 mV. The simulated circuit has large n-FET stacks and no p-FET stacks (because of the presence of NAND3s and the absence of NORs), so a higher-p-FET Vth is more acceptable than a high-n-FET Vth. This explains the lower Vdd,limit for the case in which the p-FET threshold is increased. Although not illustrated in these tables, Vdd,limit is also influenced by the p/n sizing ratios in the various gates of the design.


Table 7 Delays for a logic-dominated microprocessor path with voltage scaling. 100 mV of n-FET/p-FET mismatch is introduced. The rows labeled “n-FET increase” indicate that the n-FET threshold has been increased by 50 mV. (R and F: rising and falling transitions.)
Vdd (V)1.00.40.30.20.180.160.150.140.12

n-FET increase (R)11158381failfailfailfailfail
n-FET increase (F)110545171,107failfailfailfail
p-FET increase (R)113634346469611,1991,721fail
p-FET increase (F)1942307460656734failfail

CMOS is clearly functional at very low voltages. There is a performance price, however, for low-voltage operation. The sensitivity of delay to both supply voltage and threshold voltage in sub-Vth operation has been alluded to in both Section 2 and Tables 6 and 7. Figure 6 shows inverter delay in a 130-nm process as a function of supply voltage. Simulations of the same inverter also show that a threshold shift of only 50 mV results in a 4x change in delay. These general trends become important in discussions of leakage energy and variability later in Section 3.

Figure 6 Figure 6

Noise susceptibility in sub-Vth logic

Unwanted signals, such as noise, must be addressed in any system of logic, particularly in ultralow-power CMOS. For convenience, we partition the problem into two parts, the circuit noise margin and the noise level in the system. In the first part of this section, we compare the quantitative behavior of the noise margin in sub-Vth logic with that of conventional CMOS. We follow this analysis with a view of noise generation in sub-Vth logic, again comparing the behavioral issues to those found for conventional CMOS.

The noise margin is the difference between a valid output logic level and an input level at which the data of a “victim” circuit will be corrupted. (A victim circuit is one that is subject to noise from an external source.) Thus, for a “high” logic level, the high-state margin is given by VH,margin = VOH − VIH, where VOH is the least positive guaranteed logic output voltage for a valid high state, and VIH is the least positive input voltage required to disturb the logic state of the receiving circuit [see Figure 5(a)]. VL,margin is similarly defined, such that a positive value for VL,margin is sufficient for valid transmission of a “low” logic level. In all following discussions, for convenience we refer to these noise margins in fractional values of the Vdd.

The input levels VIH and VIL can be approximated by the unity-gain points of the receiver in question. For this approximation, the relative input levels increase as Vdd is decreased in the extreme sub-Vth region (Vdd < 100 mV) as long as n-FET- and p-FET-drive levels are carefully balanced. Both VIH and VIL necessarily approach 0.5Vdd as Vdd is decreased toward the limiting low-voltage limit for bi-stability. Figure 7 shows the increase in (Vdd − VOH)/Vdd and VOL/Vdd for a simulated 90-nm-generation CMOS sub-Vth inverter. The reason for the increase is clear; the Ion/Ioff ratio is decreasing exponentially with Vdd, roughly as exp(Vdd/mvT), where m is the “ideality factor,” that is, the subthreshold slope factor described in Section 2. Hence VOL/Vdd, for example, is expected to increase similarly, as exp(−Vdd/mvT). The net effect is that the fractional noise margin in sub-Vth logic is fairly constant as Vdd is reduced from a few hundred mV to 100 mV, below which the increasing values of (Vdd − VOH)/Vdd and VOL/Vdd result in a decreasing fractional noise margin. Figure 8 illustrates how the fractional output and input levels behave from Vdd = 200 mV to Vdd = 45 mV, where operation becomes unstable.

Figure 7 Figure 7 Figure 8 Figure 8

Noise generation can be approached from a simplified model, shown in Figure 9. Noise is generated by an “offending” path driven by Rdriver (Rdriver may be thought of as the equivalent impedance of a CMOS output stage) with a load consisting of a path directly to ac ground, and a second path with “bad” coupling capacitance to the input of a victim circuit, which in turn has some “good” input capacitance to ac ground. We refer to coupling capacitance as “bad” because it allows noise from one wire to affect another wire. In contrast, we refer to grounded capacitance as “good” because it helps a wire to resist noise. In addition, the gate of the “victim” is driven by Rgood, which, like Rdriver, is the equivalent impedance of a CMOS output stage. Typically, the “bad” coupling capacitance arises from adjacent wires that are parallel to each other. To simplify analysis, we consider two limiting cases: 1) Rgood is much greater than the impedance of Cgood, and 2) Rgood is much less than the impedance of Cgood.

Figure 9 Figure 9

In the first case, noise generation is effectively given by the ratio 2Cbad/(Cgood + Cbad), where the factor of 2 results from a worst-case scenario in which the offending wire and the victim wire switch in opposite directions. This factor of increase is due to the so-called Miller effect. While this ratio is insensitive to the drive characteristics of the transistors, a significant fraction of Cgood can be due to gate capacitance to (ac) ground; this gate capacitance is shown to decrease in sub-Vth operation. Thus, in the case in which Cgate may comprise 30% of the total wire load to ground, the noise coupling may increase by as much as 15% in sub-Vth operation. This provides further impetus to customize the interconnect technology toward using narrow, thin wires, providing lower capacitance at the expense of higher interconnect resistance. In particular, noise “scaling” can be preserved, provided that the interconnect capacitances are reduced in proportion to the effective reduction in gate capacitance.

In the second case, in which Rgood is much less than the impedance of Cgood, the noise coupling is given by (ζCbad)−1/[Rgood + (ζCbad)−1]. Note that no factor of 2 is needed because Rgood can be dominant only in a static case. Here we must consider the scaling of ζ, which is given by the inverse of the characteristic rise or fall time of the offending signal. This rise or fall time is in turn driven by Rdriver, which is increasing at the same rate as Rgood, and pushes the design into sub-Vth operation. Thus, the noise coupling is expected to be unchanged in this limit. Consequently, sub-Vth noise coupling may actually be smaller than super-Vth noise coupling if sub-Vth interconnect is optimized for lower capacitance and higher resistance.

In conclusion for this section, note that noise margins are largely unchanged in sub-Vth CMOS (to as low as 100 mV), provided that n-FET/p-FET matching can be adequately constrained. This may be non-trivial, particularly in light of stochastic mechanisms such as random dopant fluctuations (RDFs). Below 100 mV, even with perfect matching, the noise margin intrinsically decays until it collapses at the stability limit. Interconnect optimization of capacitance at the expense of resistance is desirable to compensate for reduced gate capacitance as a noise shunt. Otherwise, some extra allowance for noise coupling may be required.

Finding the energy minimum

Optimal power and energy analysis
Though absolute noise margins and delay both degrade in the sub-Vth region, CMOS logic continues to function at very low voltages. To justify these delay and robustness penalties, we now consider the power and energy efficiency when operating in the sub-Vth regime. Power consumption has two components: dynamic and leakage. Thus, the expression for power is

P = Pdyn + Pleak,(4)
 
P = ½ · Cs · Vdd2 · alpha · f + Ileak · Vdd,

where Cs is the switched capacitance of a single inverter, alpha is an activity factor, f is the clock frequency, and Ileak is the leakage current. The “activity factor” is the average number of transitions on a node per clock cycle.

Both dynamic and leakage power benefit from supply-voltage reduction, and dynamic power will continue to improve quadratically as voltage is scaled to the lowest value that guarantees functionality. Circuit designers have traditionally given power more attention than energy, but energy is generally a more suitable metric when battery life is the overriding priority. We therefore begin with a study of energy and show that, unlike power optimization, energy optimization relies upon a compromise between dynamic and leakage energies. Just as in the case for power, energy comprises dynamic and leakage components:

E = Edyn + Eleak,(5)
 
E = ½ · Cs · Vdd2 · alpha + Ileak · Vdd · tp.

Note that we can safely ignore short-circuit energy in this analysis; the reason is explained below. Typical short-circuit current analysis assumes that direct-path current approaches zero when the supply voltage drops below Vth,n-FET + |Vth,p-FET| because the direct-path current is entirely sub-Vth current below this voltage. Because sub-Vth logic is driven exclusively by sub-Vth current, we redefine short-circuit current as any direct-path current beyond the leakage current present in steady state. A first-order analysis shows that the total short-circuit energy is an approximately quadratic function of Vdd and may be combined with dynamic energy. Assuming a triangular short-circuit current distribution and that Qsc is short-circuit charge, Isc is the peak short-circuit current, and tsc is the total time that short-circuit current exists:

Equation 6(6)

Note that Isc and Ion are assumed to scale identically with Vdd, so their dependencies cancel. As a result, Qsc is linear with Vdd, and short-circuit energy, Esc = QscVdd, is quadratic with Vdd. Simulations show that the quadratic relation fits very well in the super-Vth region, but in the sub-Vth region, Esc actually decreases faster than predicted by the quadratic model. This change in behavior in the sub-Vth region is minimal, though, and can be ignored with only a small penalty. If we assume a quadratic dependence on Vdd, Esc may be modeled using a multiplier in front of dynamic energy because dynamic energy is also quadratically dependent on Vdd. We therefore ignore short-circuit energy without invalidating our analysis.

A critical difference between energy and power exists as illustrated by Equations (4) and (5), namely that leakage energy (per operation) is dependent on circuit delay, tp. Figure 6 shows that the delay increases rapidly as the supply voltage scales, particularly when the supply approaches the threshold voltage. Even though Ileak [Equation (1)] decreases with supply voltage, the increase in delay is so dramatic that leakage energy quickly overtakes dynamic energy. Figure 10 shows the average power consumption and the energy consumed per operation for a chain of 50 inverters in an industrial 130-nm technology. Here, an “operation” is the work done in a single clock period. In this example, Vdd is scaled, and Vth is fixed at approximately 400 mV. Power decreases monotonically, while energy shows an inflection point caused by the rapid rise in leakage energy. For the circuit under consideration, the energy-optimal point occurs in the sub-Vth region and yields an energy reduction greater than 20x. This result is confirmed by the authors of [9] and [10], who observe sub-Vth voltages to be optimal for an inverter chain and an FIR filter, respectively, with Vth fixed. However, the threshold voltage does not have to be fixed when scaling the supply voltage. The authors of [11] study the energy benefits of simultaneous Vdd and Vth scaling and find that sub-Vth operation is generally more energy-efficient than super-Vth operation when performance requirements are low. Because we are primarily concerned with minimizing energy, we can accept low performance and can take advantage of sub-Vth operation. Given the results of [911], we first consider the optimization of circuits in the sub-Vth region. Because a number of applications are expected to have higher energy-optimal supply voltages, we also consider the implications of near-threshold and super-Vth operation. In both cases, the supply voltage is scaled relative to a fixed threshold voltage.

Figure 10 Figure 10

Inverter chain analysis
Simple analysis of an inverter chain helps build an understanding of the balance between dynamic and leakage energies that occurs at the energy-optimal supply voltage. In [9], an inverter chain with n identical stages and an activity factor of alpha is considered. The energy per switching event of this system is given by

Equation 7(7)

where n is the number of inverter stages, tp is the delay of a single inverter, Ileak is the leakage current of a single inverter, and all other variables are as previously described for Equation (4). It is clear that sub-Vth operation is optimal for many circuits [910], so this analysis assumes sub-Vth operation. In this analysis, the delay of an inverter with a step input voltage, tp,step, is modeled using

Equation 8(8)

This expression is valid for both super-Vth and sub-Vth operation, but in the latter case Ion is modeled using Equation (1). The authors of [9] show that tp in the sub-Vth region can be approximated as

tp,actual = η · tp,step,(9)

where η is a technology-dependent parameter that represents delay degradation due to the slope of the input signal. Substituting Equations (8) and (9) into Equation (7) results in the following equation:

Equation 10(10)

The variables Ileak and Ion both take the form of Equation (1) because we are assuming operation in the sub-Vth regime. Consequently, all terms in the Ileak and Ion expressions cancel except for the exponential Vg dependence, and Equation (10) may be further simplified to

Equation 11(11)

Equation (11) reveals a great deal about the energy dependencies in the sub-Vth voltage regime. For example, the strong dependence of energy on supply voltage (Vdd) is evident. The quadratic voltage term is initially the dominant term in the expression, but as voltage is reduced far into the sub-Vth regime, the exponential dependence on supply voltage (which reflects circuit delay) begins to dominate. Figure 10(b) illustrates the fact that there is a distinct energy minimum with respect to voltage. We can easily find the supply voltage at the minimum by determining the derivative of Equation (11) and setting it equal to zero. The resulting equation is nonlinear and must be solved using numerical methods. The final expression for the supply voltage at the energy minimum, which we denote Vmin, is shown in the following equation:

Equation 12(12)

Vmin does not necessarily correspond to Vdd,limit, described in the subsection on CMOS characteristics at the voltage-scaling limit. In fact, as the subsequent discussion shows, Vmin is usually well above Vdd,limit because of the dominance of leakage.

In Equation (12), Vmin depends only on the number of device stages, the activity factor, and two process-related parameters, η and m. This simple model has great value because switching between technologies requires only the determination of η and m. The accuracy of the model is confirmed in [9].

The importance of logic depth n and activity factor alpha in Equation (12) is obvious. To understand the relationship between these two parameters, we replace the ratio of n to alpha with a single parameter neff. This substitution is valid because logic depth and activity factor affect the energy characteristics of a circuit in very similar ways. A circuit with many stages (large n) will be leaky because the leakage time for each stage is increased. Similarly, a circuit with a low activity factor is more likely to be leakage-dominated because dynamic energy is proportional to the activity factor. In both cases, the circuit will exhibit a higher Vmin.

To properly understand the effect of neff and Vmin on a circuit, it is important to understand the notion of transistor utility [12], which embraces the idea that all transistors in an energy-efficient design should spend as much time as possible doing useful computation because the circuit consumes wasted energy during idle time. We can nominally assume that dynamic energy is a measure of useful computation (because switching transistors consume dynamic energy) and that leakage energy is the penalty paid for idle time. The goal of an energy-efficient design is therefore to optimize the circuit structure such that the ratio of dynamic energy to leakage energy is maximized. Maximizing the dynamic-to-leakage ratio enables a designer to effectively use supply voltage as a “lever” to decrease dynamic energy consumption and consequently total energy consumption. In other words, a design with high transistor utility generally has a lower Vmin than a similar design with low transistor utility. In the ideal case, Vmin approaches Vdd,limit, the lowest functional voltage.

If we adopt transistor-utility maximization as the goal of a design, it is obvious that a large neff is undesirable. Long paths or paths with low activity increase effective idle time and increase Vmin. Logic depth and activity factor tend to be a function of high-level architectural decisions, so they may not be characteristics that a circuit designer can easily exploit. However, the circuit designer does have the ability to decrease the penalty for idle time. Leakage-reduction techniques such as multiple-threshold CMOS (MTCMOS) [13], input vector control [14], and threshold control via adaptive body biasing (ABB) [15] are all tools that have the potential to lower Vmin and consequently the total energy consumed by the circuit. The leakage problem may also be addressed at the device level by improving the sub-Vth slope. Section 2 pointed to the importance of sub-Vth slope in low-voltage design, and our simple analysis clearly shows that minimizing sub-Vth slope should be a key goal of energy-optimal design.

We now consider whether the previous analysis is valid for a more complex design. Silicon measurements of a simple 8-bit microprocessor with 2-Kb memory in a 130-nm technology are shown in Figure 11. A more detailed description of the architecture may be found in [12]. The form of the curve is identical to that of the inverter chain (thus confirming the validity of the inverter chain analysis), but Vmin is higher than that of the inverter chain. The memory, which has a very low activity factor compared with typical logic, is largely responsible for the higher Vmin. This example suggests that very low circuit activity could push Vmin into the super-Vth regime. The next section investigates this topic.

Figure 11 Figure 11

Energy optimality in the near-Vth and super-Vth regions
As Figure 11 shows, circuit structures with very low activity factors, such as memories, face serious leakage penalties for extremely low-voltage operation. As the leakage energy of a design increases relative to dynamic energy, the optimal supply voltage tends toward higher values. If the threshold voltage is held constant, the optimal supply voltage will likely be near or above the threshold voltage.

We can extend the inverter chain analysis example from the previous section to make some simple but powerful conclusions about operation in the near-Vth and super-Vth regions. Consider an inverter chain with variable activity factor. As the switching activity decreases, the leakage energy of the circuit is unchanged, but the dynamic energy shifts lower. This downward shifting of the dynamic energy curve is illustrated in Figure 12 for a range of alpha values. As the dynamic-energy curve moves relative to the leakage curve, the location of the minimum-energy voltage, Vmin, shifts. When Vmin approaches the threshold voltage, the leakage-energy curve flattens because delay (and consequently leakage) is less sensitive to supply-voltage changes above the threshold voltage. As a consequence, the energy minimum is flattened, so that the choice of supply voltage can deviate slightly from the optimum with only a small energy penalty.

Figure 12 Figure 12

The core problem, however, remains unchanged. The dynamic- and leakage-energy curves still cross over one another and create an energy minimum. This is a key conclusion that is independent of the region of operation: The location of the energy minimum is entirely determined by the way in which the leakage-energy and dynamic-energy curves interact. Regardless of whether a system operates in the sub-Vth or super-Vth regime, all architectural and circuit techniques such as ABB attempt to shift either the leakage-energy or dynamic-energy curve to improve energy efficiency. Figure 13 shows that Vmin behaves similarly over a wide range of neff values. Although Equation (12) was derived for the sub-Vth regime, Vmin continues to exhibit an approximately logarithmic dependence on neff until Vmin reaches approximately 600 mV. Above 600 mV, Vmin is still roughly logarithmically dependent on neff, but the slope of the line becomes steeper. The location of Vmin is dependent upon leakage, so the relative insensitivity of leakage to supply voltage in the super-Vth region forces larger changes in Vmin in order to reach energy optimality (and consequently the slope is greater). Also note that saturation of Vmin occurs at very high neff values because dynamic energy becomes insignificant compared with leakage energy regardless of Vdd.

Figure 13 Figure 13

It is clear that the understanding of alpha, neff, and transistor utility (described earlier) developed for sub-Vth operation is valuable even when Vmin shifts into the near-Vth and super-Vth regions. While the fundamental goal of maximizing transistor utility remains unchanged, circuit design in the near-Vth and super-Vth regions is clearly different from design in the sub-Vth region. Several of the key differences between sub-Vth and super-Vth operation, including delay characteristics and sensitivity to threshold voltage and device mismatch, have already been discussed extensively. In practical applications, circuit design becomes much easier when the supply voltage rises above the threshold voltage. Circuit speeds increase and variability decreases (and noise margins increase in response) as the supply voltage is raised. The real challenge arises when a circuit is optimized so that the energy-optimal supply voltage lies within the uncertain realm of sub-Vth design. The next section shows that one of the most significant challenges is “process-related variability,” a phrase that is clarified in the next section.

Variability in the sub-Vth regime

Process-related variations, that is, those variations introduced in manufacturing, have become a significant factor that affects circuit performance, even in the super-Vth regime. As a result, there has been a movement to develop methodologies and tools for dealing with variation in order to eliminate the pessimism usually associated with “corner-based” design schemes—those that assign “worst-case” parameter values to determine resistance to variability in order to ensure that the circuit functions properly under a range of conditions. The consequences of process variation are far more severe when voltage is scaled into the sub-Vth regime because of the exponential dependencies of current, and therefore delay, observed in this regime. In addition to the aforementioned relationship between sub-Vth current and threshold voltage, temperature also plays a critical role in determining delay in the sub-Vth regime. The strong dependence on threshold voltage, in particular, leads to wild fluctuations in both delay and energy as well as a considerable reduction in noise margins.

Threshold voltage variations can be broadly placed in two categories: systematic variations (which include lot-to-lot, wafer-to-wafer, die-to-die, and intra-die spatially correlated variations) and random variations. Systematic variations arise from a variety of sources, including gate-length variations, global doping variations, and temperature variations. A number of techniques have been shown to reduce the effects of systematic variation. In [16] and [17], ABB was shown to reduce both frequency and leakage power variations in test circuits. In [18], ABB was used to reduce variation in sub-Vth logic. Dynamic voltage scaling (DVS) is another technique that may be used to limit variation. DVS is traditionally discussed in the context of dynamic power management, but it may also be used to improve both frequency and power yields [17] by enabling post-silicon tuning of the supply voltage. It is likely that adaptive systems that incorporate several circuit techniques such as ABB and DVS will be necessary to achieve high yields in sub-Vth designs.

In addition to systematic variations, random variations (specifically RDF) account for a significant portion of threshold variation [18]. Random variations present a greater threat than systematic variations to delay, power yield, and energy efficiency because global countermeasures such as ABB and DVS cannot be used to effectively address the problem. Researchers have shown that RDF grows in importance as supply voltage is scaled into the sub-Vth regime [19]. Furthermore, as voltage is reduced, total variation grows significantly and becomes dominated by RDF. RDF depends strongly on channel area [2] and may therefore be controlled with careful gate sizing. The effects of RDF may also be reduced by increasing the number of gates in a path because random fluctuations “average out” over long paths. Proper selection of gate sizes and architecture (which determines logic depth) is very important in the design of robust sub-Vth circuits.

In the subsection on finding the energy minimum, we showed that energy minimization relies on the proper balance of dynamic energy and leakage energy. The strong threshold dependence of leakage makes energy optimization strongly dependent on variability. Researchers have developed statistical models of circuit delay, power, and energy and have shown that variability raises Vmin by as much as 78 mV and is therefore a threat to energy efficiency [19]. The net voltage and energy shifts are in the positive direction because delay has a lognormal distribution if we assume a normal threshold-voltage distribution. A distribution of delays across many designs is therefore skewed toward longer delays. This effective increase in delay raises the relative importance of leakage, which increases the Vmin and the total energy of a design. Larger gates and longer logic paths provide the simplest and most powerful solutions to RDF, but these are, in general, contradictory to the goals of energy minimization. Upsizing increases dynamic energy, while larger logic depths lead to lower transistor utility. Designers must therefore carefully strike a compromise between the minimization of variability and the minimization of worst-case energy.

Although energy efficiency is a serious question in the face of variability, robustness is a more pressing concern. Even a small mismatch between p-FET and n-FET threshold voltages (introduced by either systematic or random variations) causes skewed current ratios and a dramatic reduction in noise margins. Table 7 shows how a systematic threshold mismatch can lead to a considerable increase in Vdd,limit. We must also consider how noise margins are affected when the supply voltage is greater than Vdd,limit. Simulations of an inverter (130-nm-technology node) with Vdd = 200 mV show that noise margins are reduced by 19%, 38%, and 79% given p-FET/n-FET threshold mismatches of 25 mV, 50 mV, and 100 mV, respectively (where a mismatch of 2δ means |Vth,p-FET| = |Vth,p-FET,nominal| − δ and Vth,n-FET = Vth,n-FET,nominal + δ). These reductions are clearly not tolerable, and hence significant effort to control threshold matching is necessary. Static noise margins are of particular importance in SRAM, implying that designers of sub-Vth memories must pay special attention to mismatch problems. SRAM design issues are covered in detail in the next section.

Process-related variability is one of the critical barriers that must be overcome for sub-Vth logic to have widespread industrial use. A combination of global techniques including ABB and DVS should be employed in combination with careful selection of design parameters, including logic depth and transistor sizing. Well-designed logic, which accounts for variability from the beginning of the design cycle, fosters high circuit yields while also minimizing energy.

Sub-Vth SRAM design issues

The complications of sub-Vth design have been covered extensively for typical logic. We now make special considerations for SRAM. As a result of reduced activity factors, energy consumption due to leakage is especially important in SRAM caches. Depending on the size of the memory, only a small portion may be active at any given time. As shown in Figure 12, such a condition inevitably increases Vmin because reduced dynamic energy shifts the overall energy minimum. This suggests that for ultralow-power designs, SRAM arrays should be operated at a higher supply voltage than that for logic. Figure 14(a) demonstrates this basic concept for a 65-nm process, where the supply voltage that minimizes energy is above 0.4 V because of a low activity factor. With progressively higher levels in the memory hierarchy that are larger or even lower in the activity factor, Vmin will increase further.

Figure 14 Figure 14

In an SRAM cell, a proper ratio of device strengths must be maintained among the pass-gate, pull-down, and pull-up transistors to ensure functionality under write, read, and standby conditions. When the supply voltage is reduced, acceptable ratios must be maintained: The drive strength of the pass-gate must be greater than that of the pull-up transistor to allow writing of the cell, and the pull-down must be stronger than the pass-gate to avoid accidental flipping of the cell during a read event. In the standby mode, functionality constraints are the same for logic: The two inverter transfer characteristics must be maintained. In general, the read and write requirements provide more severe constraints on cell functionality at low supply voltages. With proper setting of device threshold voltages and widths, however, cell functionality can be maintained even  at extremely low voltages. Measurements  of a  6T-SRAM cell shown  in Figure 5(b) confirm that bi-stable operation is possible with Vdd as low as 70 mV. In such a case, Vdd,limit can be lower than Vmin, and optimum energy can be achieved.

The rapid increase in random process-related variability in recent technology generations, especially variability due to random dopant fluctuations, can have a severe impact on Vdd,limit for complete SRAM arrays. Such variability is particularly important in SRAM because of the widespread use of minimum-size devices and aggressive technology ground rules. While a single cell with perfectly matched devices can function at very low voltages, from a statistical standpoint, cells with significant threshold-voltage mismatch will exist in a large array. Such threshold-voltage variation can effectively degrade noise margins such that cell functionality is compromised [2021]. When variability is considered, Vdd,limit rises dramatically for a complete SRAM array. As an example, Figure 14(a) shows that the variability characteristic of a 5σ cell (as needed to yield an ~1-Mb cache under random variation) increases Vdd,limit to well beyond Vmin for a 65-nm technology. In addition, this increase is expected to become more serious as technology scales [22]. As such, optimal energy consumption cannot be achieved because an SRAM array cannot function at the optimal voltage for energy consumption under the presence of variability.

Error-correction codes and redundancy are often used to address variability in today's SRAM arrays, but these techniques will most likely be insufficient to completely address the widespread problems expected at low voltages. New circuit techniques must also be used to counter variability. By modifying the traditional 6T-SRAM cell circuit, a more variation-tolerant design can be attained. With the 8T cell, as depicted in Figure 15(b), significantly improved read noise margins can be realized [23]. This improvement occurs because the read-disturb condition [as depicted in Figure 15(a), in which the stored “0” node of the cell is pulled above ground] is eliminated by the introduction of a separate read port in the SRAM cell. With discrete read and write ports in the cell, the device ratio constraints of the traditional 6T-SRAM cell for read and write functionality are removed, which allows for simultaneous improvement of both read and write noise margins. As a result, variability tolerance is greatly enhanced, and Vdd,limit can be reduced to less than Vmin. As shown in Figure 14(b), employment of an 8T-SRAM design can allow for operation at the supply voltage for optimal energy, thus making it a desirable design option for ultralow-power SRAM caches.

Figure 15 Figure 15

4. Architectural choices at ultralow-voltage operation

Power-aware microarchitectures are able to have a substantial impact on power consumption, and can be more valuable than transistor-level remedies in reducing total power. This section covers several techniques that may be used to shift Vmin and tune energy efficiency at the architectural level. The role of multi-core processing in recovering some of the speed penalty paid for sub-Vth operation is also discussed in this section. In order to understand the value of energy-efficient architectural techniques, we first discuss several metrics that are commonly used to describe energy and power efficiency.

Metrics

Common architecture metrics capture the influence of energy minimization design for chip logic. Energy- and power-driven design affects area and performance as well as power; capturing these influences in selected benchmark conventions becomes important. Below, we describe selected common benchmarks that provide insight into energy use.

  • MIPS per watt: The number of instructions completed by a processor is often described by the millions of instructions per second (“MIPS”) completed at peak load. This benchmark, divided by the power consumption (in watts), describes the power cost of an architecture throughput technique. More effective power- and energy-aware architectures exhibit higher values of MIPS per watt. Supplemental logic circuitry is typically added in high-performance microprocessors to architecturally improve throughput. These performance accelerators add more circuits to execute the same logic and reduce power efficiency. Thus, the total energy cost of an instruction rises as these innovations are added.

  • Energy–delay product (EDP): For a given benchmark logic path, the product of the path delay and its total ac and dc energy consumption provides a measure of the effectiveness of the architecture. Large relative values of EDP indicate increased delays and/or high energy consumption, neither of which is tolerable in a power-constrained machine. EDP is often computed for common benchmarks such as a ring oscillator comprising the inverter driving a fan-out of four additional inverters (“INVFO4”). The utility value of EDP is realized when evaluating the design “return” as a result of accepting reduced performance. Active and static power per INVOF4 is often quoted in the literature [24].

  • Transactions per CPU cycle (TPCC): Energy-reduction techniques often have a negative impact on microprocessor throughput; for this reason, the designer must consider changes to the number of transactions per CPU cycle (“TPCC”) that can be retired. TPCC is a measure of microprocessor logic efficacy and is purely an architecture performance metric. Nonetheless, TPCC is a superb bellwether that the system designer can use to assess overall system impacts to power management.

Each of these metrics offers valuable information to the designer. However, no single metric is best for all applications. Instead, designers must be careful to choose the metric that best describes the particular power, area, and delay requirements of an application. Though different applications may be driven by different metrics, the applications will use very similar energy-reduction techniques. The following two subsections describe some of these architectural techniques.

Semitransparent and structural alterations for power savings

Semitransparent uniprocessor power-management techniques reduce power through consideration of under-utilized resources. Note that we use the term semitransparent to refer to a technique that is managed at the hardware level and that does not require support from the operating system. Adaptive body biasing, or ABB, modulates the voltage of the substrate in order to dynamically change the static-power-vs.-active-performance tradeoff asserted by threshold voltage. This technique was mentioned in Section 3 as a tool that may be used to limit systematic threshold variability in a sub-Vth system. In ABB, machine state requirements are anticipated via instruction-lookahead techniques, and substrate voltages are adjusted to optimize power without dramatically affecting performance [16]. The use of ABB in the super-Vth regime has recently fallen out of favor. In [25], it was shown that the application of a reverse body bias (RBB) becomes less effective with technology scaling. Short-channel effects (SCEs) worsen with shrinking transistor dimensions, so threshold control via ABB becomes less effective. In the sub-Vth regime, this problem is much less severe because SCEs, in particular DIBL, are much reduced. It is important to note that ABB is in accordance with the energy model derived in Section 3. By reducing the standby-mode leakage, ABB lowers the penalty paid for idle time, raises transistor utility, and lowers Vmin.

Alternatively, logic may be partitioned for placement in specific voltage islands with independent supplies that are compiler-controlled [26]. In Section 3, it was observed that different circuit blocks tend toward different energy-optimal supply voltages. Voltage islands are attractive because they enable a designer to target the energy-optimal conditions on a block-by-block basis rather than on a system level. Memory, in particular, is expected to have a much higher Vmin than conventional logic. With memory and logic on different voltage islands, memory would be able to operate at higher voltages to address both functionality and leakage problems, and logic would be allowed to scale voltage more aggressively to yield lower dynamic energy.

Just as different blocks have different Vmin values, different runtime conditions (such activity factor and temperature) may require the scaling of supply voltages to maintain energy optimality. Fine-grained dynamic voltage scaling (DVS) allows tuning of a design for runtime conditions and therefore holds promise for use in dedicated energy-optimal design. DVS may also be used to achieve combined performance and energy targets [27]. In [28], DVS was used in combination with ABB to effectively minimize power under a performance constraint, and the system was found to be functional at voltage levels as low as 175 mV. Such an architecture has the potential to target energy optimality given both variable runtime conditions and process variations. A DVS system that operates in both sub-Vth and super-Vth regions requires careful design because, as we saw in Section 2, n-FET/p-FET mismatch is a function of supply voltage.

Voltage gating, or multiple-threshold CMOS (MTCMOS), is yet another power-management technique in which large on-board header and footer MOSFETs provide power to specific domains of the microprocessor chip. When the resource in these domains is no longer needed, their supply access is cut by these devices [29</