IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    

IBM Journal of Research and Development

IBM System z9   Volume 51, Number 1/2, 2007
Table of contents: HTMLPDF This article: HTML PDFDOI: 10.1147/rd.511.0005Copyright info

Optimization of silicon technology for the IBM System z9

by D. J. Poindexter,
S. R. Stiffler,
P. T. Wu,
P. D. Agnello,
T. Ivers,
S. Narasimha,
T. B. Faure,
J. H. Rankin,
D. A. Grosch,
M. D. Knox,
D. C. Edelstein,
M. Khare,
G. B. Bronner,
H.-J. Nam,
and S. A. Butt

IBM 90-nm silicon-on-insulator (SOI) technology was used for the key chips in the System z9™ processor chipset. Along with system design, optimization of some critical features of this technology enabled the z9™ to achieve double the system performance of the previous generation. These technology improvements included logic and SRAM FET optimization, mask fabrication, lithography and wafer processing, and interconnect technology. Reliability improvements such as SRAM optimization and burn-in reliability screen are also described.

Introduction

A key goal of the System z9* was to provide twice the system-level performance of the prior zSeries* 990. To achieve this goal, the chip technology and design would have to provide more than 40% improvement in frequency to 1.72 GHz over the predecessor z990 system, with added logic and on-chip memory and without an increase in overall power consumption. The prior-generation z990 system [1] was designed using the IBM 130-nm silicon-on-insulator (SOI) chip technology with eight levels of metal interconnect. To achieve the z9* objectives, the 90-nm SOI technology with ten levels of Cu metallurgy was used, resulting in a 1.4X nominal performance improvement over the 130-nm technology [2]. Several technical innovations were required to enable this performance gain; the most significant were highly strained tensile n-FET liner, advanced gate-patterning techniques, hierarchical back-end-of-line (BEOL) using a low-k insulator, and electrically programmable fuses.

At the center of the z9 system is the multichip module (MCM) containing the processor electronics. The MCM is an advanced 95-mm × 95-mm glass ceramic module that contains sixteen chips interconnected with more than half a kilometer of internal cross-chip wiring. Fifteen of the sixteen chip sites use the 90-nm SOI technology: eight dual-core central processor (CP) chips, four SD chips (L2 data cache, 98 Mb custom), one storage control (SC) chip, and two memory storage control (MSC) chips (L2 to L3 interface). The sixteenth chip is the clock using IBM 130-nm bulk CMOS technology. The z9 system, when fully configured as a 54-way system, comprises four MCMs, each containing 16 dual-core processor cores running at 1.72 GHz. Figure 1 is a schematic diagram of the MCM.

Figure 1 Figure 1

The scaling of high-performance process technologies from 180 nm to 130 nm to 90 nm produced increased performance and density but also resulted in increased device leakage currents and stability concerns for the SRAM cell. Indeed, in 90-nm technology, leakage currents and their associated power can be comparable to active switching power in typical designs [3]. Given that a) the multichip module has an upper power limit; b) each chip on the MCM affects the system-level performance uniquely; and c) each chip has its own power/performance relationship, there was a requirement to optimize the entire chipset to allow the highest possible system-level performance at an acceptable power level. Table 1 shows the percentage of MCM power used by each of the chip types (note that Figure 1 shows multiple instances of each type of chip). The power supply of the system was also optimized for these purposes, allowing tradeoffs to be made among active power, passive power, and cycle time. System-level reliability requirements dictated that these designs undergo an aggressive burn-in, with increased voltage and temperature (producing up to 400 W per chip), prior to system build. Controlling the leakage currents in the chips under these severe conditions also presented a number of issues which are described in the paper.


Table 1 MCM power dissipation by chip type.
 Power dissipated (%)

CP66
SD36
SC14
MSC 4

The above combination of challenges drove an aggressive and systematic approach to process window evaluation. Process window evaluation entails understanding the impact of process variations within the specification range on the chip at wafer final test, module, and system test. From a power/performance perspective, device design optimization (to improve the power/performance characteristics of the individual devices), semiconductor and mask technology enhancements (to reduce device-to-device variability across the chips), and device threshold voltage experimentation (to optimize leakage currents at the required performance levels for each of the designs) were all of primary importance. This work was tightly coupled to burn-in characterization and system-level verification, ensuring that the chip-level optimizations produced acceptable characteristics at subsequent levels of assembly.

In this paper we first describe the key features of the 90-nm SOI technology and then discuss the logic and SRAM designs, including optimization of the gate oxide, programmable fuses, and interconnect technology. In the section on optimization of power and performance, we discuss device optimization, including device design, mask design and fabrication, lithography, and wafer processing. Finally we describe the burn-in screen, which ensures high reliability.

90-nm SOI technology

The key 90-nm SOI technology attributes are listed below [4]:

  • CMOS technology on p-SOI substrates.
  • Two gate-oxide thicknesses.
    • Thin oxide is nitrided SiO2 1.12 nm thick.
    • Thick oxide is 2.2 nm thick.
  • FET device types include the following:
    • Regular-threshold-voltage (regular-Vt) n-FET and p-FET (n-FET uses highly stressed liner).
    • Low-threshold-voltage (low-Vt) n-FET and p-FET.
    • High-threshold-voltage (high-Vt) n-FET and p-FET.
    • Thick-oxide n-FET and p-FET.
  • Low-resistance Co-silicided n+ and p+ polysilicon diffusions.
  • Shallow-trench isolation (STI).
  • Precision resistor.
  • Tungsten local interconnect level for connecting diffusions and polysilicon wires/gates.
  • Ten levels of Cu metallization:
    • 1X (thin wire) 0.25 μm thick, 0.28 μm minimum pitch.
    • 2X, 4X, and 6X (2X = twice the thickness and pitch of the 1X level).
  • Planarized passivation and interlevel dielectrics:
    • Four 1X and two 2X wiring levels insulated with SiCOH (k = 3.0).
    • Two 4X and two 6X wiring levels insulated with fluorinated SiO2 (k = 3.6).
  • Electrically programmable fuses (eFUSEs).
  • C4 terminals.

This technology was practiced in a state-of-the-art, 300-mm fabrication facility (fabricator) [5].

Logic device design

The device technology was first described in 2002 [6]; a cross section is shown in Figure 2(a). In the device design, elements were incorporated to maximize performance while simultaneously controlling short-channel effects and the associated leakage issues. The gate dielectric uses a nitrided oxide measuring 1.12 nm thick. The nitrogen content was tuned to minimize gate leakage and achieve low Tinv (electrical equivalent thickness when the device channel is inverted) with minimal degradation of carrier mobility. The gate electrode and source/drain dopings were tuned for optimal gate polysilicon activation and device short-channel control. An extension offset spacer [Figure 2(b)] was implemented to allow for a high-dose extension implant (to minimize the extrinsic resistance of the device) while retaining low gate-to-extension overlap to minimize the parasitic capacitance [6].

Figure 2 Figure 2

Two separate nitride spacer thicknesses were implemented to independently control the source/drain doping and cobalt silicide proximity from the channel for n-FET and p-FET devices (Figure 3). With the implementation of the thin spacer, a substantial reduction in the source/drain resistance was obtained for the n-FET, leading to improved device characteristics. A thicker spacer was retained on the p-FET in order to independently control the short-channel effect.

Figure 3 Figure 3

The logic device menu offered three separate logic device pairs: the high-Vt, regular-Vt, and low-Vt devices. These device pairs had subthreshold leakage currents in the nA/μm range, tens-of-nA/μm range, and hundreds-of-nA/μm range for the high-Vt, regular-Vt, and low-Vt pairs, respectively (all for nominal channel lengths at 1.0 V and 25°C). Associated with the higher leakage, the low-Vt device set offered ~10% more drive current than the regular-Vt device set. Similarly, the high-Vt device set delivered about 15% less drive current than the regular-Vt set. All of the device types were utilized in the designs and allowed the design team to actively trade off power and performance. In general, the low-Vt devices were used very judiciously and inserted only in performance-critical paths. In fact, for the chip with the greatest low-Vt usage (CP), the low-Vt device usage was only 3% of the total device width on the chip. In contrast, the high-Vt devices were used extensively to reduce leakage currents in circuits that were not performance-critical. For example, the CP chip utilized ~30% of the total device width used by high-Vt devices.

SRAM device design

In addition to the logic devices, the technology offered the ability to selectively tune the devices in the SRAM cells. While the technology supported SRAM cells down to 0.99 μm2, the cell utilized in System z9, which measured 1.15 μm2 in area, was chosen for performance, standby power, and stability. Parametric optimization suggested the use of a high-Vt n-FET and an intermediate-Vt p-FET (having a threshold voltage between the regular and high-Vt values) in the SRAM cell that was implemented in the designs. The same six-device-FET SRAM cell is used across the four-z9 90-nm chipset (Figure 1), including the high-performance L1 core arrays on the CP chip and large L2 cache arrays on the SD chip. The CP arrays were designed to operate at twice the frequency of the SD arrays. Array cell stability, performance, and reliability had to be optimized and balanced against one another to meet the z9 application requirements. The 90-nm technology reduced the size of the SRAM cells by ~50% compared with the previous generation. With continued area scaling, SRAM stability has become a critical issue, and the device strengths and threshold voltages were carefully optimized to achieve acceptable stability margins and performance levels. To ensure high reliability, various monitor structures were stressed; the results showed that the key parameters [time-dependent dielectric breakdown, hot-carrier stress, and negative-bias thermal instability (NBTI)] were within the acceptable limits.

The basic SRAM cell design and reliability of device parameters were also determined to be acceptable. A product-like reliability evaluation was then performed on a 5-Mb SRAM at 1.5 V and 140°C. These initial results revealed a serious Vmin degradation even after 2 hr stress. (Vmin is the lowest voltage at which an SRAM array still functions; it is a key reliability parameter, since Vmin tends to increase over the life of a product due to NBTI [7]). If Vmin exceeds the worst-case (low) operating voltage of the chip, a failure will occur in the field. In this study, the mean Vmin value increased by more than 170 mV under stress, and standard deviation increased three times its initial value. Further, the initial Vmin showed very large shifts, whereas some of the modules with high initial values showed very little shift [Figure 5(a), shown later]. Almost all of these shifts were related to single-cell fails in the module. Since the above results were unacceptable for product reliability, a series of studies were launched in order to understand the root cause of these Vmin shifts, to develop process changes to fix them, and to model this behavior in order to predict it in future products.

A series of electrical and physical measurements were performed on failing bit cells. These measurements revealed a clear pattern in which the pull-down n-FET (T3 and T4 in Figure 4) showed an increase of more than two to three orders of magnitude in gate leakage current, indicating that the vast majority of the fails were related to gate oxide breakdown in the pull-down n-FET. Physical analysis on the failing bits did not show any additional defect signatures in the SRAM cell. In Figure 4, we hypothesize a gate defect in T4. The n-FET gate leakage allows node A to drift downward and node B to drift upward, thus affecting the read stability margin of the cell and similarly Vmin.

Figure 4 Figure 4

Silicon oxide optimization
Silicon oxide integrity in the SRAM bit cell and SRAM FET strengths were the focus of process improvements. A series of experiments were launched to improve gate oxide integrity in the SRAM cell. The key process changes that produced the greatest reduction of such defects were the n-FET polysilicon pre-doping implant dose, the gate oxide thickness, and plasma charging damage during the middle-of-the-line reactive ion etch (RIE) process. The SRAM device strength, at fixed leakage, was also improved continuously along with logic device performance, which was described earlier. Each process element was evaluated systematically such that their combined impact on post-stress Vmin could be evaluated. Optimization of the gate oxide thickness and n-FET polysilicon pre-doping as well as continued reductions in plasma charging enabled a very significant improvement in Vmin behavior. Figures 5(a) and 5(b) respectively show a correlation between initial and post-stress Vmin values on chips before and after the process fixes. Initial modules showed poor correlation between initial and post-stress Vmin, whereas modules with all of the process fixes showed a very clear correlation between initial and post-stress Vmin, with a predictable constant shift of ~40 mV expected due to NBTI in the p-FET. This data was verified in volume as chips went through the burn-in screen as described below. With these process fixes implemented, the final SRAM reliability met the highest IBM quality standards.

Figure 5 Figure 5

Programmable fuses
The technology supports an electrically programmable fuse (eFUSE) which operates via an electromigration mechanism, providing high reliability with no additional masks [8]. A programmed fuse is pictured in Figure 6(a). This capability was used extensively, allowing the SRAM arrays to be repaired at various levels in the system build (particularly powerful was repair at the module level following burn-in). In addition to SRAM repair, eFUSEs were used to store key chip and multichip module information which was leveraged in system-level verification and debug.

Figure 6 Figure 6

Interconnect technology

The development of a robust, high-performance BEOL interconnect technology was of critical importance in meeting the demanding power, frequency, and reliability requirements of the z9 processors. Copper metallization, pioneered by IBM in the mid-1990s [9], was extended to ten levels in a hierarchical structure. The specific designs discussed here utilized four “thin-wire” or 1X (the terminology corresponds to the multiplier on thin-wire thickness) levels at 0.25-μm thickness and 0.28-μm minimum pitch, two 2X levels (at twice the thin-wire thickness and pitch), two 4X levels, and two 6X levels; see Figure 6(b). The entire wiring hierarchy can be considered to be embedded in low-dielectric-constant insulators. All 1X and 2X levels are in SiCOH (k = 3.0) dielectric (an organosilicate glass), and the 4X and 6X levels are in fluorinated silicon dioxide [SiO2 (k = 3.6)].

Of particular significance was the development of the low-k SiCOH film [1011], which has a relative dielectric constant of ~3.0 and an integrated “effective” dielectric constant (including interconnect caps) of ~3.2. SiCOH films are known to be tensile and mechanically relatively weak compared with the SiO2 or fluorinated SiO2 glass films which they replace; thus, they are more susceptible to brittle fracture from thermomechanical mismatches with the Cu or from chip–package interactions [12]. This risk was addressed during the research phase of this project through a careful choice of the film precursors and plasma deposition techniques enabled by a deep fundamental understanding of the materials set. Further optimization of process parameters in the development and manufacturing phases led to a reduction in the film stress to levels that produced a robust BEOL technology immune to stresses induced by chip-to-package interactions.

The inclusion of this low-dielectric-constant material at the 1X and 2X levels led to a ~20% capacitance reduction at those levels without compromising yield or reliability compared with the fluorinated glass dielectric (k ~ 3.6) used in the 130-nm technology node. The high level of wafer-level reliability was due in large part to the extreme care that was exercised in analyzing and improving the adhesion properties at all film interfaces.

Optimization of power and performance

In 90-nm high-performance technology, the issue of high standby leakage currents is significant and had to be addressed on the entire chipset on the multichip module. The two most challenging chips were the central processor (CP) chip, whose frequency ultimately dictated the system performance, and the L2 data cache (SD) chip with its very large number of SRAM cells (97 Mb) and the associated narrow-width devices.

The primary tools used to measure success or failure in this metric are product-like ring oscillators whose performance/power ratio has been shown to correlate well with product measurements [13]. Figure 7 shows the data of a product-like ring oscillator of inverters with a fan-out of 3. Two regions are shown in this figure. In Region 1, the chip leakage is relatively independent of the ring oscillator delay and is mainly determined by leakage through the gate oxide. In Region 2, the chip leakage rises sharply as the device subthreshold leakage becomes significant and ultimately dominates the chip leakage. Subthreshold leakage is controlled by an intimate coupling between the individual FET device design point (on-current vs. off-current), the optimization of the relative strengths of the n-FET/p-FET device pairs, and the control of these parameters across the chip (and wafer).

Figure 7 Figure 7

Device optimization

The initial device design points were relatively close to those given in [6], where the leakage currents of the various n-FET/p-FET pairs were matched. Previous experience had shown that the nominal conditions might not be optimal at chip level; thus, the initial attempt at power/performance optimization involved a relatively straightforward analysis of the chip response to the regular-Vt device process window experiment (given the process flow, the high- and low-Vt devices follow the regular-Vt devices). It was observed that the cycle time of the chip was a strong function of the n-FET drive current, while the standby leakage current of the chip was a strong function of the p-FET leakage current. Because of the inherent differences in electron and hole mobility, the n-FET is ~2X stronger than the p-FET for a given FET width and length. Thus, for balanced CMOS circuits the p-FETs are ~2X wider than the n-FETs, and since subthreshold leakage is a function of transistor width, p-FET leakage across the chip can dominate n-FET leakage. As a result, the process was modified to reduce the n-FET threshold voltages, increasing strength and leakage while simultaneously increasing the p-FET threshold voltages.

In addition, given the relatively strong dependence of chip cycle time to n-FET drive current, there was a strong desire to further improve the n-FET characteristics, and the use of tensile nitride liner was investigated [14]. While a tensile contact liner film enhances electron mobility, it degrades hole mobility, and Ge implantation was used to relax this film over the p-FET devices. Figure 8(a) shows the results of the strained liner implementation on the n-FET drive current. As shown, the results were profound and led to rapid implementation of this technology for the 90-nm SOI technology and the z9 chipset.

Results on the CP chip for the device optimizations outlined above are shown in Figure 8(b). Note the substantial improvement in the chip leakage at a fixed ring oscillator delay. Each data point is the average of 50 chips. Under these conditions, the maximum allowable current for the chip is ~50 A and, as shown, the device improvements allowed substantially faster ring oscillator speeds (~6.5%) to be consistent with this requirement. Improvements of this sort allow recentering of the device to a faster point and subsequent system-level cycle-time improvements at equivalent parametric yields.

Figure 8 Figure 8

Previously, improvements to the device design point were discussed; this work was absolutely necessary, but it was insufficient to attain the desired MCM power/performance goals for this program. The next problem that was addressed was the control of the device design point across the relatively large chips and across the entire 300-mm wafer. Key parameters here are the gate conductor dimension (which includes both the mask and lithographic components) as well as the dopant profiles that determine the device characteristics.

The critical gate conductor image starts its life as a drawn dimension inside the design tool flow. The next stage in its transformation into an actual, on-wafer device is manipulation through a series of optical proximity correction (OPC) steps, designed to take the design data and manipulate it so that the actual on-wafer dimensions are as close as possible to the designed structures. These sets of manipulations are especially critical on this level, since the final on-wafer dimensions are ~40 nm and they are imaged on the wafer surface using 193-nm deep-ultraviolet light. Issues such as line-end foreshortening and through-pitch variations are minimized through the use of OPC algorithms [15]. Following the OPC steps, the printing behavior of the entire chip is simulated and checked against the initial design input, and deviations are highlighted in a series of steps referred to as model-based verification [16].

Mask design and fabrication
Following a successful exit of the verification steps, the manipulated data is used to generate the lithographic mask that is used to expose the wafers in the fabrication facility. Great care must be taken in this critical step, since even if everything else in the fabrication process is absolutely perfect, the chips still suffer from the linewidth variation present on the mask. The gate conductor images are transferred onto the mask through direct write utilizing an electron beam.

Through 130-nm technology and into 90-nm technology, critical masks were fabricated using a raster-scanned gaussian beam (RSG) patterning tool. RSG writers form images by continuously raster-scanning an 80-nm round gaussian beam with very high-speed blanking (320 MHz) to form the images [17]. This type of mask-writing tool was utilized in the initial design passes for this program. We note that this type of mask-writing tool is consistent with the specifications outlined by the 2004 International Technology Roadmap for Semiconductors for the 90-nm-technology node. For the newer chip design revisions, a high-speed, vector-shaped-beam (VSB) e-beam system was implemented. VSB systems form images at the appropriate position on the mask and pass the e-beam through several high-precision shaping apertures to form rectangles of various sizes and image them into the resist. The use of shaping apertures on the VSB system resulted in a significant improvement in corner-rounding performance compared with the RSG system.

Another major difference between the two e-beam writers was in the correction scheme used to compensate for the e-beam proximity effect [18]. This effect is caused by backscattering of electrons off the mask substrate surface and non-ideality of the projected beam, resulting in additional undesired electron dose in regions adjacent to other exposed areas up to 25 μm away. If not handled properly, this proximity effect can cause undesirable offsets between sparse and densely written areas of the mask that would subsequently have a deleterious effect on chip performance. The RSG system uses a two-pass proximity-effect correction method known as GHOST [19] which requires the use of a very low-contrast resist. In contrast, the VSB system uses a real-time proximity-effect correction method that adjusts the local e-beam dose on the basis of the local proximity environment caused by adjacent shapes on the mask. Under this scheme, resist images in very dense gratings receive a lower primary e-beam dose than do isolated images in order to compensate for the unintended backscattering.

A summary of the key parameters attainable with each of the mask writers is given in Table 2. As shown, masks fabricated with the VSB writer were significantly improved compared with their earlier counterparts fabricated on an RSG system.


Table 2 Summary of key mask parameters as a function of the mask-writing technology. Note that the mask dimensions are reduced by a factor of 4 when imaged on the wafer.
ParameterRSG writerVSB writer

Technology node130 and 90 nm90 and 65 nm
 
Local (1 mm × 1 mm) dimensional uniformity7 nm2 nm
 
Global dimensional uniformity10 nm5 nm
 
X to Y offset5 nm3 nm

Finally, the VSB writer has the capability to support correction of dimensional errors as a function of position across the entire reticle field. Basically, this enables the dramatic reduction or elimination of systematic dimensional variation across the entire mask. Sources of these systematic variations include errors caused by both the e-beam writer and subsequent pattern-transfer processes. These variations have been found to be significant under certain conditions.

Lithography and wafer processing
Once we were satisfied with the tolerance control on the gate conductor mask, the challenge moved to the wafer. As discussed previously, the OPC steps are designed such that the design dimensions are accurately imaged on the wafer surface. Once imaged, the gate conductor features are generated by etching the polysilicon regions that are not protected by the photoresist. In general, the etch bias (the dimensional difference between the final etched feature and the initial feature in photoresist) is not constant across the wafer and can lead to a center-to-edge-of-the-wafer variation in the gate conductor dimension. It should be noted that for relatively large chips, this across-wafer signature can yield a significant across-chip variation in the gate conductor dimension and can adversely affect the performance of a chip at fixed power. The gate conductor etch process was optimized to flatten the across-wafer etch bias signature as much as possible. Modifications to the basic etch chemistry (gas flows, pressures, and wafer temperature) were extremely effective in reducing this effect. Further process optimizations involved detailed characterization of the etch bias of the improved process across the wafer and subsequent adjustments to the lithographic dose at the edge regions of the wafer. The exposure map was relatively complex and was optimized to match the etch chamber.

The process optimizations outlined above all concern control of the gate conductor dimension, which is indeed the primary variable that governs device behavior; however, other variables also play an important role. Two other areas of focus were the control of the spacer films and the control of the optical coupling of the rapid thermal processing steps that control the transient local temperature variations in the junction activation steps. The impact of the spacer films on final device control is conceptually simple—tighter control of this dimension reduces the spread in both the extrinsic device resistance and the short-channel behavior of the device. Control of the transient temperatures during the rapid thermal processing steps is more subtle and involves control of the local emissivity of the wafer, which in turn is a function of the local pattern density of both isolation and device structures [2021]. To achieve more uniform pattern density, dummy shapes are added to the design data with fill algorithms. It was found that modification to our standard fill algorithms (filling both isolation and gate layers prior to mask build) could have a significant impact on the across-chip device control, and these algorithms were adjusted appropriately for that purpose.

As a result of the power/performance optimizations discussed previously (device design improvements, improvements in the mask dimensional control, and improvements in the across-wafer and across-chip parametric device control), substantial improvements were made to the performance of the processor chip at fixed power, as illustrated in Figure 9. As shown, a benefit in ring oscillator delay was achieved compared with the initial technology in the leakage range of interest (~50 A). These technology improvements yielded an improvement of >5% performance at the system level at fixed power.

Figure 9 Figure 9

Burn-in reliability screen

To achieve the reliability objectives of the z9 system, the silicon chips utilized on the multichip module are subjected to a burn-in process which is an effective product-defect and reliability screen [22]. An effective burn-in process involves the use of elevated temperature and voltage beyond the application conditions to remove early failures from the population. The relationships among the applied voltage, the applied temperature, and the defect acceleration achieved are well established [23]. Figure 10 shows an example of the equivalent power-on hours accelerated by increased burn-in temperature and voltage.

Figure 10 Figure 10

Burn-in is typically most effective at accelerating early-life failures. The intent of the burn-in process was to improve the chip reliability, and ultimately the reliability of the z9 system. The use of accelerated conditions on these advanced technologies comes with significant challenges.

The key challenge faced when implementing burn-in for the z9 chipset was power. Burn-in power per chip ranged between 70 and 400 W. In the burn-in processing, a power challenge is synonymous with a thermal challenge [2425]. Burn-in power has increased with technology scaling and the power trends of server systems. Burn-in powers have increased by ~400X in the last ten years.

In the system, chip power is generated from both leakage currents and transistor switching currents which are associated with running at the nominal system speed. In the burn-in environment, the clock speed is reduced, and transistor leakage currents are the primary power driver. Power at burn-in conditions is generally between 2X and 4X of that seen in the system application. What fundamentally drives the high powers are the accelerated application conditions associated with burn-in processing. The relationship between leakage currents and the acceleration factors of burn-in (voltage and temperature) are exponential in nature. Small increases in voltage and/or temperature drive nonlinear increases in leakage currents. Leakage currents are also related to channel length. The short-channel transistors that are desirable for system and chip performance also have the highest leakage. In summary, to obtain system goals of high reliability and maximum performance from a given technology requires a burn-in solution capable of handling high power.

In addition to the raw power levels associated with burn-in, a number of other dimensions of the power/thermal challenge are in play. The first dimension is the variation of power. At the same accelerated voltage and temperature conditions, each individual silicon chip of the same design may exhibit varied power dissipation. This is driven by inherent process fabrication variation, key elements of which were discussed above. The second dimension of burn-in power that presents additional challenges involves power density and thermal feedback. The areas of higher power density on a die become thermal hot spots which are amplified at burn-in conditions. These hot spots can become the limiting factor with respect to the reliability acceleration conditions that can be achieved. Hot spots tend to have higher concentrations of shorter-channel devices. As hot spots reach burn-in conditions, higher leakages are experienced, which in turn drives more power, and this drives higher local temperatures. This localized self-feedback mechanism can progress to an unstable condition. The confluence of process variation, design, and a slightly variable burn-in thermal environment can interact to create a thermal runaway situation. This runaway condition, although extremely low in probability and frequency, is difficult to predict. The z9 chips and modules were thermally modeled to ensure that the burn-in hardware can handle not only the average power dissipation but also the effects of hot spots.

The z9 burn-in methodology is in situ burn-in [26]. This methodology allows for continuous device stress while verifying the expected results from the application of test patterns (e.g., testing). In earlier generations, dynamic burn-in was used, but it had limited verification of functionality. This technique allowed more burn-in escapes (field fails not occurring in burn-in) because chip function was not guaranteed. In situ burn-in minimizes burn-in escapes caused by equipment failures, pattern coverage, and product functionality.

The burn-in methodology with respect to tool interface, implementation, and tool control is accomplished by the part number program (PNP). The PNP is the key software component that controls the tool set points (e.g., voltage, temperature, and time), pattern applications, and other features.

The z9 PNP is partitioned into two in situ segments. The fundamental purpose of the first segment is reliability defect screening at high-voltage and high-temperature conditions with slower pattern application or clock speeds. The second segment is used as the guard band to the final system assembly test-and-ship product-quality-level improvement. This segment includes the use of high clock speed and full test coverage at temperature and voltage conditions which are less than burn-in but beyond typical system application conditions.

Up-front engineering includes power modeling, which is then followed by extensive characterization of early silicon. This level of engineering enables the correct burn-in platform selection by power and capability. The burn-in characterization results are merged into the wafer sorting plan to aid in burn-in power management. This combined strategy, along with the burn-in tool flexibility within and across the platforms, optimizes cost and reliability.

The burn-in hardware solution itself is a system which allows more than a hundred chips to be processed in parallel. This tooling is an advance over prior generations in power-delivery capability as well as thermal management and control [2728]. It also represents an improvement in the integration level between the various subsystems. At the heart of the hardware solution is the thermal solution. In this tooling, each chip is brought into physical contact with a water-cooled tin-plated copper heat sink. For the z9, the chips are mounted on temporary ceramic carriers allowing for single-chip module-level burn-in. The tooling heat sink directly contacts the back side of the chip. The heat sinks are resident in heat-sink arrays inside the tooling [28]. To optimize thermal performance, special design consideration was given to the heat-sink design, the thermal interface surface, and the interface medium [29] between the chip and the heat sink.

Thermal control is accomplished by modulating the cooling water flow to each individual chip in the tool [27]. A complex control algorithm based on the system thermal model is utilized to accomplish this control. Chip temperatures are monitored by a heat-sink integral back-side-contact temperature sensor and/or through the use of multiple on-chip thermal sensors included in the chip design. Each chip is individually monitored and controlled. The ability to run on-chip at-speed patterns is also supported by a combination of chip design and tooling specification.

A number of design considerations had to be made to accommodate the high currents and powers required by the chips. Additional design considerations were given to the handling of heat generated by joule heating of all current-carrying paths from the supply out to the design under test. For example, the socket design (Figure 11) included contacts capable of handling high per-contact currents as well as an integral air-cooled heat sink to assist in the removal of waste heat from the burn-in board and contactor. Additional design considerations were made to accommodate manufacturing and cost considerations, including surface mount capability and a socket hardware reuse strategy.

Figure 11 Figure 11

The burn-in learning curve for z9 was steep; power management was key. Two major burn-in challenges which required innovation and understanding were the complex thermal control algorithms and the thermal interface. The thermal algorithms included fine-tuning of the imbedded thermal model which controls the chip temperature. The thermal interface challenges included an in-depth study and understanding of the relationships between heat sinks and chips as well as the interface medium [28] itself. Key to good thermal conduction was the flatness of the heat-sink surface which contacts the die surface. The flatness/cost tradeoff had to be understood. Fabricating interface surfaces which are extremely flat can be costly and can adversely affect lead times. Equipment that is not flat enough means yield loss, potential burn-in process cycle-time issues, and less thermal performance. On the road to understanding flatness, other important capabilities and techniques were developed to enable measurement, correlation, and diagnostic capability. Translation from optical flatness to tool diagnostic electrical parameters was the final step for ensuring a level of flatness across the entire toolset.

Developing and fine-tuning the thermal control algorithm of the tool for each die design was another significant challenge. The algorithm had simultaneously to control the temperature to a set point for both high- and low-power devices. The challenge was eased slightly by the data-collection capability of the toolset. By design, the tool can collect voltage, current, temperature, and other parameters for all parts in the run at a given time. After several rounds of data analysis and algorithm adjustment, a robust thermal control solution emerged. This solution had to handle chips of widely varying powers, different die designs, and different package types, as well as the transients associated with temperature, voltage, and clock speed changes during the burn-in cycle.

Due in large part to the success of the burn-in operation, the z9 multichip module was able to meet its reliability objectives.

Summary

The 90-nm SOI technology features used for the z9 multichip module chipset were reviewed. The necessary optimization and enhancement of this technology for z9 were also discussed. Many of the technology challenges faced were required in order to obtain a 2X system performance increase over the previous generation. This drove the technology and design to a 40% increase in frequency for the MCM. On top of the base SOI technology features, later developments, which included process optimization, computational lithographic advances (OPC), FET device tuning, and mask build optimization, were key to improving delay performance over the nominal technology to meet the performance goals. This increased performance had to be balanced by power considerations, which led to the innovations necessary to successfully burn in the z9 chips. Good defect acceleration in burn-in allowed the z9 to meet its overall reliability goals. Electrically programmable fuses were used after burn-in to replace defective SRAM cells and thus increase yield. Another 90-nm-generation challenge was SRAM reliability. By successful modeling, employing careful experimentation and state-of-the-art failure analysis, the SRAM cells, reduced in area ~50% from the previous technology, were optimized to meet both their performance and reliability goals. Numerous innovations and a systematic approach were required across the spectrum of technology disciplines for the 90-nm chips of the z9 multichip module to ultimately allow the overall system to reach its goals.

Acknowledgments

The authors would like to recognize the many members of the IBM technology manufacturing, development, and research teams, and the z9 chip design, test, and burn-in teams who contributed to this achievement. In particular we would like to acknowledge the following individuals for their outstanding contributions: Richard Rizzolo for his work on system integration bridging back to wafer and module test; Debbie Hamm, Janet Rocque, and Stephen Hennessey for their product engineering work; Jim Crafts and Karre Greene for their work on test development and functional characterization, providing timely feedback to the processing and systems team; Jeff Brown and Ravikumar Ramachandran for process engineering on the early user hardware build; Amy Chan and Raymond Van Roijen for their FET device integration; and Allen Gabor for his work in reducing across-field device variation.

*Trademark, service mark, or registered trademark of International Business Machines Corporation in the United States, other countries, or both.

References

Received April 6, 2006; accepted for publication October 2, 2006; Published online February 13, 2007.


    About IBMPrivacyContact