|
This special issue of the IBM Journal of Research and Development provides an overview of the Cell Broadband Engine™ (Cell/B.E.) processor, describes its first two implementations, and reports on initial Cell/B.E. processor–based systems from IBM and the first uses of these systems. The Cell/B.E. processor was initially intended for the computer gaming market, and as such, Sony Computer Entertainment, Inc., has shipped millions of PLAYSTATION®3 systems that use this processor. However, when in March 2007 a program to do protein folding was made available on the PLAYSTATION 3 system, within a few days it created the world's largest distributed computer for this application. The Cell/B.E. processor–based system contributed more than twice as much application performance as a vastly larger number of PCs could have delivered. IBM and Mercury Computer Systems, Inc., have begun shipping the first generation of Cell/B.E processor–based blade servers. IBM and Los Alamos National Laboratory have announced their goal to jointly build a 1-Petaflops supercomputer based on a variant of the Cell/B.E. processor that provides significant improvement in double-precision floating-point performance. Mentor Graphics Corporation has introduced a Cell/B.E. processor–based solution for semiconductor mask processing for the 45-nm-node semiconductor generation, allowing its customers to replace three racks of conventional processors with half a rack of Cell/B.E. processor–based blades. A large number of universities are joining in research based on, or enabled by, the Cell/B.E. processor. The IBM developerWorks® Web site on the Cell/B.E. processor draws a great amount of traffic. An enormous number of articles on the Cell/B.E. processor, systems, and applications have appeared on the Internet and in newspapers. This issue of the Journal is intended to serve as an introduction to the key technical features of the Cell/B.E. processor and early systems and applications based on it.
An introduction to the Cell/B.E. Architecture (CBEA) is provided in the paper by Johns and Brokenshire. This paper describes how the PowerPC Architecture™ has been extended with loosely coupled cooperative off-load processors. In the CBEA, these cooperative processors are the synergistic processor elements (SPEs). In addition, the paper provides a detailed overview of the communication and control mechanisms between the SPEs and the PowerPC Architecture–compliant PowerPC processor elements (PPEs). The paper also provides a brief introduction to the programming environment.
The paper by Shimizu et al. provides an overview of the security mechanisms supported by the CBEA. The Cell/B.E. processor security architecture is based on three key features: the ability to create an isolated execution environment within the SPEs, a hardware-based “root of trust,” and the ability to dynamically load code and data into the isolated environment and authenticate and/or decrypt it on the basis of this hardware root of trust. The combination of these three features introduces a flexible solution that can be used for security, privacy, and digital-rights management without introducing any dependency on the integrity of the system software such as the operating system or hypervisor.
A comprehensive overview of the architecture, microarchitecture, and implementation of the synergistic processor unit (SPU) in the SPEs is given in the paper by Flachs et al. By focusing on mechanisms that co-optimize peak performance and performance per transistor, the SPUs deliver high performance per thread, as well as considerably higher performance per transistor than conventional processor architectures. The SPU is by no means the only mechanism that enhances efficiency over that of conventional architectures. Another key feature is the introduction of the software-managed local store memory, which provides a third software-managed storage resource between the processor registers and shared system memory.
The paper by Riley et al. provides a description of the implementation of the Cell/B.E. processor in 90-nm and 65-nm IBM silicon-on-insulator (SOI) technology. In addition to architectural innovation, the implementations of the CBEA introduce a number of techniques, such as a wide variety of aggressive latch designs, required to meet the very aggressive cycle time objectives. In the 65-nm design, static random access memory (SRAM) arrays have a separate supply voltage to maintain SRAM stability without sacrificing density or performance.
The paper by Chen et al. provides an overview of the performance of the Cell/B.E. processor on a variety of computational kernels from high-performance computing, graphics and media, and security. This paper reports that the Cell/B.E processor typically achieves an order-of-magnitude better performance than general-purpose processors of the same technology generation when running computationally intensive applications.
The high level of numeric computing power in a Cell/B.E. processor that is traditionally associated with expensive, high-end supercomputers, coupled with commodity high-volume attributes, makes it an attractive platform for a new breed of streaming, real-time, interactive digital media, scientific, and engineering applications. This new breed of applications encompasses a wide range of areas such as games, streaming media, medical imaging, video surveillance, three-dimensional and real-time rendering, collaborative engineering design, virtual worlds, military simulation, seismic computing, and financial modeling among others. Three generations of IBM Cell/B.E. blade servers, including the currently available QS20 product, are designed to address this broad set of applications and emerging market opportunities.
The paper by Nanda et al. on blade servers describes the architecture and design philosophy behind several generations of IBM server products using the Cell/B.E. processor. These blades take advantage of high-volume and open-system technologies such as the IBM BladeCenter® platform, open-system software, and commodity high-speed switches, and they are targeted for high-performance servers ranging from a low-end two way system to supercomputers containing tens of thousands of blades. In addition to the details of the hardware and software architectures, this paper briefly describes three applications (digital video surveillance, distributed game physics, and soft-body simulation) and their performance on Cell/B.E. blades.
Another key application of Cell/B.E. blades, speech recognition, is presented in depth in the paper by Liu et al. This paper describes an automated speech recognition system including efficient speech decoding algorithms and parallel data implementation of the system on a Cell/B.E. processor. Initial performance measurements indicate that a single blade can recognize speech from thousands of simultaneous voice channels in real time—a channel density that is orders of magnitude greater than the capacity of existing CPU (central processing unit)–based software speech recognizers.
Writing parallel code to extract performance from the various computing units in the Cell/B.E. processor can be a challenging task. The paper by Perez et al. presents a mechanism called CellSs that is designed to ease the burden on the programmer by automatic exploitation of the functional parallelism of a sequential program. The CellSs programming tool focuses on flexibility and is based on a simple annotation of the sequential source code. A source-to-source compiler generates parallel code automatically, and a runtime library helps execute this code on the SPEs. The runtime environment handles task scheduling and data communication among the SPEs. The paper details an implementation of this approach for Cell/B.E. blades and describes several code examples that validate the approach.
The final paper in this issue is not related to Cell/B.E. technology or systems. Black et al. provide a review of a novel process called polymer self assembly, which is a nontraditional approach to patterning integrated circuit elements at dimensions and densities that are inaccessible to traditional lithographic methods.
| |
| |
| | H. Peter Hofstee
Distinguished Engineer, Architect STI Design Center IBM Systems and Technology Group |
| |
| | Ashwini K. Nanda
Chief Architect, Cell Systems IBM Systems and Technology Group Guest Editors |
| |
| | John J. Ritsko
Editor-in-Chief IBM Journal of Research and Development |
| |
|