The SPU was designed with a compiled code focus
from the beginning, and early availability of SIMD-optimized compilers
allowed development of high-performance
graphics and media libraries for the Broadband
Architecture entirely in the C programming language.
A key innovation in the Synergistic
Processor Architecture is the use of scalar
layering to map scalar computation on a pervasively
data parallel architecture as implemented by the SPU.
In the Synergistic Processor
Architecture, a single unified register file is used to
store scalar and SIMD data, for all data types including
integer, floating point, Boolean values, and
addresses.
All data paths are 128b wide, and the
instruction set only provides instructions to operate on
the entire data width. As a result, scalar data is no
longer defined by the register file and instruction
used, but based on the use of the result of a
pervasively data parallel instruction.
This approach allows to streamline the
architecture specification and implementation, and
allows to achieve better compute density than with
previous architectures which supported data parallelism
only as an architecture option. By focusing on the
exploitation of data parallelism as the cornerstone of
the architecture, the new pervasively data parallel
computing (PDPC) paradigm allows to achieve much
greater data processing capabilities and redefines high
performance architecture.
Compiler exploitation was an integral
part of the Cell concept from the beginning. A first
compiler demonstrated the concepts -- such as scalar
layering -- incorporated in Cell and was developed by
members of the original architecture team. The first
Cell SPU compiler which was a guiding force
for the definition of the SPU architecture and the SPU programming
environment, and sample code to exploit the strengths of
the Cell Broadband
Engine Architecture.
The SPU offers a high-performance
statically scheduled execution engine which is exploited
with the help of advanced compiler techniques to
maximize the use of instruction level parallelism based
on the bundle architecture, static branch prediction and
prepare-to-branch operations, and compiler-managed
instruction fetch hints.
A detailed overview of the Cell SPU
architecture can be found in:
- A novel SIMD architecture for the Cell
heterogeneous chip-multiprocessor
- Hot Chips 17, August 2005. (M. Gschwind,
P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, T. Yamazaki)