Michael Gschwind is an IBM Master Inventor, an IBM Power
architect and a logic design and microarchitecture lead for a future
IBM System. He was one of the initiators and a leading contributor to
the Cell Broadband Engine system architecture definition as well as a
lead architect of the Synergistic Processor architecture. During the
definition of the Synergistic Processor architecture, Dr. Gschwind
also developed the first Cell Broadband Engine compiler.
Dr. Gschwind joined the IBM TJ Watson Research Center, Yorktown
Heights, NY, in 1997. He has held leadership positions in several
seminal projects, including the DAISY dynamic compilation project
where he was responsible for design, test and integration of the DAISY
tree-VLIW architecture, and the BOA project where he was a lead
architect for the BOA high-frequency statically scheduled
architecture, and was a leading contributor to the development of
pioneering dynamic compilation techniques. During the project,
Dr. Gschwind developed key pipeline control techniques used in several
industry-leading microprocessors.
Dr. Gschwind was also a leading contributor to seminal work on
power/performance trade-offs in microprocessor designs which
formalized the futility of the frequency-centric uniprocessor design
approach used in the industry at the time, an insight that had already
guided the design of the Cell Broadband Engine.
Dr. Gschwind is a leader in assessing the impact of future
technologies on architecture and microarchitecture for IBM systems.
Dr. Gschwind has contributed to the development of the Power
Architecture as the lead architect for a next generation SIMD
extension, to the architecture and microarchitecture of several
generations of PowerPC processor cores, and PowerPC-based systems such
as the IBM PowerBlade, as well as to the architecture and
microarchitecture of IBM zSeries systems.
Dr. Gschwind has also contributed to compilers, application
programming environments and application interfaces for IBM zSeries
and pSeries systems under Linux, AIX and z/OS.
Dr. Gschwind's contributions to IBM systems and technology have
been recognized with several corporate awards. In addition to his
contributions to the design and implementation of IBM systems, he is
the author of over 75 papers, covering hardware/software co-design,
compiler technology, multimedia processing, and high-performance
computer architecture, and has received key patents for his inventions
in these areas.
Before joining IBM, Dr. Gschwind made significant contributions to
the field of reconfigurable architectures, developing architecture
extension and evaluation techniques, and pioneering the use of
high-capacity field-programmable gate arrays as implementation
platform for microprocessors.
In addition to his corporate contributions, Dr. Gschwind has been a
faculty member at Technische Universitt Wien, Vienna, Austria, and a
visiting faculty member at Princeton University where he has taught
classes on advanced computer architecture. Dr. Gschwind received PhD
and MS degrees in computer science from Technische Universitt Wien,
Vienna, Austria.
Bruce D'Amora is a Senior Technical Staff Member in the
Emerging Systems Software group at the IBM T.J. Watson Research
Center. His research interests are in 3D rendering and physical
simulation. Mr. D'Amora is currently focusing on the design and
programmability of Cell processor based systems targeted at video game
development and digital media. His previous project was Pervasive 3D
Viewing for Product Data Management for which he developed a 3D
renderer on a network enabled handheld device. Prior to his position
at Watson, Bruce was the Chief Software Architect for the 3D graphics
development group at IBM Austin and the IBM representative on the
OpenGL Architectural Review Board. He has designed and developed
graphics applications, system software, and graphics adapters for 23
years. Mr. D'Amora began his career at IBM Boulder where he was a
developer of IBMs first MCAD application, FastDraft. The application
was targeted specifically at the Automotive and Aerospace industry. He
subsequently developed one of the first PC CAD applications, CADWrite,
that utilized the Virtual Device Interface (VDI) to access low level
graphics adapter function. After, briefly working on the ANY/BSY-1
Submarine Sonar system, Mr. D'Amora relocated to IBM Austin, TX to lead
the OpenGL development effort. In 1993, his team produced the first
industry implementation of OpenGL on the initial 601 Power PC
workstation developed by IBM RS/6000 division. Derivatives of the
OpenGL product became the code base for subsequent IBM graphics
systems. Mr. D'Amora holds a BA in Microbiology and a BS in Applied
Mathematics from the University of Colorado. He also holds an MS in
Computer Science from National Technological University.
Kevin O'Brien has spent the last 24 years at IBM working in the
field of compilation and architecture. Initially, at the IBM Toronto
Lab, he was the lead architect of the TOBEY optimizing backend (used
in IBM's xlc, xlf, and xlC (C++) product compilers) and made many of
the key design decisions for the whole compiler suite, including the C
and fortran front ends. Mr. O'Brien was a co-architect of the original
intermediate language for Tobey XIL, which is still in use in the
compilers that ship today. He was also responsible for the
reassociation optimization in these compilers, which were for a long
time the only commercial compilers to perform this optimization.
On moving to the IBM T.J. Watson Research Center at Yorktown Heights,
Mr O'Brien became the lead designer of a research project to perform
cache optimizations in the XL compilers. He developed numerous loop
transformation techniques, such as unroll and jam, stripmining, and
loop unswitching, as well as predictive commoning. All of his
techniques were incorporated into, and subsequently shipped in the
product compilers.
Mr. O'Brien was also the architect of a new intermediate language
(YIL), initially as a research prototype, but later incorporated into
the product compilers. This high level Intermediate Language was
introduced to facilitate high level loop optimizations which were
difficult to perform on the lower level XIL. This YIL intermediate
language was used in the High Order Transformation phase which shipped
in the product compilers, and formed the precursor of the current
Toronto Portable Optimizer.
Contemporaneously with this work, Mr. O'Brien also developed a
prototype simdizing compiler for a Numeric Intensive Computation
Accelerator architecture being developed by IBM.
Subsequent to this work, Mr O'Brien was one of the key members of
the Single Program Speculative Multithreading architecture/compiler
co-design project.
In the course of his 17 years at IBM Research, Mr. O'Brien has
focused his efforts on deploying his diverse compiler optimization
skills to the improvement of IBM products. In 1996 he ran a project to
enhance the performance of the IBM Smalltalk product. With the
emergence of Java, Mr O'Brien applied his dynamic optimization skills
to the design of a dynamic optimizer for a just in time compiler for a
JVM for the first Transmeta machine. This was the first JIT to be
written in Java. Many of the techniques developed in this work were
applied in a continuous optimization phase of a binary
translation/emulation project that Mr. O'Brien worked on
subsequently. As part of this project he also developed a threaded
interpreter. He continues to consult on issues of dynamic optimization
in IBM.
For the last 4.5 years Mr. O'Brien has been the project leader for
the IBM prototype compilers for the CBE Processor. In addition to
developing major pieces of the scalar compiler, he has developed
techniques for the single shared memory abstraction and the efficient
management of code and data on the CBE processor. He is currently
investigating memory related optimizations for the CBE.
Mr. O'Brien has received an IBM Corporate and Outstanding Technical
Achievement Awards for his work on the XL compilers and holds several
patents on compiler optimization and SPSM. He has also contributed to
the architectures of Power, PowerPC and Cell.
Mr. O'Brien holds a BSc in Theoretical Physics and an MSc in
Astrophysics from Queen Mary College London England.
Kathryn O'Brien has worked at IBM for 23 years, 17 of them as
a researcher at IBM TJ Watson Research Center, where she has been
involved in several static and dynamic compiler projects. Ms O'Brien
began her career at the IBM Toronto lab where she became the project
leader for the IBM XLF Fortran runtime library for the RS/6000
processor. Subsequently she worked on the TOBEY optimizing backend of
the XL compilers where she was responsible for several
optimizations. She implemented the trap motion optimization in the
current XL product compilers.
On moving to the IBM TJ Watson Research Center at Yorktown
Heights, Ms O'Brien started a project to develop automatic
vectorization in the IBM XL compilers targetting the S390 vector
instruction set. This project formed the basis of the later
vectorization, parallelization and memory hierarchy optimization
phases in these compilers, and shipped as the first High Order
Transformation phase in the XL compilers, the precursor to the
current Toronto Portable Optimizer. Her primary interest in this work
was in data dependence analysis and loop distribution.
Later Ms O'Brien was one of the lead architects of an
architecture/compiler co-design project for one of the first
speculative multithreading architectures, SPSM (Single Program
Speculative Multithreading). She developed significant portions of the
compiler to target this architecture, and is a co-inventor of the
original SPSM patent.
Throughout her career at IBM Research Ms O'Brien has been involved
in a number of other static and dynamic optimization projects, most
notably she was one of the key members of a dynamically optimizing jit
compiler, written in Java, for a JVM for the first Transmeta
machine. This was the first JIT compiler to be written in
Java. Subsequently Ms. O'Brien worked on a continuous dynamic
optimization project which used a multiprocessor system to monitor and
dynamically re-optimize executing applications.
Most recently Ms O'Brien has played a key role in prototyping the
first XL compilers for the CELL architecture, where her specific
interests are in both scalar compilation techniques and in compiler
exploitation of the multiple levels of parallelism through a single
source compiler. She has developed significant portions of the scalar
compiler and was instrumental in the design of the single source
OpenMP compiler.
Ms O'Brien holds a B.A. degree in History from the Queens University
of Belfast and an M.A. degree in Anthropology from the University of
London in England.
Alexandre Eichenberger is currently a Research Staff Member
in the Exploratory System Architecture group of the VLSI Systems
department at the IBM T.J. Watson Research Center. My research
interests focus on the interaction between compiler technology and
micro-architecture design.
While at Watson, I have explored and developed compiler support for
many of the micro-architectural tradeoffs present in the Cell
Broadband Engine (Cell BE) processors, primarily on the Synergistic
Processor Element (SPE). Examples of tradeoffs are SPE-specific
scheduling and bundling issues, as well as compiler techniques to
prevent instruction fetch starvation on the SPEs.
More recently, I have be a primary contributor to the automatic
generation of SIMD code targeting the SIMD units found in the CELL
(SPE/VMX), Power (VMX), and BlueGene/L (double-precision
floating-point) architectures, focusing on data alignment and code
generation related issues. Prior work includes unroll-and-pack
approaches and loop-based approaches. The novel approach that I
pioneered combines aspects of both of these approaches. It also
attempts to systematically minimize the impact of data reorganization
due to compile-time or runtime data misalignment, and it can perform
auto-simdization in the presence of data conversion (i.e., conversion
from one data type to another). Auto-simdization can generate such
minimum data reorganization code for the SIMD unit even in presence of
multiple compile-time or runtime alignment. It also handles induction
variables, private variables, and non-stride one memory reference
patterns.
Prior to working at IBM, I worked on applying modulo scheduling to
new architectures, such as clustered architectures where the compiler
explicitly handles communication among clusters of functional units. I
also applied modulo scheduling to new domains, such as to generating
faster code instrumentation gathering branch profiling using modulo
schedules that are distributed in the code. In that work, best
technique reduces slow down due to instrumentation by a 10x factor.
I also extended my area of expertise to straight line code by
investigating scheduling algorithms for superblocks, where most
efficient algorithm (balance) explicitly delays some branches to
reduce average execution time. This algorithm can reduce slowdown
compared to lower bound by a 2x factor.
To achieve higher degree of performance, superblocks including
multiple predicated execution traces (i.e. hyperblocks) have been
extensively used. However, prior to my work, hyperblocks could not be
optimized to the same extent as single path regions, because some
conditions along one path may prevent useful optimization along some
other paths. I proposed an approach that enables single path
optimizations in hyperblocks by selectively renaming registers and
replicating operations in the hyperblock. Renaming and replicating is
performed only when it enables some optimizations to break a critical
dependence. Measurements indicate that large speedups are possible,
e.g. up to 66% for wc, 8% for li, and 7% for compress.
I received a Diploma in Computer Science at Eidgenssische
Technische Hochschule, Zrich, Switzerland in 1991. I studied at the
Computer and Electrical Engineering Department at the University of
Michigan, Ann Arbor and received a M.S. and a Ph.D. degrees in
Computer and Electrical Engineering in, respectively, 1993 and 1996. I
was Assistant Professor at the Department of Electrical and Computer
Engineering at the North Carolina State University before joining IBM
Research at the IBM T.J. Watson Research Center in 2001. I have
published more than 20 refereed papers in journals and conferences
including MICRO, PACT, PLDI, CGO, ICS.
Peng Wu is currently a research staff member in the High
Performance Software Environment group in IBM T.J. Watson Research
Center. She was part of the Cell compiler team since its inception. In
the early days, Dr. Wu was heavily involved in the prototyping of the
XL compiler backend for the Cell architecture and in defining C/C++
extensions for SPE intrinsic. Since 2003, Dr. Wu has directed her
research efforts to address compilation issues from vectorizing for
SIMD architectures (a.k.a simdization). She is a primary contributor
to automatic simdization in the Cell compiler. Currently, Dr. Wu leads
the effort to build a general simdization infrastructure that targets
a variety of SIMD units in IBM processor families including the Cell
processor, VMX for PPC970, and Dual FPU unit for BlueGene/L. Her
research interests include program analysis, compilation for SIMD
architectures and multimedia applications, and language and compiler
solutions to achieve better productivity. Peng Wu received her
doctoral degree in Computer Science at University of Illinois at
Urbana-Champaign, in 2001.