-
BOA: a second generation DAISY architecture
Tutorial presented at ISCA 2004, München, Germany, June 2004.
[ Abstract and foils ] [] DAISY and BOA bibliography (sorted by date) ]
-
Inherently Lower Complexity Architectures using Dynamic Optimization
presented at Workshop on Complexity-Effective Design 2003 held in
conjunction with ISCA-2002, Anchorage, AK, May 2002.
[ Abstract and foils ]
-
Precise Exceptions in Dynamic Optimization
presented at 2002 Symposium on Compiler Construction,
Grenoble, France, April 2002
[ Abstract and foils ]
-
Optimization and Precise Exceptions in Dynamic Compilation
presented at Workshop on Binary Translation 2000 held in
conjunction with PACT 2000, Philadelphia, PA, October 2000
[ Abstract ]
-
Binary Translation and Architecture Convergence Issues for IBM System/390
presented at International Conference on Supercomputing 2000 (ICS '00),
Santa Fe, New Mexico, May 2000.
[ Abstract and foils ]
-
Optimizations and Oracle Parallelism with Dynamic Translation
presented at MICRO-32, Haifa, Israel, November 1999
[ Abstract and foils ]
-
Execution-based Scheduling for VLIW Architectures
presented at Europar-99, Toulouse, France, September 1999
[ Abstract and foils ]
-
An eight-issue tree-VLIW processor for dynamic binary translation
presented at ICCD 98, Austin, TX, October 1998
[ Abstract and foils ]
-
DAISY: Dynamic Compilation for 100% Architectural
Compatibility
presented at ISCA-24, Denver, Colorado, June 1997
[ Abstract and foils ]
-
BOA: Targeting Multi-Gigahertz with Binary Translation
presented at Binary Translation Workshop 99, Newport
Beach, California, October 1999
[ Abstract and foils ]
- Compiler/architecture interaction in a tree-based VLIW processor
Workshop on interaction between compilers and computer architectures,
1997 High-Performance Computer Architecture Conference (HPCA97),
San Antonio, February 1997.
[ Foils (pdf, 136 KB) ]
- Implementing an experimental VLIW compiler
Workshop on computer architecture education,
1997 High-Performance Computer Architecture Conference (HPCA97),
San Antonio, February 1997.
[ Foils (pdf, 154 KB) ]
- Scalable Instruction Level Parallelism through Tree
Instructions
presented at International Conference on
Supercomputing, Vienna, Austria, July 1997
[ Abstract and foils ]
- The IBM Research VLIW project
RISC in 1995 Symposium,
IBM T.J. Watson Research Center,
Yorktown Heights, NY, November 1995.
[ Abstract and foils ]
- Simulation/evaluation approach for a VLIW processor
Workshop on pre-hardware performance evaluation,
ACM International Symposium on Computer Architecture (ISCA95),
Santa Margherita Ligure, Italy, July 1995.
Updated version presented at
Workshop on performance analysis and its impact on design,
IBM Austin Research Laboratory, Austin, TX, March 1996.
[ Abstract and foils ]
- An integrated approach to architectural simulation, timing and
memory hierarchy evaluation
Workshop on performance analysis and its impact on design,
IBM Austin Research Laboratory, Austin, TX, March 1996.
[ Abstract and foils ]
- Tree-based VLIW architecture,
IBM T.J. Watson Research Center, Yorktown Heights, NY, October 1994.
[ Abstract and foils ]
K. Ebcioglu, IBM T.J. Watson Research Center
The IBM Research VLIW project has been continuing since 1986, focusing on
hardware and compiler techniques for instruction-level parallelism. We will
give an overview of various aspects of the IBM Research VLIW project,
including our VLIW hardware prototype, application of VLIW compilation
techniques to PowerPC superscalar processors, and our third-generation
parallelizing compiler. We will describe novel VLIW architectural features
and compiler techniques for achieving a high degree of instruction-level
parallelism not only in scientific code, but also in sequential-natured
code, involving frequent unpredictable branches, pointers, and a large
amount of data dependencies. Our VLIW compilation techniques can parallelize
multiple paths in a program directly (unlike trace scheduling), and can
generate variable initiation intervals during software pipelining of loops
with conditional branches (unlike modulo scheduling).
[
Presentation foils (pdf, 45 KB) ]
[ Top ]
J. Moreno, M. Moudgill, K. Ebcioglu, IBM T.J. Watson Research Center
This presentation describes the approach being used for the simulation
and early evaluation of a processor architecture based on Very-Long
Instruction Word (VLIW) principles. In this architecture, a program
consists of a set of "tree-instructions", each one corresponding to a
multiway branch and multiple operations, all performed simultaneously
(as one VLIW). The representation of the tree-instructions in memory
allows their execution in implementations with varying resources, so
that the program representation is implementation-independent.
The simulation/evaluation environment consists of:
- A high-level language (C,FORTRAN) compiler, which generates
tree-instructions in a VLIW assembly language.
- A VLIW assembler, which generates VLIW object code.
- A translator from VLIW assembly code into RS/6000 assembly code.
The RS/6000 code simulates the functionality of the VLIW processor
for the specific VLIW program, including instrumentation to collect
execution counts of VLIWs, VLIW profiling information, and generation
of predecoded VLIW execution traces.
- A cycle timer, integrated into the simulation environment, which
is invoked by the simulator on a VLIW by VLIW basis so that the timer
processes execution traces as they are generated.
The presentation will review the basic features of the architecture,
including those which allow the execution of the same VLIW program in
processors with different resources, some of the VLIW compilation
techniques used, trade-offs among compiler and architecture, the role
of the evaluation-simulation environment in this process, and the
requirements imposed on the different tools.
The environment is oriented towards early verification of the VLIW
architecture instead of reflecting a pre-hardware definition of a processor
implementation; however, the extensibility for such a functionality has
been taken into account in the design of the environment. Emphasis has been
placed on the development of an environment which provides reasonably fast
turn-around time from compilation to simulation, so that architecture/compiler
tradeoffs can be analyzed over complete execution runs.
[ Presentation foils (pdf, 70 KB) ]
[ Top ]
E. Altman, C.B. Hall, R. Miranda, J. Moreno
Using an integrated, modular approach to simulation and performance
measurement, we have built an environment for early-stage evaluation
of new architectures that achieves a high degree of efficiency and
versatility. Our environment consists of a compiler that generates
code for an experimental architecture, and a separate translator that
maps that code to a simulation executable that is run on an IBM RISC
System/6000. The simulation executable consists of RS/6000 code that
directly emulates the original native code of the experimental
architecture, as opposed to an interpreter using the native code as
input.
Performance measurement capabilities are integrated into the simulation
executable by including a decoded form of the original native code, and
by inserting calls to a generic timer routine into the emulation code.
As the simulation executable emulates each original instruction, it
calls the timer, passing the decoded version of the instruction and an
image of the current machine state.
The timer invoked by the simulation executable consists of two parts,
a processor model and a memory model, each with a clearly definedinterface. This allows a variety of processor and memory models to
be used interchangeably, with the models differing in both the system
configuration they implement, and in the degree of detail and accuracy
involved.
In practice, our timing environment has allowed us to dispense with the
generation of traces and measure the performance of realistic workloads.
Our simulation executables without timer calls typically run only about
14 times slower than the optimized native RS/6000 code for the same
program. Using a timer that models a VLIW processor at the functional
unit level and a memory hierarchy consisting of two levels of cache and
main memory, a full timing is slower than the simulation executable by
an additional factor of 75.
[ Presentation foils (pdf, 54 KB) ]
[ Top ]
J. Moreno, IBM T.J. Watson Research Center
This talk describes the features of the VLIW architecture currently
under development at IBM T.J. Watson. The architecture, based on
the concept of tree-instructions, includes properties which allow
binary compatibility across an entire family of processor
implementations, ranging from scalar to VLIW, thus recovering the
traditional separation among architecture and implementations.
[ Presentation foils (pdf, 46 KB) ]
[ Top ]
|
|
|
|