Our VLIW compiler has the following goals:
1. Performance of the compiled code
In addition to most of the standard optimizations, our compiler is
built around a software-pipelining scheduler which is an enhanced
version of a previously described
algorithm.
The performance enhancements include:
- heuristics based on modulo-scheduling;
- infinite scheduling window.
Other optimizations are centered on exposing as much instruction-level
parallelism as possible. Memory disambiguation is a very important
issue for these purposes, for which we apply a combination of strategies:
- sophisticated address computation/disambiguation;
- cloning of loops into parallel/sequential versions;
- address-compare buffers.
We also use some loop-rewrite optimizations to expose parallelism
in loops, such as:
- loop unrolling;
- expression-tree rewriting to reduce the minimum-initiation interval
(MII);
- rewriting associative operation trees.
2. Validation
Our compiler is flexible, in the sense that changes
in the target architecture can be quickly accommodated.
A major use of our compiler is as follows:
- propose an architectural change;
- add an optimization to exploit/work around that feature;
- measure the performance impact and thus accept/reject the change.
Architectural features which have been considered include:
- address compare buffers;
- combining dependent operations;
- various flavors of prefetching;
- select (conditional) operations.
The compiler is capable of handling different processor resource models.
For example, various run-time switches allow changing the number and type
of functional units and the size of the register files, as well as
enable/disable various architectural features.
3. Intermediate Form
The dominating factor in achieving the two goals listed above is simply
implementation and debug time. This is reflected in the design
of the compiler in terms of the mechanisms used to support the desired
transformations.
DFGs
The intermediate code representation and data structures are the key
factors in the ability to quickly add optimizations and experiment with
them. We have chosen a modified version of the dependence flow graph (DFG),
a representation originally developed by Dr. Pingali's group at Cornell
University [Pingali et al., Dependence flow graphs: an algebraic
approach to program dependencies, 18th ACM Symp. on Principles of
Progr. Languages, pp.67-78, 1991].
The DFG combines both the control flow and dependence information in a
single coherent form, which provides all the dependence information needed
by most optimizations in an easy-to-analyze form. For instance, the DFG
is an executable representation, so it is well suited to analysis
techniques based on partial evaluation.
The dependence information is fully "pinned," i.e., it is in both
static-single assignment and reverse static-single assignment forms, which
simplifies code motions across blocks. This feature represents our
major change to DFGs, because we do not perform bypassing
(bypassing makes code motion too difficult).
Aliasing
Another advantage in our intermediate form is the representation of
load/store dependences. Anti/Output/Flow dependences provide a mechanism
that enables performing memory anti-aliasing once, and then represent the
result so that all other optimizations can use it.
Intervals
We use only one other global data-structure: a (pseudo-)interval
hierarchy. For reducible graphs, this is the usual interval hierarchy.
For irreducible graphs, certain irreducible portions become
pseudo-intervals (they have multiple entry points from outside the loop).
Edge Split
We maintain the following invariant: there is no control flow edge
linking a basic-block with multiple predecessors to multiple successors. If
such an edge exists in the input, we split it by introducing a dummy basic
block. This turns out to be extremely useful.
Compiler/architecture interaction in a tree-based
VLIW processor (Foils, pdf, 136 KB)
Workshop on interaction between compilers and computer architectures,
1997 High-Performance Computer Architecture Conference (HPCA97),
San Antonio, February 1997.
Implementing an experimental VLIW compiler
(Foils, pdf, 154 KB)
Workshop on computer architecture education,
1997 High-Performance Computer Architecture Conference (HPCA97),
San Antonio, February 1997.