|
|
 |
|
 |
Volume 37, Number 2, 1998
San Francisco Frameworks |
|
Table of contents: HTML ASCII |
|
This article: HTML ASCII |
Copyright info |
 |
 |
 |
 |
| |
|
Profile-directed restructuring of operating system code - References |
 |
by W. J. Schmidt,
R. R. Roediger,
C. S. Mestad,
B. Mendelson,
I. Shavit-Lottem |
 |
 |
 |
Cited references and notes
- K. Roland and A. Dollas, "Predicting and Precluding
Problems with Memory Latency," IEEE Micro 14, No. 4,
59-67 (1994).
- L. A. Belady, "A Study of Replacement Algorithms for
a Virtual-Storage Computer," IBM Systems Journal 5,
No. 2, 78-101 (1966).
- P. J. Denning, "The Working Set Model for Programming
Behavior," Communications of the ACM 11, No. 5,
323-333 (1968).
- P. J. Denning, "Virtual Memory," Computing
Surveys 2, No. 3, 153-189 (September 1970).
- C. V. Ramamoorthy, "The Analytic Design of a
Dynamic Look Ahead and Program Segmenting System for Multiprogrammed
Computers," Proceedings of the ACM National Conference,
ACM Pub. P-66, ACM, New York (1966), pp. 229-239.
- T. C. Lowe, "Automatic Segmentation of Cyclic Program
Structures Based on Connectivity and Processor Timing,"
Communications of the ACM 13, No. 1, 3-9 (January
1970).
- E. W. Ver Hoef, "Automatic Program Segmentation Based
on Boolean Connectivity," Proceedings of AFIPS 1971 SJCC,
AFIPS Press, Montvale, NJ (1971), pp. 491-495.
- J.-L. Baer and R. Caughey, "Segmentation and
Optimization of Programs from Cyclic Structure Analysis," Proceedings
of AFIPS 1972 SJCC, AFIPS Press, Montvale, NJ (1972),
pp. 23-36.
- R. Snyder, "On the Application of a priori
Knowledge of Program Structure to the Performance of Virtual Memory
Computer Systems," Ph.D. thesis, University of Washington, Seattle,
WA 98195 (November 1978).
- D. J. Hatfield and J. Gerald, "Program Restructuring
for Virtual Memory," IBM Systems Journal 3, 168-192
(1971).
- D. Ferrari, "Improving Locality by Critical Working
Sets," Communications of the ACM 17, No. 11, 614-620
(November 1974).
- S. J. Hartley, "Compile-Time Program Restructuring
in Multiprogrammed Virtual Memory Systems," IEEE Transactions on
Software Engineering 14, No. 11, 1640-1644
(November 1988).
- Y. Wu, "Ordering Functions for Improving Memory
Reference Locality in a Shared Memory Multiprocessor System,"
Proceedings of the 25th International Symposium on
Microarchitecture, Portland, OR (December 1992),
pp. 218-221.
- A. J. Smith, "Cache Memories," Computing
Surveys 14, No. 3, 473-530 (1982).
- S. McFarling, "Program Optimization for Instruction
Caches," Proceedings of the Third International Conference on
Architectural Support for Programming Languages and Operating
Systems, Boston, MA (April 1989), pp. 183-191.
- A. Mendelson, S. S. Pinter, and R. Shtokhamer,
"Compile Time Instruction Cache Optimizations," ACM Computer
Architecture News 22, No. 1, 44-51 (March 1994).
- W.-M. Hwu and P. P. Chang, "Achieving High
Instruction Cache Performance with an Optimizing Compiler,"
Proceedings of the 16th Annual International Symposium on Computer
Architecture, Jerusalem, Israel (May-June 1989),
pp. 242-251.
- K. Pettis and R. C. Hansen, "Profile Guided Code
Positioning," Proceedings of the ACM SIGPLAN Conference on
Programming Language Design and Implementation, White Plains, NY
(June 1990), pp. 16-27.
- R. Gupta and C.-H. Chi, "Improving Instruction Cache
Behavior by Reducing Cache Pollution," Proceedings of
Supercomputing '90, New York (November 1990), pp. 82-91.
- J. Ferrante, K. J. Ottenstein, and J. D. Warren,
"The Program Dependence Graph and Its Use in Optimization,"
ACM Transactions on Programming Languages and Systems 9,
No. 3, 319-349 (July 1987).
- R. R. Heisch, "FDPR for AIX Executables,"
AIXpert, No. 4 (August 1994), pp. 16-20.
- R. R. Heisch,
"Trace-Directed Program Restructuring for AIX Executables,"
IBM Journal of Research and
Development 38, No. 5, 595-603 (September 1994).
- I. Nahshon and D. Bernstein, "FDPR--A Post-Pass
Object Code Optimization Tool," Proceedings of the Poster Session of
CC '96--International Conference on Compiler Construction, Sweden
(April 1996), pp. 97-104.
- S. E. Speer, R. Kumar, and C. Partridge, "Improving
UNIX Kernel Performance Using Profile Based Optimization," 1994
Winter USENIX, San Francisco, CA (January 1994),
pp. 181-188.
- J. B. Chen and B. N. Bershad, "The Impact of
Operating System Structure on Memory System Performance,"
Proceedings of the Fourteenth ACM Symposium on Operating Systems
Principles, Asheville, NC (December 1993), pp. 120-133.
- F. G. Soltis, Inside the AS/400, Duke Press,
Loveland, CO (1996).
- M. H. Lipasti, IBM Rochester, MN, personal
communication (August 1995).
- Transaction Processing Performance Council, "TPC
Benchmark C: Standard Specification," Revision 3.2 (August 1996).
Available at
http://www.tpc.org/cspec.html.
- Note that the data may not be completely valid,
since changing the callers or callees for a procedure can change the
behavior of the procedure; in practice, however, treating the data as
valid gives good results.
- D. E. Knuth, The Art of Computer Programming,
Volume 1: Fundamental Algorithms, second edition, section 2.3.4.1,
Addison-Wesley Publishing Co., Reading, MA (1973).
- T. Ball and J. R. Larus, "Optimally Profiling and
Tracing Programs," ACM Transactions on Programming Languages
and Systems 16, No. 4, 1319-1360 (July 1994).
- A. Goldberg, "Reducing Overhead in Counter-Based
Execution Profiling," Technical Report CSL-TR-91-495, Computer Systems
Laboratory, Stanford University, Stanford, CA (October 1991).
- A. D. Samples, Profile-Driven Compilation,
Ph.D. thesis (Rep. UCB/CSD 91/627), Computer Science Department,
University of California, Berkeley, CA (April 1991).
- M. R. Garey and D. S. Johnson, Computers
and Intractability: A Guide to the Theory of NP-Completeness,
W. H. Freeman and Company, New York (1979).
- G. Kirchhoff, Annalen der Physik und
Chemie 72, 497-508 (1847).
- In some cases, this transformation interacts badly
with the instruction prefetching hardware scheme. The actual predicate
we use to determine whether to restart a trace with an isolated block is
therefore more complicated than described here.
- W. Berg, M. Cline, and M. Girou, "Lessons Learned
from the OS/400 OO Project," Communications of the ACM 38,
No. 10, 54-64 (October 1995).
- This same criticism might be leveled at our decision
to only consider intramodular arcs when determining procedure order
within modules. This was a pragmatic decision based on the availability
of intramodular call flow data within the MCA for the module being
translated. Alternatively, the full SLIC call graph could be analyzed
for each module, at the cost of additional compile time and a
shared-resource bottleneck (the SLIC call graph file) during
distributed compiles.
- For more detailed information on AS/400 internals,
see [Reference 26].
- This experiment was performed on a very early
pre-release version of OS/400 Version 3 Release 7, at which time fewer
critical modules were eligible to be compiled through the slicox.
The improvement to TPC-C was roughly 4 percent, and the improvements to
Program Model were between 5 percent and 12 percent in various
configurations. The numbers in
Table 7
should be analyzed in light of these lower performance figures.
|
 |
|
|