IBM Skip to main content
  Home     Products & services     Support & downloads     My account  
  Select a country  
Journals Home  
  Systems Journal  
  ·  Current Issue  
  ·  Recent Issues  
  ·  Papers in Progress  
  ·  Search/Index  
  ·  Orders  
  ·  Description  
  ·  Author's Guide  
Journal of Research
and Development
  Staff  
  Contact Us  
Systems Journal  
Volume 37, Number 2, 1998
San Francisco Frameworks
 Table of contents: arrowHTML arrowASCII   This article: arrowHTML arrowASCII
arrowCopyright info
   

Profile-directed restructuring of operating system code - References

by W. J. Schmidt, R. R. Roediger, C. S. Mestad, B. Mendelson, I. Shavit-Lottem

Cited references and notes

  1. K. Roland and A. Dollas, "Predicting and Precluding Problems with Memory Latency," IEEE Micro 14, No. 4, 59-67 (1994).
  2. L. A. Belady, "A Study of Replacement Algorithms for a Virtual-Storage Computer," IBM Systems Journal 5, No. 2, 78-101 (1966).
  3. P. J. Denning, "The Working Set Model for Programming Behavior," Communications of the ACM 11, No. 5, 323-333 (1968).
  4. P. J. Denning, "Virtual Memory," Computing Surveys 2, No. 3, 153-189 (September 1970).
  5. C. V. Ramamoorthy, "The Analytic Design of a Dynamic Look Ahead and Program Segmenting System for Multiprogrammed Computers," Proceedings of the ACM National Conference, ACM Pub. P-66, ACM, New York (1966), pp. 229-239.
  6. T. C. Lowe, "Automatic Segmentation of Cyclic Program Structures Based on Connectivity and Processor Timing," Communications of the ACM 13, No. 1, 3-9 (January 1970).
  7. E. W. Ver Hoef, "Automatic Program Segmentation Based on Boolean Connectivity," Proceedings of AFIPS 1971 SJCC, AFIPS Press, Montvale, NJ (1971), pp. 491-495.
  8. J.-L. Baer and R. Caughey, "Segmentation and Optimization of Programs from Cyclic Structure Analysis," Proceedings of AFIPS 1972 SJCC, AFIPS Press, Montvale, NJ (1972), pp. 23-36.
  9. R. Snyder, "On the Application of a priori Knowledge of Program Structure to the Performance of Virtual Memory Computer Systems," Ph.D. thesis, University of Washington, Seattle, WA 98195 (November 1978).
  10. D. J. Hatfield and J. Gerald, "Program Restructuring for Virtual Memory," IBM Systems Journal 3, 168-192 (1971).
  11. D. Ferrari, "Improving Locality by Critical Working Sets," Communications of the ACM 17, No. 11, 614-620 (November 1974).
  12. S. J. Hartley, "Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems," IEEE Transactions on Software Engineering 14, No. 11, 1640-1644 (November 1988).
  13. Y. Wu, "Ordering Functions for Improving Memory Reference Locality in a Shared Memory Multiprocessor System," Proceedings of the 25th International Symposium on Microarchitecture, Portland, OR (December 1992), pp. 218-221.
  14. A. J. Smith, "Cache Memories," Computing Surveys 14, No. 3, 473-530 (1982).
  15. S. McFarling, "Program Optimization for Instruction Caches," Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA (April 1989), pp. 183-191.
  16. A. Mendelson, S. S. Pinter, and R. Shtokhamer, "Compile Time Instruction Cache Optimizations," ACM Computer Architecture News 22, No. 1, 44-51 (March 1994).
  17. W.-M. Hwu and P. P. Chang, "Achieving High Instruction Cache Performance with an Optimizing Compiler," Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel (May-June 1989), pp. 242-251.
  18. K. Pettis and R. C. Hansen, "Profile Guided Code Positioning," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, White Plains, NY (June 1990), pp. 16-27.
  19. R. Gupta and C.-H. Chi, "Improving Instruction Cache Behavior by Reducing Cache Pollution," Proceedings of Supercomputing '90, New York (November 1990), pp. 82-91.
  20. J. Ferrante, K. J. Ottenstein, and J. D. Warren, "The Program Dependence Graph and Its Use in Optimization," ACM Transactions on Programming Languages and Systems 9, No. 3, 319-349 (July 1987).
  21. R. R. Heisch, "FDPR for AIX Executables," AIXpert, No. 4 (August 1994), pp. 16-20.
  22. R. R. Heisch, "Trace-Directed Program Restructuring for AIX Executables," IBM Journal of Research and Development 38, No. 5, 595-603 (September 1994).
  23. I. Nahshon and D. Bernstein, "FDPR--A Post-Pass Object Code Optimization Tool," Proceedings of the Poster Session of CC '96--International Conference on Compiler Construction, Sweden (April 1996), pp. 97-104.
  24. S. E. Speer, R. Kumar, and C. Partridge, "Improving UNIX Kernel Performance Using Profile Based Optimization," 1994 Winter USENIX, San Francisco, CA (January 1994), pp. 181-188.
  25. J. B. Chen and B. N. Bershad, "The Impact of Operating System Structure on Memory System Performance," Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, Asheville, NC (December 1993), pp. 120-133.
  26. F. G. Soltis, Inside the AS/400, Duke Press, Loveland, CO (1996).
  27. M. H. Lipasti, IBM Rochester, MN, personal communication (August 1995).
  28. Transaction Processing Performance Council, "TPC Benchmark C: Standard Specification," Revision 3.2 (August 1996). Available at http://www.tpc.org/cspec.html.
  29. Note that the data may not be completely valid, since changing the callers or callees for a procedure can change the behavior of the procedure; in practice, however, treating the data as valid gives good results.
  30. D. E. Knuth, The Art of Computer Programming, Volume 1: Fundamental Algorithms, second edition, section 2.3.4.1, Addison-Wesley Publishing Co., Reading, MA (1973).
  31. T. Ball and J. R. Larus, "Optimally Profiling and Tracing Programs," ACM Transactions on Programming Languages and Systems 16, No. 4, 1319-1360 (July 1994).
  32. A. Goldberg, "Reducing Overhead in Counter-Based Execution Profiling," Technical Report CSL-TR-91-495, Computer Systems Laboratory, Stanford University, Stanford, CA (October 1991).
  33. A. D. Samples, Profile-Driven Compilation, Ph.D. thesis (Rep. UCB/CSD 91/627), Computer Science Department, University of California, Berkeley, CA (April 1991).
  34. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, New York (1979).
  35. G. Kirchhoff, Annalen der Physik und Chemie 72, 497-508 (1847).
  36. In some cases, this transformation interacts badly with the instruction prefetching hardware scheme. The actual predicate we use to determine whether to restart a trace with an isolated block is therefore more complicated than described here.
  37. W. Berg, M. Cline, and M. Girou, "Lessons Learned from the OS/400 OO Project," Communications of the ACM 38, No. 10, 54-64 (October 1995).
  38. This same criticism might be leveled at our decision to only consider intramodular arcs when determining procedure order within modules. This was a pragmatic decision based on the availability of intramodular call flow data within the MCA for the module being translated. Alternatively, the full SLIC call graph could be analyzed for each module, at the cost of additional compile time and a shared-resource bottleneck (the SLIC call graph file) during distributed compiles.
  39. For more detailed information on AS/400 internals, see [Reference 26].
  40. This experiment was performed on a very early pre-release version of OS/400 Version 3 Release 7, at which time fewer critical modules were eligible to be compiled through the slicox. The improvement to TPC-C was roughly 4 percent, and the improvements to Program Model were between 5 percent and 12 percent in various configurations. The numbers in Table 7 should be analyzed in light of these lower performance figures.