LeProf, a source-level profiling tool

LeProf, and its companion post-processing tool LeProft, are used to profile programs, providing information regarding the programs' behavior. LeProf is characterized by dynamically instrumenting the program to be profiled, thereby being able to capture the effects of user code and shared libraries.

LeProf gathers extensive data about the instructions executed by the program, the function call behavior, the behavior of branches, and the data cache behavior (i.e., cache misses). Data is gathered at the following granularities:

  • for the entire program;
  • per function;
  • per line of source code (if the program has been compiled using the option "-g");
  • per PowerPC instruction.

Moreover, the tools generate the function-call graph, weighted by the frequency of function call invocation.

The tracing/profiling mechanism relies on Aria, the tool for dynamic instrumentation of programs. A program is dynamically instrumented by Aria, generating an execution trace that is used as input to a trace analyzer. The trace analyzer collects the data regarding the execution of the program, and generates a file with the results. This file can be used as is to extract information regarding the program, or can be used as input to LeProft for generating summary results at the source code line leval (assuming that the program has been compiled with the -g option; note that the xlc family of compilers permits the use of -g with various levels of optimization, including -O2 and -O3).

For example, finding the "hot spots" in the program foo.c is achieved as follows:

  • compile the program using -g flag (required to gather data at the source code line level)

     xlc -O2 -g -o foo foo.c

  • execute the program

     leprof -o foo.stats foo inputs

  • find the source code line from which the most instructions are executed

     leproft total foo.stats

The tool gathers information regarding the data cache behavior of a program by simulating the directory of a data cache memory. Currently, the following configurations are available (which is selected as a run-time option):

     603, 603e, 604: 16k, 4way, 32 byte line
     604e: 32k, 4way, 32 byte line
     POWER,RIOS: 64k, 4way, 128 byte line
     P2SC: 128k, 4way, 128 byte line
     POWER2,RIOS2: 256k, 4way, 256 byte line .....

See the Publications and Presentations for further information regarding LeProf and LeProft.