|
|
Cell Broadband Engine processor performance optimization: Tracing tools implementation and use
|
|
|
by M. Biberstein,
S. Dori-Hacohen,
Y. Harel,
A. Heilper,
B. Mendelson,
U. Shvadron,
E. Treister,
J. Turek,
and M. S. Chang
|
|
|
|
Optimizing performance on multicore processors is a daunting task
because of the increased importance of such factors as thread
communication, memory contention, and memory access latency.
This paper presents two tools that programmers and performance
analysts can use to understand application performance on the Cell
Broadband Engine® (Cell/B.E.) processor: the Performance
Debugging Tool (PDT) and the Trace Analyzer (TA). PDT traces
user-space events, augmenting them with scheduling data from the
operating system; those traces are then read, analyzed, and
presented visually by the TA. This paper describes the
implementation issues arising from the fact that a common low
overhead clock shared by all cores, essential for analysis and
visualization, is not available on the Cell/B.E. processor. The TA
employs an offline analysis to align the collected events to a
common time based only on thread-local timestamps, event order,
and context switch information. We also discuss the overhead of
tracing and its impact on execution and performance analysis. We
illustrate the use of the PDT and TA by analyzing several
significant Cell/B.E. processor workloads, including native code
and higher-level abstractions offered by the Data Communication
and Synchronization services. We show how trace analysis can help
identify performance issues in these workloads and how it can be
used by programmers to spot performance antipatterns (common
programming practices leading to suboptimal performance).
Full paper
|
|
|
|
|
|