The reverse-acceleration model for programming petascale hybrid systems
by S. Pakin,
M. Lang,
and D. J. Kerbyson
Current technology trends favor hybrid architectures, typically
with each node in a cluster containing both general-purpose and
specialized accelerator processors. The typical model for
programming such systems is host-centric: The general-purpose
processor orchestrates the computation, offloading performance-critical
work to the accelerator, and data are communicated only
among general-purpose processors. In this paper, we propose a
radically different hybrid-programming approach, which we call
the reverse-acceleration model. In this model, the accelerators
orchestrate the computation, offloading work that cannot be
accelerated to the general-purpose processors. Data is
communicated among accelerators, not among general-purpose
processors. Our thesis is that the reverse-acceleration model
simplifies porting codes to hybrid systems and facilitates
performance optimization. We present a case study of a legacy
neutron-transport code that we modified to use reverse acceleration
and ran across the full 122,400 cores (general-purpose plus
accelerator) of the Los Alamos National Laboratory Roadrunner
supercomputer. Results indicate a substantial performance
improvement over the unaccelerated version of the code.