MapReduce for the Cell Broadband Engine Architecture
by M. de Kruijf
and K. Sankaralingam
MapReduce is a simple and flexible parallel programming model
proposed by Google for large-scale distributed data processing. In
this paper, we present a design and prototype implementation of
MapReduce for the Cell Broadband Engine® Architecture
(CBEA). The MapReduce model provides a simple machine
abstraction that shields users from parallelization and other
distributed programming complications. The goal of this paper is to
describe the tradeoffs in the design of the runtime and demonstrate
the potential for high performance. We study the basic
characteristics of the MapReduce model and identify three types of
MapReduce applications: map dominated, partition dominated,
and sort dominated. We evaluate our runtime performance,
scalability, and efficiency for microbenchmarks representing each
of these application types as well as for complete applications. We
find that map-dominated applications map well to the CBEA and
that our prototype sustains high performance on these applications.
For partition-dominated and sort-dominated applications, we
analyze runtime performance, identify sources of inefficiency, and
propose several future enhancements to significantly improve
performance. Overall, we find that the simplicity and efficiency of
the model make it an attractive tool for programming Cell
Broadband Engine processor-based platforms.