Abstract for
Locality Analysis for Distributed Shared-Memory Multiprocessors

This paper studies the locality analysis problem for shared-memory multiprocessors, a class of parallel machines that has experienced steady and rapid growth in the past few years. The focus of this work is on estimation of the memory performance of a loop nest for a given set of computation and data distributions. We assume a distributed shared-memory multiprocessor model. We discuss how to estimate the total number of cache misses (compulsory misses, conflict misses, capacity misses), and also the fractions of these cache misses that result in local vs. remote memory accesses. The goal of our work is to use this performance estimation to guide automatic and semi-automatic selection of data distributions and loop transformations in programs written for future shared-memory multiprocessors. This paper also includes simulation results as validation of our analysis method.