Performance programming for scientific computing

Instructors:

Bowen Alpern and Larry Carter.

Description:

Performance programming seeks to improve performance beyond what is achieved by programming an algorithm in the most expedient manner. The goal is that each processing element be kept as busy as possible doing useful work. This entails satisfying four requirements: breaking problems into independent subproblems that can be executed concurrently, distributing these subproblems appropriately among the processing elements, making sure that the necessary data is close to its processing element, and overlapping communication with computation where possible. To attain high performance, these requirements must be satisfied whether one views "processing elements" as stages of an arithmetic or vector pipeline, functional units of a CPU, processors of a tightly coupled shared-memory multiprocessor, nodes of a distributed-memory supercomputer, or heterogeneous computers on a network. This tutorial presents general techniques for satisfying each of these requirements and illustrates their use at many different levels of application.

This course will use extended examples, including two-dimensional seismic migration, protein matching, and computational linear algebra (matrix factoring, matrix multiplication, and its degenerate cases). Seismic migration is a representative of certain partial differential equation problems, protein matching is a typical dynamic programming application, and linear algebra is ubiquitous. Other examples, particularly from established performance benchmarks, will be introduced to illustrate particular points. While we will survey a large number of topics and techniques, the emphasis will be on mastering conceptual structures and understanding general principles rather than on learning details.

Intended audience:

The course is intended for computational scientists, application developers, and other professionals who have a need to design, implement, or tune high-performance scientific programs. It should also be of interest to computer scientists who want to develop languages, compilers, operating systems, architectures, and performance monitoring and debugging tools that can better support the needs of the performance programming community.

Level of presentation:

30% beginner; 50% intermediate; 20% advanced.

Outline:

Instructors:

Bowen Alpern received a Ph.D. in Computer Science from Cornell University in 1986. He has been a Research Staff Member at IBM since. His research interests include: performance programming, visualization of computation and architecture, theoretical models of hierarchical memory and parallelism, distributed and parallel computing, message compression, computational linear algebra, portable high performance, and high-performance Java. He has published more that twenty-five technical papers in Computer Science. He taught a graduate level course in Performance Programming for the Computer Science Department of Columbia University in 1994.

Larry Carter is a Professor in the Computer Science and Engineering Department of the University of California at San Diego, and a Senior Fellow at the San Diego Supercomputing Center. Dr. Carter received his Ph.D. degree from the University of California at Berkeley in 1974, and worked until 1994 at IBM's T.J. Watson Research Center in the areas of probabilistic algorithms, compilers, VLSI testing, and high-performance computation. His current research interests include scientific computation, performance programming, parallel computation, and machine and system architecture for high-performance computing.

Bowen and Larry developed the matrix multiplication package initially released with the RS/6000 and helped implement the NAS benchmarks on the IBM SP. They are coauthors of a number of papers on performance programming.

History:

This material was initially presented as a short course at the Cornell Theory Center in the summer of 1994. It was covered as a graduate course in the Fall of 1994 for Columbia University at the T.J. Watson Research Center and again in the Spring of 1995 by UCSD. (UCSD will offer it again in the Spring of 1997.) It was presented as a tutorial at SuperComputing '95 and '96. It will be given as a short course at the Eight SIAM Conference on Parallel Processing for Scientific Computation in March, 1997.



[ Performance programming home page | IBM Research home page ]
[ IBM home page | Order | Search | Contact IBM | Help | © | ® ]