Research

Scalable Parallel Systems

Our work in Scalable Parallel Systems focuses on the technologies needed for the design and effective use of scalable systems such as the IBM PowerParallel SP2 or workstation clusters. The effort was established in 1986 to develop the software support for the Vulcan massively parallel computer prototype. The technology that we developed provided much of the foundation for the SP product line: communication software, parallel I/O, tools, libraries, application codes, etc.

Our current research activity is focused in the following areas:

”” Scalable computer architectures. What communication models will future architectures support for parallel processes? Our past work has focused on message passing. After contributing to the design and development of the MPL message passing library of native SP communication commands, we contributed to the design of the de facto industry standard MPI (Message Passing Interface) and developed MPI-F, a complete, high-performance implementation of MPI on the SP2. This technology has been incorporated into the SP parallel environment product that supports both MPI and MPL. We are involved in the MPI2 forum, which considers extensions to MPI, such as Remote Memory Access (put/get). Our architecture work focuses on support and exploitation of communication primitives that allow us to support efficiently shared memory programming models, without suffering from the scalability problems of conventional shared memory support.

”” Scalable system services How does one build, atop a distributed operating system, the global system services that are needed to support well tightly coupled parallel applications? Our past work has focused on parallel I/O. The Vesta parallel file system prototype that was developed in Research provided much of the technology for the PIOFS parallel I/O file system product. We are currently involved in the design and implementation of an MPI-IO Portable Parallel I/O Library. Other ongoing parallel system activities are in the area of parallel job scheduling, and dynamic binding to parallel objects.

Programming environments and tools. The DRMS Distributed Resource Management System allows users dynamic control of their parallel run-time environments, and the UTE Unified Trace Environment is a powerful, versatile tool for understanding parallel program behavior.

Parallel benchmarks and applications. By paying attention to parallel computation as well as parallel I/O, we've created high-performance algorithms and applications in a range of disciplines: molecular dynamics, seismic processing, acoustical modeling, finite element methods, computational fluid dynamics, meteorological modeling, and data mining. We also contributed significantly to the development of record-setting NAS Parallel Benchmarks NAS Parallel Benchmarks and Linpack TPP benchmarks for the SP2.


More information about our work and some of our people:
Marc Snir, IBM T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY 10598; snir@watson.ibm.com, phone: 914-945-3204, fax: 914-945-4425.
George Almasi (galmasi@watson.ibm.com) maintains this page. 12 February 1996.
[ IBM homepage | Watson | Order | Search | Contact IBM | Help | (C)| (TM) ]