Skip to main content

Spidercast

Spidercast is a peer-to-peer based ultra-scalable group communication technology. It is a scalable, fully distributed, membership and messaging infrastructure, based on peer-to-peer and overlay networking technologies.

This infrastructure provides a set of clustering services targeted for: resiliency-aware applications, fault tolerance, monitoring, load balancing, resource management, as well as distributed resource location and discovery. The goal was to support extreme scalability and target 1M Linux nodes. We achieved this using a combination of scalable P2P techniques, partitioning, and hierarchy.

The infrastructure is elastic and works well from small scale to large scale, enabling a "grow as you go" model. It is a lightweight, simple to use, dynamic/static link library, implemented in C/C++. The implementation was tested on thousands of nodes.

This project represents an evolution and refinement of the P2P middleware technologies we contributed to IBM's software group products.

Spidercast was developed as part of the HPC Colony-II project.

The goal of the HPC Colony-II project is to enhance the OS of HPC systems with scalable services and interfaces that permit easy application porting and general-purpose clustered computing.

The HPC-Colony II project is a joint research effort with three partners:

  • The Oak Ridge National Laboratory (PI: Terry Jones); Focus areas: Coordinated scheduling, reducing daemon noise
  • University of Illinois at Urbana-Champaign (PI: Laxmikant Kalé); Focus area: Charm++, an object-oriented, portable, parallel language
  • IBM T.J. Watson Research Center and Haifa Research Lab (Yoav Tock and Benny Mandler); Focus areas: Membership, messaging and monitoring infrastructure

Funding for the project was provided by a grant from the U.S. Department of Energy Office of Science.

Click to see full size