IBM Israel
Skip to main content
 
Search IBM Research
   Home  |  Products & services  |  Support & downloads  |  My account
Select a Country Select a country
IBM Research Home IBM Research Home
IBM Haifa Labs Homepage IBM Haifa Labs Home

Blue Gene/L Job Management

Distributed Computing Systems
Project Homepage
 ·Publications
 ·Contact Information
Feedback


Blue Gene/L Job Management

Blue Gene/L is a new member of the new IBM Blue Gene family of supercomputers. It is designed jointly by IBM and Lawrence Livermore National Laboratory (LLNL) of the National Nuclear Security Agency (NNSA). Blue Gene/L will be at least 15 times faster (operating 200 trillion operations per second), 15 times more power efficient and consume about 50 times less space per computation than today's fastest supercomputers. This is mainly due to innovation in hardware design. Blue Gene/L is also part of IBM's research in "autonomic computing", an initiative to design computer systems that are self-healing, self-managing and self-configuring.

The BG/L computer is a tightly-coupled machine composed of 2^16 (65,536) compute nodes. The compute nodes are interconnected in a 64 x 32 x 32 physical 3-dimensional torus (see the figure). The main interconnection network allows fast message-passing between compute nodes. Several other auxiliary interconnect networks are dedicated to I/O and to fast global (e.g., MPI reduction) operations.

Each compute-node executes at most a single process at a time without interrupts and context switches. Special I/O nodes serve as bridges between the compute nodes and the outside world (file systems, network). I/O (and other complex) operations are shipped from compute node to I/O nodes for execution. This has no impact on the programming model supported by BG/L. BG/L preserves the MPI programming model on top of the Linux operating system.

Jobs run inside partitions. A partition is set of compute nodes, connected as a torus or as mesh. Partitions are created dynamically upon job launching and are destroyed upon job completion. Partitions are electronically isolated, so messages related to different jobs are never transmitted over the same wires. In practice, partitioning does not involve efficient routing of messages to avoid network congestion but rather efficient allocation of wires (e.g., using a minimal number of wires) to allow flexible partition allocation.

The job management system for BG/L is a joint development effort of IBM Haifa research laboratory and IBM Watson research center. The job management system is responsible for allocating partitions and scheduling jobs to optimally utilize the machine's resources (compute nodes). Once the job is scheduled, the system will launch, monitor, signal (if needed) and terminate the job execution.

The job management system is composed of four main components:
  • Partition management: creation (wiring) and destruction of partitions.
  • Resource management: monitoring and reporting configuration and runtime status information of all the machine's entities (e.g., I/O and compute nodes).
  • Job control management: launching, monitoring, signaling and terminating jobs on compute nodes.
  • Job scheduling. Managing incoming jobs, schedule them, and allocate partition for them efficiently on the BG/L machine. This component is based on LoadLeveler, the IBM job management system, which allows users to run more jobs in less time by matching the jobs' processing needs with the available resources.

The Haifa team is directly responsible for job scheduling and partition allocation. We are also involved in the design of the management components.

With 65,536 compute nodes and 1024 I/O nodes, BG/L represents a new level of scalability for parallel computers. Job management in such a system is one area where this new level of scalability poses a major research challenge. Current state-of-the-art approaches have had only limited success on systems with more than 1,000 nodes. Our strategy with respect to system management on this scale is based on an hierarchical organization of the machine.We are also developing allocation methods that take into account the particular network topology of BG/L. We will gradually validate our approaches in simulations, than on smaller scale prototypes, and ultimately on the full-scale production BG/L computer.

 

  About IBM  |  Privacy  |  Legal  |  Contact