Skip to main content

IBM Research - Haifa

Job scheduler and allocator for massively parallel processors

Customer

Our customer was the BG/Q scheduler team at IBM Research.

Challenge

The BlueGene series of supercomputers are high-end examples of 'massively-parallel processors' (MPP). The strength of the supercomputer of this type stems from the coherent computation and communication between thousands of relatively simple processors arranged is some three-dimensional structure. Jobs submitted for execution on an MPP may need to queue up until previous jobs leave the machine. Each job in the queue may come with requirements for a specific number of processors, a specific topology of the connectivities between the processors (e.g., 'box', or 'torus'), expected processing time, and more.

The challenge is to allocate each of the queuing jobs on a set of processors, and to schedule the execution of the job. The allocation needs to take into account the job's requirements on size and topology of processors, and scheduling needs to consider the job's expected execution time. In addition to those hard constraints, the allocation and scheduling needs to maximize the utilization of the machine, while minimizing the wait time of the jobs, and avoiding starvation of any job in the queue.

Solution

At each decision point (i.e., a point when some job is done executing and leaves the machine) we examine the current queue of waiting jobs and build the following constraint satisfaction problem: The variables are the three dimensions of the location and the start-processing time of each job in the queue; the respective domains are the three dimensions of the location of each of the processors, plus semi-continuous time units; and the constraints are geometric constraints implied by the topology required by each job, and constraints that avoid overlap between different jobs on the machine (this is where the expected execution time enters the picture). In the case where all topologies are boxes, the latter constraints are no-overlap constraints between four-dimensional boxes, or the so-called diff-4 constraint.

The main strength of this solution compared to other state-of-the-art solutions is that at each allocation time, the constraint problem examines the entire queue of waiting jobs, and decides simultaneously on both the allocation and the schedule of the set of jobs. Such a holistic view can clearly allow for better optimization, however it is too complex for regular methods to handle.

Optimization is done by a simple branch and bound scheme, with a weighted optimization function over three optimization components: maximal utilization, minimal average wait time, and cuttoff on maximal wait time (to avoid starvation).

Achievements

The solution was never incorporated into any of the BlueGene series machines, mainly because it was developed too late for the business to catch on. However, a full fledged prototype has clearly shown the advantage of the solution compared to state-of-the-art algorithms.