Virtual Machine Placement
Customer
Our customer is the Virtualization and Systems Management group at IBM Research. This group focuses on advanced and autonomic management aspects of cloud computing, as a part of the IBM Blue Cloud initiative.
One of its major activities is the Advanced Ensemble Management. Ensemble is a self-managed cluster of similar servers with virtualization capabilities. Each ensemble is controlled by a Local Ensemble Manager (LEM).
Challenge
The challenge is to place the virtual machines on a set of actual machines (hosts), taking into account the working mode of the ensemble.
Solution
First, a special algorithm devised by the LEM team determines minimal and estimated resource requirements (CPU, memory, slots, cores) for each virtual machine. Then, information on the ensemble is translated into location and anti-location constraints between virtual machines and hosts, and to collocation and anti-collocation constraints between groups of virtual machines.
The CSP model comprises:
- Location and anti-location constraints.
- Collocation and anti-collocation constraints.
- Constraints that ensure that the sum of minimal demands in terms of CPU, memory and slots of the virtual machines placed on a certain host does not exceed this host's capacities in these resources.
- Constraints that ensure that the virtual machine requirements on number of cores of the host will be met.
In addition to the above mandatory constraints, the placement needs to maximize the load balancing score of the ensemble, while minimizing the energy cost and the price of changes from a given previous placement. The relative importance of those goals is determined by the ensemble working mode. Optimization is done by a simple iterative scheme, with a weighted optimization function over those three optimization components. We also allow the user to define a preset percentage of tolerance (allowed distance from the optimal value) on the optimum.
Achievements
The solution has not yet been incorporated into LEM. Preliminary prototype results show the solution to be scalable as well as flexible.
The model was also extended to ensure high availability, providing alternative placements for a failure of up to a pre-defined number of hosts. This extension led to a patent and a paper which will be presented at ICSDS 2011.
Contact: Odellia Boni (odelliab@il.ibm.com)