Cloud Operating System Technologies
Cloud computing is a disruptive paradigm, enabling new business models and new technological approaches to providing services over the internet. While the basic Infrastructure as a Service (IaaS) capabilities quickly become a commodity, quality of service guarantees typically associated with 'traditional' hosting environments, such as performance, availability, and security guarantees become differentiating factors.
The Cloud Operating System Technologies group pursues a broad research agenda in the area of Advanced IaaS Resource and Workload Management, involving contribution to future IBM products and offerings related to cloud infrastructure, as well as academic activities (papers, etc) and collaboration with R&D institutions in Israel and in Europe. In particular, in 2008-2011, we led the EU-funded project called Reservoir (http://reservoir-fp7.eu/) that was defined by NESSI as a flagship EU project in the area of cloud computing. Presently, we are leading the Cloud Hosting activity in a large EU-funded project called FI-WARE (Future Internet Core Platform, http://www.fi-ware.eu/), which is a joint work with leading companies across Europe, in Telco, IT and other industries. As part of the latter, we are building an innovative IaaS Cloud infrastructure based on OpenStack (http://www.openstack.org/) -- an emerging open-source project becoming a de-facto standard IaaS platform, widely adopted by the industry.
Our technical areas of expertise include:
- Policy-based resource management and optimization. When hosting large number of workloads on a shared cloud infrastructure, it is important to balance between the needs of the individual workloads hosted in the cloud (in terms of performance, security, availability and other nonfunctional properties) and the needs of the infrastructure provider in terms of cloud infrastructure TCO. We leverage state of the art optimization techniques together with our expertise in virtualization and cloud technologies to deliver a flexible resource placement and optimization engine for the Cloud, addressing considerations associated with different management domains, such as capacity, performance, energy, security, license management, and other.
- Capacity management with performance guarantees. One of the enablers of the cloud paradigm is the ability to oversubscribe resources, allowing to leverage statistical multiplexing in resource demand among different users and workloads. We have developed a unique methodology for planning and ongoing management of server capacity in a cloud environment., allowing to achieve significant savings in physical capacity -- without jeopardizing the resource allocation guarantees provided to individual workloads.
- Workload elasticity. Many workloads are dynamic in nature, requiring different amount of resources at different time periods -- depending on the actual load. When such a workload is hosted in the cloud, it is possible to leverage the 'pay as you go' model to adjust the actual cost of hosting the workload in the cloud according to the actual resource demand. We are developing techniques to identify resource bottlenecks that influence the performance of workloads running in the cloud, and to adaptively adjust the resource allocation in order to meet the performance goals of the individual workloads without the need for over-provisioning.
- Federated resource and capacity management across clouds. While the above techniques allow to build highly efficient and capable clouds, it is often the federation of multiple clouds, private or public, that allows to maximize the benefit of the cloud paradigm, leveraging the fluctuations in demand across different clouds, and allowing to optimize globally -- without the need to have a centralized management and decision making entity that controls all the clouds. We develop models, techniques and infrastructure components to allow multiple clouds to leverage each other's capacity and capabilities -- focusing on the scenario of federation of private clouds, where the overall goal is to optimize resources for the entire organization that owns all the clouds, while allowing autonomy of each individual cloud (e.g., hosted in different organizational or geographic units).