Skip to main content

IBM R&D Labs in Israel News

CloudFab: Keeping it together

Research creates IBM management middleware that brings scalability to the cloud

   HRL CloudFab team (ltr): Roie Melamed,
   Yoav Tock, Alexey Roytman, Vita Bortnikov,
   Gregory Chockler, Gennady Laventman,
   Eliezer Dekel
When does a group of connected computers become a ‘cloud’? And what kind of technology manages them so you don’t notice which computers are going online, which go offline, and which one is running your application?

Although many of us are familiar with the concept of cloud computing, we still have a tough time explaining it to our friends or children. Using electricity as an example is often a good way to get the point across. When electricity was first invented, each company had to build their own power plant, managed by their own engineers. As time went on, central power plants were used to provide the electricity as a utility, where everyone could use the power based on their own needs. Following a similar line of thinking, it makes sense to have computing available as a utility, delivered via the Internet, just as water or electricity come to us through wires and pipes. Cloud computing by nature is scalable and elastic, enabling people to use exactly the capacity that is needed, without having to worry about which data center or computers it comes from.


What’s a cloud and what holds it together?

The idea of clustering computers and providing them with group communication is not new. When heavy duty applications need more power, we bring additional computers online. The clustering enables a group of computers to appear as a single computer managed in a transparent manner. For example, when we shop online, we aren’t aware of which computer is serving us—the entire group acts as one big computer. In fact, our work is often transferred from one computer to another, without us noticing.

To accomplish this feat, the cluster must be able to carry out replications, synchronize between computers, and perform distribution. Each time a computer is added to the group, the replications, synchronization, and distribution become more complex. The complexity of some of the algorithms we use to synchronize grow quadratically with the number of participating computers. Paradoxically, adding more computers to the group means more work for the group. At a certain point, we can’t just keep adding more users by adding more computers.

With this challenge, how do we manage a collection of 1000 computers or more so they appear to be acting as a single computer? CloudFab, short for cloud fabric, is the IBM management infrastructure that binds the computers and provides the necessary scalability that can turn a group of computers into a cloud of almost infinite size. Basically, a cloud is similar to a cluster of computer, but where the cluster is limited in size, the cloud can contain many more computers at many different locations.


One fabric to bind them

"Our goals were to create a fabric that was simple to configure, easy to manage, autonomic, and with a low probability of failure,” noted Eliezer Dekel, manager of Distributed Middleware at the IBM Haifa Research Lab where CloudFab was created. “We knew that failure (of computers) is inevitable and we wanted to make sure that applications and users could carry on working as usual.”

As opposed to other cloud infrastructures developed by Amazon or Google, IBM’s CloudFab has always provided application elasticity. “For example, we can specify a minimum and maximum number of replicas that should be sustained for any application, thereby more closely controlling the quality of service while ensuring that unneeded copies are not being generated,” explained Dekel.

CloudFab is currently being deployed in the Research Compute Cloud (RC2), and is also planned for integration with the China Research Lab’s work with Jomo. The WebSphere Virtual Enterprise (VE) is another product where part of the CloudFab technology is contributing its stability and scalability.


Scalable, flexible, and supports virtualization

CloudFab is a cloud system management middleware that is truly scalable. It’s based on peer-to-peer overlay technology, in which not every computer is connected to all other computers. The number of connections required for each computer in CloudFab is relatively low, which helps extend scalability. The fabric uses an overlay that connects the computers, ensuring that each computer is connected to others through several paths—in case one fails. The system also supports interest-aware membership so the system can keep track of which computers are connected, which applications are running, and other important statistics regarding the status of various components.

“It’s the fabric that creates the cloud,” notes Dekel. “An innovative fabric that supports scalability and virtualization gives us the freedom to carry out load balancing, failover, and other activities so the application is never dependant on any particular physical element. This is what gives CloudFab its superb flexibility.”

What are the plans for future research directions? Dekel is already working on a CloudFab blueprint that will incorporate way beyond 1000 computers, leading the way to autonomous cloud shires and their hierarchies, forming massive clouds.

Blue Cloud

Since October 2007, IBM has worked with 35 universities, extending cloud services to students and researchers. As of today, IBM has built nine Cloud Labs for business around the world, and will expand the number of labs to 20.


RESERVOIR (Resources and Services Virtualization without Barriers) is a European Union FP7-funded Cloud Computing research project that will enable massive-scale deployment and management of complex IT services across different administrative domains, IT platforms, and geographies.

The IBM Haifa Research Lab plays a key role in the RESERVOIR project, serving as the project coordinator, leading the architecture definition work, and leading or participating in a number of technical work packages.