Cloud Foundations

The Cloud Foundation group is developing fundamental technologies for large-scale distributed systems.
One of the group's main focus areas is NFRs (non-functional requirements), such as developing solutions for achieving high availability and scalability in large-scale systems, for example in hospitals and air-traffic control centers. High availability ensures that these systems run with minimal downtime, and scalability enables them to adapt as they grow. Both characteristics are critical for technology to function in today's world of exponentially increasing information, boosted by cloud technology and industry standards that embrace a grow-as-you-go paradigm.
Specifically, our technologies enable the management, resilience, and robustness of various IBM services in the cloud.

In recent years, we have focused on developing Microservices Fabric, including two open- sourced services deployed to BlueMix.
In addition, we own critical dependencies in Watson Developer Cloud, including the service discovery and the distributed coordination service, which brought us an accomplishment award in 2016.
Our group is also a major contributor to the Blockchain Hyperledger Fabric platform, developing the communication layer and ledger distribution and replication.

Our major activities include:


The HyperLedger project is an open-source, industry-backed initiative seeking to develop a large-scale permissioned blockchain infrastructure. To achieve scalability, the Hyperledger Fabric architecture decouples the transaction logic execution and transaction ordering activities between two separate sets of entities called peers and orderers, respectively.

Our group is responsible for development and integration of new protocols for scalable replication of the ledger state among various entities in the Hyperledger Fabric network. Our solution employs gossip-based data diffusion to achieve:

  1. Scalability – the ability to support potentially large and dynamic groups of peers and rapid propagation of newly ordered transaction blocks supplied by the orderers.
  2. Security – the ability to guarantee that high levels of security (e.g., the integrity of the block data) are maintained outside the trusted core.
  3. Resilience - the consistent reconciliation of the ledger state over time to support dynamic changes in the peer membership and recover from long-term failures (such as network partitions).

New communication layer serves to connect between different nodes comprising the blockchain network

Decoupled architecture:


For the past several years, our research focus has been on enabling and extending a microservices runtime fabric.

This workstream started with assuming responsibility for managing the service discovery components of the Watson Developer Cloud, enabling us to get a better understanding and appreciation for the requirements and challenges of operating microservices at scale.

Consequently, we built and operated Bluemix Service Discovery, allowing customers to deploy microservices without requiring them to operate the underlying "plumbing". This effort, along with the Bluemix Service Proxy, were later open-sourced and made available via the Amalgam8 project (

Amalgam8 provides a substrate for enabling and controlling DevOps practices for polyglot applications, including continuous delivery, canary and A/B testing, resiliency testing, and content aware routing.

Recently, Amalgam8 functionality is extended as part of Istio, a larger community project involving multiple vendors interested in seeing the development and operations of microservices simplified. Istio addresses additional microservice concerns, such as security, API management, and distributed tracing.

Watson Developer Cloud

We have been part of the development of Watson Developer Cloud since its inception, helping to design and develop the common foundation of services and components that support the development and operation of Watson's cognitive capabilities as a set of modern cloud services.

In our work on Watson Developer Cloud we have been able to address some of the difficult technical issues involved in architecting high-volume, multi-tenant cloud services, and to relieve service teams from having to repeatedly reinvent the wheel when readying their services for the cloud.

Among other contributions, we are responsible for the service discovery service, which enables all Watson services to locate each other dynamically, and for the coordination service that allows distributed service instances to coordinate their actions in a consistent manner. We not only developed and deployed these services, but have also handled their day-to-day operation as well as continuous improvement based on data-driven engineering. In this way, by taking end-to-end responsibility for some of the most critical services, we have had a significant part in enabling Watson's transformation to a set of cloud-native services.


Michal Malka, Manager Cloud Foundations, IBM Research - Haifa