IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Computer Science

Innovation Matters


Distributed & Fault Tolerant Computing

Distribution & Consistency Services (DCS) - Towards Flexible Levels of Availability

Different applications require different levels of replication guarantees in order to provide better service (e.g., high availability, scalability). For some applications it is sufficient to replicate using unreliable multicast technology. These applications will be able to continue service (e.g., on another machine) with some previous copy of the state (order, timing or synchronization are not important). Other applications might require replication with guaranteed delivery and total order. To continue service, these components will need to have the latest computed state. These two kinds of applications represent two points in the spectrum of replication and synchronization requirements. Providing a new customized replication service to each application is costly. The innovative technology developed for DCS promotes a modular approach that allows applications with varied replication requirements to effectively use DCS. Applications can instantiate and dynamically configure a DCS component to fit their needs.

Figure 1

Figure 1 DCS and its interfaces

There are three parts to a DCS component (see Figure 1): application interface, distribution protocol stack (DPS) and system interface. The application interfaces provide abstractions that allow the exploitation of the raw services provided by the DPS. It supports two application interfaces: membership and synchrony service (MSS) and data replication service (DRS). These interfaces can be used by a resource manager component and an object replication abstraction component correspondingly. The DPS is based on our versatile replication infrastructure (VRI) and is configurable with respect to its delivery guarantees. DCS components should be able to make use of existing client investments in software and hardware. Existing investments can range from a Java Message Service (JMS) component such as a transport layer, through cluster platform Group Communication Services (GCS) capabilities, to special fast interconnects like InfiniBand. The flexible design of the DPS and the system interface allow the investments to be optimized.

The failure of DPS products to gain wide acceptance is attributed mainly to the complexity of their interfaces and their associated learning curve. There is a definite need for a higher level abstraction to improve the usability of DPS.

There are two main versions of DCS: Core DCS and Data DCS. There is one Core DCS per process and it provides membership services among peer processes. These processes together form a Core Group. A process may be a member in one or more named Core Groups. Applications running on these processes can be members of application groups. Application groups are subsets of a particular named core group. A Data DCS component can be associated with each member of an application group. There can be several Data DCS instances per process. Data DCS instances rely on membership information provided by the Core Group they are associated with.

Figure 2

Figure 2 The Big Picture

WebSphere Application Server release 6.0 and WebSphere Extended Deployment (XD) 5.1 introduce DCS as part of their high availability architecture. The resource manager is the high availability manager and DRS provides the object replication abstraction (see Figure 2). DCS provides a mechanism for communicating information (distribution) among members with a given quality of service. Failure detection mechanisms that support and allow guaranteed quality of service are an inherent part of DCS and its services. DCS supports WebSphere components’ state replication requirements (e.g., http session and stateful beans) as well as the distribution and synchronization of WebSphere artifacts for performance, scalability, and availability.

The Haifa Research Lab has a long history of work in the area of high availability. We worked with the clustering support for iSeries, maintained the Cornell Ensemble code, and are very active in high availability research. We also develop high availability storage technology at our labs and are working on issues of end-to-end high availability. Our goal is to make high availability an engineering discipline that can be used wherever it is needed. It is the first time that such a technology is used in a commercial application server - there is nothing comparable in other products. It makes WebSphere resilient and competitive with OS and Hardware level cluster setups.

Selected Publications

Eliezer Dekel, Gera Goft, ITRA: Inter-Tier Relationship Architecture for End-to-end QoS,The Journal of Supercomputing, Volume 28, Issue 1 (2004)

Eliezer Dekel, Oleg Frenkel, Gera Goft, Yosef Moatti, Easy: Engineering High Availability QoS in wServices. 157-166, SRDS 2003

Eitan Farchi, Yoel Krasny, Yarden Nir, Automatic Simulation of Network Problems in UDP-Based Java Programs. IPDPS 2004


News and Information

New IBM Software Protects Critical Business Systems From Costly Disruptions

Related News Articles

Innovators Corner
Eliezer Dekel  
Eliezer Dekel
Researcher

What is the most exciting potential future use for the work you're doing?
This is the culmination of many years of work with WebSphere. DCS and the high availability architecture of WebSphere 6.0 make it feasible and affordable to safeguard Internet business applications against outages that can cost companies as much as $110,000 per minute in lost revenue and productivity.

What is the most interesting part of your research?
DCS is a modular group communication services component that was developed in Java with a focus on performance and scalability. The main challenge was in making a product that is reliable and at the same time has very low latency and high throughput.

What inspired you to go into this field?
I have been involved in parallel and distributed processing research for many years. Work on distributed platforms was a natural evolution of my previous work. Currently, I am looking into the business aspects of HA. I am studying technologies that will allow easy development and deployment of applications with given quality-of-service requirements (with emphasis on resiliency).

What is your favorite invention of all time?
It has to be the Internet. Although it is not an invention, but actually an evolution (even though I have seen quite a few people who claim to be the inventors), it has to be one of the developments that had the greatest impact on our world in modern times.


Team Members
Research Team
Eliezer Dekel Gera Goft Gabriel Kliot
Eliezer Dekel
Gera Goft
Gabriel Kliot
Yoel Krasny Alex Krits Dean Lorenz
Yoel Krasny
Alex Krits
Dean Lorenz
Yoav Ossia Roman Vitenberg Alan Wecker
Yoav Ossia
Roman Vitenberg
Alan Wecker
Collaborators
RMM:
Gidon Gershinski
Nir Naaman
Avraham Harpaz
Boaz Carmeli

Testing:
Eitan Farchi
Shmuel Ur
Yarden Nir-Buchbinder

WebSphere contacts:
Billy Newport
Gabe Montero


Related Links
arrowDiscipline: Computer Science
arrowResearch Area: Distributed & Fault Tolerant Computing
arrowResearch Site: Haifa
 

    About IBMPrivacyContact