|
Distribution & Consistency Services
(DCS) - Towards Flexible Levels of Availability
Different applications require different
levels of replication guarantees in order to provide better service
(e.g., high availability, scalability). For some applications
it is sufficient to replicate using unreliable multicast technology.
These applications will be able to continue service (e.g., on
another machine) with some previous copy of the state (order,
timing or synchronization are not important). Other applications
might require replication with guaranteed delivery and total order.
To continue service, these components will need to have the latest
computed state. These two kinds of applications represent two
points in the spectrum of replication and synchronization requirements.
Providing a new customized replication service to each application
is costly. The innovative technology developed for DCS promotes
a modular approach that allows applications with varied replication
requirements to effectively use DCS. Applications can instantiate
and dynamically configure a DCS component to fit their needs.

Figure 1 DCS and its interfaces
There are three parts to a DCS
component (see Figure 1): application interface, distribution
protocol stack (DPS) and system interface. The application interfaces
provide abstractions that allow the exploitation of the raw services
provided by the DPS. It supports two application interfaces: membership
and synchrony service (MSS) and data replication service (DRS).
These interfaces can be used by a resource manager component and
an object replication abstraction component correspondingly. The
DPS is based on our versatile replication infrastructure (VRI)
and is configurable with respect to its delivery guarantees. DCS
components should be able to make use of existing client investments
in software and hardware. Existing investments can range from
a Java Message Service (JMS) component such as a transport layer,
through cluster platform Group Communication Services (GCS) capabilities,
to special fast interconnects like InfiniBand. The flexible design
of the DPS and the system interface allow the investments to be
optimized.
The failure of DPS products to gain wide acceptance
is attributed mainly to the complexity of their interfaces and
their associated learning curve. There is a definite need for
a higher level abstraction to improve the usability of DPS.
There are two main versions of DCS: Core DCS
and Data DCS. There is one Core DCS per process and it provides
membership services among peer processes. These processes together
form a Core Group. A process may be a member in one or more named
Core Groups. Applications running on these processes can be members
of application groups. Application groups are subsets of a particular
named core group. A Data DCS component can be associated with
each member of an application group. There can be several Data
DCS instances per process. Data DCS instances rely on membership
information provided by the Core Group they are associated with.

Figure 2 The Big Picture
WebSphere
Application Server release 6.0 and WebSphere Extended Deployment
(XD) 5.1 introduce DCS as part of their high availability architecture.
The resource manager is the high availability manager and DRS
provides the object replication abstraction (see Figure 2).
DCS provides a mechanism for communicating information (distribution)
among members with a given quality of service. Failure detection
mechanisms that support and allow guaranteed quality of service
are an inherent part of DCS and its services. DCS supports WebSphere
components’ state replication requirements (e.g., http session
and stateful beans) as well as the distribution and synchronization
of WebSphere artifacts for performance, scalability, and availability.
The Haifa Research Lab has a long
history of work in the area of high availability. We worked
with the clustering support for iSeries, maintained the Cornell
Ensemble code, and are very active in high availability
research. We also develop high availability storage technology
at our labs and are working on issues of end-to-end high availability.
Our goal is to make high availability an engineering discipline
that can be used wherever it is needed. It is the first time
that such a technology is used in a commercial application server
- there is nothing comparable in other products. It makes WebSphere
resilient and competitive with OS and Hardware level cluster
setups.
Eliezer Dekel, Gera Goft, ITRA:
Inter-Tier Relationship Architecture for End-to-end QoS,The
Journal of Supercomputing, Volume 28, Issue 1 (2004)
Eliezer Dekel, Oleg Frenkel, Gera Goft, Yosef
Moatti, Easy:
Engineering High Availability QoS in wServices. 157-166, SRDS
2003
Eitan Farchi, Yoel Krasny, Yarden Nir, Automatic
Simulation of Network Problems in UDP-Based Java Programs. IPDPS
2004
New
IBM Software Protects Critical Business Systems From Costly Disruptions
Related News Articles
|
 |
 |
|
|
What is the most exciting potential
future use for the work you're doing?
This is the culmination
of many years of work with WebSphere. DCS and the high availability
architecture of WebSphere 6.0 make it feasible and affordable
to safeguard Internet business applications against outages
that can cost companies as much as $110,000 per minute in
lost revenue and productivity.
What is the most interesting part
of your research?
DCS is a modular group communication services component
that was developed in Java with a focus on performance and
scalability. The main challenge was in making a product
that is reliable and at the same time has very low latency
and high throughput.
What inspired you to go into this
field?
I have been involved in parallel and distributed
processing research for many years. Work on distributed
platforms was a natural evolution of my previous work. Currently,
I am looking into the business aspects of HA. I am studying
technologies that will allow easy development and deployment
of applications with given quality-of-service requirements
(with emphasis on resiliency).
What is your favorite invention
of all time?
It has to be the Internet. Although it is not an invention,
but actually an evolution (even though I have seen quite
a few people who claim to be the inventors), it has to be
one of the developments that had the greatest impact on
our world in modern times.
|
| Research Team |
 |
 |
 |
Eliezer Dekel |
Gera Goft |
Gabriel Kliot |
 |
 |
 |
Yoel Krasny |
Alex Krits |
Dean Lorenz |
 |
 |
 |
| Yoav Ossia |
Roman Vitenberg |
Alan Wecker |
| Collaborators |
RMM:
Gidon Gershinski
Nir Naaman
Avraham Harpaz
Boaz Carmeli Testing:
Eitan Farchi
Shmuel Ur
Yarden Nir-Buchbinder
WebSphere contacts:
Billy Newport
Gabe Montero
|
|
|