|
Distributed
and Fault Tolerant Computing
|
|
|
Computer
Science > Distributed
& Fault Tolerant Computing
> Computer Science Brochure
|
|
| Computer Science Brochure | |
|
Distributed and fault-tolerant computing at IBM Research combines cutting-edge hardware with novel software architectures for supporting large-scale distributed and highly fault-tolerant applications. Our distributed and fault-tolerant technologies are applicable in a diverse set of application areas, including financial markets, security services, clustered hardware, global computing, and wireless networks. We describe a small subset of current projects below. Advanced Message Brokering Messaging systems have long been a staple of large-scale distributed applications. An emerging trend is the use of loosely coupled messaging systems such as publish/subscribe. These systems allow for high performance and fault tolerance without sacrificing scalability. The Gryphon project (http://www.research.ibm.com/gryphon) is building message brokering middleware for advanced publish/subscribe applications. This middleware supports the efficient transferal of streams of events from information providers to information consumers by merging the best features of distributed communications technology and database technology. Java on Clusters Clustered systems promise high scalability at a low price. Applications designed for these systems, however, must often be customized to exploit the underlying scalability. This approach complicates development and limits portability. An alternative to customizing an application for a cluster is to leverage a virtual machine. The Cluster JVM (cJVM) project is building a Java Virtual Machine (JVM) which provides a single system image of a traditional JVM while executing on a cluster. The aim of cJVM is to obtain improved scalability by distributing the threads and objects of a multithreaded Java server application among the nodes of the cluster. (see http://www.haifa.il.ibm.com/projects/systech/cjvm.html) Thus, cJVM virtualizes the cluster; it supports any pure Java application without requiring any modifications to application code. This virtualization is supported by novel object, thread, and memory models that enable an application executing on cJVM to be oblivious to the placement of objects and threads in the cluster. Moreover, being a run-time system, cJVM achieves scalability via a set of speculative optimizations that leverage knowledge of Java semantics, usage patterns and on-the-fly code analysis. Middleware for Global Computing Global computing views the network as a data-store as well as a communication system. The TSpaces project aims to provide the underlying infrastructure necessary to support such globally connected information resources. TSpaces is a network communication buffer with database capabilities. It is a central component in the implementation of e-Space, a global community of users, resources, devices, and services. Specifically, TSpaces enables communication between applications and devices in a heterogeneous network by providing group communication services, database services, URL-based file transfer services, and rule-based event notification services. The goal of TSpaces is to make the power and flexibility of large-scale computing infrastructures available to isolated users running devices as simple as a Palm pilot. (see http://www.almaden.ibm.com/cs/TSpaces) MultiEnterprise Web Services In addition to cost-efficient scalability, clustered hardware allows e-business providers to use aggregate resources for servicing a diverse set of customers. The Océano project is an effort to establish a test-bed for prototyping advanced Web-hosting and e-business architectures using mostly commodity hardware. The Océano team is developing system management tools and application architectures that enable high availability and dynamic scalability, for delivering networked e-business applications and services. In particular, team members are designing and prototyping a scalable, self-managing hosting farm, with support for handling peak workloads through the dynamic reassignment of servers. Océano enforces the service level agreements provided to e-businesses over a shared infrastructure, without compromising their performance and security requirements. Pervasive Device Coordination The DEAPspace project exploits short-range (~3 meter) wireless technology to support transient networks of nomadic and pervasive devices. The ad hoc coupling of such devices allows for coordinated applications which exceed the capabilities of any single device. Such applications increase the value of portable computing devices by allowing them to share hardware and software resources. Several devices are being targeted, ranging from laser printers to car stereos to PDAs to wristwatch displays and to many other devices that we encounter in our environment. More importantly, however, DEAPspace places heavy emphasis on extensibility for future devices. While only a few devices may be supported today, the extensibility of DEAPspace ensures future compatibility without requiring modification of current devices. (see http://deapspace.zurich.ibm.com)
Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |