Skip to main content

Cloud Storage

Cloud Content Store (CCS)

This project investigates the architectural challenges in designing a data infrastructure for highly scalable and economical environments.  Specifically, our work focuses on the Cloud Content Store (CCS) storage service.  CCS is a secure cloud storage for unstructured data and content depots that is continuously available everywhere, for any device and application.  The data infrastructure should support the following requirements:

Replication:  Several replicas of each item are stored in data centers across the cloud.  This is important for:
  • Increased availability
  • Backup and fault tolerance
  • Disaster recovery
  • Consistency controlled by the application
Location awareness: The client obtains the item of interest from the closest data center. This enables:
  • Increased availability
  • Minimal access time
  • Load balancing
Resource utilization: In order to achieve scalability, cost efficiency, and load balancing we need to develop efficient algorithms to manage and utilize a large set of heterogeneous resources.
Figure 1
Figure 1: Illustrates location awareness in which the client accesses the replica closest to his location

 

CCS features:

Data model: We manipulate objects, which are write-entirely, read-many BLOBs, with unique IDs within the namespace.  A namespace is an abstract container providing context for the objects, while fulfilling the following properties:

  • A namespace can grow/shrink on-demand with no arbitrary limits
  • Each namespace is replicated to d datacenters, where the replication factor d can be defined independently for each namespace
  • Objects of same namespace have the same placement

Below are the main features supported by CCS.

Automated replica management: We support automated and transparent replica placement and copying throughout the namespace lifecycle. All replica operations are transparent to users and do not disrupt regular I/O traffic.

Multi-master optimistic replication:
All requests can be served from any replica at any time. This model has several advantages:

  1. Full load balancing for updates as well as reads.
  2. All sites are autonomous, so each can continue to serve requests even when disconnected

Parallel Disaster Recovery:  When a data center failure occurs, we need to restore the degree of replication for the namespaces that it contained by choosing new placements for each of the lost replicas and copying the objects in them to the new location.  To expedite this process, each recovered namespace can be  sourced from a different data center (DC) and sent to a different destination.

Secure access:
We are developing a new authorization model that will ensure secure access in the cloud environment.

 

Figure 2
Figure 2: Multi-master optimistic replication, in which all replicas are equal and can be served from any data center in the Cloud.

 

IBM R&D Labs in Israel

Learn about all IBM R&D Labs in Israel.