Project
 
 Research Home  >> Smart Networking >> Caching Managed Storage Services Offload


Storage as Utility: Vision

Outsourcing the hosting and management of storage is becoming an increasingly attractive proposition to a large number of enterprises. There are several reasons for this trend. First, the major storage system acquisition costs are eliminated in favor of a pay-as-you-go, pay-per-use utility-like model . Second, data users can enjoy improved storage quality through expert-managed backup, configuration, and maintenance . Third, users can expect the provider to respond quickly to an unexpected growth in their capacity or performance requirements. Fourth, risky technology investments are avoided (e.g. The enterprise does not have to worry about whether their FibreChannel vs Gigabit Ethernet SAN implementation decision was the right one). Last but not least, data users can control their overall storage management costs.

IMPLEMENTATION:
Distributed cooperating storage locations

Storage service providers provide such hosting services through centralized storage locations (shared data centers). With a large number of customers, a storage hosting service will have to employ a large number of geographically distributed storage locations to host customers’ data. The collection of these Storage Locations (SLs) can be thought of as a single virtual storage repository, a global storage utility, accessed from any point on the global internet.

In such an environment, customer servers will be geographically distributed and possibly quite remote from the storage locations. Customer data will contain both archival as well as hot data sets. Archival data volumes may require a lot of capacity, but little throughput and bandwidth since they are rarely accessed. On the other hand, hot data sets may be small, but may demand a high access throughput. Similarly, some data will be mostly read and rarely updated, while other data may be more frequently updated. Customer data will evolve over time, with hot data sets becoming colder with time. Furthermore, some data volumes will be accessed by many sites, possibly globally distributed, as in a multinational corporation with sites all over the globe. Other volumes will be accessed privately by a single site. The placement of data volumes based on their access characteristics, whether they are shared by multiple (geographically distributed) servers, and their performance requirements is a challenging problem.

Effectively, such a distributed storage utility model can benefit from employing a Hierarchical Storage Management (HSM) system across the sites that would manage data placement based upon the characteristics of the data volume. In general, multiplexing storage resources distributed across the network to host a large number of customers promises to reduce the total cost of hardware required to provide the requisite performance and availability specified in customer service-level agreements (SLAs), compared to a scheme where customers are statically assigned to a single node (Storage location). A smart storage utility network that adapts the allocation of customer volumes to the proper nodes and migrates data around to balance capacity and throughput usage is likely to minimize total cost and improve performance and availability.

PROJECT FOCUS

At IBM Research, we are contributing to the architecture and implementation of such a global storage utility. The challenges include addressing the bandwidth and latency problems associated with remote data access, devising proper replication, allocation and load balancing algorithms, as well as investigating pricing models and management frameworks for such a storage utility.

 
 Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact