Storage Research
Our activities in the area of storage research are focused on new and advanced features for next-generation storage systems and storage services. Our mission is to work with and enhance IBM's storage products and offerings. We emphasize collaboration with IBM storage technologies conceived and developed in Israel, such as IBM XIV storage systems, IBM ProtecTIER enterprise data deduplication solutions, and IBM Real-time Compression. We strive to work with IBM's Growth Markets unit to deliver storage building blocks that are well suited for hyper-growing markets. Our research areas include cloud storage, compression and deduplication, I/O virtualization, digital preservation, power management for storage systems, and security for storage. We collaborate with European academic and industrial organizations on EU projects, work with standards organizations on storage industry standards, and contribute to open source projects.
As part of IBM's SmartCloud vision, two of our key focus research areas are cloud storage and storage support for compute cloud.
Cloud storage is an emerging market that entails providing storage as an online web-based service, on a per-use basis, via public or private clouds. We are particularly interested in developing enterprise-worthy cloud storage technologies. In our Cloud Storage project, we architect and build the object storage cloud infrastructure needed to deliver highly scalable and cost-competitive storage services for the enterprise, which is continuously available everywhere for any device and application.
- We built Web Storage Service (WSS), which implements SNIA's CDMI industry standard for Cloud Storage. In June 2012, IBM declared in a Statement of Direction
"IBM intends to deliver Cloud features ... [including] Web Storage Services, a standards-based object store and API that implements the Cloud Data Management Interface (CDMI) standard from Storage Networking Industry Association (SNIA) to support the implementation of storage cloud services." - Our research prototype of a Cloud Object Store is a secure, scalable, and geographically-distributed storage service for web applications that primarily targets objects that are write-once, read-many (such as unstructured data of images, movies, or email attachments). It provides rich metadata services for the stored objects; employs storage optimizations such as object-level deduplication, data replication, resiliency and retention; and studies topics such as privacy and data placement in cloud environments.
We lead a large project called VISION Cloud via the European FP7 program. VISION Cloud introduces an infrastructure for reliable and effective delivery of data-intensive storage services, facilitating the convergence of ICT, media, and telecommunications. IBM builds the underlying optimized object storage for VISION Cloud. The VISION Cloud infrastructure is validated by use cases from Telco, media, healthcare, and enterprise.- We enhance existing cloud storage services to provide specialized data management services for applications such as data archiving and data preservation.
In April 2012, IBM joined the OpenStack open source foundation as a founding member. Our team was the first to make major IBM contributions to OpenStack by contributing a volume driver for the IBM Storwize and SVC storage systems to OpenStack's Nova and Cinder. These drivers enable clients with IBM storage products to fully participate in OpenStack deployments, and serve as a platform for future functions being developed by IBM Research.
With the explosion of data stored by organizations, data reduction techniques such as deduplication and compression are becoming ever more popular. We are therefore focusing research efforts in this area to assist both IBM customers and IBM storage products in data-reduction optimizations. We work on accurate methods to evaluate how much capacity these data-reduction techniques actually gain. We developed the Comprestimator Utility, a rigorous yet extremely efficient tool to accurately estimate the expected compression rate for storage block devices. We also devised a series of in-depth studies on the security aspects related to full-file deduplication in the cloud.
We are active in the area of long term digital preservation (LTDP), which deals with the preservation of large amounts of heterogeneous data for long periods of time. We are leading ENSURE, a European FP7 project, that is researching ways for long-term preservation of digital data for enterprises and specializing on the adoption of cloud technologies for this purpose. We participate in the APARSEN Network of Excellence and participated in the FP6 CASPAR project. We are developing Preservation DataStores (PDS) that provides OAIS-based preservation-aware storage services. We are leading the Long Term Retention (LTR) Technical Working Group of the SNIA.
Modern storage platforms are expected to provide a rich set of functions and requirements, such as deduplication, encryption, file and web serving, database, and more. Traditional approaches have expanded the storage system either via deep integration or as an external gateway dedicated to the implementation of the function. We explore a new approach that uses the Linux Kernel-based Virtual Machine (KVM) hypervisor for this purpose. While providing many benefits, KVM IO performance today is an obstacle to its adoption. Our work shows that by properly using a set of techniques, the overhead of the internal (back-end) block IO performance can be made negligible, thus showing the feasibility of using a virtual infrastructure to integrate new functions into a storage controller.
Designing power-aware systems that help reduce the power consumption in next-generation datacenters is a major challenge for the IT industry. Since storage consumes about 40% of the total datacenter energy, we pioneered an activity around power management for storage systems, as part of IBM's energy and environment initiative. In this activity, we are studying the factors that affect power usage in a storage system, are modeling a storage system and its components (taking power usage into account), and are designing power-efficient storage algorithms. We specialize in workload-dependent storage power consumption, and have developed a comprehensive study of the tradeoffs between key I/O workloads and their power consumption behaviors. We participated in the GAMES EU project for green active management of energy in IT service centers, and focused on the storage energy management.
Activities
Selected Publications
Below is a selected list of publications that have been recently published by our team.
- Estimation of Deduplication Ratios in Large Data Sets.
Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov, and Gil Vernik. MSST 2012 (Research Track), pp. 1-11 - Adding Advanced Storage Controller Functionality via Low-Overhead Virtualization.
Muli Ben-Yehuda, Michael Factor, Eran Rom, Avishay Traeger, Eran Borovik, and Ben-Ami Yassour.
FAST '12: 10th USENIX Conference on File and Storage Technologies - Proofs of Ownership in Remote Storage Systems.
Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg.
ACM Conference on Computer and Communications Security 2011:491-500 - Side Channels in Cloud Services: Deduplication in Cloud Storage.
Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg.
IEEE Security & Privacy 8(6): 40-47 (2010) - A Cloud Environment for Data-intensive Storage Services.
Elliot K. Kolodner, et al. CloudCom 2011: 357-366 - Secure Access Mechanism for Cloud Storage.
Danny Harnik, Elliot K. Kolodner, Shahar Ronen, Julian Satran, Alexandra Shulman-Peleg, and Sivan Tal.
Scalable Computing: Practice and Experience 12(3): (2011) - Leveraging Disk Drive Acoustic Modes for Power Management.
D. Chen, G. Goldberg, R. Kahn, R. Kat, and K. Meth.
MSST2010 (Research Track) - Storage Modeling for Power Estimation.
Miriam Allalouf, Yuriy Arbitman, Michael Factor, Ronen I. Kat, Kalman Z. Meth, and Dalit Naor.
SYSTOR 2009: 3 - Low Power Mode in Cloud Storage Systems.
Danny Harnik, Dalit Naor, and Itai Segall.
IPDPS 2009: 1-8 - Preservation DataStores: New storage paradigm for preservation environments.
Simona Rabinovici-Cohen, Michael Factor, Dalit Naor, Leeat Ramati, Petra Reshef, Shahar Ronen, Julian Satran, and David L. Giaretta.
IBM Journal of Research and Development 52(4-5): 389-400 (2008) - Towards SIRF: self-contained information retention format.
Simona Rabinovici-Cohen, Mary G. Baker, Roger Cummings, Sam Fineberg, and John Marberg.
SYSTOR 2011: 15 - Authenticity and Provenance in Long Term Digital Preservation: Modeling and Implementation in Preservation Aware Storage.
Michael Factor, Ealan Henis, Dalit Naor, Simona Rabinovici-Cohen, Petra Reshef, Shahar Ronen, Giovanni Michetti, and Maria Guercio.
Workshop on the Theory and Practice of Provenance, in FAST 2009
Past Activities
In the past we applied CDP (Continuous Data Protection) capabilities to block-based storage to provide storage support for Virtual Machine availability and synchronize it with the systems state. We also developed the Capability-based Command Security (CbCS) technology, which provides a cryptographic mechanism to enforce access control at the storage device. CbCS was standardized in the T10 technical committee of INCITS. Our group has been involved with emerging storage standards, for example, the 100 Year Archive Task Force, XAM, IEEE P1619, and OSD. In the past, our group developed and standardized the cornerstone technology of object storage, worked on early iSCSI prototyping and definition, and developed search capability in a file system.
