Skip to main content

Preservation DataStores and
Storlet Engine

Storage Research

Overview

As the volume of digital information continues to grow, we are faced with a paradox. We can read and interpret the Dead Sea scrolls written almost 2000 years ago, but we cannot do the same with data generated 20 years ago on a 5.25 inch floppy disk. Ironically, as the world becomes digital, we may be entering a digital "Dark Ages" in which business, public, and personal assets are in ever greater danger of being lost. But, on the other hand, there is an increased need for long-lived digital information. Additionally, compliance legislation, such as HIPAA and the Sarbanes-Oxley Act, which require long-term data viability, have increased the need to study how to preserve data for longer terms.

At the heart of any solution to the preservation problem resides a storage component, which is the permanent location of the information. We argue that to better preserve data economically, the storage must take preservation considerations into account. Preservation DataStores (PDS) is such a storage that has built-in support for long-term digital preservation based on OAIS. It transforms the logical OAIS AIP information object into physical storage objects. It performs preservation-related computations within the storage system via storlets running in a sandbox. It increases the value of the archived data by processing it and performing analytics within the archival storage.

PDS serves as infrastructure storage of EU Projects CASPAR, ENSURE and ForgetIT where it is experimented with various use cases from several domains including healthcare, clinical trials, financial, scientific, personal and organizational. In the recent project, ForgetIT, PDS is being integrated with DSpace. Some PDS concepts are leveraged in standardization efforts via the SNIA LTR technical working group.