Cloud storage generally utilizes commodity off the shelf servers with powerful CPUs as the underlying storage nodes that serve large data sets accessed from anywhere over the WAN. We leverage the storage node processing capabilities to execute computational modules, namely storlets, close to where the data is stored. The Storlet Engine provides the cloud storage with capability to dynamically upload storlets and execute them in a sandbox that insulates the storlet from the rest of the system and other storlets. By using storlets, the client benefits of reduced bandwidth (reduce the number of bytes transferred over the WAN), enhanced security (reduce exposure of sensitive data), cost saving (reduce infrastructure at the client side), and compliance support (improve provenance tracking).
We implemented a Storlet Engine prototype for OpenStack Swift cloud storage open source. The Storlet Engine is tied to the storage system at intercept points and identifies requests for storlets. It then executes the storlet in a sandbox according to the client request and its credentials. The Storlet Engine can reside in the storage interface node (e.g., proxy node in Swift), and in the storage local data node (e.g., object node in Swift). The Storlet Engine provides special services storlets that can be used for more complex flow, e.g., the DistributedStorlet service enables runnig multiple storlets in parallel and combining their results. A Storlets Marketplace can be used as a repository of storlets from different vendors. An application on top of the storage can mashup and use the various storles from the marketplace to create its logic and functionality.
We developed several storlets with our partners in the EU projects including deidentification storlet to deidentify sensitive medical records, image transformation storlet to transform images to sustainable formats, and jointly with Philips we developed various Pathology imaging analysis storlets such as CellDetectionStorlet that detects cells in Pathology images. The Storlet Engine can be used by its own without PDS for workloads that require data-intensive compute.