Cloud analytics acceleration

Zurich analytics cloud (ZAC)

The Zurich Analytics Cloud (ZAC) project is an effort to build a high-performance analytics stack with a focus on unleashing the full performance of modern, fast I/O devices such as RDMA interconnects or Flash storage. ZAC is based on open interfaces and delivers a set of components that work in concert with well accepted analytics ecosystems such as Hadoop or Spark.

With ZAC we aim to leverage directly the excellent distributed computing services these frameworks provide, e.g., workload distribution, scheduling, support for graph/stream/interactive workloads, etc., while substantially improving its performance.

Stacks like Hadoop and Spark have been architected for commodity off-the-shelf hardware and their I/O processing is typically highly CPU intensive. Consequently, these frameworks are a bad fit for upcoming fast I/O devices (e.g., 40 GbE network, NVM storage, etc.).

In ZAC, we are re-architecting network and storage I/O for these systems at the level of components.

For instance, Peregrine is a distributed file system specifically designed for efficient I/O based on the availability of fast Flash and RDMA devices and is suitable for fast working set storage during data processing. Peregrine fully implements the HDFS interface and can be used from within Hadoop or Spark without requiring modifications to these engines.

DSA and FlashNet are other pieces of the ZAC stack providing efficient local and remote Flash access. Again, these components can be used transparently from within existing applications as Flash access is abstracted in a way that is compatible with standard remote memory access in RDMA.

ZAC aims at a high level of flexibility regarding available high-performance hardware components. Being developed as part of the project, dedicated software modules can substitute missing hardware components. For example, missing RDMA network adapters can be seamlessly replaced by the SoftiWarp Linux kernel module.