Photo
XML Storage Manager

Many new applications use XML not only as a data interchange format, but also as their primary data model. For these applications, the main advantage of XML resides in the flexibility of the data model, allowing variable-schema or even schema-less data to be represented. However, this very flexibility of the XML data model brings new challenges for storing and processing. One approach is to convert the XML data to tables, store them in a relational database and translate queries to SQL. The other alternative is to design a database management system especially for XML data. As a short-term solution, the first alternative is clearly preferable, given the maturity of relational database technology. However, it becomes more and more evident that in order to guarantee the same level of performance for XML data as for the traditional relational data, we will need a specialized database system. As part of a larger effort across IBM for the design of a native XML database, we are conducting research related to the storage manager component.

The XML storage manager is responsible with the physical layout of the XML data in disk pages and provides direct access to the data. In order to provide maximum flexibility we have decided to separate these two functions as much as possible. Thus, we designed a data access interface similar to the Document Object Model (DOM) with operators for XML tree navigation and data extraction. This allows several different physical representations of the XML data to co-exist and be accessed seamlessly. We are currently evaluating various representation options ranging from edge-table like records to physical pointer-based structures.

 

The goals of the XML storage manager are to:

  • Provide support for storing schema-less XML documents.
  • Enable schema-aware storage and optimizations.
  • Efficiently store both document-oriented and data-oriented XML documents.
  • Provide support for typed data.
  • Minimize the book-keeping overhead.
  • Be able to recreate the XML document accurately and efficiently.
  • Efficiently support queries and updates of XML documents.
  • Support concurrency and recovery.
  • Provide support for storing multiple versions of XML documents