IBM®
Skip to main content
    Country/region [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Computer Science

Innovation Matters


Data Management

Multi Dimensional Clustering (MDC) in DB2

Providing an elegant method for flexible, continuous, and automatic clustering of data along multiple dimensions with applications in business intelligence and transaction processing environments.

Multi Dimensional Clustering (MDC) is a new data layout scheme incorporated in DB2 Universal Database (UDB) Version 8, which allows a relational table to be clustered on one or more orthogonal clustering attributes (or expressions) of a table. Many applications, e.g., Online Analytic Processing (OLAP) and data warehousing, process a table or tables in a database using a multi-dimensional access paradigm. Currently, most database systems can only support organization of a table using a primary clustering index on one dimension. Secondary indexes are created to access the tables when the primary key index is not applicable. Unfortunately, secondary indexes perform many random I/O access against the table for a simple operation such as a range query.

MDC figure  legend

A three-dimensional MDC layout on nation, itemid and year (orderDate)

Our work in MDC addresses this important deficiency in database systems. It allows the robustness and scalability provided by relational database systems, but is able to provide efficiency for multidimensional data access. The above figure indicates a simple MDC configuration on three dimensions based on the following table columns: nation, itemId and a third generated from the year of the orderDate column. The intersection of the dimension values form logical entities called cells, which are, in turn made up of physical entities called blocks. These are accessed using block indexes. A configuration like this would allow efficient access for user queries on the dimensions.

MDC also provides for a continuous and automatic maintenance of the clustering over time, which reduces the need to reorganize the table to regain clustering -- as happens with current technology. MDC also allows the ability to co-exist with existing database features such as row based indexes, table constraints, materialized views, high speed load and mass delete.

These features make MDC useful for a business intelligence environment as well as a transaction processing environment. Many major DB2 clients are realizing significant query performance gains by using MDC in their production systems. As reported in "MDC Performance Customer Examples & Experience" at the DB2 Information Management Technical Conference 2004 in Madrid, Spain by Leslie Cranston, IBM clients like the Canadian Astronomy Data Center (CADC), BrasilTelecom, BankOne and Thomas West have reported significant performance gains in their workloads and benchmarks using MDC. In particular, the talk mentions Thomas West as reporting "The new multidimensional data clustering capability has improved performance of our most complex queries by up to 30 times while removing the need for additional reorganization". MDC has been used in systems as large as 32 terabytes and in major industrial benchmarks like the DB2 10TB TPCH benchmark.

Members of the MDC team have been awarded the IBM Outstanding Innovation Award for their contributions to DB2 V8 for MDC.

Selected Publications

B. Bhattacharjee, S. Padmanabhan, T. Malkemus, T. Lai, L. Cranston, M. Huras, "Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2", Proceedings of the 29th VLDB Conference 2003.

B. Bhattacharjee, L. Cranston, T. Malkemus, S. Padmanabhan, "Boosting Query Performance: Multidimensional Clustering", DB2 Magazine, Quarter 2, 2003.

B. Bhattacharjee, L. Cranston, T. Malkemus, S. Padmanabhan, "The Luster of Data Clustering", eServer Magazine, April 2003.

S. Padmanabhan, B. Bhattacharjee, T. Malkemus, L. Cranston, M. Huras, "Multi- Dimensional Clustering: A New Data Layout Scheme in DB2", Proceedings of the ACM SIGMOD Conference 2003.

Copyright © (2003) by Association for Computing Machinery, Inc. Permission to make digital or hard copies of part of all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

News and Information

M. Gonzales, G. Robinson, The OLAP-Aware Database, DB2 Magazine, Quarter 2, 2003.

Meet The Experts: Matt Huras and Version 8 Enhancements For DB2 on Linux, Unix and Windows, DB2 Developers Domain, 23 July 2002.

Multidimensional Clustering, DB2 Information Center, 2003.

P Gunning, Database Technology Leaps Ahead, DB2 Magazine, Quarter 4, 2002.

R. Welgan, Comparing Query Performance, MDC vs. Non-MDC Tables, DB2 Magazine, Quarter 2, 2003.

S. Gausden, T. Mason, Getting Started With MDC Space Management, The IDUG Solutions Journal, October 2002.

Innovators Corner
Bishwaranjan Bhattacharjee  
Bishwaranjan Bhattacharjee
Researcher

What is the most exciting potential future use for the work you're doing?
The potential usage of MDC in Online Transaction Processing (OLTP) for reorganization avoidance. Reorganizing a table generally means the table cannot be used during that time and any technology which helps avoid it can make a major difference in OLTP environments.

What is the most interesting part of your research?
The Block Index Lookahead (BILA) prefetching technique used for inter block prefetching for block index scans in MDC was the most interesting part of my research. This allowed us to prefetch blocks of data by looking into the block index, identifying the exact blocks which are need by the query and prefetching them. It has many advantages over the existing sequential detection technique for prefetching and has proved to be useful for other scenarios like block index mostly scans.

What inspired you to go into this field?
The impact MDC could have on query processing performance and its ability to prove to be a key differentiator between our products and the competition in the Online Analytic Processing (OLAP) and data warehousing environment was the key motivator for this work.

What is your favorite invention of all time?
Powered Flight
Learn more


Team Members
Research Team
Bishwaranjan Bhattacharjee Ramesh Agarwal Tim Malkemus
Bishwaranjan
Bhattacharjee
Ramesh
Agarwal
Timothy
Malkemus
Former Research Collaborators
arrow Sriram Padmanabhan ( Silicon Valley Lab)

Related Links
arrowDiscipline: Computer Science
arrowResearch Area: Data Management
arrowResearch Site: Watson
arrowProduct Benchmark: TPC Benchmark™H
arrowRelated Products: DB2 Information Management

    About IBMPrivacyContact