
Consider the TPC-D benchmark. The significant dimensions of this
benchmark are:
Already, several benchmark implementations use the Time dimension to
Range-Partition the important tables in TPC-D. In our project,
we are
exploring the ability to cluster and partition the tables using more
than one
dimension. The expected advantages are in terms of better isolation
of the data
to answer the multidiemsnional queries efficiently and better manageablilty
of the
database. The following diagram provides a view of how such a multidimensional
clustering of the data will help to support the complex queries of
TPC-D.
In this example, we are using a 2-D clustering scheme using months of
Dates and Nations.
Given a query that selects a range of dates and some nations (in a
region), the processing
strategy is to select the subset of the clells that belong in the intersection
of the range of
dates and specific nations using compact indexes. The expectation
is to minimize I/O without
incurring maintenance penalites which occur due to heavy indexing.
[Projects: Multidimensional Clustering
| XML Data Access
| Tertiary
Storage | Database Processing
]