
We have a very strong interest in the areas of functionality and performance
of database engines. For example, the MultiDimensional Clustering project is exploring
new clustering, indexing, and database processing paradigms for data warehouses. We are
also actively pursuing other topics in this general araea such as Sorting, Aggregation, and
block-oriented operations.
Traditionally, database systems have employed disk I/O minimizing sorting techniques since
the expectation is for the Sorts to spill to disks. The main focus has been on I/O performance and
less attention has been paid to internal memory algorithms. In today's systems, it is important
to pay attention to internal memory sorting due to large memories. We have explored and
implemented a general mulitple-datatype Sorting algorithm in DB2 using very high performance
techniques such as radix sorting, distribution counting, and binary key preparation. We are
also exploring bucket based external sorting for better memory utilization and I/O efficiencies.Ref: SIGMOD 96 Paper
We have explored a technique for allocating a block of storage, inserting a group of records, and
then applying very high-performance Block-Oriented operations to perform Group By operations
and Aggregations. We have demonstrated that this results in very effective utilization of current
superscalar CPU's as well as the large L2 caches and results in large gains on aggregation intensive
queries.
We initiated the Parallel database processing effort that led to IBM's DB2 Parallel Edition and
DB2 Universal Database Enterprise-Extended Edition . Within the scope of this effort, we explored
parallelsim for almost all aspects of database processing. The Shared-Nothing Architecture and the
function-shipping paradigm form the basis of our work. We have implemented an elegant parallel
query processing architecture and the corresponding query optimization to generate optimized parallel
query plans. Also, we explored and implemented parallelism for all
utility functions as well as recovery and transaction processing.
Ref: DB2 Parallel EditionWe are interested in follow-up parallel processing issues such as Parallel Interfaces to applications,
Skew-handling, and global indexing techniques.
[Projects: Multidimensional Clustering | XML Data Access | Tertiary Storage | Database Processing ]