
This project studies the integration of DB2 with tertiary storage. Currently, we are already seeing the proliferation of large data warehouses with requirement for Terabyte storage. Projecting this trend to the future, it is easy to expect data collections to require hundreds of terabytes or even petabytes of storage. Such large storage are better managed by Herirarchical Storage Managers (HSM). We are already seeing the beginnings of this evolution and the announcements of HSMs for storage management.Inour project, we have interfaced a HSM called HPSS with DB2 using flexible filesystem semantics. We have carefully designed our architecture to be flexible enough to accommodate various current and future needs. The architecture allows SQL-transparent access to tertiary storage through disk buffer. Tertiary storage is treated as a first class storage medium on which one can build database tables. Frequently used data are automatically cached in disk buffer. Our disk buffer allows very flexible configurations to accommodate different application characteristics. The architecture is designed using a clear-cut interface that allows easy addition or removal of new data source or filter module, so that when new storage media comes along it is very easy to integrate them with our database engine architecture. On the same token, it is easy to replace the disk buffer module with a customized or extended one to fit specific application requirement.
[Projects: Multidimensional Clustering | XML Data Access | Tertiary Storage | Database Processing ]