|
Data
Management
|
|
|
Computer
Science > Data
Management
> Computer Science Brochure
|
|
| Computer Science Brochure | |
|
IBM Research is recognized as a leading innovator in the field of data management. Our history of pioneering work include E. F. Codd's seminal work on relational algebra, the System R relational database management system prototype (which led to IBM's DB2 database), ARIES transaction recovery and logging, Starburst extensible database technology, DB2® parallel database technology, QBIC® for image querying, and QUEST data mining algorithms. Today, we continue to explore new data management technology in such areas as data warehousing support, object-relational features, digital library support, multimedia content management, federated databases, as well as emerging areas of e-commerce, Internet applications, and mobile applications. Advanced Relational Database Research In the relational data management area, we are actively exploring several new technologies to enable scalability, functionality, performance, and usability in our DB2 database systems. For scalability, we are focusing on very large databases especially for data warehousing. One aspect of scalability is support for complex materialized views in a database. Materialized views precompute partial query results and hence can provide immediate response to complex queries. We are also working on advanced clustering and indexing support for these large data warehouses to improve query performance and database maintenance operations. Our overall goal for scalability is to enable data warehouses containing terabytes and petabytes of data. We have been working on the support of object-relational functionality with a Relational Database Management System (RDBMS). We are also investigating new user-defined processing paradigms, new APIs to support data mining, etc. Usability is another important research topic. Databases have become ubiquitous in many application domains. This has led to an explosion in the number of features and capabilities supported by database systems. Hence, the area of auto-adminisitration and automatic tuning of the database features is becoming very important. As an important first step, our research team has designed and implemented a robust Index Selection Wizard which automatically suggests useful indexes based on workload patterns and space requirements. Multimedia Data Management Image, text, video, and other multimedia data are important features of the Web and Internet generation of applications. Increasingly, these data objects are being stored as first-class objects in a database and are being queried and manipulated like original structured data. We have been actively exploring full-fledged data management systems that encompass these nonstandard data types. We have evolved from the original QBIC (Query By Image Content) work that provided a query facility over images based on features of more elaborate methods. For instance, our Spire system is able to extract, transform, and massage features from image and satellite data and build semantically rich object definitions with minimal support from users or experts. Federated and Distributed Data Management Managing federated and heterogeneous data sources continues to be an important research topic. Our Garlic project attempts to integrate heterogeneous data using wrappers, and it provides a query processing and query optimization framework over disparate data sources. We are actively engaged in using XML for providing structure to data objects. Our work in this area is spread across several of our research centers. Database technology has provided a number of important management features that are summarized by the ACID properties: atomicity, consistency, isolation, and durability. Ideally, all the data in the world would reside in such powerful databases. However, the reality is quite different, and the world continues to use file systems and other not-so-robust data stores for a significant volume of data. The Datalinks project is tailored to provide a database style of management for files in a file system. Active and Temporal Database Techniques Database techniques are being used for the general problem of event management, with the ability to correlate events from various sources and activate rules based on this correlation. The Amit project has created a high-level language and an execution model to correlate events based on temporal characteristics. There are many potential applications in diversified areas, such as: e-brokerage, system management, customer relationship management, and real-time monitoring of sensor inputs.
Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |