IBM Research

IBM Research
Knowledge Discovery and Data Mining
Computer Science Brochure

Online data continues to grow at an explosive pace, due to the Internet and the widespread use of database technology. This phenomenon has created an immense opportunity and need for methodologies of Knowledge Discovery and Data Mining (KDD). KDD is an interdisciplinary area focusing upon building automated techniques for extracting useful knowledge from data. Research in this area draws principally upon methods from statistics, data management, pattern recognition, and machine learning, to deliver advanced techniques for business intelligence. IBM Research has been at the forefront of this exciting new area from the very beginning. Key advances in robust and scalable data mining techniques, methods for fast pattern detection from very large databases, text and Web mining, as well as innovative business intelligence applications, have come from our worldwide research laboratories.

An area of particular focus in our KDD research has been on high performance, scalable data mining techniques for large-scale databases and data repositories. IBM's early lead in this area was established by our invention of association rule and sequential patterns technology for efficiently detecting patterns in large-scale databases. These and other technologies for scalable and parallel data mining developed as part of this project provided the original basis and impetus for IBM's flagship data mining products. This theme continues in recent research activities that include automatic subspace clustering, discovery-driven exploration of OLAP data cubes, and fast techniques for pre-computing and maintaining OLAP data.

Another area of investigation has been focusing on predictive data mining algorithms, systems, and solutions. One long-term effort has centered on rule-based predictive modeling and its integration into data mining frameworks. A recent effort has resulted in new data mining middleware for rule-based probabilistic estimation, which combines machine learning with principles from statistical learning theory and data management for scalable predictive modeling of massive data sets. This technology has been embedded in innovative business intelligence applications for areas such as insurance risk management and retail targeted marketing. Related research continues in areas such as rare-event predictive mining, robust feature selection, ensemble-based and regularization methods for predictive modeling, and support vector machines.  

The exploration of machine learning and statistical techniques for new KDD methods and solutions continues to grow across all our research laboratories. These include research activities in text categorization, information extraction from document collections and the Web, item recommendation and personalization, and event mining for systems and network management, and business process insight discovery. We continue to emphasize research in KDD techniques for handling massive amounts of data, resulting in new approaches for clustering, predictive modeling, and frequent pattern detection, as well as for integration of KDD methodologies into database middleware. Many of the data mining techniques that were originally developed and tuned for structured data (e.g. associations, classification, clustering) are also now being increasingly used and refined for projects in natural language processing and knowledge management.

A rapidly growing area of KDD research is in applications to Internet, mobile, and pervasive computing solutions. Some of our projects are currently developing advanced business intelligence components for B2C personalization and B2B commerce. We are also beginning to investigate uses for KDD in applications for e-marketplaces, mobile commerce, as well as information mining from multimedia on the Web.

Please contact Paridhi Verma to obtain copies of the Computer Science Brochure

CS Brochure 2000

Privacy Terms of use Contact IBM www.research Research Sites Page Contact