|
Knowledge
Discovery and Data Mining
|
|
|
Computer
Science > Knowledge
Discovery and Data Mining
> Computer Science Brochure
|
|
| Computer Science Brochure | |
|
Online data continues to grow at an explosive pace, due to the Internet and the widespread use of database technology. This phenomenon has created an immense opportunity and the need for methodologies of Knowledge Discovery and Data Mining (KDD). An interdisciplinary area, KDD focuses upon building automated techniques for extracting useful knowledge from data. Research in this area draws principally upon methods from statistics, data management, pattern recognition, and machine learning to deliver advanced techniques for business intelligence. IBM Research has been at the forefront of this exciting new area from its very beginning. Key advances in robust and scalable data mining techniques, methods for fast pattern detection in very large databases, text and web mining, as well as innovative business-intelligence applications have come from our worldwide research laboratories. An area of particular focus in our KDD research has been high-performance, scalable data-mining techniques for large-scale databases and data repositories. IBM's early leadership in this area was established by our invention of association-rule and sequential-patterns technology for efficiently detecting patterns in large-scale databases. These and other technologies for scalable and parallel data mining provided the original basis and impetus for IBM's flagship data-mining products. This theme continues in recent research activities that include automatic subspace clustering, discovery-driven exploration of OLAP (On Line Analytical Processing) data cubes, and fast techniques for pre-computing and maintaining OLAP data. Another area of investigation has focused on predictive data-mining algorithms, systems, and solutions. One long-term effort has centered on rule-based predictive modeling and its integration into data-mining frameworks. A recent effort has resulted in new data-mining middleware for rule-based probabilistic estimation, which combines machine learning with principles from statistical learning theory and data management for scalable predictive modeling of massive data sets. This technology has been embedded in innovative business-intelligence applications for areas such as insurance risk management and retail targeted marketing. Related research continues in such areas as rare-event predictive mining, robust feature selection, ensemble-based and regularization methods for predictive modeling, and support vector machines. Our exploration of machine learning and statistical techniques for new KDD methods and solutions includes research activities in text categorization, information extraction from document collections and the Web, item recommendation and personalization, event mining for systems and network management, and business process insight discovery. We continue to emphasize research in KDD techniques for handling massive amounts of data, resulting in new approaches for clustering, predictive modeling, and frequent-pattern detection, as well as for integration of KDD methodologies into database middleware. Many of the data-mining techniques that were originally developed and tuned for structured data (for example, association, classification, clustering) are also now being refined and used for projects in natural language processing and knowledge management. We have begun to investigate privacy concerns through methods for perturbing original data coupled with novel reconstruction procedures that allow us to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data. We plan to explore the extension of these ideas to association-rule and clustering algorithms. Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |