|
Knowledge
Discovery and Data Mining
|
|
|
Computer
Science > Knowledge
Discovery and Data Mining > Computer Science Brochure
|
|
| Computer Science Brochure | |
|
The ongoing rapid growth of online data, due to the Internet and the widespread use of databases, has created an immense need for methodologies of Knowledge Discovery and Data Mining (KDD). KDD is an interdisciplinary area focusing upon methodologies for extracting useful knowledge from data. The challenge of extracting knowledge from data draws principally upon research in statistics, data management, pattern recognition, and machine learning, to deliver advanced business intelligence and Web discovery solutions. IBM Research has been at the forefront of this exciting new area from the very beginning. Key advances in robust and scalable data mining techniques, methods for fast pattern detection from very large databases, text and Web mining, as well as innovative business intelligence applications have come from our worldwide research laboratories. Focused Areas A key area in our KDD research is high performance, scalable data mining techniques for large-scale databases and data repositories. IBM's early lead in this area was established at the Almaden Research Center by the Quest project, when the association rule and sequential patterns technology for efficiently detecting patterns in large-scale databases were invented. These and other technologies for scalable and parallel data mining developed as part of this project provided the original basis and impetus for IBM's flagship data mining products. Recent research includes automatic subspace clustering, discovery-driven exploration of OLAP data cubes, and fast techniques for precomputing and maintaining OLAP data. Related work at the Watson Research Center is similarly focused upon scalable data mining techniques for high-dimensional data. Another area is in predictive data mining algorithms, systems, and solutions. We are exploring, at Tokyo Research Laboratory, advanced data mining algorithms for geospatial decision support applications. At Watson, one long-term focus area has been on rule-based predictive modeling and its integration into data mining frameworks. This work has resulted in new data mining middleware for rule-based probabilistic estimation (the ProbE framework), which combines machine learning with principles from statistical learning theory and data management. This technology has been embedded in innovative business intelligence applications for such areas as insurance risk management and retail targeted marketing. Text mining solutions are being actively explored at various IBM research laboratories. Experimental solutions have been prototyped that employ interactive classification and clustering technologies to organize and manage text repositories, such as Lotus Notes e-mail databases. We are applying a combination of natural language processing and scalable pattern detection for discovering all frequent patterns with two-level structures. We are also combining linguistic analysis with text mining for problem detection and trend analysis. There are a number of ongoing research efforts in applications of text mining to electronic help desks. Work in this area has resulted in novel technologies for automated e-mail categorization, Web-based autoresponse systems, as well as clustering and exemplar generation techniques for high-dimensional call center logs. New Research Efforts A rapidly growing area of KDD research is applications of data mining to the Internet. These include new business intelligence components for Web personalization and electronic commerce, and Web-based knowledge discovery solutions. We are currently investigating uses for KDD in applications for e-marketplaces, as well as information mining from the Web. We have developed the CLEVER search engine technology which uses Web hyperlink information in innovative ways to detect endorsement of authoritative pages on the Web. We are also working on the problem of textual information overload on the Web. Our recent efforts in this area include the development of solutions for Web personalization based upon new clustering, indexing and classification techniques. Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |