|
Natural
Language Processing
|
|
|
Computer
Science > Natural
Language Processing
> Computer Science Brochure
|
|
| Computer Science Brochure | |
|
Natural Language Processing at IBM is a dynamic research area spanning a wide range of topics. Over the years, we have pioneered algorithms and systems that have significantly improved the understanding of this field. Our accomplishments include the TQA database query system, the source-channel statistical machine translation paradigm, grammar-free statistical parsing, and mathematical investigations of compositionality in natural language. Currently, we work on theoretical issues of computational linguistics and practical algorithms for text analysis, conversational systems, and machine translation. Our Watson, Almaden, Beijing, Tokyo, and Zurich Research labs are all actively involved in these areas. Text Analysis and Mining We develop algorithms to extract useful information from enormous collections of documents, like the Web. Many of our investigations are driven by the question answering paradigm: we envision that a user has a specific question in mind and that there are many valid answers with differing levels of granularity. For example, our traditional information retrieval techniques focus on high-precision retrieval of entire relevant documents. We are actively pursuing retrieval of finer granularity answers, such as paragraphs, individual sentences, key phrases, and summaries. Given a document, how does a user find related documents and see their connection to the original? One solution is lexical navigation. We have developed a set of techniques to extract named entities (people, places, organizations, domain terms, dates, etc.), along with the named relations (CEO, teacher, etc.) that link them. These relations are represented by lexical networks providing navigation links within the concept space. Although the amount of information present on the Web and other computer databases is exploding rapidly, an individual's ability to read and absorb it remains essentially fixed. Because of this, we have developed machine learning algorithms that mine recurring patterns and linguistic associations from this wealth of data. For example, we have developed a very accurate and rapidly trainable linear classifier algorithm, as well as the widely used conditional maximum entropy model. We have also developed a state-of-the-art algorithm for automatic new topic discovery in newswire feeds. As another example, we have enhanced a call center's productivity by analyzing call databases and automatically identifying topics associated with an increased numbers of calls. Another area of interest is Web search and text mining of documents for competitive intelligence or sales and marketing purposes. Recent applications include interactive dialog-based autoresponse systems for the Web and text categorization for e-mail autorouting and autoresponse systems. (See Knowledge Discovery and Data Mining on p. 28.) Conversational Systems Our research in this area spans an entire spectrum of conversational systems technologies, including multimodal dialog management, natural language generation, and statistical natural language understanding. Our vision is a flexible, free-flow conversational system that places the user in the driver's seat. Simply by speaking naturally in their own words, users will control the application in their own personal style. We are exploring several techniques to construct both universal and application-specific dialog engines. Our emphasis is on scalable dialog systems capable of simultaneously handling many users, tasks, and input modalities (voice, keyboard, gesture, and mouse). We gauge the quality of our research efforts by prototyping systems for many applications - ranging from stock and mutual fund trading systems to phone banking, air travel reservations, and Web-based shopping - in many languages, including English, French, German, Mandarin, and Spanish. (See Human Computer Interaction on p. 26.) Machine Translation Machine translation (MT) technology is increasingly crucial due to the global use of the Internet for information exchange. Our MT research effort has three principal focuses: fully automatic machine translation algorithms, methods of assisting the manual translation process by source-side linguistic annotation, and corpus-based statistical machine translation algorithms specifically tuned for cross-lingual information retrieval. Several of our translation systems take advantage of the Slot Grammar structure which we have developed to produce more accurate machine translation. We are currently developing broad coverage Slot Grammar parsers in several European and Asian languages. Our English Slot Grammar parser is also the basis for grammar checking and controlled language checking, where the aim is to help writers produce clearer and more translatable text. Please contact Paridhi Verma to obtain copies of the Computer Science Brochure |