IBM Haifa researchers help organizations and users search for ideas not just keywords
IBM Haifa Labs News Center
If we're searching for knowledge and ideas, why do we keep using keywords to look for information? This question has spurred on extensive work in the area of text analytics, a powerful technology that is enabling users to tap into "unstructured" information and search for concepts rather than keywords. Unstructured information is found in a variety of sources, including images, audio and video files, blogs, and e-mail.
The Unstructured Information Management Architecture (UIMA), developed by IBM Research and available through open source, can process this unstructured information to understand the latent meaning, relationship and relevant facts buried within. UIMA enables software to search and make sense of these disparate forms of data, offering users a more conceptual search.
IBM has made building text analytics applications easier by embedding UIMA into WebSphere Information Integrator OmniFind Edition, the first commercially available software platform for processing content based on the UIMA standard. OmniFind also incorporates retrieval algorithms and capabilities, developed at the Haifa Research Lab. These capabilities extend the UIMA platform, functioning as a back-end that enables annotated text analytics to be indexed and queried so the information can be retrieved. "OmniFind takes text analytics one step further and enables easy development of applications that can index, query, and retrieve the knowledge," notes Ronny Lempel, Manager of the Information Retrieval group of the IBM Haifa Lab.
Collections of documents are generally indexed and can then be searched using either a specialized query language or some keyword combination. Text analytics provides structure for unstructured content by tagging key concepts, such as persons, organizations, events and relationships between such concepts which are hidden within text. While search looks document by document, text analytics looks at relationships among multiple types of information. Text analytics can also search for concepts and facts, and understand them within the unstructured context. For example, if a user searches for 'leaders of the world', the system will retrieve information on presidents and prime ministers, even if the user-supplied query did not contain these terms.
Text analytics are already proving crucial in early warning systems, call centers, and medical applications, where these solutions are being used to discover relationships between various types of information. One company recently used the UIMA platform to develop a text mining solution that enables auto manufacturers to process unstructured information from warranty claims, maintenance records, repair requests, and call center logs. The information gathered is being used to gain an early warning on product problems. Another company developed a set of text analytics components to uncover hidden patterns and identify potential criminal or terrorist activity. The solution analyzes data such as field analyst reports, ship manifests, and surveillance transcripts, along with public records, news articles, publications, and financial transactions.
"We're doing a lot of work in Haifa to enable advanced semantic search through knowledge and ideas, rather than keywords," states Lempel. Haifa semantic search technology has proved itself in a number of international competitions, such as INEX, where it is being used to search and retrieve semi-structured information from XML documents. The group is also actively involved with the W3C, XML query working group forum for full text search, referred to as XQuery, where the XML query language developed in Haifa could help formulate the world standard for querying semi-structured data over the web.
The evolution of UIMA was augmented through work with the Defense Advanced Research Projects Agency, the central research and development organization for the Department of Defense. Several leading universities and industrial research and development organizations also contributed to UIMA technology. Some of the participating universities, such as Carnegie Mellon, Columbia, Stanford and The University of Massachusetts, are already using UIMA in courses and research projects. The other organizations actively supporting and using UIMA include Science Applications International Corp., BBN Technologies, The Mayo Clinic and MITRE Corp.
More than 15 software vendors announced commercial adoption of UIMA. These companies are expected to deliver UIMA compliant software, solutions or services to address various industry and application specific requirements.