|
The Unstructured Information Management Architecture (UIMA) is a software architecture and an associated development framework that support the creation, composition, and deployment of text and multimodal analytics. These analysis techniques are the fundamental building blocks used for automatically discovering business-critical knowledge that lies buried in mounds of unstructured information, such as natural language documents, Web documents, e-mails, chat logs, and speech and video recordings. Applications that will benefit from these techniques include enterprise search, bioinformatics, business intelligence, national security, customer relationship management, and learning and education applications.
The paper by Ferrucci and Lally provides an overview of UIMA and illustrates the steps involved in building a simple application. It also describes the ways in which UIMA has been widely adopted throughout IBM and beyond. UIMA is the basis of WebSphere® Information Integrator OmniFind™ Edition, an IBM product that provides an enterprise search engine and a platform for building solutions based on text analytics.
As part of the UIMA project, IBM has developed the UIMA Java® Framework and contributed it to the open-source community. The UIMA Software Development Kit has been posted on the alphaWorks™ Web site from which it has been downloaded by more than 8000 users. UIMA is being used in government institutions such as DARPA, the United States Army, and the Department of Homeland Security. It has been used by a number of commercial text-analysis vendors to wrap and integrate analytics into commercial products, including UIMA-compliant plug-ins. It is used for bioinformatics research by the Mayo Clinic College of Medicine and the Sloan Kettering Cancer Institute.
The UIMA Working Group, a consortium of companies and universities committed to the exploration of UIMA, was formed in 2005. As a result of the UIMA Working Group's activities, some existing resources (such as Stanford University's OpenNLP toolkit) were integrated with UIMA. As a member of the UIMA Working Group, Carnegie Mellon University became actively involved in the adaptation of UIMA for such tasks as multi-engine machine translation (GALE project) and large-scale annotation for question answering (JAVELIN project). Carnegie Mellon University has also undertaken the creation of an open-source component repository for UIMA.
|