Skip to main content

UIMA (Unstructured Information Management Architecture)

Japanese | English

UIMA: the middleware for text analytics

In the document search and text mining system, we have to analyze unstructured content. In analyzing unstructured data, we usually use a variety of natural language technologies including tokenizing, parsing and named entity extraction. To use each processing module, we have to know the detail of the technology and many components that have same function have been developed. To reuse components and integrate components easily, IBM research has been developed UIMA (Unstructured Information Management Architecture) that is the infrastructure to construct UIM (Unstructured Information Management) application, and has released UIMA SDK from alphaWorks, IBM's repository of exploratory software. From 2006, UIMA is distributed as an open source software in the Apache project.

UIMA defines the data structure that stores the original information and the extracted information as CAS (Common Analysis System). It also defines the interface of the processing module as AE (Analysis Engine). If one developer implements his module using these data structure and interface and makes it UIMA compliant, another developer can reuse it on UIMA and integrate with his application.

IBM Research - Tokyo is constructing the text mining system on UIMA and developing the base system that can process documents efficiently.