Click here to view a powerpoint presentation on Semantic Search and UIMA.
Did you ever type in a keyword search query and get hundreds of thousands of documents back and think, how you would ever expect to sift through all those hits, list after hit list?
A powerful thing you can do with the results of UIMA analysis is to enable more effective search systems – systems that can more precisely target your intended interest.
Semantic Search is a class of document retrieval that allows the user to exploit the results of UIMA analysis to create much more effective queries – queries that can hone in on exactly what you are looking for.
OmniFind search and, on a smaller scale, the semantic search engine included with the UIMA SDK can exploit the additional information from the UIMA CAS to implement more powerful and precise queries.
For example, imagine a user is looking for documents that mention an organization with “center” in its name, but is not sure of the full or precise name of the organization.
A key-word search on “center” would likely produce way too many documents because “center” is a common and ambiguous term. Our semantic search engine supports a query language called XML Fragments. This query language is designed to exploit UIMA’s CAS annotations entered in the search engine’s index. The XML Fragment query, for example,
<organization> center </organization>
will produce first only documents that contain “center” where it appears as part of a phrase annotated as an organization by a named-entity recognizer. This hit list will be a much shorter list of documents more precisely matching the user’s interest.
Consider taking this a step further. We can add a relationship recognizer to the UIMA pipeline that annotates mentions of the “CEO of” relationship. We can then configure the CAS Consumer so that it sends these new relationship annotations to the semantic search index as well. With these additional analysis results in the index we can submit queries like
<ceo_of>
<person>center </person>
<organization> center </organization>
<ceo_of>
“Center” is a common word with over 13 different meanings, but this query will zoom in on those documents that contain the word used as the name of a person or in the name of an organization.
Furthermore, it will favor those documents where “center” the person, is the “CEO of” an organization that shares the name. The semantic search engine would include as top hits documents with
“…Fred Center, CEO of Center Micros…” or
“…The CEO of Center Systems, Mr. Center…”
Where phrases like “...the center of the circle...” or “...Mr. Center threw the ball to the center of the team…” would not match.1
This kind of precision is the power that UIMA plus semantic search can bring to your applications.
1 The query as exactly shown would include less precise matches but rank them lower. The query can be further specialized to exclude anything but exact matches as suggested here.