![]() |
![]() |
![]() |
![]() |
|
| Knowledge Management | |||
|
Information Organization
and Retrieval We are currently working on three aspects of information organization
and retrieval.
Taxonomy Maintenance: Traditionally taxonomies have been built and maintained manually. However, as they grow in size and complexity, it becomes extremely difficult to maintain and update them. Another reason why manual maintenance of such taxonomies is cumbersome is that they evolve with time and nodes may get merged or split. For example initially we might place "humans" and "apes" under "mammals". Later, when many more mammals get added, we may decide to add another level within the "mammals" group and place "humans" and "apes" under "primates," which in turn comes under "mammals". At IRL, we are developing algorithms that automatically populate and
maintain taxonomies. Document classification is a simple application of
automatic population of taxonomies. Our approach is to define the notion
of a state for the taxonomy and minimize the entropy associated with the
state while adding new items and concepts. We use a measure for confidence
of insertion (classification) so that a human can intervene whenever the
classification is ambiguous.
Query Refinement and Disambiguation: In order to answer a query,
the system needs to navigate large hierarchies or taxonomies such as directories
for the Internet, library catalogues, and product catalogues. Query refinement
and disambiguation tools determine what part of the hierarchy is relevant
to the user's query by seeking relevance feedback. Our approach to query
disambiguation is to generate a compact representation of all contexts
of the query from all documents that are possibly relevant to the query.
The user can choose a particular context thereby clarifying the query.
The system will then continue the search within the particular context.
|
| About IBM | Privacy | Legal | Contact |