James W. Cooper
IBM Thomas J. Watson Research Center
An expanded version of this discussion was presented in a paper with Roy J. Byrd at ACM Digital Libraries, '97 in Philadelphia, July, 1997.
What is Lexical Navigation?
We have designed a document search and retrieval system, termed Lexical Navigation, which provides an interface allowing a user to expand or refine a query based on the actual content of the collection. In this work we have designed a client-server system written in Java to allow users to issue queries, have additional terms suggested to them, explore lexical relationships, and view documents based on keywords they contain. Lexical networks containing domain-specific vocabularies and relationships are automatically extracted from the collection and play an important role in this navigation process. The Lexical Navigation methodology constitutes a powerful set of tools for searching large text collections.
The Lexical Navigator Client
Based on the technology described throughout the IRgroup web site, we have developed the LexNav client user interface which illustrates the behavior and use of the context thesaurus and lexical network. Working with users and other IR experts in the group, we developed the LexNav interface, which provides and interface to most of tools this group has developed. We felt it was important that the user interface look professional but simple, and that it run on common platforms like Windows as well as on some of our developers Unix workstations.
Accordingly, we developed the interface program in Java, and after some experimentation we decided that it should be presented as a Java applet in a Web browser rather than as a stand-alone application, thus making the entire search system available for testing by users on our company-wide intranet.
Java processes running as applets embedded in Web pages have significant security restrictions to prevent foreign applets downloaded with Web pages from doing mischief to the users computer system. However, such applets are still allowed to make TCP/IP socket connections back to the HTTP server machine and this provides the client-server communication pathway.
The Java Server
| The actual retrieval of information for queries is accomplished by sending the queries to a server process, also written in Java but running on an IBM RS-6000. This server process, in turn, calls any of a number of C programs which perform the queries of the various components of the LexNav system. Each instance of the server then provides a separate connection to the retrieval system and recognizes 6 commands which call programs based on arguments to these commands and receive data from these programs through standard-out. Then the server instance formats and returns these data to the query process on the client workstation. | ![]() |
The commands each server instance supports include
- call the context thesaurus
- call the relationships graph program
- call the text search engine
- fetch and format the document
- identify each documents keywords and mark them in a copy of the document
- fetch the keyword-marked document
The Lexical Navigator Client
| Here we show the main Lexical Navigation interface. You can enter queries in the multi-line text box at the top and then either click on the query button directly to inspect document titles in the right hand list box, or you can click on the "related concepts" button, which calls the context thesaurus server process to return terms related to the query terms. | |
| Here, we show the context thesaurus keyword results of the query "airbus" on the 1988 Wall Street Journal collection. Terms from the context thesaurus are computed to have at least a strong unidirectional relationship to the query terms. | ![]() |
Once LexNav has identified these additional terms, you can add any of them to the additional query, but can also investigate them further by querying what sort of relationships exist between these (sometimes surprising) terms and other terms you expect to find in this collection.
![]() |
Here, we show these relationships for several of the query terms, which shows why some of the terms were found to be relevant. Two kinds of relationships are shown: named and unnamed relationships. Named relationships show the type of relation between the terms, while in the case of unnamed relationships, the strength of the relation is shown. In this display the relationship strength is represented by one, two or three asterisks. If the relationship is determined to be named, the name is shown in the middle column. |
![]() |
The interface allows you to add any marked term to the query at this point or to navigate further in relationship space by selecting one or more terms and clicking on the "Search" button. We see the results of further navigation here, where we have moved outward one relationship level to find terms related to the selected terms. Again, you can add these terms to the query or migrate further. |
In addition, you can view these relationships graphically by clicking on the "Plot" button, which produces the display shown here for a query on "Oliver North."

Terms in the yellow boxes can be expanded further by double clicking on them to show additional terms, thus allowing the user to discover all of the relationships for each additional term. This represents an additional, powerful way of navigating through concept space: to discover distant, but significant relationships between terms.
The relationships between terms are also represented in this plot, with either the relationship name or the strength of the relationship shown along the line. Should the plot become too cluttered, you can drag any box to a new position. You can also regroup terms into a single box or erase them completely.
Searching for Documents
| We have now narrowed our original query using the context thesaurus and terms from the relationships index and will use this refined query to search the collection. The result of this search is shown here, where right-clicking on any document title in the list will produce a list of the search terms contained in that document. | ![]() |
Clicking on the "View" button produces this display.

This Web page is constructed dynamically by the server, and consists of a frame where the left third is a Java applet and the right two thirds is a HTML-wrapped version of the Tipster1 format data, with the search terms converted to boldface.
The list box at the top left is a keyword-in-context display made from a custom Java control. Clicking on any line in this KWIC display moves the document in the right frame to that point. The list box in the lower left contains a list of keywords discovered in that document. Finally, clicking on the "Show Keywords" button at the lower left displays the document with the discovered keywords highlighted.
In summary, the interface provides a simple way to access a number of complex relationships and elicit responses from users based on their original vague queries to help them focus their query on specific areas within large collections.
[IBM Research Home Page|Information Retrieval and Analysis Group]
[ Home | Order |Search |Contact IBM |Legal ]