human computer interaction
CLEVER Search Technology
overview how it works future applications for researchers

Clever enables search engines to automatically identify and list the most relevant content sites it has crawled on the web. Currently this has only been achieved by human intervention on sites like Yahoo! which manually collect and list high-quality pages on a number of topics.

Using current search engines, if a user enters the keyword "fishing", they are likely to receive a list of a million pages on today's web. An engine utilizing the Clever algorithm returns a list of thirty recommended pages about fishing in general, filtering out enormous numbers of irrelevant pages (e.g., pages containing "fishing for compliments"), low-usefulness pages (i.e., pages about "fishing in the south of Medford, North Dakota" which would be appropriate responses to a more specific query), and low-quality pages (i.e., the large number of advertising pages offering the same products for the same prices).

Clever builds on the HITS (Hypertext-Induced Topic Search) algorithm developed at IBM’s Almaden Research Lab in San Jose, CA. HITS finds authoritative sources (Authorities) of information on the web, together with sites (Hubs) featuring compilations of such authoritative sources. The HITS algorithm first uses a standard text search engine to gather a "root set" of pages matching the query subject. Thereafter, the algorithm uses only the links between these pages to distill the best authorities and hubs. These links distill and organize the effort of millions of individuals independently building web pages. Clever additionally uses the content of the web pages, thus exploiting not only the link structure (as in HITS) but further using the text and other properties of the web pages being distilled.

future applications