Palm Pirate: Advanced Text Search on Palm Pilot
PDAs have been rising in popularity in the last few years.
Although earlier versions were plagued by memory constraints,
recent improvements in storage technologies have eased this constraint.
In particular, newer Palm devices already have 8MB of storage. Advanced
storage technologies (such as the IBM Microdrive technology) allow for
hundreds of MBs of storage on the PDA device. It's only natural that in
light of these developments, users are starting to use PDAs as reference
tools - containers for medium to large-size collections of searchable
and browsable documents.
However, PDA CPU technology is still far behind. In fact PDA CPUs are
still orders of magnitude slower than workstation CPUs because of battery
and heat considerations (Palm Pilot CPU speeds are in the range of 16-20
MHz). Consequently, document collections become larger, text search
applications that employ sequential search over a document collection
(e.g., Palm's built-in find utility) will be too slow. Moreover, most
existing PDA search applications lack features which are available in
state-of-the-art search engines:
- Morphological Analysis
- Ranking of results
- Indexing for fast retrieval of results.
This demonstration shows a system for fast, advanced, index-based text
search on Palm devices, developed at the IBM Research Lab in Haifa, Israel.
The system consists of two main components:
- An indexing component (that runs on a server, or a user's workstation),
which reads a document collection, creates a full index out of it, and
bundles the document collection with the index in a Palm database format.
While state-of-the-art overhead for storing indices is about 60% of the
size of the original collection, our system breaks this limit and requires
only about 25%-30%, by using various optimizations. The databases containing
the collection and the index are transferred to the Palm device via simple
synchronization.
- A query component running on the Palm device. This is essentially a Palm
application which enables the user to manage, browse and search document
collections which were indexed by the indexing component. A user can select
a document collection and search it by executing a free text query. The query
component analyses the query and returns a ranked list of documents matching
the query, utilizing the index. The user can then browse the documents and
matching words can be highlighted. The retrieval response time (i.e. the
time between query submission and display of the results) is very fast,
averaging 1 second for a collection of 3000 documents (faster than on many
desktop search systems).
We intend to demonstrate the capabilities of our system by indexing ahead
of time all abstracts of the conference, and packaging our Palm client together
with the abstract collection and its index. The package will be made available
prior to the conference date for download from the conference Web site.
A typical usage of the query component:
"Information
Retrieval & Organization group at IBM Research Lab in Haifa"