|
The Pen Technologies team at the IBM Thomas J
Watson Research Center is part of a larger effort at IBM to build
complete end-to-end pen computing solutions. The team has been working
for a number of years on technologies to enable recognition of
multilingual unconstrained on-line handwriting. This state-of-the-art
recognition technology is a component of IBM’s pen computing
platforms.
What is
Pen Computing?
Pen computing as a field broadly includes
computers and applications in which a pen is the main input device. This
field continues to draw a lot of attention from researchers because
there are a number of applications where pen is the most convenient form
of input. These include:
- Preparing a first draft of a document and
concentrating on content creation.
- A socially acceptable form of capturing
information in meetings, that is quieter than typing and creates
minimal visual barrier.
- Applications that need privacy.
- Entering letters in ideographic languages like
Chinese and Japanese; where the size of the character set makes
keyboard-style input cumbersome or infeasible
- Entering non-letter entries like graphics,
music and gestures.
- Interaction with multi-modal systems.
What is On-line Handwriting
Recognition?
Pen computing platforms record handwriting
information as a time ordered sequence of (x,y) points. The problem of
recognizing writing in this case is referred to as on-line handwriting
recognition, as opposed to off-line handwriting recognition where
handwriting information is captured as an image.
The pen input device on these platforms records
the trajectory of the pen tip on the paper as a sequence of points
sampled over time (xt,yt). The set of points between a pen-down and next
pen-up is called a stroke. The pressure of the pen tip on the paper may
also be used during recognition.
Just as people cannot read a page of writing as a
single unit (we must look at the individual words), recognition software
cannot transcribe all the ink on a page as a single unit. Instead,
the program breaks your handwriting down into manageable pieces that it
can process into letters and words.
The IBM handwriting recognizer first attempts to
sort and collect the electronic ink into a sequence of strokes belonging
to a word or phrase on a single line. The program resizes the ink,
shrinking large and stretching small writing to make it all roughly the
same size. The recognizer then breaks the strokes into smaller
pieces. This allows the recognizer to examine and work with short
segments that are fairly simple in shape and curvature. The
program can then characterize and label each stroke section more easily.
The preceding steps are all in preparation for the
“pattern recognition” phase – the heart of the transcription
system. Here, the program matches the electronic ink, now resized and
sectioned, to several different charater-shape models to find the set of
models that fit best. More than one model may exist for each
character, so the program must attempt many matches for many different
combinations. The stroke segments are grouped and regrouped in
many different possible arrangements to find the combination of grouping
and character models that best fit the writing. To aid in this
process, the program usually relies on a word dictionary to limit the
number of attempted matches, as well as built-in knowledge of how
frequently we use certain words. The different shape models for
each character are mathematical “averages” patterned after many,
many examples of writing from many different people.
Quality and usability of handwriting recognition
technology is based on the following 3 parameters:
- Vocabulary size: Every recognizer uses a
reference vocabulary to aid recognition and will give you the best
match of what you wrote and the models for the words in the
vocabulary. Smaller the vocabulary, higher the accuracy.
- Writing style: If you ask people to
write in a specific way (like Graffiti), the variability between
writing styles decreases which increases the quality of recognition.
Usability on the other hand degrades with more constraints.
- Trainability: If a recognizer can
customize the models for a particular writing style, the recognition
accuracy will be higher.
|