Videotext Extraction and Recognition - Chitra Dorai, IBM Research
Videotext Extraction and Recognition Research
at IBM T.J. Watson Research Center
Contact: Chitra Dorai
Automatic Text Extraction from Video for
Content-Based Annotation and Retrieval
The ongoing proliferation of digital image and video databases has led
to an increasing demand for systems that can query and search large
video databases efficiently and accurately for desired video
clips. Manual annotation of video is extremely time consuming,
expensive, and unscalable in the face of ever growing video databases.
Therefore, automatic extraction of video descriptions is desirable in
order to annotate and search large video databases. Text present in
video frames is a valuable source of content information. Text is
abundant in videos with program credits and title sequences. In news
videos, text is often used as captions, and in sports videos game and
player statistics are often superimposed on the frames in textual
form. Video commercials ensure that the product and other shopping
information is presented as readable text. When video text is
automatically extracted, it not only provides keywords for annotation
and search of image and video libraries but also aids in highlighting
events which can then be used for summarizing a video. Text extracted
can also be used in video categorization, cataloging of commercials,
logging of key events, and efficient video digest construction.
In 1996, we developed a computational scheme that not only locates the
textual information in video, but also extracts it and generates
images with segmented characters that can be directly supplied as
input to any OCR system. Our algorithm employs a combination of
region segmentation and feature-based refinement techniques to handle
the variations in text font size, style, gray level contrast, and
complex image backgrounds in which the text is embedded. We also
presented a mechanism to exploit the temporal persistence of the text
over multiple consecutive frames which can enhance the performance of
any video text extraction system.
Here is the paper: ICPR 1998
End-to-End Videotext Recognition for Multimedia
Content Analysis
In 1999, we tackled the problem of developing a reliable general
purpose videotext recognition system. A character recognition
algorithm designed specifically for the low resolution output from
videotext extractors is needed for better accuracy and for building
robust systems for real world applications. Additionally, temporal
persistence is often employed for enhanced readability with videotext.
We found that a typical text segment persisted on video for at least
about 20 frames. This redundancy can be exploited for higher
recognition accuracy. This paper describes a unique end-to-end video
character recognition (VCR) system that we have developed featuring
new character attributes emphasizing macro character shapes, a Support
Vector Machine-based character classifier, videotext object synthesis,
font context analysis, and temporal contiguity analysis, to
successfully address the issues confounding reliable videotext
recognition. Our system, when experimented with real video data,
performs well. The individual processing stages in our system can be
potentially implemented in conjunction with other VCR algorithms being
developed elsewhere, and can improve their recognition performance
also.
Here are the papers on various aspects of our video character recognition system:
ICME 2001
ICIP 2001
MMSP 2001
The MPEG-7 Videotext Description Scheme
We joined hands with Nevenka Dimitrova and Lalitha Agnihotri at
Philips Research to design a Multimedia Description Scheme based on
videotext. This MDS is now part of the ISO/IEC MPEG-7 standard.
Papers on this topic:
Signal Processing: Image
Communication Journal
ACM Multimedia 2000
Workshop on Standards, Interoperability and Practice
Last modified: Wed Dec 13 15:56:11 2000