Videotext Extraction and Recognition - Chitra Dorai, IBM Research

Videotext Extraction and Recognition Research at IBM T.J. Watson Research Center

Contact: Chitra Dorai

Automatic Text Extraction from Video for Content-Based Annotation and Retrieval

The ongoing proliferation of digital image and video databases has led to an increasing demand for systems that can query and search large video databases efficiently and accurately for desired video clips. Manual annotation of video is extremely time consuming, expensive, and unscalable in the face of ever growing video databases. Therefore, automatic extraction of video descriptions is desirable in order to annotate and search large video databases. Text present in video frames is a valuable source of content information. Text is abundant in videos with program credits and title sequences. In news videos, text is often used as captions, and in sports videos game and player statistics are often superimposed on the frames in textual form. Video commercials ensure that the product and other shopping information is presented as readable text. When video text is automatically extracted, it not only provides keywords for annotation and search of image and video libraries but also aids in highlighting events which can then be used for summarizing a video. Text extracted can also be used in video categorization, cataloging of commercials, logging of key events, and efficient video digest construction.

In 1996, we developed a computational scheme that not only locates the textual information in video, but also extracts it and generates images with segmented characters that can be directly supplied as input to any OCR system. Our algorithm employs a combination of region segmentation and feature-based refinement techniques to handle the variations in text font size, style, gray level contrast, and complex image backgrounds in which the text is embedded. We also presented a mechanism to exploit the temporal persistence of the text over multiple consecutive frames which can enhance the performance of any video text extraction system.

Here is the paper: ICPR 1998

End-to-End Videotext Recognition for Multimedia Content Analysis

In 1999, we tackled the problem of developing a reliable general purpose videotext recognition system. A character recognition algorithm designed specifically for the low resolution output from videotext extractors is needed for better accuracy and for building robust systems for real world applications. Additionally, temporal persistence is often employed for enhanced readability with videotext. We found that a typical text segment persisted on video for at least about 20 frames. This redundancy can be exploited for higher recognition accuracy. This paper describes a unique end-to-end video character recognition (VCR) system that we have developed featuring new character attributes emphasizing macro character shapes, a Support Vector Machine-based character classifier, videotext object synthesis, font context analysis, and temporal contiguity analysis, to successfully address the issues confounding reliable videotext recognition. Our system, when experimented with real video data, performs well. The individual processing stages in our system can be potentially implemented in conjunction with other VCR algorithms being developed elsewhere, and can improve their recognition performance also.

Here are the papers on various aspects of our video character recognition system:

ICME 2001
ICIP 2001
MMSP 2001

The MPEG-7 Videotext Description Scheme

We joined hands with Nevenka Dimitrova and Lalitha Agnihotri at Philips Research to design a Multimedia Description Scheme based on videotext. This MDS is now part of the ISO/IEC MPEG-7 standard.

Papers on this topic:

Signal Processing: Image Communication Journal
ACM Multimedia 2000 Workshop on Standards, Interoperability and Practice


 Back Last modified: Wed Dec 13 15:56:11 2000