Applications
This
adventurous research project applies ideas from statistical analysis of
audio, visual and textual modalities for semantic analysis of media content.
The aim is to further the technology available for supervised and
semi-supervised model learning using generic algorithms. This project is a
joint collaboration between members of the Pervasive Media Management Group
and members of Audio-Visual Speech Group.
Contact: Milind
Naphade, Ching-Yung Lin,
Giri Iyengar, Harriet
Nock and Arnon Amir
TREC Video Benchmark
The National Institute of Standards
and Technology (NIST) started the TREC Video Benchmark in 2001. Apart from
participating in the benchmark We are also actively involved in the
definition of the benchmark and of the collaborative tasks involved. In the
past 2 years, IBM Research is among the topmost performers across various
tasks in TREC Video Benchmark.
The TREC 2001 benchmark involved a
query answering task and a shot boundary detection task. The benchmark
involved 74 semantic queries which needed to be answered over a corpus of 19
hours. The IBM system achieved the topmost recall among known item queries.
The TREC 2002 benchmark had a query
answering task, a shot boundary detection task as well as a new concept
detection task The corpus was partitioned into a 25 hour training set a 40
hour search test set and a 5 hour feature test set. The query benchmark
involved 25 queries, while the concept detection benchmark involved 10
semantic concepts. IBM topped the concept detection benchmark with the
highest Mean Average Precision and was ranked Second in the Query Benchmark
for the Interactive Track.
The TREC 2003 benchmark involved 4
tasks: Shot boundary detection, semantic concept detection (with a lexicon of
17 benchmark concepts.), story boundary detection and the answering of 25
semantic queries. Ideas developed in SLAM are applied to all tasks. The IBM
research TRECVID system continues to top the concept detection benchmark and
the shot detection benchmark, while performing very well in the story
segmentation and the search benchmarks.
The system can be found at http://mp7.watson.ibm.com/.
The IBM MPEG-7 annotation tool at http://www.alphaworks.ibm.com/tech/videoannex.
Contact: John
R Smith
ARDA Video Analysis and Content Exploitation
Project Phase II
The Advanced Research and Development
Agency has recently begun Phase II of the Video Analysis and Content
Exploitation Project. This involves the multimedia semantic content analysis of
news video broadcasts for extraction of semantic information. The Pervasive
Media Management Group will be involved in this project beginning in 2004. The
goal of this project is to establish viability of high-level feature
detection as basis for achieving video semantics understanding. By performing
automatic detection of thousands of classes of concepts (objects, sites,
actions) from news video using statistical learning and mining, we hope to
extract the semantic threads from multiple video broadcast news sources. This
will be done by accounting for possibly distinct points of view, bias, and
foreign languages and by exploiting common underlying semantics (high-level
features, near duplicate scenes, commonality of content and structure)
Contact: John R Smith
and Milind R. Naphade
|