Photo
SLAM - Semantic Learning and Analysis of Multimedia

Applications

Multimedia Mining Adventurous Research Project Video Project

This adventurous research project applies ideas from statistical analysis of audio, visual and textual modalities for semantic analysis of media content. The aim is to further the technology available for supervised and semi-supervised model learning using generic algorithms. This project is a joint collaboration between members of the Pervasive Media Management Group and members of Audio-Visual Speech Group.

 

Contact: Milind Naphade, Ching-Yung Lin, Giri Iyengar, Harriet Nock and Arnon Amir


TREC Video Benchmark

The National Institute of Standards and Technology (NIST) started the TREC Video Benchmark in 2001. Apart from participating in the benchmark We are also actively involved in the definition of the benchmark and of the collaborative tasks involved. In the past 2 years, IBM Research is among the topmost performers across various tasks in TREC Video Benchmark.

The TREC 2001 benchmark involved a query answering task and a shot boundary detection task. The benchmark involved 74 semantic queries which needed to be answered over a corpus of 19 hours. The IBM system achieved the topmost recall among known item queries.

The TREC 2002 benchmark had a query answering task, a shot boundary detection task as well as a new concept detection task The corpus was partitioned into a 25 hour training set a 40 hour search test set and a 5 hour feature test set. The query benchmark involved 25 queries, while the concept detection benchmark involved 10 semantic concepts. IBM topped the concept detection benchmark with the highest Mean Average Precision and was ranked Second in the Query Benchmark for the Interactive Track.

 

The TREC 2003 benchmark involved 4 tasks: Shot boundary detection, semantic concept detection (with a lexicon of 17 benchmark concepts.), story boundary detection and the answering of 25 semantic queries. Ideas developed in SLAM are applied to all tasks. The IBM research TRECVID system continues to top the concept detection benchmark and the shot detection benchmark, while performing very well in the story segmentation and the search benchmarks.

 

The system can be found at http://mp7.watson.ibm.com/.

The IBM MPEG-7 annotation tool at http://www.alphaworks.ibm.com/tech/videoannex.

Contact: John R Smith

 

ARDA Video Analysis and Content Exploitation Project Phase II

The Advanced Research and Development Agency has recently begun Phase II of the Video Analysis and Content Exploitation Project. This involves the multimedia semantic content analysis of news video broadcasts for extraction of semantic information. The Pervasive Media Management Group will be involved in this project beginning in 2004. The goal of this project is to establish viability of high-level feature detection as basis for achieving video semantics understanding. By performing automatic detection of thousands of classes of concepts (objects, sites, actions) from news video using statistical learning and mining, we hope to extract the semantic threads from multiple video broadcast news sources. This will be done by accounting for possibly distinct points of view, bias, and foreign languages and by exploiting common underlying semantics (high-level features, near duplicate scenes, commonality of content and structure)

Contact: John R Smith and Milind R. Naphade