Photo
SLAM - Semantic Learning and Analysis of Multimedia

Bridging the gap between features and semantics...

 

Multimedia content is an essential part of information technology. However, the difficulty in filtering, searching, and summarizing video has so far hindered the effective utilization of video databases. Users want to filter and query video by high-level (semantic) concepts, while automatic algorithms can extract only low-level features (e.g., color, texture, shape, amount of motion). Bridging this gap is thus the most challenging problem in video (multimedia) indexing, summarization, retrieval, and filtering.

Recent research in the Pervasive Media Management Group involves mapping low-level features to high-level semantics, where the semantics relates to the concepts, context and structure in multimedia content. Our approach for learning semantics is based on the theory of statistical pattern recognition and machine learning. This framework consists of representation of concepts using probabilistic multimedia objects, representation of contextual constraints of such objects using graphical probabilistic models and representation of structure involving these models using Markov-chain based temporal models. This approach leads to the modeling of a finite coverage in semantic space. To extend that to a broader space we explore techniques for rare class classification, interactive learning, and semi-supervised techniques. Apart from reducing supervision the challenge is to maintain acceptable levels of performance. A related research challenge is to expand highly semantic queries to the lexicon that is modeled using feature-vector-based modeling. We have validated several of our techniques using the NIST TREC Video Benchmark.

 

To enable the efficient and scalable modeling of semantic concepts in high-dimensional low-level media feature spaces, we investigate the application of machine learning algorithms that provide us with generic trainable procedures to model the semantic concepts. To this end, we attempt to maximize the information gain by analyzing multiple modalities, actively seeking user feedback, and actively using the semantic context to help improve detection performance, reduce user inputs, and scale to a large number of concepts that are required to cover the semantic space.

 

Once the semantic space spanned by a lexicon of several hundred concepts is constructed using the models for these semantic concepts, further analysis and mining at a semantic level is feasible. We explore this research direction in our SEMANTRIX project funded by the Advanced Research and Development Agency. In this 2 year project phase, we attempt to apply multi-level concept modeling to the task of reconstruction of threads and exploitation of information from multiple video broadcast news sources.

 

Our research in the modeling of semantic concepts is also designed to speed up the process of annotation of several thousand meta-tags. In this context our work on active annotation and efficient video annotation attempts to involve the learning algorithms in the loop during annotation to reduce the amount of time it takes for media annotaters to richly annotate tag the multimedia content in the least amount of time. This is done by using redundancy in the multimedia content and the ability of the learning algorithms to ask intelligent questions to the annotaters that help them concentrate on those items that are difficult to tag automatically.

 

As we increase the dimensionality of the semantic feature space we face challenges of indexing and retrieval in high-dimensions. For alleviating these problems we are investigating techniques such as anchoring to help reduce the cost of complex similarity measurements by approximating them with surrogate measurements using a small and finite number of anchors that stake the high-dimensional semantic spaces.