Computational Media Aesthetics - Chitra Dorai, IBM Research

Computational Media Aesthetics

With the explosion of online media and media-based services, a key challenge in the area of media management is automation of content annotation, indexing, and organization for efficient access, search, retrieval, and browsing applications. One of the major failings of current media annotation systems is the semantic gap. This is the gap between the richness of meaning and interpretation that users desire to associate with their queries for searching and browsing media and the shallowness of the features (content descriptions) that can be actually computed today. Smeulders et al (IEEE Trans. MM 2000) call attention to this fact that when ``the user seeks semantic similarity, the database can only provide similarity on data processing''.

To address this issue and to build innovative high-level semantics-based content description services, I have created, jointly with Svetha Venkatesh, an approach called the Computational Media Aesthetics. Inspired by recent work in media aesthetics, we define Computational Media Aesthetics as the algorithmic study of a number of image and aural elements in media and the computational analysis of the principles that have emerged underlying their use and manipulation, individually or jointly, in the creative art of clarifying, intensifying, and interpreting some event for the audience. The core trait of our approach is that in order to create tools for automatically understanding video, we need to be able to interpret the data with its maker's eye. This is described in our recent paper, titled "Computational Media Aesthetics: Finding Meaning Beautiful". See also our recent edited volume titled Media Computing Computational Media Aesthetics published in June 2002 by Kluwer Academic Publishers.

Computational Media Aesthetics, undergirded by the broad rules and conventions of content creation, uses the production knowledge to elucidate the relationships between the many ways in which basic visual and aural elements are manipulated in video and their intended meaning and perceived impact on content consumers. It proposes a framework for computational understanding of the dynamic nature of the narrative structure and audiovisual imagery in media via analysis of integration and sequencing principles of audio/visual elements and their use in media productions. It helps analyze videos to understand the film grammar, in particular and uses the set of rules that are commonly followed during the narration of a story, to assist us in deriving the annotation or description of video contents effectively. A system built using this principled approach where videos are analyzed guided by the tenets of film grammar will be effective in providing high-level concept oriented media descriptions that can function across many contexts and in enhancing the quality and richness of descriptions derived. Its usefulness in bridging the semantic gap is demonstrated in our paper in COSIGN 2001.

Some of our recent work on extracting expressive elements in film such as pace and rhythm focuses on the application of Computational Media Aesthetics to motion pictures. See Publications for recent papers and Honors for recognition received. This work is joint with Prof. Svetha Venkatesh, Brett Adams, Simon Moncrieff, and Ba Tu Truong.

While in our recent work on motion picture we have drawn upon film grammar to derive high level constructs, we believe that such an approach will work in other video domains. News, Sitcoms, Sport etc. all have more or less complex grammars that may be used to capture their crafted structure. There is structure regardless of particular media context but there may not be homogeneity, and therefore it helps to be guided by production knowledge in media computing.


 Back Last modified: Wed Dec 13 15:56:11 2000