![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
| Japanese page is here. |
Video enrichment |
@ |
Overview
This project seeks the establishment of technologies to process video
images down to object level, i.e. to process individual objects
in a frame instead of processing entire frames,
as with conventional technologies.
By manipulating images at object level, it is possible
to analyze the objects movements, speed and position, and relationships
between multiple objects through time. With this information, images
can be interpreted and analyzed, obtaining meanings that can be associated
to unannotated video. The final result will be not only making search
and summarization based on the video contents possible, but also
combining objects and recreating scenes from different points of view.
Furthermore, it will make possible to gather statistical information
about an object from the video alone, opening possibilities for a broad
range of new applications.
There are several ways in which images can be interpreted, making it difficult to extract their meanings automatically. It is not possible for a computer to interpret the contents of an image simply by applying some generic process. This is particularly true with images of sports events, that have very few annotation cues. Interpretation of sports images depends on having some a-priori knowledge about the rules, the characteristics of the players and the playing field. In the Video Enrichment approach, players are defined as objects, and images can be interpreted by analyzing the movements of both individual and multiple objects. Video images carry information through time that is not always perceptible to the user but that can be revealed by this spatio-temporal analysis of the objects, making Video Enrichment a valuable tool for Knowledge Management.
The figure below shows the Video Enrichment components developed
at the IBM Japan Tokyo Research Laboratory. First, objects are
obtained by segmenting the images, and information about their
position, movements and relationships are obtained. Images are
annotated based on this object information, camera movements
and a-priori knowledge. Users can then use these annotations
not only to search the contents of a video and summarize the
results of a query, but also to obtain statistical information
about an object and to analyze changes throughout time. This
process allows a better, deeper interpretation of the images.
Fruits of this research will be contributed to the next generation international standard MPEG-7, aimed at applications of multimedia contents. This project is also part of the "Advanced Research for Multimedia Communication Network", a project of the Communications Research Laboratory (CRL) of the Japanese Ministry of Posts and Telecommunications, and of an international project involving CRL and the Electronics and Telecommunications Research Institute (ETRI, Korea). Joint researches are also been conducted with Princeton University and Osaka University.
Publications
Related information
Newspaper articles (in Japanese)
|
|
|
| Last modified 30 September 1999 |