TOP PAGETokyo Research LaboratoryEmploymentProjectsRelated InformationIBM Research
Japanese page is here.
@ @

Video enrichment


This project seeks the establishment of technologies to process video images down to object level, i.e. to process individual objects in a frame instead of processing entire frames, as with conventional technologies. By manipulating images at object level, it is possible to analyze the objects movements, speed and position, and relationships between multiple objects through time. With this information, images can be interpreted and analyzed, obtaining meanings that can be associated to unannotated video. The final result will be not only making search and summarization based on the video contents possible, but also combining objects and recreating scenes from different points of view. Furthermore, it will make possible to gather statistical information about an object from the video alone, opening possibilities for a broad range of new applications.

Research items

There are several ways in which images can be interpreted, making it difficult to extract their meanings automatically. It is not possible for a computer to interpret the contents of an image simply by applying some generic process. This is particularly true with images of sports events, that have very few annotation cues. Interpretation of sports images depends on having some a-priori knowledge about the rules, the characteristics of the players and the playing field. In the Video Enrichment approach, players are defined as objects, and images can be interpreted by analyzing the movements of both individual and multiple objects. Video images carry information through time that is not always perceptible to the user but that can be revealed by this spatio-temporal analysis of the objects, making Video Enrichment a valuable tool for Knowledge Management.

The figure below shows the Video Enrichment components developed at the IBM Japan Tokyo Research Laboratory. First, objects are obtained by segmenting the images, and information about their position, movements and relationships are obtained. Images are annotated based on this object information, camera movements and a-priori knowledge. Users can then use these annotations not only to search the contents of a video and summarize the results of a query, but also to obtain statistical information about an object and to analyze changes throughout time. This process allows a better, deeper interpretation of the images. framework

Fruits of this research will be contributed to the next generation international standard MPEG-7, aimed at applications of multimedia contents. This project is also part of the "Advanced Research for Multimedia Communication Network", a project of the Communications Research Laboratory (CRL) of the Japanese Ministry of Posts and Telecommunications, and of an international project involving CRL and the Electronics and Telecommunications Research Institute (ETRI, Korea). Joint researches are also been conducted with Princeton University and Osaka University.

Publications Related information

Newspaper articles (in Japanese)

  • "NHK News-7" Apr. 6, 1999
  • "Asahi Shinbum", "Yomiuri Shinbum", "Nihon Keizai Shinbum", "Nikkei Sangyo Shinbum", "Nikkan Kougyo Shinbum", "Denpa Shinbum" Apr. 7, 1999.
  • "Nihon Keizai Shinbum" Dec 22, 1997. "Nikkei Sangyo Shinbun" Dec. 17

Research home IBM home Order Privacy Legal Contact IBM
Last modified 30 September 1999