Photo
Video Semantic Summarization Systems

    The VideoAnnEx annotation tool assists authors in the task of annotating video sequences with MPEG-7 metadata.  Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets.  The annotated descriptions are associated with each video shot and are stored as MPEG-7 descriptions in an output XML file.  VideoAnnEx can also open MPEG-7 files in order to display the annotations for the corresponding video sequence.  The annotation tool also allows customized lexicons to be created, saved, downloaded, and updated.

    VideoAnnEx takes an MPEG video sequence as the required input source.  The tool also requires a corresponding shot segmentation file, where the input video sequence is segmented into smaller units called video shots by detecting the scene cuts, dissolves, and fades.  This shot file can be loaded into the tool from other sources or generated when the input video is first opened.  After VideoAnnEx performs shot detection on a video, the shot file can be saved in MPEG-7 schema for later use.  As an alternative, the shot file can also be generated by the IBM CueVideo Shot Detection Toolkit.  

    The VideoAnnEx annotation tool is divided into four graphical sections as illustrated in Figure 1.  On the upper right-hand corner of the tool is the Video Playback window with shot information.  On the upper left-hand corner of the tool is the Shot Annotation with a key frame image display.  On the bottom portion of the tool is two different Views Panel of the annotation preview.  A fourth component, not shown in Figure 1, is the Region Annotation pop-up window for specifying annotated regions.  These four sections provide interactivity to assist authors of the annotation tool.

VideoAnn
Figure 1: IBM VideoAnnEx Annotation Tool divided into four regions: (1) Video Playback, (2) Shot Annotation, (3) Views Panel, and (4) Region Annotation (not shown).


Table of Content

   Overview of Graphical User Interface
        Video Playback
        Shot Annotation
        Views Panel
             Frames in the Shot
             Shots in the Video
        Region Annotation
   User's Guide
           Download Software  (new)
           Annotate Video
           Annotation Guidelines
           Keyword Vocabulary
           Annotation Tips
   Functions and Features
           File I/O
           Advanced Features
   Appendix
           Example


Overview of Graphical User Interface

    The VideoAnn is divided into four graphical sections as illustrated in Figure 1.  On the upper right-hand corner of the tool is the Video Playback window with shot information.  On the upper left-hand corner of the tool is the Shot Annotation with a key frame image display.  On the bottom portion of the tool is two different Views Panel of the annotation preview.  A fourth component, not shown in Figure 1, is the Region Annotation pop-up window for specifying annotated regions.  These four sections provide interactivity to assist authors of the annotation tool.

    The Video Playback window on the upper right-hand corner displays the opened MPEG video sequence as show in Figure 2.  The four playback buttons directly below the video display window include:

  • Play - Play the video in normal real-time mode.
  • FF - Play the video in fast forward mode [display I- and P-frames].
  • FFF - Play the video in super fast forward [display only I-frames].
  • Stop - Pause the video in the current frame.
As the video is played back in the display window, the current shot information is given as well.  These shot information include the current shot number, the shot start frame, and the shot end frame.  Note that the first shot starts at number 0.

VideoAnn
Figure 2: Video Playback of the IBM VideoAnn Annotation Tool.

    The Shot Annotation module on the upper left-hand corner displays the defined annotation descriptions and the key frame window as depicted in Figure 3.  As the video is displayed on the Video Playback, a key frame image of the current shot is displayed on the Key Frame window.  The key frame is a representative image of the video shot segment, and thus offer an instantaneous recap of the whole video shot.  Consequently, the key frame may provide the author with immediate assistance in annotating the shot descriptions.  In the shot annotation module, the annotation lexicon is also displayed.  There are three types of lexicon as follows:

  • Events - List the action events that can be used to annotate the shots.
  • Static Scene - List the background static scenes that can be used to annotate the shots.
  • Key Objects - List the significant objects that are present in the shots.
In each of the three lexicons, the descriptions are organized in a hierarchical tree structure.  These annotation descriptions have corresponding check boxes for the author to select.  Furthermore, there is a Keywords box for customized annotations.  Once the check boxes have been selected and the keywords typed, the author hits the <OK> button to advance to the next shot.

VideoAnn
Figure 3: Shot Annotation of the IBM VideoAnn Annotation Tool.

    The Views Panel on the bottom displays two different previews of representative images of the video.  They are:

The Frames in the Shot view shows all the I-frames as representative images of the current shot as shown in Figure 4.  A maximum of 18 images can be displayed in this view.  This allows the author to obtain an instantanous temporal insight into the video shot without having to playback the video shot over time. The <Prev> and <Next> buttons refresh the view panel to reflect the previous and next shot frames in the video sequence.  Also, one can double-click on any of the representative images in the panel.  This action designates that selected image to be the new key frame for this shot, and is respectively displayed on the Key Frame window.  In this preview mode, if the author clicks the <OK> button on the Shot Annotation Window then the video will stop playback of the current shot and advance to play the next shot.

VideoAnn
Figure 4: Frames in the Shot of the Views Panel in the IBM VideoAnn Annotation Tool.

The Shots in the Video view shows all the key frames of each shot as representative images over the entire video as illustrated in Figure 5.  Below each shot's key frame is the annotated descriptions, if indeed they have already been provided.   The author can peruse the entire video sequence in this view and examine the annotated and non annotated shots.  The <Prev> and <Next> buttons scroll the view panel horizontally to reflect the temporal video shot ordering.  Also, one can double-click on any of the representative images in the panel.  This action instantiates the selection of the corresponding shot, resulting in (1) the appropriate shot being displayed on the Video Playback window, (2) the simultaneous key frame being displayed on the Key Frame window, and (3) the corresponding checked descriptions on the Shot Annotation panels.  In this preview mode, if the author clicks the <OK> button on the Shot Annotation Window then the video will FFF playback of the current shot and advance to play the next shot in normal playback mode.

VideoAnn
Figure 5: Shots in the Video of the Views Panel in the IBM VideoAnn Annotation Tool.

    The Region Annotation pop-up window shown in Figure 6 allows the author to associate a rectangular region with a labeled text annotation.  After the text annotations are identified on the Shot Annotation window, each description can be associated with a corresponding region on the selected key frame of that shot.  When the author finishes check marking the text annotations and clicks the <OK> button, then the Region Annotation window appears.  On the left side of the Region Annotation window is a column of descriptions listed under <Annotation List>.  On the right side is the display of the selected key frame for this shot along with some rectangular regions.  For each description on the <Annotation List>, there may be one or no corresponding region on the key frame.

VideoAnn
Figure 6: Region Annotation of the IBM VideoAnn Annotation Tool.

The descriptions under the <Annotation List> may be presented in one of four colors:

  • Black - the corresponding description has not been region annotated.
  • Blue - the corresponding description is currently selected.
  • Gray - the corresponding description has been labeled with a rectangular region.
  • Red - the corresponding description has no applicable region. (ie, when you click <N/A>)
The regions on the Key Frame image may be presented in one of two colors:
  • Blue - the region is associated with one of the not-current descriptions (ie, the description in Gray color).
  • White - the region is associated with the currently selected description (ie, the description in Blue color).
When the Region Annotation window pops up, the first description on the <Annotation List> is selected and highlighted in Blue, while the other descriptions are colored Black.  The system then waits for the author to provide a region on the image where the description appears by click-and-drag a rectangular bounding box around the area of interest.  Right after the region is designated for one description, the system advances to the next description on the list.  If there is no applicable region on the key frame image, click the <N/A> button, and the corresponding description will appear in Red.  At any time, the author can click any description on the <Annotation List> to make that selection current.  Thus the description text will appear in Blue and the corresponding region, if any, will appear in White.  Furthermore, this action allows the author to modify the current region of any description at any time.  For rules regarding region annotation, please refer to the Annotation Guidelines.

< Back to Table of Contents >


User's Guide

Download Software

Download the IBM VideoAnnEx annotation tool at the IBM alphaWorks web site:

                    http://www.alphaworks.ibm.com

 


Annotate Video

1.    Open an MPEG video for annotation.
        > File    > Open
        Select the location of the MPEG-1 or MPEG-2 video file.

2.    After an MPEG video is opened, the annotation lexicon will appear in the Shot Annotation panel.

3.    Play the video sequence on the Video Playback window by selecting the <Play>, <FF>, <FFF>, or <Stop> buttons.

4.    The video will pause playing at the end of the current shot, waiting for the author to enter the annotations.

5.    For the current video shot,

6.    Identify the annotation for a shot by selecting the check boxes on the Shot Annotation module.
        Each shot should have at least one selection from the <Static Scenes> and from the <Key Objects>.
        Annotations for temporal features and actions can be selected from <Events>.
        Furthermore, the author can specify other descriptions on the <Keywords> textbox.
        Multiple entries can be entered for <Keywords>, as long as they are separated by commas.

7.    When the author finishes annotation for a shot, click the <OK> button on the Shot Annotation module
        in order to advance to the next shot.

8.    View the annotations by switching to the Shots in the Video Views Panel.

9.    Save the annotations for this video.
        > File    > Save MPEG-7 XML
        Specify the location and filename.


Annotation Guidelines

The important step in using the VideoAnn annotation tool is to study the annotation lexicon.  The lexicon is divided into three categories, as displayed in the Shot Annotation module.  As we annotate a shot, keep in mind that the shot occurs at some scene.  So we suggest annotating the static scene descriptions first.  Afterward, focus our attention to the key subjects in the scene.  Identify these subjects with key object descriptions.  Finally, observe the actions executed by these objects.  These actions are labeled with event descriptions.  Furthermore, some vocabularies are not available in the lexicon.  Use the keywords box to annotate additional descriptions.  Keywords may include proper nouns, titles, captions, and other remarks.  At the end of this section, we have compiled a list of Keyword Vocabulary, some sample Keyword Images, and a list of Annotation Tips.

After specifying the text annotations for a shot, the regions corresponding to these descriptions are also recorded.  Here are the guidelines for identifying the regions of interest.  Note that these guidelines are suggestive only and are generated with respect to our goal of training video retrieval models.  The guideline is divided into three parts to correspond to the three different lexicon categories: static scenes, key objects, and events.  Here is the summary:

  • For a static scene annotation, we can inscribe the bounding box within the region of interest, so as to capture the corresponding color and texture features.  [ie, clouds, water, greenery, desert]
  • For a key object annotation, we can circumscribe the bounding box around the region of interest, so as to capture the corresponding shape, edge, and dominant color.  [ie, airplane, deer, flag, person, logo]
  • For an event annotation, we do not need to specify any region for the bounding box, since the key object(s) that performed these actions are already annotated.  [ie, rocket launch, boat sailing, person speaking]
As a final note for our annotation guideline, remember to always annotate conservatively.
"When in doubt, do not annotate."


Keyword Vocabulary

Here is a listing of the keywords used in our vocabulary for the TREC Video Retrieval Benchmark.  A corresponding set of Keyword Images is listed in the appendix for your reference.
 
Statue of Liberty US Flag Corn
John Deere Antlers
Jupiter  Mars
Apollo NASA Discovery
Perseus Bi-Plane X-29
Ron Vaughn Ronald Reagan Harry Hertz
David J. Nash Lou Gossett Jr. Lynn Bondurant
Fort Dome Northwest
Star Wars R2D2 3CPO
Glen Canyon Grand Canyon Hoover Dam
Flood Lake
Monologue Dialogue Title

< Back to Table of Contents >



Annotation Tips

1.  On the Key Objects panel, if you click on Rocket, Transportation is implied.  You should not click on both, since that just creates redundant information to be stored in the database.

2.  Although a lot of categories are not actually disjoint, but for our purposes, we are assuming that they are.  An example of this is that a
Rocket is a form of Transportation as well as a Man-Made Object.  Since we are assuming that Man-Made Object exclude Transportation, then we would just click on Rocket and not Man-Made Object.

3.  Do not mix up descriptions that are in  different categories.  Anything you label under Static Scene is used to describe the background, and you must pick one background that best describes the sequence.  Clicking on Man-Made under Static Scene means that the background is man-made, and not the key object in the foreground.  Thus, Man-Made static scenes include Road, Cityscape, and other outdoor places that are not naturally occurring.  Again, notice that although Man-Made scenery may be outdoors, we would not click on Outdoors, we would click on Man-Made.

4.   In the Events category,

  • Take-Off/Launch - Use this event for the take-off or launch of a helicopter, hot-air balloon, airplane, rocket, or space shuttle.
  • Landing - Use this event for the landing of any transportation vehicle.  The object in key frame will tell us what specific thing is landing.
  • Blank - Use this event when the scene is blank, no useful information, or nothing worthwhile in the scene to annotate.
5.    In the Key Objects category, note the distinction between a Rocket and a Space Shuttle.  A rocket is a jet engine that propels something in the air.  We will think of rockets as rocket engines, which propel in the air, as well as the casings which surround space shuttles.  A space shuttle is what astronauts actually travel in.  Space shuttles are much bigger and have doors and landing gears.  Here are two pictures of a rocket and a space shuttle.
VideoAnn
VideoAnn
Rocket
Space Shuttle

< Back to Table of Contents >


Functions and Features


File I/O

The main menu functions in the IBM VideoAnn annotation tool are file I/O.  There are a total of 8 menu functions under the File menu, as defined follows:

  •     Open - Open an MPEG-1 or MPEG-2 video file and corresponding CueVideo shots file.  If a FRP frame random-access file exists in the same directory, this file is loaded as well; otherwise the FRP will be generated automatically.
  •     Save MPEG-7 XML - Save the video annotation as an MPEG-7 XML file.
  •     Load MPEG-7 XML - Load the video annotation from a specified MPEG-7 XML file.
  •     Save Shot List - Save the new shots list.  The original CueVideo shots file may be modified to include a different key frame for any shot.
  •     Load Shot List - Load an existing shots list, instead of the default one loaded under the <Open> menu.
  •     Save Shot Frames - Save all the frames in the current shot as individual JPEG images under the current directory.
  •     Save Shot I-Frames - Save all the I-frames in the current shot as individual JPEG images under the current directory.
  •     Save All Key Frames - Save all the key frames in the entire video as individual JPEG images under the current directory.

Advanced Features

The IBM VideoAnn annotation tool is designed to assist the advanced users with additional functions to refine the annotation process.  These features are itemized below to correspond to the desired task.

  • Go to a specific shot in the video.
                Go to the Shots in the Video Views Panel to display all the shot key frames in the video.
                Double-click on the shot image in this Views Panel that you would like to go to.
                The new current shot will be played back in the Video Playback window.
                The corresponding key frame will be displayed on the Key Frame window of the Shot Annotation module.
  • Designate a new key frame for this shot.
                Go to the shot in which you want to modify the key frame. [see above]
                Go to the Frames in the Shot Views Panel to display all the representative I-frames in the current shot.
                Double-click on the image in this Views Panel that you would like to designate as the new key frame.
                The new key frame will be displayed on the Key Frame window of the Shot Annotation module.
  • Use a different shots list.
                Open the video sequence.  <File>  <Open>  Specify video filename.
                Designate a different shots list.  <File>  <Load Shot List>  Specify a new shots filename.
  • Modify the annotation for a shot.
        Go to the Shots in the Video Views Panel to display all the shot key frames in the video.
                Go to the shot in which you want to modify the annotation. [see above]
                The annotated descriptions will appear below each shot key frame image in this Views Panel.
                The corresponding descriptions will appear in the annotation lexicon of the Shot Annotation module with check marks.
                Click on the existing check marks of the corresponding boxes for those annotations that you would like to delete.
                Click on the corresponding boxes for those annotations that you would like to add.
                Click <OK> when done modifying this shot.
  • Review the annotation for a video.
                Open the video sequence.  <File>  <Open>  Specify video filename.
                Load shots list.  Use default shot filename or specify a different one. [see above]
                Open annotation descriptions.  <File> <Load MPEG-7 XML>  Specify XML filename.
                Go to the Shots in the Video Views Panel to display all the shot key frames in the video.
                The annotated descriptions will appear below each shot key frame image in this Views Panel.
                The corresponding descriptions will appear in the annotation lexicon of the Shot Annotation module with check marks.
 
 

< Back to Table of Contents >


Example

In this section, we will illustrate how to start using the IBM VideoAnn annotation tool to generate an MPEG-7 XML description file.  Topics covered will include using basis features of this tool to display the video content, annotate the video sequence, save the annotations, and review the annotations.
 
 

Open a Video Sequence

On Menu, <File>  <Open>
Specify the video filename.

VideoAnn
Play the Video Content

On Video Playback, <Play> or <FF> or <FFF>
Pause by clicking <Stop>

VideoAnn
View all Frames in the Shot

On Views Panel, <Frames in the Shot>

VideoAnn
View all Shots in the Video

On Views Panel, <Shots in the Video>

VideoAnn
Study the Annotation Lexicons

On Shot Annotation, scroll up and down the <Events>, <Static Scenes>, and <Key Objects> panels.  Note the hierarchical structures of the annotation lexicons.

VideoAnn
Annotate the Shot

On Shot Annotation, click the boxes next to the appropriate annotations that describes the video shot.
Also, type additional descriptions in the <Keywords> box.  When finished with the shot annotation, click the <OK> button.

VideoAnn
Check the Shot Annotations

On Views Panel, go to <Shots in the Video>.
The annotations are listed under the key images of each shot.

VideoAnn
Save the Annotations

On Menu, <File>  <Save MPEG-7 XML>.
Specify the ouput XML filename. 
 

Load the Annotations

First, the video sequence must be opened.
On Menu, <File>  <Load MPEG-7 XML>.
Specify the XML filename. 

VideoAnn
Select New Key Frame for a Shot

Go to the shot, whose key frame is to be modified.
On Views Panel, select <Frames in the Shot>.
Double-click on the desired image to designate as the new key frame for this shot.  The new key frame will be displayed on the <Key Frame> window of the Shot Annotation partition.

VideoAnn
Modify the Shot Annotations

Go to the shot, whose annotations are to be modified.
On Views Panel, go to <Shots in the Video>.
Double-click on the key image of that shot.
The key image will become highlighted.
The corresponding annotations will be displayed on the Shot Annotation windows with marked check boxes.
Modify the annotation by clicking the check boxes.
 

VideoAnn

< Back to Table of Contents >