![]() |
![]() ![]() ![]() ![]() ![]() ![]() |
| Japanese page is here. |
Video enrichment/Image annotation scheme and search |
@ |
|
Motion recognition is performed based on changes in an object's shape through time. This is accomplished by discarding the color information inside the region of the image comprising the object, obtaining a silhouette. This silhouette changes as the object moves, generating patterns in eigenspace that characterizes given movements. Therefore, a motion can be recognized by matching a movement in eigenspace with previously recorded movement patterns. This process requires computing the changes in an object's continuous movement and mapping them to the eigenspace, however at the present development stage, these changes have been inputted manually. It has been done by determining an object's motion during an interval, attributing a motion identifier and registering the frame numbers at the beginning and end of the movement. Considering that an object performs the same motion in all frames within this interval, this data is inputted only at the boundaries, when the object changes to a different movement. This is done for all objects during their lifetime in the video. Thus, the essential description unit is the motion identifier. The description of an objects movements is called "Action", and the annotation comprises the motion identifier, start and end frames and the object's position observed thru time (i.e. its trajectory, described as a series of discrete points in the time interval; the position of the object in an arbitrary point in time can be calculated by interpolating the points registered in the "Action"). The example below depicts 20 seconds of a soccer game, showing the movements of the main players (thin line in black) and the trajectory of the ball (thick line in red). It was obtained by analyzing scenes of a video from an actual soccer game, extracting the objects, recovering the camera movement parameters and recreating the movements of each player on the playing field.
![]() The figure below represents the concept of objects in a time interval. (A) and (B) represents the teams, Obj. X is the ball. The annotation for the ball's movement is of an object without its motion identifier. Action ::= < Action ID>< Time Inter-val>< Object ID>< Trajectory> ![]() Next, an "Interaction" is built describing the meaning of a scene composed of several objects. Objects pictured in a scene can have different lifetimes and may be performing different actions, but their interaction is used to annotate a scene. Interaction ::= < Interaction ID>< Time Inter-val>< Object No>< Object IDs>< Spatial Description>"Interaction" describes events such as "pass" (passing a ball) or "goal" (scoring a goal).
begin iact Through_pass t0 O0 L0 child_iact 1 Pass t1 O1 L1 child_act 3 Stay Walk Run t2 o2 L2 child_act 3 Stay Walk Run t3 o3 L3 where /* o2,and o3 are defense player*/ get_object_from_GO o4 1 O1 not_same_team o4 o2 not_same_team o4 o3 . . . less_than d3 7.0 less_than d4 7.0 fill t0 t1 O0 O1 L0 L1 endWith this description, it is possible to consistently search for a "Through_pass". Using results of a search like this, new descriptions are generated as new "Interactions", allowing the search and retrieval correspondent scenes thereafter. The following picture shows the search screen and the results from a search. The interface is based on a web browser, sending search queries to a video database server, retrieveing the results and showing the correspondent scenes at the client side.
|
|
|
| Last modified 30 September 1999 |