Overview
This system adds indexes which describe the
content of a video, and generates the metadata
used when the video digest is generated.
It is very difficult for general computer-based
image processing to properly add all of the
index locations needed to understand the
video content, so this system allows for
manually adding two kinds of indexes. The
manual indexing strategies used are:
- Scene-based indexing
- Event-based indexing
The indexes are described using MPEG-7 and
used as metadata when the video digests are
generated.
Scene-based indexing
Scene-based indexing is suitable for video
content which has a stable scene layout.
Semantic indexes can be added to each scene.
For example, this technique will work with
news programs, documentaries, and movies.
This system automatically divides the video
into segments for each scene. This allows
adding titles and comments for each scene
as metadata. Captions, sounds, and keywords
can be used for indexing.
Event-based indexing
For sports videos, indexes associated with
each event in the video are better for understanding
the video content. This distinguishes event-based
indexing from scene-based indexing. This
event-based indexing adds the kinds of event
and the time of this events as metadata.
For example, in a soccer video when a trigger
event occurs, additional information can
be is added to the index, such as the names
of the team and player triggering the event.
There are two kinds of trigger event, a single
trigger event and a multi-trigger event.
The single trigger event creates an index
entry which consists of one trigger event,
but the multi-trigger event records information
involving several related trigger events.
For example, a multi-trigger event could
describe a soccer play involving cooperation
among several soccer players.
For actual indexing, assigning each trigger
event to a key on the keyboard reduces the
indexing time. Using our current system,
event-based indexing takes only about 1.5
times as long as the actual video. Our next
target is real-time indexing.
MPEG-7 metadata
The event index associated with a video is
modeled as a continuous function with predefined
importance. For scene-based indexing, the
length of the generated video digest is a
step function, because the index uses discrete
weights. For event-based indexing, the weight
of the index has continuous values, and the
digest can be of arbitrary length. According
to the digest lengths requested by the users,
thresholds are set and digests of various
lengths can be generated. In this authoring
system, these event indexes and weights are
described using MPEG-7.
|