Video Scene Detection

Video scene detection is a fundamental video processing step aiming at division of a video into its comprising scenes.

Temporal video scenes, which are a semantic layer of division above shots, typically relay a specific concept or theme which acts as a component of the story delivered by the video.

This technology allows contextual information to be analyzed at the shot/scene level, and can enable automated and efficient video browsing, indexing and summarization.

The method first performs shot boundary detection to detect the shots in the video. Then we extract an audio and visual representation for each shot, and perform optimal sequential grouping of the intermediate-fusion of this multimodal representation.

For more information please see our blog post, or see our publications (detailed below).

Additionally we make available the Open Video Scene Detection (OVSD) dataset. This dataset is created from Creative Commons licensed videos freely available for download and use. These are either short or full-length movies from a variety of genres, including but not limited to animation, documentary, drama, crime, comedy, and sci-fi. We provide the ground truth scene annotation for technology evaluation purposes.



Daniel Rotman, Video AI Technologies, IBM Research - Haifa