On the Use of Audio Events for Improving Video Scene Segmentation
This work deals with the problem of automatic temporal segmentation of a video into elementary semantic units known as scenes. Its novelty lies in the use of high-level audio information, in the form of audio events, for the improvement of scene segmentation performance. More specifically, the proposed technique is built upon a recently proposed audio-visual scene segmentation approach that involves the construction of multiple scene transition graphs (STGs) that separately exploit information coming from different modalities. In the extension of the latter approach presented in this work, audio event detection results are introduced to the definition of an audio-based scene transition graph, while a visual-based scene transition graph is also defined independently. The results of these two types of STGs are subsequently combined. The results of the application of the proposed technique to broadcast videos demonstrate the usefulness of audio events for scene segmentation and highlight the importance of introducing additional high-level information to the scene segmentation algorithms.
KeywordsVideo analysis Scene segmentation Audio events Scene transition graph
This work was supported by the European Commission under contracts FP6-045547 VIDI-Video and FP7-248984 GLOCAL.
- 1.Tsamoura E, Mezaris V, Kompatsiaris I (2008) Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework. In: Proceedings of IEEE international conference on image processing, workshop on multimedia information retrieval (ICIP-MIR 2008), pp 45–48Google Scholar
- 2.Hanjalic A, Lagendijk RL, Biemond J (1999) Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans Circ Syst Video Technol 9(4):580–588Google Scholar
- 5.Nitanda N, Haseyama M, Kitajima H (2005) Audio signal segmentation and classification for scene-cut detection. In: Proc IEEE Int Symp Circ Syst 4:4030–4033Google Scholar
- 6.Chianese A, Moscato V, Penta A, Picariello A (2008) Scene detection using visual and audio attention. In: Proceedings of Ambi-Sys workshop on ambient media delivery and interactive televisionGoogle Scholar
- 7.Wilson K, Divakaran A (2009) Discriminative genre-independent audio-visual scene change detection. In: Proceedings of SPIE conference on multimedia content access: algorithms and systems III, vol 7255Google Scholar
- 9.Sidiropoulos P, Mezaris V, Kompatsiaris I, Meinedo H, Trancoso I (2009) Multi-modal scene segmentation using scene transition graphs. In: Proceedings of ACM Multimedia, pp 665–668Google Scholar
- 10.Amaral R, Meinedo H, Caseiro D, Trancoso I, Neto J (2007) A prototype system for selective dissemination of broadcast news in European Portuguese. EURASIP J Adv Sig Proces 2007:1–11Google Scholar
- 11.Meinedo H (2008) Audio pre-processing and speech recognition for Broadcast News. PhD thesis, IST, Technical University of LisbonGoogle Scholar
- 12.Trancoso I, Pellegrini T, Portelo J, Meinedo H, Bugalho M, Abad A, Neto J (2009) Audio contributions to semantic video search. In: Proceedings of IEEE international conference on multimedia and expo, pp 630–633Google Scholar
- 13.Bugalho M, Portelo J, Trancoso I, Pellegrini T, Abad A (2009) Detecting audio events for semantic video search. In: Proceedings of interspeech 2009Google Scholar
- 14.Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm
- 15.Meinedo H, Trancoso I (2010) Age and gender classification using fusion of acoustic and prosodic features. In: Proceedings of Interspeech 2010Google Scholar