Scene Change Detection Based on Audio-Visual Analysis and Interaction

Tsekeridou, Sofia; Krinidis, Stelios; Pitas, Ioannis

doi:10.1007/3-540-45134-X_16

Sofia Tsekeridou⁶,
Stelios Krinidis⁶ &
Ioannis Pitas⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2032))

470 Accesses
4 Citations

Abstract

A scene change detection method is presented in this paper, which analyzes both auditory and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of the audio source into three types of semantic primitives, i.e. silence, speech and music. Further processing on speech segments aims at locating speaker change instants. Video analysis attempts to segment the video source into shots, without the segmentation being affected by camera pans, zoom-ins/outs or significantly high object motion. Results from single source segmentation are in some cases suboptimal. Audio-visual interaction achieves to either enhance single source findings or extract high level semantic information. The aim of this paper is to identify semantically meaningful video scenes by exploiting the temporal correlations of both sources based on the observation that semantic changes are characterized by significant changes in both information sources. Experimentation has been carried on a real TV serial sequence composed of many different scenes with plenty of commercials appearing in-between. The results are proven to be rather promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Correia and F. Pereira, “The role of analysis in content-based video coding and indexing”, Signal Processing, Elsevier, vol. 66, no. 2, pp. 125–142, 1998.
Article MATH Google Scholar
A. Del Bimbo, Visual Information Retrieval, Morgan Kaufmann Publishers, Inc., San Francisco, California, 1999.
Google Scholar
M.R. Naphade, R. Mehrotra, A.M. Ferman, J. Warnick, T.S. Huang, and A.M. Tekalp, “A high-performance shot boundary detection algorithm using multiple cues”, in Proc. of 1998 IEEE Int. Conf. on Image Processing, Chicago, Illinois, USA, 4-7 Oct. 1998, vol. 1, pp. 884–887.
Google Scholar
N. Dimitrova, T. McGee, H. Elenbaas, and J. Martino, “Video content management in consumer devices”, IEEE Trans. on Knowledge and Data Engineering, vol. 10, no. 6, pp. 988–995, 1998.
Article Google Scholar
R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Scene determination based on video and audio features”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, pp. 685–690.
Google Scholar
C. Saraceno and R. Leonardi, “Identi-cation of story units in audio-visual sequences by joint audio and video processing”, in Proc. of 1998 IEEE Int. Conf. on Image Processing, Chicago, Illinois, USA, 4-7 Oct. 1998, vol. 1, pp. 363–367.
Google Scholar
C. Saraceno, “Video content extraction and representation using a joint audio and video processing”, in Proc. of 1999 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 15-19 Mar. 1999, vol. 6, pp. 3033–3036.
Article Google Scholar
S. Tsekeridou and I. Pitas, “Audio-visual content analysis for content-based video indexing”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, vol. I, pp. 667–672.
Article Google Scholar
L. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, N.J.: Prentice Hall, 1978.
Google Scholar
S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.
Article Google Scholar
P. Delacourt and C. Wellekens, “Audio data indexing: Use of second-order statistics for speaker-based segmentation”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, vol. II, pp. 959–963.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, Box 451, 54006, Thessaloniki, Greece
Sofia Tsekeridou, Stelios Krinidis & Ioannis Pitas

Authors

Sofia Tsekeridou
View author publications
You can also search for this author in PubMed Google Scholar
Stelios Krinidis
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Pitas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for image Technology and Robotics (CITR Tamaki), Auckland, New Zealand
Reinhard Klette & Georgy Gimel’farb &
2039 Beckman Institute, University of Illinois, 405 N. Mathews, Urbana, IL, 61801, USA
Thomas Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsekeridou, S., Krinidis, S., Pitas, I. (2001). Scene Change Detection Based on Audio-Visual Analysis and Interaction. In: Klette, R., Gimel’farb, G., Huang, T. (eds) Multi-Image Analysis. Lecture Notes in Computer Science, vol 2032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45134-X_16

Download citation

DOI: https://doi.org/10.1007/3-540-45134-X_16
Published: 02 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42122-1
Online ISBN: 978-3-540-45134-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics