Skip to main content

Scene Change Detection Based on Audio-Visual Analysis and Interaction

  • Conference paper
  • First Online:
Multi-Image Analysis

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2032))

Abstract

A scene change detection method is presented in this paper, which analyzes both auditory and visual information sources and accounts for their inter-relations and coincidence to semantically identify video scenes. Audio analysis focuses on the segmentation of the audio source into three types of semantic primitives, i.e. silence, speech and music. Further processing on speech segments aims at locating speaker change instants. Video analysis attempts to segment the video source into shots, without the segmentation being affected by camera pans, zoom-ins/outs or significantly high object motion. Results from single source segmentation are in some cases suboptimal. Audio-visual interaction achieves to either enhance single source findings or extract high level semantic information. The aim of this paper is to identify semantically meaningful video scenes by exploiting the temporal correlations of both sources based on the observation that semantic changes are characterized by significant changes in both information sources. Experimentation has been carried on a real TV serial sequence composed of many different scenes with plenty of commercials appearing in-between. The results are proven to be rather promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Correia and F. Pereira, “The role of analysis in content-based video coding and indexing”, Signal Processing, Elsevier, vol. 66, no. 2, pp. 125–142, 1998.

    Article  MATH  Google Scholar 

  2. A. Del Bimbo, Visual Information Retrieval, Morgan Kaufmann Publishers, Inc., San Francisco, California, 1999.

    Google Scholar 

  3. M.R. Naphade, R. Mehrotra, A.M. Ferman, J. Warnick, T.S. Huang, and A.M. Tekalp, “A high-performance shot boundary detection algorithm using multiple cues”, in Proc. of 1998 IEEE Int. Conf. on Image Processing, Chicago, Illinois, USA, 4-7 Oct. 1998, vol. 1, pp. 884–887.

    Google Scholar 

  4. N. Dimitrova, T. McGee, H. Elenbaas, and J. Martino, “Video content management in consumer devices”, IEEE Trans. on Knowledge and Data Engineering, vol. 10, no. 6, pp. 988–995, 1998.

    Article  Google Scholar 

  5. R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Scene determination based on video and audio features”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, pp. 685–690.

    Google Scholar 

  6. C. Saraceno and R. Leonardi, “Identi-cation of story units in audio-visual sequences by joint audio and video processing”, in Proc. of 1998 IEEE Int. Conf. on Image Processing, Chicago, Illinois, USA, 4-7 Oct. 1998, vol. 1, pp. 363–367.

    Google Scholar 

  7. C. Saraceno, “Video content extraction and representation using a joint audio and video processing”, in Proc. of 1999 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 15-19 Mar. 1999, vol. 6, pp. 3033–3036.

    Article  Google Scholar 

  8. S. Tsekeridou and I. Pitas, “Audio-visual content analysis for content-based video indexing”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, vol. I, pp. 667–672.

    Article  Google Scholar 

  9. L. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Englewood Cliffs, N.J.: Prentice Hall, 1978.

    Google Scholar 

  10. S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.

    Article  Google Scholar 

  11. P. Delacourt and C. Wellekens, “Audio data indexing: Use of second-order statistics for speaker-based segmentation”, in Proc. of 1999 IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 1999, vol. II, pp. 959–963.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tsekeridou, S., Krinidis, S., Pitas, I. (2001). Scene Change Detection Based on Audio-Visual Analysis and Interaction. In: Klette, R., Gimel’farb, G., Huang, T. (eds) Multi-Image Analysis. Lecture Notes in Computer Science, vol 2032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45134-X_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45134-X_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42122-1

  • Online ISBN: 978-3-540-45134-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics