Audio-Assisted Scene Segmentation for Story Browsing

  • Yu Cao
  • Wallapak Tavanapong
  • Kihwan Kim
  • JungHwan Oh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2728)


Content-based video retrieval requires an effective scene segmentation technique to divide a long video file into meaningful high-level aggregates of shots called scenes. Each scene is part of a story. Browsing these scenes unfolds the entire story of a film. In this paper, we first investigate recent scene segmentation techniques that belong to the visual-audio alignment approach. This approach segments a video stream into visual scenes and an audio stream into audio scenes separately and later aligns these boundaries to create the final scene boundaries. In contrast, we propose a novel audio-assisted scene segmentation technique that utilizes audio information to remove false boundaries generated from segmentation by visual information alone. The crux of our technique is the new dissimilarity measure based on analysis of statistical properties of audio features and a concept in information theory. The experimental results on two full-length films with a wide range of camera motion and a complex composition of shots demonstrate the effectiveness of our technique compared with that of the visual-audio alignment techniques.


Visual Scene Analysis Window Audio Feature Audio Stream Shot Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hanjalic, A., Lagendijk, R. L., Biemond, J.: Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Transactions on Circuits and Systems for Video Technology 9 (1999) 580–588CrossRefGoogle Scholar
  2. 2.
    Bordwell, D., Thompson, K.: Film Art: An Introduction (5 ed.). McGraw-Hill Companies, Inc. (1997)Google Scholar
  3. 3.
    Sundaram, H., Chang, S. F.: Determining computable scenes in films and their structures using audio-visual memory models. In: Proc. of ACM Multimedia, LA, CA, USA (2000) 95–104Google Scholar
  4. 4.
    Jiang, H., Lin, T., Zhang, H. J.: Video segmentation with the assistance of audio content analysis. In: Proc. of ICME 2000. (2000) 1507–1510 vol.3Google Scholar
  5. 5.
    Tavanapong, W., Zhou, J.: Shot clustering techniques for story browsing. To appear in IEEE Transactions on Multimedia ( (2003)Google Scholar
  6. 6.
    Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. of the IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing. Volume 2., IEEE (1997) 1331–1334Google Scholar
  7. 7.
    Campbell, J. P.: Speaker recognition: A tutorial. In: Proc. of IEEE. Volume 85. (1997) 1437–1461CrossRefGoogle Scholar
  8. 8.
    Lu, L., Jiang, H., Zhang, H.: A robust audio classification and segmentation method. In: Proc. of ACM Multimedia, Ottawa, Ontario, Canada (2001) 203–211Google Scholar
  9. 9.
    Kullback, S.: Information Theory and Statistics. Dover Publications, New York, NY (1997)zbMATHGoogle Scholar
  10. 10.
    J. Tou, Gonzalez, R.: Pattern recongnition principles. Applied Mathemetics and Computation (1974)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yu Cao
    • 1
  • Wallapak Tavanapong
    • 1
  • Kihwan Kim
    • 1
  • JungHwan Oh
    • 2
  1. 1.Department of Computer ScienceIowa State UniversityAmesUSA
  2. 2.Department of Computer Science and EngineeringUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations