Audio-visual processing for scene change detection

  • Caterina Saraceno
  • Riccardo Leonardi
Poster Session C: Compression, Hardware & Software, Databases, Neural Networks, Object Recognition & Reconstruction
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1311)


The organization of video data-bases according to semantic content of data, is a key point in multimedia technologies. In fact, this would allow algorithms such as indexing and retrieval to work more efficiently.

As an attempt to extract semantic information, efforts have been devoted in segmenting the video in shots and for each shot trying to extract informations such as representative video frame, etc. As a video sequence is constructed from a 2-D projection of a 3-D scene, processing video information only has shown its limitations especially in solving problems such as object identification or object tracking. Further not all information is contained in the video signal and more can be achieved by analyzing the audio signal as well. Information can be obtained from the audio signal either to confirm the results obtained by a video processing unit or to acquire information that cannot be extracted from video (such as presence of music).

This paper presents a technique which combines video and audio information for classification and indexing purposes.


Audio Signal Finite State Machine Scene Change Video Information Audio Frame 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    L. Rabiner & B. H. Juang, Fundamentals of Speech Recognition, ed. Prentice Hall, 1994Google Scholar
  2. 2.
    P. De Souza, “A Statistical Approach to the Design of an Adaptive SelfNormalizing Silence Detector”, IEEE trans. Acoust., Speech, Signal Processing, ASSP-31(3):678–684, Jun. 1983.Google Scholar
  3. 3.
    H. Kobataker, “Optimization of Voiced/Unvoiced Decision in Nonstationary Noise Environments”, IEEE Transaction on Acoustic, Speech & Signal Proc., ASSP-35(1):9–18, Jan. 1987.Google Scholar
  4. 4.
    I. K. Sethi & N. Patel, “A Statistical Approach to Scene Change Detection”, Storage and Retrieval for Image and Video Databases III, SPIE-2420:329–338, Feb. 1995.Google Scholar
  5. 5.
    A. Hampapur, R. Jain and T Weymouth, “Digital Video Segmentation”, Proc. of Multimedia 94 Conf., San Francisco, pp. 357–363, 1994.Google Scholar
  6. 6.
    H. Zhang, C. Y. Low and S. W. Smoliar, “Video Parsing and Browsing Using Compressed Data”, Multimedia Tools and Applications, Kluwer Academic Publishers, Boston, Vol. 1, pp. 89–111, 1995.Google Scholar
  7. 7.
    J. Meng, Y. Juan & Shih-Fu Chang, “Scene Change Detection in a MPEG Compressed Video Sequence”, SPIE-2419:14–25, 1995.Google Scholar
  8. 8.
    J.W. Pitton, K. Wang and B.H. Juang, “Time-Frequency Analysis and Auditory Modeling for Automatic Recognition of 0Speech”, Proceedings of the IEEE, 84(9):1199–1215, Sep. 1996.CrossRefGoogle Scholar
  9. 9.
    G.R. Doddington, “Speaker Recognition Identifying People by their Voices”, Proceedings of the IEEE, 73(11):1651–1664, Nov. 1985.Google Scholar
  10. 10.
    J. Saunders, “Real-Time Discrimination of Broadcast Speech/Music” Proc. of the 1996 ICASSP Conf., 993–996, 1996.Google Scholar
  11. 11.
    M.M Yeung and B.L. Yeo, “Video content characterization and compaction for digital library application”, Storage and Retrieval for Image and Video Databases V,SPIE-3022, pp. 45–58, Feb. 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Caterina Saraceno
    • 1
  • Riccardo Leonardi
    • 1
  1. 1.Signals and Communications Lab., Dept. of Electronics for Automation, School of EngineeringUniversity of BresciaItaly

Personalised recommendations