Audio Content Analysis for Understanding Structures of Scene in Video

  • Chan-Mi Kang
  • Joong-Hwan Baek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4113)


In this paper, we propose a system to categorize audio in 7 classes. For classification features, we use the mean and variance of RMS, ZCR, fundamental frequency and frequency peak which are extracted from every frame of 25ms length. In addition to the audio content classification, we also perform speaker identification with the voice sequences extracted automatically using our proposed method. The accuracy of our proposed scheme reaches 93.8% in categorizing audio signal and 80% in the speaker identification process.


Root Mean Square Fundamental Frequency Gaussian Mixture Model Audio Signal Temporal Autocorrelation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baek, J.S., Lee, S.T., Baek, J.H.: Scene Boundary Detection by Audiovisual Contents Analysis. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 530–539. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Kim, H.G., Moreau, N., Sikora, T.: MEPG-7 Audio and Beyond Audio Content Indexing and Retrieval. Wiley, Chichester (2005)CrossRefGoogle Scholar
  3. 3.
    Zhang, T., Jay, K.C.-C.: Audio Content Analysis for Online Audiovisual Data Segmentation and Classification. Speech and Audio Processing IEEE Transactions 9, 441–457 (2001)CrossRefGoogle Scholar
  4. 4.
    Panagiotakis, C., Tziritas, G.: A Speech/music Discriminator Based on RMS and Zero-Crossings. IEEE transactions on Multimedia 7, 155–166 (2005)CrossRefGoogle Scholar
  5. 5.
    Quatieri, T.: Discrete-time Speech Signal Processing Principles and Practice. Prentice Hall PTR, Englewood Cliffs (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chan-Mi Kang
    • 1
  • Joong-Hwan Baek
    • 2
  1. 1.Multimedia Retrieval Lab. in School of Electronics and Communication EngineeringHankuk Aviation University 
  2. 2.School of Electronics and Communication EngineeringHankuk Aviation University 

Personalised recommendations