Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Audio Content Analysis

  • Lie LuEmail author
  • Alan Hanjalic
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1528


Audio information retrieval; Semantic inference in audio


An audio signal is a signal that contains information in the audible frequency range. Audio content analysis refers to a set of theories, algorithms and systems that aim at extracting descriptors or metadata related to audio content and allowing search, retrieval and other user actions performed on audio signals.

Historical Background

Multimedia content analysis has been one of the most booming research directions in the past years. With the objective of providing fast, natural, intuitive and personalized content-based access to vast multimedia data collections, and building on the synergy of many scientific disciplines, such as signal processing, pattern recognition, machine learning, information retrieval, information theory, natural language processing and psychology, the research initiative born around the end of the 1980s has succeeded in inspiring and mobilizing enormous number of researchers worldwide....

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Cai R, Lu L, Hanjalic A. Unsupervised content discovery in composite audio. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2005. p. 628–37.Google Scholar
  2. 2.
    Cai R, Lu L, Hanjalic A, Zhang H-J, Cai L-H. A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio Speech Lang Process. 2006;14(3):1026–39.CrossRefGoogle Scholar
  3. 3.
    Casey M, et al. Content-based music information retrieval: current directions and future challenges. In: Proceedings of the IEEE, Special Issue on Advances in Multimedia Information Retrieval. 2008;96(4):668–96.CrossRefGoogle Scholar
  4. 4.
    Cheng W-H, Chu W-T, Wu J-L. Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval; 2003. p. 109–15.Google Scholar
  5. 5.
    Hanjalic A. Content-based analysis of digital video. Norwell: Kluwer; 2004.zbMATHGoogle Scholar
  6. 6.
    Huang X, Acero A, Hon HW. Spoken language processing: a guide to theory, algorithm, and system development. Upper Saddle River: Prentice; 2001.Google Scholar
  7. 7.
    Lu L, Cai R, Hanjalic A. Audio elements based auditory scene segmentation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 2006. p. 17–20.Google Scholar
  8. 8.
    Lu L, Zhang H-J, Jiang H. Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process. 2002;10(7):504–16.CrossRefGoogle Scholar
  9. 9.
    Radhakrishnan R, Divakaran A, Xiong Z. A time series clustering based framework for multimedia mining and summarization using audio features. In: Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval; 2004. p. 157–64.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina
  2. 2.Delft University of TechnologyDelftThe Netherlands

Section editors and affiliations

  • Vincent Oria
    • 1
  • Shin'ichi Satoh
    • 2
  1. 1.Dept. of Computer ScienceNew Jersey Inst. of TechnologyNewarkUSA
  2. 2.Digital Content and Media Sciences ReseaMultimedia Information Research DivisionNational Institute of InformaticsTokyoJapan