Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Audio Classification

  • Lie LuEmail author
  • Alan Hanjalic
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1032


Audio categorization; Audio indexing; Audio recognition


Audio classification aims at classifying a piece of audio signal into one of the pre-defined semantic classes. It is typically realized as a combination of a learning step to learn a statistical model of each semantic class, and an inference step to estimate which semantic class is closest to the given piece of audio signal.

Historical Background

Audio classification associates semantic labels with audio signals, and can also be referred to as audio indexing, audio categorization or audio recognition. As such, audio classification plays an important role in facilitating search and retrieval in large-scale audio collections (databases). Semantic labels are used to represent semantic classes or semantic concepts, which can be defined at different abstraction and complexity levels. Typical examples of basic semantic audio classes are speech, music, environmental sounds, and silence, which can be detected rather...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Baillie M, Jose JM. Audio-based event detection for sports video. In: Proceedings of the 2nd International Conference on Image and Video Retrieval; 2003. p. 300–9.Google Scholar
  2. 2.
    Cai R, Lu L, Hanjalic A, Zhang HJ, Cai LH. A flexible framework for key audio effects detection and Auditory context inference. IEEE Trans Audio Speech Lang Process. 2006;14(3):1026–39.CrossRefGoogle Scholar
  3. 3.
    Cheng WH, Chu WT, Wu, JL. Semantic context detection based on hierarchical audio models. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval; 2003. p. 109–15.Google Scholar
  4. 4.
    Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2000.zbMATHGoogle Scholar
  5. 5.
    Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2001.zbMATHCrossRefGoogle Scholar
  6. 6.
    Heckerman D. A tutorial on learning with Bayesian networks. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-95-06; 1995.Google Scholar
  7. 7.
    Huang C, Darwiche A. Inference in belief networks: a procedural guide. Int J Approx Reason. 1996;15(3):225–63.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Liu Z, Wang Y, Chen T. Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst Signal Image Video Technol. 1998;20(1–2):61–79.CrossRefGoogle Scholar
  9. 9.
    Lu L, Zhang HJ, Jiang H. Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process. 2002;10(7):504–16.CrossRefGoogle Scholar
  10. 10.
    Lu L, Zhang HJ, Li S. Content-based audio classification and segmentation by using support vector machines. ACM Multimed Syst J. 2003;8(6):482–92.CrossRefGoogle Scholar
  11. 11.
    Moncrieff S, Dorai C, Venkatesh S. Detecting indexical signs in film audio for scene interpretation. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2001. p. 1192–5.Google Scholar
  12. 12.
    Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–86.CrossRefGoogle Scholar
  13. 13.
    Reyes-Gomez MJ, Ellis DPW. Selection, parameter estimation, and discriminative training of hidden Markov models for general audio modeling. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 73–6.Google Scholar
  14. 14.
    Rui Y, Gupta A, Acero A. Automatically extracting highlights for TV baseball programs. In: Proceedings of the 8th ACM International Conference on Multimedia; 2000. p. 105–15.Google Scholar
  15. 15.
    Xiong Z, Radhakrishnan R, Divakaran A, Huang TS. Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 401–4.Google Scholar
  16. 16.
    Xu M, Maddage N, Xu CS, Kankanhalli M, Tian Q. Creating audio keywords for event detection in soccer video. In: Proceedings of the IEEE International Conference on Multimedia and Expo; 2003. p. 281–4.Google Scholar
  17. 17.
    Zhang T, Jay Kuo CC. Hierarchical system for content-based audio classification and retrieval. In: Proceedings of the SPIE: Multimedia Storage and Archiving Systems III; 1998. p. 398–409.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina
  2. 2.Delft University of TechnologyDelftThe Netherlands

Section editors and affiliations

  • Vincent Oria
    • 1
  • Shin'ichi Satoh
    • 2
  1. 1.Dept. of Computer ScienceNew Jersey Inst. of TechnologyNewarkUSA
  2. 2.Digital Content and Media Sciences ReseaMultimedia Information Research DivisionNational Institute of InformaticsTokyoJapan