Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Audio Representation

  • Lie LuEmail author
  • Alan Hanjalic
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1442


Audio characterization; Audio feature extraction


An audio signal is a signal that contains information in the audible frequency range. Audio representation refers to the extraction of audio signal properties, or features, that are representative of the audio signal composition (both in temporal and spectral domain) and audio signal behavior over time. Feature extraction is typically combined with feature selection, through which the best set of features for the intended operation on the audio signal is defined.

Historical Background

Audio feature extraction typically leads to a strongly reduced audio signal representation. Obtaining such representation can improve the efficiency of audio processing and benefit many applications based on such processing. For example, a compact representation of an audio signal in the form of a fingerprintcan enable extremely fast search for a match between this signal and a large-scale audio database for the purpose of audio signal...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Cai R, Lu L, Hanjalic A, Zhang H-J, Cai L-H. A flexible framework for key audio effects detection and auditory context inference. IEEE Trans Audio Speech Lang Process. 2006;14(3):1026–39.CrossRefGoogle Scholar
  2. 2.
    Casey MA. MPEG-7 sound-recognition tools. IEEE Trans Circuits Syst Video Technol. 1997;11(6):737–47.CrossRefGoogle Scholar
  3. 3.
    Foote J. Content-based retrieval of music and audio. In: Proceedings of the SPIE Multimedia Storage and Archiving Systems II; 1997. p. 138–47.Google Scholar
  4. 4.
    Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(7/8):1157–82.zbMATHGoogle Scholar
  5. 5.
    Liu Z, Wang Y, Chen T. Audio feature extraction and analysis for scene segmentation and classification. J VLSI Signal Process Syst. 1998;20(1–2): 61–79.CrossRefGoogle Scholar
  6. 6.
    Lu L, Zhang H-J, Jiang H. Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process. 2002;10(7):504–16.CrossRefGoogle Scholar
  7. 7.
    Lu L, Zhang H-J, Li S. Content-based audio classification and segmentation by using support vector machines. ACM Multimed Syst J. 2003;8(6):482–92.CrossRefGoogle Scholar
  8. 8.
    Peltonen V, Tuomi J, Klapuri AP, Huopaniemi J, Sorsa T. Computational auditory scene recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 2002. p. 1941–4.Google Scholar
  9. 9.
    Rabiner L, Juang BH. Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall; 1993.Google Scholar
  10. 10.
    Saunders J. Real-time discrimination of broadcast speech/music. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 1996. p. 993–6.Google Scholar
  11. 11.
    Scheirer E, Slaney M. Construction and evaluation of a robust multifeature music/speech discriminator. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing; 1997. p. 1331–4.Google Scholar
  12. 12.
    Tzanetakis G, Cook P. Marsyas: a framework for audio analysis. Organ Sound. 2000;4(3).CrossRefGoogle Scholar
  13. 13.
    Wall ME, Rechtsteiner A, Rocha LM. Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M, editors. A practical approach to microarray data analysis. Norwell: Kluwer; 2003. p. 91–109. LANL LA-UR-02-4001.Google Scholar
  14. 14.
    Wold E, Blum T, Wheaton J. Content-based classification, search and retrieval of audio. IEEE Multimedia. 1996;3(3):27–36.CrossRefGoogle Scholar
  15. 15.
    Zhang T, Kuo C-CJ. Video content parsing based on combined audio and visual information. In: Proceedings of the SPIE: Multimedia Storage and Archiving Systems IV; 1999. p. 78–89.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Microsoft Research AsiaBeijingChina
  2. 2.Delft University of TechnologyDelftThe Netherlands

Section editors and affiliations

  • Vincent Oria
    • 1
  • Shin'ichi Satoh
    • 2
  1. 1.Dept. of Computer ScienceNew Jersey Inst. of TechnologyNewarkUSA
  2. 2.Digital Content and Media Sciences ReseaMultimedia Information Research DivisionNational Institute of InformaticsTokyoJapan