Features for Audio Classification

Breebaart, Jeroen; McKinney, Martin F.

doi:10.1007/978-94-017-0703-9_6

Jeroen Breebaart &
Martin F. McKinney

Part of the book series: Philips Research ((PRBS,volume 2))

326 Accesses
17 Citations

Abstract

Four audio feature sets are evaluated in their ability to differentiate five audio classes: popular music, classical music, speech, background noise and crowd noise. The feature sets include low-level signal properties, mel-frequency spectral coefficients, and two new sets based on perceptual models of hearing. The temporal behavior of the features is analyzed and parameterized and these parameters are included as additional features. Using a standard Gaussian framework for classification, results show that the temporal behavior of features is important for automatic audio classification. In addition, classification is better, on average, if based on features from models of auditory perception rather than on standard features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bismarck, G. von [ 1974 ]. Sharpness as an attribute of the timbre of steady sounds. Acustica, 30: 159–172.
Google Scholar
Daniel, P., and R. Weber [ 1997 ]. Psychoacoustical roughness: Implementation of an optimized model. Acustica Acta Acustica, 83: 113–123.
Google Scholar
Davis, S.B., and P. Mermelstein [ 1980 ]. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28. 357–366.
Article Google Scholar
Duda, R.O., and P.E. Hart [ 1973 ]. Pattern classification and scene analysis. Wiley, New York.
MATH Google Scholar
Foote, J. [1997]. A similarity measure for automatic audio classification. In Proc. AAAI1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora.
Google Scholar
Fukunaga, K. [ 1972 ]. Introduction to Statistical Pattern Recognition. Academic press, New York, London.
Google Scholar
Glasberg, B.R., and B.C.J. Moore [ 1990 ]. Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47: 103–138.
Article Google Scholar
Golub, S. [ 2000 ]. Classifying Recorded Music. Master’s thesis, University of Edinburgh. http://www.aigeek.com/aimsc/.
Google Scholar
Hermansky, H., and N. Malayath [1998]. Spectral basis functions from discriminant analysis. In International Conference on Spoken Language Processing.
Google Scholar
Li, D., I.K. Sethi, N. Dimitrova, and T. McGee [ 2001 ]. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 5: 533–544.
Article Google Scholar
Logan, B. [2000]. Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.
Google Scholar
Lu, G., and T. Hankinson [1998]. A technique towards automatic audio classification and retrieval. In 4th Int. Conference on Signal Processing, Beijing.
Google Scholar
Naphade, M.R., and T.S. Huang [ 2000a ]. A probabilistic framework for semantic indexing and retrieval in video. In IEEE International Conference on Multimedia and Expo (I), pages 475–78.
Google Scholar
Naphade, M.R., and T.S. Huang [2000b]. Stochastic modeling of soundtrack for efficient segmentation and indexing of video. In Proc. SPIE, Storage and Retrieval for Media Databases, San Jose, CA, pages 168–176.
Google Scholar
Papoulis, A. [ 1991 ]. Probability, Random Variables and Stochastic Processes. McGraw-Hill series in electrical engineering. McGraw-Hill, New York.
Google Scholar
Patterson, R.D., M.H. Allerhand, and C. Giguere [ 1995 ]. Time domain modeling of peripheral auditory processing: A modular architecture and software platform. J. Acoust. Soc. Am., 98: 1890–1894.
Google Scholar
Scheirer, E., and M. Slaney [1997]. Construction and evaluation of a robust multifeature speech/music discriminator. In Proc. ICASSP, Munich, Germany, pages 1331–1334.
Google Scholar
Scheirer, E.D. [ 1998 ]. Tempo and beat analysis of acoustical musical signals. J. Acoust. Soc. Am., 103: 588–601.
Google Scholar
Slaney, M. [1998]. Auditory Toolbox. Technical Report 1998-010, Interval Research Corporation. http://rvl4.ecn.purdue.edu/malcolm/interval/1998–010/.
Google Scholar
Spina, M.S., and V.W. Zue [1996]. Automatic transcription of general audio data: Preliminary analysis. In Proc. 4th Int. Conf. on Spoken Language Processing, Philadelphia, PA.
Google Scholar
Spina, M.S., and V.W. Zue [1997]. Automatic transcription of general audio data: Effect of environment segmentation on phonetic recognition. In Proceedings of Eurospeech, Rhodes, Greece.
Google Scholar
Toonen Dekkers, R.T.J., and R.M. Aarts [ 1995 ]. On a Very Low-Cost Speech-Music Discriminator. Technical Report 124/95, Nat.Lab. Technical Note.
Google Scholar
Tzanetakis, G., G. Essi, and P. Cook [2001]. Automatic musical genre classification of audio signals. In Proceedings International Symposium for Audio Information Retrieval (ISMIR),Princeton, NJ.
Google Scholar
Wang, H., A. Divakaran, A. Vetro, S.F. Chang, and H. Sun, [ 2000a ]. Survey on Compressed- Domain Features used in Video/Audio Indexing and Analysis. Technical report, Department of electrical engineering, Columbia University, New York.
Google Scholar
Wang, Y., Z. Liu, and J.C. Huang [ 2000b ]. Multimedia content analysis using both audio and visual cues. IEEE Signal Processing Magazine, 17: 12–36.
Article Google Scholar
Wold, E., T. Blum, D. Keislar, and J. Wheaton [ 1996 ]. Content-based classification, search, and retrieval of audio. IEEE Multimedia, Fall: 27–36.
Google Scholar
Zhang, M., K. Tan, and M.H. Er [ 1998 ]. Three-dimensional sound synthesis based on head-related transfer functions. J. Audio. Eng. Soc., 146: 836–844.
Google Scholar
Zhang, T., and C.C.J. Kuo [ 2001 ]. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9: 441–457.
Article Google Scholar
Zwicker, E., and H. Fasti [ 1999a ]. Psychoacoustics: Facts and Models, volume 22 of Springer series on information sciences, chapter Roughness, pages 257–264. Springer-Verlag, Berlin, 2nd edition.
Google Scholar
Zwicker, E., and H. Fasti [ 1999b ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Loudness, pages 203–238. Springer-Verlag, Berlin, 2nd edition.
Google Scholar
Zwicker, E., and H. Fasti [ 1999c ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Sharpness and Sensory Pleasantness, pages 239–246. Springer-Verlag, Berlin, 2nd edition.
Google Scholar

Download references

Authors

Jeroen Breebaart
View author publications
You can also search for this author in PubMed Google Scholar
Martin F. McKinney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Philips Research Laboratories Eindhoven, Prof. Holstlaan 4, 5656 AA, Eindhoven, The Netherlands
Wim F. J. Verhaegh , Emile Aarts & Jan Korst , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Breebaart, J., McKinney, M.F. (2004). Features for Audio Classification. In: Verhaegh, W.F.J., Aarts, E., Korst, J. (eds) Algorithms in Ambient Intelligence. Philips Research, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0703-9_6

Download citation

DOI: https://doi.org/10.1007/978-94-017-0703-9_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6490-5
Online ISBN: 978-94-017-0703-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics