Skip to main content

Features for Audio Classification

  • Chapter
Algorithms in Ambient Intelligence

Part of the book series: Philips Research ((PRBS,volume 2))

Abstract

Four audio feature sets are evaluated in their ability to differentiate five audio classes: popular music, classical music, speech, background noise and crowd noise. The feature sets include low-level signal properties, mel-frequency spectral coefficients, and two new sets based on perceptual models of hearing. The temporal behavior of the features is analyzed and parameterized and these parameters are included as additional features. Using a standard Gaussian framework for classification, results show that the temporal behavior of features is important for automatic audio classification. In addition, classification is better, on average, if based on features from models of auditory perception rather than on standard features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bismarck, G. von [ 1974 ]. Sharpness as an attribute of the timbre of steady sounds. Acustica, 30: 159–172.

    Google Scholar 

  • Daniel, P., and R. Weber [ 1997 ]. Psychoacoustical roughness: Implementation of an optimized model. Acustica Acta Acustica, 83: 113–123.

    Google Scholar 

  • Davis, S.B., and P. Mermelstein [ 1980 ]. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28. 357–366.

    Article  Google Scholar 

  • Duda, R.O., and P.E. Hart [ 1973 ]. Pattern classification and scene analysis. Wiley, New York.

    MATH  Google Scholar 

  • Foote, J. [1997]. A similarity measure for automatic audio classification. In Proc. AAAI1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora.

    Google Scholar 

  • Fukunaga, K. [ 1972 ]. Introduction to Statistical Pattern Recognition. Academic press, New York, London.

    Google Scholar 

  • Glasberg, B.R., and B.C.J. Moore [ 1990 ]. Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47: 103–138.

    Article  Google Scholar 

  • Golub, S. [ 2000 ]. Classifying Recorded Music. Master’s thesis, University of Edinburgh. http://www.aigeek.com/aimsc/.

    Google Scholar 

  • Hermansky, H., and N. Malayath [1998]. Spectral basis functions from discriminant analysis. In International Conference on Spoken Language Processing.

    Google Scholar 

  • Li, D., I.K. Sethi, N. Dimitrova, and T. McGee [ 2001 ]. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 5: 533–544.

    Article  Google Scholar 

  • Logan, B. [2000]. Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.

    Google Scholar 

  • Lu, G., and T. Hankinson [1998]. A technique towards automatic audio classification and retrieval. In 4th Int. Conference on Signal Processing, Beijing.

    Google Scholar 

  • Naphade, M.R., and T.S. Huang [ 2000a ]. A probabilistic framework for semantic indexing and retrieval in video. In IEEE International Conference on Multimedia and Expo (I), pages 475–78.

    Google Scholar 

  • Naphade, M.R., and T.S. Huang [2000b]. Stochastic modeling of soundtrack for efficient segmentation and indexing of video. In Proc. SPIE, Storage and Retrieval for Media Databases, San Jose, CA, pages 168–176.

    Google Scholar 

  • Papoulis, A. [ 1991 ]. Probability, Random Variables and Stochastic Processes. McGraw-Hill series in electrical engineering. McGraw-Hill, New York.

    Google Scholar 

  • Patterson, R.D., M.H. Allerhand, and C. Giguere [ 1995 ]. Time domain modeling of peripheral auditory processing: A modular architecture and software platform. J. Acoust. Soc. Am., 98: 1890–1894.

    Google Scholar 

  • Scheirer, E., and M. Slaney [1997]. Construction and evaluation of a robust multifeature speech/music discriminator. In Proc. ICASSP, Munich, Germany, pages 1331–1334.

    Google Scholar 

  • Scheirer, E.D. [ 1998 ]. Tempo and beat analysis of acoustical musical signals. J. Acoust. Soc. Am., 103: 588–601.

    Google Scholar 

  • Slaney, M. [1998]. Auditory Toolbox. Technical Report 1998-010, Interval Research Corporation. http://rvl4.ecn.purdue.edu/malcolm/interval/1998–010/.

    Google Scholar 

  • Spina, M.S., and V.W. Zue [1996]. Automatic transcription of general audio data: Preliminary analysis. In Proc. 4th Int. Conf. on Spoken Language Processing, Philadelphia, PA.

    Google Scholar 

  • Spina, M.S., and V.W. Zue [1997]. Automatic transcription of general audio data: Effect of environment segmentation on phonetic recognition. In Proceedings of Eurospeech, Rhodes, Greece.

    Google Scholar 

  • Toonen Dekkers, R.T.J., and R.M. Aarts [ 1995 ]. On a Very Low-Cost Speech-Music Discriminator. Technical Report 124/95, Nat.Lab. Technical Note.

    Google Scholar 

  • Tzanetakis, G., G. Essi, and P. Cook [2001]. Automatic musical genre classification of audio signals. In Proceedings International Symposium for Audio Information Retrieval (ISMIR),Princeton, NJ.

    Google Scholar 

  • Wang, H., A. Divakaran, A. Vetro, S.F. Chang, and H. Sun, [ 2000a ]. Survey on Compressed- Domain Features used in Video/Audio Indexing and Analysis. Technical report, Department of electrical engineering, Columbia University, New York.

    Google Scholar 

  • Wang, Y., Z. Liu, and J.C. Huang [ 2000b ]. Multimedia content analysis using both audio and visual cues. IEEE Signal Processing Magazine, 17: 12–36.

    Article  Google Scholar 

  • Wold, E., T. Blum, D. Keislar, and J. Wheaton [ 1996 ]. Content-based classification, search, and retrieval of audio. IEEE Multimedia, Fall: 27–36.

    Google Scholar 

  • Zhang, M., K. Tan, and M.H. Er [ 1998 ]. Three-dimensional sound synthesis based on head-related transfer functions. J. Audio. Eng. Soc., 146: 836–844.

    Google Scholar 

  • Zhang, T., and C.C.J. Kuo [ 2001 ]. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9: 441–457.

    Article  Google Scholar 

  • Zwicker, E., and H. Fasti [ 1999a ]. Psychoacoustics: Facts and Models, volume 22 of Springer series on information sciences, chapter Roughness, pages 257–264. Springer-Verlag, Berlin, 2nd edition.

    Google Scholar 

  • Zwicker, E., and H. Fasti [ 1999b ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Loudness, pages 203–238. Springer-Verlag, Berlin, 2nd edition.

    Google Scholar 

  • Zwicker, E., and H. Fasti [ 1999c ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Sharpness and Sensory Pleasantness, pages 239–246. Springer-Verlag, Berlin, 2nd edition.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Breebaart, J., McKinney, M.F. (2004). Features for Audio Classification. In: Verhaegh, W.F.J., Aarts, E., Korst, J. (eds) Algorithms in Ambient Intelligence. Philips Research, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0703-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-0703-9_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6490-5

  • Online ISBN: 978-94-017-0703-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics