Abstract
Four audio feature sets are evaluated in their ability to differentiate five audio classes: popular music, classical music, speech, background noise and crowd noise. The feature sets include low-level signal properties, mel-frequency spectral coefficients, and two new sets based on perceptual models of hearing. The temporal behavior of the features is analyzed and parameterized and these parameters are included as additional features. Using a standard Gaussian framework for classification, results show that the temporal behavior of features is important for automatic audio classification. In addition, classification is better, on average, if based on features from models of auditory perception rather than on standard features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bismarck, G. von [ 1974 ]. Sharpness as an attribute of the timbre of steady sounds. Acustica, 30: 159–172.
Daniel, P., and R. Weber [ 1997 ]. Psychoacoustical roughness: Implementation of an optimized model. Acustica Acta Acustica, 83: 113–123.
Davis, S.B., and P. Mermelstein [ 1980 ]. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28. 357–366.
Duda, R.O., and P.E. Hart [ 1973 ]. Pattern classification and scene analysis. Wiley, New York.
Foote, J. [1997]. A similarity measure for automatic audio classification. In Proc. AAAI1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora.
Fukunaga, K. [ 1972 ]. Introduction to Statistical Pattern Recognition. Academic press, New York, London.
Glasberg, B.R., and B.C.J. Moore [ 1990 ]. Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47: 103–138.
Golub, S. [ 2000 ]. Classifying Recorded Music. Master’s thesis, University of Edinburgh. http://www.aigeek.com/aimsc/.
Hermansky, H., and N. Malayath [1998]. Spectral basis functions from discriminant analysis. In International Conference on Spoken Language Processing.
Li, D., I.K. Sethi, N. Dimitrova, and T. McGee [ 2001 ]. Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 5: 533–544.
Logan, B. [2000]. Mel frequency cepstral coefficients for music modeling. In International Symposium on Music Information Retrieval.
Lu, G., and T. Hankinson [1998]. A technique towards automatic audio classification and retrieval. In 4th Int. Conference on Signal Processing, Beijing.
Naphade, M.R., and T.S. Huang [ 2000a ]. A probabilistic framework for semantic indexing and retrieval in video. In IEEE International Conference on Multimedia and Expo (I), pages 475–78.
Naphade, M.R., and T.S. Huang [2000b]. Stochastic modeling of soundtrack for efficient segmentation and indexing of video. In Proc. SPIE, Storage and Retrieval for Media Databases, San Jose, CA, pages 168–176.
Papoulis, A. [ 1991 ]. Probability, Random Variables and Stochastic Processes. McGraw-Hill series in electrical engineering. McGraw-Hill, New York.
Patterson, R.D., M.H. Allerhand, and C. Giguere [ 1995 ]. Time domain modeling of peripheral auditory processing: A modular architecture and software platform. J. Acoust. Soc. Am., 98: 1890–1894.
Scheirer, E., and M. Slaney [1997]. Construction and evaluation of a robust multifeature speech/music discriminator. In Proc. ICASSP, Munich, Germany, pages 1331–1334.
Scheirer, E.D. [ 1998 ]. Tempo and beat analysis of acoustical musical signals. J. Acoust. Soc. Am., 103: 588–601.
Slaney, M. [1998]. Auditory Toolbox. Technical Report 1998-010, Interval Research Corporation. http://rvl4.ecn.purdue.edu/malcolm/interval/1998–010/.
Spina, M.S., and V.W. Zue [1996]. Automatic transcription of general audio data: Preliminary analysis. In Proc. 4th Int. Conf. on Spoken Language Processing, Philadelphia, PA.
Spina, M.S., and V.W. Zue [1997]. Automatic transcription of general audio data: Effect of environment segmentation on phonetic recognition. In Proceedings of Eurospeech, Rhodes, Greece.
Toonen Dekkers, R.T.J., and R.M. Aarts [ 1995 ]. On a Very Low-Cost Speech-Music Discriminator. Technical Report 124/95, Nat.Lab. Technical Note.
Tzanetakis, G., G. Essi, and P. Cook [2001]. Automatic musical genre classification of audio signals. In Proceedings International Symposium for Audio Information Retrieval (ISMIR),Princeton, NJ.
Wang, H., A. Divakaran, A. Vetro, S.F. Chang, and H. Sun, [ 2000a ]. Survey on Compressed- Domain Features used in Video/Audio Indexing and Analysis. Technical report, Department of electrical engineering, Columbia University, New York.
Wang, Y., Z. Liu, and J.C. Huang [ 2000b ]. Multimedia content analysis using both audio and visual cues. IEEE Signal Processing Magazine, 17: 12–36.
Wold, E., T. Blum, D. Keislar, and J. Wheaton [ 1996 ]. Content-based classification, search, and retrieval of audio. IEEE Multimedia, Fall: 27–36.
Zhang, M., K. Tan, and M.H. Er [ 1998 ]. Three-dimensional sound synthesis based on head-related transfer functions. J. Audio. Eng. Soc., 146: 836–844.
Zhang, T., and C.C.J. Kuo [ 2001 ]. Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9: 441–457.
Zwicker, E., and H. Fasti [ 1999a ]. Psychoacoustics: Facts and Models, volume 22 of Springer series on information sciences, chapter Roughness, pages 257–264. Springer-Verlag, Berlin, 2nd edition.
Zwicker, E., and H. Fasti [ 1999b ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Loudness, pages 203–238. Springer-Verlag, Berlin, 2nd edition.
Zwicker, E., and H. Fasti [ 1999c ]. Psychoacoustics: Facts and models, volume 22 of Springer series on information sciences, chapter Sharpness and Sensory Pleasantness, pages 239–246. Springer-Verlag, Berlin, 2nd edition.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Breebaart, J., McKinney, M.F. (2004). Features for Audio Classification. In: Verhaegh, W.F.J., Aarts, E., Korst, J. (eds) Algorithms in Ambient Intelligence. Philips Research, vol 2. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-0703-9_6
Download citation
DOI: https://doi.org/10.1007/978-94-017-0703-9_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6490-5
Online ISBN: 978-94-017-0703-9
eBook Packages: Springer Book Archive