Skip to main content
Log in

Content based audio classification: a neural network approach

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Atal B and Schroeder M (1979). Predictive coding of speech signals and subjective error criteria. IEEE Trans Acoust Speech Signal Process 27(3): 247–254

    Article  Google Scholar 

  2. Blum T, Keislar D, Wheaton J, Wold E (1999) Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information. U.S. Patent 5, 918, 223, (1999)

  3. Graps A (1995). An introduction to wavelets, IEEE Computer Science and Engineering. IEEE Comput Soc 2(2): 50–61

    Article  Google Scholar 

  4. Guo G and Li SZ (2003). Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1): 209–215

    Article  Google Scholar 

  5. Kailath T (1974). A view of three decades of linear filtering theory. IEEE Trans Inf Theory 20(2): 146–181

    Article  MATH  MathSciNet  Google Scholar 

  6. Logan B (2000) Mel frequency cepstral coefficients for music modeling. In: Proceedings of the international symposium on music information retrieval (SMIR)

  7. Markel JD and Gray A (1976). Linear prediction of speech. Communication & Cybernetics. Springer, Heidelberg

    Google Scholar 

  8. McGarry KJ, Wermter S, McIntyre J (1999) Knowledge extraction from radial basis function networks and multi-layer perceptrons. In: Proceedings of international joint conference on neural networks (IJCNN), Washinton, vol 4, pp 2494–2497

  9. MPEG Requirement Group (1998) MPEG-7: overview of the MPEG-7 standard. ISO/IEC JTC1/SC29/WG11 N3752, France

  10. National Communication System - Office Technology and Standards (1984) Federal Standard 1015, telecommunications: analog to digital conversion of radio voice by 2400 bit/second linear predictive coding

  11. Principe JC, Euliano NR and Lefebvre WC (2000). Neural and adaptive systems: fundamentals through simulations. Wiley, New York

    Google Scholar 

  12. Rabiner L and Juang B (1993). Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  13. Slaney M (1999) Auditory toolbox for Matlab, Interval Research Corporation, Version 2. http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf

  14. Tzanetakis G, Essl G, Cook P (2001) Audio analysis using the discrete wavelet transform. In: Proceedings of WSES international conference, acoustics and music: theory and applications (AMTA), Skiathos, Greece

  15. Wiener N (1949). Extrapolation, interpolation and smoothing of stationary time series with engineering applications. Technology Press/Wiley, New York

    MATH  Google Scholar 

  16. Wold H (1954). A study in the analysis of stationary time series, 2nd edn. Almquist and Wiksell, Stockholm

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vikramjit Mitra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mitra, V., Wang, CJ. Content based audio classification: a neural network approach. Soft Comput 12, 639–646 (2008). https://doi.org/10.1007/s00500-007-0241-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-007-0241-4

Keywords

Navigation