Abstract
Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.
Similar content being viewed by others
References
Atal B and Schroeder M (1979). Predictive coding of speech signals and subjective error criteria. IEEE Trans Acoust Speech Signal Process 27(3): 247–254
Blum T, Keislar D, Wheaton J, Wold E (1999) Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information. U.S. Patent 5, 918, 223, (1999)
Graps A (1995). An introduction to wavelets, IEEE Computer Science and Engineering. IEEE Comput Soc 2(2): 50–61
Guo G and Li SZ (2003). Content-based audio classification and retrieval by support vector machines. IEEE Trans Neural Netw 14(1): 209–215
Kailath T (1974). A view of three decades of linear filtering theory. IEEE Trans Inf Theory 20(2): 146–181
Logan B (2000) Mel frequency cepstral coefficients for music modeling. In: Proceedings of the international symposium on music information retrieval (SMIR)
Markel JD and Gray A (1976). Linear prediction of speech. Communication & Cybernetics. Springer, Heidelberg
McGarry KJ, Wermter S, McIntyre J (1999) Knowledge extraction from radial basis function networks and multi-layer perceptrons. In: Proceedings of international joint conference on neural networks (IJCNN), Washinton, vol 4, pp 2494–2497
MPEG Requirement Group (1998) MPEG-7: overview of the MPEG-7 standard. ISO/IEC JTC1/SC29/WG11 N3752, France
National Communication System - Office Technology and Standards (1984) Federal Standard 1015, telecommunications: analog to digital conversion of radio voice by 2400 bit/second linear predictive coding
Principe JC, Euliano NR and Lefebvre WC (2000). Neural and adaptive systems: fundamentals through simulations. Wiley, New York
Rabiner L and Juang B (1993). Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs
Slaney M (1999) Auditory toolbox for Matlab, Interval Research Corporation, Version 2. http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf
Tzanetakis G, Essl G, Cook P (2001) Audio analysis using the discrete wavelet transform. In: Proceedings of WSES international conference, acoustics and music: theory and applications (AMTA), Skiathos, Greece
Wiener N (1949). Extrapolation, interpolation and smoothing of stationary time series with engineering applications. Technology Press/Wiley, New York
Wold H (1954). A study in the analysis of stationary time series, 2nd edn. Almquist and Wiksell, Stockholm
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mitra, V., Wang, CJ. Content based audio classification: a neural network approach. Soft Comput 12, 639–646 (2008). https://doi.org/10.1007/s00500-007-0241-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-007-0241-4