Multimedia Tools and Applications

, Volume 63, Issue 1, pp 77–92 | Cite as

An analysis of content-based classification of audio signals using a fuzzy c-means algorithm



Content-based audio signal classification into broad categories such as speech, music, or speech with noise is the first step before any further processing such as speech recognition, content-based indexing, or surveillance systems. In this paper, we propose an efficient content-based audio classification approach to classify audio signals into broad genres using a fuzzy c-means (FCM) algorithm. We analyze different characteristic features of audio signals in time, frequency, and coefficient domains and select the optimal feature vector by employing a noble analytical scoring method to each feature. We utilize an FCM-based classification scheme and apply it on the extracted normalized optimal feature vector to achieve an efficient classification result. Experimental results demonstrate that the proposed approach outperforms the existing state-of-the-art audio classification systems by more than 11% in classification performance.


Audio segmentation and classification Fuzzy c-means algorithm Multimedia Database retrieval 



This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No. 2011–0017941)


  1. 1.
    Casey MA, Veltkamp R, Goto M, Leman M, Rhodes C, Slaney M (2008) Content-based music information retrieval: current directions and future challenges. Proc IEEE 96(4):668–696CrossRefGoogle Scholar
  2. 2.
    Chen L, Gunduz S, Ozsu MT (2006) Mixed type audio classification with support vector machine. IEEE Int Conf Multimedia and Expo:781–784Google Scholar
  3. 3.
    Jian-bin L, Ji-kun Y, Hui Z, Zhong-xia N (2005) AdaBoost for a digital speech recorder in different short wave environments. Int Symposium Test Measurement 9:8414–8418Google Scholar
  4. 4.
    Jian-bin L, Ji-kun Y, Hui Z, Zhong-xia N (2006) Two-stage speech/non-speech classification of telephone signals. Proc Int Conf Commun Circuits Systems 1:490–492Google Scholar
  5. 5.
    Khan MKS, Khatib WGA (2006) Machine-learning based classification of speech and music. Multimedia Systems 12(1):55–67CrossRefGoogle Scholar
  6. 6.
    Kim KM, Kim SY, Jeon JK, Park KS (2006) Quick audio retrieval using multiple feature vectors. IEEE Trans Consumer Electronics 52(1):200–204MathSciNetGoogle Scholar
  7. 7.
    Krishnamoorthy P, Kumar S (2010) Hierarchical audio content classification system using an optimal feature selection algorithm. Multimed Tool Appl. doi: 10.1007/s11042-010-0546-7
  8. 8.
    Langlois T, Marques G (2009) Automatic music genre classification using a hierarchical clustering and a language model approach. Proc Int Conf Advances Multimedia:188–193Google Scholar
  9. 9.
    Lee CH, Shih JL, Yu KM, Lin HS (2009) Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans Multimedia 11(4):670–682CrossRefGoogle Scholar
  10. 10.
    Li W, Liu Y, Xue X (2010) Robust audio identification for MP3 popular music. Proc Int ACM SIGIR Conf Research and Development in Information Retrieval :627–634Google Scholar
  11. 11.
    Liu M, Wan C, Wang L (2002) Content-based audio classification and retrieval using a fuzzy logic system: towards multimedia search engines. Soft Computing-A Fusion of Foundations, Methodologies and Applications 6(5):357–364MATHGoogle Scholar
  12. 12.
    Lopes M, Gouyon F, Koerich AL, Oliveira LES (2010) Selection of training instances for music genre classification. Int Conf Pattern Recognition:4569–4572Google Scholar
  13. 13.
    Lu L, Li SZ, Zhang HJ (2001) Context-based audio segmentation using support vector machines. IEEE Int Conf Multimedia and Expo:191–194Google Scholar
  14. 14.
    Lu L, Zhang HJ, Jiang H (2002) Content analysis for audio classification and segmentation. IEEE Trans Speech Audio Process 10(7):504–516CrossRefGoogle Scholar
  15. 15.
    Lu L, Zhang HJ, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8(6):482–492CrossRefGoogle Scholar
  16. 16.
    Luong HV, Kim CH, Kim JM (2009) Classification of audio signals using generalized spatial fuzzy clustering. J Acoust Soc Am 125:2699Google Scholar
  17. 17.
    Mirceva G, Mirchev M, Davcev D (2010) Hidden Markov models for classifying protein secondary and tertiary structures. J Convergence 1(1):57–64Google Scholar
  18. 18.
    Mitra V, Wang CJ (2007) A neural network based audio content classification. Proc Int Joint Conf Neural Networks: 1494–1499Google Scholar
  19. 19.
    Mohammad AH, Kim JM (2011) An enhanced fuzzy c-means algorithm for audio segmentation and classification. Multimed Tool Appl. doi: 10.1007/s11042-011-0921-z
  20. 20.
    Nguyen NTT, Mohammad AH, Kim CH, Kim JM (2011) Audio segmentation and classification using a temporally weighted fuzzy c-means algorithm. Int Symposium on Neural Networks. (Accepted)Google Scholar
  21. 21.
    Nitanda N, Haseyama M, Kitajima H (2006) Audio signal segmentation and classification using fuzzy c-means clustering. Systems and Computers in Japan 37(4):23–34CrossRefGoogle Scholar
  22. 22.
    Park DC (2006) Modeling and classification of audio signals using gradient-based fuzzy c-means algorithm with a Mercer Kernel. LNCS. Springer 4099:1104–1108Google Scholar
  23. 23.
    Park DC (2009) Classification of audio signals using fuzzy c-means with divergence-based Kernel. Pattern Recogn Lett 30(9):794–798CrossRefGoogle Scholar
  24. 24.
    Park DC (2011) Content-based retrieval of audio data using a centroid neural network. IEEE Int Symp Signal Processing and Information Technology:394–398Google Scholar
  25. 25.
    Park DC, Nguyen DH, Beack SH, Park S (2005) Classification of audio signals using gradient-based fuzzy c-means algorithm with divergence measure. LNCS. Springer 3767:698–708Google Scholar
  26. 26.
    Popescu A, Gavat I, Datcu M (2009) Wavelet analysis for audio signal with music classification applications. Proc Conf Speech Technology and Human-Computer Dialogue:1–6Google Scholar
  27. 27.
    Pyshkin E, Kuznetsov A (2010) Approaches for web search user interfaces: how to improve the search quality for various types of information. J Convergence 1(1):1–8Google Scholar
  28. 28.
    Saunders J (1996) Real time discrimination of broadcast speech/music. Proc Int Conf Acoustics, Speech, Signal Processing:993–996Google Scholar
  29. 29.
    Simsekli U (2010) Automatic music genre classification using bass lines. Int Conf Pattern Recognition:4137–4140Google Scholar
  30. 30.
    Theodoridis S, Koutroumbas K (2009) Pattern recognition. Academic Press. Fourth Edition. ISBN: 978-1-59749-272-0Google Scholar
  31. 31.
    Turnbull D, Elkan C (2005) Fast recognition of musical genres using RBF networks. IEEE Trans Knowledge Data Engineering 17(4):580–584CrossRefGoogle Scholar
  32. 32.
    Tzagkarakis C, Mouchtaris A, Tsakalides P (2006) Musical genre classification via generalized gaussian and alpha-stable modeling. IEEE Int Conf Acoustics Speech Signal Process 5:217–220Google Scholar
  33. 33.
    Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302CrossRefGoogle Scholar
  34. 34.
    Vitaly K, Vladimir O (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. Int J Inf Technol Commun Convergence 1(2):221–234CrossRefGoogle Scholar
  35. 35.
    Wang JC, Wang JF, Lin CB, Jian KT, Kuok WH (2006) Content-based audio classification using support vector machines and independent component analysis. Proc Int Conf Pattern Recogn 4:157–160Google Scholar
  36. 36.
    Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. IEEE Multimedia Magazine 3(3):27–36CrossRefGoogle Scholar
  37. 37.
    Yunming Y, Xutao L, Biao W, Yan L (2011) A comparative study of feature weighting methods for document co-clustering. Int J Inf Technol Commun Convergence 1(2):206–220CrossRefGoogle Scholar
  38. 38.
    Zhang T, Kuo CCJ (2001) Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans Speech Audio Process 9(4):441–457CrossRefGoogle Scholar
  39. 39.
    Zhu Y, Ming Z, Huang Q (2007) SVM-based audio classification for content-based multimedia retrieval. LNCS. Springer 4577:474–482Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.School of Electrical EngineeringUniversity of UlsanUlsanSouth Korea

Personalised recommendations