Skip to main content
Log in

A fusion way of feature extraction for automatic categorization of music genres

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Over the past decade, the invention of streaming services has led to the magnification of the music industry. With a plethora of available song choices, there is a dire need for recommendation techniques to help listeners discover music genres complementing their palate. This makes a vital need for automatic music genre categorization systems. With this objective, in this work fusion of direct and indirect features is introduced for the automatic categorization of music genres. In direct Feature Extraction (FE), the physical characteristics of music genres are assessed by timbral, chroma, and source separation-based features. In indirect FE, tunable Q-Wavelet transform and Teager energy operator are used to explore the non-linear characteristics of music signals. The proposed algorithm is examined on the GTZAN dataset, primarily focusing on the four-class classification problem. The introduced features are tested with multiple machine learning techniques to explore the best for music genre categorization. The wide neural network classifier with a single fully connected layer churned out optimal performance fetching an overall accuracy and F1 score of 95.8% and 95.82%, respectively. The proposed algorithm also outperforms most of the state-of-the-art techniques for the given dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm A1
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The dataset that support the findings of this study is belongs to [GTZAN] and it is available on Kaggle.

References

  1. Abdoli S, Cardinal P, Lameiras Koerich A (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl, Elsevier 136:252–263

    Google Scholar 

  2. Baniya BK, Lee J (2016) Importance of audio feature reduction in automatic music genre classification. Multimed Tools Appl, Springer 75:3013–3026

    Google Scholar 

  3. Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Di W (2017) Recommendation system for immunization coverage and monitoring. Human Vacc Immun, Taylor and Francis 14(1):165–171

    Google Scholar 

  4. Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2018) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst Taylor and Francis 13(3):329–351

    Google Scholar 

  5. Bhatti UA, Yuan L, Yu Z, Li J, Nawaz SA, Mehmood A, Zhang K (2020) Hybrid watermarking algorithm using Clifford algebra with Arnold scrambling and chaotic encryption. IEEE Access 8:76386–76398

    Google Scholar 

  6. Bhatti UA, … Mehmood A (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15

    Google Scholar 

  7. Borjian N, Kabir E, Seyedin S, Masehian E (2018) A query-by-example music retrieval system using feature and decision fusion. Multimed Tools Appl, Springer 77:6165–6189

    Google Scholar 

  8. Boudraa A, Salzenstein F (2018) Teager–Kaiser energy methods for signal and image analysis: a review. Digital Signal Process, Elsevier 78:338–375

    Google Scholar 

  9. Brisson R, Bianchi R (2020) On the relevance of music genre-based analysis in research on musical tastes. Psychol Music, SAGE J 48:777–794

    Google Scholar 

  10. Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimed Syst, Springer 28:779–791

    Google Scholar 

  11. Caparrini A, Arroyo J, Pérez-Molina L, Sánchez-Hernández J (2020) Automatic subgenre classification in an electronic dance music taxonomy. J New Music Res, Taylor and Francis 49:269–284

    Google Scholar 

  12. Castillo JR, Flores MJ (2021) Web-based music genre classification for timeline song visualization and analysis. IEEE Access 9:18801–18816

    Google Scholar 

  13. Costa YMG, Oliveira LS, Silla CN (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput J, Elsevier 52:28–38

    Google Scholar 

  14. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn, IEEE 40:139–157

    Google Scholar 

  15. Doelling KB, Assaneo MF, Bevilacqua D, Pesaran B, Poeppel D (2019) An oscillator model better predicts cortical entrainment to music. Proc Natl Acad Sci 116(20):10113–10121

    Google Scholar 

  16. Elbir A, Ilhan HO, Serbes G, Aydin N (2018) Short time Fourier transform based music genre classification. In: Proceedings of the electric electronics. Computer Science, Biomedical Engineerings’ Meeting. IEEE, pp 1–4

    Google Scholar 

  17. Ellis DPW, Poliner GE (2007) Identifying `cover songs’ with Chroma features and dynamic programming beat tracking. In proceedings of the IEEE international conference on acoustics, speech and signal processing, 4:1429-1432.

  18. Ferretti S (2018) On the complex network structure of musical pieces: analysis of some use cases from different music genres. Multimed Tools Appl, Springer 77:16003–16029

    Google Scholar 

  19. Foleis JH, Tavares TF (2020) Texture selection for automatic music genre classification. Appl Soft Comput J, Elsevier 89:106–127

    Google Scholar 

  20. Fredriksson D (2019) Pathways of pop: arts and education policy, studieförbund and genre hierarchies. In: Marija Dumnić Vilotijević, Ivana Medić (Ed) contemporary Popular Music studies, 19th edition, springer VS, Wiesbaden, Germany.

  21. Fu Z, Lu G, Ting KM, Zhang D (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimedia 13:303–319

    Google Scholar 

  22. Haggblade M, Hong Y, Rao K (2011) Music Genre Classification. Stanford University, pp:1–5.(online) (https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.375.204&rep=rep1&type=pdf)

  23. Holzapfel A, Stylianou Y (2008) Musical genre classification using nonnegative matrix factorization-based features. IEEE Trans Audio Speech Lang Process 16:424–434

    Google Scholar 

  24. Jain U, Nathani K, Ruban N et al (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In proceedings of the 2018 international conference on sensor networks and signal processing. IEEE, 386–391

  25. Jha CK, Kolekar MH (2020) Cardiac arrhythmia classification using tunable Q-wavelet transform based features and support vector machine classifier. Biomedical signal processing and control, Elsevier 59(101875).

  26. Kaiser JF (1990) On a simple algorithm to calculate the “energy” of a signal. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp:381–384

  27. Kaiser JF (1993) Some useful properties of Teager’s energy operators. IEEE Int Conf Acoustics Speech Signal Process 3:149–152

    Google Scholar 

  28. Kiran PU, Abhiram N, Taran S, Bajaj V (2018) TQWT based features for classification of ALS and healthy EMG signals. Am J Compt Sci Inform Technol 6(2):19

    Google Scholar 

  29. Kumaraswamy B (2022) Optimized deep learning for genre classification via improved moth flame algorithm. Multimedia Tools Appl, Springer 81:17071–17093

    Google Scholar 

  30. Kumaraswamy B, Poonacha PG (2021) Deep convolutional neural network for musical genre classification via new self Adaptive Sea lion optimization. Applied soft computing, Elsevier, 108.

  31. Lee J, Nam J (2017) Multi-level and multi-scale feature aggregation using Pretrained convolutional neural networks for music auto-tagging. IEEE Signal Process Lett 24:1208–1212

    Google Scholar 

  32. Lee MC, Nelson SJ (2008) Supervised pattern recognition for the prediction of contrast-enhancement appearance in brain tumors from multivariate magnetic resonance imaging and spectroscopy. Artif Intell Med, Elsevier 43:61–74

    Google Scholar 

  33. Lee CH, Shih JL, Yu KM, Lin HS (2009) Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans Multimedia 11:670–682

    Google Scholar 

  34. Li CB, Choung J, Noh M-H (2018) Wide-banded fatigue damage evaluation of catenary mooring lines using various artificial neural networks models. Marine Struct, Elsevier 60:186–200

    Google Scholar 

  35. Li J, Han L, Li X, … Gou Z (2022) An evaluation of deep neural network models for music classification using spectrograms. Multimed Tools Appl, Springer 81:4621–4647

    Google Scholar 

  36. Li J, Han L, Wang Y, … Yan H (2022) Combined angular margin and cosine margin softmax loss for music classification based on spectrograms. Neural Comput Appl, Springer 34:10337–10353

    Google Scholar 

  37. Liu C, Feng L, Liu G, … Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimedia Tools Appl, Springer 80:7313–7331

    Google Scholar 

  38. Markov K, Matsui T (2014) Music genre and emotion recognition using Gaussian processes. IEEE Access 2:688–697

    Google Scholar 

  39. Nanni L, Costa YMG, Aguiar RL, … Brahnam S (2018) Ensemble of deep learning, visual and acoustic features for music genre classification. J New Music Res, Taylor and Francis 47:383–397

    Google Scholar 

  40. Ng WWY, Zeng W, Wang T (2020) Multi-level local feature coding fusion for music genre recognition. IEEE Access 8:152713–152727

    Google Scholar 

  41. Panagakis Y, Kotropoulos CL, Arce GR (2014) Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans Audio Speech Language Process 22:1905–1917

    Google Scholar 

  42. Pelchat N, Gelowitz CM (2020) Neural network music genre classification. Can J Electr Comput Eng 43(3):170–173

    Google Scholar 

  43. Pichl M, Zangerle E (2021) User models for multi-context-aware music recommendation. Multimed Tools Appl, Springer 80:22509–22531

    Google Scholar 

  44. Sawhney A, Vasavada V, Wang W (2018) Latent feature extraction for musical genres from raw audio. Stanford University

    Google Scholar 

  45. Selesnick IW (2011) Wavelet transform with tunable Q-factor. IEEE Trans Signal Process 59:3560–3575

    MathSciNet  MATH  Google Scholar 

  46. Seo JS, Lee S (2011) Higher-order moments for musical genre classification. Signal Process, Elsevier 91:2154–2157

    MATH  Google Scholar 

  47. Sugianto S, Suyanto S (2019) Voting-Based Music Genre Classification Using Melspectogram and Convolutional Neural Network. In Proceedings of the 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, pp:330–333

  48. Swaminathan S, Schellenberg EG (2015) Current emotion research in music psychology. Emotion Rev, Sage 7(2):189–197

    Google Scholar 

  49. Tachibana H, Ono N, Sagayama S (2014) Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. IEEE/ACM Trans Audio, Speech Language Process 22:228–237

    Google Scholar 

  50. Taran S, Bajaj V (2019) Motor imagery tasks-based EEG signals classification using tunable-Q wavelet transform. Neural Comput Appl, Springer 31:6925–6932

    Google Scholar 

  51. Teager HM (1980) Some observations on oral airflow during phonation. IEEE Trans. Acoustics, Speech, Signal Process 28:599–601

    Google Scholar 

  52. Teager HM, Teager SM (1983) A phenomenological model for vowel production in the vocal tract. Speech Science, Recent Advances, pp 73–109

    Google Scholar 

  53. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10:293–302

    Google Scholar 

  54. Wang Y, Zhang W, Wu L, … Zhao X (2017) Unsupervised metric fusion over Multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Networks Learn Syst 28:57–70

    Google Scholar 

  55. Yu Y, Luo S, Liu S, … Feng L (2020) Deep attention-based music genre classification. Neurocomputing, Elsevier 372:84–91

    Google Scholar 

  56. Zou Q, Jiang H, Dai Q, … Wang Q (2020) Robust lane detection from continuous driving scenes using deep neural networks. IEEE Trans Veh Technol 69:41–54

    Google Scholar 

Download references

Funding

This work is not funded in any funding agencies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sachin Taran.

Ethics declarations

Conflict of interest

To the best of our knowledge, this work does not have any financial and/or non-financial conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, D., Taran, S. & Pandey, A. A fusion way of feature extraction for automatic categorization of music genres. Multimed Tools Appl 82, 25015–25038 (2023). https://doi.org/10.1007/s11042-023-14371-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14371-8

Keywords

Navigation