Abstract
Over the past decade, the invention of streaming services has led to the magnification of the music industry. With a plethora of available song choices, there is a dire need for recommendation techniques to help listeners discover music genres complementing their palate. This makes a vital need for automatic music genre categorization systems. With this objective, in this work fusion of direct and indirect features is introduced for the automatic categorization of music genres. In direct Feature Extraction (FE), the physical characteristics of music genres are assessed by timbral, chroma, and source separation-based features. In indirect FE, tunable Q-Wavelet transform and Teager energy operator are used to explore the non-linear characteristics of music signals. The proposed algorithm is examined on the GTZAN dataset, primarily focusing on the four-class classification problem. The introduced features are tested with multiple machine learning techniques to explore the best for music genre categorization. The wide neural network classifier with a single fully connected layer churned out optimal performance fetching an overall accuracy and F1 score of 95.8% and 95.82%, respectively. The proposed algorithm also outperforms most of the state-of-the-art techniques for the given dataset.
Similar content being viewed by others
Data availability
The dataset that support the findings of this study is belongs to [GTZAN] and it is available on Kaggle.
References
Abdoli S, Cardinal P, Lameiras Koerich A (2019) End-to-end environmental sound classification using a 1D convolutional neural network. Expert Syst Appl, Elsevier 136:252–263
Baniya BK, Lee J (2016) Importance of audio feature reduction in automatic music genre classification. Multimed Tools Appl, Springer 75:3013–3026
Bhatti UA, Huang M, Wang H, Zhang Y, Mehmood A, Di W (2017) Recommendation system for immunization coverage and monitoring. Human Vacc Immun, Taylor and Francis 14(1):165–171
Bhatti UA, Huang M, Wu D, Zhang Y, Mehmood A, Han H (2018) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst Taylor and Francis 13(3):329–351
Bhatti UA, Yuan L, Yu Z, Li J, Nawaz SA, Mehmood A, Zhang K (2020) Hybrid watermarking algorithm using Clifford algebra with Arnold scrambling and chaotic encryption. IEEE Access 8:76386–76398
Bhatti UA, … Mehmood A (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and Gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15
Borjian N, Kabir E, Seyedin S, Masehian E (2018) A query-by-example music retrieval system using feature and decision fusion. Multimed Tools Appl, Springer 77:6165–6189
Boudraa A, Salzenstein F (2018) Teager–Kaiser energy methods for signal and image analysis: a review. Digital Signal Process, Elsevier 78:338–375
Brisson R, Bianchi R (2020) On the relevance of music genre-based analysis in research on musical tastes. Psychol Music, SAGE J 48:777–794
Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimed Syst, Springer 28:779–791
Caparrini A, Arroyo J, Pérez-Molina L, Sánchez-Hernández J (2020) Automatic subgenre classification in an electronic dance music taxonomy. J New Music Res, Taylor and Francis 49:269–284
Castillo JR, Flores MJ (2021) Web-based music genre classification for timeline song visualization and analysis. IEEE Access 9:18801–18816
Costa YMG, Oliveira LS, Silla CN (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput J, Elsevier 52:28–38
Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn, IEEE 40:139–157
Doelling KB, Assaneo MF, Bevilacqua D, Pesaran B, Poeppel D (2019) An oscillator model better predicts cortical entrainment to music. Proc Natl Acad Sci 116(20):10113–10121
Elbir A, Ilhan HO, Serbes G, Aydin N (2018) Short time Fourier transform based music genre classification. In: Proceedings of the electric electronics. Computer Science, Biomedical Engineerings’ Meeting. IEEE, pp 1–4
Ellis DPW, Poliner GE (2007) Identifying `cover songs’ with Chroma features and dynamic programming beat tracking. In proceedings of the IEEE international conference on acoustics, speech and signal processing, 4:1429-1432.
Ferretti S (2018) On the complex network structure of musical pieces: analysis of some use cases from different music genres. Multimed Tools Appl, Springer 77:16003–16029
Foleis JH, Tavares TF (2020) Texture selection for automatic music genre classification. Appl Soft Comput J, Elsevier 89:106–127
Fredriksson D (2019) Pathways of pop: arts and education policy, studieförbund and genre hierarchies. In: Marija Dumnić Vilotijević, Ivana Medić (Ed) contemporary Popular Music studies, 19th edition, springer VS, Wiesbaden, Germany.
Fu Z, Lu G, Ting KM, Zhang D (2011) A survey of audio-based music classification and annotation. IEEE Trans Multimedia 13:303–319
Haggblade M, Hong Y, Rao K (2011) Music Genre Classification. Stanford University, pp:1–5.(online) (https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.375.204&rep=rep1&type=pdf)
Holzapfel A, Stylianou Y (2008) Musical genre classification using nonnegative matrix factorization-based features. IEEE Trans Audio Speech Lang Process 16:424–434
Jain U, Nathani K, Ruban N et al (2018) Cubic SVM classifier based feature extraction and emotion detection from speech signals. In proceedings of the 2018 international conference on sensor networks and signal processing. IEEE, 386–391
Jha CK, Kolekar MH (2020) Cardiac arrhythmia classification using tunable Q-wavelet transform based features and support vector machine classifier. Biomedical signal processing and control, Elsevier 59(101875).
Kaiser JF (1990) On a simple algorithm to calculate the “energy” of a signal. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp:381–384
Kaiser JF (1993) Some useful properties of Teager’s energy operators. IEEE Int Conf Acoustics Speech Signal Process 3:149–152
Kiran PU, Abhiram N, Taran S, Bajaj V (2018) TQWT based features for classification of ALS and healthy EMG signals. Am J Compt Sci Inform Technol 6(2):19
Kumaraswamy B (2022) Optimized deep learning for genre classification via improved moth flame algorithm. Multimedia Tools Appl, Springer 81:17071–17093
Kumaraswamy B, Poonacha PG (2021) Deep convolutional neural network for musical genre classification via new self Adaptive Sea lion optimization. Applied soft computing, Elsevier, 108.
Lee J, Nam J (2017) Multi-level and multi-scale feature aggregation using Pretrained convolutional neural networks for music auto-tagging. IEEE Signal Process Lett 24:1208–1212
Lee MC, Nelson SJ (2008) Supervised pattern recognition for the prediction of contrast-enhancement appearance in brain tumors from multivariate magnetic resonance imaging and spectroscopy. Artif Intell Med, Elsevier 43:61–74
Lee CH, Shih JL, Yu KM, Lin HS (2009) Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features. IEEE Trans Multimedia 11:670–682
Li CB, Choung J, Noh M-H (2018) Wide-banded fatigue damage evaluation of catenary mooring lines using various artificial neural networks models. Marine Struct, Elsevier 60:186–200
Li J, Han L, Li X, … Gou Z (2022) An evaluation of deep neural network models for music classification using spectrograms. Multimed Tools Appl, Springer 81:4621–4647
Li J, Han L, Wang Y, … Yan H (2022) Combined angular margin and cosine margin softmax loss for music classification based on spectrograms. Neural Comput Appl, Springer 34:10337–10353
Liu C, Feng L, Liu G, … Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimedia Tools Appl, Springer 80:7313–7331
Markov K, Matsui T (2014) Music genre and emotion recognition using Gaussian processes. IEEE Access 2:688–697
Nanni L, Costa YMG, Aguiar RL, … Brahnam S (2018) Ensemble of deep learning, visual and acoustic features for music genre classification. J New Music Res, Taylor and Francis 47:383–397
Ng WWY, Zeng W, Wang T (2020) Multi-level local feature coding fusion for music genre recognition. IEEE Access 8:152713–152727
Panagakis Y, Kotropoulos CL, Arce GR (2014) Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans Audio Speech Language Process 22:1905–1917
Pelchat N, Gelowitz CM (2020) Neural network music genre classification. Can J Electr Comput Eng 43(3):170–173
Pichl M, Zangerle E (2021) User models for multi-context-aware music recommendation. Multimed Tools Appl, Springer 80:22509–22531
Sawhney A, Vasavada V, Wang W (2018) Latent feature extraction for musical genres from raw audio. Stanford University
Selesnick IW (2011) Wavelet transform with tunable Q-factor. IEEE Trans Signal Process 59:3560–3575
Seo JS, Lee S (2011) Higher-order moments for musical genre classification. Signal Process, Elsevier 91:2154–2157
Sugianto S, Suyanto S (2019) Voting-Based Music Genre Classification Using Melspectogram and Convolutional Neural Network. In Proceedings of the 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, pp:330–333
Swaminathan S, Schellenberg EG (2015) Current emotion research in music psychology. Emotion Rev, Sage 7(2):189–197
Tachibana H, Ono N, Sagayama S (2014) Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. IEEE/ACM Trans Audio, Speech Language Process 22:228–237
Taran S, Bajaj V (2019) Motor imagery tasks-based EEG signals classification using tunable-Q wavelet transform. Neural Comput Appl, Springer 31:6925–6932
Teager HM (1980) Some observations on oral airflow during phonation. IEEE Trans. Acoustics, Speech, Signal Process 28:599–601
Teager HM, Teager SM (1983) A phenomenological model for vowel production in the vocal tract. Speech Science, Recent Advances, pp 73–109
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10:293–302
Wang Y, Zhang W, Wu L, … Zhao X (2017) Unsupervised metric fusion over Multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Networks Learn Syst 28:57–70
Yu Y, Luo S, Liu S, … Feng L (2020) Deep attention-based music genre classification. Neurocomputing, Elsevier 372:84–91
Zou Q, Jiang H, Dai Q, … Wang Q (2020) Robust lane detection from continuous driving scenes using deep neural networks. IEEE Trans Veh Technol 69:41–54
Funding
This work is not funded in any funding agencies.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
To the best of our knowledge, this work does not have any financial and/or non-financial conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sharma, D., Taran, S. & Pandey, A. A fusion way of feature extraction for automatic categorization of music genres. Multimed Tools Appl 82, 25015–25038 (2023). https://doi.org/10.1007/s11042-023-14371-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14371-8