Skip to main content
Log in

Parallel attention of representation global time–frequency correlation for music genre classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Music genre classification (MGC) is an indispensable branch of music information retrieval. With the prevalence of end-to-end learning, the research on MGC has made some breakthroughs. However, the limited receptive field of convolutional neural network (CNN) cannot capture a correlation between temporal frames of sounding at any moment and sound frequencies of all vibrations in the song. Meanwhile, time–frequency information of channels is not equally important. In order to deal with the above problems, we apply dual parallel attention (DPA) in CNN-5 to focus on global dependencies. First, we propose parallel channel attention (PCA) to build global time–frequency dependencies in the song and study the influence of different weighting methods for PCA. Next, we design dual parallel attention, which focuses on global time–frequency dependencies in the song and adaptively calibrates contribution of different channels to feature map. Then, we analyzed the effect of applying different numbers and positions of DPA in CNN-5 for performance and compared DPA with multiple attention mechanisms. The results on GTZAN dataset demonstrated that the proposed method achieves a classification accuracy of 91.4%, and DPA has the highest performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. GTZAN is divided according to the proportion of 9:1 for the training set and test set in the paper [36], and ten-fold cross-validation is adopted. The average of the ten test results is the final result, which is consistent with the strategy of our paper. We treat the results of the test set as final when the model training is complete. Unfortunately, we were unable to reproduce the classification accuracy mentioned in their paper, which may be due to insufficient training details provided in the paper.

  2. It is worth noting that: Non-local and DANet are codes provided by the author. FLA and PTS-A are implemented based on the content of the paper. All attention is applied in CNN-5 for comparison in the same way.

References

  1. Ashraf M et al (2020) A Globally Regularized Joint Neural Architecture for Music Classification. IEEE Access 8:220980–220989

    Article  Google Scholar 

  2. Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimedia Syst 28(3):779–791

    Article  Google Scholar 

  3. Downie JS (2003) Music information retrieval. Ann Rev Inf Sci Technol 37(1):295–340

    Article  Google Scholar 

  4. Fu Z et al (2011) A Survey of Audio-Based Music Classification and Annotation. IEEE Trans Multimedia 13(2):303–319

    Article  Google Scholar 

  5. Gao Y (2020) Research on Music Audio Classification Based on Deep Learning. South China University of Technology Guangzhou, China

    Google Scholar 

  6. Gardner MW, Dorling S (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636

    Article  Google Scholar 

  7. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501

    Article  Google Scholar 

  8. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

    Google Scholar 

  9. Promane BC (2009) Freddie mercury and queen: Technologies of genre and the poetics of innovation. University of Western Ontario, School of Graduate and Postdoctoral Studies

    Google Scholar 

  10. Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(4):778–784

    Article  Google Scholar 

  11. Scalvenzi RR, Guido RC, Marranghello N (2019) Wavelet-packets associated with support vector machine are effective for monophone sorting in music signals. Int. J. Semant. Comput. 13(03):415–425

    Article  Google Scholar 

  12. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5):293–302

    Article  Google Scholar 

  13. Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. Journal of Big data 3(1):1–40

    Article  Google Scholar 

  14. Yu Y et al (2020) Deep attention based music genre classification. Neurocomputing 372:84–91

    Article  Google Scholar 

  15. Zhang X et al (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Eco Inform 54:101009

    Article  Google Scholar 

  16. Zhang Z et al (2021) Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903

    Article  Google Scholar 

  17. Schedl M, Gómez Gutiérrez E, and Urbano J (2014) Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval. 12; 8 (2–3): 127–261

  18. Ndou N, Ajoodha R, Jadhav A (2021) Music Genre Classification: A Review of Deep-Learning and Traditional Machine-Learning Approaches. in 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). IEEE

  19. Gupta R, Yadav J, and Kapoor C (2021) Music information retrieval and intelligent genre classification. in Proceedings of International Conference on Intelligent Computing, Information and Control Systems Springer

  20. Pálmason H, et al (2017) Music genre classification revisited: An in-depth examination guided by music experts. in International Symposium on Computer Music Multidisciplinary Research 7 Springer

  21. Baniya BK, Ghimire D, Lee J (2014) A novel approach of automatic music genre classification based on timbrai texture and rhythmic content features. in 16th International Conference on Advanced Communication Technology IEEE

  22. Arabi, A.F. and G. Lu. Enhanced polyphonic music genre classification using high level features. in 2009 IEEE International Conference on Signal and Image Processing Applications. 2009. IEEE

  23. Saunders C et al (1998) Support vector machine reference manual

  24. Sarkar R, and Saha SK (2015) Music genre classification using EMD and pitch based feature. in 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR) IEEE

  25. Vaswani A et al (2017) Attention is all you need. in Advances in neural information processing systems

  26. He K et al (2016) Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition

  27. Piczak KJ (2015) Environmental sound classification with convolutional neural networks. in 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP) IEEE

  28. Himawan I, Towsey M, Roe (2018) P 3D convolution recurrent neural networks for bird sound detection. in Proceedings of the 3rd Workshop on Detection and Classification of Acoustic Scenes and Events. Detection and Classification of Acoustic Scenes and Events

  29. Kahl S et al (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks, in CLEF (working notes)

  30. Yang B (2008) A study of inverse short-time Fourier transform. in 2008 IEEE Int. Conf. Acoust. Speech Signal Process. IEEE

  31. Zhang W et al (2016) Improved Music Genre Classification with Convolutional Neural Networks, in Interspeech 2016. 3304–3308

  32. Choi K et al (2017) Convolutional recurrent neural networks for music classification. in 2017 IEEE Int. Conf. Acoust. Speech Signal Process (ICASSP) IEEE

  33. Cho K et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  34. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit

  35. Yang H, Zhang W.-Q (2019) Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks, in Interspeech 2019 3382–3386

  36. Chang P-C, Chen Y-S, Lee C.-H (2021) MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification, in Proceedings of the 2021 Int. Conf Multimed. Retr.. 29–36

  37. Choi K et al (2017) Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179

  38. Srinivasu PN et al (2022) Ambient Assistive Living for Monitoring the Physical Activity of Diabetic Adults through Body Area Networks. Mob. Inf. Syst 2022

  39. Wang X et al (2018) Non-local neural networks. in Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit

  40. Wang H et al (2019) Environmental sound classification with parallel temporal-spectral attention. arXiv preprint arXiv:1912.06808

  41. Huang Z et al (2022). ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition. arXiv preprint arXiv:2204.05649

  42. Dosovitskiy A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  43. Gong Y, Chung Y-A, and Glass J (2021) Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778

  44. Yang L, and Zhao H (2021) Sound Classification Based on Multihead Attention and Support Vector Machine. Math. Probl. Eng 2021

  45. Lin M, Chen Q, and Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400

  46. Ioffe S, and Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. in Int confe machine learning. PMLR

  47. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. in Icml

  48. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  49. Zhang P et al (2015) A Deep Neural Network for Modeling Music, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval 379–386

  50. Karunakaran N, Arya A (2018) A scalable hybrid classifier for music genre classification using machine learning concepts and spark. in 2018 Int Confe Intell Auton Syst (ICoIAS) IEEE

  51. Fu J et al (2019) Dual attention network for scene segmentation. in Proceedings of the IEEE/CVF Conf. Comput. Vis. Pattern Recognit

Download references

Funding

This work was supported by Postgraduate Scientific Research Innovation Project of Hunan Province (CX20210879), Postgraduate Scientific Research Innovation Project of Central South University of Forestry and Technology (CX202102059) and Hunan Key Laboratory of Intelligent Logistics Technology (2019TP1015).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aibin Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, Z., Chen, A., Zhou, G. et al. Parallel attention of representation global time–frequency correlation for music genre classification. Multimed Tools Appl 83, 10211–10231 (2024). https://doi.org/10.1007/s11042-023-16024-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16024-2

Keywords

Navigation