Parallel attention of representation global time–frequency correlation for music genre classification

Wen, Zhifang; Chen, Aibin; Zhou, Guoxiong; Yi, Jizheng; Peng, Weixiong

doi:10.1007/s11042-023-16024-2

Parallel attention of representation global time–frequency correlation for music genre classification

Published: 20 June 2023

Volume 83, pages 10211–10231, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhifang Wen¹,
Aibin Chen ORCID: orcid.org/0000-0003-4410-412X¹,
Guoxiong Zhou¹,
Jizheng Yi¹ &
…
Weixiong Peng²

240 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Music genre classification (MGC) is an indispensable branch of music information retrieval. With the prevalence of end-to-end learning, the research on MGC has made some breakthroughs. However, the limited receptive field of convolutional neural network (CNN) cannot capture a correlation between temporal frames of sounding at any moment and sound frequencies of all vibrations in the song. Meanwhile, time–frequency information of channels is not equally important. In order to deal with the above problems, we apply dual parallel attention (DPA) in CNN-5 to focus on global dependencies. First, we propose parallel channel attention (PCA) to build global time–frequency dependencies in the song and study the influence of different weighting methods for PCA. Next, we design dual parallel attention, which focuses on global time–frequency dependencies in the song and adaptively calibrates contribution of different channels to feature map. Then, we analyzed the effect of applying different numbers and positions of DPA in CNN-5 for performance and compared DPA with multiple attention mechanisms. The results on GTZAN dataset demonstrated that the proposed method achieves a classification accuracy of 91.4%, and DPA has the highest performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Automatic speech recognition: a survey

Article 10 November 2020

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Notes

GTZAN is divided according to the proportion of 9:1 for the training set and test set in the paper [36], and ten-fold cross-validation is adopted. The average of the ten test results is the final result, which is consistent with the strategy of our paper. We treat the results of the test set as final when the model training is complete. Unfortunately, we were unable to reproduce the classification accuracy mentioned in their paper, which may be due to insufficient training details provided in the paper.
It is worth noting that: Non-local and DANet are codes provided by the author. FLA and PTS-A are implemented based on the content of the paper. All attention is applied in CNN-5 for comparison in the same way.

References

Ashraf M et al (2020) A Globally Regularized Joint Neural Architecture for Music Classification. IEEE Access 8:220980–220989
Article Google Scholar
Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimedia Syst 28(3):779–791
Article Google Scholar
Downie JS (2003) Music information retrieval. Ann Rev Inf Sci Technol 37(1):295–340
Article Google Scholar
Fu Z et al (2011) A Survey of Audio-Based Music Classification and Annotation. IEEE Trans Multimedia 13(2):303–319
Article Google Scholar
Gao Y (2020) Research on Music Audio Classification Based on Deep Learning. South China University of Technology Guangzhou, China
Google Scholar
Gardner MW, Dorling S (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636
Article Google Scholar
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Promane BC (2009) Freddie mercury and queen: Technologies of genre and the poetics of innovation. University of Western Ontario, School of Graduate and Postdoctoral Studies
Google Scholar
Sarikaya R, Hinton GE, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(4):778–784
Article Google Scholar
Scalvenzi RR, Guido RC, Marranghello N (2019) Wavelet-packets associated with support vector machine are effective for monophone sorting in music signals. Int. J. Semant. Comput. 13(03):415–425
Article Google Scholar
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Transactions on speech and audio processing 10(5):293–302
Article Google Scholar
Weiss K, Khoshgoftaar TM, Wang D (2016) A survey of transfer learning. Journal of Big data 3(1):1–40
Article Google Scholar
Yu Y et al (2020) Deep attention based music genre classification. Neurocomputing 372:84–91
Article Google Scholar
Zhang X et al (2019) Spectrogram-frame linear network and continuous frame sequence for bird sound classification. Eco Inform 54:101009
Article Google Scholar
Zhang Z et al (2021) Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903
Article Google Scholar
Schedl M, Gómez Gutiérrez E, and Urbano J (2014) Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval. 12; 8 (2–3): 127–261
Ndou N, Ajoodha R, Jadhav A (2021) Music Genre Classification: A Review of Deep-Learning and Traditional Machine-Learning Approaches. in 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). IEEE
Gupta R, Yadav J, and Kapoor C (2021) Music information retrieval and intelligent genre classification. in Proceedings of International Conference on Intelligent Computing, Information and Control Systems Springer
Pálmason H, et al (2017) Music genre classification revisited: An in-depth examination guided by music experts. in International Symposium on Computer Music Multidisciplinary Research 7 Springer
Baniya BK, Ghimire D, Lee J (2014) A novel approach of automatic music genre classification based on timbrai texture and rhythmic content features. in 16th International Conference on Advanced Communication Technology IEEE
Arabi, A.F. and G. Lu. Enhanced polyphonic music genre classification using high level features. in 2009 IEEE International Conference on Signal and Image Processing Applications. 2009. IEEE
Saunders C et al (1998) Support vector machine reference manual
Sarkar R, and Saha SK (2015) Music genre classification using EMD and pitch based feature. in 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR) IEEE
Vaswani A et al (2017) Attention is all you need. in Advances in neural information processing systems
He K et al (2016) Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition
Piczak KJ (2015) Environmental sound classification with convolutional neural networks. in 2015 IEEE 25th international workshop on machine learning for signal processing (MLSP) IEEE
Himawan I, Towsey M, Roe (2018) P 3D convolution recurrent neural networks for bird sound detection. in Proceedings of the 3rd Workshop on Detection and Classification of Acoustic Scenes and Events. Detection and Classification of Acoustic Scenes and Events
Kahl S et al (2017) Large-Scale Bird Sound Classification using Convolutional Neural Networks, in CLEF (working notes)
Yang B (2008) A study of inverse short-time Fourier transform. in 2008 IEEE Int. Conf. Acoust. Speech Signal Process. IEEE
Zhang W et al (2016) Improved Music Genre Classification with Convolutional Neural Networks, in Interspeech 2016. 3304–3308
Choi K et al (2017) Convolutional recurrent neural networks for music classification. in 2017 IEEE Int. Conf. Acoust. Speech Signal Process (ICASSP) IEEE
Cho K et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit
Yang H, Zhang W.-Q (2019) Music Genre Classification Using Duplicated Convolutional Layers in Neural Networks, in Interspeech 2019 3382–3386
Chang P-C, Chen Y-S, Lee C.-H (2021) MS-SincResNet: Joint Learning of 1D and 2D Kernels Using Multi-scale SincNet and ResNet for Music Genre Classification, in Proceedings of the 2021 Int. Conf Multimed. Retr.. 29–36
Choi K et al (2017) Transfer learning for music classification and regression tasks. arXiv preprint arXiv:1703.09179
Srinivasu PN et al (2022) Ambient Assistive Living for Monitoring the Physical Activity of Diabetic Adults through Body Area Networks. Mob. Inf. Syst 2022
Wang X et al (2018) Non-local neural networks. in Proceedings of the IEEE Conf. Comput. Vis. Pattern Recognit
Wang H et al (2019) Environmental sound classification with parallel temporal-spectral attention. arXiv preprint arXiv:1912.06808
Huang Z et al (2022). ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition. arXiv preprint arXiv:2204.05649
Dosovitskiy A et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Gong Y, Chung Y-A, and Glass J (2021) Ast: Audio spectrogram transformer. arXiv preprint arXiv:2104.01778
Yang L, and Zhao H (2021) Sound Classification Based on Multihead Attention and Support Vector Machine. Math. Probl. Eng 2021
Lin M, Chen Q, and Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Ioffe S, and Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. in Int confe machine learning. PMLR
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. in Icml
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Zhang P et al (2015) A Deep Neural Network for Modeling Music, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval 379–386
Karunakaran N, Arya A (2018) A scalable hybrid classifier for music genre classification using machine learning concepts and spark. in 2018 Int Confe Intell Auton Syst (ICoIAS) IEEE
Fu J et al (2019) Dual attention network for scene segmentation. in Proceedings of the IEEE/CVF Conf. Comput. Vis. Pattern Recognit

Download references

Funding

This work was supported by Postgraduate Scientific Research Innovation Project of Hunan Province (CX20210879), Postgraduate Scientific Research Innovation Project of Central South University of Forestry and Technology (CX202102059) and Hunan Key Laboratory of Intelligent Logistics Technology (2019TP1015).

Author information

Authors and Affiliations

Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha, 410004, China
Zhifang Wen, Aibin Chen, Guoxiong Zhou & Jizheng Yi
Hunan Zixing Artificial Intelligence Technology Group Co, Ltd, Beijing, China
Weixiong Peng

Authors

Zhifang Wen
View author publications
You can also search for this author in PubMed Google Scholar
Aibin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guoxiong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jizheng Yi
View author publications
You can also search for this author in PubMed Google Scholar
Weixiong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aibin Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wen, Z., Chen, A., Zhou, G. et al. Parallel attention of representation global time–frequency correlation for music genre classification. Multimed Tools Appl 83, 10211–10231 (2024). https://doi.org/10.1007/s11042-023-16024-2

Download citation

Received: 07 June 2022
Revised: 06 May 2023
Accepted: 11 June 2023
Published: 20 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-16024-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel attention of representation global time–frequency correlation for music genre classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Automatic speech recognition: a survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel attention of representation global time–frequency correlation for music genre classification

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Automatic speech recognition: a survey

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation