Music genre classification based on fusing audio and lyric information

Li, You; Zhang, Zhihai; Ding, Han; Chang, Liang

doi:10.1007/s11042-022-14252-6

Music genre classification based on fusing audio and lyric information

Published: 29 December 2022

Volume 82, pages 20157–20176, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

You Li ORCID: orcid.org/0000-0002-9182-9783^1,2,
Zhihai Zhang²,
Han Ding¹ &
…
Liang Chang¹

945 Accesses
3 Citations
Explore all metrics

Abstract

Music genre classification (MGC) has a wide range of application scenarios. Traditional MGC methods only consider either audio information or lyric information, resulting in an unsatisfactory recognition effect. In this paper, we propose a multimodal music genre classification framework that integrates both audio information and lyric information. By using the complementarity of multimodal information, music genres can be represented more comprehensively. First, the framework extracts the mel-spectrogram of audio, and a convolutional neural network is used to extract audio features. Simultaneously, BERT is used to obtain the distributed representation of the lyrics. Then, the two modal pieces of information are fused through different strategies, such as at the feature level and decision level. To solve the serious inconsistency between the convergence speed of the audio channel and the lyric channel, we adopt the strategy of asynchronous start training of two channels and different learning rates. A series of experiments are carried out to verify the effectiveness of the proposed model. The F1 score of the proposed model is 0.87 for music genre classification, which is approximately 4% higher than that of the best baseline in the experiment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Deep Neural Network Model for Music Genre Recognition

Music genre classification based on res-gated CNN and attention mechanism

Article 06 July 2023

Music Genre Classification: A Comparative Study Between Deep Learning and Traditional Machine Learning Approaches

Data Availability

The datasets generated during the current study are available from the corresponding author on reasonable request.

Notes

This dataset is available at https://github.com/MKMaS-GUET/Music-genre-classification.
https://librosa.org/doc/latest/index.html
https://github.com/google-research/bert

References

Albadr MAA, Tiun S, Ayob M, Mohammed M, Al-Dhief FT (2021) Mel-frequency cepstral coefficient features based on standard deviation and principal component analysis for language identification systems. Cogn Comput 13:1136–1153. https://doi.org/10.1007/s12559-021-09914-w
Article Google Scholar
Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, et al. (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and gabor filtering. IEEE Trans Geosci Remote Sens 60. https://doi.org/10.1109/TGRS.2021.3090410
Chen T, Xie Y, Zhang S, Huang S, Zhou H, Li J (2022) Learning music sequence representation from text supervision. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP43922.2022.9746131, pp 4583–4587
Choi K, Fazekas G, Sandler M (2016) Automatic tagging using deep convolutional neural networks. In: Proceedings of the 17th international society for music information retrieval conference, pp 805–811
Coban O, Ozyer GT (2016) Music genre classification from Turkish lyrics. In: 2016 24th signal processing and communication application conference (SIU). https://doi.org/10.1109/siu.2016.7495686 https://doi.org/10.1109/siu.2016.7495686, pp 101–104
Corrêa DC, Rodrigues FA (2016) A survey on symbolic data-based music genre classification. Expert Syst Appl 60(C):190–210. https://doi.org/10.1016/j.eswa.2016.04.008
Article Google Scholar
Çoban Ö (2017) Turkish music genre classification using audio and lyrics features. Süleyman Demirel Üniv Fen Bilimleri Enstitüsü Derg 21(2):322–331. https://doi.org/10.19113/sdufbed.88303
Article Google Scholar
Daouadi KE, Reba ZR, Amous I (2021) Optimizing semantic deep forest for tweet topic classification. Inf Syst 101(2):101801. https://doi.org/10.1016/j.is.2021.101801
Article Google Scholar
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, vol 1 (Long and Short Papers). https://doi.org/10.18653/v1/N19-1423, pp 4171–4186
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp.2014.6854950 https://doi.org/10.1109/icassp.2014.6854950, pp 6964–6968
Fang J, Grunberg D, Litman DT, Wang Y (2017) Discourse analysis of lyric and lyric-based classification of music. In: ISMIR. https://doi.org/10.5281/zenodo.1416946, pp 464–471
Fell M, Sporleder C (2014) Lyrics-based analysis and classification of music. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, pp 620–631
Hassen AK, Janßen H, Assenmacher D, Preuss M, Vatolkin I (2018) Classifying music genres using image classification neural networks. Arch Data Sci Ser A (Online First) 5(1):20. https://doi.org/10.5445/KSP/1000087327/20
Google Scholar
Hu Z, Liu Y, Chen G, Zhong S, Zhang A (2020) Make your favorite music curative: music style transfer for anxiety reduction. Proceedings of the 28th ACM international conference on multimedia. https://doi.org/10.1145/3394171.3414070
Huang Y, Du C, Xue Z, Chen X, Zhao H, Huang L (2021) What makes multimodal learning better than single (provably). In: 35th Conference on neural information processing systems. https://doi.org/10.48550/arXiv.2106.04538, pp 10944–10956
Huang Q, Jansen A, Zhang L, Ellis PWD, Saurous AR, Anderson RJ (2020) Large-scale weakly-supervised content embeddings for music recommendation and tagging. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8364–8368. https://doi.org/10.1109/ICASSP40776.2020.9053240
Kamtue K, Euchukanonchai K, Wanvarie D, Pratanwanich N (2019) Lukthung classification using neural networks on lyrics and audios. In: 2019 23rd international computer science and engineering conference (ICSEC). https://doi.org/10.1109/ICSEC47112.2019.8974740, pp 269–274
Kumar A, Rajpal A, Rathore D (2018) Genre classification using feature extraction and deep learning techniques. In: 2018 10th International conference on knowledge and systems engineering (KSE). https://doi.org/10.1109/KSE.2018.8573325, pp 175–180
Kumar A, Rajpal A, Rathore D (2018) Genre classification using word embeddings and deep learning. In: 2018 International conference on advances in computing, communications and informatics (ICACCI). https://doi.org/10.1109/icacci.2018.8554816, pp 2142–2146
Lee J, Park J, Kim KL, Nam J (2017) Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv:1703.01789, https://doi.org/10.1109/icassp.2018.8462046
Lee J, Park J, Kim KL, Nam J (2018) Samplecnn: end-to-end deep convolutional neural networks using very small filters for music classification. Appl Sci 8(1):150. https://doi.org/10.3390/app8010150
Article Google Scholar
Li T, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 IEEE workshop on applications of signal processing to audio and acoustics. https://doi.org/10.1109/aspaa.2003.1285840, pp 143–146
Lin Y-H, Chen HH (2021) Tag propagation and cost-sensitive learning for music auto-tagging. IEEE Trans Multimedia 23:1605–1616. https://doi.org/10.1109/TMM.2020.3001521
Article Google Scholar
Lin Y, Fu Y, Li Y, Cai G, Zhou A (2021) Aspect-based sentiment analysis for online reviews with hybrid attention networks. World Wide Web 24:1215–1233. https://doi.org/10.1007/s11280-021-00898-z
Article Google Scholar
Liu C, Feng L, Liu G, Wang H, Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimed Tools Appl 80 (5):7313–7331. https://doi.org/10.1007/s11042-020-09643-6
Article Google Scholar
Makhmutov M (2019) Adaptive game soundtrack generation based on music transcription. Proc AAAI Conf Artif Intell Interact Digit Entertain 15 (1):216–218
Google Scholar
Manco I, Benetos E, Quinton E, Fazekas G (2022) Learning music audio representations via weak language supervision. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP43922.2022.9746996, pp 456–460
Mayer R, Neumayer R, Rauber A (2008) Rhyme and style features for musical genre classification by song lyrics. In: 9th international conference on music information retrieval, pp 337–342
Mayer R, Rauber A (2010) Building ensembles of audio and lyrics features to improve musical genre classification. pp 1–6
Mayer R, Rauber A (2011) Musical genre classification by ensembles of audio and lyrics features. In: Proceedings of international conference on music information retrieval, pp 675–680
Neforawati I, Pratama MO, Satyawan W (2019) Indonesian lyrics classification using feature level fusion. In: 2019 2nd International conference of computer and informatics engineering (IC2IE). https://doi.org/10.1109/IC2IE47452.2019.8940826, pp 6–11
Nguyen HQ, Do TT, Chu BT, Trinh VL, Nguyen HD, Phan VC, Phan AT, Doan VD, Pham NH, Nguyen PB et al (2019) Music genre classification using residual attention network. In: 2019 International conference on system science and engineering (ICSSE). https://doi.org/10.1109/icsse.2019.8823100 https://doi.org/10.1109/icsse.2019.8823100, pp 115–119
Oramas S, Barbieri F, Nieto Caballero O, Serra X (2018) Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retrieval 1(1):4–21. https://doi.org/10.5334/tismir.10
Article Google Scholar
Pons J, Nieto O, Prockup M, Schmidt E, Ehmann A, Serra X (2018) End-to-end learning for music audio tagging at scale. In: Proceedings of the 19th international society for music information retrieval conference. https://doi.org/10.48550/arXiv.1711.02520, pp 637–644
Pons J, Slizovskaia O, Gong R, Gómez E, Serra X (2017) Timbre analysis of music audio signals with convolutional neural networks. In: 2017 25th European signal processing conference (EUSIPCO). https://doi.org/10.23919/eusipco.2017.8081710, pp 2744–2748
Senac C, Pellegrini T, Mouret F, Pinquier J (2017) Music feature maps with convolutional neural networks for music genre classification. In: Proceedings of the 15th international workshop on content-based multimedia indexing. https://doi.org/10.1145/3095713.3095733, pp 1–5
Tang H, Chen N (2020) Combining CNN and broad learning for music classification. IEICE Trans Inf Syst 103(3):695–701. https://doi.org/10.1587/transinf.2019edp7175
Article Google Scholar
Tsaptsinos A (2017) Lyrics-based music genre classification using a hierarchical attention network. In: Proceedings of the 18th international society for music information retrieval conference, ISMIR 2017. https://doi.org/10.48550/arXiv.1707.04678, pp 694–701
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302. https://doi.org/10.1109/tsa.2002.800560
Article Google Scholar
Wadhwa L, Mukherjee P (2021) Music genre classification using multi-modal deep learning based fusion. 2021 Grace Hopper Celebration India (GHCI), 1–5. https://doi.org/10.1109/GHCI50508.2021.9514020 https://doi.org/10.1109/GHCI50508.2021.9514020
Yaslan Y, Cataltepe Z (2006) Audio music genre classification using different classifiers and feature selection methods. In: 18th International conference on pattern recognition (ICPR’06). https://doi.org/10.1109/icpr.2006.282, vol 2, pp 573–576
Yu Y, Tang S, Raposo F, Chen L (2019) Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans Multimed Comput Commun Appl 15(1):1–16. https://doi.org/10.1145/3281746
Article Google Scholar
Yuan C, Ma Q, Chen J, Zhou W, Zhang X, Tang X, Han J, Hu S (2020) Exploiting heterogeneous artist and listener preference graph for music genre classification. In: Proceedings of the 28th ACM international conference on multimedia. https://doi.org/10.1145/3394171.3414000, pp 3532–3540
Zeeshan Z, Ain UQ, Bhatti UA, Memon WH, Shoukat MU (2021) Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation. Intell Data Anal 25(4):1013–1029. https://doi.org/10.3233/IDA-205388
Article Google Scholar
Zhang K (2021) Music style classification algorithm based on music feature extraction and deep neural network. Wirel Commun Mob Comput 2021:1–7. https://doi.org/10.1155/2021/9298654
Article Google Scholar
Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In: INTERSPEECH. https://doi.org/10.21437/interspeech.2016-1236, pp 3304–3308

Download references

Acknowledgements

We thank the editor and anonymous reviewers for their valuable comments and feedbacks. This work was supported by Guangxi Natural Science Foundations (Nos. 2020GXNSFAA159012 and 2018GXNSFDA281049), National Natural Science Foundation of China (Nos. U1811264, 62062027, 62167002 and 61862013), Innovation Project of GUET Graduate Education (No. 2021YCXS052) and the project of Guangxi Key Laboratory of Trusted Software.

Author information

Authors and Affiliations

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Jinji Road, Guilin, 541004, Guangxi, China
You Li, Han Ding & Liang Chang
School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Jinji Road, Guilin, 541004, Guangxi, China
You Li & Zhihai Zhang

Authors

You Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Han Ding
View author publications
You can also search for this author in PubMed Google Scholar
Liang Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Chang.

Ethics declarations

Conflict of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Zhang, Z., Ding, H. et al. Music genre classification based on fusing audio and lyric information. Multimed Tools Appl 82, 20157–20176 (2023). https://doi.org/10.1007/s11042-022-14252-6

Download citation

Received: 28 April 2022
Revised: 26 August 2022
Accepted: 04 November 2022
Published: 29 December 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11042-022-14252-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Music genre classification based on fusing audio and lyric information

Abstract

Access this article

Similar content being viewed by others

A Deep Neural Network Model for Music Genre Recognition

Music genre classification based on res-gated CNN and attention mechanism

Music Genre Classification: A Comparative Study Between Deep Learning and Traditional Machine Learning Approaches

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Music genre classification based on fusing audio and lyric information

Abstract

Access this article

Similar content being viewed by others

A Deep Neural Network Model for Music Genre Recognition

Music genre classification based on res-gated CNN and attention mechanism

Music Genre Classification: A Comparative Study Between Deep Learning and Traditional Machine Learning Approaches

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation