Abstract
Music genre classification (MGC) has a wide range of application scenarios. Traditional MGC methods only consider either audio information or lyric information, resulting in an unsatisfactory recognition effect. In this paper, we propose a multimodal music genre classification framework that integrates both audio information and lyric information. By using the complementarity of multimodal information, music genres can be represented more comprehensively. First, the framework extracts the mel-spectrogram of audio, and a convolutional neural network is used to extract audio features. Simultaneously, BERT is used to obtain the distributed representation of the lyrics. Then, the two modal pieces of information are fused through different strategies, such as at the feature level and decision level. To solve the serious inconsistency between the convergence speed of the audio channel and the lyric channel, we adopt the strategy of asynchronous start training of two channels and different learning rates. A series of experiments are carried out to verify the effectiveness of the proposed model. The F1 score of the proposed model is 0.87 for music genre classification, which is approximately 4% higher than that of the best baseline in the experiment.
Similar content being viewed by others
Data Availability
The datasets generated during the current study are available from the corresponding author on reasonable request.
Notes
References
Albadr MAA, Tiun S, Ayob M, Mohammed M, Al-Dhief FT (2021) Mel-frequency cepstral coefficient features based on standard deviation and principal component analysis for language identification systems. Cogn Comput 13:1136–1153. https://doi.org/10.1007/s12559-021-09914-w
Bhatti UA, Yu Z, Chanussot J, Zeeshan Z, et al. (2022) Local similarity-based spatial–spectral fusion hyperspectral image classification with deep CNN and gabor filtering. IEEE Trans Geosci Remote Sens 60. https://doi.org/10.1109/TGRS.2021.3090410
Chen T, Xie Y, Zhang S, Huang S, Zhou H, Li J (2022) Learning music sequence representation from text supervision. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP43922.2022.9746131, pp 4583–4587
Choi K, Fazekas G, Sandler M (2016) Automatic tagging using deep convolutional neural networks. In: Proceedings of the 17th international society for music information retrieval conference, pp 805–811
Coban O, Ozyer GT (2016) Music genre classification from Turkish lyrics. In: 2016 24th signal processing and communication application conference (SIU). https://doi.org/10.1109/siu.2016.7495686https://doi.org/10.1109/siu.2016.7495686, pp 101–104
Corrêa DC, Rodrigues FA (2016) A survey on symbolic data-based music genre classification. Expert Syst Appl 60(C):190–210. https://doi.org/10.1016/j.eswa.2016.04.008
Çoban Ö (2017) Turkish music genre classification using audio and lyrics features. Süleyman Demirel Üniv Fen Bilimleri Enstitüsü Derg 21(2):322–331. https://doi.org/10.19113/sdufbed.88303
Daouadi KE, Reba ZR, Amous I (2021) Optimizing semantic deep forest for tweet topic classification. Inf Syst 101(2):101801. https://doi.org/10.1016/j.is.2021.101801
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, vol 1 (Long and Short Papers). https://doi.org/10.18653/v1/N19-1423, pp 4171–4186
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp.2014.6854950https://doi.org/10.1109/icassp.2014.6854950, pp 6964–6968
Fang J, Grunberg D, Litman DT, Wang Y (2017) Discourse analysis of lyric and lyric-based classification of music. In: ISMIR. https://doi.org/10.5281/zenodo.1416946, pp 464–471
Fell M, Sporleder C (2014) Lyrics-based analysis and classification of music. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers, pp 620–631
Hassen AK, Janßen H, Assenmacher D, Preuss M, Vatolkin I (2018) Classifying music genres using image classification neural networks. Arch Data Sci Ser A (Online First) 5(1):20. https://doi.org/10.5445/KSP/1000087327/20
Hu Z, Liu Y, Chen G, Zhong S, Zhang A (2020) Make your favorite music curative: music style transfer for anxiety reduction. Proceedings of the 28th ACM international conference on multimedia. https://doi.org/10.1145/3394171.3414070
Huang Y, Du C, Xue Z, Chen X, Zhao H, Huang L (2021) What makes multimodal learning better than single (provably). In: 35th Conference on neural information processing systems. https://doi.org/10.48550/arXiv.2106.04538, pp 10944–10956
Huang Q, Jansen A, Zhang L, Ellis PWD, Saurous AR, Anderson RJ (2020) Large-scale weakly-supervised content embeddings for music recommendation and tagging. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8364–8368. https://doi.org/10.1109/ICASSP40776.2020.9053240
Kamtue K, Euchukanonchai K, Wanvarie D, Pratanwanich N (2019) Lukthung classification using neural networks on lyrics and audios. In: 2019 23rd international computer science and engineering conference (ICSEC). https://doi.org/10.1109/ICSEC47112.2019.8974740, pp 269–274
Kumar A, Rajpal A, Rathore D (2018) Genre classification using feature extraction and deep learning techniques. In: 2018 10th International conference on knowledge and systems engineering (KSE). https://doi.org/10.1109/KSE.2018.8573325, pp 175–180
Kumar A, Rajpal A, Rathore D (2018) Genre classification using word embeddings and deep learning. In: 2018 International conference on advances in computing, communications and informatics (ICACCI). https://doi.org/10.1109/icacci.2018.8554816, pp 2142–2146
Lee J, Park J, Kim KL, Nam J (2017) Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv:1703.01789, https://doi.org/10.1109/icassp.2018.8462046
Lee J, Park J, Kim KL, Nam J (2018) Samplecnn: end-to-end deep convolutional neural networks using very small filters for music classification. Appl Sci 8(1):150. https://doi.org/10.3390/app8010150
Li T, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 IEEE workshop on applications of signal processing to audio and acoustics. https://doi.org/10.1109/aspaa.2003.1285840, pp 143–146
Lin Y-H, Chen HH (2021) Tag propagation and cost-sensitive learning for music auto-tagging. IEEE Trans Multimedia 23:1605–1616. https://doi.org/10.1109/TMM.2020.3001521
Lin Y, Fu Y, Li Y, Cai G, Zhou A (2021) Aspect-based sentiment analysis for online reviews with hybrid attention networks. World Wide Web 24:1215–1233. https://doi.org/10.1007/s11280-021-00898-z
Liu C, Feng L, Liu G, Wang H, Liu S (2021) Bottom-up broadcast neural network for music genre classification. Multimed Tools Appl 80 (5):7313–7331. https://doi.org/10.1007/s11042-020-09643-6
Makhmutov M (2019) Adaptive game soundtrack generation based on music transcription. Proc AAAI Conf Artif Intell Interact Digit Entertain 15 (1):216–218
Manco I, Benetos E, Quinton E, Fazekas G (2022) Learning music audio representations via weak language supervision. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP43922.2022.9746996, pp 456–460
Mayer R, Neumayer R, Rauber A (2008) Rhyme and style features for musical genre classification by song lyrics. In: 9th international conference on music information retrieval, pp 337–342
Mayer R, Rauber A (2010) Building ensembles of audio and lyrics features to improve musical genre classification. pp 1–6
Mayer R, Rauber A (2011) Musical genre classification by ensembles of audio and lyrics features. In: Proceedings of international conference on music information retrieval, pp 675–680
Neforawati I, Pratama MO, Satyawan W (2019) Indonesian lyrics classification using feature level fusion. In: 2019 2nd International conference of computer and informatics engineering (IC2IE). https://doi.org/10.1109/IC2IE47452.2019.8940826, pp 6–11
Nguyen HQ, Do TT, Chu BT, Trinh VL, Nguyen HD, Phan VC, Phan AT, Doan VD, Pham NH, Nguyen PB et al (2019) Music genre classification using residual attention network. In: 2019 International conference on system science and engineering (ICSSE). https://doi.org/10.1109/icsse.2019.8823100https://doi.org/10.1109/icsse.2019.8823100, pp 115–119
Oramas S, Barbieri F, Nieto Caballero O, Serra X (2018) Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retrieval 1(1):4–21. https://doi.org/10.5334/tismir.10
Pons J, Nieto O, Prockup M, Schmidt E, Ehmann A, Serra X (2018) End-to-end learning for music audio tagging at scale. In: Proceedings of the 19th international society for music information retrieval conference. https://doi.org/10.48550/arXiv.1711.02520, pp 637–644
Pons J, Slizovskaia O, Gong R, Gómez E, Serra X (2017) Timbre analysis of music audio signals with convolutional neural networks. In: 2017 25th European signal processing conference (EUSIPCO). https://doi.org/10.23919/eusipco.2017.8081710, pp 2744–2748
Senac C, Pellegrini T, Mouret F, Pinquier J (2017) Music feature maps with convolutional neural networks for music genre classification. In: Proceedings of the 15th international workshop on content-based multimedia indexing. https://doi.org/10.1145/3095713.3095733, pp 1–5
Tang H, Chen N (2020) Combining CNN and broad learning for music classification. IEICE Trans Inf Syst 103(3):695–701. https://doi.org/10.1587/transinf.2019edp7175
Tsaptsinos A (2017) Lyrics-based music genre classification using a hierarchical attention network. In: Proceedings of the 18th international society for music information retrieval conference, ISMIR 2017. https://doi.org/10.48550/arXiv.1707.04678, pp 694–701
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302. https://doi.org/10.1109/tsa.2002.800560
Wadhwa L, Mukherjee P (2021) Music genre classification using multi-modal deep learning based fusion. 2021 Grace Hopper Celebration India (GHCI), 1–5. https://doi.org/10.1109/GHCI50508.2021.9514020https://doi.org/10.1109/GHCI50508.2021.9514020
Yaslan Y, Cataltepe Z (2006) Audio music genre classification using different classifiers and feature selection methods. In: 18th International conference on pattern recognition (ICPR’06). https://doi.org/10.1109/icpr.2006.282, vol 2, pp 573–576
Yu Y, Tang S, Raposo F, Chen L (2019) Deep cross-modal correlation learning for audio and lyrics in music retrieval. ACM Trans Multimed Comput Commun Appl 15(1):1–16. https://doi.org/10.1145/3281746
Yuan C, Ma Q, Chen J, Zhou W, Zhang X, Tang X, Han J, Hu S (2020) Exploiting heterogeneous artist and listener preference graph for music genre classification. In: Proceedings of the 28th ACM international conference on multimedia. https://doi.org/10.1145/3394171.3414000, pp 3532–3540
Zeeshan Z, Ain UQ, Bhatti UA, Memon WH, Shoukat MU (2021) Feature-based multi-criteria recommendation system using a weighted approach with ranking correlation. Intell Data Anal 25(4):1013–1029. https://doi.org/10.3233/IDA-205388
Zhang K (2021) Music style classification algorithm based on music feature extraction and deep neural network. Wirel Commun Mob Comput 2021:1–7. https://doi.org/10.1155/2021/9298654
Zhang W, Lei W, Xu X, Xing X (2016) Improved music genre classification with convolutional neural networks. In: INTERSPEECH. https://doi.org/10.21437/interspeech.2016-1236, pp 3304–3308
Acknowledgements
We thank the editor and anonymous reviewers for their valuable comments and feedbacks. This work was supported by Guangxi Natural Science Foundations (Nos. 2020GXNSFAA159012 and 2018GXNSFDA281049), National Natural Science Foundation of China (Nos. U1811264, 62062027, 62167002 and 61862013), Innovation Project of GUET Graduate Education (No. 2021YCXS052) and the project of Guangxi Key Laboratory of Trusted Software.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Zhang, Z., Ding, H. et al. Music genre classification based on fusing audio and lyric information. Multimed Tools Appl 82, 20157–20176 (2023). https://doi.org/10.1007/s11042-022-14252-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14252-6