Skip to main content
Log in

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08% to 98.60% for different genres, by using the K-nearest neighbors algorithm on the multimodal embeddings. We successfully study the intricacies of genres in this representational space, including their misclassification, visual clustering with EM-GMM, and the domain-specific meaning of the multi-modal weight for each genre with respect to ’instrumentalness’ and ’energy’ metadata. SLEM presents one of the first works on an end-to-end method that uses spectro-lyrical embeddings without hand-engineered features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availibility statement

The dataset generated for the purpose of the research is made publicly available by the authors in a github repository https://github.com/aryanmehra1999/SLEM. This is an easily downloadable CSV file, that contains the Spotify generated metadata for 227 songs along with their lyrics.

Notes

  1. https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models/

References

  1. Bahuleyan H (2018) Music genre classification using machine learning techniques. arXiv:1804.01149

  2. Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  3. Bertin-Mahieux T, Ellis DP, Whitman B et al (2011) The million song dataset. Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2011)

  4. Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimed Syst 28(3):779–791

    Article  Google Scholar 

  5. Castillo JR, Flores MJ (2021) Web-based music genre classification for timeline song visualization and analysis. IEEE Access 9:18801–18816. https://doi.org/10.1109/ACCESS.2021.3053864

    Article  Google Scholar 

  6. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  7. Costa YM, Oliveira LS, Koericb AL et al (2011) Music genre recognition using spectrograms. In: 2011 18th International conference on systems, signals and image processing, IEEE, pp 1–4

  8. Costa YM, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38

    Article  Google Scholar 

  9. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. https://doi.org/10.1109/TIT.1967.1053964

    Article  Google Scholar 

  10. Duggirala S, Moh TS (2020) A novel approach to music genre classification using natural language processing and spark. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), IEEE, pp 1–8

  11. Humphrey EJ, Bello JP, LeCun Y (2013) Feature learning and deep architectures: New directions for music informatics. J Intell Inf Syst 41:461–481

    Article  Google Scholar 

  12. Ishaq M, Khan M, Kwon S (2023) Tc-net: A modest & lightweight emotion recognition system using temporal convolution network. Comput Syst Sci Eng 46(3)

  13. Khan M, Gueaieb W, El Saddik A et al (2024) Mser: Multimodal speech emotion recognition using cross-attention with deep fusion. Expert Syst Appl 245:122946

    Article  Google Scholar 

  14. Kumar M, Walia GK, Shingare H et al (2023) Ai-based sustainable and intelligent offloading framework for iiot in collaborative cloud-fog environments. IEEE Trans Consum Electron pp 1–1. https://doi.org/10.1109/TCE.2023.3320673

  15. Li J, Han L, Li X et al (2022) An evaluation of deep neural network models for music classification using spectrograms. Multimed Tools Appl pp 1–27

  16. Li T, Ogihara M, Li Q (2003) A comparative study on content-based music genre classification. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp 282–289

  17. Liu Y, Ott M, Goyal N et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692

  18. Lyrics-Genius (2018) Genius.com. Genius (Lyrics Genius) Open Source python API. https://github.com/johnwmillr/LyricsGenius

  19. Mao Y, Zhong G, Wang H et al (2022) Music-crn: An efficient content-based music classification and recommendation network. Cogn Comput 14(6):2306–2316

    Article  Google Scholar 

  20. Mayer R, Rauber A (2011) Musical genre classification by ensembles of audio and lyrics features. In: Proceedings of international conference on music information retrieval, pp 675–680

  21. Mayer R, Neumayer R, Rauber A (2008) Rhyme and style features for musical genre classification by song lyrics. In: Ismir, pp 337–342

  22. McKay C, Fujinaga I (2006) Musical genre classification: Is it worth pursuing and how can it be improved? In: ISMIR, pp 101–106

  23. McKay C, Fujinaga I (2010) Improving automatic music classification performance by extracting features from different types of data. In: Proceedings of the International Conference on Multimedia Information Retrieval. Association for Computing Machinery, New York, NY, USA, MIR ’10, pp 257–266. https://doi.org/10.1145/1743384.1743430

  24. McKay C, Burgoyne JA, Hockman J et al (2010) Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In: ISMIR, pp 213–218

  25. Mustaqeem K, El Saddik A, Alotaibi FS et al (2023) Aad-net: Advanced end-to-end signal processing system for human emotion detection & recognition using attention-based deep echo state network. Knowl-Based Syst 270:110525

    Article  Google Scholar 

  26. Nanni L, Costa YM, Lucio DR et al (2017) Combining visual and acoustic features for audio classification tasks. Pattern Recogn Lett 88:49–56

    Article  Google Scholar 

  27. Narkhede N, Mathur S, Bhaskar A (2022) Machine learning techniques for music genre classification. In: Information and Communication Technology for Competitive Strategies (ICTCS 2020) ICT: Applications and Social Interfaces, Springer, pp 155–161

  28. Ndou N, Ajoodha R, Jadhav A (2021) Music genre classification: A review of deep-learning and traditional machine-learning approaches. In: 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), IEEE, pp 1–6

  29. Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. Adv Neural Inf Process Syst 26

  30. Oramas S, Barbieri F, Nieto Caballero O et al (2018) (2018) Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retr 1(1):4–21

    Google Scholar 

  31. Prabhakar SK, Lee SW (2023) Holistic approaches to music genre classification using efficient transfer and deep learning techniques. Expert Syst Appl 211:118636

    Article  Google Scholar 

  32. Reimers N, Gurevych I (2019) Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv:1908.10084

  33. Roy WG, Dowd TJ (2010) What is sociological about music? Annu Rev Sociol 36:183–203

    Article  Google Scholar 

  34. Shah M, Pujara N, Mangaroliya K, et al (2022) Music genre classification using deep learning. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), IEEE, pp 974–978

  35. Silla CN, Koerich AL, Kaestner CA (2008) A machine learning approach to automatic music genre classification. J Brazilian Comp Soc 14:7–18

    Article  Google Scholar 

  36. Simonetta F, Ntalampiras S, Avanzini F (2019) Multimodal music information processing and retrieval: Survey and future challenges. In: 2019 international workshop on multilayer music representation and processing (MMRP), IEEE, pp 10–18

  37. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  38. Singh Y, Biswas A (2022) Robustness of musical features on deep learning models for music genre classification. Expert Syst Appl 199:116879

    Article  Google Scholar 

  39. Song K, Tan X, Qin T et al (2020) Mpnet: Masked and permuted pre-training for language understanding. Adv Neural Inf Process Syst 33:16857–16867

    Google Scholar 

  40. Spotipy-Developers (2020) Spotipy plugin. Spotipy Web API Documentation. https://github.com/spotipy-dev/spotipy

  41. Sturm BL (2012) An analysis of the gtzan music genre dataset. In: Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, pp 7–12

  42. Sturm BL (2013) On music genre classification via compressive sampling. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6

  43. Suman OP, Kumar M (2023) Machine learning based theoretical and experimental analysis of ddos attacks in cloud computing. In: 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT), pp 526–531. https://doi.org/10.1109/DICCT56244.2023.10110201

  44. Swain M, Maji B, Khan M et al (2023) Multilevel feature representation for hybrid transformers-based emotion recognition. In: 2023 5th International Conference on Bio-engineering for Smart Technologies (BioSMART), pp 1–5. https://doi.org/10.1109/BioSMART58455.2023.10162089

  45. Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence

  46. Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: International conference on machine learning, PMLR, pp 10096–10106

  47. Tsaptsinos A (2017) Lyrics-based music genre classification using a hierarchical attention network. Proceedings of the 18th International Society for Music Information Retrieval (ISMIR) Conference pp 694–700

  48. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302

    Article  Google Scholar 

  49. Walia GK, Kumar M, Gill SS (2023) Ai-empowered fog/edge resource management for iot applications: A comprehensive review, research challenges and future perspectives. IEEE Commun Surv Tutorials pp 1–1. https://doi.org/10.1109/COMST.2023.3338015

  50. Wallin NL, Merker B, Brown S (2001) The origins of music. MIT press

  51. Wang W, Wei F, Dong L et al (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv Neural Inf Process Syst 33:5776–5788

    Google Scholar 

  52. Wu MJ, Jang JSR (2015) Combining acoustic and multilevel visual features for music genre classification. ACM Trans Multimed Comput Commun Appl (TOMM) 12(1):1–17

    Article  MathSciNet  Google Scholar 

  53. Yang T, Nazir S (2022) A comprehensive overview of ai-enabled music classification and its influence in games. Soft Comput 26(16):7679–7693

    Article  Google Scholar 

  54. Yu Y, Luo S, Liu S et al (2020) Deep attention based music genre classification. Neurocomputing 372:84–91

    Article  Google Scholar 

  55. Zoph B, Vasudevan V, Shlens J et al (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pratik Narang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehra, A., Mehra, A. & Narang, P. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19160-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19160-5

Keywords

Navigation