Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Mehra, Ashman; Mehra, Aryan; Narang, Pratik

doi:10.1007/s11042-024-19160-5

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Published: 24 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

99 Accesses
Explore all metrics

Abstract

The essence of music is inherently multi-modal – with audio and lyrics going hand in hand. However, there is very less research done to study the intricacies of the multi-modal nature of music, and its relation with genres. Our work uses this multi-modality to present spectro-lyrical embeddings for music representation (SLEM), leveraging the power of open-sourced, lightweight, and state-of-the-art deep learning vision and language models to encode songs. This work summarises extensive experimentation with over 20 deep learning-based music embeddings of a self-curated and hand-labeled multi-lingual dataset of 226 recent songs spread over 5 genres. Our aim is to study the effects of varying the weight of lyrics and spectrograms in the embeddings on the multi-class genre classification. The purpose of this study is to prove that a simple linear combination of both modalities is better than either modality alone. Our methods achieve an accuracy ranging between 81.08% to 98.60% for different genres, by using the K-nearest neighbors algorithm on the multimodal embeddings. We successfully study the intricacies of genres in this representational space, including their misclassification, visual clustering with EM-GMM, and the domain-specific meaning of the multi-modal weight for each genre with respect to ’instrumentalness’ and ’energy’ metadata. SLEM presents one of the first works on an end-to-end method that uses spectro-lyrical embeddings without hand-engineered features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis

Article 19 January 2023

Comparative Analysis of Music Genre Classification Framework Based on Deep Learning

Data availibility statement

The dataset generated for the purpose of the research is made publicly available by the authors in a github repository https://github.com/aryanmehra1999/SLEM. This is an easily downloadable CSV file, that contains the Spotify generated metadata for 227 songs along with their lyrics.

Notes

https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models/

References

Bahuleyan H (2018) Music genre classification using machine learning techniques. arXiv:1804.01149
Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Bertin-Mahieux T, Ellis DP, Whitman B et al (2011) The million song dataset. Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2011)
Cai X, Zhang H (2022) Music genre classification based on auditory image, spectral and acoustic features. Multimed Syst 28(3):779–791
Article Google Scholar
Castillo JR, Flores MJ (2021) Web-based music genre classification for timeline song visualization and analysis. IEEE Access 9:18801–18816. https://doi.org/10.1109/ACCESS.2021.3053864
Article Google Scholar
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Costa YM, Oliveira LS, Koericb AL et al (2011) Music genre recognition using spectrograms. In: 2011 18th International conference on systems, signals and image processing, IEEE, pp 1–4
Costa YM, Oliveira LS, Silla CN Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38
Article Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. https://doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Duggirala S, Moh TS (2020) A novel approach to music genre classification using natural language processing and spark. In: 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), IEEE, pp 1–8
Humphrey EJ, Bello JP, LeCun Y (2013) Feature learning and deep architectures: New directions for music informatics. J Intell Inf Syst 41:461–481
Article Google Scholar
Ishaq M, Khan M, Kwon S (2023) Tc-net: A modest & lightweight emotion recognition system using temporal convolution network. Comput Syst Sci Eng 46(3)
Khan M, Gueaieb W, El Saddik A et al (2024) Mser: Multimodal speech emotion recognition using cross-attention with deep fusion. Expert Syst Appl 245:122946
Article Google Scholar
Kumar M, Walia GK, Shingare H et al (2023) Ai-based sustainable and intelligent offloading framework for iiot in collaborative cloud-fog environments. IEEE Trans Consum Electron pp 1–1. https://doi.org/10.1109/TCE.2023.3320673
Li J, Han L, Li X et al (2022) An evaluation of deep neural network models for music classification using spectrograms. Multimed Tools Appl pp 1–27
Li T, Ogihara M, Li Q (2003) A comparative study on content-based music genre classification. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, pp 282–289
Liu Y, Ott M, Goyal N et al (2019) Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692
Lyrics-Genius (2018) Genius.com. Genius (Lyrics Genius) Open Source python API. https://github.com/johnwmillr/LyricsGenius
Mao Y, Zhong G, Wang H et al (2022) Music-crn: An efficient content-based music classification and recommendation network. Cogn Comput 14(6):2306–2316
Article Google Scholar
Mayer R, Rauber A (2011) Musical genre classification by ensembles of audio and lyrics features. In: Proceedings of international conference on music information retrieval, pp 675–680
Mayer R, Neumayer R, Rauber A (2008) Rhyme and style features for musical genre classification by song lyrics. In: Ismir, pp 337–342
McKay C, Fujinaga I (2006) Musical genre classification: Is it worth pursuing and how can it be improved? In: ISMIR, pp 101–106
McKay C, Fujinaga I (2010) Improving automatic music classification performance by extracting features from different types of data. In: Proceedings of the International Conference on Multimedia Information Retrieval. Association for Computing Machinery, New York, NY, USA, MIR ’10, pp 257–266. https://doi.org/10.1145/1743384.1743430
McKay C, Burgoyne JA, Hockman J et al (2010) Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. In: ISMIR, pp 213–218
Mustaqeem K, El Saddik A, Alotaibi FS et al (2023) Aad-net: Advanced end-to-end signal processing system for human emotion detection & recognition using attention-based deep echo state network. Knowl-Based Syst 270:110525
Article Google Scholar
Nanni L, Costa YM, Lucio DR et al (2017) Combining visual and acoustic features for audio classification tasks. Pattern Recogn Lett 88:49–56
Article Google Scholar
Narkhede N, Mathur S, Bhaskar A (2022) Machine learning techniques for music genre classification. In: Information and Communication Technology for Competitive Strategies (ICTCS 2020) ICT: Applications and Social Interfaces, Springer, pp 155–161
Ndou N, Ajoodha R, Jadhav A (2021) Music genre classification: A review of deep-learning and traditional machine-learning approaches. In: 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), IEEE, pp 1–6
Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. Adv Neural Inf Process Syst 26
Oramas S, Barbieri F, Nieto Caballero O et al (2018) (2018) Multimodal deep learning for music genre classification. Trans Int Soc Music Inf Retr 1(1):4–21
Google Scholar
Prabhakar SK, Lee SW (2023) Holistic approaches to music genre classification using efficient transfer and deep learning techniques. Expert Syst Appl 211:118636
Article Google Scholar
Reimers N, Gurevych I (2019) Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv:1908.10084
Roy WG, Dowd TJ (2010) What is sociological about music? Annu Rev Sociol 36:183–203
Article Google Scholar
Shah M, Pujara N, Mangaroliya K, et al (2022) Music genre classification using deep learning. In: 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), IEEE, pp 974–978
Silla CN, Koerich AL, Kaestner CA (2008) A machine learning approach to automatic music genre classification. J Brazilian Comp Soc 14:7–18
Article Google Scholar
Simonetta F, Ntalampiras S, Avanzini F (2019) Multimodal music information processing and retrieval: Survey and future challenges. In: 2019 international workshop on multilayer music representation and processing (MMRP), IEEE, pp 10–18
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh Y, Biswas A (2022) Robustness of musical features on deep learning models for music genre classification. Expert Syst Appl 199:116879
Article Google Scholar
Song K, Tan X, Qin T et al (2020) Mpnet: Masked and permuted pre-training for language understanding. Adv Neural Inf Process Syst 33:16857–16867
Google Scholar
Spotipy-Developers (2020) Spotipy plugin. Spotipy Web API Documentation. https://github.com/spotipy-dev/spotipy
Sturm BL (2012) An analysis of the gtzan music genre dataset. In: Proceedings of the second international ACM workshop on Music information retrieval with user-centered and multimodal strategies, pp 7–12
Sturm BL (2013) On music genre classification via compressive sampling. In: 2013 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6
Suman OP, Kumar M (2023) Machine learning based theoretical and experimental analysis of ddos attacks in cloud computing. In: 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT), pp 526–531. https://doi.org/10.1109/DICCT56244.2023.10110201
Swain M, Maji B, Khan M et al (2023) Multilevel feature representation for hybrid transformers-based emotion recognition. In: 2023 5th International Conference on Bio-engineering for Smart Technologies (BioSMART), pp 1–5. https://doi.org/10.1109/BioSMART58455.2023.10162089
Szegedy C, Ioffe S, Vanhoucke V et al (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence
Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: International conference on machine learning, PMLR, pp 10096–10106
Tsaptsinos A (2017) Lyrics-based music genre classification using a hierarchical attention network. Proceedings of the 18th International Society for Music Information Retrieval (ISMIR) Conference pp 694–700
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302
Article Google Scholar
Walia GK, Kumar M, Gill SS (2023) Ai-empowered fog/edge resource management for iot applications: A comprehensive review, research challenges and future perspectives. IEEE Commun Surv Tutorials pp 1–1. https://doi.org/10.1109/COMST.2023.3338015
Wallin NL, Merker B, Brown S (2001) The origins of music. MIT press
Wang W, Wei F, Dong L et al (2020) Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv Neural Inf Process Syst 33:5776–5788
Google Scholar
Wu MJ, Jang JSR (2015) Combining acoustic and multilevel visual features for music genre classification. ACM Trans Multimed Comput Commun Appl (TOMM) 12(1):1–17
Article MathSciNet Google Scholar
Yang T, Nazir S (2022) A comprehensive overview of ai-enabled music classification and its influence in games. Soft Comput 26(16):7679–7693
Article Google Scholar
Yu Y, Luo S, Liu S et al (2020) Deep attention based music genre classification. Neurocomputing 372:84–91
Article Google Scholar
Zoph B, Vasudevan V, Shlens J et al (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, Goa, India
Ashman Mehra
Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA
Aryan Mehra
Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani, Pilani, Rajasthan, India
Pratik Narang

Authors

Ashman Mehra
View author publications
You can also search for this author in PubMed Google Scholar
Aryan Mehra
View author publications
You can also search for this author in PubMed Google Scholar
Pratik Narang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pratik Narang.

Ethics declarations

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mehra, A., Mehra, A. & Narang, P. Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM). Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19160-5

Download citation

Received: 23 November 2023
Revised: 28 February 2024
Accepted: 02 April 2024
Published: 24 April 2024
DOI: https://doi.org/10.1007/s11042-024-19160-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Abstract

Access this article

Similar content being viewed by others

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis

Comparative Analysis of Music Genre Classification Framework Based on Deep Learning

Data availibility statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Abstract

Access this article

Similar content being viewed by others

Classification of Music by Genre Using Probabilistic Models and Deep Learning Models

A hybrid deep learning approach for classification of music genres using wavelet and spectrogram analysis

Comparative Analysis of Music Genre Classification Framework Based on Deep Learning

Data availibility statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation