Abstract
The paper proposes a novel approach called MusicEmo, a transformer-based intelligent system for music emotion generation and recognition. The paper highlights the challenges of creating emotionally resonant music that is musically cohesive and diverse. The proposed approach addresses this challenge by introducing a theme-based conditioning approach, which trains the transformer to manifest the conditioning sequence as thematic material that appears multiple times in the generated result. The MusicEmo architecture incorporates an emotion vector and an LSTM model for creating symbolic musical sequences that are musically coherent and emotionally resonant. The proposed framework outperforms state-of-the-art approaches based on musical consistency and emotional resonance. The transformer-based approach offers a fresh and original way of creating music based on emotions, and it can potentially revolutionize how we create and experience music in the future.
Similar content being viewed by others
References
Agrawal Y, Shanker RGR, Alluri V (2021a) Transformer-based approach towards music emotion recognition from lyrics. In: European conference on information retrieval. Springer, pp 167–175
Agrawal Y, Shanker RGR, Alluri V (2021b) Transformer-based approach towards music emotion recognition from lyrics. In: European conference on information retrieval. Springer, pp 167–175
Bao C, Sun Q (2022) Generating music with emotions. IEEE Trans Multimed 25:3602–3614
Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392
Briot JP, Pachet F (2020) Deep learning for music generation: challenges and directions. Neural Comput Appl 32(4):981–993
Briot JP, Hadjeres G, Pachet FD (2017) Deep learning techniques for music generation: a survey. arXiv preprint arXiv:1709.01620
Casey MA (1993) Computers and musical style, pp 1053–1055
Chen TP, Su L (2021) Attend to chords: improving harmonic analysis of symbolic music using transformer-based models. Trans Int Soc Music Inf 4(1):1–13
Eck D, Schmidhuber J (2002) Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing. IEEE, pp 747–756
Er MB, Aydilek IB (2019) Music emotion recognition by using chroma spectrogram and deep visual features. Int J Comput Intell Syst 12(2):1622–1634
Eyben F, Weninger F, Gross F, et al (2013) Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, pp 835–838
Ferreira LN, Whitehead J (2021) Learning to generate music with sentiment. arXiv preprint arXiv:2103.06125
Genussov M, Cohen I (2010) Musical genre classification of audio signals using geometric methods. In: 2010 18th European signal processing conference. IEEE, pp 497–501
Gómez-Cañón JS, Cano E, Eerola T et al (2021) Music emotion recognition: toward new, robust standards in personalized and context-sensitive applications. IEEE Signal Process Mag 38(6):106–114
Hizlisoy S, Yildirim S, Tufekci Z (2021) Music emotion recognition using convolutional long short term memory deep neural networks. Eng Sci Technol Int J 24(3):760–767
Hsu JL, Chang SJ (2021) Generating music transition by using a transformer-based model. Electronics 10(18):2276
Hsu JL, Chang SJ (2021) Generating music transition by using a transformer-based model. Electronics 10(18):2276
Hung HT, Ching J, Doh S et al (2021) Emopia: a multi-modal pop piano dataset for emotion recognition and emotion-based music generation. arXiv preprint arXiv:2108.01374
Ishizuka K, Onisawa T et al (2008) Generation of variations on theme music based on impressions of story scenes considering human’s feeling of music and stories. Int J Comput Games Technol 2008:281959
Kagan S, Kirchberg V (2016) Music and sustainability: organizational cultures towards creative resilience: a review. J Clean Prod 135:1487–1502
Latif S, Zaidi A, Cuayahuitl H et al (2023) Transformers in speech processing: a survey. arXiv preprint arXiv:2303.11607
Lau DS, Ajoodha R (2022) Music genre classification: a comparative study between deep learning and traditional machine learning approaches. In: Proceedings of sixth international congress on information and communication technology: ICICT 2021, London, vol 4. Springer, pp 239–247
Sams AS, Zahra A (2023) Multimodal music emotion recognition in Indonesian songs based on CNN-LSTM, XLNET transformers. Bull Electr Eng Inform 12(1):355–364
Shih YJ, Wu SL, Zalkow F et al (2022) Theme transformer: symbolic music generation with theme-conditioned transformer. IEEE Trans Multimed 25:3495–3507
Turchet L, Lagrange M, Rottondi C et al (2023) The internet of sounds: convergent trends, insights and future directions. IEEE Internet Things J 10:11264
Wu SL, Yang YH (2020) The Jazz transformer on the front line: Exploring the shortcomings of ai-composed music through quantitative measures. arXiv preprint arXiv:2008.01307
Yang LC, Chou SY, Yang YH (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847
Zheng K, Meng R, Zheng C et al (2021) Emotionbox: a music-element-driven emotional music generation system using recurrent neural network. arXiv preprint arXiv:2112.08561
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xin, Y. MusicEmo: transformer-based intelligent approach towards music emotion generation and recognition. J Ambient Intell Human Comput (2024). https://doi.org/10.1007/s12652-024-04811-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12652-024-04811-0