Skip to main content
Log in

MusicEmo: transformer-based intelligent approach towards music emotion generation and recognition

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The paper proposes a novel approach called MusicEmo, a transformer-based intelligent system for music emotion generation and recognition. The paper highlights the challenges of creating emotionally resonant music that is musically cohesive and diverse. The proposed approach addresses this challenge by introducing a theme-based conditioning approach, which trains the transformer to manifest the conditioning sequence as thematic material that appears multiple times in the generated result. The MusicEmo architecture incorporates an emotion vector and an LSTM model for creating symbolic musical sequences that are musically coherent and emotionally resonant. The proposed framework outperforms state-of-the-art approaches based on musical consistency and emotional resonance. The transformer-based approach offers a fresh and original way of creating music based on emotions, and it can potentially revolutionize how we create and experience music in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Agrawal Y, Shanker RGR, Alluri V (2021a) Transformer-based approach towards music emotion recognition from lyrics. In: European conference on information retrieval. Springer, pp 167–175

  • Agrawal Y, Shanker RGR, Alluri V (2021b) Transformer-based approach towards music emotion recognition from lyrics. In: European conference on information retrieval. Springer, pp 167–175

  • Bao C, Sun Q (2022) Generating music with emotions. IEEE Trans Multimed 25:3602–3614

    Article  Google Scholar 

  • Boulanger-Lewandowski N, Bengio Y, Vincent P (2012) Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392

  • Briot JP, Pachet F (2020) Deep learning for music generation: challenges and directions. Neural Comput Appl 32(4):981–993

    Article  Google Scholar 

  • Briot JP, Hadjeres G, Pachet FD (2017) Deep learning techniques for music generation: a survey. arXiv preprint arXiv:1709.01620

  • Casey MA (1993) Computers and musical style, pp 1053–1055

  • Chen TP, Su L (2021) Attend to chords: improving harmonic analysis of symbolic music using transformer-based models. Trans Int Soc Music Inf 4(1):1–13

    Google Scholar 

  • Eck D, Schmidhuber J (2002) Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. In: Proceedings of the 12th IEEE workshop on neural networks for signal processing. IEEE, pp 747–756

  • Er MB, Aydilek IB (2019) Music emotion recognition by using chroma spectrogram and deep visual features. Int J Comput Intell Syst 12(2):1622–1634

    Article  Google Scholar 

  • Eyben F, Weninger F, Gross F, et al (2013) Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, pp 835–838

  • Ferreira LN, Whitehead J (2021) Learning to generate music with sentiment. arXiv preprint arXiv:2103.06125

  • Genussov M, Cohen I (2010) Musical genre classification of audio signals using geometric methods. In: 2010 18th European signal processing conference. IEEE, pp 497–501

  • Gómez-Cañón JS, Cano E, Eerola T et al (2021) Music emotion recognition: toward new, robust standards in personalized and context-sensitive applications. IEEE Signal Process Mag 38(6):106–114

    Article  Google Scholar 

  • Hizlisoy S, Yildirim S, Tufekci Z (2021) Music emotion recognition using convolutional long short term memory deep neural networks. Eng Sci Technol Int J 24(3):760–767

    Google Scholar 

  • Hsu JL, Chang SJ (2021) Generating music transition by using a transformer-based model. Electronics 10(18):2276

    Article  Google Scholar 

  • Hsu JL, Chang SJ (2021) Generating music transition by using a transformer-based model. Electronics 10(18):2276

    Article  Google Scholar 

  • Hung HT, Ching J, Doh S et al (2021) Emopia: a multi-modal pop piano dataset for emotion recognition and emotion-based music generation. arXiv preprint arXiv:2108.01374

  • Ishizuka K, Onisawa T et al (2008) Generation of variations on theme music based on impressions of story scenes considering human’s feeling of music and stories. Int J Comput Games Technol 2008:281959

    Article  Google Scholar 

  • Kagan S, Kirchberg V (2016) Music and sustainability: organizational cultures towards creative resilience: a review. J Clean Prod 135:1487–1502

    Article  Google Scholar 

  • Latif S, Zaidi A, Cuayahuitl H et al (2023) Transformers in speech processing: a survey. arXiv preprint arXiv:2303.11607

  • Lau DS, Ajoodha R (2022) Music genre classification: a comparative study between deep learning and traditional machine learning approaches. In: Proceedings of sixth international congress on information and communication technology: ICICT 2021, London, vol 4. Springer, pp 239–247

  • Sams AS, Zahra A (2023) Multimodal music emotion recognition in Indonesian songs based on CNN-LSTM, XLNET transformers. Bull Electr Eng Inform 12(1):355–364

    Article  Google Scholar 

  • Shih YJ, Wu SL, Zalkow F et al (2022) Theme transformer: symbolic music generation with theme-conditioned transformer. IEEE Trans Multimed 25:3495–3507

    Article  Google Scholar 

  • Turchet L, Lagrange M, Rottondi C et al (2023) The internet of sounds: convergent trends, insights and future directions. IEEE Internet Things J 10:11264

    Article  Google Scholar 

  • Wu SL, Yang YH (2020) The Jazz transformer on the front line: Exploring the shortcomings of ai-composed music through quantitative measures. arXiv preprint arXiv:2008.01307

  • Yang LC, Chou SY, Yang YH (2017) Midinet: a convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847

  • Zheng K, Meng R, Zheng C et al (2021) Emotionbox: a music-element-driven emotional music generation system using recurrent neural network. arXiv preprint arXiv:2112.08561

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Xin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xin, Y. MusicEmo: transformer-based intelligent approach towards music emotion generation and recognition. J Ambient Intell Human Comput (2024). https://doi.org/10.1007/s12652-024-04811-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12652-024-04811-0

Keywords

Navigation