Skip to main content
Log in

Controllable lyrics-to-melody generation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Lyrics-to-melody generation is an interesting and challenging topic in AI music research field. Due to the difficulty of learning the correlations between lyrics and melody, previous methods suffer from low generation quality and lack of controllability. Controllability of generative models enables human interaction with models to generate desired contents, which is especially important in music generation tasks towards human-centered AI that can facilitate musicians in creative activities. To address these issues, we propose a controllable lyrics-to-melody generation network, ConL2M, which is able to generate realistic melodies from lyrics in user-desired musical style. Our work contains three main novelties: (1) to model the dependencies of music attributes cross multiple sequences, inter-branch memory fusion (Memofu) is proposed to enable information flow between multi-branch stacked LSTM architecture; (2) reference style embedding (RSE) is proposed to improve the quality of generation as well as control the musical style of generated melodies; (3) sequence-level statistical loss (SeqLoss) is proposed to help the model learn sequence-level features of melodies given lyrics. Verified by evaluation metrics for music quality and controllability, initial study of controllable lyrics-to-melody generation shows better generation quality and the feasibility of interacting with users to generate the melodies in desired musical styles when given lyrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available in the [2] repository, https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation.

Notes

  1. https://dreamtonics.com/synthesizerv/.

References

  1. Wiggins GA (2006) A preliminary framework for description, analysis and comparison of creative systems. Knowl Based Syst 19(7):449–458. https://doi.org/10.1016/j.knosys.2006.04.009

    Article  Google Scholar 

  2. Yu Y, Srivastava A, Canales S (2021) Conditional LSTM-GAN for melody generation from lyrics. ACM Trans Multimed Comput Commun Appl 17(1):35–13520. https://doi.org/10.1145/3424116

    Article  Google Scholar 

  3. Srivastava A, Duan W, Shah RR, Wu J, Tang S, Li W, Yu Y (2022) Melody generation from lyrics using three branch conditional LSTM-GAN. In: MultiMedia modeling, pp 569–581. https://doi.org/10.1007/978-3-030-98358-1_45

  4. Sheng Z, Song K, Tan X, Ren Y, Ye W, Zhang S, Qin T (2020) SongMASS: automatic song writing with pre-training and alignment constraint. https://doi.org/10.48550/arXiv.2012.05168

  5. Briot J-P, Hadjeres G, Pachet F-D (2019) Deep learning techniques for music generation—a survey. https://doi.org/10.48550/arXiv.1709.01620

  6. Ji S, Luo J, Yang X (2020) A comprehensive survey on deep music generation: multi-level representations, algorithms, evaluations, and future directions. https://doi.org/10.48550/arXiv.2011.06801

  7. Carnovalini F, Rodà A (2020) Computational creativity and music generation systems: an introduction to the state of the art. Front Artif Intell 3

  8. Choi K, Fazekas G, Sandler M (2016) Text-based LSTM networks for automatic music composition. https://doi.org/10.48550/arXiv.1604.05358

  9. Ackerman M, Loker D (2016) Algorithmic songwriting with ALYSIA. https://doi.org/10.48550/arXiv.1612.01058

  10. Bao H, Huang S, Wei F, Cui L, Wu Y, Tan C, Piao S, Zhou M (2018) Neural melody composition from lyrics. https://doi.org/10.48550/arXiv.1809.04318

  11. Yu Y, Zhang Z, Duan W, Srivastava A, Shah R, Ren Y (2022) Conditional hybrid GAN for melody generation from lyrics. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07863-5

    Article  Google Scholar 

  12. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27

  13. Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2017) MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. https://doi.org/10.48550/arXiv.1709.06298

  14. Kingma DP, Welling M (2014) Auto-encoding variational Bayes. https://doi.org/10.48550/arXiv.1312.6114

  15. Roberts A, Engel J, Raffel C, Hawthorne C, Eck D (2019) A hierarchical latent vector model for learning long-term structure in music. https://doi.org/10.48550/arXiv.1803.05428

  16. Chen K, Wang C-I, Berg-Kirkpatrick T, Dubnov S (2020) Music SketchNet: controllable music generation via factorized representations of pitch and rhythm. https://doi.org/10.48550/arXiv.2008.01291

  17. Wang Z, Wang D, Zhang Y, Xia G (2020) Learning interpretable representation for controllable polyphonic music generation. https://doi.org/10.48550/arXiv.2008.07122

  18. Wu J, Liu X, Hu X, Zhu J (2020) PopMNet: generating structured pop music melodies using neural networks. Artif Intell 286:103303. https://doi.org/10.1016/j.artint.2020.103303

    Article  Google Scholar 

  19. Dai S, Jin Z, Gomes C, Dannenberg RB (2021) Controllable deep melody generation via hierarchical music structure. Representation. https://doi.org/10.48550/arXiv.2109.00663

    Article  Google Scholar 

  20. Ju Z, Lu P, Tan X, Wang R, Zhang C, Wu S, Zhang K, Li X, Qin T, Liu T-Y (2021) TeleMelody: lyric-to-melody generation with a template-based two-stage method. https://doi.org/10.48550/arXiv.2109.09617

  21. Duan W, Zhang Z, Yu Y, Oyama K (2022) Interpretable melody generation from lyrics with discrete-valued adversarial training. In: Proceedings of the 30th ACM international conference on multimedia, pp. 6973–6975. https://doi.org/10.1145/3503161.3547742

  22. Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879

  23. Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. https://doi.org/10.48550/arXiv.1303.5778

  24. Wang Y, Stanton D, Zhang Y, Skerry-Ryan RJ, Battenberg E, Shor J, Xiao Y, Ren F, Jia Y, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech. Synthesis. https://doi.org/10.48550/arXiv.1803.09017

    Article  Google Scholar 

  25. Jang E, Gu S, Poole B (2017) Categorical reparameterization with Gumbel-Softmax. https://doi.org/10.48550/arXiv.1611.01144

  26. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  27. Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. https://doi.org/10.48550/arXiv.1807.00734

  28. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. https://doi.org/10.48550/arXiv.1310.4546

  29. Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1097–1100. https://doi.org/10.1145/3209978.3210080

  30. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. https://doi.org/10.48550/arXiv.1412.6980

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Yu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Yu, Y. & Takasu, A. Controllable lyrics-to-melody generation. Neural Comput & Applic 35, 19805–19819 (2023). https://doi.org/10.1007/s00521-023-08728-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08728-1

Keywords

Navigation