Controllable lyrics-to-melody generation

Zhang, Zhe; Yu, Yi; Takasu, Atsuhiro

doi:10.1007/s00521-023-08728-1

Controllable lyrics-to-melody generation

Original Article
Published: 08 July 2023

Volume 35, pages 19805–19819, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

229 Accesses
Explore all metrics

Abstract

Lyrics-to-melody generation is an interesting and challenging topic in AI music research field. Due to the difficulty of learning the correlations between lyrics and melody, previous methods suffer from low generation quality and lack of controllability. Controllability of generative models enables human interaction with models to generate desired contents, which is especially important in music generation tasks towards human-centered AI that can facilitate musicians in creative activities. To address these issues, we propose a controllable lyrics-to-melody generation network, ConL2M, which is able to generate realistic melodies from lyrics in user-desired musical style. Our work contains three main novelties: (1) to model the dependencies of music attributes cross multiple sequences, inter-branch memory fusion (Memofu) is proposed to enable information flow between multi-branch stacked LSTM architecture; (2) reference style embedding (RSE) is proposed to improve the quality of generation as well as control the musical style of generated melodies; (3) sequence-level statistical loss (SeqLoss) is proposed to help the model learn sequence-level features of melodies given lyrics. Verified by evaluation metrics for music quality and controllability, initial study of controllable lyrics-to-melody generation shows better generation quality and the feasibility of interacting with users to generate the melodies in desired musical styles when given lyrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Attentional networks for music generation

Article 21 January 2022

An automatic music generation method based on RSCLN_Transformer network

Article 12 January 2024

Data availability

The datasets generated during and/or analysed during the current study are available in the [2] repository, https://github.com/yy1lab/Lyrics-Conditioned-Neural-Melody-Generation.

Notes

https://dreamtonics.com/synthesizerv/.

References

Wiggins GA (2006) A preliminary framework for description, analysis and comparison of creative systems. Knowl Based Syst 19(7):449–458. https://doi.org/10.1016/j.knosys.2006.04.009
Article Google Scholar
Yu Y, Srivastava A, Canales S (2021) Conditional LSTM-GAN for melody generation from lyrics. ACM Trans Multimed Comput Commun Appl 17(1):35–13520. https://doi.org/10.1145/3424116
Article Google Scholar
Srivastava A, Duan W, Shah RR, Wu J, Tang S, Li W, Yu Y (2022) Melody generation from lyrics using three branch conditional LSTM-GAN. In: MultiMedia modeling, pp 569–581. https://doi.org/10.1007/978-3-030-98358-1_45
Sheng Z, Song K, Tan X, Ren Y, Ye W, Zhang S, Qin T (2020) SongMASS: automatic song writing with pre-training and alignment constraint. https://doi.org/10.48550/arXiv.2012.05168
Briot J-P, Hadjeres G, Pachet F-D (2019) Deep learning techniques for music generation—a survey. https://doi.org/10.48550/arXiv.1709.01620
Ji S, Luo J, Yang X (2020) A comprehensive survey on deep music generation: multi-level representations, algorithms, evaluations, and future directions. https://doi.org/10.48550/arXiv.2011.06801
Carnovalini F, Rodà A (2020) Computational creativity and music generation systems: an introduction to the state of the art. Front Artif Intell 3
Choi K, Fazekas G, Sandler M (2016) Text-based LSTM networks for automatic music composition. https://doi.org/10.48550/arXiv.1604.05358
Ackerman M, Loker D (2016) Algorithmic songwriting with ALYSIA. https://doi.org/10.48550/arXiv.1612.01058
Bao H, Huang S, Wei F, Cui L, Wu Y, Tan C, Piao S, Zhou M (2018) Neural melody composition from lyrics. https://doi.org/10.48550/arXiv.1809.04318
Yu Y, Zhang Z, Duan W, Srivastava A, Shah R, Ren Y (2022) Conditional hybrid GAN for melody generation from lyrics. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07863-5
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27
Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2017) MuseGAN: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. https://doi.org/10.48550/arXiv.1709.06298
Kingma DP, Welling M (2014) Auto-encoding variational Bayes. https://doi.org/10.48550/arXiv.1312.6114
Roberts A, Engel J, Raffel C, Hawthorne C, Eck D (2019) A hierarchical latent vector model for learning long-term structure in music. https://doi.org/10.48550/arXiv.1803.05428
Chen K, Wang C-I, Berg-Kirkpatrick T, Dubnov S (2020) Music SketchNet: controllable music generation via factorized representations of pitch and rhythm. https://doi.org/10.48550/arXiv.2008.01291
Wang Z, Wang D, Zhang Y, Xia G (2020) Learning interpretable representation for controllable polyphonic music generation. https://doi.org/10.48550/arXiv.2008.07122
Wu J, Liu X, Hu X, Zhu J (2020) PopMNet: generating structured pop music melodies using neural networks. Artif Intell 286:103303. https://doi.org/10.1016/j.artint.2020.103303
Article Google Scholar
Dai S, Jin Z, Gomes C, Dannenberg RB (2021) Controllable deep melody generation via hierarchical music structure. Representation. https://doi.org/10.48550/arXiv.2109.00663
Article Google Scholar
Ju Z, Lu P, Tan X, Wang R, Zhang C, Wu S, Zhang K, Li X, Qin T, Liu T-Y (2021) TeleMelody: lyric-to-melody generation with a template-based two-stage method. https://doi.org/10.48550/arXiv.2109.09617
Duan W, Zhang Z, Yu Y, Oyama K (2022) Interpretable melody generation from lyrics with discrete-valued adversarial training. In: Proceedings of the 30th ACM international conference on multimedia, pp. 6973–6975. https://doi.org/10.1145/3503161.3547742
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, pp 2873–2879
Graves A, Mohamed A-R, Hinton G (2013) Speech recognition with deep recurrent neural networks. https://doi.org/10.48550/arXiv.1303.5778
Wang Y, Stanton D, Zhang Y, Skerry-Ryan RJ, Battenberg E, Shor J, Xiao Y, Ren F, Jia Y, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech. Synthesis. https://doi.org/10.48550/arXiv.1803.09017
Article Google Scholar
Jang E, Gu S, Poole B (2017) Categorical reparameterization with Gumbel-Softmax. https://doi.org/10.48550/arXiv.1611.01144
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Jolicoeur-Martineau A (2018) The relativistic discriminator: a key element missing from standard GAN. https://doi.org/10.48550/arXiv.1807.00734
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. https://doi.org/10.48550/arXiv.1310.4546
Zhu Y, Lu S, Zheng L, Guo J, Zhang W, Wang J, Yu Y (2018) Texygen: a benchmarking platform for text generation models. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 1097–1100. https://doi.org/10.1145/3209978.3210080
Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. https://doi.org/10.48550/arXiv.1412.6980

Download references

Author information

Authors and Affiliations

Digital Content and Media Sciences Research Division, National Institute of Informatics and SOKENDAI, Chiyoda-ku, Tokyo, 101-8430, Japan
Zhe Zhang, Yi Yu & Atsuhiro Takasu

Authors

Zhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiro Takasu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Yu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Z., Yu, Y. & Takasu, A. Controllable lyrics-to-melody generation. Neural Comput & Applic 35, 19805–19819 (2023). https://doi.org/10.1007/s00521-023-08728-1

Download citation

Received: 13 November 2022
Accepted: 31 May 2023
Published: 08 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00521-023-08728-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controllable lyrics-to-melody generation

Abstract

Access this article

Similar content being viewed by others

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Attentional networks for music generation

An automatic music generation method based on RSCLN_Transformer network

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Controllable lyrics-to-melody generation

Abstract

Access this article

Similar content being viewed by others

Melody Generation from Lyrics Using Three Branch Conditional LSTM-GAN

Attentional networks for music generation

An automatic music generation method based on RSCLN_Transformer network

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation