Music Composition with Deep Learning: A Review

Hernandez-Olivan, Carlos; Beltrán, José R.

doi:10.1007/978-3-031-18444-4_2

Part of the book series: Signals and Communication Technology ((SCT))

2324 Accesses
22 Citations
37 Altmetric

Abstract

Generating a complex work of art such as a musical composition requires exhibiting a certain level of creativity. This depends on a variety of factors that are related to the hierarchy of musical language. Music generation has faced challenges by using algorithmic methods and recently is approaching them with deep learning models that are being used in other fields such as computer vision. In this chapter, we place into context the existing relationships between AI-based music composition models and human musical composition and creativity processes. First, we describe the music composition process, and then we give an overview of the recent deep learning models for music generation classifying them according to their relationship with some of the music basic principles: melody, harmony, structure, or music composition processes—instrumentation and orchestration. The relevance of classifying music generation models in those categories helps us to measure and understand how deep learning models deal with the complexity and hierarchy of music. We try to answer some of the most relevant open questions for this task by analyzing the ability of current deep learning models to generate music with creativity or the similarity between AI and human composition processes, among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://magenta.tensorflow.org/music-vae, accessed August 2021.
2.
https://colinraffel.com/projects/lmd/, accessed August 2021.
3.
https://shunithaviv.github.io/bebopnet/, accessed August 2021.

References

https://www.copyright.gov/prereg/music.html (2019), accessed July 2021
https://koenigproject.nl/project-1/ (2019), accessed July 2021
Bharucha, J.J., Todd, P.M.: Modeling the perception of tonal structure with neural nets. Computer Music Journal 13(4), 44–53 (1989)
Article Google Scholar
Bretan, M., Weinberg, G., Heck, L.P.: A unit selection methodology for music generation using deep neural networks. In: Goel, A.K., Jordanous, A., Pease, A. (eds.) Proceedings of the Eighth International Conference on Computational Creativity, ICCC 2017, Atlanta, Georgia, USA, June 19-23, 2017. pp. 72–79. Association for Computational Creativity (ACC) (2017)
Google Scholar
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 747–754 (2018)
Google Scholar
Brunner, G., Wang, Y., Wattenhofer, R., Zhao, S.: Symbolic music genre transfer with cyclegan. In: Tsoukalas, L.H., Grégoire, É., Alamaniotis, M. (eds.) IEEE 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018, 5-7 November 2018, Volos, Greece. pp. 786–793. IEEE (2018)
Google Scholar
Chen, K., Zhang, W., Dubnov, S., Xia, G., Li, W.: The effect of explicit structure encoding of deep neural networks for symbolic music generation. In: 2019 International Workshop on Multilayer Music Representation and Processing (MMRP). pp. 77–84. IEEE (2019)
Google Scholar
Chen, Z., Wu, C., Lu, Y., Lerch, A., Lu, C.: Learning to fuse music genres with generative adversarial dual learning. In: Raghavan, V., Aluru, S., Karypis, G., Miele, L., Wu, X. (eds.) 2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017. pp. 817–822. IEEE Computer Society (2017)
Google Scholar
Chomsky, N.: Syntactic structures. De Gruyter Mouton (2009)
Google Scholar
Chu, H., Urtasun, R., Fidler, S.: Song from PI: A musically plausible network for pop music generation. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net (2017)
Google Scholar
Collins, D.: A synthesis process model of creative thinking in music composition. Psychology of music 33(2), 193–216 (2005)
Article Google Scholar
Cope, D.: Experiments in musical intelligence (emi): Non-linear linguistic-based composition. Journal of New Music Research 18(1-2), 117–139 (1989)
Google Scholar
Dinculescu, M., Engel, J., Roberts, A. (eds.): MidiMe: Personalizing a MusicVAE model with user data (2019)
Google Scholar
Donahue, C., Mao, H.H., Li, Y.E., Cottrell, G.W., McAuley, J.J.: Lakhnes: Improving multi-instrumental music generation with cross-domain pre-training. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 685–692 (2019)
Google Scholar
Dong, H., Hsiao, W., Yang, L., Yang, Y.: Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018. pp. 34–41. AAAI Press (2018)
Google Scholar
Dong, H., Yang, Y.: Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 190–196 (2018)
Google Scholar
Eck, D.: A network of relaxation oscillators that finds downbeats in rhythms. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21-25, 2001 Proceedings. Lecture Notes in Computer Science, vol. 2130, pp. 1239–1247. Springer (2001)
Google Scholar
Ens, J., Pasquier, P.: Mmm: Exploring conditional multi-track music generation with the transformer. arXiv preprint arXiv:2008.06048 (2020)
Google Scholar
Fukushima, K., Miyake, S.: Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognit. 15(6), 455–469 (1982)
Article Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 2414–2423. IEEE Computer Society (2016)
Google Scholar
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. p. 2672–2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Google Scholar
Gordon, E.E.: Audiation, music learning theory, music aptitude, and creativity. In: Suncoast Music Education Forum on Creativity. vol. 75, p. 81. ERIC (1989)
Google Scholar
Grove, G.: Beethoven and his nine symphonies, vol. 334. Courier Corporation (1962)
Google Scholar
Hadjeres, G., Nielsen, F.: Interactive music generation with positional constraints using anticipation-rnns. CoRR abs/1709.06404 (2017)
Google Scholar
Hadjeres, G., Pachet, F., Nielsen, F.: Deepbach: a steerable model for bach chorales generation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1362–1371. PMLR (2017)
Google Scholar
Hakimi, S.H., Bhonker, N., El-Yaniv, R.: Bebopnet: Deep neural models for personalized jazz improvisations. In: Proceedings of the 21st international society for music information retrieval conference, ismir (2020)
Google Scholar
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)
Google Scholar
Hiller Jr, L.A., Isaacson, L.M.: Musical composition with a high speed digital computer. In: Audio Engineering Society Convention 9. Audio Engineering Society (1957)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, C.A., Cooijmans, T., Roberts, A., Courville, A.C., Eck, D.: Counterpoint by convolution. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 211–218 (2017)
Google Scholar
Huang, C.A., Hawthorne, C., Roberts, A., Dinculescu, M., Wexler, J., Hong, L., Howcroft, J.: Approachable music composition with machine learning at scale. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 793–800 (2019)
Google Scholar
Huang, C.Z.A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A.M., Hoffman, M.D., Eck, D.: Music transformer: Generating music with long-term structure. arXiv preprint arXiv:1809.04281 (2018)
Google Scholar
Hung, H., Wang, C., Yang, Y., Wang, H.: Improving automatic jazz melody generation by transfer learning techniques. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019. pp. 339–346. IEEE (2019)
Google Scholar
Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions. CoRR abs/2011.06801 (2020)
Google Scholar
Jiang, J., Xia, G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: A hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4–8, 2020. pp. 516–520. IEEE (2020)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014)
Google Scholar
Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: Gordon, G.J., Dunson, D.B., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011. JMLR Proceedings, vol. 15, pp. 29–37. JMLR.org (2011)
Google Scholar
Lattner, S., Grachten, M., Widmer, G.: Imposing higher-level structure in polyphonic music generation using convolutional restricted boltzmann machines and constraints. CoRR abs/1612.04742 (2016)
Google Scholar
LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learning. In: Forsyth, D.A., Mundy, J.L., Gesù, V.D., Cipolla, R. (eds.) Shape, Contour and Grouping in Computer Vision. Lecture Notes in Computer Science, vol. 1681, p. 319. Springer (1999)
Google Scholar
Levi, R.G.: A field investigation of the composing processes used by second-grade children creating original language and music pieces. Ph.D. thesis, Case Western Reserve University (1991)
Google Scholar
Mittal, G., Engel, J.H., Hawthorne, C., Simon, I.: Symbolic music generation with diffusion models. CoRR abs/2103.16091 (2021)
Google Scholar
Mogren, O.: C-RNN-GAN: continuous recurrent neural networks with adversarial training. CoRR abs/1611.09904 (2016)
Google Scholar
Mozer, M.C.: Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connect. Sci. 6(2-3), 247–280 (1994)
Article Google Scholar
Müller, M.: Fundamentals of Music Processing - Audio, Analysis, Algorithms, Applications. Springer (2015)
Google Scholar
Nierhaus, G.: Algorithmic composition: paradigms of automated music generation. Springer Science & Business Media (2009)
Google Scholar
Payne, C.: Musenet, 2019. https://openai.com/blog/musenet (2019)
Peracha, O.: Improving polyphonic music models with feature-rich encoding. In: Cumming, J., Lee, J.H., McFee, B., Schedl, M., Devaney, J., McKay, C., Zangerle, E., de Reuse, T. (eds.) Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, October 11-16, 2020. pp. 169–175 (2020)
Google Scholar
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)
Google Scholar
Roberts, A., Engel, J.H., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4361–4370. PMLR (2018)
Google Scholar
Root-Bernstein, R.S.: Music, creativity and scientific thinking. Leonardo 34(1), 63–68 (2001)
Article Google Scholar
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65(6), 386 (1958)
Article Google Scholar
Semin Kang, S.Y.O., Kang, Y.M.: Automatic music generation and machine learning based evaluation. In: International Conference on Multimedia and Signal Processing. pp. 436–443 (2012)
Google Scholar
Sevsay, E.: The cambridge guide to orchestration. Cambridge University Press (2013)
Google Scholar
Sternberg, R.J., Kaufman, J.C.: The nature of human creativity. Cambridge University Press (2018)
Google Scholar
Tan, H.H.: Chordal: A chord-based approach for music generation using bi-lstms. In: Grace, K., Cook, M., Ventura, D., Maher, M.L. (eds.) Proceedings of the Tenth International Conference on Computational Creativity, ICCC 2019, Charlotte, North Carolina, USA, June 17-21, 2019. pp. 364–365. Association for Computational Creativity (ACC) (2019)
Google Scholar
Teng, Y., Zhao, A., Goudeseune, C.: Generating nontrivial melodies for music as a service. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 657–663 (2017)
Google Scholar
Todd, P.M.: A connectionist approach to algorithmic composition. Computer Music Journal 13(4), 27–43 (1989), http://www.jstor.org/stable/3679551
Article Google Scholar
Trieu, N., Keller, R.: Jazzgan: Improvising with generative adversarial networks. In: MUME workshop (2018)
Google Scholar
Valenti, A., Carta, A., Bacciu, D.: Learning a latent space of style-aware symbolic music representations by adversarial autoencoders. CoRR abs/2001.05494 (2020)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. pp. 5998–6008 (2017)
Google Scholar
Waite, E., et al.: Generating long-term structure in songs and stories. Web blog post. Magenta 15(4) (2016)
Google Scholar
Walton, C.W.: Basic Forms in Music. Alfred Music (2005)
Google Scholar
Wang, C., Dubnov, S.: Guided music synthesis with variable markov oracle. In: Pasquier, P., Eigenfeldt, A., Bown, O. (eds.) Musical Metacreation, Papers from the 2014 AIIDE Workshop, October 4, 2014, Raleigh, NC, USA. AAAI Workshops, vol. WS-14-18. AAAI Press (2014)
Google Scholar
Wang, Z., Wang, D., Zhang, Y., Xia, G.: Learning interpretable representation for controllable polyphonic music generation. CoRR abs/2008.07122 (2020)
Google Scholar
Wang, Z., Zhang, Y., Zhang, Y., Jiang, J., Yang, R., Zhao, J., Xia, G.: PIANOTREE VAE: structured representation learning for polyphonic music. CoRR abs/2008.07118 (2020)
Google Scholar
Wu, S., Yang, Y.: Musemorphose: Full-song and fine-grained music style transfer with just one transformer VAE. CoRR abs/2105.04090 (2021)
Google Scholar
Xenakēs, G.: Musiques formelles: nouveaux principes formels de composition musicale. Ed. Richard-Masse (1963)
Google Scholar
Yang, W., Sun, P., Zhang, Y., Zhang, Y.: Clstms: A combination of two lstm models to generate chords accompaniment for symbolic melody. In: 2019 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS). pp. 176–180. IEEE (2019)
Google Scholar
Yeh, Y., Hsiao, W., Fukayama, S., Kitahara, T., Genchel, B., Liu, H., Dong, H., Chen, Y., Leong, T., Yang, Y.: Automatic melody harmonization with triad chords: A comparative study. CoRR abs/2001.02360 (2020)
Google Scholar
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852–2858. AAAI Press (2017)
Google Scholar
Zhu, H., Liu, Q., Yuan, N.J., Qin, C., Li, J., Zhang, K., Zhou, G., Wei, F., Xu, Y., Chen, E.: Xiaoice band: A melody and arrangement generation framework for pop music. In: Guo, Y., Farooq, F. (eds.) Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. pp. 2837–2846. ACM (2018)
Google Scholar
Zhu, H., Liu, Q., Yuan, N.J., Zhang, K., Zhou, G., Chen, E.: Pop music generation: From melody to multi-style arrangement. ACM Trans. Knowl. Discov. Data 14(5), 54:1–54:31 (2020)
Google Scholar
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)
Article Google Scholar

Download references

Acknowledgements

This research has been partially supported by the Spanish Ministry of Science, Innovation and Universities by the RTI2018-096986-B-C31 contract and the Government of Aragon by the AffectiveLab-T60-20R project.

We wish to thank Jürgen Schmidhuber for his suggestions.

Author information

Authors and Affiliations

Department of Electronic Engineering and Communications, University of Zaragoza, Zaragoza, Spain
Carlos Hernandez-Olivan & José R. Beltrán

Authors

Carlos Hernandez-Olivan
View author publications
You can also search for this author in PubMed Google Scholar
José R. Beltrán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Carlos Hernandez-Olivan or José R. Beltrán .

Editor information

Editors and Affiliations

Department of Computer Science & Engineering, National Institute of Technology Silchar, Cachar, Assam, India
Anupam Biswas
Department of Media and Culture Studies, Utrecht University, Utrecht, Utrecht, The Netherlands
Emile Wennekes
Multimedia Department, Polish-Japanese Academy of Information Technology, Warsaw, Poland
Alicja Wieczorkowska
Department of Electronics & Communication Engineering, National Institute of Technology Silchar, Cachar, India
Rabul Hussain Laskar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hernandez-Olivan, C., Beltrán, J.R. (2023). Music Composition with Deep Learning: A Review. In: Biswas, A., Wennekes, E., Wieczorkowska, A., Laskar, R.H. (eds) Advances in Speech and Music Technology. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-18444-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-18444-4_2
Published: 23 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18443-7
Online ISBN: 978-3-031-18444-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics