Skip to main content

Music Composition with Deep Learning: A Review

  • Chapter
  • First Online:
Advances in Speech and Music Technology

Abstract

Generating a complex work of art such as a musical composition requires exhibiting a certain level of creativity. This depends on a variety of factors that are related to the hierarchy of musical language. Music generation has faced challenges by using algorithmic methods and recently is approaching them with deep learning models that are being used in other fields such as computer vision. In this chapter, we place into context the existing relationships between AI-based music composition models and human musical composition and creativity processes. First, we describe the music composition process, and then we give an overview of the recent deep learning models for music generation classifying them according to their relationship with some of the music basic principles: melody, harmony, structure, or music composition processes—instrumentation and orchestration. The relevance of classifying music generation models in those categories helps us to measure and understand how deep learning models deal with the complexity and hierarchy of music. We try to answer some of the most relevant open questions for this task by analyzing the ability of current deep learning models to generate music with creativity or the similarity between AI and human composition processes, among others.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://magenta.tensorflow.org/music-vae, accessed August 2021.

  2. 2.

    https://colinraffel.com/projects/lmd/, accessed August 2021.

  3. 3.

    https://shunithaviv.github.io/bebopnet/, accessed August 2021.

References

  1. https://www.copyright.gov/prereg/music.html (2019), accessed July 2021

  2. https://koenigproject.nl/project-1/ (2019), accessed July 2021

  3. Bharucha, J.J., Todd, P.M.: Modeling the perception of tonal structure with neural nets. Computer Music Journal 13(4), 44–53 (1989)

    Article  Google Scholar 

  4. Bretan, M., Weinberg, G., Heck, L.P.: A unit selection methodology for music generation using deep neural networks. In: Goel, A.K., Jordanous, A., Pease, A. (eds.) Proceedings of the Eighth International Conference on Computational Creativity, ICCC 2017, Atlanta, Georgia, USA, June 19-23, 2017. pp. 72–79. Association for Computational Creativity (ACC) (2017)

    Google Scholar 

  5. Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 747–754 (2018)

    Google Scholar 

  6. Brunner, G., Wang, Y., Wattenhofer, R., Zhao, S.: Symbolic music genre transfer with cyclegan. In: Tsoukalas, L.H., Grégoire, É., Alamaniotis, M. (eds.) IEEE 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018, 5-7 November 2018, Volos, Greece. pp. 786–793. IEEE (2018)

    Google Scholar 

  7. Chen, K., Zhang, W., Dubnov, S., Xia, G., Li, W.: The effect of explicit structure encoding of deep neural networks for symbolic music generation. In: 2019 International Workshop on Multilayer Music Representation and Processing (MMRP). pp. 77–84. IEEE (2019)

    Google Scholar 

  8. Chen, Z., Wu, C., Lu, Y., Lerch, A., Lu, C.: Learning to fuse music genres with generative adversarial dual learning. In: Raghavan, V., Aluru, S., Karypis, G., Miele, L., Wu, X. (eds.) 2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017. pp. 817–822. IEEE Computer Society (2017)

    Google Scholar 

  9. Chomsky, N.: Syntactic structures. De Gruyter Mouton (2009)

    Google Scholar 

  10. Chu, H., Urtasun, R., Fidler, S.: Song from PI: A musically plausible network for pop music generation. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net (2017)

    Google Scholar 

  11. Collins, D.: A synthesis process model of creative thinking in music composition. Psychology of music 33(2), 193–216 (2005)

    Article  Google Scholar 

  12. Cope, D.: Experiments in musical intelligence (emi): Non-linear linguistic-based composition. Journal of New Music Research 18(1-2), 117–139 (1989)

    Google Scholar 

  13. Dinculescu, M., Engel, J., Roberts, A. (eds.): MidiMe: Personalizing a MusicVAE model with user data (2019)

    Google Scholar 

  14. Donahue, C., Mao, H.H., Li, Y.E., Cottrell, G.W., McAuley, J.J.: Lakhnes: Improving multi-instrumental music generation with cross-domain pre-training. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 685–692 (2019)

    Google Scholar 

  15. Dong, H., Hsiao, W., Yang, L., Yang, Y.: Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018. pp. 34–41. AAAI Press (2018)

    Google Scholar 

  16. Dong, H., Yang, Y.: Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 190–196 (2018)

    Google Scholar 

  17. Eck, D.: A network of relaxation oscillators that finds downbeats in rhythms. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21-25, 2001 Proceedings. Lecture Notes in Computer Science, vol. 2130, pp. 1239–1247. Springer (2001)

    Google Scholar 

  18. Ens, J., Pasquier, P.: Mmm: Exploring conditional multi-track music generation with the transformer. arXiv preprint arXiv:2008.06048 (2020)

    Google Scholar 

  19. Fukushima, K., Miyake, S.: Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognit. 15(6), 455–469 (1982)

    Article  Google Scholar 

  20. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 2414–2423. IEEE Computer Society (2016)

    Google Scholar 

  21. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. p. 2672–2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)

    Google Scholar 

  22. Gordon, E.E.: Audiation, music learning theory, music aptitude, and creativity. In: Suncoast Music Education Forum on Creativity. vol. 75, p. 81. ERIC (1989)

    Google Scholar 

  23. Grove, G.: Beethoven and his nine symphonies, vol. 334. Courier Corporation (1962)

    Google Scholar 

  24. Hadjeres, G., Nielsen, F.: Interactive music generation with positional constraints using anticipation-rnns. CoRR abs/1709.06404 (2017)

    Google Scholar 

  25. Hadjeres, G., Pachet, F., Nielsen, F.: Deepbach: a steerable model for bach chorales generation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1362–1371. PMLR (2017)

    Google Scholar 

  26. Hakimi, S.H., Bhonker, N., El-Yaniv, R.: Bebopnet: Deep neural models for personalized jazz improvisations. In: Proceedings of the 21st international society for music information retrieval conference, ismir (2020)

    Google Scholar 

  27. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)

    Google Scholar 

  28. Hiller Jr, L.A., Isaacson, L.M.: Musical composition with a high speed digital computer. In: Audio Engineering Society Convention 9. Audio Engineering Society (1957)

    Google Scholar 

  29. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020)

    Google Scholar 

  30. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  31. Huang, C.A., Cooijmans, T., Roberts, A., Courville, A.C., Eck, D.: Counterpoint by convolution. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 211–218 (2017)

    Google Scholar 

  32. Huang, C.A., Hawthorne, C., Roberts, A., Dinculescu, M., Wexler, J., Hong, L., Howcroft, J.: Approachable music composition with machine learning at scale. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 793–800 (2019)

    Google Scholar 

  33. Huang, C.Z.A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A.M., Hoffman, M.D., Eck, D.: Music transformer: Generating music with long-term structure. arXiv preprint arXiv:1809.04281 (2018)

    Google Scholar 

  34. Hung, H., Wang, C., Yang, Y., Wang, H.: Improving automatic jazz melody generation by transfer learning techniques. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019. pp. 339–346. IEEE (2019)

    Google Scholar 

  35. Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions. CoRR abs/2011.06801 (2020)

    Google Scholar 

  36. Jiang, J., Xia, G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: A hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4–8, 2020. pp. 516–520. IEEE (2020)

    Google Scholar 

  37. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014)

    Google Scholar 

  38. Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: Gordon, G.J., Dunson, D.B., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011. JMLR Proceedings, vol. 15, pp. 29–37. JMLR.org (2011)

    Google Scholar 

  39. Lattner, S., Grachten, M., Widmer, G.: Imposing higher-level structure in polyphonic music generation using convolutional restricted boltzmann machines and constraints. CoRR abs/1612.04742 (2016)

    Google Scholar 

  40. LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learning. In: Forsyth, D.A., Mundy, J.L., Gesù, V.D., Cipolla, R. (eds.) Shape, Contour and Grouping in Computer Vision. Lecture Notes in Computer Science, vol. 1681, p. 319. Springer (1999)

    Google Scholar 

  41. Levi, R.G.: A field investigation of the composing processes used by second-grade children creating original language and music pieces. Ph.D. thesis, Case Western Reserve University (1991)

    Google Scholar 

  42. Mittal, G., Engel, J.H., Hawthorne, C., Simon, I.: Symbolic music generation with diffusion models. CoRR abs/2103.16091 (2021)

    Google Scholar 

  43. Mogren, O.: C-RNN-GAN: continuous recurrent neural networks with adversarial training. CoRR abs/1611.09904 (2016)

    Google Scholar 

  44. Mozer, M.C.: Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connect. Sci. 6(2-3), 247–280 (1994)

    Article  Google Scholar 

  45. Müller, M.: Fundamentals of Music Processing - Audio, Analysis, Algorithms, Applications. Springer (2015)

    Google Scholar 

  46. Nierhaus, G.: Algorithmic composition: paradigms of automated music generation. Springer Science & Business Media (2009)

    Google Scholar 

  47. Payne, C.: Musenet, 2019. https://openai.com/blog/musenet (2019)

  48. Peracha, O.: Improving polyphonic music models with feature-rich encoding. In: Cumming, J., Lee, J.H., McFee, B., Schedl, M., Devaney, J., McKay, C., Zangerle, E., de Reuse, T. (eds.) Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, October 11-16, 2020. pp. 169–175 (2020)

    Google Scholar 

  49. Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)

    Google Scholar 

  50. Roberts, A., Engel, J.H., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4361–4370. PMLR (2018)

    Google Scholar 

  51. Root-Bernstein, R.S.: Music, creativity and scientific thinking. Leonardo 34(1), 63–68 (2001)

    Article  Google Scholar 

  52. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65(6), 386 (1958)

    Article  Google Scholar 

  53. Semin Kang, S.Y.O., Kang, Y.M.: Automatic music generation and machine learning based evaluation. In: International Conference on Multimedia and Signal Processing. pp. 436–443 (2012)

    Google Scholar 

  54. Sevsay, E.: The cambridge guide to orchestration. Cambridge University Press (2013)

    Google Scholar 

  55. Sternberg, R.J., Kaufman, J.C.: The nature of human creativity. Cambridge University Press (2018)

    Google Scholar 

  56. Tan, H.H.: Chordal: A chord-based approach for music generation using bi-lstms. In: Grace, K., Cook, M., Ventura, D., Maher, M.L. (eds.) Proceedings of the Tenth International Conference on Computational Creativity, ICCC 2019, Charlotte, North Carolina, USA, June 17-21, 2019. pp. 364–365. Association for Computational Creativity (ACC) (2019)

    Google Scholar 

  57. Teng, Y., Zhao, A., Goudeseune, C.: Generating nontrivial melodies for music as a service. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 657–663 (2017)

    Google Scholar 

  58. Todd, P.M.: A connectionist approach to algorithmic composition. Computer Music Journal 13(4), 27–43 (1989), http://www.jstor.org/stable/3679551

    Article  Google Scholar 

  59. Trieu, N., Keller, R.: Jazzgan: Improvising with generative adversarial networks. In: MUME workshop (2018)

    Google Scholar 

  60. Valenti, A., Carta, A., Bacciu, D.: Learning a latent space of style-aware symbolic music representations by adversarial autoencoders. CoRR abs/2001.05494 (2020)

    Google Scholar 

  61. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. pp. 5998–6008 (2017)

    Google Scholar 

  62. Waite, E., et al.: Generating long-term structure in songs and stories. Web blog post. Magenta 15(4) (2016)

    Google Scholar 

  63. Walton, C.W.: Basic Forms in Music. Alfred Music (2005)

    Google Scholar 

  64. Wang, C., Dubnov, S.: Guided music synthesis with variable markov oracle. In: Pasquier, P., Eigenfeldt, A., Bown, O. (eds.) Musical Metacreation, Papers from the 2014 AIIDE Workshop, October 4, 2014, Raleigh, NC, USA. AAAI Workshops, vol. WS-14-18. AAAI Press (2014)

    Google Scholar 

  65. Wang, Z., Wang, D., Zhang, Y., Xia, G.: Learning interpretable representation for controllable polyphonic music generation. CoRR abs/2008.07122 (2020)

    Google Scholar 

  66. Wang, Z., Zhang, Y., Zhang, Y., Jiang, J., Yang, R., Zhao, J., Xia, G.: PIANOTREE VAE: structured representation learning for polyphonic music. CoRR abs/2008.07118 (2020)

    Google Scholar 

  67. Wu, S., Yang, Y.: Musemorphose: Full-song and fine-grained music style transfer with just one transformer VAE. CoRR abs/2105.04090 (2021)

    Google Scholar 

  68. Xenakēs, G.: Musiques formelles: nouveaux principes formels de composition musicale. Ed. Richard-Masse (1963)

    Google Scholar 

  69. Yang, W., Sun, P., Zhang, Y., Zhang, Y.: Clstms: A combination of two lstm models to generate chords accompaniment for symbolic melody. In: 2019 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS). pp. 176–180. IEEE (2019)

    Google Scholar 

  70. Yeh, Y., Hsiao, W., Fukayama, S., Kitahara, T., Genchel, B., Liu, H., Dong, H., Chen, Y., Leong, T., Yang, Y.: Automatic melody harmonization with triad chords: A comparative study. CoRR abs/2001.02360 (2020)

    Google Scholar 

  71. Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852–2858. AAAI Press (2017)

    Google Scholar 

  72. Zhu, H., Liu, Q., Yuan, N.J., Qin, C., Li, J., Zhang, K., Zhou, G., Wei, F., Xu, Y., Chen, E.: Xiaoice band: A melody and arrangement generation framework for pop music. In: Guo, Y., Farooq, F. (eds.) Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. pp. 2837–2846. ACM (2018)

    Google Scholar 

  73. Zhu, H., Liu, Q., Yuan, N.J., Zhang, K., Zhou, G., Chen, E.: Pop music generation: From melody to multi-style arrangement. ACM Trans. Knowl. Discov. Data 14(5), 54:1–54:31 (2020)

    Google Scholar 

  74. Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This research has been partially supported by the Spanish Ministry of Science, Innovation and Universities by the RTI2018-096986-B-C31 contract and the Government of Aragon by the AffectiveLab-T60-20R project.

We wish to thank Jürgen Schmidhuber for his suggestions.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Carlos Hernandez-Olivan or José R. Beltrán .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hernandez-Olivan, C., Beltrán, J.R. (2023). Music Composition with Deep Learning: A Review. In: Biswas, A., Wennekes, E., Wieczorkowska, A., Laskar, R.H. (eds) Advances in Speech and Music Technology. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-18444-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18444-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18443-7

  • Online ISBN: 978-3-031-18444-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics