Abstract
Generating a complex work of art such as a musical composition requires exhibiting a certain level of creativity. This depends on a variety of factors that are related to the hierarchy of musical language. Music generation has faced challenges by using algorithmic methods and recently is approaching them with deep learning models that are being used in other fields such as computer vision. In this chapter, we place into context the existing relationships between AI-based music composition models and human musical composition and creativity processes. First, we describe the music composition process, and then we give an overview of the recent deep learning models for music generation classifying them according to their relationship with some of the music basic principles: melody, harmony, structure, or music composition processes—instrumentation and orchestration. The relevance of classifying music generation models in those categories helps us to measure and understand how deep learning models deal with the complexity and hierarchy of music. We try to answer some of the most relevant open questions for this task by analyzing the ability of current deep learning models to generate music with creativity or the similarity between AI and human composition processes, among others.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://magenta.tensorflow.org/music-vae, accessed August 2021.
- 2.
https://colinraffel.com/projects/lmd/, accessed August 2021.
- 3.
https://shunithaviv.github.io/bebopnet/, accessed August 2021.
References
https://www.copyright.gov/prereg/music.html (2019), accessed July 2021
https://koenigproject.nl/project-1/ (2019), accessed July 2021
Bharucha, J.J., Todd, P.M.: Modeling the perception of tonal structure with neural nets. Computer Music Journal 13(4), 44–53 (1989)
Bretan, M., Weinberg, G., Heck, L.P.: A unit selection methodology for music generation using deep neural networks. In: Goel, A.K., Jordanous, A., Pease, A. (eds.) Proceedings of the Eighth International Conference on Computational Creativity, ICCC 2017, Atlanta, Georgia, USA, June 19-23, 2017. pp. 72–79. Association for Computational Creativity (ACC) (2017)
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 747–754 (2018)
Brunner, G., Wang, Y., Wattenhofer, R., Zhao, S.: Symbolic music genre transfer with cyclegan. In: Tsoukalas, L.H., Grégoire, É., Alamaniotis, M. (eds.) IEEE 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018, 5-7 November 2018, Volos, Greece. pp. 786–793. IEEE (2018)
Chen, K., Zhang, W., Dubnov, S., Xia, G., Li, W.: The effect of explicit structure encoding of deep neural networks for symbolic music generation. In: 2019 International Workshop on Multilayer Music Representation and Processing (MMRP). pp. 77–84. IEEE (2019)
Chen, Z., Wu, C., Lu, Y., Lerch, A., Lu, C.: Learning to fuse music genres with generative adversarial dual learning. In: Raghavan, V., Aluru, S., Karypis, G., Miele, L., Wu, X. (eds.) 2017 IEEE International Conference on Data Mining, ICDM 2017, New Orleans, LA, USA, November 18-21, 2017. pp. 817–822. IEEE Computer Society (2017)
Chomsky, N.: Syntactic structures. De Gruyter Mouton (2009)
Chu, H., Urtasun, R., Fidler, S.: Song from PI: A musically plausible network for pop music generation. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. OpenReview.net (2017)
Collins, D.: A synthesis process model of creative thinking in music composition. Psychology of music 33(2), 193–216 (2005)
Cope, D.: Experiments in musical intelligence (emi): Non-linear linguistic-based composition. Journal of New Music Research 18(1-2), 117–139 (1989)
Dinculescu, M., Engel, J., Roberts, A. (eds.): MidiMe: Personalizing a MusicVAE model with user data (2019)
Donahue, C., Mao, H.H., Li, Y.E., Cottrell, G.W., McAuley, J.J.: Lakhnes: Improving multi-instrumental music generation with cross-domain pre-training. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 685–692 (2019)
Dong, H., Hsiao, W., Yang, L., Yang, Y.: Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018. pp. 34–41. AAAI Press (2018)
Dong, H., Yang, Y.: Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In: Gómez, E., Hu, X., Humphrey, E., Benetos, E. (eds.) Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. pp. 190–196 (2018)
Eck, D.: A network of relaxation oscillators that finds downbeats in rhythms. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks - ICANN 2001, International Conference Vienna, Austria, August 21-25, 2001 Proceedings. Lecture Notes in Computer Science, vol. 2130, pp. 1239–1247. Springer (2001)
Ens, J., Pasquier, P.: Mmm: Exploring conditional multi-track music generation with the transformer. arXiv preprint arXiv:2008.06048 (2020)
Fukushima, K., Miyake, S.: Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognit. 15(6), 455–469 (1982)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. pp. 2414–2423. IEEE Computer Society (2016)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. p. 2672–2680. NIPS’14, MIT Press, Cambridge, MA, USA (2014)
Gordon, E.E.: Audiation, music learning theory, music aptitude, and creativity. In: Suncoast Music Education Forum on Creativity. vol. 75, p. 81. ERIC (1989)
Grove, G.: Beethoven and his nine symphonies, vol. 334. Courier Corporation (1962)
Hadjeres, G., Nielsen, F.: Interactive music generation with positional constraints using anticipation-rnns. CoRR abs/1709.06404 (2017)
Hadjeres, G., Pachet, F., Nielsen, F.: Deepbach: a steerable model for bach chorales generation. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1362–1371. PMLR (2017)
Hakimi, S.H., Bhonker, N., El-Yaniv, R.: Bebopnet: Deep neural models for personalized jazz improvisations. In: Proceedings of the 21st international society for music information retrieval conference, ismir (2020)
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net (2017)
Hiller Jr, L.A., Isaacson, L.M.: Musical composition with a high speed digital computer. In: Audio Engineering Society Convention 9. Audio Engineering Society (1957)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
Huang, C.A., Cooijmans, T., Roberts, A., Courville, A.C., Eck, D.: Counterpoint by convolution. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 211–218 (2017)
Huang, C.A., Hawthorne, C., Roberts, A., Dinculescu, M., Wexler, J., Hong, L., Howcroft, J.: Approachable music composition with machine learning at scale. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, November 4-8, 2019. pp. 793–800 (2019)
Huang, C.Z.A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A.M., Hoffman, M.D., Eck, D.: Music transformer: Generating music with long-term structure. arXiv preprint arXiv:1809.04281 (2018)
Hung, H., Wang, C., Yang, Y., Wang, H.: Improving automatic jazz melody generation by transfer learning techniques. In: 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019, Lanzhou, China, November 18-21, 2019. pp. 339–346. IEEE (2019)
Ji, S., Luo, J., Yang, X.: A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions. CoRR abs/2011.06801 (2020)
Jiang, J., Xia, G., Carlton, D.B., Anderson, C.N., Miyakawa, R.H.: Transformer VAE: A hierarchical model for structure-aware and interpretable music representation learning. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4–8, 2020. pp. 516–520. IEEE (2020)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Bengio, Y., LeCun, Y. (eds.) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings (2014)
Larochelle, H., Murray, I.: The neural autoregressive distribution estimator. In: Gordon, G.J., Dunson, D.B., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011. JMLR Proceedings, vol. 15, pp. 29–37. JMLR.org (2011)
Lattner, S., Grachten, M., Widmer, G.: Imposing higher-level structure in polyphonic music generation using convolutional restricted boltzmann machines and constraints. CoRR abs/1612.04742 (2016)
LeCun, Y., Haffner, P., Bottou, L., Bengio, Y.: Object recognition with gradient-based learning. In: Forsyth, D.A., Mundy, J.L., Gesù, V.D., Cipolla, R. (eds.) Shape, Contour and Grouping in Computer Vision. Lecture Notes in Computer Science, vol. 1681, p. 319. Springer (1999)
Levi, R.G.: A field investigation of the composing processes used by second-grade children creating original language and music pieces. Ph.D. thesis, Case Western Reserve University (1991)
Mittal, G., Engel, J.H., Hawthorne, C., Simon, I.: Symbolic music generation with diffusion models. CoRR abs/2103.16091 (2021)
Mogren, O.: C-RNN-GAN: continuous recurrent neural networks with adversarial training. CoRR abs/1611.09904 (2016)
Mozer, M.C.: Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing. Connect. Sci. 6(2-3), 247–280 (1994)
Müller, M.: Fundamentals of Music Processing - Audio, Analysis, Algorithms, Applications. Springer (2015)
Nierhaus, G.: Algorithmic composition: paradigms of automated music generation. Springer Science & Business Media (2009)
Payne, C.: Musenet, 2019. https://openai.com/blog/musenet (2019)
Peracha, O.: Improving polyphonic music models with feature-rich encoding. In: Cumming, J., Lee, J.H., McFee, B., Schedl, M., Devaney, J., McKay, C., Zangerle, E., de Reuse, T. (eds.) Proceedings of the 21th International Society for Music Information Retrieval Conference, ISMIR 2020, Montreal, Canada, October 11-16, 2020. pp. 169–175 (2020)
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)
Roberts, A., Engel, J.H., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4361–4370. PMLR (2018)
Root-Bernstein, R.S.: Music, creativity and scientific thinking. Leonardo 34(1), 63–68 (2001)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65(6), 386 (1958)
Semin Kang, S.Y.O., Kang, Y.M.: Automatic music generation and machine learning based evaluation. In: International Conference on Multimedia and Signal Processing. pp. 436–443 (2012)
Sevsay, E.: The cambridge guide to orchestration. Cambridge University Press (2013)
Sternberg, R.J., Kaufman, J.C.: The nature of human creativity. Cambridge University Press (2018)
Tan, H.H.: Chordal: A chord-based approach for music generation using bi-lstms. In: Grace, K., Cook, M., Ventura, D., Maher, M.L. (eds.) Proceedings of the Tenth International Conference on Computational Creativity, ICCC 2019, Charlotte, North Carolina, USA, June 17-21, 2019. pp. 364–365. Association for Computational Creativity (ACC) (2019)
Teng, Y., Zhao, A., Goudeseune, C.: Generating nontrivial melodies for music as a service. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017. pp. 657–663 (2017)
Todd, P.M.: A connectionist approach to algorithmic composition. Computer Music Journal 13(4), 27–43 (1989), http://www.jstor.org/stable/3679551
Trieu, N., Keller, R.: Jazzgan: Improvising with generative adversarial networks. In: MUME workshop (2018)
Valenti, A., Carta, A., Bacciu, D.: Learning a latent space of style-aware symbolic music representations by adversarial autoencoders. CoRR abs/2001.05494 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. pp. 5998–6008 (2017)
Waite, E., et al.: Generating long-term structure in songs and stories. Web blog post. Magenta 15(4) (2016)
Walton, C.W.: Basic Forms in Music. Alfred Music (2005)
Wang, C., Dubnov, S.: Guided music synthesis with variable markov oracle. In: Pasquier, P., Eigenfeldt, A., Bown, O. (eds.) Musical Metacreation, Papers from the 2014 AIIDE Workshop, October 4, 2014, Raleigh, NC, USA. AAAI Workshops, vol. WS-14-18. AAAI Press (2014)
Wang, Z., Wang, D., Zhang, Y., Xia, G.: Learning interpretable representation for controllable polyphonic music generation. CoRR abs/2008.07122 (2020)
Wang, Z., Zhang, Y., Zhang, Y., Jiang, J., Yang, R., Zhao, J., Xia, G.: PIANOTREE VAE: structured representation learning for polyphonic music. CoRR abs/2008.07118 (2020)
Wu, S., Yang, Y.: Musemorphose: Full-song and fine-grained music style transfer with just one transformer VAE. CoRR abs/2105.04090 (2021)
Xenakēs, G.: Musiques formelles: nouveaux principes formels de composition musicale. Ed. Richard-Masse (1963)
Yang, W., Sun, P., Zhang, Y., Zhang, Y.: Clstms: A combination of two lstm models to generate chords accompaniment for symbolic melody. In: 2019 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS). pp. 176–180. IEEE (2019)
Yeh, Y., Hsiao, W., Fukayama, S., Kitahara, T., Genchel, B., Liu, H., Dong, H., Chen, Y., Leong, T., Yang, Y.: Automatic melody harmonization with triad chords: A comparative study. CoRR abs/2001.02360 (2020)
Yu, L., Zhang, W., Wang, J., Yu, Y.: Seqgan: Sequence generative adversarial nets with policy gradient. In: Singh, S.P., Markovitch, S. (eds.) Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. pp. 2852–2858. AAAI Press (2017)
Zhu, H., Liu, Q., Yuan, N.J., Qin, C., Li, J., Zhang, K., Zhou, G., Wei, F., Xu, Y., Chen, E.: Xiaoice band: A melody and arrangement generation framework for pop music. In: Guo, Y., Farooq, F. (eds.) Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. pp. 2837–2846. ACM (2018)
Zhu, H., Liu, Q., Yuan, N.J., Zhang, K., Zhou, G., Chen, E.: Pop music generation: From melody to multi-style arrangement. ACM Trans. Knowl. Discov. Data 14(5), 54:1–54:31 (2020)
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)
Acknowledgements
This research has been partially supported by the Spanish Ministry of Science, Innovation and Universities by the RTI2018-096986-B-C31 contract and the Government of Aragon by the AffectiveLab-T60-20R project.
We wish to thank Jürgen Schmidhuber for his suggestions.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hernandez-Olivan, C., Beltrán, J.R. (2023). Music Composition with Deep Learning: A Review. In: Biswas, A., Wennekes, E., Wieczorkowska, A., Laskar, R.H. (eds) Advances in Speech and Music Technology. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-18444-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-18444-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18443-7
Online ISBN: 978-3-031-18444-4
eBook Packages: EngineeringEngineering (R0)