Drawing Music: Using Neural Networks to Compose Descriptive Music from Illustrations

Martín-Gómez, Lucía; Pérez-Marcos, Javier; Rivero, Alfonso José López; Bermúdez, Giovanny Mauricio Tarazona

doi:10.1007/978-3-031-14859-0_3

Lucía Martín-Gómez ORCID: orcid.org/0000-0003-4424-0527¹⁷,
Javier Pérez-Marcos¹⁷,
Alfonso José López Rivero¹⁷ &
…
Giovanny Mauricio Tarazona Bermúdez¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1430))

Included in the following conference series:

International Conference on Disruptive Technologies, Tech Ethics and Artificial Intelligence

365 Accesses

Abstract

The creative capacity of machines is still questioned by researchers and users alike. For this reason, computational creativity does not only focus on the development of machines for the creation of artistic content but also on the evaluation of the generated content. This works presents a system that composes polyphonic music from the drawings of a user in real time. Our proposal provides an analysis of the Fantasia film, produced in 1940 by Walt Disney and deduces the relationship between its audio and images. As part of system development, an LSTM-based Recurrent Neural Network was trained with MIDI music files and a model was obtained. As a result, the proposed system generates polyphonic music with expressive timing and dynamics by inferring chords from the user’s drawings. To assess the creative ability of the machine a Turing test was conducted and the quality of the interconnection between drawings and music was measured by another user test. Additionally, the performance of the considered classifiers is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, G.: Sonart: the sonification application research toolbox. Georgia Institute of Technology (2002)
Google Scholar
Clague, M.: Playing in’toon: walt disney’s “Fantasia’’(1940) and the imagineering of classical music. Am. Music. 22(1), 91–109 (2004)
Article Google Scholar
Conklin, D.: Music generation from statistical models. In: Proceedings of the AISB 2003 Symposium on Artificial Intelligence and Creativity in the Arts and Sciences, pp. 30–35. Citeseer (2003)
Google Scholar
Dhakar, L.: Color thief. http://lokeshdhakar.com/projects/color-thief/ (2011). Accessed: 02 May 2018
Driedger, J., Müller, M., Disch, S.: Extending harmonic-percussive separation of audio signals. In: ISMIR, pp. 611–616 (2014)
Google Scholar
Fitzgerald, D.: Harmonic/percussive separation using median filtering (2010)
Google Scholar
Google: magenta - make music and art using machine learning.https://magenta.tensorflow.org/ (2015). Accessed 21 Feb 2018
Hassan, M., Bhagvati, C.: Evaluation of image quality assessment metrics: color quantization noise. Evaluation 9(1) (2015)
Google Scholar
Ibraheem, N.A., Hasan, M.M., Khan, R.Z., Mishra, P.K.: Understanding color models: a review. ARPN J. Sci. Technol. 2(3), 265–275 (2012)
Google Scholar
Korzeniowski, F., Widmer, G.: Feature learning for chord recognition: the deep chroma extractor. arXiv preprint arXiv:1612.05065 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: Signal Processing Proceedings, 1998. ICSP 1998. 1998 Fourth International Conference on, vol. 2, pp. 1150–1153. IEEE (1998)
Google Scholar
Maher, M.L.: Computational and collective creativity: who’s being creative? In: ICCC, pp. 67–71. Citeseer (2012)
Google Scholar
Mann, Y.: A. I. Duet - A piano that responds to you. https://github.com/googlecreativelab/aiexperiments-ai-duet (2017). Accessed 19 Feb 2018
Martin, C.P., Torresen, J.: Robojam: a musical mixture density network for collaborative touchscreen interaction. arXiv preprint arXiv:1711.10746 (2017)
McCormack, J.: Grammar based music composition. Complex Syst. 96, 321–336 (1996)
Google Scholar
Müller, M., Ewert, S.: Chroma toolbox: matlab implementations for extracting variants of chroma-based audio features. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011. hal-00727791, version 2–22 Oct 2012. Citeseer (2011)
Google Scholar
Müller, M., Kurth, F., Clausen, M.: Audio matching via chroma-based statistical features. In: ISMIR, vol. 2005, p. 6th (2005)
Google Scholar
Navarro-Cáceres, M., Bajo, J., Corchado, J.M.: Applying social computing to generate sound clouds. Eng. Appl. Artif. Intell. 57, 171–183 (2017)
Article Google Scholar
Roberts, A., et al.: Interactive musical improvisation with magenta. In: Proceedings Neural Information Processing Systems (2016)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to SIFT or SURF. In: Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 2564–2571. IEEE (2011)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Sanz, J.C.: Lenguaje del color: (sinestesia cromática en poesía y arte visual). El autor (1981)
Google Scholar
Simon, I., Sageev, O.: Performance rnn: generating music with expressive timing and dynamics. https://magenta.tensorflow.org/performance-rnn (2017). Accessed 19 Feb 2018
Smith, K.: Kenzie smith piano - anime covers for piano. https://kenziesmithpiano.com/anime-midi/ (2018). Accessed 27 Jan 2018
Tsoumakas, G., Vlahavas, I.: Random k-Labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_38
Chapter Google Scholar
Unemi, T., Matsui, Y., Bisig, D.: Identity SA 1.6: an artistic software that produces a deformed audiovisual reflection based on a visually interactive swarm. In: Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology, pp. 297–300. ACM (2008)
Google Scholar
Waite, E., Eck, D., Roberts, A., Abolafia, D.: Generating long-term structure in songs and stories. https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn (2016). Accessed 19 Feb 2018
Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)
Google Scholar
Yang, L.C., Chou, S.Y., Yang, Y.H.: Midinet: a convolutional generative adversarial network for symbolic-domain music generation. In: Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), Suzhou, China (2017)
Google Scholar
Yang, N.C., Chang, W.H., Kuo, C.M., Li, T.H.: A fast mpeg-7 dominant color extraction with new similarity measure for image retrieval. J. Vis. Commun. Image Represent. 19(2), 92–105 (2008)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Pontifical University of Salamanca, C/ Compañía no. 5, 37002, Salamanca, Spain
Lucía Martín-Gómez, Javier Pérez-Marcos & Alfonso José López Rivero
Department of Engineering, Distrital University Francisco José de Caldas, Cr. 7 # 40B-53, Bogotá, Colombia
Giovanny Mauricio Tarazona Bermúdez

Authors

Lucía Martín-Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Javier Pérez-Marcos
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso José López Rivero
View author publications
You can also search for this author in PubMed Google Scholar
Giovanny Mauricio Tarazona Bermúdez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lucía Martín-Gómez .

Editor information

Editors and Affiliations

Facultad de Informática, Universidad Pontificia de Salamanca, Salamanca, Spain
Daniel H. de la Iglesia
Faculty of Science, University of Salamanca, Salamanca, Spain
Juan F. de Paz Santana
Facultad de Informática, Universidad Pontificia de Salamanca, Salamanca, Spain
Alfonso J. López Rivero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martín-Gómez, L., Pérez-Marcos, J., Rivero, A.J.L., Bermúdez, G.M.T. (2023). Drawing Music: Using Neural Networks to Compose Descriptive Music from Illustrations. In: de la Iglesia, D.H., de Paz Santana, J.F., López Rivero, A.J. (eds) New Trends in Disruptive Technologies, Tech Ethics and Artificial Intelligence. DiTTEt 2022. Advances in Intelligent Systems and Computing, vol 1430. Springer, Cham. https://doi.org/10.1007/978-3-031-14859-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-14859-0_3
Published: 28 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14858-3
Online ISBN: 978-3-031-14859-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Drawing Music: Using Neural Networks to Compose Descriptive Music from Illustrations