Convolutional Neural Networks and Transfer Learning Applied to Automatic Composition of Descriptive Music

  • Lucía Martín-GómezEmail author
  • Javier Pérez-Marcos
  • María Navarro-Cáceres
  • Sara Rodríguez-González
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 801)


Visual and musical arts has been strongly interconnected throughout history. The aim of this work is to compose music on the basis of the visual characteristics of a video. For this purpose, descriptive music is used as a link between image and sound and a video fragment of film Fantasia is deeply analyzed. Specially, convolutional neural networks in combination with transfer learning are applied in the process of extracting image descriptors. In order to establish a relationship between the visual and musical information, Naive Bayes, Support Vector Machine and Random Forest classifiers are applied. The obtained model is subsequently employed to compose descriptive music from a new video. The results of this proposal are compared with those of an antecedent work in order to evaluate the performance of the classifiers and the quality of the descriptive musical composition.


Descriptive music Automatic composition Image Video Transfer learning Convolutional neural networks 



This work was supported by the Spanish Ministry, Ministerio de Economía y Competitividad and FEDER funds. Project. SURF: Intelligent System for integrated and sustainable management of urban fleets TIN2015-65515-C4-3-R.


  1. 1.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  2. 2.
    Clague, M.: Playing in ’Toon: Walt Disney’s "Fantasia" (1940) and the imagineering of classical music. Am. Music 22(1), 91–109 (2004)CrossRefGoogle Scholar
  3. 3.
    Culhane, J.: Fantasia 2000: Visions of Hope. Disney Editions, Glendale (1999)Google Scholar
  4. 4.
    Haykin, S., Network, N.: A comprehensive foundation. Neural Netw. 2(2004), 41 (2004)Google Scholar
  5. 5.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)Google Scholar
  6. 6.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)Google Scholar
  7. 7.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  8. 8.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: 1998 Fourth International Conference on Signal Processing Proceedings, 1998. ICSP 1998, vol. 2, pp. 1150–1153. IEEE (1998)Google Scholar
  10. 10.
    Marks, L.E.: On colored-hearing synesthesia: cross-modal translations of sensory dimensions. Psychol. Bull. 82(3), 303 (1975)CrossRefGoogle Scholar
  11. 11.
    Martín-Gómez, L., Pérez-Marcos, J.: Image and sound data from film Fantasia produced by Walt Disney (2018).
  12. 12.
    Martín-Gómez, L., Pérez-Marcos, J., Navarro-Cáceres, M.: Automatic composition of descriptive music: a case study of the relationship between image and sound. In: Proceedings of the Workshop Computational Creativity, Concept Invention, and General Intelligence (C3GI) 2017 (2017)Google Scholar
  13. 13.
    Martín-Gmez, L., Pérez-Marcos, J.: Data repository of fantasia case study (2017).
  14. 14.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  15. 15.
    Seeger, C.: Prescriptive and descriptive music-writing. Music. Q. 44(2), 184–195 (1958)CrossRefGoogle Scholar
  16. 16.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  17. 17.
    Yang, J., Jiang, Y.G., Hauptmann, A.G., Ngo, C.W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 197–206. ACM (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Lucía Martín-Gómez
    • 1
    Email author
  • Javier Pérez-Marcos
    • 1
  • María Navarro-Cáceres
    • 1
  • Sara Rodríguez-González
    • 1
  1. 1.BISITE Digital Innovation HubUniversity of Salamanca. Edificio Multiusos I+D+iSalamancaSpain

Personalised recommendations