Skip to main content

Advertisement

Log in

Musical Vision: an interactive bio-inspired sonification tool to convert images into music

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Musical Vision is a highly flexible, interactive and bio-inspired sonification tool that translates color images into harmonic polyphonic music by mimicking the human visual system in terms of its field of vision and photosensitive sensors. Putting the user at the center of the sonification process, Musical Vision allows the interactive design of fully configurable mappings between the color space and the MIDI instruments and audio pitch spaces to tailor the music rendering results to the application needs. Moreover, Musical Vision incorporates a harmonizer capable of introducing the necessary modifications to create melodies using harmonic chords. Above all else, Musical Vision is an extremely flexible system that the user can interactively configure to convert an image into either a few seconds or a several minutes long musical piece. Thus, it can be used, for instance, with trans-artistic purposes like the conversion of a painting into music, for augmenting vision with music, or for learning musical skills such as sol-fa. To evaluate the proposed sonification tool, we conducted a pilot user study, in which twelve volunteers were tested to interpret images containing geometric patterns from music rendered by Musical Vision. Results show that even those users with no musical education background were able to achieve nearly 70% accuracy in multiple choice tests after less than 25 min training. Moreover, users with some musical education were capable of accurately “drawing by ear” the images from no other stimuli than the sonifications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. See Sensory substitution for the blind: a walk in the garden wearing The vOICe at http://www.youtube.com/watch?v=8xRgfaUJkdM (last accessed on October 2018).

  2. In general terms, people are more used to the idea of describing colors as a mixture of primary colors, be it subtractive or additive (for instance, yellow equals red plus green in the additive color system). Instead, getting users to internalize the description of colors in terms of its HSV components would require a certain training.

  3. See Fig. 3 in Sect. 3.2.2 for details on how this correspondences are made depending on how the music rendering process is configured by the user.

  4. The decision of which pixels will sound simultaneously is intimately related to the image scanning process, which is described in Sect. 3.2.2.

  5. Please visit https://goo.gl/4l6wfz.

  6. As regards the Image processing module, the following parameters are presented: visual field number, color information reduction strength, and polyphonic rules (polyphony distribution and simultaneous notes). As for the Music rendering module, the following configuration parameters are provided: image scanning pattern (referred to as Scan type), the degree of harmonization applied on each visual field region (in %). Moreover, information about the Vector spaces module is provided in terms of the predefined chord mapping employed, as well as data regarding the Differentiation tools applied and the tempo.

  7. The designed test is available online at http://goo.gl/lWdgrz.

References

  1. Kramer G et al (1999) Sonification report: status of the field and research agenda. National Science Foundation. http://www.icad.org/websiteV2.0/References/nsf.html. Accessed Oct 2018

  2. Gonzalez RC, Woods RE (2018) Digital image processing, 4th edn. Prentice Hall, Upper Saddle River

    Google Scholar 

  3. Schwartz SH (2004) Visual perception: a clinical orientation. McGraw-Hill Professional, New York City

    Google Scholar 

  4. Sarkar R, Bakshi S, Sa PK (2012) Review on image sonification: a non-visual scene representation. In: Proceedings of the RAIT conference, pp 86–90

  5. Revuelta-Sanz P, Ruiz-Mezcua B, Sánchez-Pena JM, Walker BN (2014) Scenes and images into sounds: a taxonomy of image sonification methods for mobility applications. J Audio Eng Soc 62(3):161–171

    Article  Google Scholar 

  6. Meijer PBL (1992) An experimental system for auditory image representations. IEEE Trans Biomed Eng 39(2):112–121

    Article  Google Scholar 

  7. Haigh A, Brown DJ, Meijer PBL, Proulx MJ (2013) How well do you see what you hear? The acuity of visual-to-auditory sensory substitution. Front Psychol 4:330

    Article  Google Scholar 

  8. Capelle C, Trullemans C, Arno P, Veraart C (1998) A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Trans Biomed Eng 45(10):1279–1293

    Article  Google Scholar 

  9. Renier L, De Volder AG (2010) Vision substitution and depth perception: early blind subjects experience visual perspective through their ears. Disabil Rehabil Assist Technol 5(3):175–183

    Article  Google Scholar 

  10. Payling D, Mills S, Howle T (2007) Hue music—creating timbral soundscapes from coloured pictures. In: Proceedings of ICAD conference, pp 91–97

  11. Peris-Fajarnes G, Dunai L, Praderas VS, Dunai I (2010) CASBliP—a new cognitive object detection and orientation system for impaired people. In: Proceedings of CogSys conference

  12. Yang X, Tian Y, Yi C, Arditi A (2010) Context-based indoor object detection as an aid to blind persons accessing unfamiliar environments. In: Proceedings of ACM multimedia conference, pp 1087–1090

  13. Kopecek I, Oslejsek R (2008) Hybrid approach to sonification of color images. In: Proceedings of ICHIT conference, pp 722–727

  14. Levy-Tzedek S, Hanassy S, Abboud S, Maidenbaum S, Amedi A (2012) Fast, accurate reaching movements with a visual-to-auditory sensory substitution device. Restor Neurol Neurosci 30:313–323

    Google Scholar 

  15. Abboud S, Hanassy S, Levy-Tzedek S, Maidenbaum S, Amedi A (2014) EyeMusic: introducing a visual colorful experience for the blind using auditory sensory substitution. Restor Neurol Neurosci 32:247–257

    Google Scholar 

  16. Okunaka T, Tonomura Y (2012) Eyeke: what you hear is what you see. In: Proceedings of ACM multimedia conference, pp 1287–1288

  17. Cavaco S, Henriques JT, Mengucci M, Correia N, Medeiros F (2013) Color sonification for the visually impaired. Procedia Technol 9:1048–1057

    Article  Google Scholar 

  18. Chambel T, Neves S, Sousa C, Francisco R (2010) Synesthetic video: hearing colors, seeing sounds. In: Proceedings of MindTrek conference, pp 130–133

  19. San Pedro J, Church K (2013) The sound of light: induced synesthesia for augmenting the photography experience. In: Proceedings of ACM CHI conference, pp 745–750

  20. Spiegel E (2010) Metasynth 5.0. http://uisoftware.com/MetaSynth. Accessed Oct 2018

  21. Adhitya S, Kuuskankare M (2012) SUM: from image-based sonification to computer-aided composition. In: Proceedings of CMMR symposium, pp 94–101

  22. Huang YC, Wu KY, Chen MC (2014) Seeing aural—an installation transferring the materials you gaze to sounds you hear. In: Proceedings of ACM TEI conference, pp 323–324

  23. Kubovy M, Schutz M (2010) Audio-visual objects. Rev Philos Psychol 1:4161

    Article  Google Scholar 

  24. Larson AM, Loschky LC (2009) The contributions of central versus peripheral vision to scene gist recognition. J Vis 9(10):6,116

    Article  Google Scholar 

  25. Bowmaker JK, Dartnall HJ (1980) Visual pigments of rods and cones in a human retina. J Physiol 298:501–511

    Article  Google Scholar 

  26. Rocchesso D, Delle Monache S (2012) Perception and replication of planar sonic gestures. ACM Trans Appl Percept 9(4):18

    Article  Google Scholar 

  27. Thoret E, Aramaki M, Kronland-Martinet R, Velay JL, Ystad S (2014) From sound to shape: auditory perception of drawing movements. J Exp Psychol Hum Percept Perform 40(3):983–994

    Article  Google Scholar 

  28. Mayron L (2013) A comparison of biologically-inspired methods for unsupervised salient object detection. In: Proceedings of ICME conference

  29. Rumsey F (2003) Desktop audio technology: digital audio and MIDI principles. Focal Press, Waltham

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Sevillano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Polo, A., Sevillano, X. Musical Vision: an interactive bio-inspired sonification tool to convert images into music. J Multimodal User Interfaces 13, 231–243 (2019). https://doi.org/10.1007/s12193-018-0280-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-018-0280-4

Keywords

Navigation