Abstract
Musical Vision is a highly flexible, interactive and bio-inspired sonification tool that translates color images into harmonic polyphonic music by mimicking the human visual system in terms of its field of vision and photosensitive sensors. Putting the user at the center of the sonification process, Musical Vision allows the interactive design of fully configurable mappings between the color space and the MIDI instruments and audio pitch spaces to tailor the music rendering results to the application needs. Moreover, Musical Vision incorporates a harmonizer capable of introducing the necessary modifications to create melodies using harmonic chords. Above all else, Musical Vision is an extremely flexible system that the user can interactively configure to convert an image into either a few seconds or a several minutes long musical piece. Thus, it can be used, for instance, with trans-artistic purposes like the conversion of a painting into music, for augmenting vision with music, or for learning musical skills such as sol-fa. To evaluate the proposed sonification tool, we conducted a pilot user study, in which twelve volunteers were tested to interpret images containing geometric patterns from music rendered by Musical Vision. Results show that even those users with no musical education background were able to achieve nearly 70% accuracy in multiple choice tests after less than 25 min training. Moreover, users with some musical education were capable of accurately “drawing by ear” the images from no other stimuli than the sonifications.
Similar content being viewed by others
Notes
See Sensory substitution for the blind: a walk in the garden wearing The vOICe at http://www.youtube.com/watch?v=8xRgfaUJkdM (last accessed on October 2018).
In general terms, people are more used to the idea of describing colors as a mixture of primary colors, be it subtractive or additive (for instance, yellow equals red plus green in the additive color system). Instead, getting users to internalize the description of colors in terms of its HSV components would require a certain training.
The decision of which pixels will sound simultaneously is intimately related to the image scanning process, which is described in Sect. 3.2.2.
Please visit https://goo.gl/4l6wfz.
As regards the Image processing module, the following parameters are presented: visual field number, color information reduction strength, and polyphonic rules (polyphony distribution and simultaneous notes). As for the Music rendering module, the following configuration parameters are provided: image scanning pattern (referred to as Scan type), the degree of harmonization applied on each visual field region (in %). Moreover, information about the Vector spaces module is provided in terms of the predefined chord mapping employed, as well as data regarding the Differentiation tools applied and the tempo.
The designed test is available online at http://goo.gl/lWdgrz.
References
Kramer G et al (1999) Sonification report: status of the field and research agenda. National Science Foundation. http://www.icad.org/websiteV2.0/References/nsf.html. Accessed Oct 2018
Gonzalez RC, Woods RE (2018) Digital image processing, 4th edn. Prentice Hall, Upper Saddle River
Schwartz SH (2004) Visual perception: a clinical orientation. McGraw-Hill Professional, New York City
Sarkar R, Bakshi S, Sa PK (2012) Review on image sonification: a non-visual scene representation. In: Proceedings of the RAIT conference, pp 86–90
Revuelta-Sanz P, Ruiz-Mezcua B, Sánchez-Pena JM, Walker BN (2014) Scenes and images into sounds: a taxonomy of image sonification methods for mobility applications. J Audio Eng Soc 62(3):161–171
Meijer PBL (1992) An experimental system for auditory image representations. IEEE Trans Biomed Eng 39(2):112–121
Haigh A, Brown DJ, Meijer PBL, Proulx MJ (2013) How well do you see what you hear? The acuity of visual-to-auditory sensory substitution. Front Psychol 4:330
Capelle C, Trullemans C, Arno P, Veraart C (1998) A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution. IEEE Trans Biomed Eng 45(10):1279–1293
Renier L, De Volder AG (2010) Vision substitution and depth perception: early blind subjects experience visual perspective through their ears. Disabil Rehabil Assist Technol 5(3):175–183
Payling D, Mills S, Howle T (2007) Hue music—creating timbral soundscapes from coloured pictures. In: Proceedings of ICAD conference, pp 91–97
Peris-Fajarnes G, Dunai L, Praderas VS, Dunai I (2010) CASBliP—a new cognitive object detection and orientation system for impaired people. In: Proceedings of CogSys conference
Yang X, Tian Y, Yi C, Arditi A (2010) Context-based indoor object detection as an aid to blind persons accessing unfamiliar environments. In: Proceedings of ACM multimedia conference, pp 1087–1090
Kopecek I, Oslejsek R (2008) Hybrid approach to sonification of color images. In: Proceedings of ICHIT conference, pp 722–727
Levy-Tzedek S, Hanassy S, Abboud S, Maidenbaum S, Amedi A (2012) Fast, accurate reaching movements with a visual-to-auditory sensory substitution device. Restor Neurol Neurosci 30:313–323
Abboud S, Hanassy S, Levy-Tzedek S, Maidenbaum S, Amedi A (2014) EyeMusic: introducing a visual colorful experience for the blind using auditory sensory substitution. Restor Neurol Neurosci 32:247–257
Okunaka T, Tonomura Y (2012) Eyeke: what you hear is what you see. In: Proceedings of ACM multimedia conference, pp 1287–1288
Cavaco S, Henriques JT, Mengucci M, Correia N, Medeiros F (2013) Color sonification for the visually impaired. Procedia Technol 9:1048–1057
Chambel T, Neves S, Sousa C, Francisco R (2010) Synesthetic video: hearing colors, seeing sounds. In: Proceedings of MindTrek conference, pp 130–133
San Pedro J, Church K (2013) The sound of light: induced synesthesia for augmenting the photography experience. In: Proceedings of ACM CHI conference, pp 745–750
Spiegel E (2010) Metasynth 5.0. http://uisoftware.com/MetaSynth. Accessed Oct 2018
Adhitya S, Kuuskankare M (2012) SUM: from image-based sonification to computer-aided composition. In: Proceedings of CMMR symposium, pp 94–101
Huang YC, Wu KY, Chen MC (2014) Seeing aural—an installation transferring the materials you gaze to sounds you hear. In: Proceedings of ACM TEI conference, pp 323–324
Kubovy M, Schutz M (2010) Audio-visual objects. Rev Philos Psychol 1:4161
Larson AM, Loschky LC (2009) The contributions of central versus peripheral vision to scene gist recognition. J Vis 9(10):6,116
Bowmaker JK, Dartnall HJ (1980) Visual pigments of rods and cones in a human retina. J Physiol 298:501–511
Rocchesso D, Delle Monache S (2012) Perception and replication of planar sonic gestures. ACM Trans Appl Percept 9(4):18
Thoret E, Aramaki M, Kronland-Martinet R, Velay JL, Ystad S (2014) From sound to shape: auditory perception of drawing movements. J Exp Psychol Hum Percept Perform 40(3):983–994
Mayron L (2013) A comparison of biologically-inspired methods for unsupervised salient object detection. In: Proceedings of ICME conference
Rumsey F (2003) Desktop audio technology: digital audio and MIDI principles. Focal Press, Waltham
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Polo, A., Sevillano, X. Musical Vision: an interactive bio-inspired sonification tool to convert images into music. J Multimodal User Interfaces 13, 231–243 (2019). https://doi.org/10.1007/s12193-018-0280-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-018-0280-4