Perceptive vs reflective: spectator interpretation of multimodal artworks

This article presents an empirical investigation into spectator interpretation of multimodal artworks. Specifically, this article explores perception of the artwork in two different sensorial modalities — sight and sound — and the effect of their interaction. We selected four abstract paintings and created four acousmatic musical pieces that were composed to reflect the content of the artwork. We then ran a between-subjects experimental study in three conditions: visual only, music only, and a combination of both. A total of 48 participants completed an online survey in which they were asked to report their interpretations of the shown artworks. Following a thematic analysis on the collected data, we clustered participants’ interpretation into two main categories: reflective and perceptive. The combination increased spectators’ attention to the artworks, affected the temporality of the artworks, and created richer understandings of the multimodal works. The study provides knowledge for the inclusion of multimodal experiences in the presentation of art expanding the possibilities for inter-sensory dialogue in the arts.


Introduction
Sight and sound are of vital importance to humans.These senses allow us to interact with and interpret the world around us, creating meaning [1].As humans, we coordinate information from various sensorial sources in a multimodal manner to create a completed image of the whole [2][3][4].Many investigations have been performed to explore the way in which visual and auditory interaction combine, both in artistic ventures and in academic studies.For instance, contributions from the arts aimed to explore human-world connection by focusing on the utilisation of moving or videographic imagery, though tended to disregard the combination of static visual artworks and music for artistic purposes [5,6].Academic studies, on the other hand, have been undertaken to understand the mutual effect that sounds and visuals have on each B Oliver Bramah obra335@aucklanduni.ac.nzSue Adams s.adams@auckland.ac.nzFabio Morreale f.morreale@auckland.ac.nz 1 Waipapa Taumata Rau (University of Auckland), Aotearoa, New Zealand other, including static [7] and moving visuals [8].Such studies offer a precious understanding of the interplay between these two senses.However, the qualitative exploration of how sight and sound in combination affect the interpretation of artworks requires further study.
This article presents an empirical investigation into spectators' interpretation of art when static visual artworks and music combine.A series of four modern abstract paintings were selected and music was composed with the specific purpose of accompanying the paintings.A questionnaire was designed with participants experiencing one of three modalities online -visual only, auditory only, or visual-auditory combined.This article extends the original work published in the proceedings of Audio Mostly 2021 [9] which described the concept of the study and the selection of paintings and composition process.Here we report on the findings of the experimental study.

Background
Given the highly interdisciplinary character of this study, its foundations were grounded on several disciplinary areas to construct the framework for our own study: firstly, by examining literature on the sense-making processes in relation to both music and abstract art; secondly, drawing on the considerable body of empirical evidence that examined the combination of sight and sound; and thirdly, literature describing visual music, which are pieces that incorporate sonic and visual aspects.

Sense-making in music and art
The ontological topography of music is complex and contested, yet a unifying characteristic, generally agreed upon throughout history, is music's ability to communicate emotions [10][11][12][13].Apart from languages themselves, music is seen as one of the most complex and complete systems of communication, though due to the heightened dimensionality of musical signs, as well as a general lack of acquaintance with musical language for many people, communication through music is not exact [10,14].This is not always an unintentional outcome of musical exploration as some scholars have made efforts to create what Morreale and colleagues defined as a "Rorschach interface" -an interpretively open system onto which spectators can project their own personal meaning [15].The traditional approach to the analysis of semiotics and sense-making in music has predominantly focused on music's similarities to linguistics, and thus the syntactical and formal aspects of music have been held in the spotlight, though this approach lacks the capacity to describe a complete listening experience [16].Further, music is also seen as a stimulus for motion, and so in this way, music is not only capable of communicating emotions and information, but is also capable of allowing listeners to enact visual and motive concepts, conveyed by and perceived in music [16].As such, the interpretation of music varies between listeners, leading them to undertake unique processes of sense-making in order to interpret music.
Exacerbating discrepancies in musical sense-making is the importance of the instance of sense-making, where knowledge is not only created differently by each individual, but knowledge is generated differently in each situation as well [17][18][19].Sense-making occurs in music at various stages of the artistic process, from conception (composition) to reproduction (performance) and to reception [10].However, for our study, the most pertinent form of musical sense-making occurs during the individual listener's process of cognitive mediation, where they identify symbolic forms, examine their meanings, and explore their personal relationship to that sound [16].Our perceptions of music and sound are strongly linked to the physical world and our place within it.Our instinctive reactions to sound are key to understanding and locating the sound in question and establishing our relationship to that sound [4].As such, when attempting to make sense of sounds, visual cues are of vital importance to building that understanding and placing it within the world and in relation to the listener [1,6].
Similarly to sense-making in music, the sense-making processes associated with abstract and contemporary art are reliant on individual interpretation to create meaning [20][21][22][23].Music and abstract art contain a similar artistic vocabulary, using similar sensorial words to describe their attributes, such as colour, tone, and composition [13].Another commonality is that our interactions with visual artworks, as with our interactions with sounds, are influenced by our biological and instinctive reactions to the artwork [24].Due to the similarities in the sense-making processes of musical and visual art forms, it stands to reason that they could interact successfully to expand the sense-making opportunities for the spectator, without causing significant conflict.However, the key difference in interpretative processes, between those found in musical sense-making and those found in the sense-making of abstract art, is the temporality of the two different art forms.Music by its nature is fleeting and temporally, whereas visual artworks, such as paintings, are static and tangible allowing viewers to actively and flexibly engage with the visual work [23,25,26].As such, in order to blend these two art forms together, the difference in the temporal experience must be considered and adapted to allow both sense-making processes to interweave, rather than conflict.

Sonic-visual interaction
The cross stimulation of sensory modalities, or synaesthesia, is a mixing of senses where associations are formed and presented as manifestations of imaginative thinking, or as double metaphors [27,28].For example, synaesthetic metaphors are common language features, where a word associated with one sense is used to describe another, such as the bright sound of the violin, or the dark tone of a viola, and are prevalent in different languages and cultures [28].Inter-sensory association between sound and vision may link tone with colour, or pitch with brightness.For example, Scriabin stated that the key of C minor was red, while Rimsky Korsakov perceived that it was white [27][28][29].These audio-visual connections can represent neurologically as "real" synaesthesia but can also appear as the result of cultural or learned associations, often referred to as "pseudo" or cultural synaesthesia [30].Examples of cultural synaesthesia may present themselves where colours are linked with sonic or semantic artefacts because of prior associations and learnings, such as in western culture associating black with death, or anger with the colour red.However, the difference between such experiences is overstated, and the lines between the two forms of synaesthesia are blurred [30,31].A multisensory study by Talsma et al. found that linking sonic and visual stimuli increased the speed that participants perceived both sensory modalities, facilitating audio-visual integration, leading to greater cognition and understanding [32].
Music has been argued to be essentially synaesthetic, where the "ephemeral, abstract nature of music simply calls for embodiment" [27, p. 286] through the perception of emotions, experiences, and feelings.The link between sight and sound in the human psyche is powerful, having instinctive physiological and cultural connections [4,30].Leppert reasoned that, "Precisely because musical sound is abstract, intangible, and ethereal-lost as soon as it is gained-the visual experience of its production is crucial to both musicians and audience alike for locating and communicating the place of music within society and culture" [1, p. 7].Sight also engages spectators in the listening/viewing experience for longer periods of time, delaying wear-out [8].This is vital to the interpretation of multimodal artworks as we instinctively use one sense to inform the other.
Our natural inclination to combine different senses, expanded by recent developments in technology, has led to a growing artistic genre and field of research into sonicvisual interaction [6,13,33].An earlier study conducted by Parrott in 1982 sought to evaluate the emotional effects that pieces of romantic music had on early to mid-twentieth century paintings and vice versa [7].The study evaluated responses to the artworks and music both in isolation and in combination, using a series of scaled questions about emotion, congruence, and their opinion of the works to generate quantitative data.Parrott's study predominantly aimed to examine the outcomes of understanding; whether music exacerbated or subverted the intended emotional content of the artworks/pieces shown to the participants.A similar study by Boltz and colleagues in 2009 explicitly looked at the effects that strongly emotional silent videos had on music, both in emotional interpretation and musical cognition [8].This study used quantitative data to demonstrate that visuals did have an effect on music, with participants' perceptions of the music's qualities changing to align with the visuals presented.The combination of the visuals and sound affected cognition and recognition.However, in neither study was there a qualitative exploration of how music affected the viewing experience, participants' interpretation, nor how participants used the different information to create their nuanced understandings of the artworks.
More recently an investigation has explored the emotions of figurative (rather than abstract) artworks and pre-existing musical works.The researchers developed a system to identify the physical features of paintings and music, and the relationship between emotions and those features [34].Using a predefined colour-emotion model to estimate the Arousal Valence (A.V.) value, colour combinations were used when matching the harmonisations of the musical pieces selected.The system enabled the matching of the paintings with music to decipher the artist's emotional intention, on the premise of improving the spectator's understanding of the artwork.

Visual music
The composition of the music for this study was greatly influenced by the paradigms of a genre known as visual music.The term 'visual music' was originally coined to describe the temporal aspects, or temporality, of abstract artworks in the early twentieth century [5,35].These were artworks where the contents of the artwork, such as shapes, colours, and their relations to one another, were designed to incrementally reveal themselves to the spectator as they scanned over and journeyed through the work.The genre of visual music combines moving imagery and acousmatic music (music composed solely for reproduction via speakers, rather than through live performance).There is a focus on temporality and a conscious avoidance of narrative frameworks.
Practitioners of the visual music genre often explore harmonious interactions between moving visuals and sonic artworks [36].For example, Autarkeia Aggregatum is a piece of visual music created by Bret Battey where the visuals are synchronised with the music and are continuously evolving [37].For our study, the compositions, written and produced by the first author, aimed to create similarly congruous relationships between the visual artworks and the sonic artworks.However, the focus on temporality functions differently when pairing music with static visuals.Visual music as a genre relies on moving imagery, [5] and so the music and visuals can temporally reflect one another, whereas music set to static visuals obviously cannot.The creation of musical compositions to directly accompany static artworks explores a niche that has not as yet been uncovered by artistic exploration (see Bramah, Cheng, and Morreale [9] for more information on the compositional process).

Experimental study
An experimental online study with human participants was set up to explore spectators' interpretation when exposed to a combination of music and visual artworks.We used a between-subject study model to gather qualitative and descriptive quantitative data.

Experimental conditions and experimental material
The study comprised of three different conditions: the visual artwork only, audio works only, and the combination of visual artworks and audio works (the multimodal or combined condition).The experimental material included four modern and abstract artworks: one by Meri Karako (Traces, 2021, first on the left in Fig. 1) and three by Sylvia Van Nooten (Sun Scripture, 2021, second from the left; Goddess Symbolism "This is a painting about the unseen language of the sun".• Goddess Symbolism Submerged, Sylvia Van Nooten -(1) Embodiment.( 2) "Searching through feminine scriptures for wholeness".

• Oblique Translation of Imaginary Text, Sylvia Van
Nooten -(1) Curiosity.( 2) "A painting about going so deep into the unknown that new languages/synaptic sequences must be created".
The four artworks were chosen to cover different semantic and emotional contents.We opted to use abstract rather than figurative or realistic art as abstract art is more open to interpretation with there being no single correct answer as to the meaning of the artwork [22,23].This meant we could better judge the disparities and changes in participant interpretations between conditions as participants explored different potential meanings.Abstract artwork also encourages viewers to explore the meanings of an artwork with more flexibility than more traditional art forms [23], ideally leading to greater interplay between sonic and visual art forms as the participants explore potential meanings.The musical works were composed and recorded by the first author, a professional composer, for the specific purpose of accompanying these four artworks.The musical pieces were composed in an acousmatic style and were designed to reflect the painting they were to accompany.The pieces were composed prior to receiving the information from the artists, and so were created based on the composer's intuitive reactions to the visual works 1 (see Bramah and Morreale for details about the selection of the artwork, and the composition process [9]).

Research questions
There were two research foci: (1) to investigate how spectators' interpretation of a visual artwork is affected when presented in combination with a music piece and (2) to explore what interpretative approaches participants used and how these approaches guided participants' reception of the artwork.The first question was underpinned by the evidence suggesting that the combined condition leads to a broader and more nuanced understanding of the multimodal artwork, including that sense-making processes for music and abstract art are similar [10,16,20,21]; and because sonic-visual interaction is instinctive, this results in the cross stimulation of sensory modalities [4,30,32].The second question was entirely exploratory in nature.

Study design
A between-subject study was prepared with participants randomly assigned to one of the three conditions.Thus, each participant would not view the same artwork under another condition, that is, if a participant was shown the visual condition of an artwork, they would not see the same artwork in 1 The musical works in conjunction with their artworks are available to view at the following links: Traces -https://youtu.be/pC_tcE0G2bE.Sun Scripture -https://youtu.be/4zqueZnbC44.Goddess Symbolism Submerged -https://youtu.be/IVFWL6Jud34.Oblique Translations of Imaginary Text -https://youtu.be/DX5kfJ7o0pk.the combined condition, nor the musical work we associated to the artwork.Once participants had entered the online questionnaire, they would randomly be shown one of the four works relevant to the condition they were assigned and asked a series of questions regarding their responses to and interpretations of that artwork.Participants would repeat this process for the remaining three artworks and answer identical questions for each of the four works in their assigned condition.We included open-ended questions to be qualitatively analysed, such as the following: "In less than ten words, describe what you believe this artwork/music depicts?"; "What do you believe the artist/composer intended to convey in this artwork?";and "Do you think the combination of musical and visual artworks change the way you viewed or understood the artwork?", the latter question being specific to the combined condition.

Participant selection and sample
The study was conducted online.Participants were invited through advertisements on social media aimed at students and members of artistic (music and art) communities, who we believed to be stakeholders in this research, and thus most likely to participate.The social media advertisements were put out through the authors' personal Facebook pages, the forum of the Composition cohorts of the University of Auckland, and Facebook groups dedicated to abstract art.Being online, our study also had the potential to reach international audiences, however, as we did not collect any demographic information, we are unable to verify this.The study was approved by The University of Auckland Human Participants Ethics Committee on 19/10/2021.
The questionnaire was hosted on Qualtrics and responses were collected between 19/11/2021 and 07/01/2022.Over the course of this period, 48 usable responses were gathered across the three conditions.

Data analysis
A combination of descriptive quantitative and qualitative analysis was performed, with the emphasis on the qualitative data.The extent to which participants' interpretation overlapped with the authors' stated intentions of their own works was descriptively analysed.From a list of nine words,2 including those used by the artist, we calculated the frequency of occurrence of the artists' words selected by the participants in the data for each condition.The nine words were chosen as they covered a variety of different positions on Plutchik's wheel of emotions [38][39][40], and were not overly specific or nuanced.These words were then ranked, based on frequency.Ranking was undertaken for each condition for each painting.For example, in the visual only condition of the third artwork, the word chosen by the artist was selected by 9.4% of participants, which was the 3rd highest occurrence of words chosen and thus ranked 3rd.
The qualitative data was thematically analysed using an inductive approach [41].This thematic analysis was conducted by each author independently before performing an inter-reliability check to reach consensus.Responses were separated and compared by condition in order to assess whether and how different conditions affected participant interpretation.Initial overall readings of the data were taken and a series of codes were chosen that represented several concepts present within the data.Examples of some codes used were "space", "uncertainty/anxiety", "human figure", and "physical artefact".Each data item was individually examined and codes that reflected the content of the data item were assigned.New codes were iteratively added as more topics and ideas became apparent in the data.
The codes derived from the data were then collated into two main themes; reflective and perceptive, each with two sub-themes; introspective reflective and extrospective reflective, abstract perceptive and concrete perceptive.Once the codes were collated into themes they were then compared to the relevant data items to discern whether the individual data items did in fact reflect these themes.The data were then analysed based on the four themes to discern what themes participant responses focused on in different conditions, how responses changed between conditions, and what information was being used by participants to build their understanding of the works they were shown.The themes and sub-themes were not exclusive and responses could fit into multiple themes and sub-themes.The prevalence of each theme was analysed by examining how many codes dedicated to each theme and sub-theme were present across the data, as well as how explicitly and persistently participants spoke about these codes.

Quantitative descriptive results
Using the set of nine words to describe the painting (including the artists' chosen words), the results suggest that the combined condition has perhaps led to greater alignment between the participants' and the artists' choices of words.For example, in Painting 4 Oblique Translations of Imaginary Texts, the artist chose the word "curiosity" to describe the painting.In the visual condition, the word curiosity was ranked 6th by participants; 8th in the audio condition; and in the combined condition curiosity was ranked 2nd by participants.Of interest is that the compositions were written prior to knowing the artist's descriptive word, yet when music combined with the artwork, understanding improved.Table 1 shows the rankings of the artists' words in relation to the other nine words.
Given the number of potentially confounding variables relating to word choice and interpretation, the quantitative results were not sufficiently substantive to undertake further statistical analysis.

Qualitative results
The following section explores the different themes apparent in the data and how they presented in participant responses.Following this, we compare the data from each condition.Finally, we present the results regarding participants' feedback on how music affected their interpretations of the multimodal artworks.

Perceptive responses
The perceptive theme included responses that examined items that participants perceived in the work, without necessarily reflecting on the connections within the artwork, or between the spectator and the artwork.These included responses that focused on tangible objects, such as "burnt embers" or "the sun and a hand", which made up the concrete perceptive sub-theme, and intangible observations, such as "the sun's bountiful energy" or "a rat race", which made up the abstract perceptive.

Reflective responses
As opposed to perceptive responses, the reflective ones, by contrast, focused on deeper-lying aspects within the artwork that participants would have had to have consciously reflected upon in order to place these aspects within the artwork."Reflective" responses came in two forms, "extrospective" and "introspective".extrospective reflective responses reflected on things such as the relationships between shapes, shaped figures, or the power or position of a figure they perceived within the artwork, among other things.Some examples include "unseen forces, wairua of the forest" or "a mother mourning her child's death".The participant responses that fit into the introspective reflective sub-theme reflected on the internal emotions of themselves or the deeper emotions of the subjects in the artworks that they saw or heard.Some examples that fit into this sub-theme are the following: "letting go of grief", "the repetitive nature of increasingly intense intrusive thoughts", or "being at peace and happy by oneself".

Comparing conditions
Across the entire data set, the reflective theme was more prevalent than the perceptive theme.The balance between the extrospective reflective and concrete perceptive sub-themes were almost equally represented.Though all themes and subthemes were apparent in all three conditions, the balance between themes differed greatly across the three conditions.

Visual Condition
Beginning with the visual condition, the two main themes were fairly equally represented.In the reflective theme, participants described the artworks in such terms as, "Energetic and outgoing, looking at the viewer with inner strength", showing how the painting exists both in itself (extrospective), as well as how it relates to the viewer (introspective).Responding to the Goddess Symbolism Submerged, a participant offered an interpretation: "The figure in the centre appears to be an angel of sorts.The two circular pink motifs on the top left and bottom right look like human brains, where our sense of self is housed (and, furthermore, where our soul could truly reside).These disembodied brains could indicate the journey of souls to the afterlife".This quote features both concrete and abstract perceptions (within the perceptive theme).
However, of the four sub-themes, concrete perceptive was the most prominently featured, describing items they saw within the painting such as "Sun over mountains", "Some kind of channel and an island of barriers", or "A herd of beasts", focusing on what physical attributes of the paintings were depicted without necessarily examining the participant's relationship to it.
The extrospective reflective theme featured quite prominently, due to the number of times participants spoke about or referred to connections or relationships external to themselves, such as "a person consoling person", or "Community cluster", but also included responses such as, "Flow of an organism in an environment", and "Everything is related and they are apart of one another".Here participants appeared to be examining how features of the painting interacted with one another, but not necessarily how they themselves felt about the artwork.
Some responses included both sub-themes (concrete perceptive and extrospective reflective).This is revealed in one participant's response to the painting Oblique Translations of Imaginary text: "The inter-relatedness of movement and stasis -the small repeated writing-like figures give a feeling of movement and flow to the channel.This flow is blocked by barriers, and so there are no writing figures in the centre island.The way the barriers are red giving it a feeling of violence".The participant uses the physical attributes of the painting to establish how the features within the painting interact with one another.However, the participant does not examine how they connected to and felt about the painting.
The other two sub-themes, abstract perceptive and introspective reflective, featured far less often than the extrospective reflective and concrete perceptive themes did.

Audio Condition
Participants in the audio condition tended to focus their responses around sounds that they picked out of the piece.Some responses that discussed sonic aspects of the work were "The gong combined with what appears to be intakes of breath and airy whispers reminds me of incense or the inhalation of smoke", or "The airy drone combined with the gentle flutes lends it an association to trances and tribalistic rituals.The arrythmic scratching evokes an unknown or exotic percussive element and keeps the music from getting tied too much to a beat.There is also a hint of a heartbeat, perhaps to focus you on your own body or feelings".This demonstrates how participants used tangible aspects of the sonic work to formulate their answers.
The audio condition also resulted in very different experiences to those found in the visual condition, with participants offering more personal responses and reflecting more deeply than in the visual condition.Participants showed a tendency towards the reflective theme, and within that, the two sub-themes extrospective reflective and introspective reflective, were very closely balanced.Some participants whose responses reflected the extrospective reflective theme explained that they "imagined a form of interrogation, not the physical type seen in spy movies, but the off balance type of questioning done within untrustworthy relationships", or "A group of people gathering together.They don't know each other -yet -but the overall sense is one of peace, safety, acceptance and happiness".As with the visual condition, these responses examine the interpreted relationships within the artwork without examining their connection to it.Others whose responses demonstrated the introspective reflective theme gave interpretations such as "stuck in a crowd, my fault, enjoy the moment", "I felt like I was in a nightmare trying to escape through a dark creepy tunnel with no end in sight", exploring how they fit into and interacted with the musical composition.
Though many different topics were covered by participants in the audio condition, several stood out.Some of the most common topics spoken about by the participants were places: "This one takes me into a rainforest.I can smell it!","Running through a dark tunnel trying to escape", and "Picturesque fantasy/fairytale market in a busy village"; spaces: "An empty expanse somehow filled with tension", "Abandonment in a huge dark space", and "A serene and natural environment" with words such as nature, vastness, darkness, and emptiness commonly used.Many participants also noted how they perceived anxiety and uncertainty from the music as using words such as anxious, scary, tension, confusion, fear, foreboding.Descriptions included the following: "Had to stop because I got too scared... a horrifying environment", "Something foreboding and unpleasant or dark", "The slight anxiety and pre-grief when you know mourning is coming", or "Playing on the emotions of uncertainty.There is a need to continue, but the fear of the outcome".
In many ways, the following quote, in response to the music for Traces, summarises the overall experiences of participants in the audio condition: "At first I thought I was like a child (from before urbanisation I guess) who hadn't seen much of the world before travelling through a bustling city with lots going on -many things to see, like going through a market, or a crowded plaza and being so filled with joy and wonderment... and then noticing those muffled sounds of people as if I were in a room instead, with those clinking and scraping sounds, hearing the chatter of others through from where I was, which made me think I was in a kitchen but working so happily in a harmonious environment where everyone feels the same, and are joyful and content as they work".We see the participant first attempts to understand the musical work by spatially locating themselves, saying that they were "like a child" who was in a "bustling city", and subsequently examining their emotional reaction to that.The participant then said that they began to draw upon specific sounds within the texture of the piece, using these sounds as evidence to re-evaluate the spatial aspects of their interpretation, relocating it from a city to a kitchen.From this point, they then began explaining the emotions behind both the location and the activity, all the while placing themselves at the centre of this narrative they had created.

Combined Condition
The themes in the combined condition were similarly balanced to the visual condition, though the reflective theme did appear more frequently than the perceptive theme.Like the visual condition, the most prevalent sub-themes were the extrospective reflective and concrete perceptive themes with participant responses showing tendency to focus on physical artefacts or objects, through descriptions such as "War troops planning an attack on each other", "Japanese food, cooking and preparation", or "A shaman conducting a ritual", while still maintaining an interest in the extrospective reflective sub-theme.Descriptions that reflected the extrospective reflective sub-theme included accounts such as "A large argument, political, lots of people", or "The movement and sound of people in a busy city".
However, the key difference to note between the combined condition and the visual condition is that the abstract perceptive and introspective reflective themes were spoken about comparatively more frequently in the combined condition, leading to responses that were more emotionally charged.Some responses that involved such descriptions included "A mother mourning her child's death", "sun greeting, early morning, joyous, awakening", "A mother angel watching over those she loves", or "Insects scurrying, preparing, working together contentedly".These demonstrate how participants are still focusing on tangible depictions within the painting, but are now including more emotional language in their descriptions of the artworks.
It is interesting to note that responses in the combined condition to Traces, a large number of participants recounted depictions of animals, such as "A collective of animals going about their business", "mice in a whirlpool", "Fish frantically swimming in a pond", "Mice emptying the kitchen pantry", or "Bird fluffing her wings".While some animals were described in the visual and audio conditions, there were not nearly as many present as there were in the combined condition.

Responses to multimodal works
In the combined condition we asked participants whether they thought the combination of musical and visual artworks changed the way they viewed or understood the artwork.In answering this question, the majority responded positively.For some, the music affected how participants temporally interacted with the art, saying, "I think that without the music it would've been a completely different experience.The music draws you in, and allows a moment of pause and contemplation".Others described how they remained engaged with the artwork: "The music amplified the way I viewed and understood the artwork.The music encouraged me to view the artwork longer and think about it more deeply", while another participant explained, "The music helped me to keep watching the artwork; it made the artwork seem playful; I began to see a suggestion of a bird fluffing her wings".The music appears to hold people in space and time to engage with the artwork more meaningfully.Interestingly, the participant used the term "watching the artwork [emphasis added] ", a term that would be generally used for observing something that is moving, such as a film, rather than a piece of static art, indicating how the music added motion and evolution.Some responses from participants reflected how the music changed their perception of the temporality of the artwork itself, suggesting that, "It added the visual conception of flow, therefore creating a story that I painted for myself".Participants used the music as a framework to explore the painting through a narrative and changing lens.On describing the painting Traces, one participant stated: "I think the movement in the music and the movement in the painting worked together and gave me the feeling of a group dancing and laughing together".Another participant describing Sun Scripture, created a narrative: "The circle shapes and yellow colours made me think of the sun rising and the music felt like a morning sun greeting".
In several descriptions provided by the participants, they explained how the features of the music drew them towards physical aspects of the painting.Participants described responses such as "The jarring audio highlighted the sharp edges in the artwork.If softer music were playing, I might have focused more on some of the long, sweeping lines or spirals", or "The faster music lent the sense of movement to the black smudges in the visual".Another described how in Oblique Translations of Imaginary Text, "The music made me think of tools that might make the marks in the art work".Others found the music connected them with abstract features of the painting.For example, participants reflecting on the Goddess Symbolism Submerged found the music gave the "impression the artwork had mystical qualities" and "The singing bowl and chanting/humming suggested this was a more contemplative, self-focused image".These examples indicate how sonic aspects of the music focused listeners on both physical and abstract features of the painting.
For others, however, the addition of music alongside the artwork proved a confusing distraction, responding, "Without the music I would have found the blue art peaceful and beautiful, like ocean waves.[The] music doesn't match, and made me confused", or "I think the black marks threw me off.If they were a different colour it would've been a very different reading for me.I was torn between a sinister description -but the up beat music didn't fit that scenario, unless ironically -and fish playing in the pond".For these participants, the music was perhaps perplexing to their understanding.

Discussion
This study extends previous knowledge by qualitatively examining how the personal and nuanced interpretations of a visual artwork are affected when in combination with a musical piece; and how participants responded to and interpreted both visual static artworks and musical pieces both as a single sensory modality (visual and sound, respectively) or in combination as multimodal sensory experience.The results of this study have indicated clear differences in the ways in which artworks are interpreted under different conditions, using the frameworks of reflective and perceptive.
When participants were presented with only the painting (the visual condition) the concrete perceptive theme was prevalent with responses focused on tangible aspects of the paintings.In conjunction with this, the prevalence of the extrospective reflective theme, suggests that in general, participants in the visual condition were relying on what they could find in the painting.This indicates a separation between the spectator and artwork.Participants appeared to be viewing the artwork at an emotional distance, rather than examining their own personal relationship to the artwork and its contents, or how the artwork might relate to their own human experience.This contrasts to the combined (painting and music) and audio (music only) conditions, where participants explored their own emotional connections to the works.
Prior examinations of human listening patterns suggest that sounds are related to spatial experiences, human movements, and emotions in the sounding experience [4,12,42], and our results from the audio condition support this notion.Participants in this condition frequently described the pieces by speaking about places, such as caves or forests, to begin to locate and comprehend the work.However, while participants were developing their understanding of the musical pieces by positioning themselves in the world of the sonic work spatially, they were also emotionally locating themselves within the piece, evaluating their own reactions to explore the meaning of the piece.
In sonic artworks recognisable sounds are picked out of the music to give more specific spatial and/or tangible context to their reactions and responses [6,42].While this process was apparent throughout our findings, it was most clearly demonstrated by a participant's quote which described being a child venturing into a bustling city, before relocating their interpretation to a harmonious and joyful kitchen environment.This finding demonstrates a type of engagement that was different to the visual condition, where participants evaluated the effects of the work on themselves over the course of their experience, rather than the examining the work as separate to themselves and looking for meaning that way.This ultimately led to participants offering more personal responses and reflecting more deeply in the audio condition than in the visual condition.
Interestingly, the results from the combined condition more closely represented the results from the visual condition than the audio condition.The similarity in the balance between the themes in the visual and combined condition would suggest that participants were approaching the combined artworks in a similar way to how they approached the visual artworks.The results revealed that, rather than exploring the spatial aspects of the musical features of the multimodal work, as they did in the audio condition, participants seemed to rely on elements they could find in the painting to explore the meaning and contents to the artwork, reflecting the human inclination to use visuals to embody sounds [27].However, as shown by the prevalence of emotion and introspective reflection in the combined condition, cues from the music are sought to improve comprehension and meaning of the artwork [8], as well as how the work related to them and their human experience.
A consistent difference was noted in responses between the visual and combined conditions illustrating how the cross stimulation of sensory modalities enhances understanding [27,32].This was demonstrated clearly by the description of one of the paintings as "Sun over mountains" in the visual only condition, whereas in the combined condition another participant described the painting as "A rising sun with hope, time to find peace and think, smile".Music added both emotional colour and how the work related to their self.The incorporation of music changed the way that participants interacted with the artwork beyond the initial phase of interpretation, adding deeper layers to their understanding [30].
Multimodal sensory experiences affect spectators' interactions with multimodal artworks because sight and sound are synaesthetically linked when combining the two sensory modalities [1,5,27].This sensory interaction played out in the combined condition where participants were asked about their reactions to the multimodal nature of the artwork.As Casini explicates, "Synaesthesia thus becomes a mode to overcome the limitations of a specific artistic language and medium in order to absorb others, moving toward a total work of art capable of cross-stimulating our senses" [31, p. 5].The combination of sight and sound greatly affected how long participants interacted with the artwork, maintaining interest and leading them to spend more time exploring the artwork because of the presence of the music [5,8,27].There is a risk, however, that multimodal artworks may add complexity, distract or even confuse the spectator [27].Further, the emotional responses elicited when combining visual and audio can be potentially overwhelming.Casini argues that this aspect of cultural synaesthesia remains unexplored as a way of accessing what is stored in the memory and unconscious [31].
Our study becomes particularly meaningful when considering the recent trend towards including other sensorial modalities in artistic interaction.In the field of interactive music, in particular, there are notable examples aimed at integrating further sensorial modalities in the artworks [43][44][45][46].Understanding spectators' experiences and being able to anticipate their interpretation of a multimodal artwork becomes paramount for designers, Human-Computer Interaction, and Museum Studies researchers and practitioners [20].Several scholars have indeed recently focused on understanding spectators' interpretation of interactive systems with a focus on the coconstruction of interpretation by users and artists/designers [15,[47][48][49].In an ever-evolving cultural landscape where multimodal and interactive technologies are ubiquitously present in museum and art galleries, our study provides new knowledge to engage the spectators, enhancing their experiences, comprehension, and appreciation of multimodal artworks.

Conclusion
The experiences of spectators interacting with four abstract paintings in combination with acousmatic musical compositions were enhanced through the multimodal presentation of artworks.The combination of these sensory modalities increased spectators' attention to the artworks, spending more time viewing and interacting with them than with the visual artworks in isolation.The temporality of the artworks was also affected as spectators viewed and explored the works, creating narratives surrounding what they saw and heard.Finally, the combination of visual and audio expanded spectators' comprehension of the works, allowing them to see more detailed and personal meanings within the artwork, creating richer understandings of the multimodal works.Our study affirmed how cultural (or pseudo-) synaesthesia promoted sense-making, enhancing the engagement of the spectator and their comprehension of art.By understanding the synaesthetic mechanisms by which we interpret multimodal artworks, we can expand the possibilities for inter-sensory dialogue in the arts.
This study has explored how sight and sound interact in a two-dimensional digital setting.Future research is required to explore how these two sensory phenomena interact in a three-dimensional setting, such as a physical art gallery or virtual reality gallery, where spectator interaction occurs.Understanding and predicting how spectators interpret artworks and the cognitive and perceptual processes used, as well as exploring how the spectator's experience can be optimised, will guide future practitioners and creators as they endeavour to create multimodal artworks.

Fig. 1
Fig. 1 The four artworks used in this study.In order from left to right: Traces by Meri Karako, Sun Scripture, Goddess Symbolism Submerged, and Oblique Translations of Imaginary Text by Sylvia Van Nooten