Emotional Congruence in Video Game Audio
- 110 Downloads
Video game audio is more challenging in many regards than traditional linear soundtracking. Soundtracking can enhance the emotional impact of gameplay, but in order to preserve immersion, it is important to have an understanding of the mechanisms at work when listeners respond to audio emotionally.
Video game soundtracking presents a number of unique challenges in contrast to traditional linear soundtracking (e.g., in television or film). Many solutions are in use to address the most common problem: dynamic soundtrack creation in response to gameplay action, but these often approach the problem from the point of view of, for example, procedural audio techniques (Collins 2009). One of the major reasons to include soundtracking is to enhance the emotional response of the player, for example, to accentuate danger, success, failure, and other elements of gameplay (Berndt 2011). Depending on the type of game, there may be established musical grammars to convey such things, and thus emotional congruence is vitally important in maintaining player immersion (Arsenault 2005). Defining what we mean by “emotion” is important here, as perceptual science often refers to a number of synonymous terms (including mood and affect). These are sometimes distinguished between according to the length and specifics of the feeling, with mood being longer lived, for example (Williams et al. 2014). If a player is in danger, do they feel threatened, excited, afraid, or angry? Are any of these terms quantifiable (is there a medium level of fear that might be increased as the level of danger in the game increases?) and are they distinct and in a linear relationship to one another (if fear increases, will excitement also increase?)? These are difficult questions to answer but of vital importance for the game sound designer if successful emotionally congruent soundtracking is to be achieved. For our purposes we will consider terms including affect, mood, and emotion as being interchangeable, and rather than referring to affective states or moods, we will refer simply to emotion or emotional responses.
How Is Emotion in Music Measured?
There are established methods for evaluating emotional responses in traditional psychology and cognitive sciences (Schubert 1999). These can be, and have been, adapted to the evaluation of emotional responses to music. A popular model is the two-dimensional (or circumplex) model of affect (Russell 1980). This plots positivity (valence) on a horizontal axis and activation strength (or arousal) on the vertical axis. Thus, a player state with high valence and low arousal might be described as calm, peaceful, or simply happy. This approach has the advantage of being able to quantify the emotional response – we can have a sliding scale for both axes and perhaps a particular emotional coordinate for a player at a given point in the narrative of the game. This approach also facilitates some direct mapping to a given soundtrack which matches the player state. However, this type of model is problematic. For example, let us consider a state which is very negative and also very active (low valence and high arousal). How might a player in such a condition be described – afraid? Or angry? Both are very active, negative states, but both are quite different types of emotional response. So three-dimensional models of emotion have also been proposed, for example, including dominance as a third axis (in which case, afraid might be a passive response, at one end of the scale in the third axis, and angry would be the dominant response at the opposite end of the same scale) (Mehrabian 1996). This three-dimensional model and models like it have also come under criticism when adapted to use with musical stimuli, and multidimensional, music-specific models have recently been used (Scherer 1995).
Challenges and Future Technology
One of the most important issues when considering emotional congruence in video game soundtracking is the distinction between an emotion which the music communicates and an emotion which the player actually feels (Gabrielsson 2002). Imagine that the player is in a part of the narrative which requires generically happy sounding music. The tune which is selected by the audio engine might resemble circus music, but in this particular case, the player has a phobia of the circus and in particular of clowns. The music may then create the opposite of the intended emotional state in the player. Similarly, there is a growing body of evidence which suggests that when sad music is played to a listener who are in a similar emotional state, the net effect can actually be that the listener’s emotional response is positive, due to an emotional mirroring effect which releases some neurosympathetic responses (Molnar-Szakacs and Overy 2006). There is some research suggesting that music has the power to be perceived as a sympathetic listener and to make people in negative emotional states feel “listened to.” Therefore, giving generically sad music to the player at a particular point in the narrative might also be inappropriate. Beyond this, almost everyone has slightly different tastes in music, including preferences for certain genres, performers, and even specific songs (Kreutz et al. 2008). These individual differences are very challenging for the game audio designer to account for, but the greatest challenge remains that of adaptability to nonlinear narrative changes (changes under the control of the player or other game agents). Early solutions such as looping material can become repetitive and ultimately break player immersion. Branching strategies, wherein different music cues are multiplexed at narrative breakpoints, can drastically increase the compositional complexity required in the audio design strategy (Lipscomb and Zehnder 2004). An alternative might be to apply procedural audio techniques, which have been used with varying degrees of success in game sound effect sequencing and design. However, for music tasks, the computational cost involved in procedural generation can be large – for example, if long sequences of pre-rendered audio are required for particular narrative sequences with various nonlinear properties, in response to gameplay and intended emotional state. Such solutions can also require a great deal of compositional complexity (with only limited savings in this regard over branching strategies) but have been used successfully in some instances, for example, LucasArts iMuse system (Strank 2013) which was a dynamic music streaming system which initially used MIDI files with varying degrees of transformation to imply characters and emotional states. This system was used to accompany role-playing games (including the Indiana Jones series and, perhaps most famously, the Monkey Island series of games). iMuse implemented two now commonplace solutions, horizontal re-sequencing and vertical re-orchestration, both of which were readily implementable due to the use of MIDI orchestration as a structural representation of the music soundtrack, rather than a definitive (i.e., recorded and rendered) digital audio file.
In the future, we might see an optimized solution, combining machine learning approaches to composition with an individual’s own selection of music or the use of biophysiological measures of emotion to manipulate a soundtrack to best maximize the intended, induced emotional response in an individual gamer on a case-by-case basis. These solutions sound far-fetched at the time of writing, but due to the increase in wearable biosensing technology, and the ever-decreasing cost of more complicated associated technology (facial recognition, electroencephalography, galvanic skin response), such technology may well become commercially viable in the world of game audio soon.
- Arsenault, D.: Dark waters: spotlight on immersion. Game-On North America 2005 Conference Proceedings, pp. 50–52 (2005)Google Scholar
- Berndt, A.: Diegetic music: new interactive experiences. Game Sound Technology and Player Interaction Concepts and Development, pp. 60–76 (2011)Google Scholar
- Gabrielsson, A.: Emotion perceived and emotion felt: same or different? Music. Sci. 5(1 Suppl), 123–147 (2002)Google Scholar
- Strank, W.: The legacy of iMuse: interactive video game music in the 1990s. Music Game, pp. 81–91 (2013)Google Scholar