Using the Startle Eye-Blink to Measure Affect in Players
The startle eye-blink is part of a non-voluntary response that typically occurs when an individual encounters a sudden and unexpected stimulus, such as a loud noise or increase in light. Modulations of the startle reflex can be used to infer affective processing in players. The response can be elicited using simple auditory, visual, electric, or mechanical stimuli. The magnitude of the startle eye-blink is used to infer the unconscious positive (pleasant) or negative (unpleasant) emotional state of the player. It is frequently used in psychology where variations in the magnitude, latency, and duration of the startle response are used to understand attention, workload, affective processing, and psychopathologies such as schizophrenia. By comparison, there has been limited use of this objective measure for studying games. As such, there are opportunities to adapt this measure to studies of player affect in the context of game design. We provide a review of the concepts of “affect” and “affective computing” as they relate to game design and also explain in detail the use of the startle eye-blink for objectively measuring player affect. Finally, the use of the approach is illustrated in a case study for evaluating a serious game design.
KeywordsAffective processing Emotion Startle reflex Startle eye-blink
The startle response is a non-invasive measure of central nervous activity that typically occurs when an individual encounters a sudden, surprising change in their environment (Blumenthal et al., 2005). It is a reflex reaction that occurs without voluntary control and is characterised by protective body reactions, such as eye-blinks and stiffening of neck muscles. In evolutionary terms, it likely provides a defensive response to threatening stimuli and is associated with the fight or flight response. Elements of the startle response, such as the eye-blink, are modulated with an individual’s emotional state; eye-blinks being larger when individuals are highly aroused and in unpleasant emotional states compared to blinks associated with low arousal and positive affect (Witvliet & Vrana, 1995).
The eye-blink component of the startle response is a reflex transmitted by the facial nerve that controls a number of facial muscles including those responsible for eye movement. Historically, the eye-blink reflex has been observed in studies since 1874 (Dawson, Schell, & Böhmelt, 1999). Variations in the amplitude, duration, onset (latency), and probability of the responses have been used to study a variety of psychological phenomena including attention (Filion, Dawson, & Schell, 1993), workload (Neumann, 2002), affective processing (Witvliet & Vrana, 1995), and psychopathologies such as schizophrenia (Swerdlow, Weber, Qu, Light, & Braff, 2008). Apart from variations in amplitude of the response, another common protocol involves determining how the startle response is inhibited when an additional stimuli, often referred to as a prepulse, is presented just prior to the startle stimulus (Swerdlow et al., 2008).
In terms of emotion, it is usually changes in the magnitude of the eye-blink that occur with negative or positive affect that are of interest (Witvliet & Vrana, 1995). Typically, affects are described in a two-dimensional model using arousal and valence (Lang, 1995). Arousal might range from sleepy and relaxed to excited or agitated. Valence, on the other hand, describes the pleasant or unpleasant aspect of an affect. For example, negative valence is generated under conditions that invoke fear or anger and are associated with stronger eye-blinks than those related to positive valence, such as those measured in happy or contented states. Negative and positive valence can be measured with the startle eye-blink and may be combined with other physiological measures of arousal, such as heart rate or skin conductance to classify more distinct emotional states within a two-dimensional model of affect (Witvliet & Vrana, 1995).
Importantly for serious game designers, both positive and negative valence have been strongly associated with positive and negative learning effects (Sabourin & Lester, 2014). For example, positive emotional states, such as engagement, joy, and happiness, lead to increased learning (Bless et al., 1996; Kanfer & Ackerman, 1989; Pekrun, Goetz, Titz, & Perry, 2002; Raghunathan & Trope, 2002). By contrast, negative experiences, such as frustration, anger, and boredom, lead to decreased effort, reduced motivation, and disengagement from learning activities (Meyer & Turner, 2006; Pekrun et al., 2002; Ramirez & Dockweiler, 1987; Sabourin, Rowe, Mott, & Lester, 2011). The startle eye-blink suggests itself as a measure that can be used in serious game design to evaluate the affect generated by gameplay. In simple terms, a positive affect should lead to better learning outcomes in serious games.
Although we believe the startle response holds much promise as a tool to support more objective evaluation of game design, much more work still needs to be done to apply this measure and understand its limitations. Therefore, at this stage, the use of the startle reflex measure for evaluating game design in terms of player affect needs to be approached carefully and tested in more studies. In this mood of cautious optimism, this chapter introduces information about how to use the startle response measure and summarises existing technical guidelines related to collecting, analysing and reporting results with the measure. To complement this review, we present in detail a case study where we have used the response to assist in the design of a serious game to assist in psychological counselling.
Understanding how player affect can be manipulated could impact directly the success of many serious games. Fortunately, the serious game community is not alone in considering the role of emotion in usability criteria such as effectiveness. Understanding, detecting, and responding to emotions and affective user responses are an issue at the forefront of the design of many modern computer systems. The cross-discipline field of study that interprets and simulates human emotions in terms of system design is known as “Affective Computing”. Thus, we begin the chapter with a discussion of the concept of emotion, the importance of affect in interface design, and common approaches to performing affect detection.
2 Affective Computing
Affective computing concerns the practical development of computer systems that are able to detect and respond to human moods and emotions (Calvo & D’Mello, 2010). These systems might recognize the emotions of humans, respond by expressing an emotion in a way that a human can understand and, most ambitiously, even be able to “feel” in the way humans do (Picard, 1997).
Computer games likewise often have a design goal that includes manipulating human affect. For example, it may be desirable to produce an engaging game, dominated by positive affect that better supports learning or cognitive therapy. In a first-person horror game like Slender: The Eight Pages (Hadley, 2012), the intention may be to produce a negative affect such as fear, if that is the experience desired by the player and the intention of the designer (Coppins, 2014). Thus, a good question for any game designer is “What aspects of games make them enjoyable, addictive or engaging, and how do games, or their interactivity, elicit emotional involvement from players?” This area of enquiry involves an understanding of human emotions and emotional responses.
Typically, more subjective approaches are taken to assess player responses to design choices, yet the startle eye-blink provides a more objective possibility for evaluating these design choices and perhaps even adapting gameplay based on a player’s recognising emotional state. As well as dynamic difficulty balancing, that is balancing gameplay difficulty with player ability, we might ideally provide dynamic mood balancing, whereby players’ emotional states are balanced with game mechanics.
Unfortunately, the goal of recognising emotional state is extremely challenging as the concept of emotion is difficult to define let alone measure and even common moods, feelings, and attitudes vary significantly between individuals, both in how they are experienced and how they are expressed (Calvo & D’Mello, 2010). Indeed the mechanisms of emotion are still not agreed on and, in the next section we review some alternative theories of emotion and consider how they relate to the goal of detecting human affective states in computer games.
2.1 Emotional Theories
Emotions are biologically based action dispositions, theorised to be systematic responses that occur when a highly motivated action is delayed (Lang, Greenwald, Bradley, & Hamm, 1993). It has also been proposed that an emotion is the result of a novel circumstance preventing the completion of behaviour (Hebb, 1949). Essentially, emotions can be considered an involuntary response with primitive origins.
The behaviour of very primitive organisms can be categorised into two distinct categories: a direct approach to appetitive stimuli, and a withdrawal from noxious stimuli (Schneirla, 1959). It is theorised that humans follow the same two directives of behaviour, but elaborate acts, delays, and inhibitions have evolved to facilitate more complex, goal-directed paths to achieve withdrawal or approach (Lang, 1995). Thus, while involuntary and principally biphasic (pleasant or unpleasant), the expression of an emotion is mediated by higher level, goal-directed behaviours. This greatly complicates the measurement and even definition of emotion.
Early, more traditional emotion theories tend to focus on emotion as either a means of expression, a form of embodiment, a type of cognitive appraisal, or a social construct (Calvo & D’Mello, 2010). Not surprisingly, it was Darwin who first considered the evolutionary role of emotion in terms of behaviour (Darwin, 2002). Notably, emotions such as interest, joy, surprise, sadness, anger, discuss, contempt, fear, and shyness are considered to be universally recognised (Izard, 1994). As such, detection of these emotional states frequently underpins facial expression and body recognition systems that try to detect emotions.
In contrast, other traditional emotion theories would argue that emotion is more than just a form of expression, being also accompanied by a distinctly embodied physiological state (James, 1884). Assuming a typical physiological response to standard emotions like joy, anger, and fear implies that common patterns of physiological changes could be used to detect common emotional states. Indeed this assumption underlies the use of many objective systems based on detecting emotions using physiological measures.
While emotional expression is incredibly varied and complex, most theorists endorse an approach to emotion that features three components; “subjective feeling”, “expressive behavior”, and “physiological arousal” (Scherer, 1993). Additionally, some add “motivational state”, “action tendency”, and/or “cognitive processing” (Scherer, 1993). While these multiple emotional components are noted, simpler models to capture the motivational basis of emotion have evolved.
Physiological models usually consider the motivational basis of emotion using a very simple, two-factor model featuring affective valence and arousal (Lang, 1995). This dimensional theory of emotion holds that all emotions can be located on a two-dimensional space, as a function of valence and arousal (Ravaja, Saari, Salminen, Laarni, & Kallinen, 2006). In this two-dimensional model, valence represents a user’s emotional reaction to a stimulus, reflecting the degree to which it is a pleasant or unpleasant experience (positive and negative valence, respectively). Arousal indicates the level of activation associated with the experience, from very excited and energised, to sleepy, calm, and/or disinterested (Ravaja et al., 2006). This frequently used model typically uses the startle eye-blink to measure valence and other physiological indicators, such as heart rate or skin conductance to determine arousal.
Although valence and arousal provide the simplest and most commonly used model in affective computing, it has been argued that four dimensions are needed to satisfactorily represent similarities and differences in emotional experience (Fontaine, Scherer, Roesch & Ellsworth, 2007). These four dimensions are: valence, arousal, potency-control, and unpredictability (Fontaine et al., 2007). These four were identified based on the applicability of 144 features, representing six major components of emotion; appraisal of events, psychophysiological changes, motor expressions, action tendencies, subjective experiences, and emotion regulation (Fontaine et al., 2007).
A further, more cognitive-based approach considers emotions as something experienced in relation to the unconscious appraisal of an object or event (Scherer, Schorr, & Johnstone, 2001). This appraisal process may take into account a persons’ experience, their goals, and their ability to take action (Dalgleish, Dunn, & Mobbs, 2009). Cognitive approaches to understanding emotions have generally provided the basis of computational models of emotion used in agent-based systems (Reisenzein et al., 2013).
Considering the role that social interaction plays in the world of emotions means that the context of culture (Salovey, 2003) and society (Kemper, 1991) also impact the understanding of emotions. Calvo and D’Mello (2010) point out that this social construct view of emotions is somewhat underrepresented in the study of affective interface design.
More recently, the underlying neural circuitry of emotions has also come under study by neuroscientists, highlighting the complex overlap of emotion and cognition (Dalgleish et al., 2009), where emotion continually interacts with cognitive processes such as remembering, reasoning, goal setting, and planning. This work in neuroscience highlights that some emotional phenomenon may act below our normal level of consciousness and that emotions are states that emerge from the underlying complex system of underlying affective processes (Coan, 2010).
A recent, alternative model based on neuroscience emphasises a clear distinction between affective processing and emotion such that affective processing generates emotions (Walla & Panksepp, 2013). This model suggests that affective processing forms the neurophysiological basis for emotions, which are behavioural output and thus not a direct measure of processing itself. For example, behaviours such as facial expressions produced by facial muscle contractions are indicative of an emotion generated by underlying affective processing. Thus, if neural activity within affective processing circuits codes for unpleasant, the generated facial expression is negative. If, on the other hand, the neural activity codes for pleasant, the respective facial expression is positive. One consequence of this model is that affective processing can take place without necessarily generating an emotion in an individual. Another consequence is that a measurement approach such as the startle eye-blink records affective processing as distinct from an emotion that may be experienced and reported by a player.
While this chapter will not consider the various emotional theories in more detail, there are a number of good reviews related to affective computing that are available. These include reviews of detection approaches (Pantic & Rothkrantz, 2003; Sebe, Cohen, & Huang, 2005; Zeng, Pantic, Roisman, & Huang, 2009) and the various emotional theories that underpin this work (Barrett, Mesquita, Ochsner, & Gross, 2007; Dalgleish et al., 2009; Russell, 2003). In the next section we consider how emotions are currently detected for applications of affective computing.
2.2 Detecting Emotion
Computer systems designed to detect and respond to human emotional states must trade-off against a number of criteria, including reliability, speed, cost, intrusiveness, and validity (Calvo & D’Mello, 2010). As such, a number of different approaches have been tried that focus on replicating human abilities for interpreting facial expressions, speech, body language, or a combination of these signals.
Detecting emotions from facial expressions assumes that standard expressions (Ekman, 1992) are automatically triggered in response to an affective state being experienced. The Facial Action Coding System (Ekman & Friesen, 1978) was developed to standardise the recognition of the common emotions of joy, sadness, surprise, fear, disgust, and anger. These facial expressions are broken down to smaller units of facial motion that can be identified by trained human observers. While this manual decoding process is expensive, there are ongoing efforts to automate this process using a range of algorithmic classifiers such as Bayesian networks (Gunes & Piccardi, 2007), discriminant analysis (McDaniel et al., 2007), and support vector machines (Bartlett et al., 2006). This approach has been used for educational support (McDaniel et al., 2007) both alone and also in combination with other types of physiological sensors (Arroyo et al., 2009). However, while automated techniques continue to improve, they are generally not yet as effective as manually decoded approaches as most fail to operate in real time or take into account the context in which the interactions are occurring (Zeng et al., 2009).
Another promising approach that relies on the innate expression of emotion is to detect changes in body posture or movement that reflect underlying emotional states (Calvo & D’Mello, 2010). Unlike facial expressions, body movement is usually less prone to conscious control and disguise, and so may provide a more reliable channel of information (Ekman & Friesen, 1969a, 1969b). Posture analysis has previously been used to classify interest levels in children during 20 min of serious gameplay (Mota & Picard, 2003). In this experiment nine posture positions: sitting on the edge, leaning forward, leaning forward right, leaning forward left, sitting upright, leaning back, leaning back right, leaning back left, and slumping back were used to three levels of interest (low, medium, high) and the further states of, taking a break and bored. A similar posture detection system, based on measuring the distribution of body pressure in a chair, was used to categorise boredom, confusion, delight, flow, and frustration, from neutral while college students used an intelligent tutoring systems designed to teach Newtonian mechanics (D’Mello & Graesser, 2009).
The rhythm, stress, and intonation of speech, along with other vocalizations, such as, sighs and laughter have been used extensively to try and detect emotional states (Juslin & Scherer, 2005; Russell, Bachorowski, & Fernandez-Dols, 2003; Zeng et al., 2009). These systems tend to focus only on detecting basic emotions, but they do have the advantage of being nonintrusive, low-cost, fast, and suitable for working with spontaneous real-world speech (Calvo & D’Mello, 2010). Semantic emotional cues can also be extracted from text or speech content using associations between words and affective dimensions such as good or bad, active or passive, and strong or weak (Osgood, May, & Miron, 1975). Furthermore, analysis of word counts and structured sets of words such as Wordnet (Strapparava & Valitutti, 2004) and ANEW (Bradley & Lang, 1999) allow for automatic semantic analysis of text to detect affective states. This approach has been extended to allow for categorising sentiment and opinion analysis of larger populations into emotional categories such as good/bad or angry/sad (Pang & Lee, 2008).
Many non-invasive techniques based on measuring physiological signals or brain activity monitoring and brain imaging have been developed in fields such as psychophysiology and neuroscience (Calvo & D’Mello, 2010). Assuming physiological state and brain activity are appropriate measures of affect, all of these approaches suggest promise in terms of providing objective measures of a user’s emotional state. Typical measures include skin conductance (GSR), brain activity (EEG, MRI), heart activity (ECG), and muscle activity (EMG). The specificity of particular patterns of physiology for detecting specific emotions using such measures of the autonomic nervous system (Ekman, Levenson, & Friesen, 1983) needs to be balanced against significant variations that are known to occur between individuals (Andreassi, 2007).
A number of physiology-based systems have been used to categorise different emotions (Alzoubi, Calvo, & Stevens, 2009; Calvo, Brown, & Scheding, 2009; Nasoz, Alvarez, Lisetti, & Finkelstein, 2004; Picard, Vyzas, & Healey, 2001; Vyzas & Picard, 1998). However, the two key dimensions that can be distinguished using physiology are arousal and valence. High levels of arousal are categorised with faster heart rate and other physiology changes that are activated for human actions such as fright, flight, and fight. Valence, by contrast, refers to either positive or negative association of affect, for example, happy and sad feelings. Modulations in the startle reflex are typically used to measure valence.
In terms of game design, we might expect that people actively seek out and purchase games that deliver positive emotional experiences and enjoyment. However, this needs to be considered in light of the player’s intent, as an enjoyable game may be one that intentionally elicits negative emotions. This is due to the possible enriching effect of negative emotions embedded within positive experiences and products (Fokkinga, Desmet, & Hoonhout, 2010). It is therefore possible that games featuring what are putatively negative actions may prompt positive responses (Ravaja et al., 2006). This may be due to the threats within the game appearing as a challenge to the player rather than a real threat, or that the player finds surviving in an environment perceived to be dangerous as rewarding (Ravaja et al., 2006).
Regardless of the positive or negative emotional reaction, computer games, like other affective interfaces, can act as a stimulus for affective processing, which results in associated feelings or emotions in the user. When a serious game designer can nominate desirable affective states that relate to the serious intention of the game, it is feasible to use affective computing tools to evaluate the design. It is, however, important when evaluating a game design that the intention of an event or game scenario has clearly defined expectations around what emotion the designer is trying to elicit in the player.
The time or game state at which player affect is measured is also critical. Affect, or affective processing, is bound in time to the experience of the game world and the resulting emotional effect that this has (Barrett et al., 2007). This suggests other game analytics should be used in conjunction with affective measures so that player affect is carefully correlated with the game state.
In summary, for interface design, the term affective processing is perhaps a preferable construct to emotion, as the latter is more prone to confusing and arbitrary definitions (Scherer, 2005). Additionally, affect is subconscious and is a more reliable indicator of a person’s core emotional state than self-reported emotion (Filion, Dawson, & Schell, 1998). While subjective ratings from players provide useful information about their perceived emotion, the subconscious nature of affect offers further opportunities for measurement through the collection of physiological data. The startle eye-blink is one such physiological measure that can help to determine the participant’s affective processing, and in particular, measure the valence of a player’s reaction to a startling event. In the next section, we describe previous uses of this measure in game research and provide detailed guidelines for eliciting, recording, and analysing this measure.
3 The Startle Reflex
The concept of a reflex is well known. It is an automatic direct motor response to a stimulus above a certain threshold. Perhaps, the most well-known reflex is the knee-jerk (patellar reflex), but there are various other such automatic motor responses, one of which is the so-called eye-blink reflex. When one is startled by, for instance, a loud noise like a gunshot, bright flash, or a sudden explosion, an involuntary eye-blink is elicited. Although an eye-blink occurring as a startle reflex is an automatic response, its magnitude varies as a function of affective state (Filion et al., 1998). The more positive the current state of affect, the smaller the eye-blink magnitude. The more negative the current affective state, the larger the eye-blink magnitude. This simple correlation forms the very basis for the startle eye-blink to be an excellent measure of affective processing related to any given stimulus, situation, or game being played. Following, we provide more detailed background information, including example studies, which demonstrate the potential of this measure for the serious game community.
3.1 Previous Uses of the Startle Reflex
One of the more interesting applications of the startle reflex has been in the study of people’s responses to commercial products for marketing purposes (Walla, Brenner, & Koller, 2011). This study found significantly reduced eye-blink amplitudes related to “liked” brand names compared to “disliked” brand names. In another marketing study, the startle reflex was used to measure significant differences in preference for bottle shape (Grahl, Greiner, & Walla, 2012). Likewise, the amplitude of the startle response was shown to be stronger when individuals experienced unpleasant versus pleasant odors (Kaviani, Wilson, Checkley, Kumari, & Gray, 1998). Measures of the eye-blink amplitude have also been used to distinguish different affective responses associated with eating different foods (Walla, Richter, Färber, Leodolter, & Bauer, 2010). Compared to eating yoghurt and chocolate, eating ice cream results in the lowest startle responses, or the most positive affect.
In terms of multimedia, a traditional use of the startle reflex in psychology involves grading pleasant versus unpleasant images (Allen, Trinder, & Brennan, 1999; Vrana, Spence, & Lang, 1988). This work typically relates startle results with standardised image libraries such as the International Affective Picture System (Bradley & Lang, 2007) and the Geneva affective picture database (Dan-Glauser & Scherer, 2011). These standard databases are well-correlated with both valence and arousal and can form a useful baseline to study the variations in startle response between individuals.
The startle reflex has also been used to study responses to other media, with the amplitude of the responses shown to be stronger when listening to unpleasant versus pleasant music (Roy, Mailhot, Gosselin, Paquette, & Peretz, 2009). A similar result has been found in emotionally toned film clips (Kaviani, Gray, Checkley, Kumari, & Wilson, 1999), and the response has been used to measure the viewer’s emotional response to television content (Bradley, 2007).
Virtual realities have much in common with computer games and they have also been used in conjunction with the startle response. For example, a study comparing real-world effects with virtual environments used the startle response to determine that participants actively driving through virtual tunnels experienced more negative feelings while in the darker parts of the virtual tunnel (Muehlberger, Wieser, & Pauli, 2008). In a further example, the startle response was used in conjunction with Google Street View to objectively assess affective processing associated with different urban environments (Geiser & Walla, 2011). In this study participants had to virtually walk through six districts of Paris with different median real estate prices. The eye-blink magnitudes of participants were recorded during these walkthroughs. Real estate price was strongly correlated with explicit pleasantness ratings, and the startle measures confirmed affective differences between the most expensive and cheapest districts (Geiser & Walla, 2011). In a further study, a virtual environment viewed from the perspective of the driver of a Humvee was used to examine variations in eye-blink responses in both low-threat and high-threat zones, under immersive and non-immersive conditions, while driving through a virtual Iraqi city (Parsons, Rizzo, Courtney, & Dawson, 2012). The participant’s eye-blink amplitudes increased in the high-threat zone under the high immersion conditions.
Much of the prior research using the startle response in relation to video games examines the tendency for video games to encourage violent behaviour (Wood, Griffiths, Chappell, & Davies, 2004). For example, a recent doctoral dissertation examined the effect of violent video gameplay on modulation of the startle reflex (Elmore, 2012). The study found that participants who played violent video games before being shown unpleasant images elicited lower eye-blink responses (Elmore, 2012). The results were used to support the idea that violent video games desensitize players to violence.
Another related example is the investigation of the effects of violent video games using psychophysical measures such as facial electromyography, skin conductance level, and heart rate (Ravaja, Turpeinen, Saari, Puttonen, & Keltikangas-Järvinen, 2008). In this experiment participants’ real-time emotional responses to playing violent video games were recorded. The study found that all violence within the game either perpetrated by their character or on their character resulted in an increase in arousal. However, violence perpetrated against the player’s character was associated with negative emotion measures, while violence perpetrated by the player’s character was associated with positive emotional measures (Ravaja et al., 2008).
While the use of the startle reflex for studying affect in relation to game design elements has previously been proposed (Lang, 1995; Nacke, 2009; Ravaja & Kivikangas, 2008; Sasse, 2008), few definitive results seem to be reported. In essence, most previously reported research efforts using the startle reflex have attempted to answer the fundamental question of whether video games are “good” or “bad” in terms of influencing future behaviour. While video game designers appear to follow trending design decisions and internal rules, little evidence exists to suggest that developers are using psychophysiological measures such as the startle reflex to make their games more appealing (Wood et al., 2004) or in the case of serious games, more useful.
There are a few exceptions where studies have reported results using the startle reflex to study game design elements. For example, in one study researchers used various physiological indicators such as heart rate, skin conductance, and the startle reflex to gauge the immersion of participants while playing a bespoke level of Half-Life 2 (Grimshaw, Lindley, & Nacke, 2008). The information gathered was associated with the participant’s sense of immersion, with a higher magnitude response indicating that the player was more engaged with the game at the moment of startle pulse (Grimshaw et al., 2008). In a more recent project, the startle response was used to gauge the immersion related to sound on and off conditions in a commercial horror game (Coppins, 2014). As such games are designed to create a sense of fear, it is in theory reasonable to use the startle reflex measure to evaluate how well a negative valence associated with the emotion of fear is generated. Although no significant differences were found in the startle amplitude with the two sound conditions, a significant variation was detected when participants actively played the game as opposed to the situation where they simply watched a replay of the game.
One possible reason for the still limited use of startle response in the game industry, and in particular in the development of games, is that designs tend to be subjectively evaluated. Arguably, this is also the case in the film industry where the manipulation of emotion in viewers is a well-honed skill. Despite this, there are a number of studies that illustrate why a subconscious measure like the startle reflex may be of use in quantifying player’s responses in serious games.
Principally, affect is subconscious, and thus the startle reflex is a more reliable indicator of a person’s core emotional state than self-reported emotion (Filion et al., 1998). For example, in a study investigating the modulation of the startle reflex in depressed versus healthy populations (Allen et al., 1999), it was found that while the self-reported pleasantness measure related to picture presentations was largely similar, the startle reflex data showed clear differences between depressed and non-depressed participants (Allen et al., 1999). The depressed group did not show the typical finding that pleasant images elicit a significantly reduced startle reflex compared to unpleasant images, which indicates that internally, depressed people responded rather negatively to positive image presentations. Such a discrepancy demonstrates how misleading self-reported data can be, especially when related to affective content.
In another study, psychopaths demonstrated normal self-reported responses to emotional images, whereas they did not show typical startle response enhancement as a consequence of unpleasant image presentation (Patrick, Bradley, & Lang, 1993). Once again, this clinical investigation suggests that the startle reflex may provide important information about the inner state of affect of a person that may be more reliable that any explicit response. In some more industry-related studies, similar discrepancies between explicit and implicit measures of affective processing have also been found (Geiser & Walla, 2011; Grahl et al., 2012). Thus, the startle reflex measure may tell us more about the actual state of affect of a person than the person is actually able to do by themselves. While we do not discount the importance of subjective feedback in game design, we do believe an objective measure like the startle response suggests itself as a useful adjunct that can be used in the analysis of game designs.
3.2 Using the Startle Reflex
EMG measures of eye-blink can also be prone to some noise as the changes to surface potentials are small and external electromagnetic interference is common in most environments. Where precise measures are required, magnetic search coils can be placed on the skin to detect subtle changes in magnetic field associated with electrical activity in muscles (Evinger & Manning, 1993). A disadvantage of this approach is the requirement for even larger, more intrusive sensors than required for EMG.
The measurements used in startle eye-blink studies are normally taken from the orbicularis oculi muscle, a muscle that causes a blink (among other functions). An eye-blink reflex is transmitted by the facial nerve. However, the facial nerve also innervates other key facial muscles that are sometimes studied as part of affect research such as the zygomatic and corrugator supercilli muscles. The zygomatic major and minor muscles are associated with facial expressions involving the lips such as smiling, while the corrugator supercilli muscle, sometimes called the frowning muscle, is associated with wrinkling of the forehead. Positive affect has been shown to increase activity in the zygomatic muscles, while negative emotions cause an increase in activity of the corrugator supercilii (Dimberg, Thunberg, & Elmehed, 2000).
It was in the late 1980s, after many pioneering investigations in rodents, that it was found that humans demonstrate a modulated startle reflex as a function of degree of pleasantness (Vrana et al., 1988). Since then, the magnitude of an eye-blink as a response to loud and short acoustic white noise, containing a broad spectrum of frequencies for about 50 ms at a sound level of 105 dB and with a rapid onset, has been used to study affective valence (Mavratzakis, Molloy, & Walla, 2013; Walla et al., 2011). Guidelines on the use of human startle eye-blink EMG studies provide clear direction and consensus on the appropriate use of the technique (Blumenthal et al., 2005). The process of measuring and further detail about analysing the startle eye-blink is available elsewhere (Blumenthal et al., 2005). However, for convenience, the key steps of that process, preparation, eliciting, processing, analyzing, and reporting, are summarised here.
3.2.1 Preparing for Measurement
The eye-blink startle is usually measured by recording changes in surface potential using two electrodes placed below one of the eyes (see Fig. 18.3). Two electrodes are used to independently measure voltage changes and ensure that noise on either electrode can be accounted for. A sudden change in potential is indicative of the brief electrical signal, called an action potential that causes contraction of all, or parts of the orbicularis oculi muscle. The magnitude of the current measured is small, in the order of 0–300 μV, so careful preparation is required to ensure a reliable measurement (Blumenthal et al., 2005).
It is vital that the skin is carefully cleaned before placing the electrodes to help reduce impedance to the electrical signal. This can be done by rubbing the skin with gauze and cleaning with soap and water or alcohol. To further improve impedance, a small amount of electrode gel can be applied to the specific surface of the site of each electrode. However, care must be taken to ensure that the electrode gel does not complete a circuit between the two electrodes. Due to the sensitive nature of skin below the eye, care also needs to be taken that no abrasive materials are used in the preparation and the participant’s eyes are closed so that alcohol fumes do not become a source of irritation (Blumenthal et al., 2005).
The orbicularis oculi surrounds the eye. While the eye-blink response is more precisely discerned on the top of the eye, this is an uncomfortable position for electrode placement and the motion of the upper eyelid can introduce artifacts into the detected signal. The recommended type of electrodes are AG/AgCl miniature electrodes, smaller than 5 mm, contained in a recessed plastic casing with external diameter of less than 15 mm and filled with electrode gel (Blumenthal et al., 2005).
An isolated ground electrode is typically attached to an electrically inactive site such as the middle of the forehead or temple. One active electrode is typically positioned in line with the center of the pupil while the participant looks directly ahead, and a second about 1–2 cm lateral to the first active electrode. The electrodes can be attached using double-sided adhesive collars. It is important to avoid overlapping of the electrode attachment and that the electrodes are placed to ensure they do not interfere with normal eye movement (Blumenthal et al., 2005). It is advisable to check the signal that is being detected before proceeding by asking the participant to perform a voluntary blink. Where the EMG signal is not clear, it may be necessary to reposition or reapply the electrodes.
The two active electrodes need to allow for the same level of conductance to ensure a consistent measure, and as previously noted, high impedance on either electrode can limit the ability to record an accurate signal. The baseline signal should also be inspected for high levels of background noise. Interference from background power lines and equipment in the 50–60 Hz range can be a common problem in some environments and should be avoided if possible. EMG signals from the two active electrodes are amplified differentially, so noise can be reduced by braiding the cables of the two electrodes together and ensuring they are picking the same level of noise (Blumenthal et al., 2005). Shielding equipment may also be used or a specialised environment set aside that is free of excessive electromagnetic interference. However, in computer game studies this is not always possible, so as a fall back, a notch filter in the 50–60 Hz range can be used to reduce noise in the signal. However, use of such a filter will also reduce the measured EMG signal from the eye-blink response that occurs in this 50–60 Hz frequency range.
Another source of noise in EMG measurement can come from large head and eye movements of the participant. This can be controlled in some experiments where participants can be asked to focus on a stationary point and avoid movement. However, this is more difficult to control with active game interfaces. It may be necessary to monitor participants for such movement during the study. Startle responses corrupted by movement of the electrodes may need to be excluded from the study during the analysis phase.
3.2.2 Eliciting the Startle Response
To measure the magnitude and latency of the eye-blink response, the response must first be elicited. Eye-blinks can be elicited by a range of acoustic, visual, electrical, magnetic, and mechanical stimuli, each of which may create variations in the measured response (Blumenthal et al., 2005). Indeed, variations in the response can be caused by the number of factors, such as the frequency of presentation, the background conditions, the composition of the stimulus, as well as the way it is presented. The most commonly used approach is to use an acoustic startle, and white noise is generally the most effective stimuli. This suggests a sound that consists of broadband noise containing frequencies in the range of 20 Hz to 20 kHz.
The magnitude of response, the speed of onset, as well as the probability of elicitation are increased with higher intensity sounds. The response can be influenced by the intensity of the sound and other properties of the sound envelope such as the rise time and duration (Blumenthal et al., 2005). A typical acoustic stimulus is characterised by a maximum amplitude of 100 dB(A) SPL, a rapid rise time, and a duration of around 50 ms (Blumenthal et al., 2005). In summary, sudden, short, loud sounds are more startling.
Another factor that is known to affect responses to an acoustic startle stimulus is the level and nature of other background sounds. For example, pulsing sounds can inhibit the response, while consistent background noise can help facilitate the startle response (Hoffman & Fleshier, 1963). Indeed variations in startle response are often studied by using a prepulse sound prior to the pulse of startle sound. The slightly weaker prepulse sound normally inhibits the stronger startling stimulus with a maximum inhibition typically observed with a 120 ms interval (Graham, 1975). Prepulse inhibition is used in the study of a range of psychological disorders such as schizophrenia (Swerdlow et al., 2008) and conditions that impact on attention (Filion et al., 1993).
An acoustic startle stimulus can be presented either by headphones or loudspeakers. In both cases the intensity of the presentation signal needs to be calibrated using a sound level meter. Properly fitted headphones can ensure a more consistent delivery of the startle stimulus, but can also interfere with other equipment and electrodes. By contrast, the use of loudspeakers may require targeted positioning of the participant between loudspeakers to ensure a consistent presentation of the startle stimulus.
3.2.3 Processing the EMG Signal
The EMG signal related to the eye-blink response oscillates between both positive and negative values around a zero value, in the frequency range of 28–500 Hz (Blumenthal et al., 2005). This suggests that the EMG signal should be recorded at a minimum sampling rate of 1,000 Hz. The time frame of interest in the startle blink is in the order of 0–500 ms. The raw EMG signal, measured on the surface of the skin, is a low voltage signal, typically in the order of a few microvolts (μV), where 1 V is equivalent to 1,000,000 μV.
For analysing the startle response measure, there are a number of key parts of the surface EMG signal that may need to be considered. These include: latency, amplitude, baseline amplitude, peak amplitude, duration, and the integrated EMG (IEMG).
Latency, measured in milliseconds (ms), is the time between the presentation of the startle stimulus and the onset of a significant change in surface EMG that indicates activation of the muscle fibers underlying the active electrodes. The two challenges in recording this signal are to ensure the timing of the stimulus presentation is synchronised with the raw EMG signal and identifying the onset of the response. The first challenge can be overcome by triggering the presentation stimulus electronically using an output channel from the same recording equipment that is monitoring the EMG signal. Conversely, the actual audio startle or an externally generated trigger can be fed into the recording device to accurately mark the raw EMG signal with the exact time the startle stimulus is presented.
Amplitude is a measure of the magnitude in microvolts (μV) of the average EMG signal at a point or interval of the EMG signal. For the startle response, this is usually reported as a magnitude of the rectified EMG signal. The rectified signal is the absolute value of the raw signal and so only contains positive values. This is in contrast to the raw EMG signal that oscillates between positive and negative values. The baseline amplitude is a measure of background electrical activity being detected during an interval of muscle inactivity. For the startle response, it is typically calculated as a mean value for a period of around 150 ms just prior to the startle stimulus. This mean calculation should include the positive and negative variations in the EMG signal. The baseline amplitude can be subtracted from other amplitude measures to help quantify EMG activity that is specific to a muscle response.
The peak amplitude, also measured in microvolts (μV), is the maximum amplitude in an interval of the EMG signal. For the startle probe, the interval of interest is typically taken between the onset of the startle response and the return to baseline of the signal. This peak value minus the baseline amplitude is the measure of most interest for inferring the valence associated with affect.
The duration of the startle response would typically be the interval between the response onset and the return to baseline of the signal. The onset and end of the response is often identified by visually inspecting at the raw EMG trace. This inspection process can be simplified by using a smoothed EMG signal that is cleaner to inspect for key features. This smoothing can be achieved, for example, by a technique like a moving average filter. Longer time filters create more smoothing, but also tend to lower the observed variations in the signal amplitude. Regardless, the selection of onset and end point requires some experience from the observer and can introduce some subjectivity. This subjectivity can be partially offset by using a group of independent observers to select and reject key features and then combining the results. An alternative is to automate this process by considering the standard deviation of the signal from the EMG baseline. For example, the onset might be selected when the mean of a short interval of the signal exceeds two standard deviations of the baseline. The end of the response might be gauged by an interval where the mean returns to within one standard deviation of the baseline.
While peak amplitude is commonly used to assess startle response, an alternative is to use the area under the curve of the rectified EMG signal for a specified interval, such as the duration of the response. During a startle-blink, not all muscle fibers involved may be activated simultaneously. The measured surface EMG may be a composite of the electrical changes due to multiple contractions occurring in different muscle fibers. The integrated EMG can provide a measure of the force of the combined responses, being dependent on both the magnitude and duration of the response. This integrated value is measured in units of microvolt per second.
A detectable startle response is not always elicited in response to a startle stimulus. This is because some individuals may not have a normal response, short- or long-term habituation of the startle response (Valsamis & Schmid, 2011), experimental conditions, or the treatments of interest. Therefore, another calculation that is often reported for a study is the probability that the presentation of the startle stimuli produces an actual startle response. This involves detecting what are called zero or non-responses in relation to the startle stimuli. Zero responses are identified by no significant change in the baseline of the raw EMG signal in a short interval following the stimulus. The onset interval can vary with experimental conditions, but should be identifiable within 20–150 ms of the startle stimulus (Blumenthal et al., 2005). Using a short onset window helps distinguish real responses from background activity and voluntary or spontaneous blinks. To avoid short-term habituation to the response, it is recommended that intervals between startle stimuli are randomised and at least 30–60 s apart (Valsamis & Schmid, 2011).
In general, the processing of the EMG signal can be considered in four distinct steps: amplification, filtering, rectification, and finally either a smoothing or integrating step (Blumenthal et al., 2005).
Amplification of the raw surface EMG is required because it is low voltage signal. The two closely placed and active skin electrodes, located on the orbital muscles, measure underlying changes in muscle voltage related to the action potentials that signal the muscles to contract. The raw signal from these two active electrodes used to measure the startle response is usually differentially amplified. This requires an isolated AC-amplifier with high impedance (>100 MΩ), a high common rejection ratio (>100 dB), and low input noise (Blumenthal et al., 2005). Large individual variations in amplitude can occur between participants and also vary with stimulus conditions and trials. Adjusting the amplification needs to avoid clipping that can occur in the Analog to Digital conversion process, and also be wary of missing small but significant amplitude changes in the signal. For this reason, it is advised that the highest possible resolution be used in the Analog to Digital conversion process. In the order of 16–24 bits is advised, with values sampled at least 1,000 times per second (1 kHz) (Blumenthal et al., 2005).
Filtering of the raw EMG signal is designed to maximize the signal to noise ratio and allow for better detection of the eye-blink response. Background interference is first removed by filtering out frequencies below 28 Hz and above 500 Hz. The low frequency noise can be due to motion of the electrodes or other biological sources such as eye movements, retina activity, or the contraction of other facial muscles. To remove these low frequency artifacts, a digital high-pass filter with an infinite impulse response and 3 dB cutoff at 28 Hz is recommended (Blumenthal et al., 2005). Higher frequency noise due to electrical instruments and background electromagnetic interference can be removed with a low-pass filter. A low-pass, finite impulse filter with a roll-off of 24 dB per octave is one recommended configuration (Blumenthal et al., 2005).
Rectification of the filtered signal is achieved by taking the absolute value of the raw EMG values. This removes problems when averaging the signal that oscillates between positive and negative values. This assumes that the output DC level of the amplifier is centered on zero. During the rectification process, it is also normal to subtract the mean baseline value of the signal. This baseline value, as previously described, can be obtained during a selected pre-stimulus interval, by calculating the mean of the raw EMG values recorded during this interval.
The final step in processing the signal involves the application of smoothing filters and/or the calculation of an area under the curve of the signal amplitude. Various approaches for smoothing are possible and include a simple moving average filter or a variable weight filter if it is desirable to avoid phase shift and multiple peaks in the response (Blumenthal et al., 2005). The calculation of the integral of the signal requires the selection of onset and end points for the response and can be automated with various signal processing techniques, such as a contour following integrator.
3.2.4 Analysing Responses
The final critical step in this process is to analyze the processed signals to identify and quantify the key elements of the startle signal. These include previously discussed values such as the peak amplitude, latency of response, and the probability that a response is elicited after the presentation of the startle stimulus.
The analysis of the processed EMG signal is often performed manually, but might be computer-assisted or fully automated to avoid some subjective bias in the process. The first step of the analysis process involves deciding if each startle response can be discriminated or not. This may not be possible if the signal is contaminated by noise, or if a spontaneous or voluntary eye-blink has occurred around the same time as the stimulus. Movement of the electrodes can sometimes generate artifacts on the EMG signal that exceed the amplitude of any startle response, which prevents reliable identification of the startle response. Furthermore, the startle response should only be elicited within a 20–150 ms time window after the startle stimulus. Thus, a response that occurs outside this onset window should also be rejected. Any rejected trials should be excluded from all further calculations.
Once a response is accepted, it is possible that the response is too small to include and should be categorised as a zero response (non-response). A value of zero is then recorded for the amplitude of this non-response. For any response that is deemed significant, the key characteristics of the response need to be quantified. For example, of those characteristics, the peak amplitude (measured in microvolts) is typically of most interest in startle studies related to measuring affective valence.
In the next section, we briefly introduce some possible uses of the startle eye-blink as an analytical tool to support game design. The next section also reports on a preliminary case study that uses the startle eye-blink to measure the player’s affective valence when interacting with three key parts of a serious game. This case study serves to illustrate the use and reporting of a study using the startle eye-blink.
4 The SHADOW Case Study
There are two basic approaches for integrating game analytics into the game design process; one is summative in nature and the other more formative in intent. A summative evaluation is designed to test a clear design hypothesis in a finished game. For example, the study of Coppins (2014) uses the startle reflex to assess the role of sound in eliciting player affect in a commercial horror game. The intention of this study was to better understand how sound is used in a completed game to create immersion. The second approach involves a more formative evaluation during game development and the intention is to guide or refine a game element. This assumes an iterative development approach and implies a less structured use of the startle reflex to measure player affect surrounding some key design elements of the game.
In the SHADOW case study, we describe such a formative use of the startle eye-blink measure to examine the affective response of players to the three key sections of a serious game. The game is being designed to support psychological counselling where the efficacy of the game is dependent on players learning new skills to manage their own behaviour.
4.1 Background to SHADOW
SHADOW, the serious game, is designed to support online psychological counselling of younger adults aged 18–30, of both genders (Hookham, Deady, Kay-Lambkin, & Nesbitt, 2013). The SHADOW game builds on a more traditional web-based counselling tool called SHADE, which was designed as a clinician-assisted intervention program for the treatment of comorbid depression and alcohol or other drug use problems (Kay-Lambkin, Baker, Kelly, & Lewin, 2011).
SHADE, the precursor to SHADOW, is an internet-delivered, evidence-based, psychological treatment that uses the principles of Cognitive Behavioural Therapy (CBT), mindfulness meditation, and motivation enhancement to target these conditions in an integrated way. A major objective of CBT is to identify and challenge the unrealistic beliefs that maintain a person’s problematic patterns of thinking and behaviour (Beck, Rush, Shaw, & Emery, 1979). In combination, “mindfulness” is an important skill, particularly when learning how to cope with negative automatic thoughts that are associated with depression and drinking alcohol. The central idea of mindfulness is not to prevent these thoughts from occurring, but rather to stop these thoughts from setting in and taking control when they are triggered (Segal, Williams, & Teasdale, 2002).
The efficacy of the SHADE intervention program has previously been demonstrated in a large clinical trial (Kay-Lambkin et al., 2011). However, the efficacy of the program requires participants to complete the 10-week program (Kay-Lambkin et al., 2011) and develop the key skill of mindfulness and thought management. These two skills are considered critical to the efficacy of the SHADE program, but the introduction of these skill-training components at the half way stage of the SHADE program also coincides with the time when most participants choose to leave the program.
Seven participants, 5 male and 2 female, within the ages of 18–30 were recruited for the study using poster and word of mouth. The participants in the study were mainly recruited through convenience sampling, and as such consisted primarily of students at the University of Newcastle. Participants were required to have normal or corrected to normal vision. All participants were informed through a participation information statement about the intention and methods to be used in the experiment, including the fact that occasional startling noises would be played while they played the SHADOW game.
The basic premise and gameplay of SHADOW was explained to participants, who were then asked to play the SHADOW game without instruction for 12 min. The game was played in a standard computing set up: seated, with a keyboard and mouse on a desk surface, with a single monitor and sound delivered via speakers. While playing the game, 11 acoustic startle stimulus were presented using sudden acoustic white noise pulses containing frequencies in the range 20 Hz to 20 kHz at 105 dB, with a rise time of less than 10 μs. An Arcam FMJ Amplifier was used to convey the game sound and deliver the startle stimulus. The stimulus was presented at random intervals across the 12 min experiment period, at no less than 30 s apart. The gameplay was also recorded on video to confirm the locations in the game that the player was engaged when each startle stimulus was presented.
The EMG signal was recorded using an ADInstruments Bio Amp and PowerLab 8/35, in conjunction with Labchart 8 software. It was sampled at 1,000 Hz with a range of 500 μV. A low pass band filter at 50 Hz and a high pass filter at 0.3 Hz were used. Finally, a 50 Hz notch filter was applied, and the signal was inverted. The EMG response and the acoustic stimulation pulses were recorded in Labchart using Channels 2 and 4, respectively. A macro was used within Labchart to start the recordings simultaneously.
At completion of the experiment, the recorded EMG signals were exported from Labchart in an appropriate format for data analysis in Matlab R2014b (220.127.116.11421) from Mathworks Inc. (2014). The raw EMG signal imported into Matlab was expressed in volts (V) at a sample rate of 1 ms. The raw signal was converted to (mV) to conform to reporting standards (Blumenthal et al., 2005; International Society of Electrophysiological Kinesiology, 1980). Following this, full wave rectification (absolute value) was applied to the biphasic signal.
A time interval of 150 ms before the time window of each startle acoustic pulse was used to derive a mean EMG baseline for each response. Individual baselines for each response were used to negate the effect that disturbance in the electrode lines might have over the duration of an experiment (Blumenthal et al., 2005; van Bedaf, Heesink, & Geuze, 2014), with 150 ms considered adequate for obtaining a reliable measurement (van Bedaf et al., 2014). The rectified EMG signal was also smoothed using a 30 sample moving average to allow for visual inspection.
The final step was to classify each startle response by visually inspecting the EMG data using the scoring approach detailed by Blumenthal et al. (2005). For each startle impulse, a decision was made as to whether: (a) the baseline period was contaminated with noise or movement artifact, or an involuntary blink, and thus a stimulated blink could not be quantified and the trial should be rejected, or (b) if a response within the 20–150 ms latency window occurred, and if not, the trial should be recorded as a nonresponse, or (c) a valid response occurred in a trial that has not been rejected and is thus scored as valid.
In total, 77 startle responses were recorded; 11 startle responses for each of the 7 subjects. Upon inspection, 14 % (11/77) were rejected due to signal noise or from being contaminated with responses that occurred prior to the startle stimulus. Of the valid responses, 95.4 % of the startle responses invoked a response, with 4.54 % (3/66) being judged as zero responses.
The player self-paced through the game and so the number of startles recorded in the three key sections varied. In total 24 startle responses were recorded in the instruction element, 18 in the scenario screens and 16 in the mindful challenge. Another four of the valid startles occurred in areas of the game that could not be clearly identified as one of the three key components and so these startles were also excluded from the statistical analysis.
The Instruction component recorded the highest mean amplitude (M = 168.6, SD = 4,551.9), the scenario screens recorded the next lowest mean amplitude (M = 144.6, SD = 11,204.7), and the mindful challenge recorded the lowest mean response amplitude (M = 122.3, SD = 1,510.6). There were large individual variations in the recorded amplitudes and this is reflected in high standard deviations. We assumed unequal variances and then compared the means for the three conditions using a t-test. Mean peak amplitude (valence) was significantly different for the instruction and mindful conditions; t(37) = 2.03, p = .009*. No significant difference was found between the instruction and the scenario component; t(27) = 2.05, p = .406. No significant difference was found between the mindful and scenario components; t(22) = 2.07, p = .414.
The SHADOW game is being designed to engage players in developing two key skills of the SHADE program, namely, managing negative thoughts and mindfulness. In this study we examined the valence of three key components of the game, the instruction, mindfulness, and scenario elements. The study was intended as a formative evaluation to support the game design process and the intent was to measure the player’s affective response to each of these elements. Our intention is that the most positive valence needs to be associated with the mindfulness challenge, as this is where critical new skills are imparted to players. This assumes that a positive affect supports improved learning, an assumption that is supported by previous research (Bless et al., 1996; Kanfer & Ackerman, 1989; Pekrun et al., 2002; Raghunathan & Trope, 2002).
The results indicate that the lowest amplitudes were recorded in the mindful mode, indicating that this component was associated with the most pleasant affect in players. A significant difference was found between the mindful challenge and the instructions section of the game. While these results are pleasing, some care must be taken when trying to interpret this outcome in terms of what it means for evaluating the design of the mindful challenge. The SHADOW case study has principally been provided to illustrate the use of the startle eye-blink measure in a game study. It should be remembered that the use of this study for serious game analytics is still in its infancy. For this reason, we will focus the discussion on the limitations of this study as they relate to general use of the startle eye-blink for assessing game designs and player affect in general.
Large individual variations between amplitude measures are common when using the startle response. This suggests larger sample sizes and more startle events are required. Some care also needs to be taken to ensure that startle responses occur both randomly, to avoid habituation, but also at the targeted design elements in the game. For example, in this study most startles occurred in the instructions section as participant’s read about using the game. While the look and feel of the instruction component was consistent with the other game elements, it contained no actual gameplay. By contrast, the least number of startles occurred in the mindful component and this was the design element we were most interested in studying. This is indicative of the challenge of striving for both randomness and predictability in startle reflex experiments.
It is not possible to directly relate peak amplitude to a precise valence. This is particularly true as no real baseline of valence related to individual pleasantness or unpleasantness has been established. This lack of reference to valence is something that could be partially addressed by collecting suitable baseline data for each individual. This could be achieved by using an image library such as the International Affective Picture System (IAPS) that has been well-studied in relation to the startle response (Bradley & Lang, 2007). In future work, we intend to incorporate this feature into the preliminary part of our studies.
Indeed, most previous studies with studying emotional response with the startle reflex use static images (Bradley & Lang, 2007; Dan-Glauser & Scherer, 2011) or plain text (Witvliet & Vrana, 1995) in carefully controlled laboratory conditions. Games by contrast are highly interactive, partially random, and often dynamic environments. Indeed, one study has shown that active interaction with a virtual environment generates significantly different affect responses compared to a purely passive participation in the environment (Muehlberger et al., 2008). This suggests that the level of interactivity may also need to be considered when using the startle reflex measure. In our study we used a video recorder to also record the game play as a context to when the startle stimulus was presented. Importantly, the startle response quickly adapts to changing grades of pleasantness (Vrana et al., 1988), and although it is also studied in terms of workload (Neumann, 2002) and attention (Filion et al., 1998), it is generally considered independent from cognitive influence. This should make the startle eye-blink an ideal tool to quantify raw affective processing as it occurs while playing a game.
The startle reflex is used to measure valence. In terms of affective computing, most studies include an additional physiological measure such as skin conductance or heart rate to indicate arousal. This is because many consider the motivational basis of emotion using a very simple, two-factor model featuring affective valence and arousal (Lang, 1995). This dimensional theory of emotion holds that all emotions can be located on a two-dimensional space, as a function of valence and arousal (Ravaja et al., 2006).
Manipulating player emotions, whether it is for serious purposes or just to enhance general game experience, is a key responsibility for game designers. A distinguishing feature of the startle eye-blink measure is that it is has been frequently used as a way of objectively evaluating the positive and negative valence associated with a person’s affective state. Importantly, the startle reflex is sensitive to affective processing and not to emotion. This means that it measures raw, basic affective responses, or in other words, the grade of negativity or positivity (or pleasantness) of any stimulus, situation, or environment a subject is exposed to. Eye-blink amplitude is reduced in the case of affective processing coding for pleasantness, whereas it is increased in the case of affective processing coding for unpleasant (Lang, Bradley, & Cuthbert, 1990, 1998).
Thus, the startle eye-blink suggests itself as an objective tool that can be adapted for assessing a player’s affective response to various aspects of game design. Although the measure is relatively new to the serious games community, it has been well established in the field of psychology. However, in fairness many of these previous studies occur in well-controlled conditions with methods that may require some adaption for use with dynamic gaming environments. As a result, there is much need for further foundational work in applying this approach to game evaluation.
The startle response measure is not without some complexities, both in terms of collecting and processing captured data, and interpreting results. Indeed, the relationship between attention, emotion, and cognitive workload raises some complex issues in terms of game design, player perception, and cognition as well as their emotional state. For example, games relying on negative valence or stressful cognitive workloads to engage players are not easily translated to the common two-dimensional spaces used to explain valance and arousal.
Despite these complexities, we believe the startle reflex provides a useful adjunct to other approaches in assessing subconscious player responses to game elements. We also believe that this approach can successfully be adapted for assessing a player’s emotional response to various aspects of game design and predict that the investigation of non-conscious information processing using the startle eye-blink will soon provide a useful new approach for analysing and improving serious game design.
- Alzoubi, O., Calvo, R. A., & Stevens, R. H. (2009). Classification of EEG for emotion recognition: An adaptive approach. In Proceedings of 22nd Australasian Joint Conference on Artificial Intelligence (pp. 52–61), Melbourne.Google Scholar
- Andreassi, J. J. (2007). Human behaviour and physiological response. New York: Taylor & Francis.Google Scholar
- Arroyo, I., Cooper, D. G., Burleson, W., Woolf, B. P., Muldner, K., & Christopherson, R. (2009). Emotion sensors go to school. Proceedings of Artificial Intelligence in Education, 200, 17–24.Google Scholar
- Bartlett, M. S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I., & Movellan, J. (2006, April). Fully automatic facial action recognition in spontaneous behavior. In 7th International Conference on Automatic Face and Gesture Recognition, 2006. FGR 2006. (pp. 223–230). IEEE.Google Scholar
- Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford.Google Scholar
- Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Technical manual and affective ratings. Gainesville, FL: The Center for Research in Psychophysiology, University of Florida.Google Scholar
- Bradley, M. M., & Lang, P. J. (2007). The international affective picture system (IAPS) in the study of emotion and attention. In J. A. Coan & J. J. B. Allen (Eds.), Handbook of emotion elicitation and assessment (pp. 29–46). New York: Oxford University Press.Google Scholar
- Calvo, R. A., Brown, I., & S. Scheding, S. (2009). Effect of Experimental factors on the recognition of affective mental states through physiological measures. In Proceedings of 22nd Australasian Joint Conference on Artificial Intelligence, Melbourne.Google Scholar
- Coppins, W. (2014). Measuring the effect of sound on the emotional and immersive experience of players in a video game: A case study in the horror genre. Honours thesis, The University of Newcastle, Australia.Google Scholar
- Darwin, C. (2002). Expression of the emotions in man and animals. New York: Oxford University Press.Google Scholar
- Desmet, P., & Hekkert, P. (2007). Framework of product experience. International Journal of Design, 1(1), 2007.Google Scholar
- Ekman, P., & Friesen, W. (1969a). Nonverbal leakage and clues to deception. Psychiatry, 32, 88–106.Google Scholar
- Ekman, P., & Friesen, W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Alto, CA: Consulting Psychologists.Google Scholar
- Elmore, W. R. (2012). The effect of violent video game play on emotion modulation of startle. Doctoral dissertation, University of Missouri–Kansas City, Kansas City, MO.Google Scholar
- Evinger, C., & Manning, K. A. (1993). Pattern of extraocular muscle activation during reflex blinking. Experimental Brain Research, 92(3), 502–506.Google Scholar
- Fokkinga, S. F., Desmet, P. M. A., & Hoonhout, J. (2010). The dark side of enjoyment: Using negative emotions to design for rich user experiences. In Proceedings of the 7th International Conference of Design and Emotion Society. Chicago, IL: Spertus Institute.Google Scholar
- Fontaine, J. R., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18(12), 1050–1057.Google Scholar
- Grimshaw, M., Lindley, C. A., & Nacke, L. (2008, October). Sound and immersion in the first-person shooter: mixed measurement of the player’s sonic experience. In Proceedings of Audio Mostly Conference, pp. 1–7.Google Scholar
- Hadley, M. J. (2012). Slender: The Eight Pages [PC game]. Parsec Productions.Google Scholar
- Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.Google Scholar
- Hookham, G., Deady, M., Kay-Lambkin, F., & Nesbitt, K. (2013) Training for life: Designing a game to engage younger people in a psychological counselling program. Australian Journal of Intelligent Information Processing Systems, 13(3): Special issue on Edutainment 2013.Google Scholar
- International Society of Electrophysiological Kinesiology. (1980). Units, terms and standards in the reporting of EMG research. Carbondale, IL: Southern Illinois University School of Medicine.Google Scholar
- Juslin, P. N., & Scherer, K. R. (2005). Vocal expression of affect. In J. A. Harrigan, R. Rosenthal, & K. R. Scherer (Eds.), The new handbook of methods in nonverbal behavior research (pp. 65–135). Oxford, MA: Oxford University Press.Google Scholar
- Kaviani, H., Wilson, G. D., Checkley, S. A., Kumari, V., & Gray, J. A. (1998). Modulation of the human acoustic startle reflex by pleasant and unpleasant odors. Journal of Psychophysiology, 12, 352–361.Google Scholar
- Kay-Lambkin, F., Baker, A. L., Kelly, B., & Lewin, T. J. (2011). Clinician-assisted computerised versus therapist-delivered treatment for depressive and addictive disorders: A randomised controlled trial. Medical Journal of Australia, 195(3), S44–S50.Google Scholar
- MATLAB and Statistics Toolbox Release R2014b. (2014). Natick, MA: The MathWorks, Inc.Google Scholar
- McDaniel, B. T., D’Mello, S., King, B. G., Chipman, P., Tapp, K., & Graesser, A. C. (2007). Facial features for affective state detection in learning environments. In Proceedings of the 29th Annual Cognitive Science Society (pp. 467–472). Austin, TX: Cognitive Science Society.Google Scholar
- Mota, S., & Picard, R. W. (2003). Automated posture analysis for detecting learner’s interest level. In Computer Vision and Pattern Recognition Workshop, 2003 (Vol. 5, p. 49). IEEE.Google Scholar
- Nacke L. (2009). Affective Ludology: Scientific measurement of user experience in interactive entertainment. Blekinge Institute of Technology, Game Systems and Interaction Research Laboratory, School of Computing, Blekinge Institute of Technology, Doctoral Dissertation Series No 2009:04.Google Scholar
- Osgood, C. E., May, W. H., & Miron, M. S. (1975). Cross-cultural universals of affective meaning. Urbana, IL: University of Illinois Press.Google Scholar
- Ramirez, O. M., & Dockweiler, C. J. (1987). Mathematics anxiety: A systematic review. In R. Schwarzer, H. M. Ploeg, & C. D. Spielberger (Eds.), Advances in test anxiety research (pp. 157–175). Hillsdale, NJ: Erlbaum.Google Scholar
- Ravaja, N., & Kivikangas, J. M. (2008, August). Psychophysiology of digital game playing: The relationship of self-reported emotions with phasic physiological responses. In Proceedings of Measuring Behavior (pp. 26–29), Maastricht, The Netherland.Google Scholar
- Sabourin, J., Rowe, J. P., Mott, B. W., & Lester, J. C. (2011). When off-task is on-task: The affective role of off-task behavior in narrative-centered learning environments. In Proceedings of 15th International Conference on Artificial Intelligence in Education (pp. 523–536).Google Scholar
- Salovey, P. (2003). Introduction: Emotion and social processes. In R. J. Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 747–751). Oxford, UK: Oxford University Press.Google Scholar
- Sasse, D. (2008). A framework for psychophysiological data acquisition in digital games. Master’s thesis, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.Google Scholar
- Scherer, K. R., Schorr, A. E., & Johnstone, T. E. (2001). Appraisal processes in emotion: Theory, methods, research. Oxford, UK: Oxford University Press.Google Scholar
- Schneirla, T. C. (1959). An evolutionary and developmental theory of biphasic processes underlying approach and withdrawal. In M. R. Jones (Ed.), Nebraska symposium on motivation (Vol. 7, pp. 1–43). Lincoln, NE: University of Nebraska Press.Google Scholar
- Sebe, N., Cohen, I., & Huang, T. S. (2005). Multimodal emotion recognition. Handbook of Pattern Recognition and Computer Vision, 4, 387–419.Google Scholar
- Segal, Z., Williams, J. M. G., & Teasdale, J. D. (2002). Mindfulness-based cognitive therapy for depression: A new approach to preventing relapse. New York: Guilford.Google Scholar
- Strapparava, C., & Valitutti, A. (2004). WordNet affect: An affective extension of WordNet. LREC, 4, 1083–1086.Google Scholar
- van Bedaf, L. R., Heesink, L., & Geuze, E. (2014, August 27–29). Pre-processing of electromyography startle data: A novel semi-automatic method. In: Proceedings of Measuring Behavior, Wageningen, The Netherlands.Google Scholar
- Villon, O., & Lisetti, C. (2006). A user-modeling approach to build user’s psycho-physiological maps of emotions using bio-sensors. In Proceedings of IEEE RO-MAN 2006, 15th IEEE International Symposium on Robot and Human Interactive Communication, Session Emotional Cues in Human-Robot Interaction (pp. 269–276).Google Scholar
- Vyzas E., & Picard, R. W. (1998). Affective pattern classification. In: Proceedings of AAAI Fall Symposium Series: Emotional and Intelligent: The Tangled Knot of Cognition (pp. 176–182), Orlando, FL.Google Scholar
- Walla, P., & Panksepp, J. (2013). Neuroimaging helps to clarify brain affective processing without necessarily clarifying emotions. In K. N. Fountas (Ed.), Novel Frontiers of Advanced Neuroimaging. InTech. ISBN: 978-953-51-0923-5, doi:10.5772/51761.