User Interaction for Interactive Storytelling
- 322 Downloads
User interaction is a central component of Interactive Storytelling, yet it has often been neglected, as precedence is often granted to the pursuit of narrative generation techniques. This chapter presents several paradigms of interaction using examples drawn from fully implemented Interactive Storytelling systems and proposes an empirical classification of modes of interaction based on the nature of user involvement. First is provided a context for the need for multimodal interaction with the story world and the influence of the user’s intervention onto the dynamics of story generation. Then, the requirements of affective interaction within the context of Interactive Storytelling are covered and illustrated through traditional multimodal interaction, as well as physiological interfaces. Finally, the potential of Brain-Computer Interfaces (BCI) to support Interactive Storytelling is discussed, in particular through the unification of user experience, user input, and affective filmic theories.
KeywordsMultimodal interfaces Affective interfaces Physiological computing Brain-Computer Interfaces
Paradigms and Technology
Interactive Storytelling (IS) assumes, almost by definition, the existence of interaction mechanisms that will support user intervention in reaction to observed narrative progression, with the ultimate objective of altering the course of action. It can be observed that a significant fraction of IS research has described systems and experiments that are closer to narrative generation than true IS, in the sense that little user interaction actually took place in real time during the narrative itself. Narrative generation in these cases still supports story variability through the initial parameterization of the system and the role of the initial state of the narrative environment. Still, it is fair to propose that proper IS requires mechanisms for user interaction that are tailored to the visual presentation and staging of the virtual narrative. Interaction can be studied from the perspective of the narrative (what prompts it and what changes it brings upon it), as much as from the perspective of the interaction modalities themselves. From a narrative perspective, interaction can target every element that can affect the unfolding of the story, but key to the IS experience is the fact that this process should be an integral part of the user experience. In other words, the medium itself should be the user interface, thus precluding the use of menus or any meta-story level which turns the user into a director. The scope and target of interaction are thus determined by the medium itself, which can be divided into characters and objects.
Interaction is irrevocably linked both to the mechanisms of narrative generation and to the visual interface. User influence is integrated by modifying the information available to the narrative generator, which is a representation of the world state at large (from the concrete instantiation of objects to the abstract beliefs, emotions, or motivations of characters). This is why there should be some level of intelligence in the way in which user interactions are interpreted in terms of the real-time changes they impose on the story world: This intelligence is common to the various modalities which can support interaction and should provide a unified approach to the interaction component of IS.
There is little direct physical interaction between the user and characters in IS, unlike in computer games, even if it need not be a kind of fighting. This is due to a combination of factors. Firstly, the type of embodiment in a 3D environment often makes physical interaction difficult without imposing constraints on navigation and controls which reintroduce a game-like experience. Secondly, it can be reasonably said that the most compatible physical interactions are those which preserve the user’s visual perspective on the characters and the environment and are compatible with his dual role of spectator and actor. For instance, nonverbal behavior through body language can constitute a modality for interaction, taking different forms depending on the diegetic nature of the interaction. In our virtual Madame Bovary system, turning your back on the central character, Emma Bovary, while she is talking is interpreted as a lack of interest and a negative emotional input.
On the other hand, interaction with objects in a virtual world has significant potential since objects themselves can have a strong narrative meaning and act as interaction affordances. Barthes (1966) used the notion of dispatchers to characterize these objects of core narrative importance, although he made reference to a discourse level of presentation in line with the notorious Chekov’s gun. In the context of INs, there is less scope to attract attention to an object at discourse level as a form of foreshadowing (despite attempts at implementing this feature (Bae et al. 2008)), and narrative objects are most often seen as contextual affordances taking into account the course of action. For instance, objects in the environment that could play the role of a gift can be a target for interaction, whether the user’s action would be to steal the gift (Cavazza et al. 2002) or offer it (Pizzi and Cavazza 2007).
Two modes of physical interaction can be described in this case: object interaction and object-mediated character interaction. Direct object interaction is a major modality of non-diegetic interaction, whether the user operates in god mode or ghost mode. It often involves the removal of an action resource from the environment in an effort to impair potential narrative actions being performed by some character. It should be observed that while the authors implemented this feature as early as 2001, as one potential mechanism for interaction, recent user experiments in immersive INs have confirmed its popularity with experimental subjects (Lugrin et al. 2010).
Object-mediated character interaction generally takes place when interaction is diegetic, and a typical case of this would be the handing of objects on stage to other characters while the user is themselves impersonating one of the characters. An example of this can be found in our VR Emma system (Lugrin et al. 2010), which features the ability to offer flowers as a gift but also, in a way that demonstrates the symmetrical role of object-mediated interaction, by accepting or not accepting a gift from a character.
Although some possible form of physical interaction with characters was described, language-based interaction appears to be a more natural form of intervention, even across the various interaction paradigms listed above. When impersonating a character, the user would interact through dialogue, within the technical limitations of state-of-the-art language technologies. This is a case where interaction should be diegetic, which constrains the contents of user utterances, if one assumes a compliant user. (The issue of cooperation is expected to differ greatly according to the interaction paradigms: users acting from within the story are expected to abide by genre conventions because of the diegetic constraint. There is obviously no such expectation when the user intervenes in god mode or ghost mode.) For instance, user-character dialogue in the MRE system (Rickel et al. 2002; Swartout et al. 2006) becomes part of the narrative simulation and is facilitated by the user having to follow the communication style required by the task.
In a mixed-reality narrative, inspired by the James Bond motion picture series, the authors have implemented a different form of multimodal dialogue (Cavazza et al. 2004), based on a combination of expressive gestures and user utterances, where the multimodal emphasis is not so much on reference to the environment as on the nonverbal behavior of the user. These experiments took the diegetic form of interaction to an extreme, where the user themselves becomes part of the visual medium, something that can only be achieved in a mixed-reality installation.
Linguistic input can also support other interaction paradigms, in particular those in which the user is influencing the story from a spectator’s perspective. This is one of the historical interfaces for IS, in which the user shouts advice to the story characters. The authors have implemented this form of input in our first Friends IN (Cavazza et al. 2002) using a simplified spoken grammar with elements of keyword spotting for increased robustness. In this case, user interaction is clearly non-diegetic, as user utterances are not meant to constitute any part of the story itself, simply the medium through which user influence takes place.
The Rationale for Affective Interaction with Interactive Narratives
To a large extent, the entertainment experience of narratives is mediated by their ability to elicit emotions. INs which provide an advanced level of visualization naturally share this property with traditional media, although interaction provides an additional mechanism through the feeling/awareness of agency it introduces. With the recent developments in affective interfaces that directly capture a user’s emotional state, it became possible to envision interactive storytelling systems designed around users’ affective responses. The main criterion for categorizing such affective interactive storytelling systems is whether emotions are captured spontaneously or are part of some sort of acting by the user, as in Cavazza et al. (2003) and Lugrin et al. (2010).
The definition of emotional models is a major challenge for these systems, because of the discrepancy between the basic emotions captured by interface technology and those associated with narration, which tend to be more complex. In this section, physiological computing techniques (Fairclough 2009) have offered new directions to address this problem; there is first a need to review contemporary theories of affective responses to narratives, which have been developed in the context of traditional media like film.
Affective Filmic Models
There has been substantial research in the mechanisms by which traditional media, in particular film, elicit emotional responses in their viewers. It is worth noting that the two main competing affective filmic theories reproduce the character/plot dichotomy identified as a defining concept in interactive narrative. Plot-based approaches attribute affective responses to story progression, the shape of the narrative arc, or discourse-level phenomena. For instance, Cheong and Young (2008) have described a suspense model based on story progression. The affective filmic theory of Smith (2003) posits that many filmic elements act as emotional cues including discourse elements and in particular filmic idioms (e.g., camera positioning and shot structure) and editing. Smith describes affective filmic response as a cyclical process in which emotional cues contribute to building up a specific mood, which in turn renders the viewer more susceptible to further cues. The emotional categories he describes are both traditional ones such as fear and more complex mood descriptions such as depression. One of the elements of Smith’s theory is that emotions can exist without an object or a goal. Conversely, the character-based model assumes that user’s emotions are dictated by the relationship they establish with the story characters. The most elaborate character-based theory has been developed by Tan (1996) and explicitly references empathy. This constitutes a higher-level category that does not fully determine lower-level responses, whether dimensional or categorical.
Emotional Speech Recognition
Spoken interaction provides the most natural form of intervention in a narrative; it is compatible with most IS paradigms and is essential to those in which user interaction should be diegetic. However, there exist many technical limitations when considering the state-of-the-art in speech recognition and natural language understanding. Besides, when user utterances have to be consistent with the story genre, and the genre becomes more literary, they inevitably grow in linguistic complexity, posing additional challenges to automatic processing. In the search for an interaction mechanism that would allow unrestricted linguistic expression, the authors have investigated the use of emotional speech recognition (Vogt et al. 2008). In this context, the authors have shown that emotional speech recognition allowed users to respond to the character Emma in our Madame Bovary system. The limitation imposed by the small set of emotional categories recognized could be offset by a contextual use of these categories, leveraging on the emotional categories used in the planning domain itself.
Interactive Narrative and Passive Physiological Sensing
The main interaction paradigm for IN is based on agency: whether as an actor or a spectator, the user influences the unfolding of the story through her intervention. However, monitoring affective states makes it possible to devise new paradigms, such as narrative adaptation, where story progression and pacing respond in real time to the user’s affective response. This requires an integrated approach, in which narrative generation and user input refer to the same emotional model. The authors have developed a prototype inspired from the filmic theory of Smith, and based upon a medical drama as a baseline narrative, whose visual content supports complete 3D narratives (visualized through the UDK® engine) of up to 7 min in duration.
The narratives generated by the system and their 3D visualizations which are shown to users, utilize the notion of Smithian cues (Smith 2003), where the atmosphere of actions is used to prime user responses to important narrative events. The presentation of the narratives to viewers features dramatic visualizations of narrative events such as patients being in critical medical situations or arguments between characters, with different staging styles corresponding to the intended level of affective content.
The authors have adapted Smith’s original mood-cue approach to make emotional description more specific and amenable to physiological sensing. Two types of affective responses have been considered: one determined by the immediate content of scenes and the other by story progression. Physiological input uses both galvanic skin resistance (GSR), as a measure of arousal, and facial EMG (fEMG), as a measure of valence. The tension and suspense of the core narrative are determined by the intensity of measured arousal of the user through GSR, while the system manages the overall pacing of the narrative by inserting a variety of reorderable and recombinable scenes illustrating plot background and character relationships. EMG signals are used to measure positive and negative response to categories of actions, which are selectively preferred or avoided in future narrative adaptation. This mapping also takes advantage of the different response times of GSR and EMG. The generation of cues over time and the processing of user’s emotional responses reflect the cyclical model of Smith, without however recurring to an explicit representation of the user’s mood. A detailed description of the system’s results and evaluations can be found in Gilroy et al. (2012). It shows that users respond both to emotional cues and the overall story pacing, confirming the initial hypothesis and the system’s ability to process instantaneous responses as well as built-up affective states. There is a certain paradox in this form of “passive interaction,” which removes user’s agency from Interactive Narrative; however, this approach may precisely constitute a promising alternative when considering interaction over prolonged periods of time or continuous affective input. In the next section, the authors consider affective interaction under the opposite character-based paradigm, based in particular on targeted affective responses, such as empathy and anger.
Brain-Computer Interfaces: Empathic and Anger-Based Interaction
While there is no definite answer on which filmic affective theory better accounts for observed phenomena or is more appropriate to the Interactive Narrative context, the above experiments can be analyzed from a system integration perspective. The use of empathy as a unifying concept provides a level of integration of all aspects from narrative generation to user input mechanisms which is difficult to match, as it unifies theory and practice. On the other hand, the strict emphasis on characters may not be appropriate to all genres and may limit the ability to take sophisticated editing into account. The latter aspect is, however, one well supported by Smith’s theory and in agreement with the growing interest in editing as a main influencer of affective and cognitive mechanisms alike (Smith et al. 2012). However, owing to the real-time nature of narrative generation, it remains unclear whether Interactive Storytelling will be able to reach a level of editing sophistication comparable to those from which affective filmic theories have been developed.
The Nature of Influence: Local Versus Global
Not enough research has been dedicated to user motivation for interaction during an IN. In those paradigms where user involvement is continuous, it can be assumed that motivation rests within the participation itself, although, as evidenced with Façade (Mateas and Stern 2002), an interest in the story conclusion can also be maintained. This is a case where narrative influence appropriates some aspects of game-related tasks. The situation is more complex in those paradigms in which the user has more freedom in their decision to interfere with the story. The main balance to be maintained is between the immediate impulse to interfere with an ongoing action and a more strategic influence dictated by a global understanding/appreciation of the plot, where a user attempts to modify the overall course of events. Our experience favors a genre-related interpretation. In narrative genres where situational aspects play a major role, such as comedy, the user may be prompted to interfere with ongoing actions for immediate effect, more so than they would in more dramatic genres, where interventions may be increasingly strategic and aimed at changing the overall course of action, possibly even altering the ending.
Influencing the global progression of a narrative also requires models onto which this progression can be mapped. Drama, in particular when inspired by classical stories, tends to revolve around high-level dimensions, values, and morals, a finding which was independently emphasized by Damiano and Lombardo (2009). This is also a genre, or a family of genres, in which intervention may swing the course of action towards certain dimensions or values. The strong polarity of a novel such as Madame Bovary, where the fate of the central character Emma Bovary can be seen as evolving along the duty/pleasure dimension (Mettinger 1994) makes it possible to describe possible endings along that main dimension (Pizzi et al. 2007). More precisely, the authors have posited that this dimension could be applied not only to the story as a whole but also somehow recursively to individual sections or chapters.
It is one of the paradoxes of IS research that many published systems incorporate very limited interactivity in the traditional sense, i.e., user intervention in the narrative. This could be explained by a widespread assumption that narrative generation remains the cornerstone of interactive narratives and that existing user interaction techniques are ready to be integrated once progress in narrative generation is deemed sufficient to support complete interactive narratives. The authors have challenged this dominant assumption on several grounds. Firstly, interaction paradigms are fully part of the design of an interactive narrative, as they condition the user experience. They are also dependent on the performance of component interaction technologies, albeit in complex fashion: while some basic technologies, such as spoken dialogue, may not yet be ready to support unconstrained user interaction, these constraints may be incorporated into the design of interaction paradigms. This suggests that it may be counterproductive to consider narrative generation in isolation.
The authors have illustrated multiple interaction paradigms that can be classified along two main dimensions: one is the level of involvement of the user in the story, which determines the frequency of interventions, and the other is her relation to the visual narrative itself. The richness of possibilities is often underestimated, probably because the use of game engines as main visual environment has imported their default interaction mode; however, immersive media whether VR or AR provide alternative contexts for the relationship between multimodal interaction and narrative visualization.
Finally, it appears that affective interaction and interactive narratives could develop a symbiotic relationship. Interactive narratives constitute a privileged test bed for affective interfaces in that they can implement various sorts of affective feedback loops in which users’ emotional responses can be studied. Similarly, affective input covers a wide spectrum of user interaction paradigms, from passive spectator (Gilroy et al. 2012) to user acting (Lugrin et al. 2010), in plot-based or character-based approaches (Gilroy et al. 2012, 2013).
- G. Aranyi, F. Charles, M. Cavazza. Anger-based BCI using fNIRS neurofeedback. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST ’15). ACM, New York, NY, USA, pp. 511–521 (2015). http://dx.doi.org/10.1145/2807442.2807447
- B.-C. Bae, R.M. Young, A use of flashback and foreshadowing for surprise arousal in narrative using a plan-based approach, in Proceedings 1st Joint International Conference on Interactive Digital Storytelling (ICIDS), Springer-Verlag, Berlin, Heidelberg, pp. 156–167 (2008)Google Scholar
- R. Barthes. Introduction Á L’analyse Structurale Des Récits. In: L’analyse structurale du récit, Communications, Paris, Seuil, 8, 7–33 (1966) (in French)Google Scholar
- M. Cavazza, F. Charles, S.J. Mead, Characters in search of an author: AI-based virtual storytelling, in Proceedings 1st International Conference on Virtual Storytelling (ICVS), Springer-Verlag, Berlin, Heidelberg, pp. 145–154 (2001)Google Scholar
- M. Cavazza, F. Charles, S. Mead. Interacting with virtual characters in interactive storytelling, in Proceedings 1st International Joint Conference on Autonomous Agents and MultiAgent Systems (AAMAS), IFAAMAS, Bologna, Italy, pp. 318–325 (2002)Google Scholar
- M. Cavazza, O. Martin, F. Charles, S.J. Mead, X. Marichal, Interacting with virtual agents in mixed reality interactive storytelling, in Proceedings of the 4th International Workshop, IVA 2003, Kloster Irsee, Germany, 15–17 Sept, pp. 231–235 (2003)Google Scholar
- M. Cavazza, G. Aranyi, F. Charles, J. Porteous, S. Gilroy, I. Klovatch, G. Jackont, E. Soreq, N.J. Keynan, A. Cohen, G. Raz, T. Hendler, Towards empathic neurofeedback for interactive storytelling, in Proceedings of 2014 Workshop on Computational Models of Narrative (CMN 2014), 31 July –2 Aug, Quebec City (2014)Google Scholar
- R. Damiano, V. Lombardo, A unified approach for reconciling characters and story in the realm of agency, in Proceedings 1st International Conference on Agents and Artificial Intelligence (ICAART), Porto, Portugal, pp. 430–437 (2009)Google Scholar
- S.W. Gilroy, J. Porteous, F. Charles, M. Cavazza, Exploring passive user interaction for adaptive narratives, in Proceedings of the 2012 ACM International Conference on Intelligent User Interfaces (IUI 2012) (ACM, New York, 2012), pp. 119–128Google Scholar
- S.W. Gilroy, J. Porteous, F. Charles, M. Cavazza, E. Soreq, G. Raz, L. Ikar, A. Or-Borichov, U. Ben-Arie, I. Klovatch, T. Hendler, A brain-computer interface to a plan-based narrative, in Proceedings of the Twenty-Third international joint conference on Artificial Intelligence (IJCAI ‘13), ed. by F. Rossi. (AAAI Press, Palo Alto, 1997–2005, 2013)Google Scholar
- J. Gratch, Why you should buy an emotional planner, in Proceedings 3rd International Conference on Autonomous Agents. Workshop on Emotion-Based Agent Architectures (EBAA), ACM, New York (1999)Google Scholar
- J.-L. Lugrin, M. Cavazza, D. Pizzi, T. Vogt, E. André, Exploring the usability of immersive interactive storytelling, in Proceedings 7th ACM Symposium on Virtual Reality Software and Technology (VRST), ACM, New York, pp. 103–110 (2010)Google Scholar
- A. Mettinger, Aspects of Semantic Opposition in English (Oxford University Press, Oxford, 1994)Google Scholar
- D. Pizzi, M. Cavazza, Affective storytelling based on characters’ feelings, in Proceedings Intelligent Narrative Technologies: Papers from the AAAI Fall Symposium, AAAI Press, Palo Alto (2007)Google Scholar
- D. Pizzi, F. Charles, J.-L. Lugrin, M. Cavazza, Interactive storytelling with literary feelings, in Proceedings 2nd International Conference on Affective Computing and Intelligent Interaction (ACII), Springer-Verlag, Berlin, Heidelberg, pp. 630–641 (2007)Google Scholar
- W. Swartout, J. Gratch, J.R. Hill, E. Hovy, S. Marsella, J. Rickel et al., Toward virtual humans. AI Magazine 27(2), 96 (2006)Google Scholar
- E.S. Tan, Emotion and the structure of narrative film: Film as an emotion machine (L. Erlbaum Associates, Mahwah, 1996)Google Scholar
- T. Vogt, E. André, N. Bee. EmoVoice – a framework for online recognition of emotions from voice, in Proceedings 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems, Springer-Verlag, Berlin, Heidelberg, pp. 188–199 (2008)Google Scholar
- N. Zagalo, A. Torres, V. Branco. Passive interactivity, an answer to interactive emotion, in Entertainment Computing (ICEC), Springer-Verlag, Berlin, Heidelberg, pp. 43–52 (2006)Google Scholar