Narrative Balance Management in an Intelligent Biosafety Training Application for Improving User Performance
- First Online:
- Cite this article as:
- Alvarez, N., Sanchez-Ruiz, A., Cavazza, M. et al. Int J Artif Intell Educ (2015) 25: 35. doi:10.1007/s40593-014-0022-z
- 809 Downloads
The use of three-dimensional virtual environments in training applications supports the simulation of complex scenarios and realistic object behaviour. While these environments have the potential to provide an advanced training experience to students, it is difficult to design and manage a training session in real time due to the number of parameters to pay attention to: timing of events, difficulty, user’s actions and their consequences or eventualities are some examples. For that purpose, we have extended our virtual Bio-safety Laboratory application used for training biohazard procedures with a Narrative Manager. The Narrative Manager controls the simulation deciding which events will take place in the simulation, and when, by controlling the narrative balance of the session. Our hypothesis is that the Narrative Manager allows us to increase the number of tasks for the user to solve and, due to balancing difficulty and intensity, it keeps the user interested in training. When evaluating our system we observed that the Narrative Manager effectively introduces more tasks for the user to solve, and despite that, is accepted by the users as more interesting and not harder than an identical system without a Narrative Manager. Also, a knowledge test demonstrated better results in users’ interest and learning output in the narrative condition.
KeywordsIntelligent tutoring systems Interactive narrative Drama managed environment User interest
Virtual intelligent training environments show great potential for teaching techniques and protocols in an affordable, effective and scalable way (Chen Y.F., et al. 2008), especially in cases where training in real life proves to be costly and difficult to manage on a large scale. They have been adapted to increasingly complex application scenarios, and are now able to simulate almost any kind of environment with realistic graphics and accurate object behaviour (Carpenter et al. 2006; van Wyk and de Villiers 2009; Belloc et al. 2012). Experiments with these applications show promising results regarding users’ feeling of immersion and presence (Schifter et al. 2012; Wang et al. 2011) and engagement (Charles et al. 2011). In another work, (Reger et al. 2011) a virtual reality simulation is used to treat posttraumatic stress disorder in soldiers obtaining a reduction in their symptoms, showing that the virtual environment are effective in cases when real life rehearsals are not feasible. However, creating new training scenarios and managing the events that happen in such interactive environments is a complex task, because it is necessary to keep the difficulty balanced and pay attention to the user’s engagement: if the training is too hard for the user, s/he won’t be able to progress well. On the contrary, if the training is too easy or repetitive for her/him, the user may lose interest in the training, affecting the learning outcome.
Some researchers proposed to add a ‘narrative layer’ to training systems (Marsella et al. 2000). They hypothesized that a narrative centred learning environment will both increase the engagement of the user and the learning outcome.
We want to focus on this kind of integration of educational content and narrative, but with a more practical point of view: using narrative techniques in a control module for the training sessions will allow us to manage events in the session and show them in a more attractive form for the user. Systems with such direct control in the dramatic flow of the session are rare. Narrative flow has been formally defined as a sequence of events that generates an emotional response in the user, and the control of the flow is typically achieved by modelling the grade of conflict the user experiences in each moment (Crawford 2004; Ware and Young 2011).
Our contribution in this paper is to introduce a new drama manager that controls the dramatic flow in a virtual training scenario aimed at practicing biohazard procedures. We developed a virtual Bio-safety Lab (under submission in Prendinger et al. 2014), which serves as a virtual training environment for medical students. This virtual environment targets the training of protocols in response to an accident in the laboratory. These protocols are too dangerous to train in the real laboratory, and paper tests do not cover all the unexpected problems that can arise during the clearing of the accident. For this reason, a virtual application (in this case a virtual laboratory) can be a very good solution to this real need in medical teaching. Bio-safety Lab acts as a virtual tutoring system via user task recognition and has been tested already in two field experiments.
The remainder of the paper is organized as follows. The following section reviews previous work on virtual training applications and narrative control, and compares our system to that work. Thereafter, we will introduce our system, which uses a drama manager to control the events in a biohazard training environment. Here we first describe 3D interaction with the Bio-safety Lab and real-time task recognition, and then introduce the narrative techniques used in our training system. Next, we will present an experiment we conducted to investigate the engagement and training effects of our system, and discuss the results. Finally, we conclude our work and indicate future steps.
Virtual training systems vary greatly in the method used. Testing systems in which users give answers and advance to the next step if they gave the correct response has been written about widely. For instance, in (van Wyk, and de Villiers 2009) an application designed to train accident prevention in a mine makes the user to check videos of a scenario and detect dangerous spots, only advancing when the user gives the correct answer. Other examples of these “testing systems” are a computer vision system checking user gestures following hand-washing procedures (Corato, Frucci, and di Baja 2012) or a tool designed for teaching how the modules of an energy plant work(Angelov and Styczynski 2007). These systems allow a high degree of immersion thanks to 3D graphics and videos, but lack flexibility regarding the user’s participation in the training scenario.
More advanced training scenarios are the work of (Belloc et al. 2012) that implement intelligent objects with realistic behaviours, or the work of (Johnson and Rickel 1997), which incorporates an interactive tutor capable of correcting the user or explaining the rationale of certain steps. This additional functionality requires a more complex architecture. In the case of adding complex behaviour to scenario objects, a widely accepted strategy is to use a layered architecture that supports low, physics-related event translation into high-level events. For this translation process authors use knowledge-based techniques like rules or ontologies, such as in systems of (Lugrin, J.-L. and Cavazza 2007) or (Kallmann and Thalmann 2002) that include common sense behaviour for objects, which is difficult to implement with only a physical engine. Similarly to those works, our Bio-safety Lab includes both intelligent object behaviour and a real-time interactive tutoring system that corrects users’ mistakes, in addition to a narrative layer controlling the simulation.
Virtual Training Systems Based on Narrative Control
Prevous works suggest that a narrative centred learning environment increases the engagement of the user (Gee 2003; Shaffer 2006), but testing if engagement is linked to a positive increment in learning outcome has shown more controversial results: There are findings for users participating in a narrative game-like application for learning French that showed no different scores compared with users using one without game mechanics (Hallinen et al. 2009), or proved that including off-task possibilities can even hinder user performance (Rowe, J.P. et al. 2009). Another experiment showed that adding game mechanics and a narrative does not directly affect performance, yet can improve a user’s attitude towards the exercise (Rai et al. 2009). However the opposite conclusion can be drawn from (Carini, R. M. et al. 2006) showing in a large-scale experiment that certainly engagement is positively linked to learning. The cause of this difference in the results of those works is shown in a subsequent experiment where Rowe et al. (2011) could show that engagement certainly can increase knowledge acquisition when the educational content and narrative are motivationally and tightly integrated. In that case, the system used mechanics similar to adventure games for teaching microbiology, in which it is necessary to investigate diverse places and objects and make notes related with the learning theme. Another example is the STAR framework (Molnar, A. et al. 2012), which using the same strategy showed good results in testing an application designed to teach health practices to students. In that application, the user takes on the role of a detective who has to solve the mystery of a poisoning crime, obtaining clues about bad habits that can result in getting sick.
In our Bio-safety Lab, the inclusion of narrative elements is not independent to the other mechanics of interactive training: we use narrative techniques to generate the events in the scenario, giving to the user new tasks to solve or helping him. These events directly affect the type and number of training contents the user will experience, so we can affirm that the narrative is tightly integrated with learning. Also, by balancing the difficulty and keeping the user interested we could make him experience psychological flow, as stated by Shaffer (2013). This concept describes a state in which the subject has a sense of engaging challenges at a difficulty level appropriate to his capabilities (Nakamura and Csikszentmihalyi 2002). The consequence of this effect is to persist or return to the activity because it is psychologically rewarding, which is in line with our hypothesis.
Aside from the learning benefits that narrative techniques have, the use of narrative authoring is an interesting strategy in creating training sessions and has been noted and used by a number of researchers. Examples are the works of Rizzo et al. (2010) where they present a clinical diagnosis training system including virtual agents with a scripted backstory and speech recognition for simulating patients or Carpenter et al. (2006) who proposed an approach that uses branching trees to advance in a story in which the user has to take decisions to manage a crisis. However, the user still has limited individual interaction possibilities to select the branch in the story tree, but there is no direct interaction with the environment. Another work (Gordon 2003) describes a protocol to create a branching structure where training scenarios are constructed like storyboards, and applies it to a web-based application. The approach of Ponder et al. (2003) uses a human as a scenario director or narrator. In this application the narrator controls part of a training session that aims to teach a user how to handle the information distribution in a crisis situation, and communicates with the user via voice. Another module uses a tree-based story model to drive the session forward. Similarly, Raybourn et al. (2005) describe a multiplayer simulation system for training army team leaders. The simulation is directed by a human instructor who controls events in the virtual environment. All of these systems present a well-known problem of the branching approach: it does not scale well when the number of events or possibilities in the story increases. Trying to give a solution for authoring educational interactive stories, (Silverman, B. G. et al. 2003) presented a generic tool for creating interactive drama for educational purposes but his conclusion was that there are no environments one can turn to for rapid authoring of pedagogically-oriented interactive drama games. This issue has been confronted by incorporating a drama manager to control the events in the virtual world. For example, the commercial software Giat Virtual Training includes a scenario engine driven by a definition language that represents the training tasks as a series of steps, each step corresponding to an agent inside the system that has a birth, an activity and a death, and communicates with the rest of the agents (Mollet and Arnaldi 2006). The system allows connecting with a pedagogical module controlling the training and triggering events in a narrative way but neither its methods nor its impact on the training was explored in their works. Another drama manager can be found in the application presented by (Habonneau et al. 2012), an interactive drama application for teaching how to take care of people with brain damage. The application uses IDtension, a drama manager engine that controls the simulation and does not rely on branching so it does not have scalability issues. IDtension (Szilas 2007) contains a repository of goal tasks for the scenario characters and controls these characters performing actions that generate or resolve conflicts in the simulation. This solution proved to be very immersive and engaging for the users, allowing them to find themselves in a realistic environment where the unexpected can happen. However, IDtension was not evaluated in terms of learning, and it does not provide tutoring to the user: the system does not balance the difficulty of the session (it only cares for the dramatic impact in the user, not if he/she can resolve the situation or not) nor does it warn the user about incorrect actions. Simply put, it is a narrative-only experience without training elements in which the user interacts with characters, sees what happens and hopefully learns from his/her actions.
Introduction to Bio-Safety Lab
The application implements a Task Recognition Engine, a module which acts as a virtual tutor, recognizing whether the user is doing properly the task of clearing the accident properly or not, and advises on it. This module stores the information concerning each task, including not only the correct way to act, but also common mistakes for each task, and is stored in a knowledge base. In this paper, we present the Narrative Manager, a control module inside the Task Recognition Engine that manages the scenario’s events using a narrative model which optimally selects the timing and the event to trigger.
The training system in the Bio-safety Lab is inspired by Marc Cavazza’s Death Kitchen (Lugrin, J.-L. and Cavazza 2006), which is an application that merges narrative management with complex scenario behaviour, where the system generates events dynamically. In Death Kitchen the user controls an avatar in a virtual kitchen while the system generates accidents and tries to “kill” him. It merges the narrative together with a layered behaviour intelligent system: the high level events that happen in the scenario are captured by the Causal Engine, an intelligent object’s behaviour module that decides the consequences of the event and acts as a reasoner and drama manager. For example, if the user interacts with a water tap, opening it, the Causal Engine can make it work as expected, letting the water run normally from the tap, or decide that is more convenient to break the tap and spray the water all around.
This Causal Engine is similar to the reasoner engine contained in the Bio-safety Lab which captures high-level events and processes them. As in Death Kitchen, in Bio-safety Lab these events are used to decide to trigger consequences in a narrative fashion, but it also uses the events to perform the user’s task recognition in giving real-time assistance (Death Kitchen does not have this feature). However, the way in which the consequences are decided differs. Death Kitchen uses a knowledge database called “Danger Matrix” (essentially a look-up table) containing the characteristics and requirements of each accident, and a planner selects the most dangerous available accident in the table and tries to generate it, creating new events if necessary. However, this Danger Matrix only allows one constraint (the partial order) for selecting the available events to trigger. In Bio-safety Lab, we select the event that maximizes the impact on the user using a knowledge structure known as Task Trees, allowing different constraints when deciding to select the event, which are also used when recognizing the user’s task.
Another difference appears when deciding events to trigger an accident; Death Kitchen only uses two parameters: distance to the object and danger level of the accident. This means that the system can lead to an identical accident if the user goes to the same area in each session. The system works in this way because by design, Death Kitchen does not require more complex decision taking, but in our case we need more complex behaviour in order to optimize the triggering of events. In order to achieve this optimization, we modulate the level of conflict between the user and the environment, creating difficult events for the user if the current task is too easy for him, and lowering the intensity of the events, providing assistance when the task is too difficult. For example, if the user is on the step containing the spill, the system could decide if the spill would overflow its boundaries, making the user hurry up and re-contain it. If the user is cleaning the spill too easily, the system can knock over another bottle, causing it to fall to the floor and break, creating a new spill (this of course can happen only if the bottle is available). On the other hand, if the user is having too much trouble solving his task, the system can help by hinting where the user should go and what he should do. This way we can create more events for the user to solve while at the same time keeping him engaged.
Research in the narrative field stated that creating a sense of conflict in users increases their engagement (Gerrig, R.J. 1993). Also, conflict has been well defined and modelled, as seen in (Ware and Young 2011), who created a narrative planner that generates stories with conflict, and defined a set of dimensions of the conflict in order to quantify it. However, the planner does not create conflict in the stories automatically, but only allows it as one of the possible outcomes: it is left to the player to encounter it. Also, they did not use the dimension of the conflict they defined in their work, admitting that was difficult to find a perfect solution of how to include them, and that they were somewhat subjective. Interestingly, they carried out an experiment (Ware et al. 2012) to validate these dimensions by contrasting how human subjects quantify conflict in stories with the algorithm they created (which includes giving values to some parameters manually) to measure them, and in the results there was agreement between the two quantifications, determining that the dimensions can be recognized as qualities of conflict.
Narrative Event Generation in Bio-Safety Lab
In Bio-safety Lab, the system places the user in a virtual laboratory and tells him to perform certain tasks. At some point when the user is handling lab materials, a bottle containing human blood sample contaminated with bacteria will fall, creating an infectious spill. The user must clean this potentially dangerous accident following a standard protocol.
The system has two separate parts: the interactive 3D Environment Module and the Task Recognition Engine. The first controls the virtual world where the user interacts and the second being independent of the simulation. This 3D Environment contains a Common-Sense Reasoner that translates Low-level events native to the 3D engine into high-level ones called Task-level events and sends them to the Task Recognition Engine. This translation is done by consulting a common sense database that contains a set of rules relating the Low-level events with the Task-level ones, and also describing the consequences of each Task-level event if there are any. If the Reasoner decides that there is a consequence derived from the Task-level events it has an event dispatcher sub-module for triggering the desired effect. For example, the user can drop a bottle to the floor, generating a physical event (the collision between the object and the floor). Then the Common-Sense Reasoner checks into the database and finds that the colliding object was fragile so it should break. This break event will be sent to the Task Recognition Engine, and also will trigger a consequence in the environment: the breaking of the bottle into pieces.
The Task Recognition Engine contains two connected modules: the Task Model Reasoner and the Narrative Manager. The Task Model Reasoner is in charge of providing assistance to the user by supervising his actions and sending him messages whenever he makes a mistake during the current task, and is detailed fully in depth in Prendinger et al. 2014 (under submission). The Narrative Manager also monitors the user and triggers events in the 3D virtual world in order of keep the user engaged in his tasks. In the next subsections, we will describe them in depth. Finally, the Narrative Manager has been developed as an independent module and works by sending and receiving signals from the simulation engine, so it could be assembled in other training systems provided that a Task Model Database for that domain has been defined.
Task Model Reasoner
The Task Model Reasoner receives messages from the Common Sense Reasoner whenever a Task-level event occurs there. These messages are used in order to decide what kind of task the user is performing and if he is doing it correctly. We use a Knowledge-based representation in the form of Task Trees for identifying the different steps of the available tasks. This structure is able to deal with different levels of knowledge abstraction, and supports real-time monitoring and assistance, containing information about the possible mistakes that can be made and how to correct them.
The system is able to monitor user actions by receiving task level (high-level) events from the virtual environment and matching them with the nodes of the Task Tree. In order to represent correct steps and mistakes, each of the leaf nodes is marked either as a correct step in the task or as an error. The OR groups contain the possible documented errors. Error nodes are still part of the same group as their sibling nodes, usually an OR group, due to it being one option more than performing the subtask. The rationale behind this is that it is a physically plausible way to perform a step of the task, even if it is a wrong one, and allows it to be recognized as an error while performing that step. Whenever a correct step is matched with a leaf node, it is instantiated in the task tree and the tree is updated from the bottom up, instantiating parent nodes if necessary and so on.
If the received event is considered a user mistake, the engine will not instantiate it but send back an assistance message in real-time to the virtual environment and display it to the user. There are two types of mistakes: actions that are explicitly represented in the task model as incorrect, and actions that are part of the correct procedures but that should not be executed yet. In the first case the system will explain to the user why he should not perform that action, and in the second case the system will display which is the previous action the user has to perform before the one previously attempted. As an additional feature, the system allows information collection about user behavior and reaction to accidents and mistakes. This is a useful source of posterior analysis to identify more difficult parts of training or user profiling.
The Narrative Manager uses a set of parameters that describe the dramatic value of the events and the situation. These parameters are used to decide which event will be triggered next and are based on those of (Ware and Young 2011) described as the dimensions of conflict. These dimensions are four: balance, which measures the relative likelihood of each side in the conflict to succeed; directness, which is a measure of the closeness of participants including spatial, emotional and interpersonal distance; intensity, which is the difference between of how likely it will be for a participant to potentially succeed and how unlikely it will be for them to potentially fail; and lastly, resolution, which measures the difference in a participant’s situation after a conflict event occurs.
Balance: an integer number that describes how good or bad an event is for the user (positive if it makes the task easier for the user and negative if it is difficult or hinders his or her performance). Each event contains its own balance value stored in the Task Tree, and the value is given by the domain expert that designs the tree. For example, breaking a bottle that contains an infectious sample would have a Balance of −6, and giving the user a hint has a Balance of +4. The Narrative Manager also maintains a global value for the accumulated balance of the whole training session by summing the balance of the triggered events; we call this parameter Global Balance. This Global Balance has a slow attenuating effect with time, tending towards 0. The rationale behind this attenuating effect is that receiving some hints or difficulties can affect the user in the short term but as time goes by this effect disappears: the hints are useful only for the immediate task and the problems can be resolved for the user if he/she takes some time to think.
Impact: a positive value that represents the degree of dramatic load that an event has for the user when triggered. This value is also stored in each node of the Task Tree representing an event, and its value is given by the domain expert. An event with a Balance value very near to 0 can have high Impact if it has a strong visual or psychological effect. For example, when a toxic spill overflows its borders, it has a Balance of −2, being not very bad for the user, but very startling for him, having an Impact of 5.
Intensity: a global positive value maintained by the system that describes how dramatic the current situation should be for the user. Intuitively, when the Intensity is high, the system will likely trigger some bad event soon. At the beginning its value is zero and it is updated dynamically by the Narrative Manager by adding or subtracting the Impact value of the triggered events to its previous value. The value is added or subtracted depending on its Balance value sign. For example, after the breaking of a bottle the system will decrease its Intensity value: the breaking of the bottle event has an Impact value of 10, and negative Balance so the Intensity will decrease by 10. This means that the system will not create large Impact events for some time. Conversely, after a sending a hint to the user, the Intensity of the system will increase 4 points, allowing for the triggering of larger events in the future.
As we can see, the values of Balance and Impact are static and quite subjective, since they are related to the dramatic quality of an event or the session. The domain expert has to define them by judging each event when he designs the scenario, and then the Narrative Manager will use these values to calculate the value of the related dynamic parameters (Global Balance and Intensity, that not only depend on the static value of the two static parameters, but also the timing and the type of user events). Attending to these parameters the Narrative Manager selects the next event to trigger and dynamically models the flow of dramatic intensity for each session, generating conflicts for the users and escalating the tension when necessary, and then relaxing the pace and the difficulty after that. The Narrative Manager is consulted whenever a high-level event is received by the Task Model Reasoner, or after a timer is reset (in our system the timer was set to 15 s). The rationale behind periodically calling the Narrative Manager even if the user is not performing actions is that in a narrative, it is meaningful even when nothing is happening, insofar as the system sends an empty event in order to build the intensity and decide if to trigger an event.
Whenever the Narrative Manager is called it selects an event from the Task Trees which maximizes Impact, with the current Intensity as the upper boundary, and with the Balance most similar to the current Global Balance, but with opposite value. We use Intensity as an upper boundary because it represents the maximum dramatic load we allow for the user at one time, thus limiting the events that can happen. Also, the system has to move the Global Balance towards zero, so the desired event should have a Balance similar to the current Global Balance, but with opposite value. Using both parameters allows maintaining a balanced difficulty while at the same time creating events that impact the user in a dramatic way. It is possible that an event does not meet the conditions above because the current Intensity is too low. In that case, no event will be triggered.
The event selection is carried out by an engine that traverses the scenario task tree and selects the nodes that maximize these values, and then selects one event from these candidates randomly. The selection of nodes follows the rules we stated for sibling nodes in the previous section, so, for example, the Narrative Manager will not execute a node in an OR branch if a sibling node has been executed already, and it respects the SEQ branches by executing nodes in order. Also, it takes into account the closures in the nodes, only selecting nodes if the closure has not been triggered. Finally, once the desired event is selected, the Narrative Manager updates the Utility and global balance again and sends back the order to execute the event to the virtual environment.
Finally, each time an event happens (whether it is an event performed by the user or triggered by the Narrative Manager) the Narrative Manager keeps the global value of the Balance and the Intensity updated. Intensity is calculated by adding or subtracting the Impact value of the event that just happened, depending on its Balance value sign (positive for adding, and negative for subtracting). The intuitive meaning behind this is that after an event with high intensity and negative balance (difficult for the user) is selected, the drama manager does not disturb the user more for some time. On the contrary, after a “good” event, it is more likely that something bad will happen. The current Global Balance is calculated by adding the Balance value of the event to it. If the Narrative Manager receives the empty event, it means that nothing relevant has happened, but a significant amount of time passed, so it will increase the Intensity value and the Global Balance will slowly converge to zero. We increase the Intensity because we consider that giving time helps the user to decide how to complete the task (hence it is beneficial to him/her), so after some time an event should trigger. Also, the Balance changes over time because the effects of an event diminish over time. In other words, the balance returns to zero as the situation becomes no more beneficial neither prejudicial to the user. For example, in the case of a negative event, if a bottle breaks the user could use some time to recall the steps of the protocol, or to look for materials to clean it. On the other hand, in case of positive events, if the user received a hint (a positive event that increases the balance) about what material to use for cleaning the spill, the effect of the hint only lasts until the user finds and uses the object. If after that the user does not know what to do, and spends some time thinking, it decreases the positive balance.
The participants read a manual with the protocols for clearing certain accidents in a bio-safety laboratory. The manual contains a brief description of how to clear the accident in the application and also a guide with pictures of laboratory material and instruments for the training session. The subjects can consult this manual at any moment they want in this stage. After reading the manual, they run a tutorial in order to learn how to use the application. By doing the tutorial the subjects learn how to control their avatar inside the application and also familiarize themselves with the environment and learn the use of certain objects that they will have to use later. The goal of this stage is eliminate possible usability problems and provide domain related knowledge necessary to perform the tasks in the virtual environment.
Then, the participants are separated randomly in two groups: Script Condition and Narrative Condition, each one running two sessions of the scenario. The participants do not know in which group they are, and they cannot see the others’ screen, so they are blind to the condition. Then, the participants in the Script condition run the training scenario twice in a version of the Bio-safety Lab without the Narrative Manager and the participants in the Narrative condition run the training scenario in the Bio-safety Lab with the Narrative Manager twice, which will dynamically trigger events in real time. Both versions contain one pre-scripted accident (spilling human blood infected with bacteria, see Fig. 1 to see a graphical depiction of how a sample session without Narrative manager is conducted) soon after starting the application; until then the two scenarios contain no differences, and we consider this concrete instant as the experiment start point. It finishes when the user runs out of time (12 min) or when he cleans all the existing spills. Both versions also use the Task Model Reasoner to recognize the user’s tasks and give a warning when the user makes mistakes. In summary, the differences between the two systems are that in the version without Narrative Manager the only events from the system are the initial scripted spill accident and the Task Reasoner Module warnings reactive to user’s mistakes, whereas in the version with Narrative Manager dynamically generated system events and hints are also included. After the subjects finished, they filled in a Knowledge Test about the spill cleaning protocol, containing questions like “Is there an incorrect position to handle a spill? Where and why?”, “What is the correct order to cover the spill?” or “Which material should be used for bordering the spill?” among others. Then the subjects filled in a Perception of Learning questionnaire used to capture user impressions of learning from the simulation. This information is important and one of the sources for evaluating training tools (Schumann et al. 2001). After the Perception of Learning questions, they filled in an adapted Perceived Interest Questionnaire (Schraw 1997) to evaluate how interesting the system they ran was. We do not use all the questions of the Perceived Interest Questionnaire because not all were applicable to the system, as it was a generic test for narrative works.
Once the participants completed the questionnaires, they twice ran the scenario again, but this time the one in the other condition. The reason to have another run with the other version of Bio-safety Lab is to give the participants the opportunity to compare which one is more interesting. Finally, the participants are given the Interest questionnaire again and three additional comparison questions to evaluate the system they have just run. However, we did not make the user perform the Knowledge Test because in our experiment the scenario was too simple, so after all runs of it, the users would remember everything easily.
Questionnaire used in the experiment, containing questions about recall and perceived interest
I remember the session in the virtual environment completely
I knew what to do next at all times
I noticed warning messages from the system
When reading a warning message from the system, I understood why
When reading a warning or an advice message from the system, I knew what to do next
I noticed differences in the application between sessions
I thought the scenario tried to help me while using the application
I thought the scenario tried to obstruct me while using the application
I remember how to treat a spill of infected blood
I understand the problems that can arise solving a spill of infected blood
I think this application would be very useful in training laboratory procedures
The events in the system were different each time I played
The session was suspenseful
The scenario grabbed my attention
I would like to play again if I had the chance
I thought the session was very interesting
I got caught up in the session without trying to
I would like to know more about the scenario
I liked this session a lot
I thought the first system was more amusing than the other
I thought the first system was more difficult than the other
I thought the first system was more challenging than the other
Results and Discussion
We extracted the behaviour information of the subjects from the system logs that Bio-safety Lab creates, in order to represent on a chart how they proceeded to task completion. We chose the burn-down charts due to it being widely used as a graphical representation of work left to do versus time (Mike Cohn 2005). In the vertical axis we used the distance to the goal as heuristic range, which is the number of steps the user has to take to clean a spill, starting at 23 and finishing at 0. Each one of the dots on the graph symbolizes one action from the user or one event from the system, and when the value reaches zero on the Heuristic range axis it means that the subject has cleared the scenario.
Finally, when comparing the Task Progression graphs from the Narrative condition and the Scripted condition we see that in the scripted condition the curve proceeds directly towards the end and contains many less events (counting User and Scenario events, including warnings and hints) and takes less time to complete. On average, the Narrative Manager driven sessions contained 154.29 events, and the other without Narrative Manager 76.36. Of course, if the scripted version contained an equivalent number of system events it would be interesting to compare. In that case, if the events are scripted, the version without Narrative Manager would contain actual narrative, so we decided to only maintain the necessary minimum of scripted events. However, it would be interesting to test the Narrative system against one with randomly generated events. In the next section we will elaborate more on this idea.
Average results of the comparison questions
Average (out of 10)
I think the NC system was more amusing than the other
I think the NC system was more difficult than the other
I think the NC system was more challenging than the other
Average results of the perceived interest questions
With narrative manager
Without narrative manager
T test value
Holm-Bonferroni correction α
1.- The session was suspenseful
2.- The scenario grabbed my attention
3.- I’d like to play again if I had the chance
4.- I thought the session was very interesting
5.- I got caught in the session without trying to
6.- I ‘d like to know more about the scenario
7.- I liked this session a lot
Average results in the perception of learning questions
With narrative manager
Without narrative manager
T test value
Holm-Bonferroni correction α
1.- I completely remember the session in the virtual environment
2.- I knew what to do next at all times
3.- I noticed warning messages from the system
4.- When reading a warning message from the system, I understood why
5.- When reading a warning or an advice message from the system, I knew what to do next
6.- I noticed differences in the application between sessions
7.- I thought the scenario tried to help me while using the application
8.- I thought the scenario tried to obstruct me while using the application
9.- I remembered how to treat a spill of infected blood
10.- I understood the problems that can arise when cleaning a spill of infected blood
11.- I thought this application would be very useful in training laboratory procedures
12.- The events in the system were different each time I played
Moreover, in the subjective Perception of Learning questions shown in Table 4, we can observe better values too. We only interpret these questions as indicative of the training experience for the subjects, as we already have the knowledge test as an objective metric for learning improvement. The results are better in the Narrative Manager system in all questions. The subjects subjectively perceived that they remembered training session better with the Narrative Manager. Also, they perceive more help from the system and remember the messages they received better, as the Narrative Condition effectively helps the user and has more warning messages, including proactive hints for the user (the version without Narrative Manager only has reactive warning messages). However and interestingly, the obstruction from the system is not very strongly perceived. The users perceived differences between the two rounds they ran the scenario, and even if both systems use the same warning messages to correct user’s mistakes in the same way, they are better perceived.in the Narrative Manager version.
In this paper we have presented a new method for controlling events in a training session, using narrative techniques adapted to a training domain. We applied this method in an intelligent training system for biohazard laboratory procedures based on users’ real-time task recognition. In this application we integrated a Narrative Manager module which communicates with the task recognition system and uses its same knowledge structure for modeling scenario tasks. The Narrative Manager decides when and which event will be triggered, depending on the user’s performance and the difficulty of the situation, which in turn balances the training session. Our hypothesis is that we can use this technique to keep the user interested in the training. As a consequence, the Narrative Manager is able to maximize the number of events and tasks for the user to solve while keeping him engaged, leading to more knowledge acquisition. Even if our subjects were computer science students, we believe the results here will generalize well to other groups, especially those interested in Bio-safety procedures.
Our main contribution is the design and integration of the Narrative Manager within a virtual training program, and the method it uses to control the session. This method is inspired on a narrative conflict model, adapted for training scenarios, and uses a knowledge structure called Task Trees that enables us to apply this work to different training domains. By controlling the level of conflict in the training session, we keep it dynamically balanced for the user and modulate narrative intensity to make him interested. The Narrative Manager is an independent module, so with the correct domain definition in the knowledge base, it can be potentially applied to other types of tutors or training scenarios, i.e. evacuation training, law enforcement, manual assembly, or machinery work.
We tested our system in a pilot study where we showed that the Narrative Manager system keeps the users effectively interested and depicted how it models the training sessions. The users are able to train more time when the session is interesting for them and they remember better the training procedures. Even though the Narrative Manager presents more tasks for the user to solve, it is not perceived as more difficult, and the users have a better experience from the session. As a consequence of these facts, we can confirm that the Narrative Manager has a positive impact on the learning outcome of the training session. We also analyzed a sample of the subjects’ sessions and showed how the Narrative Manager models the session and balance over time, resulting in a narrative curve.
As a future work, we are planning to improve the Narrative manager to make it able to model a training session using a predefined narrative curve as an input template, improving the authoring capacity of the application. With this new capability, the training designer not only models the system knowledge, but can propose how the training should proceed by giving high-level directions. The narrative Manager will still decide events and timing, but it will adjust itself to the input template, creating different sessions each time, but with similar outcomes. We also want to apply our narrative method to different domains in order to demonstrate its versatility, and we would like to run a larger scale experiment with a more complex training procedure. Such experiment will allow us to better analyze the learning outcome degree of our system and to obtain more definite results, in order to confirm or refute the results we obtained. For that experiment we want to use more control conditions too: one with only the necessary minimum of scripted events, as we did before, another one with randomly generated events and a third one without proactive hints. It would be very interesting to compare the results of those systems: theoretically, if the events are random, there is a possibility that a course of generated events could create a narrative session. Also we would be able to identify the concrete impact of the hints in users’ performance. In that case, the results could be similar to the Narrative system, so we should analyze with what probability or under what conditions we can obtain these results.
This work was supported partly by (1) the Health and Labour Science Research Grants, Research on Emerging and Reemerging Infectious Diseases (Research on development of teaching materials and methodology to evaluate performance to strengthen biorisk management [H20-Shinko-Ippan-009]) from the Ministry of Health, Labour and Welfare of Japan as a contract research, (2) the ‘Global Lab’ Grand Challenge grant from the National Institute of Informatics, (3) Fundaçao para a Ciencia e a Tecnologia, under project PEst-OE/EEI/LA0021/2013, and (4) the Spanish Ministry of Economy and Competitiveness under grant TIN2009-13692-C03-03. The authors would like to thank A. Nakasone and H. Damas for their help with the system development, and E. Gray for his help with the graphical assets.