Exploring spiral narratives with immediate feedback in immersive virtual reality serious games for earthquake emergency training

Various attempts and approaches have been made to teach individuals about the knowledge of best practice for earthquake emergencies. Among them, Immersive Virtual Reality Serious Games (IVR SGs) have been suggested as an effective tool for emergency training. The notion of IVR SGs is consistent with the concept of problem-based gaming (PBG), where trainees interact with games in a loop of forming a playing strategy, applying the strategy, observing consequences, and making reflection. PBG triggers reflection-on-action, enabling trainees to reform perceptions and establish knowledge after making a response to a scenario. However, in the literature of PBG, little effort has been made for trainees to reflect while they are making a response (i.e., reflection-in-action) in a scenario. In addition, trainees do not have the possibility to adjust their responses and reshape their behaviors according to their reflection-in-action. In order to overcome these limitations, this study proposes a game mechanism, which integrates spiral narratives with immediate feedback, to underpin reflection-in-action and reflective redo in PBG. An IVR SG training system suited to earthquake emergency training was developed, incorporating the proposed game mechanism. A controlled experiment with 99 university students and staff was conducted. Participants were divided into three groups, with three interventions tested: a spiral narrated IVR SG, a linear narrated IVR SG, and a leaflet. Both narrated IVR SGs were effective in terms of immediate knowledge gain and self-efficacy improvement. However, challenges and opportunities for future research have been suggested.


Introduction
Earthquakes are one of the natural hazards that heavily impact people's lives, with approximately 100 significant earthquakes causing damage around the world every year [62]. Suitable and timely behavioral responses are essential to save lives and reduce injuries during earthquake emergencies [5]. In recent years, Immersive Virtual Reality Serious Games (IVR SGs) have increasingly gained attention for education and training [27]. IVR offers credible virtual environments that engage users through a high level of immersion [21,56]. In the literature, IVR SGs have been reported to deliver quality training for emergencies [19].
Currently, most of the IVR SGs for emergency training have applied reflection-on-action, in which case trainees reflect on their actions after completing an activity. Little attention has been put on reflection-in-action. Reflection-in-action refers to the reflection that takes place while individuals are carrying out an activity [30]. Individuals need real-time feedback to make sense of confronting situations, adapt themselves, adjust behaviors, and take actions accordingly [30]. Reflection-in-action shapes the output of individuals, in most cases, modifying their behavioral responses to current situations [30]. Therefore, the facilitation of reflection-inaction is important to train individuals about behavioral responses to specific scenarios.
This study aims to investigate the integration of reflection-in-action and reflective redo with an IVR SG training system, targeting earthquakes and post-earthquake evacuation training. Research questions have been raised to guide the present study: 1. How can reflection-in-action and reflective redo be implemented in an IVR SG? 2. How effective is an IVR SG with reflection-in-action and reflective redo in terms of training?
The following research objectives were established in order to address the research questions: 1. To understand IVR SGs for emergency training, storylines and narratives in GBL, and reflection and redo in GBL. 2. To develop an IVR SG for earthquake emergency training with reflection-in-action and reflective redo, using spiral narratives and immediate feedback. 3. To validate the effectiveness of training for the proposed IVR SG with reflection-in-action and reflective redo.
A literature review on IVR SGs, storylines, and reflection is outlined first. Following that, a detailed discussion of the design and development of the proposed IVR SG training system is mapped out. Reflective redo and reflection-in-action are realized through two game mechanisms: a spiral narrative to manipulate the storyline of the IVR SG (i.e., redo from the point when an error is made) and immediate feedback to foster reflection-in-action. Finally, an experiment assessing the effectiveness of the IVR SG training system is demonstrated, leading to the main findings and a discussion of further research.
2 State of the art 2.1 IVR SGs and problem-based gaming IVR SGs enables training to take place in credible virtual environments. As such, IVR SGs have become popular for emergency training [19], covering a range of emergency situations, such as fire [57], aircraft evacuation [10,13], and earthquakes [38]. Smith and Ericson [57] have motivated children to learn about fire evacuation skills using a projection-based IVR SG. Trainees have acknowledged the value of IVR SGs due to the fact they provide highly engaging, interactive training environments and experiences. Burigat and Chittaro [10] put trainees in an IVR SG to learn and improve their spatial knowledge associated with aircrafts. Results indicated that the IVR SG outperformed the traditional training approach (i.e. safety cards) in terms of teaching spatial knowledge. Through the IVR SG training environment, trainees had the chance to walk in an aircraft and get familiar with its spatial layout. Trainees paid more attention to learning tasks in the IVR SG than using safety cards. In another aircraft emergency training, trainees learned the appropriate behavioral responses to aircraft emergency landing and evacuation through an IVR SG [13]. Results indicated that trainees who were trained in the IVR SG were able to retain more learned safety knowledge than those trained with safety cards. The possible reason is the high-level emotional arousal induced by realistic simulations (especially during hazardous events) within an IVR SG environment. Regarding earthquake training, Li et al. [38] proposed an IVR SG to teach self-protection skills in an earthquake. Results revealed that the IVR SG performed better than other traditional approaches, including videos and safety manuals. Li et al. [38] suggested that realistic experience and immersive practice could overcome the gap between theory and practice existing in traditional training approaches. In sum, IVR SGs seem to be a promising and effective training tool for emergency situations. IVR SGs are in line with the concept of problem-based gaming (PBG), where trainees develop skills and acquire knowledge by solving authentic problems through the learning and practice of using games [36]. Figure 1 illustrates the concept of PBG. According to Kiili [36], in the first phase, trainees start to form a strategy to solve the present problems based on their prior knowledge and experience. Next, trainees actively solve the problems and observe the consequences of their responses. Then, trainees reflect on their strategy and performance [6]. Reflection is a vital part of PBG as it constructs cognition and synthesizes knowledge [36]. The trainees' behaviors continuing the game are determined by reflection, where trainees decide whether to stick with the previously formed strategy or alter it based on input from reflection [36].
PBG suggests that the interaction between trainees and games is an ongoing process causing a loop, with a focus on constructing knowledge and improving problem-solving strategies used by trainees continuously in the gameplay. With the current framework of PBG, after making a response, trainees observe consequences, engage in reflection, and carry on solving the next problems. However, the current PBG lacks the possibility for trainees to reflect while they are making a response. Trainees do not have a chance to adjust their responses and reshape their behaviors for the current scenario. In contrast, they only establish perspectives and knowledge with the reflection after the completion of the current scenario. As such, PBG has limited ability to enable reflection-in-action.

Storylines and narratives
Fundamentally, an IVR SG incorporates the concept of game-based learning (GBL), which covers three essential aspects: learning, play, and storylines [29]. A storyline is a game element that presents game content with structured events [32]. In GBL, learners engage with storylines to play and learn. Knowledge is conveyed in context through storylines within game environments [59]. The use of storylines is beneficial to improving the understanding and recall of presented materials [7]. In addition, storylines promote the immersion and motivation of learners [46]. Appealing storylines encourage learners to engage in gameplay, leading to enhanced learning effects [31]. As such, storylines set out a baseline in GBL, with a strong influence on other gaming characteristics, such as interactivity, challenges, and feedback [2].
Storylines consist of a set of scenarios. The way to serve these scenarios to learners is referred to as narratives (i.e., storytelling methods) [12]. Narratives are essential to communicate learning content to learners [39]. In principle, there are two types of narratives for GBL: an implicit narrative and an explicit narrative [25]. In the case of implicit narratives, the storylines are not structurally outlined. The presentation of storylines depends on the exploration of learners in game environments. Regarding explicit narratives, storylines are clearly presented to learners. Storylines take the lead in learner activities. Implicit narratives may lead to an increased recall of spatial knowledge, whereas explicit narratives may perform better on transferring factual knowledge [25]. As such, the present study incorporates explicit narratives to the proposed IVR SG training system. With explicit narratives, one possible way to encourage learners to follow the lead of storylines is to use an action-driven method [22]. As a storyline is a set of structured scenarios, Fig. 1 Problem-based gaming, derived from [36] the progression from a scenario to the next scenario is driven by the actions taken by learners. In other words, in a scenario of storylines, learners may perform some activities, such as solving problems or completing tasks, to make progress to the next scenarios. Thus, scenarios are exposed to learners in a structured way. Eventually, the entire storyline is disclosed explicitly to learners.

Reflection and redo
The term "reflection" has a variety of different interpretations in the literature. A general view refers to reflection as a form of the mental process with involvement in learning and comprehension [42]. There are two categories of reflection: reflection-in-action and reflection-on-action [50]. Reflection-in-action means the reflection occurs while individuals are responding to a situation [30]. With the effort of making sense of confronting situations, individuals reflect on their understandings and make responses accordingly [30]. Reflection-in-action reshapes the perceptions of individuals and guides individuals' follow-on actions [30]. In terms of reflection-on-action, it is a retrospective process that interprets and analyzes the recalled information on undertaken practices, upon the completion of the entire learning or training. The reflection turns the contemplation of information into knowledge through a postmortem cognitive process [30]. Individuals step back into previous experience and retrieve their memories with the purpose of comprehending situations and gaining knowledge.
In GBL, one effective way to facilitate reflection is using feedback [24]. Feedback is a game feature that directs learners to evaluate their performance, identify knowledge gaps, and obtain correct knowledge by receiving various forms of information [33]. When learners interact with game environments, they generate an effect on the environments, which in turn feeds learners with information to reaffirm or adjust the learners' approaches to the confronting situations [50]. Reflection-in-action occurs when learners receive feedback while they are responding to the game environment. The changes in the game environment impact the perspectives of learners on the surrounding situations. With feedback and reflection, learners develop new perceptions for current situations.
At the end of the reflection, it is possible for learners to adapt themselves to current situations with newly developed perceptions, enabled by reflective redo. Reflective redo allows learners to revisit a scenario and make a different response during a training experience, usually starting from the point when an error is made [53]. With the given feedback, they redo an activity that was not done appropriately which can lead to a change in behavior and enhance their reflective thoughts to avoid the same mistake being made again [53]. In the literature of GBL, reflective redo has been studied for After Action Review (AAR) [53]. AAR is a retrospective process that provides feedback after training exercises, similar to reflection-onaction [3]. Scoresby and Shelton [53] demonstrate that reflective redo helps learners develop cognition on their actions with retrospective reflection (i.e., AAR) and eventually, improve learning.
In the literature, little attention has been paid to the integration of reflective redo with reflection-in-action. This study explores the possible effect of reflection-in-action with the implementation of reflective redo, using an IVR SG training system. Reflective redo is enabled by game mechanisms applied to the narrative of the IVR SG training system. More details of the design and development of the IVR SG training system are outlined in the next section.

The IVR SG training system
The proposed IVR SG training system allows trainees to experience an indoor earthquake and post-earthquake evacuation in an office building setting. Trainees are expected to apply best evacuation practices and learn suitable safety knowledge in relation to behavioral responses. In this section, the design of storylines and narratives are discussed first, which is fundamental to enable reflective redo. Following that, the use of immediate feedback is discussed, which facilitates reflection-in-action. Lastly, the development and deployment of the IVR SG training system are outlined.

Storylines and narratives
The target trainees of the IVR SG training system are adults, and the case study is the Faculty of Engineering at The University of Auckland, New Zealand (office building area). We followed the guidelines issued by the New Zealand Civil Defence and Emergency Management (NZCDEM) to identify a list of behavioural responses as the training objectives of the IVR SG training system (see Table 1) [45]. The guideline includes a total of 32 recommended behavioural responses, covering an extensive range of scenarios during and after an earthquake. Note that these are recommended behavioral responses, and it does not necessarily imply that all of them must be taken into account in every single earthquake emergency. Real earthquakes are highly dynamic and unpredictable hazards, where situations change rapidly. In our case, we selected the behavioral responses focused on indoor scenarios (at work) only, which were feasible to be implemented in our proposed virtual environment (an office setting). Accordingly, several scenarios were developed to form up the storyline of the training for trainees to practice and learn knowledge in context [59]. An action-driven approach was applied to drive the storyline. The storyline was progressed once trainees had taken actions in scenarios [22]. In our case, trainees had to select options to solve problems confronted in the IVR SG as the storyline goes on.
Traditionally, storylines are narrated in a linear way. A linear narrative means that storylines are disclosed from the beginning to the end with no use of flashbacks [17]. In the case of IVR SGs, with the integration of an action-driven approach, storylines progress from one scenario to the next after trainees take actions to solve a problem in a scenario (see Fig. 2a). No matter how trainees perform, trainees do not have a chance to revisit the previous scenario and adapt their previous responses. Trainees experience each scenario only once from start to end of the storyline.
In order to support the reflective redo in IVR SGs, we propose a spiral narrative to progress storylines (see Fig. 2b). The progress of storylines depends on the performance of trainees. Trainees make a response to a problem in a scenario. Then trainees receive immediate feedback, indicating whether their responses are appropriate. If the responses are not appropriate, trainees stay in the same scenario after receiving the feedback. Storylines are not progressed from the current scenario to the next scenario until trainees respond correctly to the problem in the current scenario. Trainees can make several attempts to solve a problem in the same scenario. As such, redo from the point when an error is made is enabled.

Immediate feedback
In order to foster reflection, immediate feedback is given to trainees after a specific response is undertaken [36]. Trainees are expected to think over their previous responses based on the feedback and learn from it. In the IVR SG training system, three types of stimulation are deployed with immediate feedback: image-based feedback, audio feedback, and text-based feedback. Image-based feedback means that a green check is shown for a recommended behavioral response and a red cross for a not recommended response (see Fig. 3b, c). Audio feedback stands for a sound effect that is triggered simultaneously with image-based feedback, with "Ding" for a green check. Text-based feedback is presented after image-based feedback (for green checks only), which is a short text explaining the recommended behavioral response for the current problem (or scenario) (see Fig. 3d). Trainees receive positive feedback after a correct response and negative feedback for incorrect responses. The exposure to negative feedback may result in a negative suggestion effect, in which case misinformation could be learned by trainees [9]. As such, the negative suggestion effect can jeopardize the acquisition of correct knowledge. To avoid this effect in our IVR SG training system, we propose varying degrees of stimulation for positive feedback and negative feedback. Intense stimulation is  applied for positive feedback, with all three types of stimulation triggered: when trainees choose a recommended behavioral response, a green check is popped up with the sound effect triggered at the same time, followed by a textual explanation. Whereas for negative feedback, weak stimulation is deployed with image-based feedback only: when trainees make a selection against recommended behavioral responses, a red cross is shown. The differentiation in stimulation encourages trainees to pay more attention to positive feedback, with little attention to negative feedback that only indicates a response is not recommended. In this way, the message of recommended behavioral responses is emphasized instead of misinformation [9,11].
Another important factor facilitating effective reflection enabled by immediate feedback is the reflection time after receiving the feedback. Insufficient reflection time could lead to inadequate reflection, which can harm the acquisition of knowledge [49]. Reflection time varies in the literature of IVR SGs, depending on the types of knowledge to be conveyed, such as procedural knowledge or factual knowledge [37]. In an IVR SG study where participants were trained about the behavioral responses to aircraft emergencies, a 7-second reflection time was provided after each immediate feedback [13]. Trainees could utilize this time interval to digest the feedback and trigger reflective thinking over their previous behaviours. At the same time, the IVR SG was paused temporarily without undergoing visual and acoustic simulations. In another IVR SG study where spatial knowledge was taught for evacuation, no reflection (a) A problem to solve (trainees need to find a place to take cover)  [10]. Effective training outcomes were obtained from both studies. In our case, the knowledge to be equipped by trainees is about the behavioral responses to earthquake emergencies. As such, we provided a time interval using a 10-second reflection time after text-based feedback, when trainees had completed the current scenario (i.e. they have solved the problem correctly with a recommended behavioral response). Trainees were encouraged to use the reflection time to reflect on their previous performance in response to the confronted problems by reading the text-based feedback thoroughly [13]. The storyline of the IVR SG training system was paused during the reflection time. Once the reflection time is up, the storyline started to progress to the next scenarios.

The setup of the IVR SG
A virtual earthquake takes place in the IVR SG training system. An office building of the University of Auckland was selected as the training location. We followed the Building Information Modelling (BIM)-based workflow proposed by Lovreglio et al. [41] to develop virtual environments. BIM-based workflow allows the accurate representation of building layouts and the manipulation of individual objects to permit credible earthquake simulations [20,41]. A basic building model defining the envelope and layout of the selected built environment was developed using Autodesk Revit (see Fig. 4), covering walls, ceilings, floors, doors, and windows. Next, it was imported to Unity for IVR and game features development. At this stage, low-polygon models of furniture and appliances were placed in the model, allowing a fluid IVR experience. A fluid IVR experience can avoid negative symptoms and effects, such as nausea and disorientation, resulting from low frames per second (FPS) [55].
A qualitative approach was applied to model an earthquake and provide trainees with the sense of being in an actual earthquake [41]. The actual performance of objects was not simulated, as the main focus of the IVR SG training system was delivering the knowledge of recommended behavioural responses, rather than structural simulations and analysis. With a qualitative approach, subjective descriptions were one of the data sources for the development of earthquake simulations [23]. In our case, we referred to the New Zealand Modified Mercalli Scale (MMI) to simulate an earthquake and its damage [28]. We selected the description of MMI 6: "Furniture and appliances may move on smooth surfaces, and objects fall from walls and shelves. Glassware and crockery break. Slight non-structural damage to buildings may occur" [28]. The reason to use MMI 6 is that it represents a strong earthquake, enabling trainees to build a clear perceptual picture of what an earthquake looks like; however, no significant structural damage is caused at this level. Structural damage is not necessary in the IVR environment, as the message to be delivered to trainees is the recommended behavioral responses in specific scenarios, in which case, scenarios in structurally damaged buildings are not included by the New Zealand national guidelines, and as such do not help to deliver the intended training outcomes. We continued the development in Unity for earthquake simulation and building damage. We manipulated the movement, orientations, and positions of individual objects in the IVR environment, providing the visual cues of earthquakes and damage. In addition, sound effects were integrated at the same time based on the description of MMI 6. Figure 5 demonstrates the virtual environment and earthquake damage.
The IVR SG training system was developed and deployed with Unity version 2018.2.14f1. A DELL PC workstation was used to run the IVR SG, which was equipped with an Intel Xeon W-2125 processor, an NVidia GeForce RTX 2080 graphics card, and 64 GB RAM. An Oculus Rift VR system enabled the IVR experience, with a headmounted display (HMD), a remote controller, and two tracking sensors. The video output of the HMD was transmitted to an LED screen, which allowed real-time observation of the IVR experience of trainees.
Latency is an important factor in IVR systems, which could lead to a loss of performance and cybersickness [60]. In order to address latency, we applied a holistic approach, including software design and hardware configuration. Firstly, we used low-polygon models to decrease the rendering demand of the IVR environment, as described in this section. Secondly, we used Occlusion Culling in Unity to disable rendering of objects that are currently not visible by cameras (i.e., participants). Thirdly, we used a high-spec computer to run the IVR SG, as described in this section. Fourthly, we limited the movement of participants on a predefined route, using a waypoint navigation system [47]. Participants clicked a button on their remote controller, and the camera started to move from one waypoint to the next, following the route. In addition, participants remained seated during the entire IVR experience. They only needed to turn their bodies via a swivel chair to face their moving direction. Therefore, participants were not always paying attention to making movement and interaction in the IVR environment. Taken together, the first three approaches were aimed to reduce the latency itself by increasing frames per second (FPS). The last approach tried to reduce the impact of latency on participants by involving less interaction.

Research methods
To evaluate the proposed IVR SG training system, we conducted an experimental study comparing a version of the IVR SG with redo (a spiral narrative), a version without redo (a linear narrative), and a traditional leaflet (control group). The three approaches will be referred to as Spiral narrative, Linear narrative, and Leaflet. The present study followed a pretestposttest research design. Prior to the training via each training approach, a pretest measure on the outcome of interest was administered to trainees, followed by the same measure after training (post-test). This section outlines the materials of the research and the information about trainees first. Then the measures to assess the effectiveness of the IVR SG training system are discussed. Finally, the procedure to conduct the experimental study is described.

Materials
The Spiral narrative and Linear narrative groups were treated with the IVR SG training system, using the software and hardware described in Section 3.3. The only between-group difference was the narrative mechanism, in which case Spiral narrative offered reflective redo and Linear narrative did not. Regarding Leaflet, an A4-sized paper was printed with instructions about the expected behavioral responses, as recommended in Table 1.

Trainees
Ninety-nine university students and staff (44 females and 55 males), with ages ranging from 18 to 53 years old (mean = 26.9, standard deviation = 7.74), participated in the experiment. Trainees were recruited by posters, emails, and referrals. Trainees were randomly assigned to three groups, with 33 trainees in each group. The previous experience with earthquake drills and IVR were collected from trainees (see Table 2). No significant differences were revealed between groups based on Kruskal-Wallis tests (earthquake drills, p = 0.593; IVR, p = 0.523).

Measures
The research questions focus on the effectiveness of the IVR SG training system in terms of delivering training outcomes. The training outcomes lie in the enhancement of the safety knowledge of appropriate behavioral responses and the self-efficacy in coping with earthquake emergencies [23]. Following a pretest-posttest research design, a questionnaire (see Sections 4.3.1 and 4.3.2) measuring the safety knowledge and self-efficacy of trainees were administered before and after the execution of the training. In addition, after training, we collected user feedback on engagement, mainly about attention (see Section 4.3.3). We also measured the

Safety knowledge
Trainees are expected to be able to deal with similar real-life scenarios after training [38]. In our case, the safety knowledge learned through training is the appropriate behavioral responses to an earthquake and post-earthquake evacuation, as recommended by national guidelines (see Table 1). In order to measure the acquisition of safety knowledge, a true-false knowledge test was established containing ten questions (see Table 3). The true-false knowledge test was aimed to evaluate whether participants understood what appropriate behavioral responses in earthquake emergencies are. The true-false type of questions has been applied in the literature for knowledge tests [35]. Trainees were instructed to identify true statements only. Possible test scores ranged from 0 to 10, where trainees lost one mark if they missed a true statement or picked a false statement as a true one. Table 3 illustrates the statements and the correct answers to the knowledge test.

Self-efficacy
Self-efficacy means the beliefs people hold in their competency to solve problems and overcome difficulties [4]. High-level self-efficacy may result in a change in behavior, which leads to the improvement of performance when dealing with problems and difficulties [58]. The General Self-Efficacy Scale has been suggested to measure self-efficacy in the literature, providing a list of statements [52]. Based on those statements, a six-statement self-efficacy test was developed, focusing mainly on the perceptions towards earthquakes and post-earthquake evacuation: 1. "I know what to do when facing an earthquake"; 2. "I can remain calm when facing an earthquake"; 3. "I have the confidence to deal with an earthquake emergency"; 4. "I can come up with a plan for responses to an earthquake"; 5. "I can handle situations during an earthquake"; 6. "I can think of a solution if I am in trouble during an earthquake." Trainees were asked to rate their levels of agreement to each statement based on a 7point Likert scale, with − 3 for totally disagree and + 3 for totally agree. The total score was calculated by finding the sum of the scores for each statement, with a higher score representing a higher level of self-efficacy [52]. In our case, the possible total score ranged from − 18 to 18.

Attention
For IVR groups, we deployed a self-reported questionnaire to assess to what extend the training approaches attracted trainees' attention. The questionnaire included three statements, following the measurements applied by Burigat and Chittaro [10]: 1. "It was easy for me to concentrate on my learning"; 2. "It was easy for me to stay focused on the task"; 3. "I felt the training was fun." Trainees answered the questionnaire based on a 7-point Likert scale, with − 3 for totally disagree and + 3 for totally agree. The total score was calculated by finding the mean of the three statements for each trainee [10].

Ease of training
In this study, ease of training represents the ease of narratives and feedback to facilitate the learning process in the IVR SG training system. Following the measurements deployed by Chittaro and Sioni [14], we developed a questionnaire for IVR groups, including four statements for trainees to answer: 1. "The training storyline helped me to learn"; 2. "It was easy for me to understand the learning content"; 3. "It was easy for me to learn about what to do during and after earthquakes"; 4. "It was easy for me to remember what I have learned." Trainees were asked to rate their levels of agreement to each statement based on a 7-point Likert scale, with − 3 for totally disagree and + 3 for totally agree. The total score was calculated by finding the mean of the four statements for each trainee [14].

Cybersickness
As discussed in Section 3.3, we applied a holistic approach to address latency issues. In order to measure cybersickness, for IVR groups, we used the following statement for trainees to rate: 1. "The VR experience made me dizzy".
Trainees were asked to rate their levels of agreement to each statement based on a 7-point Likert scale, with − 3 for totally disagree and + 3 for totally agree.

Realism
Another aspect of user experience measured in this study was the realism of virtual environments. For IVR groups, we used the following statements for trainees to rate: 1. "The building environment was realistic"; 2. "The VR experience was realistic".
Trainees were asked to rate their levels of agreement to each statement based on a 7-point Likert scale, with − 3 for totally disagree and + 3 for totally agree.

Procedure
This experimental study took place at the University of Auckland, New Zealand. Trainees were informed via a participation information sheet that the experiment involved a visual simulation using an IVR headset. Trainees were randomly assigned to three groups prior to participation. Upon arrival, trainees gave their consent for their participation and the collection of data for research analysis, by signing consent forms. The ethics approval (Protocol No. 016763) was granted from The University of Auckland Human Participants Ethics Committee. Trainees could withdraw their participation at any time without giving any reason. Then, trainees answered a questionnaire that covered demographic information, prior experiences with earthquake drills and IVR, a knowledge test, and a self-efficacy test.
Next, trainees in the IVR groups received an induction about using IVR as well as health and safety instructions. After that, trainees put on an IVR headset and were assisted in getting a clear view of it. Personal glasses were kept on where possible. Once the IVR session started, trainees received a tutorial to familiarize themselves with IVR environments as well as the interaction with problems and immediate feedback. The actual training took place once trainees were comfortable with the controls in the IVR.
Regarding the leaflet group, trainees were trained through reading a leaflet. Trainees were instructed to study the leaflet carefully till they fully understood the content, no matter how long it took.
Upon the completion of training sessions, the trainees of each group answered the knowledge and self-efficacy test. Lastly, trainees were thanked, and their participation was acknowledged.

Results
The results of knowledge scores and self-efficacy scores are reported in Fig. 6. Wilcoxon Signed Ranks Tests confirmed that the knowledge and self-efficacy were improved significantly after training within each group. ANCOVA was adopted to analyse the pre-test and post-test scores between groups. The analysis controlled for pre-test scores as the covariate, while post-test scores were served as the dependent variable. Results revealed that there were no significant between-group differences for the post-test knowledge scores (F(2,95) = 0.381, p = 0.684, η p 2 = 0.008) and the post-test self-efficacy scores (F(2,95) = 0.052, p = 0.949, η p 2 = 0.001). Bonferroni tests were applied for follow-on pairwise comparisons, as shown in Table 4.  Fig. 7. Cronbach's alphas were calculated to assess the internal consistency of multiple statements (Linear narrative: 0.868; Spiral narrative: 0.778), suggesting that the three asked statements were closely related to measuring attention [15]. Lastly, Kruskal-Wallis tests confirmed no significant difference between groups (p = 0.282). In general, the trainees in both IVR groups acknowledged that they kept engaged with the IVR SG.
The results of ease of training are reported in box plots (Linear narrative: M = 2.28, SD = 0.710; Spiral narrative: M = 2.33, SD = 0.741), as shown in Fig. 8. Cronbach's alphas were calculated to assess the internal consistency of multiple statements (Linear narrative: 0.871; Spiral narrative: 0.899), suggesting that the four asked statements were closely related to measuring ease of training. Lastly, Kruskal-Wallis tests confirmed no significant difference between groups (p = 0.633). In general, the trainees in both IVR groups acknowledged that the IVR SG was easy for them to get training.
No trainees quit the experiment because of cybersickness. The ratings of cybersickness are reported in box plots (Linear narrative: M = -0.82, SD = 0.2.02; Spiral narrative: M = -0.10, SD = 1.94), as shown in Fig. 9. Kruskal-Wallis tests confirmed no significant difference between groups (p = 0.140). In general, the trainees in both IVR groups acknowledged that the IVR SG did not cause serious cybersickness issues.

Discussion
Overall, the results point to a positive effect of the proposed IVR SG training system. However, no outperformance was observed. In this section, we discuss the obtained results. Fig. 8 The ease of training reported by trainees Fig. 9 The cybersickness reported by trainees

Linear vs. spiral narrative
The results of the present experiment revealed that the Linear narrative and Spiral narrative were both effective for knowledge gain and self-efficacy improvement. The similarity of Linear narrative and Spiral narrative lies in the instructional approach of immediate feedback. With immediate feedback, the trainees in both groups could develop knowledge. This finding is in line with other IVR SGs studies in the literature that immediate feedback is an effective pedagogical approach to apply in IVR SGs [10,23]. We did not find a significant difference between the two versions of IVR SG, in terms of immediate training outcomes, attention, and the perceptions of trainees about ease of training. The manipulation of narratives did not add extra value to linear-narrated IVR SGs. It is possible that the relatively simple knowledge to be taught might weaken possible differences, given that the trainees in both groups were already knowledgeable regarding the appropriate behavioral responses for earthquakes before training (the pre-test knowledge scores from Linear and Spiral narrative groups were both 7.52 out of 10). The potential for knowledge improvement might be limited. We are uncertain how Linear and Spiral narrative will play out in a larger population with different samples.

IVR SG training vs. leaflet
The three tested approaches were all effective in enhancing earthquake preparedness, manifesting in significant improvement in safety knowledge and self-efficacy. However, the two versions of IVR SG did not outperform the leaflet. This finding is consistent with other IVR SGs studies in the literature [13,57], where IVR SGs are the same as traditional approaches in increasing knowledge immediately after training. However, we speculate that differences exist in a long-term effect. Chittaro and Buttussi [13] suggest that IVR SGs could lead to better knowledge retention than traditional approaches (in their case, safety cards). In their study, after one week, trainees who were trained with safety cards suffered a significant knowledge loss, while trainees who used the IVR SG maintained their knowledge well. One possible Fig. 10 The realism reported by trainees contributor to the retention effect of IVR SGs is the emotional arousal triggered by an engaging and emotive IVR experience, in which case memory is enhanced by emotion [26,34,54]. Traditional approaches, such as safety cards or leaflets, are incapable of arousing intense emotion [13].

Challenges and opportunities
Based on the results, with or without repeating a scenario and redoing a response did not make a difference in the training outcomes. We measured immediate knowledge gain and selfefficacy improvement. It is possible that reflective redo might be influential in other aspects. For instance, a recent IVR SG study shows that repeated exposure to a fire emergency scenario could lessen trainees' anxiety and stress and improve wayfinding performance [40]. The same effect might occur in earthquake emergency situations. As such, we speculate that trainees may develop self-efficacy and competence to reach a balance of their capability to perform activities and the difficulties of activities, which in turn facilitates a state of flow during a training experience [1]. Flow is a state in which trainees are highly engaged in activities and functions to manage activities [16,43]. Flow plays an important role in GBL, by keeping trainees concentrating on learning [1]. The relationship between flow and the use of reflective redo with a spiral narrative in IVR SGs remains unclear, with a knowledge gap existing in the understanding of the impact of reflective redo on trainees' emotional states, mental workload, and cognitive activities. Future research on these topics is therefore suggested, using psychological and physiological measures.
Further to the assessment of cognitive activities, metacognition is one of the activities which has been studied in the literature with the use of reflective redo [53]. Metacognition is referred to as "the deliberate conscious control of one's own cognitive actions" [8], in other words, the knowledge that people hold about their thoughts. Scoresby and Shelton [53] investigated the metacognition of trainees who had undertaken reflective redo to get insights on the impacts of reflection on learning, with a focus on reflection-on-action. Future research can look into the metacognition associated with reflection-in-action, which is posed by a spiral narrative and reflective redo.
As well as the further investigation on the impacts and effects of reflective redo, there are opportunities to extend and escalate the use of reflective redo by improving game mechanisms. One possible way is to integrate situated learning, which suggests that learning in authentic contexts is most effective [61]. In specific, after the initial attempts of trainees to solve problems in scenarios, their knowledge gaps are exposed. Then trainees can experience a similar scenario to solve the identical problem, with contexts (i.e., social and physical environments) being the only differentiating factor. Thus, reflective redo takes place in a different context. The repeated teaching and practising in expansive contexts are beneficial for trainees to transferring knowledge and applying skills to new settings in the future [18,44], which is essential for earthquake safety as it is spatially unpredictable in times of need.
Another important factor to be investigated in future is the physiological and psychological aspects of participants undergoing training in IVR environments. Adverse effects may occur to participants as IVR may lead to disorientation, nausea, anxiety, flashbacks, and post-traumatic stress disorder (PTSD) [48,51]. The potential issues raised by an IVR training system must be investigated, and the responding procedures must be in place before rolling out to a larger audience.

Conclusions
IVR SGs have been studied for earthquake and emergency training in the literature. However, these IVR studies mainly focused on the prototyping aspect, with little attention to narratives, reflection, and their impacts on training. The present study contributes to the body of knowledge by incorporating reflection-in-action and reflective redo with IVR SGs for earthquake training. The proposed IVR SG features immediate feedback and spiral narratives. With the manipulation of narratives, trainees are allowed to repeat a scenario with immediate feedback to induce reflection-in-action, enabling reflective redo. A controlled experiment was conducted to validate the proposed game mechanisms. An IVR SG with linear narratives and an IVR SG with spiral narratives were compared, with a leaflet approach being a control group. Immediate knowledge gain, self-efficacy improvement, attention, ease of training, cybersickness, and realism of the IVR SG were measured. Results support that the IVR SG training system is well aligned to PBG. Trainees believed that both narrative approaches were easy for them to understand learning materials and facilitate learning processes. The results about training outcomes also suggest that a spiral narrative with immediate feedback is effective to deliver knowledge and improve self-efficacy, as well as a linear narrative and a leaflet.
The present study has several limitations, with one of them being the lack of a retention test. According to our results, reflective redo and reflection-in-action could result in a positive effect on immediate knowledge gain. As well as this, with repeated exposures to a scenario, trainees might lessen anxiety and stress, leading to improved self-efficacy immediately after training. IVR SGs are likely to have a long-term impact on memory [13]. Future research can clarify the retention effect on knowledge and self-efficacy with the use of reflective redo in IVR SGs. Another limitation is that the selection of reflection time is arbitrary. Trainees were given 10 s after receiving text-based feedback for reflection. There is a lack in the literature about the use of reflection time in IVR SGs. We made our choice based on the literature. It is unclear whether a 10-second reflection time is sufficient, or too much that might interrupt a sense of presence. Lastly, the range of our sample size and characteristics is limited, with most of the trainees being knowledgeable about earthquake safety knowledge before the training. Future research can expand the experiment to other countries with different types of trainees in various settings.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions.

Data availability Yes.
Code availability No. There are commercial concerns.

Conflict of interest No conflicts of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.