Immersive training of first responder squad leaders in untethered virtual reality

We present the VROnSite platform that supports immersive training of first responder units’ on-site squad leaders. Our training platform is fully immersive, entirely untethered to ease use and provides two means of navigation—abstract and natural walking—to simulate stress and exhaustion, two important factors for decision making. With the platform’s capabilities, we close a gap in prior art for first responder training. Our research is closely interlocked with stakeholders from multiple fire brigades to gather early feedback in an iterative design process. In this paper, we present the system’s design rationale, provide insight into the process of training scenario development and present results of a user study with 41 squad leaders from the firefighting domain. Virtual disaster environments with two different navigation types were evaluated using quantitative and qualitative measures. Participants considered our platform highly suitable for training of decision making in complex first responder scenarios and results show the importance of the provided navigation technologies in this context.


Introduction
Virtual reality (VR) has been developed over the past decades and has been widely employed in various realms. It integrates and combines technologies from the field of 3D computer graphics for real-time rendering, computer vision and sensor fusion for localization (Liu et al. 2017) and motion tracking (Zhang et al. 2013), spatial sound and 3D user interface design. Therefore, a highly capable user interface can be provided to enhance existing applications, such as design, therapy or first and second responder training.
Effective training is a cornerstone of disaster preparedness. Quality, consistency and frequency of training have shown to impact self-perceived disaster readiness of first responder units (Hsu et al. 2013;Djalali et al. 2014). However, barriers such as time, cost and safety limit the extent to which large groups of responders can be brought up to established standards, particularly related to integrated disaster team response skills and experience. Nowadays, preparedness efforts focus primarily on three conventional training methods: (1) didactic, classroom-based teaching; (2) webbased training that consists primarily of pre-recorded, userpaced presentation material; and (3) real-life drills and tabletop exercises. While all of the above are long-established valid approaches, classroom-based teaching and web-based presentations lack the realism offered by real-life drills. However, real-life drills are often inconsistent because of an inability to vary levels of stressful events and the extent of time and resources required to design, execute and review such drills. The advent of technology-based approaches through VR environments holds significant promise in its ability to bridge the gaps of other established training formats. VR-based systems encompass a wide array of technical capabilities ranging from non-immersive computer-based setups to fully immersive and high-fidelity platforms where participants wear head-mounted displays (HMD) for 3D scene viewing and use 3D input devices (joystick, gamepad) for interaction in controlled environments (Fig. 1).
VR-based training in disaster preparedness has been increasingly recognized over the past two decades (Freeman et al. 2001) as an important adjunct to traditional modalities of real-life drills. Multiple studies (Cone et al. 2011;Kurenov et al. 2009;Wilkerson et al. 2008) have highlighted VR applications in disaster training. The increased practice realism enables responders to gauge their individual and/or team's ability to execute tasks and decision making under closer to reality representative conditions. In essence, the immersive environment incorporated in VR-based training not only offers the realism that classroom-based instructive teaching lacks, but also may reduce the time and cost burden of real-life drills and tabletop exercises. Mills et al. estimated that a mass casualty triage training of paramedic students in a real-world simulation is about 13 times more expensive than in VR, while the simulation efficacy has been found near identical (Mills et al. 2019). Recent research has even indicated superior performance in simple search tasks following VR and augmented reality (AR) training of first responders as opposed to traditional classroom and realworld training in an ambulance bus (Koutitas et al. 2019(Koutitas et al. , 2020. Repetition time and the feeling of presence, which is provided in VR, both have positive effects on task performance, enabling the learning situation to be experienced similar to a real context. This in turn helps to promote experiential learning as well as the development of operational and formal thinking by facilitating the exploration of different situations and modalities. These factors are in particular important to train squad leaders, whose major tasks upon arrival on site are (1) PLAN: explore site; asses casualties, hazard sources and own capabilities; decision making, (2) DO: give commands to request support personnel, (3) CHECK: evaluate results of commands and (4) ACT: identify deviations to plan and adapt actions. Execution quality of the Plan-Do-Check-Act (PDCA) cycle is heavily influenced by the parameters perspective, locomotion, time pressure, stress and (physical) exhaustion, as depicted in Fig. 2. While in real-life drills, these influencing factors can be mimicked, they require a large time effort for prearrangement of the entire disaster environment and wrap up, which limits the time for the actual training.
To close this gap, we propose the VROnSite immersive virtual reality training platform that is fully mobile to allow for quick setup and ease of use. Therefore, we aim at decreasing the time required for pre-arrangement and wrap up to extend the actual training time. To particularly train on-site squad leaders, VROnSite incorporates the entire PDCA cycle as well as means to simulate all influential parameters, as shown in Fig. 2. The VROnSite training platform was developed in close conjunction with first responder units to iteratively integrate user feedback to ensure realworld training capabilities. This paper extends our previous research findings (Mossel et al. 2017) and presents novel results regarding the platform design rationale, its iterative design methodology and the comprehensive evaluation of the tasks PLAN and DO.

Contribution
To summarize, we have investigated and researched the following highlights: 1. Development of highly versatile mobile hardware setups to allow quick deployment training simulations at various real-world locations. With our setups, a user is enabled to get immersed into the virtual environment by providing three-degree-of-freedom head orientation tracking combined with stereoscopic scene viewing as well as free scene navigation. We integrated two different types of locomotion devices: a two-handed gamepad and an omnidirectional treadmill to allow real walking. The latter allows for vestibular and proprioceptive feedback and has been demonstrated as valid input device to increase presence (Huang 2003;De Luca et al. 2013). 3D scene rendering, visualization and data processing of the two locomotion devices is solely performed on a mobile device that resides inside a head-mounted display. 2. Development of two different single-user training scenarios, which we created in very close collaboration with disaster relief experts. Thus, we integrated all realworld requirements to create training environments that add real value to squad leaders of firefighters, one potential group of later stakeholders. 3. A comprehensive user study focusing on the tasks PLAN and DO, conducted with more than 40 first responder experts to evaluate the developed prototypic training platform in terms of quantitative and qualitative measures.

Related work
Employing VR technology to train first responders and relief units has been an ongoing research topic for over two decades (Stansfield et al. 1999) and has received additional attention with the technological advancements in VR technology during the last years. High demand for cost-efficient and safe training as well as numerous advantages such as repeated training over geographical and organizational divides as well as extended review modalities have led to the development of several academic and commercially available systems. These systems offer various foci from task-focused training to testing of emergency response plans, while technically they provide different degrees of immersion, modes of navigation, number of users, levels of mobility and amount of realism. A comprehensive overview is given in Hsu et al. (2013). Academic examples include the Immersive Video Intelligence Network (IVIN) (Ivin3D 2011), a tool offering 360 • building walk-throughs that are visualized on a mobile device's display. The building's interior is produced from photographs and is supposed to enhance the indoor situational awareness of first responder units. It does not provide an immersive setup, natural walking for navigation nor training functionality. Sebillo et al., on the other hand, present a multi-user AR mobile interfaces to improve the training efficacy for on-site crisis preparedness activities, which allows navigation by walking and includes training features (Sebillo et al. 2016). However, it lacks immersiveness and visualization capabilities indicating only points of interest and user positions. Koutitas et al. created an immersive VR and AR system allowing exploration and familiarization with the AmBus ambulance (Koutitas et al. 2019(Koutitas et al. , 2020. Results of this work show that VR and AR training can outperform infrequent traditional training and improve accuracy in certain search tasks. However, the system is limited to a specific restricted environment and tasks. Another recent immersive approach (Schönauer et al. 2020) uses Mixed Reality for CBRN crisis preparedness training. Its setup features virtual reality to simulate the actual training environment, combined with augmented virtuality to physically integrate the complex equipment items into the simulation. Furthermore, the proposed system supports navigation by real walking, full-body motion capture and multiple users and received positive feedback by domain experts. However, the system itself has very limited mobility due to its complex hardware setup and features only a single scenario. Sportevac (University-Of-Southern-Mississippi 2015) is a desktop-based virtual training scenario simulating the challenges of a stadium evacuation with thousands of avatars, and the Virtual Terrorism Response Academy (Dartmouth-College 2015) is a desktop-based and non-immersive VR environment that aids trainees practicing various terrorism threats such as chemical and biological hazards. Furthermore, there has been active development by industry to offer VR training systems. The system Enhanced Dynamic Geo-Social Environment (EDGE) (Department 2020) is a virtual training platform with the major goal of enhancing first responders' communications and coordination while also making training more efficient and cost-effective. EDGE offers multi-user support in a desktop-based, nonimmersive virtual environment using a high-quality game engine for rendering, a standard screen for visualization and keyboard and mouse for navigation. VirtSim (Motion-Reality 2020) employs tracking of users' heads, weapons and full-body motion to offer multi-user, fully immersive training for law enforcement situations. Egocentric stereoscopic 3D scene viewing is provided using standard HMDs that are connected to a user-carried notebook that performs processing, rendering and networking. Users can navigate in VR by real walking in larger sized physical spaces (20× 20 m) by employing an outside-in optical tracking system by Vicon that requires a plethora of cameras to cover the tracking volume. The hardware setup is complex, bulky and costs easily more than EUR 100.000 for the described tracking setup. Intelligent Decisions, the company behind the immersive VR system DSTS-Dismounted Soldier Training System (IntelligentDecisions 2020), announced in 2014 the system Medical Simulation, a training environment for first responders. Similar to DSTS, it is likely to offer immersive 3D scene viewing by using a standard HMD, 6-DOF head tracking in larger physical spaces, real walking for navigation, microphone and headset for communication and audio feedback and biosensors to track gaze, blood pressure and heart rate. DSTS uses notebooks that are carried by the users for processing, rendering and networking and is advertised to be set up in 4 h. No information are provided regarding tracking volume and system's costs, neither for DSTS nor for Medical Simulation. The Advanced Disaster Management Simulator (ADMS) (ETC-Simulation 2020) offers non-immersive virtual environments for training incident command and disaster management teams at all levels. It provides a large number of modeled 3D environments to train in scenarios that simulate building collapse, plane crash, crowd riot, terrorist attacks or nuclear, biological and chemical hazards. Large projection walls or standard screens are used for visualization, while interaction is performed with a variety of physical input devices, such as keyboard, joystick or driving wheel. XVR-Virtual Reality Training Software for Safety and Security (XVR 2020) offers education, training and assessment of incident commanders of operational level up to strategic level. It provides nonimmersive desktop-based training with keyboard interactions and supports multiple users to collaboratively train a scenario in a distributed virtual environment. The system costs are over 10.000 EUR. Some of the aforementioned systems have already been integrated into the training routines of state agencies and organizations. Among others, ADMS is used by the New York City Office of Emergency Management, the South Korea National Fire Safety Academy and the Netherlands National Institute for Safety. XVR is employed, i.e., by the Austrian State Fire Brigade School (Landesfeuerwehrstelle).
In particular, the systems ADMS (ETC-Simulation 2020) and XVR (XVR 2020) provide rich simulations to train multiple first responders in various disaster scenarios. However, they target the training of (1) remote control room personnel to train unit tactics and communication and (2) on-site squads to train specific techniques and procedures during disaster relief deployment. An effective training of on-site squad leaders-who could tremendously benefit from immersive VR as outlined before-is not covered by prior art.

System design
As introduced in the motivation, two major requirements need to be fulfilled to overcome current limitations in squad leader training: 1. Realism: all tasks of the PDCA cycle combined with the parameters (perspective, locomotion, time pressure, stress and (physical) exhaustion) must be incorporated. 2. Effectiveness: to save costs and time, a minimum amount of hardware and environmental preconditioning is required to setup the training system, while the software pipeline allows different scenario generations.
To meet the two requirements, our system design comprises a head-mounted display (HMD) for 1st person and stereoscopic viewing combined with two different means for travel (Fig. 8). In Setup 1, we provide a gamepad for straightforward and easy to use navigational input, while Setup 2 incorporates an omnidirectional treadmill that allows real walking and the simulation of stress and physical exhaustion. To ease setup and save costs, we employed a mobile HMD, featuring a Samsung GearVR with integrated 3-DOF orientation tracking combined with a Samsung Galaxy S6 mobile device. The mobile device acts as the core processing unit and wirelessly receives over Bluetooth all navigational inputs, either from the gamepad (Steelseries Stratus XL) or the treadmill (Cyberith Virtualizer). To create immersive training scenes, we built upon the VR software framework ARTiFICe (Mossel et al. 2012). As core module, ARTiFICe uses Unity3D, which is a 3D game engine and editor and natively deploys code and applications on mobile devices running Android or iOS. We extended Unity3D and ARTi-FICe with a number of necessary framework extensions and novel modules to interface with all necessary VR hardware and provide a common application layer for single-and future multi-user training. To allow an untethered connection between the treadmill and the mobile device worn by the user in Setup 2, we developed a wireless data transmission module and a Bluetooth Unity plug-in. The transmission module is a C# application and runs on a Windows 7 notebook connected to the Virtualizer via USB. A screenshot is shown in Fig. 3a. Its counter piece on the mobile device is implemented as JNI (Java Native Interface) and integrated into Unity3D as library. To access the plug-in's functionality within Unity3D and the training application, we implemented a controller script in C#. Upon start of the VR training, the application provides the user with a menu within the HMD view to select, connect to and use the desired Bluetooth device, as shown in Fig. 3b. The components handle communication in separate threads and optimizations ensure minimal latency of the navigational input.
Furthermore, we designed and implemented a common application layer that interfaces with the input (Gamepad, Virtualizer) and output devices (Samsung GearVR, headphones), as well as loads, displays and runs the training scenarios. Our modular software approach is depicted in Fig. 4. As illustrated, the application layer builds upon Unity 3D core functionalities, while integrating the input from the Bluetooth input devices that are-in case of the Virtualizer-fed into Unity using our aforementioned novel JNI android library. The application layer can load and run multiple scenarios and visualize them within the GearVR HMD. This makes our platform highly versatile and allows good scalability. To ease the creation of different training scenarios, we implemented an editor tool in C# that features different types of landscapes, a 3D model database including houses, cars, hazard sources, personnel, casualty pattern and characters that an operator can insert into the 3D scene by drag and drop. The editor is integrated into Unity3D to benefit from Unity's rich 3D authoring capabilities and its mobile deployment and publishing functionalities.

Training scenario design
In close collaboration with squad leaders of fire brigades, we developed multiple scenarios in an iterative design process enabling training of the tasks PLAN and DO.

Training scenarios
The result of the design process are two training scenarios, comprising the following elements.
1. A small fire in a garden behind a garden shed where gas cylinders are stored. On a nearby patio, a gas grill is located. Spectators are standing at the garden fence as bystanders without interfering with the scene. A screenshot is shown in Fig. 5.

A car accident on a street crossing where bystanders
interfere with the victims and the squad leader by talking and moving. One victim is trapped behind the steering wheel, and one victim sits in front of its car being dizzy and complaining about a headache. A screenshot is shown in Fig. 6.
These single-user scenarios can be used for multi-user training, as planned for the future, because our application layer provides one common interface for both single-and multi-user training scenarios.

Iterative design process
Together with an expert from the firefighting domain, experienced in command and training of fire brigades, we developed two training scenarios. The scenarios are aligned with the procedures and principles described in the handbook for basic training of firefighters (Österreichischer Bundes Feuerwehr Verband 2011) and have been designed in close collaboration with a larger number of squad leaders of fire departments. Both mimic typical disaster sites while providing a medium amount of complexity, making them sufficient for realistic squad training simulation within the later conducted user study.
We employed the three-step iterative process illustrated in Fig. 7 to define well-aligned situations. First our expert sketched the ideas and layouts for each scenario. Then we translated the pencil sketches into actual 3D environments, using the software tools Unity3D, SketchUp and Blender for rapid prototyping and modeling. We incrementally tested the scenarios in our VR labs in terms of technical functionality and performance and conducted meetings with our expert to gather early feedback. This feedback was fed into the design approach to adapt and improve the scenarios to meet the desired first responder training objectives. During the implementation of the scenarios, we iteratively evaluated them on a quantitative level and found the need for polygon reduction to ensure rendering at high update rates. To trim the polygon count, we introduced bill boards and skyboxes for the visualization of vegetation and geometry at far distance. Furthermore, we employed and designed low-polygon models for avatars, vehicles, buildings and various objects.

Preliminary results
We went to our stakeholders at two separate stages and conducted user studies of different versions of the prototype. Preliminary results from tests with a total of 31 participants are described in the following, while the second test iteration is described in Sects. 5 and 6 in more detail. Feedback from the participants in turn was used to advance the development of the scenarios .
For the first iteration of stakeholder feedback, we built an early prototype upon real-world training requirements based on our experts sketches. It was subsequently demonstrated to the stakeholders. The scenario prototype simulated a small fire and a traffic accident with injured people, as well as animated bystanders. To gather subjective early feedback, we demonstrated and tested the first scenario with hardware Setup 1 with a total of seven fire brigades 1 and a total of 31 participants. Most had leadership positions (91%) and were training others (89%) in their respective organizations and can therefore be considered experts for our use case. Nevertheless, the fact that only 20% percent of these leaders have more than seven training opportunities per year themselves shows the potential benefit of our system. The participants could explore the scenario at their own leisure and were asked for subjective feedback on usability in a post-questionnaire with a four-point Likert scale with the values insufficient (1), barely sufficient (2), adequate (3) and very good (4). Most participants (91%) rated Setup 1 (demonstrated) and Setup 2 (explained) to be very well or adequately suited for training (Setup 1: = 3.09 , = 0.5 , Setup 2: = 3.4 , = 0.64 ). All but one of the firefighters rated the virtual environment's visual quality very well or adequately suited for training. During the early presentation, we received very positive and valuable feedback, but determined challenging requirements. For example, it is important for the firefighters that materials of houses and building structures have to be visually recognizably. However, our mobile platform can only render a limited number of details. Therefore, parallel to the iteration of training scenario design we optimized the content for performance and the high update rates required for VR. For the next iteration, we developed the two singleuser scenarios described at the beginning of the section acknowledging the stakeholders' specific feedback and adding further realism by different patterns of injury, reactions to the user's presence through audio and bystanders interfering with the victims and the squad leader by talking, shouting and moving. Based on these scenarios, we conducted a comprehensive user study as described in Sect. 5. In future work, we will use feedback gathered during this study for a new design iteration and preparation for a multi-user version of our VR training system.

User study
We conducted a user study to gain insights into the capabilities of the developed training platform to support on-site squad leader training to train the PDCA cycles components PLAN and DO, using virtual reality.

Objective
For the user study, we evaluated the platform's usability, the participants' perception as well as their perceived task loads when assessing the two virtual disaster sites (as described in Sect. 3) with two different types of navigational input Since there is no related work on using immersive VR for on-site squad leader training, we aimed at studying the concept by assessing the following research questions with the conducted user study: 1. Does the training platform enable training of on-site squad leaders? Does it sufficiently support training of decision making? 2. Does the training system sufficiently respond to user interactions with the virtual environment? 3. Do the participants indicate one hardware setup to be better suitable for on-site squad leader training? 4. Which factors of the virtual simulation have been found most important for on-site squad leader training?

Apparatus and test environments
We used the proposed training platform from Sect. 3 to conduct the experiment. Photographs of the two setups are shown Fig. 8. With Setup 1, we used a gamepad to provide straightforward and easy to use navigational input, while Setup 2 incorporates an omnidirectional treadmill (ODT) which allows real walking in place, and therefore simulation of stress and physical exhaustion.

Study task
For the study, we immersed each user into two training scenarios. Therefore, we used the developed scenes garden fire and car accident, as described in Sect. 4. The participant had to explore the scenarios to asses casualties, hazard sources and own capabilities (PLAN) and accordingly plan and give their commands (DO) to request support personnel.

Study design
The study procedure for each user consisted of seven stages: (1) introduction and pre-questionnaire, (2) familiarize with Setup X using a virtual reality test scene, (3) on-site squad leader training in either Scenario 1 or 2, (4) post-questionnaire for Setup X, (5) familiarize with Setup Y using a virtual reality test scene, (6) on-site squad leader training in either Scenario 1 or Scenario 2 and (7) post-questionnaire for Setup Y. The order in which the setups were presented was randomized for all participants; therefore, X and Y in the above description denote either 1 or 2. At stage (1), users were informed about the study and the procedure, followed by filling out a pre-questionnaire. At stage (2, respectively, 5)-the user was introduced to the input and output hardware-either Setup 1 or 2-by explanation and demonstration. Next, users had up to 5:00 minutes to familiarize with the hardware by freely walking in a virtual test environment, which comprised a simple Unity3D scene with some artificial virtual objects, such as a house and a street. As soon as the user felt confident, she or he could start the actual training by walking through a virtual gate within the test environment; this triggered loading of the training scenario and the start of stage (3, respectively, 6). Within the actual training scenario, the user could freely walk to explore and assess the scene, and at each time, the user could communicate a command to request additional supplies. Upon completion of the training scenario, the user had to fill out a post-questionnaire (4, respectively, 7).

Methods
We ran three tests and collected quantitative objective and subjective measures as well as qualitative data (using openended questions) with a the post-questionnaire's quantitative measures are summarized in Tables 1, 2 and 3. For all subjective quantitative measures, we used a 5-point Likert scale. Furthermore, we encouraged the participants to think aloud during the experiment and visually observed them to examine their physical stress level, such as sweating or faster breathing.
In Table 3, each factor denoted participants' degree of agreement with "Rate the importance of the following factors in terms of support for on-site squad leader training." Scale levels were: not important (1), slightly important (2), moderately important (3), important (4), very important (5).

Experimental results
First, we ran exploratory data analyses on the results grouped by Scenario and Setup. Since the descriptive statistics revealed no significant tendencies in the data grouped by Scenario, we combined the results of the independent variable Scenario for further analysis to focus on our main objectives (as given in Sect. 5.1) by studying (1) system usability, (2) influence of navigational input device and (3) important simulation factors for virtual on-site squad leader training. For further analysis, we conducted the study using a repeated measurement design with the independent variable Setup, while the dependent variables were the objective and subjective measures, as denoted in Sect. 5.5. Each setup type was performed and completed by all 41 participants. First, we tested the metric data (training time and path length) for positive correlation as a prerequisite for parametric tests. For training time, data of Gamepad and Virtualizer positively correlate with Pearson correlation coefficient of r = .356 ; hence, we analyzed this variable using paired t-tests and employed a 95% confidence interval ( p <= 0.05 ). We calculated the effect size using Cohen's d with pooled standard deviation, as our assumption of normality held true with p > 0.05 , resulting from the Kolmogorov-Smirnov test. For path length, Pearson correlation coefficient of r = −.325 indicated a negative correlation; thus, we applied Wilcoxon signed-rank test with p <= 0.05 . The remaining dependent variables represent ordinal data and were analyzed using Wilcoxon signed-rank test with p <= 0.05 ; due to the small sample size, we performed exact tests.

Participants
Forty-one (41) participants (100% males) were involved in the experiment, and 41 participants successfully finished the experiments. All participants were part of Austrian fire brigades 2 and thus involved in real-life first responder duties on a regular base. Participants' ages ranged between 19 and 56 years (mean = 36.73, = 9.39 years). Thirty-seven participants (90.24%) reported to be deployed as on-site squad leaders, while 34 participants (82.93%) are actively engaged as on-site squad leader educators. We asked the participants to report-on average for a period of one year-the number of exercises that are prepared for them, the number of exercises they participated in and the number of exercises they prepared for others. The details are shown in Table 4.
The participants rated themselves with an average fitness ( = 3.32 , = 0.123 ), with extrema at 1 = no fitness and 5 = strong fitness. Thirty participants reported to have no [Pre-Knowledge with Virtual Reality], nine (9) little preknowledge and three (3) somewhat pre-knowledge. Fourteen participants reported to have no [Experience using a Gamepad], twelve (12) to have little experience, nine (9) have somewhat experience, and six (6) have experience. Forty participants reported to have no [Experience using the Virtualizer], and one (1) reported to have little experience.

Analysis of system's usability
Firstly, we evaluated the proposed training platform regarding its applicability to train on-site squad leaders as well as its ease of use. Therefore, we evaluated the measures from Table 2 of all participants, on average as well as depending on hardware setup, as shown in Fig. 9a and b. Overall, we achieved a high acceptance of the proposed training platform. On average, the participants rated the system's capabilities to support training of on-site squad leaders ([Support Squad Leader Training]) with = 4.17 , = 0.61 . We analyzed the rating depending on hardware setup and found that on average participants indicated the setup with the Virtualizer only slightly less suited to support training of on-site squad leaders ( = 4.10 , = 0.80 , Mdn = 4.00 ) than the hardware setup facilitating the Gamepad ( = 4.24 , = 0.66 , Mdn = 4.00 ); the difference was not significant z = −1.107 , p = .27 . Furthermore, participants indicated a high capability of the system to train decision making ([Training of Decision Making]) with = 4.29 , = 0.61 . Splitting this rating by setup type, Fig. 9 Means and standard deviation of analyzed dependentvariables, split by interaction device participants found on average using the Virtualizer slightly less suited to train decision making for on-site squad leader training ( = 4.27 , = 0.81 , Mdn = 4.00 ) than the hardware setup facilitating the Gamepad ( = 4.32 , = 0.65 , Mdn = 4.00 ); the difference was not significant z = −.25 , p = .81 . Moreover, the system overall achieved very high ratings for providing participants with capabilities to obtain a quick overview of the disaster site ([Obtain Quick Overview]), with = 4.62 , = 0.48 . On average, participants using the Virtualizer found it slightly slower to gain an overview ( = 4.54 , = 0.75 , Mdn = 5.00 ) than using the Gamepad ( = 4.71 , = 0.51 , Mdn = 5.00 ); the difference was not significant z = −1.27 , p = .21.
In addition to the factor of obtaining a quick overview, we investigated the perceived speed of participants' to [Identify Hazardous Areas and Casualties] with the training platform. Again, overall ratings were very high ( = 4.58 , = 0.51 ). Split by setup type, on average, participants using the Virtualizer found it slightly slower to identify dangerous areas and casualties ( = 4.49 , = 0.75 , Mdn = 5.00 ) than using the Gamepad ( = 4.68 , = 0.57 , Mdn = 5.00 ); the difference was not significant z = −1.24 , p = .16 . Also, we investigated how well the participants perceived the system's capabilities to [Request Additional Support], such as staff and material. Again, ratings were overall high ( = 4.51 , = 0.54 ). On average, participants using the Virtualizer found it a bit harder to request necessary additional support ( = 4.37 , = 0.86 , Mdn = 5.00 ) than using the Gamepad ( = 4.66 , = 0.53 , Mdn = 5.00 ); the difference was not significant z = −1.97 , p = .41 . Next, we asked for the [Degree of Immersion] the platform provides. Overall, the results indicated a high degree of presence ( = 4.60 , = 0.61 ). Splitting by setup, participants reported an equal degree of immersion for both Virtualizer ( = 4.61 , = 0.63 , Mdn = 5.00 ) and Gamepad ( = 4.61 , = 0.74 , Mdn = 5.00 ), z = 0.00 as the sum of negative ranks equals the sum of positive ranks, p = 1.0 . Overall, the [Acceptance of Virtual Bystanders], such as their actions, were found sufficient, with = 3.67 , = 1.07 . On average, participants using the Virtualizer found the virtual bystanders within the training scenarios slightly more engaging ( = 3.76 , = 1.11 , Mdn = 4.00 ) than using the Gamepad ( = 3.56 , = 1.18 , Mdn = 4.00 ); the difference was not significant z = −1.37 , p = .17.
To study the subjective perception of the virtual bystanders even more, we investigated the [Realism of Virtual Bystanders]. Overall, the participants found the realism sufficient with = 3.13 , = 0.75 . On average, participants using the Virtualizer found the virtual bystanders slightly less realistic ( = 3.05 , = 0.87 , Mdn = 3.00 ) than using the Gamepad ( = 3.22 , = 0.85 , Mdn = 3.00 ); the difference was not significant z = −1.33 , p = .18.
Besides perceived factors of the virtual simulation, we analyzed the [System Usability]. Overall, the participants rated the system with a mean score of = 82.19 (out of 100), = 10.96 . On average, participants using the Virtualizer scored the system less usable ( = 80.06 , = 13.87 , Mdn = 85.00 ) than those using the Gamepad ( = 84.33 , = 13.80 , Mdn = 85.00 ) and this difference was found significant z = −2.31 , p = .02 . Furthermore, we evaluated the [Training Task Load] perceived by the participants while performing the training. Overall, the participants rated the system with a raw TLX mean score of = 286.34 , = 46.83 . Splitting by setup, participants using the Virtualizer rated the training setup with a higher TLX score (raw) ( = 302.44 , = 49.23 , Mdn = 300.00 ) than those using the Gamepad ( = 270.24 , = 49.22 , Mdn = 260.00 ) and this difference was found significant z = −4.67 , p = .000.

Analysis of navigational input device
Besides perceptional factors of the simulation, we evaluated the performance and acceptance of the navigational input device, using the measures described in Table 1; the results can be reported as follows. On average, participants using the Virtualizer as navigational input device traveled smaller distances ([Path Length]) within the scenario ( = 424.83 , = 373.60 , Mdn = 253.00 ) than those using the Gamepad ( = 691.71 , = 492.87 , Mdn = 546.00 ). However, the difference was not found significant z = −1.93 , p = .053 . Next, we evaluated the time participants required to accomplish the training task ([Training Time]) using the paired t-test. On average, participants using the Virtualizer trained slightly longer ( x = 227.83 , standard error mean x = 13.97 ) than those using the Gamepad ( x = 223.988 , x = 11.63 ). This difference was not found significant with t(40) = −.26 , p = .79 , and represented no effect, d = −0.04.

Analysis of important factors for squad leader training
Finally, we evaluated the importance of several simulation factors, as described in Table 3. Means of the factor ratings revealed that participants rated all factors in average as important ( >= 4.0 ); the details split by navigation device are illustrated in Fig. 10a and b. Figure 11 illustrates an accumulation of the answers. When analyzing the cardinality of importance of each factor in detail, we can report the following results.  Analyzing the importance of the factors for on-site squad leader training depending on hardware setup, no significant differences have been found. Next, we ran two secondorder studies to analyze the importance of the simulation factors for [Leaders], describing the group of participants who reported to be deployed as on-site squad leaders (37 out of 41), and for [Educators], describing the group of participants are actively engaged as on-site squad leader educators (34 out of 41). For [Leaders], no significant differences have been found when analyzing the importance of the factors for on-site squad leader training depending on hardware setup. For [Educators], the simulation factor [Interaction with Virtual Environment] tended to be more important when using the Gamepad ( = 4.29 , = .52 , Mdn = 4.00 ) than when using the Virtualizer ( = 4.12 , = .60 , Mdn = 4.00 ); however, the difference was not significant z = −1.89 , p = .058 . The [Realism of Bystanders] was more important when using the Gamepad ( = 4.18 , = .58 , Mdn = 4.00 ) than when using the Virtualizer

Discussion
We conducted the user study to gain insights into the capabilities of the developed training platform to support onsite squad leader training using virtual reality, testing the Denoting participants' degree of agreement with "I was able to easily navigate through the virtual environment." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree(5) Speed with Nav. device Denoting participants' degree of agreement with "I was able to quickly navigate through the virtual environment." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree(5) Joy to navigate Denoting participants' degree of agreement with "Rate how much you enjoyed operating the navigation device." Scale levels were: Scale levels were: not at all (1), not much (2), neither nor (3), much (4) and very much (5) Navigation task load Raw sum of the questionnaire's results measured with the NASA Task Load Index (TLX) (Nasa and Administration 2010), where each question was rated with a 100-point scale, divided into the five scale ranges: very low (1), low (2), average (3), high (4) and very high (5) Table 2 Measures for system usability evaluation Training squad leaders Denoting participants' degree of agreement with "Rate your experience with the training platform in terms of support training of on-site squad leaders." Scale levels were: Very poor (1), poor (2), acceptable (3), good (4), very good (5) Training of decision making Denoting participants' degree of agreement with "Rate your experience with the training platform in terms of training of decision making." Scale levels were: Very poor (1), poor (2), acceptable (3), good (4), very good (5) Obtain quick overview Denoting participants' degree of agreement with "I was able to quickly obtain an overview of the disaster site." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree (5) Identify hazardous areas Denoting participants' degree of agreement with "I was able to quickly identify dangerous areas and injured persons." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree (5) Request additional support Denoting participants' degree of agreement with "I was able to request necessary support vehicles and personnel." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree (5) Degree of immersion Denoting participants' degree of agreement with "I had the feeling to be literally inside the disaster site." Scale levels were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree (5) Acceptance of bystanders Denoting participants' degree of agreement with "Rate how much you liked the actions of the virtual bystanders." Scale levels were: not at all (1), not much (2), neither nor (3), much (4) and very much (5) Realism of bystanders Denoting participants' degree of agreement with "Rate how much the bystanders' actions match with your realworld experience." Scale levels were: not at all (1), not much (2), neither nor (3), much (4) and very much (5) System usability Normalized sum of the questionnaire's results measured with the System Usability Scale (SUS) (Brooke 1996). Scale levels for each of the 10 SUS questions were: strongly disagree (1), disagree (2), neither agree nor disagree (3), agree (4) and strongly agree (5) Training task load Raw sum of the questionnaire's results measured with the NASA Task Load Index (TLX) (Nasa and Administration 2010), where each question was rated with a 100-point scale, divided into the five scale ranges: very low (1), low (2), average (3), high (4) and very high (5) These findings were further supported by feedback through the questionnaire, where nine users stated that they would use the system on a daily base for training. Furthermore, users commented with a lot of enthusiasm right after the training by thinking aloud: "I found the system very compelling," "This would be such as an asset in education," "Both scenarios are very common for every day deployment and being forced to walk made it very realistic." The qualitative feedback also revealed future improvements regarding the visual quality of the 3D simulation, such as "The graphics should be more high quality" and "The viewing was very fast, but the 3D models could have had more details." Regardless of the limitations in 3D rendering quality, participants have been found to be very well immersed into the virtual scenarios ([Degree of Immersion]), resulting in a high degree of presence, with no tendencies favoring one setup over the other. These quantitative results were also backed by qualitative feedback, such as "I really had to walk around the whole house like in reality," "How can I open the door of the car?," and "Hello? Can you hear me?" (talking to one of the casualties).
Analyzing the system's capabilities of interaction between the participants and the virtual environment, we can report the following results. The quantitative data indicated that the realism and the interactions of and with the bystanders ([Acceptance of Bystanders] and [Realism of Bystanders]) were only found sufficient by the participants. This is also in parts reflected by the subjective feedback of 16 participants, such as "The bystanders are too quiet, they need to scream and talk more" and "I would have liked to be able to talk to them." Nevertheless, qualitative feedback indicated that the virtual bystanders' interaction, which respond to participants' interaction-such as movement-was found engaging. Comments ranged from "The bystanders came so close, so typical" to "They were standing in my way and wanted to help, as always." Furthermore, the system in general and both hardware setups were found to provide very good means to obtain a quick and straightforward overview of the disaster site ([Obtain Quick Overview]), as well as to identify casualties and hazardous areas ([Identify Hazardous Areas]. Participants commented enthusiastically about the possibility to freely walk in space, such as "I could walk as in reality" and "I really needed to go there to evaluate the situation; being able to freely look around was great and I did not need to think how to rotate." Beyond obtaining an overview-correlating with PLAN-the training platform was found to provide very good means to [Request Additional Support]-correlating with DO, for both setups, no significant tendencies have been found favoring one setup over the other. The qualitative feedback of the participants also showed that the PDCA component PLAN is mapped very realistically by the VROnSite Platform. For communications of the operational commands (DO), the participants reported back that in reality they often pass on commands to team members by means of gestures (e.g., slight head and hand movements) rather than verbally. However, gestural communication is not provided in the current version of VRONSite, but this was perceived by the participants as an interesting side effect, as it meant that they had to verbalize their commands exactly and thus think things through more carefully. The qualitative data further revealed the high demand of the participants to be enabled by the training platform to analyze their commands (CHECK) and adapt them accordingly (ACT). Therefore, commands (i.e., opening a broken door with a specific tool) would need to be transformed to virtual actions and integrated at run-time. This highly valuable feedback is subject to our future research.
We furthermore wanted to gain insights whether one navigation device is better suited or subjectively preferred by the participants for on-site squad leader training. On average, training with the ODT was found to result in a significant higher task load than with a Gamepad ([Training Task Load]). This finding is furthermore supported by the task load reported for the navigation device ([Navigation Task Load]); again, the ODT was found to result in a significant higher task load than using a Gamepad. These quantitative findings are aligned with our observations during the experiment, as all participants showed signs of increased physical effort (sweating, faster breathing) while using the Virtualizer. Participants using the ODT traveled less ([Path Length]) than with the Gamepad-however, not significantly-and reported a significant slower perceived navigation speed ([Speed with Navigation Device]) compared to the Gamepad. Interestingly, training using the ODT did not result in longer training times compared to the Gamepad. A possible reason could be that the participants have planned their routes in advance in order . It was rather revealed by the qualitative feedback that the increased level of physical stress and exhaustion when using the ODT was found very useful for the training, as it reflects reality well ("Perfect that I needed to walk," "That was pretty exhausting but good to mimic a realistic training"). In addition, participants commented on the good ease of use of the ODT after the entire training while mostly being skeptical before the training ("I think if I use this two or three times again, then it is really easy to walk and I am used to it," "The Virtualizer was entirely new to me and thus, I found it hard in the beginning. But after some minutes of training, it became easier and more engaging," "With the Gamepad it was very easy to walk, but I missed the decoupled viewing from my body rotation"). Finally, we evaluated the importance of key factors of an immersive simulation for on-site squad leader training. when using the Gamepad compared to the ODT. A possible reason may be that users could focus more on the virtual environment using the Gamepad, as they had no physical effort and thus distraction as with the ODT. Therefore, they could better notice the only sufficient visual representation of the bystanders. Our results indicate that for on-site squad leader training with a focus on fire brigades, free navigation, sound and interaction were found to be most important to train the assessment of a disaster situation and more important than a photo-realistic, high-quality 3D environment. The realism of the virtual bystanders should nevertheless be improved, as reported by educators for on-site squad training, and will be subject to our future research.

Conclusion and outlook
In this paper, we have presented the prototype of a highly versatile mobile virtual reality platform for on-site squad leader training of disaster relief units. This research closes the gap in prior art that does not provide training of first responder staff at this command level. With our prototypic training platform, users are enabled to get fully immersed into the virtual training environment by providing threedegree-of-freedom head tracking combined with stereoscopic scene viewing as well as free scene navigation. Our prototype is fully untethered and uses the mobile device within the head-mounted display as core computing unit for stereoscopic rendering and processing of the wireless locomotion input. This input is either sent by a two-handed gamepad for abstract but straightforward to use or by an omnidirectional treadmill to simulate stress and exhaustion by natural walking that further incorporates vestibular and proprioceptive feedback. Our hardware prototype runs an extendable software framework for immersive training that currently provides two different training scenarios for fire brigades. New training environments can be straightforwardly built using our framework and the authoring capabilities of Unity 3D. All training scenarios have been iteratively developed in very close collaboration with disaster relief experts. Thus, we ensured the integration of real-world requirements to create training environments that add real value to the later stakeholder.
The hard and software components have been comprehensively tested by a user study with 41 first responder experts to evaluate the developed prototypic training platform in terms of quantitative and qualitative measures. The experimental results of the user study, focusing on the PDCA cycle tasks PLAN and DO, indicated the high acceptance of our platform to support on-site squad leader training and to train decision making in complex first responder scenarios. Participants reported a high degree of presence, with no tendencies favoring one locomotion setup over the other. Quantitative data revealed a higher task load when performing the training tasks with the ODT; however, qualitative feedback revealed that this increased level of physical stress and exhaustion was found very valuable to mimic real-life drills. These findings reveal that engaging squad leader training can already be achieved using a simple and cost-efficient hardware setup (gamepad), while a more expensive setup (ODT) adds another layer of realism by allowing simulation of stress. Overall, navigation was found to be key for the on-site squad leader training. This is also reflected by the results we found when analyzing the subjective importance of the immersive simulation factors [Realism of the 3D Simulation], [Interactions with Virtual Environment], [Realism of the Bystanders], [Sound] and [Free Movement]. Free navigation has been found most important by the test subjects, followed by sound and interaction. Although users repeatedly criticized the rendering quality of the virtual environment and the realism of the virtual bystanders, these factors have been ranked less important for training of on-site squad leaders. Nevertheless, the results from the group of participants, who work as educators for real-life drills, indicated the importance of more realistic behavior of the bystanders and thus will be subject to our future research. To summarize, our results indicate that for onsite squad leader training with a focus on fire brigades, free navigation, sound and interaction were felt to be most important to train the assessment of a disaster situation. The majority of the participants reported the tremendous added value of the current prototype to mimic real-life thrills. However, users were missing the PDCA cycle factors ACT and CHECK within the training prototype. Thus, integration of these two factors into our proposed training framework by allowing 3D object interactions and translating commands into actions within the virtual simulation will be subject of our future research. Therefore, we will close the PDCA cycle and can provide a fully fledged immersive on-site squad leader training.
Another important aspect of the proposed system is mobility. It is particularly important for real-world usage that training of virtual scenarios can be performed in any real-world environment without the need of its prior adaption. Therefore, the use of VR training is advantageous compared to AR training. However, with a view towards on-site crisis preparedness, integration of real equipment items and nonverbal communication between trainees in the multiuser setup planned for the future, AR could be of increased benefit. Nevertheless, current AR hardware with its limited field of view reduces immersion and hinders visual assessment of a scenario, which is especially important for the exploration phase in the PLAN task. However, upcoming AR hardware could provide interesting opportunities to integrate real environments and tools into future versions of our training system.