1 Introduction

Using Virtual Environments (VEs) to train professionals is not a new concept. In fact, training professionals is one of the most important tasks for which Virtual Reality (VR) was conceived and used (Lombard and Ditton 1997). The literature presents a vast number of articles that explore the benefits of this technology in all types of professional occupations, covering the training of simpler tasks, such as sorting delivery orders into designated shelves (Elbert et al. 2018), to more complex tasks such as training surgical procedures (Aïm et al. 2016). VR as a training tool raises a high level of interest both in the literature and in the industry due to the advantages it presents when compared to traditional training methods, such as the reduction of costs and risks (Ragan et al. 2015; Schlueter et al. 2017), the ability to simulate any type of scenario and the ability to deliver the same training consistently and with an unlimited number of sessions (Visser et al. 2011). These advantages are valuable for fields such as firefighter training because they speed up the training of new recruits and allow experienced firefighters to repeat training whenever they want, thus maintaining or even improving the skills needed to perform their job without compromising their physical integrity and preparedness.

One of the biggest challenges with training VEs is developing one capable of training individuals, which positively contributes to variables of interest such as the transfer of knowledge and/or the acquisition or improvement of a skill. This study’s motivation was to consolidate the training process by exploring how a training VE can be used to leverage Real Environment (RE) training, making the overall training process more efficient. For this purpose, the main goal of this work was to evaluate and compare the effect of a firefighter training VE against RE training. The authors developed a virtual training scenario that replicated the real-world training scenario and exposed participants to both training scenarios to achieve a proper comparison. As dependent variables, participants’ perception of fatigue and stress, transfer of knowledge, sense of presence, cybersickness, and actual stress, measured through participants’ Heart Rate Variability (HRV), were assessed. The HRV was considered to allow us to perform a direct and objective analysis of participants’ stress and assess how close/far the VE is to evoke the same type of physiological response shown in a RE. The validity of this measure for physiological stress is discussed in Sect. 3.3—Instruments.

Firefighters carry out the chosen exercise during the training of recruits. Its goal is to teach recruits the procedure they should perform in an urban fire scenario and are faced with a closed door. This exercise teaches recruits how to assess the fire situation inside the enclosure, how the door opening procedure is performed, and how to position and collaborate with other team members during the door opening procedure. Two reasons led to the choice of this training exercise in particular, one of which was the existence of a partnership between the authors and a local fire department, which allows an iterative development of the training VE and a consequent validation of it. The other reason was the possibility of this VE being used in the future to complement the training of firefighters. This would benefit firefighters, as the exercise is not performed as frequently as the commanders would like, primarily due to the high costs associated with the required resources.

2 Related work

One of the threats to operationalising a VE for training firefighters or any occupation is the uncertainty of transferring knowledge and/or skills to the RE. To combat this threat, it is crucial to research the ability of a VE to transfer knowledge and skills to the RE and the ability of a VE to be perceptually equivalent to its RE counterpart.

The literature presents a few works that address using VEs for firefighter training and study their effect on users’ performance. To the best of the authors’ knowledge, the first work to appear in the literature was conducted by Bliss et al. (1997). In this work, the authors compared the effectiveness of using a blueprint to navigate inside a building (traditional method) against a VE, presented through an I-glasses Head Mounted Display (HMD), in teaching spatial navigation skills. The results of this study were positive, indicating that the plant and the VE were equally effective and that both proved superior to lack of training. In the same year, Tate et al. (1997) presented a study where a VE, visualised through a Virtual Research VR4 HMD, was used to train firefighters against ship fire situations and practise missions (mission rehearsal). The results showed that the participants who used the VE to practise missions improved significantly compared to those who did not have access to the VE. In the following years, the works that presented VEs to train firefighters did not report tests with firefighters or users in general; they did not focus on the effect of virtual training on reality but on other aspects, such as implementing the VE or the realistic simulation of flames (e.g. Li et al. 2005; Querrec et al. 2003). The following work to study the effect of a VE on firefighter training came almost a decade later (Backlund et al. 2007). This work presented a game-based firefighter training simulator named Sidh-, which was developed for Cave Automatic Virtual Environments (CAVE). A CAVE is essentially an empty room in the shape of a cube in which each surface—the walls, floor, and ceiling—may be used as projection screens to create a highly immersive virtual environment (Cruz-Neira et al. 1992). The goal of Sidh was to train inexperienced firefighters in a task that consisted of entering a burning building and looking for victims. To evaluate the VE, the authors compared the performance of firefighters who used the VE to train the procedure against the performance of firefighters who did not use VE. The results indicated the system's usefulness and suggested that training simulators to train firefighters was feasible.

Another work that targeted inexperienced firefighters was Cha et al. (2012). The authors presented a VE, visualised through an HMD (unspecified), which aimed to allow firefighters with little experience to train in simple rescue and evacuation activities. In this work, the authors mentioned positive feedback from a user study, suggesting that conducting a simple firefighter training course using VR technology is possible. Cohen-Hatton and Honey (2015) carried out three experiments to train firefighter commanders where the effect of goal-oriented training on decision-making processes was evaluated. In the three experiments, participants were exposed to different training scenarios and used different equipment, namely immersive (Oculus Rift DK1) and non-immersive (typical monitor). In all experiments, groups that received goal-oriented training were more likely to develop explicit plans rather than move directly to action. Based on the results, the authors stated that firefighter training could be readily altered to promote the explicit formulation of plans that could facilitate the sharing of plans between incident commanders and their teams.

Other related works in the literature are those of Mossel et al. (2017), Ríos Jerez (2022) and Clifford et al. (2018). The former presented a VR system, visualised through a Samsung Gear VR HMD, which aimed to train first aid team leaders such as firefighters. In this work, the authors carried out tests with firefighters. However, the goal of these tests was not to evaluate the effect of the VE on users’ performance but rather to evaluate its usability of the VE. Ríos Jerez (2022) proposed a serious game for firefighters training where users are faced with emergencies and can both interact with firefighter tools to respond to the emergency or interact with virtual agents to delegate functions and collect information about the emergency. The latter presented a VR system for aerial firefighters who oversee aerial suppression of wildfires. The authors presented a study with users where the goal was once again not to study the effect of virtual training on reality but to compare the acquisition of situation awareness, the sense of presence and cybersickness between three visualisation methods: a regular television, a 270° projection system and an Oculus Rift CV1 HMD. The results revealed a significant increase in spatial awareness and sense of presence in immersive conditions. They also revealed that although the level of immersion was higher with the HMD, the participants preferred the projection method because it caused less cybersickness. The relationship between these variables and their effect on the VE outcome shows why assessing the VE participants’ sense of presence and cybersickness is important.

More recently, the authors have published a few works addressing the training of firefighters with VR (Narciso et al. 2019, 2022). Such works have focussed on comparing a number of variables between a real-world firefighting training exercise and a virtual replica of that same exercise as a way to assess and measure the effectiveness of the VE. Although a different training exercise was used in the mentioned works, the variables and experimental methods were very similar to the ones used in this study. Overall, the results of our previous works have been encouraging, suggesting that the VE can transfer knowledge, evoke a sense of presence, and be successful in not causing cybersickness. Moreover, the HRV analysis shows that participants exhibited signs of stress, a positive result as it demonstrates the ability of the VE to elicit the type of response that is expected in the real world. However, the stress has always been noticeably stronger in the real world. This indicates that the VE is on a promising path, but at its current form, it cannot fully replace the real-world training exercise. Despite this, given the need for constant training and the requirements of some exercises that can result in firefighter trainees skipping some training exercises during their formation, our stance has been that although VE cannot replace the real environment, it can be used as a complement.

Overall, existing literature seems to favour using VEs for firefighter training. However, the reduced number of works, different categories of firefighter work, and the reduced number of tests performed with firefighters or users in general compromise the existence of clear evidence that the knowledge and/or skills trained in the VE transfer to reality. To contribute to the body of literature on the use of VEs to train firefighters, this work aims to answer the research question “Is a VE that replicates a RE firefighter training exercise perceptually equivalent to it?”. To help answer this question, the following specific goals were defined: (i) compare the perceived fatigue and stress across environments; (ii) compare the transfer of knowledge enabled by the environments; (iii) evaluate the effect of a firefighting training VE on the participants’ sense of presence and cybersickness, and lastly, (iv) compare the effect of the two environments in participants’ physiological stress, analysed through HRV.

3 Method

This paper presents a quasi-experimental study that compares and evaluates the effect of a VE designed to train firefighters. The authors used subjective and objective metrics to measure and compare several variables across the two environments to achieve this goal.

For subjective measures, questionnaires were used. As detailed below, the questionnaires were applied using a pre-test, post-test, or pre-test and post-test methodology, depending on the questionnaire. Their purpose was to evaluate the effect of the environment, virtual and real, on one or more dependent variables. As an objective measure, participants’ HRV was used. This provides a way to measure participants’ stress continuously during the duration of the environment.

3.1 Sample

The initial sample of this study consisted of 12 participants aged between 18 and 47 years old (M = 32.08, SD = 9.31). All participants were firefighters under training. This was necessary because the selected training exercise is part of the firefighter training programme and is not available to the public. Of the twelve firefighters, nine were males, and three were females. A total of nine participants reported having normal vision, while three reported having corrected-to-normal vision. All participants reported normal hearing. Regarding VR experience, eight reported having some type of experience with VR, while four reported never having had any experience with VR. In the HRV analysis, the sample was reduced to eight participants because four participants could not attend the scheduled date for the exercise in the RE. Of these eight, seven were males, and one was female. Their ages ranged between 18 and 48 (M = 32.25, SD = 10.85). Only one had normal-corrected vision, and all had normal hearing. The sample was reduced to eight in the HRV analysis because four participants could not attend the scheduled date for the exercise in the RE. The impossibility of all attending on the same date reinforces the need and advantage of the VE in allowing participants to carry out the exercise when they want and how many times they want.

3.2 Materials

A total of four setups were used during this study: Baseline Environment (BE), Habituation Environment (HE), Virtual Environment (VE) and Real Environment (RE). The purpose of each environment and the materials used are presented in Table 1.

Table 1 Purpose of each setup and materials used

A common material in all environments was the VitalJacket (Cunha et al. 2010). A wearable biomonitoring platform that can capture medical-grade ECG exams in real-time (Fig. 1). To extract HRV data, an ECG analysis software from Bio-Devices S.A. was used (Biodevices 2019). This software was used to detect the “R” points of the ECG waveform and to extract the RR interval (time between two consecutive “R” peaks in the ECG). The RR intervals were verified if they were physiologically correct, as proposed by (Clifford et al. 2006). The RR intervals that had physiological acceptance were normal-to-normal (NN) intervals. To follow the guidelines from the European Society of Cardiology task forcend the North American Society of Pacing and Electrophysiology (Camm et al. 1996), different HRV time and spectral domain parameters were used in this study.

Fig. 1
figure 1

VitalJacket and electrodes used in the experiment

The complete firefighter uniform used in the RE consists of boots, pants with braces, an undershirt, a coat, a full facepiece, a helmet, and a Self-Contained Breathing Apparatus (SCBA). With the adapted uniform, care was taken to use as much equipment as possible in the HE and VE. However, some elements were not used, namely: boots, full facepiece, helmet, and gloves. The boots were not used for logistical reasons. During the virtual tests, the trainees did not have the uniforms assigned. In addition, the full facepiece and helmet were not used because it was not possible to use them simultaneously with the HMD. To replace the full facepiece, an MSA Advantage 410 half mask was used, which allowed the participant to breathe compressed air from the SCBA cylinder and, at the same time, free the upper part of the face so that the HMD could be placed—see Fig. 2 for a visual comparison of the two masks. Regarding the firefighter gloves, these were replaced with regular gloves to allow participants to have greater sensitivity in the use of the hand controllers.

Fig. 2
figure 2

Full face mask used in the RE (left) and half mask used in the VE (right)

The hand controllers used were HTC, and the HMD was HTC Vive Pro with the Vive Wireless Adapter, which allowed the participant to see and hear the environment while untethered.

The thermoelectric device consists of a 12 V Peltier cell that was placed under the right-hand glove. This device was used to simulate the thermal sensation on the back of the hand, a crucial component of the training exercise. This device was controlled by a custom Arduino-based system developed by Monteiro et al. (2020). For participants’ safety, the maximum temperature applied to the hand was 50°C.

Both the HE and VE were developed using Unity game engine. Figure 3 shows the participant’s equipment in the RE (left) and HE and VE (right).

Fig. 3
figure 3

Fully equipped participant in the RE (left) and in the HE and VE (right). For illustration purposes, the RE photo was taken during daylight on a different day of the exercise

3.2.1 Training procedure

The chosen training exercise aims to teach firefighter recruits how to open doors in urban or industrial fire situations. Participants learn to assess the fire stage in a closed compartment before entering it as part of the procedure. This is a crucial aspect of firefighting because opening a door can cause fire phenomena that put the firefighter’s physical integrity at risk. The sudden supply of oxygen caused by the opening of a door can cause two fire phenomena: flashover, which consists of an accelerated fire development; and backdraft, which consists of an explosive phenomenon that releases powerful shock waves capable of bursting structures. Firefighters can reduce the occurrence of these phenomena by observing another fire phenomenon known as the neutral plane. The neutral plane phenomenon consists of the separation of the hot zone, where the superheated gases and smoke are found, from the cold zone, where the fresh air and good visibility are found (Fig. 4). Observation of the neutral plane is critical because it allows firefighters to perceive the fire phase, define the attack strategy and reduce the probability of causing a flashover or backdraft.

Fig. 4
figure 4

Visual demonstration of the neutral plane phenomenon. The yellow bar demonstrates where the neutral plane phenomenon occurs, dividing the upper layer of superheated gases and smoke and the lower layer of oxygen

This procedure is performed by a team of at least two people. In the VE, these elements are the participant and a virtual agent who takes the instructor role, guiding the participant through the exercise. The participant’s role is to identify and report to the instructor the height of the neutral plane, to handle the opening and closing of the door while the instructor observes the neutral plane, to make a fire attack if that is the instructor’s decision, and to close the door after entering the compartment (if the decision is to enter the compartment). The procedure consists of the following phases: positioning, identifying the neutral plane, and door opening and closing.

Participants must observe whether the door opens inwards or outwards in the positioning phase. To do this, the door hinges are used as a clue. If the hinges can be seen, the door opens outwards (towards the participant). Otherwise, it opens inwards (against the participant’s direction). If the door opens outwards, the participant places himself next to the hinges. If not, he places himself next to the handle. Positioning is critical because it means that participants are not directly exposed to the fire as soon as they open the door, thus allowing them to close the door if the fire is at a dangerous stage. In addition, it allows them to be in a favourable position to assist and remove, if necessary, the other element exposed to fire when the door is opened. In the VE, the participant starts in front of the door and has to teleport to one side of the door. Then, the participant uses natural walking to position himself in the correct spot to perform the training procedures.

After being correctly positioned, the participants must identify the neutral plane's height. To do this, they remove the glove from the hand closest to the wall and, with the back of the hand, go through the door from the bottom to the top until they feel a gradual increase in temperature. Removing the glove aims to increase their sensitivity to detect the temperature difference since the back of the hand is used to protect the palms of the hands, which, in case of burns, would be more harmful to the firefighter’s performance. When the participants detect the temperature increase caused by the neutral plane, they put the glove back on and leave a mark on the door so that the other element has a good approximation of the height of the neutral plane.

The final phase of the procedure is done in synchronisation with the other element. The colleague, in this case, the instructor, counts down for the participant to open the door by a few centimetres (just enough for the nozzle to project water into the compartment) and for a short duration of time (2 or 3 s). While the door is open, the instructor observes the fire conditions inside the compartment. Based on the information gathered (fire stage, smoke colour and density), a decision may be not to enter the compartment, attack the fire, or enter the compartment directly. If entering the compartment endangers the physical integrity of the firefighters, the door remains closed, and they move to the next compartment. If it is decided to make an attack, the procedure is similar to observation. The difference is that the instructor uses the nozzle to project water into the compartment. If the decision is to enter the compartment directly, participants open the door enough for the instructor to enter first. After the instructor enters, participants also enter the compartment and close the door behind them to prevent fire propagation.

In the HE, the virtual instructor explains and practises the procedure with the participant in three different doors. Figure 5 demonstrates the HE where the participant practises the procedure. The participant “teleported” to the VE that replicates the RE only after performing the procedure in these three doors. All the instructions given by the virtual instructor are recordings made with the instructor that administered the RE training and replicated the instructions given in the RE.

Fig. 5
figure 5

Demonstration of the HE where the participant practises the procedure to open a door during an urban or industrial fire

3.3 Measurements

A total of eight instruments were used in this study: six questionnaires, Stress and Fatigue Visual Analogue Scales (VAS) and participants’ HRV. The questionnaires were the following:

  • A socio-demographic questionnaire was used to collect data such as age and previous experience with VR. Applied once before the VE;

  • Perceived Stress Scale (PSS) (Cohen et al. 1983): a 15-item questionnaire that measures users’ perceived stress based on the frequency at which certain events occurred. Applied twice, before the VE and before the RE;

  • chequelist Individual Strength (CIS) questionnaire (Vercoulen et al. 1994): a 20-item questionnaire that measures participants’ perceived fatigue. Applied twice, before the VE and before the RE;

  • A knowledge questionnaire, a 9-item multiple-choice questionnaire validated by the local firefighting unit commander, measures the transfer of knowledge enabled by the training exercise. Applied four times, before and after the exercise in both environments;

  • Igroup Presence Questionnaire (IPQ) (Schubert et al. 2001): a 14-item questionnaire that measures users’ presence in a VE using four subscales: experienced realism, involvement, spatial presence and overall presence. Applied once after the VE;

  • Simulator Sickness Questionnaire (SSQ) (Kennedy et al. 1993): a 16-item questionnaire that measures users’ cybersickness using four subscales: nausea, oculomotor discomfort, disorientation, overall cybersickness. It was applied four times before and after the exercise in both environments.

All questionnaires were completed in Portuguese. The PSS questionnaire was previously translated by (Cardoso et al. 2002) and the IPQ questionnaire by Vasconcelos-Raposo et al. (2016). The remaining questionnaires were translated using the back-translation method (Brislin 1970; Hambleton and Zenisky 2011) and performing their content validity assessment.

In addition to questionnaires, Visual Analogue Scales (VAS) were used to obtain self-reports of perceived fatigue and stress. VAS are one-item scales where users evaluate the amount of a given characteristic/attitude on a level of one to ten (Lesage et al. 2012; Sobel-Fox et al. 2013). Although questionnaires for perceived stress and fatigue were already used (PSS and CIS), the VAS were included because they are a quick and easy way to obtain several measurements in an experiment and make a before and after comparison to assess the effect of a particular environment (for a more detailed description on the advantages of using Single-item measures, please see (Fisher et al. 2016)). In addition, the CIS and PSS questionnaires attempt to measure participants’ perceived fatigue and stress in a more persistent timeframe, considering current events in participants’ lives.

Regarding participants’ HRV, a qualitative analysis was conducted to search for signs of physiological stress based on seven parameters extracted from participants’ electrocardiogram (ECG) exams. The trend of each parameter under stress was analysed to identify whether a rise in value corresponded to an increase or decrease in physiological stress. The trend of each parameter under stress was obtained from (Castaldo et al. 2015) meta-analysis (N = 758). Table 2 presents the HRV parameters used, briefly describing each and their trend under stress. As a practical example, if one analyses the AVNN parameter, one assumes that the lower this value, the greater the physiological stress evidenced in the participant’s response.

Table 2 Purpose of each environment and materials used

3.4 Variables

The independent variable of this study was the environment, which had two conditions: Virtual Environment (VE) and Real Environment (RE). Exceptionally, in the HRV analysis, two additional environments were considered due to their specificities: the Baseline Environment (BE) and the Habituation Environment (HE). Where applicable, the independent variable exposure is considered, composed of two conditions: before and after, which refer to the scores of the dependent variables obtained before and after the exposure to an environment.

The dependent variables of this study were perceived stress, perceived fatigue, transfer of knowledge, sense of presence (composed by the subscales experienced realism, involvement, spatial presence, and overall presence), cybersickness (composed of nausea, oculomotor discomfort, disorientation, and overall cybersickness) and physiological stress measured with HRV.

3.5 Experimental procedure

The experimental procedure was previously submitted and duly approved by the ethics committee of the Faculty of Engineering of the University of Porto. It was split into two separate times, with a gap of three to four weeks between the settings, which varied depending on the participant. The first moment consisted of performing the training in the VE, and the second moment consisted of having the participants perform the training in the RE. The VE exercise was done individually because each participant booked a date/time of their convenience. In the VE, the team member accompanying the participant in the exercise was simulated through a virtual character. In the RE, although all participants were told to be at the training centre on the same date and time, the exercise was also carried out in teams of two elements—the participant and the head of training. All participants performed the VE before the RE. This was to ensure that, if participants could not attend the RE, the experimental method remained the same—VE first.

Below are described in detail the two moments of the experimental procedure, referred to as the VE and RE procedures.

3.5.1 VE procedure

Before the experiment, participants read and signed informed consent in which they agreed to participate in the study. After the informed consent, the first step was the placement of the VitalJacket. Next, the researcher explained to the participants how and where the electrodes were placed and asked them to move to a private room to equip the VitalJacket in their privacy. After putting on the VitalJacket, the participants completed a pre-questionnaire consisting of the socio-demographic, PSS, CIS, VAS of perceived fatigue and stress, cybersickness and knowledge questionnaires, described in Sect. 3.3—Instruments.

After completing the questionnaire, participants performed the BE. They were instructed to put on Bose QuietComfort 25 headphones and remain seated for twenty minutes while listening to relaxing music. During this period, the researcher left the experimental room not to influence the participant’s baseline record.

After the BE, the researcher explained the purpose of each piece of equipment and helped the participants prepare for the habituation environment. Participants donned the pants and jacket of the firefighter’s uniform, put the SCBA on their backs (after opening the cylinders) and put on the MSA half mask. As the half mask differs from the full mask they usually use, participants were given some time to get used to it. After adjusting, participants put on a pair of regular gloves—the investigator placed the Peltier sensor under the right glove, in contact with the back of the participant’s right hand—and put on the HMD HTC Vive. At that point, the investigator placed the HMD battery in the firefighter’s jacket pocket, placed the headphones over the participants’ ears, and finally placed the HTC controllers in their hands. From the moment the participant was fully equipped, the habituation environment began. Depending on the participant, this environment lasted between fifteen to twenty minutes. At the end of habituation, the participants were automatically teleported to the VE, where they performed the training exercise in an environment that replicates the real-world scenario. Then, the participant started the training VE without changing or removing any equipment. Depending on the participant, this environment lasted between five to ten minutes. At the end of the VE, the researcher helped participants remove all equipment and asked them to complete the post-questionnaire, which consisted of the VAS of perceived fatigue and stress, IPQ presence, SSQ cybersickness and knowledge questionnaires. The entire procedure lasted approximately one hour and ten minutes. Figure 6 presents an illustration of the VE procedure.

Fig. 6
figure 6

Illustration of the VE procedure

3.5.2 RE procedure

On the date and time set by the head of training, the study participants went to the training centre already equipped with the firefighter’s uniform. After everyone was present, they were asked to go individually to a private room to equip the VitalJacket to start recording the ECG exam. As the VitalJackets were equipped, participants were asked to fill in the pre-questionnaire, composed of the PSS, CIS, VAS of perceived fatigue and stress, and knowledge questionnaires. When everyone had answered the pre-questionnaire, the head of training gave a briefing about the training exercise. After the briefing, all participants equipped the SCBAs and waited for them to be called, one by one, by the head of training to carry out the exercise. After performing the exercise, the participants removed the SCBA. Then, they completed the post-questionnaire in a different room from the participants who had not yet performed the exercise not to influence them. The post-questionnaire consisted of VAS of perceived stress and fatigue and a knowledge questionnaire. After everyone had completed the post-questionnaire, the head of training gathered all the participants for a debriefing. The exercise lasted approximately 5 min. The duration of the entire procedure was approximately 40 min. The time was significantly shorter than the VE procedure as it was not necessary to perform a baseline measurement or habituation to technology. Figure 7 presents an illustration of the RE procedure.

Fig. 7
figure 7

Illustration of the RE procedure

Figure 8 demonstrates the beginning of the training exercise in the VE and the RE. In the VE, the instructor waits for the participant to join him to start the exercise. In the RE, the participant has already started the exercise.

Fig. 8
figure 8

Demonstration of the training exercise in the VE (left) and RE (right)

3.6 Statistical procedures

Different procedures were used to analyse the data collected in the experiment. The results from the PSS and CIS questionnaires were used to evaluate and compare participants’ perceptions of stress and fatigue between environments were analysed using paired samples T-tests. In addition, their absolute values were used to evaluate the same variables empirically. A cut-off value of 28 was used in the PSS questionnaire, half of the maximum value (56), while a cut-off value of 74 was used in the CIS questionnaire, just over half of the maximum value (140).

The results from the VAS were used to evaluate the effect of the environment (virtual vs real), moment (before vs after), and interaction between the two on participants’ perception of stress and fatigue were analysed using two two-way repeated-measures ANOVA. In addition, a two-way repeated-measures ANOVA was also used to evaluate the effect of the environment (virtual vs real), moment (before vs after), and interaction between the two on participants’ knowledge transfer.

Regarding the results from the presence questionnaire, a simple descriptive analysis was made based on the mean values of the questionnaires’ subscales as it was only administered in the VE procedure. Regarding the results of the cybersickness questionnaire, these were analysed through a one-way repeated measures MANOVA and a paired-sample T-test. The first one was used to evaluate all the questionnaire subscales together, whilst the latter evaluated the subscales individually.

All the assumptions for conducting the statistical procedures were verified. Furthermore, before performing the tests mentioned above, outliers were identified by inspecting boxplots and evaluating the data distribution based on the skewness and kurtosis (George and Mallery 2003) and, consequently, removed from the sample.

Lastly, to analyse participants’ HRV, non-parametric statistics were used. This decision was motivated by the nature of the data, which varies significantly from person to person, and by the sample size, which the authors consider low for this type of data. Friedman statistical tests were used to evaluate and compare the effect of all four environments on participants’ HRV parameters. When statistically significant results were shown, pairwise T-tests were performed as follow-up tests. In such cases, a Bonferroni correction was applied to adjust the level of statistical significance for multiple comparisons.

The results were interpreted as statistically significant if the p-value was lower than 0.05 and indicative if between 0.05 and 0.10. Data are presented as mean ± standard deviation (M ± SD) unless otherwise stated.

4 Results

This section presents the results of the tests used to measure participants’ perception of stress, perception of fatigue, knowledge transfer, presence, cybersickness, and physiological stress.

4.1 Statistical procedure

Starting with the perception of stress and fatigue, the results of the PSS and CIS questionnaires are presented. In this analysis, no outliers were identified. Skewness varied from −0.872 to −0.142 and kurtosis from −1.337 to −0.067, indicating a normal data distribution (George and Mallery 2003). The PSS results indicated significant differences between environments in perceived stress, with the stress level in the RE being higher than in the VE. The CIS results indicated no significant differences between environments in perceived fatigue. The results and descriptive statistics are presented in Table 3. Based on the cut-off values defined for the questionnaires, the results indicate that participants rated perceived stress and perceived fatigue as low in both virtual and real environments.

Table 3 Descriptive statistics of the PSS and CIS questionnaires and the paired samples T-tests (N = 12)

Still, on the perception of stress and fatigue, the results of two two-way repeated measures ANOVA are presented. One was performed on the VAS stress data and the other on the VAS fatigue data. No outliers were identified in either analysis. All data was normally distributed according to (George and Mallery 2003). In the former the asymmetry ranged from 0.147 to 1.368 and kurtosis from −1.154 to 1.370. In the latter, asymmetry ranged from 0.042 to 0.499 and kurtosis from −1.458 to −0.658. The results of the test performed on the VAS stress data indicated no significant differences in the environment variable (Wilks' Λ = 0.998, F(1, 10) = 0.017, p = 0.9, partial η2 = 0.002), in the variable moment (Wilks' Λ = 0.995, F(1, 10) = 0.048, p = 0.831, partial η2 = 0.005), or in the interaction between the two variables (Wilks' Λ = 0.790, F(1, 10) = 2.66, p = 0.134, partial η2 = 0.210). The results of the test performed on the VAS fatigue data also indicated no significant differences in the environment variable (Wilks' Λ = 0.949, F(1, 10) = 0.535, p = 0.481, partial η2 = 0.051), in the moment variable (Wilks' Λ = 0.959, F(1, 10) = 0.426, p = 0.529, partial η2 = 0.041), or in the interaction between the two variables (Wilks' Λ = 0.949, F(1), 10) = 0.535, p = 0.481, partial η2 = 0.051). The descriptive statistics of the VAS stress data and VAS fatigue data are presented in Table 4.

Table 4 Descriptive statistics of VAS stress data and VAS fatigue data (N = 12)

4.2 Transfer of knowledge

In the dependent variable transfer of knowledge, two-way repeated-measures ANOVA results were used to evaluate the effect of the two variables' environment, moment, and interaction in knowledge transfer. No outliers were identified, and there was a normal distribution of the data, with skewness ranging from 045 to 1.323 and kurtosis from −0.940 to 0.875. The results showed no significant differences in the environment variable (Wilks' Λ = 0.652, F(1, 7) = 4.2, p = 0.08, partial η2 = 0.375) or the moment variable (Wilks' Λ = 0.208, F(1, 7) = 1.84, p = 0.217, partial η2 = 0.208). Still, they showed significant differences in the interaction between environment and moment (Wilks' Λ = 0.675, F(1, 7) = 14.54, p = 0.007, partial η2 = 0.675). Consecutive tests, consisting of several one-way repeated measures ANOVA, showed that this significant difference was due to an increase in knowledge in the before vs after comparison of the VE (Wilks’ Λ = 0.289, F(1, 8) = 19.69, p = 0.002) and an increase in knowledge in the before VE vs before RE comparison (Wilks’ Λ = 0.404, F(1, 7) = 10.31, p = 0.015). Descriptive statistics of the knowledge questionnaire results are presented in Table 5.

Table 5 Descriptive statistics of knowledge questionnaire results (N = 12)

4.3 Presence questionnaire

The data from the presence questionnaire (Table 6), only administered after the VE procedure, shows that participants rated the VE with moderately high spatial presence and involvement. As a result, the overall presence was also moderately high.

Table 6 Results from the presence questionnaire applied after exposure to the VE (N = 12)

4.4 Cybersickness questionnaire

Two tests were used to analyse the data obtained from the cybersickness questionnaire. A one-way repeated measures MANOVA, to consider all four scales of the cybersickness variable when evaluating the influence of the VE; and a paired samples T-test, to consider each scale independently when evaluating the influence of the VE. A total of four outliers were detected and removed from the sample, resulting in a sample of eight participants. The data was normally distributed, with asymmetry ranging from -0.277 to 1.440 and kurtosis from -2.069 to 1.031. The results of the one-way repeated measured MANOVA showed an indicative reduction in the moment variable (Wilks' Λ = 0.929, F(1, 7) = 0.53, p = 0.489, partial η2 = 0.071), and no significant differences in cybersickness scales (Wilks' Λ = 0.295, F(3, 5) = 3.98, p = 0.86, partial η2 = 0.705) and in the interaction between moment and cybersickness scales (Wilks' Λ = 0.341, F(3, 5) = 3.23, p = 0.12, partial η2 = 0.659). As for the T-test results, there was a significant reduction in the disorientation scale after performing the exercise in the VE. Table 7 presents the descriptive statistics and the T-Test results applied to cybersickness data.

Table 7 Descriptive statistics of cybersickness data and T-test results (N = 8)

4.5 HRV analysis

As mentioned in Sect. 3.1—Sample, in the HRV analysis, the sample was reduced to eight participants. Since non-parametric statistics were used, no data distribution was evaluated, nor were outliers assessed. Friedman’s tests indicated significant differences between moments in AVNN, pNN20, and LF parameters and an indicative difference in the parameter HF. Test results for the remaining parameters did not show significant differences. The results of the posthoc test revealed that:

  • The difference obtained in the AVNN parameter was caused by a significant reduction between the baseline—virtual (T(0.65) = 2.38, p = 0.001) and baseline—real (T(0.65) = 1.88 p = 0.022);

  • The difference obtained in pNN20 was caused by a significant reduction between baseline—real (T(0.65) = 1.75, p = 0.04);

  • And lastly, a significant reduction between baseline caused the difference obtained in the LF—real (T(0.65) = 1.88, p = 0.022).

Descriptive statistics of HRV parameters in different environments and the results of Friedman’s tests are presented in Table 8.

Table 8 Descriptive statistics of HRV parameters in different environments and Friedman test result (N = 8)

5 Discussion

In this section, the authors discuss the results using the same order in which they were presented. Starting with the perception of fatigue and stress, the only significant result was in the perception of stress assessed through the PSS questionnaire. The remaining instruments used to measure the same variables, namely the CIS and the VAS, did not show significant differences in any conditions tested. Analysis of PSS results revealed a significant increase from VE to RE. Despite the significant difference obtained in the statistical test, the cut-off value suggests a low stress level in both environments. In addition, it is important to consider that, as mentioned in Sect. 3.3—Instruments, the PSS questionnaire assesses the perception of stress based on events that occurred in the last two weeks to obtain a more persistent measure of the participant’s stress. Considering this information, the authors believe that the results indicate that participants felt some stress in the weeks before performing the exercise in the RE. However, perceived stress and fatigue were low and similar between environments. Similar results in the perception of fatigue and stress are a positive outcome of this study as they suggest that, at a perceptual level, the VE is equivalent to the RE in such variables. Comparing this result with results from related works is difficult because here, the objective is to obtain similar results, which indicates that the VE is perceived similarly to the RE while the focus in related works is to show the superiority of training in a VE against the training in the RE (Tate et al. 1997) or the superiority in a VE against lack of training (Backlund et al. 2007).

In the dependent variable transfer of knowledge, there was only one significant result: the interaction between environment and exposure. Please note that this result was obtained in comparing after the VE and before the RE, so it may be attributed to the increase in knowledge from before VE to after VE and the increase in the same variable from before VE to before RE. This result can be explained by participants’ learning information from the VE and retaining that information until the day of the RE procedure. Although in different variables, this result is coherent with the works of Cha et al. (2012) and Cohen-Hatton and Honey (2015) in the sense that positive feedback was obtained from the training VE. The authors consider this result positive as it indicates that the VE influenced participants in achieving higher scores in the knowledge test and that this knowledge was retained until the RE.

Furthermore, in informal comments with the firefighter commander, the authors also realised that technical concepts are addressed in the RE. However, the primary focus is on exposing participants to the conditions of an indoor fire. As it is a situation that the recruits have never faced and that involves stress, the commander states that the technical concepts are covered in theoretical classes before and after the RE training to ensure that the recruits understand the phenomena they experienced during the exercise.

The results of the presence questionnaire, applied only in the VE, show that participants rated their overall sense of presence as being moderately high (3.47 out of 5). This is a positive result because it indicates that the participants’ scales that constitute the presence variable had good ratings. Of the three evaluated scales, the lowest result was experienced realism. Since the questions related to this scale refer to the similarity between the experienced VE and the real world, what the authors extract from this result is that there is room to improve the VE components that contribute to this scale, such as the 3D models and the simulation of elements such as water and fire. An improvement of such elements could positively impact the realism perceived by the participants, but more tests are required to test this hypothesis. Additionally, it should be taken into consideration that the questionnaire was filled before they experienced the RE, which could have affected participants' perception of similarity.

The results of cybersickness also applied in the VE alone showed an indicative reduction in the before-after comparison of the VE and a significant reduction in the disorientation scale after performing the exercise in the VE. Although our VE was immersive and we did not use a non-immersive condition, as was the case in Clifford et al. (2018), the authors argue that there would not be a significant difference between them since the results do not show an increase but rather a decrease in cybersickness. The results show a clear tendency towards reducing values after experiencing the VE. Since these scales are used to measure the impact of a VE on cybersickness, a reduction in symptoms indicates that our VE was successful in not provoking cybersickness. Nevertheless, we shall note that the movements in the VE were limited and could have contributed to the low cybersickness scores.

Regarding the HRV data, from the six HRV parameters analysed, three showed significant differences between conditions, and one showed indicative differences (Table 8). In all the parameters that showed differences, namely AVNN, pNN20, LF and HF, there was a similarity: the trend across environments. All of them were shown to have higher values in the BE, followed by the habituation environment, VE and finally, RE. Although post-hoc tests showed that significant differences occurred in only one or two comparisons, this was a positive result because it suggests that, according to the parameter’s stress trends (Castaldo et al. 2015), the order of environments by the level of stress level matched our expectations: from lowest to highest, baseline, habituation, virtual and real. This is a positive result because it matches our expectations regarding the level of stress in each environment and validates our objective approach to measuring participants’ stress in different environments.

When analysing the comparisons where significant differences were obtained, one can see that the parameter that obtained the best results was AVNN. In the AVNN parameter, significant differences between the BE and both VE and RE were shown. Lower values of AVNN were shown in the VE and RE. This is a positive result since a reduction in AVNN values proposes an increase in cardiac sympatho-excitation, characteristic of stress conditions (Tharion et al. 2009). Moreover, we consider this to be a positive result because it suggests that the VE had an impact similar to the RE. In the remaining parameters, post-hoc tests showed significant differences between the BE and the RE but not between the BE and VE. This is a positive result for the RE, but not so much for the VE, as it suggests that its ability to cause stress in the participant is lower than that of the RE. A factor that may have played an important role in this result is the low value of experienced realism reported by users in the presence questionnaire. The authors hypothesise that improving the VE focussed on this presence scale would positively affect the questionnaire and the participants’ stress response. Also, the authors hypothesise that the presence of a virtual trainer in the VE against the presence of a real trainer in the RE can be a factor that affects the reported stress. Further tests are needed to verify these theories.

6 Conclusions

This article presented an experiment that aimed to compare virtual training with real training in a set of variables of interest to understand if VR can create a perceptually equivalent response to the real world. Such variables of interest were the transfer of knowledge, sense of presence, cybersickness, perception of fatigue, perception of stress and the actual stress, measured through participants’ HRV. The virtual training was carried out in a VE, and the real training in a RE both presented the same firefighting training exercise. The same assessment instruments as the VE were used in the RE except for the sense of presence and cybersickness questionnaires since these are related to VR-based experiences.

The results from the knowledge questionnaire led us to conclude that the VE had a positive effect on knowledge transfer. The same effect was not visible in the RE because the participants performed the VE first and retained knowledge from it. Consequently, the knowledge questionnaire result applied before the RE was already high before performing the exercise. Regarding the perception of stress and fatigue, the results led us to conclude that there was some perceived stress in the weeks before the RE exercise. The perception of stress and fatigue immediately before and after the exercise was reduced in both environments.

Regarding the presence and cybersickness questionnaires, the first results led us to conclude that although the realism experienced did not obtain a good result, the feeling of presence, in general, was moderately high. The results of the second suggest that the VE was successful in not causing cybersickness symptoms. Finally, the HRV results led us to conclude that both the virtual and real environments induced signs of stress. However, the stress indicators were reduced in the VE.

To conclude, the authors find the results of this study encouraging. Although the level of stress evidenced by participants’ HRV in the VE cannot be considered similar to that shown in the RE, the remaining results show a positive effect of the VE in the transfer of knowledge, in obtaining a perception of stress and fatigue similar to the RE and is not causing symptoms of cybersickness. A possible limitation of the study is related to the order of exposure to the scenarios, where the VE was constantly introduced before the RE, which could affect the outcomes. In the future, the authors plan to explore if the order of exposure to the training environments impacts the outcomes and if firefighter recruits that are only exposed to the RE have the same level of knowledge retention shown in this study with the VE. Furthermore, they plan to improve different components of the VE that contribute to a higher experienced realism and conduct further tests to verify if an increase in experience realism leads to responses closer to those shown in the RE.