Background

Natural and man-made disasters such as floods, mass-accidents, and terrorist attacks are ubiquitous and cause loss of life, human suffering, and infrastructural damage [1, 2]. They create particularly demanding situations for emergency services, as they are unforeseen and usually sudden events that exceed local capacity and resources to rescue and care [1]. During such disasters, medical first responders (MFRs), who are responsible for the initial prehospital care in medical emergencies, play a key role [3, 4]. However, numerous healthcare professionals, including MFRs, perceive their preparedness for the response to disasters as inadequate [5]. As previous research indicates, a higher training frequency and better training quality are associated with increased disaster preparedness [5]. To enhance the overall quality of MFR training, the aim of this review is to provide an overview of scientifically evaluated training methods and to examine whether certain methods seem to be particularly effective. Furthermore, indicators used to evaluate the training effectiveness will be identified so that future research can be guided by existing training evaluation methods. Finally, the emergence of new, immersive technologies, including virtual (VR) and mixed reality [MR; 6], has led to the development of new training programs which are becoming increasingly accessible to educators in the medical sector [7]. Therefore, we will draw particular attention to the role of immersive technologies by providing an additional analysis of how and to what extent VR and MR specifically are included in current disaster training research.

MFRs typically include paramedics and emergency medical technicians [3], but the term may also refer to physicians, ambulance specialist nurses, and trained volunteers depending on a country’s emergency medical service systems [8, 9]. During disasters, MFRs take on a variety of tasks such as the initial scene evaluation, triage, medical care, and the transport of patients [3]. They have to perform those tasks under stressful and challenging conditions, such as difficult access to the disaster site, multiple injured people and disruption in communication systems [10]. In order for MFRs to adapt to these unusual conditions, they require specifically tailored training.

Effective training involves a systematic and goal-oriented execution of exercises for the acquisition or increase of specific competences and skills [11]. The general idea of training is to challenge the current level of performance (e.g., higher intensity, higher difficulty, new content) without being too overwhelming, so that the trainee can adapt and reach a higher performance level [11,12,13,14,15]. However, training resources, including time, budget, and facilities, are usually limited. Therefore, training methods must be not only effective, but also match the resources of the rescue organization.

Despite the necessity of adequately preparing MFRs for disasters, no systematic and up-to-date overview of scientifically evaluated training methods and their effectiveness exists. Ingrassia and colleagues conducted an internet-based search via Google and Bing and identified several disaster management curricula at a postgraduate level with a large variety of methods, e.g., lectures and discussion-based exercises [16]. The trainings’ effectiveness, however, was not evaluated. Assessing studies published between 2000 and 2005, Williams and colleagues [17] concluded that the available evidence had not been sufficient to determine whether disaster training can effectively increase the knowledge and skills of MFRs and in-hospital staff. Because these findings are derived from studies conducted more than 15 years ago, new insights have most likely emerged and new training methods may have been added following recent technological advances.

Two of those new methods are VR and MR. In VR training, users are placed inside a simulated, artificial, three-dimensional environment in which they can interact with their digital surroundings [6]. VR can either be screen-based using computer monitors or experienced in more immersive forms: Through head-mounted displays or certain rooms equipped with several large screens or projections on several walls (i.e., CAVE system; 6,18). In contrast, MR combines the real and virtual worlds and refers to the whole spectrum between reality and VR. MR, for example, includes augmented reality (AR) in which users see their real surroundings supplemented with virtual objects [6]. A specific application from the medical field may be the visual insertion of patient information during practice. Given the rapid development of immersive technology, this review provides an additional analysis of the role of VR and MR training.

Altogether, the following research questions are addressed:

  1. 1.

    Which current disaster training methods for MFRs have already been scientifically evaluated?

  2. 2.

    Which effectiveness indicators are used to evaluate MFR disaster training methods?

  3. 3.

    Based on the findings of the reviewed studies, which methods for MFR disaster training seem to be effective?

  4. 4.

    How and to what extent are VR and MR used to prepare MFRs for disasters?

Methods

The preregistered (osf.io/yn5v3) systematic literature search was conducted in accordance with the PRISMA guidelines [19].

Search strategy

The search strategy was prepared with support of a medical information specialist to ensure the appropriateness of the search terms. Using the search engines Web of Science and PubMed, we applied search terms such as health personnel, training, and disaster (see Additional file 1 for the search string). To ensure that the results reflect current training methods, the electronic search was limited to studies published between January 2010 and September 2021. A filter limited our results to studies with a full text in English.

Inclusion and exclusion criteria

We included articles that described a training or training session (e.g., drill, lectures, mixed methods training, etc.) conducted to improve the participants’ prehospital disaster response. The training had to address prehospital content, but was allowed to also contain in-hospital topics. Participants had to be MFRs, regardless of whether they were still in training or already had work experience. In addition, to ensure adequate assessments of the effectiveness, we only considered (quasi-)experimental designs in which outcomes were compared to a control or comparison group [i.e., randomized controlled trials (RCTs), non-RCTs and at minimum pre-post testing of the same group; [20, 21]].

We excluded studies that a) did not test the effectiveness of a disaster training for MFRs, b) contained other occupational groups or not sufficiently specified groups (e.g., “others”) in addition to MFRs without reporting separate analyses for the MFRs, c) were not primary studies published in peer-reviewed journals, and d) had no full-text available.

Selection process

The search was conducted on 28th October 2021 and led to 4533 hits (Fig. 1). Duplicates were identified with the software Endnote™ (Version 20.1) and additional visual screening. Two raters (ASB and RW) independently screened the remaining hits and performed the study selection using the web application Rayyan [22]. Discrepancies in the study selection process were resolved by consensus or, if necessary, together with a third rater (YH). Fifty-five studies were included in the review.

Fig. 1
figure 1

Flow diagram of study selection

Data collection and analysis

Two raters (ASB and RW) extracted the relevant information for each article. Again, discrepancies were resolved by consensus, and when necessary together with the third rater (YH). Whenever studies used multiple methods at different time points we only considered those applied between the pre- and post-measurement. For trainings evaluated in (non-)RCTs without a pre-test, methods must have been applied before the post-test comparison with control groups. Similarly, only effectiveness indicators with sufficient informative value about training success or failure were considered (i.e., indicators used for pre-post comparisons or for comparisons with control groups). To assess the studies’ quality and risk of bias, we used the Joanna Briggs Institute (JBI) critical appraisal checklists for RCTs and quasi‐experimental studies [23]. The JBI tool for quasi-experimental studies contains nine questions and the JBI tool for experimental studies consists of 13 questions (e.g., “Were outcomes measured in a reliable way?”). There are four possible answer options: yes, no, unclear, not applicable. The answer yes indicates quality while no indicates a risk of bias.

Results

The majority of studies used a single group pre-post design (k = 35). Other study designs included non-RCTs (k = 6) and RCTs (k = 14) with 15 out of 20 containing pre-post testing (see Table 1 for a full overview and Additional file 1: Table 1 for further information). The sample sizes varied largely between studies (range 6–524). Trainings took place on several continents with the majority of trainings conducted in North America (k = 24), followed by Asia (k = 18), Europe (k = 8), Australia and Africa (both k = 2), unclear (k = 1). The majority of tested trainings addressed general disaster management or several disaster-related topics (k = 31), followed by triage (k = 14), trauma management/sonography (k = 3) etc. Furthermore, the time spans varied between one day or less (k = 22) to up to eight months (k = 33).

Table 1 Overview of included studies

Research question 1: overview of training methods

The majority of studies reported trainings that contain a combination of several methods, either in the intervention group, control group, or in both (k = 42). Training methods could be categorized into traditional and technology-based methods (Fig. 2). Traditional categories reflect lectures, real-life scenario training (e.g., mass casualty incident simulations with actors or manikins), discussion-based training (including seminars, workshops, in-class games, tabletop exercises), practical skills training (e.g., regional anesthesia), field visits (e.g., the visit of disaster affected sites or riding with the prehospital physician vehicle), and debriefings. In contrast, the technology-based category is composed of computer-based learning (i.e., online learning, educational computer programs), screen-based serious gaming, educational videos, and VR/MR. The term serious gaming refers to computer-based learning that additionally contains game elements, such as cooperation, competition, and stories [24].

Fig. 2
figure 2

Overview of the distribution of traditional and technology-based training methods

Research question 2: effectiveness indicators

The trainings were evaluated with several effectiveness indicators, including knowledge and performance, but also self-reported measures (Fig. 3). Most frequently, knowledge gain was used as an indicator. Knowledge was mainly assessed with a basic knowledge test on the training content, often in a multiple-choice format. Some studies used an applied knowledge test that consisted of a written test with several patient descriptions which had to be classified into triage categories. Less than one third of the studies used performance as an indicator. Performance assessments were frequently conducted in triage simulations [25,26,27,28,29,30,31,32,33] but also in in other contexts, e.g., the management of patients affected by chemical, biological, radiological, nuclear and/or explosive events (CBNRE; 34,35) and the execution of specific medical procedures [36, 37]. Several of those studies focused on measures that could be determined easily and relatively well objectively, including accuracy of triage or treatment decisions [26,27,28,29,30,31,32,33, 38], time needed [28,29,30, 33, 38] or compliance with the correct procedure [30]. In ten studies, raters composed an overall performance score based on several criteria [25, 28, 32, 34,35,36,37,38,39,40], e.g., the evaluation of safety on site [25, 32] and airway/breathing interventions [25, 28]. Three of those studies used already existing assessment instruments, either for treating CBNRE patients [34, 35] or for single patient care in low-resource countries [40]. Only one study used team performance as a measure of effectiveness by letting raters compose an overall team performance score for the management of simulated disaster scenes [32]. All other studies measured individual performance only. Furthermore, several studies used self-reported measures, including preparedness/readiness [39, 41,42,43,44] and (self-)confidence [45,46,47,48,49,50]. In addition to knowledge, performance and self-reported measures, one study compared the level of immersion in VR to real-life scenario training [31].

Fig. 3
figure 3

Effectiveness indicators with the number of articles that used them

Research question 3: effectiveness of training methods

All training methods demonstrated a certain effectiveness, as most studies reported positive or at least partially positive effects of the different methods (see Table 2 for an overview of the methods’ effectiveness).

Table 2 Effectiveness of methods

Lectures were mostly used in combination with other methods and often served the initial theoretical knowledge transfer [46, 51,52,53,54,55,56]. There were three studies in which only lectures occurred between the pre-test and post-test. Two of these evaluated educational refresher sessions and reported a positive impact on knowledge [57] and performance [28]. The third one concluded that lectures led to similar performance but lower knowledge gain and partially lower training satisfaction than the combination of lectures and discussion-based training [33]. Multimethod trainings with lectures showed mixed results regarding knowledge and performance but positive effects on self-reports of preparedness, knowledge, competence, confidence, and self-efficacy.

Real-life scenario training was often similarly or less effective compared to technology-based training. Studies that compared real-life scenario training to either educational videos [58] or VR [31] reported a partially lower impact of real-life practice on knowledge [58] and similar impacts on performance [31] and training satisfaction [31]. In combination with other methods, the training also resulted in similar [29] or slightly lower [25] performance but greater knowledge gain [25] than VR training and lower self-reported competence than serious gaming [24].

Discussion-based learning was often combined with other methods and resulted in mixed knowledge outcomes but at least partially positive effects on performance and self-reports of preparedness, competence, confidence, and self-efficacy. However, two studies reported smaller performance improvements [30] and self-reported competence gain [24] than trainings that contained serious gaming.

Practical skills training was never tested as a sole method. Compared to technology-based training, multimethod training with practical skills exercises always resulted in similar or smaller effects. Trainings containing practical skills exercises led to similar [37] or lower [59] knowledge gain as well as similar performance levels [37] and self-reported learning gains [59] than trainings that contained computer-based learning instead. Furthermore, multimethod training with practical skills exercises resulted in lower performance, self-reported preparedness, and self-reported competence than screen-based VR [39] and lower self-reported competence than serious gaming [24].

Field visits were part of five trainings and varied considerably in their content and length. Evidence suggests positive effects on knowledge, performance, and self-reports of knowledge and competence. One paper compared a visit of a large ambulance bus to VR and MR training and concluded that the visit was less effective in increasing performance [38]. However, trainees only had one hour in the ambulance bus to practice finding essential objects while the VR and MR group could practice as many times as they wanted within one week (at least three times).

Debriefings were only explicitly tested once. The study used drone videos from a real-life scenario training that the trainees had previously undergone [60] and partially confirmed a positive effect on (self-)perception. In combination with other methods, debriefings led to positive outcomes on performance as well as on self-reports of knowledge, confidence, and preparedness. There were mixed findings regarding objectively measured knowledge. Furthermore, multimethod training with debriefings led to lower knowledge scores and similar self-reported learning gains than computer-based learning [59] as well as lower self-reported competence than serious gaming [24].

Computer-based learning as a stand-alone method or in combination with other methods led to improvement or partial improvement in knowledge, performance, and self-reports of preparedness, competence, and self-efficacy. Computer-based training resulted in greater knowledge gain and similar self-reported learning gains compared to traditional training [59]. Computer-based learning also led to similar knowledge and performance improvements as practical skills training, both combined with videos [37].

Educational videos usually led to at least partial knowledge gain and performance improvements as well as a partially greater knowledge gain than real-life scenario training [58]. Only one multimethod study did not find an effect on knowledge. Studies also reported positive outcomes on self-reported preparedness and competence.

Serious gaming was only evaluated in two studies [24, 30]. Ma and colleagues reported that game-based teaching resulted in significantly higher self-reported disaster nursing competence than traditional training [24]. Knight and colleagues tested a multimethod training including a lecture and serious gaming within VR [30]. Compared to traditional training, it fostered better triage accuracy and partially better step accuracy. The time needed to triage did not differ between groups.

Research questions 3 and 4: current role and effectiveness of VR and MR

The VR/MR training systems were mostly used for MFR groups with little or no work experience, including students [29, 31, 61,62,63,64], cadets [38] or job starters [25]. Seven studies tested trainings that contained PC-screen-based VR (Fig. 4), although always in combination with other methods [29, 30, 39, 61,62,63,64]. Five of them covered the topic of triage [29, 30, 61,62,63], two decontamination [61, 64], one the management of COVID-19 patients [39], and one general disaster scene management [63]. The virtual scenarios mainly included manmade disasters such as traffic accidents [29], explosions in busy areas [30, 61], building collapse and fire on boats at a seaport [63] while one simulated a major earthquake [62]. Two studies that tested pre- and in-hospital trainings used either scenarios in both settings [39] or only an in-hospital scenario [64].

Fig. 4
figure 4

Overview of VR/MR studies

During the VR exercises, trainees were able to move their avatar around and perform a variety of intervention, e.g., breathing/airway checks [29, 30, 62]. While the participants usually used a mouse, keyboard and/or a joystick, one screen-based VR system tracked the trainees’ movements with a webcam as they performed decontamination exercises [64]. Training that contained screen-based VR led to mixed findings regarding knowledge but to positive performance and self-efficacy outcomes. Compared to exclusively traditional trainings, training with screen-based VR led to greater knowledge gain [39] and self-reported preparedness [39] as well as partially greater [30, 39] or similar performance levels [29]. Furthermore, the combination with computer-based learning led to greater knowledge gain than computer-based learning alone [61].

Three studies evaluated immersive VR technology [25, 31, 38]. The first one evaluated triage training in a VR CAVE [25]. The scenario was an explosion in an office building. To perform the triage, trainees observed virtual patients to assess their respiratory rate and verbally requested pulse rates. In terms of training effectiveness, the VR exercise resulted in slightly better performance but poorer knowledge scores than real-life scenario training, both in combination with lectures. Two studies evaluated VR training with head-mounted displays in which trainees used controllers to interact with their virtual surroundings [31, 38]. The first one tested a triage training with a car chase and shooting scenario [31]. Participants could click on icons attached to each casualty to gather basic clinical information and allocate triage cards. The other VR training was designed to help MFRs get a better orientation in a large ambulance bus by practicing to find essential medical equipment [38]. Both VR systems provided feedback regarding the correctness and time of task execution. Overall, these two VR trainings with head-mounted displays led to similar [31] or greater [38] performance than traditional training and to a similar learning satisfaction [31]. One of those studies, however, indicated a higher immersion level during real-life simulations which seemed to be caused by the subscale physical demand [31]. The only study using MR compared AR training to VR with head-mounted displays and traditional education [38]. The AR training was completely similar to the VR ambulance bus training except for the use of an AR headset with transparent lenses. The device projected holograms in the trainees’ field of view. With click gestures, they were able to interact with their environment, like opening/closing drawers. The AR training resulted in a better performance than traditional training, but not as much as the VR training.

Quality assessment

Overall, the study quality was satisfactory (for a detailed overview see Additional file 1: Tables 2 and 3). For the experimental studies, either none (k = 9) or one question (k = 5) out of 13 were answered with no. For the quasi-experimental studies, usually none (k = 4), one question (k = 23) or two questions (k = 13) were answered with no. There was only one paper for which four out of nine questions were answered in the negative [46]. The higher risk of bias in the quasi-experimental studies was mainly based on question 4, which assesses the control group because a large part of the studies had a single group pre-post design (k = 35). Furthermore, some studies did not have a complete follow-up or a detailed explanation or analysis for the dropout (k = 9).

Discussion

Well-trained MFRs are essential for managing disaster situations with multiple casualties [3, 4]. To ensure that future disaster training is as effective as possible, we conducted this review on scientifically-evaluated trainings which comprised both traditional and technology-based methods. The trainings were evaluated with several different effectiveness indicators, including knowledge, performance, self-reported measures, and immersion. Despite the heterogeneity of methods and outcome measures, some conclusions could be synergized. While all methods demonstrated effectiveness, the results of this review suggest that technology-based methods often lead to similar or greater training outcomes than exclusively traditional training. Furthermore, we found ten studies that used VR, although usually combined with other methods and often PC-screen-based. Only one study evaluated MR training [38].

Although trends in effectiveness could be identified, the data basis was not sufficient to declare some methods as unequivocally more effective than others. Training methods were often tested in combination, which impaired drawing unbiased conclusions about individual methods. Furthermore, the various effectiveness indicators that were used had only limited comparability. Fewer than one-third of the included studies used performance observation as an evaluation tool. Instead, several studies used knowledge tests or self-assessments (e.g., confidence) although these have limited predictive value for actual performance [65,66,67]. Despite the great variety in studies, the data basis strongly suggests the strength of technology-based methods. Several studies compared technology-based training to training with real-life scenario exercises which are usually considered the gold standard of disaster training [68]. While these studies suggest the great potential of technology-based methods, there may be a certain degree of bias. Real-life scenario training often served as (part of) the exclusively traditional training for control groups. Therefore, studies may not have been published that did not find at least an equivalent effect of their newly developed technological methods. Instead, the training technology might have been improved and retested until it was similarly or more effective, leading to a publication bias. The same might apply to practical skills training which was always used in combination with other methods and resulted in similar or lower training effectiveness than trainings that contained technology-based methods.

Generally, the current literature indicates that technology-based methods are well suited to train MFRs for disasters. Given the usually limited resources of MFR organizations, these methods promise to be particularly beneficial. Although initial investment in the technology is required, it can then be used flexibly and repeatedly. Thus, a higher, more individually adapted training frequency can be created than with many traditional methods, especially real-life scenario training.

Current use of VR/MR and its future potential

Seven out of ten studies that tested VR training focused on non-immersive, screen-based VR. The advantage of screen-based VR is that usually no hardware other than normal computer accessories is required. However, more immersive trainings offer greater similarity to experiencing real disaster situations and could therefore be even more useful for preparing MFRs for stressful and unfamiliar situations. Given that high stress can affect the performance of MFRs, training should explicitly address stress responses [69, 70]. Although some of the reviewed trainings contained in-class teaching about dealing with emotions or stress (e.g., [39, 55, 71]), we found no studies that explicitly conducted scenario-based training while assessing and controlling for stress responses. To provide more insight into behavioral changes under stress, future studies should conduct and evaluate explicit disaster training with (continuous) stress measurements to investigate its potential for MFRs. The ongoing improvement of immersive VR and MR technology [72] seems quite promising as it can provide increasingly realistic immersive training scenarios with fewer organizational demands than real-life simulations regarding time and space. Users can experience and practice an almost unlimited number of scenarios in which demands and difficulty levels can be designed as needed [73]. Our results indicate that practical exercises with immersive technology can be conducted nearly everywhere, at any time, and with relatively little preparation, i.e., without setting up a real disaster scene. Furthermore, technical progress in recent years now allows several people to interact within the same virtual environment [74] and treat patients together as in realistic rescue operations.

Future research

Given the heterogeneity of the current literature, future research should further investigate the effectiveness of individual training methods but also systematically assess whether certain combinations work particularly well. Furthermore, training methods and validated training evaluation tools should be developed not only in terms of effectiveness, but also in terms of efficiency as (financial) resources are often limited. The results of this review suggest, for example, that technological methods such as serious gaming and VR are similarly good or better than traditional methods so that complex real-life scenario trainings with actors could be at least partially replaced. There is also initial evidence that lectures, as an easily implemented method, are well suited for refresher sessions. Future research still needs to clarify the usefulness of immersive VR and especially MR as we only found one MR experimental study that matched our inclusion criteria.

The effectiveness-efficiency trade-off also applies to training evaluation. While knowledge tests offer the advantage of being very easy to conduct and evaluate, the transferability of training success to actual operations is unclear. Performance evaluations during (virtual or real-life) scenario training may be more suitable as they are closer to the target behavior of MFRs during disasters. This review has already identified some indicators, including accuracy of decisions, time needed and compliance with the correct procedure. Future research should focus on finding the appropriate performance measures for diverse disaster training contents in terms of resource efficiency, usability, and relevance. New training technologies could also provide further opportunities for performance assessment, e.g., eye-tracking to gain insights into attentional processes. Furthermore, the assessment of team performance has hardly been considered in disaster training research, although MFRs mainly work in teams. Disaster management is a team effort and is often done in ad-hoc teams similar to other domains of acute care medicine [75]. Improved and trained teamwork improves medical performance [76]. Future studies should also assess long-term benefits of the different training methods and their combination as most of the studies we found only conducted pre-post testing within a few days or weeks.

Limitations

Our review has three main limitations. First, we only included studies published in English so we might have missed relevant studies published in other languages. Second, we only kept studies in which it was either evident that the sample only consisted of MFRs or in which separate analyses for MFRs were provided. This led to the exclusion of some studies with insufficiently specified sample categories such as others. However, it might be possible that the participants were also MFRs. Third, we decided to include only quasi-experimental and experimental studies. We consider this a strength of this systematic review, as it allowed us to create a better overview of the trainings’ effectiveness. Nevertheless, we cannot draw conclusions about what training methods are generally used in disaster training research and whether new methods have been added without being tested in (quasi-)experiments.

Conclusion

We found several traditional and technology-based trainings methods. The trainings were mainly evaluated with knowledge tests and self-reported measures, while less than one third also used actual performance measures. For valid and yet inexpensive evaluations, objectively assessible performance measures, such as accuracy, time, and order of certain actions can be used. In this review, we found that technology-based methods were often similarly or more effective than traditional training. They therefore offer great potential to supplement or at least partially replace traditional training as especially the organization of the gold-standard, real-life scenario training, can be costly and time-consuming. Two training technologies that have become increasingly popular and affordable are VR and MR. This review suggests that they have great potential which is why further assessments of these technologies are required.