Background

Healthcare simulation training is a training strategy that is often recommended as a way of improving patient outcomes. It is thus often suggested for training high-risk situations such as postpartum hemorrhage (PPH) [1,2,3,4,5]. Achieving such improved outcomes, however, requires that a large number of interconnected elements be present, including the effectiveness of simulation training [6, 7]. Current literature provides much evidence of healthcare simulation training leading to positive learning outcomes [8, 9]. Evidence regarding transfer of learning (i.e., learners’ ability to apply the acquired knowledge and skills in the workplace subsequent to training) is, however, still being consolidated [10, 11]. Systematic reviews exploring the effectiveness of simulation training have underscored the importance of adherence to evidence-based instructional design guidelines as a conditioning factor related to the achievement of such transfer [12, 13].

Instructional design (ID) guidelines are based on sound learning theories and models and present a number of cognitive principles that aim to optimize complex learning and learning transfer [14, 15]. Complex learning concerns the proper integration of knowledge, skills, and attitudes, which is essential for the management of high-risk situations such as PPH [16]. Systematic reviews exploring the impact of simulation training on patient outcomes have already acknowledged the relevance of design features such as variability (clinical variation), repetitive practice of routine aspects, increasing complexity, mastery of learning (uniformly high achievement of standards), and providing feedback [7, 13, 17, 18].

The various ID guidelines available include Merrill’s First Principles of Instruction [19], which is a meta-model involving an overarching summary of available ID guidelines [20], proposing five key instructional principles for task-centered learning. These are based on careful analysis of a wide range of cognitive learning models: (1) identification of an authentic problem (since learning is promoted when learners are engaged with real-world problems), (2) activation of prior knowledge as the foundation for new knowledge, (3) demonstration of the task to be learned, (4) application of newly acquired knowledge by learners, and (5) integration or transfer of new knowledge into the learner’s world.

Simulation training has been widely advocated for PPH, the leading cause of maternal mortality worldwide, because most deaths related to this occurrence are attributable to management failures. To avoid such failures, which include delayed diagnosis, poor communication, and lack of adequate education and training, simulation training should be effective for both learning and transfer of learning [1, 5, 21,22,23,24].

Applying evidence-based ID guidelines to healthcare simulation training formats should be a priority when aiming to achieve transfer of learning and improve patient outcomes [10, 12]. This is of particular relevance for commonly encountered high-risk situations, such as PPH, in which achievement of adequate complex learning may be essential for maximizing patient safety [22, 25]. It was, therefore, necessary to explore the available literature for descriptions of the ID features used.

Methods

The present study aimed to explore the extent to which articles in the literature describe simulation training programs for dealing with a high-risk situation—in this case PPH—as adhering to evidence-based ID guidelines.

We invited a panel of healthcare experts to appraise the use of evidence-based ID guidelines in PPH simulation training programs described in the literature by scoring the extent to which their use is described or lack of such description. We chose a particularly prevalent high-risk situation, PPH, as the training content to be analyzed, on account of its epidemiological importance [4], which has led to a widespread use of PPH simulation training programs. This study formed part of a broader research project on the use of instructional design guidelines in postpartum hemorrhage simulation training, which was submitted to and approved by the Institutional Review Board of the Instituto de Medicina Integral Prof. Fernando Figueira (IMIP), in Recife, Brazil, on March 17, 2012, CAE No. 0034.0.099.000-11.

Participants

The participating raters were healthcare experts with a background in health education and, in particular, the training of health professionals. The raters were identified in two rounds and invited by email to collaborate. In the first round, from June 2015 to August 2015, we contacted authors and co-authors of previously published articles describing PPH simulation trainings. In the second round, from November 2016 to December 2016, we identified authors of abstracts listed in the Abstracts book of the International Association for Medical Education (AMEE) Conference 2016 with topics related to either simulation and/or instructional design. The corresponding contact information was located through Google Scholar profiles and similar webpages to confirm the authors’ backgrounds in health education and training expertise and to exclude undergraduate students. The raters contacted were asked to recommend other healthcare experts with a similar background who could also be invited. After both rounds, 98 raters were invited by email, of whom 60 agreed to participate and 40 returned the completed rating scales.

Materials

The rating scale used for the analysis was based on Merrill’s First Principles of Instruction. Table 1 presents the complete list of the 24 rating-scale items, which were divided into the following five subscales: (1) authenticity, (2) activation of prior knowledge, (3) demonstration, (4) application, and (5) integration/transfer. Each item of each subscale was rated on a 5-point Likert scale: 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree, and 5 = strongly agree. If the corresponding item had not been described in the article reporting the PPH simulation training, raters could select a “not described” or “not applicable” option.

Table 1 Rating-scale items used for the analysis of articles, based on Merrill’s First Principles of Instruction

The rating scale was pre-tested in a pilot study with seven instruction experts who approved it for clarity.

We analyzed PPH simulation training programs as described in articles identified by searching PubMed, Eric, and Google Scholar for studies published in English between January 2007 and March 2017, using the following keywords: “post-partum hemorrhage” AND “simulation” OR “simulation training” OR “medical simulation” OR “obstetric simulation.” We included studies retrieved by our keyword search and which described simulation training scenario(s) aimed at complex management of PPH that were attended by healthcare professionals. Articles were excluded if they lacked a description of PPH simulation training, provided secondary analysis of a PPH simulation scenario already described in one of the other articles, or described simulation scenarios intended for the training of specific individual PPH management-related skills. Our search yielded 51 studies, and after exclusion of 19 (10 for lacking a description of PPH simulation training itself, six for describing simulation scenarios for the training of specific individual PPH management-related skills, two conference abstracts, and one secondary analysis of a PPH simulation training already described in one of the other articles), the remaining 32 articles were analyzed. The remaining 32 articles were subdivided into the following five subsets to facilitate distribution for scoring by the raters: articles 1–7, 8–14, 15–21, 22–28, and 29–32. Figure 1 presents a flow diagram of the selection of the articles analyzed.

Fig. 1
figure 1

Flow diagram of the selection of the articles/PPH simulation trainings and subset distribution for raters

We prepared information tables for each article to facilitate analysis for raters. These contained the following information: article title, publication date, journal and publishing data, abstract, study design as described in the article, number of participants, and instructional aspects of the training. The selected articles were also carefully read, multiple times in full, searching for any description of the following training aspects of the PPH scenarios: presentation, practice, feedback, and assessment. These text segments were extracted and highlighted in the prepared tables as instructional aspects of the training. The full text of all the articles was also made available for consultation. Some of the raters reported consulting the full text of the articles only to confirm the absence of a description of one or more training aspects.

Procedures

Upon agreeing to participate as a rater in the study, each rater received, by email, one of the subsets of articles for analysis along with the rating scale, distributed in a crossover fashion to avoid self-rating (for those who were both raters and authors of included articles). We also provided them with an instructional guide to help them understand the ID model to ensure that they were fully informed when assessing the articles. We also provided detailed instructions on how to fill out the rating scale (each item to be rated for each paper), and the corresponding subset information tables for the articles, which were sufficient for the analysis. All raters were invited to consult the authors by email, if necessary, and guidance was provided for any rater who did so. We have appended an additional file containing the complete set of instructions provided for raters (see Additional file 1).

We distributed the subsets of articles as soon as raters agreed to participate in the study and aimed to obtain an even number of final ratings. We consulted the raters regarding the feasibility of a 6-week deadline for returning the filled-out scales but were flexible about this when necessary. Of the 60 raters who agreed to participate, five declined to participate further in the study after receiving the materials for analysis, and 15 did not reply to subsequent attempts to contact them by email. A final total of 40 raters returned completed rating scales, constituting a response rate of 66.7%. The final numbers of raters scoring each subset of articles were as follows: subset 1–7 (eight raters), subset 8–14 (eight raters), subset 15–21 (seven raters), subset 22–28 (seven raters), and subset 29–32 (ten raters). Consequently, the data consisted of five blocks, each comprising the ratings of Na (number of articles) articles by Nr (number of raters) raters, where Na and Nr varied as indicated above. We chose to invite a large number of raters, as we expected significant variation in the scores given to the articles and wished ultimately to use mean scores as our primary measure.

Statistical analysis

We used SPSS version 23 (IBM, Armonk, NY, USA) and Excel version 16.13.1 (Microsoft, Redmond, WA, USA) for data analysis. The first step of the analysis involved averaging item-specific scores for each article across all raters. The resulting article-level item scores were used as indicators of the article’s level of observed coverage of the items. In the aggregation, the “not described”/“not applicable” and missing answers were therefore recoded as “strongly disagree.” A resulting score < 3.00 thus indicated “little or no coverage observed.” In the following step, article-level subscale scores were obtained by calculating the average score of the corresponding items per subscale, thus providing indicators of an article’s level of coverage of Merrill’s First Principles (authenticity, activation of prior knowledge, demonstration, application, integration/transfer). The coverage of the subscales in the current sample of articles was explored by confirming a normal distribution and producing boxplots, M±SD, and percentiles.

Generalizability theory [26] was applied to the original intra-article rater data in order to estimate the interrater reliability (IRR) for each of the five subscales. We calculated the generalizability coefficient (G) as an estimation of reliability. In terms of generalizability theory, each of the five blocks has a so-called a × r design (ratings of Na articles by Nr raters), and variance components (V) for article, rater, and article-rater interaction (Va, Vr, and Var, respectively) were obtained accordingly from each block of data. Taking the average of each component over the five blocks, a generalizability coefficient was calculated using the equation G=Va/(Va+Var/Nr), where Nr is the number of raters. The IRR is consequently higher for a block with more raters and, in the case of our data (with unequal Nr over blocks), we will thus find a range for the IRR over the five blocks. The IRR was calculated as indicated above for each of the five subscales. The resulting IRRs were qualified by applying the classifications proposed in Hallgren (2012) for intraclass correlation coefficients (ICCs) measuring IRR (of which G is an example).

Results

Descriptive statistics for the subscale scores (5-point Likert) of the sample of articles (N=32) are shown in Table 2, which also provides the relative IRR (generalizability coefficient G) for each subscale.

Table 2 Subscale scores (5-point Likert) of articles (N=32) and relative interrater reliability (IRR) (generalizability coefficient G)

Further information on the selected articles is provided in Tables 3 and 4.

Table 3 Information on articles analyzed (author, year of publication, title, and brief description of the methodology)
Table 4 List of rated articles with full references

For all subscales, the mean scores were found to be lower than 2.68, with more than 75% of the item scores below 3.04 and over 50% below 2.71. These findings indicate that the raters noted a paucity of description of aspects relating to adherence to evidence-based ID guideline aspects in a large majority of the PPH simulation training programs. For the authenticity, activation of prior knowledge, application, and integration/transfer subscales, the IRR varied between 0.68 and 0.88, which we considered to represent “good to excellent” agreement for the purposes of the present study. The IRR for the demonstration subscale was 0.56–0.65, which is not fully acceptable for our purposes.

Discussion

Our Likert-scale mean scores were below the neutral score of 3 for all subscales. This indicates a pervasive lack of description of adherence to the main principles of evidence-based ID guidelines in simulation training for high-risk situations such as PPH. Our findings for four of the subscales — authenticity, integration/transfer, activation of prior knowledge, and application — are particularly worthy of note. These subscales presented IRR values ranging from good to excellent. The IRR level found for the demonstration subscale may be the result of incomplete or missing descriptions of the ID features relating to this subscale.

The raters’ overall agreement on the lack of coverage of evidence-based ID guidelines for almost all subscales reveals a lack of adequate description of the use of relevant ID features in PPH simulation training. Such lack of adequate description raises concern regarding the appropriate use of relevant ID features and the potentially detrimental effect of this on the transfer of learning. The proper description should involve reporting guidelines and the latter should, also for the sake of transfer of learning, present, in detail, key elements of evidence-based ID guidelines [27,28,29].

We can only speculate as to the reasons underlying this paucity of an adequate description of the use of evidence-based ID guidelines by those who promote simulation. The large body of sound evidence available as to the potentially detrimental effects on learning and transfer of learning when ID guidelines are not properly taken into account makes it unlikely that this finding can be attributed to a lack of awareness of the issue [7, 18, 30]. Moreover, evidence of positive learning and transfer outcomes when instructional approaches adhere to evidence-based ID guidelines has been produced for other areas of content besides simulation training of high-risk situations, including evidence-based medicine and decision-making [9, 11, 12, 25].

While adequate use of evidence-based instructional features has been shown to be necessary for ensuring the effectiveness of various methods of instruction, including simulation training, faculty development is another crucial factor contributing to the success of simulation [28, 31, 32]. The use of strategies to enhance awareness among faculty members with regard to incorporating innovative designs has thus been acknowledged to contribute to better simulation outcomes and should be promoted [33]. We, therefore, believe that faculty development could further raise awareness regarding the benefits of adequately using and describing the use of relevant evidence-based ID guidelines for effective simulation training outcomes.

One practical implication of our findings may be to recommend the use of a checklist of ID features based on the items described in our rating scale (Table 1) when designing simulation training. Most likely, some adjustments would have to be made, such as varying the number of cases, in so far as this is feasible, to accommodate budget and time constraints. Such a checklist would also probably require some tailoring before being applied to simulation training formats with specific goals (e.g., mastery of learning). We are aware, however, that such adjustments and tailoring may be particularly challenging, and this may explain the lack of description found.

Our concern with these findings regarding a general tendency not to report adherence to evidence-based ID guidelines is underlined by the results for specific items from the rating scale. For instance, it is worth drawing special attention to items from the authenticity subscale, which specifically refers to exposure to variability with phrases such as “scenarios differ from each other to the same extent as real-life tasks” and “scenarios are sequenced from simple to complex.” Potential lack of exposure to multiple scenarios may seriously jeopardize simulation training for high-risk situations, since this has a detrimental effect on a core complex learning principle for achieving transfer — exposure to broad clinical variation [15, 34]. When managing a complex high-risk situation, such as PPH, healthcare professionals should be able to make use of a systematic approach to problem solving, so as to be able to properly manage the clinical conditions present. Such ability relies heavily on exposure to clinical variation, if it is to be adequately developed [16, 35,36,37].

The influence on learning and transfer of learning of the various ID elements described in the rating scale items of each of the subscales (authenticity, activation of prior knowledge, demonstration, application, and integration/transfer) has frequently been demonstrated [14, 15, 19, 35, 37]. Therefore, even for a subscale with an IRR considered to be “fair,” such as “demonstration,” possible neglect of some of its instructional features may compromise the effectiveness of simulation training. For instance, failing to demonstrate the skills to be learned, as highlighted in the items “trainees are given demonstrations of the skills and/or models of the behaviors they are expected to learn” and “trainees receive multiple demonstrations that represent alternative ways of performing the skills that need to be learned” may also significantly hinder the complex learning and transfer of learning essential for the proper management of high-risk situations, such as postpartum hemorrhage [15, 19, 38].

Our overall findings provide further support for concerns previously raised by systematic reviews on simulation training effectiveness and the lack of use of evidence-based ID guidelines [7, 18]. We also consider the large number of articles identified and included in our analysis an important strength of our study. It is also worth noting the large number of more recent studies included in our analysis, demonstrating growing interest in training healthcare providers for high-risk situations such as PPH [39]. Our findings, however, indicate that even recent studies of simulation neglect to describe using evidence-based ID guidelines and it is thus reasonable to infer that they did not use them. This may significantly compromise learning and transfer of learning. Furthermore, such studies indicate a worrying lack of awareness regarding these ID guidelines on the part of those who design such simulation training [1, 40].

We acknowledge that some of the articles analyzed did report adherence to evidence-based ID guidelines in the PPH simulation training described. However, our strategy using mean score per subscale for our analysis may have led to some of the instructional strengths of some of the simulation trainings described being overlooked and this should be considered a limitation of our study. Analysis of a single simulation training content area (i.e., PPH) may also be seen as a study limitation, notwithstanding the high epidemiological prevalence of PPH and its similarity to other high-risk situations. However, other training content areas that focus more on deliberate practice of routine aspects of a task (such as Rapid Cycle Deliberate Practice) [37] and less on whole-task practice may require a different set of instructional design guidelines. In terms of the selection of raters, our goal of achieving an adequate number of participants may have led to the inclusion of raters who did not strictly adhere to Ericsson’s [41, 42] criteria for being considered an expert.

Furthermore, some of our experts may have lacked the PPH knowledge necessary to assess content-related design principles (e.g., authenticity). Finally, our aggregation protocol — recoding “not described”/ “not applicable” as “strongly disagree” — may also be seen as a study limitation. We nevertheless believe it is justifiable to consider the lack of reporting of the use of evidence-based ID guidelines as indicating potential disregard as to the importance of such guidelines.

Future studies of the use of evidence-based instructional design guidelines in healthcare simulation should include a larger number of content areas for analysis and aim to identify instructional strengths using specific simulation trainings described in the literature. Likewise, exploratory research design may contribute to a better understanding of the current reasons for shortcomings with regard to adequate description of ID guideline features. The use of alternative rating strategies may also improve interrater reliability.

Conclusion

In conclusion, we highlight the overall paucity of descriptions of the use of evidence-based ID guidelines in simulation training programs for high-risk situations, such as PPH. Encouraging faculty to further promote adequate use and description of these guidelines, particularly when reporting data regarding simulation training programs, may help to improve simulation training effectiveness and transfer of learning.