INTRODUCTION

Challenges in providing sufficient clinical instruction and supervision of invasive procedures have led to an increased use of supplementary multimedia sources, such as instructional videos.1 , 2 Indeed, free online video-sharing websites have become increasingly popular among medical students and residents.3 This trend will likely continue as video recording equipment and editing software become less expensive, and anyone will be easily able to produce videos and distribute them on the internet. Concerns have been raised regarding this widespread production and use of videos. First, studies assessing videos available on YouTube have identified generally low-quality and even dangerously misleading content.4 , 5 Second, video design often fails to address pedagogical effectiveness,6 and the principles underlying the design choices are rarely described in studies. Third, studies evaluating the effectiveness of video instruction in preparation for performing an invasive procedure have demonstrated conflicting results.7,8,9, 10 The Association of American Medical Colleges (AAMC) Institute for Improving Medical Education reported that research in educational technology is characterized by deficiencies in either methodological approach, conceptual framework, or both.11 That report recommended that researchers investigate the selection, sequencing and presentation of information in instructional videos.

The formats of preparatory interventions (PIs) based on text and video are by definition a one-way presentation of information, without provision of feedback. These limitations underscore the need for accurate and unambiguous choices on the content presented. PIs are often produced by content experts. However, because performing the procedure is so perfunctory for these experts, they may omit key elements that are essential for novice learners, thus compromising the effectiveness of the intervention for student learning.12

The challenges with respect to pedagogical design can be addressed by two evidence-based principles for optimizing the pedagogical potential of the PI for clinical procedures. First, preparatory interventions including goal orientation have been found to be effective.13 Models of goal setting broadly distinguish two strategies: a process-goal orientation focusing on the strategies used to complete a task, and outcome-goal orientation focusing on the product or end results.14 Presenting task-specific process goals to novices prior to their performance of a task is reported to be more effective than presenting outcome-specific goals,15 which indicates that the use of task-specific process goals may optimize the pedagogical potential of instructional videos intended for invasive procedure preparation.

Second, the content incorporated in the PI should present only the knowledge needed to improve novices’ performance16 and should avoid presentation of extraneous information.17

Lumbar puncture (LP) is a complex procedure18 which is crucial for the diagnosis of serious medical conditions, and frequently must be performed on an urgent basis.19 Junior doctors express uncertainty about performing the LP,20 , 21 and perform below stakeholder expectations.22 As residents often use video and text to prepare for performing procedures on patients, the effectiveness of different formats should be investigated. Although instructional videos have been reported to improve operator self-confidence,7 , 23 the correlation with actual clinical performance demonstrates conflicting results.7 , 23 Hence, there is a need to explore how different instructional designs affect novice self-confidence, and how their confidence correlates with performance. We hypothesized that the performance of participants who prepared using a goal-setting/learner-centered video would be superior to that of groups using a traditionally designed video or instructional text. The aims of this study were to investigate the effect of three different formats of preparatory instructional material—a video including goal-setting and learner-centered information, a traditionally designed video, and written instructional text—on novice doctors’ LP performance and self-confidence in a simulated clinical setting, and whether the different interventions influenced the relation between self-confidence and performance.

METHODS

Study Design

This study was an interventional, single-blinded randomized controlled trial with three arms. A CONSORT diagram is provided in Figure 1.

Figure 1
figure 1

CONSORT flow diagram of the study.

Participants

We recruited postgraduate-year 1 (PGY-1) doctors with less than 6 months of postgraduate clinical experience and with no previous experience in performing LP.

Participants were recruited over a 12-month period by email invitation to new graduates from the medical school at the University of Copenhagen and to junior doctors enrolled in first-year postgraduate training in one of the ten hospitals in eastern Denmark. As part of the enrollment, participants used a web-based portal to provide demographic information and booked a date for participation. Throughout the inclusion period, as participants approached their designated date, they were cluster-randomized by LK upon enrollment using (Random.org).24

Interventions

For the study, we designed two instructional videos and a written instructional text. As participants in our study were non-English speaking, we found it most fair to design both videos in the local language and according to local standards for performing LP. Participants prepared individually with one of the interventions for 15 min. Participants were allowed to repeat the video at their discretion within the time limit.

Goal- and Learner-Centered Video (GLV)

The design of the goal- and learner-centered video (GLV) was based on a previous study of learner-requested information and the differences between experts and novices with respect to their approach to performing LP.25 The results demonstrated that expert performers contextualized the procedure by integrating both the technical and non-technical aspects related to the patient, the clinical surroundings, the equipment, and the doctors’ own preparedness to perform. By contrast, novices express a fear of causing damage and focused on the outcome goal of getting the sample,25 , 26 and they requested information about anatomy related to needle insertion.

Based on these results, MJVH, YS and CR designed the video, structured as follows: anatomy and needle insertion, process goals for the procedure, attention to positioning of the patient and communication during the procedure, equipment, the procedure, and problem-solving. The video was 7 min long.

Traditional Video (TV)

The design of the traditional video was inspired by the content and structure of a frequently cited instructional video for the LP procedure from The New England Journal of Medicine (NEJM).27 The NEJM video focuses on the technical aspects of performing the procedure and does not address any process goals.

We developed the TV to include the following elements: indications, contraindications, equipment, positioning, the procedure, and anatomy. No elements of patient communication or strategy for performing the procedure were presented. This video was six minutes long.

An example of the difference between process goals and outcome goals in video design is exemplified by how each video presents the identification of the needle insertion point. The TV instructs the learner to identify the L4 level and subsequently the interspaces between the superior and inferior vertebrae. In contrast, the GLV instructs the learner to palpate the iliac crest, and the L4 level is then identified by bringing the thumbs together. The optimal needle insertion point is identified by palpating the spinous processes in a caudal to sacral direction, maintaining awareness of the patient’s back and the spaces between the vertebrae.

Written Text (WT)

The written text described the steps for performing the procedure with three illustrations,28 and was six pages long (pocket size).

Assessment of Self-Confidence

After the preparatory intervention, the participants rated their self-confidence on a seven-point Likert-scale, based on the following question: “How confident do you feel in the procedure you are about to perform?” (1 = very unconfident to 7 = very confident).

Study Setting

We established a hybrid simulated setting. The participants performed the LP individually and were instructed to act as they would in a real clinical setting, including engaging with the patient. We chose to limit participant performance to a single attempt of the procedure to mimic the clinical context where trainees access instructional materials and subsequently perform the procedure on a patient. We provided the participants with authentic medical records including laboratory results and computed tomography (CT) brain results. We created a ward-like setting including a standardized patient (SP), a standardized assistant, and an LP phantom (Kyoto Kagaku Lumbar Puncture Simulator II; Kyoto Kagaku Co., Ltd., Kyoto, Japan). The phantom simulated the lumbar anatomy including landmarks for palpation, lifelike skin and tissue resistance, and the possibility of obtaining a fluid sample. The SP was instructed to express moderate fear and being uninformed about the procedure. The phantom was introduced when participants had positioned and marked their insertion point(s) on the SP. The procedure continued with the SP having the phantom strapped to her back, and thus communication continued during the remainder of the procedure. Participants who failed to obtain fluid had to inform the SP and terminate the session.

Assessment

For assessment of LP performance we used the previously developed Lumbar Puncture Assessment Tool (LumPAT), which has demonstrated acceptable inter-rater reliability and positive correlation with other measures of procedural performance in a similar population of residents.25 The LumPAT is a global rating scale that assesses both technical aspects of performance and non-technical aspects, such as patient communication. The LumPAT scale ranges from 11 to 55 points, with a previously established standard of 44 points required to pass (see online appendix for the LumPAT). In addition, the raters assigned a global judgment score for the entire performance based on a seven-point Likert scale. All procedures were video-recorded. The anonymized videos, as well as the assessment tool, were distributed to the two raters through a web-based system.29 The raters were content experts—consultants and associate professors in anesthesia (RVBJ) and neurology (HT). We further documented whether participants were successful in obtaining liquid from the mannequin.

Statistical Analysis

All continuous variables were evaluated for normal distribution. Normally distributed variables were reported as a mean and standard deviation (SD), and were tested using parametric tests. Non-normally distributed variables were reported with median and interquartile range (IQR), and tested using non-parametric tests.

We compared baseline characteristics among the three groups. Age was compared using one-way analysis of variance (ANOVA), and gender distribution was compared using the chi-square test. Due to a lack of normal distribution, the number of observed LP procedures was compared using the Kruskal–Wallis test.

Inter-rater agreement was evaluated with the intraclass correlation coefficient (absolute agreement definition).

To explore the effect of the three different PIs, we compared performance scores among the three groups using the mean of the raters’ scores on all the LumPAT items, and subsequently the mean of the raters’ global ratings on the seven-point Likert scale, using one-way ANOVA. As post hoc tests to identify differences between groups, we used independent-samples t tests. Effect sizes for differences between groups were calculated using Cohen’s d, based on mean values for LumPAT items and global ratings.

The chi-square test was used to compare the difference between success and failure in obtaining liquid among the groups. To assess the relative effect of passing the established standard of the LumPAT, we defined WT as the reference for calculating the relative risk (RR) of a passing score for the other groups using a chi-square test.

Self-confidence scores were not normally distributed, and thus we used nonparametric tests. To explore whether age and the number of observed LP procedures correlated with self-confidence scores, we used Spearman’s correlation. Self-confidence scores were compared by gender using the Mann–Whitney U test. Differences in self-confidence among groups were compared using the Kruskal–Wallis test. Individual groups were compared with one another using the Mann–Whitney U test.

We used Spearman’s correlation to explore the correlation between self-confidence scores and both the mean LumPAT score of the two raters and the mean of the raters’ global judgment scores for each of the three groups.

Statistical analysis was performed using SPSS version 22.0 software (IBM Corp., Armonk, NY, USA).

Ethics

This study contained no patient participation or collection of biological material. The project was presented to the local ethics committee of Capital Region Denmark, who waived the need for further approval (journal number: H-15018242). All participants were informed verbally and in writing regarding the purpose, that the study was voluntary and entirely anonymous, and that it would not impact their future clinical appointments. All participants completed written informed consent.

RESULTS

A total of 110 participants completed the study. Data on baseline characteristics of participants are presented in Table 1, and demonstrate no differences among the three groups.

Table 1 Participant Characteristics

Intraclass correlation coefficients for the LumPAT mean score were 0.71 and 0.69 for the raters’ global ratings.

There were significant differences in the LumPAT mean scores among the three study groups (p = 0.01; Table 2). Post hoc analysis demonstrated that the GLV group had a significantly higher mean LumPAT score than the WT group, with a medium effect size (d = 0.73, p = 0.02). There was also a significant difference among the global ratings of the three groups (p = 0.026). Post hoc analysis demonstrated that the GLV group scored higher than both the TV and WT groups.

Table 2 Results of Performance Tests for LumPAT Score, Rater Global Judgment, Participant Self-Confidence Scores and Number of Participants Successfully Obtaining Liquid

There was a significant difference in self-confidence scores among the three groups (p = 0.003). Post hoc analysis demonstrated that the TV group had a significantly higher self-confidence score than the WT group, with a high effect size (d = 0.85, p = 0.002). The GLV group had a higher median score than the WT group (p = 0.046).

The percentage of participants who successfully obtained liquid was highest for the GLV group and lowest for the TV group, but the difference was non-significant (see Tables 2 and 3).

Table 3 Results of Post Hoc Test Demonstrating the Differences Between Groups in the LumPAT Scores, Rater Global Judgment and Participant Self-Confidence Scores

The relative effect of the different formats revealed that the GLV group was 2.6 times as likely to obtain a passing score as the WT group (p = 0.02). There was no difference between TV and WT (see Table 4).

Table 4 Consequences of Different Formats of Pre-Training in Meeting the Standard for Lumbar Puncture

There were no significant correlations between participants’ self-confidence scores and age (rho = −0.02; p = 0.85; Spearman’s correlation), number of previously observed LP procedures (rho = 0.07; p = 0.44; Spearman’s correlation), or gender (p = 0.22; Mann–Whitney U test).

The correlations between self-confidence scores and mean LumPAT scores were non-significant for all three groups: GLV group correlation coefficient = 0.14, p = 0.46; TV group correlation coefficient = 0.015, p = 0.93; WT group correlation coefficient = 0.19, p = 0.27. The correlations between self-confidence scores and mean rater global judgment scores were non-significant for all three groups: GLV group correlation coefficient = 0.19, p = 0.31; TV group correlation coefficient = −0.12, p = 0.43; WT group correlation coefficient = 0.05, p = 0.77.

DISCUSSION

This study demonstrates that a video based on a learner-centered design, including a demonstration of procedure-specific process goals, significantly improved immediate performance of a simulated LP compared to written text instruction. However, such benefits were not found for a video based on a traditional design. We further demonstrated that learners’ self-confidence did not correlate with their performance, irrespective of the format used for preparation. Participants preparing with the GLV were 2.6 times as likely as those using the written text to meet the established standard for readiness to perform the procedure in clinical practice. As instructional videos are often used as just-in-time preparation for clinical procedures, the present study suggests that doctors using the GLV would be better prepared for performing standard accepted LP techniques and patient-centered care than those using traditionally designed videos or text instructional material.

The strength of this study is that participants were residents with limited procedural experience, as this group frequently uses either text- or video-based instruction as preparation for performing clinical procedures.30 The study demonstrated sufficient inter-rater reliability31 and used an assessment tool specific to the procedure with an established standard for readiness to perform the procedure independently.25

The outcomes for the GLV group were significantly superior to those for the WT group, whereas those of the TV group were not. Previous studies on the effect of videos have not provided rich descriptions of the choices made when determining the means of delivering the content of their videos. Our study shows that reporting such details is necessary, as these design and content decisions can have a significant impact on participant performance. These factors, in turn, may explain the conflicting results among previous studies exploring the effect of instructional videos.7,8,9, 10

The significant difference in rater judgment between the GLV and TV groups suggests that important clinical contextualization can be provided by the instructional video format. The integration of non-technical and technical aspects of a procedure has been found to be beneficial for simulation-based training.32 In our study, this contextualization, along with the presentation of the process goals, may explain the 2.6-fold greater chance of meeting the established performance standard with the GLV versus WT approach. However, we found no significant difference in the ability to obtain liquid.

The identification of a significant difference in self-confidence scores between the TV and WT groups, but not between the GLV and WT groups, demonstrates that the design and content of the video format influence learners’ self-confidence but do not correlate with performance. Several studies have described a lack of congruence between objective performance scores obtained from simulation-based assessment and self-ratings of confidence,33 , 34 including among residents performing infant LP.7

There are several clinical implications of this study. We find that the concept of presenting procedure-specific process goals can be easily integrated into the design of instructional videos, and we therefore recommended this approach. This additional effort is believed to be cost-effective, as the learners subsequently have a greater chance of achieving the standard for performing the procedure. Further, we recommend that developers of instructional videos examine differences in procedure-related perceptions and strategies for performance between expert and novice performers. Identifying this gap is important, because content requests from novices frequently differ from those of experts. As a result, there is a risk that content experts could unintentionally overlook important needs in their teaching strategies for clinical procedures.12 Another implication of the study is that a video developer’s evaluation of PI should not rely solely on learner self-rating of confidence, but rather on objective assessment of performance.

There are some notable limitations to the study and in the interpretation of the results. The differences between the GLV and TV groups in LumPAT scores and in the percentage successfully obtaining liquid were non-significant, which could be due to a type II error. It was not possible to perform a power calculation for the present study, but we felt that the sample sizes were large enough that any significant differences would be clinically relevant. Novice performers, as those in the present study, tend to have a high level of variance in their performance,35 and therefore an additional study limitation is that we only evaluated a single one-off performance after the intervention. Further, the long-term effect of the different formats was not explored with a retention test. Nonetheless, studies comparing the instructional design of outcome versus process goals for simulation-based training found no differences between groups for 1-week retention,36 and no differences were found between post-training and a 6-week retention test for dyad versus solo training.37 The GLV included presentation of procedure-specific process goals and learner-centered information, making it difficult to discern the specific mechanism for the observed effect. However, our aim was to optimize the pedagogical effectiveness of PIs, and hence the GLV should be considered as a single entity. It is also important to note that this study did not explore the effect of the intervention on clinical performance. However, this limitation is minimized by the fact that the LumPAT was previously developed and validated for assessing residents’ readiness to perform LP independently, and the established standard reflects this aim.25 Moreover, meta-analysis of the correlation between simulation-based assessments and patient-related outcomes has shown that these measures correlate well if based on rigorously established assessment tools.38

Conclusion

We have identified that an instructional video with a learner-centered approach and procedure-specific process goals was associated with significantly better participant performance in comparison to written instructions. A video based on a traditional design was associated with less benefit with respect to participant performance, but corresponded to higher participant self-confidence than a written text intervention. No correlation was found between self-confidence scores and objective performance scores in any of the groups. More rigorous evaluation of preparatory interventions for invasive procedures should be considered in order to optimize patient safety.