Direct laryngoscopy (DL) and tracheal intubation are essential but complex clinical skills. New residents typically show a rapid improvement during the first 20 attempts. Nevertheless, a mean of 45-57 attempts is required to reach a 90% success rate,1-3 and some residents may not achieve an 80% level of competence despite 100 attempts.4 Accordingly, non-anesthesia trainees rarely have the opportunity to perform the requisite intubations. At our university, medical students are required to perform “assisted direct laryngoscopy” as part of their two-week anesthesia rotation and thus may have insufficient opportunity to acquire skill in DL.5 This situation presents an opportunity to attempt to accelerate skill acquisition.

Several factors contribute to the limited opportunities for acquiring skill in DL, including other training and experiential expectations, e.g., venous cannulation skills, production pressure in the operating room, patient safety, and wider subspecialty exposure. When DL is being performed, the instructor may assume control and sacrifice a teaching opportunity due to uncertainty about the student’s progress. Since the instructor cannot see what the student sees when using DL, it is difficult to provide meaningful feedback. Moreover, if feedback is of limited value and presented at a stressful time for the patient, student, and instructor, the educational value is questionable.

Prior to this study, we observed that many students with limited experience performed successful intubations with a video laryngoscope (VL). Some VLs require a different technique from that of DL, e.g., midline insertion and manipulation of a styletted tracheal tube. Since our training program presumes students will acquire skills in DL, we explored whether Macintosh-style VLs (MacVL), using direct viewing, might provide an opportunity to accelerate DL skill acquisition. The GlideScope Direct® (Verathon Medical, Bothell, WA, USA) and C-MAC® video laryngoscope (Karl Storz Endoskope, Tuttlingen, Germany) resemble the Macintosh laryngoscope and can be viewed directly or indirectly on a monitor. They appear to increase the success rate of students performing DL while the instructor views the progress on the monitor.6,7 This enhanced visualization using the monitor allows the instructor to provide real-time feedback, which is more meaningful and may accelerate skill acquisition. Others have shown that such feedback has a positive impact on subsequent tracheal intubation in mannequins8 and patients.9 In addition, the laryngoscopy can be video recorded for later review at a time and frequency more conducive to learning.

The use of video review in medical education and skills assessment is increasing;10 however, specific investigations in the context of teaching DL would be beneficial. From the educational perspective, a video review can be understood as reflection-on-action and enhancement of self-evaluation.11 Analysis of video recordings offers a strategy for future practice, completing Gibbs’ reflective cycle for enhanced self-directed lifelong learning.12 Recordings increase self-awareness and the development of critical thinking13 alongside self-assessment skills.14-17 We adopted a similar approach when we considered the value of an independent self-review of own MacVL video recordings.

There is a lack of studies comparing the impact of teaching DL to novices using MacVL on actual patients vs using traditional teaching methods. Furthermore, we do not know what impact an independent self-review of tracheal intubation attempts with MacVL may have on learning DL. This pilot study was conducted to assess the feasibility of conducting a larger multicentre randomized-controlled trial (RCT). The objectives of the pilot study were to establish participant recruitment and retention rates, to determine the feasibility of data collection and equipment availability, and to ascertain the magnitude of differences in DL skills achieved in different groups. We then intended to use the results of the pilot study to estimate sample size for a more definitive study.

We expect the future RCT to test the hypothesis that teaching with a Macintosh-style video laryngoscope (MacVL) vs conventional DL instruction during a training phase improves DL performance during a testing phase. We aimed to make a further comparison between this result (i.e., MacVL vs DL) and that of a third group randomized to receive MacVL training and to self-review video recordings of their performance. We hypothesized that the trainer’s real-time verbal feedback while viewing the monitor would result in trainees acquiring DL skill earlier than with conventional DL—as reflected in shorter times to perform intubations. Furthermore, we assumed that students who were provided with recordings of their laryngoscopies would acquire even greater benefit.

Methods

Participants

Research ethics board (REB) approval was granted in November 2013. The REB waived the need for patient consent, as the students would be performing supervised DL and tracheal intubations on these patients as part of the regular learning objectives—if deemed appropriate by the anesthesiologist and irrespective of their participation in the study. Medical students at the University of Toronto have a mandatory two-week rotation in anesthesia during their third year. Fifty to 60 students rotate through the anesthesia departments at the Toronto General Hospital and Toronto Western Hospital during an academic year. Most of these students have no prior experience with DL and represent an ideal study group of novices.

All third-year medical students rotating through our departments during January 2014 to August 2015 were invited to participate by e-mail, and their written consent was required for enrolment. Inclusion criterion was no prior experience performing tracheal intubations in patients or mannequins.

Study design

This pilot study had two parts, each lasting one week (Figure). For the first (TRAINING) week, a computer algorithm was used to randomize the students into one of the following three groups:

Figure
figure 1

Flowchart of the study

  1. 1.

    CONTROL group - clinical training on patients using a conventional Macintosh direct laryngoscope (DL group);

  2. 2.

    VL-1 group - clinical training on patients using a GlideScope Direct® VL or Storz C-MAC® #3 blade with real-time verbal feedback;

  3. 3.

    VL-2 group - clinical training on patients using a GlideScope Direct VL or Storz C-MAC #3 blade with real-time verbal feedback plus a video recording of the laryngoscopy for their self-review.

We chose a convenience sample (i.e., 60 students) large enough to provide useful information about feasibility and recruitment and to sustain what would likely be a lengthy study with a high number of participants.18 We anticipated that18 months would be required, allowing for incomplete recruitment and attrition. We aimed to allocate twenty students to each group. The GlideScope Direct VL and Storz C-MAC #3 blade were considered equivalent devices based on the authors’ untested clinical impression. These devices were used interchangeably based on availability.

Prior to their rotation, all medical students were expected to read an anesthesia manual with a chapter on airway management. On day 2 of the TRAINING week, a high-fidelity simulation day is provided to all students irrespective of their participation in the study. The simulation includes DL and tracheal intubation in a mannequin. Airway education in the operating room (OR) was consigned to the supervising anesthesiologist and was not formalized or structured.

Students attempted to perform DL regardless of the group to which they were assigned. In the control (DL) group, feedback was based on the students’ description of what they were seeing and the instructors’ view over the shoulders of their students. In the two VL groups, the monitors were visible only to the supervising anesthesiologists, and feedback was based on the view on the monitor. Anonymized video recordings of the intubation attempts were made in the VL-2 group and provided to students on a flash drive for self-review. These students documented the number of times they viewed each recording.

During the second (TESTING) week of the rotation, all three groups were tested using a conventional Macintosh direct laryngoscope. The following data were recorded: patients’ sex, age, weight, height, and body mass index; time to intubate (measured from insertion of the laryngoscope into the patient’s mouth until a trace of end-tidal carbon dioxide appeared on the monitor); intubation success; and immediate complications. After attempting tracheal intubation, the student and supervising anesthesiologist confidentially documented their subjective confidence in the student’s ability to perform DL using a visual analogue scale from 0 (not confident) to 100 (completely confident). Although the anesthesiologist could intervene if patient safety was compromised, two intubating attempts were permitted. The supervising anesthesiologist timed each attempt, whether successful or unsuccessful. To arrive at a single number approximating the efficiency of intubation, time to intubate was considered the total time of up to two intubation attempts. If the student’s first attempt was successful, this was regarded as time to intubate. Total times to intubate were then averaged across all participants in the group.

Blinding was not always possible, as the anesthesiologist in the TESTING week may have also supervised the student during the TRAINING week. During both weeks, patients were deemed suitable for student-performed DL based on the clinical discretion of the supervising anesthesiologist without specific inclusion or exclusion criteria related to preoperative airway evaluation. All members of the anesthesia departments were eligible to participate as trainers, were provided with the protocol, and verbally indicated their consent to participate. The anesthesia co-ordinator assigned students to ORs daily based on the presumed educational value. The anesthetic technique was chosen at the discretion of the supervising anesthesiologist without being influenced by participation in the study—patient safety was always prioritized over the needs of the study.

Objectives

The primary outcome measure was time to intubate during the TESTING week. The secondary outcome measures included 1) intubation success rate during TESTING week, 2) number of intubating opportunities per student per week (individually for each week as well as the number of attempts in TESTING week), 3) complications of DL during the TESTING week, and 4) confidence scores. The statistical analysis was performed using Prism 5.0 (GraphPad Software Inc., La Jolla, CA, USA). One-way analysis of variance (with Bonferroni’s post hoc correction) was used for intubation times, number of intubating opportunities, and difference in students’ and supervising anesthesiologists’ confidence scores among groups. The Chi square test was used for success rates and complication rates. All reported P values are two sided. Correlation between students’ and supervising anesthesiologists’ confidence scores was assessed by Spearman correlation. Observed differences in means between groups and standard deviations were used to calculate the number of participants needed for the future RCT.

Specific feasibility outcomes for this pilot investigation were not established a priori. Instead, they were determined post hoc following an editorial review recommending their inclusion to comply with standard requirements for a pilot study and to allow a proper assessment of the pilot study findings. The authors considered a recruitment rate ≥ 75% and an attrition rate of < 20% to be consistent with feasibility. Based on approximately 60 students rotating through the anesthesia departments annually, this roughly equalled recruiting, on average, four students each month over 18 months and retaining at least three students in the study each month. We had not defined a requisite number of intubations during the training or testing weeks. Other aspects of feasibility were not defined in advance, including our ability to time or record laryngoscopy attempts or the availability of MacVLs during the study.

Results

Patients’ characteristics were similar amongst the groups (Table 1). Sixty-eight (78%) of the 87 consecutive medical students approached about participation provided written consent and enrolled in the study. Eight (12%) students withdrew from the study for personal reasons during the TRAINING week without contributing any data. Data are available for the remaining 60 students (Figure). These data satisfy the feasibility outcomes for recruitment rate and attrition rate. Data for time to intubate were recorded in 103/112 (92%) successful TESTING intubations across all three groups (37/37 Control; 25/29 VL-1; and 41/46 VL-2). Only 78/110 (71%) TRAINING intubations in the VL-2 group were video recorded, and these 78 recordings were viewed a combined total of 135 times. We did not record a single occurrence of a MacVL being unavailable for a TRAINING intubation in the VL groups (i.e., 100% equipment availability).

Table 1 Demographic data are shown as mean (SD) or as stated

Primary and secondary clinical outcomes are presented in Table 2. Table 3 details analysis of the primary outcome. We found a significant difference in the mean time to intubate between the Control and the two VL Groups (Control, 91 sec; VL-1, 61 sec; VL-2, 66 sec; P = 0.018). There was no incremental benefit from video reviewing. None of the secondary outcome measures reached statistical significance. Although, with respect to intubation success rates and complication rates, we observed a trend favouring video laryngoscopy over conventional Macintosh direct laryngoscopy as a teaching method. Complications are presented in Table 4. The confidence scores expressed by the students did not correlate with those of the supervising anesthesiologists, and we did not observe any difference among groups.

Table 2 Results are shown as mean (SD) or as stated
Table 3 Bonferroni post hoc analysis, difference in means and 95% CI for primary outcome (time to intubate)
Table 4 Complications are shown as numbers

To calculate the sample size for the future RCT, we considered it clinically relevant to use the difference in means between the two groups of interest (Control vs VL-1). As results from pilot studies may be unreliable for sample size calculations, we exercised great caution in using this observed difference.19 Based on the observed difference in means (SD) in time to intubate between Control (DL) and VL-1 in our pilot study, for a probability of type I error (α) = 0.05 and Power (1-β) = 0.9, each group would require 95 students (i.e., total sample size = 190) in a future RCT comparing teaching with DL vs VL with real-time feedback.

Discussion

This pilot study provided information regarding the value, feasibility, and required sample size for a future RCT. We observed an adequate recruitment (78%) and withdrawal (12%) rate. The protocol was generally followed with an acceptable level (92%) of intubation times recorded. The equipment was consistently available. No significant safety concerns were raised. Technical difficulties or omissions accounted for a 71% recording rate of TRAINING intubations in the VL-2 group. The low video recording rate, compounded by fewer than two viewings per recording on average (self-reported data by students) and the apparent lack of incremental benefit, will likely result in this cohort being removed, and our subsequent study will be restricted to Control and VL-1 groups. Moreover, as the GlideScope Direct VL is no longer being manufactured, only the Storz C-MAC will be used in the VL group. Recording the confidence scores has also proved unfeasible as these tended to reflect success/failure in individual intubations, precluding any meaningful analysis.

For the sample size calculation for the future RCT, we concluded that each group would require 95 students (i.e., total sample size = 190). These targets are likely achievable in a similar time frame (two years) with the involvement of other teaching institutions. We need to ensure that these hospitals will be compatible with our protocol in terms of student characteristics, recruitment rates, duration of anesthesia rotations, and the availability of equipment. As the RCT will likely be powered for the primary outcome measure only (time to intubate), any potentially statistically significant differences in secondary outcome measures will need to be interpreted with caution (as with most RCTs). If our group or others consider intubation success rate to be a more relevant or preferable outcome, the comparison of the two groups (Control vs VL-1), with a power of 80%, probability of type I error (α) of 0.1, and a one-sided test, will require 277 students per group. Numbers would increase for three groups (Control vs VL-1 vs VL-2), increased power, or decreased type I error. In light of such considerations, the feasibility of such a study may become a relevant issue and will likely require a large multicentre study.

Observed times to intubate in this pilot study are similar to other reported times (i.e., 70-76 sec),9,20 though those were single and not combined intubation attempts. There are a number of ways to assess performance of a practical skill. In the case of DL, the most common approaches include overall success rates,8,9 first pass success rates,21,22 success rates by attempt,20 intubation times,8,9,20 complication rates,9,23 and/or incidence of difficult laryngoscopy.8 A combination of these variables likely offers a reasonable assessment of DL performance. We chose time to intubate as our primary outcome as we deemed time required to intubate as best representing the ability to complete an intubation by DL. Real-time feedback enabled our students to achieve success rates of 56% and 57% with as few as four or five learning opportunities. Although this is far short of the 90% used by others to define skill acquisition,2,3 there was a trend toward a higher success rate in the VL groups and a clinically meaningful shorter intubation time. In our view, this is an encouraging observation and worthy of further investigation.

Limitations of the study

Limitations of our study relate to the students, patients, devices, and instructors. As we did not capture students’ baseline time to intubate, there may be preexisting differences between groups. We tried to minimize these potential differences by including only those students who had no practical experience with DL as well as by randomization. We were unable to ensure that the patients in the three cohorts were similar with respect to the ease of DL and tracheal intubation. This could easily have introduced a bias that we have not captured. It is unlikely that the lack of universal blinding could have affected the primary outcome, i.e., timing or success of intubation. Nevertheless, it could have had an effect on reporting complications and confidence scores.

The similarity of the direct and indirect views between the GlideScope Direct and the C-MAC may not be identical as we assumed, and the students may not have realized that the view seen on the monitor may have been different from the direct view.24 A structured approach to self-review or supervised video review could have been useful. In one study, the view on the monitor of the Storz Macintosh video laryngoscope (MVL), an earlier version of C-MAC, was the same as the direct view in 55.8% of cases, at least one grade better in 41.5%, and worse in 2.7%.24 Thus the supervisor may have been providing real-time verbal feedback that was based on a somewhat different laryngeal view, reducing its value to the student. We acknowledge that the direct and indirect views may not be identical, but we think they afford more meaningful information than the verbal description the student can provide.

Our study design using patients with multiple instructors mirrors the reality of large teaching facilities with feedback from various supervisors with a range of experience. This may have resulted in a greater heterogeneity of outcomes but is likely realistic and generalizable. From an educational perspective, it might have been beneficial to limit the number of instructors and provide formal airway education to the students using multiple modalities (e.g., lectures, videos showing a standard intubation technique, drawings, practice on mannequins, demonstrations) and subsequent practice on patients.20,25 In the current climate of scarce resources, other educational approaches may be of interest, e.g., teaching without a teacher26 or remote coaching.27

Conclusion

This pilot has established the feasibility, with some modifications, of a subsequent RCT of similar design. We have calculated a sample size and concluded that we would compare only two groups and involve more hospitals. Several considerations are required regarding standardization of airway education, teaching, feedback, and patient characteristics. We trust that the results and lessons learned from this pilot study will serve as a foundation for a future multicentre RCT.