An airway evaluation is a fundamental component of the preanesthetic evaluation. The National Audit Project-4 study showed that failure to properly assess the airway and identify difficulties contributes to poor patient outcomes during operative care.1

Some medical encounters, including a preanesthetic assessment, have been conducted virtually with both high quality and patient satisfaction.2,3,4 Although guidelines have suggested medical encounters most suited for virtual care are those that do not require a physical exam,5 a recent systematic review reported that virtual preanesthetic assessments had high patient satisfaction, similar surgical cancelation rates, and lower costs compared with in-person assessments.6

The COVID-19 pandemic prompted a dramatic increase in virtual care.7 An airway evaluation is traditionally performed in-person but does not require special equipment; therefore, conducting an airway evaluation remotely using virtual videoconferencing technologies is plausible. It is unknown if a virtual airway evaluation (VAE) is a reliable alternative to an in-person assessment, or if reliability is affected by the experience of the airway assessor. Therefore, we wanted to characterize the impact of airway evaluation experience on the reliability of a virtual airway assessment.

Given the interest and potential benefits of virtual care, we undertook this study to test the following hypotheses: 1) in-person airway evaluations performed by consultant anesthesiologists are similar to consultant VAEs as assessed by inter-rater agreement; 2) the inter-rater agreement of consultant in-person airway evaluations to consultant VAEs is superior to consultant in-person evaluations to medical student (novice) VAEs.

Methods

Following University of Saskatchewan Research Ethics Board approval (BEH-2611, 6 May 2021), we conducted a prospective observational study assessing the inter-rater agreement of in-person airway evaluation performed by consultant anesthesiologists (consultant in-person) to VAEs performed by consultant anesthesiologists (consultant VAE), and VAEs performed by medical students (novice VAE).

Consultant in-person evaluations were performed by consultant anesthesiologists in the preoperative holding area as part of routine care. Consultant VAEs and novice VAEs were completed in an unspecified order (based on anesthesiologist availability) before or after consultant in-person evaluation (Figure). The consultant VAEs were completed by two anesthesiologists (W. M., P. H.) and novice VAEs by two medical students (J. M., M. Z.). Evaluators were blinded to each other’s findings. Data sheets were collected but not investigated until all evaluations for all patient participants were conducted. Data collection occurred between June and August 2021 in Saskatoon, Saskatchewan. Project protocols adhered to the Strengthening the Reporting of Observational Studies in Epidemiology guidelines.8

Figure
figure 1

Conceptual diagram of methodology and participation results

Evaluator characteristics

Evaluators did not have a pre-existing relationship or known previous interaction with participants. The consultant anesthesiologists were all fellowship-trained and Royal College of Physicians and Surgeons of Canada-accredited in anesthesiology. Medical students (J. M., M. Z.) had no experience with airway evaluations prior to receiving a workshop taught by experienced consultant anesthesiologists (J. G., W. M., P. H.). Evaluators did not have prior experience with VAEs.

Sample size, recruitment, and participant population

We targeted a convenience sample of 100 based on researcher availability during the study period. Sample size was a function of the bounded time frame of the study and the anticipated number of patients to be seen in hospital for anesthesiology consult. Eligible participants were all patients 17 yr of age and older booked for a preoperative anesthetic assessment having access to a device capable of using the Zoom for Healthcare videoconference software (Zoom Video Communications, San Jose, CA, USA).

Prospective participants scheduled for surgeries who required preanesthetic assessment and were not under care by coinvestigators (J. G., W. M., P. H.) were identified by reviewing the operating room schedules. Participants receiving consultations at the preanesthetic clinic were approached by medical students (J. M., M. Z.). A standardized consenting procedure was used. If needed, patients were given a brief standardized five-minute tutorial at the preanesthetic clinic by the medical students on how to download and use Zoom for Healthcare. We did not formally record the number of participants requiring a tutorial, nor did we assess participants’ technological familiarity. Virtual airway evaluations were conducted after the preanesthetic clinic encounter at an agreed time between the participant and evaluator. All VAEs were conducted remotely with the participants choosing a convenient location, most often their residence. Consultant and novice VAEs were conducted at different times before or after consultant in-person.

Data collection

The airway evaluation scoring tool used was based on an airway evaluation publication,9 to which we added a single scorable item, the thyroid-mental distance, to reflect routine practice in our study centers. We evaluated ten components binarily by assigning 1 and 0 points for positive and negative findings, respectively, for a maximum total score of 10 points. These components were (1) facial trauma, (2) large incisors, (3) a beard or mustache, (4) mouth opening < three fingerbreadths, (5) thyro-mental distance < five fingerbreadths, (6) hyo-mental distance < three fingerbreadths, (7) thyro-hyoid distance < two fingerbreadths, (8) Mallampati class ≥ 3, (9) presence of an obstructed airway, and (10) poor neck mobility. Where fingerbreadth assessments were made, the patients’ fingers were used for VAEs. We considered an obstructed airway synonymous with a history of obstructive sleep apnea (OSA) or loud snoring, which is defined as louder than talking volume or loud enough to be heard through closed doors.10

All airway evaluations followed the same study protocol. Consultant in-person evaluators (anesthesiologists) were aware of any previously recorded airway management difficulties from patient participants’ medical charts; consultant VAE evaluators did not review participants’ medical charts. Each participant underwent three separate evaluations: a consultant VAE (anesthesiologists W. M., P. H.), novice VAE (medical students J. M., M. Z.), and a consultant in-person evaluation by a consultant anesthesiologist who was the participant patient’s attending anesthesiologist. Participants with incomplete assessments were excluded from data analysis.

Consultant in-person evaluations

Consenting participants were evaluated by consultant anesthesiologists as part of the preanesthetic evaluation during routine care immediately before their scheduled surgery. Prior to these in-person evaluations, the consultant anesthesiologists were contacted by email (the day prior) to introduce the data collection tool. Afterward, one of the student researchers met with the consultant anesthesiologists on the day of the scheduled procedure to review the data collection tool and answer any study-related questions. Consultant anesthesiologists did not have prior exposure to the study’s airway evaluation scoring tool. In addition to the standardized preanesthetic airway evaluation, the anesthesiologists reported if the intubation was difficult (as determined clinically by the anesthesiologist) in cases where endotracheal intubation was part of the intraoperative anesthetic management (Electronic Supplementary Material [ESM] eAppendix 1).

Consultant and novice VAEs

Consultant and novice VAEs were conducted before or after consultant in-person evaluations in virtual meeting rooms hosted by Zoom for Healthcare.11 We evaluated the airway evaluation components 1 to 10 sequentially as ordered above. Directions given to the participants to optimize conditions for effective evaluation of the airway components (e.g., distance between face and screen, lighting, camera angle) were determined ad hoc by the evaluator during each evaluation. Evaluators recorded findings with visual and oral feedback from participants and collected field notes following evaluations. All evaluators used the same data collection form (ESM eAppendix 2).

Outcomes and data analysis

We tested our first hypothesis by assessing the inter-rater agreement of consultant in-person evaluations to consultant VAEs. We tested our second hypothesis by comparing the inter-rater agreement of consultant in-person evaluations to consultant VAEs against the inter-rater agreement of consultant in-person evaluations to novice VAEs (to elucidate the importance of clinical experience in airway evaluation). The inter-rater agreement for total airway scores, our primary outcome, was assessed using Cohen’s Kappa (CK). The secondary outcomes included the inter-rater agreement for each airway evaluation component of consultant in-person evaluations to consultant VAEs, and consultant in-person evaluations to novice VAEs. These secondary outcomes were assessed using prevalence-adjusted bias-adjusted Kappa (PABAK). Prevalence-adjusted bias-adjusted Kappa was used to account for disagreement between percent agreement and CK and low variability or prevalence in the data (see ESM eTable 1).12 Inter-rater agreement CK coefficients of consultant in-person total scores were compared with those of consultant and novice VAEs by calculating P values from CK 97.5% confidence intervals (CIs),13 with an alpha of 0.05. Analysis was performed with Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) and IBM SPSS Statistics for Windows, version 26 (IBM Corp., Armonk, NY, USA).

Results

One hundred out of 111 participants completed all three evaluations; one participant was unable to complete VAEs because of technological challenges, and ten participants had incomplete in-person evaluations (Figure). Demographics of the participants with complete airway evaluations are presented in Table 1.

Table 1 Characteristics of study participants

The inter-rater agreement CK coefficients (fair, moderate, good, very good: CK = 0.21–0.40, 0.41–0.60, 0.61–0.80, 0.81–1.00; respectively) of consultant in-person total scores were fair compared with those of consultant VAEs (CK, 0.21; 97.5% CI, 0.07 to 0.34) and were good compared with those of novice VAEs (CK, 0.74; 97.5% CI, 0.62 to 0.86) (Table 2).14 Consultant in-person evaluations had a significantly higher level of inter-rater agreement with novice VAEs than with consultant VAEs (P < 0.001). Inter-rater agreement of individual airway evaluation components is described in Table 2. Most consultant in-person to consultant VAE and all consultant in-person to novice VAE PABAK inter-rater assessments were good to very good. There was moderate agreement between the consultant in-person evaluation and consultant VAE for thyro-mental distance (PABAK, 0.56; 97.5% CI, 0.37 to 0.75) and obstructed airway (PABAK, 0.48; 97.5% CI, 0.28 to 0.68), and fair agreement for Mallampati class (PABAK, 0.38; 97.5% CI, 0.17 to 0.59). Raw score frequencies for total scores and airway evaluation components are provided as ESM eTable 2.

Table 2 Inter-rater agreement of consultant in-person to consultant virtual airway evaluations and consultant in-person to novice virtual airway evaluations

One participant had a difficult intubation. The airway total scores for this patient were 4 for the consultant in-person evaluation, 2 for the consultant VAE, and 4 for the novice VAE.

Discussion

Main outcomes

Our results show that the inter-rater agreement of the total airway score between a consultant in-person evaluation and a consultant VAE was fair and the inter-rater agreement between a consultant in-person evaluation and a novice VAE was good. Additionally, most airway evaluation components had good to very good inter-rater agreement between consultant in-person evaluations and VAEs. The limited number of difficult intubations (n = 1) precludes further analysis or conclusions as to VAEs predictive value regarding difficult intubations.

Explanation of the findings

The inter-rater agreement of total scores being lower than that of the individual airway components can be understood when considering the nature of the comparison statistics (CK vs PABAK). The measurement agreement of the individual airway components was assessed in isolation and was not affected by the degree of agreement of the other assessed components. The overall agreement was influenced by each component assessed (with a binary outcome), and if the total score of a matched assessment differed by even one point, that assessment was considered in disagreement.

The imperfect agreement we observed between consultant in-person evaluations and VAEs was expected as the inter-rater agreement of airway evaluation components is imperfect even when comparing in-person with in-person assessments.15 Further, the imprecision in most clinical evaluations was the basis for the suggestion of CK ≥ 0.6 and percent agreement ≥ 80% as acceptable levels for agreement in healthcare settings.16 Additionally, several airway evaluation elements, notably Mallampati class, have been criticized for lacking precise clinical definitions.15 In our study, the evaluation of an obstructed airway was described as the presence of OSA or loud snoring, but consultant in-person evaluators may have had other substitutes for this metric17 that were not easily identified by a VAE. This may explain the lower inter-rater reliability for the Mallampati scores and the presence/absence of an obstructed airway. Further, the study protocol used the patients’ fingers to estimate distances for the VAEs, but consultant in-person evaluations used the traditional assessment using the anesthesiologists’ fingers. This discrepancy may account for the lower inter-rater reliability for thyro-mental distance. Conceptually, the thyro-mental distance has both the largest measured distance and had the lowest inter-rater agreement compared with other airway evaluation components assessed by finger breadths when distinguishing between positive and negative findings and is therefore more affected by anatomical differences between evaluator/participant fingers.

Technological equipment and familiarity with videoconferencing platforms may significantly influence the accuracy of VAEs.18 We placed no limitations on minimum technological proficiency. Further, our field notes suggest that many of our patients resided in rural and remote locations with poor internet or cellular service connection, although we did not formally analyze field note data. Additionally, patients used their own available devices, including tablets, cellphones, and desktop computers, at a location of their choosing, potentially further reducing inter-rater agreement. While conducting the study assessments, our field notes suggest there were instances of poor internet connection and/or video camera quality, which resulted in poor picture quality, potentially reducing the accuracy of the measurements. These observations are consistent with those of previous publications.19,20,21,22,23

Our results unexpectedly showed a higher inter-rater agreement of consultant in-person evaluations to novice VAEs than consultant in-person evaluations to consultant VAEs. Varying technological familiarity with videoconferencing between consultant and novice evaluators may explain this surprising finding. Field notes suggested that novice VAE evaluators provided more direction and guidance to the participants during the videoconferencing than consultant VAE evaluators did. Previous studies show improved virtual healthcare outcomes with increased provider technological familiarity.18, 24,25,26 Higher technological familiarity may have enabled our novice VAE evaluators to provide participants with additional education and guidance (e.g., manipulation for optimal camera views, helping patients to find favorable lighting, and screenshotting views for re-evaluations). This conjecture is consistent with our results of Mallampati class evaluation; the airway evaluation item that conceptually needed the most coaching was the item where novice VAE most outperformed consultant VAE regarding inter-rater agreement with consultant in-person evaluation.

Strengths

One of the strengths of our study is the broad inclusion criteria open to any consenting patient 17 yr of age and older who had access to any device (even if they did not personally own the device) capable of using Zoom for Healthcare, regardless of technological familiarity. Additionally, the study protocol allowed participants to use the technological device of their choosing/availability, potentially further increasing the generalizability of our findings. Finally, our project includes both novice and consultant VAEs to characterize the impact of evaluator experience on VAEs.

Limitations

Our results may have been affected by the small sample size of consultant VAE and novice VAE evaluators (two of each evaluator type). Our broad inclusion criteria and lack of device standardization prevent insights into which patients or hardware may be best suited for VAE. Since only one participant was reported to have difficult intubation, these results may not be applicable to patients with a difficult airway. Finally, knowledge of previous airway management may bias a consultant in-person evaluator to evaluate an airway differently (either more or less thoroughly) than they would without having this knowledge.

Future directions

The goal of a preanesthetic airway evaluation is to identify and subsequently counsel the patient and plan for difficult airway management;27 further study assessing the ability of a VAE to predict difficult airway management seems prudent, such as repeating the study on a population limited to patients with known difficult airways. Additionally, our results suggest future study is needed to define educational programs to improve both providers’ and patients’ use of VAEs. Evaluators’ technological familiarity and the impact of rural and remote patient locations and poor internet or cellular service connection should be objectively measured in future studies to further assess their impact on VAE success. Finally, determining which patients, hardware, and software are most suited to VAE should be defined.

Conclusion

Our findings suggest that a VAE with good inter-rater reliability is possible with appropriate evaluator proficiency with videoconferencing technologies in patients with normal airways. The lack of an in-person airway evaluation should not be considered an insurmountable barrier to virtual preoperative assessment.