Background

Obstructive sleep apnea (OSA) is the most common sleep-related breathing disorder, characterized by episodes of apnea and hypopnea [1]. Recognizing OSA in undiagnosed patients preoperatively is particularly important, as many analgesics, anesthetics, and sedatives are respiratory depressants that can exacerbate OSA [1, 2]. Up to 68% of patients undergoing surgery with OSA can be undiagnosed, [2, 3] resulting in increased risk of perioperative cardiovascular and pulmonary complications [2, 4,5,6,7]. Thus, an easy-to-administer screening tool for preoperative assessment of patients undergoing surgery at increased risk for OSA is essential in the armament of perioperative risk stratification.

Polysomnography (PSG) is the diagnostic standard for OSA. However, PSG can be difficult to access as it is expensive, time consuming, and requires overnight laboratory observation [8]. With limited resources for PSG and the high prevalence of OSA in the general population, several screening tools have been developed for clinicians to prioritize diagnosis and treatment in patients with increased risk of OSA. In the surgical population, the STOP-Bang questionnaire, [9] STOP questionnaire, [9] P-SAP score, [10] Berlin questionnaire, [11] and ASA checklist [11] are validated screening tools for OSA.

The STOP-Bang questionnaire is a simple tool for detecting OSA that takes approximately 1 minute to complete. It has been validated in multiple settings [9, 12,13,14,15] and used worldwide in different populations [13,14,15,16]. The STOP-Bang questionnaire consists of four binary (STOP: Snoring, Tiredness, Observed apnea, and high blood Pressure), and four demographic questions (Bang: Body mass index (BMI), age, neck circumference, and gender) [9]. When first developed, the STOP-Bang questionnaire with a cut-off score of 3 or greater had demonstrated a sensitivity of 83.9, 92.9, and 100% in detecting all OSA (Apnea–Hypopnea Index (AHI) ≥5 events per hour), moderate-to-severe OSA (AHI ≥15 events per hour), and severe OSA (AHI ≥30 events per hour), respectively [9]. As a preoperative diagnosis of OSA is associated with higher risk of complications in the perioperative setting, [4, 17,18,19] the predictive parameters of the STOP-Bang questionnaire in the surgical population should be evaluated to determine its utility in predicting perioperative complications associated with OSA. To date, no systematic review and meta-analysis has examined the validity of the STOP-Bang questionnaire to detect OSA specifically in the surgical population. The objective of this systematic review and meta-analysis is to determine the validity of the STOP-Bang questionnaire as a preoperative screening tool in identifying those at increased risk of OSA in the surgical population cohort.

Methods

Study registration

The protocol of this study was registered in the International Prospective Register of Systematic Reviews (PROSPERO; registration CRD42021260451). The study was completed in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline [20].

Literature search strategy

The literature search was performed by an information specialist (ME) using the Ovid platform for the following databases: MedlineALL, Embase, Cochrane Central Register of Controlled Trials, Cochrane Database of Systematic Reviews, APA PsycINFO, and Journals@Ovid. CINAHL, the Web of Science, and Scopus (Elsevier) were also searched. Search components consisted of (“stop-bang” or “stopbang”) AND (perioperative or postoperative or surgery) related terms. Searches were limited to the years 2008 (development of STOP-Bang questionnaire) to May 14, 2021. No other limits were applied. Literature surveillance was performed through November 2021. The Medline search strategy is provided in the supplemental material (Appendix 1).

Study selection and data management

Title and abstract screening, and full text evaluation were independently completed by two reviewers (MH, NG) using Covidence. Full text articles meeting the following inclusion criteria were included: 1) the study screened for OSA using the STOP-Bang questionnaire in adult patients aged ≥18 years undergoing surgery; 2) OSA diagnosis confirmed by PSG or home sleep apnea testing (HSAT); 3) severity of OSA measured by Apnea Hypopnea Index (AHI) or Respiratory Disturbance Index (RDI) cut-offs ≥5, ≥ 15, and ≥ 30 events per hour; and 4) accuracy of the STOP-Bang questionnaire assessed with predictive parameters. The two reviewers extracted data from the included studies with a standardized form. A third reviewer (AS) resolved any discrepancy between the reviewers. Data collection was managed in Excel (Redmond, United States).

Evaluation of methodological quality

Two reviewers (MH, NG) independently evaluated bias of the included studies. The assessment was conducted using criteria for internal and external validity coded using the Cochrane Screening and Diagnostic Tests Methods Group [21]. The result of the evaluation was compared and a third reviewer (AS) resolved any discrepancies. Internal validity was assessed using the following criteria: valid reference standard, definition of disease, blind execution of index and reference tests, interpretation of index test independent of clinical information, and study design. External validity was assessed using the following criteria: disease spectrum, clinical setting, previous screening or referral filter, demographic information, explicit cut-off of index test, percentage of missing participants, management of missing data, and selection of participants for the reference test. Furthermore, the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool was used by the reviewers to rate the quality of individual included study on a scale ranging from 0 to 14 [22].

Statistical analysis

Meta-analysis was performed with Review Manager Version 5.4 and Meta-disc V.1.4. For each of the included studies, 2 × 2 contingency tables were created to obtain predictive parameters with 95% confidence intervals. The following pooled predictive parameters were calculated using a bivariate random-effects model: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), diagnostic odds ratio (DOR), likelihood ratios, and area under the curve (AUC) to evaluate the validity of the STOP-Bang questionnaire for different OSA severities defined by AHI cut-offs: AHI ≥ 5 (all), AHI ≥ 15 (moderate-to-severe), and AHI ≥ 30 (severe) events per hour. A STOP-Bang score of three or greater was accepted as the threshold and post-test probabilities were calculated as described by Brooks et al. [23] Also, the pooled predictive parameters of additional STOP-Bang score thresholds were calculated for different OSA severities.

Meta-regression and sensitivity analysis were performed for moderate-to-severe and severe OSA using Open Meta Analyst software [24] for categorical variables (validation tools and study type) and continuous variables (age, male gender, BMI, neck circumference, prevalence, and sample size). We focused to measure the association between these variables and the combined estimates of sensitivity, specificity, and log scale diagnostic odds ratio. Leave-one-study-out analysis was performed to examine the effect, if any, of individual study on the reliability of the combined estimates. Level of statistical significance was set at p value < 0.05.

Results

The search of literature identified 4641 articles, from which 2029 duplicates were removed (Fig. 1). Following the review of titles and abstracts, 2586 studies did not meet the inclusion criteria and were excluded. The full text of the remaining 26 studies were reviewed, and 16 full text articles [25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40] were excluded due to reasons listed in Supplementary Table S1. The review included 10 articles that satisfied the inclusion criteria [41,42,43,44,45,46,47,48,49,50]. Of note, Nunes et al. [42] and Waseem et al. [50] included two and four subgroups, respectively, yielding a total of 14 included study groups. The included studies involved 3247 surgical patients who were preoperatively evaluated for OSA.

Fig. 1
figure 1

PRISMA Flow Diagram

The demographic data of the included studies are summarized in Table 1. In surgical patients, the mean age was 57.3 ± 15.2 years, mean BMI was 32.5 ± 10.1 kg/m2, and 47% were male. Seven studies [41,42,43,44, 47, 49, 50] were of prospective design and three studies were retrospective [45, 46, 48]. The characteristics of the included studies are summarized in Table 2. Surgical procedures comprised of non-cardiac elective surgery [41, 44, 50] (n = 3), abdominal surgery [42] (n = 1), coronary artery bypass grafting [42] (n = 1), bariatric surgery [45, 46, 48, 49] (n = 4), and total joint arthroplasty [47] (n = 1). The study by Nunes et al. had two different populations of surgical patients (abdominal surgery and coronary artery bypass grafting) [42]. Among the surgical population, four studies included AHI cut off ≥5 [41, 43, 44, 47], eight included AHI cut off ≥15 [41,42,43, 45, 47,48,49,50], and five included cut off AHI ≥30 events per hour [41, 43, 45, 47, 50] (Figs. 2 and 3).

Table 1 Demographic data of patients using STOP-Bang questionnaire
Table 2 Characteristics of included studies
Fig. 2
figure 2

Forest plot of pooled sensitivity and specificity of STOP-Bang questionnaire for various OSA severities in surgical patients. Values are presented as means with 95% CI in parentheses. Abbreviations: AHI, Apnea–Hypopnea index; Bang, body mass index, age, neck circumference and gender; CABG, coronary artery bypass graft surgery; CI, confidence interval; OSA, obstructive sleep apnea; STOP, snoring, tiredness, observed apnea and high blood pressure

Fig. 3
figure 3

Forest plot of pooled diagnostic odds ratio of STOP-Bang questionnaire for various OSA severities in surgical patients. Values are presented as means with 95% CI in parentheses. Abbreviations: AHI, Apnea–Hypopnea index; Bang, body mass index, age, neck circumference and gender; CABG, coronary artery bypass graft surgery; CI, confidence interval; OR, odds ratio; OSA, obstructive sleep apnea; STOP, snoring, tiredness, observed apnea and high blood pressure

Methodological quality of the included studies

The included studies had QUADAS scores ranging from 9 to 13, denoting a moderate risk of bias (Table 1). PSG or HSAT was used as a reference test in all included studies to determine the accuracy of the STOP-Bang questionnaire (Table 2). Three studies [42, 43, 45] exclusively used PSG, two [41, 48] used a mix of PSG and HSAT, and five [44, 46, 47, 49, 50] used HSAT. Although the standard for the diagnosis of OSA is PSG, there were no significant disparities between studies that used PSG or HSAT regarding prevalence of OSA (Table 2) and accuracy of the STOP-Bang questionnaire (Fig. 2).

The evaluation of internal and external validities of the included studies is summarized in Supplementary Tables S2 and S3. Regarding the internal validity, index and reference tests were blindly executed in five studies [41, 44, 45, 49, 50] and STOP-Bang scores were interpreted independently from clinical data in two studies [44, 50] (Supplementary Table S2). All but one study [43], which had unspecified inclusion criteria, fully described their inclusion and exclusion criteria. In all 10 studies, adequate information was provided to describe the study setting and the demographics of the surgical patients, including age, sex, and BMI. Although one study [43] did not randomly select patients for PSG, all studies applied the STOP-Bang questionnaire without pre-screening for OSA. Overall, there is a low risk of bias in subject selection for the reference test across the 10 studies.

Accuracy of the STOP-Bang questionnaire in surgical patients

For the STOP-Bang questionnaire, the pooled predictive parameters of a score three or greater to screen for OSA in patients undergoing surgery are presented in Table 3 and Figs. 2 and 3. The prevalence of all, moderate-to-severe, and severe OSA was 65, 38, and 17% respectively. The pooled sensitivity of the STOP-Bang questionnaire was high at 85% (95%CI: 82, 88%, I2: 40.9%), 88% (95%CI: 85, 89%, I2: 85.7%), and 90% (95%CI: 87, 93%, I2: 86.9%) in screening for all, moderate-to-severe, and severe OSA, respectively. The pooled specificity was moderate at 47% (95%CI: 42, 52%, I2: 87.2%), 29% (95%CI: 27, 32%, I2: 89.9%), and 27% (95%CI: 25, 29%, I2: 93.1%) for all, moderate-to-severe, and severe OSA, respectively.

Table 3 Pooled predictive parameters of STOP-Bang ≥3 as cut-off for surgical patients

The pooled positive predictive value (PPV) was highest at 75% (95%CI: 71.8, 77.7%) in detecting all OSA, and the corresponding PPVs for moderate-to-severe and severe OSA were 43% (95%CI: 40.8, 45.0%) and 20% (95%CI: 18.5, 22.2%), respectively (Table 3). The negative predictive value (NPV) for severe OSA was highest at 93.2% (95%CI: 90.9, 95.1%), indicating that the STOP-Bang questionnaire can reasonably rule-out severe OSA. A negative score of 0–2 would decrease the probability of diagnosing severe OSA from 17.0 to 5.2%. The corresponding NPV values for all and moderate-to-severe OSA were 62.7% (95% CI: 56.9, 68.1%) and 79.6% (95% CI: 76.2, 82.6). The area under the curve (AUC) was 0.84, 0.67, and 0.63 for all, moderate-to-severe, and severe OSA, respectively.

Accuracy of different STOP-Bang score thresholds

The accuracy of different STOP-Bang score cut-offs in surgical patients for all OSA (n = 5722), moderate-to-severe OSA (n = 12,207) and severe OSA (n = 9878) are summarized in Supplementary Table S4. With the increase in the STOP-Bang threshold from 3 to 5, the sensitivity diminished from 88 to 50% for moderate-to-severe OSA and from 90 to 61% for severe OSA. As well, there was an increase in specificity from 29 to 78% for moderate-to-severe and from 27 to 75% for severe OSA. The PPV was highest at 86% with a STOP-Bang threshold of 6 or greater for detecting all OSA. Similarly, the NPV was highest for severe OSA at 94% for a threshold of 4 or greater.

Meta-regression and sensitivity analysis

Meta-regression and sensitivity analysis were conducted for moderate-to-severe OSA with 12 study groups and for severe OSA in eight studies (Supplementary Tables S5 and S6). The analysis revealed that continuous variables marginally altered the combined estimates without significant effect on the results. Similarly, the categorical variables also marginally altered the combined estimates without significance. There was no significant effect on the results by any individual study as shown by leave-one-study-out analysis (Supplementary Fig. S1).

Discussion

To date, our study is the first systematic review and meta-analysis examining the validity of the STOP-Bang questionnaire in the preoperative setting for screening of OSA in the surgical population. We demonstrate that a STOP-Bang score three or greater has excellent AUC of 0.84 to detect OSA in patients undergoing surgery. The high sensitivity and significant diagnostic odds ratio of STOP-Bang score ≥ 3 across the three OSA severities help identify patients undergoing surgery at increased risk for OSA. Similarly, the high NPV of 93.2% can help clinicians to reasonably exclude severe OSA in patients that score 0 to 2.

The prevalence of OSA in our study was high: 65, 38, and 17% for all, moderate-to-severe, and severe OSA, respectively. This is in keeping with previously reported prevalence of OSA in surgical patients [51, 52]. Overall, the high prevalence in the surgical versus the general population [53, 54] could be a consequence of higher burden of comorbidities in patients undergoing surgery, which may be risk factors for OSA.

There may be variations in the predictive accuracy of the STOP-Bang questionnaire within different ethnic groups. Devaraj et al. found a sensitivity of 82.8% and specificity of 65.2% [44]. A recent large prospective cohort study found that the optimal BMI cut-off in Indian population to be > 27.5 kg/m2 and STOP-Bang score 4 or greater as the optimal discrimination score to predict moderate-to-severe and severe OSA [50]. Most importantly, a recent meta-analysis on the performance of STOP-Bang in different geographic regions in 47 studies with 26,547 participants found it to be a valid screening tool worldwide [13]. Our study included patients across different countries and ethnicities, and our findings apply broadly to the surgical population.

In an ideal setting, every patient with undiagnosed OSA should be identified to minimize the risk of perioperative complications. Given limited logistical, financial, and clinical resources, especially in the preoperative setting, clinicians must carefully balance between the missed cases of OSA and the use of healthcare resource to diagnose OSA. In this regard, the predictive parameters of the screening tool are important measures for clinicians to take into consideration when screening patients. Sensitivity and specificity are two parameters that are typically inversely related. We found that an increase in the STOP-Bang cut-off corresponded to increased specificity and a reciprocal decrease in sensitivity in the detection of all, moderate-to-severe, and severe OSA. Surgical patients who score three or greater on STOP-Bang have a high probability of moderate-to-severe OSA [41, 55]. We found that a STOP-Bang threshold of six or greater had the highest PPV of 86% with high specificity of 90% for detecting all OSA (Supplementary Table S4). Our finding is consistent with a recent study that showed a STOP-Bang threshold of 6 has a high specificity of 91% in detecting OSA [56]. Whereas a STOP-Bang score of three or greater can be used to risk stratify patients at increased risk of OSA, a higher threshold may be useful for a patient population with a higher prevalence of OSA to reduce false-positives. In general, surgical patients should be screened with a threshold of three or greater unless a high prevalence of OSA is suspected, in which case a threshold of five or six may be beneficial to identify those at high-risk of undiagnosed OSA and in most need of further evaluation.

Utility of the STOP-Bang questionnaire in patients undergoing surgery

Despite rising awareness and increase in prevalence of OSA in patients undergoing surgery, [53, 57] the vast majority of patients with OSA are unidentified preoperatively [2, 3, 52]. Undiagnosed OSA has been associated with difficult airway management [58] and increased postoperative complications including cardiovascular events, reintubation, respiratory complications, and longer hospital stay [2, 17, 59,60,61]. Notably, preoperative use of the STOP-Bang questionnaire to screen surgical patients to detect undiagnosed OSA has been shown to predict postoperative complications [17, 18, 28, 59, 62]. Of note, several of these studies are non-randomized, observational studies [60,61,62].

An increased severity of OSA may be associated with an increased rate of postoperative complications. Severe OSA was found to be associated with increased risk of postoperative cardiac complications [2]. Similarly, a higher incidence of postoperative complications was associated with higher OSA severity [63]. As higher STOP-Bang score is associated with higher risk of moderate-to-severe and severe OSA, [41] our findings indicate that the STOP-Bang questionnaire is a valid screening tool for preoperative risk stratification.

A recent study found that patients identified by the STOP-Bang questionnaire (score ≥ 3) as at increased risk for OSA had a 4-fold increase in post-operative cardiopulmonary events [59]. Similarly, patients with STOP-Bang score ≥ 3 experienced worse perioperative respiratory outcomes and prolonged hospital stay [62]. As such, missed awareness of OSA in surgical patients can put substantial strain on the healthcare system due to increased consumption of resources in the form of intensive care, increased ventilator support, and longer length of hospitalization [64]. Patients with OSA and compliant with their continuous positive airway pressure (CPAP) therapy were shown to have improved oxygen desaturation index on the night of surgery and were less likely to require oxygen therapy [65]. In addition, surgical patients with OSA and a CPAP prescription were associated with fewer cardiovascular complications, [17] further highlighting the importance of preoperative identification of undiagnosed OSA.

Nevertheless, limited time between preoperative evaluation and surgery, patient hesitance to undergo sleep testing, and long waitlists for sleep clinics are barriers to recognizing undiagnosed OSA. This underscores the importance of access to a robust and easy-to-administer screening tool with a high predictive accuracy. We report that that the STOP-Bang questionnaire is a valid screening tool that addresses this need with a high AUC of 0.84 for clinicians to risk-stratify preoperative patients and to plan mitigation for perioperative complications associated with OSA. Surgical patients at high risk of OSA should be considered for postoperative monitoring such as continuous oximetry and capnography [66, 67].

There are some limitations in our study. First, both PSG and HSAT were used as diagnostic tools for OSA in the included studies. Although the two are often equitable, some heterogeneity may be present as PSG is the diagnostic standard. Secondly, the internal validity was difficult to assess as blinding of the index and reference tests was unclear. Nevertheless, QUADAS tool was used to provide additional evaluation of the quality of the included studies. Lastly, our study population included a variety of surgical procedures, which may limit the applicability of our results to specific surgical populations. The combination of methodological variations in the diagnostic tools, the variability in prevalence of OSA across the studies, and the various surgical procedures likely resulted in high heterogeneity of the predictive parameters. In anticipation of this heterogeneity, a random-effects model was used for the meta-analysis. Nevertheless, our study presents a current review of the literature on the accuracy of the STOP-Bang questionnaire as a preoperative screening tool in the surgical population.

Conclusions

In summary, our systematic review and meta-analysis demonstrates the validity of the STOP-Bang questionnaire for screening of OSA in surgical patients. With a score cut-off of 3 or greater, the STOP-Bang questionnaire has a high sensitivity and NPV, demonstrating its predictive utility to detect OSA in the surgical cohort.