Introduction

Obstructive sleep apnea (OSA) is a sleep-related breathing disorder characterized by repetitive partial or complete upper airway obstruction which often results in decreased arterial oxygen saturation and arousal from sleep [1]. OSA severity is commonly classified based on the apnea-hypopnea index (AHI) [2]. OSA has been associated with cardiovascular and metabolic consequences and is also linked with increased overall mortality [3]. Currently, overnight polysomnography (PSG) is the gold standard for diagnosing the presence and severity of OSA. However, its high expense, relative inaccessibility, and time consumption can delay or impede the diagnosis and treatment of patients with OSA, mainly in areas with limited healthcare resources [4]. Additionally, the increasing number of patients suspected of having OSA and the lack of structured patient interviews contribute to the growing number of patients being referred to sleep clinics [5]. Therefore, simple screening instruments for identifying patients at high risk for OSA have become increasingly important.

Several instruments have been developed over the years including the STOP-Bang questionnaire [6, 7] and the NoSAS score [8]. The STOP-Bang questionnaire shows a high sensitivity and negative predictive value, and therefore is a suitable instrument to rule out patients at risk for OSA [9,10,11,12]. However, it has a low to moderate specificity and it is possible that this will yield a high false-positive rate. Low specificity may result in unnecessary referral to sleep clinics for polysomnography [6, 7]. The NoSAS score has been validated in multiple patient cohorts, and opinions concerning superiority over the STOP-Bang questionnaire differ [8, 10, 13,14,15]. The original validation of the NoSAS score by Marti-Soler et al. describes higher specificity and positive predictive values in comparison with the STOP-Bang questionnaire, while maintaining a moderate to high sensitivity and negative predictive value, therefore allowing to rule out clinically significant OSA and simultaneously reducing the number of unnecessary nocturnal recordings as well as the number of missed diagnosis [8]. The Epworth sleepiness scale (ESS), which was originally designed to assess the extent of daytime sleepiness, has also been suggested as a screening tool for identifying patients at high risk for OSA [16]. However, multiple authors have found the ESS to be inferior to other screening tools for identifying patients at high risk for OSA [11, 12, 17, 18].

The present study reviewed and analyzed a cohort of 235 patients who underwent PSG, using in each case all three instruments: the STOP-Bang questionnaire [6], the NoSAS score [8], and the ESS [16]. Our main objectives were to evaluate the predictive and discriminative performance of the different screening instruments and compare the diagnostic effectiveness of the different methods. Additionally, we aimed to determine which variables independently were the strongest predictors in this cohort.

Recently, it has been suggested that the AHI is susceptible to variability in the clinical setting and that there is a need for an alternative parameter to indicate OSA severity [3, 19,20,21]. An important disadvantage regarding the AHI is that the morphology and duration of the apneas are not taken into account. Longer, deeper apneas might be more significant than shorter, shallow ones [22]. Significant differences in the severity of OSA have been described between patients with a similar AHI [22]. Nocturnal oxygen desaturations are the result of apneas and are important in the pathogenesis and development of complications of OSA [23]. The arterial oxygen desaturation index (ODI) has therefore been proposed as an alternative for the AHI in grading PSG data and classifying OSA severity [23,24,25,26]. The ODI might be more relevant due to the higher reproducibility in the clinical setting [3, 19,20,21]. Furthermore, there is evidence that the ODI is independently associated with prevalent risk factors like hypertension, whereas the AHI is not [19]. Therefore, in the present study, the discriminatory ability of the screening instruments will be evaluated by criteria based on the AHI as well as on the ODI.

Methods

Study design

Data from 235 patients who were monitored by ambulant PSG were retrospectively analyzed. Patient inclusion criteria were patients aging 18 years of age or older, completed clinical data, and completed STOP-Bang questionnaire and NoSAS score. Patient exclusion criteria were previously diagnosed OSA, use of portable sleep studies or respiratory polygraphy, incomplete clinical data, and technically inadequate PSG. In the outpatient clinic, the following clinical parameters were collected for all patients: gender, age, height, weight, body mass index (BMI), neck circumference (NC), self-reported complaints (snoring, daytime sleepiness, and apnea), and self-reported comorbidities (cardiovascular history, hypertension, pulmonary history). The ESS was completed. The clinical parameters were used to calculate the NoSAS score and the STOP-Bang questionnaire.

Screening instruments (supplementary material)

The STOP-Bang questionnaire consists of four questions used in the STOP questionnaire—snoring, tiredness, observed apneas, and hypertension—plus four demographic queries—BMI > 35 kg/m2, age > 50 years old, neck circumference > 40 cm, and male gender. For each question, answering ‘yes’ scores 1, a ‘no’ scores 0. With a total range of 0–8, a total score of ≥ 3 points is considered as a high probability for OSA [6]. The NoSAS score is a 5-item questionnaire which includes neck circumference, obesity, snoring, age, and gender. With a range of 0–17, NoSAS scores 4 points for neck circumference ≥ 40 cm, 3 points for BMI 25–30 kg/m2, 5 points for BMI ≥ 30 kg/m2, 2 points for snoring, 4 points for age > 55 years old, and 2 points for male gender. The total score of ≥ 8 points is considered as a high probability for OSA [8]. The ESS consists of 8 situations, allowing the patients to assess their degree of dozing off or falling asleep in a particular scene during the day, 0 for no dozing, and 1, 2, and 3 for slight, moderate, and high chance of dozing. A total score of ≥ 10 points is considered as excessive daytime sleepiness [16].

Sleep study, scoring, and diagnosis

All patients underwent a full-night PSG at home. PSG included electroencephalography, electrooculography, surface electromyography, nasal airflow and air temperature, thoracoabdominal movements, pulse oximetry, body position, and snoring sounds. Breathing was recorded with nasal pressure and temperature sensors. Scoring of the electronic raw data was performed manually, following the recommendations of the American Academy of Sleep Medicine [2]. Apnea was defined as a decrease of at least 90% of airflow from baseline for > 10 s. Hypopnea was defined as a decrease of at least 30% of airflow from baseline for > 10 s, associated with either an arousal or ≥ 3% arterial oxygen saturation decrease. The mean number of apneas and hypopneas per hour of sleep (AHI) was calculated. The ODI was defined as the mean number of arterial oxygen desaturations ≥ 3% per hour. The severity of OSA was categorized both according to the AHI and to the ODI. By using the AHI, patients were classified as mild (5 ≤ AHI < 15 events/h), moderate (15 ≤ AHI < 30 events/h), or severe (AHI ≥ 30 events/h) according to the 2012 American Academy of Sleep Medicine criteria [2]. For classification according to the ODI, patients were divided into similar groups: mild (5 ≤ ODI < 15 events/h), moderate (15 ≤ ODI < 30 events/h), and severe (ODI ≥ 30 events/h) [27]. Other PSG parameters collected included the apnea index (AI), the AHI in supine position, the AHI in non-supine position, minimal arterial oxygen saturation (minimal SpO2), baseline arterial oxygen saturation (baseline SpO2), average arterial oxygen saturation (average SpO2), and percentage of sleep time with arterial oxygen saturation time below 90% (SpO2 time < 90%).

Statistical analysis

The statistical analysis was performed by using Statistical Package for Social Studies (IBM SPSS Statistics version 24 for Windows, New York, NY, USA). Continuous data are presented as means with standard deviations. Categorical variables are presented as frequencies with percentages. Comparisons between groups were performed using Chi-square tests for categorical variables, unpaired Student’s t test, and univariate analysis of variance (ANOVA) for continuous variables. Discrimination, the ability of a screening tool to distinguish between patients with and without different outcomes, was estimated from the area under the curve (AUC) obtained by receiver operator characteristic (ROC) curves, which may range from 0.5 (no discrimination) to 1.0 (perfect discrimination) [28]. The AUCs were compared using the algorithm previously described by Hanley et al. [29]. Additionally, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for different AHI and ODI cutoffs using four-grid contingency tables, all estimates are reported with their respective 95% confidence interval (CI). The association between various individual demographic and clinical variables and the presence and degree of OSA was established by using a multivariate logistic regression model (backward stepwise selection, p < 0.05). A two-tailed p value < 0.05 was considered statistically significant.

Results

Baseline characteristics

A total of 201 patients met our inclusion criteria; baseline characteristics are mentioned in Table 1. A total of 148 (73.6%) patients were male, aged 50.0 ± 12.6 years, with a mean BMI of 28.0 ± 4.8 kg/m2. Based on the AHI, OSA was present in 159 (79.1%) of the patients; 66 (41.5%) with mild OSA, 45 (28.3%) with moderate OSA, and 48 (30.2%) with severe OSA. Male gender, age, BMI, neck circumference, cardiovascular history, hypertension, snoring, and apneas were all significantly higher in the OSA groups than in the no OSA group. A post hoc Bonferroni test showed a statistically significant difference between no OSA and moderate/severe OSA for male gender (p = 0.008; p = 0.001), age (p = 0.002; p = 0.013), and BMI (p = 0.045; p < 0.001). BMI was also significantly different between mild/moderate OSA and severe OSA (p < 0.001; p = 0.030). Neck circumference (p = 0.043; p = 0.032), cardiovascular history (p = 0.006; p = 0.040), and hypertension (p = 0.004; p = 0.002) all showed a statistically significant difference between no/mild OSA and severe OSA. The ESS did not differ significantly between OSA groups (p = 0.667; p = 0.616). A total of 54.5%, 75.6%, and 85.4% of the patients in the mild, moderate, and severe OSA group, respectively, were classified as high risk of OSA according to the NoSAS score (cutoff ≥ 8; p < 0.001). A total of 97%, 100%, and 100% in the mild, moderate, and severe OSA group, respectively, were classified as high risk of OSA according to the STOP-Bang questionnaire (cutoff ≥ 3; p < 0.001). Polysomnography results (AHI, ODI ≥ 3%, minimal SpO2, average SpO2, and SpO2 time < 90%) were all significantly different between the OSA and no OSA groups (p < 0.001; p < 0.001; p < 0.001; p < 0.001; p = 0.001). Notable is the percentage of patients with positional sleep apnea which was also statistically significant between the groups (p < 0.001). A post hoc Bonferroni test shows that the difference was significant between no OSA and all OSA severity groups (p < 0.001) and mild OSA and severe OSA (p = 0.05).

Table 1 Baseline characteristics

Performance of instruments

The predictive performance of the different screening instruments as categorical variable is shown in Table 2. For screening on different cut-off points of AHI and ODI severity, the sensitivity of the NoSAS score varies from 0.70 to 0.92 (AHI > 5 and AHI > 15, respectively). The specificity varies from 0.37 to 0.55 (AHI > 15 and AHI > 5, respectively). The STOP-Bang questionnaire showed the highest sensitivity varying from 0.99 to 1.00. However, the specificity was lower varying from 0.06 to 0.17. The highest specificity was obtained by the ESS, varying from 0.79 to 0.83, with a low sensitivity varying from 0.15 to 0.19. Figure 1 shows the ROC curves and the corresponding AUC of the three screening instruments on different levels of AHI and ODI severity. The screening instruments are presented as continuous variables. The ESS did not show adequate discrimination for screening for AHI and ODI with an AUC ranging from 0.450 to 0.525. The NoSAS score and the STOP-Bang questionnaire were both equally adequate screening tools for the AHI and the ODI with AUC ranging from 0.695 to 0.767 and 0.684 to 0.767, respectively (all comparisons with p value > 0.05). The discriminatory ability of the NoSAS score and the STOP-Bang questionnaire was similar in relation to both the AHI and the ODI (all comparisons with p value > 0.05). When used as categorical variable, the AUC of the NoSAS score ranged from 0.620 to 0.684 (cutoff ≥ 8), the AUC of the STOP-Bang questionnaire ranged from 0.529 to 0.577 (cutoff ≥ 3) (Table 2). Both instruments performed better when used as continuous variable than as categorical variable. However, only for the STOP-Bang questionnaire, this difference proved to be significant (all comparisons except AHI ≥ 5 with p value < 0.05).

Table 2 Performance of the NoSAS score, the STOP-Bang questionnaire, and the ESS. The screening instruments are presented as categorical variables (NoSAS ≥ 8, STOP-Bang ≥ 3, ESS ≥ 10)
Fig. 1
figure 1

Discriminatory ability reported as area under the curve (AUC) (95% CI). The NoSAS score, the STOP-Bang questionnaire, and the ESS are presented as continuous variables. OSA severity is classified based on AHI ≥ 5 (any OSA), AHI ≥ 15 (moderate to severe OSA), and AHI ≥ 30 (severe OSA). The ODI ≥ 3% is subdivided into ODI ≥ 5, ODI ≥ 15, and ODI ≥ 30. The NoSAS score performed similar when compared with the STOP-Bang questionnaire on all cutoff points (all comparisons with p value > 0.05). The ESS presented lower discrimination than presented by the NoSAS score and the STOP-Bang questionnaire on all cutoff points (all comparisons with p value < 0.05)

Predicting OSA

Multivariate logistic regression analyses were performed in order to establish the association between various individual demographic and clinical variables and the presence and degree of OSA categorized by the AHI and the ODI. Gender, age, and BMI proved to be the strongest predictors for any OSA (AHI ≥ 5) (p < 0.001; p < 0.001; p = 0.004), moderate to severe OSA (AHI ≥ 15) (p < 0.001; p < 0.001; p < 0.001), ODI ≥ 5 (p = 0.001; p = 0.001; p = 0.001), and ODI ≥ 15 (p < 0.001; p < 0.001; p < 0.001). Gender, BMI, and self-reported history of hypertension proved to be or the strongest predictors for severe OSA (AHI ≥ 30) (p = 0.028; p < 0.001; p = 0.028) and ODI ≥ 30 (p = 0.024; p < 0.001; p = 0.034). The ROC curves of the estimated predictive probability, the NoSAS score, and the STOP-Bang questionnaire with cutoff points AHI ≥ 15 and ODI ≥ 15 are shown in Fig. 2. The AUC of the estimated predicted probability was 0.784 when differentiating for AHI ≥ 15 and 0.805 when differentiating for ODI ≥ 15. The predicted probability performs similar to the NoSAS score and the STOP-Bang questionnaire (all comparisons with p value > 0.05).

Fig. 2
figure 2

Discriminatory ability reported as area under the curve (AUC) (95% CI). The NoSAS score and the STOP-Bang questionnaire are presented as continuous variables. The green ROC curve shows the plotted predicted probability of gender, age, and BMI. The predicted probability performs similar to the NoSAS score and the STOP-Bang questionnaire (all comparisons with p value > 0.05). The ROC curves are presented at AHI ≥ 15 and ODI ≥ 15

Discussion

The present study shows that both the NoSAS score and the STOP-Bang questionnaire, but not the ESS, were equally useful to detect patients at high risk for OSA. In this cohort, the STOP-Bang questionnaire had the highest sensitivity, with a low specificity. The NoSAS score had a higher specificity and PPV, while maintaining a moderate to high sensitivity. The ESS had the highest specificity, with a low sensitivity. This is in correspondence with what was found by previous authors [8, 10, 11, 13, 18, 30]. The discriminatory ability of the NoSAS score and the STOP-Bang questionnaire was similar in relation to both the AHI and the ODI. However, due to the low specificity and positive predictive value of the STOP-Bang questionnaire, it is possible that the STOP-Bang will yield a large proportion of false-positive cases if used in a wrong patient group and therefore increase the number of unnecessary nocturnal recordings, whereas the NoSAS score describes higher specificity and positive predictive values, while maintaining a moderate to high sensitivity and negative predictive value.

The discriminatory ability of the NoSAS score and the STOP-Bang questionnaire as a categorical variable was compared with the discriminatory ability as a continuous variable. As expected, the discriminatory ability is higher when the instrument is used as a continuous variable. However, only for the STOP-Bang questionnaire, this difference proved to be significant. Previous studies have already suggested that the probability of moderate to severe OSA increases in direct proportion to the STOP-Bang score, and therefore, the questionnaire should be used as a continuous rather than as a categorical variable. Chung et al. suggested patients with a STOP-Bang score of 0 to 2 to be classified as being at low risk for moderate to severe OSA. Those with a STOP-Bang score of 5 to 8 can be classified as being at high risk for moderate to severe OSA. In patients with a STOP-Bang score of 3 or 4, specific combinations of positive items should be examined further to ensure proper classification [6]. The NoSAS score has previously been presented as categorical variable with various cutoff points [8, 10, 13, 14, 30]. However, according to our study results, a similar scoring system to the STOP-Bang questionnaire can be considered. Coutinho Costa et al. suggested a similar approach, prioritizing patients depending on their score. Patients with a score of 0–5 are to be classified as low probability of OSA—particularly moderate to severe OSA; a score ≥ 7 are to be classified as probable OSA; a score ≥ 12 as a high probability of OSA—particularly moderate to severe OSA [14].

In the present cohort, male gender, age, and BMI showed to be the strongest individual predictors for OSA severity based on the AHI and the ODI. The discriminatory ability of the three variables combined was similar to the discriminatory ability of the NoSAS score and the STOP-Bang questionnaire. In future, this might present interesting opportunities to design a screening tool based on only three variables. As an alternative, the weighing factor of the variables gender, age, and BMI could be set higher in the existing screening instruments. A similar approach was suggested by Chung et al. for the STOP-Bang questionnaire, introducing male gender, BMI, and neck circumference as high-risk variables [6].

Clinical implications

This is the first study that evaluated the predictive performance of three different screening instruments with respect to both the AHI and the ODI. This is relevant, due to increasing evidence that the ODI has a higher reproducibility in the clinical setting [19,20,21]. Furthermore, significant differences in the severity of OSA have been described between patients with a similar AHI. Presumably, this is due to the fact that the morphology and duration of the apneas are not taken into account in the AHI [22]. In the present study, the NoSAS and STOP-Bang screening instruments both have a high discriminatory ability to predict OSA severity based on the AHI and the ODI. The ESS, however, was not able to detect patients at high risk for OSA and should, therefore, not be used as a screening instrument.

Limitations and strengths

In general, the use of a retrospective analysis to validate the predictive value of different screening instruments is less ideal than a prospective study. In this observational study, however, our center had collected data prior to PSG monitoring, thus maintaining a high credibility for this retrospective study. Most patients were referred to the sleep clinic because they were suspected of having sleep-related problems. Therefore, it is possible that a selection bias was introduced, since the questionnaire was applied only to the suspected individuals. The great prevalence of OSA in this study population could affect the interpretation of the screening instruments. Contrarily, the present study has several important strengths: this is the first study that has evaluated the predictive value of different screening instruments on the ODI. As the ODI is gaining attention as new variable to classify OSA severity, this is an important new insight. Furthermore, all patients were evaluated with a full PSG and scored according to the current guidelines proposed by the American Academy of Sleep Medicine in 2012 [2].