Introduction

Obstructive sleep apnoea (OSA) is the most common sleep-related respiratory disorder which is recognized as an independent risk factor for a range of clinical conditions, such as hypertension, stroke, depression and diabetes [1, 2]. Moreover, OSA is a significant cause of motor vehicle crashes [3] and is associated with an increase in all-cause mortality, particularly due to coronary artery disease [3, 4]. It has been estimated that up to 80% of individuals with moderate-to-severe OSA have not been diagnosed [5].

Polysomnography as the current gold standard for OSA diagnosis is expensive and difficult to set up and interpret [1]. Portable home monitoring (type III polygraphy) was approved by the American Academy of Sleep Medicine as an alternative in patients without significant cardiorespiratory disease, chronic opioid medication use, history of stroke, or severe insomnia [6].

High prevalence of undiagnosed OSA, limited resources and the short- and long-term consequences of the disease have created a need to develop a reliable and affordable screening tool for OSA risk stratification. Questionnaires can be appropriate tools to that end since they can be applied and scored easily as part of routine daily practice [7]. The STOP-Bang questionnaire (SBQ) is a simple and validated questionnaire that detects OSA with high sensitivity and is, therefore, more suited for a sleep clinic setting compared other questionnaires as it helps to avoid missing cases [8, 9]. In their 2017 meta-analysis, Chiu et al. [10] compared the Berlin questionnaire, the SBQ and Epworth’s sleepiness scale in terms of OSA detection. The results revealed that for mild, moderate, and severe OSA, the pooled sensitivity and diagnostic odds ratio of the SBQ were significantly higher in comparison to the other screening questionnaires. The SBQ also demonstrated good flexibility as it had the largest area under the curve when compared to seven other questionnaires for the commonly used AHI cutoffs of 5, 15 and 30 [11]. The SBQ has been translated and validated in numerous languages, but no scientifically produced translation or validation of a Slovene version has, thus, far been produced. We aim to translate, culturally adapt and validate the SBQ for use with Slovene patients.

Methods

We split our study into two parts: first, we translated and adapted the SBQ and tested its internal consistency; second, the translated SBQ was validated against sleep study in a cross-sectional study.

Study population

From February to April 2017, a sample of 153 healthy Slovene-speaking subjects aged 18 or older were recruited for test–retest reliability. 134 (87.6%; mean age 42.9 ± 12.7) completed both sets of the SBQ required for final analysis. The demographic characteristics and SBQ results of this sample are shown in Table 1.

Table 1 Demographic characteristics and STOP-Bang output of the convenience sample volunteers

The second part of the study was conducted at the sleep clinic at the Institute of Clinical Neurophysiology, University Medical Centre Ljubljana, Slovenia. All patients referred for polygraphy or polysomnography who were 18 or older and spoke Slovene were asked to participate in the study. Patients with neuromuscular conditions were excluded. There were no limitations of referrals. Of the 256 patients referred, 237 (92.6%; mean age 52.5 ± 14.6) were included in the final analysis. 16 patients failed to complete the questionnaires. Two patients were excluded for low fidelity polygraphy recordings which they were not willing to repeat. One was excluded, as he could not fall asleep with PG and declined further testing. The demographic characteristics of this second sample are shown in Table 2.

Table 2 Demographic characteristics of the patients recruited at the sleep clinic

Study design and data collection

Step 1: Translation of the SBQ

The SBQ was translated from English to Slovene by two independent researchers, one a medical doctor with experience in sleep medicine and the other a psychologist with experience in instrument development and translation. Both were native Slovene speakers proficient in English. A bilingual panel consisting of the two researchers who performed the forward translation and a medical doctor, a somnology specialist, conducted a synthesis of the two translations. An independent translator, psychologist by training, with no knowledge of the SBQ, who grew up in a Slovene–English bilingual home, conducted the back translation. To verify that the questions were understood correctly 10 adults (6 females and 4 males, mean age 39.3 ± 11.8) participated in a one-on-one think-aloud cognitive interview with a psychologist. Procedures were in line with the standards set out by the World Health Organisation [12].

Step 2: Test–retest reliability of the SBQ

Participants were given two sets of questionnaires. The first set contained demographic questions, exclusion criteria and the SBQ. The second set consisted of the SBQ retest which was taken 2–3 weeks after the first.

Step 3: Validation

Polysomnography and polygraphy

Patients referred to the sleep clinic for sleep study underwent either ambulatory type III polygraphy (PG) or type I polysomnography (PGS). PSG was used in patients with significant cardiorespiratory disease, chronic opioid medication use, history of stroke, or severe insomnia. When the PG recording could not be used due low fidelity or failure because of technical reasons such as the respiratory effort belt becoming loose, the dislodging of the pulse oximeter, etc. PSG was ultimately performed. This is in line with normal sleep centre operations and follows the recommendations by the American Association for Sleep Medicine (AASM) [6]. PG was recorded using the Alice NightOne, Phillips Respironics, system. PSG was recorded using the Alice 6, Phillips Respironics, system. Patients with neuromuscular conditions were excluded. This was done because other disorders such as sleep-onset insomnia, sleep maintenance insomnia, excessive eye movement sleep behaviour disorder, central sleep apnoeas, and diaphragm weakness with pseudo-central apnoeas, for which the SBQ was not designed, are common in this population [13].

Recordings were manually scored in our accredited sleep centre by our most experienced certified sleep specialist. Scoring was conducted in accordance with AASM guidelines and rules [6, 14]. The scorer was blinded to patients’ clinical histories and SBQ scores. The severity of OSA was determined as mild for apnoea hypopnea index (AHI) ≥ 5 and < 15, moderate for AHI ≥ 15 and < 30, and severe for AHI ≥ 30 [6, 14]. In the final analysis, datasets with complete STOP-Bang questionnaires and good-quality recordings were included.

Statistical analysis

Patients’ characteristics were presented with the mean (standard deviation) in the case of normally distributed numerical variables; median (interquartile range) in the case of non-normally distributed numerical variables; and with frequencies (%) in the case of categorical variables. The differences between the OSA and non-OSA group were tested with independent t test or Mann–Whitney test, while the Chi-square test was used for categorical variables. The validation of the SBQ included the evaluation of internal consistency (Cronbach’s Alpha) and test–retest reliability (Gwet’s AC1 agreement coefficient). Reliability was also tested with factor analysis using tetrachoric correlations. To assess the predictive validity of the SBQ, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for different AHI cutoffs. Logistic regression was used to calculate the predicted probabilities for AHI scores.

Results

Translation and adaptation

After the forward translations were finished, they were examined by a bilingual panel and a consensus translation was reached. No major discrepancies were noted between the original English and the back-translated version. After reviewing the results of cognitive interviewing, the panel decided to break down the question whether the body mass index (BMI) was 35 or higher into its constituents, meaning that patients were instead asked to give their weight and height as this was easier to self-report. See Additional file 1 for the final version of the Slovene SBQ.

Internal consistency and temporal stability

Cronbach’s Alpha coefficient for the 8 items was 0.628. Factor analysis suggested a single-component solution that explained 32.2% of the total item variance. The loadings for each of the SBQ items are presented in Table 3. The intraclass correlation coefficient between test and retest total scores was 0.94 (95% CI 0.91–0.95, p < 0.001). The test–retest reliability for each item was assessed with Gwet’s AC1 coefficients and almost all scores were greater than 0.9, indicating excellent test–retest reliability (Table 4).

Table 3 Factor loadings based on principal component analysis with oblimin rotation for the eight items from the SBQ (n = 372)
Table 4 Test–retest reliability

Validation against AHI

A comparison of the data for patients who had OSA (AHI ≥ 5) with those who did not reveals a significant difference in demographic characteristics between the two groups (Table 5).

Table 5 Demographic characteristics of the patients recruited at the sleep clinic and a comparison of the data for those with and without OSA

Of the 237 patients at the sleep clinic, 72 patients (30.4%) had no OSA (AHI < 5), whereas the remaining 165 patients had mild (n = 53, 22.4%), moderate (n = 52, 21.9%) and severe OSA (n = 60, 25.3%). See Table 6 for comparison of the answers given to the SBQ and AHI between patients with and without OSA. As shown in Fig. 1, the total SBQ scores were positively associated with the AHI score. The correlation coefficient was 0.56 and statistically significant (95% CI 0.47–0.64; p < 0.001). 5 patients underwent PSG, while the rest had PG. The SBQ was evaluated with the cutoff values of AHI ≥ 5, 15 and 30; the respective areas under the ROC curve were 0.757 (95% CI 0.692–0.823; p < 0.001), 0.768 (95% CI 0.711–0.825; p < 0.001) and 0.77 (95% CI 0.704–0.836; p < 0.001). Plots for age, BMI, sex and neck circumference against AHI are also given for comparison in Figs. 2, 3, 4 and 5, respectively. As the SBQ score increased, sensitivities and NPV decreased whereas specificities and PPV increased. Detailed results are presented in Table 7. Predicted probabilities of having OSA based on the SBQ were calculated. As the SBQ increased from 3 to 8, the probabilities of OSA also increased (Fig. 6).

Table 6 Comparison of the answers to the STOP-Bang questionnaire and AHI in patients with and without OSA
Fig. 1
figure 1

Scatter plot of total SBQ scores against AHI

Fig. 2
figure 2

Scatter plot of age against AHI

Fig. 3
figure 3

Scatter plot of BMI against AHI

Fig. 4
figure 4

Box plot of sex against AHI

Fig. 5
figure 5

Scatter plot of neck circumference against AHI

Table 7 Diagnostic ability of the SBQ scores from 1 to 8 at AHI cutoff points 5, 15 and 30 (n = 237)
Fig. 6
figure 6

Predicted probabilities of having OSA of different severity based on the STOP-Bang questionnaire

Discussion

During the process of cognitive interviewing, it was suggested that the question of BMI be broken down into its components, body mass and height, which would make it easier to use and understand. We chose to adopt this suggestion. We were unable to find any other validation studies of the SBQ where this was done.

Our translation of the STOP-Bang questionnaire showed good temporal stability. Intraclass correlation coefficient between test and retest total scores was high.

Internal consistency was evaluated with two statistical methods. The first was calculating Cronbach’s alpha, which was somewhat low at 0.63. This on par with 0.62 for the Brazilian translation [15], close to 0.7 for the Arab translation [16] and higher than the Lithuanian translations’ 0.41 [17]. We also performed factor analysis, which is more suitable for dichotomous variables, [18, 19] such as are found in the SBQ. This showed good internal consistency for six out of the questionnaire’s eight items. Item number two “Do you often feel tired, fatigued or sleepy during the daytime?” and item six “Age older than 50?” had a loading score below the threshold of 0.3. For item two, this could be explained by other common causes of tiredness other than OSA, for instance depression, which are common in the general population. Another cause could be the fact that some patients with OSA do not experience excessive daytime sleepiness [20]. The low factor loading of item five that refers to age could perhaps be explained by the fact that the study population consisted mostly of middle-aged subjects and a larger sample could perhaps have been more telling. Dr. Chung, who designed the questionnaire, did not evaluate internal consistency, citing that the questionnaire reflected four different dimensions of OSA morbidity and that internal consistency checking was, thus, not applicable [21]. Internal consistency checking was nevertheless performed in certain validation studies [7, 16] and omitted in others [17, 22, 23]. When it was carried out, Cronbach’s coefficient alpha was used, and values were typically low.

The prevalence of OSA (AHI of ≥ 5) in our sleep clinic population was 69.6%. For the Portuguese version, the prevalence was 78% [7], for the Lithuanian, this was 93% [17], for the Arabic, it was 94% [16] and for the Malayan, it was 100% [23].

A comparison of the answers given by patients with and without OSA (AHI ≥ 5) showed significant differences in all but one of the eight questions, i.e. the question referring to a BMI ≥ 35 where the p value was 0.113. This was somewhat surprising considering that OSA has been reported in over 40% of persons with a BMI of more than 30 [20]. Nevertheless, of the 44 patients with a BMI ≥ 35, only 9 did not have OSA. Interestingly, Reis et al. [7] also found BMI to be statistically nonsignificant. The specific cutoff value used for the BMI might be the cause. This sentiment is supported by Fig. 3 and Table 5 which show a correlation between the BMI and AHI.

For a SBQ score of 3, we found that the area under the ROC curve was high at 0.757 (95% CI 0.692–0.823; p < 0.001) for all OSA (AHI ≥ 5). This increased slightly for moderate/severe and severe OSA to 0.768 (95% CI 0.711–0.825; p < 0.001) and 0.77 (95% CI 0.704–0.836; p < 0.001), respectively. The AUC for all OSA (AHI ≥ 5) obtained by Reis et al. [7] was slightly higher than ours at 0.806 (95% CI 0.730–0.881), but slightly lower for moderate/severe (AHI ≥ 15) at 0.730 (95% CI 0.661–0.798) and severe OSA at 0.728 (0.655–0.801).

Among patients referred to the sleep clinic the Slovenian version of SBQ, at a score of 3, showed a high sensitivity 92.1% (86.9–95.7%) and moderate specificity of 44.4% (32.7–56.6%) for all OSA (AHI ≥ 5). This was on par with benchmarks such as Chung et al. [21], who had a sensitivity of 72.1% and specificity of 38.2% and Silva et al. [8] with sensitivity of 82.0% and specificity of 43.3% for the same range and cutoff. Low specificity was also observed in a number of translations [7, 16, 17] as well as meta-analysis by Nagappa et al. [9].

The PPV for an SBQ score of 3 for any OSA (AHI ≥ 5) was high at 79.2 (75.5–82.4). Specificity and PPV increased continuously for every increase in the SBQ. These results were on par with other translations of the SBQ [7, 24]. High sensitivity and PPV are essential for screening tools, but it could be argued that NPV is perhaps even more important for risk stratification. Our results show that a STOP-Bang score of 2 had a NPV of 80.0% (46.6–94.8) for all OSA (AHI ≥ 5) and 100.0% for moderate/severe (AHI ≥ 15) and severe OSA (AHI ≥ 30). Although our NPV might be higher due to an underestimation of the AHI brought about by the high percentage of sleep studies conducted with PG, the results are similar those obtained by Portuguese researchers [7].

A SBQ of 3 was chosen as the recommended cutoff. This is in line with other recent translations [7, 16, 17, 23].

An important study limitation was that the population referred to the sleep clinic was in a sense already pre-screened by referring practitioners. Our findings, thus, cannot be extended to other settings.

In our study, we primarily utilized PG, which accounted for 97.8% of all recordings. PG devices do not include sleep staging and can give lower AHI compared with PSG where periods of wakefulness are excluded from the calculation of AHI [25]. PG has, however, been shown to be a reliable alternative to PSG and is becoming ever more prevalent in clinical practice [7, 26]. Ours was not the first study to have used PG for validation of the STOP-Bang questionnaire [7].

Conclusion

Our study has shown that the Slovene version of the SBQ is a simple, reliable and valid tool for the stratification OSA risk among Slovenes referred to a sleep clinic with high sensitivity and moderate specificity.