Introduction

Pulse oximetry (PO) is a simple and inexpensive method to diagnose obstructive sleep apnea (OSA). It has been validated against sleep laboratory polysomnography (PSG), especially in adult populations with a moderate-to-high pretest probability of OSA [1,2,3]. Its sensitivity and specificity vary considerably depending on the technical features of the oximeters used, averaging times and/or oxygen saturation sampling frequency (SpO2), differences between populations, SpO2 analysis algorithms, and the different definitions used for hypopnea and OSA [4,5,6]. The main disadvantage of PO lies in its inability to detect low oxygen desaturation (OD) in borderline ranges, and therefore, it is not sensitive in diagnosing mild, positional, or REM-dependent OSA [5, 7,8,9,10].

The clinical validation of PO for outpatient OSA management in adult patients has not been sufficiently studied. Two randomized clinical trials [11, 12] studied the use of PO to diagnose and initiate continuous positive airway pressure (CPAP) treatment in patients with suspected OSA. Oximetry results were considered to be compatible with OSA when the oxygen desaturation index for ≥ 3% or 4% desaturation (ODI3, ODI4) was > 27 or > 14, respectively. Authors concluded that, in patients with a high probability of OSA, a positive oximetry would confirm OSA diagnosis in the most severe cases and allow an early indication of CPAP treatment with a similar performance to a PSG-based strategy, in terms of clinical improvement, CPAP titration, and residual AHI (apnea–hypopnea index).

A simulated observational study evaluated the accuracy of oximetry and clinical data compared to PSG in the indication of CPAP in patients with suspected OSA. Eight independent observers based their indication for CPAP in a blind assessment of the information available online for 411 patients with suspected OSA. Seven observers used clinical data and ODI3, while the eighth observer also had access to PSG results. The combination of patients’ clinical history and PO allowed physicians to reliably indicate CPAP (specificity 98%) in 60% of symptomatic patients (Epworth’s score > 10 or at least one comorbidity) [13].

The American Academy of Sleep Medicine (AASM) Guidelines and the recommendations recently published in the international consensus document on OSA do not include PO in the initial management of subjects with suspected OSA [14, 15]. However, other medical associations include PO among their diagnostic approach strategies and describe it as a valuable resource to decide whether or not to start CPAP [16,17,18,19] though they have not established any criteria on how to use PO in routine medical care.

The objectives of this study were to assess the cut-off values for ODI3 to confirm OSA diagnosis in subjects undergoing PSG in the sleep laboratory and home-based respiratory polygraphy (RP) and propose a management algorithm based on PO in patients with moderate-to-high pretest probability of OSA according to the STOP-BANG (SBQ) and Berlin (BQ) questionnaires.

Material and methods

Study design

This observational, cross-sectional, retrospective study to validate the use of PO as a diagnostic test was carried out in two tertiary care community hospitals of the city of Buenos Aires (HA, HB). The protocol was approved by an independent ethics committee (HB institutional ethic committee with approval number: PRIISA BA 6999).

Sampling and patient selection

Subjects of 17 years of age or older who underwent PSG at a sleep laboratory between January 2015 and January 2018 or home-based RP between 2011 and 2020 were selected from the data bases of both sleep units. All subjects selected in this study were referred due suspected OSA.

PSGs with significant artifacts in the neurological, respiratory, or oximetric signals or less than 180 min of total recording time (TRT), and patients on CPAP or supplemental oxygen therapy were excluded, as well as RPs with a valid TRT of less than 4 h or artifacts in > 10% of the PO. Figure 1 shows a flowchart on patient selection.

Fig. 1
figure 1

Flow Chart patient´s selection and study design (PSG=Polysomnography;TST=Total Sleep Time;RP=Respiratory Poligraphy;TVRT=Total Valid Recording Time;PO=Pulse Oximetry)

Measurement

Polysomnography

PSGs were conducted at HA’s sleep laboratory and included F4/C4/O2 or 16 EEG channels (FP1/2, F3/4, F7/8, T3/4, T5/6, C3/4, P3/4, O1/2, CZ) depending on the test indication, EOG (E1/2), chin EMG, leg EMG, ECG, air flow by nasal pressure and oral thermistor, piezoelectric belts to measure chest/abdominal movement, SpO2, and body position. PO was recorded with a Nonin OEM III module with fingertip pulse oximeter. The averaging time for SpO2 was every 4 or 8 beats depending on the heart rate. SpO2 sampling and storage frequency was 1 Hz. Enamel was removed from patients’ nails to prevent interferences in SpO2 readings. PSG results were read manually by experienced physicians following international guidelines on sleep stages [20], identification of arousal [21], and respiratory events (AASM 2012) [22]. SpO2 data from PSG were stored in a separate file and subjected to automatic analysis. ODI3 was calculated with Akonic™ EEG V14.9 software.

Respiratory polygraphy

Home-based RPs were indicated and interpreted by HB. Patients used ApneaLink™ (AL) respiratory polygraphy devices (ResMed™ Australia) [23]. These level III devices measure airflow and snoring through nasal pressure, respiratory effort through a thorax/abdominal belt and include fingertip pulse oximetry. AL oximeters are similar to the PSG model.

Automatic signal analysis (with ApneaLink™ 10.2) was followed by a manual correction of respiratory events by trained physicians. Apnea was defined as ≥ 90% or greater decrease in airflow (AF) for ≥ 10 s and hypopnea as a ≥ 50% reduction in the AF signal or an identifiable drop in AF associated with a ≥ 3% oxygen desaturation for at least 10 s in both cases [24]. ODI3 was calculated automatically.

AHI was defined as the number of respiratory events (apnea + hypopnea) per hour of sleep (AHIPSG) or valid total recording time (AHIRP) in events/hour (ev/h). OSA was defined as AHI of ≥ 5 or ≥ 15 eV/h. OSA was classified as mild (AHI from 5 to 14.9 eV/h), moderate (AHI from 15 to 29.9 eV/h), or severe (AHI ≥ 30 eV/h).

Questionnaires

All patients undergoing RP completed the Spanish version of the STOP-BANG (SBQ) and Berlin (BQ) questionnaires and the Epworth sleepiness scale (ESS) [25,26,27,28,29,30]. Excessive daytime sleepiness was defined as ESS > 10 points (30). Depending on the number of BQ categories, pretest probability was defined as high (≥ 2) or low (only 1 categories).

Study procedure and rationale

ODI3 was the PO parameter used to classify a subject as healthy (no OSA = AHI < 5 or < 15 eV/h) or unhealthy (OSA = AHI ≥ 5 or ≥ 15 eV/h). ODI3 shows the best correlation with AHI making it a valid alternative [31, 32].

HA’s database or the experimental group (Exp-G) was used to determine ODI3 cut-off values with a 100% Sp for both OSA definitions. During a second stage, Exp-G ODI3 values (without false positive results) were applied to HB’s database or the validation group (Val-G) to determine if their diagnostic performance could work similarly. Based on these results, a PO-based strategy was proposed for OSA diagnosis and the indication for CPAP treatment.

Statistical analysis

The distribution of parameters was assessed using a histogram of frequencies and a Kolmogorov–Smirnov test. Values were expressed either as mean and standard deviation or median and interquartile range. Comparisons between independent sample variables were made using parametric or non-parametric tests depending on the distribution of their values.

Values of sensitivity, specificity, and positive and negative likelihood ratio for multiple cut-off values of ODI3 in the Exp-G were estimated. Then, ODI3 values with a 100% specificity in the Exp-G were applied to Val-G to assess their diagnostic performance. Areas under the ROC curve (ROC-AUCs) for the ODI3 cut-offs that had a 100% specificity in Exp-G and Val-G were calculated. The degree of agreement between AHIPSG, AHIRP, and ODI3 were evaluated by intraclass correlation coefficient (ICC) and the Bland–Altman plot.

Statistic calculations were made using MedCalc Statistical Software (version 20.027) (MedCalc Software Ltd, Ostend, Belgium; https://www.medcalc.org; 2021).

Results

Out of all the patients initially assessed, 1141 were included in Exp-G (PSG) and 5457 in Val-G (RP). Exclusion criteria are detailed in Fig. 1.

Val-G had a lower proportion of female patients (36.8%) and a higher prevalence of obesity (55.9%), systemic hypertension (52.4%), and OSA (AHI ≥ 5 eV/h, 82.9%) compared to Exp-G (45.0%, 29.8%, 30.3%, and 67.5%, respectively (p < 0.001). Table 1 shows the characteristics of the study population.

Table 1 Characteristics of the study population

Table 2 shows PO sensitivity (S) and specificity (Sp) for OSA diagnosis in Exp-G and Val-G. S and Sp of the different cut-offs of ODI3 and OSA definitions in Val-G (RP) were similar to the ones observed in Exp-G (PSG).

Table 2 Sensitivity and specificity of pulse oximetry for diagnosis of obstructive sleep apnea

In Exp-G, an ODI3 > 12 (OSA = AHI ≥ 5) had a 69.5%, S (CI95% 66.1–72.7) and 100% of Sp (CI95% 99–100), while an ODI3 ≥ 26 had a 53.8% S (CI95% 49.3–58.2) and Sp of 100% (CI95% 99.4–100) for moderate-to-severe OSA (AHI ≥ 15). Figure 2 shows the area under the receiver operator curves (AUC-ROCs) for different oximetric criteria (ODI3 > 12 and ≥ 26) and definitions of OSA (AHI ≥ 5 and ≥ 15). ODI3 had similar diagnostic performance both on PSG and RP (ODI3PSG vs. ODI3RP: AHI ≥ 5: 0.847 vs. 0.845; AHI ≥ 15: 0.768 vs. 0.779, p NS).

Fig. 2
figure 2

Bland–Altman plot of apnea–hypopnea index (AHI) and oxygen desaturation index ≥ 3% (ODI3) from polysomnography (PSG) (A) and respiratory polygraphy (RP) (B)

Bland–Altman and identity plots in Fig. 3 compare manually edited AHI on PSG and RP with that on ODI3 in-laboratory and home sleep apnea testing. The mean difference (95% confidence interval [CI]) between ODI3PSG and AHIPSG was − 0.1 (− 043 to 0.23, p 0.9), with limits of agreement ranging from − 11 to 11 events/h. On the other hand, their counterpart (ODI3RP—AHIRP) showed a mean difference (95% CI) of 0.6 (0.4 to 0.67, p 0.1) with narrower limits of agreement of 9.3 to − 8 (p < 0.01).

Fig. 3
figure 3

Areas under the ROC curves of ODI3 from polysomnography (PSG) (A, B) and respiratory polygraphy (RP) (C, D) with 100% specificity

The mean difference (95% CI) between ODI3PSG and AHIPSG < 15 and ≥ 15 was 0.89 (AHIPSG < 15: 0.64 to 1.15) and -1.4 (AHIPSG ≥ 15: − 2 to − 0.7). In contrast, ODI3RP and AHIRP had a mean difference (95% CI) of 0.82 (AHIRP < 15: 0.73 to 0.91) and 0.3 (AHIRP ≥ 15: 0.03 to 0.5). The intraclass correlation coefficients (ICCs) between ODI3 and AHI of both PSG and RP are presented in Table 3. The concordance between ODI3 and AHI was lower in subjects with AHI less than 15 (ICCs: 0.72–0.83) compared to those with AHI ≥ 15 (ICCs: 0.92–0.94), p < 0.001.

Table 3 Intraclass correlation coefficients between AHI and ODI3

Table 4 shows the sensitivity of ODI3 according to SBQ and BQ scores in Val-G. SBQ performed better than BQ (AHI ≥ 5 or ≥ 15: 67.8 ± 13% and 53 ± 10.8% vs. 61.2 ± 14% and 49.6 ± 10%, p < 0.01).

Table 4 Sensitivity and specificity of pulse oximetry in validation group according to STOP-BANG questionnaire (Val-G)

There were no significant differences in ROC-AUCs in the following combinations: SBQ < 3 and BQ ≤ 1 (AHI ≥ 5: 0.747 vs. 0.756, p 0.72; AHI ≥ 15: 0.686 vs. 0.713, p 0.46), SBQ ≥ 2 and SBQ ≥ 3 (AHI ≥ 5: 0.85 vs. 0.86, p 0.18; AHI ≥ 15: 0.781 vs. 0.786, p 0.6), SBQ ≥ 2 and BQ ≥ 2 (AHI ≥ 5: 0.85 vs. 0.856, p 0.42; AHI ≥ 15: 0,781 vs. 0.783, p 0.83), and SBQ ≥ 3 and BQ ≥ 2 (AHI ≥ 5: 0.86 vs. 0.856, p 0.6; AHI ≥ 15: 0.786 vs. 0.783, p 0.76).

A high probability for OSA by BQ (≥ 2 categories) had a lower diagnostic performance than SBQ ≥ 5 points in any combination (AHI ≥ 5: 0.856 vs. 0.899, p < 0.001; AHI ≥ 15: 0.783 vs. 0.807, p 0.026) and SBQ ≥ 5 performed better than SBQ ≥ 2 (AHI ≥ 5: 0.899 vs. 0.85, p < 0.001; AHI ≥ 15: 0.807 vs. 0.781, p 0.016).

In the light of these observations, we propose an early management algorithm through PO in patients with suspected OSA based on ODI3 cut-off values without false positives (100% Sp) (see Fig. 4).

Fig. 4
figure 4

Obstructive sleep apnea diagnostic algorithm based on initial pulse oximetry

Discussion

The most relevant findings of this study are as follows: (1) an ODI3 > 12 (OSA = AHI ≥ 5) or ≥ 26 (OSA =  ≥ 15) allowed for an accurate diagnosis of OSA (without false positives) in a significant percentage of evaluated patients with pretest probability of moderate-to-severe OSA according to laboratory-based PSG, and (2) ODI3 cut-off values observed with PSG were reproducible, since similar results were obtained in a second sample of patients with suspected OSA assessed by home-based RP.

Our results show that this strategy would allow a reliable diagnosis in 60% of the subjects that visit a sleep unit.

Thus, a SBQ ≥ 3 (moderate-to-high pretest) or BQ ≥ 2 categories (high pretest probability) with an ODI3 > 12, would confirm OSA (AHI ≥ 5) in ≈ 70% of the cases, without the need to make additional tests; and ODI3 ≥ 26 (AHI ≥ 15) would lead to a reliable indication of CPAP (without false positives) in 55% of patients. This approach offers a stepwise approach to triage health expenditure, as units with the three resources would perform more POs than RPs or PSGs, shortening waiting lists and prioritizing cases that raise doubt.

Unlike the current study, Fabius et al. [33] developed a strategy to rule out OSA using PO. They studied 140 patients using PSG and showed that ODI3 < 5 was associated with a S/Sp of 99/65% and an AUC-ROC of 0.75 with the high negative predictive value (95%) as their greatest strength.

Recently, Sánchez-Quiroga et al. [34] showed that OSA diagnosis and management in primary care was not inferior to OSA diagnosis and management by a specialized sleep unit. Other authors have described similar findings [35] opening the door to simplified protocols to diagnose and treat OSA, which is estimated to affect 1 billion people worldwide [36].

PO has limitations inherent to the type of data it offers. For example, PO does not differentiate between central and obstructive events, and it is not possible to identify if there is a prevalence of apnea/hypopnea associated with a specific stage of sleep or body position, which could call for a different therapeutic approach. However, it is clear that the information provided by PO (ODI3), if interpreted in the right clinical context (SBQ, BQ, Epworth sleepiness scale, comorbidities), would facilitate decision making and rapid access to an effective treatment like CPAP, especially for patients at risk or when other diagnostic alternatives are not readily available. Chai-Coetzer et al.; however, in a multicenter, randomized, non-inferiority study conducted on 406 POs obtained from PSG to guide OSA treatment indications, concluded that researchers’ level of trust in the diagnosis of OSA was lower when it was based on PO when compared to RP and PSG (PO: 50%, RP: 72%, PSG: 83%, p 0.003) and their therapeutic decisions based on ODI4 resulted in lower adherence to CPAP (one hour less) and lower effect on symptom resolution [35].

Decisions are not solely based on the sleep test metrics; rather, it is necessary to consider pretest probability, symptoms, and associated diseases that represent a future risk. Many real-life CPAP indications based on patient interviews and clinical probability have received special attention for their high Sp [13, 37, 38].

Recently, the COVID-19 pandemic has limited access to PSG, resulting in unacceptable delays in the diagnosis of OSA. During the lockdown and other COVID-19-related restrictions, the sleep medicine community has used more simplified home-based diagnostic methods than ever before [39, 40], which represents an opportunity to promote a rational use of resources and identify patients with OSA patients [41]. In addition, the new generation of disposable oximeters with remote data transmission (single use at a low cost) and improved algorithms to increase the sensitivity of automatic classification may open a new and interesting field of research with clinical potential from the perspective of risk (hypoxemic burden) [42].

Our study has strengths including the systematic assessment of a large number of patients who visited two sleep units to undergo a diagnostic sleep test (real-life setting) with oximetry criteria obtained from standard PSG. Also, ODI3 cut-off values observed in the laboratory-based PSG group had a similar diagnostic performance compared to the oximetry obtained from home-based self-administered RP. However, there are several limitations to consider. First, this study has the limitations inherent to retrospective studies, since we do not have all the clinical data of the patients and the two study samples are not identical. Second, ODI3 cutoffs for subject classification only apply to the oximeter models and automatic analysis algorithms used in this study and, therefore, cannot be extrapolated to other devices. Third, our population’s respiratory function was not evaluated, which may have led to the exclusion of patients with chronic pulmonary or heart diseases where PO may behave differently. Fourth, the hypopnea criteria used may represent a bias in the interpretation of results. AHI, however, proved to be equivalent regardless of the hypopnea criteria used in its definition (AASM 2012, Chicago) [43]. Fifth, we evaluated a white race population, so the diagnostic performance of oximetry cannot be extrapolated to subjects with dark skin pigmentation [44].

Conclusions

Our proposal for the initial use of PO in subjects with a moderate-to-high pretest probability of OSA is based on the results of patients who underwent laboratory-based PSG, which were validated in a second sample of patients assessed with a home-based approach.

This algorithm considers the use of PO as well as RP and PSG for more challenging cases or in case of doubt. We believe this strategy may result in a more rational use of resources shortening long waiting lists and allowing non-experts to work in coordination with specialized units. Prospective randomized studies are necessary to draw stronger conclusions that allow for the generalization of our findings.