Abstract
Purpose
Sleep respiratory events are scored based on the reduction of airflow measured by a thermistor or nasal pressure cannula, together with oxygen desaturation and arousal criteria for hypopneas. We investigated whether automatic scoring can be performed without using the uncomfortable oronasal sensors and developed an automatic scoring system that is compatible with level III home sleep apnea testing devices.
Methods
We developed a respiratory event detection algorithm, based on SpO2 and respiratory effort signal measured from respiratory inductance plethysmograph (uncalibrated RIPsum), that outputs the time and duration of detected events and calculates an apnea–hypopnea-index (AHI) based on total recording time. The algorithm was tested on 98 polysomnography (PSG) recordings of patients, 77 with suspected sleep apnea and 21 without. The results were compared to annotations provided by the PSG systems where PSG AHI was computed using the total sleep time. The predicted AHI was evaluated for correlation and agreement with the PSG AHI using the intra-class correlation coefficient (ICC). Severity classification was performed and evaluated using the following categories: normal (< 5), mild (5–15), moderate (15–30), and severe (\(\ge 30\)).
Results
The ICC between predicted AHI and PSG AHI scored r = 0.96 (0.95–0.97, p < 0.001). The algorithm correctly predicted the severity for 74 recordings, overestimated 16, and underestimated 8. There was no misclassification by more than one severity level.
Conclusion
Using respiratory effort and SpO2, our algorithm was able to detect respiratory events with high correlation and agreement compared to full PSG-based detection.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Respiratory events are episodes during sleep where respiratory activity is disturbed. The American Academy of Sleep Medicine (AASM) scoring manual describes two respiratory events, namely apnea and hypopnea [1]. According to the scoring manual, apnea is the disruption of breathing during sleep with a reduction in airflow of at least 90% for at least 10 s while hypopnea is a reduction of airflow of at least 30%, with an additional requirement of having an associated oxygen desaturation and/or arousal [1]. The apnea–hypopnea index (AHI) is the measure used to quantify sleep apnea and hypopnea and classify its severity. AHI is defined as the number of apnea and hypopnea events per hour during sleep. Polysomnography (PSG) is a multi-sensor overnight recording of sleep and is the gold standard for sleep diagnosis. It is used to obtain the AHI. The relevant respiratory sensors are oronasal thermistor, nasal pressure cannula, abdomen and thorax respiratory inductance plethysmography (RIP) belts, finger pulse oximeter, and electroencephalography [1].
Aside from the standard PSG, home sleep apnea testing devices, such as the level III device, are currently in use. According to the AASM guideline, a level III sleep study is a smaller PSG version using fewer signals in a portable device, where a minimum of four signals are required: heart rate, oxygen saturation and two channels of respiratory movement or respiratory movement and airflow [2]. Correspondingly, we will present a novel method for detecting respiratory events that is suitable for level III systems and yields comparable results to detection performed with full PSG equipment.
In a previous study, three home sleep apnea testing devices using a reduced number of sensors (Nox T-3 (Nox Medical), ARES (SleepMed Inc.), and WatchPAT (Itamar Medical)), were compared to PSG and showed excellent agreement with the intra-class correlation coefficient (ICC) > 0.93 [3]. SpO2 is a signal measured by pulse oximeter and is available in level III devices. It has been shown to be a reliable signal for detecting respiratory events: an automated detection based on only SpO2 reported an average accuracy of 91% and an average Cohen’s kappa of 0.71. However, because of the delay between SpO2 and apnea onset, the start and end times of respiratory events cannot be obtained, so a 25-secondcorrection was applied [4]. We wanted to investigate whether respiratory events using RIPsum and SpO2 can be detected, where the predicted similar AHI is comparable to PSG scoring, the signals that are readily available in level III sleep devices. The combination of the two sensors was to ensure that the precise time and duration of each respiratory event were also detected. This is also to provide a detection system that is not dependent on the nasal pressure sensor and to provide an alternative when nasal sensors fail in level III devices.
2 Materials and Methods
2.1 Subjects
98 patients, 77 with suspected sleep apnea and 21 without, were included in this study. Each patient underwent an overnight PSG recording at Advanced Sleep Research GmbH in Berlin, Germany or at Kepler University Hospital, Department of Neurology 2, in Linz, Austria. The Linz clinic used Somnoscreen Plus with Domino software (Somnomedics, Randersacker, Germany) while the Berlin clinic used the EMBLA N7000 system with RemLogic 3.4.1 software (Embla systems, Broomfield, CO, USA). 65 of the subjects were men and 33 were women. The mean age was 53 years old (± 15.2). The mean body mass index (BMI) was 28.4 kg/m2 (± 5.2). PSG data of the patients are shown in Appendix Table 1. This study protocol was approved by the ethics committee of the state of Upper Austria (B-130–17) and the Charité—Universitätsmedizin Berlin (EA1/127/16). Written and signed consents forms were obtained from the patients prior to inclusion in the study.
2.2 Respiratory Events and AHI Calculation
2.2.1 Procedure
Each PSG recording came with respiratory events annotations detected by the PSG software. Apneas and hypopneas were pooled together and referred to as “respiratory events.” The PSG AHI from the annotations was calculated by the number of respiratory events divided by the total sleep time (TST), where TST was derived from the PSG’s hypnogram. The PSG AHI served as a reference for our study.
The automatic system for detecting respiratory events proposed in this study used signals available in level III devices, the uncalibrated RIPsum and SpO2 signals. The algorithm outputs the time and duration of the detected respiratory events. The algorithm also calculates the predicted AHITRT per recording. Unlike in PSG, sleep staging is not readily available in level III devices. Therefore, the predicted AHI was calculated using the total recording time (TRT). This study compared the predicted AHITRT computed using the TRT to the PSG AHI that was computed using TST. The difference between AHIs computed using TRT and TST is affected by the distribution of respiratory events during sleep and wake periods. When predicted respiratory events occur mainly during sleep periods, the AHITRT would consequently be underestimated compared to the PSG AHI given that the TRT would be higher than the TST.
The severity level of each recording was determined and compared to the reference scoring. The levels of severity were as follows: mild if the AHI range was between 5 and 15, moderate when between 15 and 30 and severe when the AHI was above 30 [5]. The normal category was also considered in this study for those recordings with AHI < 5.
2.2.2 Automatic Detection of RIPsum Events
RIPsum reduced events were detected = by first deriving a smooth upper envelope signal of the uncalibrated RIPsum. The upper envelope signal was derived by identifying all positive peaks in the RIPsum and performing a spline interpolation. The location of major peaks of the smooth signal was identified and segments of the RIPsum were created between the peaks. Each segment was automatically processed for any sub-segment with an amplitude lower than a threshold. The threshold used was based on the beginning peak amplitude and was determined heuristically. The sub-segments identified as being lower than the threshold must be at least 10 s in duration in order to be classified as a RIPsum event.
2.2.3 Automatic Detection of SpO2 Desaturation Events
We defined SpO2 desaturation events as events in the SpO2 signal with a desaturation of at least 3% and a subsequent return to the pre-desaturation oxygen level. The detection was performed by locating major peaks of the SpO2 signal as shown in Fig. 1 and measuring any desaturation in the signal between the peaks. The start of such an event was set at the start of desaturation and terminated on the completion of SpO2 re-saturation before the next succeeding peak.
2.2.4 RIPsum Events and SpO2 Desaturation Events for Detection
The detection of respiratory events was performed by pairing RIPsum events with associated SpO2 events, as illustrated in Fig. 1. To do so, the algorithm identified pairs of events with a RIPsum event followed by an SpO2 event with a maximum delay of 60 s. When such a pair was found, the algorithm labeled the event as a respiratory event, with the starting time and duration identical to the RIPsum event. The predicted AHITRT was calculated according to the total number of respiratory events and the TRT.
2.3 Statistical Analysis
The algorithm was evaluated by comparing the predicted AHITRT to the PSG AHI. The predicted AHITRT was computed using the TRT because it was designed for level III application while the PSG AHI was computed using the TST. An evaluation was also carried out by computing a predicted AHITST using the TST. This was to evaluate the performance of the algorithm when sleep/wake information is available. While standard level III devices do not come readily equipped with sleep/wake information, some are compatible with portable electroencephalography (EEG) for sleep staging. The Spearman’s r was calculated to evaluate correlation between the predicted AHI and PSG AHI. ICC r was computed as the metric of reliability or the degree of correlation and agreement. An ICC \(r >0.90\) indicates excellent reliability [6]. Following the guideline presented in [6], we used the two-way mixed effects ICC model with absolute agreement and single rater type. The ICC r was computed with the upper and lower bounds at 95% confidence interval (CI). Additionally, Bland–Altman analysis was performed to evaluate agreement with the PSG AHI by the mean difference and limits of agreement, set at ± 1.96 standard deviation, i.e. 95% confidence interval [7].
Aside from the comparison of predicted AHI to the PSG AHI, the severity classifications were also compared. Severity categories were defined a follows [5]: mild (5 ≤ AHI 15), moderate (15 ≤ AHI < 30), severe (AHI ≥ 30) and the normal category (AHI < 5). The confusion matrix and accuracy were prepared to compare the predicted severity to the PSG classified severity.
3 Results
Figure 2 shows the predicted AHITRT compared to the PSG AHIs whereas the full results are tabulated in Appendix Table 3. The PSG AHIs were calculated using the TST while the predicted AHIs were computed using TRT. A comparison between the predicted AHITRT and PSG AHI resulted in low median absolute difference, \(\left| {\Delta {\text{AHI}}} \right|\)= 2.8, Spearman’s r = 0.96 (p < 0.001) and ICC \(r = 0.96{ }\left( {0.95 - 0.97,p < 0.001} \right)\). The ICC indicates excellent reliability, suggesting high correlation and agreement. Furthermore, 70 recordings out of 98 had an \(\left| {\Delta {\text{AHI}}} \right|\)≤ 5. The Bland–Altman plot in Fig. 3 shows a mean difference at 0.6, where all but two are within or at borderline of the limits of agreements. The only significant outlier in Fig. 2 is recording no. 19 (Appendix Table 3) with total sleep efficiency of 67%.
The algorithm was also evaluated using TST, to test performance of the algorithm when sleep information is available, with the predicted AHITST shown in Fig. 4. The predicted AHITST compared to PSG AHI resulted in Spearman’s r = 0.97 (p < 0.001), ICC r = 0.97 (0.96 – 0.98, p < 0.0001), and median \(\left| {\Delta {\text{AHI}}} \right|\)= 2.6. The Bland–Altman plot in Fig. 5 shows a mean difference of − 1.1.
Table 1 shows the confusion matrix for severity classification. The classification was based on the predicted AHITRT. The algorithm classified the correct severity for 75.5% (n = 74) of the recordings. 16 of the recordings were classified as a higher severity by the algorithm and the remaining 8 were classified as a lower severity. The average \(\left| {\Delta {\text{AHI}}} \right|\) between the underestimated recordings was 4.9, while for the overestimation, it was 5.0. The average absolute difference of predicted AHI from the correct classification cutoff (e.g. point difference of predicted AHI from 5 or 15 for mild category) for the underestimated recordings was 3.0, and 3.2 for the overestimation. No misclassification by more than one severity level occurred in any of the recordings. The severity classification based on predicted AHITST shown in Table 2 performed better with 80.6% (n = 79) accuracy, with 11 overestimated recordings and 8 underestimated recordings.
4 Discussion
In this study, we developed an algorithm using only RIPsum and SpO2 to detect respiratory events, intended for use with level III home sleep apnea testing devices. We tested our algorithm on 98 patients, 77 with suspected sleep apnea and 21 without.
The predicted AHITRT, performed well with an ICC of r = 0.96 and Spearman’s r = 0.96 when compared to PSG AHI. The results suggest that our algorithm showed a high level of agreement and correlation with full PSG based AHIs. The median difference \(\left| {\Delta {\text{AHI}}} \right|\) was only 2.8 and 70 of the recordings have an \(\left| {\Delta {\text{AHI}}} \right|\)≤ 5. The outlier shown in Fig. 3 was underestimated by 26.6 AHI points. This was caused by the predicted events predominantly occurring during sleep time, in relation to the effect of sleep–wake distribution of events to the computed AHITRT.
Level III devices do not come readily equipped with sleep staging. Therefore, the predicted AHI was computed using the TRT. Nevertheless, sleep staging can be added when portable EEG devices are integrated into level III devices. To test whether our algorithm can perform reliably when sleep/wake information is added, we computed the predicted AHI using TST from the hypnogram. The predicted AHITST scored an ICC of r = 0.97. The algorithm improved when given the sleep and wake information. Nevertheless, even without the TST, the algorithm performed with excellent reliability.
The severity classification (based on AHI using TST) scored 75.5% accuracy, where 16 of the recordings were overestimated. When we considered overestimated severity as being acceptable, then 92% (n = 90) of the recordings were given a safe classification. This assumes that it is safer to overestimate the severity than to underestimate it, i.e. it is better to say a patient has moderate AHI when he has mild AHI than to say a patient has normal AHI when it is actually mild. For the eight underestimated recordings, the predicted AHITRT was on average only 3.0 points away from the cutoff of its correct severity level. And only two of the underestimated recordings were misclassified as normal instead of mild. For comparison, we also performed the severity classification using the predicted AHITST. Using the total sleep time, the predicted severity, increased to 80.6% accuracy, as expected.
The results of our algorithm are comparable to studies on detection algorithms with fewer or novel sensors. Three different home sleep apnea testing devices scored ICC r = 0.93 − 0.97 compared to PSG scoring showing high reliability [3]. The WatchPAT validation study for AHI estimation reported a Spearman’s r of 0.802 between the device’s rapid eye movement sleep (REM) and non-REM-based AHI versus PSG AHI scores [8]. Another study reported a predictive model for apneas and hypopneas using SleepView, a portable two-channel diagnostic device for sleep-related diseases: Using a nasal pressure cannula transducer and pulse oximetry sensor, a correlation of \(r^{2} = 0.84,p < 0.01\) was reported between AHI calculated using TRT using SleepView software versus PSG AHI based on TST using 93 subjects [9]. Our algorithm performed on par with other apnea prediction systems without airflow sensors. Using a microphone placed one meter above the bed to detect snoring and estimate AHI, a correlation coefficient of r2 = 0.81 was achieved compared to AHIs scored according to the AASM scoring criteria [10]. Using tracheal sound signal and pulse oximetry, a linear correlation score of 0.96 was reported between the estimated AHI and manually scored PSG AHI [11]. A recent study using tracheal sounds to identify apneas reported 92.8% sensitivity and 99.7% specificity [12]. Using only the thoracic respiratory effort, a comparison based on sleep and wake periods between estimated AHI and scored AHI resulted in a correlation coefficient of r2 = 0.73 for training and r2 = 0.55 for validation set [13]. A study estimating AHI using only SpO2 reported a Cohen’s kappa of 0.71 and an accuracy of 91% [4].
We calculated the Spearman’s r between the AHI values and the ICC, to evaluate not only correlation but also agreement. This is to show that the predicted AHIs not only have a positive linear correlation with the reference AHIs but also are not greatly misestimated, to avoid misclassification of severity. We take note of the usage of SpO2 alone to estimate the AHI. However, we make the case of using RIPsum because it provides the possibility of classifying events between obstructive, central, or mixed, which will be of interest for future work. Furthermore, with RIPsum, the precise location and duration of the respiratory events can be determined. One limitation of using respiratory effort instead of nasal sensors is that the distinction between apnea and hypopnea cannot be made. However, the aim of this algorithm is to provide an alternative system that is not dependent on the nasal pressure sensor, so that the patient’s comfort level can be increased and to provide an alternative in the event that nasal sensors in level III devices fail.
5 Conclusion
Our results showed that our method using RIPsum and SpO2 has excellent agreement and correlation with PSG scoring. However, it must be noted that the difference in total recording time and total sleep time can affect the estimated AHI. Nevertheless, the algorithm can detect respiratory events without using airflow sensors, ensuring more comfortable sleep for patients. Another advantage is that the sensors needed for our algorithm are available in and compliant with level III sleep studies.
References
Berry RB, Albertario CL, Hardin SM (2018) for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology, and Technical Specifications. Version 2.5. Darien, IL: American Academy of Sleep Medicine
Kapur, V. K., Auckley, D. H., Chowdhuri, S., et al. (2017). Clinical practice guideline for diagnostic testing for adult obstructive sleep apnea: An American academy of sleep medicine clinical practice guideline. Journal of Clinical Sleep Medicine, 13, 479–504. https://doi.org/10.5664/jcsm.6506.
Hong, D., Arguelles, J., Montoya, E., et al. (2017). 0492 comparing three home sleep apnea testing devices to polysomnography: Simultaneous and multi-night assessments. Sleep, 40, A183–A184. https://doi.org/10.1093/sleepj/zsx050.491.
Jung, D. W., Hwang, S. H., Cho, J. G., et al. (2018). Real-time automatic apneic event detection using nocturnal pulse oximetry. IEEE Transactions on Biomedical Engineering, 65, 706–712. https://doi.org/10.1109/TBME.2017.2715405.
American Academy of Sleep Medicine Task Force. (1999). Sleep-related breathing disorders in adults: recommendations for syndrome definition and measurement techniques in clinical research. Sleep, 22, 667–689. https://doi.org/10.1093/sleep/22.5.667.
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15, 155–163. https://doi.org/10.1016/j.jcm.2016.02.012.
Giavarina, D. (2015). Understanding Bland Altman analysis. Biochemical Medicine, 25, 141–151. https://doi.org/10.11613/BM.2015.015.
Körkuyu, E., Düzlü, M., Karamert, R., et al. (2015). The efficacy of watch PAT in obstructive sleep apnea syndrome diagnosis. European Archives of Oto-Rhino-Laryngology, 272, 111–116. https://doi.org/10.1007/s00405-014-3097-0.
Zou, J., Meng, L., Liu, Y., et al. (2015). Evaluation of a 2-channel portable device and a predictive model to screen for obstructive sleep apnea in a laboratory environment. Respiratory Care, 60, 356–362. https://doi.org/10.4187/respcare.03626.
Ben-Israel, N., Tarasiuk, A., & Zigel, Y. (2012). Obstructive apnea hypopnea index estimation by analysis of nocturnal snoring signals in adults. Sleep, 35, 1299–1305. https://doi.org/10.5665/sleep.2092.
Yadollahi, A., Giannouli, E., & Moussavi, Z. (2010). Sleep apnea monitoring and diagnosis based on pulse oximetery and tracheal sound signals. Medical & Biological Engineering & Computing, 48, 1087–1097. https://doi.org/10.1007/s11517-010-0674-2.
Kalkbrenner, C., Eichenlaub, M., Rüdiger, S., et al. (2018). Apnea and heart rate detection from tracheal body sounds for the diagnosis of sleep-related breathing disorders. Medical & Biological Engineering & Computing, 56, 671–681. https://doi.org/10.1007/s11517-017-1706-y.
Bianchi, M. T., Lipoma, T., Darling, C., et al. (2014). automated sleep apnea quantification based on respiratory movement. International Journal of Medical Sciences, 11, 796–802. https://doi.org/10.7150/ijms.9303.
Funding
This study was supported by the Austrian Research Promotion Agency (FFG), project ID 859622. Thomas Penzel was partially supported by Russian Federation Government Grant No. 075–15-2019–1885.
Author information
Authors and Affiliations
Contributions
CC developed the algorithm and performed the computations. CC, CW, and HG designed the study and wrote the paper. CW and BK co-developed the algorithm. MW, MM, MG, and TP carried out PSG recordings. EK provided technical and scientific consultations. SS, MB, GK, and ASK provided sleep studies and scoring consultations. All authors reviewed and approved the manuscript for submission.
Corresponding author
Ethics declarations
Conflicts of interest
All authors declare no conflicts of interest.
Ethical Approval
This study protocol was approved by the ethics committee of the state of Upper Austria (B-130–17) and the Charité—Universitätsmedizin Berlin (EA1/127/16).
Informed Consent
Written and signed consents form were obtained from the patients prior to inclusion in the study.
Research Involving Animal Rights
This article does not contain any studies with animals performed by any of the authors.
Appendix
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Coronel, C., Wiesmeyr, C., Garn, H. et al. Detection of Respiratory Events by Respiratory Effort and Oxygen Desaturation. J. Med. Biol. Eng. 40, 517–525 (2020). https://doi.org/10.1007/s40846-020-00524-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40846-020-00524-9
Keywords
- Obstructive sleep apnea
- Peripheral oxygen saturation
- Automatic scoring
- Home sleep apnea testing
- Portable monitoring