Detection of Respiratory Events by Respiratory Effort and Oxygen Desaturation

Purpose Sleep respiratory events are scored based on the reduction of airflow measured by a thermistor or nasal pressure cannula, together with oxygen desaturation and arousal criteria for hypopneas. We investigated whether automatic scoring can be performed without using the uncomfortable oronasal sensors and developed an automatic scoring system that is compatible with level III home sleep apnea testing devices. Methods We developed a respiratory event detection algorithm, based on SpO 2 and respiratory effort signal measured from respiratory inductance plethysmograph (uncalibrated RIPsum), that outputs the time and duration of detected events and calculates an apnea–hypopnea-index (AHI) based on total recording time. The algorithm was tested on 98 polysomnography (PSG) recordings of patients, 77 with suspected sleep apnea and 21 without. The results were compared to annotations provided by the PSG systems where PSG AHI was computed using the total sleep time. The predicted AHI was evaluated for correlation and agreement with the PSG AHI using the intra-class correlation coefficient (ICC). Severity classification was performed and evaluated using the following categories: normal (< 5), mild (5–15), moderate (15–30), and severe ( $$\ge 30$$ ≥ 30 ). Results The ICC between predicted AHI and PSG AHI scored r  = 0.96 (0.95–0.97, p  < 0.001). The algorithm correctly predicted the severity for 74 recordings, overestimated 16, and underestimated 8. There was no misclassification by more than one severity level. Conclusion Using respiratory effort and SpO 2 , our algorithm was able to detect respiratory events with high correlation and agreement compared to full PSG-based detection.


Introduction
Respiratory events are episodes during sleep where respiratory activity is disturbed. The American Academy of Sleep Medicine (AASM) scoring manual describes two respiratory events, namely apnea and hypopnea [1]. According to the scoring manual, apnea is the disruption of breathing during sleep with a reduction in airflow of at least 90% for at least 10 s while hypopnea is a reduction of airflow of at least 30%, with an additional requirement of having an associated oxygen desaturation and/or arousal [1]. The apnea-hypopnea index (AHI) is the measure used to quantify sleep apnea and hypopnea and classify its severity. AHI is defined as the number of apnea and hypopnea events per hour during sleep. Polysomnography (PSG) is a multi-sensor overnight recording of sleep and is the gold standard for sleep diagnosis. It is used to obtain the AHI. The relevant respiratory sensors are oronasal thermistor, nasal pressure cannula, abdomen and thorax respiratory inductance plethysmography (RIP) belts, finger pulse oximeter, and electroencephalography [1].
Aside from the standard PSG, home sleep apnea testing devices, such as the level III device, are currently in use. According to the AASM guideline, a level III sleep study is a smaller PSG version using fewer signals in a portable device, where a minimum of four signals are required: heart rate, oxygen saturation and two channels of respiratory movement or respiratory movement and airflow [2]. Correspondingly, we will present a novel method for detecting respiratory events that is suitable for level III systems and yields comparable results to detection performed with full PSG equipment.
In a previous study, three home sleep apnea testing devices using a reduced number of sensors (Nox T-3 (Nox Medical), ARES (SleepMed Inc.), and WatchPAT (Itamar Medical)), were compared to PSG and showed excellent agreement with the intra-class correlation coefficient (ICC) > 0.93 [3]. SpO 2 is a signal measured by pulse oximeter and is available in level III devices. It has been shown to be a reliable signal for detecting respiratory events: an automated detection based on only SpO 2 reported an average accuracy of 91% and an average Cohen's kappa of 0.71. However, because of the delay between SpO 2 and apnea onset, the start and end times of respiratory events cannot be obtained, so a 25-secondcorrection was applied [4]. We wanted to investigate whether respiratory events using RIPsum and SpO 2 can be detected, where the predicted similar AHI is comparable to PSG scoring, the signals that are readily available in level III sleep devices. The combination of the two sensors was to ensure that the precise time and duration of each respiratory event were also detected. This is also to provide a detection system that is not dependent on the nasal pressure sensor and to provide an alternative when nasal sensors fail in level III devices.

Subjects
98 patients, 77 with suspected sleep apnea and 21 without, were included in this study. Each patient underwent an overnight PSG recording at Advanced Sleep Research GmbH in Berlin, Germany or at Kepler University Hospital, Department of Neurology 2, in Linz, Austria. The Linz clinic used Somnoscreen Plus with Domino software (Somnomedics, Randersacker, Germany) while the Berlin clinic used the EMBLA N7000 system with RemLogic 3.4.1 software (Embla systems, Broomfield, CO, USA). 65 of the subjects were men and 33 were women. The mean age was 53 years old (± 15.2). The mean body mass index (BMI) was 28.4 kg/ m 2 (± 5.2). PSG data of the patients are shown in Appendix Table 1. This study protocol was approved by the ethics committee of the state of Upper Austria (B-130-17) and the Charité-Universitätsmedizin Berlin (EA1/127/16). Written and signed consents forms were obtained from the patients prior to inclusion in the study.

Procedure
Each PSG recording came with respiratory events annotations detected by the PSG software. Apneas and hypopneas were pooled together and referred to as "respiratory events." The PSG AHI from the annotations was calculated by the number of respiratory events divided by the total sleep time (TST), where TST was derived from the PSG's hypnogram. The PSG AHI served as a reference for our study.
The automatic system for detecting respiratory events proposed in this study used signals available in level III devices, the uncalibrated RIPsum and SpO 2 signals. The algorithm outputs the time and duration of the detected respiratory events. The algorithm also calculates the predicted AHI TRT per recording. Unlike in PSG, sleep staging is not readily available in level III devices. Therefore, the predicted AHI was calculated using the total recording time (TRT). This study compared the predicted AHI TRT computed using the TRT to the PSG AHI that was computed using TST. The difference between AHIs computed using TRT and TST is affected by the distribution of respiratory events during sleep and wake periods. When predicted respiratory events occur mainly during sleep periods, the AHI TRT would consequently be underestimated compared to the PSG AHI given that the TRT would be higher than the TST.
The severity level of each recording was determined and compared to the reference scoring. The levels of severity were as follows: mild if the AHI range was between 5 and 15, moderate when between 15 and 30 and severe when the AHI was above 30 [5]. The normal category was also considered in this study for those recordings with AHI < 5.

Automatic Detection of RIPsum Events
RIPsum reduced events were detected = by first deriving a smooth upper envelope signal of the uncalibrated RIPsum. The upper envelope signal was derived by identifying all positive peaks in the RIPsum and performing a spline interpolation. The location of major peaks of the smooth signal was identified and segments of the RIPsum were created between the peaks. Each segment was automatically processed for any sub-segment with an amplitude lower than a threshold. The threshold used was based on the beginning peak amplitude and was determined heuristically. The sub-segments identified as being lower than the threshold must be at least 10 s in duration in order to be classified as a RIPsum event.

Automatic Detection of SpO 2 Desaturation Events
We defined SpO 2 desaturation events as events in the SpO 2 signal with a desaturation of at least 3% and a subsequent return to the pre-desaturation oxygen level. The detection was performed by locating major peaks of the SpO 2 signal as shown in Fig. 1 and measuring any desaturation in the signal between the peaks. The start of such an event was set at the start of desaturation and terminated on the completion of SpO 2 re-saturation before the next succeeding peak.

RIPsum Events and SpO 2 Desaturation Events for Detection
The detection of respiratory events was performed by pairing RIPsum events with associated SpO 2 events, as illustrated in Fig. 1. To do so, the algorithm identified pairs of events with a RIPsum event followed by an SpO 2 event with a maximum delay of 60 s. When such a pair was found, the algorithm labeled the event as a respiratory event, with the starting time and duration identical to the RIPsum event. The predicted AHI TRT was calculated according to the total number of respiratory events and the TRT.

Statistical Analysis
The algorithm was evaluated by comparing the predicted AHI TRT to the PSG AHI. The predicted AHI TRT was computed using the TRT because it was designed for level III application while the PSG AHI was computed using the TST. An evaluation was also carried out by computing a predicted AHI TST using the TST. This was to evaluate the performance of the algorithm when sleep/wake information is available. While standard level III devices do not come readily equipped with sleep/wake information, some are compatible with portable electroencephalography (EEG) for sleep staging. The Spearman's r was calculated to evaluate correlation between the predicted AHI and PSG AHI. ICC r was computed as the metric of reliability or the degree of correlation and agreement. An ICC r > 0.90 indicates excellent reliability [6]. Following the guideline presented in [6], we used the two-way mixed effects ICC model with absolute agreement and single rater type. The ICC r was computed with the upper and lower bounds at 95% confidence interval (CI). Additionally, Bland-Altman analysis was performed to evaluate agreement with the PSG AHI by the mean difference and limits of agreement, set at ± 1.96 standard deviation, i.e. 95% confidence interval [7]. Aside from the comparison of predicted AHI to the PSG AHI, the severity classifications were also compared. Severity categories were defined a follows [5]: mild (5 ≤ AHI 15), moderate (15 ≤ AHI < 30), severe (AHI ≥ 30) and the normal category (AHI < 5). The confusion matrix and accuracy were prepared to compare the predicted severity to the PSG classified severity. Figure 2 shows the predicted AHI TRT compared to the PSG AHIs whereas the full results are tabulated in Appendix Table 3. The PSG AHIs were calculated using the TST while the predicted AHIs were computed using TRT. A comparison between the predicted AHI TRT and PSG AHI resulted in low median absolute difference, |ΔAHI|= 2.8, Spearman's r = 0.96 (p < 0.001) and ICC r = 0.96(0.95 − 0.97, p < 0.001) . The ICC indicates excellent reliability, suggesting high correlation and agreement. Furthermore, 70 recordings out of 98 had an |ΔAHI|≤ 5. The Bland-Altman plot in Fig. 3 shows a mean difference at 0.6, where all but two are within or at borderline of the limits of agreements. The only significant outlier in Fig. 2 is recording no. 19 (Appendix Table 3) with total sleep efficiency of 67%.

Results
The algorithm was also evaluated using TST, to test performance of the algorithm when sleep information is available, with the predicted AHI TST shown in Fig. 4. The predicted AHI TST compared to PSG AHI resulted in Spearman's r = 0.97 (p < 0.001), ICC r = 0.97 (0.96 -0.98, p < 0.0001), and median |ΔAHI|= 2.6. The Bland-Altman plot in Fig. 5 shows a mean difference of − 1.1. Table 1 shows the confusion matrix for severity classification. The classification was based on the predicted Bland-Altman plot between predicted AHI TST vs. AHI PSG AHI TRT . The algorithm classified the correct severity for 75.5% (n = 74) of the recordings. 16 of the recordings were classified as a higher severity by the algorithm and the remaining 8 were classified as a lower severity. The average |ΔAHI| between the underestimated recordings was 4.9, while for the overestimation, it was 5.0. The average absolute difference of predicted AHI from the correct classification cutoff (e.g. point difference of predicted AHI from 5 or 15 for mild category) for the underestimated recordings was 3.0, and 3.2 for the overestimation. No misclassification by more than one severity level occurred in any of the recordings. The severity classification based on predicted AHI TST shown in Table 2 performed better with 80.6% (n = 79) accuracy, with 11 overestimated recordings and 8 underestimated recordings.

Discussion
In this study, we developed an algorithm using only RIPsum and SpO 2 to detect respiratory events, intended for use with level III home sleep apnea testing devices. We tested our algorithm on 98 patients, 77 with suspected sleep apnea and 21 without.
The predicted AHI TRT , performed well with an ICC of r = 0.96 and Spearman's r = 0.96 when compared to PSG AHI. The results suggest that our algorithm showed a high level of agreement and correlation with full PSG based AHIs. The median difference |ΔAHI| was only 2.8 and 70 of the recordings have an |ΔAHI|≤ 5. The outlier shown in Fig. 3 was underestimated by 26.6 AHI points. This was caused by the predicted events predominantly occurring during sleep time, in relation to the effect of sleep-wake distribution of events to the computed AHI TRT .
Level III devices do not come readily equipped with sleep staging. Therefore, the predicted AHI was computed using the TRT. Nevertheless, sleep staging can be added when portable EEG devices are integrated into level III devices. To test whether our algorithm can perform reliably when sleep/wake information is added, we computed the predicted AHI using TST from the hypnogram. The predicted AHI TST scored an ICC of r = 0.97. The algorithm improved when given the sleep and wake information. Nevertheless, even without the TST, the algorithm performed with excellent reliability.
The severity classification (based on AHI using TST) scored 75.5% accuracy, where 16 of the recordings were overestimated. When we considered overestimated severity as being acceptable, then 92% (n = 90) of the recordings were given a safe classification. This assumes that it is safer to overestimate the severity than to underestimate it, i.e. it is better to say a patient has moderate AHI when he has mild AHI than to say a patient has normal AHI when it is actually mild. For the eight underestimated recordings, the predicted AHI TRT was on average only 3.0 points away from the cutoff of its correct severity level. And only two of the underestimated recordings were misclassified as normal instead of mild. For comparison, we also performed the severity classification using the predicted AHI TST . Using the total sleep time, the predicted severity, increased to 80.6% accuracy, as expected.
The results of our algorithm are comparable to studies on detection algorithms with fewer or novel sensors. Three different home sleep apnea testing devices scored ICC r = 0.93 − 0.97 compared to PSG scoring showing high  [3]. The WatchPAT validation study for AHI estimation reported a Spearman's r of 0.802 between the device's rapid eye movement sleep (REM) and non-REMbased AHI versus PSG AHI scores [8]. Another study reported a predictive model for apneas and hypopneas using SleepView, a portable two-channel diagnostic device for sleep-related diseases: Using a nasal pressure cannula transducer and pulse oximetry sensor, a correlation of r 2 = 0.84, p < 0.01 was reported between AHI calculated using TRT using SleepView software versus PSG AHI based on TST using 93 subjects [9]. Our algorithm performed on par with other apnea prediction systems without airflow sensors. Using a microphone placed one meter above the bed to detect snoring and estimate AHI, a correlation coefficient of r 2 = 0.81 was achieved compared to AHIs scored according to the AASM scoring criteria [10]. Using tracheal sound signal and pulse oximetry, a linear correlation score of 0.96 was reported between the estimated AHI and manually scored PSG AHI [11]. A recent study using tracheal sounds to identify apneas reported 92.8% sensitivity and 99.7% specificity [12]. Using only the thoracic respiratory effort, a comparison based on sleep and wake periods between estimated AHI and scored AHI resulted in a correlation coefficient of r 2 = 0.73 for training and r 2 = 0.55 for validation set [13]. A study estimating AHI using only SpO 2 reported a Cohen's kappa of 0.71 and an accuracy of 91% [4]. We calculated the Spearman's r between the AHI values and the ICC, to evaluate not only correlation but also agreement. This is to show that the predicted AHIs not only have a positive linear correlation with the reference AHIs but also are not greatly misestimated, to avoid misclassification of severity. We take note of the usage of SpO 2 alone to estimate the AHI. However, we make the case of using RIPsum because it provides the possibility of classifying events between obstructive, central, or mixed, which will be of interest for future work. Furthermore, with RIPsum, the precise location and duration of the respiratory events can be determined. One limitation of using respiratory effort instead of nasal sensors is that the distinction between apnea and hypopnea cannot be made. However, the aim of this algorithm is to provide an alternative system that is not dependent on the nasal pressure sensor, so that the patient's comfort level can be increased and to provide an alternative in the event that nasal sensors in level III devices fail.

Conclusion
Our results showed that our method using RIPsum and SpO 2 has excellent agreement and correlation with PSG scoring. However, it must be noted that the difference in total recording time and total sleep time can affect the estimated AHI. Nevertheless, the algorithm can detect respiratory events without using airflow sensors, ensuring more comfortable sleep for patients. Another advantage is that the sensors needed for our algorithm are available in and compliant with level III sleep studies.
Author's Contributions CC developed the algorithm and performed the computations. CC, CW, and HG designed the study and wrote the paper. CW and BK co-developed the algorithm. MW, MM, MG, and TP carried out PSG recordings. EK provided technical and scientific consultations. SS, MB, GK, and ASK provided sleep studies and scoring consultations. All authors reviewed and approved the manuscript for submission.

Conflicts of interest All authors declare no conflicts of interest.
Ethical Approval This study protocol was approved by the ethics committee of the state of Upper Austria (B-130-17) and the Charité-Universitätsmedizin Berlin (EA1/127/16).

Informed Consent
Written and signed consents form were obtained from the patients prior to inclusion in the study.

Research Involving Animal Rights
This article does not contain any studies with animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.