Cascading detection model for prediction of apnea-hypopnea events based on nasal flow and arterial blood oxygen saturation

Purpose Sleep apnea and hypopnea syndrome (SAHS) seriously affects sleep quality. In recent years, much research has focused on the detection of SAHS using various physiological signals and algorithms. The purpose of this study is to find an efficient model for detection of apnea-hypopnea events based on nasal flow and SpO2 signals. Methods A 60-s detector and a 10-s detector were cascaded for precise detection of apnea-hypopnea (AH) events. Random forests were adopted for classification of data segments based on morphological features extracted from nasal flow and arterial blood oxygen saturation (SpO2). Then the segments’ classification results were fed into an event detector to locate the start and end time of every AH event and predict the AH index (AHI). Results A retrospective study of 24 subjects’ polysomnography recordings was conducted. According to segment analysis, the cascading detection model reached an accuracy of 88.3%. While Pearson’s correlation coefficient between estimated AHI and reference AHI was 0.99, in the diagnosis of SAHS severity, the proposed method exhibited a performance with Cohen’s kappa coefficient of 0.76. Conclusions The cascading detection model is able to detect AH events and provide an estimate of AHI. The results indicate that it has the potential to be a useful tool for SAHS diagnosis.


Introduction
Sleep apnea and hypopnea syndrome (SAHS) is a prevalent sleep breathing disorder in middle-aged people. The gold standard for diagnosis of SAHS is to perform polysomnography (PSG) in a laboratory. However, PSG requires patients to sleep with many sensors for at least one night; the scoring of apneahypopnea (AH) events can take a long time. Therefore, many researchers hope to simplify or replace PSG by using a limited number of physiological signals. Electrocardiogram (ECG) was first studied for this purpose. McNames et al. [1] found that heart rate, S-pulse amplitude, and pulse energy were correlated with SAHS. Bsoul et al. [2] cut the ECG into 60-s segments and used a support vector machine (SVM) for real-time detection of apnea. However, many other diseases except SAHS also affect ECG. Hence, nasal flow (NF) [3][4][5][6], arterial blood oxygen saturation (SpO 2 ) [7], snoring [8], or a combination of these signals [9,10] have been adopted more recently. Gutierrez et al. [4] used the overall features of NF for the diagnosis of SAHS severity. Xie et al. [10] utilized a combination of classifiers to achieve realtime detection of SAHS based on ECG and SpO 2 . All the above studies can be roughly divided into two categories: those that predict the AH index (AHI) based on the detection of AH events [2,3,5,7,[9][10][11], and those that predict AHI based on the overall signal features [1,4,6,8,12,13]. The latter approach cannot provide time information for each AH event, whereas most studies in the former [2,7,10,11] only involve a 60-s segment identification which may not be accurate for predicting the segments containing multiple AH events and may lead to errors in the estimation of AHI. On the other hand, the methods mentioned above include rule-based [5,7,9], SVM [2,10,11], and supervised neural network [3,11], which require a large number of hyperparameters to be set by experience. Therefore, we utilized random forests composed of classification and regression trees (CARTs) based on morphological features extracted from NF and SpO 2 for AH events detection. A 60-s detector and a 10-s detector were cascaded for more precise detection of AH events.

Subjects
The St. Vincent University Hospital/Dublin University College Sleep Apnea Syndrome Database (UCDDB) [14] public on Physionet [15] was used for a retrospective data analysis throughout this paper. The database contains 25 subjects' PSG data, including EEG, electrooculogram, submental electromyography, NF, ribcage and abdomen movements, SpO 2 , snoring, and body position. All signals were obtained using a Jaeger-Toennies system. The annotation files consisted of onset time and duration of respiratory events provided by an experienced specialist. The cutoff values for AHI were commonly set to 5, 15, or 30 events/h [3,4,7,16,17]. There were data for two non-SAHS subjects, twelve mild-SAHS subjects, five moderate-SAHS subjects, and six severe-SAHS subjects in the database. While there was a severe distortion in the NF signal of subject ucddb005 thus this recording was excluded. Consequently, totally 24 subjects' polysomnography recordings were taken into this study. The sleep-related parameters of the subjects are summarized in Table 1.
According to the American Academy of Sleep Medicine (AASM) manual [16], apnea is scored when there is a more than 90% drop in the peak signal of the pre-event baseline for NF with a duration longer than 10 s. Hypopnea is scored by the following rules: (1) there is a more than 30% drop in the peak signal of the pre-event baseline for nasal pressure with a duration longer than 10 s, accompanied by (2) more than 3% arterial oxygen desaturation or an arousal. As a result, we selected NF and SpO 2 for SAHS detection. The NF signal was recorded by a thermistor while SpO 2 was recorded by a finger pulse oximeter and the sampling rate of both was 8 Hz.

Study design
The cascading detection model based on AH event detection is shown in Fig. 1. It comprises the following main steps: (1) removal of invalid data, NF signal filtering, segmentation with a sliding window, and SpO 2 alignment; (2) extraction of a specific feature set from each segment; 3) the cascading detection model predicts each segment and outputs a sequence of segments' results; (4) the event detector corrects the invalid results in the sequence and calculates the AHI.

Signal preprocessing
Signal preprocessing comprises the following four steps: (1) removal of invalid data. Any SpO 2 values lower than 50% were considered to be artifacts and removed from the analysis (5.6% of the data). (2) NF signal filtering-a four-point sliding average filter and a third-order Butterworth high-pass filter with a cutoff frequency of 0.05 Hz were used to prevent highfrequency noise caused by artifacts and baseline drift in NF signal. (3) Segmentation-the original signals were segmented using a 60-s window and a 10-s window, respectively. In both cases, the step was set to 1 s. All segments were categorized into two classes: AH and N according to the annotations. The segments containing more than 5 s of AH events were labeled as class AH. Other cases were labeled as class N. (4) SpO 2 alignment. As SpO 2 responds slowly to AH events [18], a time advance of τ s (0 < τ < 30) was applied in SpO 2 . The results showed that the model performed best with τ set to 23 s. After preprocessing, the number of extracted segments were 487,974 (AH, N = 44,476: 443,498).

Feature extraction
NF feature set According to the AASM definition of AH events, the amplitude of NF provides important information. Therefore, we first extracted the maximum and minimum points from each NF segment. Then, the tidal volume per breath Ft was calculated as the difference between two adjacent extreme points. The mean, standard deviation, and range of the tidal volume (Fmean, Fstd, Fran) were extracted within each segment. Besides, we calculated the maximum value of the tidal volume every 30 s using Eq. (1): where Ft i represents the tidal volume values in the ith segment. Fb i represents the maximum tidal volume value in 30 s before the ith segment. The number of breaths with tidal volume drops by more than 30%, 70% from Fb were calculated within each segment and denoted as Fha, Fap. And the number of breaths with tidal volume above 85% of Fb was also calculated and denoted as Fnor. In addition, the ratios of them to the total number of breaths (Fhap, Fapp, Fnorp) within each segment were calculated. Besides, owing to the cessation of breathing, there will be fluctuations in the breathing rate during AH events. One normal breath lasts for 3-5 s; energy will be concentrated with a peak in the corresponding frequency. As a result, we took the fourth statistical moment (Fkur) in 0.2-0.4 Hz of NF's frequency spectrum as another feature.
SpO 2 feature set We first calculated the standard deviation and range coefficients of SpO 2 (Spstd, Spran) in each segment. The tendency of SpO 2 in each segment (Spten) was also calculated by using the last SpO 2 value minus the first SpO 2 value. The commonly used feature: time SpO 2 stays below 90% [19,20] was referred while we calculated the indices Sp92, Sp91 with thresholds set to 92% and 91%. Besides, the maximum SpO 2 value (Spbm) and average SpO 2 value (Spba) in every 30 s were computed. Then within each segment, the time SpO 2 stays below 98% of Spbm and that below 98% of Spba were calculated and Fha, Fhap Number of breaths with a reduction more than 30% in tidal volume and its ratio to total number of breaths 3

Fap, Fapp
Number of breaths with a reduction more than 70% in tidal volume and its ratio to total number of breaths 4 Fnor, Fnorp Number of breaths with a reduction less than 15% in tidal volume and its ratio to total number of breaths 5 Fkur Fourth statistical moment in 0.2-0.4 Hz of NF's frequency spectrum 6 Spstd, Spran Standard deviation and range of SpO 2 7 Spten Tendency of SpO 2 8 Spdum, Spdua Duration of SpO 2 desaturation 9 Spldm, Splda Level of SpO 2 desaturation 10 Sp92, Sp91 Duration of SpO 2 staying below 92 and 91% Sleep Breath (2020) 24:483-490 485 Fig. 1 Design of cascading detection model based on AH event detection denoted as Spdum, Spdua respectively. Finally, the level of oxygen desaturation Spldm and Splda in each segment was calculated as Eqs. (2) and (3) show.
where i represents the ith segment. Sp i represents the SpO 2 values in the ith segment. Spbm i , Spba i represent the maximum and average SpO 2 value in 30 s before the ith segment respectively. The total feature set is shown in Table 2.

Design of cascading detector
The cascading detector contained two parts. The first was a random forest consisting of 10 CARTs for the prediction of 60 s segments. This could screen out most of the N segments while retaining the AH segments. The second part was a random forest consisting of 20 CARTs for the prediction of 10 s segments. Based on the results of the 60-s detector, the 10-s detector was able to detect AH events more precisely. Note that the 60 s detector was trained using a feature set composed of features 2, 6, and 8 in Table 2 in order to improve the training speed. The results indicated that there was almost no effect on the performance. Owing to the imbalance in the number of AH and N segments, the weights for the two classes in CARTs were set to inverse ratio of their numbers.
A twofold cross-validation was used in the test. Each time, half of the segments were used for training with the remaining half used for testing. The cascading detector output the sequence composed of the prediction results of the 10 s segments. The detector was trained on a computer with an i5-7600k CPU and 8 G RAM.

Design of event detector
The sequence predicted by the cascading detector was then fed into the event detector to correct invalid results following two rules; (1) Only more than 10 consecutive segments classified as AH were considered to be one valid AH event. As the original data were segmented by a 10-s window, and one AH event lasts at least 10 s, so one AH event corresponded to at least 10 consecutive AH segments. Any segment which did not meet the rule was modified to class N. (2) The number of segments classified as N between two valid AH segments was supposed to be more than five. This was also determined by the way of data segmentation. Any segment that did not meet the rule was reset to class AH.

Results
The cascading detection model was able to estimate AHI and provide the time information for each AH event. We analyzed its performance with respect to two aspects: segments and AHI.

Segment analysis
The prediction results for the segment-by-segment analysis are shown in Table 3. The cascading detection model achieved an accuracy of 88.3%, a sensitivity of 75.2%, and a specificity of 89.6% for 487,974 test segments. Figure 2 displays the AH event estimation results for one mild-SAHS, medium-SAHS, and severe-SAHS subject. For the mild-SAHS subject, the accuracy, sensitivity and specificity were 92.9%, 80.6%, and 94.0%. The corresponding values for the medium-SAHS subject were 92.2%, 82.2%, and 93.6% while 86.7%, 81.1%, and 88.2% for the severe one.    Table 4 compares the number of AH events, duration of AH events and AHI values for the cascading model and scored by PSG. Table 5 summarizes the classification results for SAHS severity. The mean values for sensitivity, specificity, PPV, and accuracy were 100.0%, 91.1%, 86.7%, and 94.4%, respectively, for AHI thresholds of 5, 15, and 30 events/h. Besides, the kappa coefficient for diagnosis of SAHS severity was 0.76.

Discussion
We proposed a cascading detection model that could predict AHI based on AH event detection. Compared with PSG, only NF and SpO 2 were used. Previously, the original signals were commonly cut into 60 s segments for AH event detection [2,7,10,11]. However, the detection of AH events may not be precise based on 60 s segment analysis because it can only determine whether there was AH in the segment, while, may make mistakes for the segments containing multiple AH events and lead to an error in AHI estimation. Therefore, some researchers [3,9] cut the signals into shorter segments for detection. However, it is difficult to extract effective features from a segment shorter than 10 s, because there will be no more than five complete breaths in one segment in most cases. As a result, we proposed a cascading detection model composed of a 60-s detector and a 10-s detector to predict AH events precisely. Table 3 shows the classification results for the segments. Notably, the model tended to make false positive errors. In approximately 12.1% of these errors, the amplitude of NF signal decreased by more than 30% from previous event baseline accompanied with a SpO 2 desaturation, however no arousal or nasal pressure signals were adopted for identifying hypopneas. Therefore, these segments may be mistaken for class AH.
As illustrated in Fig. 3, AHI est showed high correlation with AHI ref (Pearson correlation coefficient 0.99, p < 0.01). The performance of the model also showed good consistency among different subjects. On the other hand, AHI est was slightly higher than AHI ref . Consequently, SAHS severity was overestimated for four subjects; for the remaining 20 subjects, the model gave the correct prediction ( Table 5). The kappa coefficient of the cascading detection model for diagnosis of SAHS severity was 0.76, indicating that this method represents a powerful screening tool for SAHS. We also tested the speed of the cascading detection model. Training required 24.7 s, while only 20.3 s was needed to provide results for all segments and to predict AHI for all 24 subjects. It took 41.6 μs to predict one segment and 0.85 s to diagnose one subject on average. This implies that the model could be used for real-time AH event detection.
As Table 6 shows, our method exhibited a good sensitivity but not very good specificity compared with other studies. That is mainly because excursions in NF is not as prominent as those in nasal pressure signal during hypopnea [16] thus decrease the event detection performance. Nasal pressure signal or a combination of NF and nasal pressure signal will be taken into study in future to improve this. More importantly, the model could not only predict the severity of SAHS but could also provide time information for each AH event. Furthermore, compared with other methods such as convolutional neural networks, a smaller number of hyperparameters and less computation were required by our random forest based approach, and the CARTs provided better interpretability for clinical detection.
However, there were some limitations to this study. First, we did not further classify AH events into apnea events and hypopnea events. Second, the model was not tested in an online environment. We hope to confirm the usability of our method online in the future. Third, the model was not able to distinguish central and obstructive events because no ribcage or abdominal movement signals for identifying central events from obstructive events were adopted in this study. Finally, no electroencephalography was adopted in this algorithm, thus sleep and awake time were not evaluated in this study.

Conclusion
The purpose of this study was to propose a model for real-time detection of AH events. Based on the morphological features of NF and SpO 2 , the cascade of a 60-s detector and 10-s detector could not only predict AH events, but could also provide time information for each AH event. Compared with previous research, the cascading detection model based on random forests provides better interpretation with reduced computational complexity. Therefore, it is expected to be an effective tool for SAHS diagnosis.
Compliance with ethical standards Formal consent was not required for this study, as the data had already been de-identified.

Conflict of interest
The authors declare that they have no conflict of interest.