Towards automatic EEG cyclic alternating pattern analysis: a systematic review

This study conducted a systematic review to determine the feasibility of automatic Cyclic Alternating Pattern (CAP) analysis. Specifically, this review followed the 2020 Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines to address the formulated research question: is automatic CAP analysis viable for clinical application? From the identified 1,280 articles, the review included 35 studies that proposed various methods for examining CAP, including the classification of A phase, their subtypes, or the CAP cycles. Three main trends were observed over time regarding A phase classification, starting with mathematical models or features classified with a tuned threshold, followed by using conventional machine learning models and, recently, deep learning models. Regarding the CAP cycle detection, it was observed that most studies employed a finite state machine to implement the CAP scoring rules, which depended on an initial A phase classifier, stressing the importance of developing suitable A phase detection models. The assessment of A-phase subtypes has proven challenging due to various approaches used in the state-of-the-art for their detection, ranging from multiclass models to creating a model for each subtype. The review provided a positive answer to the main research question, concluding that automatic CAP analysis can be reliably performed. The main recommended research agenda involves validating the proposed methodologies on larger datasets, including more subjects with sleep-related disorders, and providing the source code for independent confirmation.


Introduction
Sleep is a fundamental aspect of the circadian rhythm that is unique to each person and is comprised of various stages with associated autonomic nervous system activities. During sleep, the body repairs vital systems, and the sleep process significantly impacts memory consolidation, physical development, learning, emotion regulation, and overall life quality [1]. However, despite the critical role that sleep plays in maintaining physical and mental health, there remains a lack of consensus regarding the best criteria for determining sleep quality [2]. Furthermore, various factors can impact sleep quality, and non-restorative sleep is widely acknowledged as one of the most frequently reported reasons for seeking medical consultation [3]. This highlights the need for a clearer understanding of what constitutes good sleep and the mechanisms underlying sleep disturbances.
It is anticipated that the evaluation of sleep quality will emerge as a significant aspect of medical diagnosis in the Fábio Mendonça fabioruben@staff.uma.pt 1 near future. However, as a multifaceted construct, the natural complexity of sleep makes it difficult to capture its processes using a single measure [4]. Thus, it is necessary to adopt a multivariable approach that incorporates a diverse range of predictors, considering variations in sleep quality that include age and gender information. Previous studies have reported that metrics based on the duration, intensity, or uninterrupted nature of sleep (continuity) have a limited correlation with subjective assessments of sleep quality from the previous night [5]. Alternatively, stability-based measures may have greater significance for future medical diagnoses of sleep quality [2].
In light of these findings, the analysis of sleep microstructure emerges as a crucial aspect in evaluating sleep quality. One particularly significant piece of this analysis is the identification of the Electroencephalogram (EEG) Cyclic Alternating Pattern (CAP) [6], which plays a central role in assessing sleep microstructure. CAP refers to a repeating pattern of changes in brain activity that occurs during sleep and is associated with various markers of sleep quality, including sleep fragmentation and instability. The CAP cycles are composed of alternating activation (A-phase) and quiescent (B-phase) phases that last from 2 to 60 s. The A-phase is characterized by a sequence of transient EEG variations, while the B-phase indicates the recovery of background EEG activity. Additionally, the A-phase can be further classified into three subtypes that play different roles in the sleep process, having distinct amplitude and spectral characteristics in the EEG signal. The first, named A1, is characterized by high-amplitude slow waves, while the third, denoted as A3, is the opposite. The second, entitled A2, represents an intermediate state between the two subtypes [6].
Research has demonstrated that pathological conditions can alter the characteristics of the subtypes, highlighting the importance of examining the CAP pattern and subtype characteristics in assessing sleep quality. Such analysis can provide valuable insights into the stability and fragmentation of sleep and help to identify markers of sleep disturbances, enabling the development of effective strategies for promoting good sleep health.
It is imperative to observe that the division of sleep into a limited number of sleep stages, despite its simplicity, is based on possible obsolescent knowledge about the sleep process [7]. As a result, the metrics estimated based on sleep macrostructure can be considered a rough estimate of sleep quality, as they are based on a synthetic segmentation of the continuous sleep process. Sleep microstructure provides a much more in-depth understanding of sleep, as it is based on a second-by-second analysis of transient and phasic events [8]. However, this increased resolution also brings the challenge of augmented complexity in the analysis, requiring a longer duration for a human operator to perform a full-night sleep examination. To overcome this challenge, it is crucial to automate the examination process to make sleep assessment based on sleep microstructure metrics a feasible possibility [9]. As a result, a fundamental uncertainty is whether automatic CAP analysis is viable. Hence, the formulated research question was: is automatic CAP analysis viable for clinical application?
The goal of this research is to address this query, considering that the examination of CAP, along with other measures of sleep microstructure, can provide a more comprehensive understanding of sleep, enabling the identification of sleep disturbances and the development of effective interventions for promoting good sleep health [10]. A review article was published on automatic CAP methods analysis [11], discussing the performance of automated tools for CAP analysis. Although highly relevant, that review is limited to the performance analysis. Contrarily, this article presents a comprehensive study that not only evaluates the performance of automated tools for CAP analysis but also extends its scope to survey additional articles, encompassing clinical applications and aspects of interpretability. By examining research trends, utilized features, and models, this article aims to find an answer to the formulated research question.
Whilst a deconstruction of arousal circuitries in the human brain is in its infancy, with its cortical and subcortical sources remaining elusive [12][13][14], the CAP phenotype may provide an indirect fundamental biomarker of its activity [14][15][16][17]. Moreover, there is growing evidence that CAP and arousals underwrite the basic mechanisms of sleep regulation, with subtype A1 contributing to the build-up and consolidation of deep slow-wave sleep (SWS), whilst subtypes A2 and A3 contribute to the onset of rapid eye movement (REM) sleep or wakefulness [15], which is also in keeping with findings from recent animal studies [18].
Therefore, for future clinical approach, it might be beneficial in some instances to target various subtypes of CAP, for instance, via new neuromodulation technologies or pharmacotherapy [14]. Moreover, it is likely that ability to record a baseline (untreated) EEG CAP phenotype in majority of sleep or neuropsychiatric disorder would enable a more individualized approach to be developed. For instance, in past, it has been shown that cognitive reserve, daytime sleepiness, affective/mood symptoms and OSAseverity may all dictate the distinct CAP profile in individual patients [14,17,19]. Thus, the baseline (untreated) CAP profile may also shape any individualized response to the future treatment in those disorders.
In this background, a systematic literature review was performed to examine the various methodologies for automatic CAP analysis. The study aimed to evaluate the prior work in this area and to identify current trends and advancements.
Considering the existing research and technology in this area, the review aimed to provide insights into the potential of these methods to transform the way sleep is analyzed and understood. The organization of this article is as follows. Section 2 presents the methods utilized in conducting the systematic literature review. Section 3 examines the studies included in the review, summarizing the methodology employed in each work. Section 4 consists of an analysis of the performance of the methods, and Sect. 5 concludes the article by presenting the main findings and highlighting the research agenda for future investigations.

Materials and methods
This section aims to provide a comprehensive overview of the process used to search and analyze the articles. This review study followed the 2020 Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines [20] to ensure that the examination is reproducible. Therefore, the eligibility criteria used to determine which studies to include or exclude are presented, specifying the data sources, the method of data collection, and the selection process.

Search procedure
The systematic article search was conducted using three leading databases: Web of Science, PubMed, and the Institute of Electrical and Electronics Engineers (IEEE). These databases were selected as they offer comprehensive coverage of articles from multiple publishers in various fields, thereby providing a thorough search for the intended topic (automatic methods for CAP examination). The Web of Science indexes an extensive collection of articles from multiple domains, while PubMed focuses on biomedical and life sciences. On the other hand, the IEEE database provides specialized coverage of engineering development analysis. The combination of these three databases offers a comprehensive and complementary search.
The database search was carried out on January 21st, 2023, and aimed to identify all relevant articles aligned with the search strings presented in Table 1. The search keywords used in the search string were chosen to reflect the topic of interest, focusing on sleep patterns and the two most common word derivations associated with CAP ("cyclic alternating pattern" and "CAP"), alongside with "A phase". Additionally, the keywords "automatic" and "classification" were included to emphasize the focus on automatic procedures in the analysis. The number of results for each search string is presented in Table 1, and the total number of articles found in all databases was 1,280. Among these, 635 were duplicates; thus, the total number of unique articles was 645.

Eligibility criteria
The initial screening of the 645 articles was performed by two scorers, who reviewed the title and abstract of each article to determine its relevance. The inclusion criteria for the articles were: the article must describe an automatic analysis of CAP, including the classification of A phases, A phase subtypes, or CAP cycles, and be written in English. Articles that only classified the onset or offset of the A phase were not considered for inclusion, as such a method does not provide information about the entire A phase length. After this screening process, a total of 56 articles were selected for further examination.
Eight articles whose method does not examine the EEG sensor were not considered for the review as they employ an indirect analysis regarding the presence of CAP [21][22][23][24][25][26][27][28]. Articles that examined CAP's signal characteristics but did not provide a fully automatic methodology for A phase, A phase subtypes, or CAP cycle classification were also excluded. As a result, 35 studies were selected for the systematic review. The PRISMA procedure is depicted in Fig. 1.
The distribution of the selected articles based on their year of publication is presented in Fig. 2. From this figure, it is evident that the search for methods for automatic analysis had already lasted for over two decades. It is also noticeable that there was a nearly stagnant period until 2010. However, interest was resurgent after that, largely due to the advancements in machine learning algorithms and the ability to process larger data sets. This tendency was accelerated after the year 2018 as more than half of the articles (20) were published in the past five years, indicating the significance of the topic and the requirement for a comprehensive review  performance metrics reported by the articles were included but not further analyzed.

Results
This section summarizes the included articles, presenting their methodologies and results. It is divided into three subsections, following the evolution of automatic classification approaches, from threshold-based classifiers to the conventional machine learning models, concluding with the deep learning models. The results are summarized in Tables 2  and 3, and 4 for the A phase classification, A phase subtypes estimation, and CAP cycle detection, respectively. Most that can consolidate the knowledge, highlight the trends, and identify new directions for exploration.

Performance analysis
As most studies included in the systematic review employ machine concepts and learning algorithms, four standard performance metrics were considered to assess the relevance of the method's performance since these were previously shown to be suitable for comparing dissimilar works in a review [29]. Specifically, the metrics were Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), and Area Under the receiver operating characteristic Curve (AUC). Other These features were also used by Barcaro et al. [35]. Largo et al. [36] further extend this idea by proposing an activity index that computes two moving averages, one short and the other long, from EEG bands (the standard bands with the delta band in three sub-bands, from 0.5 to 1 Hz, 1-2 Hz, and 2-4 Hz) obtained from a discrete wavelet transform.
Mariani et al. [37] further analyzed the band descriptors (the conventional bands with the delta subdivided into low, 0.5-2 Hz, and high, 2-4 Hz). They also utilized the differential variance of the EEG signal (calculating the difference in variance between consecutive windows) and Hjorth descriptors in the low delta and high delta bands. These Hjorth descriptors were activity (variance of the signal segment) and mobility (the square root of the ratio of the activity of the first derivative of the signal to the activity of the original signal). It was concluded that differential variance provides the highest Acc and Spe. Mariani et al. [38] first segmented the EEG signal using a FeedForward Neural Network (FFNN) to separate the Non-Rapid Eye Movement (NREM) sleep. Then, they used the previously mentioned features (5 band descriptors, Hjorth activity, and differential variance) for A phase analysis and applied the CAP scoring rules to identify the CAP cycles. Machado et al. [40] examined subjects with Nocturnal Frontal Lobe Epilepsy (NFLE) and computed for the five standard EEG bands the bands' descriptors and the Teager Energy Operator (TEO). It was concluded that the best performance for A1 and A2 subtypes works used the CAP sleep database in the examination [6,30]. Although some works are certain to be using samples from this database, since they have not explicitly mentioned it, it was not indicated highlighted in the table as using data from that database. Likewise, some works report which subjects from the database were used, but without specifying the demographic characteristics. Hence these characteristics were not included in these works.

Threshold-based methods
EEG exhibits complex patterns and generates substantial data during a full night examination. Several of these patterns are associated with CAP [10] and comprise amplitude and frequency characteristics. A total of 10 studies propose to automate the CAP analysis by relying on custom thresholds to identify the A phases [31][32][33][34][35][36][37][38][39][40]. Lima and Rosa [31] proposed a method that relied on an EEG signal model and looked for changes in the squared signal to detect the A phase. Afterward, a Finite State Machine (FSM) was employed for the CAP cycle detection. Rosa et al. [33] also used an FSM for CAP cycle detection, but employed a method based on a matched filter with a variable length and relative amplitude sliding template to detect A phases and then determined the end of these phases using a convolution-based procedure. Nevertheless, modeling EEG signals, which are complex and generate large amounts of data during a full-night examination, present a difficult challenge.
There is a need to identify characteristics in the data that can emphasize patterns while reducing the amount of information. These characteristics are usually named features, and several have been proposed for CAP analysis. Navona et al. [34] adopted this approach, proposing an A phase detection based on the computation descriptors for characteristic EEG bands (delta, 0.75-4 Hz, theta, 4-8 Hz,    was attained using TEO in the delta bands, while for A3, it was using the beta band. A different approach was proposed by Fantozzi et al. [32] that studied healthy and sleep disorder subjects, including insomnia, bruxism Sleep-Disordered Breathing (SDB), and REM Behavior Disorder (RBM). They filtered the EEG signal into two bands (slow, 0.3-4.5 Hz, and fast, 7-25 Hz) and then proposed an algorithm that uses the root mean square of the signal to identify the presence of A phases. Niknazar et al. [39] also proposed a conceptually different algorithm based on local extrema's statistical behaviors to determine the A phases' start and end times by examining the EEG delta band.

Methods based on conventional machine learning models
The use of threshold-based methods for CAP signal analysis may seem intuitive, given that these signals exhibit dissimilar amplitude and frequency characteristics. However, it is challenging to generalize a threshold tuned for a specific dataset to a broader population. This difficulty is evident from the trend observed in the year of publication (Fig. 2); except for Fantozzi et al. [32], all other works relying on threshold methods had been published prior to 2015. This likely reflects the generalization problem inherent in threshold-based processes. The following analysis focuses on methodologies that use machine learning algorithms, enabling the models to learn the relevant characteristics from the data. A total of 12 articles compose this examination.
In their study, Mariani et al. [41] suggested using an FFNN fed with the features described by Mariani et al. [38]. However, only the NREM sleep was analyzed. Although it is logical to isolate NREM sleep, manually doing so can hamper the practical applicability of the proposed methodology. It is, therefore, advisable to either keep all sleep data or employ an automatic process to segment the NREM sleep. Another important aspect is the used postprocessing procedure that divided the scored long A phases (over 60 s) into two separate A phases using a neural network-based clustering method. Both preprocessing and postprocessing are critical components in machine learning, as the former prepares the data, while the latter corrects some misclassifications. A Support Vector Machine (SVM) (fed with similar features to those used in the previous study) was employed by Mariani et al. [42], presenting a postprocessing procedure capable of correcting misclassifications by changing isolated 1-second classes to the adjacent class. Later, Mariani et al. [  sliding windows of 4 s with 2 s of overlap. The classification was then carried out using the k-Nearest Neighbors (k-NN) algorithm. Machado et al. [50] further expanded this concept by creating a methodology for identifying the subtypes of the A phase directly from the EEG signal. They utilized the EEG band descriptors (previously described), TEO, zero-crossing, Lempel-Ziv complexity, characteristics of the discrete-time short-time Fourier transform signal (such as frequency of maximum and mean energy and area under the magnitude spectrum curve), empirical mode decomposition, Shannon entropy, fractal dimension, and variance of the EEG signal. A total of 55 features were created, and the minimum Redundancy Maximum Relevance (mRMR) algorithm was used to rank them. The top 40 ranked were fed into an SVM (k-NN and linear DA were also examined but attained a lower Acc). Later, Machado et al. [51] used the same methodology but provided results for A phase detection and examined the use of Principal Component Analysis (PCA) to reduce the features' dimensionality. However, the results without PCA were superior.
The same classifiers were examined by Karimzadeh [52] for CAP cycle detection. An SFS procedure was then used to determine the most relevant features, choosing Kolmogorov, Shannon, and Sample Entropy to feed an SVM (best-performing classifier).

Methods based on deep learning models
Despite being intuitive, relying on features designed by researchers has significant drawbacks in the context of analyzing CAP patterns. Feature engineering is a demanding process that requires expertise and thoughtful consideration, often involving a feature selection procedure to identify the most relevant features for the problem at hand. This process can be time-consuming and may not always result in optimal features. Additionally, the features are limited in their ability to capture complex patterns and relationships in the data, leading to poor generalization and potential scalability issues as the amount of data increases. In contrast, deep learning-based methods can automatically learn relevant features from the data, uncovering patterns that may not be immediately apparent to humans. This eliminates the need for manual feature engineering and allows for the effective handling of large amounts of data. A total of 13 articles employed deep learning classifiers. Mostafa et al. [53] propose the first deep learning model for A phase analysis (in 2018), using a Deep AutoEncoder (DAE), whose output was then stored in a buffer to feed a subsequent FFNN responsible for classifying the CAP cycles. Mendonça et al. [54] provided the preprocessed EEG signal to three classifiers, two are based on a Recurrent Neural Network (RNN), precisely, the Long Short-Term the four classifiers, the DA achieved the highest accuracy and specificity.
Linear DA classifier was also used by Mendonça et al. [44], which segmented the EEG signal into two-second segments and estimated six time-based features (average power, standard variation, Shannon entropy, autocovariance, log-energy entropy, TEO) and five frequency-based features by examining the Power Spectral Density (PSD) in the delta, theta, alpha, sigma, and beta bands. Sequential Feature Selection (SFS) identified PSD in the beta, theta, and alpha bands, average power, TEO, and standard deviation as the most relevant features. An FSM was also used to assess the CAP cycles. Later, Mendonça et al. [45] expanded the work by examining nine more classifiers, the Logistic Regression (LR), two tree-based methods (one with and one without ensemble), SVM, kNN, two variants of the FFNN, and unsupervised learning-based classifiers, the SelfOrganizing Map (SOM) and the k-Means Clustering (k-MC). It was concluded that the standard FFNN outperformed the other classifiers using the PSD in the theta and beta bands, Shannon entropy, TEO, and autocovariance as features.
Dhok et al. [46] used the Wigner-Ville distribution to analyze two-second segmented data, which enables exhaustive time-frequency analysis. They then calculated the Rényi entropy and fed the result into an SVM to classify the A phase. To ensure balanced performance, they performed a balancing operation. A time-frequency approach was also proposed by Sharma et al. [47], using an orthogonal filter bank and wavelet to decompose the EEG signal into six subbands. Then they computed the wavelet entropy and three Hjorth parameters (activity, mobility, and complexity) from each subband to produce 48 features. Two tree-based classifiers (one with bagging and the other with boosting), SVM, and k-NN were studied for the A phase classification. The tree-based classifier with bagged trees attained the best performance using balanced data (the authors reported individual performance for multiple sleep disorders, however, in Table 2, only the healthy subjects' results are shown). Sharma et al. [48] also used wavelet decomposition to attain six subbands and computed both the approximate and entropies for each band. An ensemble of boosted trees was then used to classify the occurrence of A phases with a balanced dataset.
Mendez et al. [49] presented a method for further distinguishing A-phase subtypes from previously classified A-phases. For this purpose, two-second segments were analyzed and computed for each segment the mode, standard deviation, skewness, kurtosis, energy, and power after spectral decomposition of the EEG signal in four bands (delta, theta, alpha, and beta). Complexity and entropy measurements (Lempel-Ziv Complexity, Sample Entropy, Fractal Dimension, and Tsallis Entropy) were also computed in          is fed with data that overlap to the right, and the second CNN receives inputs that overlap to both the left and right. The third CNN uses data that overlaps to the left. The output from the three classifiers was combined to classify the A phase or its subtypes. Additionally, they introduce the A-phase index as a complementary perspective for CAP analysis, which provides a visual representation of sleep stability. The study involved healthy subjects and individuals with sleep disorders (NFLE, insomnia, and SDB), but the results in Tables 2 and 3 pertain solely to the healthy subjects.
Deep learning models can also incorporate features as input. Hartmann and Baumert [63] explored this possibility using Hjorth activity, Shannon entropy, TEO, differential EEG variance, and band descriptors. These features were fed to three conventional machine learning models (linear DA, k-NN, and FFNN) and an LSTM that outperformed the other classifiers in A phase classification. Mendonça et al.
[64] compared the performance of deep learning models fed with features against the same model provided directly with the preprocessed EEG signal. The features analyzed three main aspects of the EEG signal: amplitude through symbolic dynamics and an amplitude variation metric; frequency through PSD of the five characteristic EEG frequency bands; and the ratio of the maximum amplitude of an epoch to its calculated PSD, which represented both amplitude and frequency. The relevance of the features was measured using mRMR, and the most important were employed for the A phase subtype classification. The results indicated that using features improved performance, likely because the limited data did not allow the deep learning model to learn all relevant characteristics. These features were later used by Mendonça et al. [65] that conducted a similar analysis but proposed the Heuristic Oriented Search Algorithm (HOSA) for optimizing the structure of deep learning models. The authors examined the performance of LSTM fead with features agains the LSTM fead with the preprocessed EEG signal, and concluded again that the use of the featurebased model was superior for the same reason as previously stated. They also tested a FFNN and a CNN, and performed CAP cycle detection using a FSM.

Discussion
This section examines and discusses the reported results of the surveyed works. The performed classification was first explored, followed by an overview of the used features and classifiers and their relation to the CAP analysis.
Memory (LSTM) and the Gated Recurrent Unit (GRU). The last model was a Convolutional Neural Network (CNN) with one-dimensional input and custom architecture. The result of the A phase classification was then fed to an FSM to classify the CAP cycles. It was reported that LSTM attained the utmost performance. Mendonça et al. [55,56] followed a similar methodology with an LSTM, which was also the classifier employed by Hartmann and Baumert [57] (as a comparison, an FFNN was used, which achieved lower performance). They propose cleaning procedures to remove cardiac field and eye movement artifacts. Furthermore, a balancing process was employed to balance the data. The network structure was optimized by a genetic algorithm and particle swarm optimization, reaching the best performance using three EEG derivations as input. The LSTM layers performed the information fusion and provided the result to dense layers to classify the A phases.
Murarka et al. [58] presented a CNN architecture with one-dimensional input and employed an undersample balancing technique. The results in Table 2, however, show the unbalanced data performance to enable comparison with other deep learning studies. The authors evaluated the individual performance for various sleep disorders, but Table 2 only displays the results for healthy subjects. Loh et al. [59] adopted a similar approach by proposing a CNN architecture and using a balancing method. Therefore, Table 2 presents the unbalanced data performance (for the same reason as before).
Arce-Santana et al. [60] proposed another CNN architecture fed with spectrograms, which in this work are two-dimensional representations of four-second segments of the EEG signal. The authors followed a training procedure where the network was first trained using 12.5% of the subject's data and then used to classify the remaining 87.5% segments. Afterward, the network was retrained with 20-50% of the data classified by a specialist. To allow for comparison with other deep learning studies, Tables 2  and 3 include results without the retraining procedure. The proposed algorithm is capable of classifying the A phase and its subtypes. A methodology with the same classification capability was presented by You et al. [61], proposing an encoder-decoder CNN architecture based on the U-net framework (with skip connections) with a transformer layer incorporating a gated multi-head attention mechanism. The article reports performance for healthy and subjects NFLE subjects. However, Tables 2 and 3 only comprise the results related to the entire sleep data of the healthy subjects.
Mendonça et al. [62] put forward a method that employs long windows of EEG signals with overlapping durations (ranging from 15 to 23 s) as inputs for an ensemble of three CNNs. Each CNN has a one-dimensional input and is optimized separately using the HOSA algorithm. The first CNN 1 3 methodology used, as a possible alternative to manual scoring. Violin plots with the results for the main examined performance metrics are shown in Fig. 3. It was reported that sleep specialists' agreement to score CAP events could range from 69 to 78% [66]. By checking the distributions from Fig. 3, it is notorious that the median is around 78%, which is precisely the upper specialist agreement. Although in a crude examination, it can be inferred that the current automatic models are as good as specialist scoring CAP, supporting the viability of automation for CAP examination. It is also worth noting that most works used the same dataset and examined the same subjects, making this analysis less subjective.
The spread in performance can be attributed to the substantially different methodologies employed. However, it is worth noting that methods that require manual isolation of A phases or consider only data from NREM sleep may enhance model performance but prove impractical for realworld applications. It is also crucial to ensure subject-independent results to avoid bias, particularly when the number of subjects is low. Furthermore, the AUC suggests that reported performance is reasonably balanced, with similar sensitivity and specificity. This is significant because CAP analysis is naturally unbalanced, with far fewer data relating to the A phases than to not A phase. As a result, a high Acc is ambiguous without reporting sensitivity and specificity. Similarly, if a balancing operation is conducted, the test data should remain unchanged, as modifying the natural data distribution makes it impossible to ascertain whether the reported results will generalize to new, unseen data.
Regarding the A phase classification performance, the highest Acc (92.5%) and Spe (96.1%) were reported by You et al. [61], but their Sen was too low (63.6%), which is aggravated by the inherent imbalance in A phase analysis

Reviewed works' performance
Out of the 35 articles reviewed, as observed in Tables 2 and  28 performed binary classification of EEG epochs as either A phase or not A phase, with seven using a threshold-based classifier, nine using conventional machine learning classifiers, and 12 using a deep learning classifier. Additionally, ten articles examined the A phase subtypes. Among them, two used a threshold-based classifier, three used conventional machine learning classifiers, and five used a deep learning classifier.
Various approaches were employed for subtype detection, shown in Table 3, including multiclass models, individual models for each subtype, and models that separate previously classified A phases. This diversity of methodologies makes it impractical to compare the results. Furthermore, six studies conducted both A-phase and A-phase subtype analyses, [34,51,57,[60][61][62], while the remaining studies, [40,49,50,64], only performed A-phase subtype classification.
Lastly, ten articles examined the CAP cycles, presented in Table 4, with three using a threshold-based classifier, two using conventional machine learning classifiers, and five employing an FSM to implement the CAP scoring rules for scoring the CAP cycles. It is worth noticing that no work employed a deep learning model for directly classifying the CAP cycles. Furthermore, most methodologies used for CAP cycle detection rely on a prior A phase classifier whose output was fed an FSM for imposing the CAP cycle rules. Only three works, [31,33,52], directly classified the CAP cycles without first estimating the A phases.
The follow-up analysis focused solely on A phase and CAP cycle detection, aiming to evaluate the current stateof-the-art classification performance, regardless of the statistics-based features (such as mean or kurtosis), entropybased features (such as Shannon entropy), and complexitybased features (which explore the signal complexity without relying on entropy). Additionally, Hjorth parameters were included as a separate category since these comprise different metrics, and as some studies did not identify which one was used, it was impossible to classify them into the previous six categories. It should be noted that certain features may fit into multiple categories, but each feature was only associated with one class to simplify the examination.
The number of times each feature-based category was used, and the year of publication of the study that used it, is presented in Fig. 4. Upon examining the figure, it is evident that amplitude-based features were the most frequently used and were reported in studies published throughout the analyzed period. This suggests a strong preference for using these features, possibly due to the predominance of A1 phase subtype characteristics in healthy subjects and the strong association between this subtype and EEG amplitude variations. While other categories of features can also examine these properties, it is noteworthy that frequencybased features were used less frequently, despite the strong connotation between frequency components and CAP. Entropy-based, complexity-based, and Hjorth descriptors features may also be suitable for CAP examination, as they can detect the complex and variable patterns of brain activity during the A phases.
The subsequent examination is related to the classifiers used by the reviewed works. The distribution of the classifiers by the year of publication of the study and the number of times each classifier was used are presented in Fig. 5. The and limits the method's practicality. In contrast, Loh et al. [59] reported the highest sensitivity (92.1%), but their accuracy (53.0%) was nearly at a random level, rendering the approach unreliable. Therefore, the method proposed by Mendonça et al. [62] is likely the most suitable for clinical application since it reported the best-performing balanced results (Acc, Sen, and Spe of 83.3%, 80.1%, and 83.7%, respectively) and did not require any manual manipulation of the EEG signals (such as isolating NREM sleep). It is worth mentioning that all three of the indicated studies employed a CNN-based classifier, providing evidence for the suitability of deep learning models in A phase analysis. As for CAP cycle detection, Rosa et al. [33] method achieved the highest performance with an accuracy, sensitivity, and specificity of 89.8%, 89.8%, and 95.0%, respectively. However, since the study only evaluated four subjects, the generalizability of the results may be limited.

Overview of the used methodologies
The patterns contained within the CAP phases comprise characteristics from the signal's amplitude and frequency. As a result, most feature-based studies tend to examine features that explore these domains. These features were categorized into three groups: amplitude-based (which assess variations in the signal amplitude), frequency-based features (which examine characteristics in the frequency domain, such as the PSD), and amplitude-frequency-based features (for example, the ratio of maximum amplitude to the calculated PSD of an epoch). However, some features do not fit into these categories, so three additional were included: on an initial A-phase classifier. Furthermore, the assessment of the A phase subtypes' performance has proven challenging due to the use of various approaches, ranging from classification with a multiclass model to using individual models for each subtype.
While the current studies have methodological limitations, the performance results determined in this review are consistent and can be considered a reasonable estimate. Notably, the median accuracy of the state-of-the-art methods was comparable to the upper limit of the specialist agreement range, suggesting that automatic CAP analysis can be reliably performed. Therefore, this study provides a positive answer to the main research question.
The recommended research agenda involves validating the proposed methodologies on larger datasets, including more subjects with sleep-related disorders, providing the source code for independent confirmation of the proposed methods, and exploring the possibility of including CAP analysis as a standard sleep examination practice in the future.
Author contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Fábio Mendonça and Sheikh Mostafa. The first draft of the manuscript was written by Fábio Mendonça and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding Open access funding provided by FCT|FCCN (b-on). This research was funded by LARSyS (project -UIDB/50009/2020) and Sheikh Mostafa has received research support from ARDITI (project -majority of the classification methodologies used a threshold solution. However, it is important to note that except for one study published in 2021, all other articles that used this methodology were published up to 2015. In contrast, neural network-based methods have been more prevalent in the past four years, primarily due to the growing popularity of deep learning-based approaches. The fact that the best results for A phase analysis were obtained using deep learning models, combined with the continuous growth of available data, suggests that the trend of using neural networks is likely to persist, further reducing the use of conventional machine learning models.

Conclusion
This study aimed to determine whether automatic CAP analysis is currently achievable. A systematic review was performed to address this question by searching three prominent databases: a standard indexing database, a database dedicated to medical publications, and a database focused on engineering applications. A total of 35 articles were reviewed (from the 1,280 articles initially found), published between 1998 and January 21st, 2023. These studies proposed various methods for automatically examining CAP, including the classification of A-phase, their subtypes, or the CAP cycles.
It was observed that three main trends were used over time regarding the A phase classification. Initially, either mathematical models or features were utilized and classified with a tuned threshold. This trend was followed by the adoption of conventional machine learning models, which have been the norm until the last five years, when there has been a surge in the application of deep learning models. Regarding the classification of CAP cycles, most studies employed an FSM-based approach after A-phase classification to implement the CAP scoring rules. As such, these methods depend

Conflict of interest
The authors have no relevant financial or nonfinancial interests to disclose.
Ethics approval Ethics approval was not required for this study.
Informed consent Informed consent was not required for this study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.