1 Introduction

Sleep is one of the most natural and essential requirements for the human body to function properly, both physically and mentally [1]. While we sleep, our muscles are relaxed while the brain retains new informations [2], eliminates toxic waste, and releases growth hormones. The body repairs cells and replenishes energy [3]. Normal healthy sleep is defined by adequate length, sleep quality, suitable timing, regularity, and the absence of sleep disruptions or disorders. Insomnia, excessive drowsiness and obstructive sleep apnea (OSA) are among sleep disorders that can negatively affect our mental, physical, social, and emotional well-being. This explains why neuro-anatomists and neuro-physiologists have been researching sleep for over a century [4, 5].

Sleep scoring is the process of extracting sleep cycle information which can be used for detecting and treating sleep-related disorders. Traditionally Sleep scoring is performed using polysomnogram (PSG) recordings obtained from patients during their overnight sleep in a sleep laboratory. Further, sleep scoring professionals analyze PSG recordings and classify the sleep into stages using either Rechtschaffen and Kales (R&K) guidelines, or the more recent recommendations published by the American Academy of Sleep Medicine (AASM). According to R&K’s guidelines PSG recordings are first divided in 20 or 30-second epochs, which are then classified as wakefulness (W), rapid eye movement (REM) sleep, or Non-REM sleep (NREM). Non-REM sleep is further subdivided into the first, second, third, and fourth stages (also called S1, S2, S3, and S4). In contrast, AASM standards merge the NREM stages S3 and S4 to a single deep sleep stage known as N3, often known as slow-wave sleep (SWS) [6].

PSG recordings comprise electrooculograms (EOG), electroencephalograms (EEG), electrocardiograms (ECG), and electromyograms (EMG). Among them, EEG is amongst the most important physiological signals in terms of clinical relevance and practical utility. The EEG spectrum is categorized into five rhythms (frequency bands), namely α, β, γ, δ and 𝜃. The frequency and the amplitude of these different rhythms can reveal important insight for identifying, observing, and treating neurological abnormalities and diseases [7].When a subject is awake, the EEG shows β waves, high in frequency and low in amplitude. N1 is the dozing off stage; during this stage, the frequency of the brain waves begin to slow, and this stage generally lasts 5-10 minutes/cycle. The body undergoes a more subdued condition in the second stage of sleep, which includes a decrease in body temperature and pulse rate. The brain activity in this stage is slower than in earlier stages, and there is a new pattern of acute and quick fall and rise in amplitude of brain waves termed the K-complex [8] followed by short bursts of activity called sleep spindle [9]. The second stage lasts for 20-25 minutes/cycle in a healthy patient. The third stage of sleep is also known as deep sleep. This is very crucial since it is when the body is most relaxed and the brain produces growth hormones, allowing muscular tissues to mend and grow. Inadequate deep sleep may cause a person to be heavy and sluggish in the morning. This stage lasts about 20-40 minutes/cycle, depending on the person’s health and age. As humans grow older, growth hormones decrease, resulting in less deep sleep [10]. During REM stage of sleep, the eyes move swiftly from one side to the other beneath closed eyelids, breathing becomes rapid and heart rate accelerates, giving the appearance of wakefulness. A person is more likely to experience a dream when in the REM stage. Their arms and legs are briefly paralysed to prevent them from acting out the dream [11].

The sleep stage begins with N1 and advances to N2, N3, further, the body returns to N2 and then eventually reaches REM. After this, the body returns to N2, and the complete sleep cycle repeats. A healthy individual experiences this sleep cycle four to five times every night, with each cycle lasting longer than the one before it [12]. Figure 1 shows the EEG sample epochs of a subject taken from the database for each sleep stage.

Fig. 1
figure 1

Signal representing all the stages of a subject (subject ID : 10198)

Traditional sleep scoring is a time-consuming, labor-intensive procedure that is vulnerable to human errors due to many hours of continuous examination. Aside from the expense and difficulty of assessing sleep patterns, patients must spend the night inside a fully equipped sleep laboratory with sticky electrodes and cables attached to their heads. Although sleeping in this unfamiliar environment is uncomfortable, these treatments may impact the patient’s sleep efficiency. Due to these challenges, several automated methods have been proposed for sleep-stage scoring [13,14,15,16]. In this paper, we propose an automated sleep scoring system that uses subbands (SBs) obtained from optimal stopband energy (SBE) from a biorthogonal wavelet filter bank (BWFB) involving only two channels of EEG signals. For each SB, Hjorth parameters (HP) are computed for classification purposes. Since this suggested approach uses only two EEG channels, it may be easier to install compared to prior cutting-edge systems that used PSG or numerous EEG channels and other physiological inputs for sleep scoring [17]. As a result, the patient’s comfort may be enhanced.

In literature, many studies [18,19,20,21,22,23,24,25,26,27,28,29,30] have been proposed for automatic sleep stage classification. Sharma et al. [13] in their recent study, have presented a sleep scoring method for healthy (good sleepers) and unhealthy subjects using cyclic alternating pattern (CAP) sleep database [31, 32] consisting of 108 PSG files [31], They employed norm-based features and attained an accuracy of 85.3%. Langkvist et al. [33] proposed a five-stage automated sleep stage classification system based on data from St. Vincent’s University Hospital and University College Dublin (SVUH) [31]. They used one EEG, one EMG, two EOG signal for their study. They developed deep belief network for the purpose and achieved the best result of 72.2% using the database containing only 25 subjects. Tzimourta et al. [34] utilised the Institute of System and Robotics—University of Coimbra (ISRUC) Sleep dataset [35] for EEG-Based autonomous sleep stage classification [35]. They employed a total of 118 subjects, healthy and unhealthy, out of which they used 100 subjects and 87187 epochs (30 sec) in total. This data was used for five-stage classifications using six EEG channels. Using time and frequency-based features coupled with random forest classifier, they got the best accuracy of 75.3%. Michielli et al. [5] utilised Physiobank’s sleep-EDF database [36, 37] for automated sleep stage classification based on a long short-term memory (LSTM) approach [31]. They achieved 83.6% accuracy for two-class classification and 86.74% for five-class classifications using ten healthy subjects. Willemen et al. [38] in their work, achieved 69% accuracy for five-class classification (WAKE-REM-N1-N2-N3) and 81% for three-class classification using 85 PSG files obtained from 36 healthy subjects acquired from Vrije Universiteit Brussel, with SVM classifier. Fonseca et al. [39] conducted sleep stage classification using cardio-respiratory signals obtained from PSG recording of 48 subjects acquired from the SIESTA project [40]. They achieved an accuracy of 69% for four-class classification and 80% for three-class classification using multi-class Bayesian linear discriminant with time-varying probabilities. Recently Phan et al. [41] proposed a convolutional neural network (CNN) framework for classification and prediction of automatic sleep stage scoring, using two different data-sets (i) sleep cassette (SC) from Sleep EDF [31] containing 20 subjects and (ii) Montreal Archive of Sleep Studies (MASS) [42] containing 200 subjects. They achieved an accuracy of 82.3% for SC and 83.6% for MASS for five-class classification. Shi et al. [43] achieved an accuracy of 81.1 ± 0.15% for a two-class classification employing multiple kernel learning and using data from SVUH Dublin [31] containing 25 PSG from 25 subjects with the suspected sleep disorder. Yuan et al. [44] in their recent study on sleep stage classification, used the dataset from SVUH Dublin [31] containing a total of 25 PSG. They employed hybrid CNN as a classifier and obtained an accuracy of 0.7424 ± 0.594. Radha et al. [45] in their recent work on sleep stage classification used LSTM model with dataset obtained from EU SIESTA project [40] containing 292 subjects. Thus, all above mentioned studies have used various publicly available databases such as CAP, Sleep EDF, ISURC sleep data, SVUH Dublin, Massachusetts Institute of Technology at Boston’s Beth Israel Hospital (MIT-BIH), Vrije University Brussel, SIESTA project and MASS. In all the above studies, The sleep model’s classification accuracy ranged from 75% to 85%, while the number of subjects employed in the study ranged from 10 to 292. Hence, it is essential to use a large and diverse database that contains a large number of diverse subjects to generalize the results. The Wisconsin Sleep Cohort (WSC) seems to be a good candidate that satisfies the above requirement, which consists of two databases containing 2570 subjects, including good sleepers and patients suffering from various sleep disorders. Hence, we employed the WSC sleep database in this study.

The salient features of the proposed study are mentioned below:

  • This study designed a new class of linear phase optimum wavelet bi-orthogonal filter bank using the least squares (LS) method wherein the filter bank is halfband pair filter bank (HPFB).

  • We have used a huge Wisconsin Sleep Cohort (WSC) which includes WSC_dataset_1 (1715 PSG) and WSC_dataset_2 (716 PSG ) compared to other state-of-the-art studies.

  • From the total of 2431 subjects, 21,65,205 (30 sec) epochs have been extracted and used in this study.

  • Our proposed model used only two unipolar EEG signals (O1_M2 and C3_M2). Hence, it can easily be incorporated into user-friendly hardware and devices.

  • Both single and dual-channel combined are considered in this study.

  • Feature extraction is done using HP, a good discriminator for EEG signals.

  • For classification, we employed supervised machine learning classifiers with a 10-fold cross-validation technique.

  • The existing proposed systems employed a single database to develop models, so generalization to new databases is unknown. To investigate the generalization of the proposed system, we evaluated the model with the following publicly available databases: CAP, sleep EDF, ISRUC, MIT-BIH, and the sleep apnea database from St. Vincent’s University.

  • Our system is found to be either better or competitive with existing state-of-art systems when we tested our model with the above-mentioned five databases other than WSC.

  • The proposed method yielded good accuracy and Kappa score using two unipolar EEG signals. Thus, the proposed model may be installed in real-time, low-cost IoT configurations and hardware, without compromising the comfort of the subjects.

2 Dataset used

The dataset used for this study was obtained from the National Sleep Research Resource (NSRR) and provided by the Wisconsin Sleep Cohort (WSC). The WSC cohort contains two different databases collected using different systems and at different periods. The WSC_dataset_1 was collected during 2000-2009 using Grass Heritage System, this dataset has 15 signals collected from 1715 recordings, while the second dataset WSC_dataset_2 (from 2009 -to present) was collected using Grass Comet Lab-Based system, and has 18 signals obtained from 716 recordings. The detailed descriptions of both datasets are mentioned in Table 1.

Table 1 Detailed explanation of the two datasets used in this work

The majority of the PSG files contain atleast two EEG, two EOG, two EMG, one ECG signal, and other signals such as snore, nasal airflow, oral airflow, thoracic, abdominal, position, SpO2. We employed only two EEG channels in the proposed study, namely O1-M2 and C3-M2 (O1 and C3 with reference to M2).

The WSC cohort contains PSG recordings of 2570 subjects with 24,62,228 epochs of 30 seconds duration. For EEG recording either O1-M2 or C3-M2 channels are used. The EEG recordings from both EEG channels (O1-M2 and C3-M2) have been collected in most of the subjects except for a few subjects. In this study, we have recruited only those subjects for which EEG recordings of both channels are available as indicated in Table 2. Thus, out of 2570 subjects, we used 2431 subjects. Hence, 21,65,205 epochs of each 30s are generated from 2431 PSG recordings. Table 3 contains information about WSC utilized in this study. Further, to evaluate the generalization of the proposed system, we have tested the model with five more publicly available databases: CAP, Slep-EDF, MIT-BIH, ISURC and SVUH can be found in [31, 32, 35,36,37, 37] and [46] respectively.

Table 2 Details of PSG and channel used in the study
Table 3 Distribution of epochs across sleep stages in two datasets of WSC

3 Methodology

The Fig. 2 depicts the proposed method. In the following sections, we have explained the data preparation and the technique we employed.

Fig. 2
figure 2

Proposed method used for the automated classification of sleep stages

3.1 Segmentation of the dataset

We used two unipolar EEG channels O1_M2 and C3_M2 in this work, and classified them both individually and collectively. As indicated in Table 4, the dataset was segmented into nine data subsets.

Table 4 Details of classification tasks (CT) and corresponding data subsets

Up-sampling datasets

The EEG channels of WSC_dataset_1 were sampled at 100 Hz, whereas the EEG signals of WSC_dataset_2 were sampled at 200 Hz. We have upsampled EEG signals of WSC_dataset_1 to 200 Hz to have all EEG signals at the same sampling frequency. The upsampled dataset was segmented into three data subsets, as indicated in Table 5. The samples were zero-padded to increase the sampling rate from 100 to 200 Hz.

Table 5 Details of classification tasks and data subset post upsampling

Furthermore, the EEG signal of every subject is segmented into 30 seconds epochs. The epochs are labeled according to the annotation files provided with the dataset.

3.1.1 Filtering and wavelet decomposition

The wavelet transform has shown to be an effective technique for evaluating non-linear, non-stationary EEG data. The wavelets provide both the frequency and temporal domain information. Hence, the feature extraction using wavelets can be an ideal approach for sleep scoring using EEG signals [47, 48]. Many signal processing applications have used two-channel orthogonal and wavelet biorthogonal filter banks [49]. Linear phase filters are desired in applications such as image and communication coding [50,51,52,53,54,55,56]. Except for Haar filter banks, orthogonal filters cannot attain a linear phase. In few situations, biorthogonal filter banks are favored over orthogonal filter banks. This study’s FB was created utilizing a least-squares (LS) technique. The FB is designed by minimizing a quadratic objective function for the given linear constraints. The quadratic function of pass-band and stop-band errors is used as the objective function. With a half band analysis filter, we designed a linear phase optimum biorthogonal wavelet filter bank.

The initial stage in designing the filter bank was to create a half band analysis lowpass filter (HALF) by creating a linearly restricted convex quadratic optimization problem with a convex mixture of passband and stopband errors. Following the design of the HALF, the synthesis lowpass filter (SLPF) is constructed similarly to the HALF, but with certain modifications such as (i) avoiding SLPF being confined as halfband filters, and (ii) along with the regularity criteria (vanishing moment) such that the perfect reconstruction conditions are fulfilled.

3.1.2 Design of wavelet filter bank

We used a two-stage design process to create a novel class of linear-phase biorthogonal filter banks (FBs). The FB designed is a halfband pair filter bank (HPFB) wherein the lowpass filter of the two-channel FB is a halfband filter [57, 58]. The FB’s design has been characterized as a two-stage optimization issue. In the initial stage, HALF is designed, and in the second stage, the SLPF filter has been designed. Both HALF and SLPF have been obtained as solutions of least squares (LS) optimization problem. The design process involves solving linear equations for the given linear constraints with the two-stage approach. Using the technique, one can design optimal filters with different lengths and vanishing moments [13,14,15, 59,60,61] designed easily. The filters are designed to minimize stop-band energy (SBE). The SBE can be expressed in a quadratic form. The vanishing moment (VM) restrictions are stated as a set of linear equations. Hence, the optimization problem is a LS problem whose solution can be obtained easily and effectively. We have used the CVX toolbox to solve optimization problems [62, 63]. In this study, we have used 29/15 filter bank in which the orders of HALF and SLFP are 28 and 14, respectively. The HALF and SLFP both contain 4 VMs each. The frequency responses and pole-zero plots of the filters used are shown in Figs. 3 and 4.

Fig. 3
figure 3

Frequency response obtained using the biorthogonal filter

Fig. 4
figure 4

Visualization of filter pair in Z-plane

Determining the level of decomposition is critical. We have tested 3-level, 5-level and 7-level wavelet decomposition and got best results from 5-level decomposition, thus 5-level wavelet decomposition is chosen. The frequency components that dominate the signal and the most significant sub-bands for categorization are retained. Based on this, one-dimensional five-level wavelet decomposition is conducted using the designed 29/15 FB. The wavelet decomposition yielded the six sub-bands with frequency ranges of 0-3.125 Hz (A), 3.125-6.25 Hz (D1), 6.25-12.5 Hz (D2), 12.5-25 Hz (D3), 25-50 Hz (D4), 50-100 Hz (D5). In this study, the lower frequency band 0-3.125 Hz (A) is referred as approximate coefficients [64], while the remaining bands with higher frequencies are referred as detailed coefficients.

3.1.3 Feature extraction

In our study, we extracted Hjorth parameters(HP) namely activity, mobility, and complexity developed by Bo Hjorth [65], from six sub-bands of each epoch’s five level one dimensional wavelet decomposition. These parameters primarily indicate the time domain properties of EEG signals. Table 6 provides a brief explanation for these parameters. After obtaining six sub-bands from each epoch’s five-level one-dimensional wavelet decomposition, the Activity, Complexity and Mobility of the subbands are extracted.

Table 6 Details of HP used in this study

Here y(t) represents the signal in the time domain and \({\frac {dy(t)}{dt}} \) denotes the first derivative of the signal.

3.1.4 Classification

The HP features that have been chosen are fed into supervised machine learning classifiers. In order to classify, Support vector machines (SVM) [66, 67], ensemble bagged trees (EBT) [68, 69], decision trees [68, 70], K-nearest neighbours (KNN) [71] and naive Bayes [72] are used. Among the above-mentioned classifiers, EBT delivered the best classification as indicated in Table 7. In this work, we have developed the classifiers using ten-fold cross-validation (CV) strategy.

Table 7 Performance of sleep stage classification achieved using different classifiers

4 Results

The final classification results obtained using the extracted features are obtained using the classification learner app of MATLAB. It may be noted from Table 3 that the observations and epochs between different sleep stages are not the same, thus making the entire dataset very unbalanced. Therefore, to evaluate such unbalanced systems we have used Cohen’s Kappa coefficient (κ). Classification accuracy and κ value for respective data subset is mentioned in Table 8.

Table 8 Performance of sleep stage classification achieved using EBT classifier and 10-fold Cross Validation

We have performed ANOVA test to evaluate the statistical significance of the features, from the test performed we obtained p-values of all the features as zero which indicates null hypothesis is rejected and all features are statistically significant. Therefore, we have used all features during the classification. We have also ranked the features. The summary of the features extracted and the rankings are mentioned in Tables 9 and 10 respectively.

Table 9 Summary of features extracted from O1_M2 EEG channel
Table 10 Summary of features extracted from C3_M2 EEG channel

The WSC database used for this study contains two main datasets the WSC_dataset_1 and WSC_dataset_2, the method of collecting the PSG signals from subjects, make these two datasets different from each other in terms of sampling frequencies and channels present. As previously mentioned, we chose these O1_M2 and C3_M2 unipolar EEG channels since they are present in the majority of subject files for this study. The selected channels (O1_M2 and C3_M2) from each dataset had different sampling frequencies (100 Hz for WSC_dataset_1 and 200 Hz for WSC_dataset_2), thus taking into account the diversity in sampling frequency. Hence we decided to classify the entire dataset in 9 different classification tasks. Ranging from CT1-CT9, which is described in Table 4. The accuracies obtained by these different classification tasks using EBT classifier are presented in Table 8. The best overall accuracy of 83.2% was obtained with Kappa value of 0.7345.

Below is a detailed overview of the aforementioned classification tasks. For CT-1, we had an overall accuracy of 73.4% and Kappa Value of 0.5584, the confusion matrix for which is Table 11(c). Individual accuracies for different classes, namely wake, N1, N2, N3, and REM, are 82.6%, 9.9%, 90%, 27.9% and 52.2%, respectively. In CT-2, an overall accuracy of 77% and Kappa value of 0.6275 is achieved with wake, N1, N2, N3 and REM having 83.7%, 16.4%, 91%, 34.3% and 67.3% respectively, the confusion matrix of which is shown in Table 11(d). CT-3 yielded an overall accuracy of 78.5% and Kappa value of 0.6499 with individual accuracies for wake, N1, N2, N3 and REM as 85,8%, 17.4%, 92%, 36% and 35.9% respectively the confusion matrix for which is Table 11(h). For CT-4, we had an overall accuracy of 78.4% and a Kappa Value of 0.6524 the confusion matrix for which is Table 11(b). Individual accuracies for different classes namely wake, N1, N2, N3 and REM are 89.2%, 12.4%, 90.7%, 56.5% and 56.5% respectively. In CT-5, an overall accuracy of 81.1% and Kappa value of 0.7017 is achieved with the wake, N1, N2, N3, and REM, having 90.1%, 18.2%, 91.1%, 64.3% and 67.5%, respectively, confusion matrix of which is shown in Table 11(e). CT-6 yielded an overall accuracy of 83.2% and Kappa value of 0.7345 with individual accuracies for the wake, N1, N2, N3, and REM, as 91.5%, 20.8%, 92.7%, 65% and 73.6%, respectively the confusion matrix for which is given in Table 11(g). For CT-7, we had an overall accuracy of 74% and Kappa value of 0.4164 the confusion matrix for which is Table 11(c). Individual accuracies for different classes namely wake, N1, N2, N3 and REM are 83%, 9%, 90%, 37% and 50% respectively. In CT-8 an overall accuracy of 77.5% and Kappa value of 0.6363 is achieved with wake, N1, N2, N3 and REM having 85.0%, 15.0%, 91%, 44.0% and 64.0% respectively, confusion matrix of which is shown in Table 11(f). CT-9 yielded an overall accuracy of 79.2% and Kappa value of 0.5151 with individual accuracies for wake, N1, N2, N3 and REM as 86.0%, 17.0%, 92%, 45% and 69% respectively the confusion matrix for which is given in Table 11(i).

Table 11 Confusion matrix obtained corresponding to data subsets using EBT classifier with 10-fold CV

From analysing the results of the above classification task CT1-CT9, it seems that C3-M2 gives better classification accuracy than O1-M2. We can observe that CT-2, CT-5, and CT-8 (CT with C3-M2 channels) provide better accuracy than CT1, CT4 and CT7 (CT with O1-M2 channel). For each WSC_dataset, the results get better after combining both the channels. By comparing accuracies of CT-1,2,3 (CT from WSC_dataset_1) and CT-4,5,6 (CT from WSC_dataset_2), we can conclude that WSC_dataset_2 provides better accuracy than WSC_dataset_1.

Due to the inherent difference in the sampling frequency of the dataset available, we experimented by upsampling the WSC_dataset_1 from 100 Hz to 200 Hz and then performing classification task to check whether it would improve the accuracy. The description of these classification task is given in Table 5. The accuracies obtained by these different classification tasks using EBT classifier are presented in Table 8. After upsampling the WSC_dataset_1, it showed an improvement in both individual and overall accuracies. CT-10 showed an overall accuracy of 75.6% and Kappa value of 0.6524, with individual accuracies of wake, N1, N2, N3, and REM as 86.7%, 11.8%, 91%, 36.7%, and 55.5% respectively the confusion matrix of which is given in Table 11(j). In CT-11, an overall accuracy of 79.7%, and Kappa value of 0.6743 is achieved with wake, N1, N2, N3, and REM have 87.9%, 19.1%, 92%, 44.2%, and 71.3%, respectively, confusion matrix of which is shown in Table 11(k). CT-12 yielded an overall accuracy of 81.5% and Kappa value of 0.7020 with individual accuracies for wake, N1, N2, N3 and REM as 89.2%, 21.8%, 93%, 47.4%, and 74.9%, respectively. The confusion matrix for which is given in Table 11(l).

5 Discussion

Our proposed method is the first study on sleep scoring developed using 2431 subjects belonging to the entire WSC sleep database. We employed two unipolar EEG channels (O1-M2 or/and C3-M2 channels). The proposed model is developed using a large number of 21,65,205 EEG epochs of 30s duration. The model is trained and tested using optimal wavelet-based, highly discriminating HP features. To ensure the robustness of the model, it is tested with 12 diverse data subsets (refer to Tables 4 and 5). To ensure all EEG signals have the same sampling frequency, the channels with an original sampling frequency of 100 Hz was upsampled to 200 Hz. The proposed model has yielded the best classification accuracy of 83.2% and kappa value of 0.7345. The high kappa value indicates the good discriminating ability of the proposed model.

The results obtained indicate that HP extracted from the C3-M2 channel presents better classification performance than the O1-M2 channels. Also, the model presents better classification performance for WSC_dataset_2 than WSC_dataset_1. The best overall result was achieved for CT-6 channel, with an accuracy of 83.2% and κ value of 0.7340. It is observed that the best individual accuracy is obtained for N2 followed by wake, REM, N3, then N1. This trend can be observed in all classification tasks(CT-1 to CT-12). The lesser percentage of epochs of the N1 and N3 stage in the total epoch (Table 3) might contributes to this trend. The CT-10,11,12 (with upsampling) showed better accuracy and κ values than CT-1,2,3 (without upsampling). Thus, the up-sampling has been proved to enhance accuracy. The κ value gives a better insight into the robustness of the model in addition to the accuracy while using the unbalanced datasets. Therefore, we have also used κ value to evaluate our model, and our model yielded a high κ value of 0.74. We used a combination of optimal bi-orthogonal filter bank and HP for our datasets, which yielded better accuracies and Kappa values than other wavelet decomposition and feature extraction methods [56, 73]. As compared to the state-of-the-art methods [74,75,76], it may be noticed that our proposed method gave better accuracy for the N2 stage. Sharma et al. [15] did a six-class SSC using the sleep-EDF database (80 subjects). They used six sub-bands using optimal wavelet filter bank, norm features, and EBT classifier to get a best model with accuracy 85.3%(unbalanced ) and 92.8% (unbalanced) and κ of 0.786 and 0.915. However, their model reported the best accuracy of 89% for the S2 class. Phan et al. [41] proposed a 5 class SSC using a joint classification-and-prediction framework based on convolutional neural networks (CNNs) using sleep EDF and MASS datasets. They obtained an individual accuracy of 88% (sleep EDF ) and 86.9% (MASS). In the proposed work, the best accuracy of 92.7% is obtained for the N2 stage, as shown in Table 11. Sharma et al. [77] did SSC using sleep-EDF database with 76 subjects and reported the best accuracy of 98.3% and achieved only 17.3% accuracy for the N1 class, while our model obtained an accuracy of 21.1% for N1 stage detection. Thus, the proposed model also performed well in detecting the N1 class.

To assess the robustness of the proposed method, the model was tested with other databases along with the WSC database. We evaluated the model with the following publicly available data, namely CAP [36, 37], Sleep EDF [31, 32], ISRUC [35], MIT-BIH [37] and sleep apnea database from SVUH [46]. We observed that our model is either better or more competitive with other mentioned models when we tested with these five databases. While employing the same number of subjects and EEG derivations, from the SLEEP-EDF, CAP, ISRUC, SVUH, and MIT-BIH databases, we achieved an accuracy of 90.6%, 83.6%, 77.4%77.4%, 75.4%, and 73.6%, respectively. From Table 12, it is clear that our method attained an accuracy of 83.6% for the CAP database, which is the highest among all other studies mentioned for the CAP database. For the Sleep-EDF database, our model achieved an accuracy of 90.6%, which is almost equal to the highest accuracy of 90.8% achieved by Yildirim et al. [28]. Similarly, for the ISURC database, the proposed model got an accuracy of 77.4%, which is the second highest among the other studies mentioned in the Table 12. Besides, for the MIT-BIH data, our model’s accuracy is almost close to the highest accuracy obtained by Tripathy et al. [78]. For the SVUH database, our model surpassed all other methods mentioned in the table. We attained an accuracy of 83.6% with the WSC database, which is the first work to the best of our knowledge. The comparison result indicates that our model not only performed well on the WSC database but its performance is also encouraging with all five popularly used public databases, indicating that the model is robust and accurate. It also indicates the generalizability of the model for any unseen new database.

Table 12 Comparison of the proposed method with state-of-the-art SSG studies developed using the WSC dataset

The advantages of the proposed method are as follows:

  • A two-stage technique is used to design a new class of linear phase WFB, wherein the analysis lowpass filter is a half-band filter. The LS criterion has been used in both the stages of the design stages, which resulted in a simple optimization problem involving a set of linear equations.

  • This is the first study to incorporate the entire WSC-dataset for SSG.

  • We have extracted and used 21,65,205 epochs of 30 seconds duration, which is the largest dataset to be used in studies.

  • We have used a new class of optimal bi-orthogonal wavelet filters to decompose the signals into sub-bands.

  • HP is used for feature extraction, which is a good discriminator for EEG signals.

  • The proposed method achieved good accuracy and κ value utilizing only two unipolar EEG channels.

  • Developed model is robust and generalizable as it performed well using on all six publicaly available databases.

The limitation of the study are as follows:

  • The datasets used are not balanced with respect to the number of epochs in each class, making it challenging for unbiased classification.

  • As the datasets are unbalanced and has the least number of data for N1 sleep stage, it has the least accuracy. However, the performance accuracy of the N1 sleep stage is better than other reported state-of-the-art techniques.

The COVID-19 pandemic has paved the way for the Internet of things (IoT) in healthcare to obtain accurate diagnosis. The rise in demand for remote healthcare and the progress in cloud computing, machine learning, and biomedical sensors has made personal healthcare possible. Our proposed model can be used as an algorithm for cloud-based servers to detect sleep stages accurately and, the workflow is shown in Fig. 5. First, the EEG signals can be collected from the subject using the smart wearable hardware. This data can be uploaded and processed at a local server, connected to a server hosted on the cloud. This processed data can be forwarded to medical professionals for examination, who will provide the medical assistance if needed.

Fig. 5
figure 5

Flow chart of an IoT-based automated sleep stage classification system

6 Conclusion

The proposed study aims to implement an automatic SSG method using a novel class of stop-band energy localized biorthogonal filter bank and supervised machine learning algorithms. We have used a very large dataset comprising of 2431 PSG files and 21,65,205 epochs of 30 seconds duration each. With the help of a filter bank and five-level wavelet decomposition, six subbands are created from each EEG epoch. After the wavelet decomposition, three HP are computed from each subband, resulting in 18 features. These features are then fed to supervised machine learning algorithms to classify the sleep stages. The classification is done using a ten-fold cross-validation strategy. Only two unipolar EEG signals (O1_M2 and C3_M2) are considered to carry out this study compared to PSG-based system, which makes our approach convenient for patients, and a portable device can be developed with it. The database employed in our study is very large and diverse involving many channels and several subdatasets. We have obtained the best accuracy of 83.2% and κ value of 0.7345 using an ensemble bagged tress (EBT) classifier with our 12 considered data subsets. We also observed that our system is either better or competitive with existing state-of-art systems when we tested with CAP, sleep-EDF, ISRUC, MIT-BIH, and St. Vincent’s University databases.

The choice of computationally less intensive features, optimal filter bank, and only two EEG channels make our developed model suitable for real-time application. Hence it can be implemented in a portable EEG device for sleep monitoring while sleeping comfortably in our homes. In the future, we plan to carry out sleep disorder identification automatically using the same WSC database.