Classification of epileptic EEG signals based on simple random sampling and sequential feature selection
- 1.8k Downloads
Electroencephalogram (EEG) signals are used broadly in the medical fields. The main applications of EEG signals are the diagnosis and treatment of diseases such as epilepsy, Alzheimer, sleep problems and so on. This paper presents a new method which extracts and selects features from multi-channel EEG signals. This research focuses on three main points. Firstly, simple random sampling (SRS) technique is used to extract features from the time domain of EEG signals. Secondly, the sequential feature selection (SFS) algorithm is applied to select the key features and to reduce the dimensionality of the data. Finally, the selected features are forwarded to a least square support vector machine (LS_SVM) classifier to classify the EEG signals. The LS_SVM classifier classified the features which are extracted and selected from the SRS and the SFS. The experimental results show that the method achieves 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively.
KeywordsElectroencephalogram Epileptic seizures Simple random sampling Sequential feature selection Least square support vector machine
Epilepsy is a disorder which affects the human brain and hugely impairs patients’ daily lives. It is characterized by recurrent and sudden incidence of epileptic seizures . According to an estimation of the World Health Organization, more than 50 million of population are affected by epilepsy [2, 3]. Approximately, almost 1 % population have the neurological disorders [4, 5, 6]. It leads to numerous research works to identify epilepsy and related treatments. Electroencephalogram (EEG) signals have been proved as a powerful tool for detecting and diagnosing different neurological diseases. EEG signals are often used to detect and classify epilepsy . It is often difficult for the experts to recognize the people who have a brain disorder through visual inspection of EEG signals . In addition, visual inspection for discriminating EEG signals is a time consuming, error prone, costly process and not sufficient enough for reliable information. The analysis and classification of EEG signals can lead to better diagnostic techniques for brain-related disorders. It is thus important to develop better EEG classification methods.
Many researchers developed new techniques to extract the significant information from EEG signals. The information is used as the input to different classifiers. There are many approaches used to extract the key features as well as to further select features. Most of these fall under five broad categories: time domain, frequency domain, time–frequency domain, traditional non-linear methods and graph theory approaches .
One of the methods used in this paper for extracting epileptic EEG data is sample random sampling (SRS) technique. Researchers often applied the SRS in time domain. In this technique, each sample of the population has the same chance to be selected as a subject. The complete process of sampling is done in a single step, with each subject can be selected independently from the other samples of the population . Then, we forwarded all these samples to the sequential feature selection (SFS) method for selecting the best features.
This study uses the selected features as the input for a classifier. One of the most popular classifiers, the least square support vector machines (LS_SVMs) , is used to classify EEG data. This technique is used to identify the EEG data from healthy people and epileptic patients for epileptic seizures.
A lot of approaches for EEG signals classification have been developed . There were reported a diverse of classification precisions for epileptic EEG data. Brief discussions of the previous research are provided below.
Gajic et al.  extracted different features from time, frequency, time–frequency domain and non-linear analysis.
These features were obtained from sub-bands with good representative characteristics. The researchers reduced the dimension of the features by using scatter matrices. This method yielded 98.7 % accuracy.
An optimum allocation-based principal component analysis method was proposed by Siuly and Li  to extract key features for the classification of multi-class EEG signals from epileptic EEG data. They used four different classifiers which were LS_SVM, naive Bayes classifier, k-nearest neighbour (KNN) algorithm and linear discriminant analysis, to find out which one was the best classifier. They used four different output coding approaches for the multi-class LS_SVM. These were error correcting output codes, minimum output codes, one versus one (1vs1) and one versus all. That method achieved a 100 % accuracy with LS_SVM_1vs1.
Feature extraction was carried out through an empirical mode decomposition. The extracted features were forwarded to two classifiers, the classification and regression tree and the C4.5 classifiers. The method using the C4.5 classifier suggested by Martis et al.  obtained good experimental results of 95.33, 98 and 97 % for accuracy, sensitivity and specificity, respectively.
Chua et al.  gained features from raw EEG recordings by using higher order spectra. They used a Gaussian mixture model (GMM) and a SVM classifiers to detect epileptic EEG signals. They achieved average accuracies of 93.11 and 92.56 % for the HOS based GMM classifier and the SVM classifier, respectively, for different EEG classes, such as normal, pre-ictal and epileptic EEGs.
On the other hand, a genetic algorithm (GA) was used by Guo et al.  to automatically extract features from EEG data in order to enhance the classifier’s performance, as well as, to reduce the feature’s dimensionality. They used two groups of epileptic datasets. The first group was two classes of healthy people and epileptic patients. The second group was three classes of healthy people, inter-ictal and ictal. The KNN classifier was used in the work to classify the two groups. They gained 88.6 and 99.2 % accuracies for the first group without GA and with GA, respectively. They obtained of a 67.2 % accuracy without GA, and 93.5 % within GA, respectively, for the second group.
Ocak decomposed EEG signals, which were recorded from normal subjects and epileptic patients, by using discrete wavelet transform . An approximate entropy (ApEn) was extracted from the approximation and the detail coefficients. The methodology achieved more than 96 % accuracy.
Srinivasan et al. used the ApEn to extract features and an artificial neural network classifier to identify epileptic EEG signals . That approach achieved a high overall accuracy of 100 %.
Srinivasan et al. also proposed a special type of recurrent neural network, Elman network . They used the feature extracted in time domain and frequency domain as the input to the proposed classifier. The Elman network method yielded a 99.6 % accuracy with a single input feature.
A wavelet transform method was used by Gajic et al.  to extract the key features. They also used scatter matrices to reduce the dimensionality of the features. These features were used as the input to a quadratic classifier. The EEG epileptic database was classified into healthy subjects, epileptic subjects during a seizure-free (inter-ictal) and epileptic patients during the seizure activity (ictal). They obtained a 99 % classification accuracy.
Shen et al.  proposed a cascade of wavelet-ApEn for feature selection. They used Fisher scores for adaptive feature selection, and SVM for feature classification to detect epileptic seizures. They applied the method to different epileptic EEG recordings: open source EEG data and clinical EEG data. The method obtained the overall classification accuracies of 99.97 and 98.73 %, respectively.
A sampling technique (ST) based on a LS_SVM was proposed by Siuly et al. . Firstly, they used the ST to extract features from two classes of, normal persons with eyes open and epileptic patients during a seizure activity. They applied the LS_SVM to the extracted features. The total classification accuracy by that approach for both the training and testing datasets was 80.31 and 80.05 %, respectively.
Husain and Rao  presented an artificial neural network model using back propagation algorithm for the classification of epileptic EEG signals. They decomposed the EEG signals into a finite set of band limited signals termed as intrinsic mode functions. They also applied Hilbert transform on these intrinsic mode functions to calculate instantaneous frequencies. They achieved a 99.80 % overall classification accuracy.
Rückstieß et al.  performed a SFS method to select the most representative features at each time step. Each successive features depended on the previous features. All the features were put into one vector and were forwarded to a classifier. This approach was applied for handwritten digits classification and a medical diabetes prediction task.
A sequential floating forward selection (SFFS) algorithm was proposed to detect epileptic seizures in EEG signals by Choi et al. . They selected the most energy power as the features from frequency bands by using the SFFS algorithm. The total accuracy obtained by that method was 97.2 %.
In this study, we developed a new method combining the SRS with the SFS to acquire the best features set, and then we use the features as the input of the LS_SVM classifier for the EEG classification. All the techniques are discussed in Sects. 3 and 4. The conclusion is presented in Sect. 5.
2 Experimental data
The data used in this study are open source EEG recordings and are publicly available1 . The database includes five sets of EEG recordings (sets A–E), with each containing 100 single-channel EEG signals of 23.6 s from five separate classes. References [13, 26] presented all details of these datasets from set A to E. This study selected set A which was taken from surface EEG recordings of five healthy people with eye open, and set E which was taken from EEG records of five pre-surgical epileptic patients during epileptic seizure activity.
3.1 Simple random sampling (SRS) technique
SRS technique is a popular type of random or prospect sampling . In this technique, each sample of the population has the same chance of being selected as a subject. We put the number of population in a sample size calculator of the “Creative Research System” (available in sample size calculator online), to determine the sample size for both samples and subsamples. In this work, the dataset used are set A and set E (repeated). Each set has 100 data files, and each file has 4097 observations.
3.2 Sequential feature selection (SFS) algorithms
3.3 The feature set
After decreasing the dimensions of the features through the SFS, the new feature set is forwarded to the LS_SVM classifier. In this study, we obtain a feature set that has 2000 data points of 35 dimensions. These features are divided into two groups, which are the training set and the testing set. The training set is directed to train a classifier. The testing set is employed to evaluate the performance of the methodology and it is utilized as the input of the classifier.
3.4 Least square support vector machines
In this subsection, we briefly review some basic work on LS_SVMs for classification. LS_SVMs are proposed by Suykens and Vandewalle. LS_SVMs are the least square versions of SVMs, which are a set of related supervised learning methods that analyse data and recognize patterns. Moreover, they are used for classification and regression analysis . In this research, the LS_SVM classifier with a radial basis function kernel is used for the classification of epileptic EEG signals. These classifiers can avoid the problem of convex quadratic programming from the classical SVMs by using a set of linear equations . In this paper, the classification is performed by LS_SVMlab (version 1.8) toolbox in MATLAB2 .
3.5 Performance measures
This subsection presents assessing how the proposed method performs. The assessments include accuracy (also known as recognition rate), sensitivity (or recall) and specificity. The accuracy of a classifier is the percentage of the test set which is correctly classified by the classifier. The sensitivity is referred to the true positive rate which is the proportion of the positive set correctly identified.
4 Results and discussions
In this study, we involved two datasets: sets A and E as mentioned in Sect. 2. SRS technique was used to extract features from the datasets. This technique selected features randomly by choosing 10 samples from each dataset (sets A and E). A five subsamples were selected from each sample. From each subsample, nine statistical features, such as minimum, maximum, mean, median, mode, first quartile, third quartile, inter-quartile range and Std were extracted as aforementioned in Sect. 3.1.
Classification accuracy for epileptic EEG signals (sets A and E)
Experimental results using different statistic features as the criterion
Mean ≥ fs2 (SFS_feature)
Mean ≤ fs2 (SFS_feature)
Max ≤ fs2 (SFS_feature)
Min ≥ fs2 (SFS_feature)
Mode ≥ fs2 (SFS_feature)
Median ≤ fs2 (SFS_feature)
Std ≥ fs2 (SFS_feature)
Comparison of the results and time complexity for the proposed method with other methods
The proposed method with the best criterion (SRS_SFS_LS_SVM)
Comparison of performance of our proposed method with two recently reported methods for sets A and E of the EEG epileptic database
This research concentrates on two classes of EEG signals from healthy people and epileptic patients. The study presents a SRS_SFS method to extract and select the key features for classifying EEG signals into two classes. The LS_SVM classifier is used to classify two-category EEG data after the feature extraction and selection. This method yields the results of 99.90, 99.80 and 100 % for classification accuracy, sensitivity and specificity, respectively. In addition, the proposed method is faster than the SRS technique. It means that the SRS_SFS is useful for extracting and selecting the EEG features. To sum up, the proposed method is very efficient for analysing and classifying epileptic EEG signals. It will be also useful for the classification of other biomedical data.
- 2.World Health Organization (WHO) (2011). Report: WHO. http://www.who.int/mediacentre/factssheets/fs999/en/index.html. Accessed Dec 2015
- 3.Mcgrogan N (1999) Neural network detection of epileptic seizures in the electroencephalogram. http://www.new.ox.ac.uk/~nmcgroga/work/transfer
- 4.Boer H, Engel J, Prilipko L (2005) Global campaign against epilepsy. Epilepsy Atlas 82–83Google Scholar
- 10.Barreiro PL, Albandoz JP (2001) Population and sample. Sampling techniques. Management mathematics for European schools, MaMaEusch (994342-CP-1-2001-1-DECOMENIUS-C21)Google Scholar
- 11.Wu F, Zhao Y (2005) Least squares support vector machine on Moret wavelet kernel function. In: International conference on neural networks and brain, 2005. ICNN&B’05. IEEE, Beijing, p 327–333Google Scholar
- 13.Gajic D, Djurovic Z, Gligorijevic J et al (2015) Detection of epileptiform activity in EEG signals based on time–frequency and non-linear analysis. Front Comput Neurosci. doi:10.3389/fncom.2015.00038
- 21.Siuly S, Li Y, Wen P (2009) Classification of EEG signals using sampling techniques and least square support vector machines. In: Rough sets and knowledge technology. Springer, Berlin, pp 375–382Google Scholar
- 23.Rückstieß T, Osendorfer C, Van Der Smagt P (2011) Sequential feature selection for classification. In: AI 2011: advances in artificial intelligence. Springer, Berlin, pp 132–141Google Scholar
- 24.Choi K-S, Zeng Y, Qin J (2012) Using sequential floating forward selection algorithm to detect epileptic seizure in EEG signals. In: 2012 IEEE 11th international conference on signal processing (ICSP). IEEE, Beijing, pp 1637–1640Google Scholar
- 25.EEG time series (Nov 2005). http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html. Accessed Nov 2015
- 27.Marcano-Cedeño A, Quintanilla-Domínguez J, Cortina-Januchs M et al (2010) Feature selection using sequential forward selection and classification applying artificial metaplasticity neural network. In: 36th annual conference on IEEE Industrial Electronics Society. USA, pp 2845–2850Google Scholar
- 28.Ferri F, Pudil P, Hatef M et al (1994) Comparative study of techniques for large-scale feature selection. Pattern Recognit Pract IV:403–413Google Scholar
- 32.LS-SVMlab toolbox (version 1.8). http://www.esat.kuleuven.ac.be/sista/lssvmlab/. Accessed Nov 2015
- 33.Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Elsevier, AmsterdamGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.