Abstract
Diagnosing people with possible epilepsy has major implications for their health, occupation, driving and social interactions. The current epilepsy diagnosis procedure is often subject to errors with considerable interobserver variations by manually observing long-term lengthy EEG recordings that require the presence of seizure (ictal) activities. It is costly and often difficult to obtain long-term EEG data with seizure activities that imped epilepsy diagnosis for many people, in particular in areas that lack of medical resources and well-trained neurologists. There is a desperate need for a new diagnostic tool that is capable of providing quick and accurate epilepsy-screening using short-term interictal EEG signals. However, it is challenging to analyze interictal EEG recordings when patients behaviors same as normal subjects. This research is dedicated to develop new automatic data-driven pattern recognition system for interictal EEG signals and design a quick screening process to help neurologists diagnose patients with epilepsy. In particular, we propose a novel information-theory-guided spare feature selection framework to select the most important EEG features to discriminate epileptic or non-epileptic EEG patterns accurately. The proposed approach were tested on an EEG dataset with 11 patients and 11 normal subjects, achieved an impressive diagnostic accuracy of 90 % based on visually-evoked potentials in a human-computer task. This preliminary study indicates that it is promising to provide fast, reliable, and affordable epilepsy diagnostic solutions using short-term interictal EEG signals.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Epilepsy is the most common neurological brain disorders next to strokes, and about \(1\,\%\) of human population (40 million people) suffer from epilepsy [1]. An accurate diagnosis of people with possible epilepsy has big implications for their health, occupation, driving and social interactions, and an inaccurate diagnosis may have fatal consequences, especially in operating rooms and intensive care units. However, false diagnosis of epilepsy is unfortunately common in everyday practice. The estimates of the misdiagnosis rate of epilepsy varies greatly, from 5 % in a prospective childhood epilepsy study, 23 % in a British population-based study [2], to as high as 41 % in a Swedish study [3]. One reason for the misdiagnosis of epilepsy is that many other diseases or medical conditions can result in abnormal changes in brain behavior, or even cause seizure-like episodes and thus can be confused with epilepsy [1]. Among commonly used medical tests such as blood tests, magnetic resonance imaging (MRI), positron emission tomography (PET), electroencephalogram (EEG) recording play a central role in epilepsy diagnosis because it directly detects electrical activity in the brain. The epileptic diagnosis heavily relies on a tedious visual screening process by neurologists from lengthy EEG recordings that require the presence of seizure (ictal) activities. Thus, a prolonged (24-h) EEG monitoring are often necessary. In the past decades, there have been many quantitative analysis systems to help neurologists identify epileptiform patterns from long-term EEG recordings for seizure detection and seizure prediction. However, it is costly and often difficult to obtain long-term EEG data with seizure activities for epilepsy patients, especially in the areas that lack of medical resources and well-trained neurologists. There have been very few studies that using short-term interictal EEG for more convenient and affordable epilepsy diagnosis. There is a desperate need for a new medical diagnostic tool that is capable of providing quick and accurate epilepsy-screening using short-term interictal EEG signals.
This study is designed to investigate the application of short-term interictal EEG signals for epilepsy diagnosis using machine learning techniques. In particular, we propose an information-theory-guided feature selection and prediction framework to identify epilepsy-specific EEG patterns in a fast screening process in a human-computer interaction task using visually-evoked potentials (VEP). The proposed method has a potential to be applied to determine whether a patient is epileptic or non-epileptic in a quick screening process. The organization of the paper is as follows. Section 2 presents the information-guided sparse feature selection framework with regularization. The experimental design, data acquisition, and method implementation and validation procedure are presented in Sect. 3. The experimental results are provided in Sect. 4, and concluding remarks are given in Sect. 5.
2 Information-Theory-Guided Sparse Feature Selection
We propose a novel sparse feature selection approach that interactively integrates information theory with a sparse learning optimization framework with regularization to identify optimal feature subset to discriminate patterns of two classes.
Feature Selection. Feature selection techniques have been widely used to identify most important decision variables, to avoid overfitting and improve model performance, and to gain a deeper insight into the underlying processes or problem. Feature selection techniques generally can be categorized into three categories: embedded methods, wrapper methods, and filter methods [4]. Both embedded methods and wrapper methods rely on an employed classifier or model for feature subset selection. Thus, the feature selection performance is specific to the selected model. Typical approaches include Pudi’s floating search [5], stepwise selection [6]. Filter techniques assess the relevance of features by looking only at the intrinsic properties of the data. Some popular examples include correlation-based feature selection [7], Fast correlation-based feature selection [8], and minimum redundancy maximum relevance (mRMR) [9]. However, most current filter techniques select high-ranked features and do not consider feature dependency fully in feature selection. Several individually low-scored features can be combined to form a strong discriminative feature subset for classification. To address this problem, we propose a novel feature selection framework that combines mutual information feature filtering and sparse-learning method interactively to capture feature dependencies and identify the most informative feature subset efficiently.
Mutual Information-Based Feature Ranking. In information theory, mutual information (MI) is a measure of inherent dependence between two independent variables [10]. MI measures how much information a feature contains about the class without making any assumptions about the nature of their underlying relationships. The mutual information of two variables X and Y, denoted by I(X, Y), can be calculated by:
where p(x) and p(y) are the marginal probability distribution and p(X, Y) is the joint probability distribution of the variable X and Y. MI can capture nonlinear dependency among random variables and can be applied to rank features in feature selection problems [9]. The basic idea is to keep the more informative features (with higher MI) and remove the redundant or less-relevant features (with low MI) in filter-based approaches. These approaches can work well in many cases. However, they are subject to issues of missing some important features by just excluding low MI-ranked features. The interactions and dependencies among features are insufficiently considered in the current MI-based feature selection approaches. Some low-ranked weak features may be integrated with high-ranked features to produce stronger discrimination power for classification. Based on this consideration, we propose a novel feature selection framework that can consider both high-ranked and low-ranked features to discover the most important features efficiently.
Interactive Feature Selection Framework. The key idea of the proposed approach is to take into account feature dependency while keeping the searching process computational efficient. The proposed mutual-information-guided feature selection framework is built on the three steps: MI-based feature ranking, sparse feature learning on low MI-ranked features, and integration of high- and low-ranked features. In the feature ranking step, we use MI to rank features and identify a subset of high MI features that have the best informative power individually to class labels. Among those features, the highly correlated features are considered as redundant features and removed in a way similar to the MRMR approach. Given a number of features k, the subset of top k features ranked by MI is denoted by S, and the subset of the remaining features is denoted by W. In the second step, we employ the most popular sparse learning algorithm, (least absolute shrinkage and selection operator) lasso, to select potentially important feature subset from the low-ranked features in set W. The formulation of lasso with a \(l_1\)-norm penalty is as follows:
The lasso method can effectively select a sparse model by penalizing and forcing coefficients of some variables to be zero. Assume \(k_2\) features are selected by the lasso algorithm. The third step is to find the optimal feature subset by exploring the \(k_1\) high-MI-ranked features and \(k_2\) lasso selected low-MI-ranked features. Within a small set of (\(k_1+k_2\)) features, it is possible to enumerate different combinations of feature subsets with a small feature pool. Feature subset evaluation is based on leave-one-out cross-validation classification performance using logistic regression. We propose to evaluate feature subset in an ascending order of feature set size. In particular, we start with one feature, then combinations of two features, combinations of three features, etc. The subset evaluation stops when the cross-validation accuracy cannot be further improved. Then we report the best prediction model with optimal feature subset. The proposed mutual-information-guided sparse feature selection framework is shown in Fig. 1. Compared with other feature selection, the proposed framework combines information theoretic criteria and sparse learning method to supervise feature selection and discover the most important features efficiently.
3 Experimental Design for Epilepsy Diagnosis
EEG Data Acquisition. In this study, EEG was recorded from a 128-channel electrode array using a geodesic sensor net and Electrical Geodesics, Inc. (EGI; Eugene, OR) amplifier system with signal amplified at a gain of 1000 and bandpass filtered between 0.1 Hz and 100 Hz. During recording, EEG was referenced to the vertex electrode and digitized continuously at 500 Hz. The placement of the 128 scalp EEG electrodes is shown in Fig. 2.
Visually Evoked Potentials. Visually evoked potentials (VEPs) are electrical potentials (usually EEG) recorded in presence of visual stimuli, and are distinct from spontaneous EEG potentials recorded without stimulation. In particular, the steady-state visually evoked potentials (SSVEPs) have been widely investigated in the past 40 years and have been shown to be useful to analyze many brain cognitive paradigms (visual attention, binocular rivalry, working memory, and brain rhythms) and clinical neuroscience (epilepsy, aging, schizophrenia, migraine, autism, depression, anxiety, and stress). SSVEPs are evoked responses induced by long stimulus trains with flickering visual stimuli. The steady-state potentials are periodic with a stationary distinct spectrum showing stable characteristic SSVEPs peaks over a long time period. It has been found that photosensitivity is found to be common in patients with epilepsy, and visual stimulation may engage the mechanism underlying hyperexcitability in the patients. A series of experiments by Wilkins et al. indicated that spatial properties of visual patterns can elicit epileptiform EEG abnormalities [11]. The epileptic response was reported to be sensitive to luminance, with higher luminance inducing a higher risk of epilepsy [12]. People with migraine or epilepsy are especially prone to symptoms of visual perceptual distortions and visual stress on viewing flicking striped patterns. In a recent study, Birca et al. showed that SSVEP harmonics in the gamma range (50–100 Hz) have significantly stronger amplitudes and greater phase alignment for patients with febrile seizures. In children with focal epilepsy, a similar effect in the gamma range was shown by Asano et al. [13]. As patient with epilepsy are prone to exhibit abnormal EEG responses to repetitive modulated flicking patterns, the resulting SSVEPs can be employed to discriminate epileptic and non-epileptic patients in a short EEG test rather than a long-term EEG monitoring often around or longer than 24 h. The experimental design of this study is based on this observation. We make an attempt to test the hypothesis that epileptic and non-epileptic EEG recordings during steady state visual stimulation can be classified.
Experimental Design. Eleven patients with epilepsy and eleven healthy subjects were recruited in this experiment. The 11 patients had been diagnosed with idiopathic generalized epilepsy (IGE) at University of Washington (UW) Medicine Regional Epilepsy Center at Harborview. The patients with history of photic-induced seizure or photoparoxysmal responses (PPR) were exclude in order to minimize the risk of inducing seizures during the experiment. The 11 healthy subjects were selected from those who did not have a history of neurological or psychiatric diagnoses such as migraine or schizophrenia. All the patients and normal subjects had normal or corrected-to-normal visual acuity.
Each subject underwent the same experimental protocol during EEG recording. Visual stimuli were consisted of a high contrast strip pattern presented on a 19-inch LaCie Electron Blue IV monitor at a resolution of \(800 \times 600\) pixels, with a 72 H vertical refresh rate and a mean luminance of 34 cd/\(\mathrm{m}^2\). The strip contrast pattern flickering (condition 1) or switching (condition 2) at 7.5 Hz and the contrast level were temporally modulated by 10 levels from lowest contrast (level 1) to highest contrast (level 10) periodically. Each contrast level lasted for 1.067 s with 16 reversals of the flicker pattern. Thus, each stimulus of 10 contrast levels was 10.67 s. Each subject performed 20 trials for condition 1 and 20 trials for condition 2 with brief breaks between trials. A typical session of each subject is about 10–15 min.
Signal Processing and Feature Extraction. The visual stimulation flicking at a constant frequency can evoke harmonic oscillations and the SSVEPs were found to have the same fundamental frequency (rst harmonic) as the visual stimulating frequency [14]. A recent study showed that the higher SSVEP harmonics can also play an important role in studying brain functions [15]. In this study, we extracted frequency features of SSVEPs by Discrete Fourier Transform (DFT) with a 0.5 Hz resolution for each EEG channel of each trial with a time length of 1.067 s. The frequency components obtained from DFT are subject to signal variations. If signal strengths are different, the DFT coefficients are also different even the two time series signals share similar wave patterns. EEG signal is known to have significant inter-individual variability [16], and the signal amplitudes can vary considerable from one person to another. Thus, the extracted DFT frequency components can be problematic in feature selection and model construction across subjects. To tackle this problem, we introduced an normalization step based on Parseval’s Theorem. Parsevals Theorem states that the power spectrum summed over all frequencies is equal to the variance of the signal. Based on this rule, we take standard deviation of a signal as a normalization factor and normalize the signal to unit variance before applying DFT.
From the normalized DFT frequency components, the components at stimulation frequency (7.5 Hz) and multiple of stimulation frequency (up to 9th harmonics) were selected as signal features. Then a segment of EEG signal is represented by nine features that include nine harmonic frequency components that may be informative. The feature extraction was applied to each EEG channel of each trial for each subject. For each subject, the features from trials with the same contrast level were averaged to be the features of the contrast level. In summary, there are 128 (channel) \(\times \) 10 (contrast level) \(\times \) 9 (frequency component) = 11520 features for each subject. In the next, we will present a new feature selection approach to select the most informative features to discriminate epileptic patients from normal subjects.
Assessment and Validation. The feature subset assessment was based on leave-one-out cross-validation procedure as shown in Fig. 3. In order to reduce the bias of training and testing data, cross validation techniques have been extensively to assess a classification model. In this study, we employed a leave-one-patient-out cross-validation methodology in order to avoid the potential bias of having EEG samples from the same patients in both the training and testing data. We measured model classification accuracy by the average of sensitivity and specificity. Sensitivity and specificity are widely used in the medical domain as classification performance measures. We labeled the EEG samples from epileptic patients as positive and those from non-epileptic patients as negative. The sensitivity measures the fraction of positive cases that are classified as positive; the specificity measures the fraction of negative cases classified as negative.
4 Computational Results
We performed our feature selection and classification approach for each of the 10 contrast level and each of the 9 harmonic frequencies independently. This experimental setup is specially designed to find out which contrast and which harmonic frequency are most prominent to discriminate epileptic patients from normal subjects. In the feature selection step, we selected the top ten highest MI feature set first, and performed Lasso to select additional features from the remaining features with relative-low MI values. Once we finalize the feature candidates (lasso-selected low-MI features and top 10 high-MI features), we enumerate feature subset starting from one feature. The feature combination with the highest cross-validation classification accuracy was selected as the as the optimal feature subset. The classification accuracies for each contrast level and harmonic frequency are shown in Table 1. We notice that the contrast level 7 and the 5th harmonic frequency generated the best validation accuracy of 90 %. There were six selected channels: 53, 54, 56, 75, 114, 119. Using prior knowledge guided feature selection have very good interpretability to physicians and neurologist.
We also compared three popular feature selection approaches, regular Lasso feature selection [17], stepwise feature selection using statistical significance test [6], Pudil’s floating search [5]. Table 2 shows the classification performance comparisons of our method with the three popular feature selection methods. The feature subset picked up by our approach generated the highest cross-validation accuracy of 90 %, followed by the Pudil’s floating search with an accuracy of 85 %. Both regular Lasso and stepwise selection got the validation accuracy of 80 %. Also for the overall performance cross the 10 contrast levels and 6 harmonic frequencies, the proposed approach achieved an overall accuracy of 61 % while performance of other methods were around 50 % with larger standard deviation. This indicates that the feature subset selected by the proposed approach had better discriminative power than the feature subsets selected by the comparing approaches. The experimental results confirmed that the proposed feature selection framework indeed works effectively to capture feature dependencies and discover optimal feature subset. We combined high-MI features with promising low-MI features indeed generated stronger discriminative features that may be ignored by most of the current feature selection algorithms. The proposed information-guided sparse feature selection framework is capable of generating a spare model with good interpretability while preserving the most informative feature combinations to improve classification performance.
5 Conclusions and Discussions
A quick and accurate epilepsy-screening tool could enormously reduce associated healthcare costs and improve the current diagnosis procedure. To reliably recognize if a patient has epilepsy, we developed a novel mutual-information-guided sparse feature selection and classification framework to identify epilepsy-specific patterns from visually-evoked potentials in a human-computer task. The experimental results confirmed that the proposed method achieved the best diagnostic accuracy compared with several popular methods. The proposed method has a potential to help physicians to determine whether a patient is epileptic or non-epileptic in a quick screening process. More importantly, the proposed information-theory-guided sparse feature selection is an generally framework. It is also promising to help physicians and neurologists in recognizing abnormal brainwave patterns in huge medical dataset with different brain imaging techniques (such as EEG, MEG, and fMRI). The long-term goal of this study is to develop a fast, reliable, and affordable epilepsy diagnostic system using short-term interictal EEG signals. Such a system can revolutionize the current epilepsy diagnosis practice with wide and convenient applications.
References
Epilepsy Foundation: Epilepsy foundation: not another moment lost to seizures (2006). http://www.epilepsyfoundation.org
van Donselaar, C.A., Stroink, H., Arts, W.-F.: How confident are we of the diagnosis of epilepsy? Epilepsia 47(S1), 9–13 (2006)
Forsgren, L.: Prospective incidence study and clinical characterization of seizures in newly referred adults. Epilepsia 31, 292–301 (1990)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Pudil, P., Novoviov, J., Kittler, J.: Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994)
Draper, N., Smith, H.: Applied Regression Analysis, 2nd edn. Wiley-Interscience, Hoboken (1998)
Hall, M.: Correlation-based feature selection for machine learning. Ph.D. thesis, Department of Computer Science, Waikato University, New Zealand (1999)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Cover, T.M., Thomas, J.A.: Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing), 2nd edn. Wiley-Interscience, Hoboken (2001)
Wilkins, A., Nimmo-Smith, I., Tait, A., McManus, C., Sala, S.D., Tilley, A., Arnold, K., Barrie, M., Scott, S.: A neurological basis for visual discomfort. Brain 107, 989–1017 (1984)
Vialatte, F., Maurice, M., Dauwels, J., Cichocki, A.: Steady-state visually evoked potentials: focus on essential paradigms and future perspectives. Prog. Neurobiol. 90(4), 418–438 (2010)
Asano, E., Nishida, M., Fukuda, M., Rothermel, R., Juhasz, C., Sood, S.: Differential visually-induced gamma-oscillations in human cerebral cortex. NeuroImage 45(2), 477–489 (2009)
Regan, D.: Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. Elsevier Science Ltd., New York (1989)
Muller-Putz, G.R., Scherer, R., Brauneis, C., Pfurtscheller, G.: Steady-state visual evoked potential (SSVEP)-based communication: impact of harmonic frequency components. J. Neural Eng. 2(4), 123–130 (2005)
Smith, S.: EEG in the diagnosis, classification, and management of patients with epilepsy. J. Neurol. Neurosurg. Psychiatry 76, ii2–ii7 (2005)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1994)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, S., Xiao, C., Tsai, J.J., Chaovalitwongse, W., Grabowski, T.J. (2016). A Novel Mutual-Information-Guided Sparse Feature Selection Approach for Epilepsy Diagnosis Using Interictal EEG Signals. In: Ascoli, G., Hawrylycz, M., Ali, H., Khazanchi, D., Shi, Y. (eds) Brain Informatics and Health. BIH 2016. Lecture Notes in Computer Science(), vol 9919. Springer, Cham. https://doi.org/10.1007/978-3-319-47103-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-47103-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47102-0
Online ISBN: 978-3-319-47103-7
eBook Packages: Computer ScienceComputer Science (R0)