Keywords

1 Introduction

According to the World Health Organization, Epilepsy is one of the most common neurological diseases globally, approximately 50 million people worldwide suffer from it [1]. Epilepsy is a chronic disorder of the brain which is a result of excessive electrical discharges in a group of brain cells. Different parts of the brain can be the site of such discharges [1]. Recent studies have shown that up to 70% of patients can be successfully treated with anti-epileptic drugs (AEDs) [1]. For patients with drug-resistant focal epilepsy, resection of epileptogenic tissues is one of the most promising treatments in controlling epileptic seizures. Hence, it’s very important to determine the seizure focus in surgical therapy. The nature that focal iEEG signal is more stationary and less random than non-focal iEEG signal enable iEEG to be used for identification of location [2].

As it is not certain that symptoms will present in the EEG signal at all times, interictal iEEG from epilepsy patients are monitored or recorded in long-term. In this process, large amounts of data are generated that detection of seizure from the iEEG recordings with visual inspection by neurological experts is time-consuming. That can cause a delay of hours or even weeks in the patient’s course of treatment. Therefore, many methods of automatic detection of epileptic foci have been proposed to assist neurologists by accelerating the reading process and thereby reducing workload, such as classification of normal, interictal and seizure [3]. For drug-resistant focal epilepsy, automatic detection of seizure focus localization from the interictal iEEG signal is required. In order to determine the epileptic seizure focus according to the iEEG signals, it essential to extract the most discriminative features, followed by classification of features into the focal part or non-focal part.

For feature extraction, methods in common use are entropy, empirical mode decomposition (EMD) [4] and time-frequency analysis such as Fourier transform, wavelet transforms (WT) [5], etc. Particularly, it is demonstrated that the time-frequency domain extracted with the aid of STFT is suitable for classifying EEG signal for epilepsy [6]. For classification, in recent years, automatic classification of EEG by machine learning techniques has been popular, including support vector machines (SVM) [4], K-Nearest-Neighbor method (KNN) [7] and deep learning, such as the CNN and recurrent neural network (RNN) [8]. Several computer-aided solutions based on deep learning that used the raw iEEG time-series signal as input had been proposed to localize seizure focus in epilepsy [9]. However, the performance of the combination of the time-frequency domain and CNN has not been widely tested for this task.

In this paper, inspired by successes in CNN with the raw EEG time-series signal [9], we propose a deep learning approach for the classification of focal and non-focal iEEG signal combining time-frequency analysis and CNN, in which simple features are extracted from the time-frequency domain with the use of STFT, discriminative features are learned with convolutional layers and classification is performed with a fully connected layer.

2 Dataset

The iEEG signals used in this study are obtained from the publicly available Bern-Barcelona iEEG dataset provided by Andrzejak et al. at the Department of Neurology of the University of Bern [2], collected from five epilepsy patients who underwent long-term intracranial iEEG recordings. All patients suffered from long-standing drug-resistant temporal lobe epilepsy and were candidates for surgery.

Signals recorded at epileptogenic zones were labeled as focal signals, otherwise, it was labeled as non-focal signals. The dataset consists of 3750 pairs of focal iEEG signals and 3750 pairs of non-focal iEEG signals, and a pair of iEEG signals from adjacent channels were recorded into each signal pair, sampled of 20 s at a frequency of 512 Hz. In order to guarantee to get rid of the seizure iEEG signals, the iEEG signals recorded during the seizure and three hours after the last seizure were excluded.

An example of a pair of the focal and non-focal iEEG signals are shown in Fig. 1, respectively.

Fig. 1.
figure 1

An example of the focal and non-focal iEEG signals

Fig. 2.
figure 2

Proposed method of classification

3 Method

In this method, it is divided into two key parts. Firstly, the STFT and Z-score normalization were successively deployed to preprocess the iEEG signal to get feature arrays on the time-frequency domain. And then the feature arrays are fed into the CNN to classify the iEEG signal into focal and non-focal.

The proposed method used to classify iEEG signals into focal and non-focal is shown in Fig. 2.

3.1 Short Time Fourier Transform (STFT)

Due to the instability of the iEEG signal, it is difficult to extract the key features by Fourier transform [10], hence, STFT is one of the most commonly used time-frequency analysis methods. Firstly, the time-frequency spectrogram of the iEEG signal is transformed by STFT, and then the spectrogram is transformed into a 2D array which is input to CNN for training or testing.

For STFT, the process is to divide a longer time signal into shorter segments of equal length and then use the Fourier transform to compute the Fourier spectrum of each shorter segment.

Given a determined signal x(t), the time-frequency domain at each time point can be obtained by the following formula (1).

$$\begin{aligned} STFT \{x(t)\}(\tau ,\omega )=\int _{-\infty }^{\infty }x(t)w(t-\tau )e^{-j\omega t}\,dt \end{aligned}$$
(1)

where w(t) is the Hann window function centered around zero.

Examples of spectrogram of iEEG signals are shown in Fig. 3.

Fig. 3.
figure 3

STFT of focal and non-focal iEEG signals

3.2 Z-Score Normalization

Before feeding into the neural network, the data are normalized to improve the accuracy of the network and increase convergence speed. Z-score normalization is employed in this study based on the mean and standard deviation of the original spectrogram array.

The Z-score normalization defined as:

$$\begin{aligned} z = \frac{x-\bar{x}}{S} \end{aligned}$$
(2)

where \(\bar{x}\) is the mean of the dataset and S is the standard deviation of the dataset.

3.3 Convolutional Neural Network (CNN)

The CNN architecture contains three different types of the layer: convolutional layer, pooling layer, and fully connected layer.

Convolutional Layer: The ultimate preprocessed STFT data is used as input to convolutional layers. In the convolutional layers, the input time-frequency spectrogram is convoluted by the learnable filter (kernel) which is a matrix, and the stride is set to control how much the filter convolves across the input time-frequency spectrogram. The output of the convolution, also known as the feature map, are obtained after additive bias and non-linear map by an activation function.

Pooling Layer: In the pooling layer, also known as the down-sampling layer, feature maps from the upper layer are down-sampled to lower the calculation complexity and prevent overfitting. There are many kinds of pooling operation, max-pooling is used in this study to obtain the maximum value of each region of the feature map and consequently reducing the number of output neurons.

Fully Connected Layer: In the fully-connected layer, all the 2D feature maps from the upper layer are represented by a one-dimensional feature vector as the input of this layer. In this study, the output is obtained by doing dot products between the feature vector and learnable weights vector, adding learnable bias and then responding to the activation function.

CNN Architecture: The ultimate preprocessed spectrogram transformed by STFT, which is a time-frequency spectrogram of size 257 \(\times \) 101, exploited by the CNN architecture to conduct convolution operation. The local features are extracted individually based on the local correlation among the time-frequency domain. Overall features are built by connecting the local features. The CNN architecture is proposed in Fig. 4. The network is trained by setting with five pooling layers (P1 to P5) after each convolutional layers (C1 to C5), following six fully connected layers, and output includes two neurons corresponding to the focal iEEG signal and non-focal iEEG signal. And the training process builds relationships between iEEG signals and labels. The specific training process is as follows [11]:

  1. 1.

    Time-frequency spectrogram of 257 \(\times \) 101 size is convoluted by a 3 \(\times \) 3 filter sliding in C1 with stride 1 and set 10 feature maps, which are the same size as the input to represent each input spectrogram.

  2. 2.

    The P1 is done pooling operation by a 3 \(\times \) 3 filter sliding with stride 2 and set 10 feature maps to represent the output of C1, and its output size is 129 \(\times \) 51.

  3. 3.

    The C2 to C5 and P2 to P5 are similar to C1 and P1, except that their sizes of input and output are decided by former, and size of the feature map increase exponentially. The final output obtained is 160 feature maps of size 9 \(\times \) 4.

  4. 4.

    The fully connected layer has 160 \(\times \) 9 \(\times \) 4 neurons connected to the feature maps obtained from the P5, and the output layer has two neurons connected to the fully connected layer for classification. Finally, each signal is trained to correspond to one kind of label.

Fig. 4.
figure 4

CNN architecture in the method

4 Experimental Result and Discussion

The proposed algorithm was implemented on a workstation with 12 Intel Core i7 3.50 GHz (5930K), a GeForce GTX 1080 graphics processing unit (GPU) and 128 GB random-access memory (RAM) using the Python programming language on TensorFlow framework.

Ten-fold cross-validation is used in this study, 90% of the dataset is used as the training set (including 10% as validation set), while the remaining 10% as the test set. It requires a lot of computational overhead to use one iteration of full training set to perform each epoch, therefore stochastic gradient descent training is used in this paper. In each epoch of the training, 100 batches with a size of 120 data are randomly fed into the network. And we validate the network by using validation set after each epoch.

The accuracy of the validation set across classification all ten-folds is shown in Fig. 5.

Compared with the published works record in Table 1 [12], although our proposed method does not achieve the best in terms of classification accuracy, it is still managed to obtain 91.8% accuracy. And the advantage of this method is that it is less preprocessing for feature extraction and selection than other methods such as EMD and entropy.

Fig. 5.
figure 5

Accuracy of the validation set classification

Table 1. Detection results of focal and non-focal EEG signals of published journal articles using the Bern-Barcelona EEG database

5 Conclusion

Since manual visual inspection of iEEG is a time-consuming process, an effective classifier that automates detection of epileptic focus will have the potential to reduce delays in treatment. We propose a new recognition method for iEEG-based localization of epileptic focal based on STFT and CNN with additional preprocessing and we implement a 15-layer CNN model for automated iEEG signal classification in this paper. The results with 91.8% accuracy demonstrates that this method is effective with much efficient and fast preprocessing step for localization of focal epileptic seizure area.