Background

An epileptic seizure is a critical clinical problem [1] and Electroencephalogram (EEG) is one of the most prominent ways to study epilepsy and capture changes in electrical brain activities that could indicate an imminent seizure [2]. The diagnosis of epilepsy relies on manual inspection of EEG, which is time-consuming and error-prone. Research from Elger and Hoppe found that only less than half of epileptic seizures which patients document were able to record accurately, and more than half of the seizures captured in long-term video EEG monitoring were not reported [3]. It is of great significance to develop practical and reliable intelligent diagnosis algorithm for automatic seizure detection. Although many efforts have been taken to push the field, we must conclude that seizure detection analysis has not made its way into the clinical practice yet [4].

The task of seizure detection includes distinguishing different stages of seizures, which are generally divided into inter-ictal, pre-ictal and ictal periods [5]. In general, the seizure detection procedure is separated into two parts: feature extraction and classification. There are numerous technological researches based on artificial features and machine learning classifiers [6]. On the one hand, the time-frequency analysis [7], nonlinear dynamics [8], complexity, synchronization [9] and increments of accumulated energy [10] methods were used as feature extraction method. On the other hand, the machine learning classifier includes a Bayes network, traditional neural network and support vector machine (SVM) etc. In fact, feature-classifier engineering techniques have been used successfully in seizure detection tasks [11]. However, the features were extracted based on a limited and pre-fined set of hand-engineer operations. Most importantly, given that seizure characteristics vary among different patients and may change over time, automatically extracting and learning informative features from EEG data is necessary.

Recent advances in deep learning in the past decade have attracted more attention in detective and predictive data analytics, especially in health care and medical practice [12, 13]. It is a powerful computational tool that enables features to be automatically learned from data. Previous studies have proven the deep multi-layer perceptron neural network performs better than the traditional methods such as logistic regression [14] and support vector machine [15]. Related research has shown a 13-layer deep Convolutional neural network(CNN) algorithm achieved an accuracy, specificity, and sensitivity of 88.67, 90.00, and 95.00% respectively in the small Bonn University public data [16]. The ensemble of pyramidal one-dimensional CNN models [17] was proposed to reduce memory space and detection time. Recurrent convolutional neural network learned the general spatially invariant representation of a seizure, exceeding significantly previous results obtained on cross-patient classifiers [18]. The deep unsupervised neural network such as denoising sparse auto-encoder (DSAE) was used in automatically detecting the seizures timely, but may miss important information because of sparse strategy [19]. Other technologies such as deep belief network, transfer learning and so on are also applied to seizure detection [20, 21]. These algorithms based on deep learning lay the foundation of seizure detection research [22].

Nevertheless, the deep neural network was well suited for time series classification [23, 24], it is difficult to learn the corresponding information of multiple electrodes simultaneously. One of the multi-channel analysis is to study different electrodes respectively and finally integrate them [25]. Another method is used by two-dimensional (2D) CNN to learn multi-electrodes, neglecting the relationship between the electrodes [26]. Therefore, we present CNN for seizure detection with a three-dimensional (3D) kernel that is accurate and fully automated to an individual’s need. This method was originally designed to solve the problem of ignoring the inter-frame information recognition of image sequences in the 2D CNN.

In this study, the time series of each channel of EEG data are transformed into images. All channel images consequently were combined as 3D images. In addition, the CNN based on 3D kernels was constructed to perform the classification of different epileptic EEG stages of image datasets. The main contributions of this work are as follows:

  1. 1)

    An efficient method was proposed to preprocess raw EEG data into a 3D image form suitable for a CNN, which integrate multi-channel information;

  2. 2)

    This is the first time that the deep CNN with 3D kernels was applied into the epileptic datasets. In addition, we proposed instructive settings to help the CNN perform well in the seizure detection task.

  3. 3)

    The performance of the 3D CNN methodology was validated by test data, compared to both 2D CNN and traditional machine learning techniques that have been previously evaluated in the literature.

Methods

Data resource and data preparation

Data resource

The data used in this study were collected from epileptic patients in the electroencephalogram room, Department of Neurology, the First Affiliated Hospital of Xinjiang Medical University, 2013~ 2016. The sampling frequency was 500 Hz and the electrodes were located the international 10–20 system. Clinical experts have labeled every seizure. The specific information of epileptic patients was shown in Table 1.

Table 1 The details of collected data

Different seizures have different signal characteristics, and the performance of seizure detection is related to the type of epileptic seizure [27]. So in this paper, the patients’ data with complex partial seizures’ were selected. The experimental data included 13 patients, the age ranged from six to 51 years old. One hundred fifty-nine times of seizures were recorded. The average number of seizures per patient was 12.2. The observation time of each patient was 24 h and the total seizure time was 9956 s.

Data preparation

Numerous investigations have demonstrated a gradual transition between the inter-ictal state and ictal state, which is defined as the pre-ictal stage [28]. Thus, the seizure detection could be considered as the classification of three states. In this study, the EEG data collected from clinical patients were divided into three stages: inter-ictal, pre-ictal and ictal stage, as depicted in Fig. 1. The details are as follows respectively:

  • Pre-ictal state: Segment with an hour duration before each seizure was defined as the pre-ictal state [29].

  • Ictal state: Neurophysiology experts labeled the clinical seizures.

  • Inter-ictal state: The EEG signal data of each patient which was neither pre-ictal nor ictal state were defined as the inter-ictal state.

Fig. 1
figure 1

The single-channel EEG recordings illustrating typical brain states. The typical brain states of epilepsy patients include pre-ictal, ictal and inter-ictal three states. An hour segment before each seizure was defined as a pre-ictal state. Neurophysiology experts annotated ictal state. EEG data of the signal that were neither pre-ictal nor ictal defined as inter-ictal states. The figure represents the whole process of brain electrical signal seizure

System design

The overall study design consists of typical blocks (see Fig. 2). Firstly, due to the multiple electrodes, the multi-channel EEG time series were constructed as 3D images by means of the position of electrodes on the brain. 3D convolutional kernels were tunable to suit the 3D images input. Moreover, deep CNN automatically learned the patterns of different stages from the EEG signal, and then the training model was used to test in the held-out data. Training and inference phase for 13 patients were calculated using a high-performance computer.

Fig. 2
figure 2

Overview of the pipeline used for seizure detection using 3D CNN

Preprocessing

Time window selection

A sliding window analysis usually split the raw EEG data into segments for feature extraction, including overlapping sliding window and non-overlapping sliding window [30]. Since EEG signals are non-stationary data, time window should ensure the stability of data. The overlapping sliding window can guarantee the continuity of data, but it is easy to cause information redundancy. Depending on the pre-experiment, the sliding time window for the ictal data is 2500 points (5 s), while for the non-onset period, the sliding time window size is 10 s, and no overlap occurs.

3D image reconstruction

Since a 3D CNN is built in this work, it is inevitable to convert the multi-channel EEG signal into a 3D array (just like the multi-channel image). The conversion must enable to keep most information from the original data. In total, the procedure was divided into two major steps. Firstly, the time series were formed into 2D images. In order to suit the CNN kernels, the image was designed as a square, which resolution is equal to the number of points (like 5000*5000). And the image compression was used to reduce the image down to 256*256 for reducing the complexity of computation. Then the successive relationship of the different electrodes was selected according to the adjacent degree of the electrodes [31], and the corresponding 2D EEG images were fused to form a 3D multi-channel image. Its structure is [256,256, 22], which is presented in the Fig. 3.

Fig. 3
figure 3

2D and 3D image reconstruction for multi-channel EEG. a 2D image reconstruction on a multi-channel time series results in an image in 2D (multiple frames as multiple channels). b 3D image reconstruction on multi-channel time series results in 3D image volume, preserving temporal information of the input signal. The z-axis is the channel number, x is the size of the time window, y is the value of the signal

The proposed 3D CNN structure

The 3D convolution method was proposed in the action recognition in video tasks, which is most widely used as C3D model [32]. Since the CNN based on the 3D kernels has not been used for epileptic classification, there is no optimal network architecture for referring in the literature. Thus, we construct a new CNN structure with the 3D kernel in this experiment, as described in Table 2, which is different from the C3D model and suitable for seizure detection.

Table 2 The parameters of the 3D CNN

Feature extraction

Convolution neural network is a type of neural network with spatial invariance characteristics. In addition, the 3D convolution layer has the ability to collect spatial-temporal information, which preserves the input signal after every convolution operation. We empirically find that 3 × 3 × 3 convolution kernel for all layers to work best among the limited set of explored architectures. The architecture is shown in Fig. 4. As stated in the experiment, the size of the 3D convolution kernel is 3*3*3 and the step length is 1*1*1, with the Leaky Rectified Linear Unit (ReLU) active function whose coefficient is set to 0.01. The pool layer uses the maximum pool and the size of 2*2*2. The step length of the first layer is 2*2*2, and the rest of the layer is 1*2*2, reducing the attenuation of the feature. The third layer is directly connected to the fourth layer to retain the channel characteristic information as far as possible, the full connection layer of 4096 units and 2048 units followed. Finally, the softmax classifier was used for epileptic classification tasks. Our model was implemented in Python 2.7 with Tensor flow 1.6.0.

Fig. 4
figure 4

The architecture of 3D CNN. 3D CNN network has 4 convolution layers, 3 max-pooling layers, and 2 fully connected layers, followed by a softmax output layer. All conv3D kernels are 3*3*3 with stride 1 in both three dimensions; all pooling layer kernels are 2*2*2. The first fully connected layer has 4096 output units and the second fully connected layer has 2048 output units

Reduce overfitting stage

For the sake of limited available datasets, it is important to prevent the CNN from overfitting and improve the performance of the model. Firstly, the equal three stage datasets were adopted. Then the dropout strategy was applied in the both of the fully connected layers. Dropout strategy makes results in the dysfunction of the weight of some hidden layer nodes. Thirdly, considering the size of the epoch, group normalization proposed by He [33] have replaced batch normalization algorithm [34] in 3D CNN. Group normalization can divide the data into groups, then calculate the mean and variance in each group. It improves network generalization ability and accelerates the model convergence. The comparison results are shown in Table 3.

Table 3 Comparison between batch normalization and group normalization

Classification stage

In this stage, each CNN branch can learn features from different stages. The input of the several branches is the data processed in 3D image reconstruction stage. After the feature extraction stage and reduce overfitting stage, the features obtained by each branch are merged. The outputs of the model are the predicted category labels.

Training and inference phase

A total of 36,000 images dataset was split into a training dataset (30,000 images), a validation dataset (3000 images) and a test dataset (3000 images). The training dataset was used to train the parameters of the model. The validation samples used to validate the model. The test dataset was used to evaluate the trained model.

The classification procedure includes the training phase and inference (test) phase. In the training phase, we trained our model using a 10-fold cross-validation strategy. The dataset is randomly scrambled and divided into 10 equal parts. One is selected as the validation dataset to validate the model, and the rest is the training set to complete the training process, each fold data was verified. The aim of this method is to prevent overfitting of the CNN model during training. In the inference phase, the independent test data was used to evaluate the performance of the model.

According to the pre-experiments, we proposed instructive settings to help the CNN perform well with the seizure detection task. The batch size is set as 10, an epoch iteration is 6000 times, and a total of 200 epochs are trained. The cross-entropy loss function is selected as the cost function, using the Adaptive Moment Estimation (Adam) optimizer (initial learning rate = 0.01, β1 = 0.9, β2 = 0.999, decay = 0). For the learning rate strategy: if 10 consecutive Epochs, when the error on the verification set remains unchanged, the current learning rate will be reduced by 10 times. Otherwise, the learning rate is divided by 10 after each 40 Epoch. Repeat the above three operations until training all epochs.

Compared 2D CNN structure

The 2D CNN developed rapidly with the help of computer vision, the representative convolution neural network mainly includes LeNet, AlexNet, Inception, ResNet, DenseNet, Xception, MobileNet, ShuffleNet, Capsule network etc. [35]. We constructed the 12 layers 2D CNN structure shown in Table 4 and Fig. 5. For the feature extraction stage, the most difference was 2D convolution layer which was applied to collect EEG image information, and every convolution layer adopted the batch Normalization to reduce the changes in the distribution of internal neurons. For the reduction of overfitting stage, the fully connected layers applied the dropout strategy with a dropout rate of 0.5. For the training phase, the setting including earning rate, epochs and cost function etc. is the same as the 3D CNN.

Table 4 The details of 2D CNN structure
Fig. 5
figure 5

The architecture of 2D CNN. 2D CNN network has 5 convolutions, 5 max-pooling and 2 fully connected layers with a dropout rate of 0.5, followed by a softmax output layer. Conv2D kernels are 3*3 with stride 1or 5*5 with stride1; pooling layer kernels are 2*2 with stride 2 or 3*3 with stride 2. The first fully connected layer has 2048 output units and the second fully connected layer has 1024 output units

System evaluation

To evaluate the seizure detection performance, we used the metrics in Table 5 [36].

Table 5 Obfuscation matrix of prediction results and actual results

Standard measurements including sensitivity, specificity, and accuracy were adopted to evaluate the model. According to the above performance parameters, the evaluation indexes are defined as:

$$ \mathrm{Accuracy}=\mathrm{TP}+\mathrm{FN}/\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN} $$
(1)
$$ \mathrm{Specificity}=\mathrm{TN}/\mathrm{TN}+\mathrm{FP} $$
(2)
$$ \mathrm{Sensitivity}\ \left(\mathrm{recall}\ \mathrm{rate}\ \mathrm{or}\ \mathrm{true}\ \mathrm{positive}\right)=\mathrm{TP}/\mathrm{TP}+\mathrm{FN} $$
(3)
$$ \mathrm{False}\ \mathrm{negative}\ \mathrm{rate}\left(\ \mathrm{FNR}\right)=\mathrm{FN}/\mathrm{TP}+\mathrm{FN}=1-\mathrm{sensitivity} $$
(4)
$$ \mathrm{False}\ \mathrm{positive}\ \mathrm{rate}\ \left(\mathrm{FPR}\right)=\mathrm{FP}/\mathrm{FP}+\mathrm{TN}=1-\mathrm{specificity} $$
(5)

Results

In this paper, the 2D CNN model was used to test the single- electrode EEG data and multi-electrode EEG data respectively, and the 3D CNN model was tested for demonstrating the 3D kernels’ effeteness compared to other methods. The results are shown in Tables 6, 7 and 8.

Table 6 Classification result based on 2DCNN model using single and multi-channel
Table 7 Classification results based on 2D and 3DCNN model using multi-electrode
Table 8 Performance comparison of different methods

According to Table 6, the accuracy rate of the network based on the single electrode data test was 89.95%, the FNR was 15.07%, and the FPR was 7.53%. While the accuracy of the multi-channel was 89.91%, the FNR was 15.13% and the FPR was 7.57%. It demonstrated that more channels from EEG data carried more information and could increase the specificity and sensitivity in medical analysis.

In Table 7, the accuracy of 3D CNN based on multi-channel was 92.37%, the FNR is 11.43%, and the FPR is 6.22%. While the accuracy of the 2D CNN was 89.91%, the FNR was 15.13% and the FPR was 7.57%. The overall recognition rate of the 3DCNN model was higher than that of the 2D CNN, and the recognition rate for the ictal time segment was the highest, followed by the recognition rate of the pre-ictal EEG data.

Table 8 lists the comparison of the 3D CNN based algorithm with traditional machine learning algorithms as well as the 2D CNN, all of the above methods were trained and tested with the data used in this study. According to the results, the method proposed in this paper not only achieved the best performance but also reduce the hand–engineered time.

Discussion

People with uncontrolled epilepsy suffer uncertainty when a seizure occurs, the diagnosis of seizure was a lack in remote areas because of limited medical services [37]. For examining epilepsy patients efficiently, we hope to develop an automatic seizure detection system to guide doctors.

Deep learning opens the new gate of intelligent diagnosis in medical healthcare, especially in EEG signal processing. The LSTM network was able to predict all 185 seizures, providing high rates of seizure prediction sensitivity based on different pre-ictal time window in the public datasets [38]. The proposed deep learning approach combined the time-frequency and CNN achieves a sensitivity of 81.4, 81.2, and 75% in public dataset [39]. The deep learning applied to the hidden layer makes the expression of data as specific as possible so as to obtain a more efficient representation of EEG signals.

However, most deep learning researches adopt the 2D network, which ignores the fact of multi-channel signal processing [40], Table 6 shows that the more channels EEG signal could improve the performance of the network. We proposed the 3D image reconstruction approach to relate multi-channel information, just like in video processing [41]. In addition, the group normalization, as well as the oversampling techniques were applied to overcome the overfitting of the limited datasets [42]. Compared with the 2D CNN shown in Table 7, our strategy achieved a mean accuracy of more than 90%. It demonstrated that there was a reliable and automatic seizure detection system. This is the first study to introduce 3D kernel CNN’s for seizure detection.

To evaluate our approach, we have measured the proposed algorithm against three studies using the same data, summarized in Table 8. The first method [43] extracted pre-defined features from the EEG data and use conventional machine learning techniques to classify epileptic stages. This requires much time and it is possible that some information is fully or partly missed in the selected features. The next two deep learning method including 2D CNN and 3D CNN have introduced before, which could learn data patterns automatically. On average, the proposed 3D CNN method performs better than 2D CNN in terms of the multi-channel information, and it outperforms the hand-engineered method with less time and high accuracy. A recent competition on Kaggle held the seizure detection contest, the top three winner algorithms [44] includes the hand –engineered and deep learning methods, but they relied on complex features selected. Therefore, the method presented here can be run on an online platform and tested on more data, satisfying the power, resource, and computation that can be implemented in the wearable device.

However, limitations of this work have to be admitted. Firstly, this method, all deep learning technology requires sufficient data to train the model and the design of the network is much harder to guarantee to be optimal. Maybe other research gets better performance just tuning the small parameters. Secondly, few clinical experts in one center labeled the model data. Thirdly, the experiment just involves the EEG data type, which neglects other data types from a multi-scale perspective. In order to have a more generalizable clinical validation, the methods should be tested on an extensive and multi-center dataset. Further relevant information sources can be readily incorporated into the deep neural networks, such as video, weather patterns, biomarkers, or clinical notes [45, 46]. Detection algorithm which incorporates these additional inputs and the data types is the focus of ongoing work.

Conclusion

This study proposed a new approach for epileptic EEG classification, which constructed the 3D CNN for multi-channel EEG data. The main advantage of the method is fully utilizing the multi-channel signal information without hand-engineered. The 3D CNN model outperformed the previously heuristic detectors. To our best knowledge, this study is the first try of using 3D CNN algorithm for seizure detection. Therefore, it may serve as a benchmark for new work exploring deep learning enabled seizure detection in terms of multi-channel EEG data. Further studies need to carry out to validate this algorithm in the multi-center dataset. We expect more advances in signal processing, network design, model validation to shape the future of automatic seizure detection.