1 Introduction

Motor imagery (MI) brain–computer interface (BCI) is an important type of BCI, which can use imagery intents to control devices and perform rehabilitation training for patients with movement disorders. It is widely used in the field of rehabilitation and human–machine interface [1]. The source distribution of MI electroencephalogram (EEG) signals on the brain cortex and their features has corresponding relationships with motion parts and motion types of the body, which makes it possible to determine the motion content by EEG signal analysis. The previous MI-BCI researches mainly focus on the upper limb MI. Some crucial problems like channel selection, feature extraction and classification related to the MI EEG of the lower limbs still need more in-depth work.

In BCI applications, 64 or more channels are usually used to collect EEG data while only a few of them are selected for follow-up data analysis so as to reduce data amount and improve classification accuracy by reducing over-fitting [2]. In order to obtain optimal channels, many channel selection methods have been studied. Generally, the channels in the motor area, such as C3, C4, Cz, etc., are selected for the upper limb MI EEG signal analysis [3]. Varsehi et al. [4] proposed a Granger causality channel selection (GCCS) method in which the channels were sorted by the mean GC value of each channel to other channels, and the channels with higher GC values were selected. With eight selected channels, the method resulted in 93.03% accuracy, which was increased by 3.95% in comparison with correlation-based channel selection method. Idowu et al. [5] selected optimal channels by a modified particle swarm optimization (MPSO) which was added a disturbing term to avoid local optimal solution and premature convergence. The method outperformed previous version of PSO by reducing the error rate by 10.4%. Fauzi et al. [6] used L2-norm to calculate the energy of each channel and selected the channels with high energies as the active channels for EEG classification. For the hand MI EEG dataset, C3-Cp3-C4-Cp4-Cz, C3-Cp3-C4-Cz-Cpz and Cp3-Cp4-Cpz were selected. Compared with the method without energy extraction, the proposed method improved the accuracy by up to 50%. Feng et al. [7] believed that brain areas associated with MI were not exactly the same in different frequency bands and thus proposed a common spatial pattern-rank channel selection method for multifrequency band EEG (CSP-R-MF), with different sub-frequency-bands having different selected channels. The average classification accuracy was improved by about 7% when using the proposed method compared to using CSP-R. Qi et al. [8] proposed a spatiotemporal-filtering-based channel selection (STECS) method, in which the spatiotemporal filter optimization was transformed into a Rayleigh quotient maximization problem, and channel selection was achieved by adding a sparsity-promoting regularization term. Compared with other channel selection methods, the mean classification performance of STECS on the three data sets was improved by up to 10.42%, 6.13%, and 3.72% respectively. Zhang et al. [9] inserted an automatic channel selection (ACS) layer into a convolutional neural network for MI classification. By introducing the sparse regularization, the output of the ACS layer was constrained to be sparse and the channels corresponding to the nonzero coefficients were retained for MI classification. By the method, an average accuracy of 87.2% was obtained, providing a up to 23.7% improvement with respect to other channel selection approaches. Nevertheless, the research on channel selection is more focused on the upper limbs. At the same time, it is generally achieved through approximation methods and it remains unclear how large the gap is between the selected channels and the truly optimal ones [10]. In addition, compared with the need of real-time BCI applications, the number of channels acquired by most channel selection methods is still too large and needs to be further reduced. More work is needed to find the optimal channels suitable for MI-BCI, especially for the lower limb applications which have been less studied.

Feature extraction is another critical problem to ensure the EEG classification accuracy. Commonly used feature extraction methods mainly include the following: time-domain methods, spectral methods, time–frequency-domain methods and spatial domain methods. In the time domain, Balam et al. [11] proposed a single-channel EEG based drowsiness detection (DD) model. The EEG signal was decomposed into a series of sub-bands by wavelet packet transform (WPT). Those sub-bands in the EEG rhythm frequency ranges were extracted and then applied through the inverse WPT to get back the time-domain signals of them as feature signals. The proposed model achieved 94.45% and 85.3% accuracy on two EEG datasets, respectively, outperforming other comparison models. Wang et al. [12] adopted a one dimensional-aggregation approximation (1d-AX) method, using time-domain piecewise linear regression to represent the EEG signal and passing the data to the long short-term memory (LSTM) network for classification. On the adopted EEG data of all subjects, the classification accuracies of AX-LSTM were superior to other methods except in the experiment of the left hand/tongue group. In the spectral domain, Samuel et al. [13] compared 12 spectral domain descriptors (SDDs) with 20 time-domain descriptors (TDDs), and revealed that the SDDs worked better than the TDDs, with the best TDD achieving an accuracy of 67.05% as against 87.03% for the best SDD. By applying a linear feature combination, an optimal set of combined TDDs recorded an average accuracy of 90.68% while that of the SDDs achieved an accuracy of 99.55%. Virgilio et al. [14] applied power spectral density (PSD) and discrete wavelet transform (DWT) for EEG feature extraction. Using the proposed Spiking Neuron Network Model (SNN), a classification accuracy of 74.54% was obtained, which was 0.54% higher than the traditional neural network model. In the time–frequency domain, Gao et al. [15] utilized wavelet time–frequency analysis to calculate the energy sequence of each channel. Then, by using the channels of the scalp EEG as nodes and determining the edges between nodes according to the energy difference between channels, the human brain network model was constructed. The results of brain network analysis indicated that contralateral sensorimotor area were more closely related to the MI activities. Ortiz et al. [16] decomposed the brain rhythms in the time–frequency domain using the Empirical Mode Decomposition (EMD). The averaged power variations calculated by EMD and a second order Butterworth filter were similar, indicating that EMD could be a valid tool for the analysis of EEG signals. In the spatial domain, the common spatial pattern (CSP) method is the most commonly adopted method. Mishuhina et al. [17] decomposed the EEG signal into time stages and frequency components and extracted CSP from every decomposed time–frequency cell. On three public EEG datasets, the method improved the average classification accuracy by up to 8.6% comparing to other CSP variants. Tang et al. [18] extracted optimal frequency band of each electrode and then decomposed the signals of the band into spatial patterns to describe the differences of two classes of MI. The method achieved an average classification accuracy of 91.25% on public data set, being 3.75% and 6.25% higher than those using original CSP and autoregressive, respectively. Despite the studies above, some researchers considered that features extracted from a single domain only provided limited information while features from different domains might contain more useful information for EEG classification [19]. In [19], multi-domain features, including Hjorth, the power spectrum estimation via maximum entropy, and time–frequency energy, were extracted and then fused into low-dimensional informative features by sparse representation. The method achieved an average accuracy of over 79%, which outperformed those single-domain feature extraction methods. Khateeb et al. [20] extracted multi-domain features by combining wavelet entropy and Hjorth parameters from EEG dataset and improved the emotion classification accuracy to 65.92%, better than the single-domain methods using entropy and Hjorth features, which were 63.62% and 64.74%, respectively. To adopt feature fusion to obtain a higher classification accuracy is now an emerging trend [21].

In the classification stage, various classification approaches have been studied, such as decision tree [22], linear discriminant analysis [23], support vector machine (SVM) [23], and those based on deep learning [24]. Among them SVM is a commonly used EEG classification method which has good generalization ability and can obtain good classification results when the number of samples is small [25]. SVM can also be combined with other methods, such as deep learning [26], Bat optimization algorithm [27], or particle swarm optimization (PSO) [28] to improve its classification accuracy.

Therefore, there seems to be a strong need to select a more appropriate channel set, explore more reasonable multi-domain features and feature classification methods for the BCI applications of lower limb MI which is more difficult in EEG signal analysis and classification, to improve the EEG classification accuracy. In this paper, we design an EEG experiment of lower limb MI, and study the methods of channel selection, multi-domain feature extraction and classification of lower limb MI EEG signals, aiming to determine the optimal channel set (OCS) and multi-domain EEG features that can reduce the EEG signal processing workload and improve the accuracy of EEG classification. The main contributions of this paper are as follows:

  1. (1)

    A novel iterative EEG source localization (ISL) method is proposed to determine the OCS suitable for lower limb MI-BCI. The OCS, consisting of FC1, FC2, C1, C2 and Cz, has a better EEG classification accuracy than the commonly used traditional channel set (TCS) C3, C4 and Cz.

  2. (2)

    A multi-domain feature (MDF) extraction method of MI EEG signals is proposed. The combination of OCS and MDF can not only reduce the amount of data processing, but also retain more useful feature information to improve the accuracy of EEG classification.

  3. (3)

    A PSO-SVM is utilized, based on the OCS and the MDFs, to improve the EEG classification accuracy of lower limb MI.

The rest of the paper is organized as follows. In Sect. 2, the lower limb MI experiment is set-up. In Sect. 3, the methods of EEG data preprocessing are briefly described. In Sect. 4, the proposed ISL method is discussed in detail. In Sect. 5, the method of feature extraction and fusion is presented, and the PSO-SVM classification method is provided. The experimental results are reported and analyzed in Sect. 6. Finally, the paper concludes in Sect. 7.

2 Experiment set-up and the data collection

2.1 Experiment platform

The lower limb MI experiment system uses the EEG signal acquisition equipment produced by Brain Products GmbH, as shown in Fig. 1. Computer 1 is used to display the experimental paradigm including the playing of inducing video for MI, and computer 2 is used to collect and display EEG signals. Other hardware includes a Brain Amp amplifier, a signal processor, a 64-channel Quik-cap that complies with the International 10–20 System of Electrode Placement, and experimental consumables such as conductive paste, syringes, medical tape, and cotton swabs. The software of the system includes Scan, E-Prime, and the inducing video player developed by us.

Fig. 1
figure 1

Lower limb MI experiment system

Before experiment, inject conductive paste into each electrode of the cap, and reduce the resistance value of each electrode to less than 5KΩ to ensure that all the electrodes are in good contact, which can be judged with the help of Scan.

Subjects perform MI tasks according to the content displayed on the screen of computer 1. The raw EEG signals are measured by the electrode cap, then recorded by computer 2 after being amplified and filtered by the amplifier and the signal processor, respectively. The EEG records are furtherly used for the MI EEG classification.

2.2 Subjects and experiment environment

The subjects are 20 college student volunteers, aged 22–26 and being physically and mentally healthy. Before the experiment, each subject needs to do a week of imagery training of leg lifting to improve his motor imagery ability. He also should get enough sleep the night before the experiment, and keep his scalp clean so that the conductive paste works well.

In order to reduce the noise in the EEG signals, the experiment is carried out in an isolated small room to ensure that the environment is quiet. The subject is sitting in a luxury seat so that he is comfortable during the experiment. His eyes are at the same height as the screen and about 90 cm away from the screen. During the experiment, the subject should try not to blink or roll his eyes, and avoid unnecessary actions such as swallowing and making noise.

2.3 Experiment procedure and data recruitment

The experiment records two types of visual induced MI tasks, the rest task and the motor task. A picture of a human body sitting still is used for the rest task inducing. When the subject sees the picture, he keeps his body relaxed, without any mental activity in his mind. A video of leg raising is played as the motor task induction. When the subject sees the video, he needs to imagine once in his mind the scene of himself raising his leg.

In order to ensure that sufficient EEG data are generated, and at the same time not to make the subjects too fatigued due to the long experiment time and affect the quality of the data, 110 trials are arranged for each subject. The 110 trials are divided into five experiment groups. Each group has 22 trials, including 11 rest tasks and 11 motor tasks. The total number of trials for all subjects is 2200.

The procedure of a trial is shown in Fig. 2. Each trial lasts 10 s. When the subject gets ready, the keyboard “q” key is pressed to start the trial. During the 1st second, a red “ + ” appears in the middle of the screen to remind the subject to concentrate. From the 1st to the 3rd second, the screen is blank. From the 3rd to the 7th second, the motor task video or the rest task picture is randomly played on the screen. The subject performs the rest task or the motor task according to the content on the screen. From the 7th to the 10th second is the rest time. During this time, the subject relaxes himself to reduce the affection of fatigue. There is a 5-min rest time between two experiment groups. During the rest time, the subject should not make any big movements, especially not shake his head.

Fig. 2
figure 2

Procedure of a trial

As mentioned above, the EEG data set of each subject includes five groups, and each group includes 22 trials. The data of each trial include the prompt segment, the blank segment, the data segment (rest task or motor task), and the rest segment. All 64 channels of EEG data, including one channel of electrooculogram (EOG) data, are collected. The sampling frequency is 128HZ. The data recruitment process are as follows:

EEG data are collected by experimental group. At the beginning of each group, the experiment assistant pressed the start button on computer 2, computer 2 starts the data collection accordingly, and the subject begins to mentally prepare. After the subject is ready, the experiment assistant presses the “q” key of computer 1, and computer 1 displays the prompt sign “ + ” on the screen and at the same time sends a synchronization message to computer 2. After receiving the synchronization message, computer 2 makes a synchronization mark on the received EEG signal to indicate the beginning of the trial data. After that, computer 1 displays the content in the order shown in Fig. 2 until the trial is over. In one group, computer 2 continuously records EEG signals, using the synchronization signal to identify the starting point of a trial.

3 EEG data preprocessing

The EEG signals contain a lot of noise caused by many factors such as power-line interference, electromagnetic interference, head movement, eye movement, tongue movement, etc. In addition, the data quality of the EEG signals can also be decreased if the subject is fatigue or lack of concentration during the experiment. The objective of data preprocessing is to reduce the noise and improve the signal-to-noise ratio of the EEG signals. The work of data preprocessing includes: re-reference, filtering, epoch extraction, removal of bad trials and channels, blind source separation, and removal of artifacts. They are accomplished with the help of EEGLAB [29].

Re-reference is accomplished by using the average value of the EEG signals of all channels as the reference channel.

Since MI EEG signals are mainly in the α and β rhythms, the frequency range of 1–35 Hz is studied in the paper. A 1 Hz high-pass filtering, a 35 Hz low-pass filtering, and a 50 Hz notch filtering to remove the influence of power-line interference are performed.

Epoch extraction is used to select the valid segment of the EEG signal of each trial. As shown in Fig. 2, the segment from the 3rd to the 7th second is the MI segment, and the segment from the 2nd to 3rd second is used for baseline correction. Hereby, in the paper, the EEG signal segment of 5 s, from the 2nd to 7th second, is extracted as a valid segment.

Common artifacts include signals caused by eye blink, eye movement, head movement, power-line frequency interference, EMG and so on. Independent component analysis (ICA) method is often used to remove these artifacts [30]. During the MI EEG experiment designed in the paper, subjects are required to sit still and do not have large-scale limb movements. Therefore, eye blink and eye movement have the most significant impact on EEG signals among all artifacts. They are removed by the ICA method as well as manual method both provided by EEGLAB.

4 Channel selection based on the iterative source localization (ISL)

4.1 The iterative source localization

The process of finding the source area in the brain that generates the signals collected by channels on the surface is called source localization [31]. It is an inverse problem [32] and can be approximated to a linear problem. This paper uses the weighted minimum-norm estimation (WMNE) method for EEG source localization [33]. The method uses the head volume conduction model to specify the positions of the channels on the head surface. It adopts a distributed current density (DCD) model to solve the inverse source problem, which triangulates the continuous cerebral cortex and estimates the voltage amplitude of each triangle, also known as the source, from the surface EEG recordings. The higher the amplitude, the more active the source [34] and the closer the relationship between the source and the EEG signal.

The traditional source localization does a single sourcing operation using all the surface channel data. Many factors such as noise and data quality often affect its accuracy. To improve the accuracy of the source localization, the paper proposes an ISL method. ISL repeatedly calls source localization and source mapping which selects the channels by the resultant source area until the number of channels selected is stable. Figure 3 shows the process of the ISL.

Fig. 3
figure 3

Flowchart of the ISL

4.2 Selection of the initial channel set

Considering that, on one hand, the brain functional areas related to motor, including primary motor area, premotor area, and supplementary motor area, are located in the central area of the top of the head. On the other hand, the electrodes in the outermost circle of the Quik-cap are far away from the brain motor areas, and their signals are easily interfered by eye movement and head movement, in order to reduce the amount of data processing and the effect of low-quality channel data, we removed the data of the 24 channels on the outermost circle of the Quik-cap, as well as the EOG channel IO and those at the edge of the frontal and occipital regions, namely AF3, AF4, PO3, POz, and PO4. That is to say, 34 channels in the middle area of the brain as shown in Fig. 4 are selected as the initial channel set S.

Fig. 4
figure 4

34 channels selected as the initial channel set

The paper uses time–frequency analysis to verify the validity of the EEG signals of S. All the 64 channel signals and the middle 34 channel signals of all subjects are averaged, respectively, to create two data sets, a 64-channel data set and a 34-channel data set. In order to improve the accuracy of the signals, the data of 1 s before the MI task is used for baseline correction to reduce the influence of signal drift. Then, short-time Fourier transform (STFT) is performed on these two data sets, and two time–frequency diagrams are obtained, as shown in Fig. 5a, b, respectively. It can be seen that the event-related desynchronization (ERD) phenomenon of the middle 34-channel signals is more obvious than that of the whole 64-channel signals, which means that the 34-channel signals are less affected by noise and more closely related to the lower limb MI.

Fig. 5
figure 5

Time–frequency diagrams of 64 and 34 channels

4.3 Generation of source distribution images

After source localization, a source distribution image is obtained, on which the color of a pixel is used to indicate the current source density (mA/m2) of the pixel as a source, that is, the source intensity. The brighter the color, the more active the source is, as shown in Fig. 6. Here, in order to show the positions of the sources on the brain, the source distribution image is displayed together with the brain template.

Fig. 6
figure 6

Source distribution image

At different time during a trial, the distribution and the intensities of the sources of the same subject are also slightly different. In this paper, the time-domain signal curves of all channels are drawn and the point with the largest amplitude on the curves is extracted as the source localization time, as shown in Fig. 7. At the source localization time, the EEG signal is the strongest and the source localization result is obvious. Different subjects have different source localization times. The subject shown in Fig. 7 has a source localization time of t = 0.302 s.

Fig. 7
figure 7

Source localization time

The source localization is performed for each subject at his source localization time. Figure 8 shows the 20 source distribution images of the first source localization. Here, the intensity threshold is set to 10%, so the 10% sources with the lowest intensity are ignored in each image. Because the objective of the ISL algorithm is to find the distribution of the sources with the highest intensity, removing the sources of the lowest intensity can reduce the amount of data without affecting the result of channel selection.

Fig. 8
figure 8

source distribution images of the first source localization

Grayscale

4.4 Extraction of the source area

The source distribution images are converted to grayscale ones. Figure 8 shows the resultant grayscale source distribution images of the first source localization. These 20 grayscale images are superimposed and averaged and the resultant source image Ia of this source localization is obtained. Figure 9 shows Ia and its gray histogram. In order to reduce the dimension of the features that need to be processed and the workload of feature classification, generally only the top n active channels are selected for follow-up EEG analysis. Since the 64 channels are roughly evenly placed on the head surface, n channels occupy approximately s = n/64 of the area of Ia.

Fig. 9
figure 9

Source image and its gray histogram

To find the top n active channels, it is actually to determine a gray threshold k that meets the following condition: on Ia, the ratio of the area of all pixels whose gray values are greater than threshold k to the area of Ia is s. The paper uses Eq. (1) to calculate k and thus to determine the source distribution:

$$ k = {\text{max}}\left\{ {g_{i} :P\left( {g_{i} } \right) = \frac{{\mathop \sum \nolimits_{{m = g_{i} }}^{255} f\left( m \right)}}{{\mathop \sum \nolimits_{n = 0}^{255} f\left( n \right)}} \ge s;i,m,n \in \left[ {0,255} \right]} \right\} $$
(1)

where k is the gray threshold that needs to be calculated, \({g}_{i}\) is a gray value between [0,255], \(f\left(m\right)\) is the distribution of gray value \(m\), and \(P\left({g}_{i}\right)\) is the proportion of pixels with a gray value greater than \({g}_{i}\). As shown in Fig. 9, \(P\left({g}_{i}\right)\) can be obtained by integrating the gray distribution in the range of gray value not less than \({g}_{i}\).

Considering that the data volume of 5 channels can achieve a good balance between the amount of feature information and the workload of data processing, in this paper n is set to 5. When n = 5, s = 7.81%. Then, \(k\)=173 is obtained by Eq. (1). Using \(k\)=173 as the threshold, Ia is inverted and binarized to obtain the source area image Is as shown in Fig. 10.

Fig. 10
figure 10

Source area image of the first source localization

4.5 Source mapping

Source mapping is to find the channels on the scalp above the source area. It is based on an intuitive fact that the scalp surface being closest to a source area has the most intensive EEG signals generated by the source, and the channels above the source area get the best EEG signals. This is also the reason why channels above the motor area, such as C3, C4, Cz, etc., are commonly selected for EEG signal analysis.

By superimposing Is and the electrode placement map, the channels above Is are obtained. As shown in Fig. 11, it can be seen that the channels in the source area include 7 channels: Fz, FC1, FC2, C1, C2, Cz and CP2. They are the result of the first iteration of the ISL, and at the same time they constitute the initial channel set for the second iteration.

Fig. 11
figure 11

Resultant channels of the first iteration of the ISL

4.6 Determination of the OCS

Repeat the source localization with Fz, FC1, FC2, C1, C2, Cz and CP2 as the initial channel set. The result of the second iteration includes the channels FC1, FC2, C1, C2 and Cz, which are more concentrated in the middle area than the first iteration.

Figure 12 is the result of the third iteration. The resultant channels are the same as the second iteration: FC1, FC2, C1, C2 and Cz. According to the termination condition of the ISL, the iteration is terminated and the OCS selected for the lower limb MI are: FC1, FC2, C1, C2, Cz.

Fig. 12
figure 12

Result of the channel selection

5 Feature extraction and classification

5.1 Feature extraction

We hope to use as little feature data as possible to express the imagery intentions as accurately as possible. In the paper, time-domain features, frequency-domain features, time–frequency features and spatial domain features are extracted and merged into multi-domain features to express the imagery intentions, which perform better than single-domain features [28].

  1. (1)

    Time-domain (TD) feature extraction

As mentioned above, the valid segment of each trial is 5 s, of which the first 1 s is the baseline segment, and the subsequent MI time is 4 s. With the sampling rate of 128 Hz, there are totally 640 sampling points for each trial.

The EEG signal is 8–13 Hz band-pass filtered first, and the frequency band power is calculated to obtain the TD features. The square of the voltage of the sampling point on the TD EEG signal curve is used to characterize the power. For the EEG signal curve of a trial of a subject, the maximum power \({P}_{\mathrm{max}}\), the minimum power \({P}_{\mathrm{min}}\), and the average power \({P}_{mean}\) are extracted as TD features according to Eqs. (2), (3), (4).

$$ P_{{{\text{max}}\_ij}} = {\text{max}}\{ v_{ijk}^{2} ,k \in \left[ {1,n} \right]\} $$
(2)
$$ P_{{{\text{min}}\_ij}} = {\text{min}}\{ v_{ijk}^{2} , k \in \left[ {1,n} \right]\} $$
(3)
$$ P_{{{\text{mean}}\_ij}} = \frac{1}{n}\mathop \sum \limits_{k = 1}^{n} v_{ijk}^{2} $$
(4)

where n = 640 is the number of sampling points on the EEG curve of a trial. \(i\in \left[\mathrm{1,5}\right]\) is the channel number in the OCS. \(j\in \left[\mathrm{1,110}\right]\) is the trial number. \(k\in \left[1,n\right]\) is the sampling point number. \({v}_{ijk}\) is the voltage of sampling point \(k\) of trial \(j\) of channel \(i\), and thus \({P}_{\mathrm{max}\_ij}\), \({P}_{\mathrm{min}\_ij}\), and \({P}_{\mathrm{mean}\_ij}\) are the maximum power, the minimum power, and the average power of trial j of channel \(i\), respectively.

Three TD features are extracted for each channel in the OCS. Each subject has 15 TD features for each trial. Then a TD feature matrix B1(2200 × 15) can be generated for all subjects and all trials.

  1. (2)

    Frequency-domain (FD) feature extraction

For the FD EEG signal of each channel at 0–40 Hz, the power spectral density (PSD) is calculated using the Welch method, and then its mean value \({\mathrm{PSD}}_{\mathrm{mean}}\), standard deviation \({\mathrm{PSD}}_{\mathrm{std}}\), average power \({\mathrm{PSD}}_{\mathrm{avg}}\), kurtosis \({\mathrm{PSD}}_{\mathrm{kur}}\), and skewness \({\mathrm{PSD}}_{\mathrm{ske}}\) are extracted as frequency-domain features. For the 5 channels of the OCS, 25 FD features are extracted. A FD feature matrix B2(2200 × 25) is generated for all subjects and all trials.

  1. (3)

    Time–frequency-domain (TFD) feature extraction

Discrete Wavelet Transform (DWT) is used to decompose the signals of the OCS, with db4 as the wavelet basis function. According to the Nyquist sampling theorem, the valid frequency is 64 Hz when the sampling rate is 128 Hz. After four-layer decomposition, five frequency bands of 0–4 Hz, 4–8 Hz, 8–16 Hz, 16–32 Hz, 32–64 Hz are obtained, as shown in Fig. 13. The detail coefficient cD3 on the third level represents the 8–16 Hz frequency band of the EEG signals, which contains the α frequency band (8–13 Hz) that has the most obvious ERD/event-related synchronization (ERS) phenomenon. Hereby, the average energy of the frequency band of cD3 is selected as the TFD feature.

Fig. 13
figure 13

DWT decomposition

One TFD feature is extracted for each channel of the OCS and a TFD feature matrix B3(2200 × 5) is obtained for all subjects and all trials.

  1. (4)

    Spatial domain (SD) feature extraction

CSP works well for two-class classification. This paper extracts the SD features of the EEG signals based on the CSP algorithm. First a spatial filtering is performed on the rest task data and the motor task data, then the variance scaling method is used to expand the distance between these two types of data and maximize the variance between them. The variance of the voltage values of 640 sampling points is output as the SD feature.

Each channel has a SD feature, and a SD feature matrix B4(2200 × 5) can be obtained for all subjects and all trials.

  1. (5)

    Feature fusion and multi-domain (MD) feature generation

Assemble the above four feature matrices by Eq. (5) to form the MD feature matrix.

$$ B = \left[ {B_{1} ,B_{2} ,B_{3} ,B_{4} } \right] $$
(5)

According to the size of B1, B2, B3, and B4, it can be known that the size of the MD feature matrix B is 2200 × 50. B can be written as: B = [bi|i = 1,…,50], where bi is the ith column feature vector.

For any channel i, combine its corresponding feature column vectors by Eq. (6) to realize the feature fusion and obtain the fusion feature vector ti:

$$ t_{i} = \left( {\sum\nolimits_{j = 3(i - 1) + 1}^{3(i - 1) + 3} {b_{j} + \sum\nolimits_{j = 5(i - 1) + 16}^{5(i - 1) + 20} {b_{j} + \;} } b_{j + 40} + b_{j + 45} } \right)/10.0 $$
(6)

where \(i\in \left[\mathrm{1,5}\right]\). \({{\varvec{t}}}_{i}\) is the fusion feature vector corresponding to the ith channel in the OCS. \({{\varvec{b}}}_{j}\) is the jth column feature vector in B.

After combination, the MD feature matrix T(2200 × 5) = [t1, t2, t3, t4, t5] is obtained. Figure 14 shows the distributions of randomly sampled feature data of channel pairs. The data points are like (tij, tik) where i is the trial number and j, k are channel numbers. It can be seen that the points are obviously clustered according to whether they belong to the rest task or the motor task, which is helpful to find a specific plane to classify them.

Fig. 14
figure 14

Clustering distribution characteristics of feature data

5.2 Feature classification

As mentioned earlier, SVM is an excellent binary classifier which is also adopted in this paper. Radial basis kernel function (RBF) is often selected as the kernel function of SVM because it requires few parameters and is flexible. It has good performance regardless of sample size and has good anti-interference ability. Because of this, it is often selected as the kernel function in the application of SVM-based MI EEG signal classification and achieved excellent classification accuracy [35]. In this paper, we also use RBF as the kernel function of SVM. However, some studies have also achieved better classification performance using kernel functions other than RBF. For example, Chui et al. [36] used the cross correlation kernel Kxcorr,ij and the convolution kernel Kconv,ij to construct a Mercer kernel function KDDC and performed drowsiness detection of drivers based on electrocardiogram data. The accuracy of KDDC was proved better than typical kernels including linear, quadratic, third order polynomial, and Gaussian RBF by 17–63%, respectively. Therefore, customizing a kernel function suitable for the characteristics of the EEG signals to obtain a better classification performance is also our future work.

Considering the clustering distribution characteristics of the fused feature data, in this paper we adopt PSO based SVM for feature classification. PSO is used to optimize the penalty parameter c and the kernel function parameter g of SVM to improve its classification ability. In the PSO, the population size is set to be 20 and 100 iterations are performed to optimize the SVM parameters. The final optimization result is c = 0.9221, g = 0.7832.

We use tenfold cross-validation for performance validation. The multi-domain feature matrix is divided into 10 row blocks. Nine blocks are used as training data and the remaining one block is used as test data.

6 Results and analysis

6.1 Classification results

Table 1 shows the classification results of the method proposed in the paper comparing to other methods. Two data sets are used for comparison. One is the EEG signals on the OCS generated by the ISL, and the other one is the signals on TCS commonly used by existing works. For each data set, five kinds of features are extracted: TD features, FD features, TFD features, SD features, and MD features. Two kinds of SVM are adopted for the classification: Traditional SVM and PSO-SVM. In the traditional SVM, after many tries and adjustments, the penalty parameter c and the kernel function parameter g are set to 1.0 and 0.8, respectively. As a comparison, in PSO-SVM the parameters are optimized to c = 0.9221, g = 0.7832.

Table 1 Classification results for different channels and features

PSO-SVM is also compared to two popular ensemble methods, Bagging and Gradient Boosting. Here Random Forest and Gradient Boosting Machine are used. Each classification method is tested on five kinds of features: TD features, FD features, TFD features, SD features, and MD features. All the features are extracted from the OCS. Table 2 shows the results of these classification methods.

Table 2 Classification results of different classification methods

6.2 Result analysis

  1. (1)

    Channel selection

    The purpose of ISL is to find the most active source area with the highest correlation with the EEG data, so as to obtain the channel set with the most obvious EEG features. After three iterations, the stable OCS is obtained: FC1, FC2, C1, C2, and Cz. As shown in Fig. 12, compared with the TCS, C3, C4, and Cz, which are often used for the upper limb BCI applications, the distribution area of OCS is more concentrated in the center on the coronal section of the brain, which is consistent with the widely recognized lower limb motor area in the brain. This may imply that the OCS we selected are suitable for the expression of lower limb MI EEG data.

    It can be seen from Table 1 that when the same classification method, traditional SVM (T-SVM) or PSO-SVM, is used, for the same domain, the classification accuracies on the OCS for TD features, FD features, TFD features, SD features and MD features are increased by 0.46%/0.43%, 1.34%/2.43%, 0.50%/3.15%, 0.84%/0.08%, and 1.17%/0.68%, respectively, comparing to those on the TCS, as shown in Fig. 15. It proves that the classification effect of the OCS is better than the TCS.

  2. (2)

    Multi-domain feature extraction

    From Sect. 5.1, we can see that the amount of MD features is the same as that of the TFD features and the SD features, smaller than that of the TD features and the FD features.

    As shown in Table 1, using the same classification method and channel set, the classification accuracies of MD features increase by 0.93–3.21%, 0.66–2.21%, 0.60–2.50%, 0.06–3.96%, respectively, comparing to single-domain features, as shown in Fig. 16a.

    As shown in Table 2, the classification accuracies of MD features using Bagging, Gradient Boosting and PSO-SVM are increased by 1.48–8.53%, 2.25–4.91%, and 0.66–2.01%, respectively, comparing to single-domain features, as shown in Fig. 16b.

    The results show that using MD features, better classification accuracies than single-domain features are achieved in the case of no increase in the amount of data.

  3. (3)

    Feature classification

    It can be seen in Table 1 that (1) When using the same channels and the same features, the PSO-SVM method has higher classification accuracies compared with the traditional SVM algorithm, as shown in Fig. 17a. (2)Using MD features and the OCS data, PSO-SVM achieves the highest accuracy of 86.57%, which is significantly higher than other methods in the table.

Fig. 15
figure 15

Improvements of classification accuracies of the OCS comparing to the TCS

Fig. 16
figure 16

Improvements of classification accuracies of MD features comparing to single-domain features

Fig. 17
figure 17

Improvements of classification accuracies of PSO-SVM

It can be seen in Table 2 that PSO-SVM gets better classification accuracies than Bagging and Gradient Boosting for the different features, as shown in Fig. 17b, and the PSO-SVM with the MD features gets the best result.

In summary, based on the OCS and MD features, the PSO-SVM achieves a satisfying classification accuracy.

7 Conclusions

In this paper, an iterative source localization method is proposed for the channel selection for the EEG signal classification of lower limb motor imagery. Based on the method, five channels FC1, FC2, C1, C2, and Cz are selected as the OCS, which perform better than the commonly used traditional channel set. For the EEG signals of the OCS, TD features, FD features, TFD features, and SD features are extracted and fused into MD features expressed as a MD feature matrix. Finally, the PSO-SVM classification method is used to classify the multi-domain fusion features. The results show that the classification accuracy is 88.43%, 3.35–5.41% higher than those of using traditional SVM to classify single-domain features on the TCS, which proves that the combination of OCS and MD features can not only reduce the amount of data processing, but also retain more feature information to improve the accuracy of EEG classification.

Future work related to lower limb MI-BCI should focus on the development of more accurate classification methods, including customizing the basis function of SVM to make it more suitable for the characteristics of lower limb MI EEG data, and expanding the classification ability from binary classification to multi-class classification. At the same time, more efficient and automated methods of data processing should be studied to realize online real-time feature recognition.