1 Introduction

Brain–computer interface systems (BCIs) enable a person to communicate directly with the outside world directly by the brain signal, differentiating to the traditional neural channels [1, 2]. It can help disabled patients achieve the need to control external equipment and assist in rehabilitation training [3]. With the rapid development of electroencephalogram (EEG) technology, more attention has been focussed on EEG-based BCIs. In just a few decades, classic paradigms such as P300, steady-state visual-evoked potentials (SSVEP), motor imagery (MI), and emotion BCIs have been developed [4].

Compared with other paradigms, the event-related desynchronization (ERD) and event-related synchronization (ERS) are commonly considered in MI-BCI. Motor imagery (MI) evokes a spontaneous EEG signal, which has great research values and potentials in rehabilitation. References [5,6,7] have shown that both the actual movement of limbs or the imagery of such movements similarly result in the Mu and Beta rhythms of the motor cortex in the brain attenuated or increased, in the form of ERD and ERS [8]. The MI-BCI uses these phenomena to extract features from EEG signals of different imagery tasks for classification and recognition. Pfurtscheller designed a virtual keyboard input method based on MI-BCI [9], which laid a solid foundation for the relevant research. Hochberg et al. helped patients to complete the activity of using a mechanical arm to drink by recording electrical signals from the motor cortical area of patients with cerebral palsy [10]. Yasunri Hashimoto et al. [11] combined the MI-BCI with the virtual reality system, where the individuals realized the manipulation of the avatar in the virtual reality system by imagining the movement of the left hand, right hand, and feet.

The MI-BCI brings much convenience to the disabled, but there are still many unresolved problems: (1) the EEG signal of motor imagery is spontaneous, whose non-stationarity results in significant individual diversity; (2) the task is based on motor imagery patterns that are closely related to the individuality of the subjects, and are not feasible for all people [12]; (3) the traditional MI paradigm has shortcomings, such as non-significant features and low classification performance, which affect the use of MI-BCI.

Many researchers have developed advanced machine learning algorithms to improve the performance of BCI [13,14,15,16]. These classification algorithms produce high classification accuracy but often require large training datasets. In addition, many feature extraction methods are used to differentiate signals under different tasks for more efficient classification [17,18,19].

However, despite advanced feature extraction and classification algorithms, the BCI cannot perform well if the subject is not able to adjust the brain activity to the BCI requirements [20]. In recent years, hybrid Brain–Computer Interface has become a hot spot in relevant research. It combines two or more types of EEG and other bioelectric signals to further improve the performance of a BCI system. It provides an attractive solution to the improvement of the overall operating capacity of users. Allison et al. [21] proposed a hybrid brain–computer interface that combined motor imagery and steady-state visual-evoked potential. The system achieved a recognition rate of 81.0%. In contrast, SSVEP achieved 76.9%, while MI had an even lower success rate (74.8%). Long et al. [22] proposed a hybrid brain–computer interface using P300 visual-evoked features and imaginary movement ERD features, which also showed a very good application prospect in target object selection applications. Yao et al. [23] combined motor imagery with selective sensation to enhance the discrimination between left and right mental tasks. In the study by Kim et al. [24], features were extracted from three different patterns, including RPs, the ERD/ERS, and the ERP, and the results showed that the method had high accuracy. Zhuang et al. [25] used a combination of wavelet and canonical correlation analysis to reveal the ERD/ERS pattern and combine multiple learning algorithms in a classifier to obtain the best learning results. Lu & Bi [26] designed a longitude-based algorithm to control the speed of simulated vehicles. In this algorithm, Common Spatial Pattern (CSP) was used to enhance the SNR of EEG signals, and then PSD features were extracted and SSVEP patterns were classified using traditional RBF kernel support vector machine (SVM) [27].

References [28,29,30,31] have shown that the generation of EEG signals from speech imagery (SI) is related to different brain areas, such as left inferior frontal gyrus, supplementary motor area, premotor area, superior temporal gyrus, and middle temporal gyrus. Bocquelet et al. [32] believed that the advanced processing of speech imagery was related to the left hemisphere. Ikeda et al. [33] studied the electrocorticography (ECoG) signals of vowel speech imagery and found that the alpha band (8–13 Hz) and beta band (13–30 Hz) may contain information for decoding single vowel speech imagery activities. Research [34] conducted by Sereshkeh et al. using EEG signals showed that during speech imagery (SI) activities, the EEG channel activation level increases in the 20–30 Hz frequency band in the Broca area and the band below 15 Hz in the Wernicke area. From these studies, one finds that SI activities generate EEG in a similar manner to right-hand MI. Tong et al. analysed the EEG under Chinese phoneme imagination and MI tasks and the result was better than traditional MI [35]. Therefore, the SI activities can be used to enhance the alpha and beta bands of the EEG signals generated on the left cortex during MI. This paper aims to conduct in-depth analysis on this hypothesis, as well as to improve the performance of the MI-BCI through SI. A novel paradigm using SI to enhance MI-BCI is proposed to solve the problems of non-significant feature differences and lower classification accuracy in the traditional MI paradigm.

The content of this paper is organized as follows. Section 2 introduces the proposed paradigm and methods used, and the conduction of offline data collection experiments; the third section gives the results of offline data analysis and classification; Sect. 4 discusses the advantages and disadvantages of the proposed paradigm, while Sect. 5 summarizes this paper.

2 Methods

2.1 Participants

This experiment was approved by the Ethical Committee of the Tianjin University of Technology. Eight healthy and right-handed subjects (2 females and 6 males, 22–26 years old) participated in our experiments, labelled as S1–S8. All the subjects were native Chinese speakers, with normal or corrected-to-normal vision and had never participated in relevant experiments before. They had been informed and consented to participate in the experiments. The average duration of the experiment for each subject (including preparation) was 2.5 h.

2.2 Experimental Paradigm

The proposed paradigm aims to improve the traditional right-handed MI task by combining it with the SI task, that is, the subjects need to silently read Chinese Pinyin (pronunciation of Chinese characters) and imagine writing the corresponding Chinese characters with the right hand. These tasks of speech imagery are denoted as Speech and Write Motor Imagery (SW-MI) in the paper, to differentiate itself to the Left-Hand Motor Imagery (LH-MI) and Right-Hand Motor Imagery (RH-MI). In order to verify the effectiveness of the proposed paradigm while minimizing the impacts caused by time-varying features of EEG, the SW-MI, LH-MI, and RH-MI tasks are mixed randomly for comparison.

During the preparation process prior to an experiment, subjects sit quietly in a comfortable chair, with their hands resting on the armrests, looking at the screen (60 Hz refresh rate, 1920 × 1080 resolution) about 60 cm far, and waiting for the experiment to start. The detailed flowchart of the experimental paradigm is shown in Fig. 1.

Fig. 1
figure 1

Time course of the proposed experimental paradigm

Each trial is divided into three temporal parts. The first part is the setup. The “cue1” instruction is displayed as a cross, lasting 3 s, in the centre of the screen, informing the subject that the experiment is about to start. After the cross disappears, the second part starts with “cue2”, showing one of the following images: “ ← ”, “ → ”, or “yòu shǒu” (the pronunciation of “right hand” in Chinese), representing LH-MI, RH-MI, and SW-MI, respectively. All cues are displayed in the same colour and size in the centre of the screen, to minimize the effects of eye movements and vision stimuli to colours. In the SW-MI, the subject imagines the Pinyin and imagines writing the Chinese character using the right hand at the same time. The subject performs imaginary activities according to the cue for 5 s. Then, the last part of “cue3” stops the imagination tasks by displaying the “●” in the centre of the screen, reminding the subject that the imagination task is finished and that they should enter a relaxed state in which they are allowed to blink, take deep breaths, or engage in other activities. One complete experiment consists of 30 trails, ten times for each type of tasks. Each subject conducts seven experiments on the same day, a total of 210 imagination tasks. Each subject is asked verbally at the end of the experiment whether he/she has conducted the experiment seriously and whether he/she has committed any infractions, such as sudden body movements or inattention during the experiment.

2.3 EEG Recording

A SynAmps2 amplifier (Neuroscan, version 4.5) was used to acquire the EEG signals. As shown in Fig. 2, a total of 64 electrodes were used. The electrodes were placed according to the international 10–20 systems. Throughout experiments, the impedances of all channels were kept below 5000 Ω. The sampling frequency of the system was set to 1000 Hz, and the bandpass filter was set to 0.1–100 Hz. After the experiments, the recorded data were saved as a ‘.cnt’ format file for subsequent analysis and processing.

Fig. 2
figure 2

Location of the 64 electrodes in the experiments

It was commonly noted that even a tiny action, such as scalp movement, frowning, biting, swallowing, and so on, can generate electromyography recorded by electrodes attached on the skin. It is difficult to separate the required EEG signals from the superimposed electrical signals, therefore, it is necessary to remove these electromyographic artefacts. In this work, independent component analysis (ICA) was applied to identify the artefactual components of the EEG signal, so that most of the artefacts were isolated and eliminated.

For a better signal data quality, it is necessary to avoid disturbances and noises as much as possible. Therefore, three experts observed the subjects through visual inspection. If the subject exhibited excessive action or inattention during the experiment, making the experimental data difficult to process, then the experiment was conducted again to ensure the artefactual noise was eliminated.

In addition, a questionnaire was completed by the subject after the experiment to ensure that the subject performed the speech imagery according to the clues. The experimental data were processed using MATLAB software and NVIDIA GeForce GTX 1650 GPU (8 GB).

2.4 Implementation of the Proposed Method

For each subject, the recorded data were first pre-processed before being split into three categories according to the labels corresponding to the three different tasks. Afterwards, the time–frequency analysis was applied to processed data. Finally, in the classification step, two types of signals were used to build a set of spatial filters that were used to extract features for SVM classification. These steps are shown in Fig. 3 and detailed in the next section.

Fig. 3
figure 3

EEG signal processing flowchart

2.4.1 Preprocessing

The recorded EEG data were pre-processed by EEGLAB [36]. First, the multi-channel EEG signals were re-referenced using the bilateral mastoid electrodes as reference. Second, 6–30 Hz bandpass filtering was performed to remove baseline drift, power frequency interference, and EMG artefacts. Third, sampling was downsampled to 250 Hz and independent component analysis (ICA) [37] was applied to remove electrooculographic artefacts from the EEG signals. To preserve data characteristics as much as possible, EEG data were standardized. Z-Score uses the formula of (x- μ)/σ to convert two or more groups of data into unitless Z-Core scores to unify data. Finally, EEG signals were obtained for the three types of tasks.

2.4.2 Signal Analysis

EEG signals are non-stationary and time-varying and cannot be fully analysed in the time or frequency domain alone. Therefore, the time–frequency analysis method is adopted in which the time–frequency plot is used to observe the evoked ERD in RH-MI and SW-MI tasks.

Time–frequency analysis, also known as joint Time–Frequency Analysis, is commonly used for processing and analysing EEG signals. The basic idea of this method is to design a joint function of time and frequency to simultaneously describe the energy density at different times and frequencies. Using the time–frequency distribution of the signals, one gets the instantaneous frequency components and their amplitudes at a given time point, which can be used to analyse time-varying signals.

The short-time Fourier transform is a well-known time–frequency distribution function, where short data segments are selected by conjugating the signal with a sliding window, then the Fourier transform is applied on each short segment separately. The definition of short-time Fourier transform is shown in Eq. (1):

$${\text{STFT}}\left( {t,f} \right) = \mathop \smallint \limits_{ - \infty }^{ + \infty } x\left( \tau \right)W^{*} \left( {\tau - t} \right)e^{ - j2\pi f\tau } d\tau ,$$
(1)

where \(f\) is the frequency, \(x\left( \tau \right)\) is the EEG signal of one trail, \(W\left( t \right)\) is the observation window, \(W\left( {\tau - t} \right)\) is obtained by shifting the time observation window to the right to any time point \(t\), \(W^{*} \left( {\tau - t} \right)\) is the complex conjugate function of \(W\left( {\tau - t} \right)\) , and \(x\left( \tau \right)W\left( {\tau - t} \right)\) is the signal obtained after windowing the original signal.

In this paper, to improve the spectrum leakage of rectangular window, the hamming window with t = 2 s was used. In order to explore the activation of the cerebral motor cortex under different tasks, event-related spectrum disturbances were calculated using short-time Fourier transform. Event-related spectrum disturbances are defined as changes in event-related spectrum power relative to the previous baseline or reference period. The event-related spectrum disturbance can be calculated as in Eq. (2):

$${\text{ERSP}}\left( {f,t} \right) = \frac{1}{N}\mathop \sum \limits_{k = 1}^{N} \left( {F_{k} \left( {f,t} \right)^{2} } \right),$$
(2)

where \(N\) represents the number of trials, and \(F_{k} \left( {f,t} \right)\) represents the frequency spectrum estimation at the frequency \(f\) at the time \(t\) of the \(k\)-th experiment.

Considering the time difference between subjects seeing the motor imagery task clue and performing the motor imagery, data from 0.5 s to 5 s in the process of motor imagery are selected for classification.

In EEG signal processing, CSP is a well-known and effective feature extraction algorithm [38]. The basic idea of CSP is to find the optimal direction of EEG signal spatial projection by constructing a set of spatial filters [39]. CSP is suitable for binary classification. For the two classes of EEG signals, single trail EEG data can be expressed as \(N \times T\) matrix \(X_{i}\), where \(N\) is the number of channels, \(T\) is the number of sampling points for each epoch, \(i\) (1 or 2) represents the \(i\)-th class. The sample data of the two classes are categorized as \(X_{1}\) and \(X_{2}\), respectively.

For the segmented raw data, the covariance matrix \(R_{i}\) is obtained as shown in Eq. (3):

$$R_{i} = \frac{{X_{i} X_{i}^{T} }}{{{\text{trace}}\left( {X_{i} X_{i}^{T} } \right)}}\left( {i = 1,2} \right),$$
(3)

where \({\text{trace}}\left( X \right)\) is the trace of the matrix \(X\), and \(i\) represents the \(i\)-th class. \(R_{i}\) is the expectation of the spatial covariance matrix of the sample data of the \(i\)-th class. The sum of the spatial covariance matrices of the two classes is calculated in Eq. (4):

$$R_{c} = R_{1} + R_{2} ,$$
(4)

where \(R_{c}\) is a positive definite matrix, which is obtained from the singular value decomposition theorem

$$R_{c} = U_{C} \wedge_{C} U_{C}^{T} ,$$
(5)

where \(\wedge_{c}\) is the corresponding diagonal matrix of eigenvalues of \(R_{c}\) sorted in the descending order and \(U_{c}\) is the corresponding eigenvector matrix.

By the whitening conversion on \(U_{c}\), one obtains the whiten eigenvector matrix \(P\) in Eq. (6):

$$P = \frac{1}{{\sqrt { \wedge_{c} } }} \cdot U_{c}^{T} .$$
(6)

Applying matrix \(P\) to \(R_{1}\) and \(R_{2}\), one gets the matrices \(S_{1}\) and \(S_{2}\) as shown in Eq. (7):

$$S_{1} = PR_{1} P^{T} , S_{2} = PR_{2} P^{T} .$$
(7)

\(S_{1}\) and \(S_{2}\) have common eigenvectors, so there exist two diagonal matrices \(\wedge_{1}\), \(\wedge_{2}\) , and eigenvector matrix \(B\), which meet the following conditions:

$$S_{1} = B \wedge_{1} B^{T} , S_{2} = B \wedge_{2} B^{T} , \wedge_{1} + \wedge_{2} = I,$$
(8)

where \(I\) is the identity matrix, which means that the sum of the characteristic values of \(S_{1}\) and \(S_{2}\) is equal to unity.

In the eigenvector matrix \(B\), if one class gets greater eigenvalue than the other, then the data correspond to the class with the larger eigenvalue. Therefore, matrix \(B\) can be used to solve the two-class classification problem, by means of the projection matrix \(W\) in Eq. (9):

$$W = \left( {B^{T} P} \right)^{T} .$$
(9)

The original EEG signal \(X\) can be projected through the matrix \(W\) to obtain the characteristic matrix in Eq. (10), where the dimensions of each matrix are detailed for clarification.

$$Z_{N \times T} = W_{N \times N} X_{N \times T} .$$
(10)

The first \(m\) rows and the last \(m\) rows \((2m < M)\) of \(Z\) are selected as the characteristics of the original input data. Then, the final feature is obtained using Eq. (11):

$$y_{i} = \log \left( {\frac{{var\left( {Z_{i} } \right)}}{{\mathop \sum \nolimits_{n = 1}^{2m} var\left( {Z_{n} } \right)}}} \right),$$
(11)

where \(y_{i}\) is the normalized characteristic matrix of the \(i\)-th sample.

2.4.3 Classification

To properly estimate the classification accuracy, an SVM model was used to classify the feature vectors of the EEG signals. SVM is a machine learning method based on statistical learning theory [40]. It seeks to minimize structured risk to improve the generalization ability of the model, and to minimize the experience risk and confidence range. In the case of low-dimensional samples, the purpose of good statistical law can also be obtained by using kernel functions to map the input vector to a high-dimensional space and construct an optimal classification function that approximates the classification hyperplane. Here, we do not optimize the hyper-parameters specifically, but use the default parameters for classification. The effectiveness of the SVM model depends on the kernel function and the penalty factor. The kernel function determines the spatial distribution of the training samples, and the penalty factor controls the proportion of the empirical risk and the confidence range. The kernel function used in this paper is the Gaussian Kernel. The grid search method of 10 × 10 cross-validation is used to find the optimal parameters g and c, by searching in the range 2−10–210.

3 Results

In this paper, the proposed method is used to demonstrate the potential of the multimodal paradigm in improving the separability of EEG features for BCI. The EEG data of 8 subjects (S1–S8) are used to validate the hypothesis that the speech imagery improves the EEG BCI performance based on traditional MI paradigm.

Because the imagination process designed in our experiments is time-consuming, and the performance of each subject in each experiment is different to some extent, the generation of ERD/ERS phenomenon may be different over the time. Therefore, in this paper, in order to better observe the ERD/ERS phenomenon in each experiment, a visual inspection of the 5 s of imagination process of a complete experiment is performed from the energy perspective. Figure 4 shows the corresponding Event-Related Spectral Perturbation (ERSP) plots of the right-hand tasks in two paradigms, for subjects S2 and S7. As the speech imagery task is added to the new paradigm, the C3 channel of the sensorimotor area and the FC3 channel of Broca's area are selected for analysis. As shown in Fig. 4, before t = 0, the subject is in resting state. The time t = 0 corresponds to the moment when the “cue2” appears in one trail, indicating that the subject started the imagery task. When the subject performed SW-MI and RH-MI tasks, the energy of the electrode channels C3 and FC3 in the alpha and beta bands decreased from the baseline level (as shown in blue in Fig. 4), due to the ERD phenomenon. Compared with RH-MI (the second row in Fig. 4), the ERD produced by SW-MI (the first row) is significantly stronger over a wider frequency band and lasts longer, especially in the alpha band, which means the SW-MI is more suitable for classification.

Fig. 4
figure 4

ERSPs of SW-MI and RH-MI in channels C3 and FC3 for subject S2 and S7

According to the ERSP diagrams, the 6–30 Hz frequency band is selected as the characteristic segment for filtering. The spatial pattern is extracted using the CSP method. Figure 5 shows the topographic maps of the two most important spatial patterns obtained, which clearly show the spatial pattern of the brain area when each subject performs motor imagery. The two pictures in the first row of each subject are from the first column and the last column of the feature matrix in the common spatial pattern extracted by the spatial filter constructed on the SW-MI and LH-MI-EEG signals, respectively. The second row comes from RH-MI and LH-MI. From the topographic maps of SW-MI and LH-MI, the spatial difference between the two tasks is observed.

Fig. 5
figure 5

Topographic map of eight subjects after CSP

The difference is not limited to the sensory-motor areas on both sides of the brain, but there is a larger activated area for SW-MI. This shows that, during 70 speech imagery tasks, subjects have a stronger ERD for SW-MI, thus differentiating it from hand MI. On the other side, the spatial patterns of left-hand and right-hand motor imagery of most subjects under the traditional paradigm are similar, with small differences observed in the feature space. This proves that, with the proposed paradigm, the spatial discrimination between classes is higher, which is helpful to improve the classification accuracy.

Figure 6 shows the classification accuracy of each subject calculated using the 10 × 10 cross-validation method, where the ‘mean’ represents the average classification accuracy of 8 subjects, which is 68.96% under the traditional paradigm versus 77.03%, under the new paradigm with an increase of 8%. The highest classification accuracy comes from the subject S6 between SW and LH, which is 91.29%. At the same time, the classification accuracies of the S2 and S7 have increased by 12.0% and 13.5% under the proposed paradigm compared with the traditional paradigm, respectively. From Fig. 6 it can be clearly seen that our proposed paradigm (SW&LH) obtains higher classification accuracies than the traditional MI paradigm (RH&LH).

Fig. 6
figure 6

The average classification accuracy of subjects in the two paradigms (p < 0.01)

To further analyse the differences in classification accuracy between the two paradigms, the data were classified in the alpha and beta frequency bands according to the frequency band energy changes in the ERSP chart and the spectrum chart. A tenfold cross-validation method was adopted, and the results are shown in Table 1.

Table 1 Classification accuracy of two paradigms in two bands

For the alpha band, the classification accuracy of 5 subjects in the traditional paradigm is less than 70%. In MI-EEG signal classification, 70% is a standard result [41], where the classification accuracy rate below 70% is not conducive to perform BCI. In contrast, using the proposed paradigm, the classification accuracy of all subjects' EEG signals in the alpha band exceeds 70%, which meets the criteria for BCI. This is an encouraging result, suggesting that the proposed paradigm is helpful.

The beta band features are also considered for the classification. As shown in Table 1, the classification accuracy of most of the subjects using the alpha band is significantly better than that of the subjects using the beta band, except for subject S3, demonstrating that the use of alpha band EEG data for classification is a good choice for the paradigm proposed in this paper. From the same table, the classification accuracies in beta band are higher than the traditional paradigm, which further proves the effectiveness of the proposed paradigm.

4 Discussion

This paper proposed a novel paradigm using Chinese speech imagery to enhance the EEG features in MI-BCI. Compared to the traditional motor imagery paradigm, the proposed paradigm generated clearer spatial features among classes, therefore, leading to higher classification accuracy than the traditional paradigm.

4.1 Speech Imagery Enhances the ERD in BCI

This research demonstrated that it is feasible to merge speech imagery and motor imagery in BCI by silently reading words relevant to specific classes while imagining the writing activities of the words. Imagery on writing activities awakens the subject's hand sensation from semantic understanding, as suggested in the literature [42, 43], thus activating the sensory-motor area in the brain resulting in a significant ERD.

Since the MI-BCI is based on the ERD/ERS phenomenon of the brain [44], when the subject imagines left-hand or right-hand movement, the opposite side of the brain will produce energy changes in the Mu rhythm. In the proposed paradigm, the activated brain cortex areas of speech imagination are mainly concentrated in the Broca area and Wernicke area [45], which are close to the motor area. Due to the connectivity of the brain, when the subject imagines speech, the motor area of the brain will also show energy changes, and this change is similar to the ERD phenomenon in motor imagery. Therefore, when the subjects simultaneously perform motor and speech imagery, the brain will be affected by these two tasks at the same time, which is supported by the energy differences in Figs. 4 and 5. It can be seen from the ERSP diagram in Fig. 4 that, under the proposed paradigm, the motor area of the subject induced a stable and long-lasting ERD phenomenon.

4.2 Languages Used in Speech Imagery may Make Differences

It should be taken into account that each person has different ability on speech understanding. In the literature, most of the BCIs based on speech imagery have been developed for English users. Due to the significant differences in language structure and pronunciation, it is worthy to study the BCI based on Chinese speech imagery by analysing the influence of Chinese Pinyin and semantic information on motor imagery. Through the proposed paradigm, the Chinese silent reading and writing imagination were added to the tasks conducted in BCI as a form of multimodal imagery activities. Other multimodal activities can also be considered, for example, integrating different sensory imagining tasks into the traditional MI-BCI, towards higher separability among classes.

4.3 Complex Movement Imagination Helps the Concentration with Less Training

An interesting point deserving attention is that, after each trial, we asked the subject's experiences. Although the proposed SW-MI task is more complicated, most subjects stated that when performing SW-MI it is easier to maintain their concentration. However, when performing the other two easier tasks (RH-MI and LH-MI), attention is easily distracted and susceptible to interference.

It is commonly recognized in literature that training is necessary for subjects to achieve effective motor imagination. Many subjects do not know how to imagine the hand movements, and it will take a long time to train subjects to achieve good results. However, when performing the SW-MI, the subjects will involuntarily concentrate on the tasks of silent reading and writing imagination, which improves the quality of the signal to a certain extent. This point is supported by the literature [46], which reported that imagining complex movements can activate the motor area of the brain more than simple movements, which explains why the proposed paradigm is more effective.

4.4 Future Work

The combination of speech and motor imagery, if fully developed, will further improve BCI performance, which opens up interesting topics for future research. In addition, it will also be worth studying how to use multiple types of imagery activities in this new paradigm, and how to accurately convert the corresponding EEG classification into multiple control commands for controlled devices.

In this paper, 64 electrode channels were used, which may cause the actual use of the BCI to be cumbersome. In the future, the choice of electrode channels will be further optimized.

At the same time, the subjects in this work are all healthy people, but the main targeting population of the existing MI-BCI system are people with disabilities [47]. The decoding method of healthy subjects' motor imaging tasks may not necessarily be the same as such patients. The SW-MI tasks need to involve speech imaging, so the proposed paradigm may not be suitable for subjects with diseases resulting in language disorders and atresia syndrome. Therefore, in future research, it is necessary to recruit subjects from different population categories, not only from the healthy group.

This work only uses EEG as an example to demonstrate the effectiveness of the proposed paradigm. A wide range of techniques based on the proposed paradigm, e.g. functional magnetic resonance imaging (fMRI) or others, can also be used to conduct in-depth discussions on brain activities in speech imagination and other perspectives.

The CSP method is used in the paper to demonstrate the potential of the proposed paradigm on improving the BCI performances. However, the single use of the CSP method still has some limitations and further research could test more feature extraction and classification algorithms.

5 Conclusion

In this work, we proposed a novel BCI paradigm combining motor imagery and speech imagery, in which the SW-MI tasks are proven to be superior to traditional hand motor imagery. The experimental results of eight subjects provided evidence for the interaction between speech imagery and motor imagery and showed that speech imagery is helpful for MI-BCI classification.

According to the subjects' oral comments, it is easier to maintain concentration in this new paradigm, and it is less boring. Therefore, this new paradigm not only improves the classification accuracy, but also helps new users to adapt to the BCI system faster and more efficiently. In line with the traditional paradigm, this experimental paradigm does not require any execution of the action, making it suitable for cases where physical movements are not feasible.

However, the work has limitations and can be improved in the future. The most important limitation is the number of subjects available in the experiments (n = 8). For a priori assumption on a large effect size (Cohen's d of 0.8), the error probability (alpha) set at 0.05 and a false-negative rate (beta) set at 0.2 (i.e. the power of 0.8), a minimum sample size of 15 subjects is needed for the paired t test. Another limitation is the language used in the experiments, so the results may not be general for other languages with different characteristics. In addition, we used the relatively simple CSP algorithm to conduct experiments, and other methods should be explored. These limitations point out the future research.