Introduction

In daily life, the function of the upper limbs accounts for 60% of the total body function, while the function of fingers accounts for 90% of the function of the upper limbs [1]. Complete hand function plays a very important role in people’s work and daily life. However, some patients, such as hand muscle weakness, hand paralysis, hand sequelae after stroke, and even cut-off patients, lose or partially lose the hand function. Therefore, it is suitable to use the brain–computer interfaces (BCI) technology to drive the exoskeleton or the prosthetic hand to compensate for their hand function.

BCI systems have been widely used in the last decades as a communication medium between the human and external devices, particularly those people with movement issues [2]. A typical BCI system allows a person to interact with the environment without involving the peripheral nervous system or muscles, using only brain activity [3]. Motor imagery (MI) is one kind of BCI, which detects the brain’s sensorimotor cortex activation to identify a person’s motor intent. Sensory homunculus shows that the cortical areas that control human hands account for the largest proportion of the total cortical areas, and a large proportion indicates that the movements can be controlled better. Therefore, it is possible to perform the hand MI with the BCI technology. To take advantage of brain activity, BCI systems require communication signals. Functional near-infrared spectroscopy (fNIRS) is a relatively new BCI signal with some favorable properties such as the high temporal resolution, spatial resolution, portability, and ease of wearing.

There have been some papers regarding the application of fNIRS in the field of BCI, especially in the field of MI. Bhutta et al. classified the fNIRS data for deception decoding, using the methods of linear discriminant analysis (LDA) and support vector machine (SVM), and the average classification accuracies of these two methods were 78.34% and 87.33%, respectively [4]. Yin et al. utilized fNIRS to classify the imaginary clench force and clench speed, and each movement is divided into three levels. The average classification accuracy using fNIRS signal alone was \(76 \pm 5\%\) [5]. In Jiao’s fNIRS-BCI study, the classification accuracy of finger percussion was 88.66% [6]. With the help of SVM, Neethu et al. studied the real-time dichotomy between the executions of the left hand and the right hand, and the imaginations of the left hand and the right hand. The accuracies were 63% and 80%, respectively [7]. Abtahi [8] and Abibullaev [9] collected fNIRS to complete the MI tasks of upper limbs and hands, respectively, and used SVM to classify them. The classification accuracy was more than 90%. Zhu [10], Peng [11], and Ghafoor [12] also only acquired fNIRS to complete the classification task of MI, and the classification method was LDA, with the accuracy rates of 87.8%, 70.43%, and 77.14%, respectively. By collecting fNIRS signal, Wang et al. classified the grasping motion imagination of the right hand. The classification accuracy was 80.21 ± 6.7%, and the classification method was SVM [13]. One common fact of the above papers is that only the fNIRS signal is used for MI.

In addition to simply using fNIRS for MI, there have been other studies that have used a mix of fNIRS and electroencephalography (EEG) signals. For example, Yin et al. used the fNIRS-EEG to perform motion imagery tasks for the hand speed and force, with classification accuracy of \(89\pm 2\%\) [5]. Kaiser et al. completed a 2-class (right hand and feet) MI-based BCI task in 15 subjects. Using the LDA classifier, the accuracy was \(89 \pm 6\%\) [14]. Yvonne et al. acquired EEG-fNIRS signals to classify MI and motor execution, the accuracy was 87% [15]. Zhu et al. gathered EEG-fNIRS to complete the classification task of hand MI, and the classification accuracies of SVM and LDA were 86% and 84.92%, respectively [16]. Fu et al. used the same experimental paradigm as the one presented in [5] and used a mix of EEG and fNIRS signals for classification. With the help of SVM, the accuracy was \(74 \pm 2\%\) [17]. Although the EEG signal is noninvasive and has a high temporal resolution, the EEG signal is also known for its low spatial resolution, low signal strength, and easy interference by strong electrical noises [18, 19].

Table 1 Some existing BCI results based on fNIRS

The summary of the recent literature is given in Table 1. It can be seen that there are only few types of MI tasks (two, three, or four types). In addition, no comparison of different classification methods has been made for MI tasks, and there is no in-depth study of physiological signals. For this reason, this study attempted to classify new MI tasks and analyze the fNIRS by the method of empirical modal decomposition (EMD). Compared with other papers, the improvements of the results presented in this paper are as follows:

  1. 1.

    In the designed experiment, five kinds of MI tasks of hand are completed, and each task contains four levels. The purpose of the experiment is to classify the four levels of each action only using fNIRS signals obtained from the motor areas and other corresponding regions of the brain. To the best of the authors’ knowledge, the experiment of motion imagination on this scale was not carried out before. And the actions designed in this experiment belong to the Fugl-Meyer assessment scale, and can be used as the basic BCI to control rehabilitation robots. This lays the foundation for the future use of the BCI technology to drive the exoskeleton manipulator or prosthetic hand for complex hand movement training.

  2. 2.

    In this study, nine classification methods and nine features are used to determine how to combine the classification method and features can lead to a satisfactory solution for the designed MI classification task. Experiments have shown a competitive classification accuracy.

  3. 3.

    To further reduce the number of fNIRS channels, the method of EMD is used to decompose the original fNIRS signal into several sub-modes. By calculating the maximum variance of each fNIRS signal sub-mode, the brain regions with the highest correlation with motor imagery task are identified, and we can just use the fNIRS signals of these regions to complete MI tasks. The experiment results show that the classification accuracy in this optimized setting is very close to that with all fNIRS channels.

This paper is structured with the following sections: the section “Materials and methods” describes the instrumentation, experiment paradigm, fNIRS probe position, experiment procedure, data processing, feature extraction, and EMD. The section “Results” presents experimental details regarding the fNIRS collection and the classification results of the designed experiment paradigm. The section “Conclusion and discussion” discusses the obtained results and concludes the entire paper.

Materials and methods

Subjects

In this experiment, six healthy volunteers [one woman and five men of age 33.3 ± 4.7 (mean ± SD), all righthanded] took part in the test. They are all healthy, have no mental diseases or history of psychological disorders, and have no experience about the test of BCI. They have been informed of the test in details, and have been given a short warm-up test before the formal test.

This human subject study is approved by the ethics committee of Institute of Automation, Chinese Academy of Sciences (protocol number: IA-201942), and all human subjects have signed informed consent forms.

Instrumentation

In this study, tests are conducted by the Brite 24 (Artinis, Netherlands) to collect fNIRS. “Brite 24 ” contains 24 channels and 18 optrodes (10 transmitters and 8 receivers). The receiver–transmitter distance is 3 cm and the sampling frequency is set as 25 Hz. To make the measurement more accurate, the device is equipped with three types of electrode caps: large, medium, and small, which can be selected according to the different size of the subject’s head.

Fig. 1
figure 1

Timing diagram of a single trial. In the timing diagram: baseline interval (BI), random rest (RR), ready interval (RI), task cue interval (TCI), and task interval (TI)

Experiment paradigm

This experiment is carried out in a non-laboratory environment. Five different hand MI tasks are designed which are essential components of daily life, including the hand’s group flexion and extension (GFE), hooklike grasping (HG), digital opposition (DO), cylindroids grasp (CG), and spherical grasp (SG). GFE and HG tasks are further divided into 4 levels [0%, 30%, 60%, and 100% of the maximum hand motion range (MHMR)]; DO, CG, and SG tasks are also divided into 4 levels [0%, 30%, 60%, and 100% of the maximum hand grasp force (MHGF)]. It is worth noting that 0% is actually a relaxed state, which can be used in all five MI actions.

Figure 1 shows the timing diagram of a single trial for MI task. The timing diagram is generated by E-Prime 2 (Psychology Software Tools, Inc., Sharpsburg, KY, USA). In Fig. 1, the bottom of the panel exhibits the MI task of one entire trial, consisting of four parts: during the baseline interval (BI), a red circle is displayed on the screen and the subject sees it for 20 s, keeping relaxed and motionless; during the ready interval (RI), a yellow circle is raised on the screen to remind the subject of preparing for MI tasks; during the task cue interval (TCI), task pictures are displayed on the screen to remind the subject of what to image; during the task interval (TI), a green circle is presented on the screen and the subject performs the corresponding MI task for 20 s. In addition, when the subject imagines the movements, he/she is also asked to subvocalize the corresponding action to prevent distraction. A single trial costs 44 s, and each level of imaging task contains 30 trails. The middle of top panel of Fig. 1 shows five different MI tasks, where the left two pictures indicate the GFE and HG MI tasks and the far left picture displays 0%, 30%, 60%, and 100% of MHMR (the period of imaging hand opening and closing is about 6–8 s); the right three pictures indicate the DO, CG, and SG MI tasks and the far right picture shows 0%, 30%, 60%, and 100% of MHGF. The subject is required to complete the target force of the hand grasp within 2 s and then keep the force constantly for the following 18 s.

Probe deployment of Brite 24

The brain regions associated with the hand motor mainly include: the primary motor cortex, premotor cortex, and sensorimotor area. Because the subject is also asked to subvocalize the required MI task, the Broca’s area is also considered. As shown in Fig. 2, the probe positions of the fNIRS detection device are: BA6 region [premotor cortex (PMC)], BA4 region [motor cortex (M1), sensorimotor cortex (S1)], and Broca’s area. The yellow dots are the transmission terminals and the blue dots are the receiving terminals. The shortest distance between the yellow point and the blue point is 3 cm. The device has a total of 16 terminals, comprising 24 channels.

Training

Before the formal data collection, all subjects are taught about the process of the experiment many times until they can retell it fluently.

For GFE and HG, they are asked to stretch their four fingers to the maximum extend as they can for about 30 s to remember how their hand muscles feel. Then, using this feeling as a reference point, they should stretch or hook their hand to 60%, 30%, and 0% of MHMR, and also remember how their hand muscles feel at a specific level.

For DO, CG, and SG, the MHGF is determined by the average of three maximal forces: digital opposition, cylindroid grasp, and spherical grasp with the hand dynamometer. The subject takes a break every 3 min. After the completion of the MHGF task, the subject is required to do DO, CG, and SG with the hand dynamometer, and the target is 60%, 30%, and 0% of MHGF, which lasts for 2 s, and then keep hands still for 10 s. Each level of training involves ten tests to build the muscle memory.

Fig. 2
figure 2

The optode arrangement. Left panel is 2-D view; right panel is 3-D view. The yellow points are transmitting terminals; the blue ones are receiving terminals

Experiment procedure

The experiment procedures are shown in Fig. 3. The subject sits in a comfortable armchair with their arms resting naturally on the table.

Fig. 3
figure 3

The experiment process of MI. The laptop is the control terminal; the iPad or the desktop is used for screen tips

There is a desktop screen/iPad screen on the table which is 0.8–1.2 m away from the subject. The subject is asked to start a trial, and then to execute 30 trails for every task. One trial lasts for 44 s and the subject is required to avoid any body movements or frequently eye blinking. The trails (different MI tasks) in this study are not presented in a randomized sequence. The order of actions is the same among the subjects. This setting can reduce the number of movement repetition errors in comparison to randomized protocols [20].

There are five actions in the experiment, and each action includes four levels, and each level needs to be completed for 30 times. The subjects take a 5-min break to regain their concentration after every two levels. The subject completes two actions a day, and the entire training lasts for 2.5 days. To ensure the consistency of the measurement status when wearing the optical cap, the wearing process of the optical cap is set as follows: first, the central point CZ of the human brain is found, which is used to fix the position of the CZ point of the electrode cap; second, the midline of the electrode cap is coincident with the line between the patient’s nose bridge and CZ point; finally, the location of the electrode is set in accordance with International Society of Electroencephalography 10–20 standards. This ensures that when the subject wears the cap again, the position of the electrode is the same as he/she first wears it.

fNIRS signal collection

The data collected by the instrument are concentration signals including: the oxygenated hemoglobin (Oxy-Hb), deoxygenated hemoglobin (Deoxy-Hb), and total of hemoglobin (THb). The sampling frequency is 25 Hz, and the cut-off frequencies of the band-pass filter are 0.01 Hz and 0.1 Hz, respectively [17].

Feature extraction

In this paper, the sliding window method which is most commonly used for processing fNIRS signals is used for feature extraction. In the classification task, the selected time-domain features include: mean value (MV), slope factor (SF), mean absolute value (MAV), integrated absolute value (IAV), passing zero numbers (PZN), and passing mean numbers (PMN). The definitions of these features are listed as follows:

$$\begin{aligned} \mathrm{MV}= & {} \frac{1}{{\mathrm{L}}}\sum _{n=1}^{\mathrm{L}}{x(n)}, \end{aligned}$$
(1)
$$\begin{aligned} \mathrm{SF}= & {} {{\mathrm{Polyfit}}}[x(1)\longrightarrow x(\mathrm{L})], \end{aligned}$$
(2)
$$\begin{aligned} \mathrm{MAV}= & {} \frac{1}{{\mathrm{L}}}\sum _{n=1}^{\mathrm{L}}{|x(n)|}, \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{IAV}= & {} \sum _{n=1}^{{\mathrm{L}}}{|x(n)|}, \end{aligned}$$
(4)
$$\begin{aligned} \mathrm{PZN}= & {} {\mathrm{Num}}\left[ \sum _{n=1}^{{\mathrm{L}}}{x(n)x(n+1)<0}\right] , \end{aligned}$$
(5)
$$\begin{aligned} \mathrm{PMN}= & {} {\mathrm{Num}}\nonumber \\&\times \left[ \sum _{n=1}^{\mathrm{L}}(x(n)-{\mathrm{Avg}}(x))(x(n+1)-\mathrm{Avg}(x))<0\right] ,\nonumber \\ \end{aligned}$$
(6)

where \(\mathrm{L}\) is the length of time window, and in this test, it is 2 s; x(n) is fNIRS signal. Polyfit is the polynomial curve fitting a range of area. x(1) is the starting point of the area; \(x(\mathrm{L})\) is the end point of the area. SF is the slope factor of the polynomial curve. Num is the count number that satisfies the condition in the following bracket. Avg(x) is the average of x(n) in the time window length of \(\mathrm{L}\).

The frequency-domain features used in this study contain instantaneous amplitude (IA), instantaneous phase (IP), and instantaneous frequency (IF). These features can be extracted by Hilbert transform (HT) [5, 17].

The non-stationary signal x(t) is transformed into y(t) by Hilbert transform as follows:

$$\begin{aligned} y(t)=\frac{1}{\pi } P \int _{-\infty }^{\infty }{\frac{x(\tau )}{t-\tau }\mathrm{{d}}\tau }, \end{aligned}$$
(7)

where \(\mathrm P\) is the Cauchy principal value. Then, IP, IA, and IF can be, respectively, calculated by:

$$\begin{aligned} \mathrm{IP}= & {} \varphi (t)=\arctan {\frac{y(t)}{x(t)}}, \end{aligned}$$
(8)
$$\begin{aligned} \mathrm{IA}= & {} \sqrt{x(t)^2+y(t)^2}, \end{aligned}$$
(9)
$$\begin{aligned} \mathrm{IF}= & {} \frac{\mathrm{{d}}\varphi (t)}{\mathrm{{d}}t}. \end{aligned}$$
(10)

Both the time-domain and frequency-domain features can be divided into two categories: one is to reflect the value of the curve in the time window, and the other is to reflect the change degree of the curve value in the time window.

Classification

Besides SVM and LDA, some classical machine learning classification methods are also tested in this study. They are: random forest (RF), quadratic discriminant analysis (QDA), k-nearest neighbor (KNN), decision tree (DT), feed forward neural networks (FFNN), naive Bayes (NB), and ensemble learning (EL). SVM is modified according to the method of bagging [21], whose kernel function is chosen as Gaussian radial basis function. The number of decision trees is set to be 500 in RF. In QDA and LDA, the types of discriminant functions are chosen as linear and quadratic. There are 4 neighbors in KNN, and the number of trees in DT is 100. Back propagation is applied in FFNN. Naive Bayes is based on the attribute conditional independence assumption. In EL, the method is set as AdaBoostM2 and the weak learners are chosen as DT based on ID3.

The method of slide time window is used as follows: the length is set to be 2 s and the sliding distance is set to be 1 s. The proportion of the training set to the testing set is 3:2. The classification accuracy is determined by the average of 10 tests.

Optimize fNIRS channels

To utilize fewer fNIRS channels to complete the classification tasks, it is necessary to determine the relevant channels. EMD is such a signal decomposition method. It is the first step of Hilbert–Huang Transform, which acts as a dyadic filter bank [22]. After EMD, the original signal can be decomposed into intrinsic mode functions (IMFs). The decomposition results in a set of empirical mode functions and a residual term, which can represent the trend of the signal or a fixed value [23]. The mathematical formula of EMD is given as follows:

$$\begin{aligned} X(t)=\sum _{i=1}^{n}{\mathrm{IMF}_{i}{(t)}+r_n{(t)}}, \end{aligned}$$
(11)

where X(t) is the original data, \(\mathrm{IMF}_{i}{(t)}\) are intrinsic mode functions, and \(r_n{(t)}\) is the residual term.

Fig. 4
figure 4

fNIRS signals at channel 16: a Oxy-Hb, Deoxy-Hb, and Thb in task 30% of MHMR during GFE. b Oxy-Hb of four levels during GFE

Results

fNIRS signals

There are three types of signals collected in the experiment: Oxy-Hb, Deoxy-Hb, and THb. As shown in Fig. 4a, the red solid line is Oxy-Hb, the blue dashed line is Deoxy-Hb, and the green dotted line is THb. Figure 4 shows the fNIRS signal at channel 16 when the subject carries out three trials. The x-axis represents the sampling points, and one trial is composed of 1100 sampling points. Therefore, trial one is between 0 and 1100, trial two is between 1101 and 2200, and trial three is between 2201 and 3300. The y-label is the concentration of fNIRS. In Fig. 4a, the subject’s MI movement is 30% of MHMR of GFE MI task. The positions of the yellow vertical lines represent the starting time of the RI and the black dotted lines represent the starting time of the TI. The areas between the black solid lines and the yellow vertical solid lines are BI processes, the areas between the yellow vertical solid lines and the black vertical dashed lines are the RI and TCI processes, and the regions between the black vertical dashed line and the black solid line are the TI processes. In Fig. 4a, when the signal curves cross the yellow line, the red solid curve rises quickly, the blue dashed curve changes a little bit, and the green dotted curve follows the red solid curve. The results show that the average amplitudes of the red solid curves and the green dotted curves in TI are higher than those in BI, while the amplitude of the blue dashed curve does not change too much. Therefore, the data of the red solid curve (Oxy-Hb) are selected for the subsequent analysis.

Figure 4b shows different four color lines (red, blue, green, and pink) which, respectively, represent four levels of GFE (0%, 30%, 60%, and 100% of MHMR) based on Oxy-Hb. It can be seen from Fig. 4b that the pink curve changes dramatically. And then, the ranges of amplitude from top to bottom are green, blue, and red. This indicates that the intensity of signal change is proportional to the magnitude of imagined hand motor range. The other four MI types of fNIRS figures are similar to GFE, which are therefore omitted from this paper.

Table 2 Classification accuracy (%) based on different features and different classification methods for GFE

Classification results

With different features and different classification methods, the classification accuracy of GFE MI task is provided in Table 2. The accuracies greater than 80% are all marked in bold. Table 2 illustrates that features with the satisfactory accuracy are MV and IA, and the satisfactory classification methods are SVM, QDA, KNN, and RF. The results of the other four MI tasks are similar to the results of GFE, which are omitted here.

Furthermore, MV and IA are combined together to check whether the classification accuracy can be improved or not. The results are given in Fig. 5. It can be seen that the classification accuracy is generally improved. In this case, the best classification methods are SVM and RF whose classification accuracies are 89.12% and 88.47%, respectively.

Fig. 5
figure 5

Classification accuracy under different features and different classification methods

Fig. 6
figure 6

Raw signal and IMFs

Fig. 7
figure 7

Comparison between the 5-channel classification accuracy and the full-channel classification accuracy

fNIRS channel optimization results

According to (11), IMFs of 24 channel signals are calculated, which are given in Fig. 6. The first curve in Fig. 6 is the original signal, and the curves from 2 to 11 are 10 IMFs decomposed from the original curve. The 5 channels with the largest IMF variance are given in Table 3. They are R7–T7, R8–T8, R3–T5, R5–T6, and R8–T10.

R7–T7 correspond to the PMC area, and R8-T8 correspond to the M1 area. R3–T5 and R8–T10 correspond to the Broca’s area, and R5–T6 corresponds to the C3 area. Then, signals from these 5 channels are used to fulfill the classification tasks. As seen from Fig. 7, with the IA and MV feature combination, the classification accuracy based on these five channels is relatively high, which is only slightly lower than that based on 24 channels. The results imply:

  1. 1.

    For the MI tasks designed in this paper, the classification accuracy using these 5 channels is comparable to that based on 24 channels.

  2. 2.

    For the MI tasks designed in this paper, the BA6 area is more sensitive than the BA4 area.

  3. 3.

    In this MI paradigm, the Broca’s area is activated to a certain extent, because the subject is required to subvocalize specific MI tasks.

Discussion and conclusion

In the proposed experimental paradigm, only fNIRS signals are used. Therefore, there is no need to consider the synchronization between different physiological signals, which reduces the complexity of the system. In addition, it helps to collect more accurate BCI signal, because subjects may shake their heads during a long-term test (the EEG signal is easy to be interfered by the shaking head behavior, while the fNIRS is less affected). Although the designed MI tasks are relatively simple, they are the basis for other complex hand movements. In addition, these MI tasks belong to the typical movements of Fugl-Meyer assessment scale. A desirable recognition of these tasks may lay the foundation for BCI-controlled hand rehabilitation robot.

The tests are conducted in a non-laboratory setting and are performed by people who are familiar with the subjects, and an familiar environment makes the subjects behave more comfortably [24]. At the same time, a simple test proves that when subjects face a familiar environment, fNIRS signal fluctuations are relatively small, which is less likely to interfere with the normal fNIRS signals.

Table 3 Five channels with the largest variance in five MI tasks

With an appropriate feature selection process, it is recommended that for the proposed experimental paradigm, the combined feature (MV and IV features) can lead to the highest classification accuracy (89.12% by SVM and 88.47% by RF). Furthermore, the original signals are decomposed into IMFs by EMD, and the 5 channels with the largest variance could also be used to complete the classification task, and the classification accuracy is close to that of the complete channels. This suggests that these 5 channels can be used as principal components of 24 channels, and the functions of these 5 channels in the brain areas are consistent with the task of MI. On one hand, it can promote the understanding of the cerebral cortex. On the other hand, according to the motor imagery task, more accurate use of the fNIRS signals of the corresponding cortical regions of the brain can further optimize BCI and improve efficiency.