1 Introduction

The recognition of brain activity is amongst the most promising research fields over the last decades. This technology deals with human brain aided control of computer or other electronic devices without the intervention of any peripheral nerve or muscle with the purpose of permitting humans to interact with their surroundings [16]. By exploiting the electrical activity of the brain, Brain Computer Interface (BCI) aims at translating brain signals into electronic commands. On one hand, it is useful for people with severe motor impairments in order to restore communication and movement [2, 5, 31, 32, 35,36,37, 40, 43, 53, 60]. It is also beneficial for some disease detection such as Schizophrenia [4], ADHD [48], epilepsy [3, 59] and parkinson [41], On the other hand, it would be a new interface for healthy people, in numerous applications like video game control [12, 13, 19, 25], cell phone control [55], smart living environments [26, 27], sporting behavior [33] and others [49]. Other interesting studies have addressed the use of BCI technology for emotion recognition [51, 52].

There exist various technologies to record brain activity, among which, electroencephalography (EEG) and magnetoencephalography (MEG) are noteworthy for providing high temporal resolution, which is a critical factor for BCI [29, 38]. MEG based systems are both expensive and very bulky, making their use for daily applications not practical. The EEG, on the other hand, can be recorded using different and simpler techniques:

  • The invasive, which requires the implantation of an electrode inside the subject’s brain. This technique provides high spatial resolution but it’s the most expensive.

  • The semi-invasive, relies on the implantation of one or more electrodes inside the subject’s skull but outside the brain.

  • The non-invasive, consists of using external sensors, which makes it the cheapest but provides a low spatial resolution.

At present, the most widely used technique is the non-invasive, due to its safety, efficiency and the fact that it requires relatively simple and inexpensive equipment, which makes it viable for practical applications.

Motor imagery (MI) based brain computer interfaces is one of the most frequently used mental strategies in BCI applications that deals with performing a motor task simply by thinking or imagining. Whenever a subject imagines shifting of any part of her/his body, like moving hands, legs or any other motor activity there will cause a change of activation in certain areas of the brain. In this case, there are no external stimulation and the subject generates the specific activity on her/his own. For example, when a person imagines moving his/her right arm, there will be a desynchronization of neural activities in the primary motor cortex of the left brain [39].

MI is more attractive and probably more practical than all other BCI and User Computer Interfaces systems by being independent from any real physical movement. Just thinking of a given motion generates useful signals.

Since its inception, researchers have mainly used brain computer interface as a communication tool for people with disabilities [42]. Motivated by the aim of restoring independency and helping to reduce social inequality for those people, BCI research has attracted a wide range of prominence in this area. Several type of applications has been studied, some of which:

  • Helping disabled people: BCI can improve the quality of life for person with different type of disability offering them a way to communicate, control and interact with their environment directly and by emulation [21]. Here, BCI is often used for helping the disabled to control wheelchair by the mean of brain wave [36, 37, 60] and movement restoration by controlling neuroprosthese [31, 32, 53].

  • Neurorehabilitation: recently, promising papers have been published studying the benefit of applying BCI in neurorehabilitation, with the aim to increase the effects of physiotherapy in patient with sever motor impairment [2, 5, 35, 40, 43].

  • Health: like assisted living [14], appliance control [22, 62] and biomedical engineering field [1, 10, 17, 34, 44, 56, 58].

Various approaches have been applied in MI- BCI with respect to EEG data classification. The standard approach consists of focusing the work on features engineering and using the resulting features as input to a standard classifier. Some examples of this approach include, [28] where an accuracy of 84,06% was achieved by using wavelet packet decomposition as a feature extraction method followed by a dynamic frequency feature selection algorithm to select the most accurate feature for each subject. In [24], the authors first selected the time period that is most related to the motion information, then used the wavelet packet decomposition and SE-isomap. This method attained an accuracy of 84,68%. In [18] an accuracy of 74,03% was achieved by applying a Filter Bank Common Spatial Pattern as features extraction methods. At last, [20] report an accuracy of 100% in one session dataset achieved through the application of a facile and flexible motor imagery classification with an extreme learning machine classifier.

A newer approach is to take advantage of deep learning algorithms capabilities, where the most widely used architectures are Long Short-Term Memory (LSTM) and Convolutional neural network (CNN). These methods can process EEG data without the need to feature extraction or combining the raw EEG signals and features resulting from feature extraction procedures. In [47] an accuracy of 86,41% was reached by applying an neural network classifier with 2 CNN layers to filtered EEG signal, while authors of [46] combine a one-versus-the-rest common spatial pattern features extraction and a CNN classifier to achieve an accuracy of 91,1% in a 4 MI classes. In [54], authors combine features extraction with LSTM obtaining an accuracy around 88%. The model developed by Tortora and colleagues in [50] consists of a EEG signal preprocessing followed by an LSTM deep neural network classifier, which led to a hit rate of about 70%. Despite that it is a powerful approach, its great drawback is its high computational cost in train and application and the need for a large number of training samples. An interesting proposal that allows to increase the number of samples available for training is to use transfer learning, with which Zhang et al. [61] obtained an accuracy of 86,89%.

In recent years, numerous works based on Riemannian geometry such as [7] and [11] have been developed with the purpose of classifying EEG signals, using its covariance matrices as features extraction. These techniques have shown great performance since they permit the manipulation of the covariance matrices in their native space.

The statistics presented above prove the effectiveness of the models developed by their authors. Since the execution environment and the datasets used are different, they cannot be used for a correct comparative analysis. Thus, to have a meaningful comparison, it is necessary to reproduce each of the algorithms under the same conditions using the same databases as the model in question of this paper. Therefore, for a correct comparison with the model proposed in this paper, “Results and discussion” section reports the results obtained with the TSC algorithm, which represent our reference method, after having repeated the experiment in the same conditions.

This work propose a multiple tangent space projection classifier, a classifier that exploit EEG covariance matrices using riemannian geometry. This method is built upon the tangent space mapping method proposed by A. Barachant in [7], a method which maps the covariance matrices from their native manifold space to a Euclidean space, where it is possible to apply sophisticated algorithms for better classification results.

The Riemannian geometry based methods is presented in more detail in the next section and represents the starting point of the method proposed in this work.

This paper is organized as follows: First we introduce the concept of tangent space and how it can be used to classify EEG signals in Section 2. The proposed approach and the results are presented in Sections 3 and 4 respectively. Finally, Section 5 summarize the main conclusion of this work and present some suggestions of further research lines.

2 Classification in tangent space

The idea behind the Classification tangent space is to avoid the complexity of using complex riemannian geometry and having to develop classifiers based on riemannian geometry and instead project the data to a Euclidean space where euclidean geometry, and standard classifier can be used [6]. This classification method is implemented in four steps:

  1. 1.

    calculate trials EEG signal covariance matrices.

  2. 2.

    definition of the tangent space or the reference point in the manifold where the tangent space is defined.

  3. 3.

    project the input data to tangent space.

  4. 4.

    apply a standard classifier.

2.1 Covariance matrices

From the BCI perspective, the spatial covariance matrices give valuable information. In fact, on the diagonal of this matrix, we can obtain the variance of the signal measured by any sensor used to calculate the classical PSD (Power Spectral Density) features [57]. In addition, the off-diagonal records represent the covariance between signals measured by two different sensors, this covariance provide a measure of relation between the two signals. Compared to the PSD features (based on variance), the representation of EEG epochs as spatial sample covariance matrices is richer in terms of information.

The BCI is based on the assumption that a short-time segment of the EEG signal is associated with a specific thought or action (raise the right hand or left leg, ...). Thus, in BCI the EEG signals are analyzed on short-time segments referred to as trials. For every trial X(i), for n = 1...N where N is the number of trials, \(X^{(i)} \in \mathbb {R}^{e \times t}\) the trial EEG signal of e channels and t time samples, y(i) is the associated class, that indicate the thought or the action. Due to the limited number of available samples (t samples per trial), we are unable to directly calculate the covariance matrix. Therefore, we estimate it using the Sample Covariance Matrix (SCM), while assuming that the mean of the EEG signal is zero (E[X(i)] = 0) we have:

$$ \mathbf{C}^{(i)} = \frac{1}{(t-1)} \mathbf{X}^{(i)} \mathbf{X}^{(i)T} $$
(1)

The covariance matrices lie on the riemannian manifold of symmetric positive definite (SPD) matrices. A Riemannian manifold is a differentiable manifold in which the tangent space at each point is a finite-dimensional euclidean space. See [30] for more details. This riemannian manifold property provides the possibility to project the covariance matrices to the tangent space and to use standard machine learning algorithms [8, 15].

Let’s introduce some notations and basic mathematics. Denote by \(S(n) = \left \{ \mathbf {S} \in \mathbb {R}^{n \times n}, \mathbf {S}^{T}=\mathbf {S} \right \}\) the space of all n × n symmetric matrices in the space of square real matrices \(\mathbb {R}^{n \times n}\) and \(P(n) = \left \{ \mathbf {P} \in S(n), \mathbf {v}^{T}\mathbf {P}\mathbf {v} > 0, \forall \mathbf {v} \in \mathbb {R}^{n} \right \}\) the set of all n × n symmetric positive-definite (SPD) matrices.

For a PP(n), the exponential and logarithm matrix is obtained using the eigenvalue decomposition of :

$$ \mathbf{P} = \mathbf{U} diag(\sigma_{1}, ... , \sigma_{n}) \mathbf{U}^{T} $$
(2)

where σ1 > σ2 > ...σn > 0 are the eigenvalues and U the matrix of eigenvectors of P. It reads:

$$ \exp (\mathbf{P}) = \mathbf{U} diag(\exp (\sigma (1), ..., \exp (\sigma (n)) \mathbf{U}^{T} $$
(3)

and:

$$ \log (\mathbf{P}) = \mathbf{U} diag (\log (\sigma (1), ..., \log (\sigma (n)) \mathbf{U}^{T} $$
(4)

We also have the following properties :

  • PP(n),det(P) > 0

  • PP(n),P− 1P(n)

  • ∀(P(1),P(2)) ∈ P(n)2,P(1)P(2)P(n)

  • \( \forall P \in P(n), \log (\mathbf {P}) \in S(n)\)

  • \( \forall S \in S(n), \exp (\mathbf {S}) \in P(n)\)

Finally, notation P1/2 defines a symmetric matrix A that fulfils the relation AA = P.

2.2 Tangent space projection

Let’s consider P(n) the space of the n × n SPD matrices. To every point PP(n) of this space corresponds a tangent space T(n), lying in S(n), composed of all tangent vectors to P. The tangent space is a flat space, which makes possible the use of traditional Euclidean arithmetic tools [8].

In order to operate in tangent spaces, a mapping from P(n) to T(n) is required. This mapping is achieved by a logarithmic projection \(\log \_p (\mathbf {P}): P(n)\rightarrow T(n)\) defined as:

$$ \mathbf{S_{i}} = \log\_p \left (\mathbf{P_{i}} \right ) = \mathbf{P}^{1/2}\log\left (\mathbf{P}^{-1/2} \mathbf{P_{i}} \mathbf{P}^{-1/2} \right ) \mathbf{P}^{1/2} $$
(5)

and the inverse mapping is given by the exponential projection:

$$ \mathbf{P_{i}} = \exp\_p \left (\mathbf{P_{i}} \right ) = \mathbf{P}^{1/2}\exp\left (\mathbf{P}^{-1/2} \mathbf{S_{i}} \mathbf{P}^{-1/2} \right ) \mathbf{P}^{1/2} $$
(6)

2.3 Tangent space classifier

One of the first and promising work in this field is the Tangent Space Classifier (TSC) presented by Barachant et al. [7]. In TSC a binary classifier which considered tangent space in the point corresponding the reimannian mean of all training covariance matrices. Let’s consider a set of labeled training data {C(n),t(n)} for n = 1...N where N is the number of training data, Cn the covariance matrix of the n EEG trials and t(n) its label. The first step is to calculate the reference point Cref as the reimannian mean of all training covariance matrices C(n) and to calculate the projection Z(n) of C(n) using (5)

$$ \mathbf{Z}^{(n)} = \mathbf{C}_{ref}^{1/2}\log\left (\mathbf{C}_{ref}^{-1/2} \mathbf{C}^{(n)} \mathbf{C}_{ref}^{-1/2} \right ) \mathbf{C}_{ref}^{1/2} $$
(7)

given that Z(n) are symmetric and duplicate input features are not suitable for machine learning algorithm. Therefor only the upper triangular part of matrix is considered and reordered to form a vector of length d = e(e + 1)/2 (e number of EEG channels). Equation (7) is rewritten as:

$$ \mathbf{z}^{(n)} = upper \left (\mathbf{C}_{ref}^{1/2}\log\left (\mathbf{C}_{ref}^{-1/2} \mathbf{C}^{(n)} \mathbf{C}_{ref}^{-1/2} \right ) \mathbf{C}_{ref}^{1/2} \right ) $$
(8)

finally, the set {z(n),t(n)} for n = 1...N is used to train a standard binary classifier. The atopted classifier in that work is Logistic Regression (LR) due to its simplicity and good results [45].

3 Proposed method

Taking as its reference point the TSC presented in the previous section, this paper introduces a new classifier based on tangent space projection for EEG signals. The main idea of the present work starts form the inquiry >why use a single projection?

In machine learning, the features had a major impact on the finale performance of the model. In classification problems, the model have to extract from the features a way to disseminate between the different classes, so that, by constructing more discriminative features we can improve the quality of the resulting model.

As mentioned before, in the TSC original method, the authors projected the data from the original manifold to an euclidean space tangent to the manifold in the point representing the mean of the data. This projection can be considered as a general view.

Based on that, The main novelty of this work is the use of multiple projection, one per class, so every projection can provide the classifier a different view of the data which depends on the class. For simplicity, only a two-class case is considered in the rest of this paper, and the multi-class problems will be discussed in a future work.

The multiple tangent space projection classifier (M-TSC) consists of the projection of input data on two different tangent space which reference points Cref0 and Cref1 are respectively, the riemannian mean of training samples from classes 0 and 1. According to that, and using (8), two projection vectors \(\mathbf {z}_{0}^{(n)}\) and \(\mathbf {z}_{1}^{(n)}\) with length d are obtained. These are concatenated to form the features vector z(n) with length 2d. Hence, the set {z(n),t(n)} for n = 1...N is used to train a standard binary classifier. The training procedure that estimates the classifier parameters W is summarized in Algorithm 1, while Algorithm 2 presents the prediction procedure.

Algorithm 1
figure d

Training the M-TSC model.

Algorithm 2
figure e

Class prediction with M-TSC.

The method in study, creates a feature vector with a dimension twice the size of the TCS dimension compared to the reference one. The major advantage of this increment on dimensionality is that it improves the classifier’s ability to obtain a better distinction between classes.

Due to the technical difficulty of building BCI databases, the number of samples N in this dataset tends to be small compared to the number of features D (DN), what usually make BCI problem fall in the curse of dimensionality. In this setting the high variance and overfitting are a major concern that can be a critical limitation for the algorithm, which limit the choice of an appropriate classifier. In this work two classifiers are considered:

  • Logistic regression with L1 or L2 regularization which act as a sort of feature selection and reduce model variance and overfitting.

  • Support Vector Machines (SVM) a less sensitive model, giving it construction, to high dimensionality

4 Experiment

In order to demonstrate the consistency and effectiveness of the multiple tangent space projection approach (M-TSC), this section presents a series of experiments to compare its performance in term of classification accuracy to the single tangent space projection method. The reasons to compare with TSC is mainly motivated by the similitude between the two approaches and because TSC was used as a starting point for the M-TSC. Two version of the proposed method are considered:

  • M-TSC1: the results of the two projections are concatenated and directly used as input for a linear classifier (logistic regression or SVM).

  • M-TSC2: the results of the two projections are concatenated and normalized to zero means and unity standard deviation before being used as input for a linear classifier (logistic regression or SVM).

4.1 Dataset

With the purpose of evaluating the proposed model’s performance, two public datasets have been used, which are the Dataset B and Dataset A from the BCI Competition of 2008 . These datasets are referred to in the rest of the paper as DS1 and DS2.

  • DS1 : This dataset consists of labeled EEG data from 9 subjects collected through three channels C3, C4 and Cz. The data capture procedure consists of a mental reproduction of two gestures, one of the left hand (class 1) and the other of the right hand (class 2). Each subject accomplished five sessions of 120 trials [23].

  • DS2 : This dataset contain labeled EEG data collected with a 22-electrode helmet from 9 subjects. Every subject participated in two data capture sessions where they proceeded to a mental reproduction of left hand (class 1), right hand (class 2), both feet (class 3) and tongue movement (class 4) [9].

    Every single session is composed of 288 trials. In the main experiment of the present paper we have focused on the hand’s motor imagination. Therefore, only the data for the two classes: left hand and right hand are considered in the experiment of this section, which in turn reduces the number of trials per session to 144 trials.

The fundamental objective of the proposed model is to process successfully the data in a raw format, hence no pre-processing method has been applied.

In order to evaluate the model, we have to split the datasets in two parts, one to train and design the model (train) and the other is to evaluate the performance of the model on new data not seen during training (test). For DS1 the 3 first sessions were used for training and last 2 for testing while in the case of DS2 the first session was used for training and the last one for testing. Table 1 shows a summary of both datasets used in this experiment in order to evaluate the proposed method.

Table 1 Summary of datasets

4.2 Results and discussion

The results of the experiments described above are presented in four tables one for each database / classifier combination, Table 2 for DS1 with LR, Table 3 for DS1 with SVM, Table 4 for DS2 with LR and Table 5 for DS2 with SVM. The best results in all cases are shown in boldface. The M-TSC methods are indicated by LR M-TSC1 and LR M-TSC2 when using LR classifier and SVM M-TSC1 and SVM M-TSC2 when using SVM. All these results show an evident advantage of M-TSC methods over TSC with respect to the both datasets in the case of all subjects except one.

Table 2 Accuracy (%) for TSC, LR M-TSC1 and LR M-TSC2 experiment with DS1
Table 3 Accuracy (%) for TSC, SVM M-TSC1 and SVM M-TSC2 experiment with DS1
Table 4 Accuracy (%) for TSC, LR M-TSC1 and LR M-TSC2 experiment with DS2
Table 5 Accuracy (%) for TSC, SVM M-TSC1 and SVM M-TSC2 experiment with DS2

Table 2 shows a clear advantages specially in the case of subject 5, in which M-TSC1 achieve an accuracy of 81.66%, which is 8.95% better than the TSC result. By analyzing the table in more details, M-TSC1 is the best for 4 subjects (4, 5, 7, 9) and the M-TSC2 is the best for the remaining 5 subjects. On average, M-TSC2 provides the best results with an improvement of 2.19% with respect to TSC, while M-TSC1 provides an improvement of 2.06%.

In Table 3, Where SVM is used as a classifier, M-TSC shows a remarkable advantage specially for subject 2 and subject 5. In the case of subject 2, there was an improvement of 5% with the M-TSC2 method, reaching a hit rate of 58.86%. Whereas for subject 5, the M-TSC1 method attained a hit rate of 81.87%, corresponding to an improvement of 9.16%. A deeper analysis of the table show that, M-TSC2 provided better results for 6 subjects (1, 2, 4, 6, 7, 8) while achieving an average improvement of 2.31%. On the other hand, the M-TSC1 method obtained the highest average for the remaining 3 subjects and a negligible deterioration of 0.63% for subject 7.

With regard to Table 4, it shows a clear advantage for the M-TSC methods in 8 of the 9 subjects. M-TSC2 provides an important improvement with respect to subject 4 showing an improvement of 5.56%. By looking deeper into the table, M-TSC1 wins for subjects 1 and 2 with improvements of 2.09% and 4.16% compared to TSC, respectively, and M-TSC2 is the best for subjects 3, 4, 5, 7 and 9 improving the accuracy by 4.86%, 5.56%, 2.08%, 4.86% and 4.17% respectively over counterparts in TSC . For the remaining subject (8) the three models tie with an accuracy of 97.22%. It is important to mention that TSC and M-TSC2 tie for subject 2 and M-TSC1 and M-TSC2 tie for subject 6, which demonstrate that although there is no improvement, at least they do not present worst results than the original method. On average, M-TSC2 is the best model with an improvement of 3.01% with respect to TSC while M-TSC1 provides an improvement of 2.24%.

Finally, the Table 5 summarizes the results obtained using the SVM classifier with DS2 database. It can be noticed that in this case M-TSC1 outperforms M-TSC2 for 5 subjects (1, 2, 6, 7, 9). In reference to subject 6, both methods obtained the same result with a 4.7% improvement in comparison to TSC result. In the case of subject 8, all methods tied at the same accuracy 97.22%. It is interesting to mention that the most significant improvements is obtained for subject 4.

These results clearly show that both M-TSC methods provide higher performance than TSC with a slight increase of computational efforts, due to the higher number of features provided to the classifier, what demonstrate the interest of using multiple tangent spaces projections approach to improve the classification accuracy on EEG BCI problems.

M-TSC offers several benefits, including the capacity to create class-dependent and more discriminative features, which lead to a robust and precise pattern recognition and prediction model. This method can learn directly from raw data without requiring prior feature selection, which makes it even more versatile. Furthermore, its simplicity and low computational cost in training and operation make it cost-effective for practical use.

Despite its advantages, M-TSC also has some drawbacks. The primary limitation of this method is that it is subject-dependent. This means that the models are tailored to a specific individual, making them less transferable to new subjects. Therefore, it is necessary to collect data and train the model for each subject. Another limitation is the need for regularization methods to prevent overfitting, due to the large number of features and small number of samples.

5 Conclusions and further work

Riemannian geometry is a new EEG classification framework that opened new research lined in the field of brain computer interface. This paper introduced a new EEG classifier for BCI application based on the idea of projecting the data from the riemannian manifold of trial EEG signal covariance matrices to multiple tangent space. Every projection provide to the classifier a different view of the data on the manifold allowing it to create a more discriminative model. This paper provides experimental evidence supporting the advantage of multiple tangent space projection classifier over single projection for two-class motor imagery problems. Obviously, the cost to pay is a moderate computational load compared to the reference model.

There are several directions in which this work can be extended. In particular, work is undergoing to study the usefulness of multiple tangent space projection on multi subjects model and multi class problems. The authors are also considering the adaptation of this idea to online adaptive training settings to make it more suitable for real environment applications.