Multiple graph fusion based on Riemannian geometry for motor imagery classification

In motor imagery-based brain-computer interfaces (BCIs), the spatial covariance features of electroencephalography (EEG) signals that lie on Riemannian manifolds are used to enhance the classification performance of motor imagery BCIs. However, the problem of subject-specific bandpass frequency selection frequently arises in Riemannian manifold-based methods. In this study, we propose a multiple Riemannian graph fusion (MRGF) model to optimize the subject-specific frequency band for a Riemannian manifold. After constructing multiple Riemannian graphs corresponding to multiple bandpass frequency bands, graph embedding based on bilinear mapping and graph fusion based on mutual information were applied to simultaneously extract the spatial and spectral features of the EEG signals from Riemannian graphs. Furthermore, with a support vector machine (SVM) classifier performed on learned features, we obtained an efficient algorithm, which achieves higher classification performance on various datasets, such as BCI competition IIa and in-house BCI datasets. The proposed methods can also be used in other classification problems with sample data in the form of covariance matrices.


Introduction
The brain-computer interface (BCI) provides nonmuscular communication between human brains and external devices to aid people with motor impairments. It can be used to control a wheelchair, manipulator or mouse cursor on a computer screen, after decoding the electroencephalogram (EEG) signal from the cerebral scalp [1][2][3][4]. Different types of EEG modalities have been used to design multiple BCI systems [5][6][7][8]. Particularly, motor imagery BCI systems record EEG signals by imagining movements of different parts of the body, such as, the right hand, left hand, feet and tongue [9,10]. The spatial and spectral information of motor imagery EEG signals can help identify the movement intention of the body. Therefore, the major challenge in motor imagery BCIs is the efficient extraction of the spatial and spectral features of EEG signals.
During the past decade, many methods to extract spatial and spectral features from motor imagery EEG signals have been proposed. In particular, the spatial covariance matrices of EEG signals are commonly used in feature extraction methods. The two types of feature extraction methods, based on covariance matrices, are the common spatial pattern (CSP)based methods [11,12] and Riemannian manifold-based methods [13,14]. CSP-based methods, such as CSP, filter bankCSP (FBCSP) [15], sub-band CSP (SBCSP) [16], adaptive FBCSP (AFBCSP) [17], and the common sparse spectral-spatial pattern (CSSSP) [18], tend to extract features using by a spatial filter, which can maximize the variance in one class while minimizing the variance in the others. The original CSP obtains the spatial filter under a large bandpass frequency  Hz) that contains alpha and beta waves closely related to motor imagery. CSP strongly depends on the appropriate selection of subject-specific frequency bands. However, a large bandpass frequency band cannot distinguish the contributions of alpha or beta waves. Several CSP extensions have been proposed to address this problem. The FBCSP decomposes a larger frequency band into multiple sub-bands and learns the corresponding spatial filter on multiple sub-bands. In addition, it selects features from multiple sub-bands based on mutual information [15]. The SBCSP first obtains multiple after-filtering EEG signals by applying a Gabor filter to the multiple subfrequency bands and then selecting discriminative features according to the sub-band score fusion techniques [16]. The AFBCSP designs the time-frequency map of the Fisher ratio to adaptively choose subject-specific frequency bands [17]. The CSSSP simultaneously optimizes the spatial filter and finite impulse response filter to learn the spectralspatial features from the EEG signal [18]. Most CSP-based methods calculate the center of covariance matrices using the arithmetic mean. However, covariance matrices, with the symmetric positive definite form, lie in a Riemannian manifold in nature.
Riemannian manifold-based methods, such as Riemannian CSP [13], tangent space linear discriminant analysis (TSLDA) [14], bilinear sub-manifold learning (BSML) [19] and bilinear regularized locality preserving (BRLP) [20], attempt to project EEG signals from Euclidean space into Riemannian manifolds, where the relationship of samples is expressed by the Riemannian distance. Many efficient Riemannian manifold tools, such as the Riemannian mean and tangent space, can be applied to enhance the classification performance of motor imagery. Riemannian CSP recalculated the center of covariance matrices using the Riemannian mean and obtained the spatial filter through solving the joint diagonalization of mean covariance [13]. TSLDA extracted features by mapping the covariance matrices into tangent space, where the distance structure was consistent with the Riemannian manifold and the relationship between points was linear [14]. BSML designed a bilinear mapping framework for dimensionality reduction in covariance matrices. It learned low-dimensionality features by maximally preserving the global structure of the original manifold [19]. In contrast, BRLP is a localitypreserving dimensionality reduction method that attempts to preserve the similarities between vertex pairs on the Riemannian graph into embedding [20]. Although Riemannian manifold-based methods have been proposed to obtain efficient spatial features from EEG signals, they are primarily designed to project EEG signals into one Riemannian manifold corresponding to a large bandpass frequency, without considering frequency band selection in the Riemannian manifold mapping.
To address the issue-faced when using Riemannian manifold-based method, we propose a novel multiple Riemannian graph fusion method to combine multiple Riemannian manifolds corresponding to multiple bandpass frequency bands. As covariance matrices contain the spatial information of EEG signals, the proposed method attempts to obtain more spectral information and merge the spatial and spectral feature extraction into a unique framework. This unique framework is mainly composed of three parts: Riemannian graph construction on multiple frequency bands, graph embedding, and graph fusion. Many related works on graph fusion and motor imagery classification have been recently proposed. In [21], convolutional neural networks and graph convolutional networks were used to extract image-level features and relation-aware features from the images. Deep feature fusion was developed to fuse two types features to enhance the classifier performance. In [22], an adaptive spatiotemporal graph convolutional network was proposed to fully exploit the characteristics of EEG signals in the time domain and channel correlations in the spatial domain. In [23], a clustering based on a residual graph convolutional network was proposed to infer the possibility of a connection between a given node and its neighbors and achieve high clustering performance. However, the above methods fuse graphs on Euclidean space and ignore that the covariance matrices lie on the Riemannian manifold. The contributions of this study are threefold: 1) A novel framework of multiple graph fusion based on Riemannian geometry is proposed to extract the spatial and spectral features in motor imagery BCIs simultaneously. The proposed framework can be considered an extension of Riemannian manifold-based methods. 2) Insightful research on graph processing is proposed.
Our method designs a fusion technique for the parallel processing of multiple graph embeddings. This is a significantly improved version of the traditional graphembedding method.
3) The proposed method can efficiently alleviate the overfitting problem in the processing of motor imagery EEG signals using graph embedding and graph fusion.
The remainder of this paper is organized as follows. In Section 2, we provide more details on the multiple Riemannian graph fusion methods. In Section 3, we present extensive experimental results and discuss the findings. Finally, in Section 4, the conclusions are presented.
In this section, some fundamental concepts of the space of symmetric positive-definite matrices and Riemannian geometry are briefly reviewed. In addition, the multiple Riemannian graph fusion method was proposed to learn the discriminative spectral-spatial features from motor imagery EEG signals.

Riemannian geometry
The spatial covariance matrix of the N-channel EEG signal X ∈ R N×L is represented by where L is the number of sampled points in EEG trial X. The covariance matrix P ∈ R N×N lies in a space of symmetric positive-definite matrices, defined as where S(N) = P ∈ R N×N , P = P T is the space of positive-definite matrices and P(N) The space of symmetric positive-definite matrices endowed with the Riemannian metric is a differentiable Riemannian manifold M [24]. The concepts of Riemannian distance and tangent space play an important role in the application of Riemannian manifolds. Denoted by two symmetric positive-definite matrices P 1 , P 2 ∈ SPD(N), the Riemannian distance is defined as: where || · || F is the Frobenius norm of a matrix, and β i is the i-th real eigenvalue of P 1 −1 P 2 . The Riemannian distance is the minimum length of the curve connecting two points on a Riemannian manifold [25]. It satisfies three fundamental properties of the metric space: positivity, symmetry, and triangle inequality [24]. The tangent space of a Riemannian manifold is a linear space, that can often be used to study the nonlinearity of manifolds. The tangent space T (N) at P is defined as [26] where P is a tangent point, and the upper(·) operator maintains the upper triangular part of the matrix and vectorizes it. The logarithmic mapping operator is denoted by Log P (P i ) = P 1 2 log P − 1 2 P i P − 1 2 P 1 2 . In the neighborhood of P, the Riemannian distance between P and the nearby point P i is almost identical to the Euclidean distance between the corresponding points on tangent space s, s i [14]: However, the neighborhood of P is a vague area. Generally, all samples from the dataset can be considered to be neighbors, whereas the mean of all samples is regarded as the tangent point P. The relationship between the Riemannian manifold and the tangent space is shown in Fig. 1.

Multiple Riemannian graph fusion
The framework of multiple Riemannian graph fusion algorithms is presented in Fig. 2. The overall framework includes a multiple Riemannian graph construction based on multiple frequency bands, multiple graph embedding for dimensionality reduction and graph fusion for feature selection.

Multiple Riemannian graph construction
The selection of an appropriate bandpass frequency band plays an important role in motor imagery classification. In this study, the EEG signal X was first bandpass filtered by three frequency bands-alpha band, beta band and total band, and the frequency components in the alpha and beta bands provided the best discrimination between the left and right-hand movement imagination [27]. In addition, to capture more information, the EEG signal was filtered by a large total frequency band that covered the alpha and beta bands. Three filtered signals X (1) ,X (2) , andX (3) were projected into three subsets of the Riemannian manifold (M (1) , M (2) , M (3) ). To learn the low-dimensional embedding of the Riemannian manifold, we constructed three Riemannian graphs (G l ) corresponding to three subsets on the Riemannian manifold. For each Riemannian graph G l = (V, E), the vertices V comprise all SPD matrices P i in the l-th subset, and the 2) multiple graph embedding for dimensionality reduction; 3) multiple graph fusion via mutual information edges E contain adjacency and weights u ij . The adjacency on G l was designed using k-nearest neighbors with the Riemannian distance. The weight between two adjacent points P i and P j ∈ V is given by: if P i and P j are neighbors, 0 otherwise where d ij = δ R P i , P j and σ is a scaling factor.

Multiple graph embedding
For each Riemannian graph G l , we expect to design a bilinear mapping W ∈ R M×N and W T ∈ R N×M to learn a low-dimensional embedding from a subset of Riemannian manifold. The learned low-dimensional embedding can be expressed as E p = WPW T ∈ SPD(M), where P ∈ R N×N . This embedding is also a Riemannian sub-manifold. The bilinear mapping matrices have many variations with different types of property preservation, such as distance preservation and locality preservation. In this study, we aim to learn bilinear mapping matrices by preserving the distance structure between a high-dimensional manifold and low-dimensional embedding. A reasonable bilinear mapping W, with respect to the minimum distance loss, can be obtained by solving the following objective function: where C is the experimental dataset of matrices in the SPD(N). Eq. (6) can achieve an isometric mapping between the original Riemannian manifold and the lowdimensional sub-manifold. δ R (P i , P j ) represents the Riemannian distance of points (I, j ) on the original Riemannian manifold, and δ R (WP i W T , WP j W T ) is the Rieman-nian distance of the mapped points on the low-dimensional sub-manifold. The mapping matrix, learned using by (6), can best preserve the distance structure between the manifold and its sub-manifold. The solution of (6) is a nonconvex problem that is difficult to solve. In previous works [19], we showed that the optimal mapping W of (6) is equivalent to the solution of the joint diagonalization of the mean covariance in the CSP algorithm. For the two-class classification problem, the solution to (6) is equivalent to the mapping error among the between-class and within-class points. In [19], we proved that the between-class distance can be approximated as the distance between the means of two classes set, particularly when the within-class variance is much smaller than the between-class distance. Therefore, we approximate optimization (6) as the minimum loss of distance between the mean covariance of the two classes. The solution can be obtained by joint diagonalization of the mean covariance.

Graph fusion
After learning multiple low-dimensional distance-preserving embeddings from multiple subsets, we constructed three new Riemannian graphs (G E ) corresponding to three embeddings. The vertices of G E are comprise E p , and the adjacency and weight are calculated using the Riemannian distance between two points on the embedding. Evidently, G E is close to G l . However, multiple graphs include considerable redundant information, which leads to high computational costs and low classification performance. Thus, we propose a multiple graph fusion method to fuse multiple graphs G E into a unique graph, that contains the most discriminative information from multiple embeddings.
In this study, multiple graph fusion refers to the fusion of the corresponding nodes on different graphs. As the SPD matrix form of the node on G E is difficult to merge directly, we proposed vectorization processing for node where E is the Riemannian mean of the embedding. Notably, vectorization processing is a tangent space mapping in (4). Thus, such vectorization processing can maximally preserve the structure of the G E using (7). Next, we used mutual information to fuse the corresponding nodes on different graphs [15]. As shown in (7), a node on the Riemannian graph is represented by a tangent vector. In this study, we regarded the multiple-node fusion problem as an element selection from multiple tangent vectors. Because mutual information can measure arbitrary relations between variables and does not depend on transformations acting on the different variables, we calculated the mutual information of each element and selected the top k elements as the final fused nodes. Assume V (1) , V (2) and V (3) are the node matrices of G E , and G E , corresponding to the EEG signal X. The total matrix is formed as E , and G E , and the j -th row on V is the jth element of the EEG signal. The mutual information of the j -th element can be computed as where H(·) is the entropy calculation [15] and y is the label of the EEG signal X. Finally, we fuse the corresponding node by retaining elements with a high value of mutual information and removing elements with a low value. The nodes of the fusion graph can be regarded as spatial and spectral features for motor imagery classification. The pseudocode of the proposed algorithm is presented in Algorithm 1.

Results and discussion
In this section, to evaluate the effectiveness of the proposed MRGF method, the proposed algorithm was tested on two motor imagery datasets and compared against three competing methods.

Data description
The EEG data used in this study were come from two motor imagery datasets, that is, the BCI competition IV dataset and an in-house dataset. The experimental settings of the two datasets were as follows.

Algorithms evaluated
The MRGF was compared against the following competing algorithms: 1) A shrinkage estimator-based CSP was used to extract highly discriminative spatial features, and an enhanced one versus one structure was used to classify the EEG signals [28].
2) DPLM: Low-dimensional features, learned by distance preserving to local means (DPLM), were used to improve the performance of motor imagery [29]. 3) MEMDBF: Multivariate empirical mode decompositionbased filtering (MEMDBF) was used to classify EEG signals into multiple classes [30]. 4) ESVL: Ensemble support vector learning (ESVL) was used for feature combinations to improve classification performance [31]. 5) LDA+TSSM: The LDA classifier was applied in the tangent space of the submanifold (TSSM) learned by the distance-preserving dimensionality reduction method [19]. 6) Hybrid learning of transductive and inductive models was used to handle non-stationarities in motor imagery classification [32]. 7) FBCSP: The 1 st winner method for BCI competition IV. The recorded EEG signal was band-pass filtered by multiple sub-frequency bands of 4-8 Hz and 8-12 Hz..., 36-40 Hz. Then, the CSP algorithm was used to extract the spatial features from each subband. In addition, discriminative features were selected from spatial features based on mutual information. Finally, the naive Bayes Parzen window was used for classification [15]. 8) CSP+LDA+Bayes: The 2 nd winner method for BCI competition IV. The recorded EEG signal was bandpass filtered at 8-30 Hz. Then, the CSP algorithm was used to extract spatial features, and Fisher LDA was used to select features. Finally, a Bayesian classifier was applied for classification. 9) CSP+SVM: The 3 rd winner method on BCI competition IV. The recorded EEG signal was band-pass filtered at 8-25 Hz. Standard CSP was applied to learn spatial features, and an ensemble support vector machine was used as a classifier to classify the features.

Parameters setting
The dimensions of embedding were set to 10 for the BCI competition dataset and 6 for the in-house dataset based on cross-validation. The number of selected features was set to 25 and 12. SVM is a built-in function of MATLAB, the parameters of the SVM classifier are set as linear kernels, and the penalty factor is set to 1. An analysis of the parameter settings is included in the following section.

Classification results
As the nodes on the fusion graph have capture the spatial and spectral information of the motor imagery EEG signal, we regarded the nodes on the fusion graph as the feature vectors and applied SVM to classify it. To evaluate the classification performance, we tested the MRGF-SVM on the BCI competition and in-house datasets. Table 1 shows the kappa value of the MRGF-SVM and the nine competing algorithms on the BCI competition dataset. The kappa value is commonly adopted to evaluate the classification performance of the four-class problem in dataset IIa of competition IV because the kappa value considers the misclassification of multi-class problems. As shown in Table 1, MRGF-SVM achieved a mean kappa value of 0.616, which is the highest result in Table 1.
More specifically, the MRGF-SVM was significantly higher than 2 nd (p=0.0012) and 3 rd (p=0.00041). There was no significant difference between the performance of the MRGF-SVM method and FBCSP (p = 0.072). However, the value of p is close to 0.05. Furthermore, we compared the classification performance of the MRGF-SVM with the three competing methods on an in-house dataset. Because the in-house motor imagery BCI classifies right and left imagined movements (two-class problem), for simplicity, we used classification accuracy as a performance measure for the in-house dataset. As shown in Table 2, the accuracy of the MRGF-SVM method is higher than that of FBCSP, CSP+LDA+Bayes, and CSP+SVM by 8.4 %, 9.49 % and 10.9 %, respectively. Upon examination, all p<0.05, and the results in Table 2 were statistically significant.
From the comparison of methods in Tables 1 and 2, the high performance of the proposed method might be attributable, in part, to the highly discriminative features learned by MRGF as the SVM classifier is also commonly used in other competing methods.

Discussion of graph structure
The proposed MRGF method constructs three graphs corresponding to three frequency sub-bands from a single dataset and fuses them into one unified graph. To reveal the principle of multiple graph fusion, we analyzed the changes in graph structures during the execution of the MRGF method. The structures of the graph can be expressed using the weight matrix of the graph U. The weight between the ith point and the ith point is calculated by ij 2σ 2 (9) where d ij is the distance of two points.
In addition, to provide more intuitive results (discriminative features), we calculate the distance of each point from two class-related means on a high-dimensional Riemannian graph, a low-dimensional embedding graph, a graph of tangent space, and a fusion graph. In Figs. 5 and 6, the distance from the right-hand mean is regarded as the abscissa, and the distance from the right and mean is regarded as the ordinate. Figures 5 (d) and 6 (d) have the most separability. Figures 5  (b,c) and 6 (b,c) are more separable than those in Figs. 5 (a) and 6(a). These results provide evidence for the higher discriminative graph structure observed in Figs. 3 and 4.

Discussion of parameter influence
Finally, we analyze the influence of the parameters adopted within the MRGF method, such as the frequency of subbands, the number of selected features and the dimension of embedding.  to capture more information, we used a total band of 7-35 Hz as the sub-band frequency. Thus, three sub-bands of μ and β rhythms and the total band are used in the proposed method.
(II) Analysis of selected features In graph fusion processing, we retain features with high mutual information values and remove the low-value features. The key problem that remains is how to determine the number of selected features. Figure 9 shows the mutual information entropy of the features and the ratio of the selected features to the total features. We rank the entropy value of the features from high to low. As shown in Fig. 9, a larger entropy ratio can be obtained when more features are selected. A large entropy ratio indicates that the selected features accurately represents the total features. However, if the number of selected features is too large, it will lead to high computational cost. Consequently, the number of selected features must be determined by achieving a tradeoff between the degree of representation and computational costs. As shown in Fig. 9, we can obtain 25 for the BCI competition dataset and 12 for the in-house dataset.
(III) Selection of dimension of embedding After setting the frequency of the sub-band and the number of selected features, we could determine the dimensions of embedding using a cross-validation procedure. Tables 3 and 4 show the cross-validation results of the BCI competition dataset and in-house dataset, while the dimension of the embedding changes. In Table 3, the highest mean accuracy of 70.11 % is obtained when the embedding dimension is 10. In Table 4, the highest mean Bold entries are our experimental result and they are the best performance of all the methods compared

Conclusions
To extract the spatial and spectral features from EEG signals, we construct multiple Riemannian graphs corresponding to multiple sub-frequency bands and fuse them into a unified graph. Experimental results on the BCI competition and an in-house dataset show that the proposed MRGF can capture discriminative features and lead to high classification performance. The proposed methods can also be applied to many other pattern-recognition problems with input data in the form of SPD matrices.

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.