Principal component based covariate shift adaption to reduce non-stationarity in a MEG-based brain-computer interface
- First Online:
- 2.6k Downloads
One of the biggest problems in today’s BCI research is the non-stationarity of the recorded signals. This non-stationarity can cause the BCI performance to deteriorate over time or drop significantly when transferring data from one session to another. To reduce the effect of non-stationaries, we propose a new method for covariate shift adaption that is based on Principal Component Analysis to extract non-stationaries and alleviate them. We show the proposed method to significantly increase BCI performance for an MEG-based BCI in an offline analysis as well as an online experiment with 10 subjects. We also show the method to be superior to other covariate shift adaption methods and present examples of identified non-stationaries to show the effect of the proposed method.
KeywordsBCI Non-stationarity Covariate shift adaption PCA
A Brain-Computer Interface (BCI) enables a user to communicate or control a computer by means of pure brain activity without the need for muscle control. Its primary field of application is to help people who have lost voluntary muscle control due to diseases or traumatic injuries. While BCIs can be used for rehabilitation after stroke, they are most prominently used as a communication device for patients suffering from locked-in syndrome. The locked-in syndrome can be caused by different neurodegenerative diseases (like amyotrophic lateral sclerosis), brainstem stroke or traumatic brain injuries. The locked-in state describes a condition in which the patient is aware and awake but paralysed and therefore unable to move or to communciate verbally or by any other means of muscle activity. A BCI can enable such patients to communicate or to control a computer and interact with their environment .
The basic principle of a BCI relies on the user being able to voluntarily alter his brain activity. These changes in the recorded brain activity can be detected and used as an input signal for a computer. There are different signal acquisition techniques that allow to measure the brain activity of a user.
While electro-encephalography (EEG) is the most popular non-invasive method, we concentrate on data recorded by magneto-encephalography (MEG) in this paper. Typically MEG is associated with higher costs, which may be the reason for seldomly being used, but it also provides a higher spatial resolution (due to the larger amount of sensors) and more information in the higher frequency range above 40 Hz. While it has been shown to work well with BCI , it still suffers from the same problem as EEG-based BCIs, namely the non-stationarity of the recorded signals.
Reasons for non-stationarity include changes in the mental state over time (increasing fatigue or losing concentration), the transfer from training without feedback to online usage with feedback or head movements in the MEG, which cause the generating brain areas to be under a different sensor. These non-stationaries especially are a problem when a classifier trained on data of a previous session is used for classification in a current session, which is often referred to as the session-transfer problem. From the machine learning point of view this phenomenon is termed covariate shift and describes the fact that the training data follows a different distribution than the test data .
There have been different ways how to approach the problem of non-stationarity by adaptive classification [4, 5], adaptive spatial filters [6, 7] or using covariate shift adaption [8, 9, 10, 11]. In this paper we propose a method that first uses Principal Component Analysis to extract non-stationaries and then minimize the effect of these non-stationaries by covariate shift adaption of the principal components. We present results from offline data and prove the benefit of this method in an online experiment with 10 subjects.
Covariate shift adaption by normalized principal components
In this section we propose a method for covariate shift adaption which uses Principal Component Analysis (PCA)  to extract the most important principal components and normalizes these components by shifting a window over the data to reduce the effect of non-stationarity. The normalization is similar to the method described in [8, 9], but normalizes each feature individually instead of normalizing a linear combination of all features.
PCA is a method that uses an orthogonal transformation to convert a set of possibly correlated variables into a set of uncorrelated variables. These uncorrelated variables are called principal components and are sorted by the amount of variance that the principal components account for in the original data. The first principal component accounts for the highest proportion of variance in the original data.
For all with i≤w the window is used. We also experimented with different types of windows, e.g. a half hanning window, but found the rectangular window to give best results.
Then Ptis added to the end of B and the first trial in B is removed, to keep the latest w trials in B. can then be used for classification.
Covariate shift adaption methods
In the following, we give an overview of the different methods for covariate shift adaption that are tested in this paper. The covariate shift adaption methods were applied after the signal processing, which will be explained in the next section.
Satti et al.: in  a method for covariate shift adaption was proposed, that uses a polynomial function for estimating the covariate shift of the next trial and adapt the data accordingly. In the following we used a polynomial of order 3 like stated in  and used the previous w trials to fit the polynomial to the data.
baseline: as a reference method we use results without covariate shift adaption.
pcanorm: this is the method for covariate shift adaption by normalized principal components as proposed in this paper in the previous section.
pcapoly: with this method we propose a slightly different approach than the one presented in the previous section. Also a PCA is used, but instead of shifting a window over the data and normalizing by the last w trials, a method similar to  is used: a polynomial is fitted to the content of the window and the next trial is adjusted by the value, the polynomial function would estimate. Again a polynomial of order 3 is used.
pcaonly: for this method, PCA was used for extraction of non-stationaries without any further covariate shift adaption.
Although different w were tested, for preparing the results w was kept constant at w=15, to provide a fair comparison between the methods.
To evaluate the advantage of different covariate shift adaption methods we performed an offline analysis on data recorded for another study . In this study 10 subjects performed motor imagery of right hand movement and a subtraction task. In the subtraction task the subject had to do subtractions by choosing a random number (around 100) and subtract 7. The result should not be communicated but it should be continued by subtracting 7 from the result and doing this all over until the end of a trial. Two sessions were recorded on different days with 51 trials per task and session. Recording was done with a 275-channel whole-head MEG-system (VSM MedTech Ltd.) at a sampling rate of 586 Hz. Each trial lasted 4.05 seconds with about 6 seconds of break between the trials. Instructions were given on a screen and a fixation cross was displayed during trials to minimize eye movement.
Signal processing and classification
The signals were filtered and resampled to 200 Hz. For spatial filtering a small Laplacian derivation was applied. To reduce the number of channels, we only used the 185 inner channels, which should also reduce the influence of possible artifacts, which are most prominent on the outer channels. After the preprocessing the power spectrum was estimated by an autoregressive model computed with the Burg Method, as it was used in a previous MEG-BCI . A model order of 16 was used, since we obtained best results with this model order in previous MEG-BCI experiments. We used the frequency range from 1 to 40 Hz with a bin width of 2 Hz. The logarithm function was applied to each value. Before classification we used r2-ranking  for feature selection. The number of features was not estimated individually on the training data, which would have introduced overfitting in our experience. Instead a fixed number of 1000 features was used, which gave on average the best results when evaluated by cross-validation. Each feature was normalized to have zero mean and unit variance for the training dataset. The test dataset was scaled according to the mean and standard deviation of the training dataset.
For classification we used LibSVM  with C = 1 and a linear kernel. We decided against a parameter estimation by gridsearch and cross-validation because it introduced overfitting in previous experiments.
Offline accuracy evaluation
To evaluate the performance of the pcanorm-method proposed in this paper and to compare it to other previously described covariate shift adaption methods, we trained the classifier after using the respective covariate shift adaption method on session 1 of the data and tested it on session 2, referenced to as S1S2-validation later. This method especially addresses the benefits of the covariate shift adaption methods in context of the session-transfer problem.
For comparison reasons we also performed a 5x10-fold crossvalidation on all data, in which the data was permuted and partitioned into 10 blocks with equal size. In each fold 9 blocks were used for training the classifier (including feature selection and PCA) and tested on the one remaining block. Each block was used for testing once. This procedure was repeated 5 times and the accuracy was averaged over all folds.
While non-stationaries have a great effect in the S1S2-validation, since the test set has an unknown data distribution, this effect should be minimized when using a crossvalidation (CV), because of the data being permuted and the classifier knowing the data distribution from both sessions. Using both validation methods allows for a direct estimation by how much non-stationaries are alleviated by the covariate shift adaption methods. To specifically adress this issue and due to the fact that the proposed method wouldn’t make sense on permutated data, no covariate shift adaption was performed for the crossvalidation. For a fair comparison, we not only used the baseline method but also the baseline method combined with PCA for a dimensionality reduction to the m=100 principal components with the highest variance.Although the number of features used for PCA-based methods differs from the number of features used for the baseline-method, we always used the number of features that gave best results in a cross-validation for the specific method.
To confirm the results from the offline analysis, we integrated the test of the proposed method in an ongoing online experiment with 10 subjects, who had to perform motor imagery and mental subtraction. To explicitly evaluate the covariate shift adaption in context of the session-transfer problem, each subjects should participate in two sessions. In the first session 200 trials training data were recorded. In the second session the classifier was trained on the training data from the first session and the proposed method was tested with online feedback in 200 trials. Each session was seperated into runs with 40 trials and a short break after each run.
To test how the performance deterioration during one session is affected by the proposed covariate shift adaption method, we did a linear regression (least squares regression by Matlab’s polyfit function) on the accuracy throughout a session and used the slope of the regression line as a measure for performance deterioration.
To compare the online results with the pcanorm-method to the baseline-method, the baseline-method was applied offline to simulate the online experiment with the same data and same parameters but different covariate shift adaption method.
Offline classification accuracies for the baseline method and the PCA-normalization
Since robust PCA has been proposed as a better method to model non-stationaries , we also investigated if the use of robust PCA would increase the results with the method proposed in this paper. Due to the low amount of trials (102 training trials in the S1S2-validation) it was not possible to calculate robust PCA with k=100 principal components . Since it is also advised to choose a , we used robust PCA with k=50 to compare both methods with the first 50 principal components. Since the results deviate very little, they are not shown here in detail, but it should be noted that the proposed method with robust PCA achieved an average accuracy of 90.3 % while the proposed method with PCA achieved an average accuracy of 90.0 %, which is not significantly different (p>0.5, paired t-test). Although robust PCA achieved slightly better results, we continued with the use of PCA due to the limitation in the number of principal components when using robust PCA and a higher computation time.
Accuracies during the online experiment with the proposed method and the baseline method as well as the days between session 1 session 2
Satti et al.
Since the number of features differed between the pcanorm-method (100 features) and the baseline-method (1000 features), it should be stated, that the baseline-method with 100 features resulted in an average accuracy of 76.8%. The pcanorm-method with 1000 features is not significantly better than chance level (50 %), since it is highly affected by the curse of dimensionality, due to too many principal components with a variance ≈0.
Offline classification accuracies for the baseline method and the PCA-normalization
The results from the linear regression analysis, in which the slope of the regression line is used as a measure for performance deterioration is not shown in detail. But it is worth mentioning, that the slope averaged over all subjects with the proposed method was −0.06±0.62, while it was −0.32±0.61 for the baseline method without covariate shift adaption. Although there is less performance deterioration with the proposed covariate shift adaption method, the difference is not significant (p>0.1).
Example of different principal components
The offline results show that the method proposed in this paper is a useful tool to reduce the effect of non-stationaries and improves classification accuracies for BCI. In the offline analysis it has been shown that the proposed method significantly increases classification performance. It also performs significantly better than the other tested methods.
To validate the offline results and to show that the proposed method can be used in an online BCI, we integrated the proposed method in an online experiment with 10 subjects. Although 2 of the 10 subjects had a noticeable performance decrease, the average performance could be increased by 4.5 % through the use of the PCA-based covariate shift adaption method proposed in this paper. In addition there is less performance deterioration during a session when using the proposed covariate shift adaption method. Since the results in the online experiment are on the verge to significance (p<0.1), more subjects may be needed to show a significant result. With the online experiment we have shown the method to be feasible and computationally efficient enough to be used in an online BCI. While the use of PCA increases the time needed for calibration by some seconds, there is no noticeable increase in the computational cost in the online case and as we have shown, the method is fast enough to be used online even with high-dimensional MEG data.
Although PCA is not specifically designed to extract non-stationaries, the previously shown examples underline the conclusion drawn in , that PCA is a useful method to model non-stationaries and may help to understand the underlying processes or neurophysiological changes. By analysing the frequency spectrum and the topographical distriubtion of the principal components, one might be able to draw conclusions about the origin of the non-stationaries and find new methods to alleviate them. Methods like Stationary Subspace Analysis (SSA) , that are tailored to decompose multivariate signals in stationary and non-stationary parts, might give even better results. But in our case SSA could not be applied since the number of dimensions was much higher than the number of trials.
Since changes in alpha power are known to reflect changing levels of fatigue , this might explain some of the patterns that are shown in Figure 2. Especially a pattern that repeats every run or changes constantly over the whole session is likely to be associated with increasing fatigue, since many subjects reported becoming more tired or losing concentration over the course of a session and over the course of a run. When recording with EEG other types of components may arise that for example may reflect changing impedances of the electrodes.
In this paper we have proposed a new method for covariate shift adaption, which is based on Principal Component Analysis to model non-stationaries. We have shown it to significantly increase BCI performance in an offline analysis and an online experiment with 10 subjects. With the online experiment we have also shown the proposed method to be efficient enough to be used in an online BCI.
The proposed covariate shift adaption method is a step towards a more robust BCI. By reducing the effects of non-stationaries it alleviates the session-transfer problem and keeps the performance from deteriorating during a session.
This study was partly granted by the German Federal Ministry of Education and Research (BMBF, BFNT F*T, Grant UTü 01 GQ 0831) and the DFG (Grant RO 1030/15-1, KOMEG).
- 6.Tomioka R, Hill J, Blankertz B, Aihara K: Adapting spatial filter methods for nonstationary BCIs. IBIS 2006 (Max-Planck-Gesellschaft, 2006) 65-70.Google Scholar
- 11.Reuderink B, Farquhar J, Poel M: Slow sphering to suppress Non-Stationaries in the EEG. International Journal of Bioelectromagnetism 2011, 13(2):78.Google Scholar
- 13.Satti A, Guan C, Coyle D, Prasad G: A covariance shift minimization method to alleviate Non-Stationary effects for an adaptive Brain-Computer interface. Proceedings of the 20th International Conference on Pattern Recognition 2010, 105-108.Google Scholar
- 14.Bensch M, Mellinger J, Bogdan M, Rosenstiel W: A multiclass BCI using MEG. Proceedings of the 4th Int. Brain-Computer Interface Workshop (Graz, 2008) 191-196.Google Scholar
- 15.Spüler M, Rosenstiel W, Bogdan M: A fast feature selection method for high-dimensional MEG BCI data. Proceedings of the 5th Int. Brain-Computer Interface Conference (Graz, 2011) 24-27.Google Scholar
- 18.Pascual J, Kawanabe M, Vidaurre C: Modelling Non-stationarities in EEG Data with robust principal component analysis. In Hybrid Artificial Intelligent Systems, Lecture Notes in Computer Science, vol. 6679. Springer Berlin /Heidelberg; 2011:51-58.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.