A Brain-Computer Interface (BCI) enables a user to communicate or control a computer by means of pure brain activity without the need for muscle control. Its primary field of application is to help people who have lost voluntary muscle control due to diseases or traumatic injuries. While BCIs can be used for rehabilitation after stroke, they are most prominently used as a communication device for patients suffering from locked-in syndrome. The locked-in syndrome can be caused by different neurodegenerative diseases (like amyotrophic lateral sclerosis), brainstem stroke or traumatic brain injuries. The locked-in state describes a condition in which the patient is aware and awake but paralysed and therefore unable to move or to communciate verbally or by any other means of muscle activity. A BCI can enable such patients to communicate or to control a computer and interact with their environment [1].

The basic principle of a BCI relies on the user being able to voluntarily alter his brain activity. These changes in the recorded brain activity can be detected and used as an input signal for a computer. There are different signal acquisition techniques that allow to measure the brain activity of a user.

While electro-encephalography (EEG) is the most popular non-invasive method, we concentrate on data recorded by magneto-encephalography (MEG) in this paper. Typically MEG is associated with higher costs, which may be the reason for seldomly being used, but it also provides a higher spatial resolution (due to the larger amount of sensors) and more information in the higher frequency range above 40 Hz. While it has been shown to work well with BCI [2], it still suffers from the same problem as EEG-based BCIs, namely the non-stationarity of the recorded signals.

Reasons for non-stationarity include changes in the mental state over time (increasing fatigue or losing concentration), the transfer from training without feedback to online usage with feedback or head movements in the MEG, which cause the generating brain areas to be under a different sensor. These non-stationaries especially are a problem when a classifier trained on data of a previous session is used for classification in a current session, which is often referred to as the session-transfer problem. From the machine learning point of view this phenomenon is termed covariate shift and describes the fact that the training data follows a different distribution than the test data [3].

There have been different ways how to approach the problem of non-stationarity by adaptive classification [4, 5], adaptive spatial filters [6, 7] or using covariate shift adaption [811]. In this paper we propose a method that first uses Principal Component Analysis to extract non-stationaries and then minimize the effect of these non-stationaries by covariate shift adaption of the principal components. We present results from offline data and prove the benefit of this method in an online experiment with 10 subjects.


Covariate shift adaption by normalized principal components

In this section we propose a method for covariate shift adaption which uses Principal Component Analysis (PCA) [12] to extract the most important principal components and normalizes these components by shifting a window over the data to reduce the effect of non-stationarity. The normalization is similar to the method described in [8, 9], but normalizes each feature individually instead of normalizing a linear combination of all features.

PCA is a method that uses an orthogonal transformation to convert a set of possibly correlated variables into a set of uncorrelated variables. These uncorrelated variables are called principal components and are sorted by the amount of variance that the principal components account for in the original data. The first principal component accounts for the highest proportion of variance in the original data.

The proposed method is applied after feature extraction when the power spectrum for each channel has been estimated. When having n trials of training data, the dataset consists of a matrix D with dimension n×p, with the number of features p=(channels·bins) and D(i,j) being the value for trial i and feature j. For the covariate shift adaption, first a PCA is applied for dimensionality reduction and extraction of non-stationary components. As a next step the m principal components with the highest variance are selected, resulting in a p×m transformation matrix W and a matrix P=D·W that represents the m principal components. For the data presented here m = 100 was used, which seems to be a robust value giving good results. As a next step a rectangular window of length w is defined, which is shifted over the data and the value of each P(i,j)is normalized by the preceeding w trials with

P ̂ ( ij ) = P ( ij ) mean ( P ( i w , j ) , , P ( i 1 , j ) )

For all P ̂ ( ij ) with iw the window ( P ( 1 , j ) ,, P ( w , j ) ) is used. We also experimented with different types of windows, e.g. a half hanning window, but found the rectangular window to give best results.

When using the method online, the last w trials (P tw,…,P t−1) are kept in a buffer B and the principal components for a new trial D t are calculated by P t = D t ·W and P t is normalized by the mean of the buffer

P ̂ ( tj ) = P ( tj ) mean ( B ( 1 , j ) , , B ( w , j ) )

Then P t is added to the end of B and the first trial in B is removed, to keep the latest w trials in B. P ̂ t can then be used for classification.

Covariate shift adaption methods

In the following, we give an overview of the different methods for covariate shift adaption that are tested in this paper. The covariate shift adaption methods were applied after the signal processing, which will be explained in the next section.

Satti et al.: in [13] a method for covariate shift adaption was proposed, that uses a polynomial function for estimating the covariate shift of the next trial and adapt the data accordingly. In the following we used a polynomial of order 3 like stated in [13] and used the previous w trials to fit the polynomial to the data.

baseline: as a reference method we use results without covariate shift adaption.

pcanorm: this is the method for covariate shift adaption by normalized principal components as proposed in this paper in the previous section.

pcapoly: with this method we propose a slightly different approach than the one presented in the previous section. Also a PCA is used, but instead of shifting a window over the data and normalizing by the last w trials, a method similar to [13] is used: a polynomial is fitted to the content of the window and the next trial is adjusted by the value, the polynomial function would estimate. Again a polynomial of order 3 is used.

pcaonly: for this method, PCA was used for extraction of non-stationaries without any further covariate shift adaption.

Although different w were tested, for preparing the results w was kept constant at w=15, to provide a fair comparison between the methods.

Offline analysis


To evaluate the advantage of different covariate shift adaption methods we performed an offline analysis on data recorded for another study [14]. In this study 10 subjects performed motor imagery of right hand movement and a subtraction task. In the subtraction task the subject had to do subtractions by choosing a random number (around 100) and subtract 7. The result should not be communicated but it should be continued by subtracting 7 from the result and doing this all over until the end of a trial. Two sessions were recorded on different days with 51 trials per task and session. Recording was done with a 275-channel whole-head MEG-system (VSM MedTech Ltd.) at a sampling rate of 586 Hz. Each trial lasted 4.05 seconds with about 6 seconds of break between the trials. Instructions were given on a screen and a fixation cross was displayed during trials to minimize eye movement.

Signal processing and classification

The signals were filtered and resampled to 200 Hz. For spatial filtering a small Laplacian derivation was applied. To reduce the number of channels, we only used the 185 inner channels, which should also reduce the influence of possible artifacts, which are most prominent on the outer channels. After the preprocessing the power spectrum was estimated by an autoregressive model computed with the Burg Method, as it was used in a previous MEG-BCI [2]. A model order of 16 was used, since we obtained best results with this model order in previous MEG-BCI experiments. We used the frequency range from 1 to 40 Hz with a bin width of 2 Hz. The logarithm function was applied to each value. Before classification we used r2-ranking [15] for feature selection. The number of features was not estimated individually on the training data, which would have introduced overfitting in our experience. Instead a fixed number of 1000 features was used, which gave on average the best results when evaluated by cross-validation. Each feature was normalized to have zero mean and unit variance for the training dataset. The test dataset was scaled according to the mean and standard deviation of the training dataset.

For classification we used LibSVM [16] with C = 1 and a linear kernel. We decided against a parameter estimation by gridsearch and cross-validation because it introduced overfitting in previous experiments.

Offline accuracy evaluation

To evaluate the performance of the pcanorm-method proposed in this paper and to compare it to other previously described covariate shift adaption methods, we trained the classifier after using the respective covariate shift adaption method on session 1 of the data and tested it on session 2, referenced to as S1S2-validation later. This method especially addresses the benefits of the covariate shift adaption methods in context of the session-transfer problem.

For comparison reasons we also performed a 5x10-fold crossvalidation on all data, in which the data was permuted and partitioned into 10 blocks with equal size. In each fold 9 blocks were used for training the classifier (including feature selection and PCA) and tested on the one remaining block. Each block was used for testing once. This procedure was repeated 5 times and the accuracy was averaged over all folds.

While non-stationaries have a great effect in the S1S2-validation, since the test set has an unknown data distribution, this effect should be minimized when using a crossvalidation (CV), because of the data being permuted and the classifier knowing the data distribution from both sessions. Using both validation methods allows for a direct estimation by how much non-stationaries are alleviated by the covariate shift adaption methods. To specifically adress this issue and due to the fact that the proposed method wouldn’t make sense on permutated data, no covariate shift adaption was performed for the crossvalidation. For a fair comparison, we not only used the baseline method but also the baseline method combined with PCA for a dimensionality reduction to the m=100 principal components with the highest variance.Although the number of features used for PCA-based methods differs from the number of features used for the baseline-method, we always used the number of features that gave best results in a cross-validation for the specific method.

Online experiment

To confirm the results from the offline analysis, we integrated the test of the proposed method in an ongoing online experiment with 10 subjects, who had to perform motor imagery and mental subtraction. To explicitly evaluate the covariate shift adaption in context of the session-transfer problem, each subjects should participate in two sessions. In the first session 200 trials training data were recorded. In the second session the classifier was trained on the training data from the first session and the proposed method was tested with online feedback in 200 trials. Each session was seperated into runs with 40 trials and a short break after each run.

Recording was done again with a 275-channel whole-head MEG-system (VSM MedTech Ltd.) at a sampling rate of 586 Hz. During measurement the head position was continously recorded. A Notebook with an Intel Core i7 720QM and 4GB memory running BCI2000 [17] was used for signal acquisition, signal processing, feedback presentation and classification. The design of the paradigm and the corresponding time intervals are shown in Figure Figure 1. During the test phase feedback was given after every trial, which indicated the result of the classifier. Since the online test of the proposed method was integrated during the ongoing experiment, it should be noted that the first 4 subjects (B01,B03,B07,B08) did receive feedback without the covariate shift adaption method and the results of these 4 subjects shown below are from a simulated online experiment. The other 6 subjects received online feedback with the covariate shift adaption method proposed in this paper.

Figure 1
figure 1

(A) training phase without feedback. (B) test phase with feedback.

To test how the performance deterioration during one session is affected by the proposed covariate shift adaption method, we did a linear regression (least squares regression by Matlab’s polyfit function) on the accuracy throughout a session and used the slope of the regression line as a measure for performance deterioration.

To compare the online results with the pcanorm-method to the baseline-method, the baseline-method was applied offline to simulate the online experiment with the same data and same parameters but different covariate shift adaption method.


Offline analysis

The comparison of the crossvalidation results and the results from the S1S2-validation are shown in Table 1. It can be seen that the pcanorm-method proposed in this paper performs significantly better (p<0.005, paired t-test) than the baseline method without covariate shift adaption. The comparision between the results obtained by crossvalidation and the results for the S1S2-validation shows that the baseline method still suffers a significant performance decrease (p<0.05) due to the session-transfer, while there is no significant difference when comparing the proposed method with the crossvalidation results from the baseline method with PCA (p>0.1).

Table 1 Offline classification accuracies for the baseline method and the PCA-normalization

Since robust PCA has been proposed as a better method to model non-stationaries [18], we also investigated if the use of robust PCA would increase the results with the method proposed in this paper. Due to the low amount of trials (102 training trials in the S1S2-validation) it was not possible to calculate robust PCA with k=100 principal components [19]. Since it is also advised to choose a k n 2 , we used robust PCA with k=50 to compare both methods with the first 50 principal components. Since the results deviate very little, they are not shown here in detail, but it should be noted that the proposed method with robust PCA achieved an average accuracy of 90.3 % while the proposed method with PCA achieved an average accuracy of 90.0 %, which is not significantly different (p>0.5, paired t-test). Although robust PCA achieved slightly better results, we continued with the use of PCA due to the limitation in the number of principal components when using robust PCA and a higher computation time.

The results from the comparison of different covariate shift adaption methods can be seen in Table 2. It shows, that the proposed method with a mean accuracy of 92.2 % is superior to all other methods tested. While all covariate shift adaption methods perform significantly better (p<0.05) than the baseline method without covariate shift adaption, the accuracies with the proposed method are significantly higher than with the other two tested covariate shift adaption methods (p<0.005).

Table 2 Accuracies during the online experiment with the proposed method and the baseline method as well as the days between session 1 session 2

Since the number of features differed between the pcanorm-method (100 features) and the baseline-method (1000 features), it should be stated, that the baseline-method with 100 features resulted in an average accuracy of 76.8%. The pcanorm-method with 1000 features is not significantly better than chance level (50 %), since it is highly affected by the curse of dimensionality, due to too many principal components with a variance ≈0.

Online experiment

The results from the online experiment are shown in table 3. With a mean accuracy of 80.9 % the proposed method performs better (p<0.1) than the baseline method with a mean accuracy of 76.4 %. There is no significant correlation between the performance improvement by the proposed covariate shift adaption method and the number of days between the sessions.

Table 3 Offline classification accuracies for the baseline method and the PCA-normalization

The results from the linear regression analysis, in which the slope of the regression line is used as a measure for performance deterioration is not shown in detail. But it is worth mentioning, that the slope averaged over all subjects with the proposed method was −0.06±0.62, while it was −0.32±0.61 for the baseline method without covariate shift adaption. Although there is less performance deterioration with the proposed covariate shift adaption method, the difference is not significant (p>0.1).

Example of different principal components

In Figure 2 four examples of different principal components are given. In Figure 2(A) the raw principal component is shown as well as the linear classifier plane (horizontal line) and the mean of the window with the w=15 preceeding trials. The vertical lines depict the break between two runs. In Figure 2(B) the principal component after the covariate shift adaption as well as the corresponding linear classifier plane is shown. In Figure 2(C) and (D) the frequency range and the topographic distribution of the principal component can be seen. The first three principal components are an example of the proposed method improving classification accuracy by removing temporal fluctuations in the principal components and thereby making the two classes easier separable. The second principal component is an interesting case in which one can also see a topographic shift of activation over time. With an increasing value for this principal component the power in the alpha band increases in central regions, while it decreases in the posterior region. The fourth component is an example for a big non-stationary effect that happened during a longer (than usual) break between two runs.

Figure 2
figure 2

Examples of different principal components. (A) raw principal component (B) normalized principal components (C) average PCA weights per frequency bin (D) topographic distribution


The offline results show that the method proposed in this paper is a useful tool to reduce the effect of non-stationaries and improves classification accuracies for BCI. In the offline analysis it has been shown that the proposed method significantly increases classification performance. It also performs significantly better than the other tested methods.

To validate the offline results and to show that the proposed method can be used in an online BCI, we integrated the proposed method in an online experiment with 10 subjects. Although 2 of the 10 subjects had a noticeable performance decrease, the average performance could be increased by 4.5 % through the use of the PCA-based covariate shift adaption method proposed in this paper. In addition there is less performance deterioration during a session when using the proposed covariate shift adaption method. Since the results in the online experiment are on the verge to significance (p<0.1), more subjects may be needed to show a significant result. With the online experiment we have shown the method to be feasible and computationally efficient enough to be used in an online BCI. While the use of PCA increases the time needed for calibration by some seconds, there is no noticeable increase in the computational cost in the online case and as we have shown, the method is fast enough to be used online even with high-dimensional MEG data.

Although PCA is not specifically designed to extract non-stationaries, the previously shown examples underline the conclusion drawn in [18], that PCA is a useful method to model non-stationaries and may help to understand the underlying processes or neurophysiological changes. By analysing the frequency spectrum and the topographical distriubtion of the principal components, one might be able to draw conclusions about the origin of the non-stationaries and find new methods to alleviate them. Methods like Stationary Subspace Analysis (SSA) [20], that are tailored to decompose multivariate signals in stationary and non-stationary parts, might give even better results. But in our case SSA could not be applied since the number of dimensions was much higher than the number of trials.

Since changes in alpha power are known to reflect changing levels of fatigue [21], this might explain some of the patterns that are shown in Figure 2. Especially a pattern that repeats every run or changes constantly over the whole session is likely to be associated with increasing fatigue, since many subjects reported becoming more tired or losing concentration over the course of a session and over the course of a run. When recording with EEG other types of components may arise that for example may reflect changing impedances of the electrodes.


In this paper we have proposed a new method for covariate shift adaption, which is based on Principal Component Analysis to model non-stationaries. We have shown it to significantly increase BCI performance in an offline analysis and an online experiment with 10 subjects. With the online experiment we have also shown the proposed method to be efficient enough to be used in an online BCI.

The proposed covariate shift adaption method is a step towards a more robust BCI. By reducing the effects of non-stationaries it alleviates the session-transfer problem and keeps the performance from deteriorating during a session.