Skip to main content
Log in

Interactions Between Pre-Processing and Classification Methods for Event-Related-Potential Classification

Best-Practice Guidelines for Brain-Computer Interfacing

  • Original Article
  • Published:
Neuroinformatics Aims and scope Submit manuscript

Abstract

Detecting event related potentials (ERPs) from single trials is critical to the operation of many stimulus-driven brain computer interface (BCI) systems. The low strength of the ERP signal compared to the noise (due to artifacts and BCI irrelevant brain processes) makes this a challenging signal detection problem. Previous work has tended to focus on how best to detect a single ERP type (such as the visual oddball response). However, the underlying ERP detection problem is essentially the same regardless of stimulus modality (e.g. visual or tactile), ERP component (e.g. P300 oddball response, or the error-potential), measurement system or electrode layout. To investigate whether a single ERP detection method might work for a wider range of ERP BCIs we compare detection performance over a large corpus of more than 50 ERP BCI datasets whilst systematically varying the electrode montage, spectral filter, spatial filter and classifier training methods. We identify an interesting interaction between spatial whitening and regularised classification which made detection performance independent of the choice of spectral filter low-pass frequency. Our results show that pipeline consisting of spectral filtering, spatial whitening, and regularised classification gives near maximal performance in all cases. Importantly, this pipeline is simple to implement and completely automatic with no expert feature selection or parameter tuning required. Thus, we recommend this combination as a “best-practice” method for ERP detection problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This is not strictly necessary as the later spectral filter will also remove the DC, however doing it early improves numerical stability and prevents filter ringing artifacts.

  2. Again this is not strictly necessary as the later spatial filtering will also remove the common activation, however doing it early improves numerical stability and prevents filter ringing artifacts during spectral filtering.

  3. The vector of source detection strengths over sensors is called a spatial pattern. Given a matrix of all sources spatial patterns an optimal spatial filter for each source can be computed by taking the pseudo-inverse of this matrix, see (Blankertz et al. 2011).

  4. Indeed, a prototype classifier uses exactly this method to find W, i.e. \(W=\mbox{mean}(X_+) - \mbox{mean}(X_-)\).

  5. As a further aside, a second implication of this observation is that by combining operations in this way, the computational cost of applying the classification pipeline on-line can be considerably reduced, to only d*T floating point operations per epoch.

  6. This limitation was also due in-part to the sparsity of publicly available multi-session datasets.

  7. We neglect the constant bias-term for simplicity.

References

  • Bell, A.J., & Sejnowski, T.J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6), 1129–1159. doi:10.1162/neco.1995.7.6.1129.

    Article  PubMed  CAS  Google Scholar 

  • Birbaumer, N., Kubler, A., Ghanayim, N., Hinterberger, T., Perelmouter, J., Kaiser, J., Iversen, I., Kotchoubey, B., Neumann, N., Flor, H. (2000). The thought translation device (TTD) for completely paralyzed patients. IEEE Transactions on Rehabilitation Engineering, 8(2), 190–193. doi:10.1109/86.847812.

    Article  PubMed  CAS  Google Scholar 

  • Blankertz, B., Muller, K.R., Krusienski, D.J., Schalk, G., Wolpaw, J.R., Schlogl, A., Pfurtscheller, G., Millan, J.R., Schroder, M., Birbaumer, N. (2006). The BCI competition III: validating alternative approaches to actual BCI problems. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2), 153–159. doi:10.1109/TNSRE.2006.875642.

    Article  PubMed  Google Scholar 

  • Blankertz, B., Tomioka, R., Lemm, S., Kawanabe, M., Muller, K.R. (2008). Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Processing Magazine, 25(1), 41–56. doi:10.1109/MSP.2008.4408441.

    Article  Google Scholar 

  • Blankertz, B., Lemm, S., Treder, M., Haufe, S., Müller, K. (2011). Single-trial analysis and classification of ERP components: a tutorial. NeuroImage, 56(2), 814–825. doi:10.1016/j.neuroimage.2010.06.048.

    Article  PubMed  Google Scholar 

  • Bouchard, G., & Triggs, B. (2004). The tradeoff between generative and discriminative classifiers. In 16th IASC international symposium on computational statistics (COMPSTAT ‘04), Prague, Tcheque, Republique (pp. 721–728).

  • Brown, R.G., & Hwang, P.Y.C. (1997). Introduction to random signals and applied Kalman filtering. Wiley, TK5102.5.B696 (Vol. 2). New York: Wiley.

    Google Scholar 

  • Brunner, C., Naeem, M., Leeb, R., Graimann, B., Pfurtscheller, G. (2007). Spatial filtering and selection of optimized components in four class motor imagery EEG data using independent components analysis. Pattern Recognition Letters, 28(8), 957–964. doi:10.1016/j.patrec.2007.01.002.

    Article  Google Scholar 

  • Christoforou, C., Haralick, R., Sajda, P., Parra, L.C. (2010). Second-order bilinear discriminant analysis. Journal of Machine Learning Research, 11, 665–685.

    Google Scholar 

  • Duda, R.O., Hart, P.E., Stork, D.G. (2000). Pattern classification (2nd ed.). Wiley-Interscience.

  • Farquhar, J. (2009). A linear feature space for simultaneous learning of spatio-spectral filters in BCI. Neural Networks, 22(9), 1278–1285. doi:10.1016/j.neunet.2009.06.035.

    Article  PubMed  CAS  Google Scholar 

  • Farwell, L.A., & Donchin, E. (1988). Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and Clinical Neurophysiology, 70(6), 510–523.

    Article  PubMed  CAS  Google Scholar 

  • van Gerven, M., Farquhar, J., Schaefer, R., Vlek, R., Geuze, J., Nijholt, A., Ramsey, N., Haselager, P., Vuurpijl, L., Gielen, S., Desain, P. (2009). The brain–computer interface cycle. Journal of Neural Engineering, 6(4), 041,001. doi:10.1088/1741-2560/6/4/041001.

    Article  Google Scholar 

  • Hill, J., Farquhar, J., Martens, S., Bießmann, F., Schölkopf, B. (2008). Effects of stimulus type and of error-correcting code design on BCI speller performance. In Advances in neural information processing systems 21: 22nd Annual conference on neural information processing systems 2008 (pp. 665–672). Vancouver, BC: Corran.

    Google Scholar 

  • Hill, N.J., & Raths, C. (2007). New BCI approaches: Selective attention to auditory and tactile stimulus streams. In PASCAL workshop on methods of data analysis in computational neuroscience and brain computer interfaces. Berlin: Fraunhofer FIRST.

    Google Scholar 

  • Hill, N.J., Lal, T.N., Bierig, K., Birbaumer, N., Schölkopf, B. (2005). An auditory paradigm for brain-computer interfaces. In L.K. Saul, Y. Weiss, L. Bottou (Eds), Advances in neural information processing systems (Vol. 17, pp. 569–576). Cambridge, MA: MIT Press.

    Google Scholar 

  • Hoffmann, U., Vesin, J., Ebrahimi, T., Diserens, K. (2008). An efficient p300-based brain-computer interface for disabled subjects. Journal of Neuroscience Methods, 167(1), 115–125. doi:10.1016/j.jneumeth.2007.03.005.

    Article  PubMed  Google Scholar 

  • Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626–634. doi:10.1109/72.761722.

    Article  PubMed  CAS  Google Scholar 

  • Hyvarinen, A., & Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4–5), 411–430. doi:10.1016/S0893-6080(00)00026-5.

    Article  PubMed  CAS  Google Scholar 

  • Krusienski, D., Sellers, E., McFarland, D., Vaughan, T., Wolpaw, J. (2008). Toward enhanced p300 speller performance. Journal of Neuroscience Methods, 167(1), 15–21. doi:10.1016/j.jneumeth.2007.07.017.

    Article  PubMed  CAS  Google Scholar 

  • Krusienski, D.J., Sellers, E.W., Cabestaing, F., Bayoudh, S., McFarland, D.J., Vaughan, T.M., Wolpaw, J.R. (2006). A comparison of classification techniques for the p300 speller. Journal of Neural Engineering, 3, 299–305. doi:10.1088/1741-2560/3/4/007.

    Article  PubMed  Google Scholar 

  • Krzanowski, W.J., Jonathan, P., McCarthy, W.V., Thomas, M.R. (1995). Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Journal of the Royal Statistical Society Series C (Applied Statistics), 44(1), 101–115. doi:10.2307/2986198.

    Google Scholar 

  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. doi:10.1016/S0047-259X(03)00096-4.

    Article  Google Scholar 

  • Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of Neural Engineering, 4(2), R1–R13. doi:10.1088/1741-2560/4/2/R01

    Article  Google Scholar 

  • Makeig, S., Bell, A., Jung, T., Sejnowski, T., et al. (1996). Independent component analysis of electroencephalographic data. In D. Touretzky, M. Mozer and M. Hasselmo (Eds.), Advances in Neural Information Processing Systems 8 (pp. 145–151) Cambridge, MA: MIT Press

    Google Scholar 

  • Meier, L., Van De Geer, S., Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1), 53–71. doi:10.1111/j.1467-9868.2007.00627.x.

    Article  Google Scholar 

  • Middendorf, M., McMillan, G., Calhoun, G., Jones, K.S. (2000). Brain-computer interfaces based on the steady-state visual-evoked response. IEEE Transactions on Rehabilitation Engineering, 8(2), 211–214. doi:10.1109/86.847819.

    Article  PubMed  CAS  Google Scholar 

  • Muller, K.R., Anderson, C.W., Birch, G.E. (2003). Linear and nonlinear methods for brain-computer interfaces. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 11(2), 165–169. doi:10.1109/TNSRE.2003.814484.

    Article  PubMed  Google Scholar 

  • Ng, A.Y., & Jordan, M.I. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems (pp. 841–848). Vancouver, BC: MIT Press.

    Google Scholar 

  • Nunez, P.L., & Srinivasan, R. (2005). Electric fields of the brain: The neurophysics of EEG (2nd ed.). USA: Oxford University Press.

    Google Scholar 

  • Nunez, P.L., Silberstein, R.B., Cadusch, P.J., Wijesinghe, R.S., Westdorp, A.F., Srinivasan, R., (1994). A theoretical and experimental study of high resolution EEG based on surface laplacians and cortical imaging. Electroencephalography and Clinical Neurophysiology, 90(1), 40–57.

    Article  PubMed  CAS  Google Scholar 

  • Perrin, F., Pernier, J., Bertrand, O., Echallier, J.F. (1989). Spherical splines for scalp potential and current density mapping. Electroencephalography and Clinical Neurophysiology, 72(2), 184–187.

    Article  PubMed  CAS  Google Scholar 

  • Pfurtscheller, G., & Neuper, C. (2001). Motor imagery and direct brain-computer communication. Proceedings of the IEEE, 89(7), 1123–1134, doi:10.1109/5.939829.

    Article  Google Scholar 

  • Pfurtscheller, G., Brunner, C., Schlogl, A., Lopes da Silva, F. (2006). Mu rhythm (de)synchronization and EEG single-trial classification of different motor imagery tasks. NeuroImage, 31(1), 153–159. doi:10.1016/j.neuroimage.2005.12.003.

    Article  PubMed  CAS  Google Scholar 

  • Ramoser, H., Muller-Gerking, J., Pfurtscheller, G. (2000). Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Transactions on Rehabilitation Engineering, 8(4), 441–446.

    Article  PubMed  CAS  Google Scholar 

  • Schölkopf, B., & Smola, A.J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond (1st ed.). The MIT Press.

  • Selim, A.E., Wahed, M.A., Kadah, Y.M. (2008). Machine learning methodologies in brain-computer interface systems. In IEEE Cairo International Biomedical Engineering Conference (CIBEC) 2008, pp. 1–5. doi:10.1109/CIBEC.2008.4786106.

  • Tomioka, R., Aihara, K., Muller K. (2007). Logistic regression for single trial EEG classification. Advances in Neural Information Processing Systems, 19, 1377–1384.

    Google Scholar 

  • Vicente, M.A., Hoyer, P.O., Hyvarinen, A. (2007). Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 896–900. doi:10.1109/TPAMI.2007.1074.

    Article  Google Scholar 

  • Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M. (2002). Brain-computer interfaces for communication and control. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology, 113(6), 767–791.

    Article  Google Scholar 

  • Ye, J. (2006). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6(1), 483.

    Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. doi:10.1111/j.1467-9868.2005.00503.x.

    Article  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs and the Netherlands Ministry of Education, Culture and Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Farquhar.

Appendix: Feature Rotation Invariance of Quadratically Regularised Linear Classifiers

Appendix: Feature Rotation Invariance of Quadratically Regularised Linear Classifiers

A quadratically regularised linear classifier finds its solution by minimising an objective function with the form,Footnote 7

$$ J(w) = \lambda w^{{\top}} w + \sum\limits_{i=1..N}\mathcal{L}(x_i^{{\top}} w,y_i) $$
(4)

where, \(x_i\in \mathbb{R}^d\) is the ith training example with d features. w ∈ ℝd are linear classifier weights. \(\mathcal{L}\) is the classification loss function, which penalises differences between the classifier predictions (\(x_i^{{\top}} w\)) and the examples true class y i . Depending on the choice of loss function \(\mathcal{L}\) one obtains different classifiers, e.g. for logistic regression \(\mathcal{L}=(1+\exp(-y_i x_i^{{\top}} w))^{-1}\), or a least squares classifier \(\mathcal{L}=(y_i-x_i^{{\top}} w)^2\) (which can be used to implement LDA section “LDA”). \(w^{{\top}} w\) is the quadratic regularisation penalty, which penalises “complex” solutions and λ the relative strength of this penalty.

Taking derivatives with respect to w and setting equal to zero one finds the optimal solution, w *, is given by;

$$ 2 \lambda w^* + \sum\limits_{i=1..N} \mathcal{L}'(x_i^{{\top}} w^*,y_i)x_i = 0, \label{eqn:raw} $$
(5)

where \(\mathcal{L}'\) is the derivative of loss function \(\mathcal{L}\).

If one rotates the features with an arbitrary rotation matrix R such that \(\hat{x} = Rx\) the solution, \(\hat{w}^*\), to this rotated problem is given by;

$$2 \lambda \hat{w}^* + \sum\limits_{i=1..N} \mathcal{L}'(\hat{x}_i^{{\top}}\hat{w}^*,y_i)\hat{x}_i = 0,$$
(6)
$$2 \lambda \hat{w}^* + \sum\limits_{i=1..N} \mathcal{L}'(x_i^{{\top}} R^{{\top}}\hat{w}^*,y_i) Rx_i = 0,$$
(7)
$$2 \lambda R^{{\top}}\hat{w}^* + \sum\limits_{i=1..N} \mathcal{L}'(x_i^{{\top}} R^{{\top}}\hat{w}^*,y_i)x_i = 0, $$
(8)

where we have used the property that the inverse of a rotation is its transpose, i.e. \(R^{{\top}} R=I\). Making the substitution \(R^{{\top}}\hat{w}^*=w^*\) one sees that Eqs. 5 and 8 are identical with the same solution, demonstrating that the only effect of rotation of the features is to rotate the optimal solution in the opposite direction.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farquhar, J., Hill, N.J. Interactions Between Pre-Processing and Classification Methods for Event-Related-Potential Classification. Neuroinform 11, 175–192 (2013). https://doi.org/10.1007/s12021-012-9171-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12021-012-9171-0

Keywords

Navigation