Skip to main content
Log in

Backtransformation: a new representation of data processing chains with a scalar decision function

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Further methods are presented but they are tailored to functional magnetic resonance imaging (fMRI) data.

  2. The respective derivatives are constant for every sample and as such not depending on it.

  3. The notation of data and its components differs from the notation in classification tasks. Here, we look at one data sample \(x^{(0)}\) with its different processing stages \(x^{(l)}\) and the respective changes in each component of the data \({\left( x^{(l)}_{gh}\right) }\). The double index notation is applied to account for different axes in the data as in time series (different sensors and time points) or images.

  4. With \(n_{k+1}:=1\) it holds that \(\frac{\partial F_l}{\partial y^{(l)}}\in \mathbb {R}^{n_l\times n_{l+1}}\) and the dimensions of \(B_l\) are a consequence of the recursion. Another reason for the dimensions of \(B_l\) is that \(B_l\) corresponds to the mapping of \(x^{(l)}\) to the scalar output \(x^{\text {out}}\).

  5. Note that no matrix inversion is required even though one might expect that, because the goal is to find out what the original mapping was doing with the data which sounds like an inverse approach.

  6. A weighted sum of classifiers preserves linearity/differentiability. A majority vote will result in a non-differentiable classifier but when the score is the sum of the voters for the selected class, the resulting function will still be locally linear/differentiable.

  7. http://pyspace.github.io/pyspace/.

  8. Nevertheless, the resulting graphics look reasonable.

  9. A standard extended 10–20 electrode layout has been chosen with 128 electrodes: http://www.brainproducts.com/filedownload.php?path=downloads/actiCAP-128-channel-Standard-2_1201.

References

  • Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. doi:10.1002/wics.101

    Article  Google Scholar 

  • Aksoy S, Haralick RM (2001) Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit Lett 22(5):563–582. doi:10.1016/S0167-8655(00)00112-4

    Article  MATH  Google Scholar 

  • Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller KR (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831

    MathSciNet  MATH  Google Scholar 

  • Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR (2008) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. doi:10.1109/MSP.2008.4408441

    Article  Google Scholar 

  • Blankertz B, Lemm S, Treder M, Haufe S, Müller KR (2011) Single-trial analysis and classification of ERP components—a tutorial. NeuroImage 56(2):814–825. doi:10.1016/j.neuroimage.2010.06.048

    Article  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM. ACM Trans Intell Syst Technol 2(3):1–27. doi:10.1145/1961189.1961199

    Article  Google Scholar 

  • Chen Ch, Härdle W, Unwin A (2008) Handbook of data visualization. Springer Handbooks of Computational Statistics, Springer

    MATH  Google Scholar 

  • Clarke F (1990) Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, Philadelphia. doi:10.1137/1.9781611971309

    Book  MATH  Google Scholar 

  • Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585

    MathSciNet  MATH  Google Scholar 

  • Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. doi:10.1145/2347736.2347755

    Article  Google Scholar 

  • Feess D, Krell MM, Metzen JH (2013) Comparison of sensor selection mechanisms for an ERP-based brain-computer interface. PLoS One 8(7):e67,543. doi:10.1371/journal.pone.0067543

    Article  Google Scholar 

  • Ghaderi F, Straube S (2013) An adaptive and efficient spatial filter for event-related potentials. In: Proceedings of the 21st European signal processing conference (EUSIPCO)

  • Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  • Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F (2014) On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87:96–110. doi:10.1016/j.neuroimage.2013.10.067

    Article  Google Scholar 

  • Johanshahi M, Hallett M (eds) (2003) The Bereitschaftspotential: movement-related cortical potentials. Kluwer Academic/Plenum Publishers, New York

    Google Scholar 

  • Jutten C, Herault J (1991) Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10. doi:10.1016/0165-1684(91)90079-X

    Article  MATH  Google Scholar 

  • Kirchner EA, Kim SK, Straube S, Seeland A, Wöhrle H, Krell MM, Tabie M, Fahle M (2013) On the applicability of brain reading for predictive human–machine interfaces in robotics. PLoS One 8(12):e81,732. doi:10.1371/journal.pone.0081732

    Article  Google Scholar 

  • Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. PhD thesis, University of Bremen, Bremen. http://nbn-resolving.de/urn:nbn:de:gbv:46-00104380-12

  • Krell MM, Wöhrle H (2015) New one-class classifiers based on the origin separation approach. Pattern Recogn Lett 53:93–99. doi:10.1016/j.patrec.2014.11.008

    Article  Google Scholar 

  • Krell MM, Straube S, Seeland A, Wöhrle H, Teiwes J, Metzen JH, Kirchner EA, Kirchner F (2013) pySPACE—a signal processing and classification environment in Python. Front Neuroinform 7(40). doi:10.3389/fninf.2013.00040

  • Krell MM, Tabie M, Wöhrle H, Kirchner EA (2013b) Memory and processing efficient formula for moving variance calculation in EEG and EMG signal processing. In: Proceedings of international congress on neurotechnology, electronics and informatics (NEUROTECHNIX 2013), ScitePress, Vilamoura, Portugal, pp 41–45. doi:10.5220/0004633800410045

  • Krell MM, Feess D, Straube S (2014a) Balanced relative margin machine the missing piece between FDA and SVM classification. Pattern Recogn Lett 41:43–52. doi:10.1016/j.patrec.2013.09.018

    Article  Google Scholar 

  • Krell MM, Straube S, Wöhrle H, Kirchner F (2014b) Generalizing, optimizing, and decoding support vector machine classification. In: ECML/PKDD 2014 PhD session proceedings, Nancy

  • LaConte S, Strother S, Cherkassky V, Anderson J, Hu X (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage 26(2):317–329. doi:10.1016/j.neuroimage.2005.01.048

    Article  Google Scholar 

  • Lagerlund TD, Sharbrough FW, Busacker NE (1997) Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. J Clin Neurophysiol 14(1):73–82

    Article  Google Scholar 

  • Lal TN, Schröder M, Hinterberger T, Weston J, Bogdan M, Birbaumer N, Schölkopf B (2004) Support vector channel selection in BCI. IEEE Eng Med Biol Soc 51(6):1003–1010. doi:10.1109/TBME.2004.827827

    Article  Google Scholar 

  • Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791

    Article  Google Scholar 

  • Lew E, Chavarriaga R, Zhang H, Seeck M, del Millan J (2012) Self-paced movement intention detection from human brain signals: invasive and non-invasive EEG. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3280–3283

  • Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines. Mach Learn 68(3):267–276. doi:10.1007/s10994-007-5018-6

    Article  Google Scholar 

  • Metzen JH, Kirchner EA (2011) Rapid adaptation of brain reading interfaces based on threshold adjustment. In: Proceedings of the 2011 conference of the German classification society (GfKl-2011), Frankfurt, Germany, p 138

  • Mika S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in neural information processing systems 13 (NIPS 2000), MIT Press, pp 591–597

  • Oppenheim AV, Schafer RW (2009) Discrete-time signal processing, 3rd edn. Prentice Hall Press, Upper Saddle River

    MATH  Google Scholar 

  • Platt JC (2000) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, vol 10. MIT Press, Cambridge, pp 61–74

  • Press W (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Rieger J, Kosar K, Lhotska L, Krajca V (2004) Eeg data and data analysis visualization. In: Barreiro J, Martn-Snchez F, Maojo V, Sanz F (eds) Biological and medical data analysis, lecture notes in computer science, vol 3337. Springer, Berlin, pp 39–48. doi:10.1007/978-3-540-30547-7_5

  • Rivet B, Souloumiac A, Attina V, Gibert G (2009) xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Trans Biomed Eng 56(8):2035–2043. doi:10.1109/TBME.2009.2012869

    Article  Google Scholar 

  • Rockafellar RT, Wets RJB (2009) Variational analysis, vol 317. Springer, Berlin, Heidelberg

    MATH  Google Scholar 

  • Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 3642–3649

  • Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. doi:10.1162/089976601750264965

    Article  MATH  Google Scholar 

  • Seeland A, Wöhrle H, Straube S, Kirchner EA (2013) Online movement prediction in a robotic application scenario. In: 6th international IEEE EMBS conference on neural engineering (NER), San Diego, USA, pp 41–44. doi:10.1109/NER.2013.6695866

  • Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88

    Article  MathSciNet  Google Scholar 

  • Steinwart I, Christmann A (2008) Support vector machines. Springer, New York

    MATH  Google Scholar 

  • Straube S, Feess D (2013) Looking at ERPs from another perspective: polynomial feature analysis. Perception 42 ECVP abstract supplement:220

  • Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representations

  • Tabie M, Kirchner EA (2013) EMG onset detection—comparison of different methods for a movement prediction task based on EMG. In: Alvarez S, Solé-Casals J, Fred A, Gamboa H (eds) Proceedings of the 6th international conference on bio-inspired systems and signal processing (BIOSIGNALS-13). SciTePress, Barcelona, Spain, pp 242–247. doi:10.5220/0004250102420247

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  • Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340. doi:10.1109/TSMCB.2010.2053026

    Article  Google Scholar 

  • Verhoeye J, de Wulf R (1999) An image processing chain for land-cover classification using multitemporal ERS-1 data. Photogramm Eng Remote Sens 65(10):1179–1186

    Google Scholar 

  • Woehrle H, Krell MM, Straube S, Kim SK, Kirchner EA, Kirchner F (2015) An adaptive spatial filter for user-independent single trial detection of event-related potentials. IEEE Trans Biomed Eng. doi:10.1109/TBME.2015.2402252

Download references

Acknowledgments

The authors thank David Feess, Marc Tabie, Anett Seeland, Frank Kirchner, Su Kyoung Kim, Hendrik Wöhrle, and Bertold Bongardt for highly valuable discussions and input. This work was supported by the German Federal Ministry of Economics and Technology (BMWi, Grants FKZ 50 RA 1012 and FKZ 50 RA 1011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Michael Krell.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Krell, M.M., Straube, S. Backtransformation: a new representation of data processing chains with a scalar decision function. Adv Data Anal Classif 11, 415–439 (2017). https://doi.org/10.1007/s11634-015-0229-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-015-0229-3

Keywords

Mathematics Subject Classification

Navigation