Backtransformation: a new representation of data processing chains with a scalar decision function

  • Mario Michael Krell
  • Sirko Straube
Regular Article


Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.


Affine transformations Function composition Processing chain interpretation Processing chain visualization 

Mathematics Subject Classification

68T30 68N99 68W40 



The authors thank David Feess, Marc Tabie, Anett Seeland, Frank Kirchner, Su Kyoung Kim, Hendrik Wöhrle, and Bertold Bongardt for highly valuable discussions and input. This work was supported by the German Federal Ministry of Economics and Technology (BMWi, Grants FKZ 50 RA 1012 and FKZ 50 RA 1011).


  1. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. doi: 10.1002/wics.101 CrossRefGoogle Scholar
  2. Aksoy S, Haralick RM (2001) Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit Lett 22(5):563–582. doi: 10.1016/S0167-8655(00)00112-4 CrossRefzbMATHGoogle Scholar
  3. Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller KR (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831MathSciNetzbMATHGoogle Scholar
  4. Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR (2008) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. doi: 10.1109/MSP.2008.4408441 CrossRefGoogle Scholar
  5. Blankertz B, Lemm S, Treder M, Haufe S, Müller KR (2011) Single-trial analysis and classification of ERP components—a tutorial. NeuroImage 56(2):814–825. doi: 10.1016/j.neuroimage.2010.06.048 CrossRefGoogle Scholar
  6. Chang CC, Lin CJ (2011) LIBSVM. ACM Trans Intell Syst Technol 2(3):1–27. doi: 10.1145/1961189.1961199 CrossRefGoogle Scholar
  7. Chen Ch, Härdle W, Unwin A (2008) Handbook of data visualization. Springer Handbooks of Computational Statistics, SpringerzbMATHGoogle Scholar
  8. Clarke F (1990) Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, Philadelphia. doi: 10.1137/1.9781611971309 CrossRefzbMATHGoogle Scholar
  9. Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585MathSciNetzbMATHGoogle Scholar
  10. Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. doi: 10.1145/2347736.2347755 CrossRefGoogle Scholar
  11. Feess D, Krell MM, Metzen JH (2013) Comparison of sensor selection mechanisms for an ERP-based brain-computer interface. PLoS One 8(7):e67,543. doi: 10.1371/journal.pone.0067543 CrossRefGoogle Scholar
  12. Ghaderi F, Straube S (2013) An adaptive and efficient spatial filter for event-related potentials. In: Proceedings of the 21st European signal processing conference (EUSIPCO)Google Scholar
  13. Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefzbMATHGoogle Scholar
  14. Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F (2014) On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87:96–110. doi: 10.1016/j.neuroimage.2013.10.067 CrossRefGoogle Scholar
  15. Johanshahi M, Hallett M (eds) (2003) The Bereitschaftspotential: movement-related cortical potentials. Kluwer Academic/Plenum Publishers, New YorkGoogle Scholar
  16. Jutten C, Herault J (1991) Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10. doi: 10.1016/0165-1684(91)90079-X CrossRefzbMATHGoogle Scholar
  17. Kirchner EA, Kim SK, Straube S, Seeland A, Wöhrle H, Krell MM, Tabie M, Fahle M (2013) On the applicability of brain reading for predictive human–machine interfaces in robotics. PLoS One 8(12):e81,732. doi: 10.1371/journal.pone.0081732 CrossRefGoogle Scholar
  18. Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. PhD thesis, University of Bremen, Bremen.
  19. Krell MM, Wöhrle H (2015) New one-class classifiers based on the origin separation approach. Pattern Recogn Lett 53:93–99. doi: 10.1016/j.patrec.2014.11.008 CrossRefGoogle Scholar
  20. Krell MM, Straube S, Seeland A, Wöhrle H, Teiwes J, Metzen JH, Kirchner EA, Kirchner F (2013) pySPACE—a signal processing and classification environment in Python. Front Neuroinform 7(40). doi: 10.3389/fninf.2013.00040
  21. Krell MM, Tabie M, Wöhrle H, Kirchner EA (2013b) Memory and processing efficient formula for moving variance calculation in EEG and EMG signal processing. In: Proceedings of international congress on neurotechnology, electronics and informatics (NEUROTECHNIX 2013), ScitePress, Vilamoura, Portugal, pp 41–45. doi: 10.5220/0004633800410045
  22. Krell MM, Feess D, Straube S (2014a) Balanced relative margin machine the missing piece between FDA and SVM classification. Pattern Recogn Lett 41:43–52. doi: 10.1016/j.patrec.2013.09.018 CrossRefGoogle Scholar
  23. Krell MM, Straube S, Wöhrle H, Kirchner F (2014b) Generalizing, optimizing, and decoding support vector machine classification. In: ECML/PKDD 2014 PhD session proceedings, NancyGoogle Scholar
  24. LaConte S, Strother S, Cherkassky V, Anderson J, Hu X (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage 26(2):317–329. doi: 10.1016/j.neuroimage.2005.01.048 CrossRefGoogle Scholar
  25. Lagerlund TD, Sharbrough FW, Busacker NE (1997) Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. J Clin Neurophysiol 14(1):73–82CrossRefGoogle Scholar
  26. Lal TN, Schröder M, Hinterberger T, Weston J, Bogdan M, Birbaumer N, Schölkopf B (2004) Support vector channel selection in BCI. IEEE Eng Med Biol Soc 51(6):1003–1010. doi: 10.1109/TBME.2004.827827 CrossRefGoogle Scholar
  27. Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learningGoogle Scholar
  28. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi: 10.1109/5.726791 CrossRefGoogle Scholar
  29. Lew E, Chavarriaga R, Zhang H, Seeck M, del Millan J (2012) Self-paced movement intention detection from human brain signals: invasive and non-invasive EEG. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3280–3283Google Scholar
  30. Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines. Mach Learn 68(3):267–276. doi: 10.1007/s10994-007-5018-6 CrossRefGoogle Scholar
  31. Metzen JH, Kirchner EA (2011) Rapid adaptation of brain reading interfaces based on threshold adjustment. In: Proceedings of the 2011 conference of the German classification society (GfKl-2011), Frankfurt, Germany, p 138Google Scholar
  32. Mika S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in neural information processing systems 13 (NIPS 2000), MIT Press, pp 591–597Google Scholar
  33. Oppenheim AV, Schafer RW (2009) Discrete-time signal processing, 3rd edn. Prentice Hall Press, Upper Saddle RiverzbMATHGoogle Scholar
  34. Platt JC (2000) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, vol 10. MIT Press, Cambridge, pp 61–74Google Scholar
  35. Press W (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, CambridgezbMATHGoogle Scholar
  36. Rieger J, Kosar K, Lhotska L, Krajca V (2004) Eeg data and data analysis visualization. In: Barreiro J, Martn-Snchez F, Maojo V, Sanz F (eds) Biological and medical data analysis, lecture notes in computer science, vol 3337. Springer, Berlin, pp 39–48. doi: 10.1007/978-3-540-30547-7_5
  37. Rivet B, Souloumiac A, Attina V, Gibert G (2009) xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Trans Biomed Eng 56(8):2035–2043. doi: 10.1109/TBME.2009.2012869 CrossRefGoogle Scholar
  38. Rockafellar RT, Wets RJB (2009) Variational analysis, vol 317. Springer, Berlin, HeidelbergzbMATHGoogle Scholar
  39. Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 3642–3649Google Scholar
  40. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. doi: 10.1162/089976601750264965 CrossRefzbMATHGoogle Scholar
  41. Seeland A, Wöhrle H, Straube S, Kirchner EA (2013) Online movement prediction in a robotic application scenario. In: 6th international IEEE EMBS conference on neural engineering (NER), San Diego, USA, pp 41–44. doi: 10.1109/NER.2013.6695866
  42. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi: 10.1023/B:STCO.0000035301.49549.88 MathSciNetCrossRefGoogle Scholar
  43. Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkzbMATHGoogle Scholar
  44. Straube S, Feess D (2013) Looking at ERPs from another perspective: polynomial feature analysis. Perception 42 ECVP abstract supplement:220Google Scholar
  45. Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representationsGoogle Scholar
  46. Tabie M, Kirchner EA (2013) EMG onset detection—comparison of different methods for a movement prediction task based on EMG. In: Alvarez S, Solé-Casals J, Fred A, Gamboa H (eds) Proceedings of the 6th international conference on bio-inspired systems and signal processing (BIOSIGNALS-13). SciTePress, Barcelona, Spain, pp 242–247. doi: 10.5220/0004250102420247
  47. Vapnik VN (1995) The nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  48. Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340. doi: 10.1109/TSMCB.2010.2053026 CrossRefGoogle Scholar
  49. Verhoeye J, de Wulf R (1999) An image processing chain for land-cover classification using multitemporal ERS-1 data. Photogramm Eng Remote Sens 65(10):1179–1186Google Scholar
  50. Woehrle H, Krell MM, Straube S, Kim SK, Kirchner EA, Kirchner F (2015) An adaptive spatial filter for user-independent single trial detection of event-related potentials. IEEE Trans Biomed Eng. doi: 10.1109/TBME.2015.2402252

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Robotics Lab, Faculty 3 Mathematics and Computer ScienceUniversity of BremenBremenGermany
  2. 2.DFKI Bremen, Robotics Innovation CenterGerman Research Center for Artificial IntelligenceBremenGermany

Personalised recommendations