# Backtransformation: a new representation of data processing chains with a scalar decision function

• Regular Article
• Published:

## Abstract

Data processing often transforms a complex signal using a set of different preprocessing algorithms to a single value as the outcome of a final decision function. Still, it is challenging to understand and visualize the interplay between the algorithms performing this transformation. Especially when dimensionality reduction is used, the original data structure (e.g., spatio-temporal information) is hidden from subsequent algorithms. To tackle this problem, we introduce the backtransformation concept suggesting to look at the combination of algorithms as one transformation which maps the original input signal to a single value. Therefore, it takes the derivative of the final decision function and transforms it back through the previous processing steps via backward iteration and the chain rule. The resulting derivative of the composed decision function in the sample of interest represents the complete decision process. Using it for visualizations might improve the understanding of the process. Often, it is possible to construct a feasible processing chain with affine mappings which simplifies the calculation for the backtransformation and the interpretation of the result a lot. In this case, the affine backtransformation provides the complete parameterization of the processing chain. This article introduces the theory, provides implementation guidelines, and presents three application examples.

This is a preview of subscription content, log in via an institution to check access.

## Subscribe and save

Springer+ Basic
EUR 32.99 /Month
• Get 10 units per month
• 1 Unit = 1 Article or 1 Chapter
• Cancel anytime

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

## Notes

1. Further methods are presented but they are tailored to functional magnetic resonance imaging (fMRI) data.

2. The respective derivatives are constant for every sample and as such not depending on it.

3. The notation of data and its components differs from the notation in classification tasks. Here, we look at one data sample $$x^{(0)}$$ with its different processing stages $$x^{(l)}$$ and the respective changes in each component of the data $${\left( x^{(l)}_{gh}\right) }$$. The double index notation is applied to account for different axes in the data as in time series (different sensors and time points) or images.

4. With $$n_{k+1}:=1$$ it holds that $$\frac{\partial F_l}{\partial y^{(l)}}\in \mathbb {R}^{n_l\times n_{l+1}}$$ and the dimensions of $$B_l$$ are a consequence of the recursion. Another reason for the dimensions of $$B_l$$ is that $$B_l$$ corresponds to the mapping of $$x^{(l)}$$ to the scalar output $$x^{\text {out}}$$.

5. Note that no matrix inversion is required even though one might expect that, because the goal is to find out what the original mapping was doing with the data which sounds like an inverse approach.

6. A weighted sum of classifiers preserves linearity/differentiability. A majority vote will result in a non-differentiable classifier but when the score is the sum of the voters for the selected class, the resulting function will still be locally linear/differentiable.

7. Nevertheless, the resulting graphics look reasonable.

## References

• Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459. doi:10.1002/wics.101

• Aksoy S, Haralick RM (2001) Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit Lett 22(5):563–582. doi:10.1016/S0167-8655(00)00112-4

• Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, Müller KR (2010) How to explain individual classification decisions. J Mach Learn Res 11:1803–1831

• Blankertz B, Tomioka R, Lemm S, Kawanabe M, Müller KR (2008) Optimizing spatial filters for robust EEG single-trial analysis. IEEE Signal Process Mag 25(1):41–56. doi:10.1109/MSP.2008.4408441

• Blankertz B, Lemm S, Treder M, Haufe S, Müller KR (2011) Single-trial analysis and classification of ERP components—a tutorial. NeuroImage 56(2):814–825. doi:10.1016/j.neuroimage.2010.06.048

• Chang CC, Lin CJ (2011) LIBSVM. ACM Trans Intell Syst Technol 2(3):1–27. doi:10.1145/1961189.1961199

• Chen Ch, Härdle W, Unwin A (2008) Handbook of data visualization. Springer Handbooks of Computational Statistics, Springer

• Clarke F (1990) Optimization and nonsmooth analysis. Society for Industrial and Applied Mathematics, Philadelphia. doi:10.1137/1.9781611971309

• Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585

• Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87. doi:10.1145/2347736.2347755

• Feess D, Krell MM, Metzen JH (2013) Comparison of sensor selection mechanisms for an ERP-based brain-computer interface. PLoS One 8(7):e67,543. doi:10.1371/journal.pone.0067543

• Ghaderi F, Straube S (2013) An adaptive and efficient spatial filter for event-related potentials. In: Proceedings of the 21st European signal processing conference (EUSIPCO)

• Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics, Philadelphia

• Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F (2014) On the interpretation of weight vectors of linear models in multivariate neuroimaging. NeuroImage 87:96–110. doi:10.1016/j.neuroimage.2013.10.067

• Johanshahi M, Hallett M (eds) (2003) The Bereitschaftspotential: movement-related cortical potentials. Kluwer Academic/Plenum Publishers, New York

• Jutten C, Herault J (1991) Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process 24(1):1–10. doi:10.1016/0165-1684(91)90079-X

• Kirchner EA, Kim SK, Straube S, Seeland A, Wöhrle H, Krell MM, Tabie M, Fahle M (2013) On the applicability of brain reading for predictive human–machine interfaces in robotics. PLoS One 8(12):e81,732. doi:10.1371/journal.pone.0081732

• Krell MM (2015) Generalizing, decoding, and optimizing support vector machine classification. PhD thesis, University of Bremen, Bremen. http://nbn-resolving.de/urn:nbn:de:gbv:46-00104380-12

• Krell MM, Wöhrle H (2015) New one-class classifiers based on the origin separation approach. Pattern Recogn Lett 53:93–99. doi:10.1016/j.patrec.2014.11.008

• Krell MM, Straube S, Seeland A, Wöhrle H, Teiwes J, Metzen JH, Kirchner EA, Kirchner F (2013) pySPACE—a signal processing and classification environment in Python. Front Neuroinform 7(40). doi:10.3389/fninf.2013.00040

• Krell MM, Tabie M, Wöhrle H, Kirchner EA (2013b) Memory and processing efficient formula for moving variance calculation in EEG and EMG signal processing. In: Proceedings of international congress on neurotechnology, electronics and informatics (NEUROTECHNIX 2013), ScitePress, Vilamoura, Portugal, pp 41–45. doi:10.5220/0004633800410045

• Krell MM, Feess D, Straube S (2014a) Balanced relative margin machine the missing piece between FDA and SVM classification. Pattern Recogn Lett 41:43–52. doi:10.1016/j.patrec.2013.09.018

• Krell MM, Straube S, Wöhrle H, Kirchner F (2014b) Generalizing, optimizing, and decoding support vector machine classification. In: ECML/PKDD 2014 PhD session proceedings, Nancy

• LaConte S, Strother S, Cherkassky V, Anderson J, Hu X (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage 26(2):317–329. doi:10.1016/j.neuroimage.2005.01.048

• Lagerlund TD, Sharbrough FW, Busacker NE (1997) Spatial filtering of multichannel electroencephalographic recordings through principal component analysis by singular value decomposition. J Clin Neurophysiol 14(1):73–82

• Lal TN, Schröder M, Hinterberger T, Weston J, Bogdan M, Birbaumer N, Schölkopf B (2004) Support vector channel selection in BCI. IEEE Eng Med Biol Soc 51(6):1003–1010. doi:10.1109/TBME.2004.827827

• Le QV, Ranzato M, Monga R, Devin M, Chen K, Corrado GS, Dean J, Ng AY (2012) Building high-level features using large scale unsupervised learning. In: International conference on machine learning

• LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791

• Lew E, Chavarriaga R, Zhang H, Seeck M, del Millan J (2012) Self-paced movement intention detection from human brain signals: invasive and non-invasive EEG. In: 2012 annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp 3280–3283

• Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines. Mach Learn 68(3):267–276. doi:10.1007/s10994-007-5018-6

• Metzen JH, Kirchner EA (2011) Rapid adaptation of brain reading interfaces based on threshold adjustment. In: Proceedings of the 2011 conference of the German classification society (GfKl-2011), Frankfurt, Germany, p 138

• Mika S, Rätsch G, Müller KR (2001) A mathematical programming approach to the kernel fisher algorithm. In: Advances in neural information processing systems 13 (NIPS 2000), MIT Press, pp 591–597

• Oppenheim AV, Schafer RW (2009) Discrete-time signal processing, 3rd edn. Prentice Hall Press, Upper Saddle River

• Platt JC (2000) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers, vol 10. MIT Press, Cambridge, pp 61–74

• Press W (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge

• Rieger J, Kosar K, Lhotska L, Krajca V (2004) Eeg data and data analysis visualization. In: Barreiro J, Martn-Snchez F, Maojo V, Sanz F (eds) Biological and medical data analysis, lecture notes in computer science, vol 3337. Springer, Berlin, pp 39–48. doi:10.1007/978-3-540-30547-7_5

• Rivet B, Souloumiac A, Attina V, Gibert G (2009) xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Trans Biomed Eng 56(8):2035–2043. doi:10.1109/TBME.2009.2012869

• Rockafellar RT, Wets RJB (2009) Variational analysis, vol 317. Springer, Berlin, Heidelberg

• Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of the 2012 IEEE Conference on computer vision and pattern recognition (CVPR), IEEE Computer Society, pp 3642–3649

• Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471. doi:10.1162/089976601750264965

• Seeland A, Wöhrle H, Straube S, Kirchner EA (2013) Online movement prediction in a robotic application scenario. In: 6th international IEEE EMBS conference on neural engineering (NER), San Diego, USA, pp 41–44. doi:10.1109/NER.2013.6695866

• Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88

• Steinwart I, Christmann A (2008) Support vector machines. Springer, New York

• Straube S, Feess D (2013) Looking at ERPs from another perspective: polynomial feature analysis. Perception 42 ECVP abstract supplement:220

• Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: International conference on learning representations

• Tabie M, Kirchner EA (2013) EMG onset detection—comparison of different methods for a movement prediction task based on EMG. In: Alvarez S, Solé-Casals J, Fred A, Gamboa H (eds) Proceedings of the 6th international conference on bio-inspired systems and signal processing (BIOSIGNALS-13). SciTePress, Barcelona, Spain, pp 242–247. doi:10.5220/0004250102420247

• Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

• Varewyck M, Martens JP (2011) A practical approach to model selection for support vector machines with a Gaussian kernel. IEEE Trans Syst Man Cybern Part B Cybern 41(2):330–340. doi:10.1109/TSMCB.2010.2053026

• Verhoeye J, de Wulf R (1999) An image processing chain for land-cover classification using multitemporal ERS-1 data. Photogramm Eng Remote Sens 65(10):1179–1186

• Woehrle H, Krell MM, Straube S, Kim SK, Kirchner EA, Kirchner F (2015) An adaptive spatial filter for user-independent single trial detection of event-related potentials. IEEE Trans Biomed Eng. doi:10.1109/TBME.2015.2402252

## Acknowledgments

The authors thank David Feess, Marc Tabie, Anett Seeland, Frank Kirchner, Su Kyoung Kim, Hendrik Wöhrle, and Bertold Bongardt for highly valuable discussions and input. This work was supported by the German Federal Ministry of Economics and Technology (BMWi, Grants FKZ 50 RA 1012 and FKZ 50 RA 1011).

## Author information

Authors

### Corresponding author

Correspondence to Mario Michael Krell.

## Rights and permissions

Reprints and permissions

Krell, M.M., Straube, S. Backtransformation: a new representation of data processing chains with a scalar decision function. Adv Data Anal Classif 11, 415–439 (2017). https://doi.org/10.1007/s11634-015-0229-3

• Revised:

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s11634-015-0229-3