Skip to main content

Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

  • Conference paper
  • First Online:
Advances in Dynamics, Optimization and Computation (SON 2020)

Abstract

Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron–Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Given a continuous kernel k on a compact domain, Mercer’s theorem allows for a series representation of the form , see, e.g., [5]. In particular, forms an (at most countable) orthonormal system in \( \mathscr {H} \). The Mercer feature space can be constructed by computing eigenfunctions of the operator \( \mathcal {E}_k \) introduced below.

  2. 2.

    For a d-dimensional state space, the polynomial kernel with degree p spans a \( {p+d \atopwithdelims ()p} \)-dimensional feature space [19].

  3. 3.

    For a detailed introduction of covariance and cross-covariance operators, see Sect. 4.

References

  1. Reed, M., Simon, B.: Methods of Mathematical Physics I: Functional Analysis, 2nd edn. Academic Press Inc., Cambridge (1980)

    MATH  Google Scholar 

  2. Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)

    Article  MathSciNet  Google Scholar 

  3. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  4. Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, Berlin (2004)

    Book  Google Scholar 

  5. Steinwart, I., Christmann, A.: Support Vector Machines. Springer, Heidelberg (2008)

    MATH  Google Scholar 

  6. Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of the 18th International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007)

    Google Scholar 

  7. Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)

    Article  Google Scholar 

  8. Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 961–968 (2009)

    Google Scholar 

  9. Grünewälder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., Pontil, M.: Conditional mean embeddings as regressors. In: International Conference on Machine Learning, vol. 5 (2012)

    Google Scholar 

  10. Klebanov, I., Schuster, I., Sullivan, T.J.: A rigorous theory of conditional mean embeddings (2019)

    Google Scholar 

  11. Park, J., Muandet, K.: A measure-theoretic approach to kernel conditional mean embeddings (2020)

    Google Scholar 

  12. Fukumizu, K., Song, L., Gretton, A.: Kernel Bayes’ rule: Bayesian inference with positive definite kernels. J. Mach. Learn. Res. 14, 3753–3783 (2013)

    MathSciNet  MATH  Google Scholar 

  13. Fukumizu, K.: Nonparametric Bayesian inference with kernel mean embedding. In: Peters, G., Matsui, T. (eds.) Modern Methodology and Applications in Spatial-Temporal Modeling (2017)

    Google Scholar 

  14. Klus, S., Schuster, I., Muandet, K.: Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces. J. Nonlinear Sci. 30, 283–315 (2019)

    Article  MathSciNet  Google Scholar 

  15. Klus, S., Husic, B.E., Mollenhauer, M., Noé, F.: Kernel methods for detecting coherent structures in dynamical data. Chaos Interdiscip. J. Nonlinear Sci. 29(12), 123112 (2019)

    Article  MathSciNet  Google Scholar 

  16. Koltai, P., Wu, H., Noé, F., Schütte, C.: Optimal data-driven estimation of generalized Markov state models for non-equilibrium dynamics. Computation 6(1), 22 (2018)

    Article  Google Scholar 

  17. Weidmann, J.: Lineare Operatoren in Hilberträumen, 3rd edn. Teubner, Stuttgart (1976)

    MATH  Google Scholar 

  18. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. John Hopkins University Press, Baltimore (2013)

    MATH  Google Scholar 

  19. Shawe-Taylor, J., Christianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  20. Kato, T.: Perturbation Theory for Linear Operators. Springer, Berlin (1980)

    MATH  Google Scholar 

  21. Eubank, R., Hsing, T.: Theoretical Foundations of Functional Data Analysis with an Introduction to Linear Operators, 1st edn. Wiley, New York (2015)

    MATH  Google Scholar 

  22. Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Berlin (1996)

    Book  Google Scholar 

  23. Baker, C.: Joint measures and cross-covariance operators. Trans. Am. Math. Soc. 186, 273–289 (1973)

    Article  MathSciNet  Google Scholar 

  24. Lever, G., Shawe-Taylor, J., Stafford, R., Szepesvári, C.: Compressed conditional mean embeddings for model-based reinforcement learning. In: Association for the Advancement of Artificial Intelligence (AAAI), pp. 1779–1787 (2016)

    Google Scholar 

  25. Stafford, R., Shawe-Taylor, J.: ACCME: actively compressed conditional mean embeddings for model-based reinforcement learning. In: European Workshop on Reinforcement Learning 14 (2018)

    Google Scholar 

  26. Gebhardt, G.H.W., Daun, K., Schnaubelt, M., Neumann, G.: Learning robust policies for object manipulation with robot swarms. In: IEEE International Conference on Robotics and Automation (2018)

    Google Scholar 

  27. Schuster, I., Mollenhauer, M., Klus, S., Muandet, K.: Kernel conditional density operators. In: The 23rd International Conference on Artificial Intelligence and Statistics (2020, accepted for publication)

    Google Scholar 

  28. Lasota, A., Mackey, M.C.: Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Applied Mathematical Sciences, vol. 97, 2nd edn. Springer, Heidelberg (1994)

    Book  Google Scholar 

  29. Mezić, I.: Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41(1), 309–325 (2005)

    Article  MathSciNet  Google Scholar 

  30. Klus, S., Nüske, F., Koltai, P., Wu, H., Kevrekidis, I., Schütte, C., Noé, F.: Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018)

    Article  MathSciNet  Google Scholar 

  31. Melzer, T., Reiter, M., Bischof, H.: Nonlinear feature extraction using generalized canonical correlation analysis. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks – ICANN 2001, pp. 353–360. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  32. Froyland, G., Padberg-Gehle, K.: Almost-invariant and finite-time coherent sets: directionality, duration, and diffusion. In: Bahsoun, W., Bose, C., Froyland, G. (eds.) Ergodic Theory, Open Dynamics, and Coherent Structures, pp. 171–216. Springer, New York (2014)

    Chapter  Google Scholar 

Download references

Acknowledgements

M. M., S. K., and C. S were funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 (Scaling Cascades in Complex Systems, project ID: 235221301) and through Germany’s Excellence Strategy (MATH+: The Berlin Mathematics Research Center, EXC-2046/1, project ID: 390685689). We would like to thank Ilja Klebanov for proofreading the manuscript and valuable suggestions for improvements.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Klus .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Proof of Block SVD

Proof

(Lemma 2). Let A admit the SVD given in (2). Then by the definition of T, we have

$$\begin{aligned} T (\pm u_i , v_i) = (A v_i , A^* u_i) = \pm \sigma _i (\pm u_i, v_i) \end{aligned}$$

for all \(i \in I\). For any element \((f,h) \in {{\,\mathrm{span}\,}}\{(\pm u_i , v_i)\}_{i \in I}^{\perp }\), we can immediately deduce

$$\begin{aligned} 0 = \left\langle (f,h),\, (\pm u_i,v_i) \right\rangle _{\oplus } = \pm \left\langle f,\, u_i \right\rangle _F + \left\langle h,\, v_i \right\rangle _H \end{aligned}$$

for all \(i \in I\) and hence \(f \in {{\,\mathrm{span}\,}}\{u_i \}_{i \in I}^\perp \) and \(h \in {{\,\mathrm{span}\,}}\{v_i \}_{i \in I}^\perp \). Using the SVD of A in (2), we therefore have

It now remains to show that \(\left\{ \tfrac{1}{\sqrt{2}} (\pm u_i,v_i) \right\} _{i \in I}\) is an orthonormal system in \(F \oplus H\), which is clear since \(\left\langle (\pm u_i,v_i),\, ( \pm u_j,v_j) \right\rangle _{\oplus } = 2\,\delta _{ij}\) and \(\left\langle (-u_i,v_i),\, (u_j,v_j) \right\rangle _{\oplus } = 0\) for all \(i,j \in I\). Concluding, T has the form (3) as claimed.    \(\square \)

1.2 A.2 Derivation of the Empirical CCA Operator

The claim follows directly when we can show the identity

and its analogue for the feature map \(\Psi \). Let be the eigendecomposition of the Gramian. We know that in this case we have the SVD of the operator \(\Phi \Phi ^\top = \sum _{i \in I} \lambda _i (\lambda _i^{-1/2}\Phi u_i) \otimes (\lambda _i^{-1/2} \Phi u_i)\), since

We will write this operator SVD for simplicity as \(\Phi \Phi ^\top = (\Phi U \Lambda ^{-1/2}) \Lambda (\Lambda ^{-1/2} U \Phi ^\top )\) with an abuse of notation. Note that we can express the inverted operator square root elegantly in this form as \((\Phi \Phi ^\top )^{-1/2} = (\Phi U \Lambda ^{-1/2}) \Lambda ^{-1/2} (\Lambda ^{-1/2} U \Phi ^\top ) = (\Phi U) \Lambda ^{-3/2} (U \Phi ^\top ) \). Therefore, we immediately get

which proves the claim. In the regularized case, all operations work the same with an additional \(\epsilon \)-shift of the eigenvalues, i.e., the matrix \(\Lambda \) is replaced with the regularized version \(\Lambda + \epsilon \mathrm {I}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mollenhauer, M., Schuster, I., Klus, S., Schütte, C. (2020). Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-51264-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-51263-7

  • Online ISBN: 978-3-030-51264-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics