Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

Mollenhauer, Mattes; Schuster, Ingmar; Klus, Stefan; Schütte, Christof

doi:10.1007/978-3-030-51264-4_5

Mattes Mollenhauer⁷,
Ingmar Schuster⁸,
Stefan Klus⁷ &
…
Christof Schütte^7,9

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 304))

Included in the following conference series:

Advances in Dynamics, Optimization and Computation: A volume dedicated to Michael Dellnitz on the occasion of his 60th birthday

605 Accesses

Abstract

Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron–Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Given a continuous kernel k on a compact domain, Mercer’s theorem allows for a series representation of the form , see, e.g., [5]. In particular, forms an (at most countable) orthonormal system in $ \mathscr {H} $. The Mercer feature space can be constructed by computing eigenfunctions of the operator $ \mathcal {E}_k $ introduced below.
2.
For a d-dimensional state space, the polynomial kernel with degree p spans a $ {p+d \atopwithdelims ()p} $-dimensional feature space [19].
3.
For a detailed introduction of covariance and cross-covariance operators, see Sect. 4.

References

Reed, M., Simon, B.: Methods of Mathematical Physics I: Functional Analysis, 2nd edn. Academic Press Inc., Cambridge (1980)
MATH Google Scholar
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Article MathSciNet Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, Berlin (2004)
Book Google Scholar
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, Heidelberg (2008)
MATH Google Scholar
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of the 18th International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007)
Google Scholar
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)
Article Google Scholar
Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 961–968 (2009)
Google Scholar
Grünewälder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., Pontil, M.: Conditional mean embeddings as regressors. In: International Conference on Machine Learning, vol. 5 (2012)
Google Scholar
Klebanov, I., Schuster, I., Sullivan, T.J.: A rigorous theory of conditional mean embeddings (2019)
Google Scholar
Park, J., Muandet, K.: A measure-theoretic approach to kernel conditional mean embeddings (2020)
Google Scholar
Fukumizu, K., Song, L., Gretton, A.: Kernel Bayes’ rule: Bayesian inference with positive definite kernels. J. Mach. Learn. Res. 14, 3753–3783 (2013)
MathSciNet MATH Google Scholar
Fukumizu, K.: Nonparametric Bayesian inference with kernel mean embedding. In: Peters, G., Matsui, T. (eds.) Modern Methodology and Applications in Spatial-Temporal Modeling (2017)
Google Scholar
Klus, S., Schuster, I., Muandet, K.: Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces. J. Nonlinear Sci. 30, 283–315 (2019)
Article MathSciNet Google Scholar
Klus, S., Husic, B.E., Mollenhauer, M., Noé, F.: Kernel methods for detecting coherent structures in dynamical data. Chaos Interdiscip. J. Nonlinear Sci. 29(12), 123112 (2019)
Article MathSciNet Google Scholar
Koltai, P., Wu, H., Noé, F., Schütte, C.: Optimal data-driven estimation of generalized Markov state models for non-equilibrium dynamics. Computation 6(1), 22 (2018)
Article Google Scholar
Weidmann, J.: Lineare Operatoren in Hilberträumen, 3rd edn. Teubner, Stuttgart (1976)
MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. John Hopkins University Press, Baltimore (2013)
MATH Google Scholar
Shawe-Taylor, J., Christianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Kato, T.: Perturbation Theory for Linear Operators. Springer, Berlin (1980)
MATH Google Scholar
Eubank, R., Hsing, T.: Theoretical Foundations of Functional Data Analysis with an Introduction to Linear Operators, 1st edn. Wiley, New York (2015)
MATH Google Scholar
Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Berlin (1996)
Book Google Scholar
Baker, C.: Joint measures and cross-covariance operators. Trans. Am. Math. Soc. 186, 273–289 (1973)
Article MathSciNet Google Scholar
Lever, G., Shawe-Taylor, J., Stafford, R., Szepesvári, C.: Compressed conditional mean embeddings for model-based reinforcement learning. In: Association for the Advancement of Artificial Intelligence (AAAI), pp. 1779–1787 (2016)
Google Scholar
Stafford, R., Shawe-Taylor, J.: ACCME: actively compressed conditional mean embeddings for model-based reinforcement learning. In: European Workshop on Reinforcement Learning 14 (2018)
Google Scholar
Gebhardt, G.H.W., Daun, K., Schnaubelt, M., Neumann, G.: Learning robust policies for object manipulation with robot swarms. In: IEEE International Conference on Robotics and Automation (2018)
Google Scholar
Schuster, I., Mollenhauer, M., Klus, S., Muandet, K.: Kernel conditional density operators. In: The 23rd International Conference on Artificial Intelligence and Statistics (2020, accepted for publication)
Google Scholar
Lasota, A., Mackey, M.C.: Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Applied Mathematical Sciences, vol. 97, 2nd edn. Springer, Heidelberg (1994)
Book Google Scholar
Mezić, I.: Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41(1), 309–325 (2005)
Article MathSciNet Google Scholar
Klus, S., Nüske, F., Koltai, P., Wu, H., Kevrekidis, I., Schütte, C., Noé, F.: Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018)
Article MathSciNet Google Scholar
Melzer, T., Reiter, M., Bischof, H.: Nonlinear feature extraction using generalized canonical correlation analysis. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks – ICANN 2001, pp. 353–360. Springer, Heidelberg (2001)
Chapter Google Scholar
Froyland, G., Padberg-Gehle, K.: Almost-invariant and finite-time coherent sets: directionality, duration, and diffusion. In: Bahsoun, W., Bose, C., Froyland, G. (eds.) Ergodic Theory, Open Dynamics, and Coherent Structures, pp. 171–216. Springer, New York (2014)
Chapter Google Scholar

Download references

Acknowledgements

M. M., S. K., and C. S were funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 (Scaling Cascades in Complex Systems, project ID: 235221301) and through Germany’s Excellence Strategy (MATH+: The Berlin Mathematics Research Center, EXC-2046/1, project ID: 390685689). We would like to thank Ilja Klebanov for proofreading the manuscript and valuable suggestions for improvements.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
Mattes Mollenhauer, Stefan Klus & Christof Schütte
Zalando Research, Zalando SE, Berlin, Germany
Ingmar Schuster
Zuse Institute Berlin, Berlin, Germany
Christof Schütte

Authors

Mattes Mollenhauer
View author publications
You can also search for this author in PubMed Google Scholar
Ingmar Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Klus
View author publications
You can also search for this author in PubMed Google Scholar
Christof Schütte
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Klus .

Editor information

Editors and Affiliations

Department of Mathematics, Technical University of Munich, Munich, Germany
Oliver Junge
Computer Science Department, Cinvestav-IPN, Mexico City, Distrito Federal, Mexico
Oliver Schütze
School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia
Gary Froyland
Department of Mathematics, Paderborn University, Paderborn, Germany
Sina Ober-Blöbaum
Institute of Mathematics and its Didactics, Leuphana University Lüneburg, Lüneburg, Germany
Kathrin Padberg-Gehle

A Appendix

1.1 A.1 Proof of Block SVD

Proof

(Lemma 2). Let A admit the SVD given in (2). Then by the definition of T, we have

$$\begin{aligned} T (\pm u_i , v_i) = (A v_i , A^* u_i) = \pm \sigma _i (\pm u_i, v_i) \end{aligned}$$

for all $i \in I$. For any element $(f,h) \in {{\,\mathrm{span}\,}}\{(\pm u_i , v_i)\}_{i \in I}^{\perp }$, we can immediately deduce

$$\begin{aligned} 0 = \left\langle (f,h),\, (\pm u_i,v_i) \right\rangle _{\oplus } = \pm \left\langle f,\, u_i \right\rangle _F + \left\langle h,\, v_i \right\rangle _H \end{aligned}$$

for all $i \in I$ and hence $f \in {{\,\mathrm{span}\,}}\{u_i \}_{i \in I}^\perp $ and $h \in {{\,\mathrm{span}\,}}\{v_i \}_{i \in I}^\perp $. Using the SVD of A in (2), we therefore have

It now remains to show that $\left\{ \tfrac{1}{\sqrt{2}} (\pm u_i,v_i) \right\} _{i \in I}$ is an orthonormal system in $F \oplus H$, which is clear since $\left\langle (\pm u_i,v_i),\, ( \pm u_j,v_j) \right\rangle _{\oplus } = 2\,\delta _{ij}$ and $\left\langle (-u_i,v_i),\, (u_j,v_j) \right\rangle _{\oplus } = 0$ for all $i,j \in I$. Concluding, T has the form (3) as claimed. $\square $

1.2 A.2 Derivation of the Empirical CCA Operator

The claim follows directly when we can show the identity

and its analogue for the feature map $\Psi $. Let be the eigendecomposition of the Gramian. We know that in this case we have the SVD of the operator $\Phi \Phi ^\top = \sum _{i \in I} \lambda _i (\lambda _i^{-1/2}\Phi u_i) \otimes (\lambda _i^{-1/2} \Phi u_i)$, since

We will write this operator SVD for simplicity as $\Phi \Phi ^\top = (\Phi U \Lambda ^{-1/2}) \Lambda (\Lambda ^{-1/2} U \Phi ^\top )$ with an abuse of notation. Note that we can express the inverted operator square root elegantly in this form as $(\Phi \Phi ^\top )^{-1/2} = (\Phi U \Lambda ^{-1/2}) \Lambda ^{-1/2} (\Lambda ^{-1/2} U \Phi ^\top ) = (\Phi U) \Lambda ^{-3/2} (U \Phi ^\top ) $. Therefore, we immediately get

which proves the claim. In the regularized case, all operations work the same with an additional $\epsilon $-shift of the eigenvalues, i.e., the matrix $\Lambda $ is replaced with the regularized version $\Lambda + \epsilon \mathrm {I}$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mollenhauer, M., Schuster, I., Klus, S., Schütte, C. (2020). Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-51264-4_5
Published: 21 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51263-7
Online ISBN: 978-3-030-51264-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Proof of Block SVD

Proof

1.2 A.2 Derivation of the Empirical CCA Operator

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation