Abstract
Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron–Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Given a continuous kernel k on a compact domain, Mercer’s theorem allows for a series representation of the form , see, e.g., [5]. In particular, forms an (at most countable) orthonormal system in \( \mathscr {H} \). The Mercer feature space can be constructed by computing eigenfunctions of the operator \( \mathcal {E}_k \) introduced below.
- 2.
For a d-dimensional state space, the polynomial kernel with degree p spans a \( {p+d \atopwithdelims ()p} \)-dimensional feature space [19].
- 3.
For a detailed introduction of covariance and cross-covariance operators, see Sect. 4.
References
Reed, M., Simon, B.: Methods of Mathematical Physics I: Functional Analysis, 2nd edn. Academic Press Inc., Cambridge (1980)
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2001)
Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, Berlin (2004)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer, Heidelberg (2008)
Smola, A., Gretton, A., Song, L., Schölkopf, B.: A Hilbert space embedding for distributions. In: Proceedings of the 18th International Conference on Algorithmic Learning Theory, pp. 13–31. Springer (2007)
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10(1–2), 1–141 (2017)
Song, L., Huang, J., Smola, A., Fukumizu, K.: Hilbert space embeddings of conditional distributions with applications to dynamical systems. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 961–968 (2009)
Grünewälder, S., Lever, G., Baldassarre, L., Patterson, S., Gretton, A., Pontil, M.: Conditional mean embeddings as regressors. In: International Conference on Machine Learning, vol. 5 (2012)
Klebanov, I., Schuster, I., Sullivan, T.J.: A rigorous theory of conditional mean embeddings (2019)
Park, J., Muandet, K.: A measure-theoretic approach to kernel conditional mean embeddings (2020)
Fukumizu, K., Song, L., Gretton, A.: Kernel Bayes’ rule: Bayesian inference with positive definite kernels. J. Mach. Learn. Res. 14, 3753–3783 (2013)
Fukumizu, K.: Nonparametric Bayesian inference with kernel mean embedding. In: Peters, G., Matsui, T. (eds.) Modern Methodology and Applications in Spatial-Temporal Modeling (2017)
Klus, S., Schuster, I., Muandet, K.: Eigendecompositions of transfer operators in reproducing kernel Hilbert spaces. J. Nonlinear Sci. 30, 283–315 (2019)
Klus, S., Husic, B.E., Mollenhauer, M., Noé, F.: Kernel methods for detecting coherent structures in dynamical data. Chaos Interdiscip. J. Nonlinear Sci. 29(12), 123112 (2019)
Koltai, P., Wu, H., Noé, F., Schütte, C.: Optimal data-driven estimation of generalized Markov state models for non-equilibrium dynamics. Computation 6(1), 22 (2018)
Weidmann, J.: Lineare Operatoren in Hilberträumen, 3rd edn. Teubner, Stuttgart (1976)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. John Hopkins University Press, Baltimore (2013)
Shawe-Taylor, J., Christianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Kato, T.: Perturbation Theory for Linear Operators. Springer, Berlin (1980)
Eubank, R., Hsing, T.: Theoretical Foundations of Functional Data Analysis with an Introduction to Linear Operators, 1st edn. Wiley, New York (2015)
Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Berlin (1996)
Baker, C.: Joint measures and cross-covariance operators. Trans. Am. Math. Soc. 186, 273–289 (1973)
Lever, G., Shawe-Taylor, J., Stafford, R., Szepesvári, C.: Compressed conditional mean embeddings for model-based reinforcement learning. In: Association for the Advancement of Artificial Intelligence (AAAI), pp. 1779–1787 (2016)
Stafford, R., Shawe-Taylor, J.: ACCME: actively compressed conditional mean embeddings for model-based reinforcement learning. In: European Workshop on Reinforcement Learning 14 (2018)
Gebhardt, G.H.W., Daun, K., Schnaubelt, M., Neumann, G.: Learning robust policies for object manipulation with robot swarms. In: IEEE International Conference on Robotics and Automation (2018)
Schuster, I., Mollenhauer, M., Klus, S., Muandet, K.: Kernel conditional density operators. In: The 23rd International Conference on Artificial Intelligence and Statistics (2020, accepted for publication)
Lasota, A., Mackey, M.C.: Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Applied Mathematical Sciences, vol. 97, 2nd edn. Springer, Heidelberg (1994)
Mezić, I.: Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41(1), 309–325 (2005)
Klus, S., Nüske, F., Koltai, P., Wu, H., Kevrekidis, I., Schütte, C., Noé, F.: Data-driven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018)
Melzer, T., Reiter, M., Bischof, H.: Nonlinear feature extraction using generalized canonical correlation analysis. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) Artificial Neural Networks – ICANN 2001, pp. 353–360. Springer, Heidelberg (2001)
Froyland, G., Padberg-Gehle, K.: Almost-invariant and finite-time coherent sets: directionality, duration, and diffusion. In: Bahsoun, W., Bose, C., Froyland, G. (eds.) Ergodic Theory, Open Dynamics, and Coherent Structures, pp. 171–216. Springer, New York (2014)
Acknowledgements
M. M., S. K., and C. S were funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 (Scaling Cascades in Complex Systems, project ID: 235221301) and through Germany’s Excellence Strategy (MATH+: The Berlin Mathematics Research Center, EXC-2046/1, project ID: 390685689). We would like to thank Ilja Klebanov for proofreading the manuscript and valuable suggestions for improvements.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
1.1 A.1 Proof of Block SVD
Proof
(Lemma 2). Let A admit the SVD given in (2). Then by the definition of T, we have
for all \(i \in I\). For any element \((f,h) \in {{\,\mathrm{span}\,}}\{(\pm u_i , v_i)\}_{i \in I}^{\perp }\), we can immediately deduce
for all \(i \in I\) and hence \(f \in {{\,\mathrm{span}\,}}\{u_i \}_{i \in I}^\perp \) and \(h \in {{\,\mathrm{span}\,}}\{v_i \}_{i \in I}^\perp \). Using the SVD of A in (2), we therefore have
It now remains to show that \(\left\{ \tfrac{1}{\sqrt{2}} (\pm u_i,v_i) \right\} _{i \in I}\) is an orthonormal system in \(F \oplus H\), which is clear since \(\left\langle (\pm u_i,v_i),\, ( \pm u_j,v_j) \right\rangle _{\oplus } = 2\,\delta _{ij}\) and \(\left\langle (-u_i,v_i),\, (u_j,v_j) \right\rangle _{\oplus } = 0\) for all \(i,j \in I\). Concluding, T has the form (3) as claimed. \(\square \)
1.2 A.2 Derivation of the Empirical CCA Operator
The claim follows directly when we can show the identity
and its analogue for the feature map \(\Psi \). Let be the eigendecomposition of the Gramian. We know that in this case we have the SVD of the operator \(\Phi \Phi ^\top = \sum _{i \in I} \lambda _i (\lambda _i^{-1/2}\Phi u_i) \otimes (\lambda _i^{-1/2} \Phi u_i)\), since
We will write this operator SVD for simplicity as \(\Phi \Phi ^\top = (\Phi U \Lambda ^{-1/2}) \Lambda (\Lambda ^{-1/2} U \Phi ^\top )\) with an abuse of notation. Note that we can express the inverted operator square root elegantly in this form as \((\Phi \Phi ^\top )^{-1/2} = (\Phi U \Lambda ^{-1/2}) \Lambda ^{-1/2} (\Lambda ^{-1/2} U \Phi ^\top ) = (\Phi U) \Lambda ^{-3/2} (U \Phi ^\top ) \). Therefore, we immediately get
which proves the claim. In the regularized case, all operations work the same with an additional \(\epsilon \)-shift of the eigenvalues, i.e., the matrix \(\Lambda \) is replaced with the regularized version \(\Lambda + \epsilon \mathrm {I}\).
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mollenhauer, M., Schuster, I., Klus, S., Schütte, C. (2020). Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces. In: Junge, O., Schütze, O., Froyland, G., Ober-Blöbaum, S., Padberg-Gehle, K. (eds) Advances in Dynamics, Optimization and Computation. SON 2020. Studies in Systems, Decision and Control, vol 304. Springer, Cham. https://doi.org/10.1007/978-3-030-51264-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-51264-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51263-7
Online ISBN: 978-3-030-51264-4
eBook Packages: EngineeringEngineering (R0)