Abstract
This chapter presents some of the recent developments in the generalization of the data representation framework using finite-dimensional covariance matrices to infinite-dimensional covariance operators in Reproducing Kernel Hilbert Spaces (RKHS). We show that the proper mathematical setting for covariance operators is the infinite-dimensional Riemannian manifold of positive definite Hilbert–Schmidt operators, which are the generalization of symmetric, positive definite (SPD) matrices. We then give the closed form formulas for the affine-invariant and Log-Hilbert–Schmidt distances between RKHS covariance operators on this manifold, which generalize the affine-invariant and Log-Euclidean distances, respectively, between SPD matrices. The Log-Hilbert–Schmidt distance in particular can be used to design a two-layer kernel machine, which can be applied directly to a practical application, such as image classification. Experimental results are provided to illustrate the power of this new paradigm for data representation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
E. Andruchow, A. Varela, Non positively curved metric in the space of positive definite infinite matrices. Revista de la Union Matematica Argentina 48(1), 7–15 (2007)
V.I. Arsenin, A.N. Tikhonov, Solutions of Ill-Posed Problems (Winston, Washington, 1977)
V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Fast and simple calculus on tensors in the Log-Euclidean framework, Medical Image Computing and Computer-Assisted Intervention–MICCAI 2005 (Springer, New York, 2005), pp. 115–122
V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)
F. Barbaresco, Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet median, Matrix Information Geometry (Springer, New York, 2013), pp. 199–255
R. Bhatia, Positive Definite Matrices (Princeton University Press, Princeton, 2007)
D.A. Bini, B. Iannazzo, Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)
B.J. Boom, J. He, S. Palazzo, P.X. Huang, C. Beyan, H.-M. Chou, F.-P. Lin, C. Spampinato, R.B. Fisher, A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage. Ecol. Inf. 23, 83–97 (2014)
B. Caputo, E. Hayman, P. Mallikarjuna, Class-specific material categorisation, in ICCV (2005), pp. 1597–1604
C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
A. Cherian, S. Sra, A. Banerjee, N. Papanikolopoulos, Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. TPAMI 35(9), 2161–2174 (2013)
I.L. Dryden, A. Koloydenko, D. Zhou, Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat. 3, 1102–1123 (2009)
H.W. Engl, M. Hanke, A. Neubauer, Regularization of Inverse Problems, vol. 375, Mathematics and Its Applications (Springer, New York, 1996)
M. Faraki, M. Harandi, F. Porikli, Approximate infinite-dimensional region covariance descriptors for image classification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
P. Formont, J.-P. Ovarlez, F. Pascal, On the use of matrix information geometry for polarimetric SAR image classification, Matrix Information Geometry (Springer, New York, 2013), pp. 257–276
M. Harandi, M. Salzmann, F. Porikli, Bregman divergences for infinite dimensional covariance matrices, in CVPR (2014)
S. Jayasumana, R. Hartley, M. Salzmann, H. Li, M. Harandi, Kernel methods on the Riemannian manifold of symmetric positive definite matrices, in CVPR (2013)
S. Jayasumana, R. Hartley, M. Salzmann, H. Li, M. Harandi, Kernel methods on Riemannian manifolds with Gaussian RBF kernels. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2464–2477 (2015)
B. Kulis, M.A. Sustik, I.S. Dhillon, Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10, 341–376 (2009)
G. Kylberg, The Kylberg texture dataset v. 1.0. External report (Blue series) 35, Centre for Image Analysis, Swedish University of Agricultural Sciences and Uppsala University (2011)
G. Larotonda, Geodesic Convexity, Symmetric Spaces and Hilbert-Schmidt Operators. Ph.D. thesis, Universidad Nacional de General Sarmiento, Buenos Aires, Argentina (2005)
G. Larotonda, Nonpositive curvature: a geometrical approach to Hilbert-Schmidt operators. Differ. Geom. Appl. 25, 679–700 (2007)
J.D. Lawson, Y. Lim, The geometric mean, matrices, metrics, and more. Am. Math. Monthly 108(9), 797–812 (2001)
P. Li, Q. Wang, W. Zuo, L. Zhang, Log-Euclidean kernels for sparse representation and dictionary learning, in ICCV (2013)
H.Q. Minh, Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory. Constr. Approx. 32, 307–338 (2010)
H.Q. Minh, Affine-invariant Riemannian distance between infinite-dimensional covariance operators, in Geometric Science of Information, vol. 9389, Lecture Notes in Computer Science, ed. by F. Nielsen, F. Barbaresco (Springer International Publishing, Switzerland, 2015), pp. 30–38
H.Q. Minh, P. Niyogi, Y. Yao, Mercer’s theorem, feature maps, and smoothing, in Proceedings of 19th Annual Conference on Learning Theory (Springer, Pittsburg, 2006)
H.Q. Minh, M. San Biagio, V. Murino, Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces, in Advances in Neural Information Processing Systems 27 (NIPS 2014) (2014), pp. 388–396
H.Q. Minh, M. San Biagio, L. Bazzani, V. Murino, Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
G.D. Mostow, Some new decomposition theorems for semi-simple groups. Mem. Am. Math. Soc. 14, 31–54 (1955)
X. Pennec, P. Fillard, N. Ayache, A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006)
D. Pigoli, J. Aston, I.L. Dryden, P. Secchi, Distances and inference for covariance operators. Biometrika 101(2), 409–422 (2014)
F. Porikli, O. Tuzel, P. Meer, Covariance tracking using model update based on Lie algebra, in CVPR, vol. 1 (IEEE, 2006), pp. 728–735
A. Qiu, A. Lee, M. Tan, M.K. Chung, Manifold learning on brain functional networks in aging. Med. Image Anal. 20(1), 52–60 (2015)
I.J. Schoenberg, Metric spaces and positive definite functions. Trans. Am. Math. Soc. 44, 522–536 (1938)
B. Schölkopf, A. Smola, K.-R. Müller, Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299 (1998)
J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, Cambridge, 2004)
S. Sra, A new metric on the manifold of kernel matrices with application to matrix geometric means. Adv. Neural Inf. Process. Syst. 1, 144–152 (2012)
D. Tosato, M. Spera, M. Cristani, V. Murino, Characterizing humans on Riemannian manifolds. TPAMI 35(8), 1972–1984 (2013)
O. Tuzel, F. Porikli, P. Meer, Pedestrian detection via classification on Riemannian manifolds. TPAMI 30(10), 1713–1727 (2008)
S.K. Zhou, R. Chellappa, From sample similarity to ensemble similarity: probabilistic distance measures in reproducing kernel Hilbert space. TPAMI 28(6), 917–929 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
Proofs of Mathematical Results
Proof
(of Proposition 1) For the first kernel, we have the property that the sum and product of positive definite kernels are also positive definite. Thus from the positivity of the inner product \(\langle A, B\rangle _{F}\), it follows that \(K(A,B) = (c + \langle A, B\rangle _{\mathrm{logE}})^d\) is positive definite, as in the Euclidean setting.
For the second kernel, since \((\mathrm{{Sym}}^{++}(n), \odot , \circledast , \langle \;, \;\rangle _{\mathrm{logE}})\) is an inner product space, it follows that the kernel
is positive definite for \(0 < p \le 2\) by a classical result due to Schoenberg on positive definite functions and the imbeddability of metric spaces into Hilbert spaces (see [35], Theorem 1 and Corollary 1).
Proof
(of Lemma 1) Recall that we have \(\varPhi (\mathbf {x})^{*}\varPhi (\mathbf {x}) = K[\mathbf {x}]\), \(\varPhi (\mathbf {y})^{*}\varPhi (\mathbf {y}) = K[\mathbf {y}]\), \(\varPhi (\mathbf {x})^{*}\varPhi (\mathbf {y}) = K[\mathbf {x},\mathbf {y}]\). By definition of the Hilbert–Schmidt norm and property of the trace operation, we have
This completes the proof of the lemma. \(\square \)
Lemma 2
Let B be a constant with \(0< B < 1\). Then for all \(|x| \le B\),
Proof
For \(x \ge 0\), we have the well-known inequality \(0 \le \log (1+x) \le x\), so clearly \(0 \le \log (1+x) < \frac{1}{1-B}x\). Consider now the case \(-B \le x \le 0\). Let
We have
with \(f^{'}(-B) = 0\). Thus the function f is decreasing on \([-B, 0]\) and reaches its minimum at \(x= 0\), which is \(f(0) = 0\). Hence we have for all \(-1 < -B \le x \le 0\)
as we claimed. \(\square \)
Proof
(of Propositions 2 and 3) We first show that for an operator \(A+\gamma I > 0\), where A is self-adjoint, compact, the operator \(\log (A+\gamma I) \notin \mathrm{HS}(\mathscr {H})\) if \(\gamma \ne 1\). Since A is compact, it has a countable spectrum \(\{\lambda _k\}_{k\in \mathbb {N}}\), with \(\lim _{k \rightarrow \infty }\lambda _k = 0\), so that \(\lim _{k \rightarrow \infty }\log (\lambda _k +\gamma ) = \log (\gamma )\). Thus if \(\gamma \ne 1\), so that \(\log \gamma \ne 0\), we have
Hence \(\log (A+\gamma I) \notin \mathrm{HS}(\mathscr {H})\) if \(\gamma \ne 1\).
Assume now that \(\gamma = 1\). We show that \(\log (A+I) \in \mathrm{HS}(\mathscr {H})\) if and only if \(A \in \mathrm{HS}(\mathscr {H})\). For the first direction, assume that \(B = \log (A+I) \in \mathrm{HS}(\mathscr {H})\). By definition, we have \(A+ I = \exp (B) \Longleftrightarrow A = \exp (B) - I = \sum _{k=1}^{\infty }\frac{B^k}{k!}\), with
This shows that \(A \in \mathrm{HS}\). Conversely, assume \(A \in \mathrm{HS}(\mathscr {H})\), so that
and that \(A+I >0\), so that \(\log (A+I)\) is well-defined and bounded, with eigenvalues \(\{\log (\lambda _k+1)\}_{k=1}^{\infty }\). Since \(\lim _{k \rightarrow \infty }\lambda _k = 0\), for any constant \( 0< \varepsilon < 1\), there exists \(N = N(\varepsilon )\) such that \(|\lambda _k| < \varepsilon \) \(\forall k \ge N\). By Lemma 2, we have
This shows that \(\log (A+I) \in \mathrm{HS}(\mathscr {H})\), which completes the proof. \(\square \)
Proof
(Proof of Proposition 4) Since the identity operator I commutes with any operator A, we have the decomposition
We first note that the operator \(\log \left( \frac{A}{\gamma } + I\right) \) is compact, since it possesses a countable set of eigenvalues \(\{\log (\frac{\lambda _k}{\gamma } + 1)\}_{k\in \mathbb {N}}\) satisfying \(\lim _{k \rightarrow \infty }\log (\frac{\lambda _k}{\gamma } +1) = 0\).
If A is Hilbert–Schmidt, then by Proposition 2, we have \(\log \left( \frac{A}{\gamma } + I\right) \in \mathrm{HS}(\mathscr {H})\), and thus \(\log (A+\gamma I) \in \mathscr {H}_{\mathbb {R}}\). By definition of the extended Hilbert–Schmidt norm,
Conversely, if \(\log (A+\gamma I) \in \mathscr {H}_{\mathbb {R}}\), then together with the fact that \(\log \left( \frac{A}{\gamma } + I\right) \) is compact, the above decomposition shows that we must have \(\log \left( \frac{A}{\gamma } + I\right) \in \mathrm{HS}(\mathscr {H})\) and hence \(A \in \mathrm{HS}(\mathscr {H})\) by Proposition 2. \(\square \)
Proof
(of Proposition 5) Since \((A+\gamma I) > 0\), \((B+ \mu I) > 0\), it is straightforward to see that \((A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2} > 0\). Using the identity
we obtain
where \(\nu = \frac{\mu }{\gamma }\) and \(Z = (A+\gamma I)^{-1/2}B(A+\gamma I)^{-1/2} - \frac{\mu }{\gamma }A(A+\gamma I)^{-1}\). It is clear that \(Z = Z^{*}\) and that \(Z \in \mathrm{HS}(\mathscr {H})\), since \(\mathrm{HS}(\mathscr {H})\) is a two-sided ideal in \(\mathscr {L}(\mathscr {H})\). It follows that \(\log (Z + \gamma I) \in \mathscr {H}_{\mathbb {R}}\) by Proposition 4. Thus the geodesic distance
is always finite. Furthermore, by Proposition 2, \(\log (\frac{Z}{\nu } +I) \in \mathrm{HS}(\mathscr {H})\) and thus by definition of the extended Hilbert–Schmidt norm, when \(\dim (\mathscr {H}) = \infty \),
This completes the proof. \(\square \)
Proof
(of Proposition 6) By Proposition 4, \((A+\gamma I), (B+\mu I) \in \Sigma (\mathscr {H}) \Longleftrightarrow \log (A+\gamma I), \log (B+ \mu I) \in \mathscr {H}_{\mathbb {R}}\) . It follows that \([\log (A+\gamma I) - \log (B+\mu I)] \in \mathscr {H}_{\mathbb {R}}\), so that \(||\log (A+\gamma I) - \log (B+\mu I)||_{\mathrm{eHS}}\) is always finite.
Furthermore, by Proposition 2, \(\log \left( \frac{A}{\gamma }+I\right) , \log \left( \frac{B}{\mu }+I\right) \in \mathrm{HS}(\mathscr {H})\) and by definition of the extended Hilbert–Schmidt norm, when \(\dim (\mathscr {H}) = \infty \),
This completes the proof. \(\square \)
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Minh, H.Q., Murino, V. (2016). From Covariance Matrices to Covariance Operators: Data Representation from Finite to Infinite-Dimensional Settings. In: Minh, H., Murino, V. (eds) Algorithmic Advances in Riemannian Geometry and Applications. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-45026-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-45026-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45025-4
Online ISBN: 978-3-319-45026-1
eBook Packages: Computer ScienceComputer Science (R0)