Abstract
This work presents a parametrized family of divergences, namely Alpha-Beta Log-Determinant (Log-Det) divergences, between positive definite unitized trace class operators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant divergences between symmetric, positive definite matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det divergences is highly general and contains many divergences as special cases, including the recently formulated infinite-dimensional affine-invariant Riemannian distance and the infinite-dimensional Alpha Log-Det divergences between positive definite unitized trace class operators. In particular, it includes a parametrized family of metrics between positive definite trace class operators, with the affine-invariant Riemannian distance and the square root of the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det divergences between covariance operators on a Reproducing Kernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices.
Similar content being viewed by others
Notes
The current formulation for Alpha-Beta Log-Det divergences can be generalized to the entire Hilbert manifold of positive definite Hilbert–Schmidt operators. This will be presented in a separate work [25].
References
Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)
Barbaresco, F.: Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet median. Matrix Information Geometry, pp. 199–255. Springer, New York (2013)
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)
Bhatia, R.: Matrix analysis, vol. 169. Springer, New York (2013)
Bini, D.A., Iannazzo, B.: Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)
Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436(7), 1872–1889 (2012)
Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.: Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2161–2174 (2013)
Cherian, A., Stanitsas, P., Harandi, M., Morellas, V., Papanikolopoulos, N.: Learning discriminative \(\alpha \beta \)-divergences for positive definite matrices. In The IEEE International Conference on Computer Vision (ICCV), Oct (2017)
Cichocki, A., Cruces, S., Amari, S.: Log-Determinant divergences revisited: Alpha-Beta and Gamma Log-Det divergences. Entropy 17(5), 2988–3034 (2015)
Fan, K.: On a theorem of Weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. USA 36(1), 31 (1950)
Formont, P., Ovarlez, J.P., Pascal, F.: On the use of matrix information geometry for polarimetric SAR image classification. Matrix Information Geometry, pp. 257–276. Springer, New York (2013)
Harandi, M., Salzmann, M., Porikli, F.: Bregman divergences for infinite dimensional covariance matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1003–1010, (2014)
Hasegawa, H.: \(\alpha \)-divergence of the non-commutative information geometry. Rep. Math. Phys. 33(1), 87–93 (1993)
Minh, H.Q.: Regularized divergences between covariance operators and Gaussian measures on Hilbert spaces. arXiv preprint arXiv:1904.05352, (2019)
Jayasumana, S., Hartley, R., Salzmann, M., Hongdong, Li., Harandi, M.: Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 73–80, (2013)
Jenčová, A.: Geometry of quantum states: dual connections and divergence functions. Rep. Math. Phys. 47(1), 121–138 (2001)
Jost, J.: Postmodern analysis. Springer, Berlin (1998)
Kittaneh, F., Kosaki, H.: Inequalities for the Schatten p-norm V. Publ. Res. Inst. Math. Sci. 23(2), 433–443 (1987)
Kulis, B., Sustik, M.A., Dhillon, I.S.: Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10, 341–376 (2009)
Larotonda, G.: Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators. Differ. Geom. Appl. 25, 679–700 (2007)
Lawson, J.D., Lim, Y.: The geometric mean, matrices, metrics, and more. Am. Math. Mon. 108(9), 797–812 (2001)
Li, P., Wang, Q., Zuo, W., Zhang, L.: Log-Euclidean kernels for sparse representation and dictionary learning. In International Conference on Computer Vision (ICCV), pp. 1601 – 1608, (2013)
Minh, H.Q.: Affine-invariant Riemannian distance between infinite-dimensional covariance operators. In Geometric Science of Information, pp. 30–38, (2015)
Minh, H.Q.: Infinite-dimensional Log-Determinant divergences between positive definite trace class operators. Linear Algebra Appl. 528, 331–383 (2017)
Minh, H.Q.: Log-Determinant divergences between positive definite Hilbert-Schmidt operators. In Geometric Science of Information, pp. 505–513, (2017)
Minh, H.Q., Murino, V.: From covariance matrices to covariance operators: Data representation from finite to infinite-dimensional settings. Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning. Computer Vision, Statistics, and Optimization, pp. 115–143. Springer International Publishing, Cham (2016)
Minh, H.Q., Murino, V.: In Synthesis Lectures on Computer Vision. Covariances in Computer Vision and Machine Learning. Morgan & Claypool Publishers, San Rafael (2017)
Minh, H.Q., San Biagio, M., Bazzani, L., Murino, V.: Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June (2016)
Minh, H.Q., San Biagio, M., Murino, V.: Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces. In Advances in Neural Information Processing Systems (NIPS), pp. 388–396, (2014)
Mostow, G.D.: Some new decomposition theorems for semi-simple groups. Mem. Am. Math. Soc. 14, 31–54 (1955)
Ohara, A., Eguchi, S.: Geometry on positive definite matrices deformed by v-potentials and its submanifold structure. Geometric Theory of Information, pp. 31–55. Springer International Publishing, Cham (2014)
Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006)
Petryshyn, W.V.: Direct and iterative methods for the solution of linear operator equations in Hilbert spaces. Trans. Am. Math. Soc. 105, 136–175 (1962)
Peypouquet, J.: Convex optimization in normed spaces: theory, methods and examples. Springer, New York (2015)
Pigoli, D., Aston, J., Dryden, I.L., Secchi, P.: Distances and inference for covariance operators. Biometrika 101(2), 409–422 (2014)
Simon, B.: Notes on infinite determinants of Hilbert space operators. Adv. Math. 24, 244–273 (1977)
Sra, S.: A new metric on the manifold of kernel matrices with application to matrix geometric means. In Advances in Neural Information Processing Systems (NIPS), pp. 144–152, (2012)
Stanitsas, P., Cherian, A., Morellas, V., Papanikolopoulos, N.: Clustering positive definite matrices by learning information divergences. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1304–1312, (2017)
Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proofs of Main results
Proofs of Main results
1.1 Proofs for the General Alpha-Beta Log-Determinant divergences
In this section, we prove Lemma 1, Proposition 1, and Theorems 7, 8, 9, and 10 .
Proof of Lemma 1
Since any \(A \in {\mathcal {L}}({\mathcal {H}})\) commutes with the identity I, we have
since \(\sum _{j=1}^{\infty }\frac{A^j}{j!}\) is trace class, with \(\left\| \sum _{j=1}^{\infty }\frac{A^j}{j!}\right\| _{\mathrm{tr}} \le \sum _{j=1}^{\infty }\frac{||A||^j_{\mathrm{tr}}}{j!} = \exp (||A||_{\mathrm{tr}}) -1 < \infty .\)\(\square \)
Proof of Proposition 1
For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), we have \((B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2} \in \mathrm{PTr}({\mathcal {H}})\) and the logarithm \(\log [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}] \in \mathrm{Tr}_X({\mathcal {H}})\) is well-defined. By the discussion preceding Proposition 1,
For the power function, we have
For the sum of two power functions, we then have
By Lemma 5 in [24], \(\mathrm{det_X}[C(A+\gamma I)C^{-1}] = \mathrm{det_X}(A+\gamma I)\) for any invertible \(C \in {\mathcal {L}}({\mathcal {H}})\). Thus
\(\square \)
Proof of Theorem 7
By definition of the power function, we have
It follows that for \(\delta = \frac{\alpha \gamma ^{p}}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\), \(1-\delta =\frac{(1-\alpha ) \mu ^q}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\), we have
For \(0<\alpha < 1\), equality happens if and only if simultaneously, we have
In particular, for \(\gamma = \mu \), the condition \(\gamma ^{p} = \mu ^{q}\) becomes
With the conditions \(\gamma = \mu \ne 1\) and \(p =q\), we then have \((\frac{A}{\gamma }+I)^{p} = (\frac{B}{\gamma }+I)^{p} \Longleftrightarrow A = B\).\(\square \)
Proof of Theorem 8
Recall that we write \( (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2} = \varLambda + (\gamma /\mu )I \in \mathrm{PTr}({\mathcal {H}}). \) Its inverse, also in \(\mathrm{PTr}({\mathcal {H}})\), has the form \( (B+\mu I)^{1/2}(A+\gamma I)^{-1}(B+\mu I)^{1/2} = [\varLambda + (\gamma /\mu )I]^{-1} = \frac{\mu }{\gamma }I - (\frac{\mu }{\gamma })^2\varLambda (I + \frac{\mu }{\gamma }\varLambda )^{-1}. \) It follows from Corollary 1 that
where \(\delta = \frac{\alpha (\frac{\gamma }{\mu })^{p}}{\alpha (\frac{\gamma }{\mu })^{p} + \beta (\frac{\mu }{\gamma })^{q}} = \frac{\alpha (\frac{\gamma }{\mu })^{p+q}}{\alpha (\frac{\gamma }{\mu })^{p+q} + \beta }\), \(1-\delta = \frac{\beta (\frac{\mu }{\gamma })^{q}}{\alpha (\frac{\gamma }{\mu })^{p} + \beta (\frac{\mu }{\gamma })^{q}} = \frac{\beta }{\alpha (\frac{\gamma }{\mu })^{p+q} + \beta }\).
For the two determinants on the right hand side of (129) to cancel each other out, we need
Assuming that this condition holds, then by the definition of \(D^{(\alpha , \beta )}_{(p,q)}\), (129) gives
In the inequality in (129), the equality sign happens if and only if
If \(p+q = 0\), then this is always true, so that \(D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+\mu I)] = 0\) for all pairs \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), which is not what we want. In fact, with \(p+q = 0\), the condition \(\alpha p \left( \frac{\gamma }{\mu }\right) ^{p+q} = \beta q\) gives \((\alpha + \beta ) p = 0 \Rightarrow p = 0 \Rightarrow q = 0\).
If \(p +q \ne 0\), since \(\varLambda + (\gamma /\mu )I > 0\), this happens if and only if
\(\square \)
Proof of Theorem 9
Under the condition \(p+q = r\), by Theorem 8, we have \(\alpha p\left( \frac{\gamma }{\mu }\right) ^{r} = \beta (r-p) \Rightarrow p = \frac{\beta r}{\alpha \left( \frac{\gamma }{\mu }\right) ^{r} + \beta } \) Thus \(q = r - p = \frac{r \alpha \left( \frac{\gamma }{\mu }\right) ^{r}}{\alpha \left( \frac{\gamma }{\mu }\right) ^{r} + \beta }\). The equivalence of Eqs. (18) and (19) then follows from Proposition 1. \(\square \)
Proof of Theorem 10
We have
where the operators \(C_1\) and \(C_2\) are given by \(C_1 = \sum _{k=1}^{\infty }\frac{p^k}{k!}\left[ \log \left( \frac{\mu }{\gamma }\varLambda + I\right) \right] ^k \in \mathrm{Tr}({\mathcal {H}})\), \(C_2 = \sum _{k=1}^{\infty }\frac{(-1)^kq^k}{k!}\left[ \log \left( \frac{\mu }{\gamma }\varLambda + I\right) \right] ^k \in \mathrm{Tr}({\mathcal {H}})\). By definition of the \(\mathrm{det_X}\) function, we then have
This, together with the definition of \(D^{(\alpha , \beta )}_{(p,q)}\), gives us the desired expression. \(\square \)
1.2 Proofs for the affine-invariant Riemannian distance
In this section, we prove Theorem 2 (part 1). In Definition 1, with \(\alpha = \beta \), \(\delta = \frac{(\frac{\gamma }{\mu })^{r}}{(\frac{\gamma }{\mu })^{r} + 1}\), we have
We first need the following results.
Lemma 6
Let \(\gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Let \(\delta = \frac{\gamma ^r}{\gamma ^r+1}\). Then
In particular, for \(r = 2\alpha \), we have \(\lim _{\alpha \rightarrow 0}\frac{r(\delta - \frac{1}{2})}{\alpha ^2} = \log \gamma \).
Lemma 7
Let \(\gamma > 0\) be fixed. Let \(\lambda > 0\) be fixed. Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then
In particular, if \(\gamma = \lambda \), then \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{\gamma ^p + \gamma ^{-q}}{2}\right) = -\frac{[r{'}(0)]^2}{8}(\log \gamma )^2. \) If \(\gamma = 1\), then \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{\lambda ^{r/2} + \lambda ^{-r/2}}{2}\right) = \frac{[r{'}(0)]^2}{8}(\log \lambda )^2. \)
Lemma 8
Let \(\gamma > 0\) be fixed. Let \(\lambda \in {\mathbb {R}}\) be fixed such that \(\lambda + \gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then
In particular, if \(r = r(\alpha ) = 2\alpha \), then \(\lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{(\lambda + \gamma )^p + (\lambda + \gamma )^{-q}}{\gamma ^p + \gamma ^{-q}}\right) = \frac{1}{2}\left[ \log \left( \frac{\lambda }{\gamma } +1\right) \right] ^2\).
Lemma 9
Let \(\gamma > 0\) be fixed. Let \(\lambda \in {\mathbb {R}}\) be fixed such that \(\lambda + \gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then
Proof of Theorem 2, part 1
For \(\alpha = \beta \), we have \(\delta = \frac{(\frac{\gamma }{\mu })^r}{(\frac{\gamma }{\mu })^r+1},\; p = r(1-\delta ), \; q = r\delta \). Let \(\{\lambda _j\}_{j \in {\mathbb {N}}}\) be the eigenvalues of \(\varLambda \). By Theorem 10, we have
By Lemma 6, we have \( \lim _{\alpha \rightarrow 0}\frac{r(\delta -\frac{1}{2})}{\alpha ^2}\log \left( \frac{\gamma }{\mu }\right) = \frac{[r{'}(0)]^2}{4}\left[ \log \frac{\gamma }{\mu }\right] ^2 \).
By Lemma 7, we have \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}{2}\right) = -\frac{[r{'}(0)]^2}{8}\left[ \log \frac{\gamma }{\mu }\right] ^2. \)
By Lemma 8, we have
Summing up these three expressions, we obtain
\(\square \)
1.3 Proofs for the Alpha Log-Determinant divergences
Proof of Theorem 13
The proof for the cases \(\alpha = 0\) and \(\alpha =1\) is a special case of the results discussed at the end of Sect. 4.1. Consider now the case \(0< \alpha < 1\). We first note that
From Eq. 67 we have
By Proposition 1, we have
In particular, for \(r=1\), we have
Thus it follows that
For \(r=1\), in Eq. 67, \(\delta = \delta (r=1) = \frac{\alpha \gamma }{\alpha \gamma + (1-\alpha )\mu }\). Combining all of these expressions and comparing with \(d^{1-2\alpha }_{\mathrm{logdet}}\), we obtain the first desired statement. The case \(r=-1\) is proved similarly. \(\square \)
1.4 Proofs for the other limiting cases
In this section, we prove Theorems 11 and 12 . We need the following results.
Lemma 10
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) be such that \(A+ I > 0\). Then \(\forall \alpha \in {\mathbb {R}}\), \((A+I)^{\alpha }\) is well defined and \((A+I)^{\alpha } - I\in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\). Equivalently, let \(\{\lambda _k\}_{k\in {\mathbb {N}}}\) be the eigenvalues of A, then
Proof of Lemma 10
By Lemma 3 in [24], if \(A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and \(A+I > 0\), then \(\log (A+I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\). By definition of the power function, we have
Since \(\mathrm{Tr}({\mathcal {H}})\) is a Banach algebra under the trace norm, we have
Thus \((A+I)^{\alpha } - I \in \mathrm{Tr}({\mathcal {H}})\). The equivalent statement is then obvious. \(\square \)
Lemma 11
Let \({\mathcal {H}}\) be a separable Hilbert space. Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\). Then for any \(\alpha \in {\mathbb {R}}\), we have \((A+\gamma I)^{\alpha } - \gamma ^{\alpha }I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and
Proof of Lemma 11
By definition of the power function, we have
where \(\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] \in \mathrm{Tr}({\mathcal {H}})\) by Lemma 10. Thus \((A+\gamma I)^{\alpha } - \gamma ^{\alpha }I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and
which is the first identity. For the second identity, by definition of the extended trace
\(\square \)
Lemma 12
Let \((A+\gamma I), (B+ \mu I) \in \mathrm{PTr}({\mathcal {H}})\). Let \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\). Then for any \(\alpha \in {\mathbb {R}}\),
Proof of Lemma 12
By Proposition 1, we have
By the commutativity of the \(\mathrm{tr}_X\) operation (Lemma 4 in [24]), we then have
Similarly, by the product property of the \(\mathrm{det_X}\) operation (Proposition 4 in [24]),
\(\square \)
Lemma 13
Assume that \(\lambda> 0, \gamma> 0, \alpha > 0\) are fixed. Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have
In particular, for \(\lambda = \gamma \), \(\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha \gamma ^p + \beta \gamma ^{-q}}{\alpha + \beta }\right) = \frac{1}{\alpha ^2}\left( [(\log \gamma ){r(0)} + 1]\gamma ^{-r(0)}-1\right) . \)
Lemma 14
Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(\lambda \in {\mathbb {R}}\) is also fixed, such that \(\lambda + \gamma > 0\). Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have
Lemma 15
Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(\lambda \in {\mathbb {R}}\) is also fixed, such that \(\lambda + \gamma > 0\). Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have
Lemma 16
Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }\),
Proof of Theorem 11
Let \(\{\lambda _j\}_{j=1}^{\infty }\) be the eigenvalues of \(\varLambda \). By Theorem 10,
where \(p= p(\beta ) = r(1-\delta ) = \frac{r \beta }{\alpha (\frac{\gamma }{\mu })^{r} + \beta }\), \(q= q(\beta ) = r\delta = \frac{r \alpha (\frac{\gamma }{\mu })^{r}}{\alpha (\frac{\gamma }{\mu })^{r} + \beta }\).
For \(\alpha > 0\) fixed, as functions of \(\beta \), we have \(\lim _{\beta \rightarrow 0} p(\beta ) = 0, \lim _{\beta \rightarrow 0} q(\beta ) = r(0)\). For simplicity, in the following, we replace \(\frac{\gamma }{\mu }\) by \(\gamma \). By Lemma 13,
By Lemma 14,
By Lemma 15, we have \(\log \left( \frac{\alpha (\lambda _j+\gamma )^p + \beta (\lambda _j+\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}}\right) \ge 0\)\(\forall j \in {\mathbb {N}}\), so that by Lebesgue’s Monotone Convergence Theorem, we obtain
By Lemma 16, \( \log (\gamma )\lim _{\beta \rightarrow 0}\frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta } = \frac{1}{\alpha ^2}r(0)[-\gamma ^{-r(0)} + 1]\log (\gamma ). \) Combining all three expressions, we obtain the desired limit as the sum
Thus it follows that
Furthermore,
Plugging the last two expressions into (144), we obtain the desired limit as
We now replace \(\gamma \) by \(\frac{\gamma }{\mu }\). We have by Lemma 12,
Then (145) becomes
\(\square \)
Proof of Theorem 12
The dual symmetry in Theorem 14 gives
The limit on the right hand side then follows from Theorem 11. \(\square \)
1.5 Proofs of the properties of the Alpha-Beta Log-Determinant divergences
In this section, we prove Theorems 14, 15, 16, 17, 18, and 19 . For the case \(\alpha = \beta = 0\), we have \(D^{(0,0)}_0[(A+\gamma I), (B+\mu I)] = \frac{1}{2}d_{\mathrm{aiHS}}^2[(A+\gamma I), (B+\mu I)]\), with \(d_{\mathrm{aiHS}}\) being the affine-invariant Riemannian distance on \(\mathrm{PTr}({\mathcal {H}})\). Thus these properties are either automatic or straightforward to verify. We thus focus on the three cases \((\alpha> 0, \beta > 0)\), \((\alpha > 0, \beta = 0)\), and \((\alpha = 0, \beta > 0)\).
Proof of Theorem 14
The cases \((\alpha > 0, \beta = 0)\) and \((\alpha =0, \beta > 0)\) follow immediately from Eqs. (20) and (21). Consider now the case \(\alpha> 0, \beta > 0\). Write \(\delta = \delta (\alpha , \beta )\) to emphasize its dependence on \(\alpha \) and \(\beta \), we have \(\delta (\alpha , \beta ) = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\) in \(D^{(\alpha ,\beta )}_r[(A+\gamma I), (B+ \mu I)]\). Then for \(D^{(\beta , \alpha )}_r[(B+\mu I), (A+\gamma I)]\), \(\delta (\beta , \alpha ) = \frac{\beta \mu ^r}{\alpha \gamma ^r + \beta \mu ^r} = 1 -\delta (\alpha , \beta ), \;\;\; 1-\delta (\beta , \alpha ) = \delta (\alpha , \beta ), \)\(\delta (\beta , \alpha ) - \frac{\beta }{\alpha + \beta } = 1 -\delta (\alpha , \beta ) - \frac{\beta }{\alpha + \beta } = -\left( \delta (\alpha , \beta ) - \frac{\alpha }{\alpha + \beta }\right) . \) By Definition 1, we have
\(\square \)
Proof of Theorem 15
We prove Eq. (77) - the proof for Eq. (78) is similar. We write \((A+ \gamma I)^{-1} = \frac{1}{\gamma }I - \frac{A}{\gamma }(A+\gamma I)^{-1}, (B+\mu I)^{-1} = \frac{1}{\mu }I - \frac{B}{\mu }(B+\mu I)^{-1}, (B+\mu I)^{1/2}(A+\gamma I)^{-1}(B+\mu I)^{1/2} = [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}]^{-1} \). Consider the case \(\alpha> 0, \beta > 0\). By Definition 1,
where \(\delta _2 = \frac{\alpha (1/\gamma )^r}{\alpha (1/\gamma )^r + \beta (1/\mu )^r} = \frac{\alpha \mu ^r}{\alpha \mu ^r + \beta \gamma ^r} = \delta (-r)\). Thus
Consider the case \(\alpha = 0, \beta > 0\) (the case \(\alpha > 0, \beta = 0\) then follows by dual symmetry). We have
By Lemma 12, we have
Thus it follows that
\(\square \)
Proof of Theorem 16 - Affine-invariance
For \((A + \gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((C+\nu I) \in \mathrm{Tr}_X({\mathcal {H}})\), \(\nu \ne 0\),
Since \((C+\nu I)\) is invertible, \((C+\nu I)(A+\gamma I)(C+\nu I)^{*}\) is also invertible, with inverse \([(C+\nu I)^{*}]^{-1}(A+\gamma I)^{-1}(C+\nu I)^{-1}\). Furthermore, \(\forall x \in {\mathcal {H}}\),
with equality if and only if \((C+\nu I)^{*}x = 0 \Longleftrightarrow x = 0\). Thus \((C+\nu I)(A+\gamma I)(C+\nu I)^{*}\) is strictly positive. Together with its invertibility, this shows that this is a positive definite operator. Hence \((C+\nu I)(A+\gamma I)(C+\nu I)^{*} \in \mathrm{PTr}({\mathcal {H}})\). For \((A+\gamma I), (B + \mu I) \in \mathrm{PTr}({\mathcal {H}})\), we then have
Then for any \(p \in {\mathbb {R}}\), we have
Thus for any \(a, b > 0\) and any \(p, q \in {\mathbb {R}}\).
By the definition of \(D^{(\alpha , \beta )}_r\) and the following invariances of the extended Fredholm determinant \(\mathrm{det_X}\) as well as of the extended trace operation \(\mathrm{tr}_X\), namely,
for \(A + \gamma I \in \mathrm{Tr}_X({\mathcal {H}})\), \(\gamma \ne 0\), and \(C \in {\mathcal {L}}({\mathcal {H}})\) invertible (Lemma 5 in [24]), we obtain
\(\square \)
Proof of Theorem 17 - Invariance under unitary transformations
The proof of this theorem is similar to that of the proof for Theorem 16 , using the fact that \(C^{*} = C^{-1}\) and the properties
of the operations \(\mathrm{det_X}\) and \(\mathrm{tr}_X\).\(\square \)
Proof of Theorem 18
For the case \(\alpha > 0\), \(\beta > 0\), this follows immediately from Definition 1. For the case \(\alpha > 0, \beta = 0\) (the case \(\alpha =0, \beta > 0\) is entirely similar), by Definition 2 and Lemma 12, we have
\(\square \)
Proof of Theorem 19
We first note that \((\varLambda + \frac{\gamma }{\mu } I)^{\omega } = (\frac{\gamma }{\mu })^{\omega }(\frac{\mu }{\gamma }\varLambda + I)^{\omega }\). For \(\alpha> 0, \beta > 0\), this follows immediately from Definition 1. For \(\alpha > 0, \beta = 0\), by Definition 2 and Lemma 12,
\(\square \)
1.6 Proofs of Theorems 1, 2, and 3
We are now ready to provide the proofs for Theorems 1, 2, and 3 . We first need the following result.
Lemma 17
-
(i )
Let \(r \ne 0\) be fixed. The function \(f(x) = x^r-1 - r\log (x)\) for \(x > 0\) has a unique global minimum \(f_{\min } = f(1) = 0\). In other words, \(f(x) \ge 0\)\(\forall x > 0\), with equality if and only if \(x = 1\).
-
(ii)
Let \(\nu > 0, r \ne 0 \) be fixed. For \(r \ne 0\), the function \(g(x) = (\frac{x}{\nu }+1)^r - 1 - r \log (\frac{x}{\nu }+1)\) for \(x > - \nu \) has a unique global minimum \(g_{\min } = g(0) = 0\). In other words, \(g(x) \ge 0\)\(\forall x > -\nu \), with equality if and only if \(x = 0\).
Proof of Theorem 1 - Positivity
For the case \(\alpha> 0, \beta > 0\), this is a special case of Theorem 8, with \(p +q = r\). Consider now the case \(\alpha = 0, \beta > 0\) (the case \(\alpha > 0, \beta =0\) then follows by dual symmetry). It suffices to consider \(D^{(0,1)}_r\). Recall that \(\varLambda + \nu I = (B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}\), where \(\nu = \frac{\gamma }{\mu }\). Then, since \(\mathrm{det_X}[(B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}] = \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]\) and \(\mathrm{tr}_X[(B + \mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}] = \mathrm{tr}_X[(B+\mu I)^{-1}(A + \gamma I)]\), we have
By Lemma 11, \( \mathrm{tr}_X[(\varLambda + \nu I)^{r} -I] = \nu ^r - 1 +\nu ^r\mathrm{tr}\left[ \left( \frac{\varLambda }{\nu } + I\right) ^r - I\right] . \) Also
Thus we have
By the first part of Lemma 17, we have for all \(\nu > 0\), \(\nu ^r - 1 - r\log \nu \ge 0\), with equality if and only if \(\nu =1\). By the second part of Lemma 17, we have \(\forall k \in {\mathbb {N}}\), \(\left( \frac{\lambda _k}{\nu }+1\right) ^r - 1 - r\log \left( \frac{\lambda _k}{\nu } + 1\right) \ge 0\), with equality if and only \(\lambda _k = 0\). Combining these two inequalities, we obtain
with equality if and only if \(\nu = \frac{\gamma }{\mu } = 1\) and \(\lambda _k = 0 \forall k \in {\mathbb {N}}\Longleftrightarrow \varLambda = I\), that is if and only \((B+\mu I)^{-1/2}(A + \gamma I)(B+ \mu I)^{-1/2} = I \Longleftrightarrow A+\gamma I = B+\mu I \Longleftrightarrow A = B\) and \(\gamma = \mu \). \(\square \)
Proof of Theorem 2 - Special cases I
The first statement of the theorem was proved in Sect. A.2. The second statement is the content of Theorem 13. \(\square \)
Proof of Theorem 3 - Special cases II
This theorem follows from Theorem 2
as well as the symmetry of \(D^{(\alpha , \alpha )}_r\) as proved in Theorem 14. \(\square \)
1.7 Proofs for the Divergences between RKHS covariance operators
In this section, we prove Theorems 26, 27, 28, and 29 . We first need the following preliminary results.
Lemma 18
Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2,B:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_1\) be compact operators such that \(AB: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2, BA:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\) are trace class. Let \(\alpha , \beta > 0\) be fixed. Then
Proof
Since the nonzero eigenvalues of \(AB:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) and \(BA:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\) are the same, we have
For any \(p,q \in {\mathbb {R}}\),
In the above equality, we have used the fact that a zero eigenvalue of AB and BA corresponds to an eigenvalue equal to 1 for \(\frac{\alpha (AB + I_{{\mathcal {H}}_2})^{p} + \beta (AB + I_{{\mathcal {H}}_2})^q}{\alpha + \beta }:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) and \(\frac{\alpha (BA + I_{{\mathcal {H}}_1})^{p} + \beta (BA + I_{{\mathcal {H}}_1})^q}{\alpha + \beta }:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\), respectively, which does not change the determinant. \(\square \)
Lemma 19
Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A,B:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be compact operators such that \(AA^{*}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2,BB^{*}:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) are trace class. Let \(\alpha , \beta > 0\) be fixed. For any \(p, q \in {\mathbb {R}}\),
Proof of Lemma 19
We make use of the following notation. Let \(A,B,C: {\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be three bounded linear operators. Consider the operator \((A \; B \; C): {\mathcal {H}}_1^3 \rightarrow {\mathcal {H}}_2\), with \((A \; B \; C)^{*} = \begin{pmatrix} A^{*}\\ B^{*}\\ C^{*} \end{pmatrix}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_1^3\). Here \({\mathcal {H}}_1^3 = {\mathcal {H}}_1 \oplus {\mathcal {H}}_1 \oplus {\mathcal {H}}_1\) denotes the direct sum of \({\mathcal {H}}_1\) with itself, that is
equipped with the inner product
By the Sherman–Morrison–Woodbury formula, \( (BB^{*}+I_{{\mathcal {H}}_2})^{-1} = I_{{\mathcal {H}}_2} - B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1}B^{*}. \) Thus
Here the operators \(C_1, C_2\) are defined as follows.
The operator \(C_2C_1: {\mathcal {H}}_1^3 \rightarrow {\mathcal {H}}_1^3\) is given by
It follows from Lemma 18 that
\(\square \)
Proof of Theorem 26
Let \(\varLambda +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2} = (BB^{*} + \mu I_{{\mathcal {H}}_2})^{-1/2}(AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1/2}, Z + \frac{\gamma }{\mu } I_{{\mathcal {H}}_2} = (AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1}\). By Theorem 10,
By Lemma 19, with \(\frac{\mu }{\gamma }Z + I_{{\mathcal {H}}_2} = (\frac{AA^{*}}{\gamma } + I_{{\mathcal {H}}_2})(\frac{BB^{*}}{\mu } + I_{{\mathcal {H}}_2})^{-1}\), the determinant in the last term is
which is obtained by replacing \(AA^{*}\) and \(BB^{*}\) in Lemma 19 with \(\frac{AA^{*}}{\gamma }\) and \(\frac{BB^{*}}{\mu }\), respectively. \(\square \)
Proof of Theorem 27
Let \(Z + \frac{\gamma }{\mu } I_{{\mathcal {H}}_2} = (AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1}\). By Eq. (29), when \(\dim ({\mathcal {H}}_2) < \infty \),
As in the proof of Theorem 26, the determinant in last term in the above expression is
This gives us the final expression. \(\square \)
Proof of Theorems 28 and 29
Theorem 28 follows from Theorem 26 by considering the linear operators \(A = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {x}})J_m: {\mathbb {R}}^m \rightarrow {\mathcal {H}}_K, \;\;\; B = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {y}})J_m:{\mathbb {R}}^m \rightarrow {\mathcal {H}}_K\). The proof of Theorem 29 is similar to that of Theorem 28, except that we invoke Theorem 27.\(\square \)
1.8 Proofs for the Metric properties
We now prove Theorems 20, 22, 23, which lead to the proofs of Theorems 4 and 21 . We first prove Theorems 4 and 21 for the case \(\alpha =1/2\), which corresponds to the infinite-dimensional symmetric Stein divergence, and then for the general case \(\alpha > 0\). The former case utilizes Theorem 31 and the latter case utilizes Theorem 33 and the case \(\alpha = 1/2\). Both Theorems 31 and 33 are of interest in their own right.
The case of the infinite-dimensional symmetric Stein divergence. Consider the first case \(\alpha =\frac{1}{2}\).
Lemma 20
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B,C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint finite-rank operators, such that \(A+I > 0\), \(B + I > 0\), \(C+I > 0\). Then
Proof
Since A, B, C are finite-rank operators, there exists a finite-dimensional subspace \({\mathcal {H}}_n \subset {\mathcal {H}}\), with \(\dim ({\mathcal {H}}_n) = n\) for some \(n \in {\mathbb {N}}\), such that \(\mathrm{range}(A) \subset {\mathcal {H}}_n, \mathrm{range}(B) \subset {\mathcal {H}}_n\), \(\mathrm{range}(C) \subset {\mathcal {H}}_n\). Let
Then \(A_n,B_n, C_n\) are linear operators on the finite-dimensional space \({\mathcal {H}}_n\) and thus are represented by \(n \times n\) matrices, which we denote by the same symbols. We have \( (A+B)_n = (A+B) \big \vert _{{\mathcal {H}}_n} = A \big \vert _{{\mathcal {H}}_n} + B\big \vert _{{\mathcal {H}}_n} = A_n + B_n, (A+C)_n = A_n + C_n, (C+B)_n = B_n + C_n. \) Applying the finite-dimensional result in [37], we then obtain
Since the non-zero eigenvalues of A and \(A_n\) are the same, we have \(\det (A+I) = \det (A_n+I_n)\) and the same holds true for the other operators. This, together with the last expression, gives us the final result. \(\square \)
Proof of Theorem 23 - Triangle inequality for square root of symmetric Stein divergence
Let \(\{A_n\}_{n\in {\mathbb {N}}}\), \(\{B_n\}_{n\in {\mathbb {N}}}\), \(\{C_n\}_{n\in {\mathbb {N}}}\) be sequences of finite-rank operators with \(||A_n - A||_{\mathrm{tr}} \rightarrow 0,\;\;||B_n - B||_{\mathrm{tr}} \rightarrow 0,\;\; ||C_n - C||_{\mathrm{tr}} \rightarrow 0, \;\;\hbox { as}\ n \rightarrow \infty . \) By Lemma 20, we have
By Theorem 3.5 in [36], as \(n \rightarrow \infty \), we have
Taking the limit \(n \rightarrow \infty \) in the above triangle inequality for \((A_n+I), (B_n+I)\), and \((C_n+I)\) gives the final triangle inequality for \((A+I), (B+I)\), and \((C+I)\). \(\square \)
The following is the specialization of Theorem 4 when \(\alpha = 1/2\).
Theorem 30
(Metric property - square root of symmetric Stein divergence) Let \(\gamma > 0, \gamma \in {\mathbb {R}}\) be fixed. The square root \(\sqrt{D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]}\) of the infinite-dimensional symmetric Stein divergence is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\).
Proof of Theorem 30
The positivity and symmetry of \(D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]\) are shown in Theorems 1 and 14, respectively. It remains for us to show the triangle inequality, namely
for any three operators \((A+\gamma I), (B+\gamma I), (C+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\). We have
Thus the triangle inequality for \(D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]\) follows that stated in Theorem 23. \(\square \)
Lemma 21
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B \in \mathrm{Sym}({\mathcal {H}})\) be finite-rank operators, with maximum rank \(n \in {\mathbb {N}}\), such that \(A+I > 0\), \(B + I > 0\). Then
Proof of Lemma 21
Since A, B are both finite-rank operators, there exists a finite-dimensional subspace \({\mathcal {H}}_n \subset {\mathcal {H}}\), with \(\dim ({\mathcal {H}}_n) = n\), such that \(\mathrm{range}(A) \subset {\mathcal {H}}_n\), \(\mathrm{range}(B) \subset {\mathcal {H}}_n\). Let \(A_n = A \big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n,\;\; B_n = B\big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n\). Then \(A_n,B_n\) are linear operators on the finite-dimensional space \({\mathcal {H}}_n\) and thus are represented by \(n \times n\) matrices, which we denote by the same symbols. Furthermore, \( (A+B)_n = (A+B) \big \vert _{{\mathcal {H}}_n} = A \big \vert _{{\mathcal {H}}_n} + B\big \vert _{{\mathcal {H}}_n} = A_n + B_n. \) Thus we can apply the following inequality for finite SPD matrices ( [4])
We note that the non-zero eigenvalues of \(A_n, B_n\) are the same as those of A, B, respectively, with the maximum number being n, and \(\det (\frac{A+B}{2}+I) = \det (\frac{A_n+B_n}{2} + I_n)\). Together with the previous inequality, this gives us the final result. \(\square \)
Theorem 31
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint trace class operators, such that \(A+I > 0\), \(B + I > 0\). Then
Proof of Theorem 31
Let \(A = \sum _{j=1}^{\infty }\lambda _j(A)\phi _j \otimes \phi _j\) denote the spectral decomposition for A. For each \(n \in {\mathbb {N}}\), define \(A_n = \sum _{j=1}^n\lambda _j(A)\phi _j \otimes \phi _j\). Then \(A_n\) is a finite-rank operator with the eigenvalues being the first n eigenvalues of A and \(\lim _{n \rightarrow \infty }||A_n - A||_{\mathrm{tr}} = 0\). In the same way, we construct a sequence of finite-rank operators \(B_n\) with \(\lim _{n\rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\), so that \( \lim _{n \rightarrow \infty } ||(A_n+B_n) -(A+B)||_{\mathrm{tr}} = 0. \) By Theorem 3.5 in [36], as \(n \rightarrow \infty \), we then have
Applying Lemma 21 to \(A_n\) and \(B_n\), we have
The final result is then obtained by taking the limit as \(n \rightarrow \infty \), noting that the eigenvalues of \(A_n\), \(B_n\), are precisely the first n eigenvalues of A, B, respectively. \(\square \)
The following is the specialization of Theorem 21 when \(\alpha = 1/2\).
Theorem 32
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A,B \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) be such that \(A+I> 0,B + I > 0\). Let \(\mathrm{Eig}(A), \mathrm{Eig}(B): \ell ^2 \rightarrow \ell ^2\) be diagonal operators with the diagonals consisting of the eigenvalues of A and B, respectively, in decreasing order. Then
Proof of Theorem 32
By definition, we have
\(\square \)
The general case. We now consider the general case \(\alpha > 0\). In the following, let \({\mathscr {C}}_p({\mathcal {H}})\) denote the set of pth Schatten class operators on \({\mathcal {H}}\), under the norm \(||\;||_p\), which is defined by
with \({\mathscr {C}}_1({\mathcal {H}})\) being the space of trace class operators \(\mathrm{Tr}({\mathcal {H}})\), \({\mathscr {C}}_2({\mathcal {H}})\) being the space of Hilbert–Schmidt operators \(\mathrm{HS}({\mathcal {H}})\), and \({\mathscr {C}}_{\infty }({\mathcal {H}})\) being the set of compact operators under the operator norm \(||\;||\).
Theorem 33
Let \(r \in {\mathbb {R}}\) be fixed but arbitrary. Assume that \(1 \le p \le \infty \). Let \(A,\{A_n\}_{n \in {\mathbb {N}}} \in \mathrm{Sym}({\mathcal {H}}) \cap {\mathscr {C}}_p({\mathcal {H}})\), be such that \(I+A > 0\), \(I+A_n > 0\)\(\forall n \in {\mathbb {N}}\). Assume that \(\lim _{n \rightarrow \infty }||A_n - A||_{p} = 0\). Then
Proof of Theorem 33
(i) The case \(r = 0\) is trivial. We first prove that
We make use of the following result from [18] (Corollary 3.2), which states that for any two positive operators A, B on \({\mathcal {H}}\) such that \(A \ge c > 0\), \(B \ge c > 0\), and any operator X on \({\mathcal {H}}\),
By the assumption \(I+A > 0\), there exists \(M_A > 0\) such that \( \langle x, (I+A)x\rangle \ge M_A||x||^2 \;\; \; \forall x \in {\mathcal {H}}. \) By the assumption \(\lim _{n \rightarrow \infty }||A_n -A||_{p} = 0\), for any \(\epsilon \) satisfying \(0< \epsilon < M_A\), there exists \(N = N(\epsilon ) \in {\mathbb {N}}\) such that \(||A_n - A||_{p} < \epsilon \)\(\forall n \ge N\). Then \(\forall x \in {\mathcal {H}}\),
It thus follows that \(\forall x \in {\mathcal {H}}\),
Thus we have \(I+A \ge M_A > 0\), \(I+A_n \ge M_A-\epsilon > 0\)\(\forall n \ge N = N(\epsilon )\). Then, applying Eq. (157),
which implies \(\lim _{n \rightarrow \infty }||(I+A_n)^r - (I+A)^r||_{p} = 0\).
(ii) For \(r > 1\), we proceed by induction as follows. We have
Thus this case follows from the case \(0 \le r \le 1\) by induction.
(iii) For the case \(r < 0\), we first prove that
This follows from the fact that \(\forall n \ge N = N(\epsilon )\),
(iii) We next prove that
We have \((I+A)^{-1} \ge \frac{1}{\max \{(1+\lambda _k(A)): k \in {\mathbb {N}}\}} = \frac{1}{||I+A||}> 0\). From the limit \(\lim _{n \rightarrow \infty }||A_n-A|| = 0\), it follows that for any \(\epsilon \) satisfying \(0< \epsilon < ||I+A||\), there exists \(M = M(\epsilon ) \in {\mathbb {N}}\) such that \(\forall n \ge M\),
It follows that \(\forall n \ge M\), \( (I+A_n)^{-1} \ge \frac{1}{\max \{(1+\lambda _k(A_n)): k \in {\mathbb {N}}\}} = \frac{1}{||I+A_n||} \ge \frac{1}{||I+A||+\epsilon }. \) Hence invoking Eq. (157) again, we obtain \(\forall n \ge M\)
which implies that \(\lim _{n \rightarrow \infty }||(I+A_n)^{-r} - (I+A)^{-r}||_{p} = 0\) by the previous limit, when \(r = 1\).
(iv) By an induction argument as in step (ii), we then obtain the final part of the theorem, namely
Lemma 22
Let \({\mathcal {H}}\) be a separable Hilbert space. Assume that \(\{A_n\}_{n\in {\mathbb {N}}}\), A are self-adjoint trace class operators on \({\mathcal {H}}\) such that \((I+A) > 0\), \((I+A_n) > 0\)\(\forall n \in {\mathbb {N}}\). Assume that \(||A_n - A||_{\mathrm{tr}} = 0\) as \(n \rightarrow \infty \). Then \(A_n(I+A_n)^{-1}\) and \(A(I+A)^{-1}\) are trace class operators and
Lemma 23
Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(\{A_n\}_{n \in {\mathbb {N}}}\), A, \(\{B_n\}_{n \in {\mathbb {N}}}\), B, in \(\mathrm{Tr}({\mathcal {H}})\) be self-adjoint operators with \(\lim _{n \rightarrow \infty }||A_n -A||_{\mathrm{tr}} = 0\), \(\lim _{n \rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\). Assume that \(I+A> 0, I+B> 0, I+A_n> 0, I+ B_n > 0\)\(\forall n \in {\mathbb {N}}\). Then \((I+B_n)^{-1/2}(I+A_n)(I+B_n)^{-1/2} - I\) and \((I+B)^{-1/2}(I+A)(I+B)^{-1/2} - I\) are self-adjoint, trace class operators on \({\mathcal {H}}\) and
Proof of Theorem 20 - Convergence in trace norm
Let \(I+\varLambda = (I+B)^{-1/2}(I+A)(I+B)^{-1/2}\) and \(I+\varLambda _n = (I+B_n)^{-1/2}(I+A_n)(I+B_n)^{-1/2}\), with \(\varLambda , \varLambda _n \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\).
By Lemma 23, we have \(\lim _{n\rightarrow \infty }||\varLambda _n -\varLambda ||_{\mathrm{tr}} = 0\). Thus by Theorem 33,
From Eq.(130), we have
Taking limit as \(n \rightarrow \infty \) and applying the continuity of the Fredholm determinant in the trace norm (e.g. Theorem 3.5 in [36]), we obtain
\(\square \)
Proof of Theorem 21 - Diagonalization
Consider first the case \(\alpha > 0\). As in the proof of Theorem 22, it suffices to prove for the case \(\gamma = 1\). Let \(A = \sum _{j=1}^{\infty }\lambda _j(A)\phi _j \otimes \phi _j\) be the spectral decomposition for A. For each \(n \in {\mathbb {N}}\), define \(A_n = \sum _{j=1}^n\lambda _j(A)\phi _j \otimes \phi _j\). Then \(A_n\) is a finite-rank operator with the eigenvalues being the first n eigenvalues of A and \(\lim _{n \rightarrow \infty }||A_n - A||_{\mathrm{tr}} = 0\). In the same way, we construct a sequence of finite-rank operators \(B_n\) with \(\lim _{n\rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\). By construction,
Thus by Theorem 20, we have
Since \(A_n, B_n\) can be identified with finite-dimensional matrices, as in the proof of Lemma 20, we can apply the corresponding finite-dimensional result in [9] to obtain
Thus taking limits as \(n \rightarrow \infty \) gives
Letting \(\alpha \rightarrow 0\) on both sides of the above expression, we also obtain the result for the case \(\alpha = 0\). \(\square \)
Proof of Theorem 22 - Triangle inequality
For a fixed \(\gamma > 0\), we have
which reduces to the case \(\gamma = 1\). Thus it suffices to prove for \(\gamma = 1\).
Let
Let \(\mathrm{Eig}(\varLambda _1), \mathrm{Eig}(\varLambda _2):\ell ^2 \rightarrow \ell ^2\) denote diagonal operators with the diagonals consisting of the eigenvalues of \(\varLambda _1, \varLambda _2\), respectively, in decreasing order. We have by definition
Then for any \(\alpha > 0\), we have
Similarly,
Furthermore, by the Affine Invariance property (Theorem 16), we have
By the triangle inequality in Theorem 23,
The desired triangle inequality for \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+I), (B+I)]}\) requires that for all \((A+I), (B+I), (C+I) \in \mathrm{PTr}({\mathcal {H}})\),
By Eqs. (161,162,163,164), the triangle inequality in Eq. (165) is satisfied if
Since \(\mathrm{Eig}(\varLambda _1)\) and \(\mathrm{Eig}(\varLambda _2)\) are diagonal operators, they commute and thus
This is precisely the desired inequality stated in Eq. (166). \(\square \)
Proof of Theorem 4 - Metric property
The case \(\alpha = 0\) corresponds to the affine-invariant Riemannian distance on the Hilbert manifold \(\varSigma ({\mathcal {H}})\) [20], which is still a metric when restricted to \(\mathrm{PTr}({\mathcal {H}})\). Consider the case \(\alpha > 0\). The positivity and symmetry of the divergence \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]\) are from Theorems 1 and 14 , respectively. The triangle inequality for the square root \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]}\) is from Theorem 22. Thus \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]}\) is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\). \(\square \)
To prove Theorem 5, we first need the following technical result.
Lemma 24
Let \(\lambda > 0, \lambda \ne 1\) be fixed. Consider the function
Then f is continuous on \([0, \infty )\) and strictly decreasing on \((0, \infty )\), with \(f_{\max } = f(0) = \frac{1}{2}(\log \lambda )^2\). Furthermore, \(\lim _{x \rightarrow \infty }f(x) = 0\).
Proof of Theorem 5 - Monotonicity
Let \(\{\lambda _j\}_{j=1}^{\infty }\) be the eigenvalues of the trace class operator \(\varLambda \), where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\). For \(\gamma = \mu \), we have \(\delta = \frac{\alpha }{\alpha + \beta }\), \(p = r(1-\delta ) = \frac{r\beta }{\alpha + \beta }\), \(q = \frac{r\alpha }{\alpha + \beta }\). If furthermore \(\alpha = \beta \) and \(r = 2\alpha \), then \(\delta = \frac{1}{2}\), \(p = \alpha \), \(q = \alpha \). By Theorem 10, for \(\alpha > 0\),
We note that since \(\varLambda +I > 0\), we have \(\lambda _j +1> 0\)\(\forall j \in {\mathbb {N}}\). Furthermore, if \(A \ne B\), then there exists at least one index \(j \in {\mathbb {N}}\) for which \(\lambda _j \ne 0\), so that \(\lambda _j + 1 > 0\), \(\lambda _j +1 \ne 1\), since otherwise \(\varLambda = 0\) and \(A+\gamma I = B+ \gamma I \Longleftrightarrow A = B\). The terms with \(\lambda _j = 0\) simply vanish and play no contribution in the infinite series expansion above for \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]\).
By Lemma 24, as a function of the parameter \(\alpha \), \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]\) is strictly decreasing on \((0, \infty )\), with maximum value achieved at \(\alpha = 0\).
By Lemma 24 and Lebesgue’s Monotone Convergence Theorem, we have
\(\square \)
1.9 Proofs for the Derivatives
Proof of Lemma 2
This follows from the linearity of the \(\mathrm{tr}_X\) operation, which gives \(\mathrm{tr}_X[A_0+\gamma _0I + t(A+\gamma I)] - \mathrm{tr}_X(A_0+\gamma _0I) - t\mathrm{tr}_X(A+\gamma I) = 0\). \(\square \)
Proof of Proposition 2
For \(\gamma = 0\), this reduces to the Plemelj-Smithies formula. Assume that \(\gamma \ne 0\). By definition of the extended Fredholm determinant \(\mathrm{det_X}\), for \(t \ne -1/\gamma \), we have
Let \(0< \epsilon < 1\) be fixed. For t sufficiently close to zero such that \(1+ \gamma t > \epsilon \), we have
\(\square \)
Proof of Lemma 3
By the product rule of \(\mathrm{det_X}\),
By Proposition 2, we have
from which we have the desired formula. \(\square \)
Proof of Lemma 4
For the function \(\log :{\mathbb {R}}^{+}\rightarrow {\mathbb {R}}\), \(D\log (x_0)(x) = \frac{x}{x_0}\). By the chain rule, we have
by applying the formula for the derivative of \(D\mathrm{det_X}(A_0+\gamma _0)(A+\gamma I) \) from Lemma 3. \(\square \)
We now prove Proposition 3. Recall the chain rule for differentiation on Banach spaces. Let V, W, U be Banach spaces. Let \(\varOmega \subset V\) be open and assume that \(f: \varOmega \rightarrow W\) is differentiable at \(x_0 \in \varOmega \). Let \(\varSigma \subset W\) be open and assume that \(g:\varSigma \rightarrow U\) is differentiable at \(y_0 = f(x_0) \in \varSigma \). Then \(g\circ f\) is defined in an open neighborhood of \(x_0\) and is differentiable at \(x_0\), with
Let \({\mathcal {A}}\) be a Banach algebra with identity I. Consider the mapping \(f: {\mathcal {A}}\rightarrow {\mathcal {A}}\) defined by \(f(A) = A^k\), with \(k \in {\mathbb {N}}\). Its derivative \(Df(A_0)\) at \(A_0 \in {\mathcal {A}}\) is given by
For the exponential mapping \(\exp (A) = \sum _{k=0}^{\infty }\frac{A^k}{k!}\), we then have
In particular, for \(A_0 = I\) and \(A_0 = 0\), respectively,
Let \(\mathrm{GL}({\mathcal {A}})\) denote the group of invertible elements in \({\mathcal {A}}\). For the function \(f:\mathrm{GL}({\mathcal {A}}) \rightarrow \mathrm{GL}({\mathcal {A}})\) defined by \(f(A) = A^{-1}\), its Fréchet derivative \(Df(A_0)\) at \(A_0 \in \mathrm{GL}({\mathcal {A}})\) is given by
Consider now the case \({\mathcal {A}}= {\mathcal {L}}({\mathcal {H}})\). Consider the sets
of unitized compact and positive definite unitized compact operators on \({\mathcal {H}}\), respectively. Then the inverse logarithm \(\log \) of the exponential map \(\exp \)\(\log :\mathrm{PK}({\mathcal {H}}) \rightarrow \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) is well-defined. Let us find \(D\log (A_0)\) for \(A_0 \in \mathrm{PK}({\mathcal {H}})\). We first have the following result.
Lemma 25
The map \(\exp :\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}) \rightarrow \mathrm{PK}({\mathcal {H}})\) and its inverse \(\log :\mathrm{PK}({\mathcal {H}}) \rightarrow \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) are bijections.
Proof
For \(A \in \mathrm{Sym}({\mathcal {H}})\), A compact, let \(\{\lambda _j\}_{j=1}^{\infty }\) be its eigenvalues, then \(\lim _{j \rightarrow \infty }\lambda _j = 0\). For \(A+\gamma I \in \mathrm{PK}({\mathcal {H}})\), \(\log (\frac{A}{\gamma }+I)\) is compact, with eigenvalues \(\{\log (\frac{\lambda _j}{\gamma } + 1)\}_{j=1}^{\infty } \rightarrow 0\) as \(j \rightarrow \infty \). Hence \( \log (A+\gamma I) = \log (\frac{A}{\gamma } +I) + (\log \gamma )I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}). \) Conversely, for \(A+\gamma I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\), we have \( \exp (A+\gamma I) = e^{\gamma }\exp (A) = e^{\gamma }I + e^{\gamma }[\exp (A) - I] \in \mathrm{PK}({\mathcal {H}}), \) since the operator \((\exp (A) - I)\), with eigenvalues \(\{e^{\lambda _j} - 1\}_{j=1}^{\infty } \rightarrow 0\) as \(j \rightarrow \infty \), is compact. \(\square \)
From the relation \(\exp (\log (A_0)) = A_0\)\(\forall A_0 \in \mathrm{PK}({\mathcal {H}})\) and the chain rule, it follows that
Similarly, from the relation \(\log (\exp (B_0)) = B_0\)\(\forall B_0 \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) and the chain rule, we have
For each \(A_0 \in \mathrm{PK}({\mathcal {H}})\), we have \(A_0 = \exp (B_0)\) where \(B_0 =\log (A_0) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\), thus
Hence it follows that \(\forall A_0 \in \mathrm{PK}({\mathcal {H}})\),
Thus \(D\log (A_0)\) and \(D\exp (\log (A_0))\) are invertible operators in \({\mathcal {L}}({\mathcal {A}})\) and
We first consider the Fréchet derivative of \(f(t) = (I+tA)^r\), \(r \in {\mathbb {R}}\), for \(A \in \mathrm{Sym}({\mathcal {H}})\).
Lemma 26
Assume that \(A,B \in \mathrm{Sym}({\mathcal {H}})\) are compact, \(A \ne 0\). Let \(r \in {\mathbb {R}}\) be fixed. Consider the function \(f:(-1/||A||, 1/||A||) \rightarrow \mathrm{Sym}({\mathcal {H}})\) defined by \(f(t) = (I+ tA)^r + B\). Then
Proof
Since \(t \in (-1/||A||, 1/||A||)\), \(I+tA > 0\) and thus \((I+tA)^r\) is well-defined \(\forall r \in {\mathbb {R}}\). The derivative of f does not depend on the constant term B so we can set \(B = 0\). Write \(f(t) = (I+tA)^r = \exp (r\log (I+tA))\). By the chain rule, we have
\(\square \)
Lemma 27
Assume that \(A \in \mathrm{Sym}({\mathcal {H}})\) is compact, \(A \ne 0\). Let \(r,s,c \in {\mathbb {R}}, c\ge 0\) be fixed. Consider the function \(f:(-1/||A||, 1/||A||) \rightarrow \mathrm{Sym}({\mathcal {H}})\) defined by \(f(t) = [(I+tA)^r + cI]^{-1}(I+tA)^{s}\). Then
Proof
Since \(t \in (-1/||A||, 1/||A||)\), \(I+tA > 0\) and thus \((I+tA)^r, (I+tA)^s\) are well-defined \(\forall r,s \in {\mathbb {R}}\). Write \(f(t) = [(I + tA)^{r-s} + c(I+tA)^{-s}]^{-1} = [g(t)]^{-1}\), where \(g(t) = (I + tA)^{r-s} + c(I+tA)^{-s}\), where \(g(0) = (1+c)I\). By Lemma 26,
By Eq. (173), we then have \( Df(0)(t) = -g(0)^{-1}[Dg(0)(t)]g(0)^{-1} = -\frac{ r- (1+c)s}{(1+c)^2}At. \)\(\square \)
Lemma 28
Let \({\mathcal {A}}\) be a Banach algebra with identity I. Let \(A,B \in {\mathcal {A}}\) be fixed and \(k \in {\mathbb {N}}\). Consider the function \(f:{\mathbb {R}}\rightarrow {\mathcal {A}}\) defined by \(f(t) = (A+tB)^k\). Then
Proof
Write \(f(t) = [g(t)]^k = h(g(t))\), where \(g(t) = (A+tB)\), with \(g(0) = A\) and \(Dg(0)(t) = Bt\), and \(h(C) = C^k\). By the chain rule and Eq. (169), we have
\(\square \)
Lemma 29
Let V be a Banach space. Let \(\varOmega \subset {\mathbb {R}}\) be an open subset. Consider \(F(t) = g(t)H(t)\), where \(H:\varOmega \rightarrow V\) and \(g:\varOmega \rightarrow {\mathbb {R}}\) are both differentiable on \(\varOmega \). Then F is differentiable on \(\varOmega \) and
Lemma 30
Let \({\mathcal {A}}\) be a Banach algebra with identity I. Let \(A,B \in {\mathcal {A}}\) be fixed, \(\gamma \in {\mathbb {R}},\gamma \ne 0\), \(\mu \in {\mathbb {R}}\), and \(k \in {\mathbb {N}}\). Consider the function \(f:{\mathbb {R}}\rightarrow {\mathcal {A}}\) defined by \(f(t) = \frac{(A+tB)^k}{(\gamma + t\mu )^k}\). Then
Proof
For \(\mu = 0\), f is well-defined \(\forall t \in {\mathbb {R}}\). For \(\mu \ne 0\), f is well-defined on \((-\infty , -\frac{\gamma }{\mu }) \cup (-\frac{\gamma }{\mu }, \infty )\). Let \(g(t) = (A+tB)^k\), then \(g(0) = A^k\) and \(Dg(0)(1) = A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}\) by Lemma 28. Let \(h(t) = (\gamma + t \mu )^{-k}\), with \(h(0) = \gamma ^{-k}\), \(h{'}(0) = -k \gamma ^{-k-1}\mu \). By Lemma 29,
\(\square \)
We now consider the function \(f(t) = [(A+\gamma I) + t(B+\mu I)]^r\), \(r \in {\mathbb {R}}\), for \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B+\mu I)\in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). We recall the binomial series expansion in a unital Banach algebra \({\mathcal {A}}\). Let \(A \in {\mathcal {A}}\) be such that \(||A|| < 1\). Then the following binomial series converges absolutely
Proof of Proposition 3
We consider three cases: \(r= k\), \(r=-k\) (\(k \in {\mathbb {N}}\)), and \(r \in {\mathbb {R}}\) in general. While the first two cases are special cases of the last one, they do not require the assumption \(||A||_{\mathrm{tr}} < \gamma \) and result in considerably simpler expressions for the Fréchet derivative.
(a) Consider first the function f of the form \(f(t) = (A+\gamma I + t(B+\mu I))^k\), \(k \in {\mathbb {N}}\), which is well-defined \(\forall t \in {\mathbb {R}}\). Write \(f(t) = h(g(t)))^k\), where \(h(A) = A^k\) and \(g(t) = A+\gamma I + t(B+\mu I)\). By the chain rule and Eq. (169), we have
(b) Consider next the case \(r = -k\), \(k \in {\mathbb {N}}\). Write \(f(t) = h(g(t))\), where \(g(t) = (A+\gamma I + t(B+\mu I))^k\) and \(h(A) = A^{-1}\). Let t be sufficiently close to zero such that \(A+\gamma I + t(B+\mu I) > 0\), then g(t) is invertible and f(t) is well-defined. By the chain rule and Eq. (173), we have
(c) Consider now the general case \(r \in {\mathbb {R}}\). Let t be sufficiently close to zero such that \(A+\gamma I + t(B+\mu I) > 0\). Then \([A+\gamma I + t(B+\mu I)]^r\) is well-defined for any \(r \in {\mathbb {R}}\). We write
Let \(\epsilon = \gamma - ||A||_{\mathrm{tr}} > 0\) by the assumption that \(||A||_{\mathrm{tr}} < \gamma \). Then
for all t satisfying \(|t|\;||B|| - t\mu < \epsilon \). Let \(\varOmega \in {\mathbb {R}}\) be an open set such that \(0 \in \varOmega \) and
Then f(t) is well-defined on \(\varOmega \) and furthermore \(||\frac{A+tB}{\gamma + t\mu }||_{\mathrm{tr}} < 1\)\(\forall t \in \varOmega \). Thus g(t) admits the following absolutely convergent binomial series expansion on \(\varOmega \)
Since each term in the series is differentiable on \(\varOmega \), it follows that g and hence f are both differentiable on \(\varOmega \). Let \(g_k(t) = \frac{(A+tB)^k}{(\gamma + t\mu )^k}\), then by Lemma 30,
It follows that
By Lemma 29, we then have
\(\square \)
Setting \(A = 0\) and \(\gamma = 1\) in Proposition 3, we obtain the following result.
Corollary 3
Let \((B+ \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}}), r \in {\mathbb {R}}\) be fixed. Then there exists an open set \(\varOmega \in {\mathbb {R}}\) containing 0 such that the function \(f(t) = [I+ t(B+ \mu I)]^r\) is differentiable on \(\varOmega \). Furthermore,
Corollary 4
Assume that \((B+ \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\) is fixed. Let \(r \in {\mathbb {R}}\) be fixed. Then for any \((C+\nu I ) \in \mathrm{Tr}_X({\mathcal {H}})\),
Proof
Since \((C+\nu I)\) is a constant, for \(g(t) = (I+ t(B+\mu I))^r(C+\nu I)\), from Corollary 3, we have \(Dg(0)(1) = r(B+\mu I)(C+\nu I)\). Thus by the chain rule and Lemma 2, for the function \(f(t) = \mathrm{tr}_X[(I+ t(B+ \mu I))^r(C+\nu I)]\),
Since \(f{'}(0) = Df(0)(1)\), this gives us the desired result. \(\square \)
Proof of Proposition 4
Write \(f(t) = \mathrm{tr}_X[(A+\gamma I+t(B+\mu I))^r] = \mathrm{tr}_X[g(t)]\), where \(g(t) = (A+\gamma I +t(B+\mu I))^r\). By the chain rule and Lemma 2, we have
By Proposition 3, we then have
\(\square \)
Proof of Proposition 5
Write \(f(t) = \log \mathrm{det_X}[(A+\gamma I+t(B+\mu I))^r + cI] = \log \mathrm{det_X}[g(t)]\), where \(g(t) = (A+\gamma I +t(B+\mu I))^r + cI\). By the chain rule and Lemma 4, we have
By Proposition 3, noting that the Fréchet derivatives of \((A+\gamma I +t(B+\mu I))^r + cI\) and \((A+\gamma I + t(B+\mu I))^r\) are the same, we have
\(\square \)
Proof of Theorem 25
(The case\(\alpha> 0, \beta > 0\)) Consider the case \(\alpha> 0, \beta > 0\). Let \(B = A + sA_1 + tA_2\) and \(\mu = \gamma +s\gamma _1 + t\gamma _2\). By definition of \(D^{(\alpha , \beta )}_r\) and using the product and inverse rules of \(\mathrm{det_X}\),
For \(\mu = \gamma + s\gamma _1 + t \gamma _2\), \(\mu (t=0) = \gamma + s \gamma _1\), \(\mu (s=0,t=0) = \gamma \), \(\frac{\partial \mu }{\partial s} = \gamma _1\), \(\frac{\partial \mu }{\partial t} = \gamma _2\). With \(\delta = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\), we have \(\delta (t=0) = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r}\) and \( \frac{\partial \delta }{\partial t} = -\frac{(\alpha \gamma ^r)(r\beta \mu ^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta \mu ^r)^2} \), \(\frac{\partial \delta }{\partial t}\big \vert _{t=0} = -\frac{(\alpha \gamma ^r)(r\beta (\gamma + s\gamma _1)^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r)^2} \), \( \frac{\partial }{\partial t}\log \frac{\gamma }{\mu }\big \vert _{t=0} = -\frac{\gamma _2}{\gamma + s\gamma _1}. \)
(a) Consider the first term in Eq. (185), we have
Differentiating with respect to s then gives
(b) Consider the second term in Eq. (185). With \(\nu = \frac{\gamma }{\mu }\), we write
With \(B+\mu I = (A+\gamma I) + s(A_1 + \gamma _1I) + t(A_2 + \gamma _2I)\), we have
where \((Z_1 + \nu _1 I) = (A+\gamma I)^{-1/2}(A_1+\gamma _1I)(A+\gamma I)^{-1/2}\), \((Z_2 + \nu _2I) = (A+\gamma I)^{-1/2}(A_2+\gamma _2I)(A+\gamma I)^{-1/2}\), \(\nu _1 = \frac{\gamma _1}{\gamma }\), \(\nu _2 = \frac{\gamma _2}{\gamma }\), \(Z_1, Z_2 \in \mathrm{Tr}({\mathcal {H}})\).
By definition, \(Z +\nu I = (A+\gamma I)(B+ \mu I)^{-1}\), with \(\nu = \frac{\gamma }{\mu }\), so that
By Proposition 5, for s sufficiently close to zero so that \(|s|\;||Z_1||_{\mathrm{tr}} < 1 + s\nu _1\),
By Lemma 2, Lemma 27, and the chain rule, differentiating the last expression with respect to s gives
(c) Consider the third term in Eq. (185). By definition, \(Z +\nu I = (A+\gamma I)(B+ \mu I)^{-1}\), with \(\nu = \frac{\gamma }{\mu }\), so that with \(B+\mu I = (A+\gamma I) + s(A_1 + \gamma _1I) + t(A_2 + \gamma _2I)\), we have
By Proposition 5, differentiating with respect to t gives
By Proposition 5, \(\frac{d}{ds}\log \mathrm{det_X}(A+\gamma I + s(A_1 + \gamma _1I))\big \vert _{s=0} = \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)]. \) Also
Combining the last three expressions, we obtain
Combining Eqs. (186), (187), (188), we obtain
\(\square \)
Proof of of Theorem 25 - Limiting cases
We now prove Theorem 25 for the case \(\alpha > 0, \beta = 0\) (the case \(\alpha =0, \beta > 0\) is then obtained by dual symmetry). It suffices to carry out the proof for \(\alpha = 1\), where
Let \(B = A + sA_1 + tA_2, \mu = \gamma + s \gamma _1 + t \gamma _2\), with \(\mu (t=0) = \gamma + s\gamma _1\), \(\mu (s=0, t=0) = \gamma \), \(\frac{\partial \mu }{\partial s} = \gamma _1\), \(\frac{\partial \mu }{\partial t} = \gamma _2\). By definition, \(\frac{\partial }{\partial t}\left( \frac{\mu }{\gamma }\right) ^r = r\frac{\gamma _2}{\gamma }\left( \frac{\mu }{\gamma }\right) ^{r-1}, \frac{\partial }{\partial t}\left( \frac{\mu }{\gamma }\right) ^r \big \vert _{t=0} = r\frac{\gamma _2}{\gamma }\left( \frac{\gamma + s\gamma _1}{\gamma }\right) ^{r-1}. \)
(a) Consider the first term, we have \( \frac{\partial }{\partial t}\left[ \left( \frac{\mu }{\gamma }\right) ^r-1\right] \log \frac{\mu }{\gamma } \big \vert _{t=0} = r\frac{\gamma _2}{\gamma }\left( \frac{\gamma + s\gamma _1}{\gamma }\right) ^{r-1}\log \frac{\gamma + s\gamma _1}{\gamma } +\left[ \left( \frac{\gamma +s\gamma _1}{\gamma }\right) ^r-1\right] \frac{\gamma _2}{\gamma + s\gamma _1}. \) Differentiating with respect to s then gives
(b) Consider the second term. For \(B = A+sA_1 + tA_2\) and \(\mu = \gamma + s\gamma _1 + t\gamma _2\), by the definition of the power function, for any \(r \in {\mathbb {R}}\),
where \((Z_1 + \nu _1 I) = (A+\gamma I)^{-1/2}(A_1+\gamma _1I)(A+\gamma I)^{-1/2}\), \((Z_2 + \nu _2I) = (A+\gamma I)^{-1/2}(A_2+\gamma _2I)(A+\gamma I)^{-1/2}\), \(\nu _1 = \frac{\gamma _1}{\gamma }\), \(\nu _2 = \frac{\gamma _2}{\gamma }\), \(Z_1, Z_2 \in \mathrm{Tr}({\mathcal {H}})\).
For s sufficiently close to zero so that \(|s|\;||Z_1||_{\mathrm{tr}} < 1 + s\nu _1\), by Proposition 4,
By Corollary 4, differentiating with respect to s, we obtain
(c) Consider the third term. For \(B = A+sA_1 + tA_2\) and \(\mu = \gamma + s\gamma _1 + t\gamma _2\), we have by the product and inverse rules of \(\mathrm{det_X}\) that
It follows from Proposition 5 and the above calculations that
By Proposition 5, \( \frac{d}{ds}\log \mathrm{det_X}[(A+ \gamma I + s(A_1 + \gamma _1I)]\big \vert _{s=0} = \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)]. \) Also
Combining the last three expressions, we obtain
Combining Eqs. (190), (191), and (192), we obtain
\(\square \)
Proof of Lemma 5
By definition of the extended trace \(\mathrm{tr}_X\) and the extended inner product \(\langle \;, \;\rangle _{\mathrm{HS_X}}\),
\(\square \)
Proof of Corollary 2
This follows from Theorem 25, Lemma 5, and the property
for any pair \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B+\mu I) \in \mathrm{Tr}_X({\mathcal {H}})\). \(\square \)
Rights and permissions
About this article
Cite this article
Minh, H.Q. Alpha-Beta Log-Determinant Divergences Between Positive Definite Trace Class Operators. Info. Geo. 2, 101–176 (2019). https://doi.org/10.1007/s41884-019-00019-w
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41884-019-00019-w
Keywords
- Positive definite operators
- Trace class operators
- Infinite-dimensional Log-Determinant divergences
- Alpha-Beta divergences
- Affine-invariant Riemannian distance
- Stein divergence
- Extended trace
- Extended Fredholm determinant
- Reproducing kernel Hilbert spaces
- Covariance operators