Skip to main content
Log in

Alpha-Beta Log-Determinant Divergences Between Positive Definite Trace Class Operators

  • Research Paper
  • Published:
Information Geometry Aims and scope Submit manuscript

Abstract

This work presents a parametrized family of divergences, namely Alpha-Beta Log-Determinant (Log-Det) divergences, between positive definite unitized trace class operators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant divergences between symmetric, positive definite matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det divergences is highly general and contains many divergences as special cases, including the recently formulated infinite-dimensional affine-invariant Riemannian distance and the infinite-dimensional Alpha Log-Det divergences between positive definite unitized trace class operators. In particular, it includes a parametrized family of metrics between positive definite trace class operators, with the affine-invariant Riemannian distance and the square root of the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det divergences between covariance operators on a Reproducing Kernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. The current formulation for Alpha-Beta Log-Det divergences can be generalized to the entire Hilbert manifold of positive definite Hilbert–Schmidt operators. This will be presented in a separate work [25].

References

  1. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)

    Article  MathSciNet  Google Scholar 

  2. Barbaresco, F.: Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet median. Matrix Information Geometry, pp. 199–255. Springer, New York (2013)

    Chapter  Google Scholar 

  3. Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)

    MATH  Google Scholar 

  4. Bhatia, R.: Matrix analysis, vol. 169. Springer, New York (2013)

    MATH  Google Scholar 

  5. Bini, D.A., Iannazzo, B.: Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)

    Article  MathSciNet  Google Scholar 

  6. Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436(7), 1872–1889 (2012)

    Article  MathSciNet  Google Scholar 

  7. Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.: Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2161–2174 (2013)

    Article  Google Scholar 

  8. Cherian, A., Stanitsas, P., Harandi, M., Morellas, V., Papanikolopoulos, N.: Learning discriminative \(\alpha \beta \)-divergences for positive definite matrices. In The IEEE International Conference on Computer Vision (ICCV), Oct (2017)

  9. Cichocki, A., Cruces, S., Amari, S.: Log-Determinant divergences revisited: Alpha-Beta and Gamma Log-Det divergences. Entropy 17(5), 2988–3034 (2015)

    Article  MathSciNet  Google Scholar 

  10. Fan, K.: On a theorem of Weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. USA 36(1), 31 (1950)

    Article  MathSciNet  Google Scholar 

  11. Formont, P., Ovarlez, J.P., Pascal, F.: On the use of matrix information geometry for polarimetric SAR image classification. Matrix Information Geometry, pp. 257–276. Springer, New York (2013)

    Chapter  Google Scholar 

  12. Harandi, M., Salzmann, M., Porikli, F.: Bregman divergences for infinite dimensional covariance matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1003–1010, (2014)

  13. Hasegawa, H.: \(\alpha \)-divergence of the non-commutative information geometry. Rep. Math. Phys. 33(1), 87–93 (1993)

    Article  MathSciNet  Google Scholar 

  14. Minh, H.Q.: Regularized divergences between covariance operators and Gaussian measures on Hilbert spaces. arXiv preprint arXiv:1904.05352, (2019)

  15. Jayasumana, S., Hartley, R., Salzmann, M., Hongdong, Li., Harandi, M.: Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 73–80, (2013)

  16. Jenčová, A.: Geometry of quantum states: dual connections and divergence functions. Rep. Math. Phys. 47(1), 121–138 (2001)

    Article  MathSciNet  Google Scholar 

  17. Jost, J.: Postmodern analysis. Springer, Berlin (1998)

    Book  Google Scholar 

  18. Kittaneh, F., Kosaki, H.: Inequalities for the Schatten p-norm V. Publ. Res. Inst. Math. Sci. 23(2), 433–443 (1987)

    Article  MathSciNet  Google Scholar 

  19. Kulis, B., Sustik, M.A., Dhillon, I.S.: Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10, 341–376 (2009)

    MathSciNet  MATH  Google Scholar 

  20. Larotonda, G.: Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators. Differ. Geom. Appl. 25, 679–700 (2007)

    Article  MathSciNet  Google Scholar 

  21. Lawson, J.D., Lim, Y.: The geometric mean, matrices, metrics, and more. Am. Math. Mon. 108(9), 797–812 (2001)

    Article  MathSciNet  Google Scholar 

  22. Li, P., Wang, Q., Zuo, W., Zhang, L.: Log-Euclidean kernels for sparse representation and dictionary learning. In International Conference on Computer Vision (ICCV), pp. 1601 – 1608, (2013)

  23. Minh, H.Q.: Affine-invariant Riemannian distance between infinite-dimensional covariance operators. In Geometric Science of Information, pp. 30–38, (2015)

    Google Scholar 

  24. Minh, H.Q.: Infinite-dimensional Log-Determinant divergences between positive definite trace class operators. Linear Algebra Appl. 528, 331–383 (2017)

    Article  MathSciNet  Google Scholar 

  25. Minh, H.Q.: Log-Determinant divergences between positive definite Hilbert-Schmidt operators. In Geometric Science of Information, pp. 505–513, (2017)

    Google Scholar 

  26. Minh, H.Q., Murino, V.: From covariance matrices to covariance operators: Data representation from finite to infinite-dimensional settings. Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning. Computer Vision, Statistics, and Optimization, pp. 115–143. Springer International Publishing, Cham (2016)

    Chapter  Google Scholar 

  27. Minh, H.Q., Murino, V.: In Synthesis Lectures on Computer Vision. Covariances in Computer Vision and Machine Learning. Morgan & Claypool Publishers, San Rafael (2017)

    Chapter  Google Scholar 

  28. Minh, H.Q., San Biagio, M., Bazzani, L., Murino, V.: Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June (2016)

  29. Minh, H.Q., San Biagio, M., Murino, V.: Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces. In Advances in Neural Information Processing Systems (NIPS), pp. 388–396, (2014)

  30. Mostow, G.D.: Some new decomposition theorems for semi-simple groups. Mem. Am. Math. Soc. 14, 31–54 (1955)

    MathSciNet  MATH  Google Scholar 

  31. Ohara, A., Eguchi, S.: Geometry on positive definite matrices deformed by v-potentials and its submanifold structure. Geometric Theory of Information, pp. 31–55. Springer International Publishing, Cham (2014)

    Chapter  Google Scholar 

  32. Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006)

    Article  Google Scholar 

  33. Petryshyn, W.V.: Direct and iterative methods for the solution of linear operator equations in Hilbert spaces. Trans. Am. Math. Soc. 105, 136–175 (1962)

    Article  MathSciNet  Google Scholar 

  34. Peypouquet, J.: Convex optimization in normed spaces: theory, methods and examples. Springer, New York (2015)

    Book  Google Scholar 

  35. Pigoli, D., Aston, J., Dryden, I.L., Secchi, P.: Distances and inference for covariance operators. Biometrika 101(2), 409–422 (2014)

    Article  MathSciNet  Google Scholar 

  36. Simon, B.: Notes on infinite determinants of Hilbert space operators. Adv. Math. 24, 244–273 (1977)

    Article  MathSciNet  Google Scholar 

  37. Sra, S.: A new metric on the manifold of kernel matrices with application to matrix geometric means. In Advances in Neural Information Processing Systems (NIPS), pp. 144–152, (2012)

  38. Stanitsas, P., Cherian, A., Morellas, V., Papanikolopoulos, N.: Clustering positive definite matrices by learning information divergences. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1304–1312, (2017)

  39. Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hà Quang Minh.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs of Main results

Proofs of Main results

1.1 Proofs for the General Alpha-Beta Log-Determinant divergences

In this section, we prove Lemma 1, Proposition 1, and Theorems 789, and 10 .

Proof of Lemma 1

Since any \(A \in {\mathcal {L}}({\mathcal {H}})\) commutes with the identity I, we have

$$\begin{aligned} \exp (A+\gamma I) = e^{\gamma }\exp (A) = e^{\gamma }\left( I + \sum _{j=1}^{\infty }\frac{A^j}{j!}\right) = e^{\gamma }I + e^{\gamma } \sum _{j=1}^{\infty }\frac{A^j}{j!} \in \mathrm{Tr}_X({\mathcal {H}}), \end{aligned}$$

since \(\sum _{j=1}^{\infty }\frac{A^j}{j!}\) is trace class, with \(\left\| \sum _{j=1}^{\infty }\frac{A^j}{j!}\right\| _{\mathrm{tr}} \le \sum _{j=1}^{\infty }\frac{||A||^j_{\mathrm{tr}}}{j!} = \exp (||A||_{\mathrm{tr}}) -1 < \infty .\)\(\square \)

Proof of Proposition 1

For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), we have \((B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2} \in \mathrm{PTr}({\mathcal {H}})\) and the logarithm \(\log [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}] \in \mathrm{Tr}_X({\mathcal {H}})\) is well-defined. By the discussion preceding Proposition 1,

$$\begin{aligned}&\log [(A+\gamma I)(B+\mu I)^{-1}] \\&\quad = \log [(B+\mu I)^{1/2}(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}(B+\mu I)^{-1/2}] \\&\quad = (B+\mu I)^{1/2}\log [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}] (B+ \mu I)^{-1/2} \\&\quad = (B+\mu I)^{1/2}\log (\varLambda + \frac{\gamma }{\mu }I)(B+ \mu I)^{-1/2} \in \mathrm{Tr}_X({\mathcal {H}}). \end{aligned}$$

For the power function, we have

$$\begin{aligned}&[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha } = \exp (\alpha \log [(A+\gamma I)(B+\mu I)^{-1}]) \\&\quad = \exp [(B+\mu I)^{1/2}\alpha \log (\varLambda + \frac{\gamma }{\mu }I)(B+ \mu I)^{-1/2}] \\&\quad = (B+\mu I)^{1/2}\exp [\alpha \log (\varLambda + \frac{\gamma }{\mu }I)](B+ \mu I)^{-1/2} \\&\quad = (B+\mu I)^{1/2}(\varLambda + \frac{\gamma }{\mu }I)^{\alpha }(B+ \mu I)^{-1/2}. \end{aligned}$$

For the sum of two power functions, we then have

$$\begin{aligned}&\frac{\alpha [(A+\gamma I)(B+\mu I)^{-1}]^{p} + \beta [(A+\gamma I)(B+ \mu I)^{-1}]^{q}}{\alpha + \beta } \\&\quad = (B+\mu I)^{1/2}\left[ \frac{\alpha (\varLambda + \frac{\gamma }{\mu } I )^{p} + \beta (\varLambda + \frac{\gamma }{\mu } I)^{q}}{\alpha + \beta }\right] (B+\mu I)^{-1/2}. \end{aligned}$$

By Lemma 5 in [24], \(\mathrm{det_X}[C(A+\gamma I)C^{-1}] = \mathrm{det_X}(A+\gamma I)\) for any invertible \(C \in {\mathcal {L}}({\mathcal {H}})\). Thus

$$\begin{aligned}&\mathrm{det_X}\left[ \frac{\alpha [(A+\gamma I)(B+\mu I)^{-1}]^{p} + \beta [(A+\gamma I)(B+ \mu I)^{-1}]^{q}}{\alpha + \beta }\right] \\&\quad =\mathrm{det_X}\left[ \frac{\alpha (\varLambda + \frac{\gamma }{\mu } I )^{p} + \beta (\varLambda + \frac{\gamma }{\mu } I)^{q}}{\alpha + \beta }\right] . \end{aligned}$$

\(\square \)

Proof of Theorem 7

By definition of the power function, we have

$$\begin{aligned}&\alpha (A+ \gamma I)^p + (1-\alpha )(B+ \mu I)^q = \alpha \exp [p\log (A+\gamma I)] + (1-\alpha )\exp [q\log (B + \mu I)] \\&\quad = \alpha \exp \left[ p\log \left( \frac{A}{\gamma }+I\right) + p(\log \gamma )I\right] + (1-\alpha )\exp \left[ q\log \left( \frac{B}{\mu }+I\right) +q(\log \mu )I\right] \\&\quad = \alpha \gamma ^{p}\left( \frac{A}{\gamma }+I\right) ^p + (1-\alpha )\mu ^q\left( \frac{B}{\mu }+I\right) ^q. \end{aligned}$$

It follows that for \(\delta = \frac{\alpha \gamma ^{p}}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\), \(1-\delta =\frac{(1-\alpha ) \mu ^q}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\), we have

$$\begin{aligned}&\mathrm{det_X}[\alpha (A+\gamma I)^p + (1-\alpha )(B+\mu I)^q] \\&\quad = [\alpha \gamma ^p + (1-\alpha ) \mu ^q] \det \left[ \frac{\alpha \gamma ^{p}}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\left( \frac{A}{\gamma }+I\right) ^p \right. \\&\qquad \left. + \frac{(1-\alpha )\mu ^q}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\left( \frac{B}{\mu }+I\right) ^q\right] \\&\quad \ge [\alpha \gamma ^p + (1-\alpha ) \mu ^q]\det \left( \frac{A}{\gamma }+I\right) ^{p\delta } \det \left( \frac{B}{\mu }+I\right) ^{q(1-\delta )} \;\;\;\;\; \text {by Proposition 7 in [24]} \\&\quad \ge \gamma ^{p\alpha }\mu ^{(1-\alpha )q}\det \left( \frac{A}{\gamma }+I\right) ^{p\delta } \det \left( \frac{B}{\mu }+I\right) ^{q(1-\delta )} \\&\quad \;\;\;\;\; \text {by Ky Fan's Inequality applied to }\alpha \gamma ^p + (1-\alpha ) \mu ^q \\&\quad = \gamma ^{p(\alpha - \delta )}\mu ^{-q(\alpha -\delta )}\mathrm{det_X}(A+\gamma I)^{p\delta }\mathrm{det_X}(B+\mu I)^{q(1-\delta )} \\&\quad = \left( \frac{\gamma ^p}{\mu ^q}\right) ^{\alpha -\delta }\mathrm{det_X}(A+\gamma I)^{p\delta }\mathrm{det_X}(B+\mu I)^{q(1-\delta )}. \end{aligned}$$

For \(0<\alpha < 1\), equality happens if and only if simultaneously, we have

$$\begin{aligned} \left( \frac{A}{\gamma }+I\right) ^{p} = \left( \frac{B}{\mu }+I\right) ^{q} \;\; \text {and}\;\;\; \gamma ^p = \mu ^q \Longleftrightarrow (A+\gamma I)^p = (B+\mu I)^q. \end{aligned}$$

In particular, for \(\gamma = \mu \), the condition \(\gamma ^{p} = \mu ^{q}\) becomes

$$\begin{aligned} \gamma ^p = \gamma ^q \Longleftrightarrow \gamma ^{p-q} = 1 \Longleftrightarrow p = q \;\;\; \text {if}\;\;\; \gamma \ne 1. \end{aligned}$$

With the conditions \(\gamma = \mu \ne 1\) and \(p =q\), we then have \((\frac{A}{\gamma }+I)^{p} = (\frac{B}{\gamma }+I)^{p} \Longleftrightarrow A = B\).\(\square \)

Proof of Theorem 8

Recall that we write \( (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2} = \varLambda + (\gamma /\mu )I \in \mathrm{PTr}({\mathcal {H}}). \) Its inverse, also in \(\mathrm{PTr}({\mathcal {H}})\), has the form \( (B+\mu I)^{1/2}(A+\gamma I)^{-1}(B+\mu I)^{1/2} = [\varLambda + (\gamma /\mu )I]^{-1} = \frac{\mu }{\gamma }I - (\frac{\mu }{\gamma })^2\varLambda (I + \frac{\mu }{\gamma }\varLambda )^{-1}. \) It follows from Corollary 1 that

$$\begin{aligned}&\mathrm{det_X}\left[ \frac{\alpha [(\varLambda + (\gamma /\mu )I]^{p} + \beta [(\varLambda + (\gamma /\mu )I)^{-1}]^{q}}{\alpha + \beta }\right] \nonumber \\&\quad \ge \left( \frac{(\gamma /\mu )^p}{(\mu /\gamma )^q}\right) ^{\frac{\alpha }{\alpha +\beta } -\delta }\mathrm{det_X}(\varLambda + (\gamma /\mu )I]^{p\delta }\mathrm{det_X}[(\varLambda + (\gamma /\mu )I]^{-q(1-\delta )} \nonumber \\&\quad = \left( \frac{\gamma }{\mu }\right) ^{(p+q)(\frac{\alpha }{\alpha +\beta }-\delta )} \mathrm{det_X}(\varLambda + (\gamma /\mu )I]^{p\delta }\mathrm{det_X}[(\varLambda + (\gamma /\mu )I]^{-q(1-\delta )}, \end{aligned}$$
(129)

where \(\delta = \frac{\alpha (\frac{\gamma }{\mu })^{p}}{\alpha (\frac{\gamma }{\mu })^{p} + \beta (\frac{\mu }{\gamma })^{q}} = \frac{\alpha (\frac{\gamma }{\mu })^{p+q}}{\alpha (\frac{\gamma }{\mu })^{p+q} + \beta }\), \(1-\delta = \frac{\beta (\frac{\mu }{\gamma })^{q}}{\alpha (\frac{\gamma }{\mu })^{p} + \beta (\frac{\mu }{\gamma })^{q}} = \frac{\beta }{\alpha (\frac{\gamma }{\mu })^{p+q} + \beta }\).

For the two determinants on the right hand side of (129) to cancel each other out, we need

$$\begin{aligned} p\delta = q(1-\delta ) \Longleftrightarrow \alpha p\left( \frac{\gamma }{\mu }\right) ^{p} = \beta q \left( \frac{\mu }{\gamma }\right) ^{q} \Longleftrightarrow \alpha p \left( \frac{\gamma }{\mu }\right) ^{p+q} = \beta q. \end{aligned}$$

Assuming that this condition holds, then by the definition of \(D^{(\alpha , \beta )}_{(p,q)}\), (129) gives

$$\begin{aligned}&\left[ \left( \frac{\gamma }{\mu }\right) ^{(p+q)(\delta -\frac{\alpha }{\alpha +\beta })} \mathrm{det_X}\left( \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I)^p + \beta (\varLambda +\frac{\gamma }{\mu }I)^{-q}}{\alpha + \beta }\right) \right] \ge 1 \\&\quad \Longleftrightarrow D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)] \ge 0. \end{aligned}$$

In the inequality in (129), the equality sign happens if and only if

$$\begin{aligned}{}[(\varLambda + (\gamma /\mu )I]^{p} = [(\varLambda + (\gamma /\mu )I]^{-q} \Longleftrightarrow [(\varLambda + (\gamma /\mu )I]^{p+q} = I. \end{aligned}$$

If \(p+q = 0\), then this is always true, so that \(D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+\mu I)] = 0\) for all pairs \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), which is not what we want. In fact, with \(p+q = 0\), the condition \(\alpha p \left( \frac{\gamma }{\mu }\right) ^{p+q} = \beta q\) gives \((\alpha + \beta ) p = 0 \Rightarrow p = 0 \Rightarrow q = 0\).

If \(p +q \ne 0\), since \(\varLambda + (\gamma /\mu )I > 0\), this happens if and only if

$$\begin{aligned}&\varLambda + (\gamma /\mu )I = I \Longleftrightarrow (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2} = I \\&\quad \Longleftrightarrow A+\gamma I = B+\mu I \Longleftrightarrow A=B\;\;\text {and}\;\;\;\gamma = \mu . \end{aligned}$$

\(\square \)

Proof of Theorem 9

Under the condition \(p+q = r\), by Theorem 8, we have \(\alpha p\left( \frac{\gamma }{\mu }\right) ^{r} = \beta (r-p) \Rightarrow p = \frac{\beta r}{\alpha \left( \frac{\gamma }{\mu }\right) ^{r} + \beta } \) Thus \(q = r - p = \frac{r \alpha \left( \frac{\gamma }{\mu }\right) ^{r}}{\alpha \left( \frac{\gamma }{\mu }\right) ^{r} + \beta }\). The equivalence of Eqs. (18) and (19) then follows from Proposition 1. \(\square \)

Proof of Theorem 10

We have

$$\begin{aligned}&\frac{\alpha (\varLambda + \frac{\gamma }{\mu }I)^p + \beta (\varLambda + \frac{\gamma }{\mu }I)^{-q}}{\alpha + \beta } = \frac{\alpha (\frac{\gamma }{\mu })^p (\frac{\mu }{\gamma }\varLambda + I)^p + \beta (\frac{\gamma }{\mu })^{-q}(\frac{\mu }{\gamma }\varLambda + I)^{-q}}{\alpha + \beta } \\&\quad = \frac{\alpha (\frac{\gamma }{\mu })^p (I+C_1) + \beta (\frac{\gamma }{\mu })^{-q} (I+C_2)}{\alpha + \beta } \\&\quad = \frac{\left[ \alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}\right] I + \left[ \alpha (\frac{\gamma }{\mu })^p C_1 + \beta (\frac{\gamma }{\mu })^{-q} C_2\right] }{\alpha + \beta } \\&\quad = \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\left[ I + \frac{\alpha (\frac{\gamma }{\mu })^p C_1 + \beta (\frac{\gamma }{\mu })^{-q} C_2}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \end{aligned}$$

where the operators \(C_1\) and \(C_2\) are given by \(C_1 = \sum _{k=1}^{\infty }\frac{p^k}{k!}\left[ \log \left( \frac{\mu }{\gamma }\varLambda + I\right) \right] ^k \in \mathrm{Tr}({\mathcal {H}})\), \(C_2 = \sum _{k=1}^{\infty }\frac{(-1)^kq^k}{k!}\left[ \log \left( \frac{\mu }{\gamma }\varLambda + I\right) \right] ^k \in \mathrm{Tr}({\mathcal {H}})\). By definition of the \(\mathrm{det_X}\) function, we then have

$$\begin{aligned}&\log \mathrm{det_X}\left[ \frac{\alpha (\varLambda + \frac{\gamma }{\mu }I)^p + \beta (\varLambda + \frac{\gamma }{\mu }I)^{-q} )}{\alpha + \beta }\right] \\&\quad = \log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) + \log \det \left[ I + \frac{\alpha (\frac{\gamma }{\mu })^p C_1 + \beta (\frac{\gamma }{\mu })^{-q} C_2}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad = \log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) +\log \det \left[ \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I)^p + \beta (\varLambda + \frac{\gamma }{\mu }I)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] . \end{aligned}$$

This, together with the definition of \(D^{(\alpha , \beta )}_{(p,q)}\), gives us the desired expression. \(\square \)

1.2 Proofs for the affine-invariant Riemannian distance

In this section, we prove Theorem 2 (part 1). In Definition 1, with \(\alpha = \beta \), \(\delta = \frac{(\frac{\gamma }{\mu })^{r}}{(\frac{\gamma }{\mu })^{r} + 1}\), we have

$$\begin{aligned}&D^{(\alpha , \alpha )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\alpha ^2}\log \left[ \left( \frac{\gamma }{\mu }\right) ^{r(\delta -\frac{1}{2})} \mathrm{det_X}\left( \frac{(\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + (\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }}{2}\right) \right] . \end{aligned}$$
(130)

We first need the following results.

Lemma 6

Let \(\gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Let \(\delta = \frac{\gamma ^r}{\gamma ^r+1}\). Then

$$\begin{aligned} \lim _{\alpha \rightarrow 0}\frac{r(\delta - \frac{1}{2})}{\alpha ^2} = \frac{[r{'}(0)]^2}{4}\log \gamma . \end{aligned}$$
(131)

In particular, for \(r = 2\alpha \), we have \(\lim _{\alpha \rightarrow 0}\frac{r(\delta - \frac{1}{2})}{\alpha ^2} = \log \gamma \).

Lemma 7

Let \(\gamma > 0\) be fixed. Let \(\lambda > 0\) be fixed. Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then

$$\begin{aligned} \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{\lambda ^p + \lambda ^{-q}}{2}\right) = \frac{[r{'}(0)]^2}{4}\left[ -(\log \gamma )(\log \lambda ) + \frac{1}{2}(\log \lambda )^2\right] . \end{aligned}$$
(132)

In particular, if \(\gamma = \lambda \), then \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{\gamma ^p + \gamma ^{-q}}{2}\right) = -\frac{[r{'}(0)]^2}{8}(\log \gamma )^2. \) If \(\gamma = 1\), then \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{\lambda ^{r/2} + \lambda ^{-r/2}}{2}\right) = \frac{[r{'}(0)]^2}{8}(\log \lambda )^2. \)

Lemma 8

Let \(\gamma > 0\) be fixed. Let \(\lambda \in {\mathbb {R}}\) be fixed such that \(\lambda + \gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then

$$\begin{aligned} \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{(\lambda + \gamma )^p + (\lambda + \gamma )^{-q}}{\gamma ^p + \gamma ^{-q}}\right) = \frac{[r{'}(0)]^2}{8}\left[ \log \left( \frac{\lambda }{\gamma } +1\right) \right] ^2. \end{aligned}$$
(133)

In particular, if \(r = r(\alpha ) = 2\alpha \), then \(\lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{(\lambda + \gamma )^p + (\lambda + \gamma )^{-q}}{\gamma ^p + \gamma ^{-q}}\right) = \frac{1}{2}\left[ \log \left( \frac{\lambda }{\gamma } +1\right) \right] ^2\).

Lemma 9

Let \(\gamma > 0\) be fixed. Let \(\lambda \in {\mathbb {R}}\) be fixed such that \(\lambda + \gamma > 0\). Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = 0\). Define \(\delta = \frac{\gamma ^r}{\gamma ^r+1},\; p = r(1-\delta ), \; q = r\delta \). Then

$$\begin{aligned} \frac{(\lambda + \gamma )^p + (\lambda + \gamma )^{-q}}{\gamma ^p + \gamma ^{-q}} \ge 1, \;\;\; \log \left( \frac{(\lambda + \gamma )^p + (\lambda + \gamma )^{-q}}{\gamma ^p + \gamma ^{-q}}\right) \ge 0. \end{aligned}$$
(134)

Proof of Theorem 2, part 1

For \(\alpha = \beta \), we have \(\delta = \frac{(\frac{\gamma }{\mu })^r}{(\frac{\gamma }{\mu })^r+1},\; p = r(1-\delta ), \; q = r\delta \). Let \(\{\lambda _j\}_{j \in {\mathbb {N}}}\) be the eigenvalues of \(\varLambda \). By Theorem 10, we have

$$\begin{aligned}&D^{(\alpha , \alpha )}_r[(A+\gamma I), (B+ \mu I)] = \frac{r(\delta -\frac{1}{2})}{\alpha ^2}\log \left( \frac{\gamma }{\mu }\right) +\frac{1}{\alpha ^2} \log \left( \frac{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}{2}\right) \\&\qquad + \frac{1}{\alpha ^2}\log \det \left[ \frac{(\varLambda + \frac{\gamma }{\mu }I)^p + (\varLambda +\frac{\gamma }{\mu } I)^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad = \frac{r(\delta -\frac{1}{2})}{\alpha ^2}\log \left( \frac{\gamma }{\mu }\right) +\frac{1}{\alpha ^2} \log \left( \frac{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}{2}\right) \\&\qquad + \frac{1}{\alpha ^2}\sum _{j=1}^{\infty }\log \left( \frac{(\lambda _j + \frac{\gamma }{\mu })^p + (\lambda _j +\frac{\gamma }{\mu })^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right) . \end{aligned}$$

By Lemma 6, we have \( \lim _{\alpha \rightarrow 0}\frac{r(\delta -\frac{1}{2})}{\alpha ^2}\log \left( \frac{\gamma }{\mu }\right) = \frac{[r{'}(0)]^2}{4}\left[ \log \frac{\gamma }{\mu }\right] ^2 \).

By Lemma 7, we have \( \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left( \frac{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}{2}\right) = -\frac{[r{'}(0)]^2}{8}\left[ \log \frac{\gamma }{\mu }\right] ^2. \)

By Lemma 8, we have

$$\begin{aligned}&\lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \det \left[ \frac{(\varLambda + \frac{\gamma }{\mu }I)^p + (\varLambda +\frac{\gamma }{\mu } I)^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad = \lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\sum _{j=1}^{\infty }\log \left[ \frac{(\lambda _j + \frac{\gamma }{\mu })^p + (\lambda _j +\frac{\gamma }{\mu })^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad = \sum _{j=1}^{\infty }\lim _{\alpha \rightarrow 0}\frac{1}{\alpha ^2}\log \left[ \frac{(\lambda _j + \frac{\gamma }{\mu })^p + (\lambda _j +\frac{\gamma }{\mu })^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right] \\&\;\;\; \;\;\;\;\;\text {by Lebesgue's Monotone Convergence Theorem, since} \\&\;\;\;\;\;\;\; \; \log \left[ \frac{(\lambda _j + \frac{\gamma }{\mu })^p + (\lambda _j +\frac{\gamma }{\mu })^{-q}}{(\frac{\gamma }{\mu })^p + (\frac{\gamma }{\mu })^{-q}}\right] \ge 0 \; \forall j \in {\mathbb {N}}\;\; \text {by Lemma}~\mathrm{9} \\&\quad = \frac{[r{'}(0)]^2}{8}\sum _{j=1}^{\infty }\left[ \log \left( \lambda _j + \frac{\gamma }{\mu }\right) - \log \left( \frac{\gamma }{\mu }\right) \right] ^2 = \frac{[r{'}(0)]^2}{8}\sum _{j=1}^{\infty }\left[ \log \left( \lambda _j\frac{\mu }{\gamma }+1\right) \right] ^2. \end{aligned}$$

Summing up these three expressions, we obtain

$$\begin{aligned}&\lim _{\alpha \rightarrow 0}D^{(\alpha , \alpha )}_r[(A+\gamma I), (B+\mu I)] = \frac{[r{'}(0)]^2}{8}\left( \left[ \log \frac{\gamma }{\mu }\right] ^2 + \sum _{j=1}^{\infty }\left[ \log \left( \lambda _j\frac{\mu }{\gamma }+1\right) \right] ^2\right) \\&\quad = \frac{[r{'}(0)]^2}{8}\left( \left[ \log \frac{\gamma }{\mu }\right] ^2 + \left\| \log \left( \varLambda \frac{\mu }{\gamma }+ I\right) \right\| ^2_{\mathrm{HS}}\right) = \frac{[r{'}(0)]^2}{8}\left\| \log \left( \varLambda + \frac{\gamma }{\mu }I\right) \right\| ^2_{\mathrm{HS_X}} \\&\quad = \frac{[r{'}(0)]^2}{8}||\log [(B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}]||^2_{\mathrm{HS_X}} \\&\quad =\frac{[r{'}(0)]^2}{8} d_{\mathrm{aiHS}}^2[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$

\(\square \)

1.3 Proofs for the Alpha Log-Determinant divergences

Proof of Theorem 13

The proof for the cases \(\alpha = 0\) and \(\alpha =1\) is a special case of the results discussed at the end of Sect. 4.1. Consider now the case \(0< \alpha < 1\). We first note that

$$\begin{aligned} d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)&= \frac{1}{\alpha (1-\alpha )}\log \left[ \frac{\mathrm{det_X}(\alpha (A+\gamma I) + (1-\alpha )(B+\mu I)}{\mathrm{det_X}(A+\gamma I)^q\mathrm{det_X}(B+\mu I)^{1-q}}\right] \\&\quad + \frac{q-\alpha }{\alpha (1-\alpha )}\log \frac{\gamma }{\mu }, \;\;\;\text {where }q = \frac{\alpha \gamma }{\alpha \gamma + (1-\alpha )\mu }. \end{aligned}$$

From Eq. 67 we have

$$\begin{aligned}&D^{(\alpha , 1-\alpha )}_r[(A+\gamma I), (B+\mu I)] \\&\quad = \frac{1}{\alpha (1-\alpha )}\log [(\frac{\gamma }{\mu })^{r(\delta -\alpha )} \mathrm{det_X}({\alpha (\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + (1-\alpha )(\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }})] \\&\quad = \frac{r(\delta -\alpha )}{\alpha (1-\alpha )}\log (\frac{\gamma }{\mu }) + \frac{1}{\alpha (1-\alpha )}\log \mathrm{det_X}[{\alpha (\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + (1-\alpha )(\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }}]. \end{aligned}$$

By Proposition 1, we have

$$\begin{aligned}&\mathrm{det_X}[{\alpha (\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + (1-\alpha )(\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }}] \\&\quad = \mathrm{det_X}\left[ \alpha [(A+\gamma I)(B+\mu I)^{-1}]^{r(1-\delta )} + (1-\alpha )[(A+\gamma I)(B+\mu I)^{-1}]^{-r\delta }\right] \\&\quad = \mathrm{det_X}[(A+\gamma I)(B+\mu I)^{-1}]^{-r\delta }\mathrm{det_X}[\alpha [(A+\gamma I)(B+\mu I)^{-1}]^{r} + (1-\alpha )I]. \end{aligned}$$

In particular, for \(r=1\), we have

$$\begin{aligned}&\mathrm{det_X}[\alpha [(A+\gamma I)(B+\mu I)^{-1}] + (1-\alpha )I] \\&\quad = \frac{\mathrm{det_X}[\alpha (A+\gamma I) + (1-\alpha )(B+\mu I)]}{\mathrm{det_X}(B+\mu I)}. \end{aligned}$$

Thus it follows that

$$\begin{aligned}&\mathrm{det_X}[{\alpha (\varLambda +\frac{\gamma }{\mu }I)^{(1-\delta )} + (1-\alpha )(\varLambda +\frac{\gamma }{\mu }I)^{-\delta }}]\\&\quad = \frac{\mathrm{det_X}[\alpha (A+\gamma I) + (1-\alpha )(B+\mu I)]}{\mathrm{det_X}(A+\gamma I)^{\delta }\mathrm{det_X}(B+\mu I)^{1-\delta }}. \end{aligned}$$

For \(r=1\), in Eq. 67, \(\delta = \delta (r=1) = \frac{\alpha \gamma }{\alpha \gamma + (1-\alpha )\mu }\). Combining all of these expressions and comparing with \(d^{1-2\alpha }_{\mathrm{logdet}}\), we obtain the first desired statement. The case \(r=-1\) is proved similarly. \(\square \)

1.4 Proofs for the other limiting cases

In this section, we prove Theorems 11 and 12 . We need the following results.

Lemma 10

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) be such that \(A+ I > 0\). Then \(\forall \alpha \in {\mathbb {R}}\), \((A+I)^{\alpha }\) is well defined and \((A+I)^{\alpha } - I\in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\). Equivalently, let \(\{\lambda _k\}_{k\in {\mathbb {N}}}\) be the eigenvalues of A, then

$$\begin{aligned} \mathrm{tr}[(A+I)^{\alpha } - I] = \sum _{k=1}^{\infty }[(\lambda _k + 1)^{\alpha } - 1] \;\;\; \text {has a finite value}. \end{aligned}$$
(135)

Proof of Lemma 10

By Lemma 3 in [24], if \(A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and \(A+I > 0\), then \(\log (A+I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\). By definition of the power function, we have

$$\begin{aligned} (A+I)^{\alpha } = \exp [\alpha \log (A+I)] = I + \sum _{j=1}^{\infty }\frac{\alpha ^j}{j!}[\log (A+I)]^j. \end{aligned}$$

Since \(\mathrm{Tr}({\mathcal {H}})\) is a Banach algebra under the trace norm, we have

$$\begin{aligned} ||(A+I)^{\alpha }-I||_{\mathrm{tr}}&= \left\| \sum _{j=1}^{\infty }\frac{\alpha ^j}{j!}[\log (A+I)]^j\right\| _{\mathrm{tr}} \le \sum _{j=1}^{\infty }\frac{|\alpha |^j}{j!}||\log (A+I)||^j_{\mathrm{tr}} \\&= \exp (|\alpha |\;||\log (A+I)||_{\mathrm{tr}}) -1 < \infty . \end{aligned}$$

Thus \((A+I)^{\alpha } - I \in \mathrm{Tr}({\mathcal {H}})\). The equivalent statement is then obvious. \(\square \)

Lemma 11

Let \({\mathcal {H}}\) be a separable Hilbert space. Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\). Then for any \(\alpha \in {\mathbb {R}}\), we have \((A+\gamma I)^{\alpha } - \gamma ^{\alpha }I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and

$$\begin{aligned} \mathrm{tr}[(A+\gamma I)^{\alpha } - \gamma ^{\alpha }I]&= \gamma ^{\alpha }\mathrm{tr}\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] , \end{aligned}$$
(136)
$$\begin{aligned} \mathrm{tr}_X[(A+\gamma I)^{\alpha }]&= \gamma ^{\alpha }\left( 1 + \mathrm{tr}\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] \right) . \end{aligned}$$
(137)

Proof of Lemma 11

By definition of the power function, we have

$$\begin{aligned} (A+\gamma I)^{\alpha }&= \exp [\alpha \log (A+\gamma I)] = \exp \left[ (\alpha \log \gamma )I + \alpha \log \left( \frac{A}{\gamma } + I\right) \right] \\&= \gamma ^{\alpha }\left( \frac{A}{\gamma } +I\right) ^{\alpha } = \gamma ^{\alpha }\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] + \gamma ^{\alpha }I, \end{aligned}$$

where \(\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] \in \mathrm{Tr}({\mathcal {H}})\) by Lemma 10. Thus \((A+\gamma I)^{\alpha } - \gamma ^{\alpha }I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) and

$$\begin{aligned} \mathrm{tr}[(A+\gamma I)^{\alpha }- \gamma ^{\alpha }I] = \gamma ^{\alpha }\mathrm{tr}\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] , \end{aligned}$$

which is the first identity. For the second identity, by definition of the extended trace

$$\begin{aligned} \mathrm{tr}_X[(A+\gamma I)^{\alpha }] = \mathrm{tr}_X([(A+\gamma I)^{\alpha }- \gamma ^{\alpha }I] + \gamma ^{\alpha }I) = \gamma ^{\alpha }\mathrm{tr}\left[ \left( \frac{A}{\gamma } +I\right) ^{\alpha } - I\right] +\gamma ^{\alpha }. \end{aligned}$$

\(\square \)

Lemma 12

Let \((A+\gamma I), (B+ \mu I) \in \mathrm{PTr}({\mathcal {H}})\). Let \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\). Then for any \(\alpha \in {\mathbb {R}}\),

$$\begin{aligned} \mathrm{tr}_X[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha }&= \mathrm{tr}_X\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{\alpha }\right] = \mathrm{tr}_X[(B+\mu I)^{-1}(A+\gamma I)]^{\alpha }. \end{aligned}$$
(138)
$$\begin{aligned} \mathrm{det_X}[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha }&= \mathrm{det_X}\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{\alpha }\right] \nonumber \\&= \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]^{\alpha }. \end{aligned}$$
(139)

Proof of Lemma 12

By Proposition 1, we have

$$\begin{aligned}{}[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha }&= (B+\mu I)^{1/2}\left( \varLambda + \frac{\gamma }{\mu }\right) ^{\alpha }(B+\mu I)^{-1/2}. \\ [(B+\mu I)^{-1}(A+\gamma I)]^{\alpha }&= (B+\mu I)^{-1/2}\left( \varLambda + \frac{\gamma }{\mu }\right) ^{\alpha }(B+\mu I)^{1/2}. \end{aligned}$$

By the commutativity of the \(\mathrm{tr}_X\) operation (Lemma 4 in [24]), we then have

$$\begin{aligned} \mathrm{tr}_X[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha } = \mathrm{tr}_X\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{\alpha }\right] = \mathrm{tr}_X[(B+\mu I)^{-1}(A+\gamma I)]^{\alpha }. \end{aligned}$$

Similarly, by the product property of the \(\mathrm{det_X}\) operation (Proposition 4 in [24]),

$$\begin{aligned} \mathrm{det_X}[(A+\gamma I)(B+\mu I)^{-1}]^{\alpha } {=} \mathrm{det_X}\left[ \left( \varLambda {+} \frac{\gamma }{\mu }\right) ^{\alpha }\right] {=} \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]^{\alpha }. \end{aligned}$$

\(\square \)

Lemma 13

Assume that \(\lambda> 0, \gamma> 0, \alpha > 0\) are fixed. Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have

$$\begin{aligned} \lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha \lambda ^p + \beta \lambda ^{-q}}{\alpha + \beta }\right) = \frac{1}{\alpha ^2}\left( (\log \lambda )\frac{r(0)}{\gamma ^{r(0)}} + \lambda ^{-r(0)}-1\right) . \end{aligned}$$
(140)

In particular, for \(\lambda = \gamma \), \(\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha \gamma ^p + \beta \gamma ^{-q}}{\alpha + \beta }\right) = \frac{1}{\alpha ^2}\left( [(\log \gamma ){r(0)} + 1]\gamma ^{-r(0)}-1\right) . \)

Lemma 14

Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(\lambda \in {\mathbb {R}}\) is also fixed, such that \(\lambda + \gamma > 0\). Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have

$$\begin{aligned}&\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\lambda +\gamma )^p + \beta (\lambda +\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}}\right) \nonumber \\&\quad = \frac{1}{\alpha ^2}\left[ \log \left( \frac{\lambda }{\gamma }+1\right) \frac{r(0)}{\gamma ^{r(0)}} + (\lambda +\gamma )^{-r(0)} - \gamma ^{-r(0)}\right] . \end{aligned}$$
(141)

Lemma 15

Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(\lambda \in {\mathbb {R}}\) is also fixed, such that \(\lambda + \gamma > 0\). Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }, \;\; p = r(1-\delta ), \;\; q = r\delta \), we have

$$\begin{aligned} \frac{\alpha (\lambda +\gamma )^p + \beta (\lambda +\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}} \ge 1, \;\; \log \left( \frac{\alpha (\lambda +\gamma )^p + \beta (\lambda +\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}}\right) \ge 0. \end{aligned}$$
(142)

Lemma 16

Assume that \(\gamma> 0, \alpha > 0\) are fixed. Assume that \(r = r(\beta )\) is smooth. Then for \(\delta = \frac{\alpha \gamma ^{r}}{\alpha \gamma ^{r} + \beta }\),

$$\begin{aligned} \lim _{\beta \rightarrow 0}\frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta } = \frac{1}{\alpha ^2}r(0)[-\gamma ^{-r(0)} + 1]. \end{aligned}$$
(143)

Proof of Theorem 11

Let \(\{\lambda _j\}_{j=1}^{\infty }\) be the eigenvalues of \(\varLambda \). By Theorem 10,

$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)]&= \frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\log \left( \frac{\gamma }{\mu }\right) + \frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \\&\quad + \frac{1}{\alpha \beta }\log \det \left( \frac{\alpha (\varLambda + \frac{\gamma }{\mu }I)^p + \beta (\varLambda + \frac{\gamma }{\mu } I)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right) \\&= \frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\log \left( \frac{\gamma }{\mu }\right) + \frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \\&\quad + \frac{1}{\alpha \beta }\sum _{j=1}^{\infty }\log \left( \frac{\alpha (\lambda _j + \frac{\gamma }{\mu })^p + \beta (\lambda _j + \frac{\gamma }{\mu })^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right) , \end{aligned}$$

where \(p= p(\beta ) = r(1-\delta ) = \frac{r \beta }{\alpha (\frac{\gamma }{\mu })^{r} + \beta }\), \(q= q(\beta ) = r\delta = \frac{r \alpha (\frac{\gamma }{\mu })^{r}}{\alpha (\frac{\gamma }{\mu })^{r} + \beta }\).

For \(\alpha > 0\) fixed, as functions of \(\beta \), we have \(\lim _{\beta \rightarrow 0} p(\beta ) = 0, \lim _{\beta \rightarrow 0} q(\beta ) = r(0)\). For simplicity, in the following, we replace \(\frac{\gamma }{\mu }\) by \(\gamma \). By Lemma 13,

$$\begin{aligned} \lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta } \log \left( \frac{\alpha \gamma ^p + \beta \gamma ^{-q}}{\alpha + \beta }\right) =\frac{1}{\alpha ^2}\left( [(\log \gamma ){r(0)} + 1]\gamma ^{-r(0)}-1\right) . \end{aligned}$$

By Lemma 14,

$$\begin{aligned}&\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\lambda _j+\gamma )^p + \beta (\lambda _j+\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}}\right) \\&\quad = \frac{1}{\alpha ^2}\left[ \log \left( \frac{\lambda _j}{\gamma }+1\right) \frac{r(0)}{\gamma ^{r(0)}} + (\lambda _j+\gamma )^{-r(0)} - \gamma ^{-r(0)}\right] . \end{aligned}$$

By Lemma 15, we have \(\log \left( \frac{\alpha (\lambda _j+\gamma )^p + \beta (\lambda _j+\gamma )^{-q}}{\alpha \gamma ^p + \beta \gamma ^{-q}}\right) \ge 0\)\(\forall j \in {\mathbb {N}}\), so that by Lebesgue’s Monotone Convergence Theorem, we obtain

$$\begin{aligned}&\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\sum _{j=1}^{\infty }\log \left( \frac{\alpha (\lambda _j + {\gamma })^p + \beta (\lambda _j + {\gamma })^{-q}}{\alpha {\gamma }^p + \beta {\gamma }^{-q}}\right) \\&\quad = \sum _{j=1}^{\infty }\lim _{\beta \rightarrow 0}\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\lambda _j + {\gamma })^p + \beta (\lambda _j + \gamma )^{-q}}{\alpha {\gamma }^p + \beta {\gamma }^{-q}}\right) \\&\quad = \frac{1}{\alpha ^2}\sum _{j=1}^{\infty }\left[ \log \left( \frac{\lambda _j}{\gamma }+1\right) \frac{r(0)}{\gamma ^{r(0)}} + (\lambda _j+\gamma )^{-r(0)} - \gamma ^{-r(0)}\right] . \end{aligned}$$

By Lemma 16, \( \log (\gamma )\lim _{\beta \rightarrow 0}\frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta } = \frac{1}{\alpha ^2}r(0)[-\gamma ^{-r(0)} + 1]\log (\gamma ). \) Combining all three expressions, we obtain the desired limit as the sum

$$\begin{aligned}&\frac{1}{\alpha ^2}[\gamma ^{-r(0)}+r(0)\log (\gamma ) -1] \nonumber \\&\qquad + \frac{1}{\alpha ^2}\left\{ \frac{r(0)}{\gamma ^{r(0)}}\sum _{j=1}^{\infty } \log \left( \frac{\lambda _j}{\gamma }+1\right) +\sum _{j=1}^{\infty }\left[ \frac{1}{\left( \lambda _j+\gamma \right) ^{r(0)}} - \frac{1}{\gamma ^{r(0)}}\right] \right\} . \end{aligned}$$
(144)

By Lemmas 10 and 11 , we have

$$\begin{aligned}&\sum _{j=1}^{\infty }\left[ \frac{1}{(\lambda _j+\gamma )^{r(0)}} - \frac{1}{\gamma ^{r(0)}}\right] = \gamma ^{-r(0)}\sum _{j=1}^{\infty }\left[ \left( \frac{\lambda _j}{\gamma } + 1\right) ^{-r(0)} -1\right] \\&\quad = \gamma ^{-r(0)}\mathrm{tr}\left[ \left( \frac{\varLambda }{\gamma } + I\right) ^{-r(0)} - I\right] = \mathrm{tr}[(\varLambda + \gamma I)^{-r(0)} - \gamma ^{-r(0)}I]. \end{aligned}$$

Thus it follows that

$$\begin{aligned}&\gamma ^{-r(0)} - 1 + \sum _{j=1}^{\infty }\left[ \frac{1}{\left( \lambda _j+\gamma \right) ^{r(0)}} - \frac{1}{\gamma ^{r(0)}}\right] \\&\quad = \gamma ^{-r(0)} - 1 + \mathrm{tr}[(\varLambda + \gamma I)^{-r(0)} - \gamma ^{-r(0)}I] = \mathrm{tr}_X[(\varLambda + \gamma I)^{-r(0)} - I]. \end{aligned}$$

Furthermore,

$$\begin{aligned}&\frac{r(0)}{\gamma ^{r(0)}}\sum _{j=1}^{\infty } \log \left( \frac{\lambda _j}{\gamma }+1\right) = r(0)\gamma ^{-r(0)}\log \det \left( \frac{\varLambda }{\gamma }+I\right) \\&\quad = r(0)\gamma ^{-r(0)}\log \mathrm{det_X}(\varLambda + \gamma I) - r(0)\gamma ^{-r(0)}\log \gamma \\&\quad = - \gamma ^{-r(0)}\log \mathrm{det_X}(\varLambda + \gamma I)^{-r(0)} - r(0)\gamma ^{-r(0)}\log \gamma . \end{aligned}$$

Plugging the last two expressions into (144), we obtain the desired limit as

$$\begin{aligned}&\frac{1}{\alpha ^2}\left\{ r(0)(1-\gamma ^{-r(0)})\log \gamma \right\} \nonumber \\&\quad + \frac{1}{\alpha ^2}\left\{ \mathrm{tr}_X[(\varLambda + \gamma I)^{-r(0)} - I]- \gamma ^{-r(0)}\log \mathrm{det_X}(\varLambda + \gamma I)^{-r(0)} \right\} . \end{aligned}$$
(145)

We now replace \(\gamma \) by \(\frac{\gamma }{\mu }\). We have by Lemma 12,

$$\begin{aligned} \mathrm{tr}_X\left[ \left( \varLambda + \frac{\gamma }{\mu } I\right) ^{-r(0)}\right]&= \mathrm{tr}_X[(B+\mu I)^{-1}(A+\gamma I)]^{-r(0)} \\&= \mathrm{tr}_X[(A+\gamma I)^{-1}(B+ \mu I)]^{r(0)}, \\ \mathrm{det_X}\left( \varLambda + \frac{\gamma }{\mu }I\right) ^{-r(0)}&= \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]^{-r(0)} \\&= \mathrm{det_X}\left[ (A+\gamma I)^{-1}(B+\mu I)\right] ^{r(0)}. \end{aligned}$$

Then (145) becomes

$$\begin{aligned}&\frac{r(0)}{\alpha ^2}[(\frac{\mu }{\gamma })^{r(0)} -1]\log \frac{\mu }{\gamma } + \frac{1}{\alpha ^2}\mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \mu I)]^{r(0)} -I) \\&- \frac{1}{\alpha ^2}(\frac{\mu }{\gamma })^{r(0)}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]^{r(0)}. \end{aligned}$$

\(\square \)

Proof of Theorem 12

The dual symmetry in Theorem 14 gives

$$\begin{aligned} \lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] = \lim _{\alpha \rightarrow 0}D^{(\beta , \alpha )}_r[(B+ \mu I), (A+\gamma I)]. \end{aligned}$$

The limit on the right hand side then follows from Theorem 11. \(\square \)

1.5 Proofs of the properties of the Alpha-Beta Log-Determinant divergences

In this section, we prove Theorems 141516, 1718, and 19 . For the case \(\alpha = \beta = 0\), we have \(D^{(0,0)}_0[(A+\gamma I), (B+\mu I)] = \frac{1}{2}d_{\mathrm{aiHS}}^2[(A+\gamma I), (B+\mu I)]\), with \(d_{\mathrm{aiHS}}\) being the affine-invariant Riemannian distance on \(\mathrm{PTr}({\mathcal {H}})\). Thus these properties are either automatic or straightforward to verify. We thus focus on the three cases \((\alpha> 0, \beta > 0)\), \((\alpha > 0, \beta = 0)\), and \((\alpha = 0, \beta > 0)\).

Proof of Theorem 14

The cases \((\alpha > 0, \beta = 0)\) and \((\alpha =0, \beta > 0)\) follow immediately from Eqs. (20) and (21). Consider now the case \(\alpha> 0, \beta > 0\). Write \(\delta = \delta (\alpha , \beta )\) to emphasize its dependence on \(\alpha \) and \(\beta \), we have \(\delta (\alpha , \beta ) = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\) in \(D^{(\alpha ,\beta )}_r[(A+\gamma I), (B+ \mu I)]\). Then for \(D^{(\beta , \alpha )}_r[(B+\mu I), (A+\gamma I)]\), \(\delta (\beta , \alpha ) = \frac{\beta \mu ^r}{\alpha \gamma ^r + \beta \mu ^r} = 1 -\delta (\alpha , \beta ), \;\;\; 1-\delta (\beta , \alpha ) = \delta (\alpha , \beta ), \)\(\delta (\beta , \alpha ) - \frac{\beta }{\alpha + \beta } = 1 -\delta (\alpha , \beta ) - \frac{\beta }{\alpha + \beta } = -\left( \delta (\alpha , \beta ) - \frac{\alpha }{\alpha + \beta }\right) . \) By Definition 1, we have

$$\begin{aligned}&D^{(\beta , \alpha )}_r[(B+ \mu I),(A+\gamma I)] = \frac{1}{\alpha \beta }\log (\frac{\mu }{\gamma })^{r(\delta (\beta , \alpha )-\frac{\beta }{\alpha +\beta })} \\&\qquad + \frac{1}{\alpha \beta }\log \mathrm{det_X}\\&\quad \left( \frac{\beta [(B+\mu I)(A+\gamma I)^{-1}]^{r(1-\delta (\beta , \alpha ))} + \alpha [(B+\mu I)(A+\gamma I)^{-1}]^{-r\delta (\beta , \alpha )}}{\alpha + \beta }\right) \\&\quad = \frac{1}{\alpha \beta }\log (\frac{\gamma }{\mu })^{r(\delta (\alpha , \beta )-\frac{\alpha }{\alpha +\beta })} \\&\qquad + \frac{1}{\alpha \beta }\log \mathrm{det_X}\\&\quad \left( \frac{\beta [(A+\gamma I)(B+\mu I)^{-1}]^{-r\delta (\alpha , \beta )} + \alpha [(A+\gamma I)(B+\mu I)^{-1}]^{r(1-\delta (\alpha , \beta ))}}{\alpha + \beta }\right) \\&\quad = D^{(\alpha , \beta )}_r[(A+ \gamma I),(B+\mu I)]. \end{aligned}$$

\(\square \)

Proof of Theorem 15

We prove Eq. (77) - the proof for Eq. (78) is similar. We write \((A+ \gamma I)^{-1} = \frac{1}{\gamma }I - \frac{A}{\gamma }(A+\gamma I)^{-1}, (B+\mu I)^{-1} = \frac{1}{\mu }I - \frac{B}{\mu }(B+\mu I)^{-1}, (B+\mu I)^{1/2}(A+\gamma I)^{-1}(B+\mu I)^{1/2} = [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}]^{-1} \). Consider the case \(\alpha> 0, \beta > 0\). By Definition 1,

$$\begin{aligned}&D^{(\alpha , \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] \\= & {} \frac{1}{\alpha \beta }\log \left( \frac{1/\gamma }{1/\mu }\right) ^{r(\delta _2-\frac{\alpha }{\alpha +\beta })} + \frac{1}{\alpha \beta }\log \mathrm{det_X}\left( \frac{\alpha (\varLambda + \frac{\gamma }{\mu }I)^{-r(1-\delta _2)} + \beta (\varLambda + \frac{\gamma }{\mu })^{r\delta _2}}{\alpha + \beta }\right) \end{aligned}$$

where \(\delta _2 = \frac{\alpha (1/\gamma )^r}{\alpha (1/\gamma )^r + \beta (1/\mu )^r} = \frac{\alpha \mu ^r}{\alpha \mu ^r + \beta \gamma ^r} = \delta (-r)\). Thus

$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = D^{(\alpha , \beta )}_{-r}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$

Consider the case \(\alpha = 0, \beta > 0\) (the case \(\alpha > 0, \beta = 0\) then follows by dual symmetry). We have

$$\begin{aligned}&D^{(0, \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = \frac{r}{\beta ^2}[(\frac{1/\gamma }{1/\mu })^{r} -1]\log \frac{1/\gamma }{1/\mu } \\&\qquad +\frac{1}{\beta ^2}\mathrm{tr}_X([(B+\mu I)(A+ \gamma I)^{-1}]^{r} -I) - \frac{1}{\beta ^2}(\frac{1/\gamma }{1/\mu })^{r}\log \mathrm{det_X}[(B+\mu I)(A+ \gamma I)^{-1}]^{r} \\&\quad = -\frac{r}{\beta ^2}[(\frac{\gamma }{\mu })^{-r} -1]\log \frac{\gamma }{\mu } +\frac{1}{\beta ^2}\mathrm{tr}_X([(A+ \gamma I)(B+\mu I)^{-1}]^{-r} -I) \\&\qquad - \frac{1}{\beta ^2}(\frac{\gamma }{\mu })^{-r}\log \mathrm{det_X}[(A+ \gamma I)(B+\mu I)^{-1}]^{-r}. \end{aligned}$$

By Lemma 12, we have

$$\begin{aligned}&\mathrm{tr}_X[(A+\gamma I)(B+\mu I)^{-1}]^{-r} = \mathrm{tr}_X\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{-r}\right] = \mathrm{tr}_X[(B+\mu I)^{-1}(A+\gamma I)]^{-r}, \\&\mathrm{det_X}[(A+\gamma I)(B+\mu I)^{-1}]^{-r} = \mathrm{det_X}\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{-r}\right] = \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]^{-r}. \end{aligned}$$

Thus it follows that

$$\begin{aligned}&D^{(0, \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] \\&\quad = -\frac{r}{\beta ^2}[(\frac{\gamma }{\mu })^{-r} -1]\log \frac{\gamma }{\mu } +\frac{1}{\beta ^2}\mathrm{tr}_X([(B+\mu I)^{-1}(A+\gamma I)]^{-r} -I) \\&\qquad - \frac{1}{\beta ^2}\left( \frac{\gamma }{\mu }\right) ^{-r}\log \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]^{-r} \\&\quad = D^{(0,\beta )}_{-r}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$

\(\square \)

Proof of Theorem 16 - Affine-invariance

For \((A + \gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((C+\nu I) \in \mathrm{Tr}_X({\mathcal {H}})\), \(\nu \ne 0\),

$$\begin{aligned}&(C+\nu I)(A+\gamma I)(C+\nu I)^{*} \\&\quad = CAC^{*} + \nu (CA+AC^{*}) + \nu ^2A + \gamma CC^{*} + \gamma \nu (C + C^{*}) + \gamma \nu ^2 I \in \mathrm{Tr}_X({\mathcal {H}}). \end{aligned}$$

Since \((C+\nu I)\) is invertible, \((C+\nu I)(A+\gamma I)(C+\nu I)^{*}\) is also invertible, with inverse \([(C+\nu I)^{*}]^{-1}(A+\gamma I)^{-1}(C+\nu I)^{-1}\). Furthermore, \(\forall x \in {\mathcal {H}}\),

$$\begin{aligned} \langle x, (C+\nu I)(A+\gamma I)(C+\nu I)^{*}x\rangle&= \langle (C+\nu I)^{*}x, (A+\gamma I)(C+ \nu I)^{*}x\rangle \\&\ge M_A||(C+\nu I)^{*}x||^2 \ge 0, \end{aligned}$$

with equality if and only if \((C+\nu I)^{*}x = 0 \Longleftrightarrow x = 0\). Thus \((C+\nu I)(A+\gamma I)(C+\nu I)^{*}\) is strictly positive. Together with its invertibility, this shows that this is a positive definite operator. Hence \((C+\nu I)(A+\gamma I)(C+\nu I)^{*} \in \mathrm{PTr}({\mathcal {H}})\). For \((A+\gamma I), (B + \mu I) \in \mathrm{PTr}({\mathcal {H}})\), we then have

$$\begin{aligned}&[(C+\nu I)(A+\gamma I)(C+\nu I)^{*}][(C+\nu I)(B+\mu I)(C+\nu I)^{*}]^{-1} \\&\quad = (C+\nu I)[(A+\gamma I)(B+\mu I)^{-1}](C+\nu I)^{-1}. \end{aligned}$$

Then for any \(p \in {\mathbb {R}}\), we have

$$\begin{aligned}&([(C+\nu I)(A+\gamma I)(C+\nu I)^{*}][(C+\nu I)(B+\mu I)(C+\nu I)^{*}]^{-1})^p \\&\quad = (C+\nu I)[(A+\gamma I)(B+\mu I)^{-1}]^p(C+\nu I)^{-1}. \end{aligned}$$

Thus for any \(a, b > 0\) and any \(p, q \in {\mathbb {R}}\).

$$\begin{aligned}&a([(C+\nu I)(A+\gamma I)(C+\nu I)^{*}][(C+\nu I)(B+\mu I)(C+\nu I)^{*}]^{-1})^p \\&\quad +b ([(C+\nu I)(A+\gamma I)(C+\nu I)^{*}][(C+\nu I)(B+\mu I)(C+\nu I)^{*}]^{-1})^q \\&= (C+\nu I)(a[(A+\gamma I)(B+\mu I)^{-1}]^p + b [(A+\gamma I)(B+\mu I)^{-1}]^q)(C+\nu I)^{-1}. \end{aligned}$$

By the definition of \(D^{(\alpha , \beta )}_r\) and the following invariances of the extended Fredholm determinant \(\mathrm{det_X}\) as well as of the extended trace operation \(\mathrm{tr}_X\), namely,

$$\begin{aligned} \mathrm{det_X}[C(A+\gamma I)C^{-1}] = \mathrm{det_X}[(A+\gamma I)], \;\;\; \mathrm{tr}_X[C(A+\gamma I)C^{-1}] = \mathrm{tr}_X[(A+\gamma I)], \end{aligned}$$

for \(A + \gamma I \in \mathrm{Tr}_X({\mathcal {H}})\), \(\gamma \ne 0\), and \(C \in {\mathcal {L}}({\mathcal {H}})\) invertible (Lemma 5 in [24]), we obtain

$$\begin{aligned}&D^{(\alpha , \beta )}_r[(C+\nu I)(A+\gamma I)(C+\nu I)^{*}, (C+\nu I)(B+ \mu I)(C+\nu I)^{*}] \nonumber \\&=D^{(\alpha , \beta )}_r[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$

\(\square \)

Proof of Theorem 17 - Invariance under unitary transformations

The proof of this theorem is similar to that of the proof for Theorem 16 , using the fact that \(C^{*} = C^{-1}\) and the properties

$$\begin{aligned} \mathrm{det_X}[C(A+\gamma I)C^{-1}] = \mathrm{det_X}[(A+\gamma I)], \;\;\; \mathrm{tr}_X[C(A+\gamma I)C^{-1}] = \mathrm{tr}_X[(A+\gamma I)], \end{aligned}$$

of the operations \(\mathrm{det_X}\) and \(\mathrm{tr}_X\).\(\square \)

Proof of Theorem 18

For the case \(\alpha > 0\), \(\beta > 0\), this follows immediately from Definition 1. For the case \(\alpha > 0, \beta = 0\) (the case \(\alpha =0, \beta > 0\) is entirely similar), by Definition 2 and Lemma 12, we have

$$\begin{aligned}&D^{(\alpha , 0)}_r[(A+\gamma I), (B+\mu I)] = \frac{r}{\alpha ^2}\left[ (\frac{\mu }{\gamma })^{r} -1\right] \log \left( \frac{\mu }{\gamma }\right) \\&\qquad +\frac{1}{\alpha ^2}\mathrm{tr}_X\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) ^{-r} -I\right] - \frac{1}{\alpha ^2}\left( \frac{\mu }{\gamma }\right) ^{r}\log \mathrm{det_X}\left( \varLambda + \frac{\gamma }{\mu }\right) ^{-r} \\&\quad = D^{(\alpha , 0)}_r\left[ \left( \varLambda + \frac{\gamma }{\mu }\right) , I\right] . \end{aligned}$$

\(\square \)

Proof of Theorem 19

We first note that \((\varLambda + \frac{\gamma }{\mu } I)^{\omega } = (\frac{\gamma }{\mu })^{\omega }(\frac{\mu }{\gamma }\varLambda + I)^{\omega }\). For \(\alpha> 0, \beta > 0\), this follows immediately from Definition 1. For \(\alpha > 0, \beta = 0\), by Definition 2 and Lemma 12,

$$\begin{aligned}&D^{(\omega \alpha , 0)}_{\omega r}[(A+\gamma I), (B+\mu I)] = \frac{r}{\omega ^2\alpha ^2}[(\frac{\mu }{\gamma })^{\omega r} -1]\log (\frac{\mu }{\gamma })^{\omega } \\&\quad +\frac{1}{\omega ^2\alpha ^2}\mathrm{tr}_X[(\varLambda + \frac{\gamma }{\mu })^{-\omega r} -I] - \frac{1}{\omega ^2\alpha ^2}(\frac{\mu }{\gamma })^{\omega r}\log \mathrm{det_X}(\varLambda + \frac{\gamma }{\mu })^{-\omega r} \\&= \frac{1}{\omega ^2}D^{(\alpha , 0)}_r[(\varLambda + \frac{\gamma }{\mu })^{\omega }, I]. \;\;\; \text {The case }\alpha =0, \beta > 0 \text { is entirely similar.} \end{aligned}$$

\(\square \)

1.6 Proofs of Theorems 12, and 3

We are now ready to provide the proofs for Theorems 12, and 3 . We first need the following result.

Lemma 17

  1. (i )

    Let \(r \ne 0\) be fixed. The function \(f(x) = x^r-1 - r\log (x)\) for \(x > 0\) has a unique global minimum \(f_{\min } = f(1) = 0\). In other words, \(f(x) \ge 0\)\(\forall x > 0\), with equality if and only if \(x = 1\).

  2. (ii)

    Let \(\nu > 0, r \ne 0 \) be fixed. For \(r \ne 0\), the function \(g(x) = (\frac{x}{\nu }+1)^r - 1 - r \log (\frac{x}{\nu }+1)\) for \(x > - \nu \) has a unique global minimum \(g_{\min } = g(0) = 0\). In other words, \(g(x) \ge 0\)\(\forall x > -\nu \), with equality if and only if \(x = 0\).

Proof of Theorem 1 - Positivity

For the case \(\alpha> 0, \beta > 0\), this is a special case of Theorem 8, with \(p +q = r\). Consider now the case \(\alpha = 0, \beta > 0\) (the case \(\alpha > 0, \beta =0\) then follows by dual symmetry). It suffices to consider \(D^{(0,1)}_r\). Recall that \(\varLambda + \nu I = (B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}\), where \(\nu = \frac{\gamma }{\mu }\). Then, since \(\mathrm{det_X}[(B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}] = \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]\) and \(\mathrm{tr}_X[(B + \mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}] = \mathrm{tr}_X[(B+\mu I)^{-1}(A + \gamma I)]\), we have

$$\begin{aligned} D^{(0,1)}_r[(A+\gamma I), (B+ \mu I)]&= {r}(\nu ^{r} -1)\log \nu +\mathrm{tr}_X[(\varLambda + \nu I)^{r} -I] \\&\quad - \nu ^{r}\log \mathrm{det_X}(\varLambda + \nu I)^{r}. \end{aligned}$$

By Lemma 11, \( \mathrm{tr}_X[(\varLambda + \nu I)^{r} -I] = \nu ^r - 1 +\nu ^r\mathrm{tr}\left[ \left( \frac{\varLambda }{\nu } + I\right) ^r - I\right] . \) Also

$$\begin{aligned} \log \mathrm{det_X}(\varLambda + \nu I)^{r} = \log \left[ \nu ^r\det \left( \frac{\varLambda }{\nu } + I\right) ^r\right] = r\log \det \left( \frac{\varLambda }{\nu } +I\right) + r\log \nu . \end{aligned}$$

Thus we have

$$\begin{aligned} \begin{aligned}&D^{(0,1)}_r[(A+\gamma I), (B+ \mu I)] \\&\quad = \nu ^r - 1 - r\log \nu + \nu ^r\left( \mathrm {tr}\left[ \left( \frac{\varLambda }{\nu } + I\right) ^r - I\right] \right. \left. -\, r\log \det \left( \frac{\varLambda }{\nu } +I\right) \right) \\&\quad = \nu ^r - 1 - r\log \nu + \nu ^r\left[ \sum _{k=1}^{\infty }\left( \frac{\lambda _k}{\nu }+1\right) ^r - 1 - r\log \left( \frac{\lambda _k}{\nu } + 1\right) \right] . \end{aligned} \end{aligned}$$

By the first part of Lemma 17, we have for all \(\nu > 0\), \(\nu ^r - 1 - r\log \nu \ge 0\), with equality if and only if \(\nu =1\). By the second part of Lemma 17, we have \(\forall k \in {\mathbb {N}}\), \(\left( \frac{\lambda _k}{\nu }+1\right) ^r - 1 - r\log \left( \frac{\lambda _k}{\nu } + 1\right) \ge 0\), with equality if and only \(\lambda _k = 0\). Combining these two inequalities, we obtain

$$\begin{aligned}&D^{(0,1)}_r[(A+\gamma I), (B+ \mu I)] \ge 0, \end{aligned}$$

with equality if and only if \(\nu = \frac{\gamma }{\mu } = 1\) and \(\lambda _k = 0 \forall k \in {\mathbb {N}}\Longleftrightarrow \varLambda = I\), that is if and only \((B+\mu I)^{-1/2}(A + \gamma I)(B+ \mu I)^{-1/2} = I \Longleftrightarrow A+\gamma I = B+\mu I \Longleftrightarrow A = B\) and \(\gamma = \mu \). \(\square \)

Proof of Theorem 2 - Special cases I

The first statement of the theorem was proved in Sect. A.2. The second statement is the content of Theorem 13. \(\square \)

Proof of Theorem 3 - Special cases II

This theorem follows from Theorem 2

as well as the symmetry of \(D^{(\alpha , \alpha )}_r\) as proved in Theorem 14. \(\square \)

1.7 Proofs for the Divergences between RKHS covariance operators

In this section, we prove Theorems 262728, and 29 . We first need the following preliminary results.

Lemma 18

Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2,B:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_1\) be compact operators such that \(AB: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2, BA:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\) are trace class. Let \(\alpha , \beta > 0\) be fixed. Then

$$\begin{aligned}&\det \left[ \frac{\alpha (AB + I_{{\mathcal {H}}_2})^p + \beta (AB + I_{{\mathcal {H}}_2})^q}{\alpha + \beta }\right] \nonumber \\&\quad = \det \left[ \frac{\alpha (BA + I_{{\mathcal {H}}_1})^{p} + \beta (BA + I_{{\mathcal {H}}_1})^q}{\alpha + \beta }\right] \forall p,q \in {\mathbb {R}}. \end{aligned}$$
(146)

Proof

Since the nonzero eigenvalues of \(AB:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) and \(BA:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\) are the same, we have

$$\begin{aligned} \det [(AB + I_{{\mathcal {H}}_2})^p] = \det [(BA+I_{{\mathcal {H}}_1})^p], \;\;\;\forall p \in {\mathbb {R}}. \end{aligned}$$

For any \(p,q \in {\mathbb {R}}\),

$$\begin{aligned}&\det \left[ \frac{\alpha (AB + I_{{\mathcal {H}}_2})^p + \beta (AB + I_{{\mathcal {H}}_2})^q}{\alpha + \beta }\right] = \det \left[ \frac{\alpha (BA + I_{{\mathcal {H}}_1})^{p} + \beta (BA + I_{{\mathcal {H}}_1})^q}{\alpha + \beta }\right] . \end{aligned}$$

In the above equality, we have used the fact that a zero eigenvalue of AB and BA corresponds to an eigenvalue equal to 1 for \(\frac{\alpha (AB + I_{{\mathcal {H}}_2})^{p} + \beta (AB + I_{{\mathcal {H}}_2})^q}{\alpha + \beta }:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) and \(\frac{\alpha (BA + I_{{\mathcal {H}}_1})^{p} + \beta (BA + I_{{\mathcal {H}}_1})^q}{\alpha + \beta }:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_1\), respectively, which does not change the determinant. \(\square \)

Lemma 19

Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A,B:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be compact operators such that \(AA^{*}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2,BB^{*}:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) are trace class. Let \(\alpha , \beta > 0\) be fixed. For any \(p, q \in {\mathbb {R}}\),

$$\begin{aligned}&\det \left[ \frac{\alpha [(AA^{*}+I_{{\mathcal {H}}_2})(BB^{*} + I_{{\mathcal {H}}_2})^{-1}]^p + \beta [(AA^{*}+I_{{\mathcal {H}}_2})(BB^{*} + I_{{\mathcal {H}}_2})^{-1}]^q}{\alpha + \beta }\right] \nonumber \\&=\det \left[ \frac{\alpha (C + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (C + I_{{\mathcal {H}}_1} \otimes I_3)^q}{\alpha + \beta }\right] , \end{aligned}$$
(147)
$$\begin{aligned}&\text {where}\;\;C = \begin{pmatrix} A^{*}A \;\; -A^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -A^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \\ B^{*}A \;\; -B^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -B^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \\ B^{*}A \;\; -B^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -B^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \end{pmatrix}. \end{aligned}$$
(148)

Proof of Lemma 19

We make use of the following notation. Let \(A,B,C: {\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be three bounded linear operators. Consider the operator \((A \; B \; C): {\mathcal {H}}_1^3 \rightarrow {\mathcal {H}}_2\), with \((A \; B \; C)^{*} = \begin{pmatrix} A^{*}\\ B^{*}\\ C^{*} \end{pmatrix}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_1^3\). Here \({\mathcal {H}}_1^3 = {\mathcal {H}}_1 \oplus {\mathcal {H}}_1 \oplus {\mathcal {H}}_1\) denotes the direct sum of \({\mathcal {H}}_1\) with itself, that is

$$\begin{aligned} {\mathcal {H}}_1^3 = {\mathcal {H}}_1 \oplus {\mathcal {H}}_1 \oplus {\mathcal {H}}_1 = \{(v_1, v_2, v_3) \; : \; v_1, v_2, v_3 \in {\mathcal {H}}_1\}, \end{aligned}$$

equipped with the inner product

$$\begin{aligned} \langle (v_1, v_2, v_3) , (w_1, w_2, w_3)\rangle _{{\mathcal {H}}_1^3} = \langle v_1, w_1\rangle _{{\mathcal {H}}_1} + \langle v_2, w_2\rangle _{{\mathcal {H}}_1} + \langle v_3, w_3\rangle _{{\mathcal {H}}_1}. \end{aligned}$$

By the Sherman–Morrison–Woodbury formula, \( (BB^{*}+I_{{\mathcal {H}}_2})^{-1} = I_{{\mathcal {H}}_2} - B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1}B^{*}. \) Thus

$$\begin{aligned} (AA^{*} + I_{{\mathcal {H}}_2})(BB^{*} + I_{{\mathcal {H}}_2})^{-1}&= I_{{\mathcal {H}}_2} + AA^{*} - B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1}B^{*} \\&\quad - AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1}B^{*} = I_{{\mathcal {H}}_2} + C_1C_2. \end{aligned}$$

Here the operators \(C_1, C_2\) are defined as follows.

$$\begin{aligned} C_1&= [A \;\; -B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1}]:{\mathcal {H}}_1^3 \rightarrow {\mathcal {H}}_2, \\ C_2&= \begin{pmatrix} A^{*} \\ B^{*} \\ B^{*} \end{pmatrix}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_1^3. \end{aligned}$$

The operator \(C_2C_1: {\mathcal {H}}_1^3 \rightarrow {\mathcal {H}}_1^3\) is given by

$$\begin{aligned} C_2C_1 = \begin{pmatrix} A^{*}A \;\; -A^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -A^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \\ B^{*}A \;\; -B^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -B^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \\ B^{*}A \;\; -B^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \;\; -B^{*}AA^{*}B(I_{{\mathcal {H}}_1} + B^{*}B)^{-1} \end{pmatrix} = C. \end{aligned}$$

It follows from Lemma 18 that

$$\begin{aligned}&\det \left[ \frac{\alpha [(AA^{*}+I_{{\mathcal {H}}_2})(BB^{*} + I_{{\mathcal {H}}_2})^{-1}]^p + \beta [(AA^{*}+I_{{\mathcal {H}}_2})(BB^{*} + I_{{\mathcal {H}}_2})^{-1}]^q}{\alpha + \beta }\right] \\&\quad =\det \left[ \frac{\alpha (I_{{\mathcal {H}}_2} + C_1C_2)^p + \beta (I_{{\mathcal {H}}_2} + C_1C_2)^q}{\alpha + \beta }\right] \\&\quad = \det \left[ \frac{\alpha (C_2C_1 + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (C_2C_1 + I_{{\mathcal {H}}_1} \otimes I_3)^q}{\alpha + \beta }\right] . \end{aligned}$$

\(\square \)

Proof of Theorem 26

Let \(\varLambda +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2} = (BB^{*} + \mu I_{{\mathcal {H}}_2})^{-1/2}(AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1/2}, Z + \frac{\gamma }{\mu } I_{{\mathcal {H}}_2} = (AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1}\). By Theorem 10,

$$\begin{aligned}&D^{(\alpha , \beta )}_{r}[(AA^{*}+\gamma I_{{\mathcal {H}}_2}), (BB^{*}+\mu I_{{\mathcal {H}}_2})] \\&\quad = \frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\left( \log \frac{\gamma }{\mu }\right) +\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \nonumber \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^p + \beta (\varLambda + \frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \\&\qquad \;\;\text {with }p = r(1-\delta )\text { and }q = r\delta . \end{aligned}$$

By Lemma 19, with \(\frac{\mu }{\gamma }Z + I_{{\mathcal {H}}_2} = (\frac{AA^{*}}{\gamma } + I_{{\mathcal {H}}_2})(\frac{BB^{*}}{\mu } + I_{{\mathcal {H}}_2})^{-1}\), the determinant in the last term is

$$\begin{aligned}&\det \left[ \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^p + \beta (\varLambda + \frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad = \det \left[ \frac{\alpha (\frac{\gamma }{\mu })^p(\frac{\mu }{\gamma }\varLambda + I_{{\mathcal {H}}_2})^p + \beta (\frac{\gamma }{\mu })^{-q}(\frac{\mu }{\gamma }\varLambda + I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad =\det \left[ \frac{\alpha (\frac{\gamma }{\mu })^p(\frac{\mu }{\gamma }Z + I_{{\mathcal {H}}_2})^p + \beta (\frac{\gamma }{\mu })^{-q}(\frac{\mu }{\gamma }Z + I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] \\&\quad =\det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{{\mathcal {H}}_1} \otimes I_3)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \;\;\text {where} \\&C = \begin{pmatrix} \frac{A^{*}A}{\gamma } \;\; -\frac{A^{*}B}{\sqrt{\gamma \mu }}(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{A^{*}AA^{*}B}{\gamma \sqrt{\gamma \mu }}(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \\ \frac{B^{*}A}{\sqrt{\gamma \mu }} \;\; -\frac{B^{*}B}{\mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{B^{*}AA^{*}B}{\gamma \mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \\ \frac{B^{*}A}{\sqrt{\gamma \mu }} \;\; -\frac{B^{*}B}{\mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{B^{*}AA^{*}B}{\gamma \mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \end{pmatrix}, \end{aligned}$$

which is obtained by replacing \(AA^{*}\) and \(BB^{*}\) in Lemma 19 with \(\frac{AA^{*}}{\gamma }\) and \(\frac{BB^{*}}{\mu }\), respectively. \(\square \)

Proof of Theorem 27

Let \(Z + \frac{\gamma }{\mu } I_{{\mathcal {H}}_2} = (AA^{*}+\gamma I_{{\mathcal {H}}_2})(BB^{*}+ \mu I_{{\mathcal {H}}_2})^{-1}\). By Eq. (29), when \(\dim ({\mathcal {H}}_2) < \infty \),

$$\begin{aligned}&D^{(\alpha , \beta )}_{r}[(AA^{*}+\gamma I_{{\mathcal {H}}_2}), (BB^{*}+\mu I_{{\mathcal {H}}_2})] \\&\quad = \frac{1}{\alpha \beta }\log \det \left[ \frac{\alpha (Z +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^p + \beta (Z + \frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^{-q}}{\alpha + \beta }\right] \\&\quad =\frac{1}{\alpha \beta }\left[ \log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \right] \dim ({\mathcal {H}}_2) \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{\alpha (Z +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^p + \beta (Z + \frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] . \end{aligned}$$

As in the proof of Theorem 26, the determinant in last term in the above expression is

$$\begin{aligned}&\det \left[ \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^p + \beta (\varLambda + \frac{\gamma }{\mu }I_{{\mathcal {H}}_2})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] \\&= \det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{{\mathcal {H}}_1} \otimes I_3)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] . \end{aligned}$$

This gives us the final expression. \(\square \)

Proof of Theorems 28 and 29

Theorem 28 follows from Theorem 26 by considering the linear operators \(A = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {x}})J_m: {\mathbb {R}}^m \rightarrow {\mathcal {H}}_K, \;\;\; B = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {y}})J_m:{\mathbb {R}}^m \rightarrow {\mathcal {H}}_K\). The proof of Theorem 29 is similar to that of Theorem  28, except that we invoke Theorem 27.\(\square \)

1.8 Proofs for the Metric properties

We now prove Theorems 202223, which lead to the proofs of Theorems 4 and 21 . We first prove Theorems 4 and 21 for the case \(\alpha =1/2\), which corresponds to the infinite-dimensional symmetric Stein divergence, and then for the general case \(\alpha > 0\). The former case utilizes Theorem 31 and the latter case utilizes Theorem 33 and the case \(\alpha = 1/2\). Both Theorems 31 and 33 are of interest in their own right.

The case of the infinite-dimensional symmetric Stein divergence. Consider the first case \(\alpha =\frac{1}{2}\).

Lemma 20

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B,C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint finite-rank operators, such that \(A+I > 0\), \(B + I > 0\), \(C+I > 0\). Then

$$\begin{aligned} \sqrt{\log \frac{\det (\frac{A+B}{2} + I)}{\sqrt{\det (A+I)\det (B+I)}}}&\le \sqrt{\log \frac{\det (\frac{A+C}{2} + I)}{\sqrt{\det (A+I)\det (C+I)}}} \nonumber \\&+ \sqrt{\log \frac{\det (\frac{C+B}{2} + I)}{\sqrt{\det (C+I)\det (B+I)}}}. \end{aligned}$$
(149)

Proof

Since ABC are finite-rank operators, there exists a finite-dimensional subspace \({\mathcal {H}}_n \subset {\mathcal {H}}\), with \(\dim ({\mathcal {H}}_n) = n\) for some \(n \in {\mathbb {N}}\), such that \(\mathrm{range}(A) \subset {\mathcal {H}}_n, \mathrm{range}(B) \subset {\mathcal {H}}_n\), \(\mathrm{range}(C) \subset {\mathcal {H}}_n\). Let

$$\begin{aligned} A_n = A \big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n,\;\; B_n = B\big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n, \;\; C_n = C\big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n. \end{aligned}$$

Then \(A_n,B_n, C_n\) are linear operators on the finite-dimensional space \({\mathcal {H}}_n\) and thus are represented by \(n \times n\) matrices, which we denote by the same symbols. We have \( (A+B)_n = (A+B) \big \vert _{{\mathcal {H}}_n} = A \big \vert _{{\mathcal {H}}_n} + B\big \vert _{{\mathcal {H}}_n} = A_n + B_n, (A+C)_n = A_n + C_n, (C+B)_n = B_n + C_n. \) Applying the finite-dimensional result in [37], we then obtain

$$\begin{aligned} \sqrt{\log \frac{\det (\frac{A_n+B_n}{2} + I_n)}{\sqrt{\det (A_n+I_n)\det (B_n+I_n)}}}&\le \sqrt{\log \frac{\det (\frac{A_n+C_n}{2} + I_n)}{\sqrt{\det (A_n+I_n)\det (C_n+I_n)}}} \nonumber \\&+ \sqrt{\log \frac{\det (\frac{C_n+B_n}{2} + I_n)}{\sqrt{\det (C_n+I_n)\det (B_n+I_n)}}}. \end{aligned}$$

Since the non-zero eigenvalues of A and \(A_n\) are the same, we have \(\det (A+I) = \det (A_n+I_n)\) and the same holds true for the other operators. This, together with the last expression, gives us the final result. \(\square \)

Proof of Theorem 23 - Triangle inequality for square root of symmetric Stein divergence

Let \(\{A_n\}_{n\in {\mathbb {N}}}\), \(\{B_n\}_{n\in {\mathbb {N}}}\), \(\{C_n\}_{n\in {\mathbb {N}}}\) be sequences of finite-rank operators with \(||A_n - A||_{\mathrm{tr}} \rightarrow 0,\;\;||B_n - B||_{\mathrm{tr}} \rightarrow 0,\;\; ||C_n - C||_{\mathrm{tr}} \rightarrow 0, \;\;\hbox { as}\ n \rightarrow \infty . \) By Lemma 20, we have

$$\begin{aligned} \sqrt{\log \frac{\det (\frac{A_n+B_n}{2} + I)}{\sqrt{\det (A_n+I)\det (B_n+I)}}}&\le \sqrt{\log \frac{\det (\frac{A_n+C_n}{2} + I)}{\sqrt{\det (A_n+I)\det (C_n+I)}}} \nonumber \\&+ \sqrt{\log \frac{\det (\frac{C_n+B_n}{2} + I)}{\sqrt{\det (C_n+I)\det (B_n+I)}}}. \end{aligned}$$

By Theorem 3.5 in [36], as \(n \rightarrow \infty \), we have

$$\begin{aligned}&\det \left( A_n + I\right) \rightarrow \det (A+I), \;\; \det (B_n + I) \rightarrow \det (B+I), \\&\det \left( \frac{A_n+B_n}{2} + I\right) \rightarrow \det (\frac{A+B}{2} + I),\; \hbox { and similarly for operators involving}\ C_n. \end{aligned}$$

Taking the limit \(n \rightarrow \infty \) in the above triangle inequality for \((A_n+I), (B_n+I)\), and \((C_n+I)\) gives the final triangle inequality for \((A+I), (B+I)\), and \((C+I)\). \(\square \)

The following is the specialization of Theorem 4 when \(\alpha = 1/2\).

Theorem 30

(Metric property - square root of symmetric Stein divergence) Let \(\gamma > 0, \gamma \in {\mathbb {R}}\) be fixed. The square root \(\sqrt{D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]}\) of the infinite-dimensional symmetric Stein divergence is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\).

Proof of Theorem 30

The positivity and symmetry of \(D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]\) are shown in Theorems 1 and 14, respectively. It remains for us to show the triangle inequality, namely

$$\begin{aligned} \sqrt{D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]}&\le \sqrt{D^{(1/2,1/2)}_1[(A+\gamma I), (C+\gamma I)]} \\&+\sqrt{D^{(1/2,1/2)}_1[(C+\gamma I), (B+\gamma I)]}, \end{aligned}$$

for any three operators \((A+\gamma I), (B+\gamma I), (C+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\). We have

$$\begin{aligned} D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]&= 4 \log \left[ \frac{\mathrm{det_X}(\frac{(A+\gamma I) + (B+\gamma I)}{2})}{\mathrm{det_X}(A+\gamma I)^{1/2}\mathrm{det_X}(B+\gamma I)^{1/2}}\right] \\&= 4\log \left[ \frac{\det (\frac{A+B}{2\gamma }+I)}{\det (\frac{A}{\gamma } + I)^{1/2}\det (\frac{B}{\gamma }+I)^{1/2}}\right] . \end{aligned}$$

Thus the triangle inequality for \(D^{(1/2,1/2)}_1[(A+\gamma I), (B+\gamma I)]\) follows that stated in Theorem 23. \(\square \)

Lemma 21

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B \in \mathrm{Sym}({\mathcal {H}})\) be finite-rank operators, with maximum rank \(n \in {\mathbb {N}}\), such that \(A+I > 0\), \(B + I > 0\). Then

$$\begin{aligned} \prod _{j=1}^n\left[ \frac{\lambda _j(A) + \lambda _j(B)}{2} + 1\right]&\le \det \left( \frac{A+B}{2} + I\right) . \end{aligned}$$
(150)

Proof of Lemma 21

Since AB are both finite-rank operators, there exists a finite-dimensional subspace \({\mathcal {H}}_n \subset {\mathcal {H}}\), with \(\dim ({\mathcal {H}}_n) = n\), such that \(\mathrm{range}(A) \subset {\mathcal {H}}_n\), \(\mathrm{range}(B) \subset {\mathcal {H}}_n\). Let \(A_n = A \big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n,\;\; B_n = B\big \vert _{{\mathcal {H}}_n}: {\mathcal {H}}_n \rightarrow {\mathcal {H}}_n\). Then \(A_n,B_n\) are linear operators on the finite-dimensional space \({\mathcal {H}}_n\) and thus are represented by \(n \times n\) matrices, which we denote by the same symbols. Furthermore, \( (A+B)_n = (A+B) \big \vert _{{\mathcal {H}}_n} = A \big \vert _{{\mathcal {H}}_n} + B\big \vert _{{\mathcal {H}}_n} = A_n + B_n. \) Thus we can apply the following inequality for finite SPD matrices ( [4])

$$\begin{aligned} \prod _{j=1}^n\left[ \frac{\lambda _j(A_n) + \lambda _j(B_n)}{2} + 1\right]&= \prod _{j=1}^n\left[ \frac{\lambda _j(A_n+I_n) + \lambda _j(B_n+I_n)}{2}\right] \\&\quad \le \det \left( \frac{A_n+B_n}{2} + I_n\right) . \end{aligned}$$

We note that the non-zero eigenvalues of \(A_n, B_n\) are the same as those of AB, respectively, with the maximum number being n, and \(\det (\frac{A+B}{2}+I) = \det (\frac{A_n+B_n}{2} + I_n)\). Together with the previous inequality, this gives us the final result. \(\square \)

Theorem 31

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint trace class operators, such that \(A+I > 0\), \(B + I > 0\). Then

$$\begin{aligned} \prod _{j=1}^{\infty }\left[ \frac{\lambda _j(A) + \lambda _j(B)}{2} + 1\right]&\le \det \left( \frac{A+B}{2} + I\right) . \end{aligned}$$
(151)

Proof of Theorem 31

Let \(A = \sum _{j=1}^{\infty }\lambda _j(A)\phi _j \otimes \phi _j\) denote the spectral decomposition for A. For each \(n \in {\mathbb {N}}\), define \(A_n = \sum _{j=1}^n\lambda _j(A)\phi _j \otimes \phi _j\). Then \(A_n\) is a finite-rank operator with the eigenvalues being the first n eigenvalues of A and \(\lim _{n \rightarrow \infty }||A_n - A||_{\mathrm{tr}} = 0\). In the same way, we construct a sequence of finite-rank operators \(B_n\) with \(\lim _{n\rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\), so that \( \lim _{n \rightarrow \infty } ||(A_n+B_n) -(A+B)||_{\mathrm{tr}} = 0. \) By Theorem 3.5 in [36], as \(n \rightarrow \infty \), we then have

$$\begin{aligned} \lim _{n \rightarrow \infty }\det \left( \frac{A_n+B_n}{2}+I\right) = \det \left( \frac{A+B}{2}+I\right) . \end{aligned}$$

Applying Lemma 21 to \(A_n\) and \(B_n\), we have

$$\begin{aligned} \prod _{j=1}^{n}\left[ \frac{\lambda _j(A_n) + \lambda _j(B_n)}{2} + 1\right]&\le \det \left( \frac{A_n+B_n}{2} + I\right) . \end{aligned}$$
(152)

The final result is then obtained by taking the limit as \(n \rightarrow \infty \), noting that the eigenvalues of \(A_n\), \(B_n\), are precisely the first n eigenvalues of AB, respectively. \(\square \)

The following is the specialization of Theorem 21 when \(\alpha = 1/2\).

Theorem 32

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A,B \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\) be such that \(A+I> 0,B + I > 0\). Let \(\mathrm{Eig}(A), \mathrm{Eig}(B): \ell ^2 \rightarrow \ell ^2\) be diagonal operators with the diagonals consisting of the eigenvalues of A and B, respectively, in decreasing order. Then

$$\begin{aligned} D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(A) + I), (\mathrm{Eig}(B) + I)] \le D^{(1/2,1/2)}_1[(A+I), (B+I)]. \end{aligned}$$
(153)

Proof of Theorem 32

By definition, we have

$$\begin{aligned}&D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(A) + I), (\mathrm{Eig}(B) + I)]\\&\quad = 4\log \left[ \frac{\det (\frac{\mathrm{Eig}(A) + \mathrm{Eig}(B)}{2}+I)}{\sqrt{\det (\mathrm{Eig}(A)+I)\det (\mathrm{Eig}(B)+I)}}\right] \\&\quad = 4 \log \left[ \frac{\prod _{j=1}^{\infty }\left[ \frac{\lambda _j(A) + \lambda _j(B)}{2} + 1\right] }{\sqrt{\det (A+I)\det (B+I)}}\right] \\&\quad \le 4\log \left[ \frac{\det (\frac{A + B}{2}+I)}{\sqrt{\det (A+I)\det (B+I)}}\right] \;\;\;\text {by Theorem~}\mathrm{31} \\&\quad = D^{(1/2,1/2)}_{1}[(A + I), (B + I)]. \end{aligned}$$

\(\square \)

The general case. We now consider the general case \(\alpha > 0\). In the following, let \({\mathscr {C}}_p({\mathcal {H}})\) denote the set of pth Schatten class operators on \({\mathcal {H}}\), under the norm \(||\;||_p\), which is defined by

$$\begin{aligned} ||A||_p = (\mathrm{tr}|A|^p)^{1/p}, \;\;\; 1 \le p \le \infty , \end{aligned}$$
(154)

with \({\mathscr {C}}_1({\mathcal {H}})\) being the space of trace class operators \(\mathrm{Tr}({\mathcal {H}})\), \({\mathscr {C}}_2({\mathcal {H}})\) being the space of Hilbert–Schmidt operators \(\mathrm{HS}({\mathcal {H}})\), and \({\mathscr {C}}_{\infty }({\mathcal {H}})\) being the set of compact operators under the operator norm \(||\;||\).

Theorem 33

Let \(r \in {\mathbb {R}}\) be fixed but arbitrary. Assume that \(1 \le p \le \infty \). Let \(A,\{A_n\}_{n \in {\mathbb {N}}} \in \mathrm{Sym}({\mathcal {H}}) \cap {\mathscr {C}}_p({\mathcal {H}})\), be such that \(I+A > 0\), \(I+A_n > 0\)\(\forall n \in {\mathbb {N}}\). Assume that \(\lim _{n \rightarrow \infty }||A_n - A||_{p} = 0\). Then

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+A_n)^r - (I+A)^r||_{p} = 0. \end{aligned}$$
(155)

Proof of Theorem 33

(i) The case \(r = 0\) is trivial. We first prove that

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+A_n)^r - (I+A)^r||_{p} = 0, \;\; 0 < r \le 1. \end{aligned}$$
(156)

We make use of the following result from [18] (Corollary 3.2), which states that for any two positive operators AB on \({\mathcal {H}}\) such that \(A \ge c > 0\), \(B \ge c > 0\), and any operator X on \({\mathcal {H}}\),

$$\begin{aligned} ||A^rX - XB^r||_p \le rc^{r-1}||AX-XB||_p, \;\;\; 0 < r \le 1. \end{aligned}$$
(157)

By the assumption \(I+A > 0\), there exists \(M_A > 0\) such that \( \langle x, (I+A)x\rangle \ge M_A||x||^2 \;\; \; \forall x \in {\mathcal {H}}. \) By the assumption \(\lim _{n \rightarrow \infty }||A_n -A||_{p} = 0\), for any \(\epsilon \) satisfying \(0< \epsilon < M_A\), there exists \(N = N(\epsilon ) \in {\mathbb {N}}\) such that \(||A_n - A||_{p} < \epsilon \)\(\forall n \ge N\). Then \(\forall x \in {\mathcal {H}}\),

$$\begin{aligned} |\langle x, (A_n -A)x\rangle | \le ||A_n - A||\;||x||^2 \le ||A_n-A||_{p}||x||^2 \le \epsilon ||x||^2. \end{aligned}$$

It thus follows that \(\forall x \in {\mathcal {H}}\),

$$\begin{aligned} \langle x, (I+A_n)x\rangle = \langle x, (I+A)x\rangle + \langle x, (A_n-A)x\rangle \ge (M_A -\epsilon )||x||^2. \end{aligned}$$

Thus we have \(I+A \ge M_A > 0\), \(I+A_n \ge M_A-\epsilon > 0\)\(\forall n \ge N = N(\epsilon )\). Then, applying Eq. (157),

$$\begin{aligned} ||(I+A_n)^r - (I+A)^r||_{p}&\le r (M_A - \epsilon )^{r-1}||A_n - A||_{p} \;\;\;\forall n \ge N, \end{aligned}$$

which implies \(\lim _{n \rightarrow \infty }||(I+A_n)^r - (I+A)^r||_{p} = 0\).

(ii) For \(r > 1\), we proceed by induction as follows. We have

$$\begin{aligned}&||(I+A_n)^r - (I+A)^r||_p \\&\quad \le ||(I+A_n)^r - (I+A_n)(I+A)^{r-1}||_p + ||(I+A_n)(I+A)^{r-1} - (I+A)^r||_p \\&\quad \le ||I+A_n||\;||(I+A_n)^{r-1} - (I+A)^{r-1}||_p + ||A_n - A||_p||(I+A)^{r-1}||. \end{aligned}$$

Thus this case follows from the case \(0 \le r \le 1\) by induction.

(iii) For the case \(r < 0\), we first prove that

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+A_n)^{-1} - (I+A)^{-1}||_p = 0. \end{aligned}$$
(158)

This follows from the fact that \(\forall n \ge N = N(\epsilon )\),

$$\begin{aligned}&||(I+A_n)^{-1} - (I+A)^{-1}||_{p} = ||(I+A_n)^{-1}[(I+A_n) - (I+A)](I+A)^{-1}||_{p} \\&\quad \le ||(I+A_n)^{-1}||\;||A_n - A||_{p}||(I+A)^{-1}|| \le \frac{1}{M_A(M_A-\epsilon )}||A_n -A||_{p}. \end{aligned}$$

(iii) We next prove that

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+A_n)^{-r} - (I+A)^{-r}||_p = 0, \;\; 0 < r \le 1. \end{aligned}$$
(159)

We have \((I+A)^{-1} \ge \frac{1}{\max \{(1+\lambda _k(A)): k \in {\mathbb {N}}\}} = \frac{1}{||I+A||}> 0\). From the limit \(\lim _{n \rightarrow \infty }||A_n-A|| = 0\), it follows that for any \(\epsilon \) satisfying \(0< \epsilon < ||I+A||\), there exists \(M = M(\epsilon ) \in {\mathbb {N}}\) such that \(\forall n \ge M\),

$$\begin{aligned} ||I+A|| - \epsilon \le ||I+A_n|| \le ||I+A|| + \epsilon . \end{aligned}$$

It follows that \(\forall n \ge M\), \( (I+A_n)^{-1} \ge \frac{1}{\max \{(1+\lambda _k(A_n)): k \in {\mathbb {N}}\}} = \frac{1}{||I+A_n||} \ge \frac{1}{||I+A||+\epsilon }. \) Hence invoking Eq. (157) again, we obtain \(\forall n \ge M\)

$$\begin{aligned} ||(I+A_n)^{-r} - (I+A)^{-r}||_{p} \le r(||I+A||+\epsilon )^{1-r}||(I+A_n)^{-1} - (I+A)^{-1}||_p, \end{aligned}$$

which implies that \(\lim _{n \rightarrow \infty }||(I+A_n)^{-r} - (I+A)^{-r}||_{p} = 0\) by the previous limit, when \(r = 1\).

(iv) By an induction argument as in step (ii), we then obtain the final part of the theorem, namely

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+A_n)^{-r} - (I+A)^{-r}||_p = 0, \;\; \forall r > 1. \end{aligned}$$

Lemma 22

Let \({\mathcal {H}}\) be a separable Hilbert space. Assume that \(\{A_n\}_{n\in {\mathbb {N}}}\), A are self-adjoint trace class operators on \({\mathcal {H}}\) such that \((I+A) > 0\), \((I+A_n) > 0\)\(\forall n \in {\mathbb {N}}\). Assume that \(||A_n - A||_{\mathrm{tr}} = 0\) as \(n \rightarrow \infty \). Then \(A_n(I+A_n)^{-1}\) and \(A(I+A)^{-1}\) are trace class operators and

$$\begin{aligned} \lim _{n \rightarrow \infty }||A_n(I+A_n)^{-1} - A(I+A)^{-1}||_{\mathrm{tr}} = 0. \end{aligned}$$
(160)

Lemma 23

Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(\{A_n\}_{n \in {\mathbb {N}}}\), A, \(\{B_n\}_{n \in {\mathbb {N}}}\), B, in \(\mathrm{Tr}({\mathcal {H}})\) be self-adjoint operators with \(\lim _{n \rightarrow \infty }||A_n -A||_{\mathrm{tr}} = 0\), \(\lim _{n \rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\). Assume that \(I+A> 0, I+B> 0, I+A_n> 0, I+ B_n > 0\)\(\forall n \in {\mathbb {N}}\). Then \((I+B_n)^{-1/2}(I+A_n)(I+B_n)^{-1/2} - I\) and \((I+B)^{-1/2}(I+A)(I+B)^{-1/2} - I\) are self-adjoint, trace class operators on \({\mathcal {H}}\) and

$$\begin{aligned}&\lim _{n \rightarrow \infty }||(I+B_n)^{-1/2}(I+A_n)(I+B_n)^{-1/2} - (I+B)^{-1/2}(I+A)(I+B)^{-1/2}||_{\mathrm{tr}}\\&\quad = 0. \end{aligned}$$

Proof of Theorem 20 - Convergence in trace norm

Let \(I+\varLambda = (I+B)^{-1/2}(I+A)(I+B)^{-1/2}\) and \(I+\varLambda _n = (I+B_n)^{-1/2}(I+A_n)(I+B_n)^{-1/2}\), with \(\varLambda , \varLambda _n \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}})\).

By Lemma 23, we have \(\lim _{n\rightarrow \infty }||\varLambda _n -\varLambda ||_{\mathrm{tr}} = 0\). Thus by Theorem 33,

$$\begin{aligned} \lim _{n \rightarrow \infty }||(I+\varLambda _n)^{\alpha } - (I+\varLambda )^{\alpha }||_{\mathrm{tr}} = 0 \;\;\forall \alpha \in {\mathbb {R}}. \end{aligned}$$

From Eq.(130), we have

$$\begin{aligned}&D^{(\alpha , \alpha )}_{2\alpha }[(I+A_n), (I+B_n)] = \frac{1}{\alpha ^2}\log \det \left[ \frac{(I+\varLambda _n)^{\alpha } + (I+\varLambda _n)^{-\alpha }}{2}\right] . \end{aligned}$$

Taking limit as \(n \rightarrow \infty \) and applying the continuity of the Fredholm determinant in the trace norm (e.g. Theorem 3.5 in [36]), we obtain

$$\begin{aligned}&\lim _{n \rightarrow \infty }D^{(\alpha , \alpha )}_{2\alpha }[(I+A_n), (I+B_n)] \\&\quad = \frac{1}{\alpha ^2}\log \det \left[ \frac{(I+\varLambda )^{\alpha } + (I+\varLambda )^{-\alpha }}{2}\right] = D^{(\alpha , \alpha )}_{2\alpha }[(I+A), (I+B)]. \end{aligned}$$

\(\square \)

Proof of Theorem 21 - Diagonalization

Consider first the case \(\alpha > 0\). As in the proof of Theorem 22, it suffices to prove for the case \(\gamma = 1\). Let \(A = \sum _{j=1}^{\infty }\lambda _j(A)\phi _j \otimes \phi _j\) be the spectral decomposition for A. For each \(n \in {\mathbb {N}}\), define \(A_n = \sum _{j=1}^n\lambda _j(A)\phi _j \otimes \phi _j\). Then \(A_n\) is a finite-rank operator with the eigenvalues being the first n eigenvalues of A and \(\lim _{n \rightarrow \infty }||A_n - A||_{\mathrm{tr}} = 0\). In the same way, we construct a sequence of finite-rank operators \(B_n\) with \(\lim _{n\rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\). By construction,

$$\begin{aligned} \lim _{n \rightarrow \infty }||\mathrm{Eig}(A_n) - \mathrm{Eig}(A)||_{\mathrm{tr}} = 0, \;\; \lim _{n \rightarrow \infty }||\mathrm{Eig}(B_n) - \mathrm{Eig}(B)||_{\mathrm{tr}} = 0. \end{aligned}$$

Thus by Theorem 20, we have

$$\begin{aligned}&\lim _{n \rightarrow \infty }D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(A_n) + I), (\mathrm{Eig}(B_n) + I)] = D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(A) + I), (\mathrm{Eig}(B) + I)], \\&\lim _{n \rightarrow \infty }D^{(\alpha ,\alpha )}_{2\alpha }[(A_n + I), (B_n + I)] = D^{(\alpha ,\alpha )}_{2\alpha }[(A + I), (B + I)]. \end{aligned}$$

Since \(A_n, B_n\) can be identified with finite-dimensional matrices, as in the proof of Lemma 20, we can apply the corresponding finite-dimensional result in [9] to obtain

$$\begin{aligned} D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(A_n) + I), (\mathrm{Eig}(B_n) + I)] \le D^{(\alpha ,\alpha )}_{2\alpha }[(A_n + I), (B_n + I)]. \end{aligned}$$

Thus taking limits as \(n \rightarrow \infty \) gives

$$\begin{aligned} D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(A) + I), (\mathrm{Eig}(B) + I)] \le D^{(\alpha ,\alpha )}_{2\alpha }[(A + I), (B + I)]. \end{aligned}$$

Letting \(\alpha \rightarrow 0\) on both sides of the above expression, we also obtain the result for the case \(\alpha = 0\). \(\square \)

Proof of Theorem 22 - Triangle inequality

For a fixed \(\gamma > 0\), we have

$$\begin{aligned}&D^{(\alpha , \alpha )}_{2\alpha }[(A + \gamma I), (B+ \gamma I)] \\&\quad = \frac{1}{\alpha ^2}\log \mathrm{det_X}\left( \frac{[(A+\gamma I)(B+\gamma I)^{-1}]^{\alpha } + (A+\gamma I)(B+\gamma I)^{-1}]^{-\alpha }}{2}\right) \\&\quad = \frac{1}{\alpha ^2}\mathrm{logdet}\left( \frac{[(\frac{A}{\gamma }+ I)(\frac{B}{\gamma }+ I)^{-1}]^{\alpha } + (\frac{A}{\gamma }+ I)(\frac{B}{\gamma }+I)^{-1}]^{-\alpha }}{2}\right) , \end{aligned}$$

which reduces to the case \(\gamma = 1\). Thus it suffices to prove for \(\gamma = 1\).

Let

$$\begin{aligned} \varLambda _1+ I&= (B+I)^{-1/2}(A+I)(B+I)^{-1/2}> 0, \\ \varLambda _2 + I&= (B+I)^{-1/2}(C+I)(B+I)^{-1/2} > 0. \end{aligned}$$

Let \(\mathrm{Eig}(\varLambda _1), \mathrm{Eig}(\varLambda _2):\ell ^2 \rightarrow \ell ^2\) denote diagonal operators with the diagonals consisting of the eigenvalues of \(\varLambda _1, \varLambda _2\), respectively, in decreasing order. We have by definition

$$\begin{aligned} D^{(1/2,1/2)}_{1}[(A+I),(B+I)]&= 4\log \det \left[ \frac{(\varLambda _1 + I)^{1/2} + (\varLambda _1 + I)^{-1/2}}{2}\right] \\&= D^{(1/2,1/2)}_{1}[(\varLambda _1 + I),I] \\&= D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1) + I),I]. \end{aligned}$$

Then for any \(\alpha > 0\), we have

$$\begin{aligned}&D^{(\alpha , \alpha )}_{2\alpha }[(A+ I), (B+ I)] = \frac{1}{\alpha ^2}\log \det \left[ \frac{(\varLambda _1 + I)^{\alpha } + (\varLambda _1 +I)^{-\alpha }}{2}\right] \nonumber \\&= \frac{1}{4\alpha ^2}D^{(1/2,1/2)}_{1}[(\varLambda _1+I)^{2\alpha }, I] = \frac{1}{4\alpha ^2}D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1)+I)^{2\alpha }, I]. \end{aligned}$$
(161)

Similarly,

$$\begin{aligned} D^{(\alpha , \alpha )}_{2\alpha }[(C+ I), (B+ I)]&= \frac{1}{4\alpha ^2}D^{(1/2,1/2)}_{1}[(\varLambda _2+I)^{2\alpha }, I] \nonumber \\&= \frac{1}{4\alpha ^2}D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _2)+I)^{2\alpha }, I]. \end{aligned}$$
(162)

Furthermore, by the Affine Invariance property (Theorem 16), we have

$$\begin{aligned} D^{(\alpha , \alpha )}_{2\alpha }[(A+ I), (C+ I)]&= D^{(\alpha , \alpha )}_{2\alpha }[(\varLambda _1 + I), (\varLambda _2 + I)]. \end{aligned}$$
(163)

By the triangle inequality in Theorem 23,

$$\begin{aligned} \sqrt{D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1) + I)^{2\alpha }, I]}&\le \sqrt{D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1) + I)^{2\alpha }, (\mathrm{Eig}(\varLambda _2) + I)^{2\alpha }]} \nonumber \\&+ \sqrt{D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _2) + I)^{2\alpha }, I]}. \end{aligned}$$
(164)

The desired triangle inequality for \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+I), (B+I)]}\) requires that for all \((A+I), (B+I), (C+I) \in \mathrm{PTr}({\mathcal {H}})\),

$$\begin{aligned} \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+I), (B+I)]}&\le \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+I), (C+I)]} + \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(C+I), (B+I)]}. \end{aligned}$$
(165)

By Eqs. (161,162,163,164), the triangle inequality in Eq. (165) is satisfied if

$$\begin{aligned} D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1) + I)^{2\alpha }, (\mathrm{Eig}(\varLambda _2) + I)^{2\alpha }] \le 4\alpha ^2 D^{(\alpha ,\alpha )}_{2\alpha }[(A + I), (C + I)]. \end{aligned}$$
(166)

Since \(\mathrm{Eig}(\varLambda _1)\) and \(\mathrm{Eig}(\varLambda _2)\) are diagonal operators, they commute and thus

$$\begin{aligned}&D^{(1/2,1/2)}_{1}[(\mathrm{Eig}(\varLambda _1) + I)^{2\alpha }, (\mathrm{Eig}(\varLambda _2) + I)^{2\alpha }] \\&\quad =4 \log \left[ \frac{\det \left[ \frac{(\mathrm{Eig}(\varLambda _1) + I)^{2\alpha } + (\mathrm{Eig}(\varLambda _2) + I)^{2\alpha }}{2}\right] }{\det (\mathrm{Eig}(\varLambda _1) +I)^{\alpha }\det (\mathrm{Eig}(\varLambda _2) + I)^{\alpha }}\right] \\&\quad = 4\log \det \left[ \frac{[(\mathrm{Eig}(\varLambda _1) + I)(\mathrm{Eig}(\varLambda _2) + I)^{-1}]^{\alpha } + (\mathrm{Eig}(\varLambda _1) + I)(\mathrm{Eig}(\varLambda _2) + I)^{-1}]^{-\alpha }}{2}\right] \\&\quad = 4\alpha ^2D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(\varLambda _1) + I), (\mathrm{Eig}(\varLambda _2) + I)] \\&\quad \le 4\alpha ^2 D^{(\alpha ,\alpha )}_{2\alpha }[(\varLambda _1 + I), (\varLambda _2 + I)] \;\;\;\text {by Theorem}~\mathrm{21} \\&\quad = 4\alpha ^2 D^{(\alpha ,\alpha )}_{2\alpha }[(A + I), (C + I)]. \end{aligned}$$

This is precisely the desired inequality stated in Eq. (166). \(\square \)

Proof of Theorem 4 - Metric property

The case \(\alpha = 0\) corresponds to the affine-invariant Riemannian distance on the Hilbert manifold \(\varSigma ({\mathcal {H}})\) [20], which is still a metric when restricted to \(\mathrm{PTr}({\mathcal {H}})\). Consider the case \(\alpha > 0\). The positivity and symmetry of the divergence \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]\) are from Theorems 1 and 14 , respectively. The triangle inequality for the square root \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]}\) is from Theorem 22. Thus \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]}\) is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\). \(\square \)

To prove Theorem 5, we first need the following technical result.

Lemma 24

Let \(\lambda > 0, \lambda \ne 1\) be fixed. Consider the function

$$\begin{aligned} f(x) = \left\{ \begin{array}{ll} \frac{1}{x^2}\log \left( \frac{\lambda ^{x} + \lambda ^{-x}}{2}\right) , &{} x > 0. \\ \frac{1}{2}(\log \lambda )^2, &{} x = 0. \end{array} \right. \end{aligned}$$
(167)

Then f is continuous on \([0, \infty )\) and strictly decreasing on \((0, \infty )\), with \(f_{\max } = f(0) = \frac{1}{2}(\log \lambda )^2\). Furthermore, \(\lim _{x \rightarrow \infty }f(x) = 0\).

Proof of Theorem 5 - Monotonicity

Let \(\{\lambda _j\}_{j=1}^{\infty }\) be the eigenvalues of the trace class operator \(\varLambda \), where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\). For \(\gamma = \mu \), we have \(\delta = \frac{\alpha }{\alpha + \beta }\), \(p = r(1-\delta ) = \frac{r\beta }{\alpha + \beta }\), \(q = \frac{r\alpha }{\alpha + \beta }\). If furthermore \(\alpha = \beta \) and \(r = 2\alpha \), then \(\delta = \frac{1}{2}\), \(p = \alpha \), \(q = \alpha \). By Theorem 10, for \(\alpha > 0\),

$$\begin{aligned} D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]&= \frac{1}{\alpha ^2} \log \det \left[ \frac{(\varLambda + I)^{\alpha } + (\varLambda + I)^{-\alpha }}{2}\right] \\&= \frac{1}{\alpha ^2}\sum _{j=1}^{\infty }\log \left( \frac{(\lambda _j+1)^{\alpha } + (\lambda _j + 1)^{-\alpha }}{2}\right) . \end{aligned}$$

We note that since \(\varLambda +I > 0\), we have \(\lambda _j +1> 0\)\(\forall j \in {\mathbb {N}}\). Furthermore, if \(A \ne B\), then there exists at least one index \(j \in {\mathbb {N}}\) for which \(\lambda _j \ne 0\), so that \(\lambda _j + 1 > 0\), \(\lambda _j +1 \ne 1\), since otherwise \(\varLambda = 0\) and \(A+\gamma I = B+ \gamma I \Longleftrightarrow A = B\). The terms with \(\lambda _j = 0\) simply vanish and play no contribution in the infinite series expansion above for \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]\).

By Lemma 24, as a function of the parameter \(\alpha \), \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]\) is strictly decreasing on \((0, \infty )\), with maximum value achieved at \(\alpha = 0\).

By Lemma 24 and Lebesgue’s Monotone Convergence Theorem, we have

$$\begin{aligned}&\lim _{\alpha \rightarrow \infty }D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)] = \lim _{\alpha \rightarrow \infty }\frac{1}{\alpha ^2}\sum _{j=1}^{\infty }\log \left( \frac{(\lambda _j+1)^{\alpha } + (\lambda _j + 1)^{-\alpha }}{2}\right) \\&\quad = \sum _{j=1}^{\infty }\lim _{\alpha \rightarrow \infty }\log \left( \frac{(\lambda _j+1)^{\alpha } + (\lambda _j + 1)^{-\alpha }}{2}\right) = 0. \end{aligned}$$

\(\square \)

1.9 Proofs for the Derivatives

Proof of Lemma 2

This follows from the linearity of the \(\mathrm{tr}_X\) operation, which gives \(\mathrm{tr}_X[A_0+\gamma _0I + t(A+\gamma I)] - \mathrm{tr}_X(A_0+\gamma _0I) - t\mathrm{tr}_X(A+\gamma I) = 0\). \(\square \)

Proof of Proposition 2

For \(\gamma = 0\), this reduces to the Plemelj-Smithies formula. Assume that \(\gamma \ne 0\). By definition of the extended Fredholm determinant \(\mathrm{det_X}\), for \(t \ne -1/\gamma \), we have

$$\begin{aligned}&\mathrm{det_X}[I+ t(A+\gamma I)] = \mathrm{det_X}[(1+\gamma t)I + tA] = (1+\gamma t)\det \left( \frac{tA}{1+\gamma t} + I\right) \\&\quad = (1+\gamma t)\left[ 1 +\sum _{k=1}^{\infty }\frac{t^k}{k!}\frac{\alpha _k(A)}{(1+\gamma t)^k}\right] = (1+\gamma t) + t\mathrm{tr}(A) +\sum _{k=2}^{\infty }\frac{t^k}{k!}\frac{\alpha _k(A)}{(1+\gamma t)^{k-1}} \\&\quad = 1 +t\mathrm{tr}_X(A+\gamma I) + \sum _{k=2}^{\infty }\frac{t^k}{k!}\frac{\alpha _k(A)}{(1+\gamma t)^{k-1}} \;\; \text {by the Plemelj-Smithies formula}. \end{aligned}$$

Let \(0< \epsilon < 1\) be fixed. For t sufficiently close to zero such that \(1+ \gamma t > \epsilon \), we have

$$\begin{aligned}&\left| \frac{\mathrm{det_X}[I+t(A+\gamma I)] - [1 + t\mathrm{tr}_X(A+\gamma I)]}{t}\right| = \left| \sum _{k=2}^{\infty }\frac{t^{k-1}}{k!}\frac{\alpha _k(A)}{(1+\gamma t)^{k-1}}\right| \\&\quad \le \sum _{k=1}^{\infty }\frac{|t|^k}{(k+1)!}\frac{e^{k+1}||A||_{\mathrm{tr}}^{k+1}}{\epsilon ^k} = \frac{|t|e^2||A||^2_{\mathrm{tr}}}{\epsilon }\left[ \frac{1}{2!} + \frac{|t|e||A||_{\mathrm{tr}}}{3!\epsilon } + \cdots \right] \;\;\text {by Eq.~({95})} \\&\quad \le \frac{|t|e^2||A||^2_{\mathrm{tr}}}{\epsilon }\exp (|t|e||A||_{\mathrm{tr}}/\epsilon ), \;\;\;\text {from which it follows that} \\&\mathrm{det_X}[I+ t(A+\gamma I)] = 1 +t\mathrm{tr}_X(A+\gamma I) + O(t^2), \;\;\; \text {and consequently}, \\&\lim _{t \rightarrow 0}\frac{\mathrm{det_X}[I+ t(A+\gamma I)] - [1 +t\mathrm{tr}_X(A+\gamma I)]}{t} = 0. \end{aligned}$$

\(\square \)

Proof of Lemma 3

By the product rule of \(\mathrm{det_X}\),

$$\begin{aligned}&\mathrm{det_X}(A_0 +\gamma _0 I + t(A+\gamma I)) - \mathrm{det_X}(A_0 + \gamma _0I) \\&\quad -t \mathrm{det_X}(A_0+\gamma _0I)\mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I))] \\&\quad = \mathrm{det_X}(A_0+\gamma _0I) \\&\quad \times [\mathrm{det_X}(I + t(A_0+\gamma _0I)^{-1}(A+\gamma I)] - 1 - t \mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I))]. \end{aligned}$$

By Proposition 2, we have

$$\begin{aligned} \lim _{t \rightarrow 0}\frac{\mathrm{det_X}(I + t(A_0+\gamma _0I)^{-1}(A+\gamma I)] - 1 - t \mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I))}{t} = 0, \end{aligned}$$

from which we have the desired formula. \(\square \)

Proof of Lemma 4

For the function \(\log :{\mathbb {R}}^{+}\rightarrow {\mathbb {R}}\), \(D\log (x_0)(x) = \frac{x}{x_0}\). By the chain rule, we have

$$\begin{aligned}&D\log \mathrm{det_X}(A_0+\gamma _0 I)(A+\gamma I) = D(\log \circ \mathrm{det_X})(A_0 + \gamma _0I)(A+\gamma I) \\&\quad = [ D\log (\mathrm{det_X}(A_0+\gamma _0I))\circ D\mathrm{det_X}(A_0 + \gamma _0I)](A+\gamma I) \\&\quad = \mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I)], \end{aligned}$$

by applying the formula for the derivative of \(D\mathrm{det_X}(A_0+\gamma _0)(A+\gamma I) \) from Lemma 3. \(\square \)

We now prove Proposition 3. Recall the chain rule for differentiation on Banach spaces. Let VWU be Banach spaces. Let \(\varOmega \subset V\) be open and assume that \(f: \varOmega \rightarrow W\) is differentiable at \(x_0 \in \varOmega \). Let \(\varSigma \subset W\) be open and assume that \(g:\varSigma \rightarrow U\) is differentiable at \(y_0 = f(x_0) \in \varSigma \). Then \(g\circ f\) is defined in an open neighborhood of \(x_0\) and is differentiable at \(x_0\), with

$$\begin{aligned} D(g\circ f)(x_0) = Dg(y_0) \circ Df(x_0). \end{aligned}$$
(168)

Let \({\mathcal {A}}\) be a Banach algebra with identity I. Consider the mapping \(f: {\mathcal {A}}\rightarrow {\mathcal {A}}\) defined by \(f(A) = A^k\), with \(k \in {\mathbb {N}}\). Its derivative \(Df(A_0)\) at \(A_0 \in {\mathcal {A}}\) is given by

$$\begin{aligned}&Df(A_0)A = A_0^{k-1}A + A_0^{k-2}AA_0 + \cdots + AA_0^{k-1}, \;\; A \in {\mathcal {A}}. \end{aligned}$$
(169)

For the exponential mapping \(\exp (A) = \sum _{k=0}^{\infty }\frac{A^k}{k!}\), we then have

$$\begin{aligned}&D\exp (A_0)(A) = \sum _{k=1}^{\infty }\frac{1}{k!}(A_0^{k-1}A + A_0^{k-2}AA_0 + \cdots + AA_0^{k-1}). \end{aligned}$$
(170)

In particular, for \(A_0 = I\) and \(A_0 = 0\), respectively,

$$\begin{aligned}&D\exp (I)A = A + \frac{2A}{2!} + \cdots + \frac{kA}{k!} + \cdots = A\sum _{k=0}^{\infty }\frac{1}{k!} = \exp (1)A = eA, \end{aligned}$$
(171)
$$\begin{aligned}&D\exp (0)A = A \Longleftrightarrow D\exp (0) = \mathrm{id}_{{\mathcal {A}}}, \; \; \text {where }\mathrm{id}_{{\mathcal {A}}}\text { is the identity operator on }{\mathcal {A}}. \end{aligned}$$
(172)

Let \(\mathrm{GL}({\mathcal {A}})\) denote the group of invertible elements in \({\mathcal {A}}\). For the function \(f:\mathrm{GL}({\mathcal {A}}) \rightarrow \mathrm{GL}({\mathcal {A}})\) defined by \(f(A) = A^{-1}\), its Fréchet derivative \(Df(A_0)\) at \(A_0 \in \mathrm{GL}({\mathcal {A}})\) is given by

$$\begin{aligned}&Df(A_0)(A) = - A_0^{-1}AA_0^{-1}, \;\; A \in {\mathcal {A}}. \end{aligned}$$
(173)

Consider now the case \({\mathcal {A}}= {\mathcal {L}}({\mathcal {H}})\). Consider the sets

$$\begin{aligned} \mathrm{K}_X({\mathcal {H}}) = \{A + \gamma I \; : A \in {\mathcal {L}}({\mathcal {H}}), \; A \text { compact }, \gamma \in {\mathbb {R}}\}, \end{aligned}$$
(174)
$$\begin{aligned} \mathrm{PK}({\mathcal {H}}) = \{A + \gamma I > 0 \; : A \in {\mathcal {L}}({\mathcal {H}}), \; A \text { compact },\; A = A^{*}, \gamma \in {\mathbb {R}}\}, \end{aligned}$$
(175)

of unitized compact and positive definite unitized compact operators on \({\mathcal {H}}\), respectively. Then the inverse logarithm \(\log \) of the exponential map \(\exp \)\(\log :\mathrm{PK}({\mathcal {H}}) \rightarrow \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) is well-defined. Let us find \(D\log (A_0)\) for \(A_0 \in \mathrm{PK}({\mathcal {H}})\). We first have the following result.

Lemma 25

The map \(\exp :\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}) \rightarrow \mathrm{PK}({\mathcal {H}})\) and its inverse \(\log :\mathrm{PK}({\mathcal {H}}) \rightarrow \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) are bijections.

Proof

For \(A \in \mathrm{Sym}({\mathcal {H}})\), A compact, let \(\{\lambda _j\}_{j=1}^{\infty }\) be its eigenvalues, then \(\lim _{j \rightarrow \infty }\lambda _j = 0\). For \(A+\gamma I \in \mathrm{PK}({\mathcal {H}})\), \(\log (\frac{A}{\gamma }+I)\) is compact, with eigenvalues \(\{\log (\frac{\lambda _j}{\gamma } + 1)\}_{j=1}^{\infty } \rightarrow 0\) as \(j \rightarrow \infty \). Hence \( \log (A+\gamma I) = \log (\frac{A}{\gamma } +I) + (\log \gamma )I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}). \) Conversely, for \(A+\gamma I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\), we have \( \exp (A+\gamma I) = e^{\gamma }\exp (A) = e^{\gamma }I + e^{\gamma }[\exp (A) - I] \in \mathrm{PK}({\mathcal {H}}), \) since the operator \((\exp (A) - I)\), with eigenvalues \(\{e^{\lambda _j} - 1\}_{j=1}^{\infty } \rightarrow 0\) as \(j \rightarrow \infty \), is compact. \(\square \)

From the relation \(\exp (\log (A_0)) = A_0\)\(\forall A_0 \in \mathrm{PK}({\mathcal {H}})\) and the chain rule, it follows that

$$\begin{aligned}{}[D\exp (\log (A_0)) \circ D\log (A_0)]A = A \;\forall A_0 \in \mathrm{PK}({\mathcal {H}}), A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}). \end{aligned}$$

Similarly, from the relation \(\log (\exp (B_0)) = B_0\)\(\forall B_0 \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\) and the chain rule, we have

$$\begin{aligned}{}[D\log (\exp (B_0)) \circ D\exp (B_0)]A = A \;\;\forall A,B_0 \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}). \end{aligned}$$

For each \(A_0 \in \mathrm{PK}({\mathcal {H}})\), we have \(A_0 = \exp (B_0)\) where \(B_0 =\log (A_0) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}})\), thus

$$\begin{aligned}{}[D\log (A_0) \circ D\exp (\log (A_0))]A = A \;\forall A_0 \in \mathrm{PK}({\mathcal {H}}), A \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{K}_X({\mathcal {H}}). \end{aligned}$$

Hence it follows that \(\forall A_0 \in \mathrm{PK}({\mathcal {H}})\),

$$\begin{aligned}{}[D\exp (\log (A_0)) \circ D\log (A_0)] = [D\log (A_0) \circ D\exp (\log (A_0))] = \mathrm{id}_{{\mathcal {A}}}. \end{aligned}$$

Thus \(D\log (A_0)\) and \(D\exp (\log (A_0))\) are invertible operators in \({\mathcal {L}}({\mathcal {A}})\) and

$$\begin{aligned} D\log (A_0) = [D\exp (\log (A_0))]^{-1}, \;\;\;\text {with} \;\;\; D\log (I) = [D\exp (0)]^{-1} = \mathrm{id}_{{\mathcal {A}}}. \end{aligned}$$
(176)

We first consider the Fréchet derivative of \(f(t) = (I+tA)^r\), \(r \in {\mathbb {R}}\), for \(A \in \mathrm{Sym}({\mathcal {H}})\).

Lemma 26

Assume that \(A,B \in \mathrm{Sym}({\mathcal {H}})\) are compact, \(A \ne 0\). Let \(r \in {\mathbb {R}}\) be fixed. Consider the function \(f:(-1/||A||, 1/||A||) \rightarrow \mathrm{Sym}({\mathcal {H}})\) defined by \(f(t) = (I+ tA)^r + B\). Then

$$\begin{aligned} Df(0)(t) = rAt, \;\; \; t \in {\mathbb {R}}. \end{aligned}$$
(177)

Proof

Since \(t \in (-1/||A||, 1/||A||)\), \(I+tA > 0\) and thus \((I+tA)^r\) is well-defined \(\forall r \in {\mathbb {R}}\). The derivative of f does not depend on the constant term B so we can set \(B = 0\). Write \(f(t) = (I+tA)^r = \exp (r\log (I+tA))\). By the chain rule, we have

$$\begin{aligned} Df(0)(t) = [D\exp (0) \circ rD\log (I)](At) = rtD\exp (0)(D\log (I)(A)) = rAt. \end{aligned}$$

\(\square \)

Lemma 27

Assume that \(A \in \mathrm{Sym}({\mathcal {H}})\) is compact, \(A \ne 0\). Let \(r,s,c \in {\mathbb {R}}, c\ge 0\) be fixed. Consider the function \(f:(-1/||A||, 1/||A||) \rightarrow \mathrm{Sym}({\mathcal {H}})\) defined by \(f(t) = [(I+tA)^r + cI]^{-1}(I+tA)^{s}\). Then

$$\begin{aligned} Df(0)(t) = -\frac{r- (1+c)s}{(1+c)^2}At, \;\;\; \forall t \in {\mathbb {R}}. \end{aligned}$$
(178)

Proof

Since \(t \in (-1/||A||, 1/||A||)\), \(I+tA > 0\) and thus \((I+tA)^r, (I+tA)^s\) are well-defined \(\forall r,s \in {\mathbb {R}}\). Write \(f(t) = [(I + tA)^{r-s} + c(I+tA)^{-s}]^{-1} = [g(t)]^{-1}\), where \(g(t) = (I + tA)^{r-s} + c(I+tA)^{-s}\), where \(g(0) = (1+c)I\). By Lemma 26,

$$\begin{aligned} Dg(0)(t) = (r-s)At -csAt = [r - (1+c)s]At. \end{aligned}$$

By Eq. (173), we then have \( Df(0)(t) = -g(0)^{-1}[Dg(0)(t)]g(0)^{-1} = -\frac{ r- (1+c)s}{(1+c)^2}At. \)\(\square \)

Lemma 28

Let \({\mathcal {A}}\) be a Banach algebra with identity I. Let \(A,B \in {\mathcal {A}}\) be fixed and \(k \in {\mathbb {N}}\). Consider the function \(f:{\mathbb {R}}\rightarrow {\mathcal {A}}\) defined by \(f(t) = (A+tB)^k\). Then

$$\begin{aligned} Df(0)(t) = t(A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}). \end{aligned}$$
(179)

Proof

Write \(f(t) = [g(t)]^k = h(g(t))\), where \(g(t) = (A+tB)\), with \(g(0) = A\) and \(Dg(0)(t) = Bt\), and \(h(C) = C^k\). By the chain rule and Eq. (169), we have

$$\begin{aligned} Df(0)(t)&= [Dh(g(0)) \circ Dg(0)](t) = Dh(A)(Dg(0)(t)) = Dh(A)(Bt) \\&= t(A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}). \end{aligned}$$

\(\square \)

Lemma 29

Let V be a Banach space. Let \(\varOmega \subset {\mathbb {R}}\) be an open subset. Consider \(F(t) = g(t)H(t)\), where \(H:\varOmega \rightarrow V\) and \(g:\varOmega \rightarrow {\mathbb {R}}\) are both differentiable on \(\varOmega \). Then F is differentiable on \(\varOmega \) and

$$\begin{aligned} DF(x_0)(1) = g(x_0)DH(x_0)(1) + g{'}(x_0)H(x_0), \;\;\forall x_0 \in \varOmega . \end{aligned}$$
(180)

Lemma 30

Let \({\mathcal {A}}\) be a Banach algebra with identity I. Let \(A,B \in {\mathcal {A}}\) be fixed, \(\gamma \in {\mathbb {R}},\gamma \ne 0\), \(\mu \in {\mathbb {R}}\), and \(k \in {\mathbb {N}}\). Consider the function \(f:{\mathbb {R}}\rightarrow {\mathcal {A}}\) defined by \(f(t) = \frac{(A+tB)^k}{(\gamma + t\mu )^k}\). Then

$$\begin{aligned} Df(0)(1)=\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} - \frac{k\mu }{\gamma ^{k+1}}A^k. \end{aligned}$$
(181)

Proof

For \(\mu = 0\), f is well-defined \(\forall t \in {\mathbb {R}}\). For \(\mu \ne 0\), f is well-defined on \((-\infty , -\frac{\gamma }{\mu }) \cup (-\frac{\gamma }{\mu }, \infty )\). Let \(g(t) = (A+tB)^k\), then \(g(0) = A^k\) and \(Dg(0)(1) = A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}\) by Lemma 28. Let \(h(t) = (\gamma + t \mu )^{-k}\), with \(h(0) = \gamma ^{-k}\), \(h{'}(0) = -k \gamma ^{-k-1}\mu \). By Lemma 29,

$$\begin{aligned} Df(0)(1)&= h(0)Dg(0)(1) + h{'}(0)g(0) \\&= \frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} - \frac{k\mu }{\gamma ^{k+1}}A^k. \end{aligned}$$

\(\square \)

We now consider the function \(f(t) = [(A+\gamma I) + t(B+\mu I)]^r\), \(r \in {\mathbb {R}}\), for \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B+\mu I)\in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). We recall the binomial series expansion in a unital Banach algebra \({\mathcal {A}}\). Let \(A \in {\mathcal {A}}\) be such that \(||A|| < 1\). Then the following binomial series converges absolutely

$$\begin{aligned} (I+A)^r = I + rA + \frac{r(r-1)}{2!}A^2 + \frac{r(r-1)(r-2)}{3!}A^3 + \cdots . \end{aligned}$$
(182)

Proof of Proposition 3

We consider three cases: \(r= k\), \(r=-k\) (\(k \in {\mathbb {N}}\)), and \(r \in {\mathbb {R}}\) in general. While the first two cases are special cases of the last one, they do not require the assumption \(||A||_{\mathrm{tr}} < \gamma \) and result in considerably simpler expressions for the Fréchet derivative.

(a) Consider first the function f of the form \(f(t) = (A+\gamma I + t(B+\mu I))^k\), \(k \in {\mathbb {N}}\), which is well-defined \(\forall t \in {\mathbb {R}}\). Write \(f(t) = h(g(t)))^k\), where \(h(A) = A^k\) and \(g(t) = A+\gamma I + t(B+\mu I)\). By the chain rule and Eq. (169), we have

$$\begin{aligned}&Df(0)(1) = [Dh(g(0))\circ Dg(0)](1) = Dh(g(0))(Dg(0)(1))\\&\quad = Dh(A+\gamma I)(B+\mu I) \\&\quad = (A+\gamma I)^{k-1}(B+\mu I) + (A+\gamma I)^{k-2}(B+\mu I)(A+\gamma I)\\&\qquad + \cdots + (B+\mu I)(A+\gamma I)^{k-1} \\&\quad = k\mu (A+\gamma I)^{k-1} + \sum _{j=0}^{k-1}(A+\gamma I)^{k-1-j}B(A+\gamma I)^j. \end{aligned}$$

(b) Consider next the case \(r = -k\), \(k \in {\mathbb {N}}\). Write \(f(t) = h(g(t))\), where \(g(t) = (A+\gamma I + t(B+\mu I))^k\) and \(h(A) = A^{-1}\). Let t be sufficiently close to zero such that \(A+\gamma I + t(B+\mu I) > 0\), then g(t) is invertible and f(t) is well-defined. By the chain rule and Eq. (173), we have

$$\begin{aligned}&Df(0)(1) = Dh(g(0))(Dg(0)(1)) = -g(0)^{-1}Dg(0)(1)g(0)^{-1} \\&= - (A+\gamma I)^{-k}[k\mu (A+\gamma I)^{k-1} + \sum _{j=0}^{k-1}(A+\gamma I)^{k-1-j}B(A+\gamma I)^j](A+\gamma I)^{-k}. \end{aligned}$$

(c) Consider now the general case \(r \in {\mathbb {R}}\). Let t be sufficiently close to zero such that \(A+\gamma I + t(B+\mu I) > 0\). Then \([A+\gamma I + t(B+\mu I)]^r\) is well-defined for any \(r \in {\mathbb {R}}\). We write

$$\begin{aligned}&[A +\gamma I + t(B+\mu I)]^r = [(A+tB) + (\gamma + t\mu ) I]^r = (\gamma + t\mu )^r\left[ I + \frac{A +tB}{\gamma + t\mu }\right] ^r \\&\quad = h(t)g(t), \;\;\;\text {where} \;\;\; h(t) =(\gamma + t\mu )^r, \;\; g(t) = \left[ I + \frac{A +tB}{\gamma + t\mu }\right] ^r. \end{aligned}$$

Let \(\epsilon = \gamma - ||A||_{\mathrm{tr}} > 0\) by the assumption that \(||A||_{\mathrm{tr}} < \gamma \). Then

$$\begin{aligned} ||A+tB||_{\mathrm{tr}} \le ||A||_{\mathrm{tr}} + |t|\;||B||_{\mathrm{tr}} = \gamma - \epsilon + |t|\;||B||_{\mathrm{tr}} < \gamma + t\mu \end{aligned}$$

for all t satisfying \(|t|\;||B|| - t\mu < \epsilon \). Let \(\varOmega \in {\mathbb {R}}\) be an open set such that \(0 \in \varOmega \) and

$$\begin{aligned} \{A+ \gamma I + t(B+ \mu I) > 0, \;\; |t|\;||B||_{\mathrm{tr}} - t\mu < \epsilon \;\;\;\forall t \in \varOmega \}. \end{aligned}$$

Then f(t) is well-defined on \(\varOmega \) and furthermore \(||\frac{A+tB}{\gamma + t\mu }||_{\mathrm{tr}} < 1\)\(\forall t \in \varOmega \). Thus g(t) admits the following absolutely convergent binomial series expansion on \(\varOmega \)

$$\begin{aligned} g(t) =\left[ I + \frac{A +tB}{\gamma + t\mu }\right] ^r = I + \sum _{k=1}^{\infty }\frac{r(r-1)\ldots (r-k+1)}{k!}\frac{(A+tB)^k}{(\gamma + t\mu )^k}. \end{aligned}$$

Since each term in the series is differentiable on \(\varOmega \), it follows that g and hence f are both differentiable on \(\varOmega \). Let \(g_k(t) = \frac{(A+tB)^k}{(\gamma + t\mu )^k}\), then by Lemma 30,

$$\begin{aligned} Dg_k(0)(1) = \frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} - \frac{k\mu }{\gamma ^{k+1}}A^k \in \mathrm{Tr}({\mathcal {H}}). \end{aligned}$$

It follows that

$$\begin{aligned} Dg(0)(1)&= \sum _{k=1}^{\infty }\frac{r(r-1) \ldots (r-k+1)}{k!}Dg_k(0)(1) \\&= \sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} - \mu \frac{A}{\gamma ^2}\sum _{k=1}^{\infty }\frac{(r)_k}{(k-1)!}\frac{A^{k-1}}{\gamma ^{k-1}} \\&\quad = \sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} -r \mu \frac{A}{\gamma ^2}\left( I+\frac{A}{\gamma }\right) ^{r-1}. \end{aligned}$$

By Lemma 29, we then have

$$\begin{aligned}&Df(0)(1) = h(0)Dg(0)(1) + h{'}(0)g(0) \\&\quad = \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} \\&\qquad - r\gamma ^{r-1}\mu \frac{A}{\gamma }\left( I+\frac{A}{\gamma }\right) ^{r-1} + {r\gamma ^{r-1}\mu }\left( I +\frac{A}{\gamma }\right) ^r \\&\quad = \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} + r\gamma ^{r-1}\mu \left( I+\frac{A}{\gamma }\right) ^{r-1} \\&\quad = \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} + r\mu (A+\gamma I)^{r-1}. \end{aligned}$$

\(\square \)

Setting \(A = 0\) and \(\gamma = 1\) in Proposition 3, we obtain the following result.

Corollary 3

Let \((B+ \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}}), r \in {\mathbb {R}}\) be fixed. Then there exists an open set \(\varOmega \in {\mathbb {R}}\) containing 0 such that the function \(f(t) = [I+ t(B+ \mu I)]^r\) is differentiable on \(\varOmega \). Furthermore,

$$\begin{aligned} Df(0)(1) = r(B+\mu I). \end{aligned}$$
(183)

Corollary 4

Assume that \((B+ \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\) is fixed. Let \(r \in {\mathbb {R}}\) be fixed. Then for any \((C+\nu I ) \in \mathrm{Tr}_X({\mathcal {H}})\),

$$\begin{aligned} \frac{d}{dt}\mathrm{tr}_X[(I+ t(B+ \mu I))^r(C+\nu I)]\big \vert _{t=0} = r\mathrm{tr}_X[(B+ \mu I)(C+\nu I)]. \end{aligned}$$
(184)

Proof

Since \((C+\nu I)\) is a constant, for \(g(t) = (I+ t(B+\mu I))^r(C+\nu I)\), from Corollary 3, we have \(Dg(0)(1) = r(B+\mu I)(C+\nu I)\). Thus by the chain rule and Lemma 2, for the function \(f(t) = \mathrm{tr}_X[(I+ t(B+ \mu I))^r(C+\nu I)]\),

$$\begin{aligned} Df(0)(1)&= [D\mathrm{tr}_X(g(0)) \circ Dg(0)](1) = D\mathrm{tr}_X(C+\nu I)(r(B+\mu I)(C+\nu I) \\&= r\mathrm{tr}_X[(B+ \mu I)(C+\nu I)]. \end{aligned}$$

Since \(f{'}(0) = Df(0)(1)\), this gives us the desired result. \(\square \)

Proof of Proposition 4

Write \(f(t) = \mathrm{tr}_X[(A+\gamma I+t(B+\mu I))^r] = \mathrm{tr}_X[g(t)]\), where \(g(t) = (A+\gamma I +t(B+\mu I))^r\). By the chain rule and Lemma 2, we have

$$\begin{aligned} Df(0)(t) = [D\mathrm{tr}_X((A+\gamma I)^r) \circ Dg(0)](t) = \mathrm{tr}_X[Dg(0)(t)]. \end{aligned}$$

By Proposition 3, we then have

$$\begin{aligned} Df(0)(1)&= \mathrm{tr}_X\left[ \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k(A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1})}{k!\gamma ^k}\right] \nonumber \\&\quad + r\mu \mathrm{tr}_X[(A+\gamma I)^{r-1}], \;\;\text {where}\;\; (r)_k = r(r-1)\ldots (r-k+1) \\&= \mathrm{tr}_X\left[ \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_kA^{k-1}B}{(k-1)!\gamma ^k}\right] + r\mu \mathrm{tr}_X[(A+\gamma I)^{r-1}] \\&= r\mathrm{tr}_X\left[ \gamma ^{r-1}\sum _{k=1}^{\infty }\frac{(r-1)_{k-1}A^{k-1}B}{(k-1)!\gamma ^{k-1}}\right] + r\mu \mathrm{tr}_X[(A+\gamma I)^{r-1}] \\&= r\mathrm{tr}_X\left[ \gamma ^{r-1}\left( I+\frac{A}{\gamma }\right) ^{r-1}B\right] + r\mu \mathrm{tr}_X[(A+\gamma I)^{r-1}]\\&= r\mathrm{tr}_X[(A+\gamma I)^{r-1}(B+\mu I)]. \end{aligned}$$

\(\square \)

Proof of Proposition 5

Write \(f(t) = \log \mathrm{det_X}[(A+\gamma I+t(B+\mu I))^r + cI] = \log \mathrm{det_X}[g(t)]\), where \(g(t) = (A+\gamma I +t(B+\mu I))^r + cI\). By the chain rule and Lemma 4, we have

$$\begin{aligned} Df(0)(t)&= [D\log \mathrm{det_X}((A+\gamma I)^r +cI) \circ Dg(0)](t)\\&= \mathrm{tr}_X[((A+\gamma I)^r +cI)^{-1}Dg(0)(t)]. \end{aligned}$$

By Proposition 3, noting that the Fréchet derivatives of \((A+\gamma I +t(B+\mu I))^r + cI\) and \((A+\gamma I + t(B+\mu I))^r\) are the same, we have

$$\begin{aligned} Df(0)(1)&= \mathrm{tr}_X\left[ ((A+\gamma I)^r+cI)^{-1} \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k(A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1})}{k!\gamma ^k}\right] \nonumber \\&\quad + r\mu \mathrm{tr}_X[((A+\gamma I)^r + cI)^{-1}(A+\gamma I)^{r-1}], \;\;\text {where}\;\; (r)_k = r(r-1)\ldots (r-k+1) \\&= \mathrm{tr}_X\left[ ((A+\gamma I)^r+cI)^{-1} \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_kA^{k-1}B}{(k-1)!\gamma ^k}\right] \\&\quad + r\mu \mathrm{tr}_X[((A+\gamma I)^r + cI)^{-1}(A+\gamma I)^{r-1}] \\&= r\mathrm{tr}_X\left[ ((A+\gamma I)^r+cI)^{-1} \gamma ^{r-1}\sum _{k=1}^{\infty }\frac{(r-1)_{k-1}A^{k-1}B}{(k-1)!\gamma ^{k-1}}\right] \\&\quad + r\mu \mathrm{tr}_X[((A+\gamma I)^r + cI)^{-1}(A+\gamma I)^{r-1}] \\&= r\mathrm{tr}_X\left[ ((A+\gamma I)^r+cI)^{-1} \gamma ^{r-1}\left( I+\frac{A}{\gamma }\right) ^{r-1}B\right] \\&\quad + r\mu \mathrm{tr}_X[((A+\gamma I)^r + cI)^{-1}(A+\gamma I)^{r-1}] \\&= r\mathrm{tr}_X[((A+\gamma I)^r + cI)^{-1}(A+\gamma I)^{r-1}(B+\mu I)]. \end{aligned}$$

\(\square \)

Proof of Theorem 25

(The case\(\alpha> 0, \beta > 0\)) Consider the case \(\alpha> 0, \beta > 0\). Let \(B = A + sA_1 + tA_2\) and \(\mu = \gamma +s\gamma _1 + t\gamma _2\). By definition of \(D^{(\alpha , \beta )}_r\) and using the product and inverse rules of \(\mathrm{det_X}\),

$$\begin{aligned}&D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{{r(\delta -\frac{\alpha }{\alpha +\beta })}}{\alpha \beta }\log \left( \frac{\gamma }{\mu }\right) + \frac{1}{\alpha \beta }\log \mathrm{det_X}\left( \frac{\alpha (Z + \frac{\gamma }{\mu }I)^{r(1-\delta )} + \beta (Z + \frac{\gamma }{\mu }I)^{-r\delta }}{\alpha + \beta }\right) \nonumber \\&\quad = \frac{{r(\delta -\frac{\alpha }{\alpha +\beta })}}{\alpha \beta }\log \left( \frac{\gamma }{\mu }\right) + \frac{1}{\alpha \beta }\log \mathrm{det_X}\left( \frac{\alpha (Z + \frac{\gamma }{\mu }I)^{r} + \beta I}{\alpha + \beta }\right) -\frac{r\delta }{\alpha \beta }\log \mathrm{det_X}(Z+\frac{\gamma }{\mu } I). \end{aligned}$$
(185)

For \(\mu = \gamma + s\gamma _1 + t \gamma _2\), \(\mu (t=0) = \gamma + s \gamma _1\), \(\mu (s=0,t=0) = \gamma \), \(\frac{\partial \mu }{\partial s} = \gamma _1\), \(\frac{\partial \mu }{\partial t} = \gamma _2\). With \(\delta = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\), we have \(\delta (t=0) = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r}\) and \( \frac{\partial \delta }{\partial t} = -\frac{(\alpha \gamma ^r)(r\beta \mu ^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta \mu ^r)^2} \), \(\frac{\partial \delta }{\partial t}\big \vert _{t=0} = -\frac{(\alpha \gamma ^r)(r\beta (\gamma + s\gamma _1)^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r)^2} \), \( \frac{\partial }{\partial t}\log \frac{\gamma }{\mu }\big \vert _{t=0} = -\frac{\gamma _2}{\gamma + s\gamma _1}. \)

(a) Consider the first term in Eq. (185), we have

$$\begin{aligned} \frac{\partial }{\partial t}\left[ \left( \delta - \frac{\alpha }{\alpha +\beta }\right) \log \frac{\gamma }{\mu } \right] \big \vert _{t=0}&= -\frac{(\alpha \gamma ^r)(r\beta (\gamma + s\gamma _1)^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r)^2}\log \frac{\gamma }{\gamma + s\gamma _1} \\&\quad -\left[ \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r}-\frac{\alpha }{\alpha + \beta }\right] \frac{\gamma _2}{\gamma + s\gamma _1}. \end{aligned}$$

Differentiating with respect to s then gives

$$\begin{aligned}&\frac{\partial ^2 }{\partial s\partial t}\left[ \left( \delta - \frac{\alpha }{\alpha +\beta }\right) \log \frac{\gamma }{\mu } \right] \big \vert _{s =0, t=0} = \frac{2r\alpha \beta }{(\alpha +\beta )^2}\frac{\gamma _1\gamma _2}{\gamma ^2}. \end{aligned}$$
(186)

(b) Consider the second term in Eq. (185). With \(\nu = \frac{\gamma }{\mu }\), we write

$$\begin{aligned}&\log \mathrm{det_X}\left( \frac{\alpha (Z + \nu I)^r + \beta I}{\alpha + \beta }\right) = \log \frac{\alpha }{\alpha +\beta } + \log \mathrm{det_X}\left( {(Z + \nu I)^{r} + \frac{\beta }{\alpha } I}\right) \end{aligned}$$

With \(B+\mu I = (A+\gamma I) + s(A_1 + \gamma _1I) + t(A_2 + \gamma _2I)\), we have

$$\begin{aligned}&[(A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2}]^r = [I+ s(Z_1 + \nu _1I) + t(Z_2 + \nu _2I)]^r, \end{aligned}$$

where \((Z_1 + \nu _1 I) = (A+\gamma I)^{-1/2}(A_1+\gamma _1I)(A+\gamma I)^{-1/2}\), \((Z_2 + \nu _2I) = (A+\gamma I)^{-1/2}(A_2+\gamma _2I)(A+\gamma I)^{-1/2}\), \(\nu _1 = \frac{\gamma _1}{\gamma }\), \(\nu _2 = \frac{\gamma _2}{\gamma }\), \(Z_1, Z_2 \in \mathrm{Tr}({\mathcal {H}})\).

By definition, \(Z +\nu I = (A+\gamma I)(B+ \mu I)^{-1}\), with \(\nu = \frac{\gamma }{\mu }\), so that

$$\begin{aligned}&\log \mathrm{det_X}\left( {(Z + \nu I)^{r} + \frac{\beta }{\alpha } I}\right) = \log \mathrm{det_X}\left( [(A+\gamma I)(B+\mu I)^{-1}]^{r} + \frac{\beta }{\alpha } I\right) \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad = \log \mathrm{det_X}\left( [(B+\mu I)(A+\gamma I)^{-1}]^{-r} + \frac{\beta }{\alpha } I\right) \\&\quad = \log \mathrm{det_X}\left( [(A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2}]^{-r} + \frac{\beta }{\alpha } I\right) \\&\quad = \log \mathrm{det_X}[(I + s(Z_1 + \nu _1I) + t(Z_2 + \nu _2I))^{-r} + \frac{\beta }{\alpha } I]. \end{aligned}$$

By Proposition 5, for s sufficiently close to zero so that \(|s|\;||Z_1||_{\mathrm{tr}} < 1 + s\nu _1\),

$$\begin{aligned}&\frac{\partial }{\partial t}\log \mathrm{det_X}\left( {(Z + \nu I)^{r} + \frac{\beta }{\alpha } I}\right) \big \vert _{t=0} \\&\quad = -r\mathrm{tr}_X\left[ \left( (I + s(Z_1 + \nu _1I))^{-r} + \frac{\beta }{\alpha } I\right) ^{-1}(I + s(Z_1 +\nu _1 I))^{-r-1}(Z_2 + \nu _2I)\right] . \end{aligned}$$

By Lemma 2, Lemma 27, and the chain rule, differentiating the last expression with respect to s gives

$$\begin{aligned}&\frac{\partial ^2}{\partial s\partial t}\log \mathrm{det_X}\left( {(Z + \nu I)^{r} + \frac{\beta }{\alpha } I}\right) \big \vert _{s=0, t=0} \nonumber \\&= \frac{r[1+(1+r)\frac{\beta }{\alpha }]}{(1+\frac{\beta }{\alpha })^2} \mathrm{tr}_X[(Z_1 +\nu _1I)(Z_2 + \nu _2I)] \nonumber \\&= \frac{r\alpha (\alpha + (1+r)\beta )}{(\alpha + \beta )^2} \mathrm{tr}_X[(Z_1 + \nu _1I)(Z_2 + \nu _2I)] \nonumber \\&= \frac{r\alpha (\alpha + (1+r)\beta )}{(\alpha + \beta )^2} \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 +\gamma _1 I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$
(187)

(c) Consider the third term in Eq. (185). By definition, \(Z +\nu I = (A+\gamma I)(B+ \mu I)^{-1}\), with \(\nu = \frac{\gamma }{\mu }\), so that with \(B+\mu I = (A+\gamma I) + s(A_1 + \gamma _1I) + t(A_2 + \gamma _2I)\), we have

$$\begin{aligned} \log \mathrm{det_X}(Z+\nu I)&= \log \mathrm{det_X}(A+\gamma I) - \log \mathrm{det_X}(B + \mu I) \\&= \log \mathrm{det_X}(A+\gamma I) \\&\quad -\log \mathrm{det_X}[A+\gamma I + s(A_1+\gamma _1 I) +t(A_2 + \gamma _2I)]. \end{aligned}$$

By Proposition 5, differentiating with respect to t gives

$$\begin{aligned}&\frac{\partial }{\partial t}\left[ \delta \log \mathrm{det_X}(Z + \nu I)\right] \big \vert _{t=0} = \\&\quad -\frac{(\alpha \gamma ^r)(r\beta (\gamma {+} s\gamma _1)^{r-1}\gamma _2)}{(\alpha \gamma ^r + \beta (\gamma {+} s\gamma _1)^r)^2} [\log \mathrm{det_X}(A{+}\gamma I) - \log \mathrm{det_X}(A+\gamma I + s(A_1 + \gamma _1I))] \\&\quad - \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta (\gamma + s\gamma _1)^r}\mathrm{tr}_X[(A+\gamma I + s(A_1 + \gamma _1I))^{-1}(A_2 +\gamma _2I)]. \end{aligned}$$

By Proposition 5, \(\frac{d}{ds}\log \mathrm{det_X}(A+\gamma I + s(A_1 + \gamma _1I))\big \vert _{s=0} = \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)]. \) Also

$$\begin{aligned}&\frac{d}{ds}\mathrm{tr}_X[(A+\gamma I + s(A_1 + \gamma _1I))^{-1}(A_2 +\gamma _2I)] \big \vert _{s=0} \\&\quad = \frac{d}{ds}\mathrm{tr}_X[(I+ s(A+\gamma I)^{-1}(A_1 + \gamma _1I))^{-1}(A+\gamma I)^{-1}(A_2 +\gamma _2I)]\big \vert _{s=0} \\&\quad = - \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \;\text {by Corollary}~\mathrm{4} . \end{aligned}$$

Combining the last three expressions, we obtain

$$\begin{aligned}&\frac{\partial ^2}{\partial s \partial t}\left[ \delta \log \mathrm{det_X}(Z + \nu I)\right] \big \vert _{s=0, t=0} = \frac{r\alpha \beta \gamma _2}{(\alpha + \beta )^2\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)] \nonumber \\&\quad + \frac{r\alpha \beta \gamma _1}{(\alpha + \beta )^2\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \nonumber \\&\quad +\frac{\alpha }{\alpha + \beta }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$
(188)

Combining Eqs. (186), (187), (188), we obtain

$$\begin{aligned}&\frac{\partial ^2}{\partial s\partial t}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+ \mu I)] \big \vert _{s=0, t=0} = \frac{r^2}{(\alpha +\beta )^2}\frac{2\gamma _1\gamma _2}{\gamma ^2} \\&\quad - \frac{r^2}{(\alpha +\beta )^2}\frac{\gamma _2}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)] \\&\quad - \frac{r^2}{(\alpha +\beta )^2}\frac{\gamma _1}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \\&\quad + \frac{r^2}{(\alpha +\beta )^2}\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$

\(\square \)

Proof of of Theorem 25 - Limiting cases

We now prove Theorem 25 for the case \(\alpha > 0, \beta = 0\) (the case \(\alpha =0, \beta > 0\) is then obtained by dual symmetry). It suffices to carry out the proof for \(\alpha = 1\), where

$$\begin{aligned}&D^{(1, 0)}_r[(A+\gamma I), (B+\mu I)] = r[(\frac{\mu }{\gamma })^{r} -1]\log \frac{\mu }{\gamma } \nonumber \\&\quad +\mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \mu I)]^{r} -I) - r(\frac{\mu }{\gamma })^{r}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]. \end{aligned}$$
(189)

Let \(B = A + sA_1 + tA_2, \mu = \gamma + s \gamma _1 + t \gamma _2\), with \(\mu (t=0) = \gamma + s\gamma _1\), \(\mu (s=0, t=0) = \gamma \), \(\frac{\partial \mu }{\partial s} = \gamma _1\), \(\frac{\partial \mu }{\partial t} = \gamma _2\). By definition, \(\frac{\partial }{\partial t}\left( \frac{\mu }{\gamma }\right) ^r = r\frac{\gamma _2}{\gamma }\left( \frac{\mu }{\gamma }\right) ^{r-1}, \frac{\partial }{\partial t}\left( \frac{\mu }{\gamma }\right) ^r \big \vert _{t=0} = r\frac{\gamma _2}{\gamma }\left( \frac{\gamma + s\gamma _1}{\gamma }\right) ^{r-1}. \)

(a) Consider the first term, we have \( \frac{\partial }{\partial t}\left[ \left( \frac{\mu }{\gamma }\right) ^r-1\right] \log \frac{\mu }{\gamma } \big \vert _{t=0} = r\frac{\gamma _2}{\gamma }\left( \frac{\gamma + s\gamma _1}{\gamma }\right) ^{r-1}\log \frac{\gamma + s\gamma _1}{\gamma } +\left[ \left( \frac{\gamma +s\gamma _1}{\gamma }\right) ^r-1\right] \frac{\gamma _2}{\gamma + s\gamma _1}. \) Differentiating with respect to s then gives

$$\begin{aligned}&\frac{\partial ^2}{\partial s \partial t}\left[ \left( \frac{\mu }{\gamma }\right) ^r-1\right] \log \frac{\mu }{\gamma } \big \vert _{s=0, t=0} = 2r\frac{\gamma _1\gamma _2}{\gamma ^2}. \end{aligned}$$
(190)

(b) Consider the second term. For \(B = A+sA_1 + tA_2\) and \(\mu = \gamma + s\gamma _1 + t\gamma _2\), by the definition of the power function, for any \(r \in {\mathbb {R}}\),

$$\begin{aligned}&[(A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2}]^r = [I+ s(Z_1 + \nu _1I) + t(Z_2 + \nu _2I)]^r, \end{aligned}$$

where \((Z_1 + \nu _1 I) = (A+\gamma I)^{-1/2}(A_1+\gamma _1I)(A+\gamma I)^{-1/2}\), \((Z_2 + \nu _2I) = (A+\gamma I)^{-1/2}(A_2+\gamma _2I)(A+\gamma I)^{-1/2}\), \(\nu _1 = \frac{\gamma _1}{\gamma }\), \(\nu _2 = \frac{\gamma _2}{\gamma }\), \(Z_1, Z_2 \in \mathrm{Tr}({\mathcal {H}})\).

For s sufficiently close to zero so that \(|s|\;||Z_1||_{\mathrm{tr}} < 1 + s\nu _1\), by Proposition 4,

$$\begin{aligned}&\frac{\partial }{\partial t}\mathrm{tr}_X[(A+\gamma I)^{-1}(B+\mu I)]^r \big \vert _{t=0}\\&\quad = \frac{\partial }{\partial t}\mathrm{tr}_X[(A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2}]^r \big \vert _{t=0} \\&\quad = r\mathrm{tr}_X[(I+ s(Z_1 + \nu _1I))^{r-1}(Z_2 + \nu _2I)]. \end{aligned}$$

By Corollary 4, differentiating with respect to s, we obtain

$$\begin{aligned}&\frac{\partial ^2 }{\partial s\partial t}\mathrm{tr}_X[(A+\gamma I)^{-1}(B+\mu I)]^r \big \vert _{s=0, t=0}\nonumber \\&\quad = r(r-1)\mathrm{tr}_X[(Z_1+\nu _1 I)(Z_2 + \nu _2I)] \nonumber \\&\quad = r(r-1)\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1+\gamma _1I)(A+\gamma I)^{-1}(A_2+\gamma _2I)]. \end{aligned}$$
(191)

(c) Consider the third term. For \(B = A+sA_1 + tA_2\) and \(\mu = \gamma + s\gamma _1 + t\gamma _2\), we have by the product and inverse rules of \(\mathrm{det_X}\) that

$$\begin{aligned} \log \mathrm{det_X}[(A+\gamma I)^{-1}(B+\mu I)]&= -\log \mathrm{det_X}(A+\gamma I) \\&\quad + \log \mathrm{det_X}[A+ \gamma I + s(A_1 {+} \gamma _1I) + t(A_2 {+} \gamma _2I)]. \end{aligned}$$

It follows from Proposition 5 and the above calculations that

$$\begin{aligned}&\frac{\partial }{\partial t}\left[ r\left( \frac{\mu }{\gamma }\right) ^{r}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]\right] \big \vert _{t=0} \\&\quad = -r^2\frac{\gamma _2}{\gamma }(\frac{\gamma + s\gamma _1}{\gamma })^{r-1}\log \mathrm{det_X}(A+\gamma I) \\&\qquad + r^2\frac{\gamma _2}{\gamma }(\frac{\gamma + s\gamma _1}{\gamma })^{r-1}\log \mathrm{det_X}(A+\gamma I + s(A_1+\gamma _1 I)] \\&\qquad + r(\frac{\gamma + s\gamma _1}{\gamma })^{r}\mathrm{tr}_X[(A+\gamma I +s(A_1 + \gamma _1I))^{-1}(A_2 +\gamma _2I)]. \end{aligned}$$

By Proposition 5, \( \frac{d}{ds}\log \mathrm{det_X}[(A+ \gamma I + s(A_1 + \gamma _1I)]\big \vert _{s=0} = \mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)]. \) Also

$$\begin{aligned}&\frac{d}{ds}\mathrm{tr}_X[(A+\gamma I +s(A_1 + \gamma _1I))^{-1}(A_2 +\gamma _2I)] \\&\quad = \frac{d}{ds}\mathrm{tr}_X[(I +s(A+\gamma I)^{-1}(A_1 + \gamma _1I))^{-1}(A+\gamma I)^{-1}(A_2 +\gamma _2I)] \\&\quad = -\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \;\;\text {by Corollary~4} . \end{aligned}$$

Combining the last three expressions, we obtain

$$\begin{aligned}&\frac{\partial ^2}{\partial s\partial t}\left[ r\left( \frac{\mu }{\gamma }\right) ^{r}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]\right] \big \vert _{s=0,t=0} \nonumber \\&\quad = r^2\frac{\gamma _2}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)] + r^2\frac{\gamma _1}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \nonumber \\&\qquad - r\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1+\gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$
(192)

Combining Eqs. (190), (191), and (192), we obtain

$$\begin{aligned}&\frac{\partial ^2}{\partial s \partial t}D^{(1,0)}_r[(A+\gamma I), (B+\mu I)] \big \vert _{s=0,t=0} \\&\quad = 2r^2\frac{\gamma _1\gamma _2}{\gamma ^2} + r^2\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1+\gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \\&\qquad - r^2\frac{\gamma _2}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)] - r^2\frac{\gamma _1}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$

\(\square \)

Proof of Lemma 5

By definition of the extended trace \(\mathrm{tr}_X\) and the extended inner product \(\langle \;, \;\rangle _{\mathrm{HS_X}}\),

$$\begin{aligned}&\mathrm{tr}_X[(A+\gamma I)^{*}(B+\mu I)] = \mathrm{tr}_X[A^{*}B + \mu A^{*} + \gamma B + \gamma \mu I] \\&\quad = \mathrm{tr}(A^{*}B) + \mu \mathrm{tr}(A^{*}) + \gamma \mathrm{tr}(B) + \gamma \mu \\&\quad = [\mathrm{tr}(A^{*}B) + \gamma \mu ] + \mu \mathrm{tr}_X(A+\gamma I)^{*} + \gamma \mathrm{tr}_X(B+\mu I) - 2\gamma \mu \\&\quad = \langle A+\gamma I, B + \mu I\rangle _{\mathrm{HS_X}} + \mu \mathrm{tr}_X(A+\gamma I)^{*} + \gamma \mathrm{tr}_X(B+\mu I) - 2\gamma \mu . \end{aligned}$$

\(\square \)

Proof of Corollary 2

This follows from Theorem 25, Lemma 5, and the property

$$\begin{aligned} \mathrm{tr}_X[(A+\gamma I)^{-1/2}(B+\mu I)(A+\gamma I)^{-1/2} = \mathrm{tr}_X[(A+\gamma I)^{-1}(B +\mu I)] \end{aligned}$$

for any pair \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B+\mu I) \in \mathrm{Tr}_X({\mathcal {H}})\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Minh, H.Q. Alpha-Beta Log-Determinant Divergences Between Positive Definite Trace Class Operators. Info. Geo. 2, 101–176 (2019). https://doi.org/10.1007/s41884-019-00019-w

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41884-019-00019-w

Keywords

Mathematics Subject Classification

Navigation