Advertisement

Information Geometry

, Volume 2, Issue 2, pp 101–176 | Cite as

Alpha-Beta Log-Determinant Divergences Between Positive Definite Trace Class Operators

  • Hà Quang MinhEmail author
Research Paper
  • 147 Downloads

Abstract

This work presents a parametrized family of divergences, namely Alpha-Beta Log-Determinant (Log-Det) divergences, between positive definite unitized trace class operators on a Hilbert space. This is a generalization of the Alpha-Beta Log-Determinant divergences between symmetric, positive definite matrices to the infinite-dimensional setting. The family of Alpha-Beta Log-Det divergences is highly general and contains many divergences as special cases, including the recently formulated infinite-dimensional affine-invariant Riemannian distance and the infinite-dimensional Alpha Log-Det divergences between positive definite unitized trace class operators. In particular, it includes a parametrized family of metrics between positive definite trace class operators, with the affine-invariant Riemannian distance and the square root of the symmetric Stein divergence being special cases. For the Alpha-Beta Log-Det divergences between covariance operators on a Reproducing Kernel Hilbert Space (RKHS), we obtain closed form formulas via the corresponding Gram matrices.

Keywords

Positive definite operators Trace class operators Infinite-dimensional Log-Determinant divergences Alpha-Beta divergences Affine-invariant Riemannian distance Stein divergence Extended trace Extended Fredholm determinant Reproducing kernel Hilbert spaces Covariance operators 

Mathematics Subject Classification

Primary 47B65 Secondary 47L07 46E22 

1 Introduction

Symmetric Positive Definite (SPD) matrices play an important role in many areas of mathematics, statistics, machine learning, optimization, computer vision, and related fields, see e.g. [1, 2, 3, 7, 11, 15, 19, 21, 30, 32, 39]. The set \(\mathrm{Sym}^{++}(n)\) of \(n \times n\) SPD matrices is an open convex cone and can also be equipped with a Riemannian manifold structure. Among the most studied Riemannian metrics on \(\mathrm{Sym}^{++}(n)\) are the classical affine-invariant metric [3, 5, 21, 30, 32] and the more recent Log-Euclidean metric [1, 15, 22]. The convex cone structure of \(\mathrm{Sym}^{++}(n)\), on the other hand, gives rise to distance-like functions such as the Alpha Log-Determinant divergences [6], which have been shown to be special cases of the Alpha-Beta Log-Determinant divergences [9]. These divergences are fast to compute and have been shown to work well in various applications [7, 19, 37].

The Alpha-Beta Log-Determinant (Log-Det) divergences are of interest from both theoretical and practical viewpoints. Theoretically, they provide a unifying framework for smoothly connecting many different, seemingly unrelated distances and divergences on the manifold of SPD matrices, such as the affine-invariant Riemannian distance and the symmetric Stein divergence. By the one-to-one correspondence between the sets of SPD matrices and Gaussian densities in \({\mathbb {R}}^n\) with mean zero, this framework thus connects many different distances/divergences on the manifold of Gaussian densities on \({\mathbb {R}}^n\) with mean zero, such as the Fisher-Rao distance and the Kullback-Leibler (KL) divergence. The Alpha-Beta Log-Det divergences also induce a Riemannian metric on the manifold of SPD matrices that is precisely the Fisher-Rao metric on the manifold of Gaussian densities in \({\mathbb {R}}^n\) with mean zero. Practically, in current applications, a typical approach is to employ one or several fixed distances/divergences for a particular problem and thus it is not clear whether those distances/divergences are optimal for the application at hand. Thus the Alpha-Beta Log-Det divergences, being a parametrized family encompassing a wide range of distances/divergences, can be employed to design mathematically rigorous algorithms for learning the optimal distance/divergence for a particular problem. Several schemes for optimizing the parameters \((\alpha , \beta )\) in the Alpha-Beta Log-Det divergences have been proposed recently, including [8], for the problem of dictionary learning, and [38], for the problem of image clustering in computer vision.

The present work aims to generalize the Alpha-Beta Log-Determinant divergences to the infinite-dimensional setting. The motivations for the study of these infinite-dimensional divergences come from various recent applications of covariance operators in machine learning, statistics, and computer vision, e.g. [12, 26, 28, 29, 35]. As we show in the paper, this provides a unifying formulation for the many distances/divergences between positive definite trace class operators on a Hilbert space, as in the finite-dimensional setting. In the infinite-dimensional setting, we have the correspondence between positive trace class operators on a Hilbert space \({\mathcal {H}}\) and Gaussian measures on \({\mathcal {H}}\) with mean zero. Thus, the above formulation also potentially connects many different distances/divergences between Gaussian measures on \({\mathcal {H}}\) with mean zero. In [14], we report the correspondence between the Alpha Log-Determinant divergences between positive definite trace class operators and the Rényi and Kullback-Leibler divergences between Gaussian measures on \({\mathcal {H}}\) with mean zero. Further results with the general Alpha-Beta Log-Determinant divergences will be reported in future work.

Finite-dimensional Alpha-Beta Log-Determinant divergences We recall that for \(A, B \in \mathrm{Sym}^{++}(n)\), the Alpha-Beta Log-Determinant (Log-Det) divergence between A and B is a parametrized family of divergences defined by (see [9])
$$\begin{aligned}&D^{(\alpha , \beta )}(A,B) = \frac{1}{\alpha \beta }\log \det \left[ \frac{\alpha (AB^{-1})^{\beta } + \beta (AB^{-1})^{-\alpha }}{\alpha + \beta }\right] ,\nonumber \\&\;\; \alpha \ne 0, \beta \ne 0, \alpha + \beta \ne 0. \end{aligned}$$
(1)

Remark 1

To keep our presentation compact, in the following we consider the case \(\alpha > 0\), \(\beta > 0\), as well as the limiting cases \(\alpha = 0\), \(\beta = 0\). Since \(D^{(\alpha , \beta )}(A,B) = D^{(-\alpha , -\beta )}(B,A)\), the case \(\alpha< 0, \beta < 0\) is essentially identical to the previous case. We do not consider the cases \(\alpha , \beta \) have opposite signs, since in those cases the well-definedness and finiteness of \(D^{(\alpha , \beta )}(A,B)\) depends on the spectrum of \(AB^{-1}\) (see Theorem 2 in [9]), that is it is not a valid divergence on all of \(\mathrm{Sym}^{++}(n)\). For the purposes of this work, we focus on Log-Det divergences, as surveyed in [9]. For other related work on matrix divergences, see e.g. [13, 16, 31].

The parametrized family of divergences defined by Eq.(1) is highly general and admits as special cases many metrics and distance-like functions on \(\mathrm{Sym}^{++}(n)\), including in particular the following.
  1. 1.
    The affine-invariant Riemannian distance [3], corresponding to the limiting case \(D^{(0,0)}(A,B)\), with
    $$\begin{aligned} D^{(0,0)}(A,B) = \frac{1}{2}d^2_{\mathrm{aiE}}(A,B) = \frac{1}{2}||\log (B^{-1/2}AB^{-1/2})||_F^2, \end{aligned}$$
    (2)
    where \(\log (A)\) denotes the principal logarithm of the matrix A and \(||\;||_F\) denotes the Frobenius norm.
     
  2. 2.
    The Alpha Log-Determinant divergences [6], corresponding to \(D^{(\alpha , 1-\alpha )}(A,B)\), \(0< \alpha < 1\), with
    $$\begin{aligned} D^{(\alpha , 1-\alpha )}(A,B) = \frac{1}{\alpha (1-\alpha )} \log \left[ \frac{\det [\alpha A + (1-\alpha )B]}{\det (A)^{\alpha }\det (B)^{1-\alpha }}\right] . \end{aligned}$$
    (3)
    A special case of this divergence is the symmetric Stein divergence (also called the Jensen-Bregman LogDet divergence), corresponding to \(D^{(1/2, 1/2)}(A,B)\), whose square root is a metric on \(\mathrm{Sym}^{++}(n)\) [37], with
    $$\begin{aligned} D^{(1/2,1/2)}(A,B) = 4d_{\mathrm{stein}}^2(A,B) = 4\log \frac{\det (\frac{A+B}{2})}{\sqrt{\det (A)\det (B)}}. \end{aligned}$$
    (4)
     
  3. 3.
    The limiting cases \(\beta = 0\) and \(\alpha = 0\) correspond to, respectively,
    $$\begin{aligned} D^{(\alpha , 0)}(A,B)&= \frac{1}{\alpha ^2}\left\{ \mathrm{tr}((A^{-1}B)^{\alpha } - I) - \alpha \log \det (A^{-1}B)\right\} , \end{aligned}$$
    (5)
    $$\begin{aligned} D^{(0,\beta )}(A,B)&= \frac{1}{\beta ^2}\left\{ \mathrm{tr}((B^{-1}A)^{\beta } - I) - \beta \log \det (B^{-1}A)\right\} , \end{aligned}$$
    (6)
    with \(D^{(1,0)}(A,B)= \mathrm{tr}(A^{-1}B - I) - \log \det (A^{-1}B)\) and \(D^{(0,1)}(A,B)= \mathrm{tr}(B^{-1}A-I) - \log \det (B^{-1}A)\).
     
Contributions of this work The current work is a continuation and generalization of the author’s recent work [24]. In [24], we generalized the Alpha Log-Det divergences between SPD matrices [6] to the infinite-dimensional Alpha Log-Determinant divergences between positive definite unitized trace class operators in a Hilbert space. In the current work, we carry out the following.
  • We present a formulation for the Alpha-Beta Log-Det divergences between positive definite unitized trace class operators, generalizing the Alpha-Beta divergences between SPD matrices as defined by Eq.(1). As in the finite-dimensional setting, the formulation we present here is general and admits as special cases many metrics and distance-like functions between positive definite unitized trace class operators, including in particular the following: the infinite-dimensional affine-invariant Riemannian distance [20]; the infinite-dimensional Alpha Log-Det divergences [24], a special case of which is the infinite-dimensional symmetric Stein divergence. For the divergences between reproducing kernel Hilbert spaces (RKHS) covariance operators, we obtain closed form formulas for the Alpha-Beta Log-Det divergences via the corresponding Gram matrices.

  • We show that the Alpha-Beta Log-Det divergences contain a parametrized family of metrics between positive definite unitized trace class operators, with the affine-invariant Riemannian distance [20] being a special case.

  • We show that the Alpha-Beta Log-Det divergences induce a positive definite bilinear form on the space of self-adjoint unitized trace class operators, which is equal to a generalization (and is the same in a special case) of the restriction on this space of the affine-invariant Riemannian metric on the Hilbert manifold of positive definite unitized Hilbert–Schmidt operators in [20]. This is accomplished via new results on the differential calculus of the extended trace and extended Fredholm determinant operations, which were introduced in [24].

Organization We provide a summary of the main results of the paper in Sect. 3, including our definition of the infinite-dimensional Alpha-Beta Log-Det divergences. The motivations and derivations leading to this definition are presented in Sect. 4. Properties of the Alpha-Beta Log-Det divergences are presented in Sect. 5. The positive definite bilinear form induced by the Alpha-Beta Log-Det divergences is shown in Sect. 6. The closed form formulas for the Alpha-Beta Log-Det divergences between RKHS covariance operators are presented in Sect. 7. All proofs are presented in Appendix A.

2 Positive definite unitized trace class and Hilbert–Schmidt operators

We first briefly review the set of positive definite unitized trace class operators \(\mathrm{PTr}({\mathcal {H}})\) and the extended Fredholm determinant \(\mathrm{det_X}\) and refer to [24] for the detailed motivations leading to the definitions of these concepts. Throughout the paper, let \({\mathcal {H}}\) denote a real separable Hilbert space, with \(\dim ({\mathcal {H}}) = \infty \), unless explicitly stated otherwise. Let \({\mathcal {L}}({\mathcal {H}})\) be the Banach space of bounded linear operators and \(\mathrm{Sym}({\mathcal {H}}) \subset {\mathcal {L}}({\mathcal {H}})\) be the subspace of self-adjoint, bounded operators on \({\mathcal {H}}\). We recall that an operator \({\mathcal {A}}\in {\mathcal {L}}({\mathcal {H}})\) is said to be positive definite [33] if there exists a constant \(M_A > 0\) such that \(\langle x, Ax\rangle \ge M_A||x||^2 \; \forall x \in {\mathcal {H}}\). This is equivalent to saying that A is both strictly positive and invertible. We denote by \({\mathbb {P}}({\mathcal {H}})\) the set of all self-adjoint, positive definite operators on \({\mathcal {H}}\) and write \(A > 0 \Longleftrightarrow A \in {\mathbb {P}}({\mathcal {H}})\).

Extended trace class operators Let \(\mathrm{Tr}({\mathcal {H}})\) be the set of trace class operators on \({\mathcal {H}}\), the set of extended (or unitized) trace class operators on \({\mathcal {H}}\) is defined to be
$$\begin{aligned} \mathrm{Tr}_X({\mathcal {H}}) = \{ A + \gamma I \; : \; A \in \mathrm{Tr}({\mathcal {H}}), \gamma \in {\mathbb {R}}\}. \end{aligned}$$
(7)
The set \(\mathrm{Tr}_X({\mathcal {H}})\) is a Banach algebra under the extended trace class norm
$$\begin{aligned} ||A+\gamma I||_{\mathrm{tr}_X} = ||A||_{\mathrm{tr}} + |\gamma | = \mathrm{tr}|A| + |\gamma |. \end{aligned}$$
For \((A+\gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\), its extended trace is defined to be
$$\begin{aligned} \mathrm{tr}_X(A+\gamma I) = \mathrm{tr}(A) + \gamma , \;\;\;\text {with}\;\;\; \mathrm{tr}_X(I) = 1 \end{aligned}$$
(8)
in contrast to the standard trace definition, according to which \(\mathrm{tr}(I) = \infty \).
Extended Fredholm determinant For \((A+ \gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\), \(\gamma \ne 0\), its extended Fredholm determinant is defined to be
$$\begin{aligned} \mathrm{det_X}(A+\gamma I) = \gamma \det \left( \frac{A}{\gamma } + I\right) , \end{aligned}$$
(9)
where the determinant on the right hand side is the Fredholm determinant. For \(\gamma = 1\), we recover the standard Fredholm determinant. In the case \(\dim ({\mathcal {H}}) < \infty \), we define \(\mathrm{det_X}(A+\gamma I) = \det (A+\gamma I)\), the standard matrix determinant.
Positive definite unitized trace class operators The set \(\mathrm{PTr}({\mathcal {H}}) \subset \mathrm{Tr}_X({\mathcal {H}})\) of positive definite unitized trace class operators is defined to be the intersection
$$\begin{aligned} \begin{aligned} \mathrm {PTr}({\mathcal {H}}) = \mathrm {Tr}_X({\mathcal {H}}) \cap {\mathbb {P}}({\mathcal {H}}) = \{A + \gamma I > 0\; : A^{*} = A, \; A \in \mathrm {Tr}({\mathcal {H}}), \; \gamma \in {\mathbb {R}}\}. \end{aligned} \end{aligned}$$
(10)
As motivated in detail in [24], positive definite operators of the form \((A+\gamma I)\), with A being a compact opeartor, are defined, so that the logarithm \(\log (A+\gamma I)\) is well-defined and bounded, and the extended Fredholm determinant \(\mathrm{det_X}\) is defined so that the quantity \(\log \mathrm{det_X}(A+\gamma I)\), \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\), is always well-defined and finite.
Extended Hilbert–Schmidt operators Let \(\mathrm{HS}({\mathcal {H}})\) denote the space of Hilbert–Schmidt operators on \({\mathcal {H}}\), which is given by
$$\begin{aligned} \mathrm{HS}({\mathcal {H}}) = \{A \in {\mathcal {L}}({\mathcal {H}}) \; : \; ||A||^2_{\mathrm{HS}} = \mathrm{tr}(A^{*}A) < \infty \}, \end{aligned}$$
(11)
where \(||\;||_{\mathrm{HS}}\) is the Hilbert–Schmidt norm. As in the case of trace class operators, one can introduce the set of extended (or unitized) Hilbert–Schmidt operators [20]
$$\begin{aligned} \mathrm{HS}_X({\mathcal {H}}) =\{ A + \gamma I \; : \; A \in \mathrm{HS}({\mathcal {H}}), \gamma \in {\mathbb {R}}\}, \end{aligned}$$
(12)
along with the extended Hilbert–Schmidt inner product defined by
$$\begin{aligned} \langle A+\gamma I, B+\mu I\rangle _{\mathrm{HS_X}} = \langle A,B\rangle _{\mathrm{HS}} + \gamma \mu . \end{aligned}$$
(13)
Positive definite unitized Hilbert–Schmidt operators We next recall the infinite-dimensional Hilbert manifold of positive definite unitized Hilbert–Schmidt operators on \({\mathcal {H}}\), defined in [20] to be the set
$$\begin{aligned} \varSigma ({\mathcal {H}}) = \{ A + \gamma I > 0 \; : \; A = A^{*}, A \in \mathrm{HS}({\mathcal {H}}), \gamma \in {\mathbb {R}}\}. \end{aligned}$$
(14)
In the case \(\dim ({\mathcal {H}}) = \infty \), the set \(\mathrm{PTr}({\mathcal {H}})\) of positive definite unitized trace class operators on \({\mathcal {H}}\) is a strict subset of \(\varSigma ({\mathcal {H}})\). The manifold \(\varSigma ({\mathcal {H}})\) can be equipped with the following Riemannian metric [20]. For each \(P \in \varSigma ({\mathcal {H}})\), on the tangent space
$$\begin{aligned} T_P(\varSigma ({\mathcal {H}})) \cong \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{HS}_X({\mathcal {H}}) = \{ A+\gamma I \; :\; A = A^{*}, A \in \mathrm{HS}({\mathcal {H}}), \gamma \in {\mathbb {R}}\}, \end{aligned}$$
(15)
we define the following inner product
$$\begin{aligned} \langle A+\gamma I, B+ \mu I\rangle _{P} = \langle P^{-1/2}(A+\gamma I)P^{-1/2}, P^{-1/2}(B+ \mu I)P^{-1/2}\rangle _{\mathrm{HS_X}}. \end{aligned}$$
(16)
The Riemannian metric \(\langle \;, \;\rangle _{P}\) turns \(\varSigma ({\mathcal {H}})\) into an infinite-dimensional Riemannian manifold. The geodesic distance between \((A+\gamma I), (B+\mu I)\) is given by
$$\begin{aligned} d_{\mathrm{aiHS}}[(A+\gamma I), (B + \mu I)] = ||\log [(B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}]||_{\mathrm{HS_X}}. \end{aligned}$$
(17)
Similar to the extended Fredholm determinant, the extended Hilbert–Schmidt inner product is defined so that \(||\log (A+\gamma I)||_{\mathrm{HS_X}}\), \((A+\gamma I) \in \varSigma ({\mathcal {H}})\), and consequently, the distance in Eq.(17), is always well-defined and finite.

As we show below, for the positive definite trace class operators, the affine-invariant Riemannian distance in Eq.(17) is recovered as a special case of our Alpha-Beta Log-Det divergences, which also induce the affine-invariant Riemannian metric in Eq.(16), just as in the finite-dimensional setting.

3 Summary of main results

We now present a summary of our main results, with the detailed technical descriptions and derivations provided in subsequent sections. The main purpose of the current work is the generalization of the Alpha-Beta Log-Det divergence between SPD matrices, as defined in Eq. (1), to that between positive definite unitized trace class operators in \(\mathrm{PTr}({\mathcal {H}})\). The following is our definition of the Alpha-Beta (Log-Det) divergences in the infinite-dimensional setting.

Definition 1

(Alpha-Beta Log-Determinant divergences between positive definite trace class operators)1 Assume that \(\dim ({\mathcal {H}}) = \infty \). Let \(\alpha > 0\), \(\beta > 0\) be fixed. Let \(r \in {\mathbb {R}}\), \(r \ne 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), the \((\alpha , \beta )\)-Log-Det divergence \(D^{(\alpha , \beta )}_r[(A+\gamma I), (B+ \mu I)]\) is defined to be
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\alpha \beta }\log \left[ \left( \frac{\gamma }{\mu }\right) ^{r(\delta -\frac{\alpha }{\alpha +\beta })} \mathrm{det_X}\left( \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + \beta (\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }}{\alpha + \beta }\right) \right] , \end{aligned}$$
(18)
where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\), \(\delta = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\). Equivalently,
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\alpha \beta }\log \left[ \left( \frac{\gamma }{\mu }\right) ^{r(\delta -\frac{\alpha }{\alpha +\beta })} \mathrm{det_X}\left( \frac{\alpha (Z + \frac{\gamma }{\mu }I)^{r(1-\delta )} + \beta (Z + \frac{\gamma }{\mu }I)^{-r\delta }}{\alpha + \beta }\right) \right] , \end{aligned}$$
(19)
where \(Z + \frac{\gamma }{\mu }I = (A+\gamma I)(B+\mu I)^{-1}\).

Remark 2

The quantity r in \(D^{(\alpha , \beta )}_r\) is in general a real-valued function of \(\alpha \) and \(\beta \). Except for the case \(r = \alpha + \beta \), see below, to the best of our knowledge, it has no equivalence in the existing literature in the finite-dimensional setting.

Remark 3

Throughout the paper, we employ the following notations. Using the identity \((B+\mu I)^{-1} = \frac{1}{\mu }I - \frac{B}{\mu }(B+\mu I)^{-1}\), we write \((B+ \mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2} = \varLambda + \frac{\gamma }{\mu }I \in \mathrm{PTr}({\mathcal {H}}), \) where \(\varLambda = (B+\mu I)^{-1/2}A(B+ \mu I)^{-1/2} -\frac{\gamma }{\mu }B(B+\mu I)^{-1} \in \mathrm{Tr}({\mathcal {H}})\). This notation is employed in Eq. (18). Similarly, in Eq. (19), we write \((A+\gamma I)(B+ \mu I)^{-1} = Z + \frac{\gamma }{\mu } I\), where \(Z = A(B+\mu I)^{-1} - \frac{\gamma }{\mu }B(B+\mu I)^{-1} \in \mathrm{Tr}({\mathcal {H}})\).

The quantity \(D^{(\alpha ,\beta )}_r[(A+\gamma I), (B+\mu I)]\), \(\alpha> 0, \beta > 0\), as stated in Definition 1, can be extended to the cases \(\alpha > 0, \beta = 0\) and \(\alpha = 0, \beta > 0\), \(\forall r \in {\mathbb {R}}, r \ne 0\), via limiting arguments. The following is our definition in these cases.

Definition 2

(Limiting cases - I) Assume that \(\dim ({\mathcal {H}}) = \infty \). Let \(\alpha> 0, \beta > 0\), \(r \ne 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), the Log-Det divergence \(D^{(\alpha , 0)}_r[(A+\gamma I), (B+\mu I)]\) is defined to be
$$\begin{aligned} D^{(\alpha , 0)}_r[(A+\gamma I), (B+\mu I)] =&\frac{r}{\alpha ^2}\left[ \left( \frac{\mu }{\gamma }\right) ^{r} -1\right] \log \frac{\mu }{\gamma } \nonumber \\&+\frac{1}{\alpha ^2}\mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \mu I)]^{r} -I) \nonumber \\&- \frac{1}{\alpha ^2}\left( \frac{\mu }{\gamma }\right) ^{r}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]^{r}. \end{aligned}$$
(20)
Similarly, \(D^{(0, \beta )}_r[(A+\gamma I), (B+\mu I)]\) is defined to be
$$\begin{aligned} D^{(0, \beta )}_r[(A+\gamma I), (B+\mu I)] =&\frac{r}{\beta ^2}\left[ \left( \frac{\gamma }{\mu }\right) ^{r} -1\right] \log \frac{\gamma }{\mu } \nonumber \\&+\frac{1}{\beta ^2}\mathrm{tr}_X([(B+\mu I)^{-1}(A+ \gamma I)]^{r} -I) \nonumber \\&- \frac{1}{\beta ^2}\left( \frac{\gamma }{\mu }\right) ^{r}\log \mathrm{det_X}[(B+\mu I)^{-1}(A+ \gamma I)]^{r}. \end{aligned}$$
(21)

The following result confirms that the quantity \(D^{(\alpha , \beta )}_{r}\), as defined in Definitions 1 and 2 , is in fact a divergence on \(\mathrm{PTr}({\mathcal {H}})\).

Theorem 1

(Positivity) Assume the hypothesis stated in Definitions 1 and 2 . Then
$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)]&\ge 0 \end{aligned}$$
(22)
$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)]&= 0 \Longleftrightarrow A = B, \; \gamma = \mu . \end{aligned}$$
(23)

Theorem 2

(Special cases - I) The following are some of the most important special cases of Definitions 1 and 2.
  1. 1.
    The infinite-dimensional affine-invariant Riemannian distance \(d_{\mathrm{aiHS}}[(A+\gamma I), (B+ \mu I)]\) [20], which corresponds to the limiting case \(\lim _{\alpha \rightarrow 0}D^{(\alpha ,\alpha )}_{r}[(A+\gamma I), (B+ \mu I)]\), where \(r=r(\alpha )\) is smooth, with \(r(0) = 0\), \(r{'}(0) \ne 0\), and \(r(\alpha ) \ne 0\) for \(\alpha \ne 0\). The limit is given by
    $$\begin{aligned} \lim _{\alpha \rightarrow 0}D^{(\alpha , \alpha )}_{r}[(A+\gamma I), (B+\mu I)] = \frac{[r{'}(0)]^2}{8}d^2_{\mathrm{aiHS}}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
    (24)
    In particular, for \(r = 2\alpha \),
    $$\begin{aligned} \lim _{\alpha \rightarrow 0}D^{(\alpha ,\alpha )}_{2\alpha }[(A+\gamma I), (B+ \mu I)] = \frac{1}{2}d^2_{\mathrm{aiHS}}[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
    (25)
     
  2. 2.
    The infinite-dimensional Alpha Log-Determinant divergences \(d^{\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]\) [24], with
    $$\begin{aligned} D^{(\alpha , 1-\alpha )}_{\pm 1}[(A+\gamma I), (B+ \mu I)] =\,&d^{\pm (1-2\alpha )}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)], \;\; 0 \le \alpha \le 1. \end{aligned}$$
    (26)
    This case is the content of Theorem 13.
     

Since the limit \(\lim _{\alpha \rightarrow 0}D^{(\alpha , \alpha )}_{r}[(A+\gamma I), (B+ \mu I)]\) in the first part of Theorem 2 is unique, up to the multiplicative factor \([r{'}(0)]^2/8\), we define the quantity \(D^{(0,0)}_0[(A+\gamma I), (B+\mu I)]\) as follows.

Definition 3

(Limiting cases - II) For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), the Log-Det divergence \(D^{(0,0)}_0[(A+\gamma I), (B+\mu I)]\) is defined to be
$$\begin{aligned} D^{(0,0)}_0[(A+\gamma I), (B+\mu I)]&= \lim _{\alpha \rightarrow 0}D^{(\alpha ,\alpha )}_{2\alpha }[(A+\gamma I), (B+ \mu I)] \nonumber \\&= \frac{1}{2}d^2_{\mathrm{aiHS}}[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(27)

Since \(d_{\mathrm{aiHS}}[(A+\gamma I), (B+ \mu I)]\) is a metric on \(\mathrm{PTr}({\mathcal {H}})\), \(D^{(0,0)}_0[(A+\gamma I), (B+\mu I)]\) is automatically a symmetric divergence on \(\mathrm{PTr}({\mathcal {H}})\). In fact, it is a member of the parametrized family \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \mu I)]\), \(\alpha \ge 0\), of symmetric divergences on \(\mathrm{PTr}({\mathcal {H}})\), as stated in the following result.

Theorem 3

(Special cases - II) The parametrized family \(D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \mu I)]\), \(\alpha \ge 0\), is a family of symmetric divergences on \(\mathrm{PTr}({\mathcal {H}})\), with \(\alpha = 0\) corresponding to the infinite-dimensional affine-invariant Riemannian distance above and \(\alpha = 1/2\) corresponding to the infinite-dimensional symmetric Stein divergence, which is given by \(\frac{1}{4}d^{0}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]\).

Finite-dimensional case For \(\gamma = \mu \), we have \(\delta = \frac{\alpha }{\alpha + \beta }\), so that Eq. (19) becomes
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad = \frac{1}{\alpha \beta }\log \mathrm{det_X}\left( \frac{\alpha [(A+\gamma I)(B + \gamma I)^{-1}]^{\frac{r\beta }{\alpha + \beta }} + \beta [(A+\gamma I)(B + \gamma I)^{-1}]^{-\frac{r\alpha }{\alpha +\beta }}}{\alpha + \beta }\right) . \end{aligned}$$
(28)
In the finite-dimensional case, where A and B are two \(n \times n\) SPD matrices, setting \(\gamma = 0\) and recalling that \(\mathrm{det_X}= \det \) for finite matrices , we obtain
$$\begin{aligned}&D^{(\alpha , \beta )}_r(A, B) = \frac{1}{\alpha \beta }\log \det \left( \frac{\alpha (AB^{-1})^{\frac{r\beta }{\alpha + \beta }} + \beta (AB^{-1})^{-\frac{r\alpha }{\alpha +\beta }}}{\alpha + \beta }\right) . \end{aligned}$$
(29)
In particular, by setting \(r = \alpha +\beta \), we recover Eq. (1). For \(\gamma = \mu \), Eq. (20) becomes
$$\begin{aligned}&D^{(\alpha , 0)}_r[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad = \frac{1}{\alpha ^2}\left\{ \mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \gamma I)]^{r} -I) - \log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \gamma I)]^{r}\right\} ,\nonumber \\ \end{aligned}$$
(30)
which reduces to Eq. (5) when \(A,B \in \mathrm{Sym}^{++}(n)\), \(\gamma = 0, r = \alpha \), and Eq. (21) becomes
$$\begin{aligned}&D^{(0, \beta )}_r[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad = \frac{1}{\beta ^2}\left\{ \mathrm{tr}_X([(B+\gamma I)^{-1}(A+ \gamma I)]^{r} -I) - \log \mathrm{det_X}[(B+\gamma I)^{-1}(A+ \gamma I)]^{r}\right\} , \end{aligned}$$
(31)
which reduces to Eq. (6) when \(A,B \in \mathrm{Sym}^{++}(n)\), \(\gamma = 0, r = \beta \).

Remark 4

As in the cases of the Log-Hilbert–Schmidt distance [29], the infinite-dimensional affine-invariant Riemannian distance [20, 23], and the infinite-dimensional Alpha Log-Det divergences [24], we show below that in general, the infinite-dimensional formulation is not obtainable as the limit of the finite-dimensional version as the dimension approaches infinity.

Metric properties Consider now a special case, where \(\alpha = \beta \) and \(r = \alpha + \beta \). For simplicity, we consider operators \((A+\gamma I)\) and \((B+ \mu I)\) with \(\gamma = \mu \). For \(\gamma > 0, \gamma \in {\mathbb {R}}\) fixed, we define the following subset of \(\mathrm{PTr}({\mathcal {H}})\)
$$\begin{aligned} \mathrm{PTr}({\mathcal {H}})(\gamma ) = \{A + \gamma I > 0: A^{*} = A, A \in \mathrm{Tr}({\mathcal {H}})\}. \end{aligned}$$
(32)

Remark 5

When \(\dim ({\mathcal {H}}) = \infty \), the condition \(A+\gamma I > 0\), with A being compact, automatically implies that \(\gamma > 0\). When \(\dim ({\mathcal {H}}) < \infty \), we can set \(\gamma = 0\).

Theorem 4

(Metric property) Let \(\gamma > 0, \gamma \in {\mathbb {R}}\) be fixed. The square root function \(\sqrt{D^{(\alpha ,\alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]}\) is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\) for all \(\alpha \ge 0\).

Theorem 5

(Monotonicity) Let \((A+\gamma I), (B+ \gamma I) \in \mathrm{PTr}({\mathcal {H}})(\gamma )\) be fixed, with \(A \ne B\). As a function of \(\alpha \), \(f: [0, \infty ) \rightarrow {\mathbb {R}}^{+}\), \(f(\alpha ) = D^{(\alpha ,\alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]\), is strictly decreasing on \((0, \infty )\), with
$$\begin{aligned} f_{\max }&= f(0) = \frac{1}{2}d^2_{\mathrm{aiHS}}[(A+\gamma I), (B+ \gamma I)], \end{aligned}$$
(33)
$$\begin{aligned} \lim _{\alpha \rightarrow \infty } f(\alpha )&= 0. \end{aligned}$$
(34)
We thus have a family of metrics between positive definite operators of the form \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})(\gamma )\), parametrized by \(\alpha \ge 0\), monotonically decreasing as a function of \(\alpha \). In particular, with \(\alpha = 0\), we obtain the maximum metric, namely the affine-invariant Riemannian distance, and with \(\alpha = \frac{1}{2}\) we obtain the following metric, which is the square root of the infinite-dimensional Stein divergence
$$\begin{aligned} \sqrt{D^{(1/2, 1/2)}_{1}[(A+\gamma I), (B+ \gamma I)]} = 2\sqrt{\log \left[ \frac{\mathrm{det_X}\left[ \frac{(A+\gamma I)+(B+\gamma I)}{2}\right] }{\mathrm{det_X}(A+\gamma I)^{1/2}\mathrm{det_X}(B+\gamma I)^{1/2}}\right] }. \end{aligned}$$
The corresponding finite-dimensional result [9], where \(A,B \in \mathrm{Sym}^{++}(n)\), is recovered by setting \(\gamma = 0\) in Theorem 4. In particular, with \(\alpha =1/2\) and \(A,B\in \mathrm{Sym}^{++}(n)\), we obtain the corresponding result of [37]. To the best of our knowledge, even in the finite-dimensional setting, this monotonic behavior is new.
Geometry of Alpha-Beta Log-Det divergences It is well-known [9] that on \(\mathrm{Sym}^{++}(n)\), the Alpha-Beta Log-Det divergence induces the affine-invariant Riemannian metric on \(\mathrm{Sym}^{++}(n)\), independently of \(\alpha , \beta \), by neglecting the terms of order higher than two in the calculation of \(D^{(\alpha , \beta )}(A+dA,A)\), where dA is a small deviation lying in the tangent space of \(\mathrm{Sym}^{++}(n)\) at A. The following is the corresponding infinite-dimensional generalization, with the infinite-dimensional affine-invariant Riemannian metric (see Sect. 2) on the Hilbert manifold
$$\begin{aligned} \varSigma ({\mathcal {H}}) = \{ A + \gamma I > 0 \; : \; A = A^{*}, A \in \mathrm{HS}({\mathcal {H}}), \gamma \in {\mathbb {R}}\} \end{aligned}$$
(35)
restricted to the subset \(\mathrm{PTr}({\mathcal {H}})\) and the tangent space of \(\varSigma ({\mathcal {H}})\) restricted to the subspace \(\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\).

Theorem 6

(Riemannian metric) Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and that \((A_1 + \gamma _1I), (A_2+\gamma _2I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). Then for \(s,t \in {\mathbb {R}}\),
$$\begin{aligned}&\frac{(\alpha + \beta )^2}{r^2}\frac{\partial ^2}{\partial s\partial t}D^{(\alpha , \beta )}_r[(A+\gamma I), (A+\gamma I) + s(A_1 + \gamma _1I) +t(A_2+\gamma _2I) ] \big \vert _{s=0, t=0} \end{aligned}$$
(36)
$$\begin{aligned}&\quad = \langle (A_1 + \gamma _1I), (A_2 + \gamma _2I)\rangle _{(A+\gamma I)}. \end{aligned}$$
(37)
This is the content of Theorem 25 and Corollary 2.

Remark 6

When \(r = \alpha +\beta \), the Riemannian metric induced by \(D^{(\alpha , \beta )}_r\), as stated in Theorem 6, is precisely the affine-invariant Riemannian metric as given in Eq.(16) and is independent of \(\alpha , \beta \), as in the finite-dimensional version. Otherwise, it differs from the affine-invariant Riemannian metric by the multiplicative factor \(\frac{(\alpha +\beta )^2}{r^2}\).

The corresponding finite-dimensional result [9] is obtained when \(r= \alpha + \beta , A \in \mathrm{Sym}^{++}(n),A_1, A_2 \in \mathrm{Sym}(n)\), and \(\gamma = \gamma _1 = \gamma _2 = 0\), in which case Eq. (37) gives
$$\begin{aligned} \frac{\partial ^2}{\partial s\partial t}D^{(\alpha , \beta )}_{\alpha + \beta }[A, A+sA_1+tA_2] \big \vert _{s=0, t=0} = \langle A_1, A_2\rangle _{A} = \mathrm{tr}(A^{-1}A_1A^{-1}A_2). \end{aligned}$$
For \(\beta = 1-\alpha \), we recover the result of [6].

Remark 7

We have shown that, generalizing the finite-dimensional case, the affine-invariant Riemannian metric on the set of positive definite trace class operators can be recovered from the second derivatives of the Alpha-Beta divergences. The relation between the third-order derivatives of the Alpha-Beta divergences and the duality of the affine connections on the manifold \(\varSigma ({\mathcal {H}})\) will be reported in a future work.

4 Infinite-dimensional Alpha-Beta Log-Determinant divergences

We now show the motivations and derivations leading to Definition 1. As we show below, this involves the generalization of Ky Fan’s inequality on the log-concavity of the determinant to the infinite-dimensional setting.

Exponential, logarithm, and power functions We first show that the power function is well-defined on positive definite trace class operators via the exponential and logarithm maps. Consider first the exponential map \(\exp :{\mathcal {L}}({\mathcal {H}}) \rightarrow {\mathcal {L}}({\mathcal {H}})\) defined by \(\exp (A) = \sum _{j=0}^{\infty }\frac{A^j}{j!}\). The following result shows that \(\exp \) maps \(\mathrm{Tr}_X({\mathcal {H}})\) to \(\mathrm{Tr}_X({\mathcal {H}})\).

Lemma 1

Let \((A+\gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\). Then \(\exp (A+\gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\).

Consider next the inverse function \(\log = \exp ^{-1}: {\mathcal {L}}({\mathcal {H}}) \rightarrow {\mathcal {L}}({\mathcal {H}})\). For any \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\), \(\log (A+\gamma I)\) is always well-defined as follows. Let \(\{\lambda _k\}_{k=1}^{\infty }\) be the eigenvalues of A with corresponding orthonormal eigenvectors \(\{\phi _k\}_{k=1}^{\infty }\). Then
$$\begin{aligned} A = \sum _{k=1}^{\infty }\lambda _k \phi _k \otimes \phi _k, \;\;\; \log (A+ \gamma I) = \sum _{k=1}^{\infty }\log (\lambda _k + \gamma )\phi _k \otimes \phi _k, \end{aligned}$$
(38)
where \(\phi _k\otimes \phi _k: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is a rank-one operator defined by \((\phi _k \otimes \phi _k)w = \langle \phi _k, w\rangle \phi _k\)\(\forall w \in {\mathcal {H}}\). Moreover, \(\log (A+\gamma I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\) and assumes the form
$$\begin{aligned} \log (A+\gamma I) = A_1 + \gamma _1 I, \;\;\; A_1 \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}({\mathcal {H}}), \gamma _1 \in {\mathbb {R}}. \end{aligned}$$
By Proposition 6 in [24], for any \(\alpha \in {\mathbb {R}}\), the power function \((A+\gamma I)^{\alpha }\) is then well-defined via the expression
$$\begin{aligned} (A+\gamma I)^{\alpha } = \exp [\alpha \log (A+\gamma I)] \in \mathrm{PTr}({\mathcal {H}}). \end{aligned}$$
For the purposes of the current work, we need to go beyond the set \(\mathrm{PTr}({\mathcal {H}})\). Specifically, for two operators \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), we show that
$$\begin{aligned} \log [(A+\gamma I)(B+ \mu I)^{-1}], \;\;\; [(A+\gamma I)(B+\mu I)^{-1}]^{\alpha }, \alpha \in {\mathbb {R}}\end{aligned}$$
(39)
are all well-defined and belong to \(\mathrm{Tr}_X({\mathcal {H}})\), even though they are no longer necessarily self-adjoint. First, let \(B \in {\mathcal {L}}({\mathcal {H}})\) be any invertible operator, then \(\forall A \in {\mathcal {L}}({\mathcal {H}})\),
$$\begin{aligned} \exp (BAB^{-1}) = \sum _{j=0}^{\infty }\frac{(BAB^{-1})^j}{j!} = B\left( \sum _{j=0}^{\infty }\frac{A^j}{j!}\right) B^{-1} = B\exp (A)B^{-1}. \end{aligned}$$
Thus for \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\), the logarithm of \(B(A+\gamma I)B^{-1} = BAB^{-1} + \gamma I \in \mathrm{Tr}_X({\mathcal {H}})\) is also well-defined and is given by
$$\begin{aligned} \log [B(A+\gamma I)B^{-1}]= & {} B\log (A+\gamma I)B^{-1} \nonumber \\= & {} B(A_1 + \gamma _1 I)B^{-1} = BA_1B^{-1} + \gamma _1 I \in \mathrm{Tr}_X({\mathcal {H}}).\qquad \end{aligned}$$
(40)
Using Eq. (40), we obtain the following results.

Proposition 1

Let \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\). Let \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\). Then
  1. 1.
    The logarithm \(\log [(A+\gamma I)(B+\mu I)^{-1}] \in \mathrm{Tr}_X({\mathcal {H}})\) is well-defined by
    $$\begin{aligned} \log \left[ (A+\gamma I)(B+\mu I)^{-1}\right] = (B+\mu I)^{1/2}\log \left( \varLambda + \frac{\gamma }{\mu }I\right) (B+ \mu I)^{-1/2}. \end{aligned}$$
    (41)
     
  2. 2.
    For any \(\alpha \in {\mathbb {R}}\), the power \([(A+\gamma I)(B+\mu I)^{-1}]^{\alpha } \in \mathrm{Tr}_X({\mathcal {H}})\) is well-defined by
    $$\begin{aligned} \left[ (A+\gamma I)(B+\mu I)^{-1}\right] ^{\alpha } = (B+\mu I)^{1/2}\left( \varLambda + \frac{\gamma }{\mu }I\right) ^{\alpha }(B+ \mu I)^{-1/2}. \end{aligned}$$
    (42)
     
  3. 3.
    For any \(p, q \in {\mathbb {R}}\), any \(\alpha , \beta \in {\mathbb {R}}\) such that \(\alpha + \beta \ne 0\),
    $$\begin{aligned}&\mathrm{det_X}\left[ \frac{\alpha [(A+\gamma I)(B+\mu I)^{-1}]^{p} + \beta [(A+\gamma I)(B+ \mu I)^{-1}]^{q}}{\alpha + \beta }\right] \nonumber \\&\quad = \mathrm{det_X}\left[ \frac{\alpha (\varLambda + \frac{\gamma }{\mu } I )^{p} + \beta (\varLambda + \frac{\gamma }{\mu } I)^{q}}{\alpha + \beta }\right] . \end{aligned}$$
    (43)
     

Generalization of Ky Fan’s inequality. We recall that in the case \(\dim ({\mathcal {H}}) < \infty \), the Log-Det divergences were motivated by Ky Fan’s inequality [10] on the log-concavity of the determinant, which states that for \(A,B \in \mathrm{Sym}^{++}(n)\), \(\det (\alpha A + (1-\alpha )B) \ge \det (A)^{\alpha }\det (B)^{1-\alpha }\), \(0 \le \alpha \le 1\), with equality if and only if \(A=B\) (\(0< \alpha < 1\)). This inequality has recently been generalized to the infinite-dimensional setting for the extended Fredholm determinant (Theorem 1 in [24]). The following is a further generalization of Theorem 1 in [24].

Theorem 7

Let \(0 \le \alpha \le 1\). For \((A+\gamma I), (B+ \mu I) \in \mathrm{PTr}({\mathcal {H}})\), for any \(p,q \in {\mathbb {R}}\),
$$\begin{aligned}&\mathrm{det_X}[\alpha (A+\gamma I)^p + (1-\alpha )(B+\mu I)^q] \nonumber \\&\quad \ge \left( \frac{\gamma ^p}{\mu ^q}\right) ^{\alpha -\delta }\mathrm{det_X}(A+\gamma I)^{p\delta }\mathrm{det_X}(B+\mu I)^{q(1-\delta )}, \end{aligned}$$
(44)
where \(\delta = \frac{\alpha \gamma ^{p}}{\alpha \gamma ^p + (1-\alpha ) \mu ^q}\) For \(0< \alpha < 1\), equality happens if and only if
$$\begin{aligned} \left( \frac{A}{\gamma }+I\right) ^{p} = \left( \frac{B}{\mu }+I\right) ^{q} \text {and}\;\;\; \gamma ^p = \mu ^q \Longleftrightarrow (A+\gamma I)^p = (B+\mu I)^q. \end{aligned}$$
(45)
For \(\gamma = \mu \ne 1\), equality happens if and only if simultaneously \(p = q\) and \(A = B\).

In particular, for \(p=q=1\), we recover Theorem 1 in [24]. From Theorem 7, we immediately have the following result.

Corollary 1

Let \(\alpha > 0\), \(\beta > 0\). For \((A+\gamma I), (B+ \mu I) \in \mathrm{PTr}({\mathcal {H}})\), for any \(p,q \in {\mathbb {R}}\),
$$\begin{aligned}&\mathrm{det_X}\left[ \frac{\alpha (A+\gamma I)^p + \beta (B+\mu I)^q}{\alpha + \beta }\right] \nonumber \\&\quad \ge \left( \frac{\gamma ^p}{\mu ^q}\right) ^{\frac{\alpha }{\alpha +\beta }-\delta }\mathrm{det_X}(A+\gamma I)^{p\delta }\mathrm{det_X}(B+\mu I)^{q(1-\delta )}, \end{aligned}$$
(46)
where \(\delta = \frac{\alpha \gamma ^{p}}{\alpha \gamma ^p + \beta \mu ^q}\), \(1-\delta = \frac{\beta \mu ^q}{\alpha \gamma ^p + \beta \mu ^q}\). Equality happens if and only if \((A+\gamma I)^p = (B+\mu I)^q\). For \(\gamma = \mu \ne 1\), equality happens if and only if \(p = q\) and \(A = B\).

Definition of the Alpha-Beta Log-Det divergences. Motivated by Theorem 7 and Corollary 1, we first define the following.

Definition 4

Let \(\alpha > 0\), \(\beta > 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), for \(p,q \in {\mathbb {R}}\), define
$$\begin{aligned}&D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\alpha \beta }\log \left[ \left( \frac{\gamma }{\mu }\right) ^{(p+q)(\delta -\frac{\alpha }{\alpha +\beta })} \mathrm{det_X}\left( \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I)^p + \beta (\varLambda +\frac{\gamma }{\mu }I)^{-q}}{\alpha + \beta }\right) \right] , \end{aligned}$$
(47)
where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\), \(\delta = \frac{\alpha (\frac{\gamma }{\mu })^{p+q}}{\alpha (\frac{\gamma }{\mu })^{p+q} + \beta }\).

The following theorem gives sufficient conditions for \(p,q \in {\mathbb {R}}\), with \(\alpha> 0, \beta > 0\) being fixed, so that for a given pair of operators \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), the quantity \(D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)] \) in Definition 4 is nonnegative, with equality if and only if \(A = B\) and \(\gamma = \mu \).

Theorem 8

Let \(\alpha > 0\), \(\beta > 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), assume that \(p,q \in {\mathbb {R}}\) satisfy the following conditions
$$\begin{aligned} p+q&\ne 0, \end{aligned}$$
(48)
$$\begin{aligned} \alpha p\left( \frac{\gamma }{\mu }\right) ^{p+q}&= \beta q . \end{aligned}$$
(49)
Then the quantity \(D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)]\) satisfies
$$\begin{aligned} D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)]&\ge 0, \end{aligned}$$
(50)
$$\begin{aligned} D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)]&= 0 \Longleftrightarrow A = B, \gamma = \mu . \end{aligned}$$
(51)

Subsequently, we assume that conditions (48) and (49) are satisfied. Note that p and q are not uniquely determined by (49). One way to enforce the uniqueness of p and q is by fixing the sum \(p+q\). Adopting this approach, we obtain Definition 1.

Theorem 9

Under the hypothesis of Theorem 8, assume further that \(p+q = r, \; r \in {\mathbb {R}}, r \ne 0\), r fixed. Under this condition, in Definition 4, we have
$$\begin{aligned} \delta = \frac{\alpha (\frac{\gamma }{\mu })^{r}}{\alpha (\frac{\gamma }{\mu })^{r} + \beta }, \;\;\;p = r(1-\delta ) = \frac{\beta r}{\alpha (\frac{\gamma }{\mu })^r + \beta },\;\;\; q =r\delta = \frac{\alpha r (\frac{\gamma }{\mu })^r}{\alpha (\frac{\gamma }{\mu })^r + \beta }. \end{aligned}$$
(52)
Plugging the expressions for p and q in Eq. (52) into Definition 4, we obtain Definition 1. Furthermore, the formulas given in Eqs. (18) and (19) are equivalent.

We now show how \(D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+ \mu I)]\) can be expressed concretely in terms of the standard Fredholm determinant.

Theorem 10

Let \(\alpha > 0\), \(\beta > 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), assume that \(p,q \in {\mathbb {R}}\) satisfy conditions (48) and (49) in Theorem 8. Then
$$\begin{aligned}&D^{(\alpha , \beta )}_{(p,q)}[(A+\gamma I), (B+\mu I)] = \frac{(p+q)(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\left( \log \frac{\gamma }{\mu }\right) \nonumber \\&\quad +\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) +\frac{1}{\alpha \beta }\log \det \left[ \frac{\alpha (\varLambda +\frac{\gamma }{\mu }I)^p + \beta (\varLambda + \frac{\gamma }{\mu }I)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] . \end{aligned}$$
(53)

4.1 Limiting cases

We next consider the motivations leading to the limiting cases in Definition 2, namely \(\beta \rightarrow 0\) when \(\alpha > 0\) is fixed, and \(\alpha \rightarrow 0\) when \(\beta > 0\) is fixed. In particular, our definitions of \(D^{(\alpha ,0)}_r[(A+\gamma I), (B+\mu I)]\), \(\alpha > 0\), and \(D^{(0,\beta )}_r[(A+\gamma I), (B+\mu I)]\), \(\beta > 0\), as given in Definition 2, are based on the respective limits in Theorems 11 and 12 below.

Theorem 11

(Limiting case \(\alpha > 0, \beta \rightarrow 0\)) Let \(\alpha > 0\) be fixed. Assume that \(r = r(\beta )\) is smooth, with \(r(0) = r(\beta = 0)\). Then
$$\begin{aligned}&\lim _{\beta \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{r(0)}{\alpha ^2}\left[ \left( \frac{\mu }{\gamma }\right) ^{r(0)} -1\right] \log \frac{\mu }{\gamma } \nonumber \\&\quad \quad +\frac{1}{\alpha ^2}\mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \mu I)]^{r(0)} -I) \nonumber \\&\quad \quad - \frac{1}{\alpha ^2}\left( \frac{\mu }{\gamma }\right) ^{r(0)}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \mu I)]^{r(0)}. \end{aligned}$$
(54)

Theorem 12

(Limiting case \(\alpha \rightarrow 0, \beta > 0\)) Let \(\beta > 0\) be fixed. Assume that \(r = r(\alpha )\) is smooth, with \(r(0) = r(\alpha = 0)\). Then
$$\begin{aligned}&\lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{r(0)}{\beta ^2}\left[ \left( \frac{\gamma }{\mu }\right) ^{r(0)} -1\right] \log \frac{\gamma }{\mu } \nonumber \\&\quad \quad +\frac{1}{\beta ^2}\mathrm{tr}_X([(B+\mu I)^{-1}(A+ \gamma I)]^{r(0)} -I) \nonumber \\&\quad \quad - \frac{1}{\beta ^2}\left( \frac{\gamma }{\mu }\right) ^{r(0)}\log \mathrm{det_X}[(B+\mu I)^{-1}(A+ \gamma I)]^{r(0)}. \end{aligned}$$
(55)

Special cases. Let us now describe several special cases of Theorems 11 and 12 , including their specialization to the finite-dimensional setting.

(i) For \(\gamma = \mu \), we have
$$\begin{aligned}&\lim _{\beta \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad = \frac{1}{\alpha ^2}\mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \gamma I)]^{r(0)} -I) \nonumber \\&\quad \quad - \frac{1}{\alpha ^2}\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \gamma I)]^{r(0)}, \end{aligned}$$
(56)
$$\begin{aligned}&\lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad =\frac{1}{\beta ^2}\mathrm{tr}_X([(B+ \gamma I)^{-1}(A+ \gamma I)]^{r(0)} -I) \nonumber \\&\quad \quad - \frac{1}{\beta ^2}\log \mathrm{det_X}[(B+\gamma I)^{-1}(A+ \gamma I)]^{r(0)}. \end{aligned}$$
(57)
In particular, for \(r = \alpha + \beta \), we have \(r(\beta =0) = \alpha \), \(r(\alpha = 0) = \beta \), so that
$$\begin{aligned}&\lim _{\beta \rightarrow 0}D^{(\alpha , \beta )}_{\alpha +\beta }[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad =\frac{1}{\alpha ^2}\left\{ \mathrm{tr}_X([(A+\gamma I)^{-1}(B+ \gamma I)]^{\alpha } -I) - \alpha \log \mathrm{det_X}[(A+\gamma I)^{-1}(B+ \gamma I)]\right\} , \end{aligned}$$
(58)
$$\begin{aligned}&\lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_{\alpha +\beta }[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad =\frac{1}{\beta ^2}\left\{ \mathrm{tr}_X([(B+\gamma I)^{-1}(A+ \gamma I)]^{\beta } -I) - \beta \log \mathrm{det_X}[(B+\gamma I)^{-1}(A+ \gamma I)]\right\} . \end{aligned}$$
(59)
These are the direct generalizations of the corresponding finite-dimensional formulas. In fact, for \(A,B \in \mathrm{Sym}^{++}(n)\), \(n \in {\mathbb {N}}\), by setting \(\gamma = 0\), we obtain
$$\begin{aligned} \lim _{\beta \rightarrow 0}D^{(\alpha , \beta )}_{\alpha +\beta }[A,B]&=\frac{1}{\alpha ^2}\left\{ \mathrm{tr}([(A^{-1}B)^{\alpha } -I) - \alpha \log \det (A^{-1}B)\right\} , \end{aligned}$$
(60)
$$\begin{aligned} \lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_{\alpha +\beta }[A,B]&=\frac{1}{\beta ^2}\left\{ \mathrm{tr}([(B^{-1}A)^{\beta } -I) - \beta \log \det (B^{-1}A)\right\} . \end{aligned}$$
(61)
These are precisely the finite-dimensional expressions given by Eqs. (5) and 6 , which are Eqs. (23) and (22) in [9], respectively.
(ii) If \(r(0) = r(\beta = 0) = 1\), we have for \(\alpha > 0\) fixed,
$$\begin{aligned}&\lim _{\beta \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] =\frac{1}{\alpha ^2}\left( \frac{\mu }{\gamma }-1\right) \log \frac{\mu }{\gamma } \nonumber \\&\qquad + \frac{1}{\alpha ^2}\left\{ \mathrm{tr_X}[(A+\gamma I)^{-1}(B+\mu I) - I] - \frac{\mu }{\gamma }\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+\mu I)]\right\} \nonumber \\&\quad = \frac{1}{\alpha ^2}d^{-1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(62)
Similarly, if \(r(0) = r(\alpha = 0) =1\), we have for \(\beta > 0\) fixed,
$$\begin{aligned}&\lim _{\alpha \rightarrow 0}D^{(\alpha , \beta )}_r[(A+\gamma I), (B+\mu I)] =\frac{1}{\beta ^2}\left( \frac{\gamma }{\mu }-1\right) \log \frac{\gamma }{\mu } \nonumber \\&\qquad + \frac{1}{\beta ^2}\left\{ \mathrm{tr_X}[(B+\mu I)^{-1}(A+\gamma I) - I] - \frac{\gamma }{\mu }\log \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]\right\} \nonumber \\&\quad = \frac{1}{\beta ^2}d^{1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(63)
In particular, if \(r\equiv 1\) as a constant function, then with \(\beta = 1 - \alpha \), we have
$$\begin{aligned} \lim _{\alpha \rightarrow 1}D^{(\alpha , 1- \alpha )}_1[(A+\gamma I), (B+ \mu I)]&= d^{-1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)] \\ \lim _{\alpha \rightarrow 0}D^{(\alpha , 1- \alpha )}_1[(A+\gamma I), (B+ \mu I)]&= d^{1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)], \end{aligned}$$
which are precisely the limiting cases stated in Eqs. (70) and (71) in Theorem 13.

4.2 Infinite-dimensional Alpha Log-Determinant divergences

We now show that the formulation for the infinite-dimensional Alpha Log-Det divergences in [24] is a special case of the present formulation, with \(\beta = 1-\alpha \) and \(r=\pm 1\). Let \(\dim ({\mathcal {H}}) = \infty \). For \(-1< \alpha < 1\), the Log-Det \(\alpha \)-divergence \(d^{\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]\) for \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\) is defined to be [24]
$$\begin{aligned}&d^{\alpha }_{\mathrm{logdet}} [(A+\gamma I), (B+ \mu I)] \nonumber \\&\quad = \frac{4}{1-\alpha ^2} \log \left[ \frac{\mathrm{det_X}\left( \frac{1-\alpha }{2}(A+\gamma I) + \frac{1+\alpha }{2}(B+\mu I)\right) }{\mathrm{det_X}(A+\gamma I)^{q}\mathrm{det_X}(B + \mu I)^{1-q}}\left( \frac{\gamma }{\mu }\right) ^{q - \frac{1-\alpha }{2}}\right] , \end{aligned}$$
(64)
where \(q = \frac{(1-\alpha )\gamma }{(1-\alpha ) \gamma + (1+\alpha )\mu }\), with the limiting cases \(\alpha = \pm 1\) given by
$$\begin{aligned} d^{1}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]&= \left( \frac{\gamma }{\mu }-1\right) \log \frac{\gamma }{\mu } + \mathrm{tr_X}[(B+\mu I)^{-1}(A+\gamma I) - I] \nonumber \\&\quad - \frac{\gamma }{\mu }\log \mathrm{det_X}[(B+\mu I)^{-1}(A+\gamma I)]. \end{aligned}$$
(65)
$$\begin{aligned} d^{-1}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]&= \left( \frac{\mu }{\gamma }-1\right) \log \frac{\mu }{\gamma } + \mathrm{tr_X}\left[ (A+\gamma I)^{-1}(B+\mu I) - I\right] \nonumber \\&\quad - \frac{\mu }{\gamma }\log \mathrm{det_X}[(A+\gamma I)^{-1}(B+\mu I)]. \end{aligned}$$
(66)
In Definition 1, with \(0< \alpha < 1, \beta = 1-\alpha \), \(\delta = \frac{\alpha (\frac{\gamma }{\mu })^{r}}{\alpha (\frac{\gamma }{\mu })^{r} + 1-\alpha }\) we have
$$\begin{aligned}&D^{(\alpha , 1-\alpha )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\alpha (1-\alpha )}\log \left[ \left( \frac{\gamma }{\mu }\right) ^{r(\delta -\alpha )} \mathrm{det_X}\left( {\alpha \left( \varLambda +\frac{\gamma }{\mu }I\right) ^{r(1-\delta )} + (1-\alpha )\left( \varLambda +\frac{\gamma }{\mu }I\right) ^{-r\delta }}\right) \right] . \end{aligned}$$
(67)
The following result shows that \(D^{(\alpha , 1- \alpha )}_{r}[(A+\gamma I), (B+\mu I)]\) for the cases \(r = \pm 1\) are \(d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]\) and \(d^{2\alpha -1}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]\), respectively.

Theorem 13

(Alpha Log-Determinant divergences) Let \(0< \alpha <1\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\),
$$\begin{aligned}&D^{(\alpha , 1-\alpha )}_1[(A+\gamma I), (B+\mu I)]= d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{\delta -\alpha }{\alpha (1-\alpha )}\log \frac{\gamma }{\mu } + \frac{1}{\alpha (1-\alpha )}\log \left[ \frac{\mathrm{det_X}[\alpha (A+\gamma I) + (1-\alpha )(B+\mu I)]}{\mathrm{det_X}(A+\gamma I)^{\delta }\mathrm{det_X}(B+\mu I)^{1-\delta }}\right] , \end{aligned}$$
(68)
where \(\delta = \frac{\alpha \gamma }{\alpha \gamma + (1-\alpha )\mu }\). Similarly,
$$\begin{aligned} D^{(\alpha , 1-\alpha )}_{-1}[(A+\gamma I), (B+\mu I)] = d^{2\alpha -1}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
(69)
At the endpoints \(\alpha = 0\) and \(\alpha =1\),
$$\begin{aligned} \lim _{\alpha \rightarrow 1}D^{(\alpha , 1- \alpha )}_1[(A+\gamma I), (B+ \mu I)]&= d^{-1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)] \end{aligned}$$
(70)
$$\begin{aligned} \lim _{\alpha \rightarrow 0}D^{(\alpha , 1- \alpha )}_1[(A+\gamma I), (B+ \mu I)]&= d^{1}_{\mathrm{logdet}}[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(71)
In particular, in Theorem 13, for \(\gamma = \mu \), we have \(\delta = \alpha \), and
$$\begin{aligned}&D^{(\alpha , 1-\alpha )}_1[(A+\gamma I), (B+\gamma I)] \nonumber \\&\quad = \frac{1}{\alpha (1-\alpha )}\log \left[ \frac{\mathrm{det_X}[\alpha (A+\gamma I) + (1-\alpha )(B+ \gamma I)]}{\mathrm{det_X}(A+\gamma I)^{\alpha }\mathrm{det_X}(B+\gamma I)^{1-\alpha }}\right] , \end{aligned}$$
(72)
which directly generalizes the finite-dimensional formula given by Eq. (6) in [6].

Remark 8

(Beta Log-Determinant divergences)

In the finite-dimensional setting in [9], the authors call \(D^{1,\beta }(A,B)\) the Beta Log-Determinant divergence between \(A,B \in \mathrm{Sym}^{++}(n)\). Similarly, in the case \(\dim ({\mathcal {H}}) = \infty \), let \(\beta > 0\) be fixed and let \(r \in {\mathbb {R}}\), \(r \ne 0\) be fixed. For \((A+\gamma I), (B+\mu I) \in \mathrm{PTr}({\mathcal {H}})\), we then have the corresponding infinite-dimensional Beta Log-Determinant divergence
$$\begin{aligned}&D^{(1, \beta )}_r[(A+\gamma I), (B+\mu I)] \nonumber \\&\quad = \frac{1}{\beta }\log \left[ \left( \frac{\gamma }{\mu }\right) ^{r(\delta -\frac{1}{1+\beta })} \mathrm{det_X}\left( \frac{(\varLambda +\frac{\gamma }{\mu }I)^{r(1-\delta )} + \beta (\varLambda +\frac{\gamma }{\mu }I)^{-r\delta }}{1 + \beta }\right) \right] , \end{aligned}$$
(73)
where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+\mu I)^{-1/2}\), \(\delta = \frac{(\frac{\gamma }{\mu })^{r}}{ (\frac{\gamma }{\mu })^{r} + \beta }\), \(1-\delta = \frac{\beta }{(\frac{\gamma }{\mu })^{r} + \beta }\). However, we do not explore this divergence in detail in this work.

5 Properties of the Alpha-Beta Log-Determinant divergences

The following results show that many important invariance properties of the finite-dimensional Alpha-Beta Log-Det divergences [6, 9], as well as the infinite-dimensional Alpha Log-Det divergences [24], hold true for the infinite-dimensional Alpha-Beta Log-Det divergences between positive definite trace class operators. There are, however, some subtle differences between the finite and infinite-dimensional settings, see e.g. Theorems 16 and 17 on the affine-invariance and unitary-invariance properties, respectively, below.

Theorem 14

(Dual symmetry)
$$\begin{aligned} D^{(\beta , \alpha )}_r[(B+ \mu I),(A+\gamma I)] = D^{(\alpha , \beta )}_r[(A+ \gamma I),(B+\mu I)]. \end{aligned}$$
(74)
In particular, for \(\beta = \alpha \), we have
$$\begin{aligned} D^{(\alpha , \alpha )}_r[(B+ \mu I),(A+\gamma I)] = D^{(\alpha , \alpha )}_r[(A+ \gamma I),(B+\mu I)]. \end{aligned}$$
(75)

In particular, Eq.(75) provides the symmetry property in the parametrized family of symmetric divergences on \(\mathrm{PTr}({\mathcal {H}})\) stated in Theorem 2 as well as the parametrized family of metrics stated in Theorem 4.

Special case: Dual symmetry of the infinite-dimensional Alpha Log-Det divergences. By Theorem 13, we have for \(0 \le \alpha \le 1\),
$$\begin{aligned}&D^{(\alpha , 1-\alpha )}_1[(A+\gamma I), (B + \mu I)] = D^{(1-\alpha , \alpha )}_1[(B+\mu I), (A + \gamma I)] \nonumber \\&\Longleftrightarrow d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I), (B + \mu I)] = d^{-(1-2\alpha )}_{\mathrm{logdet}}[(B+\mu I), (A + \gamma I)]. \end{aligned}$$
(76)
This is precisely the dual symmetry of the infinite-dimensional Alpha Log-Det divergences (Theorem 4 in [24]).

Theorem 15

(Dual invariance under inversion)
$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}]&= D^{(\alpha , \beta )}_{-r}[(A+\gamma I), (B+\mu I)], \end{aligned}$$
(77)
$$\begin{aligned} D^{(\alpha , \beta )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}]&= D^{(\beta , \alpha )}_{r}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
(78)
In particular, for \(\alpha = \beta \), Eq. (78) gives the inversion-invariance property of \(D^{(\alpha , \alpha )}_r\),
$$\begin{aligned} D^{(\alpha , \alpha )}_r[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = D^{(\alpha , \alpha )}_{r}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
(79)
Special case: Dual invariance under inversion of the infinite-dimensional Alpha Log-Det divergences. By Eq. (77) and Theorem 13,
$$\begin{aligned} D^{(\alpha , 1-\alpha )}_1[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = D^{(\alpha , 1-\alpha )}_{-1}[(A+\gamma I), (B+\mu I)] \nonumber \\ \Longleftrightarrow d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = d^{-(1-2\alpha )}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
(80)
This is precisely the dual invariance under inversion of the infinite-dimensional Alpha Log-Det divergences (Theorem 5 in [24]). The same property follows from Eq. (78) and Theorem 13, namely
$$\begin{aligned} D^{(\alpha , 1-\alpha )}_1[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = D^{(1-\alpha , \alpha )}_{1}[(A+\gamma I), (B+\mu I)] \nonumber \\ \Longleftrightarrow d^{1-2\alpha }_{\mathrm{logdet}}[(A+\gamma I)^{-1}, (B+\mu I)^{-1}] = d^{-(1-2\alpha )}_{\mathrm{logdet}}[(A+\gamma I), (B+\mu I)]. \end{aligned}$$
(81)
The following is the generalization of the affine-invariance property of the Alpha-Beta Log-Det divergences on \(\mathrm{Sym}^{++}(n)\). In contrast to the finite-dimensional setting, in general, we do not consider transformations of the form \((A+\gamma I) \rightarrow C(A+\gamma I)C^{*}\) for an invertible \(C \in {\mathcal {L}}({\mathcal {H}})\), but those of the form \((A+\gamma I) \rightarrow (C+\nu I)(A+\gamma I)(C+\nu I)^{*}\), for an invertible \((C+\nu I) \in \mathrm{Tr}_X({\mathcal {H}})\), \(\nu \ne 0\). The latter condition ensures that \((C+\nu I)(A+\gamma I)(C+\nu I)^{*} \in \mathrm{PTr}({\mathcal {H}})\) (see also the next property below).

Theorem 16

(Affine invariance) For any \((A+\gamma I), (B + \mu I) \in \mathrm{PTr}({\mathcal {H}})\) and any invertible \((C+\nu I) \in \mathrm{Tr}_X({\mathcal {H}})\), \(\nu \ne 0\),
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(C+\nu I)(A+\gamma I)(C+\nu I)^{*}, (C+\nu I)(B+ \mu I)(C+\nu I)^{*}] \nonumber \\&\quad =D^{(\alpha , \beta )}_r[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(82)

For a general, invertible operator \(C \in {\mathcal {L}}({\mathcal {H}})\), with the general affine transformation, \((A+\gamma I) \rightarrow C(A+\gamma I)C^{*} = CAC^{*} + \gamma CC^{*} \notin \mathrm{PTr}({\mathcal {H}})\). Only in the special case \(CC^{*} = C^{*}C = I\), we have the unitary transformation \((A+\gamma I) \rightarrow C(A+\gamma I)C^{*} = CAC^{*} + \gamma I \in \mathrm{PTr}({\mathcal {H}})\) and consequently the following generalization of the unitary-invariance property on \(\mathrm{Sym}^{++}(n)\) (thus, in contrast to the finite-dimensional setting, this is not a special case of the affine-invariance property in Theorem 16).

Theorem 17

(Invariance under unitary transformations) For any \((A+\gamma I), (B+ \mu I) \in \mathrm{PTr}({\mathcal {H}})\) and any \(C \in {\mathcal {L}}({\mathcal {H}})\), with \(CC^{*} = C^{*}C = I\),
$$\begin{aligned} D^{(\alpha , \beta )}_r[C(A+\gamma I)C^{*}, C(B+ \mu I)C^{*}] =D^{(\alpha , \beta )}_r[(A+\gamma I), (B+ \mu I)]. \end{aligned}$$
(83)

The following properties effectively state that \(D^{(\alpha ,\beta )}_r[(A+ \gamma I), (B+ \mu I)]\) can be expressed in terms of the eigenvalues of \(\varLambda \in \mathrm{Tr}({\mathcal {H}})\), where \(\varLambda + \frac{\gamma }{\mu }I = (B+\mu I)^{-1/2}(A+\gamma I)(B+ \mu I)^{-1/2}\). These properties will be used in particular in the proof of Theorem 22 on the triangle inequality below.

Theorem 18

$$\begin{aligned} D^{(\alpha ,\beta )}_r[(A+ \gamma I), (B+ \mu I)] = D^{(\alpha , \beta )}_r\left[ \left( \varLambda + \frac{\gamma }{\mu }I\right) , I\right] . \end{aligned}$$
(84)

Theorem 19

Let \(\omega \in {\mathbb {R}}, \omega \ne 0\) be arbitrary. Then
$$\begin{aligned} D^{(\omega \alpha , \omega \beta )}_{\omega r}[(A+\gamma I), (B+ \mu I)] = \frac{1}{\omega ^2}D^{(\alpha , \beta )}_{r}\left[ \left( \varLambda + \frac{\gamma }{\mu }I\right) ^{\omega }, I\right] . \end{aligned}$$
(85)

The following properties are important for proving that the square root function \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \gamma I)]}\), is a metric on \(\mathrm{PTr}({\mathcal {H}})(\gamma )\), as stated in Theorem 4. We focus on the case \(\alpha > 0\), since for \(\alpha = 0\), \(\sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+ \mu I)]} = \frac{1}{\sqrt{2}}d_{\mathrm{aiHS}}[(A+\gamma I), (B+ \mu I)]\) is automatically a metric on \(\mathrm{PTr}({\mathcal {H}})\).

The following shows that for \(\gamma = \mu = 1\), the divergence \(D^{(\alpha , \alpha )}_{2\alpha }[(I+A), (I+B)]\) can be obtained as the limit of a sequence of divergences \(D^{(\alpha , \alpha )}_{2\alpha }[(I+A_n), (I+B_n)]\), where \(\{A_n\}_{n \in {\mathbb {N}}}, \{B_n\}_{n \in {\mathbb {N}}}\) are sequences of finite-rank operators converging to A and B, respectively, in the trace norm.

Theorem 20

(Convergence in trace norm) Let \(\alpha > 0\) be fixed. Let \({\mathcal {H}}\) be a separable Hilbert space, \(A,B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint, trace class operators with \((I + A)> 0,(I+B) > 0\). Let \(\{A_n\}_{n\in {\mathbb {N}}},\{B_n\}_{n\in {\mathbb {N}}}\) be sequences of self-adjoint, trace-class operators such that \(\lim _{n\rightarrow \infty }||A_n - A||_{\mathrm{tr}} = 0,\lim _{n\rightarrow \infty }||B_n - B||_{\mathrm{tr}} = 0\). Then
$$\begin{aligned} \lim _{n \rightarrow \infty }D^{(\alpha , \alpha )}_{2\alpha }[(I+A_n), (I+B_n)] = D^{(\alpha , \alpha )}_{2\alpha }[(I+A), (I+B)]. \end{aligned}$$
(86)

The following result gives the minimum lower bound of the divergence \(D^{(\alpha , \alpha )}_{2\alpha }[(A + \gamma I), (B + \gamma I)]\) in terms of the eigenvalues of A and B.

Theorem 21

(Diagonalization) Let \(\alpha \ge 0\) be fixed. Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(\gamma > 0\), \(\gamma \in {\mathbb {R}}\), be fixed. Let \(A, B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint trace class operators, such that \(A+\gamma I > 0\), \(B + \gamma I > 0\). Let \(\mathrm{Eig}(A), \mathrm{Eig}(B): \ell ^2 \rightarrow \ell ^2\) be diagonal operators with the diagonals consisting of the eigenvalues of A and B, respectively, in decreasing order.

Then
$$\begin{aligned} D^{(\alpha ,\alpha )}_{2\alpha }[(\mathrm{Eig}(A) + \gamma I), (\mathrm{Eig}(B) + \gamma I)] \le D^{(\alpha ,\alpha )}_{2\alpha }[(A + \gamma I), (B + \gamma I)]. \end{aligned}$$
(87)

Theorem 22

(Triangle inequality) Let \(\alpha > 0\) be fixed. Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(\gamma > 0\), \(\gamma \in {\mathbb {R}}\) be fixed. Let \(A,B,C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint, trace class operators such that \((A+\gamma I) > 0\), \((B+\gamma I) > 0\), \((C+\gamma I) > 0\). Then
$$\begin{aligned} \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A+\gamma I), (B+\gamma I)]}&\le \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(A + \gamma I), (C + \gamma I)]} \nonumber \\&+ \sqrt{D^{(\alpha , \alpha )}_{2\alpha }[(C + \gamma I), (B + \gamma I)]}. \end{aligned}$$
(88)

By letting \({\mathcal {H}}= {\mathbb {R}}^n\), \(A,B,C \in \mathrm{Sym}^{++}(n)\), and \(\gamma = 0\), we recover the finite-dimensional triangle inequality (see [9], Eq.(192)). For \(\alpha = 1/2\) and \(\gamma =1\), we obtain the following generalization of the triangle inequality of the square root of the symmetric Stein divergence on \(\mathrm{Sym}^{++}(n)\).

Theorem 23

(Triangle inequality- square root of symmetric Stein divergence) Let \({\mathcal {H}}\) be a separable Hilbert space. Let \(A, B,C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) be self-adjoint trace-class operators with \(A+I > 0\), \(B + I > 0\), \(C +I > 0\). Then
$$\begin{aligned} \sqrt{\log \frac{\det (\frac{A+B}{2} + I)}{\sqrt{\det (A+I)\det (B+I)}}}&\le \sqrt{\log \frac{\det (\frac{A+C}{2} + I)}{\sqrt{\det (A+I)\det (C+I)}}} \nonumber \\&+ \sqrt{\log \frac{\det (\frac{C+B}{2} + I)}{\sqrt{\det (C+I)\det (B+I)}}}. \end{aligned}$$
(89)

6 Geometry of the Log-Det divergences

In this section, we show that for each fixed \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\), the bilinear form on \(\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}}) \times \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\), defined by
$$\begin{aligned} \frac{\partial ^2}{\partial s \partial t} D^{(\alpha , \beta )}_r[(A + \gamma I), (A + \gamma I) + s(A_1 + \gamma _1I) + t(A_2 + \gamma _2I)]\big \vert _{s=0,t=0} \end{aligned}$$
(90)
with \((A_1 + \gamma _1I), (A_2+\gamma _2I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\), is strictly positive definite and equal to a generalization of the affine-invariant Riemannian metric defined on the tangent space of the Hilbert manifold \(\varSigma ({\mathcal {H}})\), namely \(\langle (A_1 + \gamma _1I), (A_2 + \gamma _2I)\rangle _{(A+\gamma I)}\), when restricted to \(\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\).
We first recall the concept of the Fréchet derivative on Banach spaces (see e.g. [17]). Let VW be Banach spaces and \({\mathcal {L}}(V,W)\) be the Banach space of bounded linear maps between V and W. Assume that \(f:\varOmega \rightarrow W\) is well-defined, where \(\varOmega \) is an open subset of V. Then the map f is said to be (Fréchet) differentiable at \(x_0 \in \varOmega \) if there exists a bounded linear map \(Df(x_0): V \rightarrow W\) such that
$$\begin{aligned} \lim _{h \rightarrow 0}\frac{||f(x_0+h) - f(x_0) - Df(x_0)(h)||_W}{||h||_V} = 0. \end{aligned}$$
The map \(Df(x_0)\) is called the Fréchet derivative of f at \(x_0\). In particular, for \(V= {\mathbb {R}}\),
$$\begin{aligned} Df(x_0)(h) = hDf(x_0)(1), \;\; \text {with}\;\; Df(x_0)(1) = \lim _{h \rightarrow 0}\frac{f(x_0+h) - f(x_0)}{h}. \end{aligned}$$
If the map \(Df:\varOmega \rightarrow {\mathcal {L}}(V,W)\) is differentiable at \(x_0\), then its Fréchet derivative at \(x_0\), denoted by \(D^2f(x_0): V \rightarrow {\mathcal {L}}(V,W)\), is called the second order derivative of f at \(x_0\). The bounded linear map \(D^2f(x_0)\), which is an element of \({\mathcal {L}}(V,{\mathcal {L}}(V,W))\), can be identified with a bounded bilinear map from \(V \times V \rightarrow W\), via
$$\begin{aligned} D^2f(x_0)(x,y) = (D^2f(x_0)(x))(y), \;\; x, y\in V. \end{aligned}$$
Under this identification, \(D^2f(x_0)\) is a symmetric, continuous bilinear map from \(V \times V \rightarrow W\), so that
$$\begin{aligned} D^2f(x_0)(x,y) = D^2f(x_0)(y,x) \;\; \forall x,y \in V. \end{aligned}$$
The higher derivatives \(D^kf\), \(k \in {\mathbb {N}}\), are defined similarly and can be identified with continuous k-linear map from \(V \times \cdots \times V\rightarrow W\).
Let \(C^k(\varOmega , W)\) denote the set of all k times continuously differentiable functions on \(\varOmega \). For \(f \in C^{k+1}(\varOmega , {\mathbb {R}})\), Taylor’s formula (see [17], Theorem 8.16) states that for \(h \in V\) such that \(\{x_0 + \tau h, 0 \le \tau \le 1\} \subset \varOmega \),
$$\begin{aligned} f(x_0 + h)&= f(x_0) + Df(x_0)(h) + \frac{1}{2!}D^2f(x_0)(h,h) + \cdots + \frac{1}{k!}D^kf(x_0)(h, \ldots , h) \\&\quad + \frac{1}{(k+1)!}D^{k+1}f(x_0 + \theta h)(h, \ldots , h), \end{aligned}$$
for a suitable \(\theta \in (0,1)\). Assume now that \(x_0\) is a local minimizer for f, then necessarily (see e.g. Theorem 1.33 in [34])
$$\begin{aligned} Df(x_0) = 0, \;\;\; D^2f(x_0)(h,h) \ge 0 \;\;\forall h \in V. \end{aligned}$$
Let \(x,y \in V\) be fixed. Then for \(s, t \in {\mathbb {R}}\), Taylor’s formula implies that
$$\begin{aligned} \frac{\partial ^2}{\partial s \partial t}f(x_0 + sx +ty)\big \vert _{s=0,t=0} = D^2f(x_0)(x,y), \end{aligned}$$
(91)
which is a symmetric, positive semi-definite bilinear form on \(V \times V\).
In our present context, \(\varOmega = \mathrm{PTr}({\mathcal {H}}) \subset V = \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). For \(x_0 \in \varOmega \) fixed, we define a function \(f_{x_0}:\varOmega \rightarrow {\mathbb {R}}\) by \(f_{x_0}(x) = D^{(\alpha , \beta )}_r[x_0,x]\). Then \(x_0\) is the unique global minimum of \(f_{x_0}\), with \(f_{x_0}(x_0) = 0\). Thus for any pair \(x, y \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\),
$$\begin{aligned} \frac{\partial ^2}{\partial s \partial t}f(x_0 +sx +ty)\big \vert _{s=0, t=0} = \frac{\partial ^2}{\partial s \partial t}D^{(\alpha , \beta )}_r[x_0, x_0 + sx+ty]\big \vert _{s=0, t=0} \end{aligned}$$
is a symmetric, positive semi-definite bilinear form on \(\mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}}) \times \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). We next compute its explicit expression.

The computation of the derivative in Eq. (90) requires the Fréchet derivatives of the maps \(\mathrm{tr}_X\), \(\mathrm{det_X}\), and their powers, which are of interest in their own right.

First, we note that the Fréchet derivative of the extended trace \(\mathrm{tr}_X:\mathrm{Tr}({\mathcal {H}}) \rightarrow {\mathbb {R}}\), which is a linear map, is itself.

Lemma 2

Assume that \((A_0 + \gamma _0I) \in \mathrm{Tr}_X({\mathcal {H}})\). Then for any \((A+\gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\),
$$\begin{aligned} D\mathrm{tr}_X(A_0 + \gamma _0I)(A+\gamma I) = \mathrm{tr}_X(A+\gamma I). \end{aligned}$$
(92)

To compute the Fréchet derivative of the extended Fredholm determinant \(\mathrm{det_X}:\mathrm{Tr}_X({\mathcal {H}}) \rightarrow {\mathbb {R}}\), which is nonlinear, we first present a generalization of the Plemelj-Smithies formula (see e.g. [36], Theorem 6.8) for trace class operators.

Theorem 24

(Plemelj-Smithies - Trace class operators case) Let \(A \in \mathrm{Tr}({\mathcal {H}})\). Then the following series converges for all \(t \in {\mathbb {C}}\),
$$\begin{aligned} \det (I + t A) = 1 + \sum _{m=1}^{\infty }t^m\frac{\alpha _m(A)}{m!}, \end{aligned}$$
(93)
where \(\alpha _m(A)\) is the \(m \times m\) determinant
$$\begin{aligned} \alpha _m(A) = \det \left( \begin{array}{ccccc} \mathrm{tr}(A) &{} m-1 &{} 0 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ \mathrm{tr}(A^{m-1}) &{} \mathrm{tr}(A^{m-2}) &{} \cdots &{} \cdots &{} 1 \\ \mathrm{tr}(A^{m}) &{} \mathrm{tr}(A^{m-1}) &{} \cdots &{} \cdots &{} \mathrm{tr}(A) \end{array} \right) . \end{aligned}$$
(94)
In particular, \(\alpha _1(A) = \mathrm{tr}(A)\), \(\alpha _2(A) = [\mathrm{tr}(A)]^2 - \mathrm{tr}(A^2)\), and in general, \(\alpha _m(A)\) is a polynomial in \(\mathrm{tr}(A), \mathrm{tr}(A^2), \ldots , \mathrm{tr}(A^m)\). By Corollary 7.6 in [36],
$$\begin{aligned} |\alpha _m(A)| \le e^m ||A||^m_{\mathrm{tr}}. \end{aligned}$$
(95)
For \(\mathrm{det_X}\), the following is the corresponding generalization (with \(t \in {\mathbb {R}}\)).

Proposition 2

Assume that \((A+\gamma I) \in \mathrm{Tr}_X({\mathcal {H}})\). Then
$$\begin{aligned} \mathrm{det_X}[I+ t(A+\gamma I)] = 1 +t\mathrm{tr}_X(A+\gamma I) + \sum _{k=2}^{\infty }\frac{t^k}{k!}\frac{\alpha _k(A)}{(1+\gamma t)^{k-1}}, \end{aligned}$$
(96)
For \(\gamma = 0\), the series converges for all \(t \in {\mathbb {R}}\). For \(\gamma \ne 0\), the series converges for all \(t \ne -1/\gamma \). In particular, for t close zero,
$$\begin{aligned}&\mathrm{det_X}[I+ t(A+\gamma I)] = 1 +t\mathrm{tr}_X(A+\gamma I) + O(t^2), \end{aligned}$$
(97)
$$\begin{aligned}&\lim _{t \rightarrow 0}\frac{\mathrm{det_X}[I+ t(A+\gamma I)] - [1 +t\mathrm{tr}_X(A+\gamma I)]}{t} = 0. \end{aligned}$$
(98)

Proposition 2 implies the following explicit expression for the Fréchet derivative of \(\mathrm{det_X}\) at an invertible operator \((A_0 + \gamma _0I) \in \mathrm{Tr}_X({\mathcal {H}})\).

Lemma 3

Assume that \((A_0+\gamma _0I) \in \mathrm{Tr}_X({\mathcal {H}})\) is invertible. Then
$$\begin{aligned} D\mathrm{det_X}(A_0+\gamma _0 I)(A+\gamma I)&= \mathrm{det_X}(A_0+\gamma _0I)\mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I)], \nonumber \\&\quad \; \; \forall (A+ \gamma I) \in \mathrm{Tr}_X({\mathcal {H}}). \end{aligned}$$
(99)

Lemma 3 in turns gives the following explicit expression for the Fréchet derivative of the map \(\log \mathrm{det_X}:\mathrm{PTr}({\mathcal {H}}) \rightarrow {\mathbb {R}}\).

Lemma 4

Assume that \((A_0 + \gamma _0I) \in \mathrm{PTr}({\mathcal {H}})\). Then
$$\begin{aligned} D\log \mathrm{det_X}(A_0+\gamma _0 I)(A+\gamma I) = \mathrm{tr}_X[(A_0+\gamma _0I)^{-1}(A+\gamma I)]. \end{aligned}$$
(100)
where \((A+\gamma I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\).

By the definition of \(D^{(\alpha , \beta )}_r\), to compute the derivative in Eq. (90), we also need to compute the Fréchet derivative of the map \(f: {\mathbb {R}}\rightarrow \mathrm{Tr}_X({\mathcal {H}})\), defined by \(f(t) = [A+\gamma I + t(B+ \mu I)]^r\), \(r \in {\mathbb {R}}\), where \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B + \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\), at \(t = 0\). This is given in the following result.

Proposition 3

Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and \((B+ \mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\) are fixed. Let \(r \in {\mathbb {R}}\) be fixed. For \(r \notin {\mathbb {Z}}\), assume further that \(||A||_{\mathrm{tr}} < \gamma \). Then there exists an open set \(\varOmega \in {\mathbb {R}}\) containing 0 such that
$$\begin{aligned} f(t) = [A+\gamma I + t(B+\mu I)]^r \end{aligned}$$
(101)
is well-defined and differentiable on \(\varOmega \). Furthermore,
$$\begin{aligned} Df(0)(1)&= \gamma ^r\sum _{k=1}^{\infty }\frac{(r)_k}{k!}\frac{A^{k-1}B + A^{k-2}BA + \cdots + BA^{k-1}}{\gamma ^k} \nonumber \\&\quad + r\mu (A+\gamma I)^{r-1}, \;\;\;\text {with}\;\;\; (r)_k = r(r-1)\ldots (r-k+1). \end{aligned}$$
(102)
In particular, for \(r = k\in {\mathbb {N}}\),
$$\begin{aligned} Df(0)(1) = k\mu (A+\gamma I)^{k-1} + \sum _{j=0}^{k-1}(A+\gamma I)^{k-1-j}B(A+\gamma I)^j. \end{aligned}$$
(103)
For \(r = - k \in {\mathbb {N}}\),
$$\begin{aligned} Df(0)(1) =&- (A+\gamma I)^{-k}[k\mu (A+\gamma I)^{k-1}\nonumber \\&\quad + \sum _{j=0}^{k-1}(A+\gamma I)^{k-1-j}B(A+\gamma I)^j](A+\gamma I)^{-k}. \end{aligned}$$
(104)

Applying Proposition 3 and Lemma 2, we obtain the following.

Proposition 4

Assume that \(A+\gamma I \in \mathrm{PTr}({\mathcal {H}})\) and \(B+\mu I \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). Let \(r \in {\mathbb {R}}\) be fixed. For \(r \notin {\mathbb {Z}}\), assume further that \(||A||_{\mathrm{tr}} < \gamma \). Then
$$\begin{aligned} \frac{d}{dt}\mathrm{tr}_X[(A+\gamma I) + t(B + \mu I)]^r \big \vert _{t=0} = r\mathrm{tr}_X[(A+\gamma I)^{r-1}(B+\mu I)]. \end{aligned}$$
(105)

Applying Proposition 3 and Lemma 4, we obtain the following.

Proposition 5

Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}}),(B+\mu I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). Let \(c \in {\mathbb {R}}, c \ge 0, r \in {\mathbb {R}}\) be fixed. For \(r \notin {\mathbb {Z}}\), assume further that \(||A||_{\mathrm{tr}} < \gamma \). Then
$$\begin{aligned}&\frac{d}{dt}\log \mathrm{det_X}[(A+\gamma I + t(B+\mu I))^r + cI]\big \vert _{t=0} \nonumber \\&\quad = r\mathrm{tr}_X[((A+\gamma I)^r +cI)^{-1}(A+\gamma I)^{r-1}(B+\mu I)]. \end{aligned}$$
(106)

Applying Propositions 4 and 5 to \(D^{(\alpha , \beta )}_r\), we obtain the following.

Theorem 25

Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and that \((A_1 + \gamma _1I), (A_2+\gamma _2I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). Then for \(s,t \in {\mathbb {R}}\),
$$\begin{aligned}&\frac{(\alpha + \beta )^2}{r^2}\frac{\partial ^2}{\partial s\partial t}D^{(\alpha , \beta )}_r[(A+\gamma I), (A+\gamma I) + s(A_1 + \gamma _1I) +t(A_2+\gamma _2I) ] \big \vert _{s=0, t=0} \nonumber \\&\quad = \frac{2\gamma _1\gamma _2}{\gamma ^2} - \frac{\gamma _2}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)] - \frac{\gamma _1}{\gamma }\mathrm{tr}_X[(A+\gamma I)^{-1}(A_2 + \gamma _2I)] \nonumber \\&\qquad \;\; +\mathrm{tr}_X[(A+\gamma I)^{-1}(A_1 + \gamma _1I)(A+\gamma I)^{-1}(A_2 + \gamma _2I)]. \end{aligned}$$
(107)

The following result connects Eq. (107) in Theorem 25 with the Riemannian inner product \(\langle (A_1+\gamma _1I), (A_2 +\gamma _2I)\rangle _{(A+\gamma I)}\) by connecting the extended trace \(\mathrm{tr}_X\) with the extended Hilbert–Schmidt inner product \(\langle \; , \;\rangle _{\mathrm{HS_X}}\) when restricted to \(\mathrm{Tr}_X({\mathcal {H}})\).

Lemma 5

Assume that \((A+\gamma I), (B+ \mu I) \in \mathrm{Tr}_X({\mathcal {H}})\). Then
$$\begin{aligned} \langle A+\gamma I, B + \mu I\rangle _{\mathrm{HS_X}}&= \mathrm{tr}_X[(A+\gamma I)^{*}(B+\mu I)] \nonumber \\&\quad \; \; - \mu \mathrm{tr}_X(A+\gamma I)^{*} - \gamma \mathrm{tr}_X(B+\mu I) + 2\gamma \mu . \end{aligned}$$
(108)

Together with Lemma 5, Theorem 25 implies the desired expression, as stated explicitly in the following.

Corollary 2

Assume that \((A+\gamma I) \in \mathrm{PTr}({\mathcal {H}})\) and that \((A_1 + \gamma _1I), (A_2+\gamma _2I) \in \mathrm{Sym}({\mathcal {H}}) \cap \mathrm{Tr}_X({\mathcal {H}})\). Then for \(s,t \in {\mathbb {R}}\),
$$\begin{aligned}&\frac{(\alpha + \beta )^2}{r^2}\frac{\partial ^2}{\partial s\partial t}D^{(\alpha , \beta )}_r[(A+\gamma I), (A+\gamma I) + s(A_1 + \gamma _1I) +t(A_2+\gamma _2I) ] \big \vert _{s=0, t=0} \nonumber \\&\quad = \langle (A+\gamma I)^{-1/2}(A_1 + \gamma _1I)(A+\gamma I)^{-1/2},\nonumber \\&\qquad (A+\gamma I)^{-1/2}(A_2 + \gamma _2I)(A+\gamma I)^{-1/2}\rangle _{\mathrm{HS_X}} \nonumber \\&\quad = \langle (A_1 + \gamma _1I), (A_2 + \gamma _2I)\rangle _{(A+\gamma I)}. \end{aligned}$$
(109)

7 Alpha-Beta Log-Det divergences between RKHS covariance operators

Let \({\mathcal {X}}\) be a complete separable metric space. We now compute the Alpha-Beta Log-Det divergences between covariance operators on an RKHS induced by a continuous positive definite kernel K on \({\mathcal {X}}\times {\mathcal {X}}\). In this case, we have explicit formulas for \(D^{(\alpha , \beta )}_r\) via the corresponding Gram matrices. Similar formulas exist in the cases of the Log-Hilbert–Schmidt distance [29], the infinite-dimensional affine-invariant Riemannian distance [20, 23], and the infinite-dimensional Alpha Log-Det divergences [24].

We first prove the following result.

Theorem 26

Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A,B:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be compact operators such that \(AA^{*}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2,BB^{*}:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) are trace class operators. Assume that \(\dim ({\mathcal {H}}_2) = \infty \). Let \(\alpha , \beta > 0\) be fixed. For any \(r\in {\mathbb {R}}\), \(r \ne 0\), for any \(\gamma > 0\), \(\mu > 0\),
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(AA^{*} + \gamma I_{{\mathcal {H}}_2}), (BB^{*} + \mu I_{{\mathcal {H}}_2})] \nonumber \\&\quad = \frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\left( \log \frac{\gamma }{\mu }\right) +\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \nonumber \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{{\mathcal {H}}_1} \otimes I_3)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \end{aligned}$$
(110)
where \(\delta = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\), \(p = r(1-\delta )\), \(q = r\delta \), and
$$\begin{aligned} C = \begin{pmatrix} \frac{A^{*}A}{\gamma } \;\; -\frac{A^{*}B}{\sqrt{\gamma \mu }}(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{A^{*}AA^{*}B}{\gamma \sqrt{\gamma \mu }}(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \\ \frac{B^{*}A}{\sqrt{\gamma \mu }} \;\; -\frac{B^{*}B}{\mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{B^{*}AA^{*}B}{\gamma \mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \\ \frac{B^{*}A}{\sqrt{\gamma \mu }} \;\; -\frac{B^{*}B}{\mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \;\; -\frac{B^{*}AA^{*}B}{\gamma \mu }(I_{{\mathcal {H}}_1} + \frac{B^{*}B}{\mu })^{-1} \end{pmatrix}. \end{aligned}$$
(111)

For comparison, the following is the corresponding version of \(D^{(\alpha , \beta )}_r[(AA^{*} + \gamma I_{{\mathcal {H}}_2}), (BB^{*} + \mu I_{{\mathcal {H}}_2})]\) when \(\dim ({\mathcal {H}}_2) < \infty \), using the formula given in Eq. (29).

Theorem 27

Let \({\mathcal {H}}_1, {\mathcal {H}}_2\) be separable Hilbert spaces. Let \(A,B:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2\) be compact operators such that \(AA^{*}: {\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2,BB^{*}:{\mathcal {H}}_2 \rightarrow {\mathcal {H}}_2\) are trace class operators. Assume that \(\dim ({\mathcal {H}}_2) < \infty \). Let \(\alpha , \beta > 0\) be fixed. For any \(r\in {\mathbb {R}}\), \(r \ne 0\), for any \(\gamma > 0\), \(\mu > 0\),
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(AA^{*} + \gamma I_{{\mathcal {H}}_2}), (BB^{*} + \mu I_{{\mathcal {H}}_2})] \nonumber \\&\quad = \frac{1}{\alpha \beta }\left[ \log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \right] \dim ({\mathcal {H}}_2) \nonumber \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{{\mathcal {H}}_1}\otimes I_3)^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{{\mathcal {H}}_1} \otimes I_3)^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \end{aligned}$$
(112)
where \(p = r\frac{\beta }{\alpha +\beta }\), \(q = r\frac{\alpha }{\alpha +\beta }\), and C is as given in Theorem 26.
Let us briefly recall RKHS covariance operators, see e.g. [24, 27]. Let \({\mathbf {x}}=[x_1, \ldots , x_m]\) be a data matrix randomly sampled from the complete separable metric space \({\mathcal {X}}\) according to a Borel probability distribution \(\rho \), where \(m \in {\mathbb {N}}\) is the number of observations. Let K be a continuous positive definite kernel on \({\mathcal {X}}\times {\mathcal {X}}\) and \({\mathcal {H}}_K\) its induced reproducing kernel Hilbert space (RKHS), which is a separable Hilbert space. Let \(\varPhi : {\mathcal {X}}\rightarrow {\mathcal {H}}_K\) be the corresponding feature map, so that \(K(x,y) = \langle \varPhi (x), \varPhi (y)\rangle _{{\mathcal {H}}_K}\) for all pairs \((x,y) \in {\mathcal {X}}\times {\mathcal {X}}\). The feature map \(\varPhi \) induces the bounded linear operator
$$\begin{aligned} \varPhi ({\mathbf {x}}): {\mathbb {R}}^m \rightarrow {\mathcal {H}}_K, \;\;\; \varPhi ({\mathbf {x}}){\mathbf {b}}= \sum _{j=1}^mb_j\varPhi (x_j) , \;\; {\mathbf {b}}\in {\mathbb {R}}^m. \end{aligned}$$
(113)
The operator \(\varPhi ({\mathbf {x}})\) can also be viewed as the (potentially infinite) data matrix \(\varPhi ({\mathbf {x}}) = [\varPhi (x_1), \ldots , \varPhi (x_m)]\) of size \(\dim ({\mathcal {H}}_K) \times m\) in the feature space \({\mathcal {H}}_K\), with the jth column being \(\varPhi (x_j)\). Assume furthermore that
$$\begin{aligned} \int _{{\mathcal {X}}}||\varPhi (x)||_{{\mathcal {H}}_K}^2d\rho (x) = \int _{{\mathcal {X}}}K(x,x)d\rho (x) < \infty . \end{aligned}$$
(114)
The RKHS mean vector \(\mu _{\varPhi }\) and covariance operator \(C_{\varPhi }:{\mathcal {H}}_K \rightarrow {\mathcal {H}}_K\) are defined by
$$\begin{aligned} \mu _{\varPhi } = \int _{{\mathcal {X}}}\varPhi (x)d\rho (x), \;\;\; C_{\varPhi } = \int _{{\mathcal {X}}}(\varPhi (x)-\mu _{\varPhi })\otimes (\varPhi (x)-\mu _{\varPhi })d\rho (x), \end{aligned}$$
(115)
where, for \(u,v,w \in {\mathcal {H}}_K, (u\otimes v)w = \langle v,w\rangle _{{\mathcal {H}}_K}u\). Then \(C_{\varPhi }\) is a positive trace class operator on \({\mathcal {H}}_K\) and the corresponding empirical covariance operator is defined by
$$\begin{aligned} C_{\varPhi ({\mathbf {x}})} = \frac{1}{m}\varPhi ({\mathbf {x}})J_m\varPhi ({\mathbf {x}})^{*}: {\mathcal {H}}_K \rightarrow {\mathcal {H}}_K, \end{aligned}$$
(116)
where \(\varPhi ({\mathbf {x}})^{*}:{\mathcal {H}}_K \rightarrow {\mathbb {R}}^m\) is the adjoint operator of \(\varPhi ({\mathbf {x}})\) and \(J_m\) is the centering matrix, defined by \(J_m = I_m -\frac{1}{m}{\mathbf {1}}_m{\mathbf {1}}_m^T\) with \({\mathbf {1}}_m = (1, \ldots , 1)^T \in {\mathbb {R}}^m\).
Let \({\mathbf {x}}= [x_i]_{i=1}^m\), \({\mathbf {y}}= [y_i]_{i=1}^m\), \(m \in {\mathbb {N}}\), be two random data matrices sampled from \({\mathcal {X}}\) according to two Borel probability distributions and \(C_{\varPhi ({\mathbf {x}})}\), \(C_{\varPhi ({\mathbf {y}})}\) be the corresponding covariance operators induced by the kernel K. Let \(K[{\mathbf {x}}]\), \(K[{\mathbf {y}}]\), and \(K[{\mathbf {x}},{\mathbf {y}}]\) be the \(m \times m\) Gram matrices defined by
$$\begin{aligned} (K[{\mathbf {x}}])_{ij}&= K(x_i, x_j), \nonumber \\ (K[{\mathbf {y}}])_{ij}&= K(y_i, y_j), \; (K[{\mathbf {x}},{\mathbf {y}}])_{ij} = K(x_i, y_j), \; 1 \le i,j\le m. \end{aligned}$$
(117)
Let \(A = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {x}})J_m:{\mathbb {R}}^m \rightarrow {\mathcal {H}}_K\), \(B = \frac{1}{\sqrt{m}}\varPhi ({\mathbf {y}})J_m:{\mathbb {R}}^m \rightarrow {\mathcal {H}}_K\), so that
$$\begin{aligned}&AA^{*} = C_{\varPhi ({\mathbf {x}})}, \;\; BB^{*} = C_{\varPhi ({\mathbf {y}})},\;\; A^{*}A = \frac{1}{m}J_mK[{\mathbf {x}}]J_m, \;\; B^{*}B = \frac{1}{m}J_mK[{\mathbf {y}}]J_m, \nonumber \\&A^{*}B = \frac{1}{m}J_mK[{\mathbf {x}},{\mathbf {y}}]J_m, \;\; B^{*}A = \frac{1}{m}J_mK[{\mathbf {y}},{\mathbf {x}}]J_m. \end{aligned}$$
(118)
Theorems 26 and 27 can then be applied to give closed form formulas for the divergences between \((C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K})\) and \((C_{\varPhi ({\mathbf {y}})}+ \mu I_{{\mathcal {H}}_K})\), as follows.

Theorem 28

(Alpha-Beta Log-Det divergences between RKHS covariance operators - Infinite-dimensional version) Let \(\alpha , \beta > 0\) be fixed. Let \(r\in {\mathbb {R}}\), \(r \ne 0\) be fixed. Assume that \(\dim ({\mathcal {H}}_K) = \infty \). For any \(\gamma > 0\), \(\mu > 0\), the divergence \(D^{(\alpha , \beta )}_{r}[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \mu I_{{\mathcal {H}}_K})]\) is given by
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \mu I_{{\mathcal {H}}_K})] \nonumber \\&\quad = \frac{r(\delta -\frac{\alpha }{\alpha +\beta })}{\alpha \beta }\left( \log \frac{\gamma }{\mu }\right) +\frac{1}{\alpha \beta }\log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \nonumber \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{3m})^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{3m})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \end{aligned}$$
(119)
where \(\delta = \frac{\alpha \gamma ^r}{\alpha \gamma ^r + \beta \mu ^r}\), \(p = r(1-\delta )\), \(q = r\delta \), and
$$\begin{aligned} C = \begin{pmatrix} C_{11} &{} C_{12} &{} C_{13} \\ C_{21} &{} C_{22} &{} C_{23} \\ C_{21} &{} C_{22} &{} C_{23} \end{pmatrix} \in {\mathbb {R}}^{3m \times 3m}. \end{aligned}$$
(120)
Here the sub-matrices \(C_{ij}\), \(i=1,2\), \(j = 1,2,3\), each of size \(m \times m\), are given by
$$\begin{aligned} C_{11}&= \frac{1}{\gamma m}J_mK[{\mathbf {x}}]J_m, \end{aligned}$$
(121)
$$\begin{aligned} C_{12}&= -\frac{1}{\sqrt{\gamma \mu }m}J_mK[{\mathbf {x}},{\mathbf {y}}]J_m\left( I_m + \frac{1}{\mu m}J_mK[{\mathbf {y}}]J_m\right) ^{-1}, \end{aligned}$$
(122)
$$\begin{aligned} C_{13}&= -\frac{1}{\gamma \sqrt{\gamma \mu }m^2}J_mK[{\mathbf {x}}]J_mK[{\mathbf {x}},{\mathbf {y}}]J_m\left( I_m + \frac{1}{\mu m}J_mK[{\mathbf {y}}]J_m\right) ^{-1}, \end{aligned}$$
(123)
$$\begin{aligned} C_{21}&= \frac{1}{\sqrt{\gamma \mu } m}J_mK[{\mathbf {y}},{\mathbf {x}}]J_m, \end{aligned}$$
(124)
$$\begin{aligned} C_{22}&= -\frac{1}{\mu m}J_mK[{\mathbf {y}}]J_m\left( I_m + \frac{1}{\mu m}J_mK[{\mathbf {y}}]J_m\right) ^{-1}, \end{aligned}$$
(125)
$$\begin{aligned} C_{23}&= -\frac{1}{\gamma \mu m^2}J_mK[{\mathbf {y}}, {\mathbf {x}}]J_mK[{\mathbf {x}},{\mathbf {y}}]J_m\left( I_m + \frac{1}{\mu m}J_mK[{\mathbf {y}}]J_m\right) ^{-1}. \end{aligned}$$
(126)

Theorem 29

(Alpha-Beta Log-Det divergences between RKHS covariance operators - Finite-dimensional version)

Let \(\alpha , \beta > 0\) be fixed. Let \(r\in {\mathbb {R}}\), \(r \ne 0\) be fixed. Assume that \(\dim ({\mathcal {H}}_K) < \infty \). For any \(\gamma > 0\), \(\mu > 0\), the divergence \(D^{(\alpha , \beta )}_{r}[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \mu I_{{\mathcal {H}}_K})]\) is given by
$$\begin{aligned}&D^{(\alpha , \beta )}_r[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \mu I_{{\mathcal {H}}_K})] \nonumber \\&\quad = \frac{1}{\alpha \beta }\left[ \log \left( \frac{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}{\alpha + \beta }\right) \right] \dim ({\mathcal {H}}_K) \nonumber \\&\qquad +\frac{1}{\alpha \beta }\log \det \left[ \frac{ \alpha (\frac{\gamma }{\mu })^p(C + I_{3m})^{p}+ \beta (\frac{\gamma }{\mu })^{-q} (C + I_{3m})^{-q}}{\alpha (\frac{\gamma }{\mu })^p + \beta (\frac{\gamma }{\mu })^{-q}}\right] , \end{aligned}$$
(127)
where \(p = r\frac{\beta }{\alpha +\beta }\), \(q = r\frac{\alpha }{\alpha +\beta }\), and C is as given in Theorem 28.

Remark 9

The closed form formulas for \(D^{(\alpha , \beta )}_r[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \mu I_{{\mathcal {H}}_K})]\) given in Eqs. (119) and (127) in Theorems 28 and  29, respectively, coincide if and only if \(\gamma = \mu \). If \(\gamma \ne \mu \), then the right hand side of Eq. (127) approaches infinity when \(\dim ({\mathcal {H}}_K) \rightarrow \infty \). Thus in general, the infinite-dimensional version is not obtainable as the limit of the finite-dimensional version as the dimension goes to infinity.

Remark 10

The closed form formulas given by Eqs. (119) and (127) in Theorems 28 and  29, respectively, are derived under more general conditions than those in [24] and are consequently more general but more complicated than the corresponding closed form formulas for the Alpha Log-Det divergences in [24] (see Theorems 12,13, 15, 16 in [24]). Thus for practical applications involving the Alpha Log-Det divergences, the corresponding closed form formulas in [24] should be employed.

Special case\(\gamma = \mu \), \(\alpha = \beta \), \(r = 2\alpha \). In this setting, both Eq. (119) and Eq. (127) simplify to the following
$$\begin{aligned}&D^{(\alpha , \alpha )}_{2\alpha }[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \gamma I_{{\mathcal {H}}_K})]\nonumber \\&\quad =\frac{1}{\alpha ^2}\log \det \left[ \frac{(C+ I_{3m})^{\alpha } + (C+ I_{3m})^{-\alpha }}{2}\right] . \end{aligned}$$
(128)
Numerical experiments. We illustrate the computation and monotonicity of the divergences \(D^{(\alpha , \alpha )}_{2\alpha }\) in Eq.(128) via the following synthetic experiment. We randomly generated two data matrices \({\mathbf {x}}\) and \({\mathbf {y}}\), using two different probability distributions on \({\mathbb {R}}^5\), each of size \([n,m] = [5,100]\), where \(m = 100\) is the number of observations, with each observation being a vector in \({\mathbb {R}}^5\). In the experiments reported here, the data matrix \({\mathbf {x}}\) was generated using a multivariate normal distribution \({\mathcal {N}}(\mu ,\varSigma )\) on \({\mathbb {R}}^5\), with mean \(\mu = [-0.6490 \; 1.1812 \; -0.7585 \; -1.1096 \; -0.8456]\) and covariance matrix
$$\begin{aligned} \varSigma = \begin{pmatrix} 1.0546 &{} -0.5433 &{} 1.7346 &{} -1.3049 &{} 1.1829 \\ -0.5433 &{} 4.4706 &{} 0.7386 &{} 0.0352 &{} -2.6820 \\ 1.7346 &{} 0.7386 &{} 11.0023 &{} -5.1949 &{} -3.3368 \\ -1.3049 &{} 0.0352 &{} -5.1949 &{} 10.2375 &{} 0.6546 \\ 1.1829 &{} -2.6820 &{} -3.3368 &{} 0.6546 &{} 5.2014 \end{pmatrix} . \end{aligned}$$
The data matrix \({\mathbf {y}}\), on the other hand, was generated using a \(\chi ^2\)-square distribution with 3 degrees of freedom. We then computed the finite-dimensional divergences \(D^{(\alpha , \alpha )}[(C_{{\mathbf {x}}}+\gamma I), (C_{{\mathbf {y}}} + \gamma I)]\), where \(C_{{\mathbf {x}}}\), \(C_{{\mathbf {y}}}\) are the empirical covariance matrices corresponding to \({\mathbf {x}}\) and \({\mathbf {y}}\), respectively, which are effectively induced by the linear kernel \(K(x,y) = \langle x,y\rangle \) on \({\mathbb {R}}^5 \times {\mathbb {R}}^5\) with identity feature map \(\varPhi (x) = x\). We next computed the infinite-dimensional divergences \(D^{(\alpha , \alpha )}_{2\alpha }[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \gamma I_{{\mathcal {H}}_K})]\), where \(C_{\varPhi ({\mathbf {x}})}\), \(C_{\varPhi ({\mathbf {y}})}\) are the empirical covariance operators corresponding to \({\mathbf {x}}\) and \({\mathbf {y}}\), respectively, induced by the Gaussian kernel \(K(x,y) = \exp \left( -\frac{||x-y||^2}{\sigma ^2}\right) \) on \({\mathbb {R}}^5 \times {\mathbb {R}}^5\), with \(\sigma = 1\). The regularization parameter was fixed to be \(\gamma = 10^{-6}\). The divergences were computed with 100 values of \(\alpha \), ranging from 0.01 to 9.91, with a step-size of 0.1. The limiting case \(\alpha = 0\) was computed separately. The results are plotted in Fig. 1. The plots confirm in particular the monotonicity stated in Theorem 5, with the divergences attaining maximum values at \(\alpha = 0\), which correspond to the squared affine-invariant Riemannian distances, and decreasing to zero as \(\alpha \rightarrow \infty \).
Fig. 1

Top: The finite-dimensional divergences \(D^{(\alpha , \alpha )}[(C_{{\mathbf {x}}}+\gamma I), (C_{{\mathbf {y}}} + \gamma I)]\). Bottom: The infinite-dimensional divergences \(D^{(\alpha , \alpha )}_{2\alpha }[(C_{\varPhi ({\mathbf {x}})} + \gamma I_{{\mathcal {H}}_K}), (C_{\varPhi ({\mathbf {y}})} + \gamma I_{{\mathcal {H}}_K})]\)

Remark 11

Further analysis of the function \(D^{(\alpha , \alpha )}_{2\alpha }\) as well as real-world applications for the current mathematical formulations, along the lines of those in [28, 29], will be presented in a separate work.

Footnotes

  1. 1.

    The current formulation for Alpha-Beta Log-Det divergences can be generalized to the entire Hilbert manifold of positive definite Hilbert–Schmidt operators. This will be presented in a separate work [25].

Notes

Compliance with ethical standards

Conflict of interest

The author declares that there is no conflict of interest.

References

  1. 1.
    Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J. Matrix Anal. Appl. 29(1), 328–347 (2007)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Barbaresco, F.: Information geometry of covariance matrix: Cartan-Siegel homogeneous bounded domains, Mostow/Berger fibration and Frechet median. Matrix Information Geometry, pp. 199–255. Springer, New York (2013)CrossRefGoogle Scholar
  3. 3.
    Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)zbMATHGoogle Scholar
  4. 4.
    Bhatia, R.: Matrix analysis, vol. 169. Springer, New York (2013)zbMATHGoogle Scholar
  5. 5.
    Bini, D.A., Iannazzo, B.: Computing the Karcher mean of symmetric positive definite matrices. Linear Algebra Appl. 438(4), 1700–1710 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chebbi, Z., Moakher, M.: Means of Hermitian positive-definite matrices based on the log-determinant \(\alpha \)-divergence function. Linear Algebra Appl. 436(7), 1872–1889 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Cherian, A., Sra, S., Banerjee, A., Papanikolopoulos, N.: Jensen-Bregman LogDet divergence with application to efficient similarity search for covariance matrices. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2161–2174 (2013)CrossRefGoogle Scholar
  8. 8.
    Cherian, A., Stanitsas, P., Harandi, M., Morellas, V., Papanikolopoulos, N.: Learning discriminative \(\alpha \beta \)-divergences for positive definite matrices. In The IEEE International Conference on Computer Vision (ICCV), Oct (2017)Google Scholar
  9. 9.
    Cichocki, A., Cruces, S., Amari, S.: Log-Determinant divergences revisited: Alpha-Beta and Gamma Log-Det divergences. Entropy 17(5), 2988–3034 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Fan, K.: On a theorem of Weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. USA 36(1), 31 (1950)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Formont, P., Ovarlez, J.P., Pascal, F.: On the use of matrix information geometry for polarimetric SAR image classification. Matrix Information Geometry, pp. 257–276. Springer, New York (2013)CrossRefGoogle Scholar
  12. 12.
    Harandi, M., Salzmann, M., Porikli, F.: Bregman divergences for infinite dimensional covariance matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1003–1010, (2014)Google Scholar
  13. 13.
    Hasegawa, H.: \(\alpha \)-divergence of the non-commutative information geometry. Rep. Math. Phys. 33(1), 87–93 (1993)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Minh, H.Q.: Regularized divergences between covariance operators and Gaussian measures on Hilbert spaces. arXiv preprint arXiv:1904.05352, (2019)
  15. 15.
    Jayasumana, S., Hartley, R., Salzmann, M., Hongdong, Li., Harandi, M.: Kernel methods on the Riemannian manifold of symmetric positive definite matrices. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 73–80, (2013)Google Scholar
  16. 16.
    Jenčová, A.: Geometry of quantum states: dual connections and divergence functions. Rep. Math. Phys. 47(1), 121–138 (2001)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Jost, J.: Postmodern analysis. Springer, Berlin (1998)CrossRefGoogle Scholar
  18. 18.
    Kittaneh, F., Kosaki, H.: Inequalities for the Schatten p-norm V. Publ. Res. Inst. Math. Sci. 23(2), 433–443 (1987)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Kulis, B., Sustik, M.A., Dhillon, I.S.: Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10, 341–376 (2009)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Larotonda, G.: Nonpositive curvature: A geometrical approach to Hilbert-Schmidt operators. Differ. Geom. Appl. 25, 679–700 (2007)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lawson, J.D., Lim, Y.: The geometric mean, matrices, metrics, and more. Am. Math. Mon. 108(9), 797–812 (2001)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Li, P., Wang, Q., Zuo, W., Zhang, L.: Log-Euclidean kernels for sparse representation and dictionary learning. In International Conference on Computer Vision (ICCV), pp. 1601 – 1608, (2013)Google Scholar
  23. 23.
    Minh, H.Q.: Affine-invariant Riemannian distance between infinite-dimensional covariance operators. In Geometric Science of Information, pp. 30–38, (2015)Google Scholar
  24. 24.
    Minh, H.Q.: Infinite-dimensional Log-Determinant divergences between positive definite trace class operators. Linear Algebra Appl. 528, 331–383 (2017)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Minh, H.Q.: Log-Determinant divergences between positive definite Hilbert-Schmidt operators. In Geometric Science of Information, pp. 505–513, (2017)Google Scholar
  26. 26.
    Minh, H.Q., Murino, V.: From covariance matrices to covariance operators: Data representation from finite to infinite-dimensional settings. Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning. Computer Vision, Statistics, and Optimization, pp. 115–143. Springer International Publishing, Cham (2016)CrossRefGoogle Scholar
  27. 27.
    Minh, H.Q., Murino, V.: In Synthesis Lectures on Computer Vision. Covariances in Computer Vision and Machine Learning. Morgan & Claypool Publishers, San Rafael (2017)CrossRefGoogle Scholar
  28. 28.
    Minh, H.Q., San Biagio, M., Bazzani, L., Murino, V.: Approximate Log-Hilbert-Schmidt distances between covariance operators for image classification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June (2016)Google Scholar
  29. 29.
    Minh, H.Q., San Biagio, M., Murino, V.: Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces. In Advances in Neural Information Processing Systems (NIPS), pp. 388–396, (2014)Google Scholar
  30. 30.
    Mostow, G.D.: Some new decomposition theorems for semi-simple groups. Mem. Am. Math. Soc. 14, 31–54 (1955)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Ohara, A., Eguchi, S.: Geometry on positive definite matrices deformed by v-potentials and its submanifold structure. Geometric Theory of Information, pp. 31–55. Springer International Publishing, Cham (2014)CrossRefGoogle Scholar
  32. 32.
    Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006)CrossRefGoogle Scholar
  33. 33.
    Petryshyn, W.V.: Direct and iterative methods for the solution of linear operator equations in Hilbert spaces. Trans. Am. Math. Soc. 105, 136–175 (1962)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Peypouquet, J.: Convex optimization in normed spaces: theory, methods and examples. Springer, New York (2015)CrossRefGoogle Scholar
  35. 35.
    Pigoli, D., Aston, J., Dryden, I.L., Secchi, P.: Distances and inference for covariance operators. Biometrika 101(2), 409–422 (2014)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Simon, B.: Notes on infinite determinants of Hilbert space operators. Adv. Math. 24, 244–273 (1977)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Sra, S.: A new metric on the manifold of kernel matrices with application to matrix geometric means. In Advances in Neural Information Processing Systems (NIPS), pp. 144–152, (2012)Google Scholar
  38. 38.
    Stanitsas, P., Cherian, A., Morellas, V., Papanikolopoulos, N.: Clustering positive definite matrices by learning information divergences. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1304–1312, (2017)Google Scholar
  39. 39.
    Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.RIKEN Center for Advanced Intelligence ProjectTokyoJapan

Personalised recommendations