Advertisement

Information Geometry

, Volume 1, Issue 2, pp 287–313 | Cite as

Ordering positive definite matrices

  • Cyrus MostajeranEmail author
  • Rodolphe Sepulchre
Open Access
Research Paper

Abstract

We introduce new partial orders on the set \(S^+_n\) of positive definite matrices of dimension n derived from the affine-invariant geometry of \(S^+_n\). The orders are induced by affine-invariant cone fields, which arise naturally from a local analysis of the orders that are compatible with the homogeneous geometry of \(S^+_n\) defined by the natural transitive action of the general linear group GL(n). We then take a geometric approach to the study of monotone functions on \(S^+_n\) and establish a number of relevant results, including an extension of the well-known Löwner-Heinz theorem derived using differential positivity with respect to affine-invariant cone fields.

Keywords

Positive definite matrices Partial orders Monotone functions Monotone flows Differential positivity Matrix means 

Mathematics Subject Classification

15B48 34C12 37C65 47H05 

1 Introduction

Well-defined notions of ordering of elements of a space are of fundamental importance to many areas of applied mathematics, including the theory of monotone functions and matrix means in which orders play a defining role [2, 11, 14, 17]. Partial orders play a key part in a wide variety of applications across information geometry where one is interested in performing statistical analysis on sets of matrices. In such applications, the choice of order relation is often taken for granted. This choice, however, is of crucial significance since a function that is not monotone with respect to one order, may be monotone with respect to another.

We outline a geometric approach to systematically generate orders on homogeneous spaces. A homogeneous space is a manifold that admits a transitive action by a Lie group, in the sense that any two points on the manifold can be mapped onto each other by elements of a group of transformations that act on the space. The observation that cone fields induce conal orders on continuous spaces, combined with the geometry of homogeneous spaces forms the basis of the approach taken in this paper. The aim is to generate cone fields that are invariant with respect to the homogeneous geometry, thereby defining partial orders built upon the underlying symmetries of the space. A smooth cone field on a manifold is often also referred to as a causal structure. The geometry of invariant cone fields and causal structures on homogeneous spaces has been the subject of extensive studies from a Lie theoretic perspective; see [12, 13, 18], for instance. Causal structures induced by quadratic cone fields on manifolds also play a fundamental role in mathematical physics, in particular within the theory of general relativity [22].

The focus of this paper is on ordering the elements of the set of symmetric positive definite matrices \(S^+_n\) of dimension n. Positive definite matrices arise in numerous applications, including as covariance matrices in statistics and computer vision, as variables in convex and semidefinite programming, as unknowns in fundamental problems in systems and control theory, as kernels in machine learning, and as diffusion tensors in medical imaging. The space \(S^+_n\) forms a smooth manifold that can be viewed as a homogeneous space admitting a transitive action by the general linear group GL(n), which endows the space with an affine-invariant geometry as reviewed in Sect. 2. In Sect. 3, this geometry is used to construct affine-invariant cone fields and new partial orders on \(S^+_n\). In Sect. 4, we discuss how differential positivity [9] can be used to study and characterize monotonicity on \(S^+_n\) with respect to the invariant orders introduced in this paper. We also state and prove a generalized version of the celebrated Löwner-Heinz theorem [11, 17] of operator monotonicity theory derived using this approach. In Sect. 5, we consider preorder relations induced by affine-invariant and translation-invariant half-spaces on \(S^+_n\), and provide examples of functions and flows that preserve such structures. Finally, in Sect. 6, we review the notion of matrix means and establish a connection between the geometric mean and affine-invariant cone fields on \(S^+_n\).

2 Homogeneous geometry of \(S^+_n\)

The set \(S^+_n\) of symmetric positive definite matrices of dimension n has the structure of a homogeneous space with a transitive GL(n)-action. The transitive action of GL(n) on \(S^+_n\) is given by congruence transformations of the form
$$\begin{aligned} \tau _A: \varSigma \mapsto A\varSigma A^T \quad \forall A\in GL(n), \quad \ \forall \varSigma \in S^+_n. \end{aligned}$$
(1)
Specifically, if \(\varSigma _1,\varSigma _2\in S^+_n\), then \(\tau _A\) with \(A={\varSigma _2^{1/2}\varSigma _1^{-1/2}}\in GL(n)\) maps \(\varSigma _1\) onto \(\varSigma _2\), where \(\varSigma ^{1/2}\) denotes the unique positive definite square root of \(\varSigma \). This action is said to be almost effective in the sense that \(\pm I\) are the only elements of GL(n) that fix every \(\varSigma \in S^+_n\). The isotropy group of this action at \(\varSigma = I\) is precisely the orthogonal group O(n), since \(\tau _Q: I \mapsto QIQ^T=I\) if and only if \(Q\in O(n)\). Thus, we can identify any \(\varSigma \in S^+_n\) with an element of the quotient space GL(n) / O(n). That is
$$\begin{aligned} S^+_n\cong GL(n)/O(n). \end{aligned}$$
(2)
The identification in (2) can also be made by noting that \(\varSigma \in S^+_n\) admits a Cholesky decomposition \(\varSigma = CC^T\) for some \(C\in GL(n)\). The Cauchy polar decomposition of the invertible matrix C yields a unique decomposition \(C=PQ\) of C into an orthogonal matrix \(Q\in O(n)\) and a symmetric positive definite matrix \(P\in S^n_+\). Now note that if \(\varSigma \) has Cholesky decomposition \(\varSigma = CC^T\) and C has a Cauchy polar decomposition \(C=PQ\), then \(\varSigma =PQQ^TP=P^2\). That is, \(\varSigma \) is invariant with respect to the orthogonal part Q of the polar decomposition. Therefore, we can identify any \(\varSigma \in S^+_n\) with the equivalence class \([\varSigma ^{1/2}]=\varSigma ^{1/2}\cdot O(n)\) in the quotient space GL(n) / O(n).
Recall that the Lie algebra \(\mathfrak {gl}(n)\) of GL(n) consists of the set \(\mathbb {R}^{n\times n}\) of all real \(n\times n\) matrices equipped with the Lie bracket \([X,Y]=XY-YX\), while the Lie algebra of O(n) is \(\mathfrak {o}(n)=\{X\in \mathbb {R}^{n\times n}: X^T=-X\}\). Since any matrix \(X\in \mathbb {R}^{n\times n}\) has a unique decomposition \( X=\frac{1}{2}(X-X^T) + \frac{1}{2}(X+X^T)\), as a sum of an antisymmetric part and a symmetric part, we have \(\mathfrak {gl}(n)=\mathfrak {o}(n)\oplus \mathfrak {m}\), where \(\mathfrak {m}=\{X\in \mathbb {R}^{n\times n}: X^T=X\}\). Furthermore, since \({\text {Ad}}_Q(S)=QSQ^{-1}=QSQ^T\) is a symmetric matrix for each \(S\in \mathfrak {m}\), we have
$$\begin{aligned} {\text {Ad}}_{O(n)}\mathfrak {m}\subseteq \mathfrak {m}, \end{aligned}$$
(3)
which shows that \(S^+_n=GL(n)/O(n)\) is in fact a reductive homogeneous space with reductive decomposition \(\mathfrak {gl}(n)=\mathfrak {o}(n)\oplus \mathfrak {m}\). Also, note that since \((XY-YX)^T=Y^TX^T-X^TY^T\), we have \([\mathfrak {o}(n),\mathfrak {o}(n)]\subseteq \mathfrak {o}(n)\), \([\mathfrak {m},\mathfrak {m}]\subseteq \mathfrak {o}(n)\), and \( [\mathfrak {o}(n),\mathfrak {m}]\subseteq \mathfrak {m}\). The tangent space \(T_oS^+_n\) of \(S^+_n\) at the base-point \(o=[I]=I\cdot O(n)\) is identified with \(\mathfrak {m}\). For each \(\varSigma \in S^+_n\), the action \(\tau _{\varSigma ^{1/2}}:S^+_n\rightarrow S^+_n\) induces the vector space isomorphism \(d\tau _{\varSigma ^{1/2}}\vert _I: T_I S^+_n \rightarrow T_{\varSigma } S^+_n\) given by
$$\begin{aligned} d\tau _{\varSigma ^{1/2}}\big \vert _I X = \varSigma ^{1/2}X\varSigma ^{1/2}, \quad \forall X\in \mathfrak {m}. \end{aligned}$$
(4)
The map (4) can be used to extend structures defined in \(T_o S^+_n\) to structures defined on the tangent bundle \(T S^+_n\) through affine-invariance, provided that the structures in \(T_o S^+_n\) are \({\text {Ad}}_{O(n)}\)-invariant. The \({\text {Ad}}_{O(n)}\)-invariance is required to ensure that the extension to \(T S^+_n\) is unique and thus well-defined. For instance, any homogeneous Riemannian metric on \(S^+_n\cong GL(n)/O(n)\) is determined by an \({\text {Ad}}_{O(n)}\)-invariant inner product on \(\mathfrak {m}\). Any such inner product induces a norm that is rotationally invariant and so can only depend on the scalar invariants \({\text {tr}}(X^k)\) where \(k \ge 1\) and \(X\in \mathfrak {m}\). Moreover, as the inner product is a quadratic function, \(\Vert X\Vert ^2\) must be a linear combination of \(({\text {tr}}(X))^2\) and \({\text {tr}}(X^2)\). Thus, any \({\text {Ad}}_{O(n)}\)-invariant inner product on \(\mathfrak {m}\) must be a scalar multiple of
$$\begin{aligned} \langle X, Y \rangle _{\mathfrak {m}} = {\text {tr}}(XY)+\mu {\text {tr}}(X){\text {tr}}(Y), \end{aligned}$$
(5)
where \(\mu \) is a scalar parameter with \(\mu > - 1/n\) to ensure positive-definiteness [21]. Therefore, the corresponding affine-invariant Riemannian metrics are generated by (4) and given by
$$\begin{aligned} \langle X, Y \rangle _{\varSigma }= & {} \langle \varSigma ^{-1/2}X\varSigma ^{-1/2}, \varSigma ^{-1/2}Y \varSigma ^{-1/2}\rangle _{\mathfrak {m}} \nonumber \\= & {} {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}Y)+\mu {\text {tr}}(\varSigma ^{-1}X){\text {tr}}(\varSigma ^{-1}Y), \end{aligned}$$
(6)
for \(\varSigma \in S^+_n\) and \(X,Y\in T_{\varSigma }S^+_n\). In the case \(\mu = 0\), (6) yields the most commonly used ‘natural’ Riemannian metric on \(S^+_n\), which corresponds to the Fisher information metric for the multivariate normal distribution [8, 23], and has been widely used in applications such as tensor computing in medical imaging [4].

3 Affine-invariant orders

3.1 Affine-invariant cone fields

A cone field \(\mathcal {K}\) on \(S^+_n\) smoothly assigns a cone \(\mathcal {K}(\varSigma )\subset T_{\varSigma }S^+_n\) to each point \(\varSigma \in S^+_n\). In this paper, we consider a cone to be a solid and pointed subset of a vector space that is closed under linear combinations with positive coefficients. We say that \(\mathcal {K}\) is affine-invariant or homogeneous with respect to the quotient geometry \(S^+_n\cong GL(n)/O(n)\) if
$$\begin{aligned} d\tau _A\big \vert _{\varSigma }\mathcal {K}(\varSigma )=\mathcal {K}(\tau _A(\varSigma )), \end{aligned}$$
(7)
for all \(\varSigma \in S^+_n\) and \(A\in GL(n)\). The procedure we will use for constructing affine-invariant cone fields on \(S^+_n\) is similar to the approach taken for generating the affine-invariant Riemannian metrics in Sect. 2. We begin by defining a cone \(\mathcal {K}(I)\) at I that is \({\text {Ad}}_{O(n)}\)-invariant:
$$\begin{aligned} X \in \mathcal {K}(I) \Longleftrightarrow {\text {Ad}}_Q X = d\tau _Q\big \vert _I X = QXQ^T \in \mathcal {K}(I), \, \forall Q\in O(n). \end{aligned}$$
(8)
Using such a cone, we generate a cone field via
$$\begin{aligned} \mathcal {K}(\varSigma )= d\tau _{\varSigma ^{1/2}}\big \vert _I\mathcal {K}(I) = \{X\in T_{\varSigma }S^+_n: \varSigma ^{-1/2}X\varSigma ^{-1/2}\in \mathcal {K}(I)\}. \end{aligned}$$
(9)
The \({\text {Ad}}_{O(n)}\)-invariance condition (8) is satisfied if \(\mathcal {K}(I)\) has a spectral characterization; that is, we can check to see if any given \(X\in T_{I}S^+_n\cong \mathfrak {m}\) lies in \(\mathcal {K}(I)\) using only properties of X that are characterized by its spectrum. This observation leads to the following result.

Proposition 1

A cone \(\mathcal {K}(I)\in T_{I}S^+_n\) is \({\text {Ad}}_{O(n)}\)-invariant if and only if there exists a cone \(\mathcal {K}_{\Lambda }\subset \mathbb {R}^n\) that satisfies
$$\begin{aligned} \varvec{\lambda }\in \mathcal {K}_{\Lambda } \quad \Longleftrightarrow \quad P\varvec{\lambda }\in \mathcal {K}_{\Lambda }, \end{aligned}$$
(10)
for all permutation matrices \(P\in \mathbb {R}^{n\times n}\), such that \(X\in \mathcal {K}(I)\) whenever \(\varvec{\lambda }_X\in \mathcal {K}_{\Lambda }\), where \(\varvec{\lambda }_X=(\lambda _i(X))\) is a vector consisting of the n real eigenvalues of the symmetric matrix X.

For instance, \({\text {tr}}(X)\) and \({\text {tr}}(X^2)\) are both functions of X that are spectrally characterized and indeed \({\text {Ad}}_{O(n)}\)-invariant. Quadratic \({\text {Ad}}_{O(n)}\)-invariant cones are defined by inequalities on suitable linear combinations of \(({\text {tr}}(X))^2\) and \({\text {tr}}(X^2)\).

Proposition 2

For any choice of parameter \(\mu \in (0,n)\), the set
$$\begin{aligned} \mathcal {K}(I)=\{X\in T_IS^+_n:({\text {tr}}(X))^2-\mu {\text {tr}}(X^2) \ge 0, \ {\text {tr}}(X) \ge 0\}, \end{aligned}$$
(11)
defines an \({\text {Ad}}_{O(n)}\)-invariant cone in \(T_{I}S^+_n=\{X\in \mathbb {R}^{n\times n}: X^T=X\}\).

Proof

\({\text {Ad}}_{O(n)}\)-invariance is clear since \({\text {tr}}(X^2)={\text {tr}}(QXQ^TQXQ^T)\) and \({\text {tr}}(X)={\text {tr}}(QXQ^T)\) for all \(Q\in O(n)\). To prove that (11) is a cone, first note that \(0\in \mathcal {K}(I)\) and for \(\lambda >0\), \(X\in \mathcal {K}(I)\), we have \(\lambda X \in \mathcal {K}(I)\) since \({\text {tr}}(\lambda X)=\lambda {\text {tr}}(X)\ge 0\) and
$$\begin{aligned} ({\text {tr}}(\lambda X))^2-\mu {\text {tr}}((\lambda X)^2)=\lambda ^2[({\text {tr}}(X))^2-\mu {\text {tr}}(X^2)] \ge 0. \end{aligned}$$
(12)
To show convexity, let \(X_1, X_2\in \mathcal {K}(I)\). Now \({\text {tr}}(X_1+X_2)={\text {tr}}(X_1)+{\text {tr}}(X_2)\ge 0\), and
$$\begin{aligned}&({\text {tr}}(X_1+X_2))^2 -\mu {\text {tr}}((X_1+X_2)^2) = [({\text {tr}}(X_1))^2-\mu {\text {tr}}(X_1^2)]\nonumber \\&\quad + [({\text {tr}}(X_2))^2-\mu {\text {tr}}(X_2^2)] + 2 [{\text {tr}}(X_1){\text {tr}}(X_2)-\mu {\text {tr}}(X_1X_2)] \ge 0, \end{aligned}$$
(13)
since \({\text {tr}}(X_1X_2) \le ({\text {tr}}(X_1^2))^{\frac{1}{2}}({\text {tr}}(X_2^2))^{\frac{1}{2}} \le \frac{1}{\sqrt{\mu }}{\text {tr}}(X_1)\frac{1}{\sqrt{\mu }}{\text {tr}}(X_2)\), where the first inequality follows by Cauchy-Schwarz. Finally, we need to show that \(\mathcal {K}(I)\) is pointed. If \(X\in \mathcal {K}(I)\) and \(-X \in \mathcal {K}(I)\), then \({\text {tr}}(-X)=-{\text {tr}}(X)=0\). Thus, \(({\text {tr}}(X))^2-\mu {\text {tr}}(X^2)=-\mu {\text {tr}}(X^2)\ge 0\), which is possible if and only if all of the eigenvalues of X are zero; i.e., if and only if X = 0. \(\square \)
The parameter \(\mu \) controls the opening angle of the cone. If \(\mu = 0\), then (11) defines the half-space \({\text {tr}}(X)\ge 0\). As \(\mu \) increases, the opening angle of the cone becomes smaller and for \(\mu = n\) (11) collapses to a ray. For each \(\mu \in (0,n)\), the cone \(\mathcal {K}_{\Lambda }=\mathcal {K}_{\Lambda }^{\mu }\subset \mathbb {R}^n\) of Proposition 1 is given by
$$\begin{aligned} \mathcal {K}_{\Lambda }^{\mu }=\left\{ \varvec{\lambda }=(\lambda _i)\in \mathbb {R}^n:\left( \sum _{i=1}^n\lambda _i\right) ^2-\mu \sum _{i=1}^n\lambda _i^2\ge 0,\, \sum _{i=1}^n\lambda _i\ge 0\right\} , \end{aligned}$$
(14)
since \({\text {tr}}(X)=\sum _{i=1}^n\lambda _i(X)\) and \({\text {tr}}(X^2)=\sum _{i=1}^n\lambda _i^2(X)\). Indeed \(\mathcal {K}_{\Lambda }^{\mu }\) is a quadratic cone
$$\begin{aligned} \mathcal {K}_{\Lambda }^{\mu }=\{\varvec{\lambda }\in \mathbb {R}^n:\varvec{\lambda }^TQ_{\mu }\varvec{\lambda }\ge 0, \varvec{1}^T\varvec{\lambda }\,\ge 0\}, \end{aligned}$$
(15)
where \(\varvec{1}=(1,\ldots , 1)^T\in \mathbb {R}^n\), and \(Q_{\mu }\) is the \(n\times n\) matrix with entries \((Q_{\mu })_{ii}= 1-\mu \) and \((Q_{\mu })_{ij}=1\) for \(i\ne j\).

The dual cone \(C^*\) of a subset C of a vector space is a very important notion in convex analysis. For a vector space \(\mathcal {V}\) endowed with an inner product \(\langle \cdot ,\cdot \rangle \), the dual cone can be defined as \( C^*=\{y\in \mathcal {V}:\langle y,x\rangle \ge 0, \ \forall x\in C\}\). A cone is said to be self-dual if it coincides with its dual cone. It is well-known that the cone of positive semidefinite matrices is self-dual. The following lemma will be used to characterize the form of the dual cone \((\mathcal {K}_{\Lambda }^{\mu })^*\) for each \(\mu \in (0,n)\) with respect to the standard inner product on \(\mathbb {R}^n\).

Lemma 1

The dual cone of the quadratic cone defined by (15) with respect to the standard inner product on \(\mathbb {R}^n\) is given by
$$\begin{aligned} (\mathcal {K}_{\Lambda }^{\mu })^*=\{\varvec{\lambda }\in \mathbb {R}^n:\varvec{\lambda }^TQ_{\mu }^{-1}\varvec{\lambda }\ge 0, \varvec{1}^T\varvec{\lambda }\;\ge 0\}. \end{aligned}$$
(16)
The inverse matrix \(Q_{\mu }^{-1}\) is given by
$$\begin{aligned} (Q_{\mu }^{-1})_{ij}= {\left\{ \begin{array}{ll} \frac{\mu -(n-1)}{\mu (n-\mu )} \quad &{}i=j, \\ \frac{1}{\mu (n-\mu )} \quad &{}i\ne j. \end{array}\right. } \end{aligned}$$
(17)
Since \(\mu (n-\mu )>0\) and \(\mu -(n-1)=1-\mu ^*\) where \(\mu ^*=n-\mu \), we find that \(\varvec{\lambda }^TQ_{\mu }^{-1}\varvec{\lambda }\ge 0\) if and only if \(\varvec{\lambda }^TQ_{\mu ^*}\varvec{\lambda }\ge 0\). That is,
$$\begin{aligned} (\mathcal {K}_{\Lambda }^{\mu })^*=\mathcal {K}_{\Lambda }^{n-\mu }. \end{aligned}$$
(18)
We notice of course from (18) that \({\text {Ad}}_{O(n)}\)-invariant cones are generally not self-dual. Indeed, for quadratic \({\text {Ad}}_{O(n)}\)-invariant cones, self-duality is only achieved for \(\mu =n/2\).
Now for any fixed \(\mu \in (0,n)\), we obtain a unique well-defined affine-invariant cone field given by
$$\begin{aligned} \mathcal {K}(\varSigma )=\{X\in T_{\varSigma }S^+_n:({\text {tr}}(\varSigma ^{-1}X))^2-\mu {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X) \ge 0, \ {\text {tr}}(\varSigma ^{-1}X) \ge 0\}. \end{aligned}$$
(19)
Note that for the value \(\mu =0\), (19) reduces to the affine-invariant half-space field \(\{X\in T_{\varSigma }S^+_n:{\text {tr}}(\varSigma ^{-1}X)\ge 0\}\). At the other extreme, for \(\mu =n\), it is easy to show that the set at I is given by the ray \(\{X\in T_{I}S^+_n:X=\lambda I, \lambda \ge 0\}\). By affine-invariance, (19) reduces to \(\{X\in T_{\varSigma }S^+_n:X=\lambda \varSigma , \lambda \ge 0\}\) for \(\mu =n\), which describes an affine-invariant field of rays in \(S^+_n\).

It should be noted that of course not all \({\text {Ad}}_{O(n)}\)-invariant cones at I are quadratic. Indeed, it is possible to construct polyhedral \({\text {Ad}}_{O(n)}\)-invariant cones that arise as the intersections of a collection of spectrally defined half-spaces in \(T_{I}S^+_n\). The clearest example of such a construction is the cone of positive semidefinite matrices in \(T_{I}S^+_n\), which of course itself has a spectral characterization \(\mathcal {K}(I) = \{X\in T_I S^+_n: \lambda _i(X)\ge 0, \; i = 1, \ldots , n\}\).

3.2 Affine-invariant pseudo-Riemannian structures on \(S^+_n\)

At this point it is instructive to note the following systematic analysis of all affine-invariant pseudo-Riemannian structures on \(S^+_n\) before continuing with our treatment of affine-invariant cone fields. This elegant characterization presents the affine-invariant Riemannian metrics of (6) and the quadratic affine-invariant cone fields of (19) within a unified and rigorous mathematical framework. Recall that a pseudo-Riemannian metric is a generalization of a Riemannian metric in which the metric tensor need not be positive definite, but need only be a non-degenerate, smooth, symmetric bilinear form. The signature of such a metric tensor is defined as the ordered pair consisting of the number of positive and negative eigenvalues of the real and symmetric matrix of the metric tensor with respect to a basis. Note that the signature of a metric tensor is independent of the choice of basis by Sylvester’s law of inertia. A metric tensor on a smooth manifold \(\mathcal {M}\) is called Lorentzian if its signature is \((1,\dim \mathcal {M}-1)\).

The irreducible decomposition of \(\mathfrak {m}\) under the \({\text {Ad}}_{O(n)}\)-action is given by \(\mathfrak {m}=\mathbb {R}I\oplus \mathfrak {m}_0\), where \(\mathfrak {m}_0:=\{X\in \mathfrak {m}:{\text {tr}}X = 0\}\). According to this decomposition, we have \(X=\frac{{\text {tr}}X}{n}I\oplus \pi (X)\) for any \(X\in \mathfrak {m}\), where \(\pi (X):=X-\frac{{\text {tr}}X}{n}I\in \mathfrak {m}_0\). Denote by \(\langle X, Y \rangle _{\mathrm {std}}\) the standard inner product \({\text {tr}}(XY)\) on \(\mathfrak {m}\), and let \(\Vert X\Vert ^2_{\mathrm {std}}:=\langle X,X\rangle _{{\text {std}}}\) be the corresponding norm. Then we have
$$\begin{aligned} {\text {tr}}(X^2) = \Vert X\Vert _{\mathrm {std}}^2 = \frac{({\text {tr}}X)^2}{n}+\Vert \pi (X)\Vert ^2_{\mathrm {std}}. \end{aligned}$$
(20)
Now since \(\mathfrak {m}_0\) is an irreducible \({\text {Ad}}_{O(n)}\)-module, any \({\text {Ad}}_{O(n)}\)-invariant quadratic form on \(\mathfrak {m}_0\) is simply a scalar multiple of \(\Vert \cdot \Vert ^2_{\mathrm {std}}\) by Schur’s lemma. Therefore, any \({\text {Ad}}_{O(n)}\)-invariant quadratic form on \(\mathfrak {m}\) is of the form
$$\begin{aligned} Q_{\alpha \beta }(X):=\alpha \frac{({\text {tr}}X)^2}{n}+\beta \Vert \pi (X)\Vert ^2_{\mathrm {std}}, \end{aligned}$$
(21)
with \(\alpha ,\beta \in \mathbb {R}\). Clearly, \(Q_{\alpha \beta }\) is positive definite if and only if \(\alpha >0\) and \(\beta >0\). Moreover, if \(\alpha >0\) and \(\beta <0\), then \(Q_{\alpha \beta }\) is Lorentzian and the set \(\{X\in \mathfrak {m}:Q_{\alpha \beta }(X)\ge 0,\, {\text {tr}}X\ge 0\}\) defines a pointed cone. Noting that
$$\begin{aligned} {\text {tr}}(XY)+\mu {\text {tr}}(X){\text {tr}}(Y) = \left( \mu +\frac{1}{n}\right) {\text {tr}}(X) {\text {tr}}(Y) + \langle \pi (X), \pi (Y) \rangle _{\mathrm {std}}, \end{aligned}$$
(22)
for each \(X, Y\in \mathfrak {m}\), we confirm that the metrics in (6) are indeed positive definite if and only if \(\mu > -1/n\). Similarly, we find that
$$\begin{aligned} ({\text {tr}}X)^2-\mu {\text {tr}}(X^2)=\frac{n-\mu }{n}({\text {tr}}X)^2 - \mu \Vert \pi (X)\Vert ^2_{\mathrm {std}}, \end{aligned}$$
(23)
which is Lorentzian if and only if \(0<\mu <n\). Thus, we see that the affine-invariant pseudo-Riemannian structures on \(S^+_n\) are essentially either Riemannian or Lorentzian, and the quadratic cone fields in (19) are precisely the cone fields defined by the affine-invariant Lorentzian metrics.

3.3 Affine-invariant partial orders on \(S^+_n\)

A smooth cone field \(\mathcal {K}\) on a manifold \(\mathcal {M}\) gives rise to a conal order \(\prec _{\mathcal {K}}\) on \(\mathcal {M}\), defined by \(x\prec _{\mathcal {K}} y\) if there exists a (piecewise) smooth curve \(\gamma :[0,1]\rightarrow \mathcal {M}\) with \(\gamma (0)=x\), \(\gamma (1)=y\) and \(\gamma '(t)\in \mathcal {K}(\gamma (t))\) whenever the derivative exists. The closure \(\le _{\mathcal {K}}\) of this order is again an order and satisfies \(x\le _{\mathcal {K}} y\) if and only if \(y\in \overline{\{z:x\prec _{\mathcal {K}} z\}}\). We say that \(\mathcal {M}\) is globally orderable if \(\le _{\mathcal {K}}\) is a partial order. Here we will prove that the conal orders induced by affine-invariant cone fields on \(S^+_n\) define partial orders. That is, we will show that the conal orders satisfy the antisymmetry property that \(\varSigma _1\le _{\mathcal {K}} \varSigma _2\) and \(\varSigma _2\le _{\mathcal {K}} \varSigma _1\) together imply \(\varSigma _1=\varSigma _2\), for any affine-invariant cone field \(\mathcal {K}\) on \(S^+_n\). In other words, we will prove that there do not exist any non-trivial closed conal curves in \(S^+_n\). In the following, we will make use of the preimage theorem [3] given below. Recall that given a smooth map \(F:\mathcal {M}\rightarrow \mathcal {N}\) between manifolds, we say that a point \(y\in \mathcal {N}\) is a regular value of F if for all \(x\in F^{-1}(y)\) the map \(dF\vert _x:T_x\mathcal {M}\rightarrow T_y\mathcal {N}\) is surjective.

Theorem 1

(The preimage theorem) Let \(F:\mathcal {M}\rightarrow \mathcal {N}\) be a smooth map of manifolds, with \(\dim \mathcal M=m\) and \(\dim \mathcal {N}=n\). If \(x\in \mathcal {N}\) is a regular value of F, then \(F^{-1}(c)\) is a submanifold of \(\mathcal {M}\) of dimension \(m-n\). Moreover, the tangent space of \(F^{-1}(c)\) at x is equal to \(\ker (dF\vert _x)\).

Now define \(F:S^+_n\rightarrow \mathbb {R}\) by \(F(\varSigma )=\det \varSigma \). By Jacobi’s formula, the differential of the determinant takes the form \( d(\det )\vert _{\varSigma }X={\text {tr}}\left( {\text {adj}}(\varSigma )X\right) , \) where \({\text {adj}}(\varSigma )\) denotes the adjugate of \(\varSigma \). That is,
$$\begin{aligned} dF\vert _{\varSigma }X=(\det \varSigma ){\text {tr}}\left( \varSigma ^{-1}X\right) , \end{aligned}$$
(24)
for all \(X\in T_{\varSigma }S^+_n\). Note that for \(c>0\) and any \(\varSigma \in F^{-1}(c)\), we have \(dF\vert _{\varSigma }I=c{\text {tr}}\left( \varSigma ^{-1}\right) >0\), which clearly shows that any \(c>0\) is a regular value of F. Hence, \(F^{-1}(c)\) is a submanifold of codimension 1 for any choice of \(c>0\). Furthermore, as \({\text {im}}(F) = \mathbb {R}^+=\{c\in \mathbb {R}:c>0\}\), the collection of submanifolds \(\{F^{-1}(c)\}_{c>0}\) forms a foliation of \(S^+_n\). Since \(\det \varSigma >0\) for any \(\varSigma \in S^+_n\), (24) implies that \({\text {ker}}(dF\vert _{\varSigma })=\{X\in T_{\varSigma }S^+_n:{\text {tr}}(\varSigma ^{-1}X)=0\}\). Thus, the tangent spaces to the submanifolds \(\{F^{-1}(c)\}_{c>0}\) are described by the affine-invariant distribution \(\mathcal {D}_{\varSigma }\) of rank \(\dim S^+_n-1 = n(n+1)/2-1\) on \(S^+_n\) defined by \(\mathcal {D}_{\varSigma }:=\{X\in T_{\varSigma }S^+_n:{\text {tr}}(\varSigma ^{-1}X)=0\}\).

Proposition 3

If \(\gamma :[0,1]\rightarrow S^+_n\) is a non-trivial conal curve with respect to a quadratic affine-invariant cone field \(\mathcal {K}\) (19), then
$$\begin{aligned} t_2>t_1 \implies \det (\gamma (t_2))>\det (\gamma (t_1)), \end{aligned}$$
(25)
for \(t_1,t_2\in [0,1]\).

Proof

First note that \(X\in \mathcal {K}(\varSigma ){\setminus }\{0\}\) implies that \({\text {tr}}(\varSigma ^{-1}X)>0\). This follows by noting that if \({\text {tr}}(\varSigma ^{-1}X)=0\), then \({\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X)={\text {tr}}[(\varSigma ^{-1/2}X\varSigma ^{-1/2})^2]\le 0\), which is a contradiction. For simplicity, we assume that \(\gamma \) is a non-trivial smooth conal curve. The proof for a piecewise smooth curve is similar. We then have \({\text {tr}}(\gamma (t)^{-1}\gamma '(t))>0\), which implies that
$$\begin{aligned} \frac{d}{dt}\det \gamma (t)=(\det \gamma (t)){\text {tr}}\left( \gamma (t)^{-1}\gamma '(t)\right) >0. \end{aligned}$$
(26)
\(\square \)

Proposition 3 clearly implies that \(S^+_n\) equipped with any of the cone fields described by (19) does not admit any non-trivial closed conal curves. Indeed, this result holds for all affine-invariant cone fields, not just quadratic ones. To see this, note that the permutation symmetry (10) of Proposition 1, implies that \({\text {tr}}(\varSigma ^{-1}X)\ne 0\) whenever \(X\in \mathcal {K}(\varSigma ){\setminus }\{0\}\). It thus follows by (26) that \(\det \circ \gamma :[0,1]\rightarrow \mathbb {R}^+\) is a strictly monotone function for any non-trivial conal curve \(\gamma \), which rules out the existence of closed conal curves. We thus arrive at the following theorem.

Theorem 2

All affine-invariant conal orders on \(S^+_n\) are partial orders.

At this point it is worth noting a few interesting features of the collection of submanifolds \(\{F^{-1}(c)\}_{c>0}\) of \(S^+_n\). First note that if \(\gamma \) is an inextensible conal curve, then by (26) it must intersect each of the submanifolds \(F^{-1}(c)\) exactly once. That is, for each \(c>0\), \(F^{-1}(c)\) defines a Cauchy surface for the causal structure induced by any affine-invariant cone field. We also note the following results which connect these submanifolds to geodesics on \(S^+_n\) with respect to the standard affine-invariant Riemannian metric \(ds^2={\text {tr}}[(\varSigma ^{-1}d\varSigma )^2]\) on \(S^+_n\).

Proposition 4

Endow \(S^+_n\) with the Riemannian structure defined by the standard Riemannian metric \(ds^2={\text {tr}}[(\varSigma ^{-1}d\varSigma )^2]\). We have the following results.
  1. (i)

    If \(\varSigma _1,\varSigma _2\in S^+_n\) satisfy \(\det \varSigma _1=\det \varSigma _2=c\), then the geodesic from \(\varSigma _1\) to \(\varSigma _2\) lies in \(F^{-1}(c)\).

     
  2. (ii)

    If \(X\in T_{\varSigma }S^+_n\) satisfies \({\text {tr}}(\varSigma ^{-1}X)=0\), then the geodesic through \(\varSigma \) in the direction of X stays on the submanifold \(F^{-1}(\det \varSigma )\).

     

Proof

i) Let \(\varSigma _1,\varSigma _2\in S^+_n\) satisfy \(\det \varSigma _1=\det \varSigma _2\). The geodesic \(\gamma \) from \(\varSigma _1\) to \(\varSigma _2\) is given by
$$\begin{aligned} \gamma (t)=\varSigma _1^{1/2}\exp \left( t\log \left( \varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\right) \right) \varSigma _1^{1/2}. \end{aligned}$$
(27)
Thus, \(\det (\gamma (t))=(\det \varSigma _1)\det (\exp (t\log (\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}))\). Using the matrix identity \(\log (\det A)={\text {tr}}(\log A)\), we find that
$$\begin{aligned} \log \left[ \det \left( \exp \left( t\log \left( \varSigma _1^{-\frac{1}{2}}\varSigma _2\varSigma _1^{-\frac{1}{2}}\right) \right) \right) \right]&= {\text {tr}}\left[ \log \left( \exp \left( t\log \left( \varSigma _1^{-\frac{1}{2}}\varSigma _2\varSigma _1^{-\frac{1}{2}}\right) \right) \right) \right] \nonumber \\&= t {\text {tr}}\left( \log \left( \varSigma _1^{-\frac{1}{2}}\varSigma _2\varSigma _1^{-\frac{1}{2}}\right) \right) \end{aligned}$$
(28)
$$\begin{aligned}&= t \log \left( \det \varSigma _2/\det \varSigma _1\right) = 0. \end{aligned}$$
(29)
Therefore, \(\det (\exp (t\log (\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}))=1\), which implies that \(\det (\gamma (t))=\det \varSigma _1\) for all \(t\in \mathbb {R}\).
ii) The geodesic \(\gamma \) from \(\varSigma \) in the direction of \(X\in T_{\varSigma }S^+_n\) takes the form \(\gamma (t)=\varSigma ^{1/2}\exp (t\varSigma ^{-1/2}X\varSigma ^{-1/2})\varSigma ^{1/2}\). If \({\text {tr}}(\varSigma ^{-1}X)=0\), then
$$\begin{aligned} \log (\det (\exp (t\varSigma ^{-1/2}X\varSigma ^{-1/2})))={\text {tr}}(t\varSigma ^{-1/2}X\varSigma ^{-1/2})=t{\text {tr}}(\varSigma ^{-1}X)=0, \end{aligned}$$
(30)
which implies that \(\det (\gamma (t))=(\det \varSigma )\det (\exp (t\varSigma ^{-1/2}X\varSigma ^{-1/2}))=\det \varSigma \) for all \(t\in \mathbb {R}\). \(\square \)

3.4 Causal semigroups

Define a wedge to be a closed and convex subset of a vector space that is also invariant with respect to scaling by positive numbers. Notice in particular that a wedge need not be pointed. Let \(\mathcal {M}=G/H\) be a homogeneous space, G a Lie group with group identity element e and Lie algebra \(\mathfrak {g}\), H a closed subgroup with Lie algebra \(\mathfrak {h}\), and \(\pi :G\rightarrow \mathcal {M}\) the associated projection map. Assume that the Lie algebra \(\mathfrak {g}\) contains a wedge W such that (i) \(W\cap -W=\mathfrak {h}\) and (ii) \({\text {Ad}}(h)W=W\) for all \(h\in H\). A wedge W is said to be a Lie wedge if \(e^{{\text {ad}}h}W=W\) for all \(h\in W\cap -W\). Denoting the left action of G on \(\mathcal {M}\) by \(\tau _g:\mathcal {M}\rightarrow \mathcal {M}\), we have \(\pi \circ \lambda _g=\tau _g\circ \pi \), where \(\lambda _g\) is the left multiplication with g on G. Conditions (i) and (ii) ensure that \(d\pi \vert _g\circ d\lambda _g\vert _eW\) only depends on \(\pi (g)\), so that
$$\begin{aligned} \mathcal {K}(\pi (g))=\left( d\pi \vert _g\circ d\lambda _g\vert _e\right) W, \end{aligned}$$
(31)
yields a well-defined field of pointed cones on \(\mathcal {M}\) that is invariant under the action of G on \(\mathcal {M}\): \(d\tau _g\vert _x\mathcal {K}(x)=\mathcal {K}(\tau _g(x))\). These results can be found in [13]. The set \(S=\{g\in G: o \le _{\mathcal {K}} \tau _g(o)\}\), where \(o=\pi (e)\), is a closed semigroup of G referred to as the causal semigroup of \((\mathcal {M},G,\mathcal {K})\). The following theorem is derived from [18].

Theorem 3

Let \(S=\overline{\langle \exp W\rangle H}\subseteq G\), then \(S=\pi ^{-1}\left( \{x\in \mathcal {M}:o\le _{\mathcal {K}} x\}\right) \) and \(\mathcal {M}\) is globally orderable with respect to \(\mathcal {K}\) if and only if \(W=\varvec{L}(S)\), where
$$\begin{aligned} \varvec{L}(S)=\{Z\in \mathfrak {g}:\exp (\mathbb {R}^+Z)\subseteq S\}. \end{aligned}$$
(32)

The affine-invariant cone fields on \(S^+_n=GL(n)/O(n)\) can be viewed as projections of invariant wedge fields on the Lie group GL(n) in the sense of the above results. Since we have the reductive decomposition \(\mathfrak {gl}(n)=\mathfrak {o}(n)\oplus \mathfrak {m}\), it is easy to construct the corresponding wedge field W that satisfies conditions (i) and (ii) for a given affine-invariant cone field \(\mathcal {K}\). We will now use this structure and Theorem 3 to prove the following important result.

Theorem 4

Let \(S^+_n\) be equipped with an affine-invariant cone field \(\mathcal {K}\) and the standard affine-invariant Riemannian metric \(ds^2={\text {tr}}[(\varSigma ^{-1}d\varSigma )^2]\). For any pair of matrices \(\varSigma _1,\varSigma _2\in S^+_n\), we have \(\varSigma _1\le _{\mathcal {K}}\varSigma _2\) if and only if the geodesic from \(\varSigma _1\) to \(\varSigma _2\) is a conal curve.

Proof

Note that the expression of the geodesic from \(\varSigma _1\) to \(\varSigma _2\) given in (27) implies that this theorem is equivalent to
$$\begin{aligned} \varSigma _1\le _{\mathcal {K}}\varSigma _2 \quad \Longleftrightarrow \quad \log \left( \varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\right) \in \mathcal {K}(I). \end{aligned}$$
(33)
As \(\mathcal {K}\) is affine-invariant, \(\varSigma _1\le _{\mathcal {K}}\varSigma _2\) is equivalent to \(I\le \varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\). Thus, it is sufficient to prove that
$$\begin{aligned} I\le _{\mathcal {K}}\varSigma \quad \Longleftrightarrow \quad \log \left( \varSigma \right) \in \mathcal {K}(I), \end{aligned}$$
(34)
for any \(\varSigma \in S^+_n\). We define a wedge W in \(\mathfrak {gl}(n)\) by
$$\begin{aligned} W:=\{X+Y: X\in \mathcal {K}(I), Y\in \mathfrak {o}(n)\}\subset \mathfrak {gl}(n)=\mathfrak {m}\oplus \mathfrak {o}(n), \end{aligned}$$
(35)
where \(\mathcal {K}(I)\) is viewed as a subset of \(\mathfrak {m}\cong T_I S^+_n\). Note that (35) ensures that W satisfies the properties required of it in Theorem 3. If \(I\le _{\mathcal {K}}\varSigma \), it follows from Theorem 3 that there exists \(A\in W\) such that
$$\begin{aligned} \varSigma =\pi (\exp A)=\tau _{\exp A}(I)=(\exp A)(\exp A)^T. \end{aligned}$$
(36)
By the polar decomposition theorem of [16], any element \(g=\exp A\) of the semigroup \(S=\overline{\langle \exp W\rangle O(n)}\subset GL(n)\) admits a unique decomposition as \(g=(\exp X)Q\) with \(X\in W\cap \mathfrak {m}=\mathcal {K}(I)\) and \(Q\in O(n)\). Thus, we have
$$\begin{aligned} \varSigma =\tau _g(I)=\tau _{\exp X}(I)=\exp 2X, \end{aligned}$$
(37)
so that \(\log \varSigma =2X\in \mathcal {K}(I)\). \(\square \)

Remark 1

Let \(\mathcal {K}\) be a quadratic affine-invariant cone field described by (19). Given a pair \(\varSigma _1, \varSigma _2\in S^+_n\), we have by Theorem 4 that \(\varSigma _1\le _{\mathcal {K}} \varSigma _2\) if and only if \(\log \left( \varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\right) \in \mathcal {K}(I)\), which is equivalent to
$$\begin{aligned} {\left\{ \begin{array}{ll} {\text {tr}}\left( \log (\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2})\right) \ge 0, \\ \left( {\text {tr}}(\log (\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}))\right) ^2-\mu {\text {tr}}\left[ (\log (\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}))^2\right] \ge 0. \end{array}\right. } \end{aligned}$$
(38)
Since \(\varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\) and \(\varSigma _2\varSigma _1^{-1}\) have the same spectrum, (38) can be written as
$$\begin{aligned} {\left\{ \begin{array}{ll} {\text {tr}}\left( \log (\varSigma _2\varSigma _1^{-1})\right) \ge 0, \\ \left( {\text {tr}}(\log (\varSigma _2\varSigma _1^{-1}))\right) ^2-\mu {\text {tr}}\left[ (\log (\varSigma _2\varSigma _1^{-1}))^2\right] \ge 0, \end{array}\right. } \end{aligned}$$
(39)
which has the virtue of not involving square roots of \(\varSigma _1\) and \(\varSigma _2\). Equation (39) in turn is equivalent to
$$\begin{aligned} {\left\{ \begin{array}{ll} \sum \nolimits _i\log \lambda _i\ge 0, \\ \left( \sum \nolimits _i\log \lambda _i\right) ^2-\mu \sum \nolimits _i(\log \lambda _i)^2\ge 0, \end{array}\right. } \end{aligned}$$
(40)
where \(\lambda _i=\lambda _i(\varSigma _2\varSigma _1^{-1})\) \((i=1,\ldots ,n)\) denote the n real and positive eigenvalues of \(\varSigma _2\varSigma _1^{-1}\). We have thus used invariance to reduce the question of whether a pair of positive definite matrices \(\varSigma _1\) and \(\varSigma _2\) are ordered with respect to any of the quadratic affine-invariant cone fields to a pair of inequalities involving the spectrum of \(\varSigma _2\varSigma _1^{-1}\).

3.5 Visualization of affine-invariant cone fields on \(S^+_2\)

It is well-known that the set of positive semidefinite matrices of dimension n forms a cone in the space of symmetric \(n\times n\) matrices. Moreover, \(S^+_n\) forms the interior of this cone. A concrete visualization of this identification can be made in the \(n=2\) case, as shown in Fig. 1a. The set \(S^+_2\) can be identified with the interior of the set \(K=\{(x,y,z)\in \mathbb {R}^3: z^2-x^2-y^2 \ge 0, \ z \ge 0\}\), through the bijection \(\phi :S^+_2\rightarrow {\text {int}} K\) given by
$$\begin{aligned} \phi : \begin{pmatrix} a \ &{} \ b \\ b \ &{} \ c \end{pmatrix} \mapsto (x,y,z)=\left( \sqrt{2}b,\frac{1}{\sqrt{2}}(a-c),\frac{1}{\sqrt{2}}(a+c)\right) . \end{aligned}$$
(41)
Inverting \(\phi \), we find that \(a = (z+y)/\sqrt{2}\), \(b=x/\sqrt{2}\), \(c=(z-y)/\sqrt{2}\). Note that the point \((x,y,z)=(0,0,\sqrt{2})\) corresponds to the identity matrix \(I\in S^+_2\). We seek to arrive at a visual representation of the affine-invariant cone fields generated from the \({\text {Ad}}_{O(n)}\)-invariant cones (11) for different choices of the parameter \(\mu \). The defining inequalities \({\text {tr}}(X)\ge 0\) and \(({\text {tr}}(X))^2 - \mu {\text {tr}}(X^2) \ge 0\) in \(T_IS^+_2\) take the forms
$$\begin{aligned} \delta z \ge 0, \quad \text {and} \quad \left( \frac{2}{\mu }-1\right) \delta z^2 - \delta x^2-\delta y^2 \ge 0, \end{aligned}$$
(42)
respectively, where \((\delta x, \delta y, \delta z)\in T_{(0,0,\sqrt{2})} K\cong T_{I}S^+_2\). The corresponding spectral cone \(\mathcal {K}_{\Lambda }^{\mu }\subset \mathbb {R}^2\) is given by
$$\begin{aligned} \lambda _1+\lambda _2\ge 0, \quad \text {and} \quad (\lambda _1+\lambda _2)^2-\mu (\lambda _1^2+\lambda _2^2)\ge 0. \end{aligned}$$
(43)
See Fig. 1b for an illustration of such a cone for a choice of \(\mu \in (0,1)\).
Fig. 1

a Identification of \(S^+_2\) with the interior of the closed, convex, pointed cone \(K=\{(x,y,z)\in \mathbb {R}^3:z^2-x^2-y^2 \ge 0\), \(z\ge 0\)} in \(\mathbb {R}^3\). The \({\text {Ad}}_{O(n)}\)-invariant cone \(\mathcal {K}(I)\subset T_{I}S^+_n\) at identity is also shown for a choice of \(\mu \in (0,1)\). b The corresponding spectral cone \(\mathcal {K}^{\mu }_{\Lambda }\subset \mathbb {R}^2\) which characterizes the cone \(\mathcal {K}(I)\subset T_{I}S^+_n\)

Clearly the translation invariant cone fields generated from this cone are given by the same equations as in (42) for \((\delta x, \delta y, \delta z)\in T_{(x,y,z)} K\cong T_{\varSigma }S^+_2\), where \(\phi (\varSigma )=(x,y,z)\). To obtain the affine-invariant cone fields, note that at \(\varSigma = \phi ^{-1}(x,y,z)\in S^+_2\), the inequality \({\text {tr}}(\varSigma ^{-1}X)\ge 0\) takes the form
$$\begin{aligned}&{\text {tr}}\left[ \begin{pmatrix} c &{} -b \\ -b &{} a \end{pmatrix} \begin{pmatrix} \delta a \ &{} \ \delta b \\ \delta b \ &{} \ \delta c \end{pmatrix} \right] = c\ \delta a - 2b \ \delta b +a \ \delta c \ge 0 \end{aligned}$$
(44)
$$\begin{aligned}&\qquad \Longleftrightarrow \ z\ \delta z - x \ \delta x - y \ \delta y \ge 0. \end{aligned}$$
(45)
Similarly, the inequality \(({\text {tr}}(\varSigma ^{-1}X))^2-\mu {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X)\ge 0\) is equivalent to
$$\begin{aligned}&2(x\ \delta x+y\ \delta y - z\ \delta z)^2 - \mu \Bigg [(z^2+x^2-y^2)\delta x^2+(z^2-x^2-y^2)\delta y^2\nonumber \\&\quad + (x^2+y^2+z^2)\delta z^2+ \ 4xy\ \delta x \delta y - 4xz\ \delta x \delta z - 4yz\ \delta y \delta z\Bigg ] \ge 0, \end{aligned}$$
(46)
where \((\delta x, \delta y, \delta z)\in T_{(x,y,z)} K\cong T_{\varSigma }S^+_2\). In the case \(\mu =1\), this reduces to \((\frac{2}{\mu }-1)\delta z^2-\delta x^2-\delta y^2 \ge 0\). Thus, for \(\mu =1\) the quadratic cone field generated by affine-invariance coincides with the corresponding translation-invariant cone field. Generally, however, affine-invariant and translation-invariant cone fields do not agree, as depicted in Fig. 2. Each of the distinct cone fields in Fig. 2 induces a distinct partial order on \(S^+_n\).
Fig. 2

Cone fields on \(S^+_2\): a quadratic affine-invariant cone fields for different choices of the parameter \(\mu \in (0,2)\). b The corresponding translation-invariant cone fields

3.6 The Löwner order

The Löwner order is the partial order \(\ge _L\) on \(S^+_n\) defined by
$$\begin{aligned} A \ge _L B \quad \Longleftrightarrow \quad A-B \ge _L O, \end{aligned}$$
(47)
where the inequality on the right denotes that \(A-B\) is positive semidefinite [5]. The definition in (47) is based on translations and the ‘flat’ geometry of \(S^+_n\). It is clear that the Löwner order is translation invariant in the sense that \(A\ge _L B\) implies that \(A+C \ge _L B +C\) for all \(A,B,C\in S^+_n\). From the perspective of conal orders, the Löwner order is the partial order induced by the cone field generated by translations of the cone of positive semidefinite matrices at \(T_IS^+_n\).
In the previous section, we gave an explicit construction showing that the cone field generated through translations of the cone of positive semidefinite matrices at \(T_IS^+_n\) coincides with the cone field generated through affine-invariance in the \(n=2\) case. We will now show that this is a general result which holds for all n. First note that the cone at \(T_IS^+_n\) can be expressed as
$$\begin{aligned} \mathcal {K}(I)=\{X\in T_IS^+_n: u^TXu \ge 0 \ \forall u\in \mathbb {R}^n,\ u^TXu = 0 \Rightarrow u =0\}, \end{aligned}$$
(48)
and the resulting translation-invariant cone field is simply given by
$$\begin{aligned} \mathcal {K}_T(\varSigma )=\{X\in T_{\varSigma }S^+_n: u^TXu \ge 0 \ \forall u\in \mathbb {R}^n,\ u^TXu = 0 \Rightarrow u =0\}. \end{aligned}$$
(49)
The corresponding affine-invariant cone field is given by
$$\begin{aligned}&\mathcal {K}_A(\varSigma )=\{X\in T_{\varSigma }S^+_n: u^T\varSigma ^{-1/2}X\varSigma ^{-1/2}u \ge 0 \ \forall u\in \mathbb {R}^n, \nonumber \\&u^T\varSigma ^{-1/2} X\varSigma ^{-1/2}u = 0 \Rightarrow u =0\}, \end{aligned}$$
(50)
which is seen to be equal to \(\mathcal {K}_T\) by introducing the invertible transformation \(\bar{u}=\varSigma ^{-1/2}u\) in (50). Thus we see that the Löwner order enjoys the special status of being both affine-invariant and translation-invariant, even though its classical definition is based on the ‘flat’ or translational geometry on \(S^+_n\).

4 Monotone functions on \(S^+_n\)

4.1 Differential positivity

Let f be a map of \(S^+_n\) into itself. We say that f is monotone with respect to a partial order \(\ge \) on \(S^+_n\) if \(f(\varSigma _1)\ge f(\varSigma _2)\) whenever \(\varSigma _1\ge \varSigma _2\). Such functions were introduced by Löwner in his seminal paper [17] on operator monotone functions. Since then operator monotone functions have been studied extensively and found applications to many fields including electrical engineering [1], network theory, and quantum information theory [6, 19]. Monotonicity of mappings and dynamical systems with respect to partial orders induced by cone fields have a local geometric characterization in the form of differential positivity [9]. A smooth map \(f:S^+_n\rightarrow S^+_n\) is said to be differentially positive with respect to a cone field \(\mathcal {K}\) on \(S^+_n\) if \(df\vert _{\varSigma }(\delta \varSigma )\in \mathcal {K}(f(\varSigma ))\) whenever \(\delta \varSigma \in \mathcal {K}(\varSigma )\), where \(df\vert _{\varSigma }:T_{\varSigma }S^+_n \rightarrow T_{f(\varSigma )}S^+_n\) denotes the differential of f at \(\varSigma \). Assuming that \(\ge _{\mathcal {K}}\) is a partial order induced by \(\mathcal {K}\), then f is monotone with respect to \(\ge _{\mathcal {K}}\) if and only if it is differentially positive with respect to \(\mathcal {K}\). To see this, recall that \(\varSigma _2\ge _{\mathcal {K}} \varSigma _1\) means that there exists some conal curve \(\gamma :[0,1]\rightarrow S^+_n\) such that \(\gamma (0)=\varSigma _ 1\), \(\gamma (1)=\varSigma _2\) and \(\gamma '(t)\in \mathcal {K}(\gamma (t))\) for all \(t\in (0,1)\). Now \(f\circ \gamma : [0,1]\rightarrow S^+_n\) is a curve in \(S^+_n\) with \((f\circ \gamma ) (0)=f(\varSigma _1)\), \((f\circ \gamma )(1)=f(\varSigma _2)\), and
$$\begin{aligned} (f\circ \gamma )'(t)=df\vert _{\gamma (t)}\gamma '(t). \end{aligned}$$
(51)
Hence, \(f\circ \gamma \) is a conal curve joining \(f(\varSigma _1)\) to \(f(\varSigma _2)\) if and only if \(df\vert _{\gamma (t)}\mathcal {K}(\gamma (t))\subseteq \mathcal {K}(f(\gamma (t))\).

4.2 The generalized Löwner-Heinz theorem

One of the most fundamental results in operator theory is the Löwner-Heinz theorem [11, 17] stated below.

Theorem 5

(Löwner-Heinz) If \(\varSigma _1\ge _L \varSigma _2\) in \(S^+_n\) and \(r\in [0,1]\), then
$$\begin{aligned} \varSigma _1^r\ge _L\varSigma _2^r. \end{aligned}$$
(52)

Furthermore, if \(n\ge 2\) and \(r>1\), then \(\varSigma _1\ge _L \varSigma _2 \not \Rightarrow \varSigma _1^r\ge _L \varSigma _2^r\).

There are several different proofs of the Löwner-Heinz theorem. See [5, 11, 17, 20], for instance. Most of these proofs are based on analytic methods, such as integral representations from complex analysis. Instead we employ a geometric approach to study monotonicity based on a differential analysis of the system. One of the advantages of such an approach is that it is immediately applicable to all of the conal orders considered in this paper, while providing geometric insight into the behavior of the map under consideration. By using invariant differential positivity with respect to the family of affine-invariant cone fields in (19), we arrive at the following extension to the Löwner-Heinz theorem.

Theorem 6

(Generalized Löwner-Heinz) For any of the affine-invariant partial orders induced by the quadratic cone fields (19) parametrized by \(\mu \), the map \(f_r(\varSigma )=\varSigma ^r\) is monotone on \(S^+_n\) for any \(r\in [0,1]\).

This result suggests that the monotonicity of the map \(f_r: \varSigma \mapsto \varSigma ^r\) for \(r\in (0,1)\) is intimately connected to the affine-invariant geometry of \(S^+_n\) and not its translational geometry. The structure of the proof of Theorem 6 is as follows. We first prove that the map \(f_{1/p}:\varSigma \mapsto \varSigma ^{1/p}\) is monotone for any \(p\in \mathbb {N}\). We then extend this result to maps \(f_{q/p}:\varSigma \mapsto \varSigma ^{q/p}\) for rational numbers \(q/p\in \mathbb {Q}\cap (0,1)\), before arriving at the full result via a density argument. We prove monotonicty by establishing differential positivity in each case. To prove the monotonicity of \(f_{1/p}:\varSigma \mapsto \varSigma ^{1/p}\), \(p\in \mathbb {N}\), we only need the following lemma [24].

Lemma 2

If A and B are Hermitian \(n\times n\) matrices, then
$$\begin{aligned} {\text {tr}}[(AB)^{2m}]\le {\text {tr}}[A^{2m}B^{2m}], \quad m\in \mathbb {N}. \end{aligned}$$
(53)

The proof of the theorem for rational exponents is based on a simple observation whose proof nonetheless requires a few technical steps that are based on Proposition 5, which itself relies on Lemma 3 established in [7, 10].

Lemma 3

Let FG be real-valued functions on some domain \(D\subseteq \mathbb {R}\) and \(\varSigma \), X be Hermitian matrices, such that the spectrum of \(\varSigma \) is contained in D. If (FG) is an antimonotone pair so that \((F(a)-F(b))(G(a)-G(b))\le 0\) for all \(a,b\in D\), then
$$\begin{aligned} {\text {tr}}\left[ F(\varSigma )XG(\varSigma )X\right] \ge {\text {tr}}\left[ F(\varSigma )G(\varSigma )X^2\right] . \end{aligned}$$
(54)

Proposition 5

If \(\varSigma \in S^+_n\) and X is a Hermitian matrix, then
$$\begin{aligned} {\text {tr}}\left( \varSigma ^{-2-k}X\varSigma ^{k}X\right) \ge {\text {tr}}\left( \varSigma ^{-1-k}X\varSigma ^{-1+k}X\right) , \end{aligned}$$
(55)
for integers \(k\ge 0\).

Proof

Define \(F,G:(0,\infty )\rightarrow \mathbb {R}\) by \(F(x):=x^{-1-2k}\) and \(G(x):=x\), and note that \((F(a)-F(b))(G(a)-G(b))\le 0\) for all \(a,b> 0\). Let \(\varSigma \in S^+_n\) and X be a Hermitian matrix. Then, we have
$$\begin{aligned} {\text {tr}}\left( \varSigma ^{-2-k}X\varSigma ^{k}X\right)&= {\text {tr}}\left[ \varSigma ^{-1-2k}\left( \varSigma ^{\frac{-1+k}{2}}X\varSigma ^{\frac{-1+k}{2}}\right) \varSigma \left( \varSigma ^{\frac{-1+k}{2}}X\varSigma ^{\frac{-1+k}{2}}\right) \right] \nonumber \\&\ge {\text {tr}}\left[ \varSigma ^{-2k}\left( \varSigma ^{\frac{-1+k}{2}}X\varSigma ^{\frac{-1+k}{2}}\right) \left( \varSigma ^{\frac{-1+k}{2}}X\varSigma ^{\frac{-1+k}{2}}\right) \right] \end{aligned}$$
(56)
$$\begin{aligned}&= {\text {tr}}\left( \varSigma ^{-1-k}X\varSigma ^{-1+k}X\right) , \end{aligned}$$
(57)
following an application of Lemma 3 with the Hermitian matrix replaced by \(\varSigma ^{(-1+k)/2}X\varSigma ^{(-1+k)/2}\). \(\square \)

Proof of Theorem 6:

The differential \(df_{1/p}\vert _{\varSigma }:T_{\varSigma }S^+_n\rightarrow T_{f_{1/p}(\varSigma )}S^+_n\) of \(f_{1/p}\) satisfies the generalized Sylvester equation
$$\begin{aligned} \sum _{j=0}^{p-1}(\varSigma ^{1/p})^{p-1-j}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{j}=X, \end{aligned}$$
(58)
for every \(X\in T_{\varSigma }S^+_n\). Thus,
$$\begin{aligned} \sum _{j=0}^{p-1}(\varSigma ^{1/p})^{p-1-j-\frac{1}{2}p}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{j-\frac{1}{2}p}=\varSigma ^{-1/2}X\varSigma ^{-1/2}. \end{aligned}$$
(59)
Taking the trace of (59) yields
$$\begin{aligned} {\text {tr}}\left( \sum _{j=0}^{p-1}(\varSigma ^{1/p})^{\frac{1}{2}p-1-j}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{j-\frac{1}{2}p}\right)&= {\text {tr}}(\varSigma ^{-1/2}X\varSigma ^{-1/2}) \end{aligned}$$
(60)
$$\begin{aligned} \implies \quad {\text {tr}}\left( \sum _{j=0}^{p-1}\varSigma ^{-1/p}(df_{1/p}\vert _{\varSigma }X)\right)&= {\text {tr}}(\varSigma ^{-1}X). \end{aligned}$$
(61)
That is, \(p{\text {tr}}\left( (f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)\right) = {\text {tr}}(\varSigma ^{-1}X)\), for all \(X\in T_{\varSigma }S^+_n\). Now taking the trace of the square of (59), we obtain
$$\begin{aligned} {\text {tr}}\left( \sum _{i,j=0}^{p-1}(\varSigma ^{1/p})^{i-j-1}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{j-i-1}(df_{1/p}\vert _{\varSigma }X)\right) ={\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X). \end{aligned}$$
(62)
The left-hand side of (62) can be rewritten as
$$\begin{aligned}&\sum _{i,j=0}^{p-1}{\text {tr}}\left[ \left( (\varSigma ^{1/p})^{\frac{j-i-1}{2}}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{\frac{j-i-1}{2}}\right) ^2 \left( (\varSigma ^{1/p})^{i-j}\right) ^2\right] \end{aligned}$$
(63)
$$\begin{aligned}&\quad \ge \quad \sum _{i,j=0}^{p-1}{\text {tr}}\left[ \left( (\varSigma ^{1/p})^{\frac{j-i-1}{2}}(df_{1/p}\vert _{\varSigma }X)(\varSigma ^{1/p})^{\frac{j-i-1}{2}}(\varSigma ^{1/p})^{i-j}\right) ^2\right] \end{aligned}$$
(64)
$$\begin{aligned}&\quad = \quad \sum _{i,j=0}^{p-1}{\text {tr}}\left[ \varSigma ^{-1/p}(df_{1/p}\vert _{\varSigma }X)\varSigma ^{-1/p}(df_{1/p}\vert _{\varSigma }X)\right] \end{aligned}$$
(65)
$$\begin{aligned}&\quad = \quad p^2{\text {tr}}\left[ (f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)(f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)\right] , \end{aligned}$$
(66)
where the inequality follows from an application of Lemma 2. Thus,
$$\begin{aligned} {\text {tr}}\left[ \left( (f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)\right) ^2\right] \le \frac{1}{p^2} {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X). \end{aligned}$$
(67)
Combined with (61), this implies that
$$\begin{aligned}&{[}{\text {tr}}\left( (f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)\right) ]^2-\mu {\text {tr}}\left[ \left( (f_{1/p}(\varSigma ))^{-1}(df_{1/p}\vert _{\varSigma }X)\right) ^2\right] \nonumber \\&\quad \ge \frac{1}{p^2}\left( [{\text {tr}}(\varSigma ^{-1}X)]^2-\mu {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X)\right) \ge 0,&\end{aligned}$$
(68)
for all \(X\in \mathcal {K}(\varSigma )\). That is, \((df_{1/p}\vert _{\varSigma })\mathcal {K}(\varSigma )\subseteq \mathcal {K}(f_{1/p}(\varSigma ))\) for any choice of \(\mu \).
This result can be extended to all rational powers \(q/p\in \mathbb {Q}\cap [0,1]\) by combining two observations. First, since the inverse of the p-th root matrix function \(f_{1/p}\) is the p-th power function \(f_p:\varSigma \mapsto \varSigma ^p\) and \(f_{1/p}\) contracts the invariant cone field \(\mathcal {K}\), \(f_p\) must expand \(\mathcal {K}\). Second, this expansion is greater for larger p. That is, for positive integers \(p_1\le p_2\),
$$\begin{aligned} \left( d\tau _{\varSigma ^{-1/2p_1}}\vert _{\varSigma ^{p_1}}\circ df_{p_1}\vert _{\varSigma }\right) \mathcal {K}(\varSigma )\subseteq \left( d\tau _{\varSigma ^{-1/2p_2}}\vert _{\varSigma ^{p_2}}\circ df_{p_2}\vert _{\varSigma }\right) \mathcal {K}(\varSigma ). \end{aligned}$$
(69)
Thus, the map \(f_{q/p}=f_q\circ f_{1/p}\) is differentially positive, since the contraction of the cone field by \(f_{1/p}\) will dominate the expansion of the cone field by \(f_q\) for \(p\ge q\). Note that the contractions and expansions referred to here need not be strict for the argument to hold. To prove (69), it is sufficient to show that the map \(f_{p+1}\) expands the cone field at least as much as \(f_{p}\) for any \(p\in \mathbb {N}\). This is done by showing that
$$\begin{aligned} df_p\vert _{\varSigma }X\in \partial \mathcal {K}(\varSigma ^p) \implies df_{p+1}\vert _{\varSigma }X\notin {\text {int}}\mathcal {K}(\varSigma ^{p+1}), \end{aligned}$$
(70)
for any \(\varSigma \in S^+_n\) and \(X\in T_{\varSigma }S^+_n\), where \(\partial \mathcal {K}(\varSigma ^p)\) denotes the boundary of \(\mathcal {K}(\varSigma ^p)\). Note that \(df_p\vert _{\varSigma }X\in \partial \mathcal {K}(\varSigma ^p)\) implies that \(X\in \mathcal {K}(\varSigma )\), since \(f_p\) expands \(\mathcal {K}\). The implication in (70) shows that the expansion of the cone field by \(f_{p+1}\) is at least as great as that of \(f_p\) by linearity of the differential maps. Using \({\text {tr}}(f_{p}(\varSigma )^{-1}df_p\vert _{\varSigma }X)=p{\text {tr}}(\varSigma ^{-1}X)\), we see that \(df_p\vert _{\varSigma }X\in \partial \mathcal {K}(\varSigma ^p)\) is equivalent to
$$\begin{aligned} p^2{\text {tr}}(\varSigma ^{-1}X)^2=\mu \sum _{i,j=0}^{p-1}{\text {tr}}\left( \varSigma ^{-1+i-j}X\varSigma ^{-1+j-i}X\right) . \end{aligned}$$
(71)
Assuming (71), we have
$$\begin{aligned}&\left[ {\text {tr}}\left( (f_{p+1}(\varSigma ))^{-1}(df_{p+1}\vert _{\varSigma }X)\right) \right] ^2-\mu {\text {tr}}\left[ \left( (f_{p+1}(\varSigma ))^{-1}(df_{p+1}\vert _{\varSigma }X)\right) ^2\right] \nonumber \\&\quad = (p+1)^2{\text {tr}}(\varSigma ^{-1}X)^2-\mu \sum _{i,j=0}^{p}{\text {tr}}\left( \varSigma ^{-1+i-j}X\varSigma ^{-1+j-i}X\right) \nonumber \\&\quad = \frac{\mu (p+1)^2}{p^2}\sum _{i,j=0}^{p-1}{\text {tr}}\left( \varSigma ^{-1+i-j}X\varSigma ^{-1+j-i}X\right) \nonumber \\&\qquad - \mu \sum _{i,j=0}^{p}{\text {tr}}\left( \varSigma ^{-1+i-j}X\varSigma ^{-1+j-i}X\right) , \end{aligned}$$
(72)
where the last equation follows from substitution using (71). Using the simplification \(\sum _{i,j=0}^{p-1}{\text {tr}}\left( \varSigma ^{-1+i-j}X\varSigma ^{-1+j-i}X\right) =\sum _{k=0}^{p-1}\alpha _k{\text {tr}}\left( \varSigma ^{-k-1}X\varSigma ^{k-1}X\right) \), where \(\alpha _0=p\) and \(\alpha _k=2(p-k)\) for \(k\ge 1\), (72) reduces to
$$\begin{aligned}&\mu \left[ \left( p\frac{(p+1)^2}{p^2}-(p+1)\right) {\text {tr}}\left( \varSigma ^{-1}X\varSigma ^{-1}X\right) \right. \nonumber \\&\qquad +\frac{(p+1)^2}{p^2}\sum _{k=1}^{p-1}2(p-k){\text {tr}}\left( \varSigma ^{-1-k}X\varSigma ^{-1+k}X\right) \nonumber \\&\qquad -\left. \sum _{k=1}^p 2(p+1-k){\text {tr}}\left( \varSigma ^{-1-k}X\varSigma ^{-1+k}X\right) \right] \end{aligned}$$
(73)
$$\begin{aligned}&\quad = \mu \left[ \frac{p+1}{p}{\text {tr}}\left( \varSigma ^{-1}X\varSigma ^{-1}X\right) \right. \nonumber \\&\qquad + \sum _{k=1}^{p-1}\beta _k{\text {tr}}\left( \varSigma ^{-1-k}X\varSigma ^{-1+k}X\right) \left. -2{\text {tr}}\left( \varSigma ^{-1-p}X\varSigma ^{-1+p}X\right) \right] , \end{aligned}$$
(74)
where
$$\begin{aligned} \beta _k=2\frac{(p+1)^2(p-k)}{p^2}-2(p+1-k). \end{aligned}$$
(75)
We find that \(\beta _k\ge 0\) if and only if \(k\le l := \lfloor p/2\rfloor \), where \(\lfloor \cdot \rfloor \) identifies the integer part of its argument. Thus, through repeated applications of Proposition 5, we see that (74) is less than or equal to
$$\begin{aligned} \mu \left( \frac{p+1}{p}+\sum _{k=1}^{l}\beta _k\right) {\text {tr}}\left( \varSigma ^{-1-l}X\varSigma ^{-1+l}X\right) -\mu \left( 2+\sum _{k=l+1}^{p-1}|\beta _k|\right) {\text {tr}}\left( \varSigma ^{-2-l}X\varSigma ^{l}X\right) \nonumber \\ = \mu \left( 2+\frac{(p-l-1)(l+2pl-p)}{p^2}\right) \left[ {\text {tr}}\left( \varSigma ^{-1-l}X\varSigma ^{-1+l}X\right) -{\text {tr}}\left( \varSigma ^{-2-l}X\varSigma ^{l}X\right) \right] , \end{aligned}$$
(76)
which is nonpositive by a final application of Proposition 5. This completes the proof of (70).

Finally, we extend the result to all real exponents \(r\in [0,1]\). Assume for a contradiction that there exists some \(r\in (0,1)\) and \(\varSigma _1,\varSigma _2\in S^+_n\) such that \(\varSigma _1\ge \varSigma _2\) and \(\varSigma _1^r < \varSigma _2^r\). Define \(E=\{x\in (0,1): \varSigma _1^x < \varSigma _2^x\}\) and note that \(E\ne \emptyset \) since \(r\in E\). As E is an open set in \(\mathbb {R}\), there exists some \(s\in \mathbb {Q}\cap E\) so that \(\varSigma _1^s < \varSigma _2^s\), which is a contradiction. Therefore, \(f_{r}\) is monotone for all \(r\in [0,1]\) with respect to any of the affine-invariant orders parametrized by \(\mu \). \(\square \)

Remark 2

The geometric insight provided by differential positivity clarifies the duality between the monotonicity of the function \(f_r:\varSigma \mapsto \varSigma ^{r}\) for \(0< r < 1\) and its non-monotonicity for \(r>1\), which may seem somewhat mysterious otherwise. Specifically, since the inverse of the function \(f_r\) is given by \(f_{1/r}\), we see that if \(f_r\) contracts affine-invariant cone fields for \(r\in (0,1)\) at every point, then \(f_{1/r}\) must expand the same cone fields. Indeed, if the contraction of \(\mathcal {K}\) by \(f_{r}\) is strict at some \(\varSigma \in S^+_n\), then \(f_{1/r}\) cannot be differentially positive with respect to \(\mathcal {K}\) and so is not monotone with respect to \(\le _{\mathcal {K}}\). See Fig. 3. To show that this is indeed the case for any of the affine-invariant cone fields (19), we note that at any \(\varSigma \in S^+_n\), \(X_{\varSigma }=\varSigma \in T_{\varSigma }S^+_n\) lies in the interior of \(\mathcal {K}(\varSigma )\), since \(({\text {tr}}(\varSigma ^{-1}X_{\varSigma }))^2-\mu {\text {tr}}(\varSigma ^{-1}X_{\varSigma }\varSigma ^{-1}X_{\varSigma })=n^2-\mu n\ > 0\) and \({\text {tr}}(\varSigma ^{-1}X_{\varSigma })={\text {tr}}(I)=n > 0\) for \(\mu \in (0,n)\). Let \(\varSigma ={\text {diag}}(\sigma _1,\sigma _2,\ldots ,\sigma _n)\) be any diagonal matrix in \(S^+_n\) with \(\sigma _1>\sigma _2\). As \(X_{\varSigma }=\varSigma \in {\text {int}}\mathcal {K}(\varSigma )\), there exists some \(\delta >0\) such that
$$\begin{aligned} X=(x_{ij}) = \begin{pmatrix} \sigma _1 &{} \delta \\ \delta &{} \sigma _2 \\ &{} &{} \sigma _3 \\ &{}&{}&{} \ddots \\ &{}&{}&{}&{}\sigma _n \\ \end{pmatrix} \end{aligned}$$
(77)
lies on the boundary of \(\partial \mathcal {K}(\varSigma )\). Specifically, we find that
$$\begin{aligned}&\left( {\text {tr}}(\varSigma ^{-1}X)\right) ^2-\mu {\text {tr}}\left( \varSigma ^{-1}X\varSigma ^{-1}X\right) \end{aligned}$$
(78)
$$\begin{aligned}&\quad = \left( \sum _i\frac{x_{ii}}{\sigma _i}\right) ^2 -\mu \left( \sum _i\frac{x^2_{ii}}{\sigma ^2_i}+\frac{2}{\sigma _1\sigma _2}\delta ^2\right) = n^2 - \mu \left( n+\frac{2}{\sigma _1\sigma _2}\delta ^2\right) \end{aligned}$$
(79)
vanishes when
$$\begin{aligned} \delta ^2 = \frac{n(n-\mu )\sigma _1\sigma _2}{2\mu }. \end{aligned}$$
(80)
Now for this choice of X, the inequality (55) of Proposition 5 with \(k=0\) becomes strict as
$$\begin{aligned} {\text {tr}}\left( \varSigma ^{-1}X\varSigma ^{-1}X\right) =n+\frac{2}{\sigma _1\sigma _2}\delta ^2 < n + \left( \frac{1}{\sigma _1^{2}}+\frac{1}{\sigma _2^{2}}\right) \delta ^2 = {\text {tr}}(\varSigma ^{-2}X^2), \end{aligned}$$
(81)
since \((1/\sigma _1-1/\sigma _2)^2 > 0\). As this inequality is used to derive (76), which is used to prove (69), it follows that the contraction of \(\mathcal {K}\) by \(f_r\) is strict at some \(\varSigma \in S^+_n\) for \(r\in (0,1)\). Therefore, \(f_r\) cannot be monotone with respect to \(\le _{\mathcal {K}}\) for \(r>1\).
Fig. 3

Contraction of affine-invariant cone fields by \(f_r:\varSigma \mapsto \varSigma ^r\) for \(0< r < 1\) corresponds to expansion of affine-invariant cone fields by the inverse map \(f_r^{-1}=f_{1/r}:\varSigma \mapsto \varSigma ^{1/r}\)

4.3 Matrix inversion

Consider the matrix inversion map \(f(\varSigma )=\varSigma ^{-1}\). The differential \(df\vert _{\varSigma }:T_{\varSigma }S^+_n\rightarrow T_{\varSigma ^{-1}}S^+_n\) of f is given by
$$\begin{aligned} df\vert _{\varSigma }X=-\varSigma ^{-1}X\varSigma ^{-1}. \end{aligned}$$
(82)
To show this, it is sufficient to consider the geodesic from \(\varSigma \) in the direction \(X\in T_{\varSigma }S^+_n\) given by
$$\begin{aligned} \gamma (t)=\varSigma ^{1/2}\exp (t\varSigma ^{-1/2}X\varSigma ^{-1/2})\varSigma ^{1/2}, \end{aligned}$$
(83)
and note that \((f\circ \gamma )(t)=\varSigma ^{-1/2}\exp (-t\varSigma ^{-1/2}X\varSigma ^{-1/2})\varSigma ^{-1/2}\) so that
$$\begin{aligned} (f\circ \gamma )'(0) = \varSigma ^{-1/2}(-\varSigma ^{-1/2}X\varSigma ^{-1/2})e^{-t\varSigma ^{1/2}X\varSigma ^{1/2}}\varSigma ^{-1/2}\big \vert _{t=0}=-\varSigma ^{-1}X\varSigma ^{-1}. \end{aligned}$$
(84)
Thus, \( {\text {tr}}(\varSigma \, (df\vert _{\varSigma }X)) =-{\text {tr}}(\varSigma ^{-1}X)\) and \( {\text {tr}}\left[ (\varSigma \, df\vert _{\varSigma }X)^2\right] = {\text {tr}}(\varSigma ^{-1}X\varSigma ^{-1}X)\). Therefore, noting the conditions in (19), it is clear that \(\varSigma \mapsto \varSigma ^{-1}\) reverses the ordering of positive definite matrices for any of the affine-invariant orders since
$$\begin{aligned} {\text {tr}}((f(\varSigma ))^{-1}(df\vert _{\varSigma }X)) =-{\text {tr}}(\varSigma ^{-1}X). \end{aligned}$$
(85)
That is,
$$\begin{aligned} \varSigma _1 \ge _{\mathcal {K}} \varSigma _2 \quad \implies \quad \varSigma _2^{-1} \ge _{\mathcal {K}} \varSigma _1^{-1}, \end{aligned}$$
(86)
for any of the affine-invariant cone fields \(\mathcal {K}\) in (19).

4.4 Scaling and congruence transformations

Consider the function \(S_{\lambda }:S^+_n\rightarrow S^+_n\) defined by \(S_{\lambda }(\varSigma )=\lambda \varSigma \), where \(\lambda >0\) is a scalar. The differential \(dS_{\lambda }\vert _{\varSigma }:T_{\varSigma }S^+_n\rightarrow T_{\lambda \varSigma }S^+_n\) is given by \(dS_{\lambda }\vert _{\varSigma } X=\lambda X\). Substituting into the formula for the family of quadratic affine-invariant cones (19), we find that
$$\begin{aligned}&\left[ {\text {tr}}\left( S_{\lambda }(\varSigma )^{-1}(dS_{\lambda }\vert _{\varSigma }X)\right) \right] ^2 - \mu {\text {tr}}\left( S_{\lambda }(\varSigma )^{-1}(dS_{\lambda }\vert _{\varSigma }X) S_{\lambda }(\varSigma )^{-1}(dS_{\lambda }\vert _{\varSigma }X)\right) \nonumber \\&\quad = \left[ {\text {tr}}\left( \frac{1}{\lambda }\varSigma ^{-1}\lambda X\right) \right] ^2-\mu {\text {tr}}\left( \frac{1}{\lambda }\varSigma ^{-1}\lambda X\right) ^2 = [{\text {tr}}(\varSigma ^{-1}X)]^2-\mu {\text {tr}}(\varSigma ^{-1}X)^2 \ge 0 \end{aligned}$$
(87)
for any \(X\in \mathcal {K}(\varSigma )\). Thus, \(S_{\lambda }\) is differentially positive and so preserves the affine-invariant orders induced by any of the cone fields (19). This is of course a special case of a more general result about congruence transformations \(\tau _{A}(\varSigma )=A\varSigma A^T\), where \(A\in GL(n)\). Congruence transformations can be thought of as generalizations of scaling transformations on \(S^+_n\). The preservation of affine-invariant orders by congruence transformations follows by construction. If \(\varSigma _1 \le _{\mathcal {K}} \varSigma _2\) for some partial order induced by an affine-invariant cone field \(\mathcal {K}\), then there exists a conal curve \(\gamma \) from \(\varSigma _1\) to \(\varSigma _2\). It follows from the definition of affine-invariant cone fields that congruence transformations map conal curves to conal curves in \(S^+_n\). That is, \(\tau _A(\gamma (t))\) is a conal curve joining \(\tau _A(\varSigma _1)\) to \(\tau _A(\varSigma _2)\).

4.5 Translations

It is important to note that translations do not generally preserve an affine-invariant order unless the associated affine-invariant cone field happens to also be translation invariant.

Proposition 6

Let \(\le _{\mathcal {K}}\) denote the partial order induced by an affine-invariant cone field \(\mathcal {K}\) on \(S^+_n\). If \(\mathcal {K}\) is not translation invariant, then there exists a translation \(T_C:S^+_n\rightarrow S^+_n\), \(T_C(\varSigma )=\varSigma +C\) that does not preserve \(\le _{\mathcal {K}}\).

Proof

If \(\mathcal {K}\) is not translation invariant, then there exist \(\varSigma _1,\varSigma _2\in S^+_n\) such that \(dT_{(\varSigma _2-\varSigma _1)}\vert _{\varSigma _1}\mathcal {K}(\varSigma _1)\ne \mathcal {K}(\varSigma _2)\), where \(T_{(\varSigma _2-\varSigma _1)}(\varSigma )=\varSigma +(\varSigma _2-\varSigma _1)\). Thus there exists some \(\delta \varSigma \) in the cone at either \(\varSigma _1\) or \(\varSigma _2\) that cannot be identified with an element of the cone at the other point under translation. Without loss of generality, assume that \(\delta \varSigma \in \mathcal {K}(\varSigma _1)\) and \(dT_{(\varSigma _2-\varSigma _1)}\big \vert _{\varSigma _1}(\delta \varSigma )\notin \mathcal {K}(\varSigma _2)\). For an affine-invariant cone field \(\mathcal {K}\), we have
$$\begin{aligned} \mathcal {K}(\lambda \varSigma )=d\tau _{\lambda ^{1/2}I}\big \vert _{\varSigma }\mathcal {K}(\varSigma )=dS_{\lambda }\big \vert _{\varSigma }\mathcal {K}(\varSigma )=\lambda \mathcal {K}(\varSigma )=\mathcal {K}(\varSigma ) \end{aligned}$$
(88)
for any \(\lambda >0\) and \(\varSigma \in S^+_n\). That is, the cone field is translationally invariant along each ray \(\gamma (t)=t\varSigma \), \(t>0\). Thus, we can identify \(\mathcal {K}(\varSigma _2)\) through translation with any cone \(\mathcal {K}(\lambda \varSigma _2)\) where \(\lambda >0\). It follows that \(dT_{(\lambda \varSigma _2-\varSigma _1)}\big \vert _{\varSigma _1}(\delta \varSigma ) \notin \mathcal {K}(\lambda \varSigma _2)\) for any \(\lambda >0\). For sufficiently large \(\lambda >0\), \(C:=\lambda \varSigma _2-\varSigma _1\) is a positive definite matrix. Therefore, \(T_C:S^+_n\rightarrow S^+_n\) is not differentially positive with respect to \(\mathcal {K}\) and hence is not monotone with respect to \(\le _{\mathcal {K}}\). \(\square \)

5 Invariant half-spaces

5.1 An affine-invariant half-space preorder

The \({\text {Ad}}_{O(n)}\)-invariant condition \({\text {tr}}(X)\ge 0\) on \(T_{I}S^+_n\) in (11) picks out a pointed cone from the double cone defined by the non-negativity of the quadratic form \(({\text {tr}}(X))^2-\mu {\text {tr}}(X^2)\). Indeed, \({\text {tr}}(X)\ge 0\) defines a half-space in \(T_{I}S^+_n\) bounded by the hyperplane \({\text {tr}}(X)=0\) in \(T_{I}S^+_n\). The affine-invariant extension of this hyperplane to all of \(S^+_n\) yields a distribution of rank \(\dim S^+_n-1 = n(n+1)/2-1\) on \(S^+_n\) given by \({\text {tr}}(\varSigma ^{-1/2}X\varSigma ^{-1/2})={\text {tr}}(\varSigma ^{-1}X)=0\) for \(X\in T_{\varSigma }S^+_n\). The corresponding affine-invariant half-space field \(\mathcal {H}_{\varSigma }\) on the tangent bundle \(TS^+_n\) simply takes the form
$$\begin{aligned} \mathcal {H}_{\varSigma }=\{X\in T_{\varSigma }S^+_n:{\text {tr}}(\varSigma ^{-1}X)\ge 0\}. \end{aligned}$$
(89)
A half-space field of this form induces a partial preorder \(\preceq _{\mathcal {H}}\) on \(S^+_n\). That is, a binary relation that is reflexive and transitive. The antisymmetry condition required for a preorder to be a partial order does not hold since \(\mathcal {H}_{\varSigma }\) is not a pointed cone. Nonetheless, one can ask whether any two given matrices \(\varSigma _1,\varSigma _2\in S^+_n\) satisfy \(\varSigma _1\preceq _{\mathcal {H}}\varSigma _2\), or if a given function on \(S^+_n\) is monotone with respect to the preorder induced by (89). The monotonicity of a function with respect to a preorder still gives geometric insight into the effects of the function on the space on which it acts and the discrete-time dynamics defined by its iterations.

To illustrate this we return to a puzzling aspect concerning the monotonicity of the function \(f_r(x)=x^r\) on the real line for \(r>0\) and its analogue result for positive semidefinite matrices. Namely, that the map \(f_r\) is monotone on \(S^+_n\) with respect to an affine-invariant partial order if \(r\in [0,1]\) but is not monotone on \(S^+_n\) for \(r>1\). We will show that the monotonicity on the real line for \(r>0\) is inherited in the matrix function setting in the form of a one-dimensional monotonicity expressed as the preservation of the affine-invariant half-space preorder for any \(r>0\).

Proposition 7

The function \(f_r:\varSigma \mapsto \varSigma ^r\) is monotone on \(S^+_n\) with respect to the affine-invariant half-space preorder \(\preceq _{\mathcal {H}}\) for any \(r>0\).

Proof

Let \(p,q\in \mathbb {N}\) be positive integers. The map \(f_{q/p}:\varSigma \mapsto \varSigma ^{q/p}\) can be written as the composition \(f_{1/p}\circ f_q\) with differential
$$\begin{aligned} df_{q/p}\vert _{\varSigma }=df_{1/p}\vert _{f_q(\varSigma )}\circ df_{q}\vert _{\varSigma }. \end{aligned}$$
(90)
Now since \(df_q\vert _{\varSigma }\) is given by
$$\begin{aligned} df_q\vert _{\varSigma }X=\sum _{j=0}^{q-1}\varSigma ^{q-1-j}X\varSigma ^j, \quad (X\in T_{\varSigma }S^+_n) \end{aligned}$$
(91)
and \(df_{1/p}\vert _{\varSigma }\) is the unique solution of the generalized Sylvester equation (58), the differential \(df_{q/p}\vert _{\varSigma }\) in (90) must satisfy
$$\begin{aligned} \sum _{i=0}^{p-1}(\varSigma ^{q/p})^{p-1-i}(df_{q/p}\vert _{\varSigma }X)(\varSigma ^{q/p})^i = \sum _{j=0}^{q-1}\varSigma ^{q-1-j}X\varSigma ^j. \end{aligned}$$
(92)
Multiplying both sides of this equation by \(\varSigma ^{-q}\) and taking the trace of the resulting equation yields
$$\begin{aligned} {\text {tr}}\left( \sum _{i=0}^{p-1}(\varSigma ^{q/p})^{-1-i}(df_{q/p}\vert _{\varSigma }X)(\varSigma ^{q/p})^i\right)&= {\text {tr}}\left( \sum _{j=0}^{q-1}\varSigma ^{-1-j}X\varSigma ^{j}\right) \end{aligned}$$
(93)
$$\begin{aligned} \implies \quad {\text {tr}}\left( \sum _{i=0}^{p-1}\varSigma ^{-q/p}(df_{q/p}\vert _{\varSigma }X)\right)&={\text {tr}}\left( \sum _{j=0}^{q-1}\varSigma ^{-1}X\right) \end{aligned}$$
(94)
$$\begin{aligned} \implies \quad p{\text {tr}}\left( \varSigma ^{-q/p}(df_{q/p}\vert _{\varSigma }X)\right)&= q {\text {tr}}(\varSigma ^{-1}X). \end{aligned}$$
(95)
That is, \({\text {tr}}\left( (f_{q/p}(\varSigma ))^{-1}df_{q/p}\vert _{\varSigma }X\right) =\frac{q}{p}{\text {tr}}(\varSigma ^{-1}X)\) for all \(X\in T_{\varSigma }S^+_n\). A standard argument based on the density of positive rational numbers in the positive real line \(\mathbb {R}_+\) gives
$$\begin{aligned} {\text {tr}}\left( (f_{r}(\varSigma ))^{-1}df_{r}\vert _{\varSigma }X\right) =r{\text {tr}}(\varSigma ^{-1}X) \end{aligned}$$
(96)
for any real \(r>0\). Therefore, we clearly have the implication
$$\begin{aligned} X\in \mathcal {H}_{\varSigma } \implies df_{r}\vert _{\varSigma }X\in \mathcal {H}_{f_r(\varSigma )} \end{aligned}$$
(97)
for all \(X\in T_{\varSigma }S^+_n\), which is precisely the local characterization of the monotonicity of \(f_r\) with respect to the preorder induced by \(\mathcal {H}_{\varSigma }\). \(\square \)

This result further highlights the natural connection between affine-invariance of causal structures on \(S^+_n\) and monotonicity of the matrix power functions \(f_r(\varSigma )=\varSigma ^r\). In particular, \(f_r\) is generally not monotone with respect to a preorder induced by a half-space field that is translation-invariant.

It should be noted that although the above proof has the virtue of being self-contained, Proposition 7 can also be proven using results from Sect. 3.3. Specifically, it should be clear from the material from that section that \(\varSigma _1\preceq _{\mathcal {H}}\varSigma _2\) if and only if \(\det \varSigma _1\preceq _{\mathcal {H}}\det \varSigma _2\), whence \(f_r:\varSigma \mapsto \varSigma ^r\) preserves \(\preceq _{\mathcal {H}}\) precisely when
$$\begin{aligned} \det \varSigma _1\le \det \varSigma _2 \implies \det f_r(\varSigma _1)\le \det f_r(\varSigma _2). \end{aligned}$$
(98)
Since \(\det f_r(\varSigma )=\det \varSigma ^r=r(\det \varSigma )\), this is clearly the case for any \(r>0\).
It is instructive to return to the \(n=2\) case to obtain a visualization of the rank 2 distribution \(\mathcal {D}_{\varSigma }=\partial \mathcal {H}\) that defines the affine-invariant preorder induced by \(\mathcal {H}_{\varSigma }\). As noted in Sect. 3.5, the set \(S^+_2\) can be identified with the interior of the quadratic cone K in \(\mathbb {R}^3\) given by \(z^2-x^2-y^2\ge 0\), \(z\ge 0\) via a bijection \(\phi :\varSigma \mapsto (x,y,z)\). At \(\varSigma = \phi ^{-1}(x,y,z)\in S^+_2\), the inequality \({\text {tr}}(\varSigma ^{-1}X)\ge 0\) takes the form \(\ z\delta z - x \delta x - y \delta y \ge 0\), where \((\delta x, \delta y, \delta z)\in T_{(x,y,z)}K\) as shown in (45). The distribution \(\partial \mathcal {H}\) that consists of the hyperplanes which form the boundary of the half-space field \(\mathcal {H}_{\varSigma }\) are given by \(\ z\delta z - x \delta x - y \delta y = 0\). This distribution is clearly integrable with integral submanifolds of the form \( z^2-x^2 - y^2 = C\), where \(C\ge 0\) is a constant for each of the integral submanifolds, which form hyperboloids of revolution as shown in Fig. 4. As expected, these surfaces coincide with the submanifolds of constant determinant predicted in Sect. 3.3.
Fig. 4

a An illustration of the affine-invariant hyperplanes \(\partial \mathcal {H}\) corresponding to \({\text {tr}}(\varSigma ^{-1}X) = 0\) against the backdrop of the cone \(K=\{(x,y,z)\in \mathbb {R}^3:z^2-x^2-y^2\ge 0, z\ge 0\}\) identified with \(S^+_2\). b The distributions integrate to give a family of hyperboloids of revolution parametrized by \(C > 0\). The limiting case \(C=0\) yields the boundary of the cone K

5.2 The Toda and QR flows

The Toda flow is a well-know Hamiltonian dynamical system on the space of real symmetric matrices of fixed dimension n, which can be expressed in the Lax pair form
$$\begin{aligned} \dot{X}(t)=[X,\pi _s(X)] = X\pi _s(X)-\pi _s(X)X, \end{aligned}$$
(99)
where \(\pi _s(X)\) is the skew-symmetric matrix \(\pi _s(X)= X_{ij}\) if \(i>j\), \(\pi _s(X)=0\) if \(i=j\), and \(\pi _s(X)=-X_{ji}\) if \(i<j\). The QR-flow is a related dynamical system on \(S^+_n\) that has close connections to the QR algorithm and is given by
$$\begin{aligned} \dot{\varSigma }(t)=[\varSigma ,\pi _s(\log \varSigma )]. \end{aligned}$$
(100)
The Lax pair formulations of the Toda and QR-flows show that these flows are isospectral. That is, the eigenvalues of X(t) and \(\varSigma (t)\) are independent of t. Isospectral flows clearly preserve all translation invariant orders that possess spectral characterizations.

In [15], the following theorem is established for the projected Toda and QR flows. The projected flows refer to projections of the flows to the \(r\times r\) upper left corner principal submatrices of X(t) and \(\varSigma (t)\), i.e., the flows of \(X_r(t)=E_r^TX(t)E_r\) and \(\varSigma _r(t)=E_r^T\varSigma (t)E_r\), where \(E_r^T= [I_r \; 0]\).

Theorem 7

For \(1\le r \le n\) and any symmetric matrix X(0) and symmetric positive definite matrix \(\varSigma (0)\), the ordered eigenvalues of the projected Toda flow orbit \(X_r(t)=E_r^TX(t)E_r\) and the projected QR flow orbit \(\varSigma _r(t)=E_r^T\varSigma (t)E_r\) are nondecreasing functions of t.

Corollary 1

Let f(x) be any nondecreasing real-valued function and \(\alpha >0\). Then \(F(t)={\text {tr}}(f(E_r^TX(t)E_r))\) and \(G(t)={\text {tr}}(f(E_r^T\varSigma (t)^{\alpha }E_r))\) are nondecreasing functions of t for \(t\in \mathbb {R}\).

The geometric interpretation of the above corollary is that the generalized projected Toda and QR flows, \(f(X_r(t))\) and \(f(\varSigma _r(t))\), respectively, preserve the half-space preorder induced by the translation invariant half-space \({\text {tr}}(X)\ge 0\). This is clear by noting that if \(X(0),\hat{X}(0)\) are symmetric matrices such that \({\text {tr}}(X(0)-\hat{X}(0))\ge 0\), then
$$\begin{aligned} {\text {tr}}(f(X_r(t))-f(\hat{X}_r(t))) \ge {\text {tr}}(X(0)-\hat{X}(0)) \ge 0, \quad \forall t>0, \end{aligned}$$
(101)
and similarly for the generalized projected QR flow.

6 Matrix means

Notions of means and averaging operations on matrices are of great interest in matrix analysis and operator theory with numerous applications to fields such as radar data processing, medical imaging, statistics and machine learning. Adapting basic properties of means on the positive real line to the setting of positive definite matrices, we may define a matrix mean to be a continuous map \(M:S^+_n\times S^+_n\rightarrow S^+_n\) that satisfies the following properties
  1. 1.

    \(M(\varSigma _1,\varSigma _2)=M(\varSigma _2,\varSigma _1)\)

     
  2. 2.

    \(\varSigma _1 \le \varSigma _2 \implies \varSigma _1 \le M(\varSigma _1,\varSigma _2) \le \varSigma _2\)

     
  3. 3.

    \(M(A^T\varSigma _1 A, A^T\varSigma _2 A) = A^T M(\varSigma _1,\varSigma _2)A\), for all \(A\in GL(n).\)

     
  4. 4.

    \(M(\varSigma _1,\varSigma _2)\) is monotone in \(\varSigma _1\) and \(\varSigma _2\).

     
In the existing literature on matrix means, the partial order \(\le \) in the above definition refers to the Löwner order \(\le _{L}\). It is a nontrivial question whether a given map \(M:S^+_n\times S^+_n\rightarrow S^+_n\) defines a matrix mean with respect to any of the new partial orders considered in this paper. A particularly important matrix mean that has been the subject of considerable interest in recent years is the geometric mean \(M(\varSigma _1,\varSigma _2)=\varSigma _1\#\varSigma _2\) defined by
$$\begin{aligned} \varSigma _1\#\varSigma _2 = \varSigma _1^{1/2}\left( \varSigma _1^{-1/2}\varSigma _2\varSigma _1^{-1/2}\right) ^{1/2}\varSigma _1^{1/2}. \end{aligned}$$
(102)
The following theorem shows that the geometric mean and affine-invariant orders on \(S^+_n\) are intimately connected.

Theorem 8

The geometric mean \(\#\) (102) defines a matrix mean for any affine-invariant order \(\le \) on \(S^+_n\).

Proof

The geometric mean \(\varSigma _1\#\varSigma _2\) of two points \(\varSigma _1,\varSigma _2\in S^+_n\) is the midpoint of the geodesic joining \(\varSigma _1\) and \(\varSigma _2\) in \(S^+_n\) endowed with the standard Riemannian metric \(ds^2={\text {tr}}[(\varSigma ^{-1}d\varSigma )^2]\) [5]. This geometric interpretation immediately implies \(\varSigma _1\#\varSigma _2=\varSigma _2\#\varSigma _1\). Furthermore, given any affine-invariant order \(\le _{\mathcal {K}}\) induced by an affine-invariant cone field \(\mathcal {K}\) and a pair of matrices satisfying \(\varSigma _1\le _{\mathcal {K}}\varSigma _2\), the geodesic \(\gamma :[0,1]\rightarrow S^+_n\) from \(\varSigma _1\) to \(\varSigma _2\) is a conal curve by Theorem 4. Hence, the midpoint \(\varSigma _1\#\varSigma _2\) of \(\gamma \) clearly satisfies \(\varSigma _1 \le _{\mathcal {K}} \varSigma _1\#\varSigma _2 \le _{\mathcal {K}} \varSigma _2\). Since congruence transformations are isometries, for any \(A\in GL(n)\) the geodesic connecting \(A^T\varSigma _1 A\) to \(A^T\varSigma _2 A\) is given by \(\tilde{\gamma }(t)=A^T\gamma (t) A\). Thus, \((A^T\varSigma _1A)\#(A^T\varSigma _2A)=A^T( \varSigma _1\#\varSigma _2)A\). Finally, for fixed \(\varSigma _1\in S^+_n\), the function \(F(\varSigma )=\varSigma _1\#\varSigma \) is monotone with respect to any affine-invariant order since congruence transformations preserve affine-invariant orders and the function \(\varSigma \mapsto \varSigma ^{1/2}\) is monotone for any affine-invariant order. By symmetry, \(\#\) is also monotone with respect to its first argument. That is, the four conditions that define a matrix mean are all satisfied by the geometric mean for any choice of affine-invariant order. \(\square \)

7 Conclusion

The choice of partial order is a key part of studying monotonicity of functions that is often taken for granted. Invariant cone fields provide a geometric approach to systematically construct ‘natural’ orders by connecting the geometry of the state space to the search for orders. Coupled with differential positivity, invariant cone fields provide an insightful and powerful method for studying monotonicity, as shown in the case of \(S^+_n\). Future work can focus on exploring the applications of the new partial orders presented in this paper to the study of dynamical systems and convergence analysis of algorithms defined on matrices. It may also be fruitful to explore the implications of this work in convexity theory. New notions of partial orders mean new notions of convexity. In this context it may be natural to consider the concept of geodesic convexity on \(S^+_n\) with respect to the Riemannian structure on \(S^+_n\), as well as the usual notion of convexity on sets of matrices that is based on translational geometry.

Notes

Acknowledgements

We should like to thank the anonymous referees whose reviews resulted in significant improvements to the quality of this paper. We are particularly grateful to the reviewer who suggested the elegant characterization of affine-invariant pseudo-Riemannian structures presented in Sect. 3.2 and important clarifications in Sect. 3.4.

References

  1. 1.
    Anderson, W.N., Trapp, G.E.: A class of monotone operator functions related to electrical network theory. Linear Algebra Appl. 15(1), 53–67 (1976)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Ando, T.: Concavity of certain maps on positive definite matrices and applications to hadamard products. Linear Algebra Appl. 26, 203–241 (1979)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Banyaga, A., Hurtubise, D.: Lectures on Morse Homology, vol. 29. Springer Science & Business Media, Berlin (2013)zbMATHGoogle Scholar
  4. 4.
    Barbaresco, F.: Innovative tools for radar signal processing based on Cartan’s geometry of SPD matrices & information geometry. In: Proceedings of the IEEE International Radar Conference, IEEE , Rome, Italy (2008)Google Scholar
  5. 5.
    Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)zbMATHGoogle Scholar
  6. 6.
    Bhatia, R.: Matrix Analysis, vol. 169. Springer Science & Business Media, Berlin (2013)zbMATHGoogle Scholar
  7. 7.
    Bourin, J.C.: Some inequalities for norms on matrices and operators. Linear Algebra Appl. 292(1), 139–154 (1999)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Burbea, J., Rao, C.: Entropy differential metric, distance and divergence measures in probability spaces: a unified approach. J. Multivariate Anal. 12(4), 575–596 (1982)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Forni, F., Sepulchre, R.: Differentially positive systems. IEEE Trans. Autom. Control 61(2), 346–359 (2016)Google Scholar
  10. 10.
    Fujii, J.I.: A trace inequality arising from quantum information theory. Linear Algebra Appl. 400, 141–146 (2005)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Heinz, E.: Beitrge zur strungstheorie der spektralzerlegung. Mathematische Annalen 123, 415–438 (1951)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hilgert, J., Ólafsson, G.: Causal Symmetric Spaces: Geometry and Harmonic Analysis, vol. 18. Elsevier, Amsterdam (1997)zbMATHGoogle Scholar
  13. 13.
    Hilgert, J., Hofmann, K.H., Lawson, J.: Lie Groups, Convex Cones, and Semi-groups. Oxford University Press, Oxford (1989)zbMATHGoogle Scholar
  14. 14.
    Kubo, F., Ando, T.: Means of positive linear operators. Mathematische Annalen 246, 205–224 (1979)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Lagarias, J.C.: Monotonicity properties of the toda flow, the qr-flow, and subspace iteration. SIAM J Matrix Anal Appl 12(3), 449–462 (1991)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lawson, J.D.: Polar and Ol’shanski decompositions. Seminar Sophus Lie 1, 163–173 (1991)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Löwner, K.: Über monotone matrixfunktionen. Mathematische Zeitschrift 38, 177–216 (1934)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Neeb, K.H.: Conal orders on homogeneous spaces. Inventiones mathematicae 104(1), 467–496 (1991)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Nielsen, M.A., Chuang, I. Quantum computation and quantum information. Cambridge University Press (2002)Google Scholar
  20. 20.
    Pedersen, G.K.: Some operator monotone functions. Proc. Am. Math. Soc. 36(1), 309–310 (1972)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Pennec, X.: Statistical Computing on Manifolds for Computational Anatomy. Habilitation à diriger des recherches, Université Nice Sophia Antipolis (2006). https://tel.archives-ouvertes.fr/tel-00633163
  22. 22.
    Segal, I.E.: Mathematical Cosmology and Extragalactic Astronomy, vol. 68. Academic Press (1976). https://www.elsevier.com/books/mathematical-cosmology-and-extragalactic-astronomy/segal/978-0-12-635250-4
  23. 23.
    Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Stat. 11(4), 211–223 (1984)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Yang, Z., Feng, X.: A note on the trace inequality for products of hermitian matrix power. JIPAM J. Inequal. Pure Appl. Math. [electronic only] 3, 5 (2002)Google Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of EngineeringUniversity of CambridgeCambridgeUK

Personalised recommendations