The Expanding Universe of the Geometric Mean

In this paper the authors seek to trace in an accessible fashion the rapid recent development of the theory of the matrix geometric mean in the cone of positive definite matrices up through the closely related operator geometric mean in the positive cone of a unital $C^*$-algebra. The story begins with the two-variable matrix geometric mean, moves to the breakthrough developments in the multivariable matrix setting, the main focus of the paper, and then on to the extension to the positive cone of the $C^*$-algebra of operators on a Hilbert space, even to general unital $C^*$-algebras, and finally to the consideration of barycentric maps that grow out of the geometric mean on the space of integrable probability measures on the positive cone. Besides expected tools from linear algebra and operator theory, one observes a surprisingly substantial interplay with geometrical notions in metric spaces, particularly the notion of nonpositive curvature. Added features include a glance at the probabilistic theory of random variables with values in a metric space of nonpositive curvature, and the appearance of related means such as the inductive and power means.


Forward
This manuscript is an expanded and more detailed ArXiv version of the authors' earlier rather similar article Following the Trail of the Operator Geometric Mean [20].Some twenty years ago the authors published an article [15] in the Monthly that treated in some depth the two-variable matrix geometric mean and that has been rather widely read and cited.In this paper we seek to describe the significant advancement and broad generalization of the theory that has taken place in these twenty years.

Introduction
The problem of "squaring a rectangle" is the problem of constructing the side of a square that has the same area as a given rectangle.Such a construction is given by Euclid in Book II of the Elements.If the sides of the rectangle are a and b, then the side of the square has length √ ab, the geometric mean of a and b.Because of their interest in proportions and musical ratios, the Greeks defined some 2500 years ago at least eleven different means, with the best known ones being the arithmetic, geometric, harmonic, and golden.The subject of (binary) means for positive numbers or line segments has a rich mathematical lineage dating back into antiquity.The study of various means on the positive reals and their properties has continued off and on throughout the history of mathematics up to the present day.
The appropriate definition of the geometric mean for two positive definite matrices of the same size seems to have first appeared in 1975 in a paper of Pusz and Woronowicz [27].Ando [2] provided the first systematic development of many of its basic properties and gave equivalent characterizations and also applications to matrix inequalities that are otherwise difficult to prove.
In an article [15] appearing some twenty years ago in the Monthly the authors presented eight characterizations or prominent properties of the classical geometric mean and showed how each extended to the matrix geometric mean setting, providing convincing documentation that the name "matrix geometric mean" was most appropriate.As hinted in the Forward, that theory has now advanced to a multivariable setting for both positive matrices and operators and beyond, as we will trace out.
Positive definite matrices have become fundamental computational objects in many areas of engineering, statistics, quantum information, applied mathematics, and elsewhere.They appear as "data points" in a diverse variety of settings: covariance matrices in statistics, elements of the search space in convex and semidefinite programming, kernels in machine learning, observations in radar imaging, and diffusion tensors in medical imaging, to cite only a few.A variety of computational algorithms have arisen for approximation, interpolation, filtering, estimation, and averaging.Our interest focuses on the last named, the process of finding an average or mean, which is again positive definite.In recent years it has been increasingly recognized that the Euclidean distance is often not the most suitable for the space P of positive definite matrices and that working with the appropriate geometry does matter in computational problems; see e.g.[4], [25].The matrix geometric mean grows out of the geometric structure of P, which makes it a particularly suitable averaging tool in a variety of settings.

Positive Definite Matrices
Let M m (C), or simply M m , denote the set of m × m complex matrices.We may identify M m with the set of linear operators on C m , where we consider C m to be a complex Hilbert space of column vectors with the usual hermitian inner product.
Denoting the conjugate transpose of A ∈ M m by A * , we recall that A is hermitian The hermitian matrix A is positive definite if ∀u = 0, u, Au > 0. These notions readily generalize to B(H), the algebra of operators on an arbitrary Hilbert space.
The following are well-known equivalences for a hermitian matrix A to be positive definite (with the definition appearing first): (1) Ax, x > 0 for all 0 = x, where •, • is the Hilbert space inner product on C m .(2) A = BB * for some invertible B. Every positive definite (resp.hermitian) matrix operator has a unique spectral decomposition where the λ i > 0 (resp.λ i ∈ R) range over the distinct eigenvalues of A and E i is the orthogonal projection onto the eigenspace of λ i .One then has from which one can easily deduce that every positive definite matrix has a unique positive definite p th -root.We also note that the exponential map is given alternatively by from which we can quickly deduce the equivalence of items ( 3) and (4) in the previous list of equivalent characterizations of positive definite matrices.
We define a partial order (sometimes called the Loewner order ) on the vector space H m of hermitian matrices by A ≤ B if B − A is positive semidefinite.We note 0 ≤ A iff A is positive semidefinite and write 0 < A if A ∈ P iff A is positive definite.The matrix A is sometimes called strictly positive in this setting.
For any invertible M ∈ M m (C), the congruence transformation Γ M (X) = MXM * is an invertible linear map on M m that carries each of H, P, and P, the convex cone of positive semidefinite matrices, onto itself.It follows that congruence transformations preserve the Loewner order on H: A ≤ B implies MAM * ≤ MBM * for M invertible.We note, in particular, for each M ∈ P, Γ M (X) = MXM is a congruence transformation on P. Matrix inversion A → A −1 maps P onto itself and reverses the Loewner order, as we shall see later.
The geometry of P will be crucial in what follows.One important approach to geometry is that of Felix Klein's Erlangen Program, which emphasized the importance of the group of transformations or "symmetries" that preserved basic geometric properties.For the study of P this group, denoted G(P), is the one generated by the congruence transformations and the inversion map, which acts as the point reflection through the identity matrix.
Remark 3.1.For A, B ∈ P, by the previous item (5), there exists a unitary U such that U(A −1/2 BA −1/2 )U * = D, a diagonal matrix, and hence Γ U A −1/2 carries A to I and B to D. This observation allows various results about A, B ∈ P to be reduced to the case A = I and B is a diagonal matrix.
The arithmetic and harmonic means readily extend from R >0 := (0, ∞) to the set P of positive definite matrices: The geometric mean is not so obvious (e.g., √ AB, the square root of AB with positive eigenvalues, need not be positive definite for A, B positive definite).One approach is to rewrite the equation x 2 = ab (which has positive solution the geometric mean of a and b) in its appropriate form in the noncommutative setting and solve for X: Definition 3.2.The matrix geometric mean A#B of A, B ∈ P is given by Alternatively it can be characterized as the unique positive definite solution X of the elementary Riccati equation By inverting both sides XA −1 X = B and multiplying through by X on the right and left, one obtains XB −1 X = A. Since the second equation is equivalent to the first, we see A#B = B#A.Similarly one can use the Riccati equation to show that the matrix geometric mean is invariant under congruence mappings and inversion, i.e., and this along with order invariance under congruence maps shows that the inversion map is order reversing.
We collect these properties together with other basic properties that can be deduced by elementary arguments; see [15].
Proposition 3.3.The following hold in P:

By the Loewner-Heinz inequality
From this, using congruence invariance, one obtains the important property (iv).
As mentioned in the introduction, other formulations and connections between the matrix geometric mean and the one for positive real numbers may be found in [15].

The Riemannian Metric
The tools needed for extending the binary geometric mean on P to a multivariable one have relied on the metric and geometric structure of P. Such considerations had already begun in the binary setting; see [15,Section 4].We briefly overview the necessary tools in this section; see [15,Section 4] and [5,Chapter 6] for details.
We equip the space H of hermitian matrices of some fixed dimension m with the Frobenius inner product A, B = Tr(AB), the trace of AB, which makes H a Hilbert space.The corresponding norm A 2 = A, A 1/2 is called the Frobenius or Hilbert-Schmidt norm.We can write A = UDU * for some unitary U, where D is a diagonal matrix with entries the eigenvalues of A. We now observe that (4.1) , where {λ i } m i=1 is the set of eigenvalues of A. We define the Riemannian metric δ on P by , where {λ i } are the eigenvalues of A −1/2 BA −1/2 .In the last expression we may replace A −1/2 BA −1/2 by BA −1 , since the two are similar and hence have the same eigenvalues.
We list basic properties of the Riemannian metric δ.
Proposition 4.1.The Riemannian metric δ is a metric making P a complete metric space exhibiting the following properties: (1) For M ∈ GL m (C), the group of m × m invertible matrices, the congruence transformation Γ M : (P, δ) → (P, δ) defined by Γ M (X) = MXM * is an isometry.The inversion map A → A −1 is also an isometry on P.
(2) The exponential map exp : (3) The exponential map restricted to any one-dimensional subspace of H is an isometry.Furthermore, any metric on P that has this property and is invariant under congruence transformations must agree with δ. (4) For A, B ∈ P, A#B is the unique metric midpoint between A and B.

Now let d(•,
•) be a metric satisfying the two properties of (3).For A, B ∈ P, For the midpoint property in (4), since congruence mappings preserve δ and #, it suffices to consider the case A#B = I (otherwise first apply Γ (A#B) −1/2 ).Then Applying log we obtain log B = − log A, so restricting exp to the one-dimensional subspace R • A yields the result by (3).
A proof of ( 2) is sketched in [15], and a shorter and more elegant proof appears in [5,Chapter 6].
For the triangular inequality, consider first the case A, B, C with A = I.Then The general case follows using (1) and the congruence transformation and its inverse.
By (2) and continuity of the exponential map, the metric δ is complete.
For a metric space (X, d) the metric is said to satisfy the semiparallelogram law if for all x 1 , x 2 ∈ X, there exists m ∈ X such that for any x ∈ X, the general case reduces to this one.See, for example, [15] for further details.

Means of several variables
Formally a mean of order n, or n-mean for short, on a set X is a function γ : X n → X satisfying the idempotency condition: ∀x ∈ X, γ(x, x, . . ., x) = x.It is frequently assumed in the definition of a mean that it is symmetric, that is, invariant under any permutation of variables.When we speak of an omnivariable mean γ = {γ n }, we are referring to one defined for all X n , n ≥ 1. (For n = 1, γ 1 is the identity, and is thus frequently ignored.)The mean γ : X n → X is continuous or a topological mean if X is a topological space and γ is continuous.Frequently a mean represents some type of averaging operator.
In 2004 T. Ando, C. K. Li and R. Mathias [3] gave the first extension of the binary geometric mean to an omnivariable mean on P, which came to be called the ALM mean.For three variables A, B, C, they first took the new three point set mean for all n > 2. We note that Lawson and Lim [16] later extended the ALM construction to a rather wide class of metric spaces.
Ando, Li, and Mathias made two important contributions in their paper.First of all, they identified axiomatic properties that an omnivariable geometric mean γ should satisfy.They then established that the ALM mean they had defined satisfied all these properties.The proofs typically involved extending from the known case of n = 2 by induction.(P1) (Consistency with scalars) (AGH is short for arithmetic-geometric-harmonic.)D. Bini, B. Meini, and F. Poloni [9] later gave a variant of the ALM mean that retained its properties, but was much more computationally efficient.
A weight w of length n is an n-tuple (w 1 , . . ., w n ) where 0 < w i ≤ 1 for each i and n k=1 w k = 1, and a weighted n-tuple of a set X is a pair (w, x), where w is a weight of length n and x = (x 1 , . . ., x n ) ∈ X n .We think of this as convenient notation for an ordered n-tuple of weighted points (x i , w i ).An n-variable weighted mean γ on a set X assigns to each weighted n-tuple (w, x) some γ(w, x) ∈ X with the extra condition that γ(w, (x, x, . . ., x)) = x.We may think of γ(w, x) as the assigning of a "center of mass." Remark 5.2.The weighted geometric mean A# t B (with w = (1 − t, t) ) is given by The map α : [0, δ(A, B)] → P defined by α(t) = A# t B is an isometry onto the geodesic arc between A and B.
Rather obvious variants of the axiomatic properties (P1)-(P10) exist in the weighted mean setting and were introduced and studied in [14].There a weighted version of the mean given by Bini, Meini, and Poloni was introduced, which could also be extended to more general metric spaces.
Closely related to the notion of a omnivariable weighted mean is that of a barycentric map.Denote by P <∞ (X) the set of all finitely supported probability measures on X, measures of the form n k=1 w k δ x k , where (w 1 , . . ., w n ) is a weight and δ x is the unit point mass at x.A barycentric map in its simplest manifestation is a map β : P <∞ (X) → X such that β(δ x ) = x for each x ∈ X.A barycentric β gives rise to a corresponding multivariable weighted mean defined by where w = (w 1 , . . ., w n ) is a weight.We note, however, that it is not the case that all weighted means arise from barycentric maps.If we restrict to uniform weights (w i = 1/n for each i), we are essentially in the setting of non-weighted means.
Soon after the introduction of the ALM mean, a better candidate for the multivariable matrix geometric mean was put forth to which we now turn.Let (M, d) be a metric space.The least squares mean Λ(a 1 , . . ., a n ) is defined as the solution to the optimization problem of minimizing the sum of distances squared Λ(a 1 , . . ., a n ) = arg min where m(x, y) is the unique midpoint between x and y.) E. Cartan considered such "barycenters" in the case of Riemannian manifolds, where they uniquely exist for the ones of nonpositive curvature and exist locally much more generally, and M. Fréchet considered them in more general metric spaces.
Thus the least squares mean is also called the Cartan mean or Fréchet mean.Such means also frequently go by the name "Karcher means," but Karcher's approach from differential geometry involved finding the solution of the "Karcher equation" that was satisfied at this extremum [11].We return to this in a later section.
M. Moakher (2005) [24] first introduced and studied the least squares mean for the set of positive definite matrices P equipped with the Riemannian metric as a omnivariable generalization of the two-variable geometric mean.Independently R.
Bhatia and J. Holbrook (2006) [6,7] introduced and studied the least squares mean in the weighted setting.These authors established its (unique) existence and verified most of the axiomatic properties (P1)-(P10) satisfied by the Ando-Li-Mathias geometric mean: consistency with scalars, joint homogeneity, permutation invariance, congruence invariance, and self-duality (the last two being true since congruence transformations and inversion are isometries).Further, based on computational experimentation, Bhatia and Holbrook conjectured monotonicity for the least squares mean (property (P4) in the earlier list), but this was left as an open problem.

The Inductive Mean, Random Variables, and Monotonicity
One other mean played an important role in what followed, one that we shall call the inductive mean, following the terminology of K.-T.Sturm (2003) [29].It appeared elsewhere in the work of M. Sagae and K. Tanabe (1994) [28] and Ahn, Kim, and Lim (2007) [1].It is defined inductively for Hadamard spaces (or more generally for metric spaces with weighted binary means x# t y) for each k ≥ 2 by S 2 (x, y) = x#y and for ) Note that this mean at each stage is defined from the previous stage by taking the appropriate two-variable weighted mean, which is monotone for the Hadamard space P. Thus the inductive mean is monotone (property (P4)).

Alternatively we may give
Later Y. Lim and M. Palfia [22] obtained more general results about deterministic walks yielding the least squares mean in general Hadamard spaces.

Note:
The ALM mean is typically distinct from the least squares mean for n ≥ 3. Thus the ALM axioms do not characterize a mean.The latter fact had already been noted by Bini, Meini and Poloni (2010), who observed their variant of the ALM mean was different from it [9].In [21], Lim and Palfia showed that the Karcher mean is uniquely determined by congruence invariancy (P6), self-duality (P8), and the following Yamazaki inequality [30]:

The Karcher Equation
The uniform convexity of the Riemannian metric δ on P yields that the least squares mean is the unique critical point for the function X → n k=1 w k δ 2 (X, A k ).The least squares mean is thus characterized by the vanishing of the gradient, which is equivalent to its being a solution of the following Karcher equation: The Karcher equation ( 8.3) can also be used to define a mean on the cone P of positive invertible bounded operators on an infinite-dimensional Hilbert space (where one no longer has a Hadamard space), called the Karcher mean.As we just previously noted, restricted to the matrix setting it yields the least squares mean.Thus it is reasonable to continue to denote it by Λ(w, A 1 , . . ., A n ).
Power means for positive definite matrices were introduced by Lim and Palfia (2012) [21].
Theorem.Let A 1 , . . ., A n ∈ P and let w = (w 1 , . . ., w n ) be a weight.Then for each t ∈ (0, 1], the following equation has a unique positive definite solution X = P t (w; A 1 , . . ., A n ), called the t-weighted power mean: w k (X# t A i ).
When restricted to the positive reals, the power mean reduces to the usual power mean P t (w; a 1 , . . ., a n ) = w 1 a t 1 + • • • + w n a t n 1 t .
In 2014 Lawson and Lim showed that the preceding notion of power mean extended to the setting of bounded operators on a Hilbert space [18] and established that the w k log(X −1/2 A k X −1/2 ) = 0. details.However, the infinite dimensional P is not a Hadamard space, so requires new approaches; see [23].

Summary
In the preceding we have attempted to outline the high points of the striking development of the theory of the matrix/operator geometric mean on the cone of positive matrices/operators in the past twenty years.Over this period of time it has evolved from a two-variable matrix mean to a omnivariable matrix mean (the least squares, Cartan, or Frechét mean) to an operator mean in the setting of unital C * -algebras (the Karcher mean) and finally to a barycentric map on the space of integrable Borel probability measures.At each stage of the evolution significant new insights and developments were necessary.The theory has drawn heavily from matrix and operator theory and at the same time from geometric notions.And along the way we have seen a variety of characterizations of this mean: the least squares mean, the probabilistic or deterministic characterization as a limit of a "walk" with the inductive mean, the solution of the Karcher equation, and the limit/infimum of the power means {P t } for t ց 0. Whatever future developments may hold, it is clear that a substantial theory has already emerged.

( 3 )
A has all positive eigenvalues.(4) A = exp B = ∞ k=0 B k /k! for some (unique) hermitian B. (5) A = UDU * for some unitary U and diagonal D with positive diagonal entries.The positive definite m×m-matrices form an open cone in H m , the m×m hermitian matrices, with closure the positive semidefinite matrices (equivalently, Ax, x ≥ 0 for all x).We denote the open cone of positive definite matrices by P (or P m if we need to distinguish the dimension).The exponential map exp : H → P is an analytic diffeomorphism with inverse analytic diffeomorphism log : P → H.

unique and is the unique metric midpoint between x 1 and x 2 .
If the inequality is replaced by an equality, one obtains a version of the parallelogram law holding in Hilbert spaces.The semiparallelogram law is a metric version of nonpositive curvature (NPC).We define a Hadamard space to be a complete metric space satisfying the semiparallelogram law.These spaces have been and continue to be widely studied and appear under the alternative names of global NPC-spaces or CAT(0)-spaces.Using Proposition 4.1 it is straightforward to show that (P, δ) is a Hadamard space.One first considers the case that A#B = I and another point C. Then the parallelogram law holds in the Hilbert space H for log A, log B = − log A, and log C, and one uses Proposition 4.1(2) to obtain the semiparallelogram law for A, B, C. Via 4.1(1)

Corollary 4 . 2 .
The space (P, δ) is a Hadamard space.Remark 4.3.A more sophisticated approach to the results of this section is the path of Riemannian geometry.The open cone P m of m × m positive definite matrices becomes a well-known Riemannian manifold when equipped with the trace Riemannian metric: X, Y A = trA −1 XA −1 Y, where A ∈ P m and X, Y are m × m Hermitian matrices.The corresponding distance metric on P m is precisely our metric δ, and this is the source of the name "Riemannian metric."The distance metric of a simply connected Riemannian manifold satisfies (NPC) iff the manifold has nonpositive curvature in the usual sense.

{A 1 :
= B#C, B 1 := A#C, C 1 := A#B} consisting of the geometric mean (midpoint) of each pair, then repeated this construction on the new three point set and continued repeating the operation inductively.They showed the triples approached a common point, their mean for the case n = 3.They extended it inductively to an n-variable

d 2
(x, a i ), and the weighted least squares mean Λ(w; a 1 , . . ., a n ) for w = (w 1 , . . ., w n ) by Λ(w; a 1 , . . ., a n ) = arg min x∈M n i=1 w i d 2 (x, a i ), provided the solution uniquely exists in each of the respective cases.Note that the least squares mean is equal to the weighted least squares mean for the uniform weight with all entries 1/n.The unique solution of the minimizing problem exists in Hadamard spaces [29, Proposition 1.7], since the non-negative function defined by

Theorem 6 . 1 . 3 ( 1 ) 2 )
∞ j=1 N n the product probability, making it a probability space, and define a family of i.i.d.random variables {X k } by X k (ω) = x ω(k) .We replace the traditional sum of the first k random variables by σ ω,x (k) and take for the expected value Λ(w; x 1 , . . ., x m ).From this viewpoint we have the following special case of Sturm's Law of Large Numbers for Hadamard spaces[29, Theorem 4.7]:Sturm's Theorem.Giving ∞ n=1 N m the product probability, the set ω ∈ ∞ k=1 N m : lim k→∞ σ ω,x (k) = Λ(w; x 1 , . . ., x m ) has measure 1, i.e., σ ω,x (k) → Λ(w; x 1 , . . ., x m ) as k → ∞ for almost all ω.We shall briefly return to the theory of random variables on a probability space taking values in P, or more generally in a Hadamard space, at a later point.Using the preceding version of Sturm's Theorem, Lawson and Lim (2011)[17] provided a positive solution to the earlier mentioned conjecture of Bhatia and Holbrook about the monotonicity of the least squares mean.Let P be the open cone of positive definite matrices of some fixed dimension, and let n ≥ The [weighted] least squares mean Λ on P is monotone:A i ≤ B i for 1 ≤ i ≤ n implies Λ(A 1 , . . ., A n ) ≤ Λ(B 1 , . . .B n ) [Λ(w; A 1 , . . ., A n ) ≤ Λ(w; B 1 , . . .B n )].(The other nine (weighted) ALM axioms hold for Λ.Proof.Assume forA = (A 1 , . . ., A m ) and B = (B 1 , . . ., B m ) that A i ≤ B i for 1 ≤ i ≤ m.Let w be a weight.By Sturm's theorem applied to the "walks" for A and for B, we haveσ ω,A (k) → Λ(w; A 1 , . . ., A n ) and σ ω,B (k) → Λ(w; B 1 , . . ., B n ) as k → ∞ for almost all ω ∈ ∞ n=1 N m (since the intersection of two sets of measure 1 has measure 1).Fixing any such ω, we obtain part (1) since the partial order relation is closed and by the monotonicity of the inductive mean for each k σ ω,A (k) = S k (A ω(1) , . . ., A ω(k) ) ≤ S k (B ω(1) , . . ., B ω(k) ) = σ ω,B (k).

7 .
The Hilbert Space Setting Let B(E) be the C * -algebra of bounded linear operators equipped with the operator norm on an infinite-dimensional Hilbert space E. Let H(E) denote the closed subspace of hermitian operators, and let P E be the cone of invertible positive hermitian operators, an open cone in H(E).Again exp and log are analytic inverses between H(E) and P E .We equip P E with the Thompson metric defined by d T (A, B) = log(A −1/2 BA −1/2 ) , where • is the operator norm.The metric d T retains the four properties of δ in Proposition 4.1 in the finite-dimensional setting, except that A#B is no longer the unique midpoint between A and B [26].Additionally P E is no longer a Hadamard space, basically because H(E) is no longer a Hilbert space.The definition and basic properties of the geometric mean A#B, in particular those of Proposition 3.3, remain valid, and it still gives a distinguished metric midpoint (no longer unique [26]) between A and B. The smallest closed set containing A and B and closed under taking #-midpoints yields a distinguished metric geodesic connecting A and B consisting of all A# t B, 0 ≤ t ≤ 1.

Theorem 8 . 1 .
power means are decreasing, s < t implies P s (• ; •) ≤ P t (• ; •).Using power means Lawson and Lim were able to establish the existence and uniqueness of the Karcher mean in the C * -algebra of bounded operators on a Hilbert space.In the strong operator topologyΛ(• ; •) = lim t→0 + P t (• ; •) = inf t>0 P t (• ; •),where Λ is the Karcher mean, the unique solution of the Karcher equation X = Λ(w; A 1 , . . ., A n ) ⇔ n k=1