Construction of Fréchet Means and Multicouplings

When given measures μ1, …, μN are supported on the real line, computing their Frechet mean \(\bar \mu \) is straightforward (Sect. 3.1.4). This is in contrast to the multivariate case, where, apart from the important yet special case of compatible measures, closed-form formulae are not available. This chapter presents an iterative procedure that provably approximates at least a Karcher mean with mild restrictions on the measures μ1, …, μN. The algorithm is based on the differentiability properties of the Frechet functional developed in Sect. 3.1.6 and can be interpreted as classical steepest descent in the Wasserstein space \({\mathcal {W}}_2({\mathbb {R}}^d)\). It reduces the problem of finding the Frechet mean to a succession of pairwise transport problems, involving only the Monge–Kantorovich problem between two measures. In the Gaussian case (or any location-scatter family), the latter can be done explicitly, rendering the algorithm particularly appealing (see Sect. 5.4.1).


A Steepest Descent Algorithm for the Computation of Fréchet Means
Throughout this section, we assume that N is a fixed integer and consider a fixed collection μ 1 , . . . , μ N ∈ W 2 (R d ) with μ 1 absolutely continuous with bounded density, (5.1) whose unique (Proposition 3.1.8) Fréchet meanμ is sought. It has been established that if γ is absolutely continuous then the associated Fréchet functional has Fréchet derivative (Theorem 3.1.14) at γ. Let γ j ∈ W 2 (R d ) be an absolutely continuous measure, representing our current estimate of the Fréchet mean at step j. Then it makes sense to introduce a step size τ j > 0, and to follow the steepest descent of F given by the negative of the gradient: In order to employ further descent at γ j+1 , it needs to be verified that F is differentiable at γ j+1 , which amounts to showing that the latter stays absolutely continuous. This will happen for all but countably many values of the step size τ j , but necessarily if the latter is contained in [0, 1]: Lemma 5.1.1 (Regularity of the Iterates) If γ 0 is absolutely continuous and τ = τ 0 ∈ [0, 1], then γ 1 = exp γ 0 (−τ 0 F (γ 0 )) is also absolutely continuous.
The idea is that push-forwards of γ 0 under monotone maps are absolutely continuous if and only if the monotonicity is strict, a property preserved by averaging. See page 118 in the supplement for the details. Lemma 5.1.1 suggests that the step size should be restricted to [0,1]. The next result suggests that the objective function essentially tells us that the optimal step size, achieving the maximal reduction of the objective function (thus corresponding to an approximate line search), is exactly equal to 1. It does not rely on finite-dimensional arguments and holds when replacing R d by a separable Hilbert space.
Lemma 5.1.2 (Optimal Stepsize) If γ 0 ∈ W 2 (R d ) is absolutely continuous, then 2 and the bound on the right-hand side of the last display is minimised when τ = 1.
Proof. Let S i = t μ i γ 0 be the optimal map from γ 0 to μ i , and set W i = S i − i. Then Both γ 1 and μ i can be written as push-forwards of γ 0 and (2.3) gives the bound .
For brevity, we omit the subscript L 2 (γ 0 ) from the norms and inner products. Developing the squares, summing over i = 1, . . . , N and using (5.3) gives and recalling that W i = S i − i yields To conclude, observe that τ − τ 2 /2 is maximised at τ = 1.
In light of Lemmata 5.1.1 and 5.1.2, we will always take τ j = 1. The resulting iteration is summarised as Algorithm 1. A first step in the convergence analysis is that the sequence (F(γ j )) is nonincreasing and that for any integer k, As k → ∞, the infinite sum on the left-hand side converges, so F (γ j ) 2 must vanish as j → ∞. (B) For j = 0, let γ j be an arbitrary absolutely continuous measure.

Analogy with Procrustes Analysis
Algorithm 1 is similar in spirit to another procedure, generalised Procrustes analysis, that is used in shape theory. Given a subset B ⊆ R d , most commonly a finite collection of labelled points called landmarks, an interesting question is how to mathematically define the shape of B. One way to reach such a definition is to disregard those properties of B that are deemed irrelevant for what one considers this shape should be; typically, these would include its location, its orientation, and/or its scale. Accordingly, the shape of B can be defined as the equivalence class consisting of all sets obtained as gB, where g belongs to a collection G of transformations of R d containing all combinations of rotations,translations,dilations,and/or reflections (Dryden and Mardia [45,Chapter 4]).
If B 1 and B 2 are two collections of k landmarks, one may define the distance between their shapes as the infimum of B 1 − gB 2 2 over the group G . In other words, one seeks to register B 2 as close as possible to B 1 by using elements of the group G , with distance being measured as the sum of squared Euclidean distances between the transformed points of B 2 and those of B 1 . In a sense, one can think about the shape problem and the Monge problem as dual to each other. In the former, one is given constraints on how to optimally carry out the registration of the points with the cost being judged by how successful the registration procedure is. In the latter, one imposes that the registration be done exactly, and evaluates the cost by how much the space must be deformed in order to achieve this.
The optimal g and the resulting distance can be found in closed-form by means of ordinary Procrustes analysis [45,Section 5.2]. Suppose now that we are given N > 2 collections of points, B 1 , . . . , B N , with the goal of minimising the sum of squares g i B i − g j B j 2 over g i ∈ G . 1 As in the case of Fréchet means in W 2 (R d ) (Sect. 3.1.2), there is a formulation in terms of sum of squares from the average N −1 ∑ g j B j . Unfortunately, there is no explicit solution for this problem when d ≥ 3. Like Algorithm 1, generalised Procrustes analysis (Gower [66]; Dryden and Mardia [45,p. 90]) tackles this "multimatching" setting by iteratively solving the pairwise problem as follows. Choose one of the configurations as an initial estimate/template, then register every other configuration to the template, employing ordinary Procrustes analysis. The new template is then given by the linear average of the registered configurations, and the process is iterated subsequently.
Paralleling this framework, Algorithm 1 iterates the two steps of registration and linear averaging given the current template γ j , but in a different manner: (1) Registration: by finding the optimal transportation maps t In this sense, the collection (μ 1 , . . . , μ N ) is viewed in the common coordinate system given by the tangent space at the template γ j and is registered to it.
(2) Averaging: the registered measures are averaged linearly, using the common coordinate system of the registration step (1), as elements in the linear space Tan γ j . The linear average is then retracted back onto the Wasserstein space via the exponential map to yield the estimate at the ( j + 1)-th step, γ j+1 .
Notice that in the Procrustes sense, the maps that register each μ i to the template γ j are t γ j μ i , the inverses of t μ i γ j . We will not use the term "registration maps" in the sequel, to avoid possible confusion.

Convergence of Algorithm 1
In order to tackle the issue of convergence, we will use an approach that is specific to the nature of optimal transport. This is because the Hessian-type arguments that are used to prove similar convergence results for steepest descent on Riemannian manifolds (Afsari et al. In fact, even in Euclidean spaces, convergence of steepest descent usually requires a Lipschitz bound on the derivative of F (Bertsekas [19, Subsection 1.2.2]). Unfortunately, F is not known to be differentiable at discrete measures, and these constitute a dense set in W 2 ; consequently, this Lipschitz condition is very unlikely to hold. Still, this specific geometry of the Wasserstein space affords some advantages; for instance, we will place no restriction on the starting point for the iteration, except that it be absolutely continuous; and no assumption on how "spread out" the collection μ 1 ,..., μ N is will be necessary as in, for example, [1,67,86]. Theorem 5.3.1 (Limit Points are Karcher Means) Let μ 1 ,..., μ N ∈ W 2 (R d ) be probability measures and suppose that one of them is absolutely continuous with a bounded density. Then, the sequence generated by Algorithm 1 stays in a compact set of the Wasserstein space W 2 (R d ), and any limit point of the sequence is a Karcher mean of (μ 1 ,..., μ N ).
Since the Fréchet meanμ is a Karcher mean (Proposition 3.1.8), we obtain immediately: Alternatively, combining Theorem 5.3.1 with the optimality criterion Theorem 3.1.15 shows that the algorithm converges toμ when the appropriate assumptions on {μ i } and the Karcher mean μ = lim γ j are satisfied. This allows to conclude that Algorithm 1 converges to the unique Fréchet mean when μ i are Gaussian measures (see Theorem 5.4.1). The proof of Theorem 5.3.1 is rather elaborate, since we need to use specific methods that are tailored to the Wasserstein space. Before giving the proof, we state two important consequences. The first is the uniform convergence of the optimal maps t μ i γ j to t μ ī μ on compacta. This convergence does not immediately follow from the Wasserstein convergence of γ j toμ, and is also established for the inverses. Both the formulation and the proof of this result are similar to those of Theorem 4.4.3.

Theorem 5.3.3 (Uniform Convergence of Optimal Maps) Under the conditions of Corollary 5.3.2, there exist sets
for any pair of compacta Ω 1 ⊆ A, Ω i 2 ⊆ B i . If in addition all the measures μ 1 ,..., μ N have the same support, then one can choose all the sets B i to be the same.
The other consequence is convergence of the optimal multicouplings.

Corollary 5.3.4 (Convergence of Multicouplings) Under the conditions of Corollary 5.3.2, the sequence of multicouplings
The proofs of Theorem 5.3.3 and Corollary 5.3.4 are given at the end of the present section.
The proof of Theorem 5.3.1 is achieved by establishing the following facts: 1. The sequence (γ j ) stays in a compact subset of W 2 (R d ) (Lemma 5.3.5). 2. Any limit of (γ j ) is absolutely continuous (Proposition 5.3.6 and the paragraph preceding it). 3. Algorithm 1 acts continuously on its argument (Corollary 5.3.8).
Since it has already been established that F (γ j ) → 0, these three facts indeed suffice.
Lemma 5.3.5 The sequence generated by Algorithm 1 stays in a compact subset of the Wasserstein space W 2 (R d ).
Proof. For all j ≥ 1, γ j takes the form M n #π, where M N (x 1 , . . . , x N ) = x and π is a multicoupling of μ 1 ,..., μ N . The compactness of this set has been established in Step 2 of the proof of Theorem 3.1.5; see page 63 in the supplement, where this is done in a more complicated setup.
A closer look at the proof reveals that a more general result holds true. Let A denote the steepest descent iteration, that is, . This is also true if R d is replaced by a separable Hilbert space.
In order to show that a weakly convergent sequence (γ j ) of absolutely continuous measures has an absolutely continuous limit γ, it suffices to show that the densities of γ j are uniformly bounded.
by the portmanteau Lemma 1.7.1. It follows that γ is absolutely continuous with density bounded by C. We now show that such C can be found that applies to all measures in the image of A , hence to all sequences resulting from iterations of Algorithm 1. Proposition 5.3.6 (Uniform Density Bound) For each i = 1, . . . , N denote by g i the density of μ i (if it exists) and g i ∞ its supremum, taken to be infinite if g i does not exist (or if g i is unbounded). Let γ 0 be any absolutely continuous probability measure. Then the density of γ 1 = A (γ 0 ) is bounded by the 1/d-th harmonic mean of g i ∞ , The constant C μ depends only on the measures (μ 1 ,..., μ N ), and is finite as long as one μ i has a bounded density, since C μ ≤ N d g i ∞ for any i.
Proof. Let h i be the density of γ i . By the change of variables formula, for γ 0 -almost any x , when g i exists.
(Convex functions are twice differentiable almost surely (Villani [125, Theorem 14.25]), hence these gradients are well-defined γ 0 -almost surely.) We seek a lower bound on the determinant of ∇t γ 1 γ 0 (x), which by definition equals Such a bound is provided by the Brunn-Minkowski inequality (Stein and Shakarchi [121, Section 1.5]) for symmetric positive semidefinite matrices From this, we obtain an upper bound for h 1 : Let Σ be the set of points where this inequality holds, then γ 0 (Σ ) = 1. Hence Thus, γ 1 -almost surely and for all i, The third statement (continuity of A ) is much more subtle to establish, and its rather lengthy proof is given next. In view of Proposition 5.3.6, the uniform bound on the densities is not a hindrance for the proof of convergence of Algorithm 1.
Proposition 5.3.7 Let (γ n ) be a sequence of absolutely continuous measures with uniformly bounded densities, suppose that W 2 (γ n , γ) → 0, and let Proof. As has been established in the discussion before Proposition 5.3.6, the limit γ must be absolutely continuous, so η is well-defined.
In view of Theorem 2.2.1, it suffices to show that if h : (R d ) N+1 → R is any continuous nonnegative function such that and g defined analogously. The proof, given in full detail on page 124 of the supplement, is sketched here.
Step 1: Truncation. Since γ n converge in the Wasserstein space, they satisfy the uniform integrability (2.4) and absolute continuity (2.7) by Theorem 2.2.1. Consequently, g n,R = min(g n , 4R) is uniformly close to g n : We may thus replace g n by a bounded version g n,R .
Step 2: Convergence of g n to g. By Proposition 1.7.11, the optimal maps t μ i γ n converge to t μ i γ and (since h is continuous), g n → g uniformly on "nice" sets Ω ⊆ E = suppγ. Write Step 3: Bounding the first two integrals. The first integral vanishes as n → ∞, by the portmanteau Lemma 1.7.1, and the second by uniform convergence.
Step 4: Bounding the third integral. The integrand is bounded by 8R, so it suffices to bound the measures of R d \ Ω . This is a bit technical, and uses the uniform density bound on (γ n ) and the portmanteau lemma. Corollary 5.3.8 (Continuity of A ) If W 2 (γ n , γ) → 0 and γ n have uniformly bounded densities, then A (γ n ) → A (γ).
Proof. Choose h in the proof of Proposition 5.3.7 to depend only on y.
Proof (Proof of Corollary 5.3.4). Choose h in the proof of Proposition 5.3.7 to depend only on t 1 ,...,t n .

Proof (Proof of Theorem 5.3.3). Let E = suppμ and set
Asμ is absolutely continuous,μ(A i ) = 1, and the same is true for A = ∩ N i=1 A i . The first assertion then follows from Proposition 1.7.11. The second statement is proven similarly. Let E i = suppμ i and notice that by absolute continuity the B i = (E i ) den ∩ {x : tμ μ i (x) is univalued} has measure 1 with respect to μ i . Apply Proposition 1.7.11. If in addition E 1 = · · · = E N , then μ i (B) = 1 for B = ∩B i .

Illustrative Examples
As an illustration, we implement Algorithm 1 in several scenarios for which pairwise optimal maps can be calculated explicitly at every iteration, allowing for fast computation without error propagation. In each case, we give some theory first, describing how the optimal maps are calculated, and then implement Algorithm 1 on simulated examples.

Gaussian Measures
No example illustrates the use of Algorithm 1 better than the Gaussian case. This is so because optimal maps between centred nondegenerate Gaussian measures with covariances A and B have the explicit form (see Sect. 1.6.3) with the obvious slight abuse of notation. In contrast, the Fréchet mean of a collection of Gaussian measures (one of which nonsingular) does not admit a closed-form formula and is only known to be a Gaussian measure whose covariance matrix Γ is the unique invertible root of the matrix equation where S i is the covariance matrix of μ i . Given the formula for t B A , application of Algorithm 1 to Gaussian measures is straightforward. The next result shows that, in the Gaussian case, the iterates must converge to the unique Fréchet mean, and that (5.4) can be derived from the characterisation of Karcher means.
Theorem 5.4.1 (Convergence in Gaussian Case) Let μ 1 , . . . , μ N be Gaussian measures with zero means and covariance matrices S i with S 1 nonsingular, and let the initial point γ 0 be N (0,Γ 0 ) with Γ 0 nonsingular. Then the sequence of iterates generated by Algorithm 1 converges to the unique Fréchet mean of (μ 1 , . . . , μ N ).
Proof. Since the optimal maps are linear, so is their mean and therefore γ k is a Gaussian measure for all k, say N (0,Γ k ) with Γ k nonsingular. Any limit point of γ is a Karcher mean by Theorem 5.3.1. If we knew that γ itself were Gaussian, then it actually must be the Fréchet mean because N −1 ∑ t μ i γ equals the identity everywhere on R d (see the discussion before Theorem 3.1.15). Let us show that every limit point γ is indeed Gaussian. It suffices to prove that (Γ k ) is a bounded sequence, because if Γ k → Γ , then N (0,Γ k ) → N (0,Γ ) weakly, as can be seen from either Lehmann-Scheffé's theorem (the densities converge) or Lévy's continuity theorem (the characteristic functions converge).
To see that (Γ k ) is bounded, observe first that for any centred (Gaussian or not) measure μ with covariance matrix S, where δ 0 is a Dirac mass at the origin. (This follows from the spectral decomposition of S.) Therefore 0 ≤ tr[Γ k ] = W 2 2 (γ k , δ 0 ) is bounded uniformly, because {γ k } stays in a Wasserstein-compact set by Lemma 5.3.5. If we define C = sup k tr[Γ k ] < ∞, then all the diagonal elements of Γ k are bounded uniformly. When A is symmetric and positive semidefinite, 2|A i j | ≤ A ii + A i j . Consequently, all the entries of Γ k are bounded uniformly by C, which means that (Γ k ) is a bounded sequence.
From the formula for the optimal maps, we see that if Γ is the covariance of the Fréchet mean, then and we recover the fixed point equation (5.4). If the means are nonzero, then the optimal maps are affine and the same result applies; the Fréchet mean is still a Gaussian measure with covariance matrix Γ and mean that equals the average of the means of μ i , i = 1, . . . , N. Figure 5.1 shows density plots of N = 4 centred Gaussian measures on R 2 with covariances S i ∼ Wishart(I 2 , 2), and Fig. 5.2 shows the density of the resulting Fréchet mean. In this particular example, the algorithm needed 11 iterations starting from the identity matrix. The corresponding optimal maps are displayed in Fig. 5.3. It is apparent from the figure that these maps are linear, and after a more careful reflection one can be convinced that their average is the identity. The four plots in the figure are remarkably different, in accordance with the measures themselves having widely varying condition numbers and orientations; μ 3 and more so μ 4 are very concentrated, so the optimal maps "sweep" the mass towards zero. In contrast, the optimal maps to μ 1 and μ 2 spread the mass out away from the origin.

Compatible Measures
We next discuss the behaviour of the algorithm when the measures are compatible. Definition 2.3.1). Boissard et al. [28] showed that when this condition holds, the Fréchet mean of (μ 1 , . . . , μ N ) can be found by simple computations involving the iterated barycentre. We again denote by γ 0 the initial point of Algorithm 1, which can be any absolutely continuous measure.

Lemma 5.4.2 (Compatibility and Convergence) If γ 0 ∪ {μ i } is compatible, then
Algorithm 1 converges to the Fréchet mean of (μ i ) after a single step.
Proof. By definition, the next iterate is which is the Fréchet mean by Theorem 3.1.9.
In this case, Algorithm 1 requires the calculation of N pairwise optimal maps, and this can be reduced to N − 1 if the initial point is chosen to be μ 1 . This is the same computational complexity as the calculation of the iterated barycentre proposed in [28]. When the measures have a common copula, finding the optimal maps reduces to finding the optimal maps between the one-dimensional marginals (see Lemma 2.3.3) and this can be done using quantile functions as described in Sect. 1.5. The marginal Fréchet means are then plugged into the common copula to yield the joint Fréchet mean. We next illustrate Algorithm 1 in three such scenarios.

The One-Dimensional Case
When the measures are supported on the real line, there is no need to use the algorithm since the Fréchet mean admits a closed-form expression in terms of quantile functions (see Sect. 3.1.4). We nevertheless discuss this case briefly because we build upon this construction in subsequent examples. Given that t ν μ = F −1 ν • F μ , we may apply Algorithm 1 starting from one of these measures (or any other measure). Figure 5.4 plots N = 4 univariate densities and the Fréchet mean yielded by the algorithm in two different scenarios. At the left, the densities were generated as with φ the standard normal density, and the parameters generated independently as Gamma(4,4).
At the right of Fig. 5.4, we used a mixture of a shifted gamma and a Gaussian: . The resulting Fréchet mean density for both settings is shown in thick light blue, and can be seen to capture the bimodal nature of the data. Even though the Fréchet mean of Gaussian mixtures is not a Gaussian mixture itself, it is approximately so, provided that the peaks are separated enough. Figure 5.5 shows the optimal maps pushing the Fréchet meanμ to the measures μ 1 ,..., μ N in each case. If one ignores the "middle part" of the x axis, the maps appear (approximately) affine for small values of x and for large values of x, indicating how the peaks are shifted. In the middle region, the maps need to "bridge the gap" between the different slopes and intercepts of these affine maps.

Independence
We next take measures μ i on R 2 , having independent marginal densities f i X as in (5.5), and f i Y as in (5.6). Figure 5.6 shows the density plot of N = 4 such measures, constructed as the product of the measures from Fig. 5.4. One can distinguish the independence by the "parallel" structure of the figures: for every pair (y 1 , y 2 ), the ratio g(x, y 1 )/g(x, y 2 ) does not depend on x (and vice versa, interchanging x and y). Figure 5.7 plots the density of the resulting Fréchet mean. We observe that the Fréchet mean captures the four peaks and their location. Furthermore, the parallel nature of the figure is preserved in the Fréchet mean. Indeed, by Lemma 3.1.11 the Fréchet mean is a product measure. The optimal maps, in Fig. 5.10, are the same as in the next example, and will be discussed there.

Common Copula
Let μ i be a measure on R 2 with density where f i X and f i Y are random densities on the real line with distribution functions F i X and F i Y , and c is a copula density. Figure 5.8 shows the density plot of N = 4 such measures, with f i X generated as in (5.5), f i Y as in (5.6), and c is the Frank(−8) copula density, while Fig. 5.9 plots the density of the Fréchet mean obtained. (For ease of comparison we use the same realisations of the densities that appear in Fig. 5.4.) The Fréchet mean can be seen to preserve the shape of the density, having four clearly distinguished peaks. Figure 5.10, depicting the resulting optimal maps, allows for a clearer interpretation: for instance, the leftmost plot (in black) shows more clearly that the map splits the mass around x = −2 to a much wider interval; and conversely a very large amount mass is sent to x ≈ 2. This rather extreme behaviour matches the peak of the density of μ 1 located at x = 2.

Partially Gaussian Trivariate Measures
We now apply Algorithm 1 in a situation that entangles two of the previous settings. Let U be a fixed 3 × 3 real orthogonal matrix with columns U 1 , U 2 , U 3 and let μ i have density with f i bounded density on the real line and S i ∈ R 2×2 positive definite. We simulated N = 4 such densities with f i as in (5.5) and S i ∼ Wishart(I 2 , 2). We apply Algorithm 1 to this collection of measures and find their Fréchet mean (see the end of this subsection for precise details on how the optimal maps were calculated). Figure 5.11 shows level set of the resulting densities for some specific values. The bimodal nature of f i implies that for most values of a, {x : f i (x) = a} has four elements. Hence, the level sets in the figures are unions of four separate parts, with each peak of f i contributing two parts that form together the boundary of an ellipsoid in R 3 (see Fig. 5.12). The principal axes of these ellipsoids and their position in R 3 differ between the measures, but the Fréchet mean can be viewed as an average of those in some sense. In terms of orientation (principal axes) of the ellipsoids, the Fréchet mean is most similar to μ 1 and μ 2 , whose orientations are similar to one another. Let us now see how the optimal maps are calculated. If Y = (y 1 , y 2 , y 3 ) ∼ μ i , then the random vector (x 1 , x 2 , x 3 ) = X = U −1 Y has joint density so the probability law of X is ρ i ⊗ ν i with ρ i centred Gaussian with covariance matrix Σ i and ν i having density f i on R. By Lemma 3.1.11, the Fréchet mean of (U −1 #μ i ) is the product measure of that of (ρ i ) and that of (ν i ); by Lemma 3.1.12, the Fréchet mean of (μ i ) is therefore where Σ is the Fréchet-Wasserstein mean of Σ 1 , . . . , Σ N . Starting at an initial point γ 0 = U#(N (0, Σ 0 ) ⊗ ν 0 ), with ν 0 having continuous distribution F ν 0 , the optimal maps are U the gradients of the convex function where we identify t Σ i γ 0 with the positive definite matrix (Σ i ) 1/2 [(Σ i ) 1/2 Σ 0 (Σ i ) 1/2 ] −1/2 (Σ i ) 1/2 that pushes forward N (0, Σ 0 ) to N (0, Σ i ). Due to the one-dimensionality, the algorithm finds the third component of the rotated measures after one step, but the convergence of the Gaussian component requires further iterations.

Population Version of Algorithm 1
Let Λ ∈ W 2 (R d ) be a random measure with finite Fréchet functional. The population version of (5.1) is q = P(Λ absolutely continuous with density bounded by M) > 0 for some M < ∞, (5.7) which we assume henceforth. This condition is satisfied if and only if P(Λ absolutely continuous with bounded density) > 0.
These probabilities are well-defined because the set : μ absolutely continuous with density bounded by M} is weakly closed (see the paragraph before Proposition 5.3.6), hence a Borel set of W 2 (R d ).
In light of Theorem 3.2.13, we can define a population version of Algorithm 1 with the iteration function The (Bochner) expectation is well-defined in L 2 (γ) because the random map t Λ γ is measurable (Lemma 2.4.6). Since L 2 (γ) is a Hilbert space, the law of large numbers applies there, and results for the empirical version carry over to the population version by means of approximations. In particular: Proof. The result is true in the empirical case, by Proposition 5.3.6. The key point (observed by Pass [102, Subsection 3.3]) is that the number of measures does not appear in the bound q −d M.
Let Λ 1 , . . . be a sample from Λ and let q n be the proportion of measures in (Λ 1 , . . . ,Λ n ) that have density bounded by M. Then both n −1 ∑ n i=1 t Λ i γ → Et Λ γ and q n → q almost surely by the law of large numbers. Pick any ω in the probability space for which this happens and notice that (invoking Lemma 2.4.5) Let λ n denote the measure in the last limit. By Proposition 5.3.6, its density is bounded by q −d n M → q −d M almost surely, so for any C > q −d M and n large, λ n has density bounded by C. By the portmanteau Lemma 1.7.1, so does lim λ n = [Et Λ γ ]#γ. Now let C q −d M.
Though it follows that every Karcher mean of Λ has a bounded density, we cannot yet conclude that the same bound holds for the Fréchet mean, because we need an apriori knowledge that the latter is absolutely continuous. This again can be achieved by approximations: Theorem 5.5.2 (Bounded Density for Population Fréchet Mean) Let Λ ∈ W 2 (R d ) be a random measure with finite Fréchet functional. If Λ has a bounded density with positive probability, then the Fréchet mean of Λ is absolutely continuous with a bounded density.
Proof. Let q and M be as in (5.7), Λ 1 , . . . be a sample from Λ , and q n the proportion of (Λ i ) i≤n with density bounded by M. The empirical Fréchet mean λ n of the sample (Λ 1 , . . . ,Λ n ) has a density bounded by q −d n M. The Fréchet mean λ of Λ is unique by Proposition 3.2.7, and consequently λ n → λ in W 2 (R d ) by the law of large numbers (Corollary 3.2.10). For any C > lim sup q −d n M, the density of λ is bounded by C by the portmanteau Lemma 1.7.1, and the limsup is q −d M almost surely. Thus, the density is bounded by q −d M.
In the same way, one shows the population version of Theorem 3.1.9: Theorem 5.5.3 (Fréchet Mean of Compatible Measures) Let Λ : (Ω , F , P) → W 2 (R d ) be a random measure with finite Fréchet functional, and suppose that with positive probability Λ is absolutely continuous and has bounded density. If the collection {γ} ∪ Λ (Ω ) is compatible and γ is absolutely continuous, then [Et Λ γ ]#γ is the Fréchet mean of Λ .
It is of course sufficient that {γ} ∪ Λ (Ω \ N ) be compatible for some null set N ⊂ Ω .

Bibliographical Notes
The algorithm outlined in this chapter was suggested independently in this steepest descent form by Zemel and Panaretos [134] and in the form a fixed point equation iteration byÁlvarez-Esteban et al. [9]. These two papers provide different alternative proofs of Theorem 5.3.1. The exposition here is based on [134]. Although longer and more technical than the one in [9], the formalism in [134] is amenable to directly treating the optimal maps (Theorem 5.3.3) and the multicouplings (Corollary 5.3.4). On the flip side, it is noteworthy that the proof of the Gaussian case (Theorem 5.4.1) given in [9] is more explicit and quantitative; for instance, it shows the additional property that the traces of the matrix iterates are monotonically increasing.
Developing numerical schemes for computing Fréchet means in W 2 (R d ) is a very active area of research, and readers are referred to the recent monograph of Peyré and Cuturi [103, Section 9.2] for a survey.
In recent work,  propose a stochastic steepest descent for finding Karcher means of a population Fréchet functional associated with a random measure Λ . At iterate j, one replaces γ j by The analogue of Theorem 5.3.1 holds under further conditions.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.