Central limit theorem for intrinsic Frechet means in smooth compact Riemannian manifolds

We prove a central limit theorem (CLT) for the Frechet mean of independent and identically distributed observations in a compact Riemannian manifold assuming that the population Frechet mean is unique. Previous general CLT results in this setting have assumed that the cut locus of the Frechet mean lies outside the support of the population distribution. So far as we are aware, the CLT in the present paper is the first which allows the cut locus to have co-dimension one or two when it is included in the support of the distribution. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus is one or greater than one: in the former case a non-standard term appears but not in the latter case. This is the first paper to give a general and explicit expression for the non-standard term which arises when the co-dimension of the cut locus is one.


Introduction
The Fréchet mean, the natural setting for which is a metric space, is defined as the point, or set of points, in the space for which the sum of squared distances is minimised.In Euclidean spaces and normed vector spaces, the Fréchet mean is the standard linear mean.More generally, it extends the concept of the mean to nonlinear spaces.In this paper we focus on the large sample behaviour of the sample Fréchet mean based on the intrinsic distance in smooth, compact Riemannian manifolds.
Central limit theory for Fréchet means on compact Riemannian manifolds has been an ongoing topic of research for over 20 years.The principal source of difficulty in proving a general central limit theorem for the intrinsic Fréchet mean is due to the so-called cut locus of a manifold.Roughly speaking, the cut locus of a point x in a manifold M is the set of points z ∈ M such that there exists more than one distance-minimising geodesic from x to z.This nonuniqueness produces non-smooth behaviour in the estimating function for the Fréchet mean.However, despite the challenge posed by the cut locus, there has been some progress in this area, typically with the limitation that the cut locus of the population Fréchet mean is assumed to lie outside the support of the population distribution.
For an account of nonparametric inference for manifold-valued data see Bhattacharya and Bhattacharya [1].Significant contributions on central limit theorems (CLTs) for the Fréchet mean in compact Riemannian manifolds include the following.The papers of Bhattacharya and Patrangenaru [4], [5] were the first to lay out an extensive Fréchet central limit theory for manifolds, covering both intrinsic and extrinsic means; Kendall and Le [15] proved a CLT for Fréchet means based on independent but not necessarily identically distributed manifold-valued random variables; Bhattacharya and Lin [2] considered a more general metric space setting than just manifolds but also derived results of interest for manifolds; Eltzner and Huckemann [9] obtained further extensions and they also discussed a phenomenon that they call smeariness; moreover Eltzner et al. [10] proved a further CLT and developed the concepts of topological stability and metric continuity of the cut locus which we make use of later in the paper.However, all of the CLTs for Fréchet means in general compact Riemannian manifolds given in the contributions mentioned above, and to the best of our knowledge all of the the relevant literature, with the exception of Bhattacharya and Lin [2], assume that the relevant population distribution has support which excludes the cut locus.
The only CLT for Fréchet means in general compact Riemannian manifolds in the contributions mentioned above, and to the best of our knowledge in all of the relevant literature, which does not assume that the relevant population distribution has support which excludes the cut locus is given by Theorem 3.3 in Bhattacharya and Lin [2].They essentially showed that the Fréchet mean exhibits standard behaviour also when the cut-locus of small balls around the population mean carry mass which goes to zero faster than the radius raised to the manifold's dimension plus two.This will essentially be the case if the distribution is absolutely continuous with respect to the Riemannian volume measure and the cut-locus is of co-dimension three as is the case for threeand higher-dimensional spheres; cf.Corollary 3.5 in Bhattacharya and Lin [2].In fact, the authors remark that they can treat the two-dimensional sphere only under support restrictions excluding the cut locus (see their Remark 3.7).However, we speculate that it may be possible to use results along the lines of Brown [6], see also Ritov [19], to prove a standard CLT for the Fréchet mean in the case of S 2 , where the cut locus has co-dimension 2, but we have not yet investigated all of the details.Here, we take a different approach to that problem.
At the outset it was not clear whether the CLT for the intrinsic Fréchet mean on compact Riemannian manifolds exhibits standard behaviour but with technically difficult proofs or whether non-standard behaviour can occur.The article by Hotz and Huckemann [13], who considered the intrinsic Fréchet mean on the circle, S 1 , settled the matter by showing that highly non-standard behaviour occurs in this setting.This sets the scene for the currently open question of the appropriate form of the central limit theorem for the intrinsic Fréchet mean in a general compact Riemannian manifold.
The principal aims of this paper are (i) to clarify when non-standard behaviour of the Fréchet mean in compact Riemannian manifolds occurs; and (ii) to characterise the non-standard behaviour when it does occur.Specifically, we allow the support of the population distribution to include the cut locus and only a mild regularity assumption is made in this regard.A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field.Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus relative to M is 1 or greater than 1: in the former case a non-standard term will appear but not in the latter case.The non-standard term which arises when the co-dimension of the cut locus is 1 is precisely characterised.
The main results of the paper, Theorem 1 and Theorem 2, are stated in Section 2 and are proved in Section 3 and Section 4, respectively.

Central Limit Theorem
Let M be a compact and connected Riemannian manifold (without boundary) of dimension m and let ρ denote the distance function on M × M induced by the Riemannian metric.Suppose that µ is a probability measure on M .The Fréchet function F µ of µ is defined as Since M is compact, F µ (x) < ∞ for all x ∈ M .The population Fréchet mean is defined by For some µ, x 0 will consist of a subset of M rather than a single point in M .It will be assumed throughout the paper that x 0 is unique.Suppose ξ 1 , . . ., ξ n ∈ M is a random sample drawn independently from µ.Then, the set of sample Fréchet means is defined by where G n ⊂ M is the set of global minima of n −1 n i=1 ρ(ξ i , y) 2 .In those cases where G n is not a singleton set, it is assumed that a measureable selection ξn ∈ G n has been made, so that ξn ∈ G n is a measureable random element in the case where G n is not a singleton set.
The following result, proved in Section 5.1, makes use of the strong laws of large numbers proved by Ziezold [23] and Evans and Jaffe [11].
Proposition 1. Assume that (i) M is compact and (ii) x 0 ∈ M is the unique population Fréchet mean of µ.For each n, let ξn ∈ G n denote any measureable selection from G n .Then ρ(x 0 , ξn ) Let T x (M ) denote the tangent space at x ∈ M and write exp x (v) to denote the exponential map, which maps a point v ∈ T x (M ) to the point exp x (v) ∈ M .The inverse exponential (or log) map, denoted exp −1  x (y), maps a point y ∈ M \C x to the point exp −1  x (y) ∈ T x (M ), where C x denotes the cut locus of x.See, for example, Chavel [7] for terminology.Also, define where 1 A denotes the indicator function of a set A. Note that {G µ (x) : x ∈ M } is a vector field on M .It follows from the result of [17] that and that, with probability one under the product measure determined by µ, where μn is the empirical distribution on M based on the random sample ξ 1 , . . ., ξ n .Before stating Theorem 1 and Theorem 2, we mention a number of relevant facts.We denote by D the covariant derivative and by ∇ the gradient operator, both defined on M .For x ′ ∈ C x , (cf.Jost [14], p.203).Moreover, the Hessian, Hess f , of a smooth function f on M is the (symmetric) (0, 2)-tensor field such that, for any vector fields U and V on M , (cf.O'Neill [18] p.86).That is, Hess ρ 2 x can be expressed as for any smooth vector fields U , V on M and any x ′ ∈ M \ Cx, where H(x ′ | x) is the (1, 1)-tensor such that, for any smooth vector field V on M , For any x ∈ M and δ > 0, define the sets and where  [10] prove that both topological stability and metric continuity of the cut locus hold for compact Riemannian manifolds.Here, it will be slightly more convenient to use the concept of metric continuity at any point x ∈ M .In the notation defined above, metric continuity entails the following.
Proposition 2. If M is a compact Riemannian manifold and x ∈ M , then for any δ > 0 there exists a δ 1 > 0 such that In Appendix A we give a different proof for Proposition 2 to that given by Eltzner et al. [10].
Let vol M denote the Riemannian volume measure on M .The key linearization result we need is the the following.
Theorem 1. Assume that (i) M is a compact, connected Riemannian manifold; (ii) x 0 ∈ M is the unique population Fréchet mean of µ; (iii) for δ > 0 sufficiently small, µ, restricted to B δ (x 0 ) defined in (10), is absolutely continuous with respect to vol M and the corresponding Radon-Nicodym derivative has a version ψ which is continuous on exists, where H(•|•) is defined in (8).Then the vector field G µ (x) admits the following linearization for x ∈ M in a neighbourhood of x 0 : where Π x,x0 denotes parallel transport from Tx(M ) to Tx 0 (M ) along the (unique) shortest geodesic between x and x 0 , ||R(x, (20).
The proof of Theorem 1, given in Section 3, uses some involved geometric arguments.These arguments are of potentially broader interest than just the current context.The definition of Ψ µ (x 0 ), which is given in the next subsection, has a particularly interesting form when the co-dimension of the cut-locus of x 0 is 1.In this case Ψ µ (x 0 ) contains a non-standard term which we discuss in detail below, and illustrate in some examples at the end of the section.
Note that, under assumption (iii) of Theorem 1, G µ defined by (3) can be written as in a neighbourhood of x 0 and so we have the relationship in that neighbourhood.Then, one immediate consequence of Theorem 1 is that the Hessian of the Fréchet function F µ at x 0 exists and it can be expressed in terms of Ψ µ as In fact, a slight modification of the proof for Theorem 1 shows that the same result holds in a neighbourhood of x 0 .
We now state our main result, a CLT for ξn , assumed to be a measurable selection from G n .
Theorem 2. Suppose that assumptions (i) -(v) of Theorem 1 hold and that ξn is any measurable selection from G n , as in Proposition 1.In addition, assume (vi) that Ψ µ (x 0 ) is strictly positive definite.Then where

Discussion of assumptions
Here we discuss the assumptions made in Theorem 1 and Theorem 2. Assumptions (i) and (ii) in Theorem 1 define the setting that we consider.Assumption (iii) in Theorem 1 implies a certain level of regularity of the population distribution in a neighbourhood of the cut locus of the Fréchet mean; some such regularity is needed for an expansion of the type (13) to hold.Previous central limit theorems in this setting, such as Bhattacharya and Patragenaru [4], [5] have made the much stronger assumption that the population probability density function is zero in a neighbourhood of the cut locus of the population Fréchet mean.Bhattacharya and Lin [2] have assumed µ(A δ (x 0 )) = o(δ 2 ) whereas our assumptions (iii) and (iv) amount only to µ(A δ (x 0 )) = O(δ).Assumptions (iv) and (v) in Theorem 1 are largely geometric in character.For each of these assumptions, it would be interesting to know whether or not it holds for all smooth, compact connected manifolds when the population Fréchet mean is unique.However, we do not have a proof or a counter-example to this statement in either case and we have found nothing in the literature that throws light on either question.
Finally, assumption (vi) in Theorem 2 is a non-degeneracy assumption.If Ψ µ (x 0 ) is non-negative definite but not of full rank then we are in the same situation as that of a smeary central limit theorem, as discussed by Eltzner and Huckemann [9]: specifically, a central limit theorem is expected to hold but with a non-standard convergence rate which depends on the level of smoothness of the population distribution.Bearing in mind that 2Ψ µ (x 0 ) is the Hessian of the Fréchet function F µ (x), see (14), it follows that if Ψ µ (x 0 ) has one or more strictly negative eigenvalues then this contradicts x 0 being a Fréchet mean due to the Hessian of the Fréchet function F µ (x) in ( 1) not being non-negative definite, in which case x 0 can not be a stationary minimum of the Fréchet function.

The expression of Ψ µ (x 0 )
The expression of Ψ µ (x 0 ) comprises two terms, one associated with the Hessian of the squared distance function, away from the cut locus of x 0 , and the other with the behaviour of the distance function on the cut locus C x0 of x 0 .Hence, the second term reflects the geometric structure of the manifold M .
To make the notation more explicit, we write, for any fixed x ∈ M , ρ x = ρ(x, • ).Note that ρ 2 x is a smooth function away from the cut locus of x.The tensor H(x 0 | • ) which appears in (7) and ( 8) determines the first term of Ψ µ (x 0 ).The construction above for H(x 0 | x) requires that x ∈ Cx 0 .Nevertheless, it follows from the result of Le and Barden [17] that H(x 0 | ξ 1 ) is well-defined with probability one, because condition (iii) of Theorem 1 implies that µ(ξ ∈ C x0 ) = 0, i.e. the cut locus of x 0 has zero probability under µ.
To introduce the second term of Ψ µ (x 0 ), we first recall some facts on the cut locus C x of x and the behaviour of ρ x nearby.These results, explicitly or implicitly stated in Barden and Le [3] & Le and Barden [16], are given in the following lemmas.The first one is on the structure of C x , a set of co-Hausdorffdimension at least one (or, equivalently, where the Hausdorff dimension is at most m − 1).Lemma 1.For any x ∈ M there is a set Q x of Hausdorff (m − 1)-measure zero contained in Cx and containing the first conjugate locus of x such that Hx = Cx \ Q x is a countable union of disjoint hyper-surfaces (co-dimension one sub-manifolds) where, for each y ∈ Hx, there are exactly two minimal geodesics from x to y.In particular, Hx is a Borel measurable set and y ∈ Hx if and only if x ∈ Hy.
The decomposition of Cx in Lemma 1 above is the same as that given in Theorem 2 of [16], but slightly different from that given in Prop 2 in [3].In [3] Q x is the set of the first conjugate loci of x in Cx, while here Q x is the union of the set of the first conjugate loci of x in Cx with the set of non-conjugate points in Cx which have more than two minimal geodesics to x.Furthermore, the proof of Theorem 2 in Le and Barden [16] made it clear that the set Q x , which was called E there, has co-dimension at least two, although the Theorem itself only stated that it has Hausdorff (m − 1)-measure zero as needed for that paper.In particular, that the set of the first conjugate loci of x has co-dimension at least two was proved in Proposition 1 of Barden and Le [3].
The next two lemmas show that, although ρ x is not differentiable at C x , it is relatively well behaved in a neighbourhood of H x .
Lemma 2. Let H x be given as in Lemma 1.For each y ∈ Hx, there is a neighbourhood Vy of y in M on which there are two unique smooth functions φ 1y ( • | x) and φ 2y ( • | x) such that for any y ′ ∈ Vy, The neighbourhood Vy and the two functions φ iy ( • | x) in the above Lemma were constructed in the proof of Proposition 1 in Barden and Le [3] as follows.There are two disjoint neighbourhoods U1y and U2y in Tx(M ) such that, for each i, Vy = exp x (U iy ).Then, φ iy (y ′ | x) = (exp −1  x | Uiy (y ′ ) for y ′ ∈ Vy.The next result is an immediate consequence of this construction.Lemma 3. Let H x be given as in Lemma 1, and let V y , U jy and φ jy be given as in Lemma 2 and in the following construction.If, for y ′ ∈ Vy Hx, γ j is the minimal geodesic with γ j (0) = x, γ j (1) = y ′ and γj (0) ∈ Ujy, then ∇φ jy (y ′ | x) = γj (1)/ γj (1) , that is, ∇φ jy (y ′ | x) is the unit tangent vector to γ j at y ′ .
The following result follows from the uniqueness of the pair of functions φ jy , j = 1, 2, stated in Lemma 2.
Corollary 1.Let H x be given as in Lemma 1, and let V y and φ jy be given as in Lemma 2. For each y ′ ∈ Vy Hx, the unordered pair of the functions Hx and so, making a continuous choice of sign, this difference is a well-defined This, together with the results in Barden and Le [3], implies the following relationship between H i and χ i .Corollary 2. Let H i and χ i be given as in Corollary 1.For y ∈ Hi(x), ∇χ i (y | x) is non-zero and normal to Hi(x) at y.
With the above understanding of C x and ρ x nearby, we reach the following main ingredients for our definition of the second term of Ψ µ (x 0 ).Corollary 3. Let H x be given as in Lemma 1, and let H i and χ i be given as in Corollary 1.Then, (a) the set Hx can be expressed as the countable union of disjoint Hi(x); is well-defined on Hx; (c) the unit normal vector field given by is well-defined up to sign on Hx.
Note that, for y ∈ Hi(x), and that, for y ∈ Hi(x) and y ′ ∈ Vy Hi(x), Now, for y ∈ Hx, define d y ⊥x to be the 1-form, unique up to sign, given by d y ⊥x (U (y)) = n(y | x), U (y) for any tangent vector U (y) at y. Write J(y | x) for the well-defined (0, 2)-tensor at y on Hx given by That is, for any y ∈ Hx and any U (y), V (y) ∈ T y (M ), Write α(t) for the unit speed geodesic orthogonal to Hx at y and τ x|y (t) for the distance from x to Hα(t) along the geodesic orthogonal to Hy.Then That is, τ ′ x|y (0) represents the rate of change of x orthogonal to H y as y moves orthogonally to H x .In terms of J(x 0 | y), τ ′ x|y (0) and ψ, the Radon-Nikodym derivative of µ with respect to the volume measure in a neighbourhood of C x0 , we denote by J µ (x 0 ) the (1, 1)-tensor defined by where d vol Hx denotes the co-dimension one surface measure on Hx.

Three examples of J µ
In the case of symmetric spaces, τ ′ y|x (0) ≡ 1 and so the expression for J µ (x) defined by ( 19) can be simplified.We now calculate J µ (x) for special symmetric spaces with appropriate 'coordinate systems'.Moreover, we show that condition (iv) in Theorem 1 is satisfied in each of the three examples.i.e. we show that vol M (A δ (x)) = O(δ) as δ ↓ 0, where A δ (x) is defined in (9); and we show that condition (v) in Theorem 1 is also satisfied in the three examples.
(a) M = S 1 : Hx = Cx contains only the antipodal point y of x.Thus, ρ(x, y) = π; the initial tangent vectors of the two geodesics from x to y have the opposite direction so that κ(x | y) = 2; and we may take n(x | y) = 1.Hence, if we take the standard coordinate in the subset (−π, π] in its universal cover with x = 0, then the corresponding J µ is J µ (0) = −2π ψ(π), identical with the extra term in the covariance of the central limit theorem of Hotz and Huckemann [13].
Finally, we check conditions (iv) and (v) of Theorem 1.Since C x is the antipodal point of x, it follows that, in the local coordinates introduced above, A δ may be written as A δ (0) = (−π, −π+δ)∪(π−δ, π], so that vol M (A δ (x)) = 2δ and therefore condition (iv) of Theorem 1 is satisfied.Condition (v) follows because the circle is flat and therefore the Hessian H(x 0 |ξ) = 1 if ξ is not the antipodal point of x 0 .
For higher dimensional spheres S d , d > 1, we have vol M (A δ (x)) = O(δ d ) but Hx is empty since the cut-locus is of co-dimension d > 1, so J µ (0) vanishes.For d > 2 this has already been observed by Bhattacharya & Lin [2] but the CLT for S 2 given a non-vanishing density at the cut locus appears to be new.
This expression can be verified by direct computation of the Hessian of F µ .Finally, we consider conditions (iv) and (v) of Theorem 1.We first identify the form of A δ (x 0 ).Without loss of generality we take x 0 to be x 0 = (0, 0, 1) ⊤ and represent RP 2 by the hemisphere {x = (x 1 , x 2 , x 3 ) ⊤ ∈ S 2 : x 3 ≥ 0}.Then it is easy to see that A δ is given by Moreover, the volume of A δ with respect to surface area measure on S 2 is 2π sin δ.It follows easily that condition (iv) of Theorem 1 is satisfied here, too.
Condition (v) requires a bit more work to check in this example.From Kendall and Le (2011), H(x|y) on the sphere S 2 is given by the map where •, • is the Riemannian inner product on the tangent space at x ∈ M .When restricted to the (open) half sphere centred at y, it gives H(x | y) on RP 2 .For given x ∈ M there is a possible singularity at y = x.However, the singularity is in fact a removable singularity because, for x close to y, ρ(x, y) ∼ sin(ρ(x, y)) and 1 − cos(ρ(x, y)) ∼ ρ(x, y) 2 .Then, the boundedness of is well-defined.As is the case with higher-dimensional tori, it is easy to see that also for RP d with d > 2, conditions (iv) and (v) remain satisfied but that J µ (x 0 ) will not vanish in general.To the best of our knowledge, the corresponding CLTs are the first of their kind when the cut locus is containted in the support of the distribution.

Proof of Theorem 1
To prove Theorem 1, we first consider a generalised version of the Taylor expansion of the inverse exponential map at different base points.That is, for fixed z ∈ M , we study the Taylor expansion for the vector field exp −1  x (z) for x ∈ Cz.For this, we fix z ∈ M and, for x 0 , x 1 ∈ Cz sufficiently close, denote by γ the unit speed geodesic segment such that γ(0) = x 0 and γ(ρ(x 0 , x 1 )) = x 1 .
In the case that γ(t) ∈ H z ⊆ C z for some t ∈ (0, ρ(x 0 , x 1 )), we have the following result on the approximation of Π x1,x0 exp −1 x1 (z) in terms of exp −1 x0 (z), generalising the Taylor expansion (21) for smooth vector fields.Proposition 3. Let x 0 , x 1 , z ∈ M be such that x 0 , x 1 ∈ Cz are sufficiently close and γ be the minimal unit speed geodesic from x 0 to x 1 .If there is a parameter t z ∈ (0, ρ(x 0 , x 1 )) such that γ(t z ) ∈ Hz, then where H(x ′ | x) is defined by (8), κ(y | x) is defined by (15) and n(y | x) is defined by (16).
Proof.If there is a parameter t z ∈ (0, ρ(x 0 , x 1 )) such that γ(t z ) ∈ Hz ⊆ Cz, such t z is unique provided x 0 and x 1 are sufficiently close.Without loss of generality, we may assume that the two smooth functions Then, the difference between the two tangent vectors Π x1,x0 exp −1 x1 (z) and exp −1 x0 (z), both in T x0 (M ), can be expressed as The definitions for κ(y | x) and n(y | x) given respectively by (15) and (16) imply that the terms in the third curly bracket on the right hand side above is equal to By (21), the difference between the terms in the second curly bracket on the right hand side above and (H( a similar application of (21) to the terms in the first curly bracket results in up to a term of order o(ρ(x 0 , x 1 )).Hence, and similarly for exp −1 x1 (γ(t z )), as well as noting ρ(x 0 , x 1 ) = ρ(x 0 , γ(t z )) + ρ(γ(t z ), x 1 ), we have )), so that the required result follows.
In the remainder of the paper it will be useful to use the different but equivalent representation of A δ (x) in (9) given by To see that ( 9) and ( 23) are equivalent, note that using the fact that z ∈ C y if and only if y ∈ C z .
Then A x0,x1 ⊇ N * x0,x .Since x 0 is the Fréchet mean of µ, G µ (x 0 ) = 0.Under the given assumption, we also have that, for x ∈ M in a neighbourhood of x 0 , µ({ξ ∈ Cx}) = 0. Thus, it follows from (21) and Proposition 3 that, for x sufficiently close to x 0 , Since condition (v) of Theorem together with Lemma 1 ensures that and since the boundedness of M and Lemma 1 together imply that we have that Thus, by the definition (20) of Ψ µ (x 0 ), it is sufficient to show that where J µ (x 0 ) is defined by (19).
For this, we note that the functions χ i ( • | x) given in Corollary 1 are defined on a neighbourhood of Hx.Thus, we may extend the definitions of the corresponding κ( • | x) and n( • | x) given in (15) and (16) to that neighbourhood of Hx.This implies that To analyse the right hand side of (27) we consider, for any z ∈ N * x0,x1 , the minimal unit speed geodesic β z from z to x 0 .Extending β z backwards beyond z, let y z be the first hitting point of Cx 0 on the extension; see Figure 1.Let Nx 0,x1 = {z ∈ M | γ(t) ∈ Hz for some t ∈ (0, ρ(x 0 , x 1 )) and y z ∈ Hx 0 }.
Then, Nx 0,x1 is a Lebesgue measurable subset of N * x0,x1 and the difference between the volumes of Nx 0,x1 and of N * x0,x1 is o(ρ(x 0 , x 1 )).Since, by (11) and condition (iii) of Theorem 1, which states that in a neighbourhood of Cx 0 , µ is absolutely continuous with respect to the volume measure vol M (•) with continuous Radon-Nikodym derivative ψ, (27) can be expressed in terms of Nx 0,x as where x is sufficiently close to x 0 .If we write u z for the point on β z that lies in Cx 1 as in Figure 1, then the volume of the local cross-sectional slice of Nx 0,x1 at z ∈ Nx 0,x1 can be approximated by d vol Hx 0 (y z ) n(y z | x 0 ), exp −1 yz (u z ) and, since y z ∈ Hx 0 , we also have where both n(y z | x 0 ) and n(x 0 | y z ) are chosen such that the inner products are non-negative and where τ ′ y|x (0) is given in (18).These two facts together imply that, for z ∈ Nx 0,x1 , d vol(z) ≈ n(x 0 | y z ), exp −1 x0 (x 1 ) τ ′ yz|x0 (0) d vol Hx 0 (y z ).
Using this and the continuity in z of ρ z (x), κ(x | z), n(x | z) and ψ(z), the dominant term on the right hand side of the second equality in (28) can be expressed as Hence, (26) follows from the definition (19) of J µ (x 0 ) as required.
4 Proof of Proposition 1 and Theorem 2

Proof of Proposition 1
For each n ≥ 1, let ξn ∈ G n denote any measureable selection from G .From the strong law of large numbers in Ziezold (1977), and using the assumption that x 0 is the unique population mean, almost surely where a horizontal line over a set indicates set closure.From elementary considerations, the first set inclusion below holds and therefore the set G 0 of limit points is where, for each k ≥ 1, ξk ∈ G k .Since G 0 ⊆ {x 0 }, there are two possibilities: either G 0 = {x 0 }, in which case the proposition follows; or, alternatively, G 0 = ∅, the empty set.However, M is compact, so { ξn } must have a convergent subsequence with a limit x 1 ∈ M .Moreover, we must have x 1 ∈ G 0 because x 1 is an accumulation point of the sequence.Therefore, x 1 = x 0 , and consequently ρ( ξn , x 0 ) → 0 almost surely as required.
Our proof of Theorem 2, in particular Step 1, makes use of this lemma.

Proof of Theorem 2
The proof of Theorem 2 is broken into two steps.In the first step we explain how Lemma 5 will be applied.In the subsequent step, we explain how to make the right-hand side of (31) arbitrarily small uniformly for all x ∈ R m and therefore the CLT in Theorem 2 will have been proved.
Step 1. Application of Lemma 5. Write To establish Theorem 2, we apply Lemma 5 with X and Y defined in (40) and (41), respectively.Since, from (35), we know that Z n (x 0 ) d → N m (0 m , V 0 ), it follows that Ψ µ (x 0 ) −1 Z n (x 0 ) is asymptotically normal with mean vector the zero vector and covariance matrix Ψ µ (x 0 ) −1 V 0 Ψ µ (x 0 ) ⊤ −1 .Moreover, as n → ∞, a suitable sequence of w's such that ||w|| → 0 and P[{|Y | ≤ w} c ] → 0 can always be found provided all components of Y go to 0 in probability.Consequently, to complete the proof of Theorem 2, it is sufficient to show that ||Y || p → 0, which is proved in Step 2.
In Step 2 we first show that ||T 1 || p → 0, where T 1 is defined in (38).Then we deduce that ||Y || p → 0, where Y is defined in (41).To establish the result for T 1 we shall make use of results from empirical process theory.A key step is to approximate E tr (Π x,x0 Z n (x) − Π y,x0 Z n (y)) ⊤ (Π x,x0 Z n (x) − Π y,x0 Z n (y)) .
It is also assumed below that x, y ∈ B δ0 (x 0 ), the open ball in M of radius δ 0 centred at x 0 , where 0 < δ 0 < δ, and using condition (iii) of Theorem 1, δ has been chosen to be sufficiently small for µ to be absolutely continuous on A δ (x 0 ), where A δ (x 0 ) is defined in (9) and x 0 ∈ M is the population Fréchet mean of µ, assumed to be unique.We may write (43) as .

Figure 1 :
Figure 1: Relevant points on the minimal geodesic β z from x 0 to z when z ∈ Cx 0