Abstract
We prove a central limit theorem (CLT) for the Fréchet mean of independent and identically distributed observations in a compact Riemannian manifold assuming that the population Fréchet mean is unique. Previous general CLT results in this setting have assumed that the cut locus of the Fréchet mean lies outside the support of the population distribution. In this paper we present a CLT under some mild technical conditions on the manifold plus the following assumption on the population distribution: in a neighbourhood of the cut locus of the population Fréchet mean, the population distribution is absolutely continuous with respect to the volume measure on the manifold and in this neighhbourhood the Radon–Nikodym derivative has a version that is continuous. So far as we are aware, the CLT given here is the first which allows the cut locus to have co-dimension one or two when it is included in the support of the distribution. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus is one or greater than one: in the former case a non-standard term appears but not in the latter case. This is the first paper to give a general and explicit expression for the non-standard term which arises when the co-dimension of the cut locus is one.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The Fréchet mean, the natural setting for which is a metric space, is defined as the point, or set of points, in the space for which the sum of squared distances is minimised. In Euclidean spaces and normed vector spaces, the Fréchet mean is the standard linear mean. More generally, it extends the concept of the mean to nonlinear spaces. In this paper we focus on the large sample behaviour of the sample Fréchet mean based on the intrinsic distance in smooth, compact Riemannian manifolds.
Central limit theory for Fréchet means on compact Riemannian manifolds has been an ongoing topic of research for over 20 years. The principal source of difficulty in proving a general central limit theorem for the intrinsic Fréchet mean is due to the so-called cut locus of a manifold. Roughly speaking, the cut locus of a point x in a manifold \({\varvec{M}}\) is the set of points \(z \in {\varvec{M}}\) such that there exists more than one distance-minimising geodesic from x to z or the distance function from x is singular at z. This non-uniqueness produces non-smooth behaviour in the estimating function for the Fréchet mean. However, despite the challenge posed by the cut locus, there has been some progress in this area, typically with the limitation that the cut locus of the population Fréchet mean is assumed to lie outside the support of the population distribution.
For an account of nonparametric inference for manifold-valued data see Bhattacharya and Bhattacharya [1]. Significant contributions on central limit theorems (CLTs) for the Fréchet mean in compact Riemannian manifolds include the following. The papers of Bhattacharya and Patrangenaru [4, 5] were the first to lay out an extensive Fréchet central limit theory for manifolds, covering both intrinsic and extrinsic means; Kendall and Le [15] proved a CLT for Fréchet means based on independent but not necessarily identically distributed manifold-valued random variables; Bhattacharya and Lin [2] considered a more general metric space setting than just manifolds but also derived results of interest for manifolds; Eltzner and Huckemann [9] obtained further extensions and they also discussed a phenomenon that they call smeariness; moreover Eltzner et al. [10] proved a further CLT and developed the concepts of topological stability and metric continuity of the cut locus which we make use of later in the paper. However, all of the CLTs for Fréchet means in general compact Riemannian manifolds given in the contributions mentioned above, and to the best of our knowledge all of the relevant literature, with two exceptions to be discussed below, assume that the relevant population distribution has support which excludes the cut locus.
One of these exceptions is given by Theorem 3.3 in Bhattacharya and Lin [2]. They showed that the Fréchet mean exhibits standard behaviour also when the cut-locus of small balls around the population mean carry mass which goes to zero faster than the radius raised to the manifold’s dimension plus two. This will be the case if the distribution is absolutely continuous with respect to the Riemannian volume measure and the cut-locus is of codimension three or higher, as is the case for three- and higher-dimensional spheres; cf. Corollary 3.5 in Bhattacharya and Lin [2]. In fact, the authors remark that they can treat the two-dimensional sphere only under support restrictions excluding the cut locus (see their Remark 3.7). However, we speculate that it may be possible to use results along the lines of Brown [6], see also Ritov [20], to prove a standard CLT for the Fréchet mean in the case of \(S^2\), where the cut locus has co-dimension 2, but we have not yet investigated all of the details. Here, we take a different approach to that problem.
The other exception referred to above is Theorem 2.11 of Eltzner and Huckemann [9]. As these authors point out, their Theorem 2.11 is a generalisation and adaptation of van der Vaart (1998, Theorem 5.2.3). This is an interesting result which provides a starting point for a general study of the phenomenon called smeariness; see Eltzner and Huckemann [9]. However, from the perspective of the current paper, Theorem 2.11 has two deficiencies: (i) it does not give a clear picture of how the geometry impacts the CLT theory; and (ii) it is not straightforward to justify Assumption 2.6 in Theorem 2.11 in the case of compact manifolds, due to non-differentiability of the Fréchet function at the cut locus. In this paper we address point (i) in Theorem 1 and Theorem 2 below; and we use Theorem 1 to deal directly with the problems caused by non-differentiability at the cut locus, thereby side-stepping point (ii).
At the outset it was not clear whether the CLT for the intrinsic Fréchet mean on compact Riemannian manifolds exhibits standard behaviour but with technically difficult proofs or whether non-standard behaviour can occur. The article by Hotz and Huckemann [13], who considered the intrinsic Fréchet mean on the circle, \(S^1\), settled the matter by showing that highly non-standard behaviour occurs in this setting. This sets the scene for the currently open question of the appropriate form of the central limit theorem for the intrinsic Fréchet mean in a general compact Riemannian manifold.
The principal aims of this paper are (i) to clarify when non-standard behaviour of the Fréchet mean in compact Riemannian manifolds occurs; and (ii) to characterise the non-standard behaviour when it does occur. Specifically, we allow the support of the population distribution to include the cut locus and only a mild regularity assumption is made in this regard. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus relative to \({\varvec{M}}\) is 1 or greater than 1: in the former case a non-standard term will appear but not in the latter case. The non-standard term which arises when the co-dimension of the cut locus is 1 is precisely characterised.
The main results of the paper, Theorems 1 and 2, are stated in Sect. 2 and are proved in Sects. 3 and 4, respectively.
2 Main results
2.1 Central limit theorem
Let \({\varvec{M}}\) be a compact and connected Riemannian manifold (without boundary) of dimension m and let \(\rho \) denote the distance function on \({\varvec{M}}\times {\varvec{M}}\) induced by the Riemannian metric. Suppose that \(\mu \) is a probability measure on M. The Fréchet function \(F_{\mu }\) of \(\mu \) is defined as
Since \({\varvec{M}}\) is compact, \(F_{\mu }(x)<\infty \) for all \(x\in {\varvec{M}}\). The population Fréchet mean is defined by
For some \(\mu \), \(x_0\) will consist of a subset of \({\varvec{M}}\) rather than a single point in \({\varvec{M}}\). It will be assumed throughout the paper that \(x_0\) is unique.
Suppose \(\xi _1, \ldots , \xi _n \in {\varvec{M}}\) is a random sample drawn independently from \(\mu \). Then, the set of sample Fréchet means is defined by
where \(\mathcal {G}_n \subset {\varvec{M}}\) is the set of global minima of \(n^{-1} \sum _{i=1}^n \rho (\xi _i, y)^2\). In those cases where \(\mathcal {G}_n\) is not a singleton set, it is assumed that a measureable selection \(\hat{\xi }_n \in \mathcal {G}_n\) has been made, so that \(\hat{\xi }_n \in \mathcal {G}_n\) is a measureable random element in the case where \(\mathcal {G}_n\) is not a singleton set.
The following result, proved in Section 5.1, makes use of the strong laws of large numbers proved by Ziezold [24] and Evans and Jaffe [11].
Proposition 1
Assume that (i) \({\varvec{M}}\) is compact and (ii) \(x_0 \in {\varvec{M}}\) is the unique population Fréchet mean of \(\mu \). For each n, let \(\hat{\xi }_n \in \mathcal {G}_n\) denote any measureable selection from \(\mathcal {G}_n\). Then \(\rho (x_0, \hat{\xi }_n) \overset{a.s.}{\rightarrow }0\) as \(n \rightarrow \infty \).
Let \(\mathcal {T}_x({\varvec{M}})\) denote the tangent space at \(x \in {\varvec{M}}\) and write \(\exp _x(v)\) to denote the exponential map, which maps a point \(v \in \mathcal {T}_x({\varvec{M}})\) to the point \(\exp _x(v) \in {\varvec{M}}\). The inverse exponential (or log) map, denoted \(\exp _x^{-1}(y)\), maps a point \(y \in {\varvec{M}}\setminus {\mathcal {C}}_x\) to the point \(\exp _x^{-1}(y) \in \mathcal {T}_x({\varvec{M}})\), where \({\mathcal {C}}_x\) denotes the cut locus of x. See, for example, Chavel [7] for terminology. Also, define
where \(1_A\) denotes the indicator function of a set A. Note that \(\{G_{\mu } (x): \, x \in {\varvec{M}}\}\) is a vector field on \({\varvec{M}}\). It follows from the result of [17] that
and that, with probability one under the product measure determined by \(\mu \),
where \(\hat{\mu }_n\) is the empirical distribution on \({\varvec{M}}\) based on the random sample \(\xi _1, \ldots , \xi _n\).
Before stating Theorems 1 and 2, we mention a number of relevant facts. For any fixed \(x \in {\varvec{M}}\), define \(\rho _x(\cdot )=\rho (x, \cdot )\). We denote by D the covariant derivative and by \(\nabla \) the gradient operator, both defined on \({\varvec{M}}\). For \(x'\not \in {\mathcal {C}}_x\),
(cf. Jost [14], p. 203). Moreover, the Hessian, \(\hbox {Hess}^f\), of a smooth function f on \({\varvec{M}}\) is the (symmetric) (0, 2)-tensor field such that, for any vector fields U and V on \({\varvec{M}}\),
(cf. O’Neill [19], p.86). That is, \(\hbox {Hess}^{\rho _x^2}\) can be expressed as
for any smooth vector fields U, V on \({\varvec{M}}\) and any \(x'\in {\varvec{M}}\setminus {\mathcal {C}}_x\), where \(H(x'| x)\) is the (1, 1)-tensor such that, for any smooth vector field V on \({\varvec{M}}\),
For any \(x\in {\varvec{M}}\) and \(\delta >0\), define the sets
and
where \(B_{\delta } (x) = \{x' \in {\varvec{M}}:\, \rho (x,x') < \delta \}\). The concepts of topological stability and metrical continuity of the cut locus are relevant in the present context; see definitions 3.6 and 3.10 in Eltzner et al. [10]. Corollary 3.8 and Proposition 3.11 in Eltzner et al. [10] prove that both topological stability and metric continuity of the cut locus hold for compact Riemannian manifolds. Here, it will be slightly more convenient to use the concept of metric continuity at any point \(x \in {\varvec{M}}\). In the notation defined above, metric continuity entails the following.
Proposition 2
If \({\varvec{M}}\) is a compact Riemannian manifold and \(x \in {\varvec{M}}\), then for any \(\delta >0\) there exists a \(\delta _1>0\) such that
In Appendix A we give a proof for Proposition 2 different to that given by Eltzner et al. [10].
Let \(\text {vol}_{{\varvec{M}}}\) denote the Riemannian volume measure on \({\varvec{M}}\). The key linearization result we need is the following.
Theorem 1
Assume that (i) \({\varvec{M}}\) is a compact, connected Riemannian manifold; (ii) \(x_0 \in {\varvec{M}}\) is the unique population Fréchet mean of \(\mu \); (iii) for \(\delta >0\) sufficiently small, \(\mu \), restricted to \(\mathcal {B}_{\delta }(x_0)\) defined in (10), is absolutely continuous with respect to \(\text {vol}_{{\varvec{M}}}\) and the corresponding Radon–Nikodym derivative has a version \(\psi \) which is continuous on \(\mathcal {B}_{\delta }(x_0)\); (iv) as \(\delta \downarrow 0\), \(\mathcal {A}_{\delta }(x_0)\) in (9) satisfies \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x_0))= O(\delta )\); (v) the function \(\xi \mapsto H(x_0\vert \xi )\) is \(\mu \)-integrable, i.e. the integral
exists, where \(H(\cdot \vert \cdot )\) is defined in (8). Then the vector field \(G_{\mu }(x)\) admits the following linearization for \(x \in {\varvec{M}}\) in a neighbourhood of \(x_0\):
where \(\Pi _{x,x_0}\) denotes parallel transport from \({\mathcal {T}}_x({\varvec{M}})\) to \({\mathcal {T}}_{x_0}({\varvec{M}})\) along the (unique) shortest geodesic between x and \(x_0\), \(\vert \vert R(x,x_0)\vert \vert = o(\rho (x,x_0))\) and \(\Psi _{\mu }(x_0)\) is a (1, 1)-tensor defined by
where \(J_{\mu }(x_0)\) is defined by (20) at \(x=x_0\).
The second term \(J_{\mu }\) in the expression for \(\Psi _{\mu }\) specifies the role played by the cut locus of the Fréchet mean \(x_0\). A direct consequence of the proof of Theorem 1 is that the result holds regardless of the codimension of the cut locus of \(x_0\). In particular, in the case that the codimension of the cut locus of \(x_0\) is greater than one, the expression for \(\Psi _{\mu }(x_0)\) reduces to \(\int _{{\varvec{M}}}H(x_0|\xi )\,\hbox {d}\mu (\xi )\).
Note that we choose to use the term Radon–Nikodym derivative throughout this paper, instead of the commonly used term probability density function. This allows us to express quantities of interest in a coordinate-free fashion, while the term ‘probability density function’ is typically used in the Euclidean setting with the volume measure expressed in terms of a fixed standard coordinate system.
The proof of Theorem 1, given in Sect. 3, uses some involved geometric arguments. These arguments are of potentially broader interest than just the current context. The definition of \(\Psi _{\mu }(x_0)\), which is given in the next subsection, has a particularly interesting form when the co-dimension of the cut-locus of \(x_0\) is 1. In this case \(\Psi _{\mu }(x_0)\) contains a non-standard term which we discuss in detail below, and illustrate in some examples at the end of the section.
Note that, under assumption (iii) of Theorem 1, \(G_{\mu }\) defined by (3) can be written as
in a neighbourhood of \(x_0\) and so we have the relationship
in that neighbourhood. Then, one immediate consequence of Theorem 1 is that the Hessian of the Fréchet function \(F_{\mu }\) at \(x_0\) exists and it can be expressed in terms of \(\Psi _{\mu }\) as
In fact, a slight modification of the proof for Theorem 1 shows that the same result holds in a neighbourhood of \(x_0\).
We now state our main result, a CLT for \(\hat{\xi }_n\), assumed to be a measurable selection from \(\mathcal {G}_n\).
Theorem 2
Suppose that assumptions (i) – (v) of Theorem 1 hold and that \(\hat{\xi }_n\) is any measurable selection from \(\mathcal {G}_n\), as in Proposition 1. In addition, assume (vi) that \(\Psi _{\mu }(x_0)\) is strictly positive definite. Then
where \(V_0=\textrm{Cov}(\exp ^{-1}_{x_0}(\xi _1))\).
2.2 Discussion of assumptions
Here we discuss the assumptions made in Theorems 1 and 2. Assumptions (i) and (ii) in Theorem 1 define the setting that we consider. Assumption (iii) in Theorem 1 implies a certain level of regularity of the population distribution in a neighbourhood of the cut locus of the Fréchet mean; some such regularity is needed for an expansion of the type (13) to hold. Previous central limit theorems in this setting, such as Bhattacharya and Patrangenaru [4, 5] have made the much stronger assumption that the population probability density function is zero in a neighbourhood of the cut locus of the population Fréchet mean. Bhattacharya and Lin [2] have assumed \(\mu ({\mathcal {A}}_{\delta }(x_0)) = o(\delta ^2)\) whereas our assumptions (iii) and (iv) amount only to \(\mu ( {\mathcal {A}}_{\delta }(x_0)) =O(\delta )\).
Assumptions (iv) and (v) in Theorem 1 are largely geometric in character. For each of these assumptions, it would be interesting to know whether or not it holds for all smooth, compact connected manifolds when the population Fréchet mean is unique. However, we do not have a proof or a counter-example to this statement in either case and we have found nothing in the literature that throws light on either question.
Finally, assumption (vi) in Theorem 2 is a non-degeneracy assumption. If \(\Psi _{\mu }(x_0)\) is non-negative definite but not of full rank then we appear to be in a similar, but more general, situation to that of a smeary central limit theorem, as discussed by Eltzner and Huckemann [9]. Specifically, a type of central limit theorem is expected to hold but with a non-standard convergence rate which depends on the level of smoothness of the population distribution. The reason that we believe the situation is potentially more general than that considered by Eltzner and Huckemann [9] is that they focus on situations where each component in the CLT has the same convergence rate, but in general rank-deficient cases there is the potential for different components to have different convergence rates. In any case, it appears that a proper study of smeariness in rank-defficient cases is going to be very challenging because it will involve the calculation of higher-order terms in the Taylor expansion of the Fréchet function and it is unclear how to deal with this in general. As a consequence, the discussion in this paragraph is somewhat speculative.
Two further points. First, bearing in mind that \(2 \Psi _{\mu }(x_0)\) is the Hessian of the Fréchet function \(F_{\mu }(x)\), see (15), it follows that if \(\Psi _{\mu }(x_0)\) has one or more strictly negative eigenvalues then this contradicts \(x_0\) being a Fréchet mean due to the Hessian of the Fréchet function \(F_{\mu }(x)\) in (1) not being non-negative definite, in which case \(x_0\) can not be a stationary minimum of the Fréchet function. Second, although in this paper we have not explored the practical impact of the non-standard form of the Hessian that arises when the cut locus has codimension 1, we believe that, due to the likely connection with smeariness in rank-deficient cases, the practical impact will be considerable in some cases; see Eltzner and Huckemann [9] and other references therein for further information about the potential practical impact of smeariness.
2.3 The expression of \(\Psi _{\mu }(x_0)\)
The expression of \(\Psi _{\mu }(x_0)\) appearing in Theorems 1 and 2 comprises two terms, one associated with the Hessian of the squared distance function, away from the cut locus of \(x_0\), and the other with the behaviour of the distance function on the cut locus \({\mathcal {C}}_{x_0}\) of \(x_0\). Hence, the second term reflects the geometric structure of the manifold \({\varvec{M}}\).
Note that \(\rho _x(\cdot )^2=\rho (x,\cdot )^2\) is a smooth function away from the cut locus of x. The tensor \(H(x_0|\cdot \,)\) which appears in (7) and (8) determines the first term of \(\Psi _{\mu }(x_0)\). The construction above for \(H(x_0| x)\) requires that \(x\not \in {\mathcal {C}}_{x_0}\). Nevertheless, it follows from the result of Le and Barden [17] that \(H(x_0|\xi _1)\) is well-defined with probability one, because condition (iii) of Theorem 1 implies that \(\mu (\xi \in \mathcal {C}_{x_0})=0\), i.e. the cut locus of \(x_0\) has zero probability under \(\mu \).
To introduce the second term of \(\Psi _{\mu }(x_0)\), we first recall some facts on the cut locus \({\mathcal {C}}_x\) of x and the behaviour of \(\rho _x\) nearby. These results, explicitly or implicitly stated in Barden and Le [3] & Le and Barden [16], are given in the following lemmas. The first one is on the structure of \({\mathcal {C}}_x\), the cut locus of x.
Lemma 1
For any \(x \in {\varvec{M}}\) there is a set \({\mathcal {Q}}_x\) of co-dimension at least two contained in \({\mathcal {C}}_x\) and containing the first conjugate locus of x such that if \({\mathcal {H}}_x={\mathcal {C}}_x\setminus {\mathcal {Q}}_x\) non-empty, it is a countable union of disjoint hyper-surfaces (codimension one sub-manifolds) where, for each \(y\in {\mathcal {H}}_x\), there are exactly two minimal geodesics from x to y. In particular, \({\mathcal {H}}_x\) is a Borel measurable set and \(y\in {\mathcal {H}}_x\) if and only if \(x\in {\mathcal {H}}_y\).
The decomposition of \({\mathcal {C}}_x\) in Lemma 1 above is the same as that given in Theorem 2 of [16], but slightly different from that given in Proposition 2 in [3]. In [3] \({\mathcal {Q}}_x\) is the set of the first conjugate loci of x in \({\mathcal {C}}_x\), while here \({\mathcal {Q}}_x\) is the union of the set of the first conjugate loci of x in \({\mathcal {C}}_x\) with the set of non-conjugate points in \({\mathcal {C}}_x\) which have more than two minimal geodesics to x. Furthermore, the proof of Theorem 2 in Le and Barden [16] made it clear that the set \({\mathcal {Q}}_x\), which was called E there, has codimension at least two, although the Theorem itself only stated that it has Hausdorff \((m-1)\)-measure zero as needed for that paper. In particular, that the set of the first conjugate loci of x has co-dimension at least two was proved in Proposition 1 of Barden and Le [3].
The next two lemmas show that, although \(\rho _x\) is not differentiable at \({\mathcal {C}}_x\), it is relatively well behaved in a neighbourhood of \({\mathcal {H}}_x\). For these results, as well as their corollaries, we assume that \({\mathcal {H}}_x\) is non-empty.
Lemma 2
Let \({\mathcal {H}}_x\) be given as in Lemma 1. For each \(y\in {\mathcal {H}}_x\), there is a neighbourhood \({\mathcal {V}}_y\) of y in \({\varvec{M}}\) on which there are two unique smooth functions \(\phi _{1y}(\,\cdot | x)\) and \(\phi _{2y}(\,\cdot | x)\) such that for any \(y'\in {\mathcal {V}}_y\),
where \(\phi _{1y}(y'| x)=\phi _{2y}(y'| x)\) if and only if \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\).
The neighbourhood \({\mathcal {V}}_y\) and the two functions \(\phi _{iy}(\,\cdot | x)\) in the above Lemma were constructed in the proof of Proposition 1 in Barden and Le [3] as follows. There are two disjoint neighbourhoods \({\mathcal {U}}_{1y}\) and \({\mathcal {U}}_{2y}\) in \({\mathcal {T}}_x({\varvec{M}})\) such that, for each i, \({\mathcal {V}}_y=\exp _x({\mathcal {U}}_{iy})\). Then, \(\phi _{iy}(y'| x)=\Vert (\exp _x^{-1}|_{{\mathcal {U}}_{iy}}(y')\Vert \) for \(y'\in {\mathcal {V}}_y\).
The next result is an immediate consequence of this construction.
Lemma 3
Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {V}}_y\), \({\mathcal {U}}_{jy}\) and \(\phi _{jy}\) be given as in Lemma 2 and in the following construction. If, for \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\), \(\gamma _j\) is the minimal geodesic with \(\gamma _j(0)=x\), \(\gamma _j(1)=y'\) and \(\dot{\gamma }_j(0)\in {\mathcal {U}}_{jy}\), then
that is, \(\nabla \phi _{jy}(y'| x)\) is the unit tangent vector to \(\gamma _j\) at \(y'\).
The following result follows from the uniqueness of the pair of functions \(\phi _{jy}\), \(j=1,2\), stated in Lemma 2.
Corollary 1
Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {V}}_y\) and \(\phi _{jy}\) be given as in Lemma 2. For each \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\), the unordered pair of the functions \(\{\phi _{1y'}(\cdot | x),\phi _{2y'}(\cdot | x)\}\) coincides with the pair \(\{\phi _{1y}(\cdot | x),\phi _{2y}(\cdot | x)\}\) on \({\mathcal {V}}_y\bigcap {\mathcal {V}}_{y'}\). Thus, the difference \(\phi _{1z}(\cdot | x)-\phi _{2z}(\cdot | x)\) is, up to sign, independent of \(z\in ({\mathcal {V}}_y\cup {\mathcal {V}}_{y'})\cap {\mathcal {H}}_x\) and so, making a continuous choice of sign, this difference is a well-defined smooth function \(\chi _i(\cdot | x)\) on a neighbourhood (in \({\varvec{M}}\)) of each connected component \({\mathcal {H}}_i(x)\) of \({\mathcal {H}}_x\).
This, together with the results in Barden and Le [3], implies the following relationship between \({\mathcal {H}}_i\) and \(\chi _i\).
Corollary 2
Let \({\mathcal {H}}_i\) and \(\chi _i\) be given as in Corollary 1. For \(y\in {\mathcal {H}}_i(x)\), \(\nabla \chi _i(y| x)\) is non-zero and normal to \({\mathcal {H}}_i(x)\) at y.
With the above understanding of \({\mathcal {C}}_x\) and \(\rho _x\) nearby, we reach the following main ingredients for our definition of the second term of \(\Psi _{\mu }(x_0)\).
Corollary 3
Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {H}}_i\) and \(\chi _i\) be given as in Corollary 1. Then,
- (a):
-
the set \({\mathcal {H}}_x\) can be expressed as the countable union of the \({\mathcal {H}}_i(x)\);
- (b):
-
the function
$$\begin{aligned} \kappa (y| x)=\left\| \nabla \chi _i(y| x)\right\| ,\qquad \hbox { if }y\in {\mathcal {H}}_i(x), \end{aligned}$$(16)is a well-defined smooth function on a neighbourhood of \({\mathcal {H}}_x\);
- (c):
-
the unit normal vector field given by
$$\begin{aligned} {\varvec{n}}(y| x)=\frac{\nabla \chi _i(y| x)}{\kappa (y| x)}\in {\mathcal {T}}_y({\varvec{M}}),\qquad \hbox { if }y\in {\mathcal {H}}_i(x), \end{aligned}$$(17)is well-defined up to sign on \({\mathcal {H}}_x\).
Note that, for \(y\in {\mathcal {H}}_i(x)\),
and that, for \(y\in {\mathcal {H}}_i(x)\) and \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_i(x)\),
Now, for \(y\in {\mathcal {H}}_x\), define \(\hbox {d}y^{\perp _x}\) to be the 1-form, unique up to sign, given by \(\hbox {d}y^{\perp _x}(U(y))=\langle {\varvec{n}}(y| x),U(y)\rangle \) for any tangent vector U(y) at y. Write \(J(y\,|\, x)\) for the well-defined (0, 2)-tensor at y on \({\mathcal {H}}_x\) given by
That is, for any \(y\in {\mathcal {H}}_x\) and any \(U(y),V(y)\in \mathcal {T}_y({\varvec{M}})\),
Write \(\alpha (t)\) for the unit speed geodesic orthogonal to \({\mathcal {H}}_y\) at \(x\in {\mathcal {H}}_y\) and \(\tau _{y|x}(t)\) for the distance from y to \({\mathcal {H}}_{\alpha (t)}\) along the geodesic orthogonal to \({\mathcal {H}}_x\). Then
That is, \(\tau '_{y|x}(0)\) represents the rate of change of y orthogonal to \({\mathcal {H}}_x\) as x moves orthogonally to \({\mathcal {H}}_y\), while each remains in the relevant co-dimension one subset of the cut locus of the other. In terms of \(J(x_0| y)\), \(\tau '_{y|x}(0)\) and \(\psi \), the Radon–Nikodym derivative of \(\mu \) with respect to the volume measure in a neighbourhood of \({\mathcal {C}}_{x_0}\), we denote by \(J_{\mu }(x)\) the (1, 1)-tensor field defined, for any vector field V(x), by
where \(\hbox {d}\hbox {vol}_{{\mathcal {H}}_x}\) denotes the co-dimension one surface measure on \({\mathcal {H}}_x\).
2.4 Three examples
In the case of symmetric spaces, \(\tau '_{y|x}(0)\equiv 1\) and so the expression for \(J_{\mu }(x)\) defined by (20) can be simplified. We now calculate \(J_{\mu }(x)\) for special symmetric spaces with appropriate ‘coordinate systems’. Moreover, we show that condition (iv) in Theorem 1 is satisfied in each of the three examples, i.e. we show that \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x))=O(\delta )\) as \(\delta \downarrow 0\), where \(\mathcal {A}_{\delta }(x)\) is defined in (9); and we show that condition (v) in Theorem 1 is also satisfied in the three examples.
(a) \({\varvec{M}}=S^1\): \({\mathcal {H}}_x={\mathcal {C}}_x\) contains only the antipodal point y of x. Thus, \(\rho (x,y)=\pi \); the initial tangent vectors of the two geodesics from x to y have the opposite direction so that \(\kappa (x| y)=2\); and we may take \({\varvec{n}}(x| y)=1\). Hence, if we take the standard coordinate in the subset \((-\pi ,\pi ]\) in its universal cover with \(x=0\), then the corresponding \(J_{\mu }\) is \(J_{\mu }(0)=2\pi \,\psi (\pi )\), identical with the extra term in the covariance of the central limit theorem of Hotz and Huckemann [13].
Finally, we check conditions (iv) and (v) of Theorem 1. Since \(\mathcal {C}_x\) is the antipodal point of x, it follows that, in the local coordinates introduced above, \(\mathcal {A}_{\delta }\) may be written as \(\mathcal {A}_{\delta }(0)=(-\pi , -\pi +\delta ) \cup (\pi -\delta , \pi ]\), so that \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta } (x))=2\delta \) and therefore condition (iv) of Theorem 1 is satisfied. Condition (v) follows because the circle is flat and therefore \(H(x_0\vert \xi )=-1\) if \(\xi \) is not the antipodal point of \(x_0\).
For higher dimensional spheres \(S^d\), \(d > 1\), we have \(\text {vol}_{{\varvec{M}}}({\mathcal {A}}_{\delta }(x)) = O(\delta ^d)\) but \({\mathcal {H}}_x\) is empty since the cut-locus is of co-dimension \(d > 1\), so \(J_{\mu }(0)\) vanishes. For \(d > 2\) this has already been observed by Bhattacharya & Lin [2] but the CLT for \(S^2\) given a non-vanishing density at the cut locus appears to be new.
(b) \({\varvec{M}}=S^1\times S^1\) (the standard torus): We take the standard coordinate system in the subset \((-\pi ,\pi ]\times (-\pi ,\pi ]\) in its universal covering space with \(x=(0,0)\). Then \({\mathcal {C}}_x={\mathcal {H}}_x\cup \{(\pi ,\pi )\}\) where \({\mathcal {H}}_x\) is the union of two disjoint sets \({\mathcal {H}}_1(x)\) and \({\mathcal {H}}_2(x)\) where \({\mathcal {H}}_1(x)=(-\pi ,\pi )\times \{\pi \}\) and \({\mathcal {H}}_2(x)=\{\pi \}\times (-\pi ,\pi )\). Under this coordinate system, \(U=\frac{\partial }{\partial x_1}\) and \(V=\frac{\partial }{\partial x_2}\) form an orthonormal basis of \({\mathcal {T}}_x({\varvec{M}})\), and, for any \(y=(y_1,y_2)\in {\varvec{M}}\), \(\rho (x,y)^2=y_1^2+y_2^2\). Also, up to sign, for \(y\in {\mathcal {H}}_1(x)\), \({\varvec{n}}(x| y)=V\) and, for \(y\in {\mathcal {H}}_2(x)\), \({\varvec{n}}(x| y)=U\). For \(y\in {\mathcal {H}}_1(x)\), \(\rho (x,y)\kappa (x| y)=2|y_1|=2\pi \) and, similarly, \(\rho (x,y)\kappa (x| y)=2|y_2|=2\pi \) for \(y\in {\mathcal {H}}_2(x)\). Thus,
Hence, in this case, under the chosen ‘coordinate system’, the corresponding \(J_{\mu }\) is
Finally, we note that
In this case, \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x))\) is seen to be bounded by \(8 \pi \delta \). It follows that condition (iv) of Theorem 1 is satisfied. Condition (v) of Theorem 1 follows because the torus is flat and hence \(H(x_0\vert \xi )=-I_{2\times 2}\), the identity, for \(\xi \not \in {\mathcal {C}}_{x_0}\).
For d-dimensional tori with \(d > 2\), \({\mathcal {C}}_x\) is given by the union of \((d-1)\)-dimensional tori, and the conditions remain satisfied with \(J_{\mu }\) not vanishing in general. With a similarly defined coordinate system around x and a similar argument, the corresponding \(H(x_0\vert \xi )=-I_{d\times d}\) and \(J_{\mu }(0,\cdots ,0)\) is the \(d\times d\) diagonal matrix whose ith diagonal element is
(c) \({\varvec{M}}=\mathbb{R}\mathbb{P}^2\) (two-dimensional real projective space): \({\mathcal {Q}}_x=\emptyset \) so that \({\mathcal {H}}_x={\mathcal {C}}_x\); and for any \(y\in {\mathcal {C}}_x\), \(\rho (x,y)=\pi /2\) where the initial tangent vectors of the two minimal geodesics from x to y are in opposite directions. Hence, for \(y\in {\mathcal {C}}_x\), \(\rho (x,y)\,\kappa (x| y)=\pi \). We take the normal coordinates centred at x on \({\mathcal {T}}_x({\varvec{M}})\). Then, using the corresponding polar coordinates \((r,\theta )\), for any \(y\in {\mathcal {C}}_x\) one of the initial unit tangent vectors to the two geodesics from x to y has coordinates \((\cos \theta ,\sin \theta )\) where \(\theta \in [0,\pi )\), which we take as \({\varvec{n}}(x| y)\). Thus, for \(y\in {\mathcal {C}}_x\),
so that in this case, under this coordinate system, the corresponding \(J_{\mu }\) is
This expression can be verified by direct computation of the Hessian of \(F_{\mu }\).
Finally, we consider conditions (iv) and (v) of Theorem 1. We first identify the form of \(\mathcal {A}_{\delta }(x_0)\). Without loss of generality we take \(x_0\) to be \(x_0= (0,0,1)^{\top }\) and represent \(\mathbb{R}\mathbb{P}^2\) by the hemisphere \(\{x =(x_1, x_2, x_3)^{\top } \in \mathcal {S}^2:\, x_3 \ge 0\}\). Then it is easy to see that \(\mathcal {A}_{\delta }(x_0)\) is given by
Moreover, the volume of \(\mathcal {A}_{\delta }(x_0)\) with respect to surface area measure on \(S^2\) is \(2 \pi \sin \delta \). It follows easily that condition (iv) of Theorem 1 is satisfied here, too.
Condition (v) requires a bit more work to check in this example. From Kendall and Le [15], \(H(x\vert y)\) on the sphere \(S^2\) is given by the map
where \(\langle \cdot , \cdot \rangle \) is the Riemannian inner product on the tangent space at \(x \in {\varvec{M}}\), noting that H defined in [15] has the opposite sign to the one used in this paper. When restricted to the (open) half sphere centred at y, it gives H(x|y) on \(\mathbb{R}\mathbb{P}^2\). For given \(x \in {\varvec{M}}\) there is a possible singularity at \(y=x\). However, the singularity is in fact a removable singularity because, for x close to y, \(\rho (x,y) \sim \sin (\rho (x,y))\) and \(1- \cos (\rho (x,y)) \sim \rho (x,y)^2\). Then, the boundedness of H(x|y) ensures that
is well-defined.
As is the case with higher-dimensional tori, it is also easy to see that, for \(\mathbb{R}\mathbb{P}^d\) with \(d > 2\), conditions (iv) and (v) remain satisfied with the expression for \(H(x\vert y)\) being the same as above for \(\mathbb{R}\mathbb{P}^2\). Also, \(J_{\mu }(x_0)\) will not vanish in general. With a similar coordinate system around \(x_0\) and a similar argument, its expression can be obtained from the above for \(\mathbb{R}\mathbb{P}^2\) with \({\varvec{n}}(x| y)\) there replaced by
and with the integration there replaced by \((d-1)\)-fold integration.
3 Proof of Theorem 1
To prove Theorem 1, we first consider a generalised version of the Taylor expansion of the inverse exponential map at different base points. That is, for fixed \(z\in {\varvec{M}}\), we study the Taylor expansion for the vector field \(\exp ^{-1}_x(z)\) for \(x\not \in {\mathcal {C}}_z\). For this, we fix \(z\in {\varvec{M}}\) and, for \(x_0,x_1\not \in {\mathcal {C}}_z\) sufficiently close, denote by \(\gamma \) the unit speed geodesic segment such that \(\gamma (0)=x_0\) and \(\gamma (\rho (x_0,x_1))=x_1\).
If \(\gamma (t)\not \in {\mathcal {C}}_z\) for all \(t\in (0,\rho (x_0,x_1))\), \(\exp ^{-1}_{\gamma (t)}(z)\) is a smooth vector field along \(\gamma \). Then, it follows from the definition of the covariant derivative that the Taylor expansion for \(\exp _x^{-1}(z)\) about \(\exp ^{-1}_{x_0}(z)\) takes the form
where \(H(x'| x)\) is defined by (8) for \(x'\in {\varvec{M}}\setminus {\mathcal {C}}_x\) and \(\vert \vert R(x_0,x_1) \vert \vert = o(\rho (x_0,x_1))\)..
In the case that \(\gamma (t)\in {\mathcal {H}}_z\subseteq {\mathcal {C}}_z\) for some \(t\in (0,\rho (x_0,x_1))\), we have the following result on the approximation of \(\Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) \) in terms of \(\exp ^{-1}_{x_0}(z)\), generalising the Taylor expansion (21) for smooth vector fields.
Proposition 3
Let \(x_0,x_1,z\in {\varvec{M}}\) be such that \(x_0,x_1\not \in {\mathcal {C}}_z\) are sufficiently close and \(\gamma \) be the minimal unit speed geodesic from \(x_0\) to \(x_1\). If there is a parameter \(t_z\in (0,\rho (x_0,x_1))\) such that \(\gamma (t_z)\in {\mathcal {H}}_z\), then
where \(H(x'| x)\) is defined by (8), \(\kappa (y| x)\) is defined by (16) and \({\varvec{n}}(y| x)\) is defined by (17)
Proof
If there is a parameter \(t_z\in (0,\rho (x_0,x_1))\) such that \(\gamma (t_z)\in {\mathcal {H}}_z\subseteq {\mathcal {C}}_z\), such \(t_z\) is unique provided \(x_0\) and \(x_1\) are sufficiently close. To see the uniqueness of \(t_z\), we first exclude the possibility of a finite number of such \(t_z\). This can be achieved by replacing \(x_1\) with a new point, on the geodesic from \(x_0\) to \(x_1\), between the first and second such \(t_z\), which is permissable because \(x_0\) and \(x_1\) can be chosen arbitrarily close. The other possibility is that there are an infinite number of such \(t_z\). Let \(t_z^{*}\) denote the infimum of such \(t_z\). If \(t_z^{*}\) is not an accumulation point, then the proof is the same as in the finite case. If \(t_z^{*} = 0\), then 0 must be an accumulation point, in which case we have a contradiction, because \(\mathcal {C}_z\) is closed and the fact that there is a sequence of \(t_z\) suct that \(t_z \rightarrow 0\) would imply that \(x_0 \in \mathcal {C}_z\), thereby contradicting the given assumption. The only other possibility is that \(t_z^{*}>0\) and \(t_z^{*}\) is an accumulation point. In this case we redefine \(x_1\) to be \(\gamma (t)\) for any \(t \in (0,t_z^{*})\)which gives uniqueness of \(t_z\) when it exists, provided \(x_0\) and \(x_1\) are sufficiently close. See also Lemma 3 in the Supplementary Material of [18] for the description of sufficiently small neighbourhoods of non-conjugate parts of cut loci.
Without loss of generality, we may assume that the two smooth functions \(\phi _i(\,\cdot \,)=\phi _{i\gamma (t_z)}(\,\cdot | z)\), where \(\phi _{iy}(\,\cdot | x)\) are defined in Lemma 2, are chosen such that
Then, the difference between the two tangent vectors \(\Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) \) and \(\exp ^{-1}_{x_0}(z)\), both in \(\mathcal {T}_{x_0}({\varvec{M}})\), can be expressed as
The definitions for \(\kappa (y| x)\) and \({\varvec{n}}(y| x)\) given respectively by (16) and (17) imply that the terms in the third curly bracket on the right hand side above is equal to
By (21), the difference between the terms in the second curly bracket on the right hand side above and \(\Big (H(x_0| z)\Big )\Big (\exp ^{-1}_{x_0}(\gamma (t_z))\Big )\) is \(o(\rho (x_0,x_1))\). Since
a similar application of (21) to the terms in the first curly bracket results in
up to a term of order \(o(\rho (x_0,x_1))\). Hence,
However, using
and similarly for \(\exp ^{-1}_{x_1}\left( \gamma (t_z)\right) \), as well as noting \(\rho (x_0,x_1)=\rho (x_0,\gamma (t_z))+\rho (\gamma (t_z),x_1)\), we have
so that the required result follows. \(\square \)
In the remainder of the paper it will be convenient to use the different but equivalent representation of \(\mathcal {A}_{\delta }(x)\) in (9) given by
To see that (9) and (23) are equivalent, note that
using the fact that \(z \in \mathcal {C}_y\) if and only if \(y \in \mathcal {C}_z\).
Proof of Theorem 1
When \(x_0\) and \(x_1\) are sufficiently close, write \(\gamma \) for the unique unit speed geodesic from \(x_0\) to \(x_1\) and \({\mathcal {N}}^*_{x_0,x_1}\) for the set defined by
It follows from condition (iv) of Theorem 1 that the volume of \({\mathcal {N}}^*_{x_0,x_1}\) is \(O(\rho (x_0,x_1))\), because for \(\delta >0\) sufficiently small and \(x_1\) such that \(\rho (x_0, x_1)=\delta \), \(\mathcal {N}_{x_0, x_1}^{*} \subseteq \mathcal {A}_{\delta }(x_0)\) and also \(\mu (\mathcal {A}_{\delta } (x_0))=O(\delta )\) as \(\delta \downarrow 0\). Similar to \({\mathcal {N}}^*_{x_0,x_1}\), we also write \({\mathcal {A}}_{x_0,x_1}\) for the set defined by
Then \({\mathcal {A}}_{\rho (x_0,x_1)}(x_0)\supseteq \,{\mathcal {A}}_{x_0,x_1}\supseteq {\mathcal {N}}^*_{x_0,x}\).
Since \(x_0\) is the Fréchet mean of \(\mu \), \(G_{\mu }(x_0)=0\). Under the given assumption, we also have that, for \(x\in {\varvec{M}}\) in a neighbourhood of \(x_0\), \(\mu (\{\xi \in {\mathcal {C}}_x\})=0\). Thus, it follows from (21) and Proposition 2 that, for x sufficiently close to \(x_0\),
Now
On the other hand, the construction in our proof of Proposition 2 in Appendix A together with Lemma 3 of [18] imply that, for any \(\delta >0\), when \(\rho (x_0,x)\) is sufficiently small, we also have
However, by Lemma 1, \({\mathcal {Q}}_{x_0}\) has co-dimension at least two, so that the volume of the set on the right hand the above equation is \(O(\delta ^2)\). Thus, from condition (iv) of the Theorem, it follows that vol\(({\mathcal {A}}_{x_0,x} {\setminus }{\mathcal {N}}^*_{x_0,x})=O(\rho (x_0,x)^2)\). This, together with condition (v) of the Theorem, ensures that
and since the boundedness of \({\varvec{M}}\) and Lemma 1 together imply that
we have that
Thus, by the definition (14) of \(\Psi _{\mu } (x_0)\), it is sufficient to show that
where \(J_{\mu }(x_0)\) is defined by (20).
For this, we note that the functions \(\chi _i(\,\cdot | x)\) given in Corollary 1 are defined on a neighbourhood of \({\mathcal {H}}_x\). Thus, we may extend the definitions of the corresponding \(\kappa (\,\cdot | x)\) and \({\varvec{n}}(\,\cdot | x)\) given in (16) and (17) to that neighbourhood of \({\mathcal {H}}_x\). This implies that
To analyse the right hand side of (27) we consider, for any \(z\in {\mathcal {N}}^*_{x_0,x_1}\), the minimal unit speed geodesic \(\beta _z\) from z to \(x_0\). Extending \(\beta _z\) backwards beyond z, let \(y_z\) be the first hitting point of \({\mathcal {C}}_{x_0}\) on the extension; see Fig. 1. Let
Then, \({\mathcal {N}}_{x_0,x_1}\) is a Borel measurable subset of \({\mathcal {N}}^*_{x_0,x_1}\) and the difference between the volumes of \({\mathcal {N}}_{x_0,x_1}\) and of \({\mathcal {N}}^*_{x_0,x_1}\) is \(o(\rho (x_0,x_1))\). Since, by (11) and condition (iii) of Theorem 1, which states that in a neighbourhood of \({\mathcal {C}}_{x_0}\), \(\mu \) is absolutely continuous with respect to the volume measure \(\textrm{vol}_{{\varvec{M}}}(\cdot )\) with continuous Radon–Nikodym derivative \(\psi \), (27) can be expressed in terms of \({\mathcal {N}}_{x_0,x}\) as
where x is sufficiently close to \(x_0\). If we write \(u_z\) for the point on \(\beta _z\) that lies in \({\mathcal {C}}_{x_1}\) as in Fig. 1, the volume of the local cross-sectional slice of \({\mathcal {N}}_{x_0,x_1}\) at \(z\in {\mathcal {N}}_{x_0,x_1}\) can be approximated by \(\hbox {d}\hbox {vol}_{{\mathcal {H}}_{x_0}}(y_z)\,\langle {\varvec{n}}(y_z| x_0),\,\exp ^{-1}_{y_z}(u_z)\rangle \). Then, by the definition of \(\tau _{y|x}\) prior to (19), the geometric interpretation of \(\tau '_{y|x}(0)\) following (19), see also the illustration in Fig. 1, we also have
since \(y_z\in {\mathcal {H}}_{x_0}\), where both \({\varvec{n}}(y_z| x_0)\) and \({\varvec{n}}(x_0| y_z)\) are chosen such that the inner products are non-negative and where \(\tau '_{y|x}(0)\) is given in (19). As usual, the notation \(a\approx b\) used above, as well as below, means that, as \(x_1\rightarrow x_0\), the left-hand and the right-hand terms have the same limit. These observations imply that, for \(z\in {\mathcal {N}}_{x_0,x_1}\),
Using this and the continuity in z of \(\rho _z(x)\), \(\kappa (x| z)\), \({\varvec{n}}(x| z)\) and \(\psi (z)\), the dominant term on the right hand side of the second equality in (28) can be expressed as
where the \(o(\rho (x_0,x))\) term is due to the above approximation as well as to the fact that the volume of \({\mathcal {N}}_{x_0}\) is \(O(\rho (x_0,x))\). Hence, (26) follows from the definition (20) of \(J_{\mu }(x_0)\) as required. \(\square \)
4 Proof of Proposition 1 and Theorem 2
4.1 Proof of Proposition 1
For each \(n \ge 1\), let \(\hat{\xi }_n \in \mathcal {G}_n\) denote any measureable selection from \(\mathcal {G}_n\). From the strong law of large numbers in Ziezold [24], and using the assumption that \(x_0\) is the unique population mean, almost surely
where a horizontal line over a set indicates set closure. From elementary considerations, the first set inclusion below holds and therefore the set \(\mathcal {G}_0\) of limit points is
where, for each \(k\ge 1\), \(\hat{\xi }_k \in \mathcal {G}_k\). Since \(\mathcal {G}_0 \subseteq \{x_0\}\), there are two possibilities: either \(\mathcal {G}_0=\{x_0\}\), in which case the proposition follows; or, alternatively, \(\mathcal {G}_0=\emptyset \), the empty set. However, \({\varvec{M}}\) is compact, so \(\{\hat{\xi }_n\}\) must have a convergent subsequence with a limit \(x_1 \in {\varvec{M}}\). Moreover, we must have \(x_1 \in \mathcal {G}_0\) because \(x_1\) is an accumulation point of the sequence. Therefore, \(x_1=x_0\), and consequently \(\rho (\hat{\xi }_n, x_0) \rightarrow 0\) almost surely as required. \(\square \)
4.2 An elementary lemma
We first introduce some notation. If \(X=(X_1, \ldots , X_m)^{\top }\) and \(x=(x_1, \ldots , x_m)^{\top }\) are vectors with real components then statements such as \(\{X \le x\}\) and \(\{\vert X\vert \le x\}\) are interpreted component-wise as \(\{X_1 \le x_1, \ldots , X_m \le x_m\}\) and \(\{\vert X_1\vert \le x_1, \ldots , \vert X_m \vert \le x_m\}\), respectively. Also, denote by \(\Phi (x; \Sigma )\) the cumulative distribution function of a zero-mean multivariate Gaussian distribution with covariance matrix \(\Sigma \). The Euclidean norm, \((w^{\top } w)^{1/2}\), of a vector \(w \in \mathbb {R}^m\) is denoted \(\vert \vert w \vert \vert \). The following lemma is proved in Appendix B.
Lemma 4
Let X, Y denote \(\mathbb {R}^m\)-valued random vectors defined on an arbitrary probability space \((\Omega , \mathcal {F}, \mathbb {P})\). Let \(w \in \mathbb {R}^m\) denote a non-random vector with positive components. Then
Moreover, suppose that for some \(\epsilon >0\) and some \(m \times m\) covariance matrix \(\Sigma \),
Then, for a constant \(c_0>0\) depending only on \(\Sigma \),
Our proof of Theorem 2, in particular Step 1, makes use of this lemma.
4.3 Proof of Theorem 2
The proof of Theorem 2 is broken into two steps. In the first step we explain how Lemma 4 will be applied. In the subsequent step, we explain how to make the right-hand side of (31) arbitrarily small uniformly for all \(x \in \mathbb {R}^m\) and therefore the CLT in Theorem 2 will have been proved.
Step 1. Application of Lemma 4.
Write
where \(\hat{\mu }_n\) is the empirical distribution function on \({\varvec{M}}\) based on the random sample \(\xi _1, \ldots , \xi _n\) and define the vector field \(\{Z_n(x): \, x \in {\varvec{M}}\}\) on \({\varvec{M}}\) by
where \(G_{\mu } (x)\) is defined in (3). Under the conditions of Theorem 1 and 2, the population Fréchet mean \(x_0\) is a stationary minimum of (1) and, in particular,
i.e. the zero element in \(\mathcal {T}_{x_0}({\varvec{M}})\), which follows from integrating (5) over \({\varvec{M}}\) with respect to the probability measure \(\mu \) and putting \(x=x_0\). Hence
Denote the Euclidean norm (which is the induced Riemannian tangent space norm) on \(\mathcal {T}_{x_0}({\varvec{M}})\) by \(\vert \vert \cdot \vert \vert \). Since \(\vert \vert \exp _{x_{0}}^{-1}(\xi )1_{\{\xi \notin \mathcal {C}_{x_0}\}}\vert \vert \) is bounded over \(\xi \in M\), the LHS of (33) with \(x=x_0\) follows a central limit theorem in the tangent space, i.e.
where \(V_0=\text {Cov}\{\exp _{x_0}^{-1}(\xi _1)1_{\{\xi \notin \mathcal {C}_{x_0}\}}\}=\text {Cov}\{\exp _{x_0}^{-1}(\xi _1)\}\).
Moreover, \(\hat{\xi }_n \in \mathcal {G}_n\), which is assumed to be a measureable selection from \(\mathcal {G}_n\) as in Proposition 1, satisfies (4) and consequently,
Define
Then, using (37), (36) and Theorem 1, it is seen that
Since, by assumption (vi) of Theorem 2, \(\Psi _{\mu }(x_0)\) has full rank, it follows that
where
and
To establish Theorem 2, we apply Lemma 4 with X and Y defined in (40) and (41), respectively. Since, from (35), we know that \(Z_n(x_0) \overset{d}{\rightarrow }\mathfrak {N}_m(0_m, V_0)\), it follows that \(\Psi _{\mu }(x_0)^{-1}Z_n(x_0)\) is asymptotically normal with mean vector the zero vector and covariance matrix \(\Psi _{\mu }(x_0)^{-1} V_0\left\{ \Psi _{\mu }(x_0)^{\top }\right\} ^{-1}\). Moreover, as \(n \rightarrow \infty \), a suitable sequence of w’s such that \(\vert \vert w \vert \vert \rightarrow 0\) and \(\mathbb {P}[\{\vert Y \vert \le w\}^c] \rightarrow 0\) can always be found provided all components of Y go to 0 in probability. Consequently, to complete the proof of Theorem 2, it is sufficient to show that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\), which is proved in Step 2.
Step 2. Show that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\).
In Step 2 we first show that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\), where \(T_1\) is defined in (37). Then we deduce that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\), where Y is defined in (41). To establish the result for \(T_1\) we shall make use of results from empirical process theory. A key step is to approximate
However, as \(\Pi _{x,x_0}\) and \(\Pi _{y,x_0}\) are vector space isomorphisms from \(\mathcal {T}_x({\varvec{M}})\) to \(\mathcal {T}_{x_0}({\varvec{M}})\) and \(\mathcal {T}_y({\varvec{M}})\) to \(\mathcal {T}_{x_0}({\varvec{M}})\), respectively, it follows from the definition of \(Z_n(x)\) in (33) that \(\Pi _{x,x_0} Z_n(x) - \Pi _{y,x_0}Z_n(y)\) in (42) is an IID sum of terms \(q_x(\xi _i)-q_y(\xi _i)\), where
for \(i=1, \ldots , n\), with a similar definition for \(q_y(\xi _i)\), and with \(q_x(\xi _i), q_y(\xi _i) \in \mathcal {T}_{x_0}({\varvec{M}})\). It follows that (42) is equal to
It is also assumed below that \(x,y \in B_{\delta _0}(x_0)\), the open ball in \({\varvec{M}}\) of radius \(\delta _0\) centred at \(x_0\), where \(0<\delta _0 <\delta \), and using condition (iii) of Theorem 1, \(\delta \) has been chosen to be sufficiently small for \(\mu \) to be absolutely continuous on \(\mathcal {A}_{\delta } (x_0)\), where \(\mathcal {A}_{\delta } (x_0)\) is defined in (9) and \(x_0 \in {\varvec{M}}\) is the population Fréchet mean of \(\mu \), assumed to be unique.
We may write (43) as
For \(\xi \in {\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)\), using (21) and Theorem 1, we have
and also, since \({\varvec{M}}\) is smooth and compact, it follows that
Consequently, for \(\xi \in {\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)\),
Therefore, since (44) holds uniformly for \(z \in {\varvec{M}}{\setminus } \mathcal {A}_{\delta }(x_0)\) by compactness, connectedness and smoothness of \({\varvec{M}}\), it follows that
To approximate the integral of \(\{q_x(\xi ) -q_y(\xi )\}^{\top }\{ q_x(\xi )-q_y(\xi ) \}\) on the set \(\mathcal {A}_{\delta }(x_0)\), we use the following facts: recall that \(\delta >0\) has been chosen sufficiently small so that the Radon–Nickodym derivative of \(\mu \) has a continuous version on \(\mathcal {A}_{\delta }(x_0)\) (see assumption (iii) of the theorems); the Riemannian volume of \(\mathcal {A}_{\delta }(x_0)\) satisfies \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x_0))=O(\rho (x,y))\) (see immediately below (25)); and \(q_x(\xi )\) is bounded on \({\varvec{M}}\). As a consequence of these facts,
Consequently, for x and y such that \(\rho (x_0,x) \rightarrow 0\) and \(\rho (x_0,y) \rightarrow 0\),
The relevant class of functions here is
where \(0<\delta _0<\delta \) is chosen to be sufficiently small. Note that, by construction, these functions take values in \(\mathcal {T}_{x_0}({\varvec{M}})\), the tangent space of the population Fréchet mean \(x_0\). Since this is a Euclidean space we may apply the standard theorems of empirical process theory to the function class \(\mathcal {F}_{\delta _0}\). Using (47) and the fact that \({\varvec{M}}\) is compact, it follows that the integrals in Theorem 2.5.6 of van der Vaart and Wellner [22] are both finite, so that \(\mathcal {F}_{\delta _0}\) is a Donsker class. By Theorem 3.34 of Dudley [8], the Donsker property is sufficient to guarantee asymptotic equicontinuity at \(x_0\), which in turn implies that
To justify (48), we argue as follows. The relevant empirical process in terms of the \(q_x\) functions is
where \(Z_n(x)\) is defined in (33), the parallel transport map \(\Pi _{x,x_0}\) is defined in the statement of Theorem 1 and, by definition, the \(q_x\) functions have already been centred so that \(\mathbb {E}[q_x(\xi _i)]=0 \in \mathcal {T}_{x_0}({\varvec{M}})\). Asymptotic equicontinuity implies that, for any \(\eta >0\) and \(\epsilon >0\), there exists a neighbourhood \(V \subset {\varvec{M}}\) of \(x_0 \in {\varvec{M}}\) such that
where \(\Pi _{x_0, x_0}\) is taken to be the identity. Moreover, by Proposition 1,
as \(n \rightarrow \infty \), where \(\hat{\xi }_n\) is a measureable selection of the sample Fréchet mean. The result (48) now follows by a standard argument.
Thus we have proved that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\). One further comment: most of the results in the literature on empirical process theory, including van der Vaart and Wellner [22] and Dudley [8], are usually stated for classes of real-valued functions. To generalise to \(\mathbb {R}^m\)-valued functions, where m is finite, is a straightforward matter. In the present context, we simply prove that each component is \(o_p(1)\), which follows immediately from the calculations given above.
We now complete the proof of Step 2. Recall the first equality in (39). Since, from Theorem 1,
it follows that
Moreover, it has already been shown that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\), and we know from condition (v) of Theorem 2 that \(\Psi _{\mu }(x_0)^{-1}\) is a fixed matrix with bounded elements and that \(\vert \vert \Psi _{\mu }(x_0)^{-1}Z_n(x_0)\vert \vert =O_p(1)\) due to the central limit theorem. Consequently, it must be the case that \(\sqrt{n} \vert \vert \exp _{x_0}^{-1}(\hat{\xi }_n)\vert \vert = O_p(1)\). Hence \(\sqrt{n}\vert \vert R(\hat{\xi }_n,x_0)\vert \vert =o_p(1)\) and therefore \(\vert \vert Y \vert \vert =o_p(1)\) as claimed. \(\square \)
References
Bhattacharya, A., Bhattacharya, R.N.: Nonparametric Inference on Manifolds with Applications to Shape Spaces. Cambridge University Press, Cambridge (2012)
Bhattacharya, R.N., Lin, L.: Omnibus CLTs for Fréchet means and nonparametric inference on non-Euclidean spaces. Proc. Am. Math. Soc. 145, 413–428 (2017)
Barden, D., Le, H.: Some consequences of the nature of the distance function on the cut locus in a Riemannian manifold. J. Lond. Math. Soc. 56, 369–383 (1997)
Bhattacharya, R.N., Patrangenaru, V.: Large sample theory of intrinsic and extrinsic sample means on manifolds-I. Ann. Stat. 31, 1–29 (2003)
Bhattacharya, R.N., Patrangenaru, V.: Large sample theory of intrinsic and extrinsic sample means on manifolds-II. Ann. Stat. 33, 1225–1259 (2005)
Brown, B.M.: Multiparameter linearization theorems. J. R. Stat. Soc. B 47, 323–331 (1985)
Chavel, I.: Riemannian Geometry: A Modern Introduction. Cambridge University Press, Cambridge (1993)
Dudley, R.M.: Uniform Central Limit Theorems, 2nd edn. Cambridge University Press, Cambridge (2008)
Eltzner, B., Huckemann, S.F.: A smeary central limit theorem for manifolds with application to high-dimensional spheres. Ann. Stat. 47, 3360–3381 (2019)
Eltzner, B., Galaz-Garcia, F., Huckemann, S.F., Tuschmann, W.: Stability of the cut locus and a central limit theorem for Fréchet means of Riemannian manifolds. Proc. Am. Math. Soc. 149, 3947–3963 (2021)
Evans, S.N., Jaffe, A.Q.: Strong laws of large numbers for Fréchet means. arXiv:2012.12859 (2020)
Gallot, S., Hulin, D., Lafontaine, J.: Riemannian Geometry. Springer, New York (1987)
Hotz, T., Huckemann, S.: Intrinsic means on the circle: uniqueness, locus and asymptotics. Ann. Inst. Math. Sci. 67, 177–193 (2015)
Jost, J.: Riemannian Geometry and Geometric Analysis, 4th edn. Springer, New York (2005)
Kendall, W.S., Le, H.: Limit theorems for empirical Fréchet means of independent and non-identically distributed manifold-valued random variables. Braz. J. Probab. Stat. 25, 323–352 (2011)
Le, H., Barden, D.: Itô correction terms for the radial parts of semimartingales on manifolds. Probab. Theory Relat. Fields 101, 133–146 (1995)
Le, H., Barden, D.: On the measure of the cut locus of a Fréchet mean. Bull. Lond. Math. Soc. 46, 698–708 (2014)
Le, H., Lewis, A., Bharath, K., Fallaize, C.: A diffusion approach to Stein’s method on Riemannian manifolds. Bernoulli 30, 1079–1104 (2024)
O’Neill, B.: Semi-Riemannian Geometry. Academic Press, Orlando (1983)
Ritov, Y.: Tightness of monotone random fields. J. R. Stat. Soc. B 49, 331–333 (1987)
van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, New York (1996)
Willmore, T.J.: Riemannian Geometry. Clarendon Press, Oxford (1993)
Ziezold, H.: On expected figures and a strong law of large numbers for random elements in quasi-metric spaces. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes and of the Eighth European Meeting of Statisticians (Tech. Univ. Prague, Prague, 1974), Vol. A, pp. 591–602. Reidel, Dordrecht (1977)
Acknowledgements
The authors are grateful to two reviewers for close readings of the paper and for constructive comments. ATAW is grateful to the Australian Reaearch Council for supporting his research on this paper through grant DP220102322.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Contributions
All three authors contributed to the original discussions that led to the paper. Most of the writing of the paper was due to HL and ATAW. The formulation and proof of Theorem 1 was due to HL, with support from TH and ATAW; the proof of Theorem 2 was due to ATAW, with support from TH and HL. All three authors contributed to the examples and reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Proposition 2
For a Riemannian manifold \(({\varvec{M}},g)\) with connection \(\nabla \), the Sasaki metric \(\hat{g}\) on the tangent bundle \({\mathcal {T}}\!\!{\varvec{M}}\) is a natural Riemannian metric that has the properties that (i) horizontal and vertical distributions are orthogonal; (ii) the metric induced on the fibers is Euclidean; (iii) the canonical projection from \(({\mathcal {T}}\!\!{\varvec{M}},\hat{g})\) to \(({\varvec{M}},g)\) a Riemannian submersion. More precisely, it is determined by
for all vector fields \(X,Y\in \mathcal {C}^{\infty }({\mathcal {T}}\!\!{\varvec{M}})\), where \(X^h\) and \(X^v\) respectively denote the horizontal and vertical lifts of X and Y to \({\mathcal {T}}({\mathcal {T}}\!\!{\varvec{M}})\). A smooth curve \(\Gamma (t)=(\gamma (t),u(t))\) in \({\mathcal {T}}\!\!{\varvec{M}}\) is a geodesic under the Sasaki metric if and only if it satisfies
and
In particular, if \(\gamma (t)\) is a geodesic on \({\varvec{M}}\) and u(t) is a parallel vector field along \(\gamma (t)\), then \(\Gamma (t)=(\gamma (t),u(t))\) satisfies the conditions (49) and (50), so that \(\Gamma (t)\) is a geodesic on \({\mathcal {T}}\!\!{\varvec{M}}\) equipped with the Sasaki metric. Such a \(\Gamma (t)\) is called a horizontal lift of \(\gamma (t)\).
Proof of Proposition 2
It is sufficient to show that, for any \(\delta >0\), there is \(\delta _1>0\) such that, if \(\rho (x,y)<\delta _1\), then for any \(y'\in {{\mathcal {C}}}_y\), there is an \(x'\in {{\mathcal {C}}}_x\) such that \(\rho (x',y')<\delta \).
For this, we note first that the ‘full’ exponential map \(\hbox {Exp}\!:{\mathcal {T}}\!\!{\varvec{M}}\longrightarrow {\varvec{M}}\) is \({\mathcal {C}}^{\infty }\) and that, since \({\varvec{M}}\) is compact, the distance \(\tilde{\rho }(x,v)\) of x to the cut point of x along the geodesic \(\exp _x(tv)\) is a continuous function to \(\mathbb {R}^+\) on the unit sphere bundle \({\mathcal {S}}\!{\varvec{M}}= \{(x,v)\in {\mathcal {T}}\!\!{\varvec{M}}|\Vert v\Vert =1\}\) in the tangent bundle. Thus, it follows from the compactness of \({\varvec{M}}\) that the function R defined by
is uniformly continuous. Hence, for any \(\delta >0\), there is \(\delta _1>0\) such that, for any \(v_x=(x,v),\,w_y=(y,w)\in {\mathcal {S}}\!{\varvec{M}}\), \(\hat{\rho }(v_x,\,w_y)<\delta _1\) implies that
where \(\hat{\rho }\) is the distance function on \({\mathcal {T}}\!\!{\varvec{M}}\) induced by the Riemannian metric \(\hat{g}\).
Now, for given \(x\in {\varvec{M}}\), assume \(\rho (x,y)<\delta _1\). For any \(y'\) in \({\mathcal {C}}_y\), let \(w\in {\mathcal {S}}_y({\varvec{M}})\) such that \(y'=R(w_y)\). For such a \(w_y=(y,w)\in {\mathcal {S}}\!{\varvec{M}}\), take \(v\in {\mathcal {S}}_x({\varvec{M}})\) to be the parallel transport of w to x along the geodesic from the y to x. Then, \(v_x=(x,v)\in {\mathcal {S}}\!{\varvec{M}}\). By the remark following (18), the distance \(\hat{\rho }(v_x,w_y)\) between \(v_x\) and \(w_y\) on \({\mathcal {T}}\!\!{\varvec{M}}\) is equal to \(\rho (x,y)<\delta _1\) on \({\varvec{M}}\). Take \(x'=R(v_x)\). Then, \(x'\in {\mathcal {C}}_x\) and it follows from (51) that \(\rho (x',y')=\rho (R(v_x),\,R(w_y))<\delta \) as required. \(\square \)
Appendix B: Proof of Lemma 4
We first prove (29). For any events A and B we have
Also, if w has positive components, \(A_0=\{X+Y \le x\}\), \(A_1=\{X \le x+w\}\), \(A_2=\{X \le x-w\}\), \(A_3=\{X \le x\}\) and \(B=\{\vert Y \vert \le w\}\), then
and consequently,
Applying the LHS ineqality of (52) to the LHS of (53) and the RHS inequality of (52) to the RHS of (53), we obtain
From this it follows that
and
Since \(A_2 \subseteq A_3 \subseteq A_1\), it follows that \(\mathbb {P}[A_2] \le \mathbb {P}[A_3] \le \mathbb {P}[A_1]\) and therefore
Moreover, since \(\mathbb {P}[A_0 \cap B]=\mathbb {P}[A_0] - \mathbb {P}[A_0 \cap B^c]\) and for any real numbers a and b, \(\vert a-b\vert \ge \vert a\vert - \vert b \vert \), it follows that
Therefore, since \(\mathbb {P}[A_0 \cap B^c] \le \mathbb {P}[B^c]\), it follows that
The inequality in (31) follows easily from (30), because (30) implies that
for all \(x \in \mathbb {R}^m\), and for all \(w \in \mathbb {R}^m\) with positive components. Moreover, using Taylor expansion it is easily shown that
where \(c_0>0\) is a suitable constant depending only on \(\Sigma \). The proof is now complete. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hotz, T., Le, H. & Wood, A.T.A. Central limit theorem for intrinsic Fréchet means in smooth compact Riemannian manifolds. Probab. Theory Relat. Fields (2024). https://doi.org/10.1007/s00440-024-01291-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00440-024-01291-3