1 Introduction

The Fréchet mean, the natural setting for which is a metric space, is defined as the point, or set of points, in the space for which the sum of squared distances is minimised. In Euclidean spaces and normed vector spaces, the Fréchet mean is the standard linear mean. More generally, it extends the concept of the mean to nonlinear spaces. In this paper we focus on the large sample behaviour of the sample Fréchet mean based on the intrinsic distance in smooth, compact Riemannian manifolds.

Central limit theory for Fréchet means on compact Riemannian manifolds has been an ongoing topic of research for over 20 years. The principal source of difficulty in proving a general central limit theorem for the intrinsic Fréchet mean is due to the so-called cut locus of a manifold. Roughly speaking, the cut locus of a point x in a manifold \({\varvec{M}}\) is the set of points \(z \in {\varvec{M}}\) such that there exists more than one distance-minimising geodesic from x to z or the distance function from x is singular at z. This non-uniqueness produces non-smooth behaviour in the estimating function for the Fréchet mean. However, despite the challenge posed by the cut locus, there has been some progress in this area, typically with the limitation that the cut locus of the population Fréchet mean is assumed to lie outside the support of the population distribution.

For an account of nonparametric inference for manifold-valued data see Bhattacharya and Bhattacharya [1]. Significant contributions on central limit theorems (CLTs) for the Fréchet mean in compact Riemannian manifolds include the following. The papers of Bhattacharya and Patrangenaru [4, 5] were the first to lay out an extensive Fréchet central limit theory for manifolds, covering both intrinsic and extrinsic means; Kendall and Le [15] proved a CLT for Fréchet means based on independent but not necessarily identically distributed manifold-valued random variables; Bhattacharya and Lin [2] considered a more general metric space setting than just manifolds but also derived results of interest for manifolds; Eltzner and Huckemann [9] obtained further extensions and they also discussed a phenomenon that they call smeariness; moreover Eltzner et al. [10] proved a further CLT and developed the concepts of topological stability and metric continuity of the cut locus which we make use of later in the paper. However, all of the CLTs for Fréchet means in general compact Riemannian manifolds given in the contributions mentioned above, and to the best of our knowledge all of the relevant literature, with two exceptions to be discussed below, assume that the relevant population distribution has support which excludes the cut locus.

One of these exceptions is given by Theorem 3.3 in Bhattacharya and Lin [2]. They showed that the Fréchet mean exhibits standard behaviour also when the cut-locus of small balls around the population mean carry mass which goes to zero faster than the radius raised to the manifold’s dimension plus two. This will be the case if the distribution is absolutely continuous with respect to the Riemannian volume measure and the cut-locus is of codimension three or higher, as is the case for three- and higher-dimensional spheres; cf. Corollary 3.5 in Bhattacharya and Lin [2]. In fact, the authors remark that they can treat the two-dimensional sphere only under support restrictions excluding the cut locus (see their Remark 3.7). However, we speculate that it may be possible to use results along the lines of Brown [6], see also Ritov [20], to prove a standard CLT for the Fréchet mean in the case of \(S^2\), where the cut locus has co-dimension 2, but we have not yet investigated all of the details. Here, we take a different approach to that problem.

The other exception referred to above is Theorem 2.11 of Eltzner and Huckemann [9]. As these authors point out, their Theorem 2.11 is a generalisation and adaptation of van der Vaart (1998, Theorem 5.2.3). This is an interesting result which provides a starting point for a general study of the phenomenon called smeariness; see Eltzner and Huckemann [9]. However, from the perspective of the current paper, Theorem 2.11 has two deficiencies: (i) it does not give a clear picture of how the geometry impacts the CLT theory; and (ii) it is not straightforward to justify Assumption 2.6 in Theorem 2.11 in the case of compact manifolds, due to non-differentiability of the Fréchet function at the cut locus. In this paper we address point (i) in Theorem 1 and Theorem 2 below; and we use Theorem 1 to deal directly with the problems caused by non-differentiability at the cut locus, thereby side-stepping point (ii).

At the outset it was not clear whether the CLT for the intrinsic Fréchet mean on compact Riemannian manifolds exhibits standard behaviour but with technically difficult proofs or whether non-standard behaviour can occur. The article by Hotz and Huckemann [13], who considered the intrinsic Fréchet mean on the circle, \(S^1\), settled the matter by showing that highly non-standard behaviour occurs in this setting. This sets the scene for the currently open question of the appropriate form of the central limit theorem for the intrinsic Fréchet mean in a general compact Riemannian manifold.

The principal aims of this paper are (i) to clarify when non-standard behaviour of the Fréchet mean in compact Riemannian manifolds occurs; and (ii) to characterise the non-standard behaviour when it does occur. Specifically, we allow the support of the population distribution to include the cut locus and only a mild regularity assumption is made in this regard. A key part of the proof is establishing an asymptotic approximation for the parallel transport of a certain vector field. Whether or not a non-standard term arises in the CLT depends on whether the co-dimension of the cut locus relative to \({\varvec{M}}\) is 1 or greater than 1: in the former case a non-standard term will appear but not in the latter case. The non-standard term which arises when the co-dimension of the cut locus is 1 is precisely characterised.

The main results of the paper, Theorems 1 and 2, are stated in Sect. 2 and are proved in Sects. 3 and 4, respectively.

2 Main results

2.1 Central limit theorem

Let \({\varvec{M}}\) be a compact and connected Riemannian manifold (without boundary) of dimension m and let \(\rho \) denote the distance function on \({\varvec{M}}\times {\varvec{M}}\) induced by the Riemannian metric. Suppose that \(\mu \) is a probability measure on M. The Fréchet function \(F_{\mu }\) of \(\mu \) is defined as

$$\begin{aligned} F_{\mu }(x)=\int _{{\varvec{M}}}\rho (x,y)^2\hbox {d}\mu (y),\quad x\in {\varvec{M}}. \end{aligned}$$
(1)

Since \({\varvec{M}}\) is compact, \(F_{\mu }(x)<\infty \) for all \(x\in {\varvec{M}}\). The population Fréchet mean is defined by

$$\begin{aligned} x_0=\mathop {\textrm{arg}\,\textrm{min}}\limits _{x \in M}F_{\mu }(x). \end{aligned}$$

For some \(\mu \), \(x_0\) will consist of a subset of \({\varvec{M}}\) rather than a single point in \({\varvec{M}}\). It will be assumed throughout the paper that \(x_0\) is unique.

Suppose \(\xi _1, \ldots , \xi _n \in {\varvec{M}}\) is a random sample drawn independently from \(\mu \). Then, the set of sample Fréchet means is defined by

$$\begin{aligned} \mathcal {G}_n=\mathop {\textrm{arg}\,\textrm{min}}\limits _{x \in {\varvec{M}}} \sum _{i=1}^n \rho (x,\xi _i)^2, \end{aligned}$$
(2)

where \(\mathcal {G}_n \subset {\varvec{M}}\) is the set of global minima of \(n^{-1} \sum _{i=1}^n \rho (\xi _i, y)^2\). In those cases where \(\mathcal {G}_n\) is not a singleton set, it is assumed that a measureable selection \(\hat{\xi }_n \in \mathcal {G}_n\) has been made, so that \(\hat{\xi }_n \in \mathcal {G}_n\) is a measureable random element in the case where \(\mathcal {G}_n\) is not a singleton set.

The following result, proved in Section 5.1, makes use of the strong laws of large numbers proved by Ziezold [24] and Evans and Jaffe [11].

Proposition 1

Assume that (i) \({\varvec{M}}\) is compact and (ii) \(x_0 \in {\varvec{M}}\) is the unique population Fréchet mean of \(\mu \). For each n, let \(\hat{\xi }_n \in \mathcal {G}_n\) denote any measureable selection from \(\mathcal {G}_n\). Then \(\rho (x_0, \hat{\xi }_n) \overset{a.s.}{\rightarrow }0\) as \(n \rightarrow \infty \).

Let \(\mathcal {T}_x({\varvec{M}})\) denote the tangent space at \(x \in {\varvec{M}}\) and write \(\exp _x(v)\) to denote the exponential map, which maps a point \(v \in \mathcal {T}_x({\varvec{M}})\) to the point \(\exp _x(v) \in {\varvec{M}}\). The inverse exponential (or log) map, denoted \(\exp _x^{-1}(y)\), maps a point \(y \in {\varvec{M}}\setminus {\mathcal {C}}_x\) to the point \(\exp _x^{-1}(y) \in \mathcal {T}_x({\varvec{M}})\), where \({\mathcal {C}}_x\) denotes the cut locus of x. See, for example, Chavel [7] for terminology. Also, define

$$\begin{aligned} G_{\mu }(x)=\int _{{\varvec{M}}} \exp _x^{-1}(\xi )\,1_{\{\xi \not \in {\mathcal {C}}_x\}}\hbox {d}\mu (\xi ), \quad x \in M, \end{aligned}$$
(3)

where \(1_A\) denotes the indicator function of a set A. Note that \(\{G_{\mu } (x): \, x \in {\varvec{M}}\}\) is a vector field on \({\varvec{M}}\). It follows from the result of [17] that

$$\begin{aligned} G_{\mu }(x_0)=\int _{{\varvec{M}}} \exp _{x_0}^{-1}(\xi )\,\hbox {d}\mu (\xi )=0 \in \mathcal {T}_{x_0}({\varvec{M}}) \end{aligned}$$

and that, with probability one under the product measure determined by \(\mu \),

$$\begin{aligned} G_{\hat{\mu }_n}(\hat{\xi }_n)=\frac{1}{n}\sum _{i=1}^n \exp _{\hat{\xi }_n}^{-1}(\xi _i)=0\in \mathcal {T}_{\hat{\xi }_n}({\varvec{M}}), \end{aligned}$$
(4)

where \(\hat{\mu }_n\) is the empirical distribution on \({\varvec{M}}\) based on the random sample \(\xi _1, \ldots , \xi _n\).

Before stating Theorems 1 and 2, we mention a number of relevant facts. For any fixed \(x \in {\varvec{M}}\), define \(\rho _x(\cdot )=\rho (x, \cdot )\). We denote by D the covariant derivative and by \(\nabla \) the gradient operator, both defined on \({\varvec{M}}\). For \(x'\not \in {\mathcal {C}}_x\),

$$\begin{aligned} \nabla \rho _x(x')^2=-2\exp _{x'}^{-1}(x) \end{aligned}$$
(5)

(cf. Jost [14], p. 203). Moreover, the Hessian, \(\hbox {Hess}^f\), of a smooth function f on \({\varvec{M}}\) is the (symmetric) (0, 2)-tensor field such that, for any vector fields U and V on \({\varvec{M}}\),

$$\begin{aligned} \hbox {Hess}^f(U,V)(x')=\langle D_V(\nabla f),U\rangle (x') \end{aligned}$$
(6)

(cf. O’Neill [19], p.86). That is, \(\hbox {Hess}^{\rho _x^2}\) can be expressed as

$$\begin{aligned} -\frac{1}{2}\hbox {Hess}^{\rho _x^2}(V(x'),U(x'))=\langle (H(x'| x))(V(x')),\,U(x')\rangle , \end{aligned}$$
(7)

for any smooth vector fields U, V on \({\varvec{M}}\) and any \(x'\in {\varvec{M}}\setminus {\mathcal {C}}_x\), where \(H(x'| x)\) is the (1, 1)-tensor such that, for any smooth vector field V on \({\varvec{M}}\),

$$\begin{aligned} (H(x'| x))(V(x'))=D_{V(x')}\exp ^{-1}_{x'}(x). \end{aligned}$$
(8)

For any \(x\in {\varvec{M}}\) and \(\delta >0\), define the sets

$$\begin{aligned} \mathcal {A}_{\delta }(x) = \bigcup _{y \in B_{\delta }(x)} \mathcal {C}_y \end{aligned}$$
(9)

and

$$\begin{aligned} \mathcal {B}_{\delta }(x)= \bigcup _{z \in \mathcal {C}_x} B_{\delta }(z) =\{y \in {\varvec{M}}: \rho (z,y)<\delta \quad \hbox {for some} \quad z \in \mathcal {C}_x\}, \end{aligned}$$
(10)

where \(B_{\delta } (x) = \{x' \in {\varvec{M}}:\, \rho (x,x') < \delta \}\). The concepts of topological stability and metrical continuity of the cut locus are relevant in the present context; see definitions 3.6 and 3.10 in Eltzner et al. [10]. Corollary 3.8 and Proposition 3.11 in Eltzner et al. [10] prove that both topological stability and metric continuity of the cut locus hold for compact Riemannian manifolds. Here, it will be slightly more convenient to use the concept of metric continuity at any point \(x \in {\varvec{M}}\). In the notation defined above, metric continuity entails the following.

Proposition 2

If \({\varvec{M}}\) is a compact Riemannian manifold and \(x \in {\varvec{M}}\), then for any \(\delta >0\) there exists a \(\delta _1>0\) such that

$$\begin{aligned} \mathcal {A}_{\delta _1}(x) \subseteq \mathcal {B}_{\delta }(x). \end{aligned}$$
(11)

In Appendix A we give a proof for Proposition 2 different to that given by Eltzner et al. [10].

Let \(\text {vol}_{{\varvec{M}}}\) denote the Riemannian volume measure on \({\varvec{M}}\). The key linearization result we need is the following.

Theorem 1

Assume that (i) \({\varvec{M}}\) is a compact, connected Riemannian manifold; (ii) \(x_0 \in {\varvec{M}}\) is the unique population Fréchet mean of \(\mu \); (iii) for \(\delta >0\) sufficiently small, \(\mu \), restricted to \(\mathcal {B}_{\delta }(x_0)\) defined in (10), is absolutely continuous with respect to \(\text {vol}_{{\varvec{M}}}\) and the corresponding Radon–Nikodym derivative has a version \(\psi \) which is continuous on \(\mathcal {B}_{\delta }(x_0)\); (iv) as \(\delta \downarrow 0\), \(\mathcal {A}_{\delta }(x_0)\) in (9) satisfies \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x_0))= O(\delta )\); (v) the function \(\xi \mapsto H(x_0\vert \xi )\) is \(\mu \)-integrable, i.e. the integral

$$\begin{aligned} \int _{{\varvec{M}}} H(x_0\vert \xi ) \hbox {d}\mu (\xi ) \end{aligned}$$
(12)

exists, where \(H(\cdot \vert \cdot )\) is defined in (8). Then the vector field \(G_{\mu }(x)\) admits the following linearization for \(x \in {\varvec{M}}\) in a neighbourhood of \(x_0\):

$$\begin{aligned} \Pi _{x,x_0} G_{\mu } (x)=-\Psi _{\mu }(x_0) \exp _{x_0}^{-1}(x)+ R(x,x_0), \end{aligned}$$
(13)

where \(\Pi _{x,x_0}\) denotes parallel transport from \({\mathcal {T}}_x({\varvec{M}})\) to \({\mathcal {T}}_{x_0}({\varvec{M}})\) along the (unique) shortest geodesic between x and \(x_0\), \(\vert \vert R(x,x_0)\vert \vert = o(\rho (x,x_0))\) and \(\Psi _{\mu }(x_0)\) is a (1, 1)-tensor defined by

$$\begin{aligned} \Psi _{\mu }(x_0)=-\int _{{\varvec{M}}}H(x_0|\xi )\,\hbox {d}\mu (\xi )-J_{\mu }(x_0). \end{aligned}$$
(14)

where \(J_{\mu }(x_0)\) is defined by (20) at \(x=x_0\).

The second term \(J_{\mu }\) in the expression for \(\Psi _{\mu }\) specifies the role played by the cut locus of the Fréchet mean \(x_0\). A direct consequence of the proof of Theorem 1 is that the result holds regardless of the codimension of the cut locus of \(x_0\). In particular, in the case that the codimension of the cut locus of \(x_0\) is greater than one, the expression for \(\Psi _{\mu }(x_0)\) reduces to \(\int _{{\varvec{M}}}H(x_0|\xi )\,\hbox {d}\mu (\xi )\).

Note that we choose to use the term Radon–Nikodym derivative throughout this paper, instead of the commonly used term probability density function. This allows us to express quantities of interest in a coordinate-free fashion, while the term ‘probability density function’ is typically used in the Euclidean setting with the volume measure expressed in terms of a fixed standard coordinate system.

The proof of Theorem 1, given in Sect. 3, uses some involved geometric arguments. These arguments are of potentially broader interest than just the current context. The definition of \(\Psi _{\mu }(x_0)\), which is given in the next subsection, has a particularly interesting form when the co-dimension of the cut-locus of \(x_0\) is 1. In this case \(\Psi _{\mu }(x_0)\) contains a non-standard term which we discuss in detail below, and illustrate in some examples at the end of the section.

Note that, under assumption (iii) of Theorem 1, \(G_{\mu }\) defined by (3) can be written as

$$\begin{aligned} G_{\mu }(x)=\int _{{\varvec{M}}}\exp ^{-1}_x(\xi )\,\hbox {d}\mu (\xi ) \end{aligned}$$

in a neighbourhood of \(x_0\) and so we have the relationship

$$\begin{aligned} G_{\mu }(x)=-2\nabla F_{\mu }(x) \end{aligned}$$

in that neighbourhood. Then, one immediate consequence of Theorem 1 is that the Hessian of the Fréchet function \(F_{\mu }\) at \(x_0\) exists and it can be expressed in terms of \(\Psi _{\mu }\) as

$$\begin{aligned} \hbox {Hess}^{F_{\mu }}(U,V)(x_0)=2\langle \Psi _{\mu }(x_0) (U(x_0),\,V(x_0)\rangle . \end{aligned}$$
(15)

In fact, a slight modification of the proof for Theorem 1 shows that the same result holds in a neighbourhood of \(x_0\).

We now state our main result, a CLT for \(\hat{\xi }_n\), assumed to be a measurable selection from \(\mathcal {G}_n\).

Theorem 2

Suppose that assumptions (i) – (v) of Theorem 1 hold and that \(\hat{\xi }_n\) is any measurable selection from \(\mathcal {G}_n\), as in Proposition 1. In addition, assume (vi) that \(\Psi _{\mu }(x_0)\) is strictly positive definite. Then

$$\begin{aligned} \sqrt{n} \exp _{x_0}^{-1}(\hat{\xi }_n) \overset{d}{\rightarrow }\mathfrak {N}_m \left\{ 0_m, \Psi _{\mu }(x_0)^{-1}V_0 \Big (\Psi _{\mu }(x_0)^{\top }\Big )^{-1}\right\} , \end{aligned}$$

where \(V_0=\textrm{Cov}(\exp ^{-1}_{x_0}(\xi _1))\).

2.2 Discussion of assumptions

Here we discuss the assumptions made in Theorems 1 and  2. Assumptions (i) and (ii) in Theorem 1 define the setting that we consider. Assumption (iii) in Theorem 1 implies a certain level of regularity of the population distribution in a neighbourhood of the cut locus of the Fréchet mean; some such regularity is needed for an expansion of the type (13) to hold. Previous central limit theorems in this setting, such as Bhattacharya and Patrangenaru [4, 5] have made the much stronger assumption that the population probability density function is zero in a neighbourhood of the cut locus of the population Fréchet mean. Bhattacharya and Lin [2] have assumed \(\mu ({\mathcal {A}}_{\delta }(x_0)) = o(\delta ^2)\) whereas our assumptions (iii) and (iv) amount only to \(\mu ( {\mathcal {A}}_{\delta }(x_0)) =O(\delta )\).

Assumptions (iv) and (v) in Theorem 1 are largely geometric in character. For each of these assumptions, it would be interesting to know whether or not it holds for all smooth, compact connected manifolds when the population Fréchet mean is unique. However, we do not have a proof or a counter-example to this statement in either case and we have found nothing in the literature that throws light on either question.

Finally, assumption (vi) in Theorem 2 is a non-degeneracy assumption. If \(\Psi _{\mu }(x_0)\) is non-negative definite but not of full rank then we appear to be in a similar, but more general, situation to that of a smeary central limit theorem, as discussed by Eltzner and Huckemann [9]. Specifically, a type of central limit theorem is expected to hold but with a non-standard convergence rate which depends on the level of smoothness of the population distribution. The reason that we believe the situation is potentially more general than that considered by Eltzner and Huckemann [9] is that they focus on situations where each component in the CLT has the same convergence rate, but in general rank-deficient cases there is the potential for different components to have different convergence rates. In any case, it appears that a proper study of smeariness in rank-defficient cases is going to be very challenging because it will involve the calculation of higher-order terms in the Taylor expansion of the Fréchet function and it is unclear how to deal with this in general. As a consequence, the discussion in this paragraph is somewhat speculative.

Two further points. First, bearing in mind that \(2 \Psi _{\mu }(x_0)\) is the Hessian of the Fréchet function \(F_{\mu }(x)\), see (15), it follows that if \(\Psi _{\mu }(x_0)\) has one or more strictly negative eigenvalues then this contradicts \(x_0\) being a Fréchet mean due to the Hessian of the Fréchet function \(F_{\mu }(x)\) in (1) not being non-negative definite, in which case \(x_0\) can not be a stationary minimum of the Fréchet function. Second, although in this paper we have not explored the practical impact of the non-standard form of the Hessian that arises when the cut locus has codimension 1, we believe that, due to the likely connection with smeariness in rank-deficient cases, the practical impact will be considerable in some cases; see Eltzner and Huckemann [9] and other references therein for further information about the potential practical impact of smeariness.

2.3 The expression of \(\Psi _{\mu }(x_0)\)

The expression of \(\Psi _{\mu }(x_0)\) appearing in Theorems 1 and 2 comprises two terms, one associated with the Hessian of the squared distance function, away from the cut locus of \(x_0\), and the other with the behaviour of the distance function on the cut locus \({\mathcal {C}}_{x_0}\) of \(x_0\). Hence, the second term reflects the geometric structure of the manifold \({\varvec{M}}\).

Note that \(\rho _x(\cdot )^2=\rho (x,\cdot )^2\) is a smooth function away from the cut locus of x. The tensor \(H(x_0|\cdot \,)\) which appears in (7) and (8) determines the first term of \(\Psi _{\mu }(x_0)\). The construction above for \(H(x_0| x)\) requires that \(x\not \in {\mathcal {C}}_{x_0}\). Nevertheless, it follows from the result of Le and Barden [17] that \(H(x_0|\xi _1)\) is well-defined with probability one, because condition (iii) of Theorem 1 implies that \(\mu (\xi \in \mathcal {C}_{x_0})=0\), i.e. the cut locus of \(x_0\) has zero probability under \(\mu \).

To introduce the second term of \(\Psi _{\mu }(x_0)\), we first recall some facts on the cut locus \({\mathcal {C}}_x\) of x and the behaviour of \(\rho _x\) nearby. These results, explicitly or implicitly stated in Barden and Le [3] & Le and Barden [16], are given in the following lemmas. The first one is on the structure of \({\mathcal {C}}_x\), the cut locus of x.

Lemma 1

For any \(x \in {\varvec{M}}\) there is a set \({\mathcal {Q}}_x\) of co-dimension at least two contained in \({\mathcal {C}}_x\) and containing the first conjugate locus of x such that if \({\mathcal {H}}_x={\mathcal {C}}_x\setminus {\mathcal {Q}}_x\) non-empty, it is a countable union of disjoint hyper-surfaces (codimension one sub-manifolds) where, for each \(y\in {\mathcal {H}}_x\), there are exactly two minimal geodesics from x to y. In particular, \({\mathcal {H}}_x\) is a Borel measurable set and \(y\in {\mathcal {H}}_x\) if and only if \(x\in {\mathcal {H}}_y\).

The decomposition of \({\mathcal {C}}_x\) in Lemma 1 above is the same as that given in Theorem 2 of [16], but slightly different from that given in Proposition 2 in [3]. In [3] \({\mathcal {Q}}_x\) is the set of the first conjugate loci of x in \({\mathcal {C}}_x\), while here \({\mathcal {Q}}_x\) is the union of the set of the first conjugate loci of x in \({\mathcal {C}}_x\) with the set of non-conjugate points in \({\mathcal {C}}_x\) which have more than two minimal geodesics to x. Furthermore, the proof of Theorem 2 in Le and Barden [16] made it clear that the set \({\mathcal {Q}}_x\), which was called E there, has codimension at least two, although the Theorem itself only stated that it has Hausdorff \((m-1)\)-measure zero as needed for that paper. In particular, that the set of the first conjugate loci of x has co-dimension at least two was proved in Proposition 1 of Barden and Le [3].

The next two lemmas show that, although \(\rho _x\) is not differentiable at \({\mathcal {C}}_x\), it is relatively well behaved in a neighbourhood of \({\mathcal {H}}_x\). For these results, as well as their corollaries, we assume that \({\mathcal {H}}_x\) is non-empty.

Lemma 2

Let \({\mathcal {H}}_x\) be given as in Lemma 1. For each \(y\in {\mathcal {H}}_x\), there is a neighbourhood \({\mathcal {V}}_y\) of y in \({\varvec{M}}\) on which there are two unique smooth functions \(\phi _{1y}(\,\cdot | x)\) and \(\phi _{2y}(\,\cdot | x)\) such that for any \(y'\in {\mathcal {V}}_y\),

$$\begin{aligned} \rho _x(y')=\min \{\phi _{1y}(y'| x),\phi _{2y}(y'| x)\}, \end{aligned}$$

where \(\phi _{1y}(y'| x)=\phi _{2y}(y'| x)\) if and only if \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\).

The neighbourhood \({\mathcal {V}}_y\) and the two functions \(\phi _{iy}(\,\cdot | x)\) in the above Lemma were constructed in the proof of Proposition 1 in Barden and Le [3] as follows. There are two disjoint neighbourhoods \({\mathcal {U}}_{1y}\) and \({\mathcal {U}}_{2y}\) in \({\mathcal {T}}_x({\varvec{M}})\) such that, for each i, \({\mathcal {V}}_y=\exp _x({\mathcal {U}}_{iy})\). Then, \(\phi _{iy}(y'| x)=\Vert (\exp _x^{-1}|_{{\mathcal {U}}_{iy}}(y')\Vert \) for \(y'\in {\mathcal {V}}_y\).

The next result is an immediate consequence of this construction.

Lemma 3

Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {V}}_y\), \({\mathcal {U}}_{jy}\) and \(\phi _{jy}\) be given as in Lemma 2 and in the following construction. If, for \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\), \(\gamma _j\) is the minimal geodesic with \(\gamma _j(0)=x\), \(\gamma _j(1)=y'\) and \(\dot{\gamma }_j(0)\in {\mathcal {U}}_{jy}\), then

$$\begin{aligned} \nabla \phi _{jy}(y'| x)=\dot{\gamma }_j(1)/\Vert \dot{\gamma }_j(1)\Vert , \end{aligned}$$

that is, \(\nabla \phi _{jy}(y'| x)\) is the unit tangent vector to \(\gamma _j\) at \(y'\).

The following result follows from the uniqueness of the pair of functions \(\phi _{jy}\), \(j=1,2\), stated in Lemma 2.

Corollary 1

Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {V}}_y\) and \(\phi _{jy}\) be given as in Lemma 2. For each \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_x\), the unordered pair of the functions \(\{\phi _{1y'}(\cdot | x),\phi _{2y'}(\cdot | x)\}\) coincides with the pair \(\{\phi _{1y}(\cdot | x),\phi _{2y}(\cdot | x)\}\) on \({\mathcal {V}}_y\bigcap {\mathcal {V}}_{y'}\). Thus, the difference \(\phi _{1z}(\cdot | x)-\phi _{2z}(\cdot | x)\) is, up to sign, independent of \(z\in ({\mathcal {V}}_y\cup {\mathcal {V}}_{y'})\cap {\mathcal {H}}_x\) and so, making a continuous choice of sign, this difference is a well-defined smooth function \(\chi _i(\cdot | x)\) on a neighbourhood (in \({\varvec{M}}\)) of each connected component \({\mathcal {H}}_i(x)\) of \({\mathcal {H}}_x\).

This, together with the results in Barden and Le [3], implies the following relationship between \({\mathcal {H}}_i\) and \(\chi _i\).

Corollary 2

Let \({\mathcal {H}}_i\) and \(\chi _i\) be given as in Corollary 1. For \(y\in {\mathcal {H}}_i(x)\), \(\nabla \chi _i(y| x)\) is non-zero and normal to \({\mathcal {H}}_i(x)\) at y.

With the above understanding of \({\mathcal {C}}_x\) and \(\rho _x\) nearby, we reach the following main ingredients for our definition of the second term of \(\Psi _{\mu }(x_0)\).

Corollary 3

Let \({\mathcal {H}}_x\) be given as in Lemma 1, and let \({\mathcal {H}}_i\) and \(\chi _i\) be given as in Corollary 1. Then,

(a):

the set \({\mathcal {H}}_x\) can be expressed as the countable union of the \({\mathcal {H}}_i(x)\);

(b):

the function

$$\begin{aligned} \kappa (y| x)=\left\| \nabla \chi _i(y| x)\right\| ,\qquad \hbox { if }y\in {\mathcal {H}}_i(x), \end{aligned}$$
(16)

is a well-defined smooth function on a neighbourhood of \({\mathcal {H}}_x\);

(c):

the unit normal vector field given by

$$\begin{aligned} {\varvec{n}}(y| x)=\frac{\nabla \chi _i(y| x)}{\kappa (y| x)}\in {\mathcal {T}}_y({\varvec{M}}),\qquad \hbox { if }y\in {\mathcal {H}}_i(x), \end{aligned}$$
(17)

is well-defined up to sign on \({\mathcal {H}}_x\).

Note that, for \(y\in {\mathcal {H}}_i(x)\),

$$\begin{aligned} \rho (x,y)\,\kappa (y| x)=\frac{1}{2}\left\| \nabla \left( \phi _{1y}(y| x)^2-\phi _{2y}(y| x)^2\right) \right\| ; \end{aligned}$$

and that, for \(y\in {\mathcal {H}}_i(x)\) and \(y'\in {\mathcal {V}}_y\bigcap {\mathcal {H}}_i(x)\),

$$\begin{aligned} \nabla \phi _{1y}(y'| x)=\lim _{z\rightarrow y'}\nabla \phi _{1y}(z| x). \end{aligned}$$

Now, for \(y\in {\mathcal {H}}_x\), define \(\hbox {d}y^{\perp _x}\) to be the 1-form, unique up to sign, given by \(\hbox {d}y^{\perp _x}(U(y))=\langle {\varvec{n}}(y| x),U(y)\rangle \) for any tangent vector U(y) at y. Write \(J(y\,|\, x)\) for the well-defined (0, 2)-tensor at y on \({\mathcal {H}}_x\) given by

$$\begin{aligned} J(y| x)=\rho _x(y)\,\kappa (y| x)\,\textrm{d}y^{\perp _x} \otimes \textrm{d}y^{\perp _x}. \end{aligned}$$
(18)

That is, for any \(y\in {\mathcal {H}}_x\) and any \(U(y),V(y)\in \mathcal {T}_y({\varvec{M}})\),

$$\begin{aligned} \Big (J(y| x))(U(y),V(y)\Big )=\rho _x(y)\,\kappa (y| x) \,\langle {\varvec{n}}(y| x),U(y)\rangle \,\langle {\varvec{n}}(y| x),V(y)\rangle . \end{aligned}$$

Write \(\alpha (t)\) for the unit speed geodesic orthogonal to \({\mathcal {H}}_y\) at \(x\in {\mathcal {H}}_y\) and \(\tau _{y|x}(t)\) for the distance from y to \({\mathcal {H}}_{\alpha (t)}\) along the geodesic orthogonal to \({\mathcal {H}}_x\). Then

$$\begin{aligned} \hbox {d}y^{\perp _x}=\tau '_{y|x}(0)\,\hbox {d}x^{\perp _y}. \end{aligned}$$
(19)

That is, \(\tau '_{y|x}(0)\) represents the rate of change of y orthogonal to \({\mathcal {H}}_x\) as x moves orthogonally to \({\mathcal {H}}_y\), while each remains in the relevant co-dimension one subset of the cut locus of the other. In terms of \(J(x_0| y)\), \(\tau '_{y|x}(0)\) and \(\psi \), the Radon–Nikodym derivative of \(\mu \) with respect to the volume measure in a neighbourhood of \({\mathcal {C}}_{x_0}\), we denote by \(J_{\mu }(x)\) the (1, 1)-tensor field defined, for any vector field V(x), by

$$\begin{aligned}{} & {} \left( J_{\mu }(x)\right) (V(x))\nonumber \\{} & {} \quad =\displaystyle \int _{{\mathcal {H}}_{x}}\!\!\!\!\!\rho _y(x)\,\kappa (x\!|\! y)\, \langle {\varvec{n}}(x\!|\! y),V(x)\rangle \,{\varvec{n}}(x\!|\! y)\,\tau '_{y|x}(0)\, \psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_{x}}(y), \end{aligned}$$
(20)

where \(\hbox {d}\hbox {vol}_{{\mathcal {H}}_x}\) denotes the co-dimension one surface measure on \({\mathcal {H}}_x\).

2.4 Three examples

In the case of symmetric spaces, \(\tau '_{y|x}(0)\equiv 1\) and so the expression for \(J_{\mu }(x)\) defined by (20) can be simplified. We now calculate \(J_{\mu }(x)\) for special symmetric spaces with appropriate ‘coordinate systems’. Moreover, we show that condition (iv) in Theorem 1 is satisfied in each of the three examples, i.e. we show that \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x))=O(\delta )\) as \(\delta \downarrow 0\), where \(\mathcal {A}_{\delta }(x)\) is defined in (9); and we show that condition (v) in Theorem 1 is also satisfied in the three examples.

(a) \({\varvec{M}}=S^1\): \({\mathcal {H}}_x={\mathcal {C}}_x\) contains only the antipodal point y of x. Thus, \(\rho (x,y)=\pi \); the initial tangent vectors of the two geodesics from x to y have the opposite direction so that \(\kappa (x| y)=2\); and we may take \({\varvec{n}}(x| y)=1\). Hence, if we take the standard coordinate in the subset \((-\pi ,\pi ]\) in its universal cover with \(x=0\), then the corresponding \(J_{\mu }\) is \(J_{\mu }(0)=2\pi \,\psi (\pi )\), identical with the extra term in the covariance of the central limit theorem of Hotz and Huckemann [13].

Finally, we check conditions (iv) and (v) of Theorem 1. Since \(\mathcal {C}_x\) is the antipodal point of x, it follows that, in the local coordinates introduced above, \(\mathcal {A}_{\delta }\) may be written as \(\mathcal {A}_{\delta }(0)=(-\pi , -\pi +\delta ) \cup (\pi -\delta , \pi ]\), so that \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta } (x))=2\delta \) and therefore condition (iv) of Theorem 1 is satisfied. Condition (v) follows because the circle is flat and therefore \(H(x_0\vert \xi )=-1\) if \(\xi \) is not the antipodal point of \(x_0\).

For higher dimensional spheres \(S^d\), \(d > 1\), we have \(\text {vol}_{{\varvec{M}}}({\mathcal {A}}_{\delta }(x)) = O(\delta ^d)\) but \({\mathcal {H}}_x\) is empty since the cut-locus is of co-dimension \(d > 1\), so \(J_{\mu }(0)\) vanishes. For \(d > 2\) this has already been observed by Bhattacharya & Lin [2] but the CLT for \(S^2\) given a non-vanishing density at the cut locus appears to be new.

(b) \({\varvec{M}}=S^1\times S^1\) (the standard torus): We take the standard coordinate system in the subset \((-\pi ,\pi ]\times (-\pi ,\pi ]\) in its universal covering space with \(x=(0,0)\). Then \({\mathcal {C}}_x={\mathcal {H}}_x\cup \{(\pi ,\pi )\}\) where \({\mathcal {H}}_x\) is the union of two disjoint sets \({\mathcal {H}}_1(x)\) and \({\mathcal {H}}_2(x)\) where \({\mathcal {H}}_1(x)=(-\pi ,\pi )\times \{\pi \}\) and \({\mathcal {H}}_2(x)=\{\pi \}\times (-\pi ,\pi )\). Under this coordinate system, \(U=\frac{\partial }{\partial x_1}\) and \(V=\frac{\partial }{\partial x_2}\) form an orthonormal basis of \({\mathcal {T}}_x({\varvec{M}})\), and, for any \(y=(y_1,y_2)\in {\varvec{M}}\), \(\rho (x,y)^2=y_1^2+y_2^2\). Also, up to sign, for \(y\in {\mathcal {H}}_1(x)\), \({\varvec{n}}(x| y)=V\) and, for \(y\in {\mathcal {H}}_2(x)\), \({\varvec{n}}(x| y)=U\). For \(y\in {\mathcal {H}}_1(x)\), \(\rho (x,y)\kappa (x| y)=2|y_1|=2\pi \) and, similarly, \(\rho (x,y)\kappa (x| y)=2|y_2|=2\pi \) for \(y\in {\mathcal {H}}_2(x)\). Thus,

$$\begin{aligned}{} & {} \int _{{\mathcal {H}}_x}\rho (x,y)\,\kappa (x| y)\,\langle {\varvec{n}}(x| y),U\rangle \,\langle {\varvec{n}}(x| y),U\rangle \,\psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_x}(y)\\{} & {} \quad =\int _{{\mathcal {H}}_2(x)}\rho (x,y)\,\kappa (x| y) \,\langle {\varvec{n}}(x| y),U\rangle \,\langle {\varvec{n}}(x| y), U\rangle \,\psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_2(x)}(y)\\{} & {} \quad =2\pi \int _{-\pi }^{\pi }\psi (y_1,\pi )\,\hbox {d}y_1;\\{} & {} \int _{{\mathcal {H}}_x}\rho (x,y)\,\kappa (x| y) \,\langle {\varvec{n}}(x| y),V\rangle \,\langle {\varvec{n}}(x| y),V\rangle \,\psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_x}(y)\\{} & {} \quad =\int _{{\mathcal {H}}_1(x)}\rho (x,y)\,\kappa (x| y) \,\langle {\varvec{n}}(x| y),V\rangle \,\langle {\varvec{n}}(x| y),V\rangle \,\psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_1(x)}(y)\\{} & {} \quad =2\pi \int _{-\pi }^{\pi }\psi (\pi ,y_2)\,\hbox {d}y_2;\\{} & {} \int _{{\mathcal {H}}_x}\rho (x,y)\,\kappa (x| y) \,\langle {\varvec{n}}(x| y),U\rangle \,\langle {\varvec{n}}(x| y),V\rangle \,\psi (y)\,\hbox {d}\hbox {vol}_{{\mathcal {H}}_x}(y)\\{} & {} \quad =0. \end{aligned}$$

Hence, in this case, under the chosen ‘coordinate system’, the corresponding \(J_{\mu }\) is

$$\begin{aligned} J_{\mu }(0,0)=2\pi \begin{pmatrix} \displaystyle \int _{-\pi }^{\pi }\psi (y_1,\pi )\,\hbox {d}y_1&{}0\\ 0&{}\displaystyle \int _{-\pi }^{\pi }\psi (\pi ,y_2)\,\hbox {d}y_2 \end{pmatrix}. \end{aligned}$$

Finally, we note that

$$\begin{aligned} \mathcal {A}_{\delta }(x) = \left\{ S^1 \times \left\{ (- \pi +\delta , -\pi ) \cup (\pi -\delta , \pi ]\right\} \right\} \cup \left\{ \left\{ (- \pi +\delta , -\pi ) \cup (\pi -\delta , \pi ] \right\} \times S^1 \right\} \!. \end{aligned}$$

In this case, \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x))\) is seen to be bounded by \(8 \pi \delta \). It follows that condition (iv) of Theorem 1 is satisfied. Condition (v) of Theorem 1 follows because the torus is flat and hence \(H(x_0\vert \xi )=-I_{2\times 2}\), the identity, for \(\xi \not \in {\mathcal {C}}_{x_0}\).

For d-dimensional tori with \(d > 2\), \({\mathcal {C}}_x\) is given by the union of \((d-1)\)-dimensional tori, and the conditions remain satisfied with \(J_{\mu }\) not vanishing in general. With a similarly defined coordinate system around x and a similar argument, the corresponding \(H(x_0\vert \xi )=-I_{d\times d}\) and \(J_{\mu }(0,\cdots ,0)\) is the \(d\times d\) diagonal matrix whose ith diagonal element is

$$\begin{aligned} 2\pi \int _{-\pi }^{\pi }\cdots \int _{-\pi }^{\pi }\psi (y_1,\cdots ,y_{d-i},\pi ,y_{d+2-i},\ldots ,y_d) \,\hbox {d}y_1\cdots \hbox {d}y_{d-i}\hbox {d}y_{d+2-i}\ldots \hbox {d}y_d. \end{aligned}$$

(c) \({\varvec{M}}=\mathbb{R}\mathbb{P}^2\) (two-dimensional real projective space): \({\mathcal {Q}}_x=\emptyset \) so that \({\mathcal {H}}_x={\mathcal {C}}_x\); and for any \(y\in {\mathcal {C}}_x\), \(\rho (x,y)=\pi /2\) where the initial tangent vectors of the two minimal geodesics from x to y are in opposite directions. Hence, for \(y\in {\mathcal {C}}_x\), \(\rho (x,y)\,\kappa (x| y)=\pi \). We take the normal coordinates centred at x on \({\mathcal {T}}_x({\varvec{M}})\). Then, using the corresponding polar coordinates \((r,\theta )\), for any \(y\in {\mathcal {C}}_x\) one of the initial unit tangent vectors to the two geodesics from x to y has coordinates \((\cos \theta ,\sin \theta )\) where \(\theta \in [0,\pi )\), which we take as \({\varvec{n}}(x| y)\). Thus, for \(y\in {\mathcal {C}}_x\),

$$\begin{aligned} {\varvec{n}}(x| y)\otimes {\varvec{n}}(x| y)=\begin{pmatrix} \cos \theta \\ \sin \theta \end{pmatrix} \begin{pmatrix} \cos \theta&\sin \theta \end{pmatrix} =\begin{pmatrix} \cos ^2\theta &{}\sin \theta \cos \theta \\ \sin \theta \cos \theta &{}\sin ^2\theta \end{pmatrix} \end{aligned}$$

so that in this case, under this coordinate system, the corresponding \(J_{\mu }\) is

$$\begin{aligned} J_{\mu }(0,0)=\pi \int _0^{\pi }\begin{pmatrix} \cos ^2\theta &{}\sin \theta \cos \theta \\ \sin \theta \cos \theta &{}\sin ^2\theta \end{pmatrix}\psi (\pi /2,\theta )\,\hbox {d}\theta . \end{aligned}$$

This expression can be verified by direct computation of the Hessian of \(F_{\mu }\).

Finally, we consider conditions (iv) and (v) of Theorem 1. We first identify the form of \(\mathcal {A}_{\delta }(x_0)\). Without loss of generality we take \(x_0\) to be \(x_0= (0,0,1)^{\top }\) and represent \(\mathbb{R}\mathbb{P}^2\) by the hemisphere \(\{x =(x_1, x_2, x_3)^{\top } \in \mathcal {S}^2:\, x_3 \ge 0\}\). Then it is easy to see that \(\mathcal {A}_{\delta }(x_0)\) is given by

$$\begin{aligned} \mathcal {A}_{\delta }(x_0) = \Big \{ (\sin \theta \cos \phi , \sin \theta \sin \phi , \cos \theta )^{\top }: \, \theta \in \big ((\pi /2) - \delta , \pi /2\big ), \phi \in (0,2\pi )\Big \}. \end{aligned}$$

Moreover, the volume of \(\mathcal {A}_{\delta }(x_0)\) with respect to surface area measure on \(S^2\) is \(2 \pi \sin \delta \). It follows easily that condition (iv) of Theorem 1 is satisfied here, too.

Condition (v) requires a bit more work to check in this example. From Kendall and Le [15], \(H(x\vert y)\) on the sphere \(S^2\) is given by the map

$$\begin{aligned} H(x| y):\,\, v&\mapsto -\dfrac{1-\rho (x,y)\cos (\rho (x,y))/ \sin (\rho (x,y))}{\rho (x,y)^2} \langle \exp ^{-1}_x(y),v\rangle \exp ^{-1}_x(y)\\&\quad \ -\dfrac{\rho (x,y)\cos (\rho (x,y))}{\sin (\rho (x,y))}v, \end{aligned}$$

where \(\langle \cdot , \cdot \rangle \) is the Riemannian inner product on the tangent space at \(x \in {\varvec{M}}\), noting that H defined in [15] has the opposite sign to the one used in this paper. When restricted to the (open) half sphere centred at y, it gives H(x|y) on \(\mathbb{R}\mathbb{P}^2\). For given \(x \in {\varvec{M}}\) there is a possible singularity at \(y=x\). However, the singularity is in fact a removable singularity because, for x close to y, \(\rho (x,y) \sim \sin (\rho (x,y))\) and \(1- \cos (\rho (x,y)) \sim \rho (x,y)^2\). Then, the boundedness of H(x|y) ensures that

$$\begin{aligned} \int _{{\varvec{M}}} H(x\vert y)\hbox {d}\mu (y) \end{aligned}$$

is well-defined.

As is the case with higher-dimensional tori, it is also easy to see that, for \(\mathbb{R}\mathbb{P}^d\) with \(d > 2\), conditions (iv) and (v) remain satisfied with the expression for \(H(x\vert y)\) being the same as above for \(\mathbb{R}\mathbb{P}^2\). Also, \(J_{\mu }(x_0)\) will not vanish in general. With a similar coordinate system around \(x_0\) and a similar argument, its expression can be obtained from the above for \(\mathbb{R}\mathbb{P}^2\) with \({\varvec{n}}(x| y)\) there replaced by

$$\begin{aligned} \left( \cos \theta _1, \sin \theta _1\cos \theta _2, \cdots , \sin \theta _1\sin \theta _2\cdots \sin \theta _{d-1} \right) \end{aligned}$$

and with the integration there replaced by \((d-1)\)-fold integration.

3 Proof of Theorem 1

To prove Theorem 1, we first consider a generalised version of the Taylor expansion of the inverse exponential map at different base points. That is, for fixed \(z\in {\varvec{M}}\), we study the Taylor expansion for the vector field \(\exp ^{-1}_x(z)\) for \(x\not \in {\mathcal {C}}_z\). For this, we fix \(z\in {\varvec{M}}\) and, for \(x_0,x_1\not \in {\mathcal {C}}_z\) sufficiently close, denote by \(\gamma \) the unit speed geodesic segment such that \(\gamma (0)=x_0\) and \(\gamma (\rho (x_0,x_1))=x_1\).

If \(\gamma (t)\not \in {\mathcal {C}}_z\) for all \(t\in (0,\rho (x_0,x_1))\), \(\exp ^{-1}_{\gamma (t)}(z)\) is a smooth vector field along \(\gamma \). Then, it follows from the definition of the covariant derivative that the Taylor expansion for \(\exp _x^{-1}(z)\) about \(\exp ^{-1}_{x_0}(z)\) takes the form

$$\begin{aligned} \Pi _{x_1,x_0}\Big (\exp ^{-1}_{x_1}(z)\Big )=\exp ^{-1}_{x_0}(z)+\Big (H(x_0| z)\Big )\Big (\exp _{x_0}^{-1}(x_1)\Big )+R(x_0,x_1), \end{aligned}$$
(21)

where \(H(x'| x)\) is defined by (8) for \(x'\in {\varvec{M}}\setminus {\mathcal {C}}_x\) and \(\vert \vert R(x_0,x_1) \vert \vert = o(\rho (x_0,x_1))\)..

In the case that \(\gamma (t)\in {\mathcal {H}}_z\subseteq {\mathcal {C}}_z\) for some \(t\in (0,\rho (x_0,x_1))\), we have the following result on the approximation of \(\Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) \) in terms of \(\exp ^{-1}_{x_0}(z)\), generalising the Taylor expansion (21) for smooth vector fields.

Proposition 3

Let \(x_0,x_1,z\in {\varvec{M}}\) be such that \(x_0,x_1\not \in {\mathcal {C}}_z\) are sufficiently close and \(\gamma \) be the minimal unit speed geodesic from \(x_0\) to \(x_1\). If there is a parameter \(t_z\in (0,\rho (x_0,x_1))\) such that \(\gamma (t_z)\in {\mathcal {H}}_z\), then

$$\begin{aligned} \Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right)&=\exp ^{-1}_{x_0}(z)+(H(x_0| z))\left( \exp ^{-1}_{x_0}(x_1)\right) \nonumber \\&\quad +\,\,\rho _z(\gamma (t_z))\,\kappa (\gamma (t_z)| z) \,\Pi _{\gamma (t_z),x_0}\left( {\varvec{n}}(\gamma (t_z)| z)\right) \nonumber \\&\quad +o(\rho (x_0,x_1)), \end{aligned}$$
(22)

where \(H(x'| x)\) is defined by (8), \(\kappa (y| x)\) is defined by (16) and \({\varvec{n}}(y| x)\) is defined by (17)

Proof

If there is a parameter \(t_z\in (0,\rho (x_0,x_1))\) such that \(\gamma (t_z)\in {\mathcal {H}}_z\subseteq {\mathcal {C}}_z\), such \(t_z\) is unique provided \(x_0\) and \(x_1\) are sufficiently close. To see the uniqueness of \(t_z\), we first exclude the possibility of a finite number of such \(t_z\). This can be achieved by replacing \(x_1\) with a new point, on the geodesic from \(x_0\) to \(x_1\), between the first and second such \(t_z\), which is permissable because \(x_0\) and \(x_1\) can be chosen arbitrarily close. The other possibility is that there are an infinite number of such \(t_z\). Let \(t_z^{*}\) denote the infimum of such \(t_z\). If \(t_z^{*}\) is not an accumulation point, then the proof is the same as in the finite case. If \(t_z^{*} = 0\), then 0 must be an accumulation point, in which case we have a contradiction, because \(\mathcal {C}_z\) is closed and the fact that there is a sequence of \(t_z\) suct that \(t_z \rightarrow 0\) would imply that \(x_0 \in \mathcal {C}_z\), thereby contradicting the given assumption. The only other possibility is that \(t_z^{*}>0\) and \(t_z^{*}\) is an accumulation point. In this case we redefine \(x_1\) to be \(\gamma (t)\) for any \(t \in (0,t_z^{*})\)which gives uniqueness of \(t_z\) when it exists, provided \(x_0\) and \(x_1\) are sufficiently close. See also Lemma 3 in the Supplementary Material of [18] for the description of sufficiently small neighbourhoods of non-conjugate parts of cut loci.

Without loss of generality, we may assume that the two smooth functions \(\phi _i(\,\cdot \,)=\phi _{i\gamma (t_z)}(\,\cdot | z)\), where \(\phi _{iy}(\,\cdot | x)\) are defined in Lemma 2, are chosen such that

$$\begin{aligned} \rho _z(\gamma (t))={\left\{ \begin{array}{ll} \phi _1(\gamma (t))&{}\hbox {if }0\leqslant t\leqslant t_z\\ \phi _2(\gamma (t))&{}\hbox {if }t_z\leqslant t\leqslant \rho (x_0,x_1). \end{array}\right. } \end{aligned}$$

Then, the difference between the two tangent vectors \(\Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) \) and \(\exp ^{-1}_{x_0}(z)\), both in \(\mathcal {T}_{x_0}({\varvec{M}})\), can be expressed as

$$\begin{aligned}{} & {} \Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) -\exp ^{-1}_{x_0}(z)\\{} & {} \quad =\left\{ \Pi _{x_1,x_0}\bigg (\exp ^{-1}_{x_1}(z)\bigg ) -\Pi _{\gamma (t_z),x_0}\bigg (-\nabla \phi _2(\gamma (t_z)) \,\rho _z(\gamma (t_z))\bigg )\right\} \\{} & {} \qquad +\left\{ \Pi _{\gamma (t_z),x_0} \bigg (-\nabla \phi _1(\gamma (t_z))\,\rho _z (\gamma (t_z))\bigg )-\exp ^{-1}_{x_0}(z)\right\} \\{} & {} \qquad +\,\,\left\{ \Pi _{\gamma (t_z),x_0} \bigg (\nabla \phi _1(\gamma (t_z))\,\rho _z(\gamma (t_z)) -\nabla \phi _2(\gamma (t_z))\,\rho _z(\gamma (t_z))\bigg )\right\} \!. \end{aligned}$$

The definitions for \(\kappa (y| x)\) and \({\varvec{n}}(y| x)\) given respectively by (16) and (17) imply that the terms in the third curly bracket on the right hand side above is equal to

$$\begin{aligned} \rho _z(\gamma (t_z))\,\kappa (\gamma (t_z)| z) \,\Pi _{\gamma (t_z),x_0}({\varvec{n}}(\gamma (t_z)| z)). \end{aligned}$$

By (21), the difference between the terms in the second curly bracket on the right hand side above and \(\Big (H(x_0| z)\Big )\Big (\exp ^{-1}_{x_0}(\gamma (t_z))\Big )\) is \(o(\rho (x_0,x_1))\). Since

$$\begin{aligned} \Pi _{x_1,x_0}^{-1}\circ \Pi _{\gamma (t_z),x_0} =\Pi _{x_0,x_1}\circ \Pi _{\gamma (t_z),x_0}=\Pi _{\gamma (t_z),x_1}, \end{aligned}$$

a similar application of (21) to the terms in the first curly bracket results in

$$\begin{aligned} -\Pi _{x_1,x_0}\left( (H(x_1| z)) (\exp ^{-1}_{x_1}(\gamma (t_z)))\right) \!, \end{aligned}$$

up to a term of order \(o(\rho (x_0,x_1))\). Hence,

$$\begin{aligned}{} & {} \Pi _{x_1,x_0}\left( \exp ^{-1}_{x_1}(z)\right) \\{} & {} \quad =\exp ^{-1}_{x_0}(z)+(H(x_0| z)) \left( \exp ^{-1}_{x_0}(\gamma (t_z))\right) \\{} & {} \qquad +\,\,\Pi _{x_1,x_0}\left( (H(x_1| z)) \left( -\exp ^{-1}_{x_1}(\gamma (t_z))\right) \right) \\{} & {} \qquad +\,\,\rho _z(\gamma (t_z))\,\kappa (\gamma (t_z)| z) \,\Pi _{\gamma (t_z),x_0}\left( {\varvec{n}}(\gamma (t_z)| z)\right) +o(\rho (x_0,x_1)). \end{aligned}$$

However, using

$$\begin{aligned} \exp ^{-1}_{x_0}(x_1)=\frac{\rho (x_0,x_1)}{\rho (x_0, \gamma (t_z))}\exp ^{-1}_{x_0}\left( \gamma (t_z)\right) \end{aligned}$$

and similarly for \(\exp ^{-1}_{x_1}\left( \gamma (t_z)\right) \), as well as noting \(\rho (x_0,x_1)=\rho (x_0,\gamma (t_z))+\rho (\gamma (t_z),x_1)\), we have

$$\begin{aligned}{} & {} (H(x_0| z))\left( \exp ^{-1}_{x_0}(\gamma (t_z))\right) +\Pi _{x_1,x_0}\left( (H(x_1| z))\left( -\exp ^{-1}_{x_1} (\gamma (t_z))\right) \right) \\{} & {} \quad =(H(x_0| z))\left( \exp ^{-1}_{x_0}(x_1)\right) \\{} & {} \qquad -\,\frac{\rho (\gamma (t_z),x_1)}{\rho (x_0,x_1)} \left\{ (H(x_0| z))\left( \exp ^{-1}_{x_0}(x_1)\right) \!+\!\Pi _{x_1,x_0}\left( (H(x_1| z))\left( \exp ^{-1}_{x_1} (x_0)\right) \right) \right\} \\{} & {} \quad =(H(x_0| z))\left( \exp ^{-1}_{x_0}(x_1)\right) +o(\rho (x_0,x_1)), \end{aligned}$$

so that the required result follows. \(\square \)

In the remainder of the paper it will be convenient to use the different but equivalent representation of \(\mathcal {A}_{\delta }(x)\) in (9) given by

$$\begin{aligned} \mathcal {A}_{\delta }(x)=\{ z \in {\varvec{M}}: \, \mathcal {C}_z \cap B_{\delta }(x) \ne \emptyset \}. \end{aligned}$$
(23)

To see that (9) and (23) are equivalent, note that

$$ \begin{aligned} \mathcal {A}_{\delta }(x)&=\{ z \in {\varvec{M}}: \, \mathcal {C}_z \cap B_{\delta }(x) \ne \emptyset \} \nonumber \\&=\left\{ z \in {\varvec{M}}: \, \text {there exists} \, y \, \, \text {such that} \, \, y \in \mathcal {C}_z \, \, \& \, \, y \in B_{\delta }(x) \right\} \nonumber \\&=\left\{ z \in {\varvec{M}}: \, \text {there exists}\, \, y \, \, \text {such that} \, \, z \in \mathcal {C}_y \, \, \& \, \, y \in B_{\delta }(x) \right\} \nonumber \\&= \bigcup _{y \in B_{\delta }(x)} \mathcal {C}_y, \end{aligned}$$
(24)

using the fact that \(z \in \mathcal {C}_y\) if and only if \(y \in \mathcal {C}_z\).

Proof of Theorem 1

When \(x_0\) and \(x_1\) are sufficiently close, write \(\gamma \) for the unique unit speed geodesic from \(x_0\) to \(x_1\) and \({\mathcal {N}}^*_{x_0,x_1}\) for the set defined by

$$\begin{aligned} {\mathcal {N}}^*_{x_0,x_1}=\{z\in {\varvec{M}}|\gamma (t)\in {\mathcal {H}}_z\hbox { for some }t\in (0,\rho (x_0,x_1))\}. \end{aligned}$$
(25)

It follows from condition (iv) of Theorem 1 that the volume of \({\mathcal {N}}^*_{x_0,x_1}\) is \(O(\rho (x_0,x_1))\), because for \(\delta >0\) sufficiently small and \(x_1\) such that \(\rho (x_0, x_1)=\delta \), \(\mathcal {N}_{x_0, x_1}^{*} \subseteq \mathcal {A}_{\delta }(x_0)\) and also \(\mu (\mathcal {A}_{\delta } (x_0))=O(\delta )\) as \(\delta \downarrow 0\). Similar to \({\mathcal {N}}^*_{x_0,x_1}\), we also write \({\mathcal {A}}_{x_0,x_1}\) for the set defined by

$$\begin{aligned} {\mathcal {A}}_{x_0,x_1}=\{z\in {\varvec{M}}|\gamma (t) \in {\mathcal {C}}_z\hbox { for some }t\in (0,\rho (x_0,x_1))\}. \end{aligned}$$

Then \({\mathcal {A}}_{\rho (x_0,x_1)}(x_0)\supseteq \,{\mathcal {A}}_{x_0,x_1}\supseteq {\mathcal {N}}^*_{x_0,x}\).

Since \(x_0\) is the Fréchet mean of \(\mu \), \(G_{\mu }(x_0)=0\). Under the given assumption, we also have that, for \(x\in {\varvec{M}}\) in a neighbourhood of \(x_0\), \(\mu (\{\xi \in {\mathcal {C}}_x\})=0\). Thus, it follows from (21) and Proposition 2 that, for x sufficiently close to \(x_0\),

$$\begin{aligned}{} & {} \Pi _{x,x_0}(G_{\mu }(x))\\{} & {} \quad =\left( \int _{{\varvec{M}}}H(x_0|\xi )\,\hbox {d}\mu (\xi )\right) \left( \exp ^{-1}_{x_0}(x)\right) \\{} & {} \qquad +\int _{{\varvec{M}}}\rho _{\xi }(\gamma (t_{\xi }))\,\kappa (\gamma (t_{\xi })| \xi )\,\Pi _{\gamma (t_{\xi }),x_0} \left( {\varvec{n}}(\gamma (t_{\xi })| \xi )\right) \,1_{{\mathcal {N}}^*_{x_0,x}}(\xi )\,\hbox {d}\mu (\xi )\\{} & {} \qquad +\,\,\Pi _{x,x_0}\left( \int _{{\mathcal {A}}_{x_0,x} \setminus {\mathcal {N}}^*_{x_0,x}}\exp ^{-1}_x(\xi )\,1_{\{\xi \not \in {\mathcal {C}}_x\}} \hbox {d}\mu (\xi )\right) \\{} & {} \qquad -\left( \int _{{\mathcal {A}}_{x_0,x}\setminus {\mathcal {N}}^*_{x_0,x}} H(x_0|\xi )\,\hbox {d}\mu (\xi )\right) \left( \exp ^{-1}_{x_0}(x)\right) \!. \end{aligned}$$

Now

$$\begin{aligned} {\mathcal {A}}_{x_0,x}\setminus {\mathcal {N}}^*_{x_0,x}= & {} \{z\in {\varvec{M}}|\gamma (t)\in {\mathcal {Q}}_z\hbox { for some }t \in (0,\rho (x_0,x))\}\\\subseteq & {} \bigcup _{y\in B_{\rho (x_0,x)}(x_0)}{\mathcal {Q}}_y \subseteq {\mathcal {A}}_{\rho (x_0,x)}(x_0). \end{aligned}$$

On the other hand, the construction in our proof of Proposition 2 in Appendix A together with Lemma 3 of [18] imply that, for any \(\delta >0\), when \(\rho (x_0,x)\) is sufficiently small, we also have

$$\begin{aligned} \bigcup _{y\in B_{\rho (x_0,x)}(x_0)}{\mathcal {Q}}_y \subseteq \bigcup _{y\in {\mathcal {Q}}_{x_0}}B_{\delta }(y). \end{aligned}$$

However, by Lemma 1, \({\mathcal {Q}}_{x_0}\) has co-dimension at least two, so that the volume of the set on the right hand the above equation is \(O(\delta ^2)\). Thus, from condition (iv) of the Theorem, it follows that vol\(({\mathcal {A}}_{x_0,x} {\setminus }{\mathcal {N}}^*_{x_0,x})=O(\rho (x_0,x)^2)\). This, together with condition (v) of the Theorem, ensures that

$$\begin{aligned} \int _{{\mathcal {A}}_{x_0,x}\setminus {\mathcal {N}}^*_{x_0,x}} H(x_0|\xi )\,\hbox {d}\mu (\xi )=o(\rho (x_0,x)), \end{aligned}$$

and since the boundedness of \({\varvec{M}}\) and Lemma 1 together imply that

$$\begin{aligned} \int _{{\mathcal {A}}_{x_0,x}\setminus {\mathcal {N}}^*_{x_0,x}} \exp ^{-1}_x(\xi )\,1_{\{\xi \not \in {\mathcal {C}}_x\}}\hbox {d}\mu (\xi )=o(\rho (x_0,x)), \end{aligned}$$

we have that

$$\begin{aligned}{} & {} \Pi _{x,x_0}(G_{\mu }(x))\\{} & {} \quad =\left( \int _{{\varvec{M}}}H(x_0|\xi )\,\hbox {d}\mu (\xi )\right) \left( \exp ^{-1}_{x_0}(x)\right) \\{} & {} \qquad +\int _{{\varvec{M}}}\rho _{\xi }(\gamma (t_{\xi }))\,\kappa (\gamma (t_{\xi })| \xi )\,\Pi _{\gamma (t_{\xi }),x_0} \left( {\varvec{n}}(\gamma (t_{\xi })| \xi )\right) \,1_{{\mathcal {N}}^*_{x_0,x}}(\xi )\,\hbox {d}\mu (\xi )\\{} & {} \qquad +\,\,o(\rho (x_0,x)). \end{aligned}$$

Thus, by the definition (14) of \(\Psi _{\mu } (x_0)\), it is sufficient to show that

$$\begin{aligned}{} & {} \int _{{\varvec{M}}}\rho _{\xi }(\gamma (t_{\xi }))\,\kappa (\gamma (t_{\xi })|\xi ) \,\Pi _{\gamma (t_{\xi }),x_0}({\varvec{n}}(\gamma (t_{\xi })|\xi )) \,1_{{\mathcal {N}}^*_{x_0,x}}(\xi )\,\hbox {d}\mu (\xi )\nonumber \\{} & {} \qquad =\left( J_{\mu }(x_0)\right) (\exp ^{-1}_{x_0}(x))+o(\rho (x_0,x)), \end{aligned}$$
(26)

where \(J_{\mu }(x_0)\) is defined by (20).

For this, we note that the functions \(\chi _i(\,\cdot | x)\) given in Corollary 1 are defined on a neighbourhood of \({\mathcal {H}}_x\). Thus, we may extend the definitions of the corresponding \(\kappa (\,\cdot | x)\) and \({\varvec{n}}(\,\cdot | x)\) given in (16) and (17) to that neighbourhood of \({\mathcal {H}}_x\). This implies that

$$\begin{aligned}{} & {} \int _{{\varvec{M}}}\rho _{\xi }(\gamma (t_{\xi }))\,\kappa (\gamma (t_{\xi })|\xi ) \,\Pi _{\gamma (t_{\xi }),x_0}({\varvec{n}}(\gamma (t_{\xi })|\xi ))\,1_{{\mathcal {N}}^*_{x_0,x}} (\xi )\,\hbox {d}\mu (\xi )\nonumber \\{} & {} \quad =\int _{{\varvec{M}}}\rho _{\xi }(x_0)\,\kappa (x_0|\xi )\,{\varvec{n}}(x_0|\xi ) \,1_{{\mathcal {N}}^*_{x_0,x}}(\xi )\,\hbox {d}\mu (\xi )+O\Big (\rho (x_0,x)^2\Big ). \end{aligned}$$
(27)
Fig. 1
figure 1

Relevant points on the minimal geodesic \(\beta _z\) from \(x_0\) to z when \(z\not \in {\mathcal {C}}_{x_0}\)

To analyse the right hand side of (27) we consider, for any \(z\in {\mathcal {N}}^*_{x_0,x_1}\), the minimal unit speed geodesic \(\beta _z\) from z to \(x_0\). Extending \(\beta _z\) backwards beyond z, let \(y_z\) be the first hitting point of \({\mathcal {C}}_{x_0}\) on the extension; see Fig. 1. Let

$$\begin{aligned} {\mathcal {N}}_{x_0,x_1}=\{z\in {\varvec{M}}|\gamma (t)\in {\mathcal {H}}_z\hbox { for some } t\in (0,\rho (x_0,x_1))\hbox { and }y_z\in {\mathcal {H}}_{x_0}\}. \end{aligned}$$

Then, \({\mathcal {N}}_{x_0,x_1}\) is a Borel measurable subset of \({\mathcal {N}}^*_{x_0,x_1}\) and the difference between the volumes of \({\mathcal {N}}_{x_0,x_1}\) and of \({\mathcal {N}}^*_{x_0,x_1}\) is \(o(\rho (x_0,x_1))\). Since, by (11) and condition (iii) of Theorem 1, which states that in a neighbourhood of \({\mathcal {C}}_{x_0}\), \(\mu \) is absolutely continuous with respect to the volume measure \(\textrm{vol}_{{\varvec{M}}}(\cdot )\) with continuous Radon–Nikodym derivative \(\psi \), (27) can be expressed in terms of \({\mathcal {N}}_{x_0,x}\) as

$$\begin{aligned}{} & {} \int _{{\varvec{M}}}\rho _{\xi }(\gamma (t_{\xi }))\,\kappa (\gamma (t_{\xi })|\xi ) \,\Pi _{\gamma (t_{\xi }),x_0}({\varvec{n}}(\gamma (t_{\xi })|\xi ))\,1_{{\mathcal {N}}^*_{x_0,x}} (\xi )\,\hbox {d}\mu (\xi )\nonumber \\{} & {} \quad = \int _{{\varvec{M}}}\rho _{\xi }(x_0)\,\kappa (x_0|\xi ) \,{\varvec{n}}(x_0|\xi )\,1_{{\mathcal {N}}_{x_0,x}}(\xi )\,\hbox {d}\mu (\xi )+O\Big (\rho (x_0,x)^2\Big )\nonumber \\{} & {} \quad = \int _{{\mathcal {N}}_{x_0,x}}\!\!\!\!\!\!\rho _{\xi }(x_0) \,\kappa (x_0|\xi )\,{\varvec{n}}(x_0|\xi )\,\psi (\xi )\,\hbox {d}\hbox {vol} (\xi )+O\Big (\rho (x_0,x)^2\Big ), \end{aligned}$$
(28)

where x is sufficiently close to \(x_0\). If we write \(u_z\) for the point on \(\beta _z\) that lies in \({\mathcal {C}}_{x_1}\) as in Fig. 1, the volume of the local cross-sectional slice of \({\mathcal {N}}_{x_0,x_1}\) at \(z\in {\mathcal {N}}_{x_0,x_1}\) can be approximated by \(\hbox {d}\hbox {vol}_{{\mathcal {H}}_{x_0}}(y_z)\,\langle {\varvec{n}}(y_z| x_0),\,\exp ^{-1}_{y_z}(u_z)\rangle \). Then, by the definition of \(\tau _{y|x}\) prior to (19), the geometric interpretation of \(\tau '_{y|x}(0)\) following (19), see also the illustration in Fig. 1, we also have

$$\begin{aligned} \frac{\langle {\varvec{n}}(y_z| x_0),\,\exp ^{-1}_{y_z}(u_z)\rangle }{\langle {\varvec{n}}(x_0| y_z),\,\exp ^{-1}_{x_0}(x_1)\rangle }\approx \tau '_{y_z|x_0}(0) \end{aligned}$$

since \(y_z\in {\mathcal {H}}_{x_0}\), where both \({\varvec{n}}(y_z| x_0)\) and \({\varvec{n}}(x_0| y_z)\) are chosen such that the inner products are non-negative and where \(\tau '_{y|x}(0)\) is given in (19). As usual, the notation \(a\approx b\) used above, as well as below, means that, as \(x_1\rightarrow x_0\), the left-hand and the right-hand terms have the same limit. These observations imply that, for \(z\in {\mathcal {N}}_{x_0,x_1}\),

$$\begin{aligned} \hbox {d}\hbox {vol}(z)\approx \langle {\varvec{n}}(x_0| y_z), \,\exp ^{-1}_{x_0}(x_1)\rangle \,\tau '_{y_z|x_0}(0) \,\hbox {d}\hbox {vol}_{{\mathcal {H}}_{x_0}}(y_z). \end{aligned}$$

Using this and the continuity in z of \(\rho _z(x)\), \(\kappa (x| z)\), \({\varvec{n}}(x| z)\) and \(\psi (z)\), the dominant term on the right hand side of the second equality in (28) can be expressed as

$$\begin{aligned}{} & {} \int _{{\mathcal {N}}_{x_0,x}}\rho _{\xi }(x_0)\,\kappa (x_0|\xi ) \,{\varvec{n}}(x_0|\xi )\,\psi (\xi )\,\hbox {d}\hbox {vol}(\xi )\\{} & {} \quad =\int _{{\mathcal {H}}_{x_0}}\!\!\!\!\!\rho _y(x_0) \,\kappa (x_0| y)\,{\varvec{n}}(x_0| y)\,\langle {\varvec{n}}(x_0| y), \,\exp ^{-1}_{x_0}(x)\rangle \,\tau '_{y|x_0}(0)\,\psi (y) \,\hbox {d}\hbox {vol}_{{\mathcal {H}}_{x_0}}(y)\\{} & {} \qquad +\,\,o(\rho (x_0,x)), \end{aligned}$$

where the \(o(\rho (x_0,x))\) term is due to the above approximation as well as to the fact that the volume of \({\mathcal {N}}_{x_0}\) is \(O(\rho (x_0,x))\). Hence, (26) follows from the definition (20) of \(J_{\mu }(x_0)\) as required. \(\square \)

4 Proof of Proposition 1 and Theorem 2

4.1 Proof of Proposition 1

For each \(n \ge 1\), let \(\hat{\xi }_n \in \mathcal {G}_n\) denote any measureable selection from \(\mathcal {G}_n\). From the strong law of large numbers in Ziezold [24], and using the assumption that \(x_0\) is the unique population mean, almost surely

$$\begin{aligned} \bigcap _{n=1}^{\infty } \overline{\bigcup _{k=n}^{\infty } \mathcal {G}_k}\subseteq \{x_0\}, \end{aligned}$$

where a horizontal line over a set indicates set closure. From elementary considerations, the first set inclusion below holds and therefore the set \(\mathcal {G}_0\) of limit points is

$$\begin{aligned} \mathcal {G}_0=\bigcap _{n=1}^{\infty } \overline{\bigcup _{k=n}^{\infty } \{\hat{\xi }_k\}}\subseteq \bigcap _{n=1}^{\infty } \overline{\bigcup _{k=n}^{\infty } \mathcal {G}_k}\subseteq \{x_0\}, \end{aligned}$$

where, for each \(k\ge 1\), \(\hat{\xi }_k \in \mathcal {G}_k\). Since \(\mathcal {G}_0 \subseteq \{x_0\}\), there are two possibilities: either \(\mathcal {G}_0=\{x_0\}\), in which case the proposition follows; or, alternatively, \(\mathcal {G}_0=\emptyset \), the empty set. However, \({\varvec{M}}\) is compact, so \(\{\hat{\xi }_n\}\) must have a convergent subsequence with a limit \(x_1 \in {\varvec{M}}\). Moreover, we must have \(x_1 \in \mathcal {G}_0\) because \(x_1\) is an accumulation point of the sequence. Therefore, \(x_1=x_0\), and consequently \(\rho (\hat{\xi }_n, x_0) \rightarrow 0\) almost surely as required. \(\square \)

4.2 An elementary lemma

We first introduce some notation. If \(X=(X_1, \ldots , X_m)^{\top }\) and \(x=(x_1, \ldots , x_m)^{\top }\) are vectors with real components then statements such as \(\{X \le x\}\) and \(\{\vert X\vert \le x\}\) are interpreted component-wise as \(\{X_1 \le x_1, \ldots , X_m \le x_m\}\) and \(\{\vert X_1\vert \le x_1, \ldots , \vert X_m \vert \le x_m\}\), respectively. Also, denote by \(\Phi (x; \Sigma )\) the cumulative distribution function of a zero-mean multivariate Gaussian distribution with covariance matrix \(\Sigma \). The Euclidean norm, \((w^{\top } w)^{1/2}\), of a vector \(w \in \mathbb {R}^m\) is denoted \(\vert \vert w \vert \vert \). The following lemma is proved in Appendix B.

Lemma 4

Let XY denote \(\mathbb {R}^m\)-valued random vectors defined on an arbitrary probability space \((\Omega , \mathcal {F}, \mathbb {P})\). Let \(w \in \mathbb {R}^m\) denote a non-random vector with positive components. Then

$$\begin{aligned} \Big \vert \mathbb {P}[X+Y \le x] - \mathbb {P}[X \le x]\Big \vert \le \mathbb {P}[X \le x+w]-\mathbb {P}[X\le x-w] +2 \mathbb {P}[\{\vert Y \vert \le w\}^c]. \end{aligned}$$
(29)

Moreover, suppose that for some \(\epsilon >0\) and some \(m \times m\) covariance matrix \(\Sigma \),

$$\begin{aligned} \sup _{x \in \mathbb {R}^m} \Big \vert \mathbb {P}[X \le x] - \Phi (x;\Sigma ) \Big \vert \le \epsilon . \end{aligned}$$
(30)

Then, for a constant \(c_0>0\) depending only on \(\Sigma \),

$$\begin{aligned} \Big \vert \mathbb {P}[X+Y \le x] - \Phi (x;\Sigma )\Big \vert \le 3 \epsilon + 2 c_0 \vert \vert w \vert \vert + 2 \mathbb {P}[\{\vert Y \vert \le w\}^c]. \end{aligned}$$
(31)

Our proof of Theorem 2, in particular Step 1, makes use of this lemma.

4.3 Proof of Theorem 2

The proof of Theorem 2 is broken into two steps. In the first step we explain how Lemma 4 will be applied. In the subsequent step, we explain how to make the right-hand side of (31) arbitrarily small uniformly for all \(x \in \mathbb {R}^m\) and therefore the CLT in Theorem 2 will have been proved.

Step 1. Application of Lemma 4.

Write

$$\begin{aligned} G_{\hat{\mu }_n}(x)=\frac{1}{n}\sum _{i=1}^n \exp _{x}^{-1}(\xi _i)1_{\{\xi _i \notin \mathcal {C}_x\}}=\int _{{\varvec{M}}}\exp _{x}^{-1}(\xi )1_{\{\xi \notin \mathcal {C}_x\}}\hbox {d}\hat{\mu }_n(\xi ), \end{aligned}$$
(32)

where \(\hat{\mu }_n\) is the empirical distribution function on \({\varvec{M}}\) based on the random sample \(\xi _1, \ldots , \xi _n\) and define the vector field \(\{Z_n(x): \, x \in {\varvec{M}}\}\) on \({\varvec{M}}\) by

$$\begin{aligned} Z_n(x)=\sqrt{n}\left[ G_{\hat{\mu }_n}(x) - G_{\mu }(x) \right] , \end{aligned}$$
(33)

where \(G_{\mu } (x)\) is defined in (3). Under the conditions of Theorem 1 and 2, the population Fréchet mean \(x_0\) is a stationary minimum of (1) and, in particular,

$$\begin{aligned} G_{\mu }(x_0)= \int _{{\varvec{M}}} \exp _{x_0}^{-1}(\xi ) 1_{\{\xi \notin \mathcal {C}_{x_0}\}}\hbox {d}\mu (\xi ) = 0, \end{aligned}$$
(34)

i.e. the zero element in \(\mathcal {T}_{x_0}({\varvec{M}})\), which follows from integrating (5) over \({\varvec{M}}\) with respect to the probability measure \(\mu \) and putting \(x=x_0\). Hence

$$\begin{aligned} Z_n(x_0)=\sqrt{n}G_{\hat{\mu }_n}(x_0). \end{aligned}$$

Denote the Euclidean norm (which is the induced Riemannian tangent space norm) on \(\mathcal {T}_{x_0}({\varvec{M}})\) by \(\vert \vert \cdot \vert \vert \). Since \(\vert \vert \exp _{x_{0}}^{-1}(\xi )1_{\{\xi \notin \mathcal {C}_{x_0}\}}\vert \vert \) is bounded over \(\xi \in M\), the LHS of (33) with \(x=x_0\) follows a central limit theorem in the tangent space, i.e.

$$\begin{aligned} Z_n(x_0)=\frac{1}{\sqrt{n}}\sum _{i=1}^n \exp _{x_0}^{-1}(\xi _i)1_{\{\xi \notin \mathcal {C}_{x_0}\}} \overset{d}{\rightarrow }\mathfrak {N}_m(0_m,V_0), \end{aligned}$$
(35)

where \(V_0=\text {Cov}\{\exp _{x_0}^{-1}(\xi _1)1_{\{\xi \notin \mathcal {C}_{x_0}\}}\}=\text {Cov}\{\exp _{x_0}^{-1}(\xi _1)\}\).

Moreover, \(\hat{\xi }_n \in \mathcal {G}_n\), which is assumed to be a measureable selection from \(\mathcal {G}_n\) as in Proposition 1, satisfies (4) and consequently,

$$\begin{aligned} Z_n(\hat{\xi }_n)&=\sqrt{n} [G_{\hat{\mu }_n}(\hat{\xi }_n) -G_{\mu }(\hat{\xi }_n)] \nonumber \\&=- \sqrt{n}G_{\mu }(\hat{\xi }_n). \end{aligned}$$
(36)

Define

$$\begin{aligned} T_1=\Pi _{\hat{\xi }_n, x_0}Z_n(\hat{\xi }_n) - Z_n(x_0) \end{aligned}$$
(37)

Then, using (37), (36) and Theorem 1, it is seen that

$$\begin{aligned} T_1&=\Pi _{\hat{\xi }_n, x_0} Z_n(\hat{\xi }_n)-Z_n(x_0)\nonumber \\&=-\sqrt{n} \Pi _{\hat{\xi }_n, x_0}G_{\mu }(\hat{\xi }_n)-Z_n(x_0) \nonumber \\&=\sqrt{n} \Psi _{\mu }(x_0)\exp _{x_0}^{-1}(\hat{\xi }_n) -\sqrt{n}R(\hat{\xi }_n, x_0) - Z_n(x_0). \end{aligned}$$
(38)

Since, by assumption (vi) of Theorem 2, \(\Psi _{\mu }(x_0)\) has full rank, it follows that

$$\begin{aligned} \sqrt{n}\exp _{x_0}^{-1}(\hat{\xi }_n)&=\Psi _{\mu }(x_0)^{-1}Z_n (x_0) +\Psi _{\mu }(x_0)^{-1} \left\{ T_1 + \sqrt{n}R (\hat{\xi }_n, x_0)\right\} \nonumber \\&=X+Y, \end{aligned}$$
(39)

where

$$\begin{aligned} X=\Psi _{\mu }(x_0)^{-1}Z_n(x_0) \end{aligned}$$
(40)

and

$$\begin{aligned} Y=\Psi _{\mu }(x_0)^{-1} \left\{ T_1 + \sqrt{n}R(\hat{\xi }_n, x_0)\right\} \!. \end{aligned}$$
(41)

To establish Theorem 2, we apply Lemma 4 with X and Y defined in (40) and (41), respectively. Since, from (35), we know that \(Z_n(x_0) \overset{d}{\rightarrow }\mathfrak {N}_m(0_m, V_0)\), it follows that \(\Psi _{\mu }(x_0)^{-1}Z_n(x_0)\) is asymptotically normal with mean vector the zero vector and covariance matrix \(\Psi _{\mu }(x_0)^{-1} V_0\left\{ \Psi _{\mu }(x_0)^{\top }\right\} ^{-1}\). Moreover, as \(n \rightarrow \infty \), a suitable sequence of w’s such that \(\vert \vert w \vert \vert \rightarrow 0\) and \(\mathbb {P}[\{\vert Y \vert \le w\}^c] \rightarrow 0\) can always be found provided all components of Y go to 0 in probability. Consequently, to complete the proof of Theorem 2, it is sufficient to show that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\), which is proved in Step 2.

Step 2. Show that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\).

In Step 2 we first show that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\), where \(T_1\) is defined in (37). Then we deduce that \(\vert \vert Y \vert \vert \overset{p}{\rightarrow }0\), where Y is defined in (41). To establish the result for \(T_1\) we shall make use of results from empirical process theory. A key step is to approximate

$$\begin{aligned} \mathbb {E}\left[ \text {tr} \left\{ \left( \Pi _{x,x_0}Z_n(x) - \Pi _{y,x_0} Z_n(y) \right) ^{\top } \left( \Pi _{x,x_0}Z_n(x) - \Pi _{y,x_0}Z_n(y) \right) \right\} \right] \!. \end{aligned}$$
(42)

However, as \(\Pi _{x,x_0}\) and \(\Pi _{y,x_0}\) are vector space isomorphisms from \(\mathcal {T}_x({\varvec{M}})\) to \(\mathcal {T}_{x_0}({\varvec{M}})\) and \(\mathcal {T}_y({\varvec{M}})\) to \(\mathcal {T}_{x_0}({\varvec{M}})\), respectively, it follows from the definition of \(Z_n(x)\) in (33) that \(\Pi _{x,x_0} Z_n(x) - \Pi _{y,x_0}Z_n(y)\) in (42) is an IID sum of terms \(q_x(\xi _i)-q_y(\xi _i)\), where

$$\begin{aligned} q_x(\xi _i) =\Pi _{x,x_0} \exp _x^{-1} (\xi _i)1_{\{\xi _i \notin \mathcal {C}_x\}} - \Pi _{x,x_0}G_{\mu }(x), \end{aligned}$$

for \(i=1, \ldots , n\), with a similar definition for \(q_y(\xi _i)\), and with \(q_x(\xi _i), q_y(\xi _i) \in \mathcal {T}_{x_0}({\varvec{M}})\). It follows that (42) is equal to

$$\begin{aligned} \int _{{\varvec{M}}} \left\{ q_x(\xi )-q_y(\xi )\right\} ^{\top } \left\{ q_x (\xi ) - q_y(\xi )\right\} \hbox {d}\mu (\xi ). \end{aligned}$$
(43)

It is also assumed below that \(x,y \in B_{\delta _0}(x_0)\), the open ball in \({\varvec{M}}\) of radius \(\delta _0\) centred at \(x_0\), where \(0<\delta _0 <\delta \), and using condition (iii) of Theorem 1, \(\delta \) has been chosen to be sufficiently small for \(\mu \) to be absolutely continuous on \(\mathcal {A}_{\delta } (x_0)\), where \(\mathcal {A}_{\delta } (x_0)\) is defined in (9) and \(x_0 \in {\varvec{M}}\) is the population Fréchet mean of \(\mu \), assumed to be unique.

We may write (43) as

$$\begin{aligned} \int _{{\varvec{M}}} = \int _{\mathcal {A}_{\delta }(x_0)} + \int _{{\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)}. \end{aligned}$$

For \(\xi \in {\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)\), using (21) and Theorem 1, we have

$$\begin{aligned} q_x(\xi ) - q_y(\xi )&= \Pi _{x,x_0}\{\exp _x^{-1}(\xi )1_{\{\xi \notin \mathcal {C}_x\}} - G_{\mu }(x)\} \\&\quad -\Pi _{y,x_0}\{\exp _y^{-1}(\xi )1_{\{\xi \notin \mathcal {C}_y\}}-G_{\mu }(\xi )\} \\&=H(x_0\vert \xi ) \{exp_{x_0}^{-1}x - \exp _{x_0}^{-1}(y)\} + R_1(x_0,x) - R_1(x_0,y)\\&\quad - \Psi _{\mu }(x_0) \{\exp _{x_0}^{-1}(x) -\exp _{x_0}^{-1}(y)\} +R(x,x_0)- R(y,x_0); \end{aligned}$$

and also, since \({\varvec{M}}\) is smooth and compact, it follows that

$$\begin{aligned} \vert \vert R(x,x_0) - R(y,x_0)\vert \vert = O(\rho (x,y)) =\vert \vert R_1(x_0,x)-R_1(x_0,y)\vert \vert . \end{aligned}$$

Consequently, for \(\xi \in {\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)\),

$$\begin{aligned} \vert \vert q_x(\xi ) - q_y(\xi ) \vert \vert = O(\rho (x,y)). \end{aligned}$$
(44)

Therefore, since (44) holds uniformly for \(z \in {\varvec{M}}{\setminus } \mathcal {A}_{\delta }(x_0)\) by compactness, connectedness and smoothness of \({\varvec{M}}\), it follows that

$$\begin{aligned} \int _{{\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)} \{q_x(\xi ) -q_y(\xi )\}^{\top } \{q_x(\xi ) - q_y(\xi ) \} \hbox {d}\mu (\xi ) =O(\rho (x,y)^2). \end{aligned}$$
(45)

To approximate the integral of \(\{q_x(\xi ) -q_y(\xi )\}^{\top }\{ q_x(\xi )-q_y(\xi ) \}\) on the set \(\mathcal {A}_{\delta }(x_0)\), we use the following facts: recall that \(\delta >0\) has been chosen sufficiently small so that the Radon–Nickodym derivative of \(\mu \) has a continuous version on \(\mathcal {A}_{\delta }(x_0)\) (see assumption (iii) of the theorems); the Riemannian volume of \(\mathcal {A}_{\delta }(x_0)\) satisfies \(\text {vol}_{{\varvec{M}}}(\mathcal {A}_{\delta }(x_0))=O(\rho (x,y))\) (see immediately below (25)); and \(q_x(\xi )\) is bounded on \({\varvec{M}}\). As a consequence of these facts,

$$\begin{aligned} \int _{\mathcal {A}_{\delta }(x_0)} \text {tr}\left[ \{ q_x(\xi ) - q_y(\xi ) \}^{\top } \{q_x(\xi ) - q_y(\xi )\} \right] \hbox {d}\mu (\xi ) =O(\rho (x,y)). \end{aligned}$$
(46)

Consequently, for x and y such that \(\rho (x_0,x) \rightarrow 0\) and \(\rho (x_0,y) \rightarrow 0\),

$$\begin{aligned}&\sqrt{\int _{{\varvec{M}}} \text {tr}\left[ \left\{ q_x(\xi )-q_y(\xi ) \right\} ^{\top } \left\{ q_x (\xi ) - q_y(\xi )\right\} \right] \hbox {d}\mu (\xi )} = \sqrt{\int _{\mathcal {A}_{\delta }(x_0)} + \int _{{\varvec{M}}\setminus \mathcal {A}_{\delta }(x_0)} }\nonumber \\&\quad = \sqrt{O(\rho (x,y)) +O( \rho (x,y)^2)} \nonumber \\&\quad =O \Big \{\rho (x,y)^{1/2} \Big \}. \end{aligned}$$
(47)

The relevant class of functions here is

$$\begin{aligned} \mathcal {F}_{\delta _0} =\{q_x(\cdot ): \, q_x: {\varvec{M}}\rightarrow \mathcal {T}_{x_0}({\varvec{M}}), \, \rho (x,x_0)<\delta _0\}, \end{aligned}$$

where \(0<\delta _0<\delta \) is chosen to be sufficiently small. Note that, by construction, these functions take values in \(\mathcal {T}_{x_0}({\varvec{M}})\), the tangent space of the population Fréchet mean \(x_0\). Since this is a Euclidean space we may apply the standard theorems of empirical process theory to the function class \(\mathcal {F}_{\delta _0}\). Using (47) and the fact that \({\varvec{M}}\) is compact, it follows that the integrals in Theorem 2.5.6 of van der Vaart and Wellner [22] are both finite, so that \(\mathcal {F}_{\delta _0}\) is a Donsker class. By Theorem 3.34 of Dudley [8], the Donsker property is sufficient to guarantee asymptotic equicontinuity at \(x_0\), which in turn implies that

$$\begin{aligned} \Big \vert \Big \vert \Pi _{\hat{\xi }_n,x_0}Z_n(\hat{\xi }_n)-Z_n(x_0)\Big \vert \Big \vert = o_p(1). \end{aligned}$$
(48)

To justify (48), we argue as follows. The relevant empirical process in terms of the \(q_x\) functions is

$$\begin{aligned} \frac{1}{\sqrt{n}} \sum _{i=1}^n q_x(\xi _i) = \Pi _{x,x_0} Z_n(x)\quad x \in {\varvec{M}}, \end{aligned}$$

where \(Z_n(x)\) is defined in (33), the parallel transport map \(\Pi _{x,x_0}\) is defined in the statement of Theorem 1 and, by definition, the \(q_x\) functions have already been centred so that \(\mathbb {E}[q_x(\xi _i)]=0 \in \mathcal {T}_{x_0}({\varvec{M}})\). Asymptotic equicontinuity implies that, for any \(\eta >0\) and \(\epsilon >0\), there exists a neighbourhood \(V \subset {\varvec{M}}\) of \(x_0 \in {\varvec{M}}\) such that

$$\begin{aligned} \limsup _{n \rightarrow \infty } \mathbb {P}\left[ \sup _{x \in V} \Big \vert \Big \vert \Pi _{x,x_0}Z_n(x) -Z_n(x_0) \Big \vert \Big \vert >\eta \right] <\epsilon , \end{aligned}$$

where \(\Pi _{x_0, x_0}\) is taken to be the identity. Moreover, by Proposition 1,

$$\begin{aligned} \mathbb {P}[\hat{\xi }_n \in V] \rightarrow 1 \end{aligned}$$

as \(n \rightarrow \infty \), where \(\hat{\xi }_n\) is a measureable selection of the sample Fréchet mean. The result (48) now follows by a standard argument.

Thus we have proved that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\). One further comment: most of the results in the literature on empirical process theory, including van der Vaart and Wellner [22] and Dudley [8], are usually stated for classes of real-valued functions. To generalise to \(\mathbb {R}^m\)-valued functions, where m is finite, is a straightforward matter. In the present context, we simply prove that each component is \(o_p(1)\), which follows immediately from the calculations given above.

We now complete the proof of Step 2. Recall the first equality in (39). Since, from Theorem 1,

$$\begin{aligned} R(\hat{\xi }_n, x_0) = o(\rho (\hat{\xi }_n, x_0))=o\left( \Big \vert \Big \vert \exp _{x_0}^{-1}(\hat{\xi }_n)\Big \vert \Big \vert \right) \!, \end{aligned}$$

it follows that

$$\begin{aligned} \sqrt{n}R(\hat{\xi }_n,x_0)=o\left( \sqrt{n} \Big \vert \Big \vert \exp _{x_0}^{-1}(\hat{\xi }_n)\Big \vert \Big \vert \right) \!. \end{aligned}$$

Moreover, it has already been shown that \(\vert \vert T_1 \vert \vert \overset{p}{\rightarrow }0\), and we know from condition (v) of Theorem 2 that \(\Psi _{\mu }(x_0)^{-1}\) is a fixed matrix with bounded elements and that \(\vert \vert \Psi _{\mu }(x_0)^{-1}Z_n(x_0)\vert \vert =O_p(1)\) due to the central limit theorem. Consequently, it must be the case that \(\sqrt{n} \vert \vert \exp _{x_0}^{-1}(\hat{\xi }_n)\vert \vert = O_p(1)\). Hence \(\sqrt{n}\vert \vert R(\hat{\xi }_n,x_0)\vert \vert =o_p(1)\) and therefore \(\vert \vert Y \vert \vert =o_p(1)\) as claimed. \(\square \)