1 Introduction

Hydrodynamic limits of interacting particle systems is a well established subject. A large variety of parabolic equations (such as the non-linear heat equation) and hyperbolic conservation laws have been obtained from microscopic stochastic particle systems; see DeMasi and Presutti [7], Kipnis and Landim [13], Seppäläinen [17] for overviews. Usually, the setting here is that in the underlying particle system the particles move on the lattice \(\mathbb {Z}^d\), and after rescaling the limiting partial differential equation is defined on \(\mathbb {R}^d\), or on a subdomain of \(\mathbb {R}^d\) such as an interval, where then equations with boundary conditions on the ends of the interval are derived (e.g. Dirichlet boundary conditions for the case where at the right and left end the system is coupled to a reservoir fixing the density of particles, see Gonçalves [10]).

Motivated e.g. by the study of the motion of proteins in a cell-membrane, or more general motion of particles on curved interfaces, it is clear that there are many relevant physical systems of which the macroscopic motion takes place on a Riemannian manifold rather than on Euclidean space. It is the aim of this paper to provide first steps in this direction, by considering the simplest interacting particle system on a suitable discretization of a Riemannian manifold and proving its hydrodynamic limit. The symmetric exclusion process is a well-known and well-studied interacting particle system for which in standard setting it is rather straightforward to obtain the hydrodynamic limit using duality. Duality allows to translate the one-particle scaling limit, i.e., the fact that the rescaled single particle position converges to Brownian motion to the fact that the hydrodynamic limit of the particle system is the diffusion equation. Another manifestation of duality is the fact that the microscopic equation for the expectation of the density field is already a closed equation. We consider the symmetric exclusion process on a suitable discretization (a notion defined more precisely below) of a compact Riemannian manifold and prove that its empirical density field, after appropriate rescaling, converges to the solution of the heat equation on the manifold. To obtain this result, we start in Sect. 2 by studying the invariance principle of a class of geodesic random walks, thereby extending earlier results of Jørgensen [12]. These random walks are shown to converge to Brownian motion, via the technique of generator convergence. Next, in Sect. 3, we define a notion of “uniformly approximating grids” and show that choosing uniformly N points on the manifold, and connecting them via a kernel depending on the Riemannian distance yields a weighted graph such that the corresponding random walk converges (as the number of random points tends to infinity) to a geodesic random walk which in turn scales to Brownian motion. We also formulate abstract conditions on approximating grids ensuring the convergence of the weighted random walk to Brownian motion. In particular, convergence of the empirical distribution to the normalized Riemannian volume in Kantorovic distance is shown to be sufficient, i.e. we show that in that setting weights can be chosen such that the corresponding random walk converges to Brownian motion. We give several examples of such suitable grids. Finally, in Sect. 4, we define the exclusion process on such suitable grids (defined in Sect. 3) and show that its empirical density converges to the solution of the heat equation, following the proof from Seppäläinen [17].

2 The Invariance Principle for a Class of Geodesic Random Walks

Let M be an n-dimensional, compact and connected Riemannian manifold. Then we know that M is complete and hence geodesically complete. The main purpose of this section is to define the geodesic random walk and to show that it approximates Brownian motion when appropriately rescaled (in time and space). Such random walks and this so-called invariance principle have been studied before (Jørgensen [12] and in a special case Blum [4]). However we will directly obtain results that are tailor-made to apply them in Sect. 3. In particular, we will obtain general assumptions on the jumping distributions of the geodesic random walk for it to converge to Brownian motion. In Sect. 2.1, we define the geodesic random walk and show convergence of the generators to the generator of Brownian motion under certain assumptions on the jumping distributions. Section 2.2 is devoted to finding out which distributions satisfy these assumptions.

2.1 Convergence of the Generators

The process

Let \(\{\mu _p,p\in M\}\) be a collection of positive, finite measures where each \(\mu _p\) is a measure on \(T_pM\). The measure \(\mu _p\) represents the rate to jump in a particular direction of \(T_pM\). More precisely, the Markov process \(X^N=\{X^N_t,t\ge 0\}\) associated to \(\{\mu _p,p\in M\}\) has generator

$$\begin{aligned} L_Nf(p)=\int _{T_pM} f(p(1/N,\eta ))-f(p) \mu _p(\mathrm {d}\eta ), \end{aligned}$$

where for a vector \(\xi \in T_pM\) we denote the geodesic through p with tangent vector \(\xi \) at p by \(p(\cdot ,\xi )\). We denote the corresponding semigroup by

$$\begin{aligned} S^N_tf(p)=\mathbb {E}_pf(X^N_t). \end{aligned}$$

Both of these have the continuous functions on the manifold C(M) as their domain.

We interpret this process as follows. When the process \(X^N\) is at a point p, it chooses a random direction \(\eta \) from \(T_pM\) with rates given by \(\mu _p\) (i.e. it waits for an exponential time with rate \(\mu _p(T_pM)\) and then independently picks a vector according to the probability distribution \(\frac{\mu _p}{\mu _p(T_pM)}\)). Then the process jumps to the position \(p(1/N,\eta )\) that is reached by following the geodesic through p in the direction of \(\eta \) for time \(\frac{1}{N}\). This situation is sketched in Fig. 1. We assume that choosing random directions happens independently. In this section we will specify restrictions that the measures \(\mu _p\) should satisfy. Later (in Sect. 2.2), we will show that we can take \(\mu _p\) to be for instance the uniform distribution on the unit tangent vectors at p.

Fig. 1
figure 1

Left: geodesic random walk on a sphere. Right: Brownian motion on a sphere (Sourcehttps://en.wikipedia.org/wiki/Brownian_motion)

The case

Before we go into the general case, we illustrate the above in \(\mathbb {R}^n\). In \(\mathbb {R}^n\) the exponential map is simply addition if we identify \(T_p\mathbb {R}^n\) with \(\mathbb {R}^n\) itself. So in that case from a point p the process moves to \(p(1/N,\eta )=p+\frac{1}{N}\eta \) where \(\eta \) is chosen from \(T_p\mathbb {R}^n=\mathbb {R}^n\) randomly. This means that the discrete time jumping process when jumping as described above, can be denoted by \(S^N_m=\sum _{i=1}^m\frac{1}{N}\eta _i=\frac{1}{N}\sum _{i=1}^m\eta _i\) where \(\eta _j\) is drawn from \(T_{S_{j-1}}\mathbb {R}^n=\mathbb {R}^n\) according to some distribution. Now let \(\{N_t,t\ge 0\}\) be a Poisson process with rate one and define \(X^N_t=S_{N_t}\). Then X makes the same jumps as S, but after independent exponential times. We see that \(X^N=\{X^N_t,t\ge 0\}\) satisfies the description above. Now the invariance principle tells us that under some conditions on the jumping rates \(X^N_{tN^2}\rightarrow B_t\) in distribution as N goes to infinity, where B is Brownian motion. We show the analogous result in the more general setting of a manifold.

Aim

We denote the Laplace–Beltrami operator on the manifold by \(\Delta _M\). The rest of this section will be devoted to the proof of the following result.

Proposition 2.1

Suppose that in the situation above we have:

  • \(\sup _{p\in M} \sup _{\eta \in \mathrm {supp}\mu _p} ||\eta ||<\infty \)

  • \(\sup _{p\in M} \mu _p(T_pM)<\infty \)

  • \(\int \eta ^i \mu _p(\mathrm {d}\eta )=0\) and \(\int \eta ^i\eta ^j\mu _p(\mathrm {d}\eta )=g^{ij}(p)\) in each coordinate system around p

Then for \(f\in C^\infty \): \(N^2L_Nf\rightarrow \frac{1}{2} \Delta _Mf\) uniformly on M.

The first assumption requires that the supports of the measures and their total masses are bounded uniformly over all points of the manifold. We will loosely say that the measures are uniformly compactly supported and uniformly finite. Since \(C^\infty (M)\) is a core for \(\frac{1}{2}\Delta _M\) [20], the Trotter-Kurtz theorem (see Kurtz [14]) implies the following corollary.

Corollary 2.2

In the situation of Proposition 2.1 the geodesic random walk converges to Brownian motion in distribution in \(D([0,\infty ),M)\) (the space of cadlag maps \([0,\infty )\rightarrow M\)).

Note that if we denote the random variable corresponding to \(\mu _p\) by \(\zeta _p\), the second requirement of Proposition 2.1 is that (in any coordinate system) \(\mathbb {E}\zeta _p^i = 0\) and \(\mathrm {Cov}(\zeta _p^i,\zeta _p^j)=g^{ij}(p)\). This shows that the mean vector m of \(\zeta _p\) satisfies \(m=0\) and the covariance matrix \(\Sigma \) satisfies \(\Sigma =(g^{ij})(p)\). In \(\mathbb {R}^n\), this simplifies to \(\mathbb {E}\zeta _p^i = 0\) and \(\mathrm {Cov}(\zeta _p^i,\zeta _p^j)=\delta ^i_j\). This is satisfied for instance when \(\mu _p\) is the uniform distribution on the sphere with radius \(\sqrt{N}\) in \(\mathbb {R}^n\). Section 2.2 deals with the question which measures satisfy the restrictions above. Some examples will be given at the end of that section as well.

Remark 2.3

Although we study the jumping distributions later, something that can already be seen now, is that we do not require any relation between jumping measures at different points of the manifold (apart from the uniform bounds on the support and the total mass). This means that our result does not require the jumping measures to be identically distributed, so it really generalizes [12].

Choosing Suitable Charts

Let f be a fixed smooth function from now on. Since we want the convergence \(N^2L_N f\rightarrow \frac{1}{2} \Delta _Mf\) to be uniform on M, we cannot just consider this problem pointwise. To deal with this, we will choose specific coordinate charts.

Let \(\rho \) denote the original metric of the manifold and let d denote the metric that is induced by the Riemannian metric. Recall that these metrics induce the same topology. This means that we do not cause confusion when we speak about open and closed sets, continuous maps and compactness without explicitly mentioning the metric. For each \(p\in M\), let \((x_p,U_p)\) be a coordinate chart for M around p. \(U_p\) is open with respect to \(\rho \) and hence with respect to d. This means that there is some \(\epsilon _p>0\) such that \(G_p:=\overline{B_d(p,\epsilon _p)}\subset U_p\). Now define \(O_p=B_d(p,\epsilon /2)\). Since M is compact, we can find \(p_1,\ldots ,p_m\) such that \(M\subset \cup _i O_{p_i}\). We have the following easy statement.

Lemma 2.4

Let \((g_k)_{k=1}^\infty \) and g be functions \(M\rightarrow \mathbb {R}\). If \(g_k\rightarrow g\) uniformly on each \(O_{p_i}\), then \(g_k\rightarrow g\) uniformly on M.

Proof

Let \(\epsilon >0\). For each i there is an \(N_i\in \mathbb {N}\) such that for all \(k\ge N_i: \sup _{O_{p_i}}|g_k(q)-g(q)|<\epsilon \). Set \(N=\max _{1\le i\le m}N_i\) and let \(q\in M\). Then there is a j such that \(q\in O_{p_j}\). Now for all \(k\ge N\), we see \(k\ge N_j\), so \(|g_k(q)-g(q)|\le \sup _{O_{p_i}}|g_k(s)-g(s)|<\epsilon \). This shows that \(\sup _M |g_k(q)-g(q)|\le \epsilon \). Hence \(g_k\rightarrow g\) uniformly on M. \(\square \)

Now let \(j\in \{1,\ldots ,m\}\) be fixed. Call \(O:=O_{p_j}\), \(\epsilon :=\epsilon _{p_j}\), \(x:=x_{p_j}\), \(G:=G_{p_j}\) and \(U:=U_{p_j}\) (this situation is shown in Fig. 2). Because of the lemma, it suffices to show that \(N^2L_N f\rightarrow \frac{1}{2} \Delta _Mf\) uniformly on O.

Technical Considerations

To obtain good estimations later, we will need that \(p(s,\eta )\) is still in our coordinate system (xU) and even in the set G when \(|s|\le \frac{1}{N}\) for N large enough. Since the convergence must be uniform, how large N must be can not depend on the point p. The following lemma tells us how to choose such N.

Lemma 2.5

Call \(K=\sup _{p\in M} \sup _{\eta \in \mathrm {supp}\mu _p} ||\eta ||<\infty \) (by assumption). Choose \(N_\epsilon \in \mathbb {N}\) such that \(\frac{1}{N_\epsilon }<\frac{\epsilon }{2K}\). Then for all \(p\in O\) and \(N\ge N_\epsilon \) we see

$$\begin{aligned} \forall |s|\le \frac{1}{N}: p(s,\eta )\in G. \end{aligned}$$

Proof

Let \(N\ge N_\epsilon \) and let \(p\in O\). The situation of the proof is visually represented in Fig. 2. Fix \(s\in (-\frac{1}{N},\frac{1}{N})\). Without loss of generality assume \(s>0\). Note that the speed of the geodesic \(p(\cdot ,\eta )\) equals \(||\eta ||\), so at time s, it has traveled a distance \(s||\eta ||\) from p. This means that there is a path of length \(s||\eta ||\) from \(p(s,\eta )\) to p, so \(d(p(s,\eta ),p)\le s||\eta ||\le \frac{1}{N}K \le \frac{1}{N_\epsilon }K<\epsilon /2\). Since \(p\in O\), we know \(d(p,p_j)<\epsilon /2\). Now the triangle inequality shows that \(d(p_j,p(s,\eta ))\le d(p_j,p)+d(p,p(s,\eta ))< \epsilon /2+\epsilon /2=\epsilon \). This implies that \(p(s,\eta )\in B_d(p_j,\epsilon )\subset G\). \(\square \)

Fix \(N_\epsilon \) as in the lemma and take N larger than \(N_\epsilon \).

Fig. 2
figure 2

The chart (xU) with closed ball G and open ball O around \(p_j\). As is shown in lemma 2.5, \(p^\eta =p(t,\eta )\) does not leave the ball around p with radius \(\epsilon /2\), as long as \(|t|\le 1/N\) for \(N\ge N_\epsilon \). The importance for uniformity is that it does not matter where we choose p (in O)

Taylor Expansion

Now fix \(p\in O\) and \(\eta \in T_pM\). Write \(p^\eta \) for the map \(\mathbb {R}\rightarrow M\) that takes t to \(p(t,\eta )\). We can locally write \(f\circ p^\eta = (f\circ x^{-1})\circ (x \circ p^\eta )\), which is a composition of smooth maps. This means that \(f\circ p^\eta \) is just a smooth map \(\mathbb {R}\rightarrow \mathbb {R}\), so we can use a Taylor expansion and obtain

$$\begin{aligned} f(p(1/N,\eta ))=f(p)+\frac{1}{N} \frac{\mathrm {d}(f\circ p^\eta )}{\mathrm {d}t}(0)+\frac{1}{2N^2} \frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}^2 t}(0) + \frac{1}{6N^3} \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}^3 t}(t_{N,\eta ,p}), \end{aligned}$$

where \(t_{N,\eta ,p}\in (0,1/N)\) is a number depending on N, \(\eta \) and p. This gives us

$$\begin{aligned} N^2 L_Nf(p)= & {} N^2 \int _{M_p} f(p(1/N,\eta ))-f(p) \mu _p(\mathrm {d}\eta )\nonumber \\= & {} N^2 \int \frac{1}{N} \frac{\mathrm {d}(f\circ p^\eta )}{\mathrm {d}t}(0)+\frac{1}{2N^2} \frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}^2 t}(0)\nonumber \\&+ \frac{1}{6N^3} \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}^3 t}(t_{N,\eta ,p}) \mu _p(\mathrm {d}\eta )\nonumber \\= & {} N \int \frac{\mathrm {d}(f\circ p^\eta )}{\mathrm {d}t}(0)\mu _p(\mathrm {d}\eta )+\frac{1}{2}\int \frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}t^2}(0)\mu _p(\mathrm {d}\eta ) \nonumber \\&+\, \frac{1}{6N} \int \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t_{N,\eta ,p}) \mu _p(\mathrm {d}\eta ). \end{aligned}$$
(1)

We will examine these terms separately.

The First Term

Recall that \(p\in O\) and that O is contained in a coordinate chart (xU). Since \(N\ge N_\epsilon \), Lemma 2.5 guarantees us that \(p(s,\eta )\) stays in the coordinate chart for \(|s|<\frac{1}{N}\). Writing \(\eta =\sum _{i=1}^n \eta ^i \frac{\partial }{\partial x^i}|_p\), we see for \(|s|<\frac{1}{N}\):

$$\begin{aligned} \frac{\mathrm {d}(f\circ p^\eta )}{\mathrm {d}t}(s)= & {} \frac{\mathrm {d}}{\mathrm {d}t} [(f\circ x^{-1}) \circ (x \circ p^\eta )](s)\\= & {} \sum _{i=1}^n D_i(f\circ x^{-1})(x(p^\eta (s)) \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t}(s)\\= & {} \sum _{i=1}^n \frac{\partial f}{\partial x^i}( p^\eta (s)) \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t}(s). \end{aligned}$$

Now setting \(s=0\), this becomes:

$$\begin{aligned} \sum _{i=1}^n \frac{\partial f}{\partial x^i}(p) \eta ^i = \sum _{i=1}^n \eta ^i \frac{\partial }{\partial x^i}|_p f = \eta (f), \end{aligned}$$

since \(p^\eta (0)=p(0,\eta )=p\) and the tangent vector to the geodesic \(p(\cdot ,\eta )\) at 0 is \(\eta \) (so the \(i^{\text {th}}\) coordinate with respect x is just \(\eta ^i\)). Now the first term of (1) becomes:

$$\begin{aligned} N\int \eta (f)\mu _p(\mathrm {d}\eta ) = N \int \sum _{i=1}^n \eta ^i \frac{\partial }{\partial x^i}|_p f \mu _p(\mathrm {d}\eta ) = N \sum _{i=1}^n \frac{\partial }{\partial x^i}|_p f \int \eta ^i \mu _p(\mathrm {d}\eta ). \end{aligned}$$

By assumption these integrals are 0. This shows that the first term of (1) vanishes.

The Second Term

Now we want to show that the remaining term equals \(\frac{1}{2}\Delta _Mf(p)\). Similarly to above we see for \(|s|<\frac{1}{N}\) (leaving out the arguments to keep things clear):

$$\begin{aligned} \frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}t^2}= & {} \frac{\mathrm {d}}{\mathrm {d}t} \sum _{i=1}^n \frac{\partial f}{\partial x^i} \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t}\\= & {} \sum _{i=1}^n \left\{ \left( \frac{\mathrm {d}}{\mathrm {d}t} \frac{\partial f}{\partial x^i}\right) \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t} + \frac{\partial f}{\partial x^i} \left( \frac{\mathrm {d}}{\mathrm {d}t} \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t}\right) \right\} \\= & {} \sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^j\partial x^i} \frac{\mathrm {d}(x^j \circ p^\eta )}{\mathrm {d}t} \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t} + \frac{\partial f}{\partial x^i} \frac{\mathrm {d}^2 (x^i \circ p^\eta )}{\mathrm {d}t^2} \right\} . \end{aligned}$$

Since \(p^\eta \) is a geodesic, we know that it satisfies the geodesic equations. This shows that for each \(i=1,\ldots ,n\) we have

$$\begin{aligned} \frac{\mathrm {d}^2 (x^i \circ p^\eta )}{\mathrm {d}t^2} + \sum _{k,l=1}^n \Gamma ^i_{kl} \frac{\mathrm {d}(x^k\circ p^\eta )}{\mathrm {d}t}\frac{\mathrm {d}(x^l\circ p^\eta )}{\mathrm {d}t}=0. \end{aligned}$$

Using this yields the following expression for the second derivative:

$$\begin{aligned} \sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^j\partial x^i} \frac{\mathrm {d}(x^j \circ p^\eta )}{\mathrm {d}t} \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t} - \frac{\partial f}{\partial x^i} \sum _{k,l=1}^n \Gamma ^i_{kl} \frac{\mathrm {d}(x^k\circ p^\eta )}{\mathrm {d}t}\frac{\mathrm {d}(x^l\circ p^\eta )}{\mathrm {d}t} \right\} , \end{aligned}$$

so

$$\begin{aligned} \frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}t^2}(0)=\sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^j\partial x^i}(p) \eta ^j\eta ^i - \frac{\partial f}{\partial x^i}(p) \sum _{k,l=1}^n \Gamma ^i_{kl}(p) \eta ^k\eta ^l \right\} . \end{aligned}$$

Using linearity of the integral, we obtain the following expression for the second term of (1):

$$\begin{aligned} \frac{1}{2}\sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^i\partial x^j}(p) \int \eta ^i\eta ^j\mu _p(\mathrm {d}\eta ) - \frac{\partial f}{\partial x^i}(p) \sum _{k,l=1}^n \Gamma ^i_{kl}(p) \int \eta ^k\eta ^l\mu _p(\mathrm {d}\eta ) \right\} . \end{aligned}$$

Note that we also changed the order of the derivatives of f, this can be done since f is smooth. Now we want the term above to equal

$$\begin{aligned}&\frac{1}{2}\Delta _Mf(p)=\frac{1}{2}\left\{ g^{ij}\frac{\partial ^2 f}{\partial x^i x^j} - g^{kl}\Gamma ^i_{kl} \frac{\partial f}{\partial x^i}\right\} \\&\quad = \frac{1}{2}\sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^i\partial x^j}(p) g^{ij}(p) - \frac{\partial f}{\partial x^i}(p) \sum _{k,l=1}^n \Gamma ^i_{kl}(p) g^{kl}(p) \right\} . \end{aligned}$$

This is true, since we required that for any coordinate chart around p and for all ij: \(\int _{M_p}\eta ^i\eta ^j\mu _p(\mathrm {d}\eta )=g^{ij}(p)\).

The Rest Term

If the last term goes to 0 uniformly on O, we have the result. Let N still be larger then \(N_\epsilon \).

$$\begin{aligned} \left| \frac{1}{6N} \int \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t_{N,\eta ,p}) \mu _p(\mathrm {d}\eta )\right|\le & {} \frac{1}{6N} \int \left| \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t_{N,\eta ,p})\right| \mu _p(\mathrm {d}\eta ) \\\le & {} \frac{K'}{6N} \sup _{\eta \in \mathrm {supp}\mu _p} \left| \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t_{N,\eta ,p})\right| \end{aligned}$$

where \(K'=\sup _{p\in M} \mu _p(T_pM)<\infty \) (by assumption). We know that \(t_{N,\eta ,p}\in [0,1/N]\subset [0,1/N_\epsilon ]\). This means that the above is smaller than:

$$\begin{aligned} \frac{K'}{6N} \sup _{\eta \in \mathrm {supp}\mu _p}\sup _{t\in [0,1/N_\epsilon ]} \left| \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t)\right| \le \frac{K'}{6N} \sup _{\eta :||\eta ||\le K}\sup _{t\in [0,1/N_\epsilon ]} \left| \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t)\right| . \end{aligned}$$

Because of the 1 / N in front of the equation, we only need to know that the rest is uniformly bounded to obtain uniform convergence. It thus suffices to show that \(\frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}(t)\) is bounded as a function of \(\eta \) with \(||\eta ||<K\) and \(t\in [0,1/N_\epsilon ]\). Lemma 2.5 shows that \(p(t,\eta )\) stays in G for all such \(\eta \) and t. We will use this fact multiple times.

We first express \(\frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}\) in local coordinates for \(|t|\le 1/N\).

$$\begin{aligned} \frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}= & {} \frac{\mathrm {d}}{\mathrm {d}t}\frac{\mathrm {d}^2 (f\circ p^\eta )}{\mathrm {d}t^2} \nonumber \\= & {} \frac{\mathrm {d}}{\mathrm {d}t} \sum _{i=1}^n \left\{ \sum _{j=1}^n \frac{\partial ^2 f}{\partial x^j\partial x^i} \frac{\mathrm {d}(x^j \circ p^\eta )}{\mathrm {d}t} \frac{\mathrm {d}(x^i \circ p^\eta )}{\mathrm {d}t} + \frac{\partial f}{\partial x^i} \frac{\mathrm {d}^2 (x^i \circ p^\eta )}{\mathrm {d}t^2} \right\} . \end{aligned}$$
(2)

To make notation more compact, we introduce the following notation (and \(f_i,f_{ijk}\) analogously):

$$\begin{aligned} f_{ij}:=\frac{\partial ^2 f}{\partial x^j\partial x^i},\quad \qquad p^i_k:=\frac{\mathrm {d}^k (x^i \circ p^\eta )}{\mathrm {d}t^k}. \end{aligned}$$

Combining this with Einstein summation, we can write (2) as

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} (f_{ij}p^i_1p^j_1 + f_ip^i_2)= & {} (f_{ijk}p^k_1)p^i_1p^j_1+f_{ij}(p^i_1p^j_2+p^i_2p^j_1) + (f_{ij}p^j_1)p^i_2+f_ip^i_3\\= & {} f_{ijk}p^k_1p^i_1p^j_1+f_{ij}(p^i_1p^j_2+2p^i_2p^j_1) +f_ip^i_3. \end{aligned}$$

Now, as before, we can deal with second derivatives of geodesics using the geodesic equations:

$$\begin{aligned} p^i_2=-\Gamma ^i_{rs}p^r_1p^s_1. \end{aligned}$$

We can also calculate the third derivative:

$$\begin{aligned} p^i_3=\frac{\mathrm {d}}{\mathrm {d}t}p^i_2=\frac{\mathrm {d}}{\mathrm {d}t}(-\Gamma ^i_{rs}p^r_1p^s_1) = -\left( \frac{\mathrm {d}}{\mathrm {d}t}\Gamma ^i_{rs}\right) p^r_1p^s_1-\Gamma ^i_{rs}(p^r_1p^s_2+p^r_2p^s_1). \end{aligned}$$

This shows us that \(\frac{\mathrm {d}^3 (f\circ p^\eta )}{\mathrm {d}t^3}\) is a combination of products and sums of the following types of expressions: \(f_i\), \(f_{ij}\), \(f_{ijk}\), \(p^i_1\), \(\Gamma ^i_{rs}\) and \(\frac{\mathrm {d}}{\mathrm {d}t}\Gamma ^i_{rs}\). If we can bound all of these on the right domains (independent of p and \(\eta \)), we are done.

Bounding\(f_i\), \(f_{ij}\)and\(f_{ijk}\)

First of all, note that f is a smooth function on U. Further, \(\partial _i\) defines smooth vector field on U. Since \(f_i=\frac{\partial f}{\partial x^i}\) is obtained by applying \(\partial _i\) on U to f, it is a smooth function on U. Continuing in this way, we see that \(f_{ij}\) and \(f_{ijk}\) are also smooth functions on U. In particular, they are smooth functions on G (since it is a subset of U). G is a closed subset of the compact M and is hence compact itself. This implies that \(f_i\), \(f_{ij}\) and \(f_{ijk}\) are (for each choice of ijk) bounded on G. Since we evaluate these functions in the points \(p(s,\eta )\) for \(0\le s\le 1/N\), \(N\ge N_\epsilon \) and \(||\mu ||\le K\), our discussion above shows that we only evaluate them in points of G. This means that we have found bounds for \(f_i\), \(f_{ij}\) and \(f_{ijk}\).

Bounding \(p^i_1\)

We start with a technical lemma.

Lemma 2.6

Let \(q\in M\) and let (yV) be a coordinate chart around q. Let \(v\in T_qM\) and write \(v=v^i\partial _i\). Then \(|v^i|\le \sqrt{g^{ii}(q)}||v||\).

Proof

Fix some \(1\le i\le n\). We see in the tangent space at q:

$$\begin{aligned} \left<v,g^{ij}\partial _j\right>=\left<v^k\partial _k,g^{ij}\partial _j\right>=v^kg^{ij}g_{kj}=v^k\delta ^i_k=v^i. \end{aligned}$$

Further,

$$\begin{aligned} ||g^{ij}\partial _j||^2=\left<g^{ij}\partial _j,g^{ik}\partial _k\right>=g^{ij}g^{ik}g_{jk}=g^{ij}\delta ^i_j=g^{ii}. \end{aligned}$$

Using the relations above and the Cauchy-Schwarz inequality, we obtain:

$$\begin{aligned} |v^i|=|\left<v,g^{ij}\partial _j\right>|\le ||v||\cdot ||g^{ij}\partial _j||=\sqrt{g^{ii}}||v||. \end{aligned}$$

\(\square \)

Now we can use this to show the following.

Lemma 2.7

\(|p^i_1(t)|=\left| \frac{\mathrm {d}(x^i\circ p^\eta )}{\mathrm {d}t}(t)\right| \le \sqrt{g^{ii}(p(t,\eta ))}||\eta ||\).

Proof

The first equation is just a change of notation. Further we see

$$\begin{aligned} \frac{\mathrm {d}(x^i\circ p^\eta )}{\mathrm {d}t}=\left( p^\eta _* \frac{\mathrm {d}}{\mathrm {d}t}\right) (x^i)=\frac{\mathrm {d}p^\eta }{\mathrm {d}t}(x^i)=\left( \frac{\mathrm {d}p^\eta }{\mathrm {d}t}\right) ^i. \end{aligned}$$

This means that \(\frac{\mathrm {d}(x^i\circ p^\eta )}{\mathrm {d}t}\) is just the \(i^{\text {th}}\) coordinate with respect to (xU) of the tangent vector to \(p^\eta \) at time t so at the point \(p(t,\eta )\in M\). Using Lemma 2.6, we see

$$\begin{aligned} \left| \frac{\mathrm {d}(x^i\circ p^\eta )}{\mathrm {d}t}(t)\right| \le \sqrt{g^{ii}(p(t,\eta ))}\left| \left| \frac{\mathrm {d}p^\eta }{\mathrm {d}t}\right| \right| . \end{aligned}$$
(3)

Since \(p^\eta \) is a geodesic, it has constant speed. Its speed at p is \(||\eta ||\), so this must be its speed anywhere else along the trajectory. Hence \(||\frac{\mathrm {d}p^\eta }{\mathrm {d}t}||=||\eta ||\). Inserting this in (3) yields the result. \(\square \)

We can now easily obtain a bound for \(p^i_1\). For \(0\le t\le 1/N\) and \(||\eta ||\le K\), we know \(p(t,\eta )\) stays in G. \(g^{ii}\) is a smooth and hence continous function on U, so it is bounded on G (since G is compact). This means that \(\sqrt{g^{ii}(p(t,\eta ))}\) is bounded by some \(K^{i}\) for \(||\eta ||\le K\) and \(0\le t\le 1/N\). Now we see \(|p^i_1|\le \sqrt{g^{ii}(p(t,\eta ))}\left| \left| \frac{\mathrm {d}p^\eta }{\mathrm {d}t}\right| \right| \le K^{i}K\).

Bounding \(\Gamma ^i_{rs}\) and \(\frac{\mathrm {d}}{\mathrm {d}t}\Gamma ^i_{rs}\)

Each \(g_{ij}\) is a smooth function on U. This means that \(\frac{\partial g_{ij}}{\partial x^k}\) is a smooth function on U. This implies that \(\Gamma ^i_{rs}\) is just combination of products and sums of smooth functions, so it is smooth itself. Now, as before, \(\Gamma ^i_{rs}\) is bounded on G. Since we only evaluate it in \(p(t,\eta )\) with \(0\le t\le 1/N\) and \(||\eta ||\le K\), we only evaluate it in G, so we have bounded \(\Gamma ^i_{rs}\).

Now \(\frac{\mathrm {d}}{\mathrm {d}t}\Gamma ^i_{rs}\) can be written as

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\Gamma ^i_{rs}=\frac{\partial \Gamma ^i_{rs}}{\partial x^j} \frac{\mathrm {d}(x^j\circ p^\eta )}{\mathrm {d}t} = (\Gamma ^i_{rs})_j p^j_1, \end{aligned}$$

with notation as above. Since \(\Gamma ^i_{rs}\) is smooth function \(U\rightarrow \mathbb {R}\), this expression can be bounded in exactly the same way as expressions like \(f_jp^j_1\) above.

2.2 Stepping Distribution

Constraints for a Stepping Distribution

The question now is which distributions \(\mu _p\) on \(T_pM\) satisfy the assumptions of Proposition 2.1. From here on we fix \(p\in M\) and simply write \(\mu \) for \(\mu _p\). Being compactly supported and finite are rather natural constraints, but the other assumptions are harder, especially since they involve local coordinates. In this section we address the question which distributions satisfy the other assumptions, i.e. for every coordinate system around p:

$$\begin{aligned} \begin{aligned}&\int \eta ^i \mu (\mathrm {d}\eta ) = 0 \quad&\forall i=1,\ldots ,n\\&\int \eta ^i\eta ^j \mu (\mathrm {d}\eta ) = g^{ij}&\forall i,j=1,\ldots ,n. \end{aligned} \end{aligned}$$
(4)

To generalize this a bit, suppose \(\mu \) satisfies the following for some \(c>0\) for every coordinate system:

$$\begin{aligned} \begin{aligned}&\int \eta ^i \mu (\mathrm {d}\eta ) = 0 \quad&\forall i=1,\ldots ,n\\&\int \eta ^i\eta ^j \mu (\mathrm {d}\eta ) = cg^{ij}&\forall i,j=1,\ldots ,n. \end{aligned} \end{aligned}$$
(5)

Following the proof in the previous section, one sees directly that in this case the generators converge to the generator of Brownian motion that is speeded up by a factor c. We will look into this generalized situation and at the end we will see how to determine c.

Independence of (5) of Coordinate Systems

The following lemma shows that if (5) holds for a single coordinate system, it holds for any coordinate system.

Lemma 2.8

If (5) holds for some \(c>0\) and for some coordinate system (xU) around p, then it holds for the same c for all coordinate systems around p.

Proof

Let (xU) be a coordinate system around p for which (5) holds with \(c>0\) and let (yV) be any other coordinate system around p. It suffices to show that (5) holds with the same c for y. Denote the metric matrix with respect to x by g and the one with respect to y by \({\hat{g}}\). For any \(\eta \in T_pM\) define \(\eta ^1,\ldots ,\eta ^n\) as the coefficients of \(\eta \) with respect to x, so such that \(\eta =\sum _i \eta ^i \frac{\partial }{\partial x^i}\). Analogously let \({\hat{\eta }}^1,\ldots ,{\hat{\eta }}^n\) be such that \(\eta =\sum _i {\hat{\eta }}^i \frac{\partial }{\partial y^i}\). Let \(J=\frac{\partial (x^1,\ldots ,x^n)}{\partial (y^1,\ldots ,y^n)}\). If \(\eta \in T_pM\), then

$$\begin{aligned} {\hat{\eta }}^j=\eta (y^i)=\sum _i \eta ^i \frac{\partial }{\partial x^i} y^i = \sum _i \eta ^i \frac{\partial y^j}{\partial x^i}. \end{aligned}$$

This shows that for any j

$$\begin{aligned} \int {\hat{\eta }}^j\mu (\mathrm {d}\eta ) = \int \sum _{i=1}^n \eta ^i \frac{\partial y^j}{\partial x^i} \mu (\mathrm {d}\eta ) = \sum _{i=1}^n \frac{\partial y^j}{\partial x^i} \int \eta ^i \mu (\mathrm {d}\eta ) = 0, \end{aligned}$$

since for any i: \(\int \eta ^i \mu (\mathrm {d}\eta ) = 0\). Moreover, for any ij: \(\int \eta ^i\eta ^j\mu (\mathrm {d}\eta )=c g^{ij}\), so for any ij:

$$\begin{aligned} \int {\hat{\eta }}^i{\hat{\eta }}^j\mu (\mathrm {d}\eta )= & {} \int \sum _{k=1}^n \eta ^k \frac{\partial y^i}{\partial x^k} \sum _{l=1}^n \eta ^l \frac{\partial y^j}{\partial x^l} \mu (\mathrm {d}\eta ) = \sum _{k,l=1}^n \frac{\partial y^i}{\partial x^k}\frac{\partial y^j}{\partial x^l} \int \eta ^k \eta ^l \mu (\mathrm {d}\eta )\\= & {} \sum _{k,l=1}^n \frac{\partial y^i}{\partial x^k}\frac{\partial y^j}{\partial x^l} c g^{kl} = c (J^{-1}G^{-1}(J^{-1})^T)_{ij}. \end{aligned}$$

Since \(J^{-1}G^{-1}(J^{-1})^T=J^{-1}G^{-1}(J^{T})^{-1}=(J^TGJ)^{-1}={\hat{G}} ^{-1}\), we see that \(\int {\hat{\eta }}^i{\hat{\eta }}^j\mu (\mathrm {d}\eta )=c{\hat{g}}^{ij}\). We conclude that (5) holds for y with the same c. \(\square \)

Orthogonal Transformations and Canonical Measures

We now introduce a class of measures.

Definition 2.9

Let V be an inner product space and let T be a linear map \(V\rightarrow V\). We call T an orthogonal transformation if for any \(u,v\in V\): \(\left<Tu,Tv\right>=\left<u,v\right>\).

We call a measure \(\mu \) on \(T_pM\)canonical if for any orthogonal transformation T on \(T_pM\) and for any coordinate system:

$$\begin{aligned} \int \eta ^i \mu (\mathrm {d}\eta ) = \int (T\eta )^i \mu (\mathrm {d}\eta ) \text { and } \int \eta ^i \eta ^j \mu (\mathrm {d}\eta ) = \int (T\eta )^i(T\eta )^j \mu (\mathrm {d}\eta ). \end{aligned}$$

Remark 2.10

In the same way as above, one can show that \(\mu \) has the property above with respect to some coordinate system if and only if it has the property with respect to every coordinate system. Moreover, since \(-I\) always satisfies \((-I)^TG(-I)=G\), we see that \(\int \eta ^i \mu (\mathrm {d}\eta )=\int (-\eta )^i \mu (\mathrm {d}\eta )=\int -\eta ^i \mu (\mathrm {d}\eta )=-\int \eta ^i \mu (\mathrm {d}\eta )\), so \(\int \eta ^i \mu (\mathrm {d}\eta )\) is 0 for any canonical \(\mu \).

In words, \(\mu \) is canonical if orthogonal transformations do not change the mean vector and the covariance matrix of a random variable that has distribution \(\mu \). Remark 2.10 shows that in fact the mean vector must be 0. Note that in particular measures that are invariant under orthogonal transformations are canonical, since then \(\int (T\eta )^i \mu (\mathrm {d}\eta ) = \int \eta ^i (\mu \circ T^{-1})(\mathrm {d}\eta ) = \int \eta ^i\mu (\mathrm {d}\eta )\) and the other equation follows analogously. However a simple example shows that the converse is not true. Let \(M=\mathbb {R}\) and let \(\mu \) be any non-symmetric distribution on \(T_pM=\mathbb {R}\) with mean 0. The only orthogonal transformation (apart from the identity) is \(t\mapsto -t\). Under this transformation the mean (which is 0) and the second moment are obviously left invariant, but \(\mu \) is not symmetric, so it is not invariant. We will give an example for \(\mathbb {R}^n\) later.

If (xU) is some coordinate system around p and \(G=(g_{ij})\) is the matrix of the metric in p with respect to x, we can write a linear transformation \(T:T_pM\rightarrow T_pM\) as a matrix (which we will also call T) with respect to the base \(\frac{\partial }{\partial x^1},\ldots ,\frac{\partial }{\partial x^n}\). We see that

$$\begin{aligned} \left<T\eta ,T\xi \right>=\sum _{i,j} g_{ij} (T\eta )^i(T\xi )^j = \sum _{i,j} g_{ij} \sum _{k} T_{ik} \eta ^k \sum _{l} T_{jl} \xi ^l=\sum _{k,l} \left( \sum _{i,j} g_{ij} T_{ik}T_{jl}\right) \eta ^k \xi ^l. \end{aligned}$$

If T is orthogonal, this must equal

$$\begin{aligned} \left<\eta ,v\right>=\sum _{k,l} g_{kl} \eta ^k\xi ^l, \end{aligned}$$

so we see that \(g_{kl}=\sum _{i,j} g_{ij} T_{ik}T_{jl}=(T^TGT)_{kl}\) and hence \(G=T^TGT\).

Now for a measure \(\mu \) on \(T_pM\) and a coordinate system (xU), define the vector \(A_\mu \) and the matrix \(B_\mu \) by \(A_\mu ^i=\int \eta ^i\mu (\mathrm {d}\eta )\) and \(B_\mu ^{ij}=\int \eta ^i\eta ^j\mu (\mathrm {d}\eta )\). Then we have the following.

Lemma 2.11

Let \(\mu \) be a measure on \(T_pM\). Then the following are equivalent.

  1. (i)

    \(\mu \) is canonical.

  2. (ii)

    For every linear transformation T and every coordinate system (xU): if \(G=T^TGT\) , then \(A_\mu =TA_\mu \) and \(B_{\mu }=TB_\mu T^T\).

Proof

\((i)\Leftrightarrow (ii)\) because (ii) is just the definition of being canonical written in local coordinates. Indeed, we already saw that orthogonality or T translates in local coordinates to \(G=T^TGT\), the other expressions follow in a similar way from the following equations:

$$\begin{aligned} A_\mu ^{i}= & {} \int (T\eta )^i \mu (\mathrm {d}\eta ) = \int \sum _k T_{ik} \eta ^k\mu (\mathrm {d}\eta ) = \sum _k T_{ik} \int \eta ^k\mu (\mathrm {d}\eta ) = \sum _k T_{ik}A_\mu ^k\\ B_\mu ^{ij}= & {} \int (T\eta )^i(T\eta )^j\mu (\mathrm {d}\eta )\\= & {} \int \sum _k T_{ik} \eta ^k \sum _l T_{jl} \eta ^l \mu (\mathrm {d}\eta ) = \sum _{k,l} T_{ik} T_{jl} \int \eta ^k\eta ^l\mu (\mathrm {d}\eta ) = \sum _{k,l} T_{ik} T_{jl} B_\mu ^{kl}. \end{aligned}$$

\(\square \)

Canonical Measures are Stepping Distributions

Now we have the following result.

Proposition 2.12

Let \(\mu \) be a probability measure on \(T_pM\). Then \(\mu \) is canonical if and only if it satisfies (5) for some \(c>0\).

Proof

First assume that \(\mu \) is canonical and let (xU) be normal coordinates centered at p. Because of Lemma 2.8 it suffices to verify (5) for x, so we need to show that \(A_\mu =0\) and \(B_\mu =cG^{-1}=cI\) for some \(c>0\).

The fact that \(A_\mu =0\) is just Remark 2.10. Now note that since \(B_\mu \) is symmetric, it can be diagonalized as \(TB_\mu T^{-1}\) where T is an orthogonal matrix (in the usual sense). This means that \(T^T=T^{-1}\) and that \(T^TGT=T^TIT=T^TT=I=G\), so Lemma 2.11 tells us that the diagonalization equals \(TB_\mu T^T=B_\mu \). This implies that \(B_\mu \) is a diagonal matrix. Now for \(i\ne j\) let \({\bar{I}}^{ij}\) be the \(n\times n\)-identity matrix with the \(i^\text {th}\) and \(j^\text {th}\) column exchanged. It is easy to see that \(({\bar{I}}^{ij})^T{\bar{I}}^{ij}=I\), so we must also have \(B_\mu ={\bar{I}}^{ij}B_\mu ({\bar{I}}^{ij})^T\). The latter is \(B_\mu \) with the \(i^\text {th}\) and \(j^\text {th}\) diagonal element exchanged. This shows that these elements must be equal. Hence all diagonal elements are equal and \(B_\mu =cI\) for some \(c\in \mathbb {R}\). Since \(c=B_\mu ^{11}=\int \eta ^1\eta ^1\mu (\mathrm {d}\eta )\ge 0\), we know that \(c\ge 0\). If \(c=0\), then \(B_\mu =0\), so \(\mu =0\), which is not possible. We conclude that \(c>0\).

Conversely let (xU) be a coordinate system with corresponding metric matrix G and assume that \(\mu \) satisfies (5) for some \(c>0\). Let T be such that \(G=T^TGT\). Then \(A_\mu =0=T0=TA_\mu \). We also see: \(T^TGT=G \iff G=(T^T)^{-1}GT^{-1} \iff G^{-1}=TG^{-1}T^T \iff cG^{-1}=T(cG^{-1})T^T \implies B_\mu =TB_\mu T^T\) (since \(B_\mu =cG^{-1}\)), so by Lemma 2.11\(\mu \) is canonical. \(\square \)

Now we know that if the stepping distribution is canonical (and finite and compactly supported, uniformly on M), the generators converge to the generator of Brownian motion that is speeded up by some factor \(c>0\) (depending on \(\mu \)). The question remains what this c is. The following lemma answers this question.

Lemma 2.13

Suppose \(\mu \) satisfies (5) for some \(c>0\). Then \(c=\frac{\int ||\eta ||^2\mu (\mathrm {d}\eta )}{n}\).

Proof

We calculate the following (with respect to some coordinate system (xU)):

$$\begin{aligned} \int ||\eta ||^2 \mu (\mathrm {d}\eta )= & {} \int \left<\eta ,\eta \right>\mu (\mathrm {d}\eta ) = \int \left<\sum _i \eta ^i\frac{\partial }{\partial x^i},\sum _j \eta ^j\frac{\partial }{\partial x^j}\right>\mu (\mathrm {d}\eta ) \\= & {} \sum _{i,j}\left<\frac{\partial }{\partial x^i},\frac{\partial }{\partial x^j}\right>\int \eta ^i\eta ^j\mu (\mathrm {d}\eta )\\= & {} \sum _{i,j} g_{ij}c g^{ij} = c \sum _i \sum _j g_{ij}g^{ji} = c\sum _i 1 = cn. \end{aligned}$$

Hence \(c=\frac{\int ||\eta ||^2\mu (\mathrm {d}\eta )}{n}\). \(\square \)

The nice part of this lemma is that the expression for c does not involve a coordinate system, only the norm (and hence inner product) of \(T_pM\). In particular we see that \(c=1\) is equivalent to \(\int ||\eta ||^2\mu (\mathrm {d}\eta )=n\). We summarize our findings in the following result.

Proposition 2.14

A probability measure \(\mu \) on \(T_pM\) satisfies (5) for some \(c>0\) if and only if it is canonical and \(c=\frac{\int ||\eta ||^2\mu (\mathrm {d}\eta )}{n}\). In particular, it satisfies (4) if and only if it is canonical and \(\int ||\eta ||^2\mu (\mathrm {d}\eta )=n\).

Remark 2.15

Note that all we need of the jumping distributions is that their mean is 0, their covariance matrix is invariant under orthogonal transformations, they are (uniformly) compactly supported and they are (uniformly) finite. We don’t need the measures to be similar in any other way, so we do not at all require the jumps to have identical distributions in the sense of Jørgensen [12].

Examples

  1. 1.

    To satisfy (4) for every coordinate system, by Lemma 2.8 it suffices to choose a coordinate system and construct a distribution that satisfies (4) for that coordinate system. Let (xU) be any coordinate system around some point in M with corresponding metric matrix G in that point. Let X be any random variable in \(\mathbb {R}^n\) that has mean vector 0 and covariance matrix \(G^{-1}\) (for instance let \(X\sim N(0,G^{-1})\)). Now let \(\mu \) be the distribution of \(\sum _i X^i\frac{\partial }{\partial x^i}\). Then by construction \(\int \eta ^i\mu (\mathrm {d}\eta )=\mathbb {E}X^i = 0\) and \(\int \eta ^i\eta ^j\mu (\mathrm {d}\eta )=\mathbb {E}X^iX^j = \mathbb {E}X^iX^j -\mathbb {E}X^i\mathbb {E}X^j=g^{ij}\).

  2. 2.

    In the previous Example (4) is immediate. Let us now consider an example that illustrates the use of Proposition 2.14. Let \(\mu _p\) be the uniform distribution on \(\sqrt{n}S_pM\) (the vectors with norm \(\sqrt{n}\)). By definition of such a distribution, it is invariant under orthogonal transformations (rotations and reflections), so it is a canonical distribution. Since also \(\int ||\eta ||^2\mu (\mathrm {d}\eta ) = \int \sqrt{n}^2 \mu (\mathrm {d}\eta )=n\), we conclude that the uniform distribution on \(\sqrt{n}S_pM\) satisfies (4). Moreover, \(\sup _{p\in M} \sup _{\eta \in \mathrm {supp}\mu _p} ||\eta ||=\sqrt{n}<\infty \) and \(\sup _{p\in M} \mu _p(T_pM)=1<\infty \). Together this shows that the \(\mu _p\)’s satisfy the assumption of proposition 2.1.

  3. 3.

    Let us conclude by showing for \(\mathbb {R}^n\) that the class of canonical distributions is strictly larger than the class of distributions that are invariant under orthogonal transformations, even with the restriction that \(\int ||\eta ||^2\mu (\mathrm {d}\eta )=n\). It suffices to find a distribution \(\mu \) with mean 0 and covariance matrix I (since then \(\mu \) satisfies (4) and 2.14 then tells us that \(\mu \) is canonical and has \(\int ||\eta ||^2\mu (\mathrm {d}\eta )=n\)) and an orthogonal T such that \(\mu \ne \mu \circ T^{-1}\). Let \(\nu \) be the distribution on \(\mathbb {R}\) given by \(\nu =\frac{1}{5}\delta _{-2}+\frac{4}{5}\delta _{1/2}\). Then, using the natural coordinate system, \(\int t \nu (\mathrm {d}t)=\frac{1}{5}(-2)+\frac{4}{5}\frac{1}{2}=0\) and \(\int t^2\mu (\mathrm {d}t) = \frac{1}{5}(-2)^2+\frac{4}{5}(\frac{1}{2})^2=1\). Now let \(\mu =\nu \times \cdots \times \nu \) (n times). Then we directly see that the mean vector is 0 and the covariance matrix is I. However \(T=-I\) is an orthogonal transformation and \(\mu \circ (-I)^{-1}\) equals the product of n times \(\frac{1}{5}\delta _{2}+\frac{4}{5}\delta _{-1/2}\), so obviously \(\mu \ne \mu \circ (-I)^{-1}\).

3 Uniformly Approximating Grids

We would like to consider interacting particle systems such as the symmetric exclusion process on a manifold. Because the exclusion process does not make sense directly in a continuum, we need a proper discrete grid approximation. More precisely, we need a sequence of grids on the manifold that converges to the manifold in a suitable way. It will become clear that the grids will need to approximate the manifold in a uniform way. We will see in Sect. 4 that a natural requirement on the grids is that we can define edge weights (or, equivalently, random walks) on them, such that the graph Laplacians converge to the Laplace-Beltrami operator in a suitable sense.

To be more precise, we would like to have a sequence \((p_n)_{n=1}^\infty \) in M and construct a sequence of grids \((G^N)_{N=1}^\infty \) by setting \(G^N=\{p_1,\ldots ,p_N\}\). On each \(G^N\), we would like to define a random walk \(X^N\) which jumps from \(p_i\) to \(p_j\) with (symmetric) rate \(W^N_{ij}\) with the property that there exists some function \(a:\mathbb {N}\rightarrow [0,\infty )\) and some constant \(C>0\) such that for each smooth \(\phi \)

$$\begin{aligned} a(N)\sum _{j=1}^NW^N_{ij}(\phi (p_j)-\phi (p_i))\longrightarrow C\Delta _M\phi (p_i)\quad (N\rightarrow \infty ) \end{aligned}$$

where the convergence is in the sense that for all smooth \(\phi :M\rightarrow \mathbb {R}\)

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}\sum _{i=1}^N \left| a(N)\sum _{j=1}^NW^N_{ij}(\phi (p_j)-\phi (p_i))- C\Delta _M\phi (p_i)\right| = 0. \end{aligned}$$
(6)

Definition 3.1

We call a sequence of grids and corresponding weights \((G_N,W_N)_{N=1}^\infty \) uniformly approximating grids if they satisfy (6).

Remark 3.2

(Comparison with standard grids) To give an idea of how known grids in Euclidean spaces can be incorporated in this framework, let S be the one-dimensional torus. Let \(S^N\) be the grid that places a grid point in \(k/N, k=1,\ldots ,N\). Now we can define a nearest neighbour random walk by putting \(W^N_{ij}=\mathbb {1}_{|p_i-p_j|=1/N}\). Also set \(a(N)=N^2\). Then we see for a point \(p_i\in S^N\) for \(N=2^m\) for some \(m\in \mathbb {N}\) that

$$\begin{aligned} a(N)\sum _{j=1}^NW^N_{ij}(\phi (p_j)-\phi (p_i))= & {} N^2 (\phi (p_i+1/N)+\phi (p_i-1/N)-2\phi (p_i)) \\= & {} \phi ''(p_i)+O(N^{-1}). \end{aligned}$$

The compactness of the torus easily implies that this rest term can be bounded uniformly. This implies that (6) holds.

We will show in Sect. 4 that if we define the Symmetric Exclusion Process on uniformly approximating grids we can prove that its hydrodynamic limit satisfies the heat equation on M.

It is not obvious how uniformly approximating grids could be defined. Most natural grids in Euclidean settings involve some notion of equidistance, scaling or translation invariance. All of these concepts are very hard if not intrinsically impossible to define on a manifold. The current section is dedicated to showing that uniformly approximating grids actually exist. To be more precise, we will show that a sequence \((p_n)_{n=1}^\infty \) can be used to define such grids if the empirical measures \(1/N\sum _{i=1}^N\delta _{p_i}\) converge to the uniform distribution in Kantorovich sense. In Sect. 3.4 we will show that such sequences exist: they are obtained with probability 1 when sampling uniformly from the manifold, i.e. from the normalized Riemannian volume measure.

For the calculations of this section, we need a result that forms the core of proving the invariance principle, which we have proved in Sect. 2.

Remark 3.3

At first sight the requirement that the empirical measures approximate the uniform measure and that the grid points can be sampled uniformly seems arbitrary, but this is actually quite natural. We want to construct a random walk with symmetric jumping rates (we need this for instance for the Symmetric Exclusion Process later). This implies that the invariant measure of the random walk is the counting measure, so the random walk spend on average the same amount of time in each point of the grid. Hence the amount of time that the random walk spends in some subset of the manifold is proportional to the amount of grid points in that subset. Since we want the random walk to approximate Brownian motion and the volume measure is invariant for Brownian motion, we want the amount of time that the random walk spends in a set to be proportional to the volume of the set. This means that the amount of grid points in a subset of M should be proportional to the volume of that subset. This suggests that the empirical measures \(1/N\sum _{i=1}^N\delta _{p_i}\) should in some sense approximate the uniform measure. Moreover, a natural way to let the amount of grid points in a subset be proportional to its volume is by sampling grid points from the uniform distribution on the manifold.

3.1 Model and Motivation

Motivation

In statistical data analysis the following setting is known and used in various contexts such as data clustering, dimension reduction, computer vision and statistical learning, see: Singer [18], von Luxburg et al. [22], Giné et al. [9], Belkin and Niyogi [3] and Belkin [2] and references therein for general background and various applications. Suppose we have a manifold M that is embedded in \(\mathbb {R}^m\) for some m and we would like to recover the manifold from some observations of it, say an i.i.d. sample of uniform random elements of M. To do this we can describe the observations as a graph with as weight on the edge between two points a semi positive kernel with bandwidth \(\epsilon \) applied to the Euclidean distance between those points. Then it can be shown that the graph Laplacian of the graph that is obtained in this way converges in a suitable sense to the Laplace-Beltrami operator on M as the number of observations goes to infinity and \(\epsilon \) goes to 0. This suggests that we could define random walks on such random graphs and that the corresponding generators converge to the generator of Brownian motion. We generalize this idea by taking a more general sequence of graphs, but our main example (in Sect. 3.4) will be this random graph.

The main distinction between the statistical literature and our context is the following: for our purposes it is much more natural to view the manifold M on its own instead of embedded in a possibly high dimensional Euclidean space. This means that we have to use the distance that is induced by the Riemannian metric instead of the Euclidean distance. The latter is more suitable to purposes in statistics, because in that setting the Riemannian metric on M is not known beforehand. Also, a lot is known about the behaviour of the Euclidean distance in this type of situation and not so much about the distance on the manifold. We will have to make things work in M itself.

The problem of discretizing the Laplacian on a manifold (without embedding in a Euclidean space) is also studied in the analysis literature where the main concern is the convergence of spectra, see for instance: Burago et al. [5], Fujiwara [8] and Aubry [1], where structures like \(\epsilon \)-nets or triangulations are used to discretize the manifold. However, since we want to define the exclusion process on our discrete weighted graph which approximates the manifold, it is important that the edge weights are symmetric. Therefore these papers cannot be applied in our context.

Model

Let M be a compact and connected Riemannian manifold. We call a function f on M Lipschitz with Lipschitz constant \(L_f\) if

$$\begin{aligned} \sup _{p,q\in M}\frac{|f(p)-f(q)|}{d(p,q)} = L_f<\infty . \end{aligned}$$

Let \((p_n)_{n\ge 1}\) be a sequence in M such that \(\mu ^N:=\frac{1}{N}\sum _{i=1}^N\delta _{p_i}\) converges in the Kantorovich sense to \({\bar{V}}\) (the uniform distribution on M), i.e.

$$\begin{aligned} W_1(\mu ^N,{\bar{V}}) = \sup _{f\in \mathcal {F}_1(M)} \left\{ \int _M f\mathrm {d}\mu ^N - \int _M f \mathrm {d}{\bar{V}}\right\} \rightarrow 0, \end{aligned}$$

where \(\mathcal {F}_1(M)\) denotes the set of Lipschitz functions f on M that have Lipschitz constant \(L_f\le 1\). Define the \(N^{\text {th}}\) grid \(V_N\) as \(V_N=\{p_1,\ldots ,p_N\}\). Set

$$\begin{aligned} \epsilon :=\epsilon (N):=\left( \sup _{m\ge N} W_1(\mu ^m,\bar{V})\right) ^{\frac{1}{4+d}}. \end{aligned}$$
(7)

This \(\epsilon \) rescales the distance over which particles will jump. Naturally, \(\epsilon \downarrow 0\) as \(N\rightarrow \infty \) (since \(W_1(\mu ^N,{\bar{V}})\rightarrow 0\)). Let \(k:[0,\infty )\rightarrow [0,\infty )\) be Lipschitz and compactly supported (for instance \(k(x)=(1-x)\mathbb {1}_{[0,1]}(x)\)), we will call such k a kernel. Define

$$\begin{aligned} W^\epsilon _{ij}=k(d(p_i,p_j)/\epsilon ) \end{aligned}$$

as the jumping rate from \(p_i\) to \(p_j\). Here d is the Riemannian metric on M. Note that the only dependence on N is through \(\epsilon \), hence the notation \(W^\epsilon _{ij}\) instead of \(W^N_{ij}\). These jumping rates define a random walk on \(V_N\). If we regard to points \(p_i,p_j\) as having an edge between them if \(W^N_{ij}>0\), we want the resulting graph to be connected (to make sense of the random walk and later of the particle systems defined on it). If we assume that there is some \(\alpha \) such that \(k(x)>0\) for \(x\le \alpha \), one can show that the resulting graph is connected for N large enough. The main reason is that the distance between points that are close to each other goes to zero faster than \(\epsilon \). The details of the proof are in the appendix (see also Remark 3.6). Finally we define

$$\begin{aligned} a(N)=\epsilon ^{-2-d}N^{-1}. \end{aligned}$$

To prove that the grids are uniformly approximating we have to show (6), i.e. as the number of points N goes to infinity (and hence the bandwidth \(\epsilon \) goes to 0)

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^N \left| a(N)\sum _{j=1}^NW^\epsilon _{ij}(f(p_j)-f(p_i))- C\Delta _Mf(p_i)\right| \longrightarrow 0 \quad (N\rightarrow \infty ). \end{aligned}$$

We will prove the following slightly stronger result:

$$\begin{aligned} \sup _{1\le i\le N} \left| a(N)\sum _{j=1}^NW^\epsilon _{ij}(f(p_j)-f(p_i))- C\Delta _Mf(p_i)\right| \longrightarrow 0 \quad (N\rightarrow \infty ). \end{aligned}$$
(8)

Note that since the process defined above is just a continuous-time random walk its generator is given by

$$\begin{aligned} L^Nf(p_i)=\sum _{j=1}^NW^\epsilon _{ij}(f(p_j)-f(p_i)). \end{aligned}$$
(9)

Therefore we call (8) “convergence of the (rescaled) generators to \(\Delta _M\) uniformly in the \(p_i\)’s for \(i\le N\)” or just “convergence of the generators to \(\Delta _M\) uniformly for \(i\le N\)”. In fact, we will show that the rate of convergence does not depend on \(p_i\), so we might as well call it “uniformly in the \(p_i\)’s”.

Remark 3.4

In fact, we can say more. We denote the semigroups corresponding to the generators \(a(N)\sum _{j=1}^NW^\epsilon _{ij}(f(p_j)-f(p_i))\) by \(S_t^N\) and the semigroup corresponding to \(C\Delta _M\) by \(S_t\). Then (8) implies that uniformly on compact time intervals

$$\begin{aligned} \sup _{1\le i\le N} \left| S_t^Nf|_{G^N}(p_i)- S_tf(p_i)\right| \longrightarrow 0 \quad (N\rightarrow \infty ). \end{aligned}$$

The proof is a straightforward application of (Kurtz [14], Theorem 2.1) and a small argument that the extended limit of the generators above (as described in Kurtz [14]) equals \(C\Delta \) since they are equal on the smooth functions.

Remark 3.5

To see why the rescaling a(N) is natural, we can write

$$\begin{aligned} a(N)L^Nf(p_i)=\frac{1}{\epsilon ^2}\sum _{j=1}^N\frac{k\left( \frac{d(p_i,p_j)}{\epsilon }\right) }{N\epsilon ^d}(f(p_j)-f(p_i)). \end{aligned}$$

Since k is a kernel that is rescaled by \(\epsilon \) inside, we need the \(1/\epsilon ^d\) to make sure the integral of the kernel stays of order 1 as \(\epsilon \) goes to 0. Since the amount of points that the process can jump to equals N, we also need the factor 1 / N to make sure the jumping rate is of order 1 as N goes to infinity. Also note that the typical distance that a particle jumps with these rates is of order \(\epsilon \). This means that space is scaled by \(\epsilon \). Hence it is very natural to expect that time should be rescaled by \(1/\epsilon ^2\), which is exactly what we have.

Finally note that in the calculations N is the main parameter and \(\epsilon \) an auxiliary parameter depending on N. However, conceptually, when the scaling is concerned, the most important parameter is \(\epsilon \). N is just the total amount of positions and simply has to grow fast enough as \(\epsilon \) goes to 0. To see why this is true, note that any sequence \(\epsilon (N)\) that goes to 0 more slowly than what we use here will also do. Hence \(\epsilon \) should go to 0 slow enough with respect to N or, equivalently, N should go to infinity fast enough with respect to \(\epsilon \).

Remark 3.6

We mentioned earlier that N must grow to infinity fast enough as \(\epsilon \) goes to 0. In fact, with \(\epsilon \) as defined in (7), the number of points in a ball of radius \(\epsilon \) goes to infinity (even though \(\epsilon \) shrinks to 0). In particular, this means that the number of points that a particle can jump to, goes to infinity. This is very different from the \(\mathbb {R}^n\) case with the lattice approximation \(\frac{1}{N}\mathbb {Z}^d\), where the number of neighbours is constant. The reason why it should be different in the manifold case is the following. In \(\mathbb {R}^d\), the natural grid \(\frac{1}{N}\mathbb {Z}^d\) is very symmetric. Indeed, we can split the graph Laplacian into the contributions \(N^2(f(x+\mathrm {e}_i/N)+f(x-\mathrm {e}_i/N)-2f(x))\) in each direction i, where \(\mathrm {e}_i\) it the unit vector in direction i. Now when applying Taylor we see that the first order terms cancel perfectly, leaving us only with the second order terms, which we want for the Laplacian. In a manifold such perfect cancellation is not possible. Therefore the way to make the first order terms cancel is to sample more and more points around a grid point, such that the sum over the linear order terms becomes an integral which then vanishes in the limit. For this reason we need the number of points in a ball of size \(\epsilon \) to go to infinity.

Remark 3.7

It is also possible to define \(W_{ij}^N\) as \(p_\epsilon (p_i,p_j)\), the heat kernel after time \(\epsilon \), and rescale by \(\epsilon ^{-1}\) instead of \(\epsilon ^{-2-d}\). Then the result of Sect. 3.2 can be proven in the same way (by obtaining some good bounds on Lipschitz constants and suprema of the heat kernel and choosing \(\epsilon =\epsilon (N)\) appropriately, see Cipriani and van Ginkel [6]) and the result of Sect. 3.3 is a direct consequence of the fact that the Laplace-Beltrami operator generates the heat semigroup. However, for purposes of application/simulation the weights that we have chosen here are much easier to calculate (since only the geodesic distances need to be known, not the heat kernel).

3.2 Replacing Empirical Measure by Uniform Measure

We would like to show that in this case there is a C independent of i such that for all smooth f

$$\begin{aligned} \lim _{N\rightarrow \infty }\epsilon ^{-2-d}N^{-1}\sum _{j=1}^N k(d(p_j,p_i)/\epsilon )\left[ f(p_j)-f(p_i)\right] = C\Delta _Mf(p_i) \end{aligned}$$

uniformly in the \(p_i\)’s.

We can write

$$\begin{aligned} \epsilon ^{-2-d}N^{-1}\sum _{j=1}^N k(d(p_j,p_i)/\epsilon )\left[ f(p_j)-f(p_i)\right] = \epsilon ^{-2-d} \int _M g^{\epsilon ,i}\mathrm {d}\mu ^N, \end{aligned}$$
(10)

where

$$\begin{aligned} g^{\epsilon ,i}(p)=k(d(p,p_i)/\epsilon )\left[ f(p)-f(p_i)\right] . \end{aligned}$$

Now (10) equals

$$\begin{aligned} \epsilon ^{-2-d} \int _M g^{\epsilon ,i}\mathrm {d}{\bar{V}}+\epsilon ^{-2-d} \int _M g^{\epsilon ,i}\mathrm {d}(\mu ^N-{\bar{V}}). \end{aligned}$$
(11)

We will show later that the first term converges to \(C\Delta _Mf(p_i)\) (uniformly in the \(p_i\)’s) as \(N\rightarrow \infty \). Therefore it suffices for now to show that the second term converges to 0, uniformly in the \(p_i\)’s.

Note that k is Lipschitz so it has some Lipschitz constant \(L_k<\infty \). This implies that

$$\begin{aligned} \left| k\left( \frac{d(q^1,p_i)}{\epsilon }\right) -k\left( \frac{d(q^2,p_i)}{\epsilon }\right) \right| \le L_k \left| \frac{d(q^1,p_i)}{\epsilon }-\frac{d(q^2,p_i)}{\epsilon }\right| \le \frac{L_k}{\epsilon } d(q^1,q^2), \end{aligned}$$

by the reverse triangle inequality, so \(k(d(\cdot ,p_i)/\epsilon )\) has Lipschitz constant \(\frac{L_k}{\epsilon }\). f is smooth, so it is Lipschitz too with Lipschitz constant \(L_f\). Since \(f(p_i)\) is just a constant, \(f(\cdot )-f(p_i)\) is also Lipschitz with Lipschitz constant \(L_f\). Since they are both bounded functions, we see for the Lipschitz constant of \({g^{\epsilon ,j}}\):

$$\begin{aligned} L_{g^{\epsilon ,j}}\le & {} L_{k(d(\cdot ,p_i)/\epsilon )} ||f(\cdot )-f(p_i)||_\infty + ||k(d(\cdot ,p_i)/\epsilon )||_\infty L_{f(\cdot )-f(p_i)} \\\le & {} \frac{2L_k}{\epsilon }||f||_\infty + ||k||_\infty L_f. \end{aligned}$$

Note that k is bounded since it is Lipschitz and compactly supported, so \(||k||_\infty <\infty \). This shows that:

$$\begin{aligned} \left| \epsilon ^{-2-d} \int _M g^{\epsilon ,i}\mathrm {d}(\mu ^N-\bar{V})\right|\le & {} \epsilon ^{-2-d}\left( \frac{2L_k}{\epsilon }||f||_\infty + ||k||_\infty L_f\right) W_1(\mu ^N,\nu )\\= & {} \epsilon (N)^{-3-d}\left( 2L_k||f||_\infty + \epsilon (N)||k||_\infty L_f\right) W_1(\mu ^N,\nu ), \end{aligned}$$

where we denoted the dependence of \(\epsilon \) on N explicitly. By (7), \(W_1(\mu ^N,\nu )\le \epsilon (N)^{4+d}\), so we obtain

$$\begin{aligned} \left| \epsilon ^{-2-d} \int _M g^{\epsilon ,i}\mathrm {d}(\mu ^N-\bar{V})\right| \le \epsilon \left( 2L_k||f||_\infty + \epsilon ||k||_\infty L_f\right) . \end{aligned}$$

Note that this bound does not depend on \(p_i\). Since \(\epsilon \rightarrow 0\), it follows that the second term of (11) goes to 0 uniformly in the \(p_i\)’s.

What Remains

What we have seen above basically means that we can replace the empirical distribution \(\mu ^N\) by the uniform distribution \({\bar{V}}\). For convergence of the generators we still have to show that

$$\begin{aligned} \lim _{\epsilon \downarrow 0} \epsilon ^{-2-d}\int _M k(d(p,p_i)/\epsilon ) \left[ f(p)-f(p_i)\right] {\bar{V}} (\mathrm {d}p) = C\Delta _Mf(p_i) \end{aligned}$$

uniformly in the \(p_i\)’s. Note that we can replace \(N\rightarrow \infty \) by \(\epsilon \downarrow 0\), since the expression only depends on N via \(\epsilon \) and \(\epsilon (N)\downarrow 0\) as \(N\rightarrow \infty \). Since the \(p_i\)’s are all in M we can replace \(p_i\) by q and require that the convergence is uniform in \(q\in M\).

Because of these considerations it remains to show that there exists \(C>0\) such that uniformly in \(q\in M\):

$$\begin{aligned} \lim _{\epsilon \downarrow 0} \epsilon ^{-2-d} \int _M k(d(p,q)/\epsilon ) \left[ f(p)-f(q)\right] {\bar{V}} (\mathrm {d}p) = C\Delta _Mf(q). \end{aligned}$$
(12)

Note that for every \(\epsilon >0\) this expression can be interpreted as the generator of a jump process on the manifold M. The process jumps from p to a (measurable) set \(Q\subset M\) with rate \(\int _Q \epsilon ^{-2-d}k(d(p,q)/\epsilon )\mathrm {d}{\bar{V}}\).

Remark 3.8

Note that this is easy to show in \(\mathbb {R}^d\). Indeed, using the transformation \(u=(y-x)/\epsilon \) and Taylor, we see

$$\begin{aligned}&\epsilon ^{-2-d} \int _{\mathbb {R}^d} k\left( \frac{\Vert y-x\Vert }{\epsilon }\right) (f(y)-f(x))\mathrm {d}y = \epsilon ^{-2} \int _{\mathbb {R}^d} k(\Vert u\Vert ) (f(x+\epsilon u)-f(x))\mathrm {d}u\\&\quad = \epsilon ^{-1} \int _{\mathbb {R}^d} k(\Vert u\Vert ) \nabla f(x) \cdot u \mathrm {d}u + \frac{1}{2} \int _{\mathbb {R}^d} k(\Vert u\Vert ) u^TH(x)u\mathrm {d}u + O(\epsilon ), \end{aligned}$$

where H(x) is the Hessian of f in x. Now changing coordinates to integrate over each sphere \(B_r\) of radius r with respect to the appropriate surface measure \(S_r\) and then with respect to r, we obtain

$$\begin{aligned} \epsilon ^{-1} \int _{\mathbb {R}} k(r) \int _{B_r} \nabla f(x) \cdot w S_r(\mathrm {d}w)\mathrm {d}r + \frac{1}{2} \int _{\mathbb {R}} k(r) \int _{B_r} w^TH(x)w S_r(\mathrm {d}w)\mathrm {d}r + O(\epsilon ). \end{aligned}$$

Now because of symmetry the integrals of \(w_i\) and of \(w_iw_j\) over spheres vanish for each \(i\ne j\). Moreover the integrals of \(w_i^2\) do not depend on i, but only on r. Therefore the first term vanishes and we are left with

$$\begin{aligned} \frac{1}{2} \int _{\mathbb {R}} k(r) C(r) \Delta f(x) \mathrm {d}r + O(\epsilon ) = C'\Delta f(x) + O(\epsilon ). \end{aligned}$$

This shows convergence (at least pointwise, for uniform convergence we have to be a little more careful about the \(O(\epsilon )\)).

3.3 Convergence Result

Integral Over Tangent Space

Let \(\alpha >0\) be such that \(\mathrm {supp}~k\subset [0,\alpha ]\) (such \(\alpha \) exists since k is compactly supported). We denote for \(p\in M,r>0: B_d(p,r)=\{q\in M: d(p,q)\le r\}\). Then we can write

$$\begin{aligned} \int _M k(d(p,q)/\epsilon )(f(q)-f(p)) {\bar{V}}(\mathrm {d}q) = \int _{B_d(p,\alpha \epsilon )} k(d(p,q)/\epsilon ) (f(q)-f(p)) \bar{V}(\mathrm {d}q). \end{aligned}$$
(13)

Denote for \(\eta \in T_pM, r>0: B_p(\eta ,r)=\{\xi \in T_pM: ||\xi -\eta ||\le r\}\) (not to be confused with \(B_\rho \), which is a ball in M with respect to the original metric \(\rho \)). For \(\epsilon \) small enough we know that \(\exp _p: T_pM\supset B_p(0,\alpha \epsilon ) \rightarrow B_d(p,\alpha \epsilon )\subset M\) is a diffeomorphism. We want to use this to write the integral above as an integral over \(B_p(0,\epsilon )\subset T_pM\):

$$\begin{aligned}&\int _{B_d(p,\alpha \epsilon )} k(d(p,q)/\epsilon ) (f(q)-f(p)) \bar{V}(\mathrm {d}q) \nonumber \\&\quad = \int _{B_p(0,\alpha \epsilon )} k(d(p,\exp _p(\eta ))/\epsilon ) (f(\exp _p(\eta ) )-f(p)) \bar{V}\circ \exp (\mathrm {d}\eta )\nonumber \\&\quad = \int _{B_p(0,\alpha )} k(d(p,\exp _p(\epsilon \eta ))/\epsilon ) (f(\exp _p(\epsilon \eta ))-f(p)) \bar{V}\circ \exp \circ \lambda _\epsilon (\mathrm {d}\eta ) \nonumber \\&\quad = \int _{B_p(0,\alpha )} k(||\eta ||) (f(\exp _p(\epsilon \eta ))-f(p)) {\bar{V}}\circ \exp \circ \lambda _\epsilon (\mathrm {d}\eta ). \end{aligned}$$
(14)

This means we integrate with respect to the measure \(\bar{V}\circ \exp \circ \lambda _\epsilon \), where \(\lambda _\epsilon \) denotes multiplication with \(\epsilon \).

Determining the Measure \(\bar{V}\circ \exp \circ \lambda _\epsilon \)

Since \(B_p(0,\alpha \epsilon )\) is a star-shaped open neighbourhood of 0, we see that for \(\epsilon \) small enough \(V_\epsilon :=B_d(p,\alpha \epsilon )=\exp _p(B_p(0,\alpha \epsilon ))\) is a normal neighbourhood of p, so there exists a normal coordinate system \((x,V_\epsilon )\) that is centered at p. We interpret, for \(v\in \mathbb {R}^n\), \(v_p\in T_pM\) as \(\sum _i v_i \frac{\partial }{\partial x^i}\). Consequently, when we write \(A_p\) for some subset A of \(\mathbb {R}^n\), we mean \(\{v_p: v\in A\}\). Since the basis \(W=\left( \frac{\partial }{\partial x^1}\ldots ,\frac{\partial }{\partial x^n}\right) \) is orthogonal in \(T_pM\), it is easy to see that \(\phi :=v_p\mapsto v\) preserves the inner product and is an isomorphism of inner product spaces. Indeed,

$$\begin{aligned} ||v_p||^2=\left<v_p,v_p\right>=(v_p)^i(v_p)^jg_{ij}=\sum _{ij}v^iv^j\delta ^i_j=\sum _i (v^i)^2=||v||^2. \end{aligned}$$

In particular \(B_{\mathbb {R}^n}(0,\alpha \epsilon )_p=B_p(0,\alpha \epsilon )\) (where \(B_{\mathbb {R}^n}\) denotes a ball in \(\mathbb {R}^n\) with respect to the Euclidean metric). We can use this in the following lemma, which tells us more about \({\bar{V}}\circ \exp \circ \lambda _\epsilon \).

Lemma 3.9

There exist \(\epsilon '>0\) and a function \(h:B_{\mathbb {R}^n}(0,\epsilon ')\rightarrow \mathbb {R}\) such that for t tending to \(0\,h(t)=O(||t||^2)\) and for all \(0<\epsilon <\epsilon '\): \(\bar{V}\circ \exp \circ \lambda _\epsilon =\epsilon ^n \left( \frac{1+h(\epsilon t)}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n \right) \circ \phi \) on \(B_p(0,\alpha )\).

Proof

Let \(\epsilon '\) be small enough such that the considerations above the lemma hold and let \(\epsilon <\epsilon '\). For clarity of the proof, we first separately prove the following statement.

Claim:\(x\circ \exp =\phi \) on \(B_{\mathbb {R}^n}(0,\alpha \epsilon )_p\).

Proof

The geodesics through p are straight lines with respect to x, so they are of the form \(x(\gamma (t))=ta+b\) with \(a,b\in \mathbb {R}^n\). For \(\eta =\sum _i \eta ^i\frac{\partial }{\partial x^i}\), the geodesic starting at p with tangent vector \(\eta \) at p should satisfy \(b=x(p)=0\) and \(a_i=\eta ^i\) for all i, so we see \(\gamma ^k=t\eta ^k\). For \(q\in B_d(p,\alpha \epsilon )\), we see \(x^k(\exp (x(q)_p))=1*x^k(q)=x^k(q)\), so \(\exp (x(q)_p)=q\). This also shows that \(x\circ \exp (v_p)=v\) for \(v\in B_{\mathbb {R}^n}(0,\alpha \epsilon )\) (since x is invertible), which gives an identification

$$\begin{aligned} x\circ \exp : T_pM\supset B_{\mathbb {R}^n}(0,\alpha \epsilon )_p \rightarrow B_{\mathbb {R}^n}(0,\alpha \epsilon )\subset \mathbb {R}^n \end{aligned}$$

which is the restriction of \(\phi \) to \(B_{\mathbb {R}^n}(0,\alpha \epsilon )_p\). This situation is sketched in Fig. 3. \(\square \)

Now we will first use the definition of integration to see what the measure is in coordinates (so it becomes a measure on a subset of \(\mathbb {R}^n\)). Then we will use the claim above: we will pull the measure on \(\mathbb {R}^n\) back to \(T_pM\) using \(\phi \).

On \((x,V_\epsilon )\) the volume measure is given by \(\sqrt{\det G} \mathrm {d}x^1\wedge \ldots \wedge \mathrm {d}x^n\). According to (Wang [23], Cor 2.3), \(\sqrt{\det G}\) can be expanded (in normal coordinates) as \(1+h(x)\) where h is such that \(h(x)=O(||x||^2)\). Now the measure can be written in local coordinates on \(B_{\mathbb {R}^n}(\alpha \epsilon ')\) as \((1+h(x))\mathrm {d}x^1\wedge \ldots \wedge \mathrm {d}x^n\), so the uniform measure is \(\frac{1+h(x)}{V(M)}\mathrm {d}x^1\wedge \ldots \wedge \mathrm {d}x^n\). This yields the measure \({\bar{V}}\circ x^{-1}=\frac{1+h(t)}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n\) on \(x(V_{\epsilon '})=B_{\mathbb {R}^n}(0,\alpha \epsilon ')\). We have on \(B_{\mathbb {R}^n}(0,\alpha )_p\):

$$\begin{aligned} {\bar{V}}\circ \exp \circ \lambda _\epsilon =({\bar{V}}\circ x^{-1})\circ (x\circ \exp )\circ \lambda _\epsilon . \end{aligned}$$

According to the claim above, \(x\circ \exp \) is a restriction of \(\phi \), so we can replace it by \(\phi \). Since this map is linear, it can be interchanged with \(\lambda _\epsilon \), which yields (inserting what we found before and since \(\epsilon <\epsilon '\)):

$$\begin{aligned} \left( \frac{1+h(t)}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n \right) \circ \lambda _\epsilon \circ \phi =\left( \frac{\epsilon ^n(1+h(\epsilon t))}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n \right) \circ \phi . \end{aligned}$$

In the last step we interpret \(\frac{\epsilon ^n(1+h(\epsilon t))}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n\) as a measure on \(B_{\mathbb {R}^n}(0,\alpha )\) and this last step is then just a transformation of measures on \(\mathbb {R}^n\). This yields the expression that we want. \(\square \)

Fig. 3
figure 3

The situation in Lemma 3.9. On \(B_p(0,\alpha \epsilon )\): \(x\circ \exp =\phi \). The uniform measure on \(B_d(p,\alpha \epsilon )\) is moved via x to \(B_{\mathbb {R}^n}(0,\alpha \epsilon )\) using the formula \(\sqrt{\det G}t_1 \ldots t_n\). This measure can then be pulled back to \(B_p(0,\alpha \epsilon )\) using \(\phi \). Since \(\phi \) is an inner product space isomorphism, it will be easy to deal with orthogonal transformations later, in Lemma 3.12

Remark 3.10

We used [23, Cor 2.3] in the proof above. In these notes the expansion of \(\sqrt{\det G(p,x)}\) is calculated around a point p in normal coordinates x centered around p:

$$\begin{aligned} \sqrt{\det G(p,x)} = 1 - \frac{1}{6}\text {Ric}(p)_{kl}x^kx^l+O\left( |x|^3\right) . \end{aligned}$$
(15)

As can be seen, there are no linear terms in the expansion. The coefficients for the quadratic terms are coefficients of the Ricci curvature of M in p. This implies that the way that the uniform distribution on a ball around p in M is pulled back to the tangent space via the exponential map depends on the curvature of M in p. In particular, if there is no curvature, M is locally isomorphic to a neighbourhood in \(\mathbb {R}^n\) so the same thing happens as in \(\mathbb {R}^n\). This means that we get a uniform distribution on a ball around 0 in the tangent space.

Remark 3.11

We will need in Proposition 3.13 that the statement of Lemma 3.9 holds uniformly in all points of the manifold. This means that the difference between the uniform measure on a ball in the tangent space and the pulled back uniform measure on a geodesic ball in the manifold decays quadratically with \(\epsilon \) uniformly in the manifold. Note that this uniform convergence is intuitively clear, since the difference between the two measures is caused by curvature and curvature is bounded in a compact manifold. As in the proof of Lemma 3.9, one needs to write

$$\begin{aligned} \sqrt{\det G}(\exp _p(x)) = 1 + h_p(x) \end{aligned}$$

for some function \(h_p\) that is \(O(|x|^2)\) independent of p. Here G(q) is the metric matrix at q expressed in (fixed) normal coordinates centered at p. Since and \(\det \) are uniformly continuous in the right domains, it suffices to show that

$$\begin{aligned} G(\exp _p(x)) = I + O(|x|^2), \end{aligned}$$
(16)

where the \(O(|x|^2)\) is independent of p. In other words,

$$\begin{aligned} ||G(\exp _p(x))-I||\le C||x||^2, \end{aligned}$$
(17)

where C does not depend on p. For all \(p\in M\) (and for any system of normal coordinates centered at p) we have the following Taylor expansion (note that for fixed \(p\,G(\exp _p(\cdot ))_{ij}\) is a map from a (subset of) \(\mathbb {R}^d\) to \(\mathbb {R}\)):

$$\begin{aligned} G(\exp _p(x))_{ij} = \delta _{ij} + \frac{1}{3}R_{ijkl}x^kx^l + \sum _{|\beta |=3} \frac{3}{\beta !}\int _0^1 (1-t)^2 D^\beta G(\exp _p(\cdot ))_{ij}(tx)\mathrm {d}t\cdot x^\beta . \end{aligned}$$
(18)

From this we get (17) directly for fixed p, i.e. we have

$$\begin{aligned} ||G(\exp _p(x))-I||\le C_p||x||^2. \end{aligned}$$

In order to obtain uniformity of \(C_p\) in p, we note that the functions of p and x appearing in the r.h.s. of (18) can be made smooth both in p and x. Smoothness in x is obvious (within the injectivity radius) and smoothness in p follows from a special choice of normal coordinates in such a way that they vary smoothly with p. A choice of normal coordinates is equivalent to a choice of an orthonormal basis, so one can construct smoothly varying normal coordinates by taking a smooth section of the orthonormal frame bundle (this can only be done locally, but it is enough to have the uniformity result locally, since then by compactness one has it globally). By compactness, the injectivity radius is bounded from below by some \(\delta >0\). Now for all \(p\in M\) and \(||x||<\delta \), (18) holds and (locally) the quantities on the r.h.s. vary smoothly and therefore (again by compactness) one can show that \(C:=\sup _p C_p\) is finite.

A Canonical Part Plus a Rest Term

Now define

$$\begin{aligned} \mu =\left( \frac{1}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n \right) \circ \phi \qquad \text { and } \qquad \mu _R=\left( \frac{h(\epsilon t)}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n \right) \circ \phi \end{aligned}$$

on \(B_p(0,\alpha )\) and 0 everywhere else. Then the lemma implies that (14) equals

$$\begin{aligned}&\int _{B_p(0,\alpha )} k(||\eta ||) (f(\exp _p(\epsilon \eta ))-f(p)) \epsilon ^n (\mu +\mu _R)(\mathrm {d}\eta ) \\&\quad =\epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) k(||\eta ||) (\mu +\mu _R)(\mathrm {d}\eta ). \end{aligned}$$

Recall that \(p(\epsilon ,\eta )\) is just notation for following the geodesic from p in the direction of \(\eta \) for time \(\epsilon \). Now we define \(\mu ^k=k(||\cdot ||)\mu \) (so the measure which has density \(k(||\cdot ||)\) with respect to \(\mu \)) and analogously \(\mu ^k_R=k(||\cdot ||)\mu _R\). Then we can write the integral above as

$$\begin{aligned} \epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) (\mu ^k+\mu _R^k)(\mathrm {d}\eta ). \end{aligned}$$

In this way we transformed the integral to one that we worked with in Sect. 2.1 since we wrote it as the generator of a geodesic random walk (see \(L_N\) on page 2). To use the theory that we obtained in that section, we need the following lemma. It tells us that \(\mu ^k\) can be used as a stepping distribution for a geodesic random walk and it gives us the constant speed of the Brownian motion to which it converges (see Sect. 2.2).

Lemma 3.12

\(\mu ^k\) is canonical. Moreover \(\int _{T_pM} ||\eta ||^2 \mu ^k(\mathrm {d}\eta )=\frac{2\pi ^{n/2}}{V(M)\Gamma (n/2)}\int _0^\infty k(r)r^{n+1}\mathrm {d}r\).

Proof

First of all recall that k is continuous and compactly supported, so the integral over k above makes sense and is finite. Define \(\nu =\frac{1}{V(M)} \mathrm {d}t^1 \ldots \mathrm {d}t^n\) on \(B_{\mathbb {R}^n}(0,\alpha )\) and 0 everywhere else. Then we can write \(\mu =\nu \circ \phi \). Since \(\phi \) preserves the norm, we see that \(k(||\cdot ||_{T_pM})\circ \phi ^{-1}=k(||\cdot ||_{\mathbb {R}^n})\). This means that \(\mu ^k=\nu ^k\circ \phi \), where \(\nu ^k:=k(||\cdot ||)\nu \). Since \(\phi \) preserves the inner product, the measure \(\mu ^k\) behaves the same with respect to orthogonal transformations in \(T_pM\) as \(\nu ^k\) with respect to orthogonal transformations in \(\mathbb {R}^n\). Since \(\nu ^k\) is clearly preserved under such transformations, so is \(\mu ^k\). This shows that \(\mu ^k\) is canonical.

Now we calculate the corresponding constant.

$$\begin{aligned} \int _{T_pM} ||\eta ||_{T_pM}^2\mu ^k(\mathrm {d}\eta )= & {} \int _{T_pM} ||v_p||_{T_pM}^2\mu ^k(\mathrm {d}v_p) = \int _{\mathbb {R}^n} ||\phi ^{-1}(v)||_{T_pM}^2\nu ^k(\mathrm {d}v)\\= & {} \int _{\mathbb {R}^n} ||v||_{\mathbb {R}^n}^2\nu ^k(\mathrm {d}v) = \frac{1}{V(M)} \int _{B_{\mathbb {R}^n}(0,\alpha )} ||v||_{\mathbb {R}^n}^2k(||v||_{\mathbb {R}^n})\mathrm {d}v\\= & {} \frac{1}{V(M)}\int _0^\alpha r^2 k(r) \frac{2\pi ^{n/2}}{\Gamma (n/2)}r^{n-1}\mathrm {d}r\\= & {} \frac{2\pi ^{n/2}}{V(M)\Gamma (n/2)}\int _0^\infty k(r)r^{n+1}\mathrm {d}r \end{aligned}$$

The first step was just writing the integral with respect to the coordinates for which we defined \(\mu \). The second step holds because \(\mu ^k=\nu ^k\circ \phi \). The third uses the fact that \(\phi \) preserves the norm. The penultimate step is a change of coordinates in \(\mathbb {R}^n\) using the fact that ||v|| is constant on spheres around the origin. Here \(\frac{2\pi ^{n/2}}{\Gamma (n/2)}r^{n-1}\) is the area of \(rS_{n-1}\). In the last step we used that \(\mathrm {supp}(k)\subset [0,\alpha ]\). \(\square \)

Conclusion

We use everything above to obtain the statement that we aim for.

Proposition 3.13

Set

$$\begin{aligned} C=\frac{\pi ^{n/2}}{V(M)n\Gamma (n/2)}\int _0^\infty k(r)r^{n+1}\mathrm {d}r. \end{aligned}$$

Then as \(\epsilon \rightarrow 0\) we have uniformly in \(p\in M\):

$$\begin{aligned} \epsilon ^{-2-n} \int _M k(d(p,q)/\epsilon ) \left[ f(q)-f(p)\right] {\bar{V}} (\mathrm {d}q) \longrightarrow C \Delta _M f(p). \end{aligned}$$

Proof

Let \(p\in M\). We can write

$$\begin{aligned}&\int _M k(d(p,q)/\epsilon )(f(q)-f(p)) {\bar{V}}(\mathrm {d}q) = \epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) (\mu ^k+\mu ^k_R)(\mathrm {d}\eta )\\&\quad =\epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) \mu ^k(\mathrm {d}\eta )\\&\qquad +\, \epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p))^2 \mu _R^k(\mathrm {d}\eta ). \end{aligned}$$

From the results in Sects. 2.1 and 2.2 (Proposition  2.14) and Lemma 3.12, we see for the first term uniformly in p

$$\begin{aligned}&\lim _{\epsilon \downarrow 0} \frac{1}{\epsilon ^{2+n}} \epsilon ^n \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) \mu ^k(\mathrm {d}\eta ) \\&\quad = \lim _{\epsilon \downarrow 0}\frac{1}{\epsilon ^{2}} \int _{T_pM} (f(p(\epsilon ,\eta ))-f(p)) \mu ^k(\mathrm {d}\eta )\\&\quad =\frac{1}{n} \frac{2\pi ^{n/2}}{V(M)\Gamma (n/2)}\int _0^\infty k(r)r^{n+1}\mathrm {d}r\cdot \frac{1}{2} \Delta _M f(p)=C \Delta _M f(p). \end{aligned}$$

Now it suffices to show that the second term goes to zero at a rate independent of p. Let \(\epsilon '',K>0\) such that \(\epsilon ''<\epsilon '\) and \(|h(s)|<K||s||^2\) for \(s\in B_{\mathbb {R}^n}(0,\epsilon '')\) (where both \(\epsilon '\) and h are from Lemma 3.9). We need Remark 3.11 to make sure that K and \(\epsilon ''\) do not depend on p. Now note that for \(\epsilon <\epsilon ''\):

$$\begin{aligned} |\mu _R|\le & {} \left( \sup _{t\in B_{\mathbb {R}^n}(0,1)}|h(\epsilon t)|\right) \mu \le \left( \sup _{t\in B_{\mathbb {R}^n}(0,1)} K||\epsilon t||^2\right) \mu = \left( \sup _{t\in B_{\mathbb {R}^n}(0,1)} K\epsilon ^2 ||t||^2\right) \mu \\= & {} K\epsilon ^2\mu . \end{aligned}$$

Now we see:

$$\begin{aligned}&\lim _{\epsilon \downarrow 0} \frac{1}{\epsilon ^{2+n}} \epsilon ^n \left| \int _{T_pM} f(p(\epsilon ,\eta ))-f(p) \mu ^k_R(\mathrm {d}\eta )\right| \\&\quad \le \lim _{\epsilon \downarrow 0} \frac{1}{\epsilon ^{2}} \int _{T_pM} \left| f(p(\epsilon ,\eta ))-f(p)\right| k(||\eta ||) |\mu _R|(\mathrm {d}\eta )\\&\quad \le \lim _{\epsilon \downarrow 0} \frac{1}{\epsilon ^{2}} \int _{T_pM} d(p(\epsilon ,\eta ),p)L_f k(||\eta ||) K\epsilon ^2 \mu (\mathrm {d}\eta ) \le L_f K \lim _{\epsilon \downarrow 0} \int _{T_pM} \epsilon ||\eta || k(||\eta ||) \mu (\mathrm {d}\eta ) \\&\quad = L_f K \int _{T_pM} ||\eta || k(||\eta ||) \mu (\mathrm {d}\eta )\lim _{\epsilon \downarrow 0}\epsilon = 0, \end{aligned}$$

where we used that the integral is finite since k is bounded and has support in \([0,\alpha ]\). Combining everything above gives what we wanted. \(\square \)

3.4 Example Grid

So far, we have seen that a sequence of grids is suitable for the hydrodynamic limit problem if the empirical distributions converge to the uniform distribution in the Kantorovich topology. We conclude by giving examples of such grids. To be more precise, we show that if one constructs a grid by adding uniformly sampled points from the manifold, this grid is suitable with probability 1.

Remark 3.14

(Comparison with standard grids) Recall the grids \(S^N\) on the one-dimensional torus S from Remark 3.2. We can show that the empirical measures corresponding to these grids along the subsequence \(N=2^m, m=0,1,2, \ldots \) converge to the uniform measure on S with respect to the Kantorovich distance. To this end let \(N=2^m\) be fixed, call the corresponding empirical measure \(\mu ^N\) and call the uniform measure \(\lambda \). Recall that the Kantorovich distance between these measures is alternatively given by

$$\begin{aligned} W_1(\mu ^N,\lambda )=\inf _{\gamma \in \Gamma (\mu ^N,\lambda )} \int _{S\times S} d(x,y) \gamma (dx,dy), \end{aligned}$$

where \(\Gamma (\mu ^N,\lambda )\) is the set of all couplings of \(\mu ^N\) and \(\lambda \). Now let Y be a uniform random variable on S and define

$$\begin{aligned} X=k/N \iff Y\in \left[ \frac{k-1/2}{N},\frac{k+1/2}{N}\right) . \end{aligned}$$

Denote the joint distribution of (XY) by \(\nu \). Then it is easy to see that \(\nu \in \Gamma (\mu ^N,\lambda )\). This implies that

$$\begin{aligned} W_1(\mu ^N,\lambda )\le \int _{S\times S} d(x,y) \nu (dx,dy) = \mathbb {E}_\nu (d(X,Y)) \le \frac{1}{2N}. \end{aligned}$$

This implies convergence with respect to the Kantorovich metric along the subsequence \(N=2^m, m=0,1,2, \ldots \). Note, however, that the corresponding edge weights as described in this section are not the same as those in Remark 3.2.

Convergence of a Random Grid

Now we move back to the general case of a compact and connected n-dimensional Riemannian manifold M. Let \((P_n)_{n=1}^\infty \) be a sequence of iid uniformly random points of M. Define \( \mu ^N=\frac{1}{N}\sum _{i=1}^N \delta _{P_i}\). We follow [21, Example 5.15] to show that \(W_1(\mu ^N,{\bar{V}})\rightarrow 0\) as \(N\rightarrow \infty \). First we will show that the expectation goes to 0, then we will derive that it goes to 0 almost surely.

For now, let N be fixed. Let \(\mathscr {F}_1\) be the set of Lipschitz function on M with Lipschitz constant \(\le 1\). Then we define for \(f\in \mathscr {F}_1\) the random variable \(X_f=\mu ^Nf-\bar{V} f\). Note that both \(\mu ^N\) and \({\bar{V}}\) are probability distributions, so \(X_f(\omega )\) is Lipschitz in f for each \(\omega \):

$$\begin{aligned} |X_f-X_g|=|\mu ^Nf-{\bar{V}} f-(\mu ^Ng-{\bar{V}} g)|\le |\mu ^N(f-g)|+|\bar{V}(f-g)|\le 2||f-g||_\infty . \end{aligned}$$

Now note that since f has Lipschitz constant \(\le 1\):

$$\begin{aligned} \sup _{p\in M}f(p)-\inf _{q\in M}f(q) = \sup _{p,q\in M} |f(p)-f(q)|\le \sup _{p,q\in M}d(p,q) =: K. \end{aligned}$$

M is compact, so \(K<\infty \). Since adding constants to f does not change \(X_f\), it suffices to consider \(f\in \mathscr {F}_{1,K}=\{g\in \mathscr {F}_1: 0\le g \le K\}\). It follows that for each \(f\in \mathscr {F}_{1,K}\) by writing

$$\begin{aligned} X_f = \sum _{i=1}^N \frac{f(X_i)-{\bar{V}} f}{n}, \end{aligned}$$

we see that it is a sum of iid random variables taking values in \([-\frac{K}{N},\frac{K}{N}]\). By the Azuma-Hoeffding inequality, this implies that \(X_f\) is \(\frac{K^2}{N}\)-subgaussian for each \(f\in \mathscr {F}_{1,K}\). Now [21, Lemma 5.7] shows that

$$\begin{aligned} \mathbb {E}[W_1(\mu ^N,{\bar{V}})]\le \inf _{\epsilon >0}\left\{ 2\epsilon +\sqrt{\frac{2K^2}{N} \log N(W,||\cdot ||_\infty ,\epsilon )}\right\} , \end{aligned}$$

where \(N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\) is the minimal number of points in some space containing \(\mathscr {F}_{1,K}\) such that the balls of radius \(\epsilon \) with respect to the uniform distance around those points cover \(\mathscr {F}_{1,K}\).

Estimating the Covering Number \(N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\)

We now need to estimate this covering number. To do this we need an upper bound of the covering number \(N(M,d,\epsilon )\) of M. Since M is compact there exist \(a,\delta >0\) such that for all \(0<\epsilon <\delta \): \(N(M,d,\epsilon )\le a\epsilon ^{-d}\) (see for instance [16, Lemma 4.2]). Using this we can prove the following.

Lemma 3.15

There is a \(c>0\) such that for all \(0<\epsilon <\delta \): \(N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\le \exp {c/\epsilon ^d}.\)

Proof

Fix \(\epsilon >0\) and call \(m=N(M,d,\epsilon /4)\). By definition of this number, we can find points \(p_1, \ldots ,p_m\in M\) such that \(\cup _{i=1}^m B(p_i,\epsilon /4)\supset M\). Now define \(V_1=B(p_1,\epsilon /4)\) and for \(i\ge 2\): \(V_i=B(p_i,\epsilon /4)\setminus \cup _{j=1}^{i-1}V_j\). Now for \(f\in \mathscr {F}_{1,K}\), define \(\pi ^f:M\rightarrow \mathbb {R}\) by

$$\begin{aligned} \pi ^f: V_i\ni p\mapsto \epsilon \left( \left\lfloor \frac{f(p_i)}{\epsilon }\right\rfloor +\frac{1}{2}\right) . \end{aligned}$$

Since each \(p\in M\) is contained in exactly one \(V_i\) (by construction), this map is well-defined. Note that if \(k\epsilon \le f(p_i)<(k+1)\epsilon \), then \(\pi ^f=(k+1/2)\epsilon \) on \(V_i\). In particular clearly \(|f(p_i)-\pi ^f(p_i)|\le \epsilon /2\). Now denote \(Y=\{\pi ^f|f\in \mathscr {F}_{1,K}\}\).

Now fix \(f\in \mathscr {F}_{1,K}\) and \(p\in M\). Let i be such that \(p\in V_i\). Then we see:

$$\begin{aligned} |\pi ^f(p)-f(p)|= & {} |\pi ^f(p_i)-f(p)|\le |\pi ^f(p_i)-f(p_i)|+|f(p_i)-f(p)|\\\le & {} \epsilon /2 + L_fd(p_i,p) \\\le & {} \epsilon /2+\epsilon /4< \epsilon . \end{aligned}$$

This shows that \(||\pi ^f-f||_\infty \le \epsilon \), which implies that Y is an \(\epsilon \)-net for \(\mathscr {F}_{1,K}\). Hence \(N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\le \#Y\).

All we have to do now is estimate \(\#Y\).

First of all let \(\pi ^f\in Y\). Note that if \(d(p_i,p_j)\le \epsilon /2\), we see

$$\begin{aligned} |\pi ^f(p_i)-\pi ^f(p_j)|\le & {} |\pi ^f(p_i)-f(p_i)|+|f(p_i)-f(p_j)|+|f(p_j)-\pi ^f(p_j)|\\\le & {} \epsilon /2 + L_fd(p_i,p_j)+\epsilon /2=3\epsilon /2. \end{aligned}$$

Since \(|\pi ^f(p_i)-\pi ^f(p_j)|=k\epsilon \) for some \(k\in \mathbb {Z}\), we conclude \(|\pi ^f(p_i)-\pi ^f(p_j)|\in \{-\epsilon ,0,\epsilon \}\), so \(\pi ^f(p_i)\in \{\pi ^f(p_j)-\epsilon ,\pi ^f(p_j),\pi ^f(p_j)+\epsilon \}.\)

Now define a graph G with vertices \(p_1,\ldots ,p_m\) by putting an edge between \(p_i\) and \(p_j\) whenever \(d(p_i,p_j)\le \epsilon /2\). Any \(\pi ^f\) is uniquely specified by its values on the nodes of G. Note further that whenever we know \(\pi ^f\) for some point of the graph, there are only 3 possible values left for each of its neighbours (since neighbours are at distance at most \(\epsilon /2\)). Now \(\#Y\) is dominated by the amount of ways in which we can assign values of the type \((k+1/2)\epsilon \) to nodes of G while keeping this restriction into account. Define, for \(i\le 0\), \(S_i=\{p\in G: d_G(p_1,p)=i\}\), where \(d_G(p,q)\) denotes the minimum amount of edges that need to be followed to walk from p to q in G. Now we can start counting.

For \(p_1\), there are at most \(\left\lceil K/\epsilon \right\rceil \) possible values (recall that any \(f\in \mathscr {F}_{1,K}\) has \(0\le f\le K\)). Each node in \(S_1\) is a distance at most \(\epsilon /2\) from \(p_1\), so each node can take at most 3 values. This brings the possible amount of value assignments to (less than) \(\left\lceil K/\epsilon \right\rceil 3^{\#S_1}\). Now each node in \(S_2\) is at distance at most \(\epsilon /2\) of a node in \(S_1\), so each of these can take at most 3 different values. This brings the number of options so far to at most \(\left\lceil K/\epsilon \right\rceil 3^{\#S_1}3^{\#S_2}\). Continuing in this way, we obtain that the number of ways to assign values is at most

$$\begin{aligned} \left\lceil \frac{K}{\epsilon } \right\rceil \prod _{i=1}^{\infty }3^{\#S_i} = \left\lceil \frac{K}{\epsilon } \right\rceil 3^{\sum _{i=1}^{\infty }\#S_i} = \left\lceil \frac{K}{\epsilon } \right\rceil 3^{m-1} = \left\lceil \frac{K}{\epsilon } \right\rceil 3^{N(M,d,\epsilon /4)-1}. \end{aligned}$$

Recall that m is the total amount of balls as we defined at the beginning of the proof, which we chose equal to \(N(M,d,\epsilon /4)\). Now we know that for \(0<\epsilon <\delta \)

$$\begin{aligned} N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\le \left\lceil \frac{K}{\epsilon } \right\rceil 3^{a/(\epsilon /4)^d-1} = \left\lceil \frac{K}{\epsilon } \right\rceil 3^{a4^d/\epsilon ^d-1}. \end{aligned}$$

This implies that there exists \(c>0\) such that for all \(0<\epsilon <\delta \), \(N(\mathscr {F}_{1,K},||\cdot ||_\infty ,\epsilon )\le \mathrm {e}^{c/\epsilon ^d}.\)\(\square \)

Now we see that for any \(0<\epsilon <\delta :\)

$$\begin{aligned} \mathbb {E}[W_1(\mu ^N,{\bar{V}})]\le 2\epsilon + \sqrt{\frac{2K^2}{N} \log \exp {c/\epsilon ^d}} = 2\epsilon +\sqrt{\frac{2cK^2}{N}}\epsilon ^{-d/2}. \end{aligned}$$

Elementary methods show that this value takes a minimum at \(\epsilon =c_0N^{\frac{-1}{d+2}}\) where \(c_0\) is some constant (take N large enough such that \(c_0N^{\frac{-1}{d+2}}<\delta \)). This shows that the optimal bound that we get is

$$\begin{aligned} 2c_0N^{\frac{-1}{d+2}}+\sqrt{\frac{2cK^2}{N}}\left( c_0N^{\frac{-1}{d+2}}\right) ^{-d/2} = 2c_0N^{\frac{-1}{d+2}} +c_1 N^{\frac{-1}{d+2}} \end{aligned}$$

where \(c_1\) is the product of some constants that don’t depend on N. This shows that

$$\begin{aligned} \mathbb {E}[W_1(\mu ^N,{\bar{V}})]\le (2c_0 +c_1) N^{\frac{-1}{d+2}} \rightarrow 0 \end{aligned}$$

as \(n\rightarrow \infty \).

Convergence a.s.

It remains to show that \(W_1(\mu ^N,{\bar{V}})\) goes to zero almost surely. For a function \(f:M^N\rightarrow \mathbb {R}\) define

$$\begin{aligned} D_if(p_1,\ldots ,p_N)= & {} \sup _{z\in M} f(p_1,\ldots ,p_{i-1},z,p_{i+1},\ldots ,p_N) \\&- \inf _{z\in M} f(p_1,\ldots ,p_{i-1},z,p_{i+1},\ldots ,p_N). \end{aligned}$$

Further, define the function \(H:M^N\rightarrow \mathbb {R}\) by

$$\begin{aligned} (p_1,\ldots ,p_N)\mapsto \sup _{g\in \mathscr {F}_1} \left\{ \frac{1}{N}\sum _{i=1}^N g(p_i) - \int _M g\mathrm {d}{\bar{V}}\right\} . \end{aligned}$$

Note that \(H(p_1,\ldots ,p_N)=W_1(\mu ^N,{\bar{V}})\).

Lemma 3.16

Set (as before) \(K=\sup _{p,q\in M}d(p,q)\). Then for each \(1\le j\le N\): \(||D_jH||_\infty \le K/N\).

Proof

Let \(1\le j\le N\) and fix \(p_1,\ldots ,p_N\). Denote for \(p\in M\) and \(g\in \mathscr {F}_1\)

$$\begin{aligned} J^j(g,p)=\frac{1}{N}\left( \sum _{i=1,i\ne j}^N g(p_i)+g(p)\right) -\int _Mg\mathrm {d}{\bar{V}}. \end{aligned}$$

Now let \(p,q\in M\). Then for any \(g\in \mathscr {F}_1\):

$$\begin{aligned} |J^j(g,p)-J^j(g,q)| = \frac{1}{N} |g(p)-g(q)| \le \frac{1}{N} d(p,q) \le \frac{K}{N}. \end{aligned}$$

This shows that \(g\mapsto J^j(g,p)\) and \(g\mapsto J^j(g,q)\) are always at most K / N apart from each other, which implies that

$$\begin{aligned} \left| \sup _{g\in \mathscr {F}_1}J^j(g,p)-\sup _{g\in \mathscr {F}_1}J^j(g,q)\right| \le \frac{K}{N}. \end{aligned}$$

Now

$$\begin{aligned} D_iH(p_1,\ldots ,p_N)= & {} \sup _{p\in M} H(p_1,\ldots ,p_{i-1},p,p_{i+1},\ldots ,p_N) \\&- \inf _{q\in M} H(p_1,\ldots ,p_{i-1},q,p_{i+1},\ldots ,p_N)\\= & {} \sup _{p,q\in M} | H(p_1,\ldots ,p_{i-1},p,p_{i+1},\ldots ,p_N) \\&- H(p_1,\ldots ,p_{i-1},q,p_{i+1},\ldots ,p_N)|\\= & {} \sup _{p,q\in M} \left| \sup _{g\in \mathscr {F}_1}J^j(g,p)-\sup _{g\in \mathscr {F}_1}J^j(g,q)\right| \le \frac{K}{N}. \end{aligned}$$

Since \(P_1,\ldots ,P_N\) were arbitrary, we conclude that \(||D_jH||_\infty \le \frac{K}{N}\). \(\square \)

Now we are in position to prove the main result.

Proposition 3.17

\(W_1(\mu ^N,{\bar{V}})\rightarrow 0\) almost surely as \(N\rightarrow \infty \).

Proof

Since \(P_1, \ldots ,P_N\) are independent, (van Handel [21], Theorem 3.11) gives us that for any \(t>0\)

$$\begin{aligned} \mathbb {P}(W_1(\mu ^N,{\bar{V}})-\mathbb {E}W_1(\mu ^N,{\bar{V}})>t)= & {} \mathbb {P}\left( H(P_1, \ldots ,P_N)-\mathbb {E}H(P_1,\ldots ,P_N)>t\right) \\\le & {} \exp \left( \frac{-2t^2}{\sum _{k=1}^N||D_kH||_\infty ^2}\right) \le \exp \left( \frac{-2t^2N}{K^2}\right) , \end{aligned}$$

where the last inequality follows from Lemma 3.16. For reasons of symmetry we obtain

$$\begin{aligned} \mathbb {P}\left( \left| W_1(\mu ^N,{\bar{V}})-\mathbb {E}W_1(\mu ^N,{\bar{V}})\right| >t\right) \le 2 \exp \left( \frac{-2t^2N}{K^2}\right) . \end{aligned}$$

By a standard application of the Borel-Cantelli lemma, this implies that \(W_1(\mu ^N,{\bar{V}})-\mathbb {E}W_1(\mu ^N,{\bar{V}})\rightarrow 0\) a.s. Since we have already seen that \(\mathbb {E}W_1(\mu ^N,{\bar{V}})\rightarrow 0\), we conclude that a.s. as \(N\rightarrow \infty \)

$$\begin{aligned} W_1(\mu ^N,{\bar{V}})\rightarrow 0. \end{aligned}$$

\(\square \)

We conclude that sampling uniformly from the manifold yields a suitable grid with probability 1.

4 Hydrodynamic Limit of the SEP

In Sect. 3 we showed the existence of uniformly approximating grids. In this section we will apply such grids. We will use it to define an interacting particle system on the manifold. Then we will show that this interacting particle system has a hydrodynamic limit and that this limit satisfies the heat equation (the precise formulation is given in Theorem 4.2). We follow a standard method that is used in (Seppäläinen [17], Chap. 8) for the Euclidean case.

Now let \((G_N,W_N)_{N=1}^\infty \) be a sequence of uniformly approximating grids with corresponding weights. Recall that this means the following. There is a sequence \((p_n)_{n=1}^\infty \) in M such that \(G^N=\{p_1,\ldots ,p_N\}\). On each \(G^N\), there is a random walk \(X^N\) which jumps from \(p_i\) to \(p_j\) with (symmetric) rate \(W^N_{ij}\). We assume that there exists some function \(a:\mathbb {N}\rightarrow [0,\infty )\) and some constant \(C>0\) such that for each smooth \(\phi \)

$$\begin{aligned} a(N)\sum _{j=1}^NW^N_{ij}(\phi (p_j)-\phi (p_i))\longrightarrow C\Delta _M\phi (p_i)\quad (N\rightarrow \infty ) \end{aligned}$$

where the convergence is in the sense that for all smooth \(\phi \)Footnote 1

$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N}\sum _{i=1}^N \left| a(N)\sum _{j=1}^NW^N_{ij}(\phi (p_j)-\phi (p_i))- C\Delta _M\phi (p_i)\right| = 0. \end{aligned}$$
(19)

By dividing a(N) by C if necessary, we can assume that \(C=1\).

Remark 4.1

Note that for the result of this section it is not necessary to construct grids from a sequence. Any sequence of finite grids such that (19) holds would do. However, since the grid that we constructed in Sect. 3 is of this form and this section partially serves as an example of the application of that grid, we formulate our results in this section in the same way.

4.1 Symmetric Exclusion Process

The Symmetric Exclusion Process (SEP) is an interacting particle system that was introduced in Spitzer [19] and studied in detail in (Liggett [15], Chap. 8). The idea is that there is some (possibly countably infinite) amount of particles on a (possibly countably infinite) graph G. The particles are considered identical. Each particle jumps after independent exponential times with parameter 1 from x to y with probability p(xy), provided that the place that it wants to jump to is not already occupied. Otherwise, the jump is suppressed. We assume that \(p(x,y)=p(y,x)\). Let \(\eta _t\in \{0,1\}^G\) denote the configuration of the particles at time t, i.e. \(\eta _t(x)=1\) if there is a particle at place \(x\in G\) at time t and 0 else. We will sometimes write \(\eta (p,t)=\eta _t(p)\). For any configuration \(\eta \) and points xy define \(\eta ^{xy}\) by

$$\begin{aligned} \eta ^{xy}(z)={\left\{ \begin{array}{ll} \eta (x) &{} \text { if } z=y\\ \eta (y) &{} \text { if } z=x\\ \eta (z) &{} \text { if } z\ne x,y \end{array}\right. } \end{aligned}$$

An equivalent description of this process is the following. All edges (xy) have independent exponential clocks with rate \(p(x,y)=p(y,x)\). Whenever a clock rings, the particles that are at either side of the corresponding edge jump along the edge. This means that if there are no particles, nothing happens. If there is one particle, it jumps. If there are two particle, they switch places. Since we are not interested in individual particles, the configuration stays the same in the latter case. Note that in this way there can never be more than two particles at the same place. Using the notation introduced above, we see that the generator of this process is defined on the core of local functions as

$$\begin{aligned} Lf(\eta )=\frac{1}{2}\sum _{x,y} p(x,y) (f(\eta ^{xy})-f(\eta )). \end{aligned}$$

The factor \(\frac{1}{2}\) is there since we count every edge twice.

The Process

We now define the SEP \(\eta ^{N}=(\eta ^{N}_t)_{t\ge 0}\) on \(G^N\) through the generator

$$\begin{aligned} L^{N}h(\eta )=\frac{a(N)}{2}\sum _{i,j=1}^NW_{ij}^N (h(\eta ^{ij})-h(\eta )), \quad h: \{0,1\}^{G^N}\rightarrow \mathbb {R}. \end{aligned}$$

Here \(\eta ^{ij}:=\eta ^{p_ip_j}\). It follows from our considerations above that this process describes particles that perform independent random walks according to \(X^N\) with the restriction that jumps to occupied sites are suppressed.

Let \((X_i)_{i=1}^\infty \) be some sequence of (possibly degenerate) random variables taking values in \(\{0,1\}\). Set as the initial configuration \(\eta ^{N}_0(p_i)=X_i\).

4.2 Hydrodynamic Limit

We will use this subsection to give the basic definitions that describe the idea of a hydrodynamic limit. At a microscopic scale, the particles are just random walkers with some interaction, but at the macroscopic scale (where limits are taken in space and time), the behaviour is deterministic: it is described by a partial differential equation (in our case the heat equation).

Path Space

Now write R(M) for the space of Radon measures on M with the vague topology and let \(D=D([0,\infty ),R(M))\) denote the space of all paths \(\gamma :[0,\infty )\rightarrow R(M)\) such that \(\gamma \) is right continuous and has left limits. On this space we can define the Skohorod metric (see for instance [17, Appendix A.2.2]). Since R(M) is a Polish space, it can be shown that D with the Skohorod metric is a Polish space too.

Initial Conditions and Trajectories of Particle Configurations

Define

$$\begin{aligned} \mu _t^{N}=\frac{1}{N}\sum _{i=1}^N\delta _{p_i}\eta ^{N}_{t}(p_i), \end{aligned}$$

where \(\delta _p\) is the Dirac measure which places mass 1 at \(p\in M\). It puts a point mass at each particle and rescales it by the amount of possible positions, which represents the particle configuration \(\eta ^{N}_t\) at time t. In particular \(\mu _t^{N}\) is a sub-probability measure and is in R(M).

Instead of dealing with this problem pointwise for each t, we will look at trajectories. As the particles move according to the SEP, \(\gamma ^{N}:[0,\infty )\rightarrow R(M)\) defined by \(t\mapsto \mu _t^{N}\) is a random trajectory and hence a random element of D. It represents the positions of the particles over time. The initial configuration \(X_1,\ldots ,X_N\) and the dynamics of the SEP determine a distribution \(Q^{N}\) on D. In this way we obtain a sequence \((Q^{N})_{N=0}^\infty \) of measures on D.

Assumption on the Initial Configuration

We assume that there exists a measurable function \(\rho _0:M\rightarrow \mathbb {R}\) such that \(0\le \rho _0\le 1\) and \(\mu _0^{N}\) converges vaguely to \(\rho _0\mathrm {d}{\bar{V}}\) in probability, i.e. for any continuous \(\phi \) as \(N\rightarrow \infty \):

$$\begin{aligned} \int _M \phi \mathrm {d}\mu _0^{N} \rightarrow \int _M \rho _0\phi \mathrm {d}{\bar{V}} \qquad \text { in probability.} \end{aligned}$$
(20)

If this is the case, we say that \(\rho _0\mathrm {d}V\) is the density profile corresponding to the configurations \(\eta _0^{N}\). Note that using measures here to represent the particles provides a bridge between separate particles (discrete measures) and density profiles (measures that are absolutely continuous with respect to V). We would like to show that if this initial condition is given, then at any time t the configurations \(\eta _t^{N}\) have a corresponding density profile \(\rho _t\mathrm {d}{\bar{V}}\). Moreover, we want to show that \(t\mapsto \rho _t\) solves the heat equation with initial condition \(\rho _0\).

Example of Initial Distribution

Suppose for now that the \(p_i\)’s are such that for any continuous f: \(\frac{1}{N}\sum _{i=1}^Nf(p_i)\rightarrow \int _Mf\mathrm {d}{\bar{V}}\).Footnote 2 Define the random variables \((X_i)_{i=1}^\infty \) to be independent Bernoulli random variables with \(\mathbb {E}X_i = \rho _0(p_i)\) for some continuous function \(\rho _0:M\rightarrow \mathbb {R}\) with \(0\le \rho _0\le 1\). Then we see as \(N\rightarrow \infty \):

$$\begin{aligned} \mathbb {E}\left[ \int \phi \mathrm {d}\mu _0^{N}\right]= & {} \mathbb {E}\left[ \frac{1}{N} \sum _{i=1}^N \phi (p_i)\eta _0^{N}(p_i)\right] = \frac{1}{N} \sum _{i=1}^N \phi (p_i)\mathbb {E}\eta _0^{N}(p_i)\\= & {} \frac{1}{N} \sum _{i=1}^N \phi (p_i)\rho _0(p_i)\rightarrow \int \phi \rho _0\mathrm {d}{\bar{V}}, \end{aligned}$$

since \(\phi \) and \(\rho _0\) are continuous. Further,

$$\begin{aligned} \mathrm {var}\left[ \int \phi \mathrm {d}\mu _0^{N}\right]= & {} \mathrm {var}\left[ \frac{1}{N} \sum _{i=1}^N \phi (p_i)\eta _0^{N}(p_i)\right] = \frac{1}{N^2} \sum _{i=1}^N \phi (p_i)\mathrm {var}(\eta _0^{N}(p_i)) \\= & {} \frac{1}{N^2} \sum _{i=1}^N \phi (p_i)\rho _0(p_i)(1-\rho _0(p_i))\rightarrow 0. \end{aligned}$$

Together this implies that (20) holds here for any continuous \(\phi \).

Main Result

After all these definitions, we can state the main result of this section.

Theorem 4.2

Let M be a complete, n-dimensional, connected Riemannian manifold and let \((G_N,W_N)_{N=1}^\infty \) be a sequence of uniformly approximating grids with corresponding weights. Let \(\eta ^N_t\) be particle configurations that behave according to the SEP on \((G_N,W_N)\) and let \(\mu ^N_t\) be its measure valued representation. Suppose that \(\mu _0^N\) has density profile \(\rho _0\mathrm {d}V\) for some measurable function \(\rho _0\). Then the trajectory \(t\mapsto \mu ^N_t\) converges in probability to the trajectory \(t\mapsto \rho _t\mathrm {d}V\) in the Skohorod topology, where \(t\mapsto \rho _t\) satisfies the heat equation on M with initial condition \(\rho _0\).

4.3 Convergence Result

Dynkin Martingale

The proof of the hydrodynamic limit follows the line of (Seppäläinen [17], Chap. 8) which is a canonical method that is also discussed in  Kipnis and Landim [13]. However, in our context, there are several new technical difficulties along the way which we have to tackle. Its core calculations are based on the following Dynkin martingale result. It is a standard result and it is also proved in Seppäläinen [17]. We will formulate it in terms of our situation on a compact Riemannian manifold.

Proposition 4.3

Let \(\{\eta _t,t\ge 0\}\) be a Feller process on a compact Riemannian manifold with generator L and semigroup \(S_t\). For any function f such that both f and \(f^2\) are in D(L), define

$$\begin{aligned} M_t=f(\eta _t)-f(\eta _0) - \int _0^t Lf(\eta _s)\mathrm {d}s. \end{aligned}$$

Then \(M_t\) is a martingale with respect to the filtration \(\mathscr {F}_t=\sigma \{\eta _r,r\le t\}\). Moreover, its quadratic variation \(\left<M,M\right>_t\) equals \(\int _0^t\gamma (s)\mathrm {d}s\), where \(\gamma (s)=(L(f^2)-2fLf)(\eta _s)\).

Application of the Proposition

First of all fix a smooth function \(\phi \) on M. Define for \(\eta \in \{0,1\}^{G^N}\): \(f^N(\eta )=\frac{1}{N}\sum _{i=1}^N \eta (p_i)\phi (p_i)=\mu (\phi )\), where \(\mu =\frac{1}{N}\sum _{i=1}^n\delta _i\eta (p_i)\). Note that since \(L^{N}\) is the generator of a random walk on a the finite space of configurations, its domain consists of all functions on those configurations, so in particular \(f^N\) and \((f^N)^2\) are in it. Applying Theorem 4.3 in this situation shows that \(M^{N}\) defined by

$$\begin{aligned} M^{N}_{t}=f^N(\eta ^{N}_{t})-f^N(\eta ^{N}_0) - \int _0^{t} L^{N}f(\eta ^{N}_s)\mathrm {d}s \end{aligned}$$
(21)

is a martingale with quadratic variation \(\left<M^{N},M^{N}\right>_t=\int _0^t\gamma (s)\mathrm {d}s\), where \(\gamma (s)=(L^{N}(f^N)^2-2f^NL^{N}f^N)(\eta _s)\). Some basic manipulations show that

$$\begin{aligned} f^N(\eta ^{ij})-f^N(\eta )=-\frac{1}{N}(\phi (p_j)-\phi (p_i))(\eta (p_j)-\eta (p_i). \end{aligned}$$
(22)

Inserting definitions and leaving out some indexes (to keep everything clear) shows that the right hand side of (21) equals

$$\begin{aligned}&\frac{1}{N}\sum _{i=1}^N\phi (p_i)(\eta _t(p_i))-\frac{1}{N}\sum _{i=1}^N\phi (p_i)(\eta _0(p_i))\nonumber \\&\qquad -\left( -\int _0^{t} \frac{a(N)}{2N} \sum _{i,j=1}^NW_{ij}^N (\phi (p_j)-\phi (p_i))(\eta _s(p_j)-\eta _s(p_i))\mathrm {d}s\right) \nonumber \\&\quad =\mu _t^{N}(\phi )-\mu _0^{N}(\phi ) - \int _0^{t} \frac{a(N)}{N} \sum _{i,j=1}^NW_{ij}^N(\phi (p_j)-\phi (p_i))\eta _{s}(p_i)\mathrm {d}s\nonumber \\&\quad =\mu _t^{N}(\phi )-\mu _0^{N}(\phi ) - \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)\left( a(N)\sum _{j=1}^NW_{ij}^N(\phi (p_j)-\phi (p_i))\right) \mathrm {d}s.\qquad \end{aligned}$$
(23)

Using Convergence of the Generators

By (19), we can write for any \(p_i\):

$$\begin{aligned} a(N)\sum _{j=1}^NW_{ij}^N(\phi (p_j)-\phi (p_i)) = \Delta _M\phi (p_i)+E_{p_i}(N), \end{aligned}$$
(24)

where

$$\begin{aligned} E(N):=\frac{1}{N}\sum _{i=1}^N|E_{p_i}(N)|\rightarrow 0 \qquad (N\rightarrow \infty ). \end{aligned}$$
(25)

This shows that

$$\begin{aligned}&\int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)\left( a(N)\sum _{j=1}^NW_{ij}^N(\phi (p_j)-\phi (p_i))\right) \mathrm {d}s \\&\quad = \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)\left( \Delta _M\phi (p_i)+E_{p_i}(N)\right) \mathrm {d}s\\&\quad = \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)\Delta _M\phi (p_i)\mathrm {d}s +\int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)E_{p_i}(N)\mathrm {d}s\\&\quad = \int _0^{t} \mu _s(\Delta _M\phi )\mathrm {d}s +\int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)E_{p_i}(N)\mathrm {d}s. \end{aligned}$$

Plugging this into (23) and (21), we obtain:

$$\begin{aligned} \mu _t^{N}(\phi )-\mu _0^{N}(\phi ) - \int _0^{t} \mu _s^{N}(\Delta _M\phi )\mathrm {d}s = M_t^{N} + \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s, \end{aligned}$$
(26)

so for any \(T>0\):

$$\begin{aligned}&\sup _{0\le t\le T}\left| \mu _t^{N}(\phi )-\mu _0^{N}(\phi ) - \int _0^{t} \mu _s^{N}(\Delta _M\phi )\mathrm {d}s \right| \le \sup _{0\le t\le T} \left| M_t^{N}\right| \nonumber \\&\quad + \sup _{0\le t\le T}\left| \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s\right| . \end{aligned}$$
(27)

We want to show that this expression converges to 0 in probability. We will deal with the terms on the right hand side separately.

The Error Term

First of all

$$\begin{aligned} \left| \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s\right|\le & {} \int _0^{t} \frac{1}{N} \sum _{i=1}^N|\eta _{s}^{N}(p_i)| |E_{p_i}(N)|\mathrm {d}s \le \int _0^{t} E(N)\mathrm {d}s\\= & {} t E(N), \end{aligned}$$

so

$$\begin{aligned} \sup _{0\le t\le T}\left| \int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}(p_i)E_{p_i}(N)\mathrm {d}s\right| \le TE(N)\rightarrow 0 \quad \text {(by}\,(25)\text {)}. \end{aligned}$$

Convergence of the Martingale to 0

Now for the other term. Since the trajectory \(t\mapsto \mu _t^{N}\) is cadlag, so is \(M^{N}\). Hence by Doob’s inequality we see:

$$\begin{aligned} \mathbb {P}\left( \sup _{0\le t\le T} \left| M_t^{N}\right| >\delta \right) \le \frac{\mathbb {E}|M_T^{N}|}{\delta }. \end{aligned}$$
(28)

To show that \(\mathbb {E}|M_T^{N}|\) goes to 0, it suffices to show that \(\mathbb {E}\left<M^{N},M^{N}\right>_T\) goes to 0 (since then \(\mathbb {E}\left[ (M_T^{N})^2\right] =\mathbb {E}\left<M^{N},M^{N}\right>_T\rightarrow 0\) and hence \(\mathbb {E}|M_T^{N}|\rightarrow 0\)). This is what the following lemma tells us.

Lemma 4.4

For any \(T>0\):

$$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb {E}\left<M^{N},M^{N}\right>_T=0. \end{aligned}$$

Proof

Recall that \(\left<M^{N},M^{N}\right>_T=\int _0^T (L^{N}(f^N)^2-2f^NL^{N}f^N)(\eta _s) \mathrm {d}s\). By writing out, one simply obtains

$$\begin{aligned} (L^{N}(f^N)^2-2f^NL^{N}f^N)(\eta )=\sum _{i,j=1}^N \frac{a(N)}{2} W_{ij}^N (f(\eta ^{ij})-f(\eta ))^2. \end{aligned}$$

Using (22), we see

$$\begin{aligned} (f(\eta ^{ij})-f(\eta ))^2\le \left( \frac{1}{N}(\phi (p_j)-\phi (p_i))(\eta (p_j)-\eta (p_i))\right) ^2 \le \frac{1}{N^2}(\phi (p_j)-\phi (p_i))^2, \end{aligned}$$

since \(\eta (p_i)\in \{0,1\}\) for all i. This shows that

$$\begin{aligned} 0\le & {} \left<M^{N},M^{N}\right>_T=\int _0^T (L^{N}(f^N)^2-2f^NL^{N}f^N)(\eta _s) \mathrm {d}s \\\le & {} \int _0^T \frac{a(N)}{2N^2} \sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2\mathrm {d}s = T \frac{a(N)}{2N^2} \sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2. \end{aligned}$$

This implies that also

$$\begin{aligned} 0\le \mathbb {E}\left<M^{N},M^{N}\right>_T\le T \frac{a(N)}{2N^2} \sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2. \end{aligned}$$
(29)

We can estimate this term by using (25). Some basic manipulations show that

$$\begin{aligned}&\frac{a(N)}{2}\sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2 = - \sum _{i=1}^N \phi (p_i) a(N) \sum _{j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i)) \\&\quad = - \sum _{i=1}^N \phi (p_i) \left( \Delta _M\phi (p_i)+E_{p_i}(N)\right) = - \sum _{i=1}^N \phi (p_i)\Delta _M\phi (p_i) - \sum _{i=1}^N \phi (p_i)E_{p_i}(N), \end{aligned}$$

where the \(E_{p_i}\)’s are as before. This implies that

$$\begin{aligned}&\limsup _{N\rightarrow \infty }\left| \frac{a(N)}{2N^2}\sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2\right| \\&\quad \le \limsup _{N\rightarrow \infty }\left\{ \frac{1}{N^2}\sum _{i=1}^N |\phi (p_i)||\Delta _M\phi (p_i)| +\frac{1}{N^2} \sum _{i=1}^N |\phi (p_i)||E_{p_i}(N)|\right\} \\&\quad \le \limsup _{N\rightarrow \infty }\frac{1}{N} ||\phi ||_\infty ||\Delta _M\phi ||_\infty + \limsup _{N\rightarrow \infty }\frac{1}{N}||\phi ||_\infty E(N)=0, \end{aligned}$$

where in the last step we used (25). So we obtain

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{a(N)}{2N^2} \sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2 = 0. \end{aligned}$$

Together with (29) this gives the result. \(\square \)

We conclude from the lemma that the right hand side of (28) goes to zero as N goes to infinity and \(\epsilon \) goes to zero, so

$$\begin{aligned} \lim _{\epsilon \downarrow 0}\lim _{N\rightarrow \infty } \sup _{0\le t\le T} \left| M_t^{N}\right| = 0 \quad \text {in probability.} \end{aligned}$$

Convergence of (27) to 0 in Probability

Combining everything above and using (27), we conclude that

$$\begin{aligned} \lim _{N\rightarrow \infty }\sup _{0\le t\le T}\left| \mu _t^{N}(\phi )-\mu _0^{N}(\phi ) - \int _0^{t} \mu _s^{N}(\Delta _M\phi )\mathrm {d}s \right| =0 \quad \text {in probability.} \end{aligned}$$

In particular, for any \(\delta \ge 0\), define

$$\begin{aligned} H^\delta =\left\{ \alpha \in D: \sup _{0\le t< T}\left| \alpha _t(\phi )-\alpha _0(\phi ) - \int _0^{t} \alpha _s(\Delta _M\phi )\mathrm {d}s \right| \le \delta \right\} . \end{aligned}$$

It can be shown, as in (Seppäläinen [17], Chap. 8), that \(H^\delta \) is closed for any \(\delta >0\). Recall from page 26 that we write the distribution of \(t\mapsto \mu _t^{N}\) as \(Q^{N}\). Then the convergence result above implies that for any \(\delta >0\):

$$\begin{aligned} \lim _{N\rightarrow \infty }Q^{N}(H^\delta ) = 1. \end{aligned}$$

Tightness of\((Q^N)_{N=1}^\infty \)

We will need that the sequence of distributions \((Q^N)_{N=1}^\infty \) is tight. This can be shown in exactly the same way as (Kipnis and Landim [13], p.55-56). In fact all the most crucial calculations have already been performed above.

Lemma 4.5

The sequence of distributions \((Q^N)_{N=1}^\infty \) is tight.

Proof

It needs to be shown that the two conditions of (Kipnis and Landim [13], Chapt. 4, Thm. 1.3) are satisfied. Note that for any continuous f we can map a path \(\nu \in D([0,T],R(M))\) to the path in \(D([0,T],\mathbb {R})\) given by \(t\mapsto \nu _t(f)\). This induces a sequence of distributions \(Q^Nf^{-1}\) on \(D([0,T],\mathbb {R})\). By (Kipnis and Landim [13], Chap. 4, Prop. 1.7) and the fact that the smooth functions are uniformly dense in the set of continuous functions on a manifold, it suffices to prove the conditions of (Kipnis and Landim [13], Chapt. 4, Thm. 1.3) for {\(Q^Nf^{-1},N\ge 0\}\) for all smooth f. Fix such f. Since each path stays in the set of sub-probability measures, the first condition is easily satisfied. For the second condition, it suffices to prove Aldous’ tightness criterion, i.e. that

$$\begin{aligned} \lim _{\gamma \rightarrow 0} \limsup _{N\rightarrow \infty }\sup _{\tau \in \mathcal {I}_T, \theta \le \gamma } Q^Nf^{-1} \left[ \left| \mu ^N_\tau (f)-\mu ^N_{\tau +\theta }(f)\right| >\epsilon \right] =0, \end{aligned}$$
(30)

where \(\mathcal {I}_T\) denotes the set of all stopping times bounded by T. We know from equation (26) that there exists a martingale M (depending on f) such that

$$\begin{aligned} \mu _t^{N}(f)-\mu _0^{N}(f) - \underbrace{\int _0^{t} \mu _s^{N}(\Delta _M f)\mathrm {d}s}_\text {(I)} = \underbrace{M_t^{N}}_\text {(II)} + \underbrace{\int _0^{t} \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s}_\text {(III)}. \end{aligned}$$

It therefore suffices to check the tightness criterion for the RHS of this equation and for the integral on the LHS (since the only other term is constant). Now we can make the following estimations.

(I). First of all, since \(\mu _s^N\) is a sub-probability measure and \(\Delta _M f\) is bounded:

$$\begin{aligned} \left| \int _0^{\tau +\theta } \mu _s^{N}(\Delta _M f)\mathrm {d}s - \int _0^{\tau } \mu _s^{N}(\Delta _M f)\mathrm {d}s\right| \le \theta ||\Delta _M f||_\infty . \end{aligned}$$

This implies that

$$\begin{aligned}&\sup _{\tau \in \mathcal {I}_T, \theta \le \gamma } Q^Nf^{-1} \left[ \left| \int _0^{\tau +\theta } \mu _s^{N}(\Delta _M f)\mathrm {d}s - \int _0^{\tau } \mu _s^{N}(\Delta _M f)\mathrm {d}s\right|>\epsilon \right] \\&\quad \le Q^Nf^{-1} \left[ \sup _{\tau \in \mathcal {I}_T, \theta \le \gamma } \left| \int _0^{\tau +\theta } \mu _s^{N}(\Delta _M f)\mathrm {d}s - \int _0^{\tau } \mu _s^{N}(\Delta _M f)\mathrm {d}s\right|>\epsilon \right] \\&\quad \le Q^Nf^{-1} \left[ \sup _{\tau \in \mathcal {I}_T, \theta \le \gamma } \theta ||\Delta _M f||_\infty>\epsilon \right] \le Q^Nf^{-1} \left[ \gamma ||\Delta _M f||_\infty>\epsilon \right] = \mathbb {1}_{\gamma ||\Delta _M f||_\infty >\epsilon }. \end{aligned}$$

This implies that the limit in (30) is smaller than

$$\begin{aligned} \lim _{\gamma \rightarrow 0} \limsup _{N\rightarrow \infty } \mathbb {1}_{\gamma ||\Delta _M f||_\infty>\epsilon } = \lim _{\gamma \rightarrow 0} \mathbb {1}_{\gamma ||\Delta _M f||_\infty >\epsilon } = 0, \end{aligned}$$

so (I) satisfies the tightness criterion.

(II). Further, the calculations above show that

$$\begin{aligned} \left| \int _0^{\tau +\theta } \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s - \int _0^{\tau } \frac{1}{N} \sum _{i=1}^N\eta _{s}^{N}(p_i)E_{p_i}(N)\mathrm {d}s\right| \le \theta E(N)\le \theta K. \end{aligned}$$

Here K is some positive number which exists, because of (25). This part satisfies (30) in the same way as the previous part.

(III). Now for the last term, we first estimate \(\mathbb {E}\left[ (M^N_{\tau +\theta }-M^N_{\tau })^2\right] \) (as is done in (Kipnis and Landim [13], p.56)). Naturally, the expectation is taken with respect to \(Q^Nf^{-1}\). Note that because of the martingale property:

$$\begin{aligned} 0\le \mathbb {E}\left[ (M^N_{\tau +\theta }-M^N_{\tau })^2\right] = \mathbb {E}(M^N_{\tau +\theta })^2-\mathbb {E}(M^N_{\tau })^2 = \mathbb {E}\left<M^N,M^N\right>_{\tau +\theta }-\mathbb {E}\left<M^N,M^N\right>_{\tau }. \end{aligned}$$

We see from the calculations in the proof of Lemma 4.4 that

$$\begin{aligned} \mathbb {E}\left<M^N,M^N\right>_{\tau +\theta }-\mathbb {E}\left<M^N,M^N\right>_{\tau } \le \theta \frac{a(N)}{2N^2} \sum _{i,j=1}^N W_{ij}^N (\phi (p_j)-\phi (p_i))^2. \end{aligned}$$

Since the term after \(\theta \) converges to 0, we see that it is bounded by some constant \(\alpha \). By Chebyshev’s inequality we obtain:

$$\begin{aligned} Q^Nf^{-1} \left( |M^N_{\tau +\theta }-M^N_{\tau }|>\epsilon \right) \le \frac{\mathbb {E}\left[ (M^N_{\tau +\theta }-M^N_{\tau })^2\right] }{\epsilon ^2} \le \frac{\theta \alpha }{\epsilon ^2}. \end{aligned}$$

Since

$$\begin{aligned} \lim _{\gamma \rightarrow 0} \limsup _{N\rightarrow \infty }\sup _{\tau \in \mathcal {I}_T, \theta \le \gamma } \frac{\theta \alpha }{\epsilon ^2} = \lim _{\gamma \rightarrow 0} \limsup _{N\rightarrow \infty } \frac{\gamma \alpha }{\epsilon ^2} = \lim _{\gamma \rightarrow 0} \frac{\gamma \alpha }{\epsilon ^2} = 0, \end{aligned}$$

this part satisfies (30) too.

Limit Distribution

We have just shown that \((Q^N)_{N=1}^\infty \) is a tight sequence of measures on D. This implies that every one of its subsequences is also tight and therefore has a weakly convergent subsequence. If these all have the same limit, then it follows from a basic result in metric spaces that the sequence itself converges weakly to that limit. It therefore suffices for weak convergence of \((Q^N)_{N=1}^\infty \) to show that every weakly convergent subsequence of \((Q^N)_{N=1}^\infty \) has the same limit. Let \((Q^{N_k})_{k=1}^\infty \) be any weakly convergent subsequence and denote its limit by Q. Since H is closed, we know for any \(\delta >0\) that

$$\begin{aligned} Q(H^\delta )\ge \limsup _{k\rightarrow \infty } Q^{N_k}(H^\delta )=1, \end{aligned}$$

so \(Q(H^\delta )=1\). Since this holds for any \(\delta >0\), we see

$$\begin{aligned} Q(H^0)=Q\left( \bigcap _{m=1}^\infty H^\frac{1}{m}\right) =1-Q\left( \bigcup _{m=1}^\infty (H^\frac{1}{m})^C\right) \ge 1- \sum _{m=1}^\infty Q\left( \left( H^\frac{1}{m}\right) ^C\right) =1. \end{aligned}$$

This means that

$$\begin{aligned} Q\left( \alpha \in D: \sup _{0\le t< T}\left| \alpha _t(\phi )-\alpha _0(\phi ) - \int _0^{t} \alpha _s(\Delta _M\phi )\mathrm {d}s \right| =0 \right) =1. \end{aligned}$$

By doing this for a countable set of functions \(\phi \) that is dense in \(C^\infty \) with respect to \(||\cdot ||_\infty +||\Delta _M\cdot ||_\infty \) and arguing that this implies the same for any smooth function we see:

$$\begin{aligned} Q\left( \alpha \in D: \sup _{0\le t< T}\left| \alpha _t(\phi )-\alpha _0(\phi ) - \int _0^{t} \alpha _s(\Delta _M\phi )\mathrm {d}s \right| =0 \,\,\forall \phi \in C^\infty \right) =1. \end{aligned}$$

Since this holds for any \(T>0\), we see that \(Q-\)a.s. for every \(t\ge 0\) and for all smooth \(\phi \):

$$\begin{aligned} \alpha _t(\phi )-\alpha _0(\phi ) = \int _0^{t} \alpha _s(\Delta _M\phi )\mathrm {d}s. \end{aligned}$$
(31)

Note that (31) is a weak, measure-valued formulation of the heat equation. We will argue and use shortly that this equation uniquely determines the trajectory \(t\mapsto \alpha _t\) given the initial conditions.

Continuity

To obtain uniqueness, we first need to know that the trajectory is continuous. For the \(\mathbb {R}^n\) case this is shown in  (Seppäläinen [17], Lemma 8.6). The result can be shown in exactly the same way in our case, so we will not provide all the details. The topology on the space of measures is generated by the following metric:

$$\begin{aligned} d_M(\mu ,\nu ) = \sum _{j=1}^\infty 2^{-j} \left( 1\wedge \left| \mu (\phi _j)-\nu (\phi _j)\right| \right) , \end{aligned}$$

for some sequence \(\phi _j\in C^\infty (M)\). It suffices to control

$$\begin{aligned} \sup _{t\ge 0} \mathrm {e}^{-t}d_M(\mu ^N_t,\mu ^N_{t-}). \end{aligned}$$

Doing that can be reduced to showing that for any \(T>0\) and \(\psi \in C^\infty (M)\):

$$\begin{aligned} \lim _{\delta \rightarrow 0}\limsup _{n\rightarrow \infty } \mathbb {E}\left[ \sup _{0\le s,t\le T, |s-t|<\delta } \left| \mu ^N_s(\phi )-\mu ^N_t(\phi )\right| ^2\right] . \end{aligned}$$

This can be done by using the Dynkin martingale representation (26) and bounding all the differences as in the proof of tightness. The only term that needs some attention is \((M^N_t-M^N_s)^2\), but it can be controlled using Doob’s maximal inequality:

$$\begin{aligned} \mathbb {E}\left[ \sup _{0\le s,t\le T, |s-t|<\delta } (M^N_t-M^N_s)^2\right]\le & {} \mathbb {E}\left[ \sup _{0\le t\le T} 4 (M^N_t)^2\right] \\\le & {} 16 \mathbb {E}(M^N_T)^2 = 16 \mathbb {E}\left<M^N,M^N\right>_T, \end{aligned}$$

which goes to zero according to Lemma 4.4.

Uniqueness

To obtain uniqueness of limits of subsequences of \(Q^N\), we need to know that there is a unique continuous solution to (31) that has initial condition \(\rho _0\mathrm {d}{\bar{V}}\). We know that \(t\mapsto \rho _t\mathrm {d}{\bar{V}}\) is a continuous solution to (31) with the right initial condition if \(t\mapsto \rho _t\) satisfies the heat equation with initial condition \(\rho _0\). Therefore it suffices to show that this solution is unique. This result is proven with a boundedness condition in  (Seppäläinen [17], Thm A.28). The main idea of the proof is that the measure valued path \(\alpha _t\) is smoothed by taking its convolution with some smooth kernel with bandwidth \(\epsilon >0\). Then it is shown that this trajectory of functions satisfies the heat equation with initial condition \(\rho _0\) in the strong sense (by interchanging integral and derivatives and using that these identities are known for sufficiently many \(\phi \)), so it must equal \(t\mapsto \rho _t\). Then by letting \(\epsilon \) go to zero, it is shown that the original trajectory \(t\mapsto \alpha _t\) must equal \(t\mapsto \rho _t\mathrm {d}\lambda \), where \(\lambda \) is the Lebesgue measure.

To obtain the analogous result in our setting, we cannot use convolution, since this is not well-defined on a manifold. However, we can smooth the measures by integrating the heat kernel at time \(\epsilon \) with respect to the measures. Using this smoothing, we can follow exactly the same approach, i.e. showing that the smoothed trajectory satisfies the heat equation in a strong sense and then letting \(\epsilon \) go to 0. The boundedness condition is a bound on volumes, which is needed for some estimations in Seppäläinen [17] and for the uniqueness of the strong solution to the heat equation. Since we work in a compact setting and with probability measures, such a bound is not necessary. The uniqueness of the strong solution to the heat equation is a standard result in our case (so for a compact and connected Riemannian manifold). See for instance [11, Thm. 8.18]. Results on the heat kernel on a manifold can also be found in Grigoryan [11].

Conclusion

Now let \(t\mapsto \rho _t\) be the solution to the heat equation on M with initial condition \(\rho _0\) and call \(\beta :=(t\mapsto \rho _t\mathrm {d}{\bar{V}})\). Recall that (31) holds \(Q-\)a.s. By the uniqueness result above, this implies that Q is a Dirac distribution with \(\beta \) as its support. Since this does not depend on \(Q^{N_k}\), it must be the same for any convergent subsequence, so with arguments given above, we conclude that \(Q^N\rightarrow Q\) weakly. Let \(\gamma ^N\) denote the random trajectory \(t\mapsto \mu ^{N}_t\). Since Q is degenerate, the weak convergence implies convergence in probability, so \(\gamma ^N\rightarrow \beta \) in probability. This is what we wanted to show.