Hydrodynamic Limit of the Symmetric Exclusion Process on a Compact Riemannian Manifold

We consider the symmetric exclusion process on suitable random grids that approximate a compact Riemannian manifold. We prove that a class of random walks on these random grids converge to Brownian motion on the manifold. We then consider the empirical density field of the symmetric exclusion process and prove that it converges to the solution of the heat equation on the manifold.


Introduction
Hydrodynamic limits of interacting particle systems is a well established subject. A large variety of parabolic equations (such as the non-linear heat equation) and hyperbolic conservation laws have been obtained from microscopic stochastic particle systems; see DeMasi and Presutti [7], Kipnis and Landim [13], Seppäläinen [17] for overviews. Usually, the setting here is that in the underlying particle system the particles move on the lattice Z d , and after rescaling the limiting partial differential equation is defined on R d , or on a subdomain of R d such as an interval, where then equations with boundary conditions on the ends of the interval are derived (e.g. Dirichlet boundary conditions for the case where at the right and left end the system is coupled to a reservoir fixing the density of particles, see Gonçalves [10]). Motivated e.g. by the study of the motion of proteins in a cell-membrane, or more general motion of particles on curved interfaces, it is clear that there are many relevant physical systems of which the macroscopic motion takes place on a Riemannian manifold rather than on Euclidean space. It is the aim of this paper to provide first steps in this direction, by con-Communicated by Hal Tasaki. sidering the simplest interacting particle system on a suitable discretization of a Riemannian manifold and proving its hydrodynamic limit. The symmetric exclusion process is a wellknown and well-studied interacting particle system for which in standard setting it is rather straightforward to obtain the hydrodynamic limit using duality. Duality allows to translate the one-particle scaling limit, i.e., the fact that the rescaled single particle position converges to Brownian motion to the fact that the hydrodynamic limit of the particle system is the diffusion equation. Another manifestation of duality is the fact that the microscopic equation for the expectation of the density field is already a closed equation. We consider the symmetric exclusion process on a suitable discretization (a notion defined more precisely below) of a compact Riemannian manifold and prove that its empirical density field, after appropriate rescaling, converges to the solution of the heat equation on the manifold. To obtain this result, we start in Sect. 2 by studying the invariance principle of a class of geodesic random walks, thereby extending earlier results of Jørgensen [12]. These random walks are shown to converge to Brownian motion, via the technique of generator convergence. Next, in Sect. 3, we define a notion of "uniformly approximating grids" and show that choosing uniformly N points on the manifold, and connecting them via a kernel depending on the Riemannian distance yields a weighted graph such that the corresponding random walk converges (as the number of random points tends to infinity) to a geodesic random walk which in turn scales to Brownian motion. We also formulate abstract conditions on approximating grids ensuring the convergence of the weighted random walk to Brownian motion. In particular, convergence of the empirical distribution to the normalized Riemannian volume in Kantorovic distance is shown to be sufficient, i.e. we show that in that setting weights can be chosen such that the corresponding random walk converges to Brownian motion. We give several examples of such suitable grids. Finally, in Sect. 4, we define the exclusion process on such suitable grids (defined in Sect. 3) and show that its empirical density converges to the solution of the heat equation, following the proof from Seppäläinen [17].

The Invariance Principle for a Class of Geodesic Random Walks
Let M be an n-dimensional, compact and connected Riemannian manifold. Then we know that M is complete and hence geodesically complete. The main purpose of this section is to define the geodesic random walk and to show that it approximates Brownian motion when appropriately rescaled (in time and space). Such random walks and this so-called invariance principle have been studied before (Jørgensen [12] and in a special case Blum [4]). However we will directly obtain results that are tailor-made to apply them in Sect. 3. In particular, we will obtain general assumptions on the jumping distributions of the geodesic random walk for it to converge to Brownian motion. In Sect. 2.1, we define the geodesic random walk and show convergence of the generators to the generator of Brownian motion under certain assumptions on the jumping distributions. Section 2.2 is devoted to finding out which distributions satisfy these assumptions.

Convergence of the Generators
The process Let {μ p , p ∈ M} be a collection of positive, finite measures where each μ p is a measure on T p M. The measure μ p represents the rate to jump in a particular direction of T p M. More where for a vector ξ ∈ T p M we denote the geodesic through p with tangent vector ξ at p by p(·, ξ). We denote the corresponding semigroup by Both of these have the continuous functions on the manifold C(M) as their domain. We interpret this process as follows. When the process X N is at a point p, it chooses a random direction η from T p M with rates given by μ p (i.e. it waits for an exponential time with rate μ p (T p M) and then independently picks a vector according to the probability distribution μ p μ p (T p M) ). Then the process jumps to the position p(1/N , η) that is reached by following the geodesic through p in the direction of η for time 1 N . This situation is sketched in Fig. 1. We assume that choosing random directions happens independently. In this section we will specify restrictions that the measures μ p should satisfy. Later (in Sect. 2.2), we will show that we can take μ p to be for instance the uniform distribution on the unit tangent vectors at p.
The R R R n case Before we go into the general case, we illustrate the above in R n . In R n the exponential map is simply addition if we identify T p R n with R n itself. So in that case from a point p the process moves to p(1/N , η) = p + 1 N η where η is chosen from T p R n = R n randomly. This means that the discrete time jumping process when jumping as described above, can be denoted by where η j is drawn from T S j−1 R n = R n according to some distribution. Now let {N t , t ≥ 0} be a Poisson process with rate one and define X N t = S N t . Then X makes the same jumps as S, but after independent exponential times. We see that X N = {X N t , t ≥ 0} satisfies the description above. Now the invariance principle tells us that under some conditions on the jumping rates X N t N 2 → B t in distribution as N goes to infinity, where B is Brownian motion. We show the analogous result in the more general setting of a manifold.

Aim
We denote the Laplace-Beltrami operator on the manifold by M . The rest of this section will be devoted to the proof of the following result.
Proposition 2.1 Suppose that in the situation above we have: • sup p∈M sup η∈suppμ p ||η|| < ∞ • sup p∈M μ p (T p M) < ∞ • η i μ p (dη) = 0 and η i η j μ p (dη) = g i j ( p) in each coordinate system around p The first assumption requires that the supports of the measures and their total masses are bounded uniformly over all points of the manifold. We will loosely say that the measures are uniformly compactly supported and uniformly finite. Since C ∞ (M) is a core for 1 2 M [20], the Trotter-Kurtz theorem (see Kurtz [14]) implies the following corollary. Note that if we denote the random variable corresponding to μ p by ζ p , the second requirement of Proposition 2.1 is that (in any coordinate system) Eζ i p = 0 and Cov(ζ i p , ζ j p ) = g i j ( p). This shows that the mean vector m of ζ p satisfies m = 0 and the covariance matrix satisfies = (g i j )( p). In R n , this simplifies to Eζ i p = 0 and Cov(ζ i p , ζ j p ) = δ i j . This is satisfied for instance when μ p is the uniform distribution on the sphere with radius √ N in R n . Section 2.2 deals with the question which measures satisfy the restrictions above. Some examples will be given at the end of that section as well.  Fig. 2). Because of the lemma, it suffices to show that

Technical Considerations
To obtain good estimations later, we will need that p(s, η) is still in our coordinate system (x, U ) and even in the set G when |s| ≤ 1 N for N large enough. Since the convergence must be uniform, how large N must be can not depend on the point p. The following lemma tells us how to choose such N . Lemma 2.5 Call K = sup p∈M sup η∈suppμ p ||η|| < ∞ (by assumption). Choose N ∈ N such that 1 N < 2K . Then for all p ∈ O and N ≥ N we see Proof Let N ≥ N and let p ∈ O. The situation of the proof is visually represented in Fig. 2. Fix s ∈ (− 1 N , 1 N ). Without loss of generality assume s > 0. Note that the speed of the geodesic p(·, η) equals ||η||, so at time s, it has traveled a distance s||η|| from p. This means that there is a path of length s||η|| from p(s, η) to p, Fix N as in the lemma and take N larger than N .

Taylor Expansion
Now fix p ∈ O and η ∈ T p M. Write p η for the map R → M that takes t to p(t, η). We can which is a composition of smooth maps. This means that f • p η is just a smooth map R → R, so we can use a Taylor expansion and obtain where t N ,η, p ∈ (0, 1/N ) is a number depending on N , η and p. This gives us We will examine these terms separately.

The First Term
Recall that p ∈ O and that O is contained in a coordinate chart (x, U ). Since N ≥ N , Lemma 2.5 guarantees us that p(s, η) stays in the coordinate chart for |s| < 1 Now setting s = 0, this becomes: since p η (0) = p(0, η) = p and the tangent vector to the geodesic p(·, η) at 0 is η (so the i th coordinate with respect x is just η i ). Now the first term of (1) becomes: By assumption these integrals are 0. This shows that the first term of (1) vanishes.

The Second Term
Now we want to show that the remaining term equals 1 2 M f ( p). Similarly to above we see for |s| < 1 N (leaving out the arguments to keep things clear): Since p η is a geodesic, we know that it satisfies the geodesic equations. This shows that for each i = 1, . . . , n we have Using this yields the following expression for the second derivative: Using linearity of the integral, we obtain the following expression for the second term of (1): Note that we also changed the order of the derivatives of f , this can be done since f is smooth. Now we want the term above to equal This is true, since we required that for any coordinate chart around p and for all i, j:

The Rest Term
If the last term goes to 0 uniformly on O, we have the result. Let N still be larger then N .
This means that the above is smaller than: Because of the 1/N in front of the equation, we only need to know that the rest is uniformly bounded to obtain uniform convergence. It thus suffices to show that d 3 (t) is bounded as a function of η with ||η|| < K and t ∈ [0, 1/N ]. Lemma 2.5 shows that p(t, η) stays in G for all such η and t. We will use this fact multiple times. We first express d 3 in local coordinates for |t| ≤ 1/N .
To make notation more compact, we introduce the following notation (and f i , f i jk analogously): Combining this with Einstein summation, we can write (2) as . Now, as before, we can deal with second derivatives of geodesics using the geodesic equations: We can also calculate the third derivative: This shows us that d 3 is a combination of products and sums of the following types of expressions: f i , f i j , f i jk , p i 1 , i rs and d dt i rs . If we can bound all of these on the right domains (independent of p and η), we are done.
Bounding f i , f i j and f i jk First of all, note that f is a smooth function on U . Further, ∂ i defines smooth vector field on U . Since f i = ∂ f ∂ x i is obtained by applying ∂ i on U to f , it is a smooth function on U .
Continuing in this way, we see that f i j and f i jk are also smooth functions on U . In particular, they are smooth functions on G (since it is a subset of U ). G is a closed subset of the compact M and is hence compact itself. This implies that f i , f i j and f i jk are (for each choice of i, j, k) bounded on G. Since we evaluate these functions in the points p(s, η) for 0 ≤ s ≤ 1/N , N ≥ N and ||μ|| ≤ K , our discussion above shows that we only evaluate them in points of G. This means that we have found bounds for f i , f i j and f i jk .
Bounding p i 1 We start with a technical lemma.

Lemma 2.6
Let q ∈ M and let (y, V ) be a coordinate chart around q. Let v ∈ T q M and Proof Fix some 1 ≤ i ≤ n. We see in the tangent space at q: Using the relations above and the Cauchy-Schwarz inequality, we obtain: Now we can use this to show the following.
Proof The first equation is just a change of notation. Further we see This means that d(x i • p η ) dt is just the i th coordinate with respect to (x, U ) of the tangent vector to p η at time t so at the point p(t, η) ∈ M. Using Lemma 2.6, we see Since p η is a geodesic, it has constant speed. Its speed at p is ||η||, so this must be its speed anywhere else along the trajectory. Hence || d p η dt || = ||η||. Inserting this in (3) yields the result.
We can now easily obtain a bound for p i 1 . For 0 ≤ t ≤ 1/N and ||η|| ≤ K , we know p(t, η) stays in G. g ii is a smooth and hence continous function on U , so it is bounded on G (since G is compact). This means that g ii ( p(t, η)) is bounded by some K i for ||η|| ≤ K and with notation as above. Since i rs is smooth function U → R, this expression can be bounded in exactly the same way as expressions like f j p j 1 above.

Constraints for a Stepping Distribution
The question now is which distributions μ p on T p M satisfy the assumptions of Proposition 2.1. From here on we fix p ∈ M and simply write μ for μ p . Being compactly supported and finite are rather natural constraints, but the other assumptions are harder, especially since they involve local coordinates. In this section we address the question which distributions satisfy the other assumptions, i.e. for every coordinate system around p: To generalize this a bit, suppose μ satisfies the following for some c > 0 for every coordinate system: Following the proof in the previous section, one sees directly that in this case the generators converge to the generator of Brownian motion that is speeded up by a factor c. We will look into this generalized situation and at the end we will see how to determine c.

Independence of (5) of Coordinate Systems
The following lemma shows that if (5) holds for a single coordinate system, it holds for any coordinate system. (5) holds for some c > 0 and for some coordinate system (x, U ) around p, then it holds for the same c for all coordinate systems around p.

Lemma 2.8 If
Proof Let (x, U ) be a coordinate system around p for which (5) holds with c > 0 and let (y, V ) be any other coordinate system around p. It suffices to show that (5) holds with the same c for y. Denote the metric matrix with respect to x by g and the one with respect to y bŷ g. For any η ∈ T p M define η 1 , . . . , η n as the coefficients of η with respect to x, so such that η = i η i ∂ ∂ x i . Analogously letη 1 , . . . ,η n be such that η = iη i ∂ ∂ y i . Let J = ∂(x 1 ,...,x n ) ∂(y 1 ,...,y n ) . If η ∈ T p M, thenη This shows that for any j since for any i: η i μ(dη) = 0. Moreover, for any i, j: η i η j μ(dη) = cg i j , so for any i, j: We conclude that (5) holds for y with the same c.

Orthogonal Transformations and Canonical Measures
We now introduce a class of measures.

Definition 2.9
Let V be an inner product space and let T be a linear map We call a measure μ on T p M canonical if for any orthogonal transformation T on T p M and for any coordinate system:

Remark 2.10
In the same way as above, one can show that μ has the property above with respect to some coordinate system if and only if it has the property with respect to every coordinate system. Moreover, since −I always satisfies (−I ) T G(−I ) = G, we see that In words, μ is canonical if orthogonal transformations do not change the mean vector and the covariance matrix of a random variable that has distribution μ. Remark 2.10 shows that in fact the mean vector must be 0. Note that in particular measures that are invariant under orthogonal transformations are canonical, since then and the other equation follows analogously. However a simple example shows that the converse is not true. Let M = R and let μ be any non-symmetric distribution on T p M = R with mean 0. The only orthogonal transformation (apart from the identity) is t → −t. Under this transformation the mean (which is 0) and the second moment are obviously left invariant, but μ is not symmetric, so it is not invariant. We will give an example for R n later.
If (x, U ) is some coordinate system around p and G = (g i j ) is the matrix of the metric in p with respect to x, we can write a linear transformation T : T p M → T p M as a matrix (which we will also call T ) with respect to the base ∂ ∂ x 1 , . . . , ∂ ∂ x n . We see that If T is orthogonal, this must equal so we see that g kl = i, j g i j T ik T jl = (T T GT ) kl and hence G = T T GT . Now for a measure μ on T p M and a coordinate system (x, U ), define the vector A μ and the matrix B μ by A i μ = η i μ(dη) and B i j Then we have the following.

Lemma 2.11
Let μ be a measure on T p M. Then the following are equivalent.
(ii) For every linear transformation T and every coordinate system is just the definition of being canonical written in local coordinates. Indeed, we already saw that orthogonality or T translates in local coordinates to G = T T GT , the other expressions follow in a similar way from the following equations:

Canonical Measures are Stepping Distributions
Now we have the following result.

Proposition 2.12 Let μ be a probability measure on T p M. Then μ is canonical if and only
if it satisfies (5) for some c > 0.
Proof First assume that μ is canonical and let (x, U ) be normal coordinates centered at p. Because of Lemma 2.8 it suffices to verify (5) for x, so we need to show that A μ = 0 and B μ = cG −1 = cI for some c > 0. The fact that A μ = 0 is just Remark 2.10. Now note that since B μ is symmetric, it can be diagonalized as T B μ T −1 where T is an orthogonal matrix (in the usual sense). This means that T T = T −1 and that T T GT = T T I T = T T T = I = G, so Lemma 2.11 tells us that the diagonalization equals T B μ T T = B μ . This implies that B μ is a diagonal matrix. Now for i = j letĪ i j be the n × n-identity matrix with the i th and j th column exchanged. It is easy to see that The latter is B μ with the i th and j th diagonal element exchanged. This shows that these elements must be equal. Hence all diagonal elements are equal and B μ = cI for some c ∈ R. Since Conversely let (x, U ) be a coordinate system with corresponding metric matrix G and assume that μ satisfies (5) for some c > 0. Let T be such that G = T T GT . Then A μ = 0 = T 0 = T A μ . We also see: Now we know that if the stepping distribution is canonical (and finite and compactly supported, uniformly on M), the generators converge to the generator of Brownian motion that is speeded up by some factor c > 0 (depending on μ). The question remains what this c is. The following lemma answers this question.
Proof We calculate the following (with respect to some coordinate system (x, U )): The nice part of this lemma is that the expression for c does not involve a coordinate system, only the norm (and hence inner product) of T p M. In particular we see that c = 1 is equivalent to ||η|| 2 μ(dη) = n. We summarize our findings in the following result.

Remark 2.15
Note that all we need of the jumping distributions is that their mean is 0, their covariance matrix is invariant under orthogonal transformations, they are (uniformly) compactly supported and they are (uniformly) finite. We don't need the measures to be similar in any other way, so we do not at all require the jumps to have identical distributions in the sense of Jørgensen [12]. Examples 1. To satisfy (4) for every coordinate system, by Lemma 2.8 it suffices to choose a coordinate system and construct a distribution that satisfies (4) for that coordinate system. Let (x, U ) be any coordinate system around some point in M with corresponding metric matrix G in that point. Let X be any random variable in R n that has mean vector 0 and covariance matrix G −1 (for instance let X ∼ N (0, G −1 )). Now let μ be the distribution of 2. In the previous Example (4) is immediate. Let us now consider an example that illustrates the use of Proposition 2.14. Let μ p be the uniform distribution on √ nS p M (the vectors with norm √ n). By definition of such a distribution, it is invariant under orthogonal transformations (rotations and reflections), so it is a canonical distribution. Since also ||η|| 2 μ(dη) = √ n 2 μ(dη) = n, we conclude that the uniform distribution on √ nS p M satisfies (4). Moreover, sup p∈M sup η∈suppμ p ||η|| = √ n < ∞ and sup p∈M μ p (T p M) = 1 < ∞. Together this shows that the μ p 's satisfy the assumption of proposition 2.1. 3. Let us conclude by showing for R n that the class of canonical distributions is strictly larger than the class of distributions that are invariant under orthogonal transformations, even with the restriction that ||η|| 2 μ(dη) = n. It suffices to find a distribution μ with mean 0 and covariance matrix I (since then μ satisfies (4) and 2.14 then tells us that μ is canonical and has ||η|| 2 μ(dη) = n) and an orthogonal T such that μ = μ • T −1 . Let ν be the distribution on R given by ν = 1 5 δ −2 + 4 5 δ 1/2 . Then, using the natural coordinate system, tν(dt) = 1 5 (−2) + 4 5 1 2 = 0 and t 2 μ(dt) = 1 5 (−2) 2 + 4 5 ( 1 2 ) 2 = 1. Now let μ = ν × · · · × ν (n times). Then we directly see that the mean vector is 0 and the covariance matrix is I . However T = −I is an orthogonal transformation and μ•(−I ) −1 equals the product of n times 1 5

Uniformly Approximating Grids
We would like to consider interacting particle systems such as the symmetric exclusion process on a manifold. Because the exclusion process does not make sense directly in a continuum, we need a proper discrete grid approximation. More precisely, we need a sequence of grids on the manifold that converges to the manifold in a suitable way. It will become clear that the grids will need to approximate the manifold in a uniform way. We will see in Sect. 4 that a natural requirement on the grids is that we can define edge weights (or, equivalently, random walks) on them, such that the graph Laplacians converge to the Laplace-Beltrami operator in a suitable sense.
To be more precise, we would like to have a sequence ( p n ) ∞ n=1 in M and construct a sequence of grids On each G N , we would like to define a random walk X N which jumps from p i to p j with (symmetric) rate W N i j with the property that there exists some function a : N → [0, ∞) and some constant C > 0 such that for each smooth φ where the convergence is in the sense that for all smooth φ : Definition 3. 1 We call a sequence of grids and corresponding weights (G N , W N ) ∞ N =1 uniformly approximating grids if they satisfy (6).

Remark 3.2 (Comparison with standard grids)
To give an idea of how known grids in Euclidean spaces can be incorporated in this framework, let S be the one-dimensional torus. Let S N be the grid that places a grid point in k/N , k = 1, . . . , N . Now we can define a nearest neighbour random walk by putting The compactness of the torus easily implies that this rest term can be bounded uniformly. This implies that (6) holds.
We will show in Sect. 4 that if we define the Symmetric Exclusion Process on uniformly approximating grids we can prove that its hydrodynamic limit satisfies the heat equation on M.
It is not obvious how uniformly approximating grids could be defined. Most natural grids in Euclidean settings involve some notion of equidistance, scaling or translation invariance. All of these concepts are very hard if not intrinsically impossible to define on a manifold. The current section is dedicated to showing that uniformly approximating grids actually exist.
To be more precise, we will show that a sequence ( p n ) ∞ n=1 can be used to define such grids if the empirical measures 1/N N i=1 δ p i converge to the uniform distribution in Kantorovich sense. In Sect. 3.4 we will show that such sequences exist: they are obtained with probability 1 when sampling uniformly from the manifold, i.e. from the normalized Riemannian volume measure. For the calculations of this section, we need a result that forms the core of proving the invariance principle, which we have proved in Sect. 2.

Remark 3.3
At first sight the requirement that the empirical measures approximate the uniform measure and that the grid points can be sampled uniformly seems arbitrary, but this is actually quite natural. We want to construct a random walk with symmetric jumping rates (we need this for instance for the Symmetric Exclusion Process later). This implies that the invariant measure of the random walk is the counting measure, so the random walk spend on average the same amount of time in each point of the grid. Hence the amount of time that the random walk spends in some subset of the manifold is proportional to the amount of grid points in that subset. Since we want the random walk to approximate Brownian motion and the volume measure is invariant for Brownian motion, we want the amount of time that the random walk spends in a set to be proportional to the volume of the set. This means that the amount of grid points in a subset of M should be proportional to the volume of that subset. This suggests that the empirical measures 1/N N i=1 δ p i should in some sense approximate the uniform measure. Moreover, a natural way to let the amount of grid points in a subset be proportional to its volume is by sampling grid points from the uniform distribution on the manifold.

Motivation
In statistical data analysis the following setting is known and used in various contexts such as data clustering, dimension reduction, computer vision and statistical learning, see: Singer [18], von Luxburg et al. [22], Giné et al. [9], Belkin and Niyogi [3] and Belkin [2] and references therein for general background and various applications. Suppose we have a manifold M that is embedded in R m for some m and we would like to recover the manifold from some observations of it, say an i.i.d. sample of uniform random elements of M. To do this we can describe the observations as a graph with as weight on the edge between two points a semi positive kernel with bandwidth applied to the Euclidean distance between those points. Then it can be shown that the graph Laplacian of the graph that is obtained in this way converges in a suitable sense to the Laplace-Beltrami operator on M as the number of observations goes to infinity and goes to 0. This suggests that we could define random walks on such random graphs and that the corresponding generators converge to the generator of Brownian motion. We generalize this idea by taking a more general sequence of graphs, but our main example (in Sect. 3.4) will be this random graph. The main distinction between the statistical literature and our context is the following: for our purposes it is much more natural to view the manifold M on its own instead of embedded in a possibly high dimensional Euclidean space. This means that we have to use the distance that is induced by the Riemannian metric instead of the Euclidean distance. The latter is more suitable to purposes in statistics, because in that setting the Riemannian metric on M is not known beforehand. Also, a lot is known about the behaviour of the Euclidean distance in this type of situation and not so much about the distance on the manifold. We will have to make things work in M itself. The problem of discretizing the Laplacian on a manifold (without embedding in a Euclidean space) is also studied in the analysis literature where the main concern is the convergence of spectra, see for instance: Burago et al. [5], Fujiwara [8] and Aubry [1], where structures like -nets or triangulations are used to discretize the manifold. However, since we want to define the exclusion process on our discrete weighted graph which approximates the manifold, it is important that the edge weights are symmetric. Therefore these papers cannot be applied in our context.

Model
Let M be a compact and connected Riemannian manifold. We call a function f on M where This rescales the distance over which particles will jump. Naturally, ↓ 0 as N → ∞ ), we will call such k a kernel. Define as the jumping rate from p i to p j . Here d is the Riemannian metric on M. Note that the only dependence on N is through , hence the notation W i j instead of W N i j . These jumping rates define a random walk on V N . If we regard to points p i , p j as having an edge between them if W N i j > 0, we want the resulting graph to be connected (to make sense of the random walk and later of the particle systems defined on it). If we assume that there is some α such that k(x) > 0 for x ≤ α, one can show that the resulting graph is connected for N large enough. The main reason is that the distance between points that are close to each other goes to zero faster than . The details of the proof are in the appendix (see also Remark 3.6). Finally we define To prove that the grids are uniformly approximating we have to show (6), i.e. as the number of points N goes to infinity (and hence the bandwidth goes to 0) We will prove the following slightly stronger result: Note that since the process defined above is just a continuous-time random walk its generator is given by Therefore we call (8) "convergence of the (rescaled) generators to M uniformly in the p i 's for i ≤ N " or just "convergence of the generators to M uniformly for i ≤ N ". In fact, we will show that the rate of convergence does not depend on p i , so we might as well call it "uniformly in the p i 's".

Remark 3.4
In fact, we can say more. We denote the semigroups corresponding to the gen- ) by S N t and the semigroup corresponding to C M by S t . Then (8) implies that uniformly on compact time intervals The proof is a straightforward application of (Kurtz [14], Theorem 2.1) and a small argument that the extended limit of the generators above (as described in Kurtz [14]) equals C since they are equal on the smooth functions.

Remark 3.5
To see why the rescaling a(N ) is natural, we can write Since k is a kernel that is rescaled by inside, we need the 1/ d to make sure the integral of the kernel stays of order 1 as goes to 0. Since the amount of points that the process can jump to equals N , we also need the factor 1/N to make sure the jumping rate is of order 1 as N goes to infinity. Also note that the typical distance that a particle jumps with these rates is of order . This means that space is scaled by . Hence it is very natural to expect that time should be rescaled by 1/ 2 , which is exactly what we have. Finally note that in the calculations N is the main parameter and an auxiliary parameter depending on N . However, conceptually, when the scaling is concerned, the most important parameter is . N is just the total amount of positions and simply has to grow fast enough as goes to 0. To see why this is true, note that any sequence (N ) that goes to 0 more slowly than what we use here will also do. Hence should go to 0 slow enough with respect to N or, equivalently, N should go to infinity fast enough with respect to . Remark 3. 6 We mentioned earlier that N must grow to infinity fast enough as goes to 0. In fact, with as defined in (7), the number of points in a ball of radius goes to infinity (even though shrinks to 0). In particular, this means that the number of points that a particle can jump to, goes to infinity. This is very different from the R n case with the lattice approximation 1 N Z d , where the number of neighbours is constant. The reason why it should be different in the manifold case is the following. In R d , the natural grid 1 N Z d is very symmetric. Indeed, we can split the graph Laplacian into the contributions in each direction i, where e i it the unit vector in direction i. Now when applying Taylor we see that the first order terms cancel perfectly, leaving us only with the second order terms, which we want for the Laplacian. In a manifold such perfect cancellation is not possible. Therefore the way to make the first order terms cancel is to sample more and more points around a grid point, such that the sum over the linear order terms becomes an integral which then vanishes in the limit. For this reason we need the number of points in a ball of size to go to infinity.

Remark 3.7
It is also possible to define W N i j as p ( p i , p j ), the heat kernel after time , and rescale by −1 instead of −2−d . Then the result of Sect. 3.2 can be proven in the same way (by obtaining some good bounds on Lipschitz constants and suprema of the heat kernel and choosing = (N ) appropriately, see Cipriani and van Ginkel [6]) and the result of Sect. 3.3 is a direct consequence of the fact that the Laplace-Beltrami operator generates the heat semigroup. However, for purposes of application/simulation the weights that we have chosen here are much easier to calculate (since only the geodesic distances need to be known, not the heat kernel).

Replacing Empirical Measure by Uniform Measure
We would like to show that in this case there is a C independent of i such that for all smooth f where

Now (10) equals
We will show later that the first term converges to C M f ( p i ) (uniformly in the p i 's) as N → ∞. Therefore it suffices for now to show that the second term converges to 0, uniformly in the p i 's.
Note that k is Lipschitz so it has some Lipschitz constant L k < ∞. This implies that by the reverse triangle inequality, so k(d(·, p i )/ ) has Lipschitz constant L k . f is smooth, is also Lipschitz with Lipschitz constant L f . Since they are both bounded functions, we see for the Lipschitz constant of g , j : Note that k is bounded since it is Lipschitz and compactly supported, so ||k|| ∞ < ∞. This shows that: where we denoted the dependence of on N explicitly. By (7), Note that this bound does not depend on p i . Since → 0, it follows that the second term of (11) goes to 0 uniformly in the p i 's.

What Remains
What we have seen above basically means that we can replace the empirical distribution μ N by the uniform distributionV . For convergence of the generators we still have to show that uniformly in the p i 's. Note that we can replace N → ∞ by ↓ 0, since the expression only depends on N via and (N ) ↓ 0 as N → ∞. Since the p i 's are all in M we can replace p i by q and require that the convergence is uniform in q ∈ M. Because of these considerations it remains to show that there exists C > 0 such that uniformly in q ∈ M: Note that for every > 0 this expression can be interpreted as the generator of a jump process on the manifold M. The process jumps from p to a (measurable) set Q ⊂ M with rate Q −2−d k(d( p, q)/ )dV .

Remark 3.8
Note that this is easy to show in R d . Indeed, using the transformation u = (y − x)/ and Taylor, we see where H (x) is the Hessian of f in x. Now changing coordinates to integrate over each sphere B r of radius r with respect to the appropriate surface measure S r and then with respect to r , we obtain Now because of symmetry the integrals of w i and of w i w j over spheres vanish for each i = j. Moreover the integrals of w 2 i do not depend on i, but only on r . Therefore the first term vanishes and we are left with This shows convergence (at least pointwise, for uniform convergence we have to be a little more careful about the O( )).

Integral Over Tangent Space
Let α > 0 be such that supp k ⊂ [0, α] (such α exists since k is compactly supported). We denote for p ∈ M, r > 0 : (13) Denote for η ∈ T p M, r > 0 : B p (η, r ) = {ξ ∈ T p M : ||ξ − η|| ≤ r } (not to be confused with B ρ , which is a ball in M with respect to the original metric ρ). For small enough we know that exp p : T p M ⊃ B p (0, α ) → B d ( p, α ) ⊂ M is a diffeomorphism. We want to use this to write the integral above as an integral over B p (0, ) ⊂ T p M: This means we integrate with respect to the measureV • exp •λ , where λ denotes multiplication with .
Determining the MeasureV • exp •λ Since B p (0, α ) is a star-shaped open neighbourhood of 0, we see that for small enough V := B d ( p, α ) = exp p (B p (0, α )) is a normal neighbourhood of p, so there exists a normal coordinate system (x, V ) that is centered at p. We interpret, for v ∈ R n , v p ∈ T p M as Consequently, when we write A p for some subset A of R n , we mean {v p : v ∈ A}.
Since the basis W = ∂ ∂ x 1 . . . , ∂ ∂ x n is orthogonal in T p M, it is easy to see that φ := v p → v preserves the inner product and is an isomorphism of inner product spaces. Indeed, In particular B R n (0, α ) p = B p (0, α ) (where B R n denotes a ball in R n with respect to the Euclidean metric). We can use this in the following lemma, which tells us more about V • exp •λ .

Lemma 3.9
There exist > 0 and a function h : B R n (0, ) → R such that for t tending to 0 h(t) = O(||t|| 2 ) and for all 0 Proof Let be small enough such that the considerations above the lemma hold and let < . For clarity of the proof, we first separately prove the following statement.
Proof The geodesics through p are straight lines with respect to x, so they are of the form which is the restriction of φ to B R n (0, α ) p . This situation is sketched in Fig. 3. Now we will first use the definition of integration to see what the measure is in coordinates (so it becomes a measure on a subset of R n ). Then we will use the claim above: we will pull the measure on R n back to T p M using φ.
On (x, V ) the volume measure is given by √ det Gdx 1 ∧ . . . ∧ dx n . According to (Wang [23], Cor 2.3), √ det G can be expanded (in normal coordinates) as 1 + h(x) where h is such that h(x) = O(||x|| 2 ). Now the measure can be written in local coordinates on B R n (α ) as According to the claim above, x • exp is a restriction of φ, so we can replace it by φ. Since this map is linear, it can be interchanged with λ , which yields (inserting what we found before and since < ): In the last step we interpret n (1+h( t)) V (M) dt 1 . . . dt n as a measure on B R n (0, α) and this last step is then just a transformation of measures on R n . This yields the expression that we want.
The situation in Lemma 3.9. On B p (0, α ): This measure can then be pulled back to B p (0, α ) using φ. Since φ is an inner product space isomorphism, it will be easy to deal with orthogonal transformations later, in Lemma 3.12 Remark 3. 10 We used [23,Cor 2.3] in the proof above. In these notes the expansion of √ det G( p, x) is calculated around a point p in normal coordinates x centered around p: As can be seen, there are no linear terms in the expansion. The coefficients for the quadratic terms are coefficients of the Ricci curvature of M in p. This implies that the way that the uniform distribution on a ball around p in M is pulled back to the tangent space via the exponential map depends on the curvature of M in p. In particular, if there is no curvature, M is locally isomorphic to a neighbourhood in R n so the same thing happens as in R n . This means that we get a uniform distribution on a ball around 0 in the tangent space.

Remark 3.11
We will need in Proposition 3.13 that the statement of Lemma 3.9 holds uniformly in all points of the manifold. This means that the difference between the uniform measure on a ball in the tangent space and the pulled back uniform measure on a geodesic ball in the manifold decays quadratically with uniformly in the manifold. Note that this uniform convergence is intuitively clear, since the difference between the two measures is caused by curvature and curvature is bounded in a compact manifold. As in the proof of Lemma 3.9, one needs to write for some function h p that is O(|x| 2 ) independent of p. Here G(q) is the metric matrix at q expressed in (fixed) normal coordinates centered at p. Since √ and det are uniformly continuous in the right domains, it suffices to show that where the O(|x| 2 ) is independent of p. In other words, where C does not depend on p. For all p ∈ M (and for any system of normal coordinates centered at p) we have the following Taylor expansion (note that for fixed p G(exp p (·)) i j is a map from a (subset of) R d to R): From this we get (17) directly for fixed p, i.e. we have In order to obtain uniformity of C p in p, we note that the functions of p and x appearing in the r.h.s. of (18) can be made smooth both in p and x. Smoothness in x is obvious (within the injectivity radius) and smoothness in p follows from a special choice of normal coordinates in such a way that they vary smoothly with p. A choice of normal coordinates is equivalent to a choice of an orthonormal basis, so one can construct smoothly varying normal coordinates by taking a smooth section of the orthonormal frame bundle (this can only be done locally, but it is enough to have the uniformity result locally, since then by compactness one has it globally). By compactness, the injectivity radius is bounded from below by some δ > 0. Now for all p ∈ M and ||x|| < δ, (18) holds and (locally) the quantities on the r.h.s. vary smoothly and therefore (again by compactness) one can show that C := sup p C p is finite.

A Canonical Part Plus a Rest Term
Now define on B p (0, α) and 0 everywhere else. Then the lemma implies that (14) equals Recall that p( , η) is just notation for following the geodesic from p in the direction of η for time . Now we define μ k = k(|| · ||)μ (so the measure which has density k(|| · ||) with respect to μ) and analogously μ k R = k(|| · ||)μ R . Then we can write the integral above as In this way we transformed the integral to one that we worked with in Sect. 2.1 since we wrote it as the generator of a geodesic random walk (see L N on page 2). To use the theory that we obtained in that section, we need the following lemma. It tells us that μ k can be used as a stepping distribution for a geodesic random walk and it gives us the constant speed of the Brownian motion to which it converges (see Sect. 2.2).

Lemma 3.12 μ k is canonical. Moreover
Proof First of all recall that k is continuous and compactly supported, so the integral over k above makes sense and is finite. Define ν = 1 V (M) dt 1 . . . dt n on B R n (0, α) and 0 everywhere else. Then we can write μ = ν•φ. Since φ preserves the norm, we see that k(||·|| T p M )•φ −1 = k(|| · || R n ). This means that μ k = ν k • φ, where ν k := k(|| · ||)ν. Since φ preserves the inner product, the measure μ k behaves the same with respect to orthogonal transformations in T p M as ν k with respect to orthogonal transformations in R n . Since ν k is clearly preserved under such transformations, so is μ k . This shows that μ k is canonical. Now we calculate the corresponding constant.
The first step was just writing the integral with respect to the coordinates for which we defined μ. The second step holds because μ k = ν k • φ. The third uses the fact that φ preserves the norm. The penultimate step is a change of coordinates in R n using the fact that ||v|| is constant on spheres around the origin. Here 2π n/2 (n/2) r n−1 is the area of r S n−1 . In the last step we used that supp(k) ⊂ [0, α].

Conclusion
We use everything above to obtain the statement that we aim for.

Proposition 3.13 Set
Then as → 0 we have uniformly in p ∈ M: From the results in Sects. 2.1 and 2.2 (Proposition 2.14) and Lemma 3.12, we see for the first term uniformly in p Now it suffices to show that the second term goes to zero at a rate independent of p. Let , K > 0 such that < and |h(s)| < K ||s|| 2 for s ∈ B R n (0, ) (where both and h are from Lemma 3.9). We need Remark 3.11 to make sure that K and do not depend on p. Now note that for < : Now we see: where we used that the integral is finite since k is bounded and has support in [0, α]. Combining everything above gives what we wanted.

Example Grid
So far, we have seen that a sequence of grids is suitable for the hydrodynamic limit problem if the empirical distributions converge to the uniform distribution in the Kantorovich topology. We conclude by giving examples of such grids. To be more precise, we show that if one constructs a grid by adding uniformly sampled points from the manifold, this grid is suitable with probability 1.
Remark 3.14 (Comparison with standard grids) Recall the grids S N on the one-dimensional torus S from Remark 3.2. We can show that the empirical measures corresponding to these grids along the subsequence N = 2 m , m = 0, 1, 2, . . . converge to the uniform measure on S with respect to the Kantorovich distance. To this end let N = 2 m be fixed, call the corresponding empirical measure μ N and call the uniform measure λ. Recall that the Kantorovich distance between these measures is alternatively given by where (μ N , λ) is the set of all couplings of μ N and λ. Now let Y be a uniform random variable on S and define Denote the joint distribution of (X , Y ) by ν. Then it is easy to see that ν ∈ (μ N , λ). This implies that This implies convergence with respect to the Kantorovich metric along the subsequence N = 2 m , m = 0, 1, 2, . . .. Note, however, that the corresponding edge weights as described in this section are not the same as those in Remark 3.2.

Convergence of a Random Grid
Now we move back to the general case of a compact and connected n-dimensional Riemannian manifold M. Let (P n ) ∞ n=1 be a sequence of iid uniformly random points of M. Define [21,Example 5.15] to show that W 1 (μ N ,V ) → 0 as N → ∞. First we will show that the expectation goes to 0, then we will derive that it goes to 0 almost surely. For now, let N be fixed. Let F 1 be the set of Lipschitz function on M with Lipschitz constant ≤ 1. Then we define for f ∈ F 1 the random variable X f = μ N f −V f . Note that both μ N andV are probability distributions, so X f (ω) is Lipschitz in f for each ω: Now note that since f has Lipschitz constant ≤ 1: M is compact, so K < ∞. Since adding constants to f does not change X f , it suffices to consider f ∈ F 1,K = {g ∈ F 1 : 0 ≤ g ≤ K }. It follows that for each f ∈ F 1,K by writing we see that it is a sum of iid random variables taking values in [− K N , K N ]. By the Azuma-Hoeffding inequality, this implies that X f is K 2 N -subgaussian for each f ∈ F 1,K . Now [21,Lemma 5.7] shows that where N (F 1,K , || · || ∞ , ) is the minimal number of points in some space containing F 1,K such that the balls of radius with respect to the uniform distance around those points cover F 1,K .

Estimating the Covering Number
We now need to estimate this covering number. To do this we need an upper bound of the covering number N (M, d, ) of M. Since M is compact there exist a, δ > 0 such that for all 0 < < δ : N (M, d, ) ≤ a −d (see for instance [16,Lemma 4.2]). Using this we can prove the following.

Lemma 3.15
There is a c > 0 such that for all 0 < < δ: N(F 1,K , || · || ∞ , ) ≤ exp c/ d . Now we see that for any 0 < < δ : Elementary methods show that this value takes a minimum at = c 0 N −1 d+2 where c 0 is some constant (take N large enough such that c 0 N −1 d+2 < δ). This shows that the optimal bound that we get is where c 1 is the product of some constants that don't depend on N . This shows that

Convergence a.s.
It remains to show that W 1 (μ N ,V ) goes to zero almost surely. For a function f : Further, define the function H : M N → R by Note that H ( p 1 , . . . , p N ) = W 1 (μ N ,V ).
Now we are in position to prove the main result.
Proof Since P 1 , . . . , P N are independent, (van Handel [21], Theorem 3.11) gives us that for any t > 0 where the last inequality follows from Lemma 3.16. For reasons of symmetry we obtain By a standard application of the Borel-Cantelli lemma, this implies that W 1 (μ N ,V ) − EW 1 (μ N ,V ) → 0 a.s. Since we have already seen that EW 1 (μ N ,V ) → 0, we conclude that a.s. as N → ∞ We conclude that sampling uniformly from the manifold yields a suitable grid with probability 1.

Hydrodynamic Limit of the SEP
In Sect. 3 we showed the existence of uniformly approximating grids. In this section we will apply such grids. We will use it to define an interacting particle system on the manifold. Then we will show that this interacting particle system has a hydrodynamic limit and that this limit satisfies the heat equation (the precise formulation is given in Theorem 4.2). We follow a standard method that is used in (Seppäläinen [17], Chap. 8) for the Euclidean case. Now let (G N , W N ) ∞ N =1 be a sequence of uniformly approximating grids with corresponding weights. Recall that this means the following. There is a sequence ( p n ) ∞ n=1 in M such that G N = {p 1 , . . . , p N }. On each G N , there is a random walk X N which jumps from p i to p j with (symmetric) rate W N i j . We assume that there exists some function a : N → [0, ∞) and some constant C > 0 such that for each smooth φ where the convergence is in the sense that for all smooth φ 1 By dividing a(N ) by C if necessary, we can assume that C = 1.

Remark 4.1
Note that for the result of this section it is not necessary to construct grids from a sequence. Any sequence of finite grids such that (19) holds would do. However, since the grid that we constructed in Sect. 3 is of this form and this section partially serves as an example of the application of that grid, we formulate our results in this section in the same way.

Symmetric Exclusion Process
The Symmetric Exclusion Process (SEP) is an interacting particle system that was introduced in Spitzer [19] and studied in detail in (Liggett [15], Chap. 8). The idea is that there is some (possibly countably infinite) amount of particles on a (possibly countably infinite) graph G. The particles are considered identical. Each particle jumps after independent exponential times with parameter 1 from x to y with probability p(x, y), provided that the place that it wants to jump to is not already occupied. Otherwise, the jump is suppressed. We assume that p(x, y) = p(y, x). Let η t ∈ {0, 1} G denote the configuration of the particles at time t, i.e. η t (x) = 1 if there is a particle at place x ∈ G at time t and 0 else. We will sometimes write η( p, t) = η t ( p). For any configuration η and points x, y define η xy by An equivalent description of this process is the following. All edges (x y) have independent exponential clocks with rate p(x, y) = p(y, x). Whenever a clock rings, the particles that are at either side of the corresponding edge jump along the edge. This means that if there are no particles, nothing happens. If there is one particle, it jumps. If there are two particle, they switch places. Since we are not interested in individual particles, the configuration stays the same in the latter case. Note that in this way there can never be more than two particles at the same place. Using the notation introduced above, we see that the generator of this process is defined on the core of local functions as The factor 1 2 is there since we count every edge twice.

The Process
We now define the SEP η N = (η N t ) t≥0 on G N through the generator Here η i j := η p i p j . It follows from our considerations above that this process describes particles that perform independent random walks according to X N with the restriction that jumps to occupied sites are suppressed.
Let (X i ) ∞ i=1 be some sequence of (possibly degenerate) random variables taking values in {0, 1}. Set as the initial configuration η N 0 ( p i ) = X i .

Hydrodynamic Limit
We will use this subsection to give the basic definitions that describe the idea of a hydrodynamic limit. At a microscopic scale, the particles are just random walkers with some interaction, but at the macroscopic scale (where limits are taken in space and time), the behaviour is deterministic: it is described by a partial differential equation (in our case the heat equation).

Initial Conditions and Trajectories of Particle Configurations Define
where δ p is the Dirac measure which places mass 1 at p ∈ M. It puts a point mass at each particle and rescales it by the amount of possible positions, which represents the particle configuration η N t at time t. In particular μ N t is a sub-probability measure and is in R(M). Instead of dealing with this problem pointwise for each t, we will look at trajectories. As the particles move according to the SEP, γ N : [0, ∞) → R(M) defined by t → μ N t is a random trajectory and hence a random element of D. It represents the positions of the particles over time. The initial configuration X 1 , . . . , X N and the dynamics of the SEP determine a distribution Q N on D. In this way we obtain a sequence (Q N ) ∞ N =0 of measures on D.

Assumption on the Initial Configuration
We assume that there exists a measurable function ρ 0 : M → R such that 0 ≤ ρ 0 ≤ 1 and μ N 0 converges vaguely to ρ 0 dV in probability, i.e. for any continuous φ as N → ∞: If this is the case, we say that ρ 0 dV is the density profile corresponding to the configurations η N 0 . Note that using measures here to represent the particles provides a bridge between separate particles (discrete measures) and density profiles (measures that are absolutely continuous with respect to V ). We would like to show that if this initial condition is given, then at any time t the configurations η N t have a corresponding density profile ρ t dV . Moreover, we want to show that t → ρ t solves the heat equation with initial condition ρ 0 .

Example of Initial Distribution
Suppose for now that the p i 's are such that for any continuous f : 1 to be independent Bernoulli random variables with EX i = ρ 0 ( p i ) for some continuous function ρ 0 : M → R with 0 ≤ ρ 0 ≤ 1. Then we see as N → ∞: Together this implies that (20) holds here for any continuous φ.

Main Result
After all these definitions, we can state the main result of this section.

Theorem 4.2 Let M be a complete, n-dimensional, connected Riemannian manifold and let
be a sequence of uniformly approximating grids with corresponding weights. Let η N t be particle configurations that behave according to the SEP on (G N , W N ) and let μ N t be its measure valued representation. Suppose that μ N 0 has density profile ρ 0 dV for some measurable function ρ 0 . Then the trajectory t → μ N t converges in probability to the trajectory t → ρ t dV in the Skohorod topology, where t → ρ t satisfies the heat equation on M with initial condition ρ 0 .

Dynkin Martingale
The proof of the hydrodynamic limit follows the line of (Seppäläinen [17], Chap. 8) which is a canonical method that is also discussed in Kipnis and Landim [13]. However, in our context, there are several new technical difficulties along the way which we have to tackle. Its core calculations are based on the following Dynkin martingale result. It is a standard result and it is also proved in Seppäläinen [17]. We will formulate it in terms of our situation on a compact Riemannian manifold.

Proposition 4.3 Let {η t , t ≥ 0} be a Feller process on a compact Riemannian manifold with generator L and semigroup S t . For any function f such that both f and f 2 are in D(L), define
Then M t is a martingale with respect to the filtration F t = σ {η r , r ≤ t}. Moreover, its quadratic variation M, M t equals

Application of the Proposition
First of all fix a smooth function φ on M. Define for η ∈ {0, 1} G N : Note that since L N is the generator of a random walk on a the finite space of configurations, its domain consists of all functions on those configurations, so in particular f N and ( f N ) 2 are in it. Applying Theorem 4.3 in this situation shows that M N defined by is a martingale with quadratic variation M N , Inserting definitions and leaving out some indexes (to keep everything clear) shows that the right hand side of (21) equals Using Convergence of the Generators By (19), we can write for any p i : where This shows that Plugging this into (23) and (21), we obtain: so for any T > 0: We want to show that this expression converges to 0 in probability. We will deal with the terms on the right hand side separately.

The Error Term
First of all (25)).

Convergence of the Martingale to 0
Now for the other term. Since the trajectory t → μ N t is cadlag, so is M N . Hence by Doob's inequality we see: To show that E|M N T | goes to 0, it suffices to show that E M N , M N T goes to 0 (since then . This is what the following lemma tells us.

Lemma 4.4 For any T > 0: lim
By writing out, one simply obtains Using (22), we see for all i. This shows that This implies that also We can estimate this term by using (25). Some basic manipulations show that where the E p i 's are as before. This implies that where in the last step we used (25). So we obtain Together with (29) this gives the result.
We conclude from the lemma that the right hand side of (28) goes to zero as N goes to infinity and goes to zero, so Convergence of (27) to 0 in Probability Combining everything above and using (27), we conclude that In particular, for any δ ≥ 0, define It can be shown, as in (Seppäläinen [17], Chap. 8), that H δ is closed for any δ > 0. Recall from page 26 that we write the distribution of t → μ N t as Q N . Then the convergence result above implies that for any δ > 0: We will need that the sequence of distributions (Q N ) ∞ N =1 is tight. This can be shown in exactly the same way as (Kipnis and Landim [13], p.55-56). In fact all the most crucial calculations have already been performed above.
where I T denotes the set of all stopping times bounded by T . We know from equation (26) that there exists a martingale M (depending on f ) such that (III) .
It therefore suffices to check the tightness criterion for the RHS of this equation and for the integral on the LHS (since the only other term is constant). Now we can make the following estimations.

(I).
First of all, since μ N s is a sub-probability measure and M f is bounded: This implies that sup τ ∈I T ,θ ≤γ This implies that the limit in (30) is smaller than 1 γ || M f || ∞ > = 0, so (I) satisfies the tightness criterion.
(II). Further, the calculations above show that τ +θ Here K is some positive number which exists, because of (25). This part satisfies (30) in the same way as the previous part. (III). Now for the last term, we first estimate E (M N τ +θ − M N τ ) 2 (as is done in (Kipnis and Landim [13], p.56)). Naturally, the expectation is taken with respect to Q N f −1 . Note that because of the martingale property: We see from the calculations in the proof of Lemma 4.4 that Since the term after θ converges to 0, we see that it is bounded by some constant α. By Chebyshev's inequality we obtain: Since lim

Limit Distribution
We have just shown that (Q N ) ∞ N =1 is a tight sequence of measures on D. This implies that every one of its subsequences is also tight and therefore has a weakly convergent subsequence.
If these all have the same limit, then it follows from a basic result in metric spaces that the sequence itself converges weakly to that limit. It therefore suffices for weak convergence of (Q N ) ∞ N =1 to show that every weakly convergent subsequence of (Q N ) ∞ N =1 has the same limit. Let (Q N k ) ∞ k=1 be any weakly convergent subsequence and denote its limit by Q. Since H is closed, we know for any δ > 0 that Q(H δ ) ≥ lim sup k→∞ Q N k (H δ ) = 1, so Q(H δ ) = 1. Since this holds for any δ > 0, we see This means that By doing this for a countable set of functions φ that is dense in C ∞ with respect to || · || ∞ + || M · || ∞ and arguing that this implies the same for any smooth function we see: Since this holds for any T > 0, we see that Q−a.s. for every t ≥ 0 and for all smooth φ: Note that (31) is a weak, measure-valued formulation of the heat equation. We will argue and use shortly that this equation uniquely determines the trajectory t → α t given the initial conditions.

Continuity
To obtain uniqueness, we first need to know that the trajectory is continuous. For the R n case this is shown in (Seppäläinen [17], Lemma 8.6). The result can be shown in exactly the same way in our case, so we will not provide all the details. The topology on the space of measures is generated by the following metric: for some sequence φ j ∈ C ∞ (M

Uniqueness
To obtain uniqueness of limits of subsequences of Q N , we need to know that there is a unique continuous solution to (31) that has initial condition ρ 0 dV . We know that t → ρ t dV is a continuous solution to (31) with the right initial condition if t → ρ t satisfies the heat equation with initial condition ρ 0 . Therefore it suffices to show that this solution is unique. This result is proven with a boundedness condition in (Seppäläinen [17], Thm A.28). The main idea of the proof is that the measure valued path α t is smoothed by taking its convolution with some smooth kernel with bandwidth > 0. Then it is shown that this trajectory of functions satisfies the heat equation with initial condition ρ 0 in the strong sense (by interchanging integral and derivatives and using that these identities are known for sufficiently many φ), so it must equal t → ρ t . Then by letting go to zero, it is shown that the original trajectory t → α t must equal t → ρ t dλ, where λ is the Lebesgue measure.
To obtain the analogous result in our setting, we cannot use convolution, since this is not well-defined on a manifold. However, we can smooth the measures by integrating the heat kernel at time with respect to the measures. Using this smoothing, we can follow exactly the same approach, i.e. showing that the smoothed trajectory satisfies the heat equation in a strong sense and then letting go to 0. The boundedness condition is a bound on volumes, which is needed for some estimations in Seppäläinen [17] and for the uniqueness of the strong solution to the heat equation. Since we work in a compact setting and with probability measures, such a bound is not necessary. The uniqueness of the strong solution to the heat equation is a standard result in our case (so for a compact and connected Riemannian manifold). See for instance [11,Thm. 8.18]. Results on the heat kernel on a manifold can also be found in Grigoryan [11].

Conclusion
Now let t → ρ t be the solution to the heat equation on M with initial condition ρ 0 and call β := (t → ρ t dV ). Recall that (31) holds Q−a.s. By the uniqueness result above, this implies that Q is a Dirac distribution with β as its support. Since this does not depend on Q N k , it must be the same for any convergent subsequence, so with arguments given above, we conclude that Q N → Q weakly. Let γ N denote the random trajectory t → μ N t . Since Q is degenerate, the weak convergence implies convergence in probability, so γ N → β in probability. This is what we wanted to show.
(not depending on s N ), such that for N large enough Now we see there is a C > 0 such that for N large enough This implies that there is some N 0 such that for all N ≥ N 0 α N ≥ β N . By our choice of k, all points at distance α N or less are joined by an edge, so this inequality combined with the definition of β N shows that for all N ≥ N 0 V N with edges as defined in the lemma statement is connected.