1 Introduction

Let M and G be finite dimensional smooth manifolds. Let \(Y_k\), \(k=1,\dots , m\), be \(C^6\) vector fields on M, \(\alpha _k\) real valued \(C^r\) functions on G, \(\epsilon \) a positive number, and \((z_t^\epsilon )\) diffusions on a filtered probability space \((\Omega , {\mathcal {F}}, {\mathcal {F}}_t, {\mathbb {P}})\) with values in G and infinitesimal generator \({\mathcal {L}}_0^\epsilon ={1\over \epsilon } {\mathcal {L}}_0\) which will be made precise later. The aim of this paper is to study limit theorems associated to the system of ordinary differential equations on M,

$$\begin{aligned} \dot{y}_t^\epsilon (\omega )=\sum _{k=1}^m Y_k(y_t^\epsilon (\omega ))\alpha _k(z_t^\epsilon (\omega )) \end{aligned}$$
(1.1)

under the assumption that \(\alpha _k\) ‘averages’ to zero. The ‘average’ is with respect to the unique invariant probability measure of \({\mathcal {L}}_0\), in case \({\mathcal {L}}_0\) satisfies strong Hörmander’s condition, and more generally the ‘average’ is the projection to a suitable function space. We prove that \(y_{t\over \epsilon }^\epsilon \) converges as \(\epsilon \rightarrow 0\) to a Markov process whose Markov generator has an explicit expression.

This study is motivated by problems arising from stochastic homogenization. It turned out that in the study of randomly perturbed systems with a conserved quantity, which does not necessarily take value in a linear space, the reduced equations for the slow variables can sometimes be transformed into (1.5). Below, in Sect. 2 we illustrate this by four examples including one on the orthonormal frame bundle over a Riemannian manifold. Of these examples, the first is from [25] where we did not know how to obtain a rate of convergence, and the last three from [26] where a family of interpolation equations on homogeneous manifolds are introduced. An additional example can be found in [24].

1.1 Outline of the paper

In all the examples, which we described in Sect. 2 below, the scalar functions average to 0 with respect to a suitable probability measure on G. Bearing in mind that if a Hamiltonian system approximates a physical system with error \(\epsilon \) on a compact time interval, over a time interval of size \({1\over \epsilon }\) the physical orbits deviate visibly from that of the Hamiltonian system unless the error is reduced by oscillations, it is natural and a classical problem to study ODEs whose right hand sides are random and whose averages in time are zero.

The objectives of the present article are: (1) to prove that, as \(\epsilon \) tends to zero, the law of \((y_{s\over \epsilon }^\epsilon , s \le t)\) converges weakly to a probability measure \(\bar{\mu }\) on the path space over M and to describe the properties of the limiting Markov semigroups; (2) to estimate the rate of convergence, especially in the Wasserstein distance. For simplicity we assume that all the equations are complete. In Sects. 4, 5, 6 and 8 we assume that \({\mathcal {L}}_0\) is a regularity improving Fredholm operator on a compact manifold G, see Definition 4.1. In Theorem 6.4 we assume, in addition, that \({\mathcal {L}}_0\) has Fredholm index 0. But strong Hörmander’s condition can be used to replace the condition ‘regularity improving Fredholm operator of index 0’.

For simplicity, throughout the introduction, \(\alpha _k\) are bounded and belong to \( N^\perp \) where N is the kernel of \({\mathcal {L}}_0^*\), the adjoint of the unbounded operator \({\mathcal {L}}_0\) in \(L^2(G)\) with respect to the volume measure. In case \({\mathcal {L}}_0\) is not elliptic we assume in addition that \(r\ge 3\) or \(r\ge \max {\{3, {n\over 2} +1\}}\), depending on the result. The growth conditions on \(Y_k\) are in terms of a control function V and a controlled function space \(B_{V,r}\) where r indicates the order of the derivatives to be controlled, see (5.1). For simplicity we assume both M and G are compact.

In Sect. 3 we present two elementary lemmas, Lemmas 3.4 and 3.5, assuming \({\mathcal {L}}_0\) mixes exponentially in a weighted total variation norm with weight \(W: G\rightarrow {\mathbb {R}}\). In Sect. 4, for \({\mathcal {L}}_0\) a regularity improving Fredholm operator and f a \(C^2\) function, we deduce a formula for \(f(y_{t\over \epsilon }^\epsilon )\) where the transmission of the randomness from the fast motion \((z_t^\epsilon )\) to the slow motion \((y_t^\epsilon )\) is manifested in a martingale. This provides a platform for the uniform estimates over large time intervals, weak convergences, and the study of rate of convergence in later sections.

In Sect. 5, we obtain uniform estimates in \(\epsilon \) for functionals of \(y_{t}^\epsilon \) over \([0,{1\over \epsilon }]\). Let \({\mathcal {L}}_0\) be a regularity improving Fredholm operator, \(y_0^\epsilon =y_0\), and V a \(C^2\) function such that \(\sum _{j=1}^m|L_{Y_j}V|\le c+KV\), \(\sum _{i,j=1}^m|L_{Y_i}L_{Y_j}V| \le c+KV\) for some numbers c and K. Then, Theorem 5.2, for every numbers \(p\ge 1\) there exists a positive number \(\epsilon _0\) such that \(\sup _{0<\epsilon \le \epsilon _0}{\mathbf E}\sup _{0\le u\le t} V^p(y_{u\over \epsilon }^\epsilon )\) is finite and belongs to \(B_{V,0}\) as a function of \(y_0\). This leads to convergence in the Wasserstein distance and will be used later to prove a key lemma on averaging functions along the paths of \((y_t^\epsilon , z_t^\epsilon )\).

In Sect. 6, \({\mathcal {L}}_0\) is an operator on a compact manifold G satisfying Hörmander’s condition and with Fredholm index 0; M has positive injectivity radius and other geometric restrictions. In particular we do not make any assumption on the ergodicity of \({\mathcal {L}}_0\). Let \(\overline{\alpha _i\beta _j}\) denote \(\sum _{l} u_l\langle \alpha _i\beta _j, \pi _l\rangle \) where \(\{u_l\}\) is a basis of the kernel of \({\mathcal {L}}_0\) and \(\{\pi _l\}\) the dual basis in the kernel of \({\mathcal {L}}_0^*\). Theorem 6.4 states that, given bounds on \(Y_k\) and its derivatives and for \(\alpha _k\in C^r\) where \(r\ge \max {\{3, {n\over 2}+1\}}\), \((y_{s\over \epsilon }^\epsilon , s \le t)\) converges weakly, as \(\epsilon \rightarrow 0\), to the Markov process with Markov generator \(\bar{{\mathcal {L}}}=\sum _{i,j=1}^m \overline{\alpha _i\beta _j} L_{Y_i}L_{Y_j}\). This follows from a tightness result, Proposition 6.1 where no assumption on the Fredholm index of \({\mathcal {L}}_0\) is made, and a law of large numbers for sub-elliptic operators on compact manifolds, Lemma 6.2. Convergences of \(\{(y_{t\over \epsilon }^\epsilon , 0\le t\le T)\}\) in the Wasserstein p-distance are also obtained.

In Sect. 7 we study the solution flows of SDEs and their associated Kolmogorov equations, to be applied to the limiting operator \(\bar{{\mathcal {L}}}\) in Sect. 8. Otherwise this section is independent of the rest of the paper. Let \(Y_k, Y_0\) be \(C^6\) and \(C^5\) vector fields respectively. If M is compact, or more generally if \(Y_k\) are \(BC^5\) vector fields, the conclusions in this section holds, trivially. Denote \(B_{V,4}\) the set of functions whose derivatives up to order r are controlled by a function V, c.f. (5.1). Let \(\Phi _t(y)\) be the solution flow to

$$\begin{aligned} dy_t=\sum _{k=1}^m Y_k(y_t)\circ dB_t^k+Y_0(y_t)dt. \end{aligned}$$

Let \(P_t f(y)={\mathbf E}f(\Phi _t(y))\) and \(Z={1\over 2}\sum _{k=1}^m \nabla _{Y_k} Y_k+Y_0\). Let \(V\in C^2(M, {\mathbb {R}}_+)\) and \(\sup _{s\le t}{\mathbf E}V^q(\phi _s(y))\in B_{V,0}\) for every \(q\ge 1\). This assumption on V is implied by the following conditions: \(|L_{Y_i}L_{Y_j} V| \le c+KV\), \(|L_{Y_j}V|\le c+KV\), where CK are constants. Let \(\tilde{V}=1+\ln (1+|V|)\). We assume, in addition, for some number c the following hold:

$$\begin{aligned} {\begin{aligned}&\sum _{k=1}^m\sum _{\alpha =0}^5 | \nabla ^{(\alpha )} Y_k| \in B_{V,0}, \;&\sum _{\alpha =0}^4 |\nabla ^{(\alpha )} Y_0| \in B_{V,0},\; \\&\sum _{k=1}^m| \nabla Y_k|^2 \le c\tilde{V},\;&\sup _{|u|=1}\langle \nabla _u Z, u\rangle \le c\tilde{V}. \end{aligned}} \end{aligned}$$
(1.2)

Then there is a global smooth solution flow \(\Phi _t(y)\), Theorem 7.2. Furthermore for \(f\in B_{V,4}\), \({\mathcal {L}}f\in B_{V,2}\), \({\mathcal {L}}^2 f\in B_{V,0}\), and \( P_tf \in B_{V,4}\).

For \(M={\mathbb {R}}^n\), an example of the control pair is: \(V(x)=C(1+|x|^2)\) and \(\tilde{V}(x)=\ln (1+|x|^2)\). Our conditions are weaker than those commonly used in the probability literature for \(d(P_tf)\), in two ways. Firstly we allow non-bounded first order derivative, secondly we allow one sided conditions on the drift and its first order derivatives. In this regard, we extend a theorem of Kohler and Papanicolaou [32] where they used estimations from Oleinik–Radkevič [31]. The estimates on the derivative flows, obtained in this section, are often assumptions in applications of Malliavin calculus to the study of stochastic differential equations. Results in this section might be of independent interests.

Let \(P_t\) be the Markov semigroup generated by \(\bar{{\mathcal {L}}}\). In Sect. 8, we prove the following estimate: \(|{\mathbf E}f(\Phi _t^\epsilon (y_0))- P_tf(y_0)|\le C(t)\gamma (y_0)\epsilon \sqrt{|\log \epsilon |}\) where C(t) is a constant, \(\gamma \) is a function in \(B_{V,0}\) and \(\Phi _t^\epsilon (y_0)\) the solution to (1.5) with initial value \(y_0\). The conditions on the vector fields \(Y_k\) are similar to (1.2), we also assume the conditions of Theorem 5.2 and that \({\mathcal {L}}_0\) satisfies strong Hörmander’s condition. We incorporated traditional techniques on time averaging with techniques from homogenization. The homogenization techniques was developed from [23] which was inspired by the study in Hairer and Pavliotis [12]. For the rate of convergence we were particularly influenced by the following papers: Kohler and Papanicolaou [32, 36] and Papanicolaou and Varadhan [34]. Denote \(\hat{P}_{y^\epsilon _{t\over \epsilon }}\) the probability distributions of the random variables \(y^\epsilon _{t\over \epsilon }\) and \(\bar{\mu }_t\) the probability measure determined by \(P_t\). The under suitable conditions, \(W_1(\hat{P}_{y^\epsilon _{t\over \epsilon }}, \bar{\mu }_t) \le C\epsilon ^r\), where r is any positive number less or equal to \({1\over 4}\) and \(W_1\) denotes the Wasserstein 1-distance, see Sect. 9.

1.2 Main theorems

We contrive to impose as little as possible on the vector fields \(\{Y_k\}\), hence a few sets of assumptions are used. For the examples we have in mind, G is a compact Lie group acting on a manifold M, and so for simplicity G is assumed to be compact throughout the article, with few exceptions. In a future study, it would be nice to provide some interesting examples in which G is not compact.

If M is also compact, only the following two conditions are needed: (a) \({\mathcal {L}}_0\) satisfies strong Hörmander’s condition; (b) \(\{\alpha _k\} \subset C^r\cap N^\perp \) where N is the annihilator of the kernel of \({\mathcal {L}}_0^*\) and r is a sufficiently large number. If \({\mathcal {L}}_0\) is elliptic, ‘\(C^r\)’ can be replaced by ‘bounded measurable’. For the convergence condition (a) can be replaced by ‘\({\mathcal {L}}_0\) satisfies Hörmander’s condition and has Fredholm index 0’. If \({\mathcal {L}}_0\) has a unique invariant probability measure, no condition is needed on the Fredhom index of \({\mathcal {L}}_0\).

Theorem 6.4 and Corollary 6.5. Under the conditions of Proposition 6.1 and Assumption 6.1, \((y_{t\over \epsilon }^\epsilon )\) converges weakly to the Markov process determined by

$$\begin{aligned} \bar{{\mathcal {L}}} =-\sum _{i,j=1}^m \overline{ \alpha _i {\mathcal {L}}_0^{-1} \alpha _j } L_{Y_i}L_{Y_j}, \quad \overline{ \alpha _i {\mathcal {L}}_0^{-1} \alpha _j }=\sum _{b=1}^{n_0} u_b \langle \alpha _i {\mathcal {L}}_0^{-1} \alpha _j ,\pi _b \rangle , \end{aligned}$$

where \(n_0\) is the dimension of the kernel of \({\mathcal {L}}_0\) which, by the assumption that \({\mathcal {L}}_0\) has Fredholm index 0, equals the dimension of the kernel of \({\mathcal {L}}_0^*\). The set of functions \(\{u_b\}\) is a basis of \(\mathrm{ker}({\mathcal {L}}_0)\) and \(\{\pi _b\}\subset \mathrm{ker}({\mathcal {L}}_0^*)\) its dual basis. In case \({\mathcal {L}}_0\) satisfies strong Hörmander’s condition, then there is a unique invariant measure and \(\overline{ \alpha _i {\mathcal {L}}_0^{-1} \alpha _j }\) is simply the average of \(\alpha _i {\mathcal {L}}_0^{-1} \alpha _j\) with respect to the unique invariant measure. Let \(p\ge 1\) be a number and V a Lyapunov type function such that \(\rho ^p\in B_{V, 0}\), a function space controlled by V. If furthermore Assumption 5.1 holds, \((y_{\cdot \over \epsilon }^\epsilon )\) converges, on [0, t], in the Wasserstein p-distance.

Theorem 8.2. Denote \(\Phi _t^\epsilon (\cdot )\) the solution flow to (1.5) and \(P_t\) the semigroup for \(\bar{{\mathcal {L}}}\). If Assumption 8.1 holds then for \(f \in B_{V,4}\),

$$\begin{aligned} \left| {\mathbf E}f\left( \Phi ^\epsilon _{T\over \epsilon }(y_0)\right) -P_Tf(y_0)\right| \le \epsilon |\log \epsilon |^{1\over 2}C(T)\gamma _1(y_0), \end{aligned}$$

where \(\gamma _1\in B_{V,0}\) and C(T) is a constant increasing in T. Similarly, if \(f\in BC^4\),

$$\begin{aligned} \left| {\mathbf E}f\left( \Phi ^\epsilon _{T\over \epsilon }(y_0)\right) -P_Tf(y_0)\right| \le \epsilon |\log \epsilon |^{1\over 2}\,C(T)\gamma _2(y_0) (1+ |f|_{4, \infty }). \end{aligned}$$
(1.3)

where \(\gamma _2\) is a function in \(B_{V,0}\) independent of f and C are increasing functions.

A complete connected Riemannian manifold is said to have bounded geometry if it has strictly positive injectivity radius, and if the Riemannian curvature tensor and its covariant derivatives are bounded.

Proposition 9.1. Suppose that M has bounded geometry, \(\rho _o^2 \in B_{V,0}\), and Assumption 8.1 holds. Let \(\bar{\mu }\) be the limit measure and \(\bar{\mu }_t=(ev_t)_*\bar{\mu }\). Then for every \(r<{1\over 4}\) there exists \(C(T)\in B_{V,0}\) and \(\epsilon _0>0\) s.t. for all \(\epsilon \le \epsilon _0\) and \(t\le T\),

$$\begin{aligned} d_W\left( {\mathrm {Law}} \left( {y^\epsilon _{t\over \epsilon }}\right) , \bar{\mu }_t\right) \le C(T)\epsilon ^{r}. \end{aligned}$$

Besides the fact that we work on manifolds, where there is the inherited non-linearity and the problem with cut locus, the following aspects of the paper are perhaps new. (a) We do not assume there exists a unique invariant probability measure on the noise and the effective processes are obtained by a suitable projection, accommodating one type of degeneracy. Furthermore the noise takes value in another manifold, accommodating ‘removable’ degeneracy. For example the stochastic processes in question lives in a Lie group, while the noise are entirely in the directions of a sub-group. (b) We used Lyapunov functions to control the growth of the vector fields and their derivatives, leading to estimates uniform in \(\epsilon \) and to the conclusion on the convergence in the Wasserstein topologies. A key step for the convergence is a law of large numbers, with rates, for sub-elliptic operators (i.e. operators satisfying Hörmander’s sub-elliptic estimates). (c) Instead of working with iterated time averages we use a solution to Poisson equations to reveal the effective operator. Functionals of the processes \(y_{t\over \epsilon }^\epsilon \) splits naturally into the sum of a fast martingale, a finite variation term involving a second order differential operator in Hörmander form, and a term of order \(\epsilon \). From this we obtain the effective diffusion, in explicit Hörmander form. It is perhaps also new to have an estimate for the rate of the convergence in the Wasserstein distance. Finally we improved known theorems on the existence of global smooth solutions for SDEs in [22], c.f. Theorem 7.2 below where a criterion is given in terms of a pair of Lyapunov functions. New estimates on the moments of higher order covariant derivatives of the derivative flows are also given.

1.3 Classical theorems

We review, briefly, basic ideas from existing literature on random ordinary differential equations with fast oscillating vector fields. Let \(F(x,t,\omega , \epsilon ):=F^{(0)}(x,t,\omega )+\epsilon F^{(1)}(x,t,\omega )\), where \(F^{(i)}(x,t,\cdot )\) are measurable functions, for which a Birkhoff ergodic theorem holds whose limit is denoted by \(\bar{F}\). The solutions to the equations \(\dot{y}_t^\epsilon = F(y_t^\epsilon ,{t\over \epsilon },\omega , \epsilon )\) over a time interval [0, t], can be approximated by the solution to the averaged equation driven by \(\bar{F}\). If \(\bar{F}^{(0)}=0\), we should observe the solutions in the next time scale and study \(\dot{x}_t^\epsilon ={1\over \epsilon } F(x_t^\epsilon ,{t\over \epsilon ^2},\omega , \epsilon )\). See Stratonovich [42, 43]. Suppose for some functions \(\bar{a}_{j,k}\) and \(\bar{b}_j\) the following estimates hold uniformly:

$$\begin{aligned}&\left| {1\over \epsilon ^3} \int _s^{s+\epsilon } \int _{s}^{r_1} {\mathbf E}F_j^{(0)}\left( x, {r_2\over \epsilon ^2}\right) F^{(0)}_k\left( x, {r_1\over \epsilon ^2}\right) \, dr_2\; dr_1-\bar{a}_{j,k}(s,x)\right| \, dr_2\; dr_1\le o(\epsilon ),\nonumber \\&\left| {1\over \epsilon ^3} \int _s^{s+\epsilon } \int _{s}^{r_1} \sum _{k=1}^d {\mathbf E}{\partial F_j^{(0)}\over \partial x_k} \left( x, {r_2\over \epsilon ^2}\right) F_k^{(0)} \left( x, {r_1\over \epsilon ^2}\right) \, dr_2\; dr_1\right. \nonumber \\&\quad \left. +{1\over \epsilon } \int _s^{s+\epsilon } {\mathbf E}F_j^{(1)} \left( x, {r\over \epsilon ^2}\right) \,dr-\bar{b}_j(x, s)\right| \nonumber \\&\quad \le o(\epsilon ). \end{aligned}$$
(1.4)

Then under a ‘strong mixing’ condition with suitable mixing rate, the solutions of the equations \(\dot{x}_t^\epsilon ={1\over \epsilon } F(x_t^\epsilon ,{t\over \epsilon ^2},\omega , \epsilon )\) converge weakly on any compact interval to a Markov process. This is a theorem of Stratonovich [43] and Khasminskii [14], further refined and explored in Khasminskii [15] and Borodin [3]. These theorems lay foundation for investigation beyond ordinary differential equations with a fast oscillating right hand side.

In our case, noise comes into the system via a \({\mathcal {L}}_0\)-diffusion satisfying Hörmander’s conditions, and hence we could by pass these assumptions and also obtain convergences in the Wasserstein distances. For manifold valued stochastic processes, some difficulties are caused by the inherited non-linearity. For example, integrating a vector field along a path makes sense only after they are parallel translated back. Parallel transports of a vector field along a path, from time t to time 0, involves the whole path up to time t and introduces extra difficulties; this is still an unexplored territory wanting further investigations. For the proof of tightness, the non-linearity causes particular difficulty if the Riemannian distance function is not smooth. The advantage of working on a manifold setting is that for some specific physical models, the noise can be untwisted and becomes easy to deal with.

Our estimates for the rate of convergence, Sects. 8 and 9, can be considered as an extension to that in Kohler and Papanicolaou [32, 36], which were in turn developed from the following sequence of remarkable papers: Coghurn and Hersh [6], Keller and Papanicolaou [35], Hersh and Pinsky [17], Hersh and Papanicolaou [16] and Papanicolaou and Varadhan [34]. See also Kurtz [21] and [33] by Stroock and Varadhan.

The condition \(\bar{F}=0\) needs not hold for this type of scaling and convergence. If \(F(x, t, \omega , \epsilon )=F^{(0)} (x, \zeta _t(\omega ))\), where \(\zeta _t\) is a stationary process with values in \({\mathbb {R}}^m\), and \(\bar{F}^{(0)}=X_H\), the Hamiltonian vector field associated to a function \(H\in BC^3( {\mathbb {R}}^2; {\mathbb {R}})\) whose level sets are closed connected curves without intersections, then \(H(y_{t\over \epsilon }^\epsilon )\) converge to a Markov process, under suitable mixing and technical assumptions. See Borodin and Freidlin [4], also Freidlin and Weber [8] where a first integral replaces the Hamiltonian, and also Li [25] where the value of a map from a manifold to another is preserved by the unperturbed system.

In Freidlin and Wentzell [9], the following type of central limit theorem is proved: \({1\over \sqrt{\epsilon }} (H(x_s^\epsilon )-H(\bar{x}_s))\) converges to a Markov diffusion. This formulation is not suitable when the conserved quantity takes value in a non-linear space.

For the interested reader, we also refer to the following articles on limit theorems, averaging and Homogenization for stochastic equations on manifolds: Enriquez et al. [7], Gargate and Ruffino [11], Ikeda and Ochi [19], Kifer [20], Liao and Wang [27], Manade and Ochi [29], Ogura [30], Pinsky [37], and Sowers [41].

1.4 Further question

(1) I am grateful to the associate editor for pointing out the paper by Liverani and Olla [28], where random perturbed Harmiltonian system, in the context of weak interacting particle systems, is studied. Their system is somewhat related to the completely integrable equation studied in [23] leading to a new problem which we now state. Denote \(X_f\) the Hamiltonian vector field on a symplectic manifold corresponding to a function f. If the symplectic manifold is \({\mathbb {R}}^{2n}\) with the canonical symplectic form, \(X_f\) is the skew gradient of f. Suppose that \(\{H_1, \dots , H_n\}\) is a completely integrable system, i.e. they are poisson commuting at every point and their Hamiltonian vector fields are linearly independent at almost all points. Following [23] we consider a completely integrable SDE perturbed by a transversal Hamiltonian vector field:

$$\begin{aligned} dy_t^\epsilon = \sum _{i=1}^n X_{H_i}(y_t^\epsilon )\circ dW_t^i+X_{H_0}(y_t^\epsilon )dt+\epsilon X_K(y_t^\epsilon )dt. \end{aligned}$$

Suppose that \(X_{H_0}\) commutes with \(X_{H_k}\) for \(k=1,\dots , n\), then each \(H_i\) is a first integral of the unperturbed system. Then, [23, Th 4.1], within the action angle coordinates of a regular value of the energy function \(H=(H_1, \dots , H_n)\), the energies \(\{ H_1(y^\epsilon _{t\over \epsilon ^2}), \dots , H_n(y^\epsilon _{t\over \epsilon ^2})\}\) converge weakly to a Markov process. When restricted to the level sets of the energies, the fast motions are ellipitic. It would be desirable to remove the ‘complete integrability’ in favour of Hormander’s type conditions. There is a non-standard symplectic form on \(({\mathbb {R}}^4)^N\) with respect to which the vector fields in [28] are Hamiltonian vector fields and when restricted to level sets of the energies the unperturbed system in [28] satisfies Hörmander’s condition, see [28, section 5], and therefore provides a motivating example for further studies. Finally note that the driving vector fields in (1.5) are in a special form, results here would not apply to the systems in [23] nor that in [28], and hence it would be interesting to formulate and develop limit theorems for more general random ODEs to include these two cases.

(2) It should be interesting to develop a theory for the ODEs below

$$\begin{aligned} \dot{y}_t^\epsilon (\omega )=\sum _{k=1}^m Y_k(y_t^\epsilon (\omega ))\alpha _k(z_t^\epsilon (\omega ), y_t^\epsilon )) \end{aligned}$$
(1.5)

where \(\alpha _k\) depends also on the \(y^\epsilon \) process.

(3) It would be nice to extend the theory to allow the noise to live in a non-compact manifold, in which \({\mathcal {L}}_0\) should be an Ornstein–Uhlenbeck type operator whose drift term would provide for a deformed volume measure.

Notation. Throughout this paper \({\mathcal {B}}_b(M;N)\), \(C_K^r(M;N)\), and \(BC^r(M;N)\) denote the set of functions from M to N that are respectively bounded measurable, \(C^r\) with compact supports, and bounded \(C^r\) with bounded first r derivatives. If \(N={\mathbb {R}}\) the letter N will be suppressed. Also \({\mathbb L}(V_1;V_2)\) denotes the space of bounded linear maps; \(C^r(\Gamma TM)\) denotes \(C^r\) vector fields on a manifold M.

2 Examples

Let \(\{W_t^k\}\) be independent real valued Brownian motions on a given filtered probability space, \(\circ \) denote Stratonovich integrals. In the following \(H_0\) and \( A_k\) are smooth vector fields, and \(\{A_1, \dots , A_k\}\) is an orthonormal basis at each point of the vertical tangent spaces. To be brief, we do not specify the properties of the vector fields, instead refer the interested reader to [25] for details. For any \(\epsilon >0\), the stochastic differential equations

$$\begin{aligned} du_t^\epsilon =H_0(u_t^\epsilon )dt+{1\over \sqrt{\epsilon }}\sum _{k=1}^{n(n-1)\over 2} A_k(u_t^\epsilon )\circ dW_t^k \end{aligned}$$

are degenerate and they interpolate between the geodesic equation (\(\epsilon =\infty \)) and Brownian motions on the fibres (\(\epsilon =0\)). The fast random motion is transmitted to the horizontal direction by the action of the Lie bracket \([H_0, A_k]\). If \( H_0=0\), there is a conserved quantity to the system which is the projection from the orthonormal bundle to the base manifold. This allows us to separate the slow variable \((y_t^\epsilon )\) and the fast variable \((z_t^\epsilon )\). The reduced equation for \((y_t^\epsilon )\), once suitable ‘coordinate maps’ are chosen, can be written in the form of (1.5). In [25] we proved that \((y^\epsilon _{t\over \epsilon })\) converges weakly to a rescaled horizontal Brownian motion. Recently Angst et al. gave this a beautiful treatment, [1], using rough path analysis. By theorems in this article, the above model can be generalised to include random perturbation by hypoelliptic diffusions, i.e. \(\{A_1, \dots , A_k\}\) generates all vertical directions. In [25] we did not know how to obtain a rate for the convergence. Theorem 8.2, in this article, will apply and indeed we have an upper bound for the rate of convergence.

As a second example, we consider, on the special orthogonal group SO(n), the following equations:

$$\begin{aligned} dg_t^\epsilon ={1\over \sqrt{\epsilon }}\sum _{k=1}^{n(n-1)\over 2} g_t^\epsilon E_{k}\circ dW_t^k + g_t^\epsilon Y_0dt, \end{aligned}$$
(2.1)

where \(\{E_k\}\) is an orthonormal basis of \({\mathfrak {so}}(n-1)\), as a subspace of \({\mathfrak {so}}(n)\), and \(Y_0\) is a skew symmetric matrix orthogonal to \({\mathfrak {so}}(n-1)\). The above equation is closely related to the following set of equations:

$$\begin{aligned} dg_t=\gamma \sum _{k=1}^{n(n-1)\over 2} g_t E_{k}\circ dW_t^k +\delta g_t Y_0dt, \end{aligned}$$

where \(\gamma , \delta \) are two positive numbers. If \(\delta =0\) and \(\gamma =1\), the solutions are Brownian motions on \(SO(n-1)\). If \(\delta ={1\over |Y_0|}\) and \(\gamma =0\), the solutions are unit speed geodesics on SO(n). These equations interpolate between a Brownian motion on the sub-group \(SO(n-1)\) and a one parameter family of subgroup on SO(n). See [26]. Take \(\delta =1\) and let \(\gamma ={1\over \sqrt{\epsilon }}\rightarrow \infty \), what could be the ‘effective limit’ if it exists? The slow components of the solutions, which we denote by \((u_t^\epsilon )\), satisfy equations of the form (1.5). They are ‘horizontal lifts’ of the projections of the solutions to \(S^n\). If \({\mathfrak {m}}\) is the orthogonal complement of \({\mathfrak {so}}(n-1)\) in \({\mathfrak {so}}(n)\) then \({\mathfrak {m}}\) is \(\mathrm{Ad}_H\)-irreducible and \(\mathrm{Ad}_H\)-invariant, noise is transmitted from \({\mathfrak {h}}\) to every direction in \({\mathfrak {m}}\), and this in the uniform way. It is therefore plausible that \(u_{t\over \epsilon }^\epsilon \) can be approximated by a diffusion \(\bar{u}_t\) of constant rank. The projection of \(u_t\) to \(S^n\) is a scaled Brownian motion with scale \(\lambda \). The scale \(\lambda \) is a function of the dimension n, but is independent of \(Y_0\) and is associated to an eigenvalue of the Laplacian on \(SO(n-1)\), indicating the speed of propagation.

As a third example we consider the Hopf fibration \(\pi : S^3\rightarrow S^2\). Let \(\{X_1, X_2, X_3\}\) be the Pauli matrices, they form an orthonormal basis of \({\mathfrak {su}}(2)\) with respect to the canonical bi-invariant Riemannian metric.

$$\begin{aligned} X_1=\left( \begin{matrix} i &{}0\\ 0&{}-i \end{matrix}\right) , \quad X_2=\left( \begin{matrix} 0 &{}1\\ -1&{}0 \end{matrix}\right) , \quad X_3=\left( \begin{matrix} 0 &{}i\\ i&{}0 \end{matrix}\right) . \end{aligned}$$

Denote \(X^*\) the left invariant vector field generated by \(X\in {\mathfrak {su}}(2)\). By declaring \(\{{1\over \sqrt{\epsilon }}X_1^*, X_2^*, X_3^*\}\) an orthonormal frame, we obtain a family of left invariant Riemannian metrics \(m^\epsilon \) on \(S^3\). The Berger’s spheres, \((S^3, m^\epsilon )\), converge in measured Gromov–Hausdorff topology to the lower dimensional sphere \(S^2({1\over 2})\). For further discussions see Fukaya [10] and Cheeger and Gromov [5]. Let \(W_t\) be a one dimensional Brownian motion and take Y from \({\mathfrak {m}}:=\langle X_2, X_3\rangle \). The infinitesimal generator of the equation \( dg_t^\epsilon ={1\over \sqrt{\epsilon }} X_1^*(g_t^\epsilon )\circ dW_t+Y^*(g_t^\epsilon ) \;dt\) satisfies weak Hörmander’s conditions. The ‘slow motions’, suitably scaled, converge to a ‘horizontal’ Brownian motion whose generator is \({1\over 2}c\mathrm{trace}_{{\mathfrak {m}}}\nabla d\), where the trace is taken in \({\mathfrak {m}}\). A slightly different, ad hoc, example on the Hopf fibration is discussed in [24]. An analogous equations can be considered on SU(n) where the diffusion coefficients come from a maximal torus.

Finally we give an example where the noise \((z_t^\epsilon )\) in the reduced equation is not elliptic. Let \(M=SO(4)\) and let \(E_{i,j}\) be the elementary \(4\times 4\) matrices and \(A_{i,j}={1\over \sqrt{2}} (E_{ij}-E_{ji})\). For \(k=1,2\) and 3, we consider the equations

$$\begin{aligned} dg_t^\epsilon ={1\over \sqrt{\epsilon }} A_{1,2}^*(g_t^\epsilon )\circ db_t^1+ {1\over \sqrt{\epsilon }} A_{1,3}^*(g_t^\epsilon )\circ db_t^2+A_{k4}^*(g_t^\epsilon )dt. \end{aligned}$$

The slow components of the solutions of these equations again satisfy an equation of the form (1.5).

3 Preliminary estimates

Let \({\mathcal {L}}_0\) be a diffusion operator on a manifold G and \(Q_t\) its transition semigroup and transition probabilities. Let \(\Vert \cdot \Vert _{TV}\) denote the total variation norm of a measure, normalized so that the total variation norm between two probability measures is less or equal to 2. By the duality formulation for the total variation norm,

$$\begin{aligned} \Vert \mu \Vert _{TV}=\sup _{|f| \le 1, f\in {\mathcal {B}}_b(G;{\mathbb {R}})} \left| \int _G fd\mu \right| . \end{aligned}$$

For \(W\in {\mathcal {B}}( G; [1,\infty ))\) denote \(\Vert f\Vert _W\) the weighted supremum norm and \(\Vert \mu \Vert _{TV,W}\) the weighted total variation norm:

$$\begin{aligned} \Vert f\Vert _W=\sup _{x\in G}{|f(x)|\over W(x)}, \quad \Vert \mu \Vert _{TV, W}= \sup _{ \{ \Vert f\Vert _W\le 1\}} \left| \int _G f d\mu \right| . \end{aligned}$$

Assumption 3.1

There is an invariant probability measure \(\pi \) for \({\mathcal {L}}_0\), a real valued function \(W\in L^1(G, \pi )\) with \(W\ge 1\), numbers \(\delta >0\) and \(a>0\) such that

$$\begin{aligned} \sup _{x\in G}{\Vert Q_t(x,\cdot )-\pi \Vert _{TV, W}\over W(x)} \le ae^{-\delta t}. \end{aligned}$$

If G is compact we take \(W\equiv 1\).

In the following lemma we collect some elementary estimates, which will be used to prove Lemmas 3.4 and 3.5, for completeness their proofs are given in the “Appendix”. Write \(\bar{W}=\int _G W d\pi \).

Lemma 3.1

Assume Assumption 3.1. Let \(f,g:G\rightarrow {\mathbb {R}}\) be bounded measurable functions and let \(c_\infty =|f|_\infty \Vert g\Vert _W\). Then the following statements hold for all \(s, t\ge 0\).

  1. (1)

    Let \((z_t)\) be an \({\mathcal {L}}_0\) diffusion. If \( \int _G gd\pi =0\),

    $$\begin{aligned}&\left| {1\over t-s} \int _s^{t} \int _s^{s_1} \left( {\mathbf E}\left\{ f(z_{s_2}) g(z_{s_1}) \Big | {\mathcal {F}}_s\right\} -\int _G fQ_{s_1-s_2}g d\pi \right) ds_2ds_1\right| \\&\quad \le {a^2c_\infty \over (t-s)\delta ^2}W(z_s). \end{aligned}$$
  2. (2)

    Let \((z_t)\) be an \({\mathcal {L}}_0\) diffusion. If \(\int _G gd \pi =0\) then

    $$\begin{aligned}&\left| {1\over t-s} \int _s^{t} \int _s^{s_1} {\mathbf E}\left\{ f(z_{s_2}) g(z_{s_1}) \Big | {\mathcal {F}}_s\right\} \; ds_2 \; ds_1 - \int _G\int _0^\infty f Q_rg \; dr \; d\pi \right| \\&\quad \le {c_\infty \over (t-s) \delta ^2} (a^2 W(z_s)+a \bar{W} )+{c_\infty a\over \delta } \bar{W}. \end{aligned}$$
  3. (3)

    Suppose that either \(\int _G f\; d\pi =0\) or \(\int _G g\; d\pi =0\). Let

    $$\begin{aligned} C_1={a\over \delta ^2} (aW+\bar{W})|f|_\infty \Vert g\Vert _W, \quad C_2={2a\over \delta }|f|_\infty \Vert g\Vert _W\bar{W} +{a \over \delta }|\bar{g}| \; \Vert f\Vert _W W. \end{aligned}$$

    Let \((z_t^\epsilon )\) be an \({\mathcal {L}}_0^\epsilon \) diffusion. Then for every \(\epsilon >0\),

    $$\begin{aligned} \left| \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ f(z^\epsilon _{s_2}) g(z^\epsilon _{s_1}) \Big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2\; ds_1 \right| \le C_1(z_{s\over \epsilon }^\epsilon ) \epsilon ^2+C_2(z_{s\over \epsilon }^\epsilon ) (t-s). \end{aligned}$$

To put Assumption 3.1 into context, we consider Hörmander type operators. Let \(L_{X}\) denote Lie differentiation in the direction of a vector field X and [XY] the Lie bracket of two vector fields X and Y. Let \(\{X_i, i=0, 1,\dots , m'\}\) be a family of smooth vector fields on a compact smooth manifold G and \({\mathcal {L}}_0={1\over 2}\sum _{i=1}^{m'}L_{X_i}L_{X_i}+L_{X_0}\). If \(\{X_i, i=1, \dots , m'\}\) and their Lie brackets generate the tangent space \(T_xG\) at each point x we say that the operator \({\mathcal {L}}_0\) satisfies the strong Hörmander’s condition.

Lemma 3.2

Suppose that \({\mathcal {L}}_0\) satisfies the strong Hörmander condition on a compact manifold G and let \(Q_t(x,\cdot )\) be its family of transition probabilities. Then Assumption 3.1 holds with W identically 1. Furthermore the invariant probability measure \(\pi \) has a strictly positive smooth density w.r.t. the Lebesgue measure and

$$\begin{aligned} \Vert Q_t(x,\cdot )-\pi (\cdot )\Vert _{TV} \le Ce^{-\delta t} \end{aligned}$$

for all x in G and for all \(t>0\).

Proof

By Hörmander’s theorem there are smooth functions \(q_t(x,y)\) such that \(Q_t(x,dy)=q_t(x,y)dy\). Furthermore \(q_t(x,y)\) is strictly positive, see Bony [2] and Sanchez-Calle [39]. Let \(a=\inf _{x,y\in M}q_t(x,y)>0\). Thus Döeblin’s condition holds: if \(\mathrm{vol}(A)\) denotes the volume of a Borel set A, \(Q_t(x, A)\ge a \,\mathrm{vol}(A)\). \(\square \)

We say that W is a \(C^3\) Lyapunov function for the ergodicity problem if there are constant \(c\not =0\) and \(C>0\) s.t. \({\mathcal {L}}_0 W\le C-c^2 W\). If such a function exists, the \({\mathcal {L}}_0^\epsilon \) diffusions are conservative. Suppose that the Lyapunov function V satisfies in addition the following conditions: there exists a number \(\alpha \in (0,1)\) and \(t_0>0\) s.t. for every \(R>0\),

$$\begin{aligned} \sup _{\{(x,y): V(x)+V(y)\le R\}} \Vert Q_{t_0}(x, \cdot )-Q_{t_0}(y, \cdot )\Vert _{TV} \le 2(1-\alpha ), \end{aligned}$$

Then there exists a unique invariant measure \(\pi \) such that Assumption 3.1 holds, see e.g. Hairer and Mattingly [13]. We mention the following standard estimates which helps to understand the estimates in Lemma 3.3.

Lemma 3.3

Let W be a \(C^3\) Lyapunov function for the ergodicity problem of \({\mathcal {L}}_0\), \( {\mathbf E}W(z_0^\epsilon )\) is uniformly bounded in \(\epsilon \) for \(\epsilon \) sufficiently small. Then there exist numbers \(\epsilon _0>0\) and c s.t. for all \(t>0\),

$$\begin{aligned} \sup _{s\le t}\sup _{\epsilon \le \epsilon _0}{\mathbf E}W(z_{s\over \epsilon }^\epsilon ) \le c. \end{aligned}$$

Proof

By localizing \((z_t^\epsilon )\) if necessary, we see that \(W(z_t^\epsilon )-W(z_0^\epsilon )-{1\over \epsilon } \int _0^t {\mathcal {L}}_0 W (z^\epsilon _r) dr\) is a martingale. Let \(c\not =0\) and \(C>0\) be constant s.t. \({\mathcal {L}}_0 W\le C-c^2 W\). Then \( {\mathbf E}W(z_{s\over \epsilon }^\epsilon ) \le ({\mathbf E}W(z_0^\epsilon )+{1\over \epsilon }Ct) e^{-{c^2\over \epsilon } t}\).

As an application we see that, under the assumption of Lemma 3.3, the functions \(C_i\) in part (3) of Lemma 3.1 satisfy that \(\sup _{\epsilon \le \epsilon _0}{\mathbf E}C_i(z_{s\over \epsilon }^\epsilon ) <\infty \).

Definition 3.1

We say that a stochastic differential equation (SDE) on M is complete or conservative if for each initial point \(y\in M\) any solution with initial value y exists for all \(t\ge 0\). Let \(\Phi _t(x)\) be its solution starting from x. The SDE is strongly complete if it has a unique strong solution and that \((t,x)\mapsto \Phi _t(x, \omega )\) is continuous for a.s. \(\omega \).

From now on, by a solution we always mean a globally defined solution. For \(\epsilon \in (0,1)\) we define \({\mathcal {L}}_0^\epsilon ={1\over \epsilon } {\mathcal {L}}_0\). Let \(Q_t^\epsilon \) denote their transition semigroups and transition probabilities. For each \(\epsilon >0\), let \((z_t^\epsilon )\) be an \({\mathcal {L}}_0^\epsilon \) diffusion. Let \(\alpha _k\in {\mathcal {B}}_b(G;{\mathbb {R}})\) and \((y_t^\epsilon )\) solutions to the equations

$$\begin{aligned} \dot{y}_t^\epsilon =\sum _{k=1}^m Y_k(y_t^\epsilon )\alpha _k(z_t^\epsilon ). \end{aligned}$$
(3.1)

Let \(\Phi _{s,t}^\epsilon \) be the solution flow to (3.1) with \(\Phi _{s,s}^\epsilon (y)=y\). We denote by \(\bar{g}\) the average of an integrable function \(g: G\rightarrow {\mathbb {R}}\) with respect to \(\pi \). Let

$$\begin{aligned} c_0(a, \delta )={a^2+a\over \delta ^2}+{3 a\over \delta }, \quad c_{W}=c(a,\delta )(W+\bar{W}). \end{aligned}$$
(3.2)

Lemma 3.4

Suppose that Assumption 3.1 holds. Let \(f,g \in {\mathcal {B}}_b(G; {\mathbb {R}})\) and \(\bar{g}=0\). Suppose that \(\alpha _k\) are bounded. Then for any \(F\in C^1(M; {\mathbb {R}})\), \(0\le s \le t\) and \(0<\epsilon <1\),

$$\begin{aligned}&\left| \epsilon \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ F(y^\epsilon _{s_2}) g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2 \; ds_1\right| \\&\quad \le 2 \gamma _\epsilon |g|_\infty |f|_\infty ( \epsilon ^2+ (t-s)^2 ). \end{aligned}$$

Here

$$\begin{aligned} \gamma _\epsilon = \left( |F(y_{s\over \epsilon }^\epsilon )| \;c_W ( z_{s\over \epsilon }^\epsilon ) + \sum _{l=1}^m |\alpha _l|_\infty {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| (L_{Y_l} F)(y^\epsilon _r)\right| c_W (z^\epsilon _r) \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr \right) . \end{aligned}$$

Proof

We first expand \(F(y_{s_2}^\epsilon )\) at \({s \over \epsilon }\):

$$\begin{aligned}&\epsilon \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ F(y^\epsilon _{s_2}) g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2 \; ds_1\\&\quad =\epsilon F(y^\epsilon _{s\over \epsilon }) \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2 \; ds_1\\&\qquad +\sum _{l=1}^m\epsilon \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1}\int _{s\over \epsilon }^{s_2} {\mathbf E}\left\{ (dF)(Y_l(y^\epsilon _{s_3})) \alpha _l(z_{s_3}^\epsilon ) g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_3 \; ds_2 \; ds_1 \end{aligned}$$

By part (3) of lemma 3.1

$$\begin{aligned}&\left| \epsilon F(y^\epsilon _{s\over \epsilon }) \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2 \; ds_1\right| \\&\quad \le |F(y_{s\over \epsilon }^\epsilon )||f|_\infty |g|_\infty c_W (z^\epsilon _{s\over \epsilon }) (\epsilon ^3+(t-s)\epsilon ). \end{aligned}$$

It remain to estimate the summands in the second term, whose absolute value is bounded by the following

$$\begin{aligned} { \begin{aligned} A_l:=&\left| \epsilon \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1}\int _{s\over \epsilon }^{s_2} {\mathbf E}\left\{ (dF)(Y_l(y^\epsilon _{s_3})) \alpha _l(z_{s_3}^\epsilon ) g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_3 \; ds_2 \; ds_1\right| \\ =&\left| \epsilon \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ (dF)(Y_l(y^\epsilon _{s_3})) \alpha _l(z_{s_3}^\epsilon ) \int _{s_3}^{t\over \epsilon }\int _{s_2}^{t\over \epsilon } {\mathbf E}\left\{ g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s_3}\right\} \; ds_1 \; ds_2 \Big |{\mathcal {F}}_{s\over \epsilon } \right\} ds_3 \right| . \end{aligned} } \end{aligned}$$

For \(s_3\in [{s\over \epsilon }, {t\over \epsilon }]\), we apply part (3) of Lemma 3.1 to bound the inner iterated integral,

$$\begin{aligned}&\left| \int _{s_3}^{t\over \epsilon }\int _{s_2}^{t\over \epsilon } {\mathbf E}\left\{ g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s_3}\right\} \; ds_1 \; ds_2\right| {=}\left| \int _{s_3}^{t\over \epsilon }\int _{s_3}^{s_1} {\mathbf E}\left\{ g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s_3}\right\} \; ds_2 \; ds_1\right| \\&\quad \le ( \epsilon ^2+ t-\epsilon s_3 ) c_W (z^\epsilon _{s_3}) |f|_\infty |g|_\infty . \end{aligned}$$

We bring this back to the previous line, the notation \(L_{Y_l}F=dF(Y_l)\) will be used,

$$\begin{aligned} A_l\le & {} \epsilon \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| (dF)(Y_l(y^\epsilon _{s_3})) c_W (z^\epsilon _{s_3}) \alpha _l(z_{s_3}^\epsilon ) \Big |{\mathcal {F}}_{s\over \epsilon } \right\} \right| ( \epsilon ^2+ (t-\epsilon s_3) ) |f|_\infty |g|_\infty \; ds_3\\\le & {} |f|_\infty |g|_\infty |\alpha _l|_\infty (t-s) (\epsilon ^2+(t-s)) {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| (L_{Y_l} F)(y^\epsilon _{s_3})\right| c_W (z^\epsilon _{s_3})\Big |{\mathcal {F}}_{s\over \epsilon } \right\} ds_3. \end{aligned}$$

Putting everything together we see that, for \(\gamma _\epsilon \) given in the Lemma, \(\epsilon <1\),

$$\begin{aligned} \left| \epsilon \int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} {\mathbf E}\left\{ F(y^\epsilon _{s_2}) g(z^\epsilon _{s_2})f(z^\epsilon _{s_1}) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \; ds_2 \; ds_1\right| \le 2\gamma _\epsilon |g|_\infty |f|_\infty ( \epsilon ^2+ (t-s)^2). \end{aligned}$$

The proof is complete. \(\square \)

In Sect. 5 we will estimate \(\gamma _\epsilon \) and give uniform, in \(\epsilon \), moment estimates of functionals of \((y_t^\epsilon )\) on \([0, {T \over \epsilon }]\).

Lemma 3.5

Assume that \((z_t^\epsilon )\) satisfies Assumption 3.1 and \(\alpha _j\) are bounded. If \(F\in C^2(M;{\mathbb {R}})\) and \(f\in {\mathcal {B}}_b(G;{\mathbb {R}})\), then for all \(s\le t\),

$$\begin{aligned}&\left| {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ F (y_r^\epsilon ) f(z_{r}^\epsilon ) \big | {\mathcal {F}}_{s\over \epsilon }\right\} dr -\bar{f}\; F(y_{s\over \epsilon }^\epsilon ) \right| \\&\quad \le {2a\over \delta } |f|_\infty \left( W(z_{s\over \epsilon }^\epsilon )|F|(y_{s\over \epsilon }^\epsilon ) +\sum _{j=1}^{\mathfrak {m}}\gamma ^j_\epsilon |\alpha _j|_\infty \right) \left( {\epsilon ^2\over t-s} +(t-s)\right) \end{aligned}$$

where

$$\begin{aligned} \gamma ^j_{\epsilon }(y)= & {} c_W (z_{s\over \epsilon }^\epsilon ) \;|L_{Y_j}F (y^\epsilon _{s\over \epsilon })|\\&+ \sum _{l=1}^m |\alpha _l|_\infty {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| L_{Y_l} L_{Y_j}F(y^\epsilon _r)\right| c_W (z^\epsilon _r) \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

Proof

We note that,

$$\begin{aligned} {\begin{aligned} {\epsilon \over t-s}\int _{s\over \epsilon }^{t\over \epsilon }F (y_r^\epsilon ) f(z_{r}^\epsilon )dr&=F(y_{s\over \epsilon }^\epsilon ) {\epsilon \over t-s}\int _{s\over \epsilon }^{t\over \epsilon } f(z_{r}^\epsilon )dr\\&+ \sum _{j=1}^m {\epsilon \over t-s}\int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} dF (Y_j(y_{s_2}^\epsilon )) \alpha _j (z_{s_2}^\epsilon ) f(z_{s_1}^\epsilon )ds_2ds_1. \end{aligned}} \end{aligned}$$

Letting \(\psi (r)=ae^{-\delta r}\), it is clear that for \(k\ge 2\),

$$\begin{aligned} {\begin{aligned}&\left| {\mathbf E}\left\{ \left( F(y_{s\over \epsilon }^\epsilon ) {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } f(z_{r}^\epsilon )dr -\bar{f}\;F(y_{s\over \epsilon }) \right) \big |\;{\mathcal {F}}_{s\over \epsilon }\right\} \right| \\&\quad \le \Vert f\Vert _W W(z^\epsilon _{s\over \epsilon }) \left| F(y_{s\over \epsilon }^\epsilon )\right| {\epsilon ^2\over t-s} \int _{s\over \epsilon ^2}^{t\over \epsilon ^2} \psi \left( r-{s\over \epsilon ^2}\right) dr\\&\quad \le {a\over \delta } \Vert f\Vert _W W(z^\epsilon _{s\over \epsilon }) \left| F(y_{s\over \epsilon }^\epsilon )\right| {\epsilon ^2 \over t-s}. \end{aligned}} \end{aligned}$$

To the second term we apply Lemma 3.4 and obtain the bound

$$\begin{aligned}&\left| {\mathbf E}\left\{ \sum _{j=1}^m {\epsilon \over t-s}\int _{s\over \epsilon }^{t\over \epsilon } \int _{s\over \epsilon }^{s_1} dF (Y_j(y_{s_2}^\epsilon )) \alpha _j (z_{s_2}^\epsilon ) f(z_{s_1}^\epsilon )ds_2ds_1\big |\;{\mathcal {F}}_{s\over \epsilon }\right\} \right| \\&\quad \le 2 \sum _{j=1}^m \tilde{\gamma }_\epsilon ^j |\alpha _j|_\infty |f|_\infty \left( {\epsilon ^2\over t-s} +(t-s)\right) \end{aligned}$$

where

$$\begin{aligned} \gamma _\epsilon ^j= & {} |L_{Y_j}F|(y_{s\over \epsilon }^\epsilon )| \;c_W ( z_{s\over \epsilon }^\epsilon )\\&+ \sum _{l=1}^m |\alpha _l|_\infty {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| (L_{Y_l} L_{Y_j}F)(y^\epsilon _r)\right| c_W (z^\epsilon _r) \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

Adding the error estimates together we conclude the proof. \(\square \)

It is worth noticing that if \(\phi :{\mathbb {R}}\rightarrow {\mathbb {R}}\) is a concave function \(\phi (W)\) is again a Lyapunov function. Thus by using \(\log W\) if necessary, we may assume uniform bounds on \({\mathbf E}W^p(z_{s\over \epsilon }^\epsilon ) \) and further estimates on the conditional expectation in the error term are expected from Cauchy-Schwartz inequality. If G is compact then \(c_W \) is bounded. In Corollary 5.3, we will give uniform estimates on moments of \(\gamma ^j_\epsilon \).

4 A reduction

Let G be a smooth manifold of dimension n with volume measure dx. Let \(H^s\equiv H^s(G)\) denote the Sobolev space of real valued functions over a manifold G and \(\Vert -\Vert _s\) the Sobolev norm. The norm \((\Vert u\Vert _s)^2 :=(2\pi )^{-n}\int |\hat{u}(\xi )|^2 (1+|\xi |^2)^s d\xi \) extends from domains in \({\mathbb {R}}^n\) to compact manifolds, e.g. by taking supremum over \(\Vert u\Vert _s\) on charts. If \(s\in {\mathcal {N}}\), \(H^s\) is the completion of \(C^\infty (M)\) with the norm \(\Vert u\Vert _{s} =\sum _{j=0}^s \int (|\nabla ^j u|)^2 dx )^{1\over 2}\) where \(\nabla \) is usually taken as the Levi-Civita connection; when the manifold is compact this is independent of the Riemannian metric. And \(u\in H^s\) if and only if for any function \(\phi \in C_K^\infty \), \(\phi u\) in any chart belongs to \(H^s\).

Let \(\{X_i, i=0, 1,\dots , m'\}\) be a family of smooth vector fields on G and let us consider the Hörmander form operator \({\mathcal {L}}_0={1\over 2}\sum _{i=1}^{m'}L_{X_i}L_{X_i}+L_{X_0}\). Let

$$\begin{aligned} \Lambda := \{X_{i_1}, [X_{i_1}, X_{i_2}], [X_{i_1}, [X_{i_2},X_{i_3}]], i_j =0,1, \dots , m'\}. \end{aligned}$$

If the vector fields in \(\Lambda \) generate \(T_xG\) at each \(x\in G\), we say that Hörmander’s condition is satisfied. By the proof in a theorem of Hörmander [18, Theorem 1.1], if \({\mathcal {L}}_0\) satisfies the Hörmander condition then u is a \(C^\infty \) function in every open set where \({\mathcal {L}}_0 u\) is a \(C^\infty \) function. There is a number \(\delta >0\) such that there is an \(\delta \) improvement in the Sobolev regularity: if u is a distribution such that \({\mathcal {L}}_0 u\in H^s_{\mathrm{loc}}\), then \(u\in H^{s+\delta }_{\mathrm{loc}}\).

Suppose that G is compact. Then \(\Vert u\Vert _\delta \le C(\Vert u\Vert _{L^2}+\Vert {\mathcal {L}}_0 u\Vert _{L^2})\), the resolvents \(({\mathcal {L}}_0+\lambda I)^{-1}\) as operators from \(L^2(G;dx)\) to \(L^2(G;dx)\) are compact, and \({\mathcal {L}}_0\) is Fredholm on \(L^2(dx)\), by which we mean that \({\mathcal {L}}_0\) is a bounded linear operator from \(\mathrm{Dom}({\mathcal {L}}_0)\) to \(L^2(dx)\) and has the Fredholm property: its range is closed and of finite co-dimension, the dimension of its kernel, \(\mathrm{ker}({\mathcal {L}}_0)\) is finite. The domain of \({\mathcal {L}}_0\) is endowed with the norm \(|u|_{\mathrm{Dom}({\mathcal {L}}_0)}=|u|_{L_2}+|{\mathcal {L}}_0 u|_{L_2}\). Let \({\mathcal {L}}_0^*\) denote the adjoint of \({\mathcal {L}}_0\). Then the kernel N of \({\mathcal {L}}_0^*\) is finite dimensional and its elements are measures on M with smooth densities in \(L^2(dx)\). Denote \(N^\perp \) the annihilator of N, \(g\in L^2(dx)\) is in \( N^\perp \) if and only if \(\langle g, \pi \rangle =0\) for all \(\pi \in \mathrm{ker}({\mathcal {L}}_0^*)\). Since \({\mathcal {L}}_0\) has closed range, \((\mathrm{ker}({\mathcal {L}}_0^*))^\perp \) can be identified with the range of \({\mathcal {L}}_0\), and the set of g such that the Poisson equation \({\mathcal {L}}_0 u=g\) is solvable is exactly \(N^\perp \). We denote by \({\mathcal {L}}_0^{-1}g\) a solution. Furthermore \({\mathcal {L}}_0^{-1}g\) is \(C^r\) whenever g is \(C^r\). Denote by \(\hbox {index}({\mathcal {L}}_0)\), \(\mathrm{dim}\mathrm{ker}{\mathcal {L}}_0-\mathrm{dim}\,\mathrm{Coker}{\mathcal {L}}_0\), the index of a Fredholm operator \({\mathcal {L}}_0\), where \(\mathrm{Coker}=L^2(dx)/\mathrm{range}({\mathcal {L}}_0)\). If \({\mathcal {L}}_0\) is self-adjoint, \(\hbox {index} ({\mathcal {L}}_0)=0\).

Definition 4.1

We say that \({\mathcal {L}}_0\) is a regularity improving Fredholm operator, if it is a Fredholm operator and \({\mathcal {L}}_0^{-1}\alpha \) is \(C^r\) whenever \(\alpha \in C^r \cap N^\perp \).

Let \(\{W_t^k, k=1,\dots , m'\}\) be a family of independent real valued Brownian motions. We may and will often represent \({\mathcal {L}}_0^\epsilon \)-diffusions \((z_t^\epsilon )\) as solutions to the following stochastic differential equations, in Stratonovich form,

$$\begin{aligned} dz_t^\epsilon ={1\over \sqrt{\epsilon }}\sum _{k=1}^{m'} X_k(z_t^\epsilon ) \circ dW_t^k +{1\over \epsilon } X_0(z_t^\epsilon )dt. \end{aligned}$$

Lemma 4.1

Let \({\mathcal {L}}_0\) be a regularity improving Fredholm operator on a compact manifold G, \(\alpha _k\in C^3\cap N^\perp \), and \(\beta _j={\mathcal {L}}_0^{-1}\alpha _j\). Let \((y_r^\epsilon )\) be global solutions of (3.1) on M. Then for all \(0\le s<t\), \(\epsilon >0\) and \(f\in C^2(M; {\mathbb {R}})\),

$$\begin{aligned} f(y^\epsilon _{t\over \epsilon })= & {} f(y^\epsilon _{s\over \epsilon }) + \epsilon \sum _{j=1}^m ( df(Y_j(y^\epsilon _{t\over \epsilon } ) )\beta _j(z^\epsilon _{t\over \epsilon }) -df(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j ( z^\epsilon _{s\over \epsilon }))\nonumber \\&-\epsilon \sum _{i,j=1}^m\int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} f(y^\epsilon _r)) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \;dr \\&- \sqrt{\epsilon }\sum _{j=1}^m\sum _{k=1}^{m'} \int _{s\over \epsilon }^{t\over \epsilon } df( Y_j(y^\epsilon _r)) \; d\beta _j( X_k(z^\epsilon _r)) \;dW_r^k.\nonumber \end{aligned}$$
(4.1)

Suppose that, furthermore, for each \(\epsilon >0\), \(j,k=1,\dots , m\), \(\int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}|df( Y_j(y^\epsilon _r))|^2 |(d\beta _j( X_k)(z^\epsilon _r)|^2 \;dr\) is finite. Then,

$$\begin{aligned}&{\mathbf E}\left\{ f(y^\epsilon _{t\over \epsilon })\big | {\mathcal {F}}_{s\over \epsilon }\right\} -f(y^\epsilon _{s\over \epsilon })\nonumber \\&\quad = \epsilon \sum _{j=1}^m ( {\mathbf E}\left\{ df(Y_j(y^\epsilon _{t\over \epsilon } ) ) \beta _j(z^\epsilon _{t\over \epsilon })\big | {\mathcal {F}}_{s\over \epsilon }\right\} -df(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j ( z^\epsilon _{s\over \epsilon }))\nonumber \\&\qquad -\epsilon \sum _{i,j=1}^m\int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ L_{Y_i}L_{Y_j} f(y^\epsilon _r)) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \;dr. \end{aligned}$$
(4.2)

Proof

Firstly, for any \(C^2\) function \(f:M\rightarrow R\),

$$\begin{aligned} f(y_{t\over \epsilon }^\epsilon )-f (y_{s\over \epsilon }^\epsilon )= \sum _{j=1}^{m}\int _{s\over \epsilon }^{t\over \epsilon } df (Y_j(y_{s_1}^\epsilon )) \alpha _j (z_{s_1}) ds_1. \end{aligned}$$

Since the \(\alpha _j\)’s are \(C^2\) so are \(\beta _j\), following from the regularity improving property of \({\mathcal {L}}_0\). We apply Itô’s formula to the functions \((L_{Y_j}f)\beta _j: M\times G\rightarrow {\mathbb {R}}\). To avoid extra regularity conditions, we apply Itô’s formula to the function \(df(Y_j)\), which is \(C^1\), and the \(C^3\) functions \(\beta _j\) separately and follow it with the product rule. This gives:

$$\begin{aligned} df(Y_j(y^\epsilon _{t\over \epsilon } ))\beta _j (z^\epsilon _{t\over \epsilon })= & {} df(Y_j(y^\epsilon _{s\over \epsilon } )) \beta _j\left( z^\epsilon _{s\over \epsilon }\right) \\&+\sum _{j=1}^m \int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} f(y^\epsilon _r)\,\alpha _i(z^\epsilon _r)\;\beta _j( z^\epsilon _r)\; \;dr \\&+ {1\over \sqrt{\epsilon }} \sum _{k=1}^{m'} \int _{s\over \epsilon }^{t\over \epsilon } L_{Y_j} f( y^\epsilon _r) \, d\beta _j( X_k (z^\epsilon _r))dW_r^k\\&+{1\over \epsilon } \int _{s\over \epsilon }^{t\over \epsilon } L_{Y_j} f(y^\epsilon _r) \,{\mathcal {L}}_0 \beta _j(z_r^\epsilon )dr . \end{aligned}$$

Substitute this into the earlier equation, we obtain (4.1). Part (4.2) is obvious, as we note that

$$\begin{aligned}&{\mathbf E}\left( \sum _{k=1}^{m'} \int _{s\over \epsilon }^{t\over \epsilon } df( Y_j(y^\epsilon _r)) (d\beta _j) ( X_k(z^\epsilon _r)) \;dW_r^k\right) ^2 \\&\quad \le \sum _{k=1}^{m'}{\mathbf E}\int _{s\over \epsilon }^{t\over \epsilon } df( Y_j(y^\epsilon _r)) |^2 |d\beta (X_k(z_r^\epsilon ))|^2 | \; dr<\infty \end{aligned}$$

and the stochastic integrals are \(L^2\)-martingales, so (4.2) follows. \(\square \)

When G is compact, \(d\beta (X_k)\) is bounded and the condition becomes: \({\mathbf E}\int _{s\over \epsilon }^{t\over \epsilon } df( Y_j(y^\epsilon _r)) |^2 \; dr\) is finite, which we discuss below. Otherwise, assumptions on \({\mathbf E}|d\beta (X_k(z_r^\epsilon ))|^{2+} \) is needed.

5 Uniform estimates

If \(V: M\rightarrow {\mathbb {R}}_+\) is a locally bounded function such that \(\lim _{y\rightarrow \infty } V(y)=\infty \) we say that V is a pre-Lyapunov function. Let \(\alpha _k\in {\mathcal {B}}_b(G;{\mathbb {R}})\). Let \(\{Y_k\}\) be \(C^1\) smooth vector fields on M such that: either (a) each \(Y_k\) grows at most linearly; or (b) there exist a pre-Lyapunov function \(V\in C^1(M;{\mathbb {R}}_+)\), positive constants c and K such that \(\sum _{k=1}^m |L_{Y_k} V| \le c+KV\) then the equations (3.1) are complete. In case (a) let \(o\in M\) and a be a constant such that \(|Y_k(x)|\le a(1+\rho (o,x))\) where \(\rho \) denotes the Riemannian distance function on M. For x fixed, denote \(\rho _x=\rho (x, \cdot )\). By the definition of the Riemannian distance function,

$$\begin{aligned} \rho (y_t^\epsilon , y_0)\le \int _0^t |\dot{y}_s^\epsilon |ds =\sum _{k=1}^m \int _0^t |Y_k(y_s^\epsilon ) \alpha _k(z_s^\epsilon ) |ds \le \sum _{k=1}^m |\alpha _k|_\infty \int _0^t |Y_k(y_s^\epsilon )| ds. \end{aligned}$$

This together with the inequality \(\rho (y_t^\epsilon , o)\le \rho (y_t^\epsilon , y_0)+\rho (o, y_0^\epsilon )\) implies that for all \(p\ge 1\), there exist constants \(C_1, C_2\) depending on p such that

$$\begin{aligned} \sup _{s\le t}\rho ^p(y_s^\epsilon , o)\le (C_1 \rho ^p(o, y_0^\epsilon )+C_2 t) e^{C_2t^p} \end{aligned}$$

where \(C_2=a^pC_1^2( \sum _{k=1}^m |\alpha _k|_\infty )^p\). When restricted to \(\{t<\tau ^\epsilon \}\), the first time \(y_t^\epsilon \) reaches the cut locus, the bounded is simple \(Ce^{Ct}\). In case (b), for any \(q\ge 1\),

$$\begin{aligned} \sup _{s\le t} (V(y_s^\epsilon ))^q\le \left( V^q(y_0^\epsilon )+ctq\sum _{k=1}^m|\alpha _k|_\infty \right) \exp {\left( q\sum _{k=1}^m|\alpha _k|_\infty (K+c)t\right) }, \end{aligned}$$

which followed easily from the bound

$$\begin{aligned} |dV^q(\alpha _k Y_k)|=|qV^{q-1}dV(\alpha _kY_k)|\le q|\alpha _k|_\infty (c+(c+K) V^q). \end{aligned}$$

For the convenience of comparing the above estimates, which are standard and expected, with the uniform estimates of \((y_t^\epsilon )\) in Theorem 5.2 below in the time scale \({1\over \epsilon }\), we record this in the following Lemma.

Lemma 5.1

Let \(\alpha _k \in {\mathcal {B}}_b(G;{\mathbb {R}})\). Let \(\epsilon \in (0,1)\), \(0\le s\le t\), \(\omega \in \Omega \).

  1. 1.

    If \(\{Y_k\}\) grow at most linearly then (3.1) is complete and there exists CC(t) s.t.

    $$\begin{aligned} \sup _{0\le s\le t}\rho ^p(y_s^\epsilon (\omega ), o)\le (C \rho ^p(o, y_0^\epsilon (\omega ))+C(t)) e^{C(t)}. \end{aligned}$$
  2. 2.

    If there exist a pre-Lyapunov function \(V\in C^1(M;{\mathbb {R}}_+)\), positive constants c and K such that \(\sum _{j=1}^m |L_{Y_j} V| \le c+KV\), then (3.1) is complete.

  3. 3.

    If (3.1) is complete and there exists \(V\in C^1(M;{\mathbb {R}}_+)\) such that \(\sum _{j=1}^m |L_{Y_j} V| \le c+KV\) then there exists a constant C, s.t.

    $$\begin{aligned} \sup _{0\le s\le t} (V(y_s^\epsilon (\omega )))^q\le ( ( V(y_0^\epsilon (\omega )))^q+Ct ) e^{Ct}. \end{aligned}$$

If \(V\in {\mathcal {B}}(M;{\mathbb {R}})\) is a positive function, denote by \(B_{V,r}\) the following classes of functions:

$$\begin{aligned} B_{V,r}=\left\{ f\in C^r(M;{\mathbb {R}}): \sum _{j=0}^r |d^{j}f|\le c+c V^q \hbox { for some numbers }c, q \right\} . \end{aligned}$$
(5.1)

In particular, \(B_{V,0}\) is the class of continuous functions bounded by a function of the form \( c+c V^q\). In \({\mathbb {R}}^n\), the constant functions and the function \(V(x)=1+|x|^2\) are potential ‘control’ functions.

Assumption 5.1

Assume that (i) (3.1) are complete for every \(\epsilon \in (0,1)\), (ii) \(\sup _\epsilon {\mathbf E}\left( V(y_0^\epsilon )\right) ^q\) is finite for every \(q\ge 1\); and (iii) there exist a function \(V\in C^2(M;{\mathbb {R}}_+)\), positive constants c and K such that

$$\begin{aligned} \sum _{j=1}^m |L_{Y_j} V| \le c+KV, \quad \sum _{i,j=1}^m |L_{Y_i}L_{Y_j} V| \le c+KV. \end{aligned}$$

Below we assume that \({\mathcal {L}}_0\) satisfies Hörmander’s condition. We do not make any assumption on the mixing rate. Let \(\beta _j={\mathcal {L}}_0^{-1}\alpha _j\), \(a_1= \sum _{j=1}^m |\beta _j|_\infty \), \(a_2=\sum _{i,j=1}^m |\alpha _i|_\infty |\beta _j|_\infty \), \(a_3=\sum _{j=1}^m |d\beta _j|_\infty \), and \(a_4=\sum _{k=1}^m |X_k|^2_\infty \). We recall that if \(\alpha _k\) and \({\mathcal {L}}_0\) satisfy Assumption 6.1 then \({\mathcal {L}}_0\) is a regularity improving Fredholm operator.

Theorem 5.2

Let \({\mathcal {L}}_0\) be a regularity improving Fredholm operator on a compact manifold G, and \(\alpha _k\in C^3(G;{\mathbb {R}})\cap N^\perp \). Assume that \(Y_k\) satisfy Assumption 5.1. Then for all \(p\ge 1\), there exists a constant \(C=C(c, K, a_i,p)\) s.t. for any \(0\le s\le t\) and all \(\epsilon \le \epsilon _0\),

$$\begin{aligned} {\mathbf E}\left\{ \sup _{s\le u\le t} ( V(y^\epsilon _{u\over \epsilon }) )^{2p} \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \le ( 4 V^{2p}(y^\epsilon _{s\over \epsilon }) +C(t-s)^2+C ) e^{ C(t-s+1)t}. \end{aligned}$$
(5.2)

Here \(\epsilon _0\le 1\) depends on \(c,K, p, a_1\) and \(V, Y_i, Y_j\).

Proof

Let \(0\le s\le t\). We apply (4.1) to \(f=V^p\):

$$\begin{aligned} V^p(y^\epsilon _{t\over \epsilon })= & {} V^p(y^\epsilon _{s\over \epsilon }) + \epsilon \sum _{j=1}^m dV^p(Y_j(y^\epsilon _{t\over \epsilon } ) )\beta _j(z^\epsilon _{t\over \epsilon }) -\epsilon \sum _{j=1}^m dV^p(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j( z^\epsilon _{s\over \epsilon })\\&-\epsilon \sum _{i,j=1}^m \int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} V^p(y^\epsilon _r) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \; dr\\&-\sqrt{\epsilon }\sum _{k=1}^p \int _{s\over \epsilon }^{t\over \epsilon } \sum _{j=1}^m dV^p (Y_j(y_r^\epsilon ))(d\beta _j) (X_k(z_r^\epsilon )) \;dW_r^k. \end{aligned}$$

In the following estimates, we may first assume that \(\sum _{j=1}^m |L_{Y_j}V|\) and \(\sum _{i,j=1}^m |L_{Y_j}L_{Y_i}V|\) are bounded. We may then replace t by \(t\wedge \tau _n\) where \(\tau _n\) is the first time that either quantity is greater or equal to n. We take this point of view for proofs of inequalities and may not repeat it each time.

We take the supremum over [st] followed by conditional expectation of both sides of the inequality:

$$\begin{aligned}&{\mathbf E}\left\{ \sup _{s\le u\le t} V^p(y^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \le V^p(y^\epsilon _{s\over \epsilon }) + \epsilon {\mathbf E}\left\{ \sup _{s\le u\le t} \sum _{j=1}^m dV^p(Y_j(y^\epsilon _{u\over \epsilon } )) \beta _j(z^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\qquad - \sum _{j=1}^m dV^p(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j( z^\epsilon _{s\over \epsilon })\\&\qquad +\,\epsilon {\mathbf E}\left\{ \sup _{s\le u\le t} \left| \int _{s\over \epsilon }^{u\over \epsilon }\sum _{i,j=1}^mL_{Y_i}L_{Y_j} V^p\left( y^\epsilon _r\right) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \;dr \right| \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\qquad +\,\sqrt{\epsilon }{\mathbf E}\left\{ \sup _{s\le u\le t} \left| \sum _{k=1}^{m'} \int _{s\over \epsilon }^{u\over \epsilon } \sum _{j=1}^m dV^p (Y_j(y_r^\epsilon ))(d\beta _j) (X_k(z_r^\epsilon )) dW_r^k\right| \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} . \end{aligned}$$

By the conditional Jensen inequality and the conditional Doob’s inequality, there exists a universal constant \(\tilde{C}\) depending only on p s.t.,

$$\begin{aligned}&{\mathbf E}\left\{ \sup _{s\le u\le t} V^{2p}(y^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\quad \le 4 V^{2p}(y^\epsilon _{s\over \epsilon }) + 4\epsilon ^2 {\mathbf E}\left( \left\{ \sum _{j=1}^m |\beta _j|_\infty \sup _{s\le u \le t}| dV^p(Y_j(y^\epsilon _{u\over \epsilon } ))| \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \right) ^2\\&\qquad +\, 4\epsilon ^2 \left( \sum _{j=1}^m |\beta _j|_\infty | dV^p(Y_j(y^\epsilon _{s\over \epsilon } ))| \right) ^2 \\&\qquad +8\epsilon (t-s) {\mathbf E}\left\{ \left( \int _{s\over \epsilon }^{t\over \epsilon } \sum _{i,j=1}^m |\alpha _i|_\infty |\beta _j|_\infty \left| L_{Y_i}L_{Y_j} V^p\left( y^\epsilon _r\right) \right| \;dr\right) ^2 \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\qquad +\tilde{C} \sum _{k=1}^p{\mathbf E}\left\{ \epsilon \int _{s\over \epsilon }^{t\over \epsilon } \left| \sum _{j=1}^m dV^p( Y_j(y^\epsilon _r)) (d\beta _j) \left( X_k(z^\epsilon _r)\right) \right| ^2 \; dr \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} . \end{aligned}$$

Since \(\sum _j|L_{Y_j}V|\le c+KV\) and \(\sum _{i,j=1}^p |L_{Y_i}L_{Y_j}V|\le c+KV\), there are constants \(c_1\) and \(K_1\) such that \(\max _{j=1, \dots , m}|L_{Y_j}V^p |\le c_1+K_1V^p\) and \(\max _{i,j=1, \dots , m} |L_{Y_i}L_{Y_j}V^p|\le c_1+K_1V^p\). This leads to the following estimate:

$$\begin{aligned}&{\mathbf E}\left\{ \sup _{s\le u\le t} V^{2p}(y^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\quad \le 4 V^{2p}(y^\epsilon _{s\over \epsilon }) + 8\epsilon ^2(a_1)^2 \left( 2(c_1)^2+ (K_1)^2 {\mathbf E}\left\{ \sup _{s\le u \le t} V^{2p}(y^\epsilon _{u\over \epsilon } ) \; \big | \; {\mathcal {F}}_{s\over \epsilon }\right\} +(K_1)^2 V^{2p}(y^\epsilon _{s\over \epsilon } )\right) \\&\qquad +16 (a_2)^2(t-s) \epsilon \int _{s\over \epsilon }^{t\over \epsilon } \left( (c_1)^2+ (K_1)^2{\mathbf E}\left\{ V^{2p}(y^\epsilon _r ) \; \big | \; {\mathcal {F}}_{s\over \epsilon }\right\} \right) dr \\&\qquad + \tilde{C} (a_3a_4)^2 \epsilon \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left( c_1+K_1 V^p((y^\epsilon _r))\right) ^2 \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \; dr. \end{aligned}$$

Let \(\epsilon _0=\min \{{1\over 8a_1K_1}, 1\}\). For \(\epsilon \le \epsilon _0\),

$$\begin{aligned}&{1\over 2} {\mathbf E}\left\{ \sup _{s\le u\le t} V^{2p}(y^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \\&\quad \le 4 V^{2p}(y^\epsilon _{s\over \epsilon }) +16\epsilon ^2 ( a_1c_1)^2 +16(t-s)^2 (a_2c_1)^2+ 4\tilde{C} (a_3a_4c_1)^2(t-s)\\&\qquad + ( 16 (a_2K_1)^2(t-s) +4\tilde{C} (a_3a_4K_1)^2 ) \epsilon \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ V^{2p}(y^\epsilon _r ) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \;dr. \end{aligned}$$

It follows that there exists a constant C such that for \(\epsilon \le \epsilon _0\),

$$\begin{aligned} {\mathbf E}\left\{ \sup _{s\le u\le t} V^{2p}(y^\epsilon _{u\over \epsilon }) \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \le ( 4 V^{2p}(y^\epsilon _{s\over \epsilon }) +C(t-s)^2+C ) e^{ C(t-s+1)t}. \end{aligned}$$

\(\square \)

Remark

If \(M={\mathbb {R}}^n\), \(Y_i\) are vector fields with bounded first order derivatives, then \(\rho _0^2\) is a pre-Lyapunov function satisfying the conditions of Theorem 5.2, hence Theorem 5.2 holds. Let us recall that \(B_{V,r}\) is defined in (5.1).

We return to Lemma 3.5 in Sect. 3 to obtain a key estimation for the estimation in Sect. 8. Let us recall that \(B_{V,r}\) is defined in (5.1).

Corollary 5.3

Assume (3.1) is complete, for every \(\epsilon \in (0,1)\), and conditions of Assumption 3.1. Let \(V\in {\mathcal {B}}(M;{\mathbb {R}}_+)\) be a locally bounded function and \(\epsilon _0\) a positive number s.t. for all \(q\ge 1\) and \(T>0\), there exists a locally bounded function \(C_q: {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\), a real valued polynomial \(\lambda _q\) such that for \(0\le s\le t\le T\) and for all \(\epsilon \le \epsilon _0\)

$$\begin{aligned} \sup _{s\le u \le t} {\mathbf E}\left\{ V^q(y_{u\over \epsilon }^\epsilon ) \; \big | {\mathcal {F}}_{s\over \epsilon } \right\} \le C_q(t)+C_q(t) \lambda _q( V(y_{s\over \epsilon }^\epsilon )), \quad \sup _{0<\epsilon \le \epsilon _0}{\mathbf E}(V^q(y_0^\epsilon ))<\infty . \end{aligned}$$
(5.3)

Let \(h\in {\mathcal {B}}_b( G; {\mathbb {R}})\). If \(f\in B_{V,0}\) is a function s.t. \(L_{Y_j}f\in B_{V,0}\) and \( L_{Y_l}L_{Y_j}f\in B_{V,0}\) for all \(j, l=1, \dots , m\), then for all \(0\le s\le t\),

$$\begin{aligned} \left| {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ f (y_r^\epsilon ) h(z_{r}^\epsilon ) \big | {\mathcal {F}}_{s\over \epsilon }\right\} dr -\bar{h}\; f(y_{s\over \epsilon }^\epsilon ) \right| \le \tilde{c} |h|_\infty \gamma _\epsilon (y_{s\over \epsilon }^\epsilon ) \left( {\epsilon ^2\over t-s}+(t-s)\right) . \end{aligned}$$

Here \(\tilde{c}\) is a constant, see (5.4) below, and

$$\begin{aligned} \gamma _\epsilon =|f| + \sum _{j=1}^m |L_{Y_j}f|+\sum _{j,l=1}^m {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ |L_{Y_l} L_{Y_j}f(y^\epsilon _r) | \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

For all \(s<t\) and \(p\ge 1\),

$$\begin{aligned} \sup _{s\le u\le t }\sup _{\epsilon \le \epsilon _0} {\mathbf E}( \gamma _\epsilon (y_{u\over \epsilon }^\epsilon ))^p<\infty . \end{aligned}$$

More explicitly, if \(\sum _{j=1}^m\sum _{l=1}^m|L_{Y_l} L_{Y_j}f|\le K+KV^q\) where Kq are constants, then there exists a constant C(t) depending only on the differential equation (3.1) s.t.

$$\begin{aligned} \gamma _\epsilon \le |f|+ \sum _{j=1}^m |L_{Y_j}f|+K+C(t)V^q. \end{aligned}$$

Proof

By Lemma 3.5,

$$\begin{aligned}&\left| {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ f (y_r^\epsilon ) h(z_{r}^\epsilon ) \big | {\mathcal {F}}_{s\over \epsilon }\right\} dr -\bar{h}\; f(y_{s\over \epsilon }^\epsilon ) \right| \\&\quad \le {2a\over \delta } |h|_\infty \left( W(z_{s\over \epsilon }^\epsilon )\left| f(y_{s\over \epsilon }^\epsilon )\right| +\sum _{j=1}^{\mathfrak {m}}\gamma ^j_\epsilon |\alpha _j|_\infty \right) \left( {\epsilon ^2\over t-s} +(t-s)\right) ,\\ \end{aligned}$$

where

$$\begin{aligned} \gamma ^j_{\epsilon }(y)= & {} c_W (z_{s\over \epsilon }^\epsilon ) \;|L_{Y_j}f (y^\epsilon _{s\over \epsilon })|\\&+ \sum _{l=1}^m |\alpha _l|_\infty {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \left| L_{Y_l} L_{Y_j}f(y^\epsilon _r)\right| c_W (z^\epsilon _r) \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

Since W is bounded so is \(c_W \), which is bounded by \(2c(a,\delta ) |W|_\infty \). Furthermore

$$\begin{aligned} {\mathbf E}\left\{ |L_{Y_l} L_{Y_j}f(y^\epsilon _r)|c_W (z^\epsilon _r) \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr \le 2c(a,\delta ) |W|_\infty {\mathbf E}\left\{ |L_{Y_l} L_{Y_j}f(y^\epsilon _r)| \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

We gather all constant together,

$$\begin{aligned} \tilde{c}= {2a\over \delta }|W|_\infty +2c(a,\delta )|W|_\infty \sum _{j,l=1}^m |\alpha _j|_\infty +2\left( \sum _{j=1}^m |\alpha _j|_\infty \right) ^2. \end{aligned}$$
(5.4)

It is clear that,

$$\begin{aligned} \left| {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ f (y_r^\epsilon ) h(z_{r}^\epsilon ) \big | {\mathcal {F}}_{s\over \epsilon }\right\} dr -\bar{h}\; f(y_{s\over \epsilon }^\epsilon ) \right| \le \tilde{c}\, \gamma _\epsilon |h|_\infty \left( {\epsilon ^2\over t-s} +(t-s)\right) . \end{aligned}$$

Since f, \(L_{Y_j}\) and \(L_{Y_l}L_{Y_j}f\in B_{V,0}\), by (5.3), the following quantities are finite for all \(p\ge 1\):

$$\begin{aligned} \sup _{\epsilon \le \epsilon _0} \sup _{s\le u \le t} {\mathbf E}| (L_{Y_l}L_{Y_j}f)(y_{u\over \epsilon }^\epsilon )| ^p, \quad \sup _{\epsilon \le \epsilon _0} \sup _{s\le u \le t} {\mathbf E}| L_{Y_j}f(y_{u\over \epsilon }^\epsilon )| ^p, \quad \sup _{\epsilon \le \epsilon _0} \sup _{s\le u \le t} {\mathbf E}| f(y_{u\over \epsilon }^\epsilon )|^p. \end{aligned}$$

Furthermore since \(\sum _{j=1}^m\sum _{l=1}^m|L_{Y_l} L_{Y_j}f|\le K+KV^q\),

$$\begin{aligned} \sum _{j=1}^m\sum _{l=1}^m {\epsilon \over t-s} \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ |L_{Y_l} L_{Y_j}f(y^\epsilon _r)| \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr \le K+C(t) V^q(y_{s\over \epsilon }^\epsilon ). \end{aligned}$$

Consequently, \(\gamma _\epsilon \le |f|+ \sum _{j=1}^m |L_{Y_j}f|+K+C(t)V^q\), completing the proof. \(\square \)

6 Convergence under Hörmander’s conditions

Below \(\mathrm {inj}(M)\) denotes the injectivity radius of M and \(\rho _y=\rho (y, \cdot )\) is the Riemannian distance function on M from a point y. Let o denote a point in M. The following proposition applies to an operator \({\mathcal {L}}_0\), on a compact manifold, satisfying Hörmander’s condition.

Proposition 6.1

Let M be a manifold with positive injectivity radius and \(\epsilon _0>0\). Suppose conditions (1–5) below or conditions (1–3), (4’) and (5).

  1. (1)

    \({\mathcal {L}}_0\) is a regularity improving Fredholm operator on \(L^2(G)\) for a compact manifold G;

  2. (2)

    \(\{\alpha _k\} \subset C^3\cap N^\perp \);

  3. (3)

    Suppose that for \(\epsilon \in (0, \epsilon _0)\), (3.1) is complete and \(\sup _{\epsilon \le \epsilon _0}{\mathbf E}\rho (y_0^\epsilon , o)<\infty \);

  4. (4)

    Suppose that there exists a locally bounded function V s.t. for all \(\epsilon \le \epsilon _0\) and for any \(0\le s\le u\le t\), and for all \(p\ge 1\),

    $$\begin{aligned} {\mathbf E}V^p(y^\epsilon _0) \le c_0, \quad \sup _{s\le u\le t}{\mathbf E}\left\{ ( V(y^\epsilon _{u\over \epsilon }) )^p \; \big | \; {\mathcal {F}}_{s\over \epsilon } \right\} \le K +K V^{p'}(y^\epsilon _{s\over \epsilon }) \end{aligned}$$

    where \(c_0=c_0(p)\), \(K=K(p,t)\), and \( p'=p'(p,t)\) is a natural number; \(K, p'\) are locally bounded in t.

  5. (4’)

    There exist a function \(V\in C^2(M;{\mathbb {R}}_+)\), positive constants c and K such that

    $$\begin{aligned} \sum _{j=1}^m |L_{Y_j} V| \le c+KV, \quad \sum _{i,j=1}^m |L_{Y_i}L_{Y_j} V| \le c+KV. \end{aligned}$$
  6. (5)

    For V in part (4) or in part (4’), suppose that for some number \(\delta >0\),

    $$\begin{aligned} |Y_j|\in B_{V,0} \quad \sup _{\rho (y,\cdot ) \le \delta }| L_{Y_i}L_{Y_j}\rho _y(\cdot ) |\in B_{V,0}. \end{aligned}$$

Then there exists a distance function \(\tilde{\rho }\) on M that is compatible with the topology of M and there exists a number \(\alpha >0\) such that

$$\begin{aligned} \sup _{\epsilon \le \epsilon _0} {\mathbf E}\sup _{s\not =t} \left( {\tilde{\rho }( y_{s\over \epsilon }^\epsilon , y_{t\over \epsilon }^\epsilon ) \over |t-s|^\alpha }\right) <\infty , \end{aligned}$$

and for any \(T>0\), \(\{ (y_{t\over \epsilon }^\epsilon , t\le T), 0<\epsilon \le 1\}\) is tight.

Proof

By Theorem 5.2, conditions (1–3) and (4’) imply condition (4). (a) Let \({\delta }<\min (1, {1\over 2} \mathrm {inj}(M))\). Let \(f: {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\) be a smooth convex function such that \(f(r)=r\) when \(r\le {\delta \over 2}\) and \(f(r)=1\) when \(r\ge \delta \). Then \(\tilde{\rho }(x,y)=f\circ \rho \) is a distance function with \(\tilde{\rho } \le 1\). Its open sets generate the same topology on M as that by \(\rho \). Let \(\beta _j\) be a solution to \({\mathcal {L}}_0\beta _j=\alpha _j\). For any \(y_0\in M\), \(|L_{Y_j}\tilde{\rho }^2(y_0,\cdot )|\le 2|Y_j(\cdot )|\). Since \(|Y_j|\in B_{V,0}\), \(\int _0^{t\over \epsilon } {\mathbf E}|L_{Y_j} \tilde{\rho }|(y_r^\epsilon )|^2 dr<\infty \). We may apply (4.2) in Lemma 4.1,

$$\begin{aligned}&{\mathbf E}\left\{ \tilde{\rho }^2(y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon ) \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \\&\quad = \epsilon \sum _{j=1}^m \left( {\mathbf E}\left\{ (L_{Y_j}\tilde{\rho }^2(y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon )) \;\beta _j(z^\epsilon _{t\over \epsilon })\; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} - (L_{Y_j}\tilde{\rho }^2(y^\epsilon _{s\over \epsilon }, \cdot )) (y_{s\over \epsilon }^\epsilon )\;\beta _j(z^\epsilon _{s\over \epsilon })\right) \\&\qquad -\epsilon \sum _{i,j=1}^m\int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ (L_{Y_i}L_{Y_j} \tilde{\rho }^2( y^\epsilon _{s\over \epsilon }, y^\epsilon _r))\; \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r)\; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \;dr. \end{aligned}$$

In the above equation, differentiation of \((\tilde{\rho })^2 \) is w.r.t. to the second variable. By construction \( \tilde{\rho } \) is bounded by 1 and \(|\nabla \tilde{\rho } |\le |\nabla \rho |\le 1\). Furthermore since \(\alpha _j\) are \(C^3\) functions on a compact manifold, so \(\beta _j\) and \(|\beta _j|\) are bounded. For any \(y_0\in M\), \(L_{Y_j} \tilde{\rho } ( y_0, \cdot )=\gamma '(\rho _{y_0} )L_{Y_j} \rho _{y_0} \). Thus

$$\begin{aligned} \left| {\mathbf E}\left\{ (L_{Y_j}\tilde{\rho }^2( y_{s\over \epsilon }^\epsilon , y^\epsilon _{t\over \epsilon })) \;\beta _j(z^\epsilon _{t\over \epsilon }) \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \right| \le |\beta _j|_\infty {\mathbf E}\left\{ \tilde{\rho } (y_{s\over \epsilon }^\epsilon ,{y^\epsilon _{t\over \epsilon }}) |Y_j({y^\epsilon _{t\over \epsilon }})| \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} . \end{aligned}$$

Recall \(\tilde{\rho }\le 1\) and there are numbers \(K_1\) and \(p_1\) s.t. \(|Y_j|\le K_1+K_1V^{p_1}\), so

$$\begin{aligned} {\mathbf E}\left\{ |Y_j({y^\epsilon _{t\over \epsilon }})| \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \le K_1+K_1{\mathbf E}\left\{ V^{p_1}({y^\epsilon _{t\over \epsilon }}) \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \le K_1+K_1K(p_1,t) V^{p'(p_1,t)}({y^\epsilon _{s\over \epsilon }}). \end{aligned}$$

Let \(g_1 =K_1+K_1K(p_1) V^{p'(p_1,t)}\), it is clear that \(g_1\in B_{V,0}\). We remark that,

$$\begin{aligned} L_{Y_i}L_{Y_j}(\tilde{\rho }^2) =(f^2)''(\rho ) (L_{Y_i}\rho ) (L_{Y_j}\rho ) + (f^2)'(\rho ) L_{Y_i}L_{Y_j}\rho . \end{aligned}$$

By the assumption, there exists a function \(g_2\in B_{V,0}\) s.t.

$$\begin{aligned} {\mathbf E}\left\{ \tilde{\rho }^2 (y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon ) \big | {\mathcal {F}}_{s\over \epsilon }\right\} \le g_2(y_{s\over \epsilon })\epsilon +g_2(y_{s\over \epsilon })(t-s). \end{aligned}$$

For \(\epsilon \ge \sqrt{t-s}\), it is better to estimate directly from (3.1):

$$\begin{aligned} {\mathbf E}\left\{ \tilde{\rho }^2 (y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon ) \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\}= & {} \sum _{k=1}^m\int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ 2\tilde{\rho } (y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon ) L_{Y_k}\tilde{\rho } (y^\epsilon _{s\over \epsilon }, y_{t\over \epsilon }^\epsilon ) \alpha _k(z_r^\epsilon ) \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \\\le & {} 2|\alpha _k|_\infty \sum _{k=1}^m \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ |Y_k(y_r^\epsilon )| \;\big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \;dr \le g_3(y^\epsilon _{s\over \epsilon }) \left( {t-s \over \epsilon }\right) \end{aligned}$$

where \(g_3\in B_{V,0}\). We interpolate these estimates and conclude that for some function \(g_4\in B_{V,0}\) and a constant c the following holds: \({\mathbf E}\{ \tilde{\rho }^2(y_{t\over \epsilon }^{\epsilon }, y_{s\over \epsilon }^\epsilon )\;\big |\; {\mathcal {F}}_{s\over \epsilon }\} \le (t-s)g_4(y^\epsilon _{s\over \epsilon })\). There is a function \(g_5\in B_{V,0}\) s.t.

$$\begin{aligned} {\mathbf E}\tilde{\rho }^2(y_{t\over \epsilon }^{\epsilon }, y_{s\over \epsilon }^\epsilon ) \le {\mathbf E}g_5(y_0^\epsilon ) (t-s)\le c(t-s). \end{aligned}$$

In the last step we use Assumption (4) on the initial value. By Kolmogorov’s criterion, there exists \(\alpha >0\) such that

$$\begin{aligned} \sup _\epsilon {\mathbf E}\sup _{s\not =t} \left( {\tilde{\rho }^2 ( y_{s\over \epsilon }^\epsilon , y_{t\over \epsilon }^\epsilon ) \over |t-s|^\alpha }\right) <\infty , \end{aligned}$$

and the processes \((y_{s\over \epsilon }^\epsilon )\) are equi uniformly Hölder continuous on any compact time interval. Consequently the family of stochastic processes \(\{ y_{t\over \epsilon }^\epsilon , 0<\epsilon \le 1 \}\) is tight. \(\square \)

If \({\mathcal {L}}_0\) is the Laplace–Beltrami operator on a compact Riemannian manifold and \(\pi \) its invariant probability measure then for any Lipschitz continuous function \(f:G\rightarrow {\mathbb {R}}\),

$$\begin{aligned} \sqrt{{\mathbf E}\left( {1\over t} \int _0^t f(z_s) ds-\int f \; d\pi \right) ^2} \le C(\Vert f\Vert _{Osc}){1\over \sqrt{t}}. \end{aligned}$$
(6.1)

where \(\Vert f\Vert _{Osc}\) denotes the oscillation of f. If \({\mathcal {L}}_0\) is not elliptic we suppose it satisfies Hörmander’s conditions and has index 0. The dimension of the kernel of \({\mathcal {L}}_0^*\) equals the dimension of the kernel of \({\mathcal {L}}_0\). Let \(\{u_i, i=1, \dots , n_0\}\) be a basis in \(\mathrm{ker}( {\mathcal {L}}_0)\) and \(\{\pi _i\, i=1, \dots , n_0\}\) the dual basis for the null space of \({\mathcal {L}}_0^*\). For \(f\in L^2(G;{\mathbb {R}})\) we define \(\bar{f}= \sum _{i=1}^{n_0} u_i\langle f, \pi _i\rangle \) where the bracket denotes the dual pairing between \(L^2\) and \((L^2)^*\).

Lemma 6.2

Suppose that \((z_t)\) is a Markov process on a compact manifold G with generator \( {\mathcal {L}}_0\) satisfying Hörmander’s condition and having Fredholm index 0. Then for any function \(f\in C^r(G; {\mathbb {R}})\), where \(r\ge \max {\{3, {n\over 2}+1\}}\), there is a constant C depending on \(|f|_{{n\over 2}+1}\), s.t.

$$\begin{aligned} \sqrt{{\mathbf E}\left( {1\over t-s} \int _s^t f(z_r) dr- \bar{f} \right) ^2} \le C(\Vert f-\bar{f}\Vert _{{n\over 2}+1}) {1\over \sqrt{t-s}}. \end{aligned}$$
(6.2)

Proof

Since \(\langle f, \pi _j\rangle =\langle f, \pi _j\rangle \), \(f-\bar{f}\in N^\perp \). By working with \(f-\bar{f}\) we may assume that \(f\in N^\perp \) and let g be a solution to \({\mathcal {L}}_0 g=f\). By Hörmander’s theorem, [18], there is a positive number \(\delta \), such that for all \(u\in C^\infty (M)\),

$$\begin{aligned} \Vert u\Vert _{s+\delta } \le C( \Vert {\mathcal {L}}_0 u\Vert _s+\Vert u\Vert _{L_2} ). \end{aligned}$$

The number \(\delta =2^{1-k}\) where \(k\in {\mathcal {N}}\) is related to the number of brackets needed to generate the tangent spaces.

Furthermore every u such that \(\Vert {\mathcal {L}}_0 u\Vert _s<\infty \) must be in \(H^s\). If \(s>{n\over 2}+1\), \(H^s\) is embedded in \(C^1\) and for some constant \(c_i\),

$$\begin{aligned} |g|_{C^1(M)}\le c_1 \,\Vert g\Vert _{ {n\over 2}+1+\epsilon }\le c_2 \; ( \Vert f\Vert _{{n\over 2}+1}+|g|_{L_2}) \le c_3\, \Vert f\Vert _{{n\over 2}+1}. \end{aligned}$$

Recall that \({\mathcal {L}}_0=\sum _{i=1}^{m'} L_{X_i}L_{X_i}+L_{X_0}\). Let \(\{W_t^j, j=1, \dots , m'\}\) be independent one dimensional Brownian motions. Let \((z_t)\) be solutions of \(dz_t=\sum _{j=1}^{m'} X_j(z_t)\circ dW_t^j\). Since f is \(C^2\),

$$\begin{aligned} {1\over t-s} \int _s^t f(z_r)dr ={1\over t-s} \left( g(z_t)-g(z_s)\right) -{1\over t-s}\left( \sum _{j=1}^{m'} \int _s^t (dg(X_j))(z_r) dW_r^j\right) . \end{aligned}$$

We apply the Sobolev estimates to g and use Doob’s \(L^2\) inequality to see that for \(t\ge 1\) there is a constant C such that,

$$\begin{aligned}&{\mathbf E}\left( {1\over t-s} \int _s^t f(z_r)dr\right) ^2 \le {4 \over t^2} |g|_\infty ^2 +{8\over (t-s)^2} \sum _{j=1}^{m'} \int _s^t ( {\mathbf E}|dg(z_r)|^2|X_j(z_r)|^2 ) dr\\&\quad \le {4 \over (t-s)^2} (|g|_\infty )^2 +{8m'\over t-s} (|dg|)^2_{\infty } \sum _{j=1}^{m'} |X_j|_\infty ^2 \le C(\Vert f\Vert _{{n\over 2}+1})^2{1\over t-s}. \end{aligned}$$

\(\square \)

We remark that a self-adjoint operator satisfying Hörmander’s condition has index zero.

Lemma 6.3

Suppose that \({\mathcal {L}}_0\) satisfies Hörmander’s condition. In addition it has Fredholm index 0 or it has a unique invariant probability measure. Let \(r\ge \max {\{3, {n\over 2}+1\}}\). Let \(h : M \times G\rightarrow {\mathbb {R}}\) be such that \(h(y, \cdot )\in C^r\) for each y and that \(|h|_\infty + \sup _{z} |h(\cdot , z)|_{\mathrm {Lip}} +\sup _{y} |h(y, \cdot )|_{C^r}<\infty \). Let \(s\le t\) be a pair of positive numbers, and \(F\in BC( C([0,s]; M)\rightarrow {\mathbb {R}})\). For any equi -uniformly continuous subsequence, \(\tilde{y}^n_t:=(y^{\epsilon _n}_{t\over {\epsilon _n}})\), of \((y^\epsilon _{t\over \epsilon })\) that converges weakly to a continuous process \(\bar{y}_\cdot \) as \(n\rightarrow \infty \), the following convergence holds weakly:

$$\begin{aligned} F(y^{\epsilon _n}_{\cdot \over \epsilon _n}) \int _s^t h(y^{\epsilon _n}_{u\over \epsilon _n}, z^{\epsilon _n}_{u\over \epsilon _n}) du \rightarrow F( \bar{y}_\cdot ) \int _s^t \overline{ h(\bar{y}_u, \cdot ) }du \end{aligned}$$

where \(\overline{ h(y, \cdot )}=\sum _{i=1}^{n_0} u_i \langle h(y, \cdot ),\pi _i\rangle \).

Proof

For simplicity we omit the subscript n. The required convergence follows from Lemma 4.3 in [25] where it was assumed that (6.1) holds and \({\mathcal {L}}_0\) has a unique invariant measure for \(\mu \). It is easy to check that the proof there is valid. We take care to replace \(\int _G h(y,z) d\mu (z)\) in Lemma 4.3 there by \(\sum _{i=1}^{n_0} u_i \langle h(y, \cdot ),\pi _i\rangle \). We remark that by the regularity improving property each \(u_i\) is smooth and therefore bounded. In the first part of the proof, we divide [st] into sub-intervals of size \(\epsilon \), freeze the slow variable \((y^\epsilon _{u\over \epsilon })\) on \([t_k, t_{k+1}]\), and approximate \(h(y_{u\over \epsilon }^\epsilon , z_{u\over \epsilon }^\epsilon )\) by \(h(y_{t_k\over \epsilon }^\epsilon , z_{u\over \epsilon }^\epsilon )\) on each sub-interval \([t_k, t_{k+1}]\). This approximation is clear: the computation is exactly as in Lemma 4.3 of [25] and we use the uniform continuity of \((y_t^\epsilon )\), the fact that \(|h|_\infty \) and \(\sup _z |h( \cdot , z)|_{\mathrm {Lip}}\) are finite. The convergence of

$$\begin{aligned} \int _{t_{k-1}\over \epsilon }^{t_{k-1}\over \epsilon } h(y_{t_k\over \epsilon }^\epsilon , z_{u\over \epsilon }^\epsilon )du \rightarrow \Delta t_k \sum _{i=1}^{n_0} u_i \langle h(y_{t_{k-1}\over \epsilon }^\epsilon , \cdot ), \pi _i\rangle \end{aligned}$$

follows from the law of large numbers in Lemma 6.2. The convergence of

$$\begin{aligned} \sum _k\Delta t_k \sum _{i=1}^{n_0} u_i \langle h(y_{t_{k-1}\over \epsilon }^\epsilon , \cdot ), \pi _i\rangle \rightarrow \sum _{i=1}^{n_0} u_i \int _s^t \langle h(y_{u\over \epsilon }^\epsilon , \cdot ), \pi _i\rangle du \end{aligned}$$

is also clear and follows from the Lipschitz continuity of h in the first variable and the equi continuity of the \(y^\epsilon \) path. Finally denote by \(y^\epsilon _{[0,s]}\) the restriction of the path \(y^\epsilon _\cdot \) to the interval [0, s], the weak convergence of \( \sum _{i=1}^{n_0} u_i F(y^\epsilon _{[0,s]}) \int _s^t \langle h(y_{u\over \epsilon }^\epsilon , \cdot ), \pi _i\rangle du\) to the required limit is trivial, as explained in Lemma 4.3 of [25]. \(\square \)

Assumption 6.1

The generator \({\mathcal {L}}_0\) satisfies Hörmander’s condition and has Fredholm index 0 (or has a unique invariant probability measure). For \(k=1, \dots , m\), \(\alpha _k\in C^r( G; {\mathbb {R}})\cap N^\perp \) for some \(r\ge \max \{3,{n\over 2}+1\}\).

If \({\mathcal {L}}_0\) is elliptic, it is sufficient to assume \(\alpha _k\in {\mathcal {B}}_b(G;{\mathbb {R}})\), instead of \(\alpha _k\in C^r\).

Theorem 6.4

If \({\mathcal {L}}_0\), \(\alpha _k\), \((y_0^\epsilon )\) and \(|Y_j|\) satisfy the conditions of Proposition 6.1 and Assumption 6.1, then \((y_{t\over \epsilon }^\epsilon )\) converge weakly to the Markov process determined by the Markov generator

$$\begin{aligned} \bar{{\mathcal {L}}} =-\sum _{i,j=1}^m \overline{ \alpha _i \beta _j } L_{Y_i}L_{Y_j}, \quad \overline{ \alpha _i \beta _j }=\sum _{b=1}^{n_0} u_b \langle \alpha _i \beta _j ,\pi _b\rangle . \end{aligned}$$

Proof

By Proposition 6.1, \(\{(y_{t\over \epsilon }^\epsilon , t\ge 0)\}\) is tight. We prove that any convergent sub-sequence converges to the same limit. Let \(\epsilon _n\rightarrow 0\) be a a monotone sequence converging to zero such that the probability distributions of \((y_{t\over \epsilon _n}^{\epsilon _n})\) converge weakly, on [0, T], to a measure \(\bar{\mu }\). For notational simplicity we may assume that \(\{(y_{t\over \epsilon }^\epsilon , t\ge 0)\}\) converges to \(\bar{\mu }\).

Let \(s<t\), \(\{{\mathcal {B}}_s\}\) the canonical filtration, \((Y_s)\) the canonical process, and \(Y_{[0,s]}\) its restriction to [0, s]. By the Stroock–Varadhan martingale method, it is sufficient to prove \(f(Y_t)-f(Y_s)-\int _s^t \bar{{\mathcal {L}}} f(Y_r)\; dr\) is a local martingale for any \(f\in C_K^\infty (M)\). By (4.1), the following is a local martingale,

$$\begin{aligned}&f(y^\epsilon _{t\over \epsilon }) -f(y^\epsilon _{s\over \epsilon }) - \epsilon \sum _{j=1}^m ( df(Y_j(y^\epsilon _{t\over \epsilon } ) )\beta _j(z^\epsilon _{t\over \epsilon }) +df(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j ( z^\epsilon _{s\over \epsilon }))\\&\quad +\,\epsilon \sum _{i,j=1}^m\int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} f(y^\epsilon _r)) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \;dr. \end{aligned}$$

Since the third term converges to zero as \(\epsilon \) tends to zero, it is sufficient to prove

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} {\mathbf E}\left\{ \epsilon \sum _{i,j=1}^m\int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} f(y^\epsilon _r)) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \;dr-\int _s^t \bar{{\mathcal {L}}} f(y_{r\over \epsilon }^\epsilon )\; dr \; \big | {\mathcal {F}}_{s\over \epsilon }\right\} =0. \end{aligned}$$

This follows from Lemma 6.3, completing the proof. \(\square \)

Corollary 6.5

Let \(p\ge 1\) be a number and suppose that \(\rho ^p\in B_{V, 0}\). Then, under the conditions of Theorem 6.4 and Assumption 5.1, \((y_{\cdot \over \epsilon }^\epsilon )\) converges in the Wasserstein p-distance on C([0, t]; M).

Proof

By Theorem 5.2, \(\sup _{\epsilon \le \epsilon _0} {\mathbf E}\sup _{s\le t} \rho ^p(o, y_{s\over \epsilon }^\epsilon )<\infty \). Let \(W_p\) denote the Wasserstein p distance:

$$\begin{aligned} W_p(\mu _1, \mu _2)=\left( \inf \int _{M\times M} \sup _{s\le t} \rho (\sigma _1(s), \sigma _2(s)) d\mu (\sigma _1, \sigma _2)\right) ^{1\over p}. \end{aligned}$$

Here the infimum is taken over all probability measures on the path spaces C([0, t]; M) with marginals \(\mu _1\) and \(\mu _2\). Note that C([0, t]; M) is a Banach space, a family of probability measures \(\mu _n\) converges to \(\mu \) in \(W_p\), if and only if the following holds: (1) it converges weakly and (2) \(\sup _n \int \sup _{s\le t} \rho ^p(o, \sigma _2(s)) d\mu _n( \sigma _2)<\infty \). The conclusion follows. \(\square \)

7 A study of the semigroups

The primary aim of the section is to study the properties of \(P_t f\) for \(f\in B_{V,r}\) where \(P_t\) is the semigroup for a generic stochastic differential equation. These results will be applied to the limit equation, to provide the necessary a priori estimates. Theorem 7.2 should be of independent interest, it also lead to Lemma 7.5, which will be used in Sect. 8.

Throughout this section M is a complete smooth Riemannian manifold. Let \(Y_0\) be \(C^5\) and \(\{Y_k, k=1,\dots , m\}\) be \(C^6\) smooth vector fields on M, \(\{B_t^k\}\) independent real valued Brownian motions. Let \((\Phi _t(y), t<\zeta (y)) \) be the maximal solution to the following equation

$$\begin{aligned} dy_t=\sum _{k=1}^m Y_k(y_t)\circ dB_t^k+Y_0(y_t)dt \end{aligned}$$
(7.1)

with initial value y. Its Markov generator is \({\mathcal {L}}f={1\over 2} \sum _{k=1}^m L_{Y_k}L_{Y_k}f+L_{Y_0}f\). Let \(Z={1\over 2}\sum _{k=1}^m \nabla _{Y_k}Y_k+Y_0\) be the drift vector field, so

$$\begin{aligned} {\mathcal {L}}f={1\over 2}\sum _{k=1}^m \nabla df(Y_k,Y_k)+df(Z). \end{aligned}$$
(7.2)

If there exists a \(C^3\) pre-Lyapunov function V, constants c and K such that \({\mathcal {L}}V\le c+KV\) then (7.1) is complete. However we do not limit ourselves to Lyapunov test for the completeness of the SDE. Let us denote \(|f|_r=\sum _{k=1}^{r}|\nabla ^{(k-1)} df|\) and \(|f|_{r,\infty }=\sum _{k=1}^{r}|\nabla ^{(k-1)} df|_\infty \). The following observation is useful.

Lemma 7.1

Let \(V\in {\mathcal {B}}(M;{\mathbb {R}})\) be locally bounded.

  • Suppose that \(\sum _{j=1}^m |Y_j| \in B_{V,0}\) and \( |Z|\in B_{V,0}\). Then if \(f\in B_{V, 2}\), \({\mathcal {L}}f\in B_{V,0}\). If \(f \in BC^2\), \(|{\mathcal {L}}f|\le |f|_{2, \infty } F_1\) where \(F_1\in B_{V,0}\), not depending on f.

  • Suppose that

    $$\begin{aligned} \sum _{j=1}^m( |Y_j| +|\nabla Y_j|+|\nabla ^{(2)} Y_j|) \in B_{V,0}, \quad |Z|+|\nabla Z|+|\nabla ^{(2)} Z|\in B_{V,0}. \end{aligned}$$

    If \(f\in B_{V,4}\), \({\mathcal {L}}^2 f\in B_{V,0}\). If \(f \in BC^4\), \(|{\mathcal {L}}^2 f|\le |f|_{4,\infty } F_2\) where \(F_2\) is a function in \(B_{V,0}\), not dependent of f.

Proof

That \({\mathcal {L}}f\) belongs to \(B_{V,0}\) follows from (7.2). If \(f\in BC^2\), \(|{\mathcal {L}}f|\le (|f|_2)_\infty (\sum _{k=1}^m|Y_k|^2+|Z|)\). For the second part we observe that \({\mathcal {L}}^2 f\) involves at most four derivatives of f and two derivatives of \(Y_j\) and Z where \(j=1, \dots , m\). \(\square \)

Let \(d\Phi _t(v)\) denote the derivative flow in the direction of \(v\in T_yM\). It is the derivative of the function \(y\mapsto \Phi _t(y, \omega )\), in probability. Moreover, it solves the following stochastic covariant differential equation along the solutions \(y_t:=\Phi _t(y_0)\),

$$\begin{aligned} Dv_t=\sum _{k=1}^m \nabla _{v_t} Y_k\circ dB_t^k+\nabla _{v_t} Y_0dt. \end{aligned}$$

Here \(D V_t:=//_t(y_\cdot ) d (//_{\!t}^{-1}(y_\cdot ) V_t)\) where \(//_{\!t}(y_\cdot ) :T_{y_0}M\rightarrow T_{y_t}M\) is the stochastic parallel transport map along the path \(y_\cdot \). Denote \(|d\Phi _t|_{y_0}\) the norm of \(d\Phi _t(y_0): T_{y_0}M\rightarrow T_{y_t}M\). For \(p>0\), \(y\in M\) and \(v\in T_yM\), we define \(H_p(y)\in {\mathbb L}(T_yM\times T_yM; {\mathbb {R}})\) by

$$\begin{aligned} H_p(y)(v,v)= \sum _{k=1}^{m} |\nabla Y_k(v)|^2 +(p-2)\sum _{k=1}^{m} {\langle \nabla Y_k(v), v\rangle ^2\over |v|^2} + 2\langle \nabla Z(v), v\rangle . \end{aligned}$$

Let \(\underline{h}_p(y)=\sup _{ |v|=1\}} H_p(y)(v,v)\). Its upper bound will be used to control \(|d\Phi _t|_y\).

Assumption 7.1

The Eq. (7.1) is complete. (i’) and (ii), below hold.

  1. (i)

    There exists a locally bounded function \(V\in {\mathcal {B}}(M;{\mathbb {R}}_+)\), s.t. for all \(q\ge 1\) and \(t\le T\), there exists a number \(C_q(t)\) and a polynomial \(\lambda _q\) such that

    $$\begin{aligned} \sup _{s\le t}{\mathbf E}(|V(\Phi _s(y))|^q)\le C_q(t)+C_q(t) \lambda _q( V(y)). \end{aligned}$$
    (7.3)
  2. (i’)

    There exists \(V\in C^3(M; {\mathbb {R}}_+)\) and constants c and K such that

    $$\begin{aligned} {\mathcal {L}}V\le c+KV, \quad |L_{Y_j}V|\le c+KV, \quad j= 1, \dots , m, \end{aligned}$$
  3. (ii)

    Let \(\tilde{V}=1+ \ln (1+|V|)\). For some constant c,

    $$\begin{aligned} \sum _{k=1}^{m} |\nabla Y_k|^2\le c\tilde{V}, \quad \sup _{|v|=1}{\langle \nabla Z(v),v\rangle } \le c\tilde{V}. \end{aligned}$$
    (7.4)

Remark

Suppose that (7.1) is complete. Since \({\mathcal {L}}V^q =qV^{q-1} {\mathcal {L}}V+q(q-1)V^{q-2} |L_{Y_j}V|^2\), (i’) implies (i). In fact, \( {\mathbf E}\sup _{s\le t} \left( V(y_s)\right) ^q\le \left( {\mathbf E}V(y_0)^q+cq^2t \right) e^{(c+K)q^2t}\).

Recall that (7.1) is strongly complete if \((t,y)\mapsto \Phi _t(y)\) is continuous almost surely on \([0, t]\times M\) for ant \(t>0\).

Theorem 7.2

Under Assumption 7.1, the following statements hold.

  1. 1.

    The SDE (7.1) is strongly complete and for every \(t\le T\), \( \Phi _t(\cdot )\) is \(C^4\). Furthermore for all \(p\ge 1\), there exists a positive number C(tp) such that

    $$\begin{aligned} {\mathbf E}\left( \sup _{s\le t}|d\Phi _s(y)|^p\right) \le C(t,p)+C(t,p)V^{C(t,p)}(y) . \end{aligned}$$
    (7.5)
  2. 2.

    Let \(f \in B_{V,1}\). Define \(\delta P_t (df))={\mathbf E}df(d\Phi _t(\cdot ))\). Then \(d(P_tf)=\delta P_t(df)\) and \(|d(P_tf)|\in B_{V,0}\). Furthermore for a constant C(tp) independent of f,

    $$\begin{aligned} |d(P_tf)|\le \sqrt{ {\mathbf E}\left( |df|_{\Phi _t^\epsilon (y)}\right) ^2} \sqrt{ C(t,p)(1+ V ^{C(t,p)}(y))}. \end{aligned}$$
  3. 3.

    Suppose furthermore that

    $$\begin{aligned} \sum _{j=1}^m\sum _{\alpha =0}^3|\nabla ^{(\alpha )}Y_j|\in B_{V,0}, \qquad \sum _{\alpha =0}^2|\nabla ^{(\alpha )} Y_0| \in B_{V, 0}. \end{aligned}$$

    Then, (a) \({\mathbf E}\sup _{s\le t}|\nabla d\Phi _s|^2(y)\in B_{V,0}\); (b) If \(f\in B_{V, 2}\), then \(P_tf\in B_{V,2}\), and

    $$\begin{aligned} (\nabla dP_tf)(u_1,u_2)={\mathbf E}\nabla df (d\Phi _t(u_1), d\Phi _t(u_2))+{\mathbf E}df (\nabla _{u_1} d\Phi _t(u_2)). \end{aligned}$$

    Furthermore, (c) \({d P_t f\over dt }=P_t {\mathcal {L}}f\), and \({\mathcal {L}}(P_t f)=P_t ({\mathcal {L}}f)\).

  4. 4.

    Let \(r\ge 2\). Suppose furthermore that

    $$\begin{aligned} \sum _{\alpha =0}^{r}|\nabla ^{(\alpha )}Y_0| \in B_{V, 0}, \quad \sum _{\alpha =0}^{r+1} \sum _{k=1}^m|\nabla ^{(\alpha )} Y_k|\in B_{ V,0}. \end{aligned}$$

    Then \({\mathbf E}\sup _{s\le t}(|\nabla ^{(r-1)} d\Phi _s|_y)^2\) belongs to \(B_{V,0}\). If \(f\in B_{V, r}\), then \(P_tf\in B_{V,r}\).

Proof

The statement on strong completeness follows from the following theorem, see Theorem 5.1 in [22]. Suppose that (7.1) is complete. If \(\tilde{V}\) is a function and \(c_0\) a number such that for all \(t>0\), K compact, and all constants \(\lambda \),

$$\begin{aligned} \sup _{y\in K}{\mathbf E}\exp {\left( \lambda \int _0^t \tilde{V}(\Phi _s(y))ds\right) }<\infty , \quad \sum _{k=1}^{m} |\nabla Y_k|^2\le c_0\tilde{V}, \quad \underline{h}_p\le 6pc_0 \tilde{V}, \end{aligned}$$
(7.6)

then (7.1) is strongly complete. Furthermore for every \(p\ge 1\) there exists a constant c(p) such that

$$\begin{aligned} {\mathbf E}\left( \sup _{s\le t}|d\Phi _s(y)|^p\right) \le c(p){\mathbf E}\left( \exp {\left( 6p^2 \int _0^{t}\tilde{V}(\Phi _s(y)) ds\right) }\right) . \end{aligned}$$
(7.7)

Since \(Y_j\) are \(C^6\), then for every t, \( \Phi _t(\cdot )\) is \(C^4\). It is easy to verify that condition (7.6) is satisfied. In fact, by the assumption \(\underline{h}_p\le 6p c \tilde{V}\). Take \(\tilde{V}=1+ \ln (1+|V|)\) then for \(p\ge 1\),

$$\begin{aligned} {\mathbf E}\left( \exp {\left( 6p^2 \int _0^{t}\tilde{V}(\Phi _s(y)) ds\right) }\right) \le C(t,p) +C(t,p) ( V ^{C(t,p)}(y))<\infty . \end{aligned}$$

This proves part (1).

For part (2) let \(f\in C^1\). Then \(y\mapsto f(\Phi _t(y,\omega ))\) is differentiable for almost every \(\omega \). Let \(\sigma : [0,t_0]\rightarrow M\) be a geodesic segment with \(\sigma (0)=y\). Then

$$\begin{aligned} { f(\Phi _t(\sigma _s, \omega ))-f(\Phi _t(y, \omega ))\over s} ={1\over s}\int _0^s {d\over dr} f\left( \Phi _t(\sigma _r, \omega )\right) dr. \end{aligned}$$

Since \({\mathbf E}|d\Phi _t(y)|^2\) is locally bounded in y, \(r\mapsto {\mathbf E}|d\Phi _t(\sigma _r, \omega )|\) is continuous and the expectation of the right hand side converges to \({\mathbf E}df(d\Phi _t(\dot{\sigma }(0))\). The left hand side clearly converges almost surely. Since \({\mathbf E}|df (d\Phi _t(y))|^2\) is locally bounded the convergence is in \(L^1\). We proved that \(d(P_tf)=\delta P_t(df)\). Furthermore, suppose that \(|df|\le K+K V^q\),

$$\begin{aligned} |d(P_tf)|_y\le & {} \sqrt{ {\mathbf E}(|df|_{\Phi _t^\epsilon (y)})^2} \sqrt{{\mathbf E}|d\Phi _t^\epsilon |_y^2} \\\le & {} \sqrt{ 2K^2+2K^2 {\mathbf E}V^{2q}(\Phi _t^\epsilon (y))} \sqrt{ c(p) C(t,p) +c(p) C(t,p) ( V ^{C(t,p)}(y))}. \end{aligned}$$

The latter, as a function of y, belongs to \(B_{V,0}\).

We proceed to part (3a). Let \(v,w \in T_yM\) and \(U_t:=\nabla d\Phi _t(w, v)\). Then \(U_t\) satisfies the following equation:

$$\begin{aligned} DU_t= & {} \sum _{k=1}^m\nabla ^{(2)} Y_k (d\Phi _t(v), d\Phi _t(w))\circ dB_t^k+\sum _{k=1}^m\nabla Y_k(U_t)\circ dB_t^k\\&+\,\nabla ^{(2)} Y_0 (d\Phi _t(v), d\Phi _t(w))dt+\nabla Y_0(U_t)dt. \end{aligned}$$

It follows that,

$$\begin{aligned} {d} |U_t|^2= & {} 2\sum _{k=1}^m \langle \nabla ^{(2)} Y_k (d\Phi _t(v), d\Phi _t(w))\circ dB_t^k+\nabla ^{(2)} Y_0 (d\Phi _t(v), d\Phi _t(w))dt, U_t\rangle \\&+\left\langle \sum _{k=1}^m\nabla Y_k(U_t)\circ dB_t^k+\nabla Y_0(U_t)dt, U_t\right\rangle . \end{aligned}$$

To the first term on the right hand side we apply Cauchy Schwartz inequality to split the first term in the inner product and the second term in the inner product. This gives: \(C|U_t|^2\) and other terms that does not involve \(U_t\). The Stratonovich corrections will throw out the extra derivative \(\nabla ^{(3)} Y_k\) which does not involve \(U_t\). The second term on the right hand side is a sum of the form \(\sum _{k=1}^m \langle \nabla Y_k(U_t), U_t\rangle dB_t^k\) for which only bound on \(|\nabla Y_k|\) is required, and

$$\begin{aligned} \left\langle \sum _{k=1}^m \nabla ^{(2)} Y_k(Y_k, U_t)+\nabla Y_0(U_t), U_t\right\rangle =\langle \nabla Z(U_t), U_t\rangle -\left\langle \sum _{k=1}^m \nabla Y_k(\nabla _{U_t} Y_k), U_t\right\rangle . \end{aligned}$$

The second term is bounded by

$$\begin{aligned} \left| \sum _{k=1}^m\left\langle \nabla Y_k(\nabla _{U_t} Y_k), U_t\right\rangle \right| \le \sum _{k=1}^m |\nabla Y_k|^2 |U_t|^2. \end{aligned}$$

By the assumption, there exist \(c>0,q\ge 1\) such that, for every \(k=1, \dots , m\),

$$\begin{aligned} |\nabla Y_k| \le \tilde{V} , |\nabla ^2Y_j|\le c+cV^q, |\nabla ^{(3)} Y_k|\le c+cV^q, \langle \nabla _u Z, u\rangle \le (c+KV)|u|^2. \end{aligned}$$

There is a stochastic process \(I_s\), which does not involve \(U_t\), and constants Cq such that

$$\begin{aligned} {\mathbf E}|U_t|^2 \le {\mathbf E}|U_0|^2+\int _0^t {\mathbf E}I_r dr+\int _0^t C{\mathbf E}\tilde{V}^q(y_r^\epsilon )|U_r|^2dr. \end{aligned}$$

By induction \(I_r\) has moments of all order which are bounded on compact intervals. By Gronwall’s inequality, for \(t\le T\),

$$\begin{aligned} {\mathbf E}|U_t|^2\le \left( {\mathbf E}|U_0|^2+\int _0^T {\mathbf E}I_r dr\right) \exp {\left( C \int _0^t \tilde{V}^q(y_r^\epsilon )dr\right) }. \end{aligned}$$

To obtain the supremum inside the expectation, we simply use Doob’s \(L^p\) inequality before taking expectations. With the argument in the proof of part (1) we conclude that \({\mathbf E}\sup _{s\le t} |\nabla d\Phi _s|^2(y)\) is finite and belongs to \(B_{V,0}\).

Part (3b). Let \(f\in B_{V, 2}\). By part (1), \(d(P_tf)={\mathbf E}df(d\Phi _t(y))\). Let \(u_1, u_2\in T_yM\). By an argument analogous to part (3), we may differentiate the right hand side under the expectation to obtain that

$$\begin{aligned} (\nabla dP_tf)(u_1,u_2)={\mathbf E}\nabla df (d\Phi _t(u_1), d\Phi _t(u_2))+{\mathbf E}df (\nabla _{u_1} d\Phi _t(u_2)). \end{aligned}$$

Hence \(P_tf\in B_{V,2}\). This procedure can be iterated.

Part (3c). By Itô’s formula,

$$\begin{aligned} f(y_t)=f(y_s)+\sum _{k=1}^m\int _s^t df(Y_k(y_r))dB_r^k+\int _s^t {\mathcal {L}}f(y_r)dr. \end{aligned}$$

Since \(df(Y_k)\in B_{V,0}\), the expectations of the stochastic integrals with respect to the Brownian motions vanish. Since \( {\mathcal {L}}f\in B_{V,0}\) by part (3), \({\mathcal {L}}f(y_r)\) is bounded in \(L^2\). It follows that the function \(r\mapsto {\mathbf E}{\mathcal {L}}f (y_r)\) is continuous,

$$\begin{aligned} \lim _{t\rightarrow s} {{\mathbf E}f(y_t)-{\mathbf E}f(y_s)\over t-s}={\mathbf E}{\mathcal {L}}f(y_s) \end{aligned}$$

and we obtain Kolmogorov’s backward equation, \({\partial \over \partial s}P_sf =P_s ({\mathcal {L}}f)\). Since \(P_sf \in B_{V,2}\), we apply the above argument to \(P_sf\), and take t to zero in \({P_t(P_sf)-P_sf\over t}\) and obtain that \({\partial \over \partial s} P_sf = {\mathcal {L}}(P_sf)\). This leads to the required statement \({\mathcal {L}}P_sf =P_s {\mathcal {L}}f\).

Part (4). For higher order derivatives of \(\Phi _t\) we simply iterate the above procedure and note that the linear terms in the equation for \({d\over dt} |\nabla ^{k-1}d\Phi _t(u_1, \dots , u_k)|^2\) are always of the same form. \(\square \)

Remark 7.3

With the assumption of part (3), we can show that for all integer p, \({\mathbf E}\sup _{s\le t}|\nabla d\Phi _s|_{y}^p\in B_{V,0}\).

If we assume the additional conditions that

$$\begin{aligned} |\nabla Y_0| \le c\tilde{V}, \quad \sum _{k=1}^m |\nabla ^{(2)} Y_k||Y_k|\le c\tilde{V}, \end{aligned}$$

the conclusion of the remark follows more easily. With the assumptions of part (5) we need to work a bit more which we illustrate below. Let \(U_t=\nabla d\Phi _t(w,v)\). Instead of writing down all term in \(|U_t|^p\) we classify the terms in \(|U_t|^p\) into two classes: those involving \(U_t\) and those not. For the first class we must assume that they are bounded by \(c\tilde{V}\) for some c. For the second class we may use induction and hence it is sufficient to assume that they belong to \(B_{V,0}\). The terms that involving \(U_t\) are:

$$\begin{aligned} \nabla Y_k(U_t), \quad \sum _{k=1}^m\nabla ^{(2)} Y_k(Y_k, U_t)+\nabla Y_0(U_t). \end{aligned}$$

The essential identity to use is:

$$\begin{aligned} \sum _{k=1}^m\nabla ^{(2)} Y_k(Y_k, U_t)+\nabla Y_0(U_t)=\nabla Z(U_t)-\sum _{k=1}^m \nabla Y_k(\nabla Y_k (U_t)). \end{aligned}$$

We do not need to assume that the second order derivatives \(|\nabla ^{(2)} Y_k||Y_k|\le c\tilde{V}\), it is sufficient to assume that for \(|\nabla Y_k|^2\) and \(\nabla Z\) for all \(k=1,\dots , m\). With a bit of care, we check that only one sided derivatives of Z are involved.

For example we can convert it to the \(p=2\) case,

$$\begin{aligned} d|U_t|^p={p\over 2} (|U_t|^{p-2}) \circ d|U_t|^2={p\over 2} |U_t|^{p-2} d|U_t|^p + {1 \over 4} p(p-1)|U_t|^{p-4} \langle d|U_t|^2\rangle . \end{aligned}$$

By the first term \({p\over 2} |U_t|^{p-2} d|U_t|^p \) we mean that in place of \(d|U_t|^p\) plug in all terms on the right hand side of the equation for \(d|U_t|^2\), after formally converting the integrals to Itô form. By \( \langle d|U_t|^2\rangle \) we mean the bracket of the martingale term on the right hand side of \(d|U_t|^2\). It is now easy to check that in all the terms that involving \(U_t\), higher order derivatives of \(Y_k\) does not appear, except in the form of \(|U_t|^{p-2}\langle \nabla _{U_t} Z, U_t\rangle \).

Remark 7.4

Assume the SDE is complete. Suppose that for some positive number C,

$$\begin{aligned} \sum _{k=1}^m \sum _{k=0}^{5} |\nabla ^{(k)}Y_k|\le C, \quad \sum _{k'=0}^{4} |\nabla ^{(k')} Y_0|\le C. \end{aligned}$$

Then for all \(p\ge 1\), there exists a constant C(tp) such that

$$\begin{aligned} {\mathbf E}\left( \sup _{s\le t}|d\Phi _s(x)|^p\right) \le C(t,p). \end{aligned}$$

Furthermore the statements in Theorem 7.2 hold for \(r\le 4\).

Recall that \(|f|_r=\sum _{k=1}^{r} |\nabla ^{(k-1)} d f|\) and \(|f|_{r,\infty }=\sum _{k=1}^{r} |\nabla ^{(k-1)}df|_\infty \).

Lemma 7.5

Assume Assumption 7.1 and

$$\begin{aligned} \sum _{\alpha =0}^{4}|\nabla ^{(\alpha )}Y_0| \in B_{V, 0}, \quad \sum _{\alpha =0}^{5} \sum _{k=1}^m|\nabla ^{(\alpha )} Y_k|\in B_{ V,0}. \end{aligned}$$

Then there exist constants \(q_1, q_2\ge 1\), \(c_1\) and \(c_2\) depending on t and f and locally bounded in t, also functions \(\gamma _i \in B_{V,0}\), \(\lambda _{q_i}\) polynomials, such that for \(s\le t\),

$$\begin{aligned} |P_tf(y_0)-P_sf (y_0)|\le & {} (t-s) c_1( 1+\lambda _{q_1}(V(y_0))), \quad f\in B_{V,2}\\ \left| P_tf(y_0)-P_sf(y_0)-(t-s) P_s( {\mathcal {L}}f) (y_0)\right|\le & {} (t-s)^2 c_2 ( 1+\lambda _{q_2}(V(y_0))), \quad f\in B_{V,4}\\ |P_tf(y_0)-P_sf (y_0)|\le & {} (t-s) (1+|f|_{2, \infty })\gamma _1(y_0), \quad \forall f \in BC^2 \\ \left| P_tf(y_0)-P_sf(y_0)-(t-s) P_s( {\mathcal {L}}f) (y_0)\right|\le & {} (t-s)^2(1+|f|_{4, \infty })\gamma _2(y_0), \quad \forall f \in BC^4. \end{aligned}$$

Proof

Denote \(y_t=\Phi _t(y_0)\), the solution to (7.1). Then for \(f\in C^2\),

$$\begin{aligned} P_tf(y_0)=P_sf (y_0)+\int _s^t P_r ({\mathcal {L}}f)(y_0 )dr+\sum _{k=1}^{m} {\mathbf E}\left( \int _s^t df(Y_k(y_r)) dB_r^k\right) . \end{aligned}$$

Since \(|L_{Y_k} f| \le |df|_\infty |Y_k|\) and |df|, \(Y_k\) belong to \( B_{V,0}\), by Assumption 7.1(i), \( \int _0^t {\mathbf E}|L_{Y_k}f|_{y_r}^2 dr \) is finite and the last term vanishes. Hence \( |P_tf(y_0)-P_sf (y_0)|\le \int _s^t P_{s_2} ( {\mathcal {L}}f)(y_0)ds_2\). By Lemma 7.1, \({\mathcal {L}}f\in B_{V,0}\) if \(f\in B_{V,2}\). Let \(K, q_1\) be s.t. \(|{\mathcal {L}}f|\le K+KV^{q_1}\).

$$\begin{aligned} \int _s^r |P_{s_2} ( {\mathcal {L}}f)(y_0)|ds_2 \le \int _0^r (K+K{\mathbf E}V^{q_1}(\Phi _{s_2}(y_0)) )ds_2. \end{aligned}$$

By the assumption, we see easily that \(\sum _{k=0}^3 |\nabla ^{(\alpha )} Z|\in B_{V,0}\). By Assumption 7.1, \(\sup _{s\le t}{\mathbf E}(|V(\Phi _s(y_0))|^{q_1})\le C_{q_1}(t)+C_{q_1}(t) \lambda _{q_1}( V(y_0))\) and the first conclusion holds. We repeat this procedure for \(f\in C^4\) to obtain:

$$\begin{aligned}&P_tf(y_0)-P_sf (y_0)\\&\quad =\int _s^t \left( P_s ({\mathcal {L}}f)(y_0 ) +\int _s^r P_{s_2} ( {\mathcal {L}}^2 f)(y_0)ds_2 +\sum _{k=1}^{m} {\mathbf E}\int _s^t ( L_{Y_k} ({\mathcal {L}}f) )(y_{s_2})) dB_{s_2}^k \right) ds_1. \end{aligned}$$

The last term also vanishes, as every term in \(L_{Y_k} {\mathcal {L}}f \) belongs to \(B_{V,0}\). Indeed

$$\begin{aligned} L_{Y_k} {\mathcal {L}}f= & {} \sum _i \nabla ^{(2)} df(Y_k, Y_i, Y_i)+ 2\sum _i \nabla df( \nabla _{Y_k}Y_i, Y_i)+ \nabla df \left( Y_k, Z\right) \\&+\sum _i df(\nabla ^{(2)}Y_i(Y_k,Y_i) +\nabla Y_i(\nabla _{Y_k}Y_i+\nabla _{Y_k}Y_0)). \end{aligned}$$

This gives, for all \(f \in B_{V,4}\),

$$\begin{aligned} \left| P_tf(y_0)-P_sf(y_0)-(t-s) P_s( {\mathcal {L}}f) (y_0)\right| \le \left| \int _s^t \int _s^{s_1} P_{s_2} ({\mathcal {L}}^2 f)(y_0)ds_2 ds_1 \right| . \end{aligned}$$
(7.8)

Let \(q_2, K\) be numbers such that \(|{\mathcal {L}}^2 f| \le K+K V^{q_2}\). Then,

$$\begin{aligned} \sup _{s\le t} P_{s} ({\mathcal {L}}^2 f)(y_0)\le K+K {\mathbf E}\left( V(y_s)\right) ^{q_2} \le K+C_{q_2}(t)+KC_{q_2}(t) \tilde{\lambda }_{q_2}(V(y_0)). \end{aligned}$$

Consequently, there exist a constant \(c_2(t)\) s.t.

$$\begin{aligned} \left| P_tf(y_0)-P_sf(y_0)-(t-s) P_s( {\mathcal {L}}f) (y_0)\right| \le (t-s)^2 c_2(t, K, q_2) (1+ \lambda _{q_2}(V(y_0))). \end{aligned}$$

completing the proof for \(f\in B_{V,2}\) and \(B_{V,4}\). Next suppose that \(f\in BC^2\). By Lemma 7.1, \(| {\mathcal {L}}f|\le |f|_{2, \infty } F_1\), and \(|{\mathcal {L}}^2 f| \le |f|_{4, \infty } F_2\) if \(f\in BC^4\). Here \(F_1, F_2\in B_{V,0}\) and do not depend on f. We iterate the argument above to complete the proof for \(f\in BC^4\). \(\square \)

8 Rate of convergence

If \({\mathcal {L}}_0\) has a unique invariant probability measure \(\pi \) and \(f\in L^1(G, d\pi )\) denote \(\bar{f}=\int _G fd\pi \). Let \(\bar{{\mathcal {L}}}=-\sum _{i,j=1}^m \overline{\alpha _i \beta _j} L_{Y_i}L_{Y_j} \). Let \(\{\sigma _k^i, i,k=1,\dots , m\}\) be the entries in a square root of the matrix \((\overline{-\alpha _i \beta _j})\). They satisfy \(\sum _{k=1}^m \sigma _k^i\sigma _k^j=(\overline{-\alpha _i \beta _j})\) and are constants. Let us consider the SDE:

$$\begin{aligned} dy_t=\sum _{k=1}^{m} \left( \sum _{i=1}^{m} \sigma _k^i Y_i(y_t) \right) \circ dB_t^k, \end{aligned}$$
(8.1)

where \(\{B_t^k\}\) are independent one dimensional Brownian motions. Let

$$\begin{aligned} \tilde{Y}_k= \sum _{i=1}^{m} \sigma _k^i Y_i(y_t), \quad \tilde{Z}=\sum _{i,j=1}^m\overline{ -\alpha _i\beta _j}\nabla _{Y_i}Y_j. \end{aligned}$$

The results from Sect. 7 apply. Recall that \({\mathcal {L}}_0={1\over 2}\sum _{i=1}^p L_{X_i}L_{X_i}+L_{X_0}\) and \((z_t^\epsilon )\) are \({\mathcal {L}}^\epsilon ={1\over \epsilon } {\mathcal {L}}_0\) diffusions. Let \(\Phi ^\epsilon _t(y)\) be the solution to the SDE (1.5): \(\dot{y}_t^\epsilon =\sum _{k=1}^m \alpha _k(z_t^\epsilon )Y_k(y_t^\epsilon )\) with initial value y.

Assumption 8.1

G is compact, \(Y_0\in C^5(\Gamma TM)\), and \(Y_k\in C^6(\Gamma TM)\) for \(k=1, \dots , m\). Conditions (1)–(5) below hold or Conditions (1), (2’) and (3–5) hold.

  1. (1)

    The SDEs (3.1) and (8.1) are complete.

  2. (2)

    \(V\in {\mathcal {B}}(M;{\mathbb {R}}_+)\) is a locally bounded function and \(\epsilon _0\) a positive number s.t. for all \(q\ge 1\) and \(T>0\), there exists a locally bounded function \(C_q: {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\), a real valued polynomial \(\lambda _q\) such that for \(0\le s\le t\le T\) and for all \(\epsilon \le \epsilon _0\)

    $$\begin{aligned} \sup _{s\le u \le t} {\mathbf E}\left\{ V^q(\Phi _{u\over \epsilon }^\epsilon (y)) \; \big | {\mathcal {F}}_{s\over \epsilon } \right\} \le C_q(t)+C_q(t) \lambda _q( V(\Phi _{s\over \epsilon }^\epsilon (y)). \end{aligned}$$
    (8.2)
  3. (2’)

    There exists a function \(V\in C^3(M; {\mathbb {R}}_+) \) s.t. for all \(i,j\in \{1, \dots , m\}\), \(|L_{Y_i}L_{Y_j} V|\le c+KV\) and \(|L_{Y_j} V| \le c+KV\).

  4. (3)

    For V defined above, let \(\tilde{V}=1+ \ln (1+|V|)\). Suppose that

    $$\begin{aligned}&\sum _{\alpha =0}^{4}|\nabla ^{(\alpha )}Y_0| \in B_{V, 0}, \quad \sum _{\alpha =0}^{5} \sum _{k=1}^m|\nabla ^{(\alpha )} Y_k|\in B_{ V,0}, \\&\sum _{j=1}^m|\nabla Y_j| ^2\le c\tilde{V}, \quad \sup _{|u|=1} \langle \nabla \tilde{Z}(u), u\rangle \le c\tilde{V} \end{aligned}$$
  5. (4)

    \({\mathcal {L}}_0\) satisfies Hörmander’s conditions and has a unique invariant measure \(\pi \) satisfying Assumption 3.1.

  6. (5)

    \(\alpha _k\in C^3(G; {\mathbb {R}})\cap N^\perp \).

We emphasize the following:

Remark 8.1

  1. (a)

    If V in (2’) is a pre-Lyapunov function, then (3.1) is complete. Furthermore \(|\bar{{\mathcal {L}}} V|\le c+KV\) and so (8.1) is complete.

  2. (b)

    Under conditions (1), (2’) and (4–5), (2) holds. See Theorem 5.2. Also Corollary 5.3 holds. Conditions (1–5) implies the conclusions of Theorem 7.2.

  3. (c)

    If \({\mathcal {L}}_0\) satisfies strong Hörmander’s condition, condition (4) is satisfied.

Let \(P_t^\epsilon \) be the probability semigroup associated with \((y_{t}^\epsilon )\) and \(P_t\) the Markov semigroup for \(\bar{{\mathcal {L}}}\). Recall that \(|f|_{r,\infty }=\sum _{j=1}^{r} |\nabla ^{(j-1)}df|_\infty \). We recall that operator \({\mathcal {L}}_0\) on a compact manifold G satisfying strong Hörmander’s condition has an exponential mixing rate, so \({\mathcal {L}}_0\) satisfy Assumption 3.1.

Theorem 8.2

Assume that \(Y_k, \alpha _k\) and \( {\mathcal {L}}_0\) satisfy Assumption 8.1. For every \(f \in B_{V,4}\),

$$\begin{aligned} |{\mathbf E}f(\Phi ^\epsilon _{T\over \epsilon }(y_0)) -P_Tf(y_0)|\le \epsilon |\log \epsilon |^{1\over 2}C(T)\gamma _1(y_0), \end{aligned}$$

where \(\gamma _1\in B_{V,0}\) and C(T) are constant increasing in T. Similarly, if \(f\in BC^4\),

$$\begin{aligned} |{\mathbf E}f(\Phi ^\epsilon _{T\over \epsilon }(y_0)) -P_Tf(y_0)| \le \epsilon |\log \epsilon |^{1\over 2}\,C(T)\gamma _2(y_0) (1+ |f|_{4, \infty }). \end{aligned}$$

where \(\gamma _2\) is a function in \(B_{V,0}\) that does not depend on f and C(T) are constants increasing in T.

Proof

Step 1. To obtain optimal estimates we work on intervals of order \(\epsilon \), c.f. Lemma 3.4. Let \(t_0=0<t_1<\dots <t_N=T\) be a partition of [0, T] with \(\Delta t_k=t_k-t_{k-1}=\epsilon \) for \(k<N\) and \(t_1\le \epsilon \). Write \(y_t^\epsilon =\Phi _t^\epsilon (y_0)\). Then,

$$\begin{aligned}&f(y^\epsilon _{t\over \epsilon })-P_T f(y_0) = \sum _{k=1}^N( P_{T-t_k}f(y_{t_k\over \epsilon }^\epsilon ) - P_{T-t_{k-1}}f(y_{t_{k-1}\over \epsilon }^\epsilon ) )\\&\quad = \sum _{k=1}^N ( P_{T-t_k}f(y_{t_k \over \epsilon }^\epsilon ) - P_{T-t_{k}} f(y_{t_{k-1 \over \epsilon }}^\epsilon ) + \Delta t_k ( P_{T-t_{k-1}} \bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon )) )\\&\qquad + \sum _{k=1}^N ( P_{T-t_k}f (y_{t_{k-1\over \epsilon }}^\epsilon ) - P_{T-t_{k-1}}f(y_{t_{k-1}\over \epsilon }^\epsilon ) - \Delta t_k (P_{T-t_{k-1}} \bar{{\mathcal {L}}} f ) (y_{t_{k-1\over \epsilon }}^\epsilon )). \end{aligned}$$

Define

$$\begin{aligned} I_k^\epsilon= & {} P_{T-t_k}f(y_{t_k \over \epsilon }^\epsilon ) - P_{T-t_{k}} f(y_{t_{k-1 \over \epsilon }}^\epsilon ) + \Delta t_k ( P_{T-t_{k-1}} \bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon )),\\ J_k^\epsilon= & {} P_{T-t_k}f - P_{T-t_{k-1}}f -\Delta t_k P_{T-t_{k-1}} \bar{{\mathcal {L}}} f. \end{aligned}$$

Since \(f\in B_{V,4}\), Lemma 7.5 applies and obtain the desired estimate on the second term:

$$\begin{aligned} | J_k^\epsilon (y_{t_{k-1\over \epsilon }}^\epsilon )| \le (\Delta t_k)^2 \tilde{c}_2(T,f)( 1+ (\lambda _{q_2} (V(y_{t_{k-1\over \epsilon }}^\epsilon ))) \end{aligned}$$

where \(\tilde{c}_2(T,f)\) is a constant and \(\lambda _{q_2}\) a polynomial.

Let Kq be constants such that \(\lambda _{q_2}(V)\le K+KV^q\). We apply (8.2) from Assumption 8.1 to see that for some constant \(C_q(T)\) depending on \(\lambda _{q_2}(V)\),

$$\begin{aligned} {\mathbf E}\left( \lambda _{q_2} (V(y_{t_{k-1\over \epsilon }}^\epsilon ))\right) \le K+KC_{q}(T)+KC_{q}(T) \lambda _{q} (V(y_0)). \end{aligned}$$

Since \(\Delta t_k\le \epsilon \) and \(N\sim {1\over \epsilon }\),

$$\begin{aligned} \sum _{k=1}^N {\mathbf E}| J_k^\epsilon (y_{t_{k-1\over \epsilon }}^\epsilon ) | \le \epsilon \tilde{c}_2(T,f)(K+1)( 1+C_{q}(T)+C_{q}(T) \lambda _{q} (V(y_0))). \end{aligned}$$
(8.3)

If f belongs to \( BC^4\), we apply Lemma 7.5 to see that there exists a function \(F\in B_{V,0}\), independent of f s.t.

$$\begin{aligned} \bigg | J_k^\epsilon (y_{t_{k-1\over \epsilon }}^\epsilon ) \bigg | \le (\Delta t_k)^2(1+|f|_{4,\infty }) \left( F(y_{t_{k-1\over \epsilon }}^\epsilon )\right) . \end{aligned}$$

Hence

$$\begin{aligned} \sum _{k=1}^N {\mathbf E}\bigg | J_k^\epsilon (y_{t_{k-1\over \epsilon }}^\epsilon )\bigg | \le \epsilon (1+|f|_{4,\infty }) {\mathbf E}\left( F(y_{t_{k-1\over \epsilon }}^\epsilon )\right) . \end{aligned}$$
(8.4)

The rest of the proof is just as for the case of \(f\in B_{V,4}\).

Step 2. Let \(0\le s <t\). By part (3) of Theorem 7.2, \(\bar{{\mathcal {L}}} P_tf=P_t \bar{\mathcal {L}}f\) for any \(t>0\) and \(P_{T-t_{k}} \bar{{\mathcal {L}}} f =\bar{{\mathcal {L}}} P_{T-t_k}f\). We will approximate \(P_{T-t_{k-1}} \bar{{\mathcal {L}}} f \) by \(P_{T-t_{k}} \bar{{\mathcal {L}}} f\) and estimate the error

$$\begin{aligned} \sum _{k=1}^N \Delta t_k ( P_{T-t_{k}} \bar{{\mathcal {L}}} f -P_{T-t_{k-1}} \bar{{\mathcal {L}}} f )(y_{t_{k-1\over \epsilon }}^\epsilon ). \end{aligned}$$

By Lemma 7.1, \({\mathcal {L}}f\in B_{V,2}\), and we may apply Lemma 7.5 to \(\bar{{\mathcal {L}}} f\). We have,

$$\begin{aligned} |P_{T-t_{k}}\bar{{\mathcal {L}}} f(y_0)-P_{T-t_{k-1}} \bar{{\mathcal {L}}} f (y_0)| \le \Delta t_k \tilde{c}_1(T)( 1+\lambda _{q_1}(V(y_0))). \end{aligned}$$

Recall that \(\lambda _{q_1} (V)\in B_{V,0}\). Summing over k and take the expectation of the above inequality we obtain that

$$\begin{aligned} \sum _{k=1}^N \Delta t_k | P_{T-t_{k}} \bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon )-P_{T-t_{k-1}} \bar{{\mathcal {L}}} f(y_{t_{k-1\over \epsilon }}^\epsilon )| \le \epsilon c_1(T)( 1+\lambda _{q_1}(V(y_0))). \end{aligned}$$
(8.5)

If \(f\in BC^2\), \({\mathcal {L}}f\in BC^2\). By Lemma 7.5,

$$\begin{aligned} |P_{T-t_{k}}\bar{{\mathcal {L}}} f(y_0)-P_{T-t_{k-1}} \bar{{\mathcal {L}}} f (y_0)| \le \Delta t_k \tilde{c}_1(T)( 1+\lambda _{q_1}(V(y_0))). \end{aligned}$$

there exist constant C(T) and a function \(\gamma _1 \in B_{V,0}\), independent of f, s.t.

$$\begin{aligned} |P_tf(y_0)-P_sf (y_0)| \le (t-s) (1+|f|_{2,\infty })\gamma _1(y_0). \end{aligned}$$

Here \(\gamma _1\in B_{V,0}\). Thus for \(f\in BC^2\),

$$\begin{aligned} \sum _{k=1}^N \Delta t_k | P_{T-t_{k}} \bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon )-P_{T-t_{k-1}} \bar{{\mathcal {L}}} f(y_{t_{k-1\over \epsilon }}^\epsilon )| \le 2\epsilon |f|_{2,\infty }( 1+\gamma _1(y_0)). \end{aligned}$$
(8.6)

Finally instead of estimating \(I_k^\epsilon \), we estimate

$$\begin{aligned} D_k^\epsilon :=P_{T-t_k}f(y_{t_k \over \epsilon }^\epsilon ) - P_{T-t_{k}} f(y_{t_{k-1 \over \epsilon }}^\epsilon ) + \Delta t_k P_{T-t_{k}} \bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon ). \end{aligned}$$

Step 3. If \(f \in B_{V,4}\), by Theorem 7.2, \(P_tf\in B_{V,4}\) for any t. Since \(\alpha _k\in N^\perp \cap C^3\), we may apply Lemma 4.1 to \(P_{T-t_k}f\) and obtain the following formula for \(D_k^\epsilon \).

$$\begin{aligned}&D_k^\epsilon =P_{T-t_k}f (y^\epsilon _{{t_k}\over \epsilon } ) - P_{T-t_k}f (y^\epsilon _{t_{k-1}\over \epsilon } ) + \Delta t_k P_{T-t_{k}}\bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon )\\&\quad =\epsilon \sum _{j=1}^m ( dP_{T-t_k}f (Y_j (y^\epsilon _{{t_k}\over \epsilon } ) )\beta _j (z^\epsilon _{{t_k}\over \epsilon } ) - dP_{T-t_k}f (Y_j (y^\epsilon _{t_{k-1}\over \epsilon } ) ) \beta _j ( z^\epsilon _{t_{k-1}\over \epsilon } ) )\\&\qquad + \Delta t_k P_{T-t_{k}}\bar{{\mathcal {L}}} f (y_{t_{k-1\over \epsilon }}^\epsilon ) -\epsilon \sum _{i,j=1}^m\int _{t_{k-1}\over \epsilon }^{{t_k}\over \epsilon } ( L_{Y_i}L_{Y_j} P_{T-t_k}f(y^\epsilon _r) ) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r)\; dr\\&\qquad - \sqrt{\epsilon }\sum _{j=1}^m\sum _{k=1}^{m'} \int _{s\over \epsilon }^{t\over \epsilon } d P_{T-t_k}f( Y_j(y^\epsilon _r)) \; d\beta _j( X_k(z^\epsilon _r)) \;dW_r^k. \end{aligned}$$

Since \(Y_0, Y_k\in B_{V,0}\), \(L_{Y_i} L_{Y_j} P_{T-t_k}f\in B_{V,0}\), which follows the same argument as for Lemma 7.1. In particular, for each \(0<\epsilon \le \epsilon _0\),

$$\begin{aligned} \int _0^{t\over \epsilon }{\mathbf E}(| L_{Y_i} L_{Y_j} P_{T-t_k}f(y_r^\epsilon )|)^2 dr <\infty . \end{aligned}$$

The expectation of the martingale term in the above formula vanishes. For \(j=1,\dots , m\) and \(k=1, \dots , N\), let

$$\begin{aligned} A_{jk}^\epsilon= & {} dP_{T-t_k}f ( Y_j (y^\epsilon _{t_k\over \epsilon } ) ) \beta _j (z^\epsilon _{ t_k \over \epsilon } ) - dP_{T-t_k}f (Y_j (y^\epsilon _{t_{k-1}\over \epsilon } ) ) \beta _j ( z^\epsilon _{t_{k-1}\over \epsilon } ),\\ B_{k}^\epsilon= & {} \Delta t_k (P_{T-t_{k}}\bar{{\mathcal {L}}} f) (y_{t_{k-1\over \epsilon }}^\epsilon )-\epsilon \sum _{i,j=1}^m\int _{t_{k-1}\over \epsilon }^{{t_k}\over \epsilon } ( L_{Y_i}L_{Y_j} P_{T-t_k}f )(y^\epsilon _r) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \; dr. \end{aligned}$$

Step 4. We recall that \(\bar{{\mathcal {L}}} P_{T-t_k}f=\sum _{i,j=1}^m \overline{\alpha _i\beta _j} L_{Y_i}L_{Y_j}P_{T-t_k}f\). By Theorem 7.2, \(L_{Y_i} L_{Y_j} P_{T-t_k}f\) is \(C^2\). Furthermore by Assumption 3.1, the \((z_t^\epsilon )\) diffusion has exponential mixing rate. We apply Corollary 5.3 to each function of the form \(L_{Y_i} L_{Y_j} P_{T-t_k}f\) and take \(h=\alpha _i\beta _j\) There exist a constant \(\tilde{c}\) and a function \(\gamma _{i,j,,k,\epsilon }\in B_{V,0}\) such that

$$\begin{aligned} |B_k^\epsilon |\le & {} {\Delta t_k}\sum _{i,j=1}^m \left| \overline{\alpha _i\beta _j}\; L_{Y_i} L_{Y_j} P_{T-t_k}f (y_{t_{k-1}\over \epsilon }^\epsilon )\right. \\&\qquad \left. - {\epsilon \over \Delta t_k} \int _{t_{k-1}\over \epsilon }^{t_k\over \epsilon } {\mathbf E}\left\{ L_{Y_i} L_{Y_j} P_{T-t_k}f (y_r^\epsilon ) (\alpha _i\beta _j)(z_{r}^\epsilon ) \big | {\mathcal {F}}_{t_{k-1}\over \epsilon }\right\} dr\right| \\\le & {} \sum _{i,j=1}^m\tilde{c} |\alpha _i\beta _j|_\infty \gamma _{i,j, k,\epsilon }(y_{t_{k-1}\over \epsilon }^\epsilon ) ({\epsilon ^2}+ (\Delta t_k)^2), \end{aligned}$$

where denoting \(G^k_{i,j}:=L_{Y_i} L_{Y_j} P_{T-t_k}f\),

$$\begin{aligned} \gamma _{i,j,k, \epsilon }=|G^k_{i,j}| + \sum _{l'=1}^m |L_{Y_{l'}}G^k_{i,j}|+\sum _{l,l'=1}^m {\epsilon \over \Delta t_k} \int _{t_{k-1}\over \epsilon }^{t_k\over \epsilon } {\mathbf E}\left\{ |L_{Y_l} L_{Y_{l'}}G^k_{i,j}(y^\epsilon _r) | \; \big |\;{\mathcal {F}}_{s\over \epsilon } \right\} dr. \end{aligned}$$

By Theorem 7.2, \(G^k_{i,j}=L_{Y_i} L_{Y_j} P_{T-t_k}f\) belong to \(B_{V,2}\). Furthermore \(G^k_{i,j}\) and its first two derivatives are bounded by a function in \(B_{V,0}\) which depends on f only through \(\sum _{k=0}^4 P_{T-t_k} ( |\nabla ^{(k)}d f|^p)\), for some p. Thus there are numbers cq such that for all k, \(\max _{i,j}|\gamma _{i,j,k, \epsilon }|\le c+cV^q\), for some cq. Since \(\Delta t_k\le \epsilon \le 1\), \(N\sim O({1\over \epsilon })\), we summing over k,

$$\begin{aligned} \sum _{k=1}^N {\mathbf E}|B_k^\epsilon | \le 2\epsilon \cdot c\cdot \tilde{c} \sum _{i,j=1}^m |\alpha _i\beta _j|_\infty C_q(T) \sup _k {\mathbf E}(1+V^q (y_{t_{k-1}\over \epsilon }^\epsilon )) \le \epsilon C(T)\tilde{\gamma }(y_0), \end{aligned}$$
(8.7)

for some constant C(T) and some function \(\tilde{\gamma }\) in \(B_{V,0}\). If \(f\in BC^4\), it is easy to see that there is a function \(g\in B_{V,0}\), not depending on f, s.t. \(\max _{i,j,k}{\mathbf E}\gamma _{i,j, k,\epsilon }(y_{t_{k-1}\over \epsilon }^\epsilon ) \le C(T)g(y_0)|f|_{4,\infty }\).

Step 5. Finally, by Lemma 8.4 below, for \( \epsilon \le s\le t\le T\) and \(f\in B_{V,3}\), there is a constant C and function \(\tilde{\gamma }\in B_{V,0}\), depending on Tf s.t. for \(0\le s <t \le T\),

$$\begin{aligned}&\left| \sum _{j=1}^m {\mathbf E}df(Y_j (y^\epsilon _{t\over \epsilon } ) )\beta _j (z^\epsilon _{t\over \epsilon } ) -{\mathbf E}df (Y_j (y^\epsilon _{s\over \epsilon } ) ) \beta _j ( z^\epsilon _{s\over \epsilon } ) \right| \nonumber \\&\quad \le C\gamma (y_0)\epsilon \sqrt{|\log \epsilon |}+C\gamma (y_0) (t-s). \end{aligned}$$
(8.8)

For the partition \(t_0<t_1<\dots <t_N\), we assumed that \(t_1-t_0\le \epsilon \) and \(\Delta t_k=\epsilon \) for \(k\ge 1\). Let \(k\ge 2\). Since \(dP_{T-t_k}f(Y_j)\in B_{V,3}\), estimate (8.8) holds also with f replaced by \(dP_{T-t_k}f(Y_j)\), and we have:

$$\begin{aligned} \left| \sum _{j=1}^m \epsilon {\mathbf E}A_{jk}^\epsilon \right| \le C\tilde{\gamma }(y_0)\epsilon ^2 \sqrt{|\log \epsilon |}, \quad k\ge 2 \end{aligned}$$
(8.9)

Since \(\beta _j\) are bounded and by Theorem 7.2 \(dP_{T-t_k}f\) is bounded by a function in \(B_{V,0}\) that does not depend on k, for \(\epsilon \le \epsilon _0\), each term \({\mathbf E}|A_{jk}^\epsilon |\) is bounded by a function in \(B_{V,0}\) and \( \sup _{0<\epsilon \le \epsilon _0}|{\mathbf E}A_{jk}^\epsilon |\) is of order \(\epsilon \tilde{\gamma }(y_0)\) for some function \(\tilde{\gamma }\in B_{V,0}\). We ignore a finite number of terms in the summation. In particular we will not need to worry about the terms with \(k=1\). Since the sum over k involves \(O({1\over \epsilon })\) terms the following bound follows from (8.9):

$$\begin{aligned} \sum _{k=1}^N \left| \sum _{j=1}^m \epsilon {\mathbf E}A_{jk}^\epsilon \right| \le C\tilde{\gamma }(y_0)\epsilon \sqrt{|\log \epsilon |}. \end{aligned}$$
(8.10)

Here \(\tilde{\gamma }\in B_{V,0}\) and may depend on f. The case of \(f\in BC^4\) can be treated similarly. The estimate is of the form \(\tilde{\gamma }(\epsilon )=(1+|f|_{4,\infty })\gamma _0\) where \( \gamma _0\in B_{V,0}\) does not depend on f. We putting together (8.3), (8.5), (8.7) and (8.10)to see that if \(f \in B_{V,4}\),

$$\begin{aligned} |{\mathbf E}f(\Phi ^\epsilon _{t\over \epsilon }(y_0)) -P_tf(y_0)|\le C(T) \gamma (y_0)\epsilon \sqrt{|\log \epsilon |}, \end{aligned}$$

where \(\gamma \in B_{V,0}\). If \(f\in BC^4\), collecting the estimates together, we see that there is a constant C(T) s.t.

$$\begin{aligned} |{\mathbf E}f(\Phi ^\epsilon _{t\over \epsilon }(y_0)) -P_tf(y_0)| \le \epsilon \sqrt{|\log \epsilon |}\, C(T)\left( 1+ \sum _{k=1}^4 |\nabla ^{(k-1)}df|_\infty \right) \tilde{\gamma }(y_0) \end{aligned}$$

where \(\tilde{\gamma }\) is a function in \(B_{V,0}\) that does not depend on f. By induction the finite dimensional distributions converge and hence the required weak convergence. The proof is complete. \(\square \)

Lemma 8.3

Assume that (3.1) are complete for all \(\epsilon \in (0, \epsilon _0)\), some \(\epsilon _0>0\).

  1. (1)

    \({\mathcal {L}}_0\) is a regularity improving Fredholm operator on a compact manifold G, \(\alpha _k\in C^3\cap N^\perp \).

  2. (2)

    There exists \(V\in C^2(M; {\mathbb {R}}_+)\), constants cK, s.t.

    $$\begin{aligned} \sum _{j=1}^m |L_{Y_j} V| \le c+KV, \quad \sum _{j=1}^m |L_{Y_i}L_{Y_j}V| \le c+KV. \end{aligned}$$
  3. (2’)

    There exists a locally bounded \(V:M\rightarrow {\mathbb {R}}_+\) such that for all \(q\ge 2\) and \(t>0\) there are constants C(t) and \(q'\), with the property that

    $$\begin{aligned} \sup _{s\le u\le t}{\mathbf E}\left\{ (V(y_u^\epsilon ))^q \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \le C V^{q'}(y^\epsilon _{s\over \epsilon })+C. \end{aligned}$$
    (8.11)
  4. (3)

    For V in part (2) or in part (2’), \( \sup _{\epsilon } {\mathbf E}V^q(y_0^\epsilon )<\infty \) for all \(q\ge 2\).

For \(f\in C^2\) with the property that \(L_{Y_j} f, L_{Y_i}L_{Y_j}f\in B_{V,0}\) for all ij, there exists a number \(\epsilon _0>0\) s.t. for every \(0<\epsilon \le \epsilon _0\),

$$\begin{aligned} |{\mathbf E}\{ f (y_{t\over \epsilon }^\epsilon ) \; \big |\; {\mathcal {F}}_{s\over \epsilon } \}- f(y_{s\over \epsilon }^\epsilon ) |\le & {} \gamma _1(y_{s\over \epsilon }^\epsilon )\max _j |\beta _j|_\infty \; \epsilon \\&+\,(t-s) \gamma _2(y_{s\over \epsilon }^\epsilon )\max _i|\alpha _i|_\infty \max _j |\beta _j|_\infty . \end{aligned}$$

Here \(\gamma _1, \gamma _2\in B_{V,0}\) and depend on |f| only through \(|L_{Y_j}f|\) and \(|L_{Y_j}L_{Y_i}f|\). In particular there exists \(\gamma \in B_{V,0}\) s.t. for all \(0<\epsilon \le \epsilon _0\),

$$\begin{aligned} | {\mathbf E}f(y_{t\over \epsilon }^\epsilon )-{\mathbf E}f(y_{s\over \epsilon }^\epsilon )| \le \sup _{0<\epsilon \le \epsilon _0} {\mathbf E}\gamma (y_0^\epsilon )(t-s+\epsilon ). \end{aligned}$$

Furthermore, \(\sup _{0<\epsilon \le \epsilon _0}{\mathbf E}|f(y_{t\over \epsilon }^\epsilon )-f(y_{s\over \epsilon }^\epsilon )|\le (\epsilon +\sqrt{t-s})){\mathbf E}\gamma (y_0^\epsilon )\).

Proof

Since the hypothesis of Theorem 5.2 holds, if V is as defined in (2), it satisfies (2’). Since \(L_{Y_j}f\in B_{V,0}\), \(\sup _{s\le t}{\mathbf E}|L_{Y_j}f(y_{s\over \epsilon }^\epsilon )| ^2\) is finite. We apply Lemma 4.1:

$$\begin{aligned}&{\mathbf E}\left\{ f(y^\epsilon _{t\over \epsilon }) \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} =\ f(y^\epsilon _{s\over \epsilon }) \\&\quad +\, \epsilon \sum _{j=1}^m {\mathbf E}\left\{ ( df(Y_j(y^\epsilon _{t\over \epsilon } ) )\beta _j(z^\epsilon _{t\over \epsilon }) -df(Y_j(y^\epsilon _{s\over \epsilon } ))\beta _j( z^\epsilon _{s\over \epsilon })) \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} \\&\quad -\,\epsilon \sum _{i,j=1}^m {\mathbf E}\left\{ \int _{s\over \epsilon }^{t\over \epsilon } L_{Y_i}L_{Y_j} f(y^\epsilon _r)) \alpha _i(z^\epsilon _r)\;\beta _j(z^\epsilon _r) \;dr \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} . \end{aligned}$$

Let

$$\begin{aligned}&\gamma _1(y_{s\over \epsilon }^\epsilon )=2 \sup _{s\le r\le t} \sum _{j=1}^m{\mathbf E}\left\{ |L_{Y_j}f(y_{r\over \epsilon }^\epsilon )| \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} , \\&\gamma _2(y_{s\over \epsilon }^\epsilon ) = \sup _{s\le r\le t} \sum _{i,j=1}^m{\mathbf E}\left\{ |L_{Y_i}L_{Y_j} f(y^\epsilon _{s\over \epsilon })| \; \big |\; {\mathcal {F}}_{s\over \epsilon }\right\} . \end{aligned}$$

Since \(L_{Y_j}f\) and \( L_{Y_i}L_{Y_j} f\in B_{V,0}\), \(\gamma _1, \gamma _2\in B_{V,0}\). The required conclusion follows for there conditioned inequality, and hence the estimate for \(| {\mathbf E}f(y_{t\over \epsilon }^\epsilon )-{\mathbf E}f(y_{s\over \epsilon }^\epsilon )|\). To estimate \( {\mathbf E}| f(y_{t\over \epsilon }^\epsilon )-f(y_{s\over \epsilon }^\epsilon )|\), we need to involve the diffusion term in (4.1) and hence \(\sqrt{t-s}\) appears. \(\square \)

Lemma 8.4

Assume the conditions of Lemma 8.3 and Assumption 3.1. Let \(y_0^\epsilon =y_0\). If \(f\in C^3\) is s.t. \(|L_{Y_j}f|\), \(|L_{Y_i}L_{Y_j}f|\), \(|L_{Y_l}L_{Y_i}L_{Y_j}f|\) belong to \(B_{V,0}\) for all ijk, then for some \(\epsilon _0\) and all \(0<\epsilon \le \epsilon _0\) and for all \(0\le \epsilon \le s<t\le T\) where \(T>0\),

$$\begin{aligned}&\left| \sum _{l=1}^m {\mathbf E}df(Y_l(y^\epsilon _{t\over \epsilon } ) )\beta _l(z^\epsilon _{t\over \epsilon }) -{\mathbf E}df(Y_l(y^\epsilon _{s\over \epsilon } )) \beta _l( z^\epsilon _{s\over \epsilon }) \right| \\&\quad \le C(T) \gamma (y_0)\epsilon \sqrt{|\log \epsilon |}+C(T) \gamma (y_0) (t-s), \end{aligned}$$

where \(\gamma \in B_{V,0}\) and C(T) is a constant. If the assumptions of Theorem 8.2 holds, the above estimate holds for any \(f\in B_{V,3}\); if \(f\in BC^3\), we may take \(\gamma =(|f|_{3,\infty }+1)\tilde{\gamma } \) where \(\tilde{\gamma }\in B_{V,0}\).

Proof

Let \(t\le T\). Since \(\beta _l(z^\epsilon _{t\over \epsilon })\) is the highly oscillating term, we expect that averaging in the oscillation in \(\beta _l\) gains an \(\epsilon \) in the estimation. We first split the sums:

$$\begin{aligned}&( df (Y_l (y^\epsilon _{t\over \epsilon } ) )\beta _l (z^\epsilon _{t\over \epsilon } ) ) - ( df (Y_l (y^\epsilon _{s\over \epsilon } ) )\beta _l ( z^\epsilon _{s\over \epsilon } ) )\nonumber \\&\quad = df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ( \beta _l (z^\epsilon _{t\over \epsilon } )- \beta _l (z^\epsilon _{s\over \epsilon } ) )\nonumber \\&\qquad + ( df (Y_l (y^\epsilon _{t\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ) \beta _l (z^\epsilon _{t\over \epsilon } )=I_l+II_l. \end{aligned}$$
(8.12)

By Assumption 3.1, \({\mathcal {L}}_0\) has mixing rate \(\psi (r)=ae^{-{\delta r}}\). Let \(s'<s\le t\),

$$\begin{aligned}&|{\mathbf E}df(Y_l(y^\epsilon _{s'\over \epsilon } ))( \beta _l(z^\epsilon _{t\over \epsilon })- \beta _l(z^\epsilon _{s\over \epsilon }))|\\&\quad \le {\mathbf E}\left( \left| df(Y_l(y^\epsilon _{s'\over \epsilon } ) ) \right| \cdot \left| {1\over \epsilon } \int _{s\over \epsilon }^{t\over \epsilon } {\mathbf E}\left\{ \alpha _l (z_r^\epsilon )\big | {\mathcal {F}}_{s'\over \epsilon } \right\} \; dr \right| \right) \\&\quad \le {\mathbf E}| df(Y_l(y^\epsilon _{s'\over \epsilon } )) |{1\over \epsilon } \int _{0}^{t-s\over \epsilon } \psi \left( {r+{s-s'\over \epsilon } \over \epsilon }\right) dr\\&\quad \le {a^2\over \delta } e^{-{\delta (s-s')\over \epsilon ^2}} \; {\mathbf E}| df(Y_l(y^\epsilon _{s'\over \epsilon } ))|. \end{aligned}$$

If \(s-s'=\delta _0 \epsilon ^2|\log \epsilon |\), \(\exp (-{\delta (s-s')\over \epsilon ^2})=\epsilon ^{\delta \delta _0}\). We apply Theorem 5.2 to the functions \(L_{Y_l}f\in B_{V,0}\). For a constant \(\epsilon _0>0\),

$$\begin{aligned} {a^2\over \delta }\sup _{0<\epsilon \le \epsilon _0}\sup _{0\le s'\le t} {\mathbf E}| (df(Y_l(y_{s'\over \epsilon }^\epsilon )) ) | \le \tilde{\gamma }_l(y_0) \end{aligned}$$

where \(\tilde{\gamma }_l\) is a function in \(B_{V,0}\), depending on T. Thus for \(s'<s<t\),

$$\begin{aligned} |{\mathbf E}( df (Y_l (y^\epsilon _{s'\over \epsilon } ) ) ( \beta _l (z^\epsilon _{t\over \epsilon } )- \beta _l(z^\epsilon _{s\over \epsilon } ) ) ) | \le \tilde{\gamma }_l(y_0){a^2\over \delta } \exp {\left( -{\delta (s-s')\over \epsilon ^2}\right) }. \end{aligned}$$
(8.13)

Let us split the first term on the right hand side of (8.12). Denoting \(s'=s-{1\over \delta }\epsilon ^2|\log \epsilon |\),

$$\begin{aligned}&I_l= {\mathbf E}df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ( \beta _l (z^\epsilon _{t\over \epsilon } )- \beta _l (z^\epsilon _{s\over \epsilon } ) )\\&\quad = {\mathbf E}df (Y_l (y^\epsilon _{s'\over \epsilon } ) ) ( \beta _l (z^\epsilon _{t\over \epsilon } )- \beta _l (z^\epsilon _{s\over \epsilon } ) ) \\&\qquad +\,{\mathbf E}( ( df (Y_l (y^\epsilon _{s\over \epsilon } ) ) -df (Y_l (y^\epsilon _{s'\over \epsilon } ) ) ) ( \beta _l (z^\epsilon _{t\over \epsilon } )- \beta _l (z^\epsilon _{s\over \epsilon } ) ) ). \end{aligned}$$

The first term on the right hand side is estimated by (8.13). To the second term we take the supremum norm of \(\beta _l\) and use Lemma 8.3. For some \(\tilde{C}(T)\) and \( \gamma \in B_{V,0}\),

$$\begin{aligned} {\mathbf E}| df(Y_l(y^\epsilon _{s\over \epsilon } ) ) -df(Y_l(y^\epsilon _{s'\over \epsilon } )) | \le \tilde{C}(T)\gamma (y_0) \left( \epsilon + {1\over \sqrt{\delta }}\epsilon |\log \epsilon |^{1\over 2}\right) . \end{aligned}$$
(8.14)

Then for some number C(T),

$$\begin{aligned} \sum _l I_l\le {1\over \sqrt{\delta }} \epsilon \sqrt{|\log \epsilon |}C(T)\gamma (y_0) \end{aligned}$$
(8.15)

where \(\gamma \in B_{V,0}\). Let us treat the second term on the right hand side of (8.12). Let \(t'=t-{1\over \delta }\epsilon ^2|\log \epsilon |\). Then

$$\begin{aligned} II_l= & {} {\mathbf E}( df (Y_l (y^\epsilon _{t\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ) \beta _l (z^\epsilon _{t\over \epsilon } ) \\= & {} {\mathbf E}( df (Y_l (y^\epsilon _{t\over \epsilon } ) )- df (Y_l (y^\epsilon _{t'\over \epsilon } ) ) ) \beta _l (z^\epsilon _{t\over \epsilon } ) \\&+\, {\mathbf E}( df (Y_l (y^\epsilon _{t'\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ) \beta _l (z^\epsilon _{t\over \epsilon } ). \end{aligned}$$

To the first term we apply (8.14) and obtain a rate \({1\over \sqrt{\delta }}\epsilon \sqrt{|\log \epsilon |}\). We could assume that \(\beta _l\) averages to zero. Subtracting the term \(\bar{\beta }_l\) does not change \(I_l\). Alternatively Lemma 8.3 provides an estimate of order \(\epsilon \) for \(| {\mathbf E}( df(Y_l(y^\epsilon _{t\over \epsilon } ) )- df(Y_l(y^\epsilon _{s\over \epsilon } ) )) |\). Finally, since \(\int \beta d\pi =0\),

$$\begin{aligned}&| {\mathbf E}( df (Y_l (y^\epsilon _{t'\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ) \beta _l (z^\epsilon _{t\over \epsilon } ) | \\&\quad = \left| {\mathbf E}( df (Y_l (y^\epsilon _{t'\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) ) {\mathbf E}\left\{ \beta _l (z^\epsilon _{t\over \epsilon } ) \; \big | {\mathcal {F}}_{t'\over \epsilon }\right\} \right| \\&\quad \le {\mathbf E}| df (Y_l (y^\epsilon _{t'\over \epsilon } ) )- df (Y_l (y^\epsilon _{s\over \epsilon } ) ) | |\beta _l|_\infty ae^{ -\delta {t-t'\over \epsilon ^2}}\le \gamma _{l} (y_0) |\beta _l|_\infty a\epsilon . \end{aligned}$$

In the last step we used condition (2’) and \(\gamma _l\) is a function in \(B_{V,0}\). We have proved the first assertion.

If the assumptions of Theorem 8.2 holds, for any \(f\in B_{V,3}\), the following functions belong to \(B_{V,0}\): \(|L_{Y_j}f|\), \(|L_{Y_i}L_{Y_j}f|\), and \(|L_{Y_l}L_{Y_i}L_{Y_j}f|\). If \(f\in BC^3\), the above mentioned functions can be obviously controlled by \(|f|_{3,\infty }\) multiplied by a function in \(B_{V,0}\), thus completing the proof. \(\square \)

9 Rate of convergence in Wasserstein distance

Let \({\mathcal {B}}(M)\) denotes the collection of Borel sets in a \(C^k\) smooth Riemannian manifold M with the Riemannian distance function \(\rho \); let \({\mathbb {P}}(M)\) be the space of probability measures on M. Let \(\epsilon \in (0, \epsilon _0)\) where \(\epsilon _0\) is a positive number. If \(P_\epsilon \rightarrow P\) weakly, we may use either the total variation distance or the Wasserstein distance, both imply weak convergence, to measure the rate of the convergence of \(P_\epsilon \) to P. Let \(\rho \) denotes the Riemannian distance function. The Wasserstein 1-distance is

$$\begin{aligned} d_W(P,Q)=\inf _{ (\pi _1)^*\mu =P, (\pi _2)^*\mu =Q} \int _{M\times M} \rho (x,y)d\mu (x,y). \end{aligned}$$

Here \(\pi _i:M\times M\rightarrow M\) are projections to the first and the second factors respectively, and the infimum are taken over probability measures on \(M\times M\) that couples Q and P. If the diameter, \(\mathrm {diam}(M)\), of M is finite, then the Wasserstein distance is controlled by the total variation distance, \(d_W(P,Q)\le \mathrm {diam}(M)\Vert P-Q\Vert _{TV}\). See Villani [44].

Let us assume that the manifold has bounded geometry; i.e. it has positive injectivity radius, \(\mathrm {inj}(M)\), the curvature tensor and the covariant derivatives of the curvature tensor are bounded. The exponential map from a ball of radius r, \(r<\mathrm {inj}(M)\), at a point x defines a chart, through a fixed orthonormal frame at x. Coordinates that consists of the above mentioned exponential charts are said to be canonical. In canonical coordinates, all transitions functions have bounded derivatives of all order. That f is bounded in \(C^k\) can be formulated as below: for any canonical coordinates and for any integer k, \(|\partial ^\lambda f| \) is bounded for any multi-index \(\lambda \) up to order k. The following types of manifolds have bounded geometry: Lie groups, homogeneous spaces with invariant metrics, Riemannian covering spaces of compact manifolds.

In the lemma below we deduce from the convergence rate of \(P_\epsilon \) to P in the \((C^k)^*\) norm a rate in the Wasserstein distance. Let \(\rho \) be the Riemannian distance with reference to which we speak of Lipschitz continuity of a real valued function on M and the Wasserstein distance on \({\mathbb {P}}(M)\). If \(\xi \) is a random variable we denote by \(\hat{P}_\xi \) its probability distribution. Denote by \(|f|_{\mathrm {Lip}}\) the Lipschitz constant of the function f. Let \(p\in M\). Let \(|f|_{C^k}=|f|_\infty + \sum _{j=0}^{k-1} |\nabla ^jdf|_\infty \).

Lemma 9.1

Let \(\xi _1\) and \(\xi _2\) be random variables on a \(C^k\) manifold M, where \(k\ge 1\), with bounded geometry. Suppose that for a reference point \(p\in M\), \(c_0:=\sum _{i=1}^2{\mathbf E}\rho ^2(\xi _i, p) \) is finite. Suppose that there exist numbers \(c\ge 0, \alpha \in (0,1), \epsilon \in (0, 1]\) s.t. for \(g\in BC^k\),

$$\begin{aligned} |{\mathbf E}g(\xi _1)-{\mathbf E}g(\xi _2)|\le c\epsilon ^\alpha (1+ |g|_{C^k}). \end{aligned}$$

Then there is a constant C, depending only on the geometry of the manifold, s.t.

$$\begin{aligned} d_W(\hat{P}_{\xi _1}, \hat{P}_{\xi _2})\le C(c_0+c)\epsilon ^{\alpha \over k}. \end{aligned}$$

Proof

If \(k=1\), this is clear. Let us take \(k\ge 2\) and let \(f:M\rightarrow {\mathbb {R}}\) be a Lipschitz continuous function with Lipschitz constant 1. Since we are concerned only with the difference of the values of f at two points, \(|{\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)|\), we first shift f so that its value at the reference point is zero. By the Lipschitz continuity of f, \(|f(x) | \le |f|_{\mathrm {Lip}}\;\rho (x,p)\). We may also assume that f is bounded; if not we define a family of functions \(f_n=(f\wedge n )\vee (-n)\). Then \(f_n\) is Lipschitz continuous with its Lipschitz constant bounded by \(|f|_{\mathrm {Lip}}\). Let \(i=1,2\). The correction term \((f-f_n)(\xi _i)\) can be easily controlled by the second moment of \(\rho (p, \xi _i)\):

$$\begin{aligned} {\mathbf E}|(f-f_n)(\xi _i)| \le {\mathbf E}|f(\xi _i)|{\mathbf 1}_{\{|f(\xi _i)|>n\}} \le {1\over n} {\mathbf E}f(\xi _i)^2 \le {1\over n} {\mathbf E}\rho ^2(p, \xi _i). \end{aligned}$$

Let \(\eta : {\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a function supported in the ball \(B(x_0, 1)\) with \(|\eta |_{L_1}=1\) and \(\eta _\delta =\delta ^{-n}\eta ({x\over \delta })\), where \(\delta \) is a positive number and n is the dimension of the manifold. If \(M={\mathbb {R}}^n\),

$$\begin{aligned}&\left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right| \\&\quad \le \left| {\mathbf E}(f*\eta _\delta )(\xi _1)-{\mathbf E}(f*\eta _\delta )(\xi _2)\right| +\sum _{i=1}^2 \left| {\mathbf E}(f*\eta _\delta )(\xi _i)-{\mathbf E}f(\xi _i) \right| \\&\quad \le c\epsilon ^\alpha (1+ |f*\eta _\delta |_{C^k})+2\delta |f|_{\mathrm {Lip}}. \end{aligned}$$

In the last step we used the assumption on \({\mathbf E}|f*\eta _\delta (\xi _1)-f*\eta _\delta (\xi _2)|\) for the \(BC^k\) function \(f*\eta _\delta \). By distributing the derivatives to \(\eta _\delta \) we see that the norm of the first k derivatives of \(f*\eta _\delta \) are controlled by \(|f|_{\mathrm {Lip}}\). If f is bounded,

$$\begin{aligned} c\epsilon ^\alpha (1+ |f*\eta _\delta |_{C^k})\le c \epsilon ^\alpha (1+ |f|_\infty +c_1 \delta ^{-k+1}|f|_{\mathrm {Lip}}), \end{aligned}$$

where \(c_1\) is a combinatorial constant. To summarize, for all Lipschitz continuous f with \(|f|_{\mathrm {Lip}}=1\),

$$\begin{aligned} \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right|\le & {} 2\delta |f|_{\mathrm {Lip}}+ c\epsilon ^\alpha (1+|f_n*\eta _\delta |_{C^k})+{c_0\over n}\\\le & {} 2 \delta +c\epsilon ^\alpha + c\epsilon ^\alpha n+ c_1c\epsilon ^\alpha \delta ^{-k+1} +{c_0\over n}. \end{aligned}$$

Let \(\delta =\epsilon ^{\alpha \over k}\). Since \(k\ge 2\), we choose n with the property \(\epsilon ^{-{\alpha \over k}}\le n\le 2\epsilon ^{-\alpha +{\alpha \over k}}\), then for f with \(|f|_{\mathrm {Lip}}=1\),

$$\begin{aligned} \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right| \le (2+2c+c_1c+2c_0)\epsilon ^{\alpha \over k}. \end{aligned}$$

Let \(\delta \) be a positive number with \(4\delta <\mathrm {inj}(M)\). Let \(B_x(r)\) denotes the geodesic ball centred at x with radius r, whose Riemannian volume is denoted by V(xr). There is a countable sequence \(\{x_i\}\) in M with the following property: (1) \(\{B_{x_i}(\delta )\}\) covers M; (2) There is a natural number N such that any point y belongs to at most N balls from \(\{{\mathcal {B}}_{x_i}(3 \delta )\} \); i.e. the cover \(\{{\mathcal {B}}_{x_i}(3 \delta )\} \) has finite multiplicity. Moreover this number N is independent of \(\delta \). See Shubin [40]. To see the independence of N on \(\delta \), let us choose a sequence \(\{x_i, i\ge 1\}\) in M with the property that \(\{B_{x_i}( \delta )\}\) covers M and \(\{B_{x_i}({\delta \over 2})\}\) are pairwise disjoint. Since the curvature tensors and their derivatives are bounded, there is a positive number C such that

$$\begin{aligned} {1\over C}\le {V(x,r)\over V(y,r)}\le C, \quad x,y \in M, r\in (0, 4\delta ). \end{aligned}$$

Let \(y\in M\) be a fixed point that belongs to N balls of the form \(B_{x_i}({\delta \over 2})\). Since \(B_{x_i}({\delta \over 2})\subset B(y, 4\delta )\), the sum of the volume satisfies: \(\sum V(x_i, {\delta \over 2}) \le V(y, 4\delta )\) and \({N\over C} V(y, {\delta \over 2})\le V(y, 4\delta )\). The ratio \(\sup _y{V(y, 4\delta )\over V(y, {\delta \over 2})}\) depends only on the dimension of the manifold.

Let us take a \(C^k\) smooth partition of unity \(\{\alpha _i, i \in \Lambda \}\) that is subordinated to \(\{B_{x_i}(2\delta )\} \): \(1=\sum _{i\in \Lambda }\phi _i\), \(\phi _i\ge 0\), \(\phi _i\) is supported in \(B_{x_i}(2\delta )\), and for any point x there are only a finite number of non-zero summands in \(\sum _{i\in \Lambda } \alpha _i(x)\). The partition of unity satisfies the additional property: \(\sup _i|\partial ^\lambda \alpha _i|\le C_\lambda \), \(\alpha _i\ge 0\).

Let \((B_{x_i}(\mathrm {inj}(M)), \phi _i)\) be the geodesic charts. Let \(f_i=f\alpha _i\) and let \(\tilde{g}=g\circ \phi _i\) denote the representation of a function g in a chart.

$$\begin{aligned} \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right|= & {} \left| \sum _{i\in \Lambda } {\mathbf E}\tilde{f}_i ( \phi ^{-1}_i(\xi _1))-\sum _{i\in \Lambda }{\mathbf E}\tilde{f}_i ( \phi ^{-1}_i(\xi _2))\right| \\\le & {} \left| \sum _{i\in \Lambda } {\mathbf E}\tilde{f}_i*\eta _{\delta } ( \phi ^{-1}_i(\xi _1))-\sum _{i\in \Lambda }{\mathbf E}\tilde{f}_i*\eta _{\delta }( \phi ^{-1}_i(\xi _2))\right| \\&+ \sum _{j=1}^2\left| \sum _{i\in \Lambda } {\mathbf E}\tilde{f}_i*\eta _{\delta } ( \phi ^{-1}_i(\xi _j))-\sum _{i\in \Lambda }{\mathbf E}\tilde{f}_i ( \phi ^{-1}_i(\xi _j))\right| . \end{aligned}$$

It is crucial to note that there are at most N non-zero terms in the summation. By the assumption, for each i,

$$\begin{aligned} | {\mathbf E}\tilde{f}_i*\eta _{\delta } ( \phi ^{-1}_i(\xi _1))-{\mathbf E}\tilde{f}_i*\eta _{\delta }( \phi ^{-1}_i(\xi _2))| \le c\epsilon ^\alpha | \tilde{f}_i*\eta _{\delta }\circ \phi _i^{-1}|_{C^k}. \end{aligned}$$

By construction, \(\sup _i|\alpha _i|_{C^k}\) is bounded. There is a constant \(c'\) that depends only on the partition of unity, such that

$$\begin{aligned} | \tilde{f}_i*\eta _{\delta }\circ \phi _i^{-1}|_{C^k} \le c'| \tilde{f}_i*\eta _{\delta }|_{C^k} \le c' |\tilde{f}|_\infty +c'c_1\delta ^{1-k}|\tilde{f}|_{\mathrm {Lip}} \end{aligned}$$

Similarly for the second summation, we work with the representatives of \(f_i\),

$$\begin{aligned} | \tilde{f}_i*\eta _{\delta } ( \phi ^{-1}_i(y))-\tilde{f}_i( \phi ^{-1}_i(y))| \le \delta |\tilde{f}_i|_{\mathrm {Lip}}\le c'\delta . \end{aligned}$$

Since we work in the geodesic charts the Lipschitz constant of \(\tilde{f}_i\) are comparable to that of \(|f|_{\mathrm {Lip}}\). Let \(|f|_{\mathrm {Lip}}=1\). If f is bounded,

$$\begin{aligned} \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right| \le N c\epsilon ^\alpha (1+c' | f|_\infty +c'\delta ^{1-k})+2 c'\delta N \end{aligned}$$

Let \(\delta =\epsilon ^{\alpha \over k}\),

$$\begin{aligned} \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right| \le Nc\epsilon ^\alpha (c' | f|_\infty +1)+ Nc' \epsilon ^{\alpha \over k}+2c'N\epsilon ^{\alpha \over k}. \end{aligned}$$

On a compact manifold, \(|f|_\infty \) can be controlled by \(|f|_{\mathrm {Lip}}\); otherwise we use the cut off function \(f_n\) in place of f and the estimate \({\mathbf E}|(f-f_n)(\xi _i)| \le {c_0\over n} \). Choose n sufficiently large, as before, to see that \( \left| {\mathbf E}f(\xi _1)-{\mathbf E}f(\xi _2)\right| \le C\epsilon ^{\alpha \over k}\). Finally we apply the Kantorovich–Rubinstein duality theorem,

$$\begin{aligned} d_W(\hat{P}_{\xi _1},\hat{P}_{\xi _2})=\sup _{f: |f|_{\mathrm {Lip}\le 1} }\left\{ |{\mathbf E}f(\xi _1) -{\mathbf E}f(\xi _2)| \right\} \le C\epsilon ^{\alpha \over k}, \end{aligned}$$

to obtain the required estimate on the Wasserstein 1-distance and concluding the proof. \(\square \)

Let \( \mathrm{ev}_t: C([0,T]; M)\rightarrow M\) denote the evaluation map at time t : \( \mathrm{ev}(\sigma )=\sigma (t)\). Let \(\hat{P}_{\xi }\) denote the probability distribution of a random variable \(\xi \). Let \(o\in M\).

Proposition 9.2

Assume the conditions and notations of Theorem 8.2. Suppose that M has bounded geometry and \(\rho _o^2 \in B_{V,0}\). Let \(\bar{\mu }\) be the limit measure and \(\bar{\mu }_t=(ev_t)_*\bar{\mu }\). Then for every \(r<{1\over 4}\) there exists \(C(T)\in B_{V,0}\) and \(\epsilon _0>0\) s.t. for all \(\epsilon \le \epsilon _0\) and \(t\le T\),

$$\begin{aligned} d_W(\hat{P}_{y^\epsilon _{t\over \epsilon }}, \bar{\mu }_t)\le C(T)\epsilon ^{r}. \end{aligned}$$

Proof

By Theorem 8.2, for \(f\in BC^4\),

$$\begin{aligned} |{\mathbf E}f(\Phi ^\epsilon _{t\over \epsilon }(y_0)) -P_tf(y_0)|\le C(T)(y_0)\epsilon \sqrt{|\log \epsilon |}, \end{aligned}$$

where \(C(T) (y_0)\le \tilde{C}(T)(y_0) (1+|f|_{C^4})\) for some function \(\tilde{C}(T)\in B_{V,0}\). Since by Theorem 5.2, there exists \(\epsilon _0>0\) such that \(\sup _{\epsilon \le \epsilon _0} {\mathbf E}\rho ^2_o (\Phi _t^\epsilon (y_0))\) is finite, we take \(\alpha \) in Lemma 9.1 to be any number less than 1 to conclude the proposition. \(\square \)