1 Introduction

In recent years, McKean-Vlasov stochastic differential equations (MV-SDEs for short) have received increasing attentions by researchers. They are also called as mean-field SDEs or distribution dependent SDEs which are much more involved than classical SDEs as the drift and diffusion coefficients depending on the solution and the law of solution. In a nutshell, this kind of equations play important roles in characterising non-linear Fokker-Planck equations and environment dependent financial systems, see [9, 10, 12, 13, 20, 23, 24] and references therein. Also, this kind of SDEs have been applied to characterise partial differential equations (PDEs for short) involving the Lions derivative (L-derivative for short), which was introduced by P.-L. Lions in his lecture notes [6], see also [5, 7, 14, 16, 21, 22] for more details. Additionally, the analysis of stochastic particle systems (that is why MV-SDEs can be treated as the limiting behaviour of individual particles) has developed as crucial mathematic tools modelling economic and finance systems.

It is well known that the key point of large deviation principle (LDPs for short) is to show the probability property of a rare event, see [1, 4, 11, 15, 26]. In the case of stochastic process, the idea is to find a deterministic path around which the diffusion is concentrated with high probability, and the stochastic motion can be interpreted as a small perturbation of the deterministic path. There are two main approaches to investigate LDPs, one is weak convergence method, the other one is based on exponential approximation argument.

Compared with the theory of LDP, the central limit theorem (CLT for short) is interested in the asymptotic behaviour of stochastic motion tends to the corresponding deterministic path in the smallest deviation scale. Likewise LDP and CLT, recently, theory of moderate deviation principle (MDP for short) has attracted a lot of attention. For instance, the MDP for \(2D\) stochastic convective Brinkman-Forchheimer equations is established in [19]. The authors in [18] investigated large and moderate deviation principles for McKean-Vlasov SDEs with jumps. For more details, we refer to [2, 15] and references therein. It is worth noting that the MDP is concerned with probabilities with a smaller order than that in the LDP, which deviation scale fills in the gap between the CLT scale and the LDP scale.

In this paper, we investigate the CLT and the MDP for solutions of MV-SDEs by using the weak convergence approach. More precisely, we first show that the law of solution to a good approximation SDE of the underlying MV-SDEs satisfies an LDP via weak convergence method. It is worth noting that the weak convergence approach results in a convenient representation formula for the rate function. Secondly, we show that the solution to an approximation SDE and the solution to the MV-SDEs are exponentially equivalent as the deviation scale tends to zero.

To introduce the main results, we recall some preliminaries.

Let \(|\cdot |\) and \(\langle \cdot ,\cdot \rangle \) denote the Euclidean norm and inner product in \(\mathbb{R}^{d}\), respectively. Consider the Cameron-Martin space

$$ \mathbb{H}=\Big\{ h\in C ([0,T];\mathbb{R}^{d}): h(0)={\mathbf{0}}, \dot{h}(t) \text{ exists a.e. } t, \|h\|_{\mathbb{H}}:=\Big(\int _{0}^{T}|\dot{h}(t)|^{2} \mathrm{d}t\Big)^{\frac{1}{2}}\Big\} , $$

where \({\mathbf{0}}\) denotes the vector with components 0.

Let \(\mathscr{A}\) denote the class of \(\mathbb{R}^{d}\)-valued \(\{\mathscr{F}_{t}\}\)-predictable processes \(h(\omega ,\cdot )\) belonging to ℍ a.s. For each \(N>0\), let

$$ S_{N}:=\Big\{ h\in \mathbb{H};~\int _{0}^{T}|\dot{h}(s)|^{2} \mathrm{d}s\le N\Big\} . $$

\(S_{N}\) is endowed with the weak topology induced from ℍ. Define

$$ \mathscr{A}_{N}:=\{h\in \mathscr{A},h(\omega ,\cdot )\in S_{N},~ \mathbb{P}-a.s.\}. $$

In the sequel, we recall the definition of \(L\)-derivative (more details see [22]). Let \(\mathscr{P}_{2}(\mathbb{R}^{d})\) be the set of all probability measures on \(\mathbb{R}^{d}\) with finite second moment, i.e.

$$ \mathscr{P}_{2}(\mathbb{R}^{d})=\Big\{ \mu \in \mathscr{P}(\mathbb{R}^{d}) :\mu (|\cdot |^{2})=\int _{\mathbb{R}^{d}}|x|^{2}\mu (\mathrm{d}x)< \infty \Big\} , $$

where \(\mu (f):=\int f\mathrm{d}\mu \) for a measurable function \(f\). Then \(\mathscr{P}_{2}(\mathbb{R}^{d})\) is a Polish space under the Wasserstein distance

$$\begin{aligned} \mathbb{W}_{2}(\mu ,\nu ):=\inf _{\pi \in \mathscr{C}(\mu ,\nu )} \Big(\int _{\mathbb{R}^{d}\times \mathbb{R}^{d}}|x-y|^{2}\pi ( \mathrm{d}x,\mathrm{d}y)\Big)^{\frac{1}{2}}, \mu ,\nu \in \mathscr{P}_{2}(\mathbb{R}^{d}), \end{aligned}$$

where \(\mathscr{C}(\mu ,\nu )\) is the set of couplings for \(\mu \) and \(\nu \).

For any \(\mu \in \mathscr{P}_{2}(\mathbb{R}^{d})\), the tangent space at \(\mu \) is given by

$$\begin{aligned} T_{\mu ,2}=L^{2}(\mathbb{R}^{d}\rightarrow \mathbb{R}^{d};\mu ):=\{ \phi :\mathbb{R}^{d}\rightarrow \mathbb{R}^{d}~ \text{is measurable with} ~\mu (|\phi |^{2})< \infty \}. \end{aligned}$$

For \(\phi \in T_{\mu ,2}\), we set \(\|\phi \|_{T_{\mu ,2}}^{2}=\int _{\mathbb{R}^{d}}|\phi (x)|^{2}\mu ( \mathrm{d}x)\).

Definition 1.1

Let \(f:\mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}\) be a continuous function, and let \(Id\) be the identity map on \(\mathbb{R}^{d}\).

  1. (1)

    \(f\) is called intrinsically differentiable at a point \(\mu \in \mathscr{P}_{2}(\mathbb{R}^{d})\), if

    $$\begin{aligned} T_{\mu ,2}\ni \phi \mapsto D_{\phi }^{L}f(\mu ):=\lim _{\epsilon \downarrow 0} \frac{f(\mu \circ (Id+\epsilon \phi )^{-1})-f(\mu )}{\epsilon }\in \mathbb{R} \end{aligned}$$

    is a well-defined bounded linear functional. In this case, by the Riesz representation theorem, the unique element \(D^{L}f(\mu )\in T_{\mu ,2}\) satisfying

    $$\begin{aligned} \langle D^{L}f(\mu ),\phi \rangle _{T_{\mu ,2}}:=\int _{\mathbb{R}^{d}} \langle D^{L}f(\mu )(x),\phi (x)\rangle \mu (\mathrm{d}x)=D_{\phi }^{L}f( \mu ), ~~\phi \in T_{\mu ,2}, \end{aligned}$$

    is called the intrinsic derivative of \(f\) at \(\mu \).

    If moreover,

    $$\begin{aligned} \lim _{\|\phi \|_{T_{\mu ,2}}\downarrow 0} \frac{|f(\mu \circ (Id+\phi )^{-1})-f(\mu )-D_{\phi }^{L}f(\mu )|}{\|\phi \|_{T_{\mu ,2}}}=0. \end{aligned}$$

    \(f\) is called \(L\)-differentiable at \(\mu \) with the \(L\)-derivative (i.e. Lions derivative) \(D^{L}f(\mu )\).

  2. (2)

    We write \(f\in C^{1}(\mathscr{P}_{2}(\mathbb{R}^{d}))\) if \(f\) is \(L\)-differentiable at any point \(\mu \in \mathscr{P}_{2}(\mathbb{R}^{d})\), and the \(L\)-derivative has a version \(D^{L}f(\mu )(x)\) jointly continuous in \((x,\mu )\in \mathbb{R}^{d}\times \mathscr{P}_{2}(\mathbb{R}^{d})\). If moreover, \(D^{L}f(\mu )(x)\) is bounded, we denote \(f\in C_{b}^{1}(\mathscr{P}_{2}(\mathbb{R}^{d}))\).

For a vector-valued function \(f=(f_{i})\), or a matrix-valued function \(f=(f_{ij})\) with \(L\)-differentiable components, we write

$$\begin{aligned} D_{\phi }^{L}f(\mu )=(D_{\phi }^{L}f_{i}(\mu )), ~\text{or}~D_{\phi }^{L}f( \mu )=(D_{\phi }^{L}f_{ij}(\mu )),~\mu \in \mathscr{P}_{2}(\mathbb{R}^{d}). \end{aligned}$$

In this paper, we use the symbol “⇒” to denote convergence in distribution.

The following uniform LDP criteria was presented in [17].

Lemma 1.1

For any \(\epsilon >0\), let \(\Gamma ^{\epsilon }\) be a measurable mapping from \(C([0,T];\mathbb{R}^{d})\) into \(C([0,T];\mathbb{R}^{d})\). Suppose that \(\{\Gamma ^{\epsilon }\}_{\epsilon >0}\) satisfies the following assumptions: there exists a measurable map \(\Gamma ^{0}:C([0,T];\mathbb{R}^{d})\rightarrow C([0,T];\mathbb{R}^{d})\) such that

  1. (a)

    For every \(N<+\infty \) and any family \(\{h_{\epsilon };\epsilon >0\}\subset \mathscr{A}_{N}\) satisfying that \(h_{\epsilon }\) converges in distribution as \(S_{N}\)-valued random variables to \(h\) as \(\epsilon \rightarrow 0\), then

    $$ \Gamma ^{\epsilon }\Big(W_{\cdot }+\frac{1}{\sqrt{\epsilon }}\int _{0}^{\cdot }\dot{h}_{\epsilon }(s)\mathrm{d}s\Big)\Rightarrow \Gamma ^{0} \Big(\int _{0}^{\cdot }\dot{h}(s)\mathrm{d}s\Big) \textit{ as } \epsilon \rightarrow 0. $$
  2. (b)

    For every \(N\,{<}\,{+}\infty \), the set \(\{\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s)\mathrm{d}s){;}\, h\,{\in}\, S_{N} \}\) is a compact subset of \(C([0,T];\mathbb{R}^{d})\).

Then the family \(\{\Gamma ^{\epsilon }\}_{\epsilon >0}\) satisfies a large deviation principle in \(C([0,T];\mathbb{R}^{d})\) with the rate function \(I\) given by

$$ I(g):=\inf _{h\in \mathbb{H};g=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s)} \Big\{ \frac{1}{2}\int _{0}^{T}|\dot{h}(s)|^{2} \mathrm{d}s\Big\} ,~~g\in C([0,T];\mathbb{R}^{d}) $$
(1.1)

with \(\inf \emptyset =\infty \) by convention.

The rest of the paper is organised as follows. In Sect. 2 we present the main results Theorem 2.1 and 2.2; Sect. 3 are Sect. 4 are devoted to the proofs of Theorem 2.1 and 2.2, respectively.

Throughout this paper, we let \(C(\alpha , \beta )\) stand for a general constant which depends on parameters \(\alpha , \beta \), and may change from occurrence to occurrence. For \(x\in \mathbb{R}^{d}\), \(\delta _{x}\) stands for the Dirac measure at \(x\). Let \(\|\cdot \|\) denote the operator norm for linear operators respectively. Moreover, we use \(A\lesssim B\) to denote \(A\le cB\) for some constant \(c>0\) and \(a\vee b=\max \{a,b\}\).

2 Main Results

We are interested in the MV-SDE on \((\mathbb{R}^{d},\langle \cdot ,\cdot \rangle ,|\cdot |)\) as follows:

$$ \mathrm{d}X_{t}^{\epsilon }=b_{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})\mathrm{d}t +\sqrt{\epsilon }\sigma _{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})\mathrm{d}W_{t}, ~~X_{0}^{\epsilon }=x, $$
(2.1)

with \(\epsilon >0\), which is named as the scaling parameter. Here \(W_{t}\) is the \(d\)-dimensional Brownian motion defined on a complete filtered probability space \((\Omega , \mathscr{F}, \{\mathscr{F}_{t}\}_{t\ge 0}, \mathbb{P})\), \(\mathscr{L}_{X_{t}^{\epsilon }}\) is the law of \(X_{t}^{\epsilon }\). We assume that the coefficients \(b\) and \(\sigma \) satisfy the following conditions:

(H1):

The coefficients \(b:[0,\infty )\times \mathbb{R}^{d}\times \mathscr{P}_{2}(\mathbb{R}^{d}) \rightarrow \mathbb{R}^{d}\), \(\sigma :[0,\infty )\times \mathbb{R}^{d} \times \mathscr{P}_{2}(\mathbb{R}^{d})\rightarrow \mathbb{R}^{d\otimes d}\) are continuous. There exists an increasing function \(K:[0,\infty )\rightarrow [0,\infty )\) such that

$$\begin{aligned} &\max \{\|\nabla b_{t}(\cdot ,\mu )(x)\|,\|D^{L}b_{t}(x,\cdot )(\mu ) \|_{T_{\mu , 2}}\}\le K(t), ~~t\ge 0,x\in \mathbb{R}^{d},\mu \in \mathscr{P}_{2}(\mathbb{R}^{d}), \end{aligned}$$
(2.2)
$$\begin{aligned} &\|\sigma _{t}(x,\mu )-\sigma _{t}(y,\nu )\| \le K(t)(|x-y|+ \mathbb{W}_{2}(\mu ,\nu )),~t\ge 0,x,y\in \mathbb{R}^{d},\mu ,\nu \in \mathscr{P}_{2}(\mathbb{R}^{d}), \end{aligned}$$
(2.3)

and

$$ |b_{t}({\mathbf{0}},\delta _{{\mathbf{0}}})|+\|\sigma _{t}({\mathbf{0}},\delta _{{ \mathbf{0}}})\| \le K(t),~~t\ge 0. $$
(2.4)
(H2):

The coefficient \(b_{t}(x,\mu )\) are differentiable with respect to \(x\) and \(\mu \) respectively, and its derivative functions satisfy

$$ \|\nabla b_{t}(\cdot ,\mu )(x)-\nabla b_{t}(\cdot ,\nu )(y) \|\le K(t)(|x-y|+\mathbb{W}_{2}(\mu ,\nu )), $$
(2.5)
$$ \begin{aligned} &\Big|\mathbb{E}\langle D^{L}b_{t}(x,\cdot )(\mathscr{L}_{X})(X), \phi \rangle -\mathbb{E}\langle D^{L}b_{t}(y,\cdot )(\mathscr{L}_{Y})(Y), \phi \rangle \Big| \\ &\le K(t)\big(|x-y|+\mathbb{W}_{2}(\mathscr{L}_{X},\mathscr{L}_{Y})+( \mathbb{E}|X-Y|^{2})^{1/2}\big)(\mathbb{E}|\phi |^{2})^{\frac{1}{2}}, \end{aligned} $$

for all \(t\ge 0\), \(x,y\in \mathbb{R}^{d}\), \(\mu ,\nu \in \mathscr{P}_{2}(\mathbb{R}^{d})\), and \(X,Y,\phi \in L^{2}(\Omega \rightarrow \mathbb{R}^{d},\mathbb{P})\).

Remark 2.1

By (H1), we have for \(t\ge 0\), \(x,y\in \mathbb{R}^{d}\), \(\mu ,\nu \in \mathscr{P}_{2}(\mathbb{R}^{d})\) that

$$ |b_{t}(x,\mu )-b_{t}(y,\mu )| \le K(t)(|x-y|+ \mathbb{W}_{2}(\mu ,\nu )). $$
(2.6)

Intuitively, as the parameter \(\epsilon \) tends to 0 in (2.1), the diffusion term vanishes and we have the following ordinary differential equation

$$ \mathrm{d}X_{t}^{0}=b_{t}(X_{t}^{0},\delta _{X_{t}^{0}}) \mathrm{d}t, $$
(2.7)

with the same initial datum as (2.1), that is, \(X_{0}^{0}=x\). Since \(x\) is deterministic, we deduce that \(\delta _{X_{\cdot }^{0}}\) is a Dirac measure centered on the path \(X_{\cdot }^{0}\).

On the general case, investigating the deviations of solution \(X_{t}^{\epsilon }\) to (2.1) from the solution \(X_{t}^{0}\) to (2.7) is to study the asymptotic behaviour of the trajectory

$$\begin{aligned} \overline{X}_{t}^{\epsilon }= \frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}(X_{t}^{\epsilon }-X_{t}^{0}),~t \in [0,T]. \end{aligned}$$
(2.8)
  1. (LDP)

    The case \(\lambda (\epsilon )=1/{\sqrt{\epsilon }}\) provides some large deviation estimates. [11] proved that the law of the solution \(X^{\epsilon }\) satisfies an LDP by means of the discussion of exponential tightness.

  2. (CLT)

    If \(\lambda (\epsilon )\equiv 1\), we shall show that \(\frac{X^{\epsilon }-X^{0}}{\sqrt{\epsilon }}\) converges to a stochastic process in a certain sense as \(\epsilon \rightarrow 0\), see Theorem 2.1.

  3. (MDP)

    To fill in the gap between the CLT scale and the LDP scale, the MDP for \(X^{\epsilon }\) is to investigate the LDP of trajectory (2.8), where the deviation scale \(\lambda (\epsilon )\) satisfies

    $$\begin{aligned} \lambda (\epsilon )\rightarrow \infty ,~~\sqrt{\epsilon }\lambda ( \epsilon )\rightarrow 0, ~~\text{as}~\epsilon \rightarrow 0. \end{aligned}$$
    (2.9)

The first main result is to investigate the CLT for \((X^{\epsilon })_{\epsilon \in (0,1)}\), which is stated as follows:

Theorem 2.1

Under assumptions (H1) and (H2),

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}\Big| \frac{X_{t}^{\epsilon }-X_{t}^{0}}{\sqrt{\epsilon }}-Z_{t}\Big|^{p}\Big) \lesssim \epsilon ,~~\textit{ for any } p\ge 2, \end{aligned}$$

where \(Z_{t}\) solves

$$\begin{aligned} &\mathrm{d}Z_{t}=\nabla _{Z_{t}}b_{t}(\cdot ,\delta _{X_{t}^{0}})(X_{t}^{0}) \mathrm{d}t+\mathbb{E}\langle D^{L}b_{t}(y,\cdot )(\delta _{X_{t}^{0}})(X_{t}^{0}),Z_{t} \rangle |_{y=X_{t}^{0}}\mathrm{d}t +\sigma_{t} (X_{t}^{0},\delta _{X_{t}^{0}}) \mathrm{d}W_{t}, \\ &\quad Z_{0}={\mathbf{0}}. \end{aligned}$$
(2.10)

Here, and in what follows, for \(x,y\in \mathbb{R}^{d}\), \(\mu \in \mathscr{P}_{2}(\mathbb{R}^{d})\), \(\nabla _{y}f(\cdot ,\mu )(x)\) means the directional derivative of function \(f\) at \(x\) in direction \(y\).

The second result is interested in an MDP for \((X^{\epsilon })_{\epsilon \in (0,1)}\), which is stated as follows:

Theorem 2.2

Under assumptions (H1) and (H2), \(\overline{X}^{\epsilon }_{\cdot }\), defined by (2.8), satisfies an LDP on \(C([0,T];\mathbb{R}^{d})\) with the rate function \(I\) which is defined by

$$ I(g):=\inf _{\{h\in \mathbb{H};g=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s)\}} \Big\{ \frac{1}{2}\int _{0}^{T}|\dot{h}(s)|^{2} \mathrm{d}s\Big\} ,~~g\in C([0,T];\mathbb{R}^{d}), $$
(2.11)

where, by convention, \(I(g)=\infty \) if \(\{h\in \mathbb{H};g=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s)\}=\emptyset \) and \(Y_{\cdot }^{h}:=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s)\mathrm{d}s)\) satisfies the following equation:

$$ \mathrm{d}Y_{t}^{h}=\Big\{ \nabla _{Y_{t}^{h}} b_{t}(\cdot , \delta _{X_{t}^{0}})(X_{t}^{0})+\sigma _{t}(X_{t}^{0},\delta _{X_{t}^{0}}) \dot{h}(t)\Big\} \mathrm{d}t. $$
(2.12)

Remark 2.2

Theorems 2.1 and 2.2 can be extended to the case of path-distribution dependent SDEs, and the drift can only satisfies the monotone condition.

We give an example to illustrate the theory.

Example 2.3

For any \(g\in C_{b}^{2}(\mathbb{R}^{d})\), define the function of \(\mu \) as \(\mu \mapsto \mu (g):=\int _{\mathbb{R}}g\mathrm{d}\mu \). Consider the following MV-SDE on \(\mathbb{R}^{d}\):

$$\begin{aligned} \mathrm{d}X_{t}^{\epsilon }=\{X_{t}^{\epsilon }+(\mathscr{L}_{X_{t}^{\epsilon }}(g))^{2}\}\mathrm{d}t+\sqrt{\epsilon }\{X_{t}^{\epsilon }+\mathscr{L}_{X_{t}^{\epsilon }}(g)\}\mathrm{d}W_{t} \end{aligned}$$
(2.13)

with the initial value \(X_{0}^{\epsilon }\). When \(\epsilon \rightarrow 0\), we obtain the following ordinary differential equation

$$\begin{aligned} \mathrm{d}X_{t}^{0}=\{X_{t}^{0}+(\delta _{X_{t}^{0}}(g))^{2} \}\mathrm{d}t. \end{aligned}$$
(2.14)

We now check that the coefficients of (2.13) satisfy (H1) and (H2).

Let \(b(x,\mu )=x+(\mu (g))^{2}\), we have \(\nabla b(\cdot ,\mu )(x)=I\), where \(I\) is the \(d\times d\) identity matrix. It is easy to check that (H1) and (H2) hold for the spatial component of \(b\). Now, we check (H1) and (H2) also hold for the measure component of \(b\).

Firstly, we verify the condition (H1). By the Taylor expansion, we arrive at

$$\begin{aligned} &\lim _{\|\phi \|_{T_{\mu ,2}}\rightarrow 0} \frac{1}{\|\phi \|_{T_{\mu ,2}}}\Big|\mu \circ (Id+\phi )^{-1}(g)- \mu (g)-\mu(\langle \nabla g,\phi (x)\rangle) \Big| \\ &=\lim _{\|\phi \|_{T_{\mu ,2}}\rightarrow 0} \frac{1}{\|\phi \|_{T_{\mu ,2}}}\Big|\int _{\mathbb{R}^{d}}\{g(x+ \phi (x))-g(x)-\langle \nabla g,\phi (x)\rangle \}\mu ( \mathrm{d}x) \Big| \\ &\le \lim _{\|\phi \|_{T_{\mu ,2}}\rightarrow 0} \frac{\|\nabla ^{2} g\|_{\infty }}{2\|\phi \|_{T_{\mu ,2}}} \Big|\int _{ \mathbb{R}^{d}}|\phi (x)|^{2}\mu (\mathrm{d}x) \\ &\le \lim _{\|\phi \|_{T_{\mu ,2}}\rightarrow 0}\|\nabla ^{2} g\|_{\infty }\|\phi \|_{T_{\mu ,2}}=0. \end{aligned}$$

That is \(D^{L}\mu (g)=\nabla g\). Similarly, we can show that \(D^{L}b(x,\cdot )(\mu )=2\mu (g)\nabla g\). This yields that \(\|D^{L}b(x,\cdot )(\mathscr{L}_{x}(g))\|_{T_{\mu ,2}}\le K\), where \(K=2\max \{\sup _{x\in \mathbb{R}^{d}}|g(x)|, \sup _{x\in \mathbb{R}^{d}} \|\nabla g(x)\|\}\), since \(g\in C_{b}^{2}(\mathbb{R}^{d})\).

We now check the condition (H2). For \(X,Y,\phi \in L^{2}(\Omega \rightarrow \mathbb{R}^{d},\mathbb{P})\)

$$\begin{aligned} &|\mathbb{E}\langle D^{L}b(x,\cdot )(\mathscr{L}_{X}(g))(X),\phi \rangle -\mathbb{E}\langle D^{L}b(x,\cdot )(\mathscr{L}_{Y}(g))(Y), \phi \rangle | \\ &=\Big|\mathbb{E}\langle 2(\mathscr{L}_{X}(g)\nabla g)(X),\phi \rangle -\mathbb{E}\langle 2(\mathscr{L}_{Y}(g)\nabla g)(Y),\phi \rangle \Big| \\ &\le 2(\mathbb{E}|\phi |^{2})^{1/2}(\mathbb{E}|(\mathscr{L}_{X}(g) \nabla g)(X)-(\mathscr{L}_{Y}(g)\nabla g)(Y)|^{2})^{1/2} \\ &\le 4(\mathbb{E}|\phi |^{2})^{1/2} \big\{ (\mathbb{E}|\mathscr{L}_{X}(g)- \mathscr{L}_{Y}(g)\nabla g(X)|^{2})^{1/2}+(\mathbb{E}|\mathscr{L}_{Y}(g)( \nabla g(X)-\nabla g(Y))|^{2})^{1/2}\big\} \\ &\le C(\mathbb{E}|\phi |^{2})^{1/2} (\mathbb{E}|X-Y|^{2})^{1/2}), \end{aligned}$$

where in the last inequality, we have used \(\|D^{L}\mu (g)\|_{T_{\mu },2}<\infty \).

Similarly, we can also check that \(\sigma \) satisfies (H1). Thus, by Theorem 2.1, we obtain \(Z_{t}\) satisfies

$$\begin{aligned} \mathrm{d}Z_{t}=Z_{t}\mathrm{d}t+\mathbb{E}\langle 2(\delta _{X_{t}}^{0}(g) \nabla g)(X_{t}^{0}),Z_{t}\rangle \mathrm{d}t+\{X_{t}^{0}+( \mathscr{L}_{X_{t}^{0}}(g))\}\mathrm{d}W_{t}. \end{aligned}$$
(2.15)

3 Proof of Theorem 2.1

Before giving the proof of Theorem 2.1, we prepare the following lemmas, where the first one is a formula of \(L\)-derivative, due to [22].

Lemma 3.1

Let \((\Omega ,\mathscr{F},\mathbb{P})\) be an atomless probability space, and let \(X,Y\in L^{2}(\Omega \rightarrow \mathbb{R}^{d},\mathbb{P})\) with \(\mathscr{L}_{X}=\mu \). If either \(X\) and \(Y\) are bounded and \(f\) is \(L\)-differentiable at \(\mu \), or \(f\in C_{b}^{1}(\mathscr{P}_{2}(\mathbb{R}^{d}))\), then

$$ \lim _{\epsilon \rightarrow 0} \frac{f(\mathscr{L}_{X+\epsilon Y})-f(\mu )}{\epsilon }=\mathbb{E} \langle D^{L}f(\mu )(X),Y\rangle . $$
(3.1)

Consequently,

$$ \Big|\lim _{\epsilon \downarrow 0} \frac{f(\mathscr{L}_{X+\epsilon Y})-f(\mu )}{\epsilon }\Big|=| \mathbb{E}\langle D^{L}f(\mu )(X),Y\rangle |\le \|D^{L}f(\mu )\|\sqrt{ \mathbb{E}|Y|^{2}}. $$
(3.2)

The existence and uniqueness of solution to (2.1) has been proved in [25]. The following Lemma gives the uniformly \(p\)-th moment estimates about \(X_{t}^{\epsilon }, X_{t}^{0}\).

Lemma 3.2

Under assumption (H1). For \(\forall p\ge 2\), we have

$$ \mathbb{E}\Big(\sup _{0\le t\le T}|X_{t}^{\epsilon }|^{p}\Big)\vee \Big( \sup _{0\le t\le T}|X_{t}^{0}|^{p}\Big)< \infty ,~~~~~p\ge 2, $$
(3.3)

with the initial value \(X_{0}^{0}=X_{0}^{\epsilon }=x\in \mathbb{R}^{d}\).

Proof

It is easy to get from (H1),

$$ |b_{t}(x,\mu )|\vee \|\sigma _{t}(x,\mu )\|\le K(t)(1+|x|+\mathbb{W}_{2}( \mu ,\delta _{0})). $$
(3.4)

Noting that \(\mathbb{W}_{2}(\mathscr{L}_{X_{s}^{\epsilon }},\delta _{0})^{p}\le ( \mathbb{E}|X_{s}^{\epsilon }|^{2})^{p/2}\), by the Burkholder-Davis-Gundy (BDG for short) inequality and (3.4), one has

$$ \mathbb{E}\Big(\sup _{0\le t\le T}|X_{t}^{\epsilon }|^{p} \Big) \le 3^{p-1}|x|^{p}+C(T,p)\mathbb{E}\int _{0}^{T}(1+|X_{s}^{\epsilon }|^{p})\mathrm{d}s, $$

and

$$\begin{aligned} \Big(\sup _{0\le t\le T}|X_{t}^{0}|^{p}\Big)\le C(T,p)\int _{0}^{T}(1+|X_{s}^{0}|^{p}) \mathrm{d}s, \end{aligned}$$

thus, (3.3) follows from Gronwall’s inequality. □

Lemma 3.3

Under (H1) and (H2), we have \(\forall p\ge 2\)

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}^{\epsilon }|^{p}\Big)\vee \mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}|^{p}\Big)< \infty , \end{aligned}$$
(3.5)

where \(Z_{\cdot }^{\epsilon }:= \frac{X_{\cdot }^{\epsilon }-X_{\cdot }^{0}}{\sqrt{\epsilon }}\) and \(Z_{t}\) is defined in (2.10).

Proof

By (2.1) and (2.7), we know \(Z_{t}^{\epsilon }\) satisfying

$$\begin{aligned} \mathrm{d}Z_{t}^{\epsilon }=\frac{1}{\sqrt{\epsilon }}(b_{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})-b_{t}(X_{t}^{0},\delta _{X_{t}^{0}})) \mathrm{d}t+\sigma _{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})\mathrm{d}W_{t}. \end{aligned}$$
(3.6)

To prove \(\mathbb{E}\Big (\sup _{0\le t\le T}|Z_{t}^{\epsilon }|^{p}\Big )<\infty \), \(p \ge 2\), it suffices to show

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|X_{t}^{\epsilon }-X_{t}^{0}|^{p}\Big) \le C(T,p)\epsilon ^{\frac{p}{2}}. \end{aligned}$$
(3.7)

Indeed, by (2.6), (3.4), Hölder’s inequality and BDG’s inequality, one gets

$$\begin{aligned} &\mathbb{E}\Big(\sup _{0\le t\le T}|X_{t}^{\epsilon }-X_{t}^{0}|^{p} \Big) \\ &\le 2^{p-1}\Big\{ \mathbb{E}\Big|\int _{0}^{T}|b_{s}(X_{s}^{\epsilon }, \mathscr{L}_{X_{s}^{\epsilon }})-b_{s}(X_{s}^{0},\delta _{X_{s}^{0}})| \mathrm{d}s\Big|^{p}+\epsilon ^{p/2}\mathbb{E}\Big(\sup _{0\le t \le T}\Big|\int _{0}^{t}\sigma _{s}(X_{s}^{\epsilon },\mathscr{L}_{X_{s}^{\epsilon }})\mathrm{d}W_{s}\Big|^{p}\Big)\Big\} \\ &\le C(T,p)\Big\{ \int _{0}^{T}(|X_{s}^{\epsilon }-X_{s}^{0}|+ \mathbb{W}_{2}(\mathscr{L}_{X_{s}^{\epsilon }},\delta _{X_{s}^{0}}))^{p} \mathrm{d}s+\epsilon ^{p/2}\Big(\int _{0}^{T}(\mathbb{E}|X_{s}^{\epsilon }|^{2}+1)\mathrm{d}s\Big)^{p/2}\Big\} \\ &\le C(T,p) \int _{0}^{T}\mathbb{E}|X_{s}^{\epsilon }-X_{s}^{0}|^{p} \mathrm{d}s+\epsilon ^{p/2}C(T,p)\Big(1+\int _{0}^{T}\mathbb{E}|X_{s}^{\epsilon }|^{p}\mathrm{d}s\Big), \end{aligned}$$

where the last inequality is due to the fact that \(\mathbb{W}_{2}(\mathscr{L}_{X_{s}^{\epsilon }},\delta _{0})^{2}\le \mathbb{E}|X_{s}^{\epsilon }|^{2}\). Then, (3.7) follows from (3.3) and the Gronwall inequality.

Similarly, by (H2) and (3.3), we derive from (2.10) that

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}|^{p}\Big)&\le C(T,p)\int _{0}^{T} \mathbb{E}|Z_{t}|^{p}\mathrm{d}t+C(T,p)\int _{0}^{T}(1+|X_{t}^{0}|^{p}) \mathrm{d}t \\ &\le C(T,p)\Big(1+\int _{0}^{T}\mathbb{E}|Z_{t}|^{p}\mathrm{d}t \Big), \end{aligned}$$

then, this implies (3.5) by using Gronwall’s inequality. □

Proof of Theorem 2.1

By the definitions of \(Z_{t}^{\epsilon }\) and \(Z_{t}\), we derive that

$$\begin{aligned} Z_{t}^{\epsilon }-Z_{t}&=\int _{0}^{t}\Big(\frac{1}{\sqrt{\epsilon }}(b_{s}(X_{s}^{\epsilon },\mathscr{L}_{X_{s}^{\epsilon }})-b_{s}(X_{s}^{0},\mathscr{L}_{X_{s}^{\epsilon }}))- \nabla _{Z_{s}^{\epsilon }}b_{s}(\cdot ,\mathscr{L}_{X_{s}^{\epsilon }})(X_{s}^{0}) \Big)\mathrm{d}s \\ &~~+\int _{0}^{t}\Big(\frac{1}{\sqrt{\epsilon }}(b_{s}(X_{s}^{0}, \mathscr{L}_{X_{s}^{\epsilon }})-b_{s}(X_{s}^{0},\delta _{X_{s}^{0}}))- \mathbb{E}\langle D^{L}b_{s}(y,\cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}^{\epsilon }\rangle |_{y=X_{s}^{0}} \Big)\mathrm{d}s \\ &~~+\int _{0}^{t}(\nabla _{Z_{s}^{\epsilon }}b_{s}(\cdot ,\mathscr{L}_{X_{s}^{\epsilon }})(X_{s}^{0})-\nabla _{Z_{s}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0})) \mathrm{d}s \\ &~~+\int _{0}^{t}(\mathbb{E}\langle D^{L}b_{s}(y,\cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}^{\epsilon }\rangle |_{y=X_{s}^{0}}-\mathbb{E}\langle D^{L}b_{s}(y, \cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}\rangle |_{y=X_{s}^{0}} ) \mathrm{d}s \\ &~~+\int _{0}^{t}(\sigma _{s}(X_{s}^{\epsilon },\mathscr{L}_{X_{s}^{\epsilon }})-\sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})) \mathrm{d}W_{s}. \end{aligned}$$

By (H2), Lemma 3.1, Hölder’s inequality and BDG’s inequality, we have

$$\begin{aligned} &\mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}^{\epsilon }-Z_{t}|^{p}\Big) \\ &\quad \le C(T,p)\int _{0}^{T}\mathbb{E}\Big|\int _{0}^{1}\nabla _{Z_{s}^{\epsilon }}b_{s}(\cdot ,\mathscr{L}_{X_{s}^{\epsilon }})(R_{s}^{\epsilon }(r)) \mathrm{d}r-\nabla _{Z_{s}^{\epsilon }}b_{s}(\cdot ,\mathscr{L}_{X_{s}^{\epsilon }})(X_{s}^{0})\Big|^{p}\mathrm{d}s \\ &\qquad{}+C(T,p) \int _{0}^{T}\mathbb{E}\Big|\int _{0}^{1}\mathbb{E}\langle D^{L}b_{s}(y, \cdot )(\mathscr{L}_{R_{s}^{\epsilon }(r)})(R_{s}^{\epsilon }(r)), Z_{s}^{\epsilon }\rangle |_{y=X_{s}^{0}}\mathrm{d}r \\ &\qquad{}-\mathbb{E}\langle D^{L}b_{s}(y,\cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}^{\epsilon }\rangle |_{y=X_{s}^{0}}\Big|^{p}\mathrm{d}s \\ &\qquad{}+C(T,p)\int _{0}^{T}\mathbb{E}|\nabla _{Z_{s}^{\epsilon }-Z_{s}}b_{s}( \cdot ,\mathscr{L}_{X_{s}^{\epsilon }})(X_{s}^{0})|^{p}\mathrm{d}s \\ &\qquad{}+C(T,p)\int _{0}^{T}\Big(\mathbb{E}|\nabla _{Z_{s}}b_{s}(\cdot , \mathscr{L}_{X_{s}^{\epsilon }})(X_{s}^{0})-\nabla _{Z_{s}}b_{s}(\cdot , \delta _{X_{s}^{0}})(X_{s}^{0})|^{p}\Big)\mathrm{d}s \\ &\qquad{}+C(T,p)\int _{0}^{T}\mathbb{E}\Big|\mathbb{E}\langle D^{L}b_{s}(y, \cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}^{\epsilon }\rangle |_{y=X_{s}^{0}} \\ &\qquad{}-\mathbb{E}\langle D^{L}b_{s}(y,\cdot )(\delta _{X_{s}^{0}})(X_{s}^{0}), Z_{s}\rangle |_{y=X_{s}^{0}}\Big|^{p}\mathrm{d}s \\ &\qquad{}+C(T,p)\int _{0}^{T}\mathbb{E}|\sigma _{s}(X_{s}^{\epsilon }, \mathscr{L}_{X_{s}^{\epsilon }})-\sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})|^{p} \mathrm{d}s \\ &\quad=:\sum _{i=1}^{6}J_{i}(T),I=1,2,\ldots ,6, \end{aligned}$$
(3.8)

where \(R_{s}^{\epsilon }(r)=X_{s}^{0}+r(X_{s}^{\epsilon }-X_{s}^{0})\), \(r\in [0,1]\).

By (H1), (H2), (3.5) and Hölder’s inequality, we have

$$\begin{aligned} &\sum _{i=1}^{5}J_{i}(T) \\ &\quad\le C(T,p)\Big\{ \int _{0}^{T}(\mathbb{E}|Z_{s}^{\epsilon }|^{2})^{\frac{p}{2}}\mathbb{E}\Big(\int _{0}^{1}((\mathbb{E}|R_{s}^{\epsilon }(r)-X_{s}^{0}|^{2})^{1/2}+\mathbb{W}_{2}(\mathscr{L}_{R_{s}^{\epsilon }(r)},\delta _{X_{s}^{0}}))\mathrm{d}r\Big)^{p} \mathrm{d}s \\ &\qquad+\epsilon ^{p/2}\int _{0}^{T}\mathbb{E}|Z_{s}^{\epsilon }|^{2p} \mathrm{d}s+\int _{0}^{T}\mathbb{E}|Z_{s}^{\epsilon }-Z_{s}|^{p} \mathrm{d}s +\int _{0}^{T}\mathbb{E}(|Z_{s}|\mathbb{W}_{2}( \mathscr{L}_{X_{s}^{\epsilon }},\delta _{X_{s}^{0}}))^{p} \mathrm{d}s\Big\} \\ &\quad\le C(T,p)\Big\{ \epsilon ^{p/2}\int _{0}^{T}\big(\mathbb{E}|Z_{s}^{\epsilon }|^{2p}+\mathbb{E}|Z_{s}|^{p}\mathbb{E}|Z_{s}^{\epsilon }|^{p} \big)\mathrm{d}s+\int _{0}^{T}\mathbb{E}|Z_{s}^{\epsilon }-Z_{s}|^{p} \mathrm{d}s\Big\} , \end{aligned}$$
(3.9)

we used \(\mathbb{W}_{2}(\mathscr{L}_{R_{s}^{\epsilon }(r)},\delta _{X_{s}^{0}}) \le r\sqrt{\epsilon }(\mathbb{E}|Z_{s}^{\epsilon }|^{2})^{1/2}\), \(\mathbb{W}_{2}(\mathscr{L}_{X_{s}^{\epsilon }},\delta _{X_{s}^{0}}) \le \epsilon ^{1/2}(\mathbb{E}|Z_{s}^{\epsilon }|^{2})^{1/2}\) in the last inequality.

Moreover, we obtain from (H1), (3.5) and Hölder’s inequality that

$$\begin{aligned} J_{6}(T)&\le C(T,p)\int _{0}^{T}(\epsilon ^{p/2}\mathbb{E}|Z_{s}^{\epsilon }|^{p}+\mathbb{E}\mathbb{W}_{2}(\mathscr{L}_{X_{s}^{\epsilon }}, \delta _{X_{s}^{0}})^{p})\mathrm{d}s \\ &\le C(T,p)\int _{0}^{T}\epsilon ^{p/2}\mathbb{E}|Z_{s}^{\epsilon }|^{p} \mathrm{d}s. \end{aligned}$$
(3.10)

Collecting the estimates (3.9), (3.10) into (3.8), we arrive at

$$\begin{aligned} &\mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}^{\epsilon }-Z_{t}|^{p}\Big)\\ &\quad \le C(T,p)\Big\{ \epsilon ^{p/2}\int _{0}^{T}\big(\mathbb{E}|Z_{s}^{\epsilon }|^{2p}+\mathbb{E}|Z_{s}|^{p}\mathbb{E}|Z_{s}^{\epsilon }|^{p} \big)\mathrm{d}s+\int _{0}^{T}\mathbb{E}|Z_{s}^{\epsilon }-Z_{s}|^{p} \mathrm{d}s\Big\} . \end{aligned}$$

An application of the Gronwall inequality, it yields that

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|Z_{t}^{\epsilon }-Z_{t}|^{p}\Big) \le C_{T,p}\epsilon ^{p/2}. \end{aligned}$$

The desired assertion is obtained by taking \(\epsilon \rightarrow 0\). □

4 Proof of Theorem 2.2

From (2.1), (2.7), (2.8), we can see that \(\overline{X}^{\epsilon }\) satisfies the following equation:

$$ \overline{X}_{t}^{\epsilon }= \frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}\int _{0}^{t}[b_{s}(X_{s}^{\epsilon },\mathscr{L}_{X_{s}^{\epsilon }})-b_{s}(X_{s}^{0},\delta _{X_{s}^{0}}) ]\mathrm{d}s +\frac{1}{\lambda (\epsilon )}\int _{0}^{t}\sigma _{s}(X_{s}^{\epsilon },\mathscr{L}_{X_{s}^{\epsilon }})\mathrm{d}W_{s}. $$
(4.1)

In the sequel, we aim to show the law of \(\overline{X}_{t}^{\epsilon }\) satisfies an LDP. To this end, we first recall the LDP is to identify a deterministic path around which the diffusion is concentrated with overwhelming probability, so that the stochastic motion can be seen as a small random perturbation of this deterministic path. This means in particular that the law of \(\overline{X}_{t}^{\epsilon }\) is close to some Dirac mass if \(\epsilon \) is small. We therefore proceed in two steps toward the aim of proving the law of \(\overline{X}^{\epsilon }\) satisfies an LDP.

Firstly, noting that \(\mathscr{L}_{X_{t}^{\epsilon }}\) will converge to \(\delta _{X_{t}^{0}}\) in distribution as the deviation scale \(\lambda (\epsilon )\) satisfying (2.9). We replace \(\mathscr{L}_{X_{t}^{\epsilon }}\) by \(\delta _{X_{t}^{0}}\) in (4.1) and obtain an approximation SDE of (4.1) as follows:

$$ \overline{Y}_{t}^{\epsilon }= \frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}\int _{0}^{t}[b_{s}( \widetilde{Y}_{s}^{\epsilon },\delta _{X_{s}^{0}})-b_{s}(X_{s}^{0}, \delta _{X_{s}^{0}}) ]\mathrm{d}s +\frac{1}{\lambda (\epsilon )} \int _{0}^{t}\sigma _{s}(\widetilde{Y}_{s}^{\epsilon },\delta _{X_{s}^{0}}) \mathrm{d}W_{s}, $$
(4.2)

where \(\mathrm{d}\widetilde{Y}_{t}^{\epsilon }=b_{t}(\widetilde{Y}_{t}^{\epsilon },\delta _{X_{t}^{0}})\mathrm{d}t+\sqrt{\epsilon }\sigma _{t}( \widetilde{Y}_{t}^{\epsilon },\delta _{X_{t}^{0}})\mathrm{d}W_{t}\) and \(\overline{Y}_{t}^{\epsilon }= \frac{\widetilde{Y}_{t}^{\epsilon }-X_{t}^{0}}{\sqrt{\epsilon }\lambda (\epsilon )}\). Then, we establish the law of \(\overline{Y}_{t}^{\epsilon }\) satisfying an LDP.

Secondly, we claim that \(\overline{X}_{t}^{\epsilon }\) and \(\overline{Y}_{t}^{\epsilon }\) are exponentially equivalent. Thus, we obtain the law of \(\overline{X}_{t}^{\epsilon }\) satisfies an LDP with the good rate function \(I(g)\) given in (2.11) due to the fact the LDP does not distinguish between exponentially equivalent families.

To make the content self-contained. In the following subsection, we give the sketch proof of the law of \(\overline{Y}^{\epsilon }\) satisfying an LDP.

4.1 Large Deviation Principle for \(\overline{Y}^{\epsilon }\)

Lemma 4.1

Under the assumptions of Theorem 2.2, the family of \((\overline{Y}^{\epsilon })_{\epsilon >0}\) satisfies a large deviation principle in \(C([0,T];\mathbb{R}^{d})\) equipped with the topology of the uniform norm with the good rate function \(I(g)\) given in (2.11).

According to Lemma 1.1, to complete the proof of Lemma 4.1, we only need to verify the conditions (a) and (b) in Lemma 1.1.

By the Yamada-Watanabe theorem, there exists a measurable map \(\Gamma ^{\epsilon }:C([0,T];\mathbb{R}^{d})\rightarrow C([0,T]; \mathbb{R}^{d})\) such that \(\overline{Y}_{\cdot }^{\epsilon }=\Gamma ^{\epsilon }\Big ( \frac{1}{\lambda (\epsilon )}W_{\cdot }\Big )\).

Since \(\mathbb{E}_{\mathbb{P}}\Big (\exp \big \{\frac{1}{2}\int _{0}^{T}|\dot{h}_{\epsilon }(s))|^{2}\mathrm{d}s\big \}\Big )<\infty \) for \(h_{\epsilon }\in \mathscr{A}_{N}\), that is, the Novikov condition holds. By the Girsanov theorem, we know that

$$\begin{aligned} \frac{1}{\lambda (\epsilon )}\widetilde{W}_{t}= \frac{1}{\lambda (\epsilon )}W_{t}+\int _{0}^{t}\dot{h}_{\epsilon }(s) \mathrm{d}s \end{aligned}$$

is a Brownian motion under the probability measure \(\mathbb{P}_{\epsilon }:=R_{T}\mathbb{P}\), where

$$\begin{aligned} R_{T}=\exp {\Big\{ -\int _{0}^{T}\dot{h}_{\epsilon }(s)\mathrm{d} \frac{W_{s}}{\lambda (\epsilon )}-\frac{1}{2}\int _{0}^{T}|\dot{h}_{\epsilon }(s)|^{2}\mathrm{d}s\Big\} } \end{aligned}$$

is an exponential martingale.

Furthermore, we obtain that \(\overline{Y}_{\cdot }^{\epsilon ,h_{\epsilon }}=\Gamma ^{\epsilon }\Big ( \frac{1}{\lambda (\epsilon )}W_{\cdot }+\int _{0}^{\cdot }\dot{h}_{\epsilon }(s)\mathrm{d}s\Big )\), which solves

$$\begin{aligned} \mathrm{d}\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}&= \frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}[b_{t}(Y_{t}^{\epsilon ,h_{\epsilon }},\delta _{X_{t}^{0}})-b_{t}(X_{t}^{0},\delta _{X_{t}^{0}}) ] \mathrm{d}t \\ &~~+\frac{1}{\lambda (\epsilon )}\sigma _{t}(Y_{t}^{\epsilon ,h_{\epsilon }},\delta _{X_{t}^{0}})\mathrm{d}W_{t} +\sigma _{t}(Y_{t}^{ \epsilon ,h_{\epsilon }},\delta _{X_{t}^{0}})\dot{h}_{\epsilon }(t) \mathrm{d}t, \end{aligned}$$
(4.3)

where \(Y_{t}^{\epsilon ,h_{\epsilon }}:=X_{t}^{0}+\sqrt{\epsilon }\lambda ( \epsilon )\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}\).

The following lemmas play the key roles in the proof of Lemma 4.1.

Lemma 4.2

Under Assumptions (H1) and (2.5) in (H2), for any \(h\in \mathbb{H}\), Eq. (2.12) admits a unique solution \(Y_{\cdot }^{h}\) in \(C([0,T];\mathbb{R}^{d})\). Moreover, for any \(N>0\), there exists a constant \(C_{N,T}\) such that

$$ \sup _{h\in S_{N}}\Big\{ \sup _{0\le t\le T}|Y_{t}^{h}|\Big\} \le C_{N,T}. $$
(4.4)

Proof

By (H1) and (H2), the coefficients of (2.12) satisfy the Lipschitz condition, therefore Eq. (2.12) admits a unique solution. Moreover, noting the coefficient functions satisfy the linear growth condition and the fact that \(\mathbb{W}_{2}(\mathscr{L}_{Y_{t}^{h}},\delta _{0})^{2}\le \mathbb{E}|Y_{t}^{h}|^{2}\), we can obtain the estimate (4.4) by using the Gronwall inequality. Here we omit the details of proof. □

Firstly, we prove that the condition (b) of Lemma 1.1 holds.

Lemma 4.3

Under assumptions (H1) and (2.5) in (H2), for any positive number \(N<\infty \), the family

$$ K_{N}:=\Big\{ \Gamma ^{0}\Big(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s\Big); h\in S_{N}\Big\} , $$

is compact in \(C([0,T];\mathbb{R}^{d})\), where the map \(\Gamma ^{0}\) is defined in Theorem 2.2.

Proof

For any \(N<\infty \), the set \(K_{N}\) is compact provided that the compactness of \(S_{N}\) and the continuity of the map \(\Gamma ^{0}\) from \(S_{N}\) to \(C([0,T];\mathbb{R}^{d})\). To this end, it suffices to claim that \(\Gamma ^{0}\) is a continuous map from \(S_{N}\) to \(C([0,T];\mathbb{R}^{d})\). Let \(h_{n}\rightarrow h\) in \(S_{N}\) as \(n\rightarrow \infty \). Then

$$ \begin{aligned} Y_{t}^{h_{n}}-Y_{t}^{h}&=\int _{0}^{t}\nabla _{\{Y_{s}^{h_{n}}-Y_{s}^{h} \}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0})\mathrm{d}s +\int _{0}^{t} \sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})(\dot{h}_{n}(s)-\dot{h}(s)) \mathrm{d}s \\ &=:I_{1}^{n}(t)+I_{2}^{n}(t). \end{aligned} $$

By (H2), (3.3) and (3.4), it is easy to see that

$$\begin{aligned} |I_{1}^{n}(t)|\le \int _{0}^{t}K(s)(1+|X_{s}^{0}|+\mathbb{W}_{2}( \delta _{X_{s}^{0}},\delta _{0}))|Y_{s}^{h_{n}}-Y_{s}^{h}| \mathrm{d}s. \end{aligned}$$

Let \(g^{n}(t)=\int _{0}^{t}\sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}}) \dot{h}_{n}(s)\mathrm{d}s\). By (H1), Lemma 3.2, and \(h_{n},h\in S_{N}\), we derive that

$$\begin{aligned} |g^{n}(t)|&\le \Big(\int _{0}^{t}\|\sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}}) \|^{2}\mathrm{d}s\Big)^{1/2}\Big(\int _{0}^{t}|\dot{h}_{n}(s)|^{2} \mathrm{d}s\Big)^{1/2} \\ &\le \Big(\int _{0}^{t}K^{2}(s)(1+|X_{s}^{0}|+W_{2}(\delta _{X_{s}^{0}}, \delta _{0}))^{2}\mathrm{d}s\Big)^{1/2}\Big(\int _{0}^{t}|\dot{h}_{n}(s)|^{2} \mathrm{d}s\Big)^{1/2} \\ &< \infty . \end{aligned}$$

Similarly, we see that for any \(0\le t_{1}\le t_{2}\le T\),

$$\begin{aligned} |g^{n}(t_{2})-g^{n}(t_{1})|&\le \int _{t_{1}}^{t_{2}} \|\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})\||\dot{h}_{n}(s)|\mathrm{d}s \\ &\le \int _{t_{1}}^{t_{2}}K(s)(1+|X_{s}^{0}|+W_{2}(\delta _{X_{s}^{0}}, \delta _{0}))|\dot{h}_{n}(s)|\mathrm{d}s \\ &\le C(T)(t_{2}-t_{1})^{1/2}\Big(\int _{t_{1}}^{t_{2}}|\dot{h}_{n}(s)|^{2} \mathrm{d}s\Big)^{1/2} \\ &\le C(T,N)(t_{2}-t_{1})^{1/2}. \end{aligned}$$

Hence, the family of function \(\{g^{n}\}_{n\ge 1}\) are equicontinuous in \(C([0,T];\mathbb{R}^{d})\).

According to the Azelà-Ascoli theorem, \(\{g^{n}\}_{n\ge 1}\) is relatively compact in \(C([0,T];\mathbb{R}^{d})\), let \(g\) be any limit point of \(\{g^{n}\}_{n\ge 1}\). Noting \({h}_{n}\rightarrow {h}\) on \(S_{N}\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\int _{0}^{t}\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})\dot{h}_{n}(s)\mathrm{d}s=\int _{0}^{t} \sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})\dot{h}(s)\mathrm{d}s, \forall t\in [0,T], \end{aligned}$$

that is, \(\lim _{n\rightarrow \infty }\sup _{t\in [0,T]}|I_{2}^{n}(t)|=0\). This, together with (3.3), yields that

$$\begin{aligned} \sup _{0\le t\le T}|Y_{t}^{h_{n}}-Y_{t}^{h}|&\le \int _{0}^{T}K(t)(1+|X_{t}^{0}|+ \mathbb{W}_{2}(\delta _{X_{t}^{0}},\delta _{0}))|Y_{t}^{h_{n}}-Y_{t}^{h}| \mathrm{d}t+\sup _{0\le t\le T}I_{2}^{n}(t), \end{aligned}$$

by the Gronwall inequality, we arrive at

$$\begin{aligned} \sup _{0\le t\le T}|Y_{t}^{h_{n}}-Y_{t}^{h}|&\le \exp {\Big\{ \int _{0}^{T}K(t)(1+|X_{t}^{0}|+ \mathbb{W}_{2}(\delta _{X_{t}^{0}},\delta _{0}))\mathrm{d}t \Big\} }\sup _{0\le t\le T}I_{2}^{n}(t) \\ &\le C(T,N)\sup _{0\le t\le T}I_{2}^{n}(t)\rightarrow 0, \text{as}~~n \rightarrow \infty . \end{aligned}$$

Thus, the \(\Gamma ^{0}\) is a continuous map, the proof is therefore completed. □

Before verifying condition (a), we give an estimate for the second moment of \(\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}\).

Lemma 4.4

Assume (H1) holds. Then, there exists an \(\epsilon _{0}\in (0,1)\) such that for some \(C_{T}\),

$$ \mathbb{E}\Big(\sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big)\le C_{T},~\epsilon \in (0,\epsilon _{0}),~h_{\epsilon }\in \mathscr{A}_{N}, $$
(4.5)

where \(\overline{Y}_{\cdot }^{\epsilon ,h_{\epsilon }}\) is defined in (4.3).

Proof

Note that \(\overline{Y}_{\cdot }^{\epsilon ,h_{\epsilon }}\) can be decomposed into the following three parts

$$\begin{aligned} \overline{Y}_{t}^{\epsilon ,h_{\epsilon }}&=\int _{0}^{t} \frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}[b_{s}(Y_{s}^{\epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})-b_{s}(X_{s}^{0},\delta _{X_{s}^{0}}) ] \mathrm{d}s \\ &~~+\int _{0}^{t}\frac{1}{\lambda (\epsilon )}\sigma _{s}(Y_{s}^{ \epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})\mathrm{d}W_{s} +\int _{0}^{t} \sigma _{s}(Y_{s}^{\epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})\dot{h}_{\epsilon }(s)\mathrm{d}s \\ &=: \sum _{i=1}^{3}J_{i}^{\epsilon ,h_{\epsilon }}(t). \end{aligned}$$

By (H1), we have

$$ \begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|J_{1}^{\epsilon ,h_{\epsilon }}(t)|^{2}\Big) &\le \frac{TK(T)}{\epsilon \lambda ^{2}(\epsilon )}\int _{0}^{T}\mathbb{E}|Y_{s}^{ \epsilon ,h_{\epsilon }}-X_{s}^{0}|^{2}\mathrm{d}s \le C_{T}\int _{0}^{T}\mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2}\mathrm{d}s. \end{aligned} $$

By the BDG inequality, (3.3) and (3.4), one has

$$ \begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|J_{2}^{\epsilon ,h_{\epsilon }}(t)|^{2}\Big) &\le \frac{C_{T}}{\lambda ^{2}(\epsilon )} \int _{0}^{T}\mathbb{E}[1+|Y_{s}^{\epsilon ,h_{\epsilon }}|^{2}+ \mathbb{W}_{2}(\delta _{X_{s}^{0}},\delta _{\mathbf{0}})^{2}] \mathrm{d}s \\ &\le \frac{C_{T}}{\lambda ^{2}(\epsilon )}\int _{0}^{T}[1+\mathbb{E}|Y_{s}^{ \epsilon ,h_{\epsilon }}-X_{s}^{0}|^{2}+\mathbb{E}|X_{s}^{0}|^{2}] \mathrm{d}s \\ &\le \frac{C_{T}}{\lambda ^{2}(\epsilon )}\int _{0}^{T}[1+\epsilon \lambda ^{2}(\epsilon )\mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2}+\mathbb{E}|X_{s}^{0}|^{2}]\mathrm{d}s \\ &\le \frac{C_{T}}{\lambda ^{2}(\epsilon )}+\epsilon C_{T}\int _{0}^{T} \mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2} \mathrm{d}s. \end{aligned} $$

Applying the Hölder inequality and recalling \(h_{\epsilon }\in \mathscr{A}_{N}\), we obtain from (3.3) and (3.4) that

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|J_{3}^{\epsilon ,h_{\epsilon }}(t)|^{2} \Big) &\le C_{T}\mathbb{E}\int _{0}^{T}[1+|{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2}+\mathbb{W}_{2}(\delta _{X_{s}^{0}},\delta _{\mathbf{0}})^{2}]| \dot{h}_{\epsilon }(s)|^{2}\mathrm{d}s \\ &\le C_{T}\Big(1+\Big(\sup _{0\le t\le T}|X_{t}^{0}|^{2}\Big)+ \epsilon \lambda ^{2}(\epsilon )\mathbb{E}\Big(\sup _{0\le t\le T}| \overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big)\Big)\int _{0}^{T}| \dot{h}_{\epsilon }(s)|^{2}\mathrm{d}s \\ &\le C_{T}\Big(1+\epsilon \lambda ^{2}(\epsilon )\mathbb{E}\Big(\sup _{0 \le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big)\Big) . \end{aligned}$$

Thus, we arrived at

$$ \mathbb{E}\Big(\sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big) \le C_{T}\Big(1+ \frac{1}{\lambda ^{2}(\epsilon )}+\epsilon \lambda ^{2}(\epsilon ) \mathbb{E}\Big(\sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big) +(1+\epsilon )\int _{0}^{T}\mathbb{E}| \overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\mathrm{d}t\Big). $$

Taking \(\epsilon >0\) sufficiently small such that \(C_{T}\epsilon \lambda ^{2}(\epsilon )\le \frac{1}{2}\) leads to

$$ \mathbb{E}\Big(\sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}|^{2}\Big) \le C_{T}\Big(1+ \frac{1}{\lambda ^{2}(\epsilon )} +(1+\epsilon )\int _{0}^{T} \mathbb{E}\Big(\sup _{0\le s\le t}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2}\Big)\mathrm{d}t\Big). $$

The desired assertion follows from Gronwall’s inequality and due to the fact that \(\frac{1}{\lambda ^{2}(\epsilon )}\rightarrow 0\) as \(\epsilon \rightarrow 0\). □

We are now in the position to verify the condition (a) of Lemma 1.1.

Lemma 4.5

Under assumptions (H1) and (2.5) in (H2). For every fixed \(N\in \mathbb{N}\), let \(h_{\epsilon }, h\in \mathscr{A}_{N}\) be such that \(h_{\epsilon }\Rightarrow h\) as \(\epsilon \rightarrow 0\). Then \(\Gamma ^{\epsilon }\Big (\frac{1}{\lambda (\epsilon )}W_{\cdot }+\int _{0}^{\cdot }\dot{h}_{\epsilon }(s)\mathrm{d}s\Big )\Rightarrow \Gamma ^{0} \Big (\int _{0}^{\cdot }\dot{h}(s)\mathrm{d}s\Big )\) in \(C([0,T];\mathbb{R}^{d})\).

Proof

By the Skorokhod representation theorem [3, Theorem 6.7, p. 70], there exists a probability space \((\widetilde{\Omega },\widetilde{\mathscr{F}}, \widetilde{\mathscr{F}}_{t},\widetilde{\mathbb{P}})\), and a Brownian motion \(\widetilde{W}\) on this basis, a family of \(\widetilde{\mathscr{F}}_{t}\)-predictable processes \(\{\widetilde{h_{\epsilon }};\epsilon >0\},\widetilde{h}\) taking values on \(\mathscr{A}_{N}\), \(\widetilde{\mathbb{P}}\)-a.s., such that the joint law of \((h_{\epsilon },h,W)\) under ℙ coincides with the law of \((\widetilde{h}_{\epsilon },\widetilde{h},\widetilde{W})\) under \(\widetilde{\mathbb{P}}\) and

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\langle \widetilde{h}_{\epsilon }- \widetilde{h},g\rangle =0, \forall g\in \mathbb{H}, \widetilde{\mathbb{P}}- a.s. \end{aligned}$$

Let \(\widetilde{Y}^{\epsilon ,\widetilde{h}_{\epsilon }}\) be the solution of (4.3) replacing \(h_{\epsilon }\) by \(\widetilde{h}_{\epsilon }\) and \(W\) by \(\widetilde{W}\), and \(\widetilde{Y}^{\widetilde{h}}\) be the solution of (2.12) replacing \(h\) by \(\widetilde{h}\). Thus, to this end, it suffices to verify

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\|\widetilde{Y}^{\epsilon , \widetilde{h}_{\epsilon }}-\widetilde{Y}^{\widetilde{h}}\|=0,~~ \text{in probability}. \end{aligned}$$

In the sequel, we drop off the \(\widetilde{\cdot }\) in the notation for the sake of simplicity.

Note that \(\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}-Y_{t}^{h}\) can be decomposed as the next three parts:

$$ \begin{aligned} &\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}-Y_{t}^{h} \\ &=\left [\frac{1}{\sqrt{\epsilon }\lambda (\epsilon )}\int _{0}^{t}[b_{s}(Y_{s}^{ \epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})-b_{s}(X_{s}^{0},\delta _{X_{s}^{0}})] \mathrm{d}s -\int _{0}^{t}\nabla _{Y_{s}^{h}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0}) \mathrm{d}s\right ] \\ &~~+\int _{0}^{t}\Big[\sigma _{s}(Y_{s}^{\epsilon ,h_{\epsilon }}, \delta _{X_{s}^{0}})\dot{h}_{\epsilon }(s)-\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})\dot{h}(s)\Big]\mathrm{d}s + \frac{1}{\lambda (\epsilon )}\int _{0}^{t}\sigma _{s}(Y_{s}^{ \epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})\mathrm{d}W_{s} \\ &=:\sum _{i=1}^{3}I_{i}^{\epsilon ,h_{\epsilon }}(t). \end{aligned} $$

By (H2), we have

$$\begin{aligned} |I_{1}^{\epsilon ,h_{\epsilon }}(t)|&=\int _{0}^{t}\Big|\int _{0}^{1} \nabla _{\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}}b_{s}(\cdot , \delta _{X_{s}^{0}})(X_{s}^{0}+r(Y_{s}^{\epsilon ,h_{\epsilon }}-X_{s}^{0})) \mathrm{d}r-\nabla _{Y_{s}^{h}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0}) \Big|\mathrm{d}s \\ &\le \int _{0}^{t}\Big|\int _{0}^{1}\nabla _{\{\overline{Y}_{s}^{ \epsilon ,h_{\epsilon }}-Y_{s}^{h}\}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0}+r(Y_{s}^{ \epsilon ,h_{\epsilon }}-X_{s}^{0}))\mathrm{d}r\Big|\mathrm{d}s \\ &+\int _{0}^{t}\Big|\int _{0}^{1}\nabla _{Y_{s}^{h}}b_{s}(\cdot , \delta _{X_{s}^{0}})(X_{s}^{0}+r(Y_{s}^{\epsilon ,h_{\epsilon }}-X_{s}^{0})) \mathrm{d}r-\nabla _{Y_{s}^{h}}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0}) \Big|\mathrm{d}s \\ &\le K(t)\int _{0}^{t}\Big(|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}-Y_{s}^{h}| +\frac{\sqrt{\epsilon }\lambda (\epsilon )}{2}|Y_{s}^{h}|| \overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|\Big)\mathrm{d}s. \end{aligned}$$

By (4.4) and (4.5), it follows that

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|I_{1}^{\epsilon ,h_{\epsilon }}(t)|^{2} \Big) \lesssim \epsilon \lambda ^{2}(\epsilon )+\int _{0}^{T} \mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}-Y_{s}^{h}|^{2} \mathrm{d}s. \end{aligned}$$

By (H1) and (3.4), it follows that

$$\begin{aligned} &|I_{2}^{\epsilon ,h_{\epsilon }}(t)| \\ &\le \Big|\int _{0}^{t}\Big[\sigma _{s}(Y_{s}^{\epsilon ,h_{\epsilon }}, \delta _{X_{s}^{0}})-\sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})\Big] \dot{h}_{\epsilon }(s)\mathrm{d}s\Big|+\Big|\int _{0}^{t}\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})(\dot{h}_{\epsilon }(s)-\dot{h}(s))\mathrm{d}s \Big| \\ &\le \int _{0}^{t}K(s)|Y_{s}^{\epsilon ,h_{\epsilon }}-X_{s}^{0}|| \dot{h}_{\epsilon }(s)|\mathrm{d}s +\int _{0}^{t}|\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})(\dot{h}_{\epsilon }(s)-\dot{h}(s))|\mathrm{d}s \\ &\le \sqrt{\epsilon }\lambda (\epsilon )\int _{0}^{t}K(s)| \overline{Y}_{s}^{\epsilon ,h_{\epsilon }}||\dot{h}_{\epsilon }(s)| \mathrm{d}s +\int _{0}^{t}K(s)(1+|X_{s}^{0}|)|\dot{h}_{\epsilon }(s)- \dot{h}(s)|\mathrm{d}s, \end{aligned}$$

thus, by Hölder’s inequality and (3.3), it follows that

$$\begin{aligned} \mathbb{E}\Big(\sup _{0\le t\le T}|I_{2}^{\epsilon ,h_{\epsilon }}(t)|^{2} \Big) \lesssim \epsilon \lambda ^{2}(\epsilon )+\int _{0}^{T} \mathbb{E}|\dot{h}_{\epsilon }(s)-\dot{h}(s)|^{2}\mathrm{d}s. \end{aligned}$$

By the BDG inequality, (3.4) and (4.4), we arrive at

$$\begin{aligned} &\mathbb{E}\Big(\sup _{0\le t\le T}|I_{3}^{\epsilon ,h_{\epsilon }}(t)|^{2} \Big) \\ &\le \frac{1}{\lambda ^{2}(\epsilon )}\int _{0}^{T}\mathbb{E}\Big(\| \sigma _{s}(Y_{s}^{\epsilon ,h_{\epsilon }},\delta _{X_{s}^{0}})- \sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})\|^{2}+\|\sigma _{s}(X_{s}^{0}, \delta _{X_{s}^{0}})\|^{2}\Big)\mathrm{d}s \\ &\lesssim \frac{1}{\lambda ^{2}(\epsilon )}+\epsilon \int _{0}^{T} \mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}|^{2} \mathrm{d}s. \end{aligned}$$

Taking the above estimates into consideration, it follows that

$$\begin{aligned} &\mathbb{E}\Big(\sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon ,h_{\epsilon }}-Y_{t}^{h}|^{2}\Big) \\ &\lesssim \frac{1}{\lambda ^{2}(\epsilon )}+\epsilon (\lambda ^{2}( \epsilon )+1)+\int _{0}^{T}\mathbb{E}|\dot{h}_{\epsilon }(s)-\dot{h}(s)|^{2} \mathrm{d}s+\int _{0}^{T}\mathbb{E}|\overline{Y}_{s}^{\epsilon ,h_{\epsilon }}-Y_{s}^{h}|^{2}\mathrm{d}s, \end{aligned}$$

thus, the desired assertion follows from the Gronwall inequality and taking \(\epsilon \rightarrow 0\). □

Proof of Lemma 4.1

The conclusion of Lemma 4.1 follows from Lemma 1.1, Lemmas 4.3 and 4.5. □

4.2 \(\overline{X}^{\epsilon }\) and \(\overline{Y}^{\epsilon }\) Are Exponentially Equivalent

In order to show \(\overline{X}^{\epsilon }\) and \(\overline{Y}^{\epsilon }\) are exponentially equivalent, we need to prove the following lemma.

Lemma 4.6

For any \(\delta >0\), we have

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0}\epsilon \log \Big(\mathbb{P}\Big\{ \sup _{0\le t\le T}|\overline{X}_{t}^{\epsilon }-\overline{Y}_{t}^{\epsilon }|\ge \delta \Big\} \Big)=-\infty . \end{aligned}$$
(4.6)

The proofs of Lemma 4.6 is based on the following lemma, which corresponds to [8, Lemma 5.6.18].

Lemma 4.7

Let \(b_{t},\sigma _{t}\) be progressively measurable processes, \((w_{t})_{t\ge 0}\) is a \(d\)-dimensional Brownian motion, and let

$$\begin{aligned} \mathrm{d}z_{t}=b_{t}\mathrm{d}t+\sqrt{\epsilon }\sigma _{t} \mathrm{d}w_{t}, \, t\ge 0, \end{aligned}$$

where \(z_{0}\) is deterministic. Let \(\tau _{1}\in [0,1]\) be a stopping time with respect to the filtration of \(\{w_{t},t\in [0,1]\}\). Suppose that the coefficients of the diffusion matrix \(\sigma \) are uniformly bounded, and for some constants \(M,B,\rho \) and any \(t\in [0,\tau _{1}]\),

$$\begin{aligned} |\sigma _{t}|\le M(\rho ^{2}+|z_{t}|^{2})^{1/2},~~|b_{t}|\le B(\rho ^{2}+|z_{t}|^{2})^{1/2}. \end{aligned}$$

Then for any \(\delta >0\) and any \(\epsilon \le 1\),

$$\begin{aligned} \epsilon \log \mathbb{P}\Big(\sup _{t\in [0,\tau _{1}]}|z_{t}|\ge \delta \Big)\le K+\log \Big( \frac{\rho ^{2}+|z_{0}|^{2}}{\rho ^{2}+\delta ^{2}}\Big), \end{aligned}$$

where \(K=2B+M^{2}(2+d)\).

Proof of Lemma 4.6

Without loss of generality, we may choose \(R>0\) such that the initial data \(x\) is in the ball \(B_{R+1}(0)\) (center 0 and radius \(R+1\)). We also assume that \(X_{t}^{0}\) do not leave this ball up to time \(T\). We define the stopping time \(\tau _{R}':=\inf \Big \{t:t\ge 0\Big ||\overline{X}_{t}^{\epsilon }| \vee |\overline{Y}_{t}^{\epsilon }|\ge R+1\Big \}\), then we denote by \(\tau _{R}=\min \{T,\tau _{R}'\}\).

In the sequel, we consider \(\overline{z}_{t}:=\overline{X}_{t}^{\epsilon }-\overline{Y}_{t}^{\epsilon }\), the new process satisfies the following equation

$$\begin{aligned} \mathrm{d}\overline{z}_{t}=\int _{0}^{t} b_{s}\mathrm{d}s+ \sqrt{\epsilon }\int _{0}^{t}\sigma _{s}\mathrm{d}W_{s},~ \overline{z}_{0}={\mathbf{0}}, \end{aligned}$$
(4.7)

where

$$\begin{aligned} b_{t}:= \frac{b_{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})-b_{t}(\widetilde{Y}_{t}^{\epsilon },\delta _{X_{t}^{0}})}{\sqrt{\epsilon }\lambda (\epsilon )},~~~ \sigma _{t}:= \frac{\sigma _{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})-\sigma _{t}(\widetilde{Y}_{t}^{\epsilon },\delta _{X_{t}^{0}})}{\sqrt{\epsilon }\lambda (\epsilon )}. \end{aligned}$$

Note that both \(b_{t}\) and \(\sigma _{t}\) are progressively measurable processes. Assume \(t\le \tau _{R}\), then we derive from (2.6) that

$$\begin{aligned} |b_{t}|&= \frac{|b_{t}(X_{t}^{\epsilon },\mathscr{L}_{X_{t}^{\epsilon }})-b_{t}(X_{t}^{\epsilon },\delta _{X_{t}^{0}})+b_{t}(X_{t}^{\epsilon },\delta _{X_{t}^{0}})-b_{t}(\widetilde{Y}_{t}^{\epsilon },\delta _{X_{t}^{0}})|}{\sqrt{\epsilon }\lambda (\epsilon )} \\ &\le \frac{K(t)W_{2}(\mathscr{L}_{X_{t}^{\epsilon }},\delta _{X_{t}^{0}})}{\sqrt{\epsilon }\lambda (\epsilon )}+ \frac{K(t)|X_{t}^{\epsilon }-\widetilde{Y}_{t}^{\epsilon }|}{\sqrt{\epsilon }\lambda (\epsilon )} \\ &\le K(t)(\rho ^{2}(\epsilon )+|\overline{z}_{t}|^{2})^{1/2}, \end{aligned}$$

where \(\rho ^{2}(\epsilon )=\sup _{0\le t\le T}\mathbb{E}|\overline{X}_{t}^{\epsilon }|^{2}\). In the same vein, we have

$$\begin{aligned} |\sigma _{t}|\le K(t)(\rho ^{2}(\epsilon )+|\overline{z}_{t}|^{2})^{1/2}. \end{aligned}$$

Note that \(\overline{z}_{0}={\mathbf{0}}\), for any \(\delta ,\rho ^{\epsilon }\) and for any \(\epsilon \) small enough, we derive from Lemma 4.7 that

$$\begin{aligned} \epsilon \log \mathbb{P}\Big(\sup _{t\in [0,\tau _{R}]}|\overline{z}_{t}| \ge \delta \Big)\le KT+\log \Big( \frac{\rho ^{2}(\epsilon )}{\rho ^{2}(\epsilon )+\delta ^{2}}\Big). \end{aligned}$$

In the same way as the proof of (3.7), one can show that \(\rho ^{2}(\epsilon )\) converges to 0 as \(\epsilon \rightarrow 0\). Hence, we deduce that

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0}\epsilon \log \mathbb{P}\Big(\sup _{t \in [0,\tau _{R}]}|\overline{z}_{t}|\ge \delta \Big)=-\infty . \end{aligned}$$
(4.8)

Now, since

$$\begin{aligned} \{\|\overline{X}^{\epsilon }-\overline{Y}^{\epsilon }\|_{\infty }\ge \delta \}\subset \{\tau _{R}\le T\}\cup \Big\{ \sup _{0\le t\le \tau _{R}}| \overline{X}_{t}^{\epsilon }-\overline{Y}_{t}^{\epsilon }|\ge \delta \Big\} , \end{aligned}$$

we can conclude as long as we show that

$$\begin{aligned} \lim _{R\rightarrow \infty }\limsup _{\epsilon \rightarrow 0} \epsilon \log \Big(\mathbb{P}\{\tau _{R}< T\}\Big)=-\infty . \end{aligned}$$

Define \(\eta _{R}:=\{t:t\ge 0, |\overline{Y}_{t}^{\epsilon }|\ge R\}\), i.e., the first time of \(\overline{Y}^{\epsilon }\) exits from the ball \(B_{R}(0)\) (center 0 and radius \(R\)).

Let \(\tau _{R}< T\), we then have \(\{|\overline{X}_{\tau _{R}}^{\epsilon }|\vee |\overline{Y}_{\tau _{R}}^{\epsilon }|= R+1\}\).

If \(|\overline{Y}_{\tau _{R}}^{\epsilon }|= R+1\), then we have immediately \(\eta _{R}< T\). This implies that \(\mathbb{P}\{\tau _{R}< T\}\le \mathbb{P}\{\eta _{R}< T\}\).

If \(|\overline{X}_{\tau _{R}}^{\epsilon }|= R+1\), one can derive that

$$\begin{aligned} &P\{\tau _{R}< T\}\le P\{ |\overline{X}_{\tau _{R}}^{\epsilon }|= R+1\} \\ &=P\bigg\{ \sup _{t\in [0, \tau _{R}]}|\overline{z}_{t}| \ge \frac{1}{2}, |\overline{X}_{\tau _{R}}^{\epsilon }|= R+1\bigg\} + P \bigg\{ \sup _{t\in [0, \tau _{R}]}|\overline{z}_{t}| < \frac{1}{2}, | \overline{X}_{\tau _{R}}^{\epsilon }|= R+1\bigg\} \\ &\le P\bigg\{ \sup _{t\in [0, \tau _{R}]}|\overline{z}_{t}| \ge \frac{1}{2}\bigg\} + P\{ \eta _{R}< T\}. \end{aligned}$$

By (4.8), to end the proof, it is sufficient to prove that the probability that \(\overline{Y}^{\epsilon }\) exits the ball \(B_{R}(0)\) is very small as \(\epsilon \) goes to zero, i.e.

$$\begin{aligned} \lim _{R\rightarrow \infty }\limsup _{\epsilon \rightarrow 0} \epsilon \log \Big(\mathbb{P}\{\eta _{R}< T\}\Big)=-\infty . \end{aligned}$$

Recall that \(\overline{Y}^{\epsilon }\) satisfies an LDP for the uniform norm with good rate function \(I(g)\) given in (2.11). Then, for any closed set \(F\subset C([0,T];\mathbb{R}^{d})\) we have

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0}\epsilon \log \mathbb{P}\{ \overline{Y}^{\epsilon }\in F\}\le -\inf _{g\in F}I(g). \end{aligned}$$

As a consequence,

$$\begin{aligned} &\limsup _{\epsilon \rightarrow 0}\epsilon \log \Big(\mathbb{P}\{\eta _{R}< T \}\Big) =\limsup _{\epsilon \rightarrow 0}\epsilon \log \Big( \mathbb{P}\Big\{ \sup _{0\le t\le T}|\overline{Y}_{t}^{\epsilon }|\ge R \Big\} \Big) \\ &\le -\inf _{\{h\in \mathbb{H};g=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s),\|g\|_{\infty }\ge R\}}\frac{1}{2}\int _{0}^{T}| \dot{h}(s)|^{2}\mathrm{d}s. \end{aligned}$$

We remark that the infimum of \(I(g)\) on the set of paths exiting from the ball \(B_{R}(0)\) goes to infinity as \(R\) goes to infinity.

By (H1) and (3.3), we obtain that

$$\begin{aligned} |g(t)|&\le \int _{0}^{t}|\nabla _{g(s)}b_{s}(\cdot ,\delta _{X_{s}^{0}})(X_{s}^{0})+ \sigma _{s}(X_{s}^{0},\delta _{X_{s}^{0}})\dot{h}(s)|\mathrm{d}s \\ &\le \int _{0}^{t}|K(s)(|g(s)|+(1+|X_{s}^{0}|)|\dot{h}(s)|) \mathrm{d}s \\ &\le C_{t}\Big(\int _{0}^{t}|g(s)|\mathrm{d}s+\Big(\int _{0}^{t}| \dot{h}(s)|^{2}\mathrm{d}s\Big)^{1/2}\Big), \end{aligned}$$

and by the Gronwall inequality, we have

$$\begin{aligned} |g(t)|\le C_{t}\Big(\int _{0}^{t}|\dot{h}(s)|^{2}\mathrm{d}s\Big)^{1/2}< \infty . \end{aligned}$$

By taking \(R\rightarrow \infty \), it yields that \(\{h\in \mathbb{H};g=\Gamma ^{0}(\int _{0}^{\cdot }\dot{h}(s) \mathrm{d}s),\|g\|_{\infty }\ge R\}=\emptyset \), which implies \(I(g)=-\infty \). That is, \(\overline{X}^{\epsilon }\) and \(\overline{Y}^{\epsilon }\) are exponentially equivalent. □

Proof of Theorem 2.2

The conclusion of Theorem 2.2 follows from Lemma 4.1 and Lemma 4.6. □