The Cutoff Phenomenon in Wasserstein Distance for Nonlinear Stable Langevin Systems with Small Lévy Noise

Barrera, G.; Högele, M. A.; Pardo, J. C.

doi:10.1007/s10884-022-10138-1

The Cutoff Phenomenon in Wasserstein Distance for Nonlinear Stable Langevin Systems with Small Lévy Noise

Open access
Published: 25 February 2022

Volume 36, pages 251–278, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Dynamics and Differential Equations Aims and scope Submit manuscript

The Cutoff Phenomenon in Wasserstein Distance for Nonlinear Stable Langevin Systems with Small Lévy Noise

Download PDF

808 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This article establishes the cutoff phenomenon in the Wasserstein distance for systems of nonlinear ordinary differential equations with a dissipative stable fixed point subject to small additive Markovian noise. This result generalizes the results shown in Barrera, Högele, Pardo (EJP2021) in a more restrictive setting of Blumenthal-Getoor index $\alpha >3/2$ to the formulation in Wasserstein distance, which allows to cover the case of general Lévy processes with some given moment. The main proof techniques are based on the close control of the errors in a version of the Hartman–Grobman theorem and the adaptation of the linear theory established in Barrera, Högele, Pardo (JSP2021). In particular, they rely on the precise asymptotics of the nonlinear flow and the nonstandard shift linearity property of the Wasserstein distance, which is established by the authors in (JSP2021). Main examples are the nonlinear Fermi–Pasta–Ulam–Tsingou gradient flow and dissipative nonlinear oscillators subject to small (and possibly degenerate) Brownian or arbitrary $\alpha $-stable noise.

The Small-Mass Limit for Langevin Dynamics with Unbounded Coefficients and Positive Friction

Article 22 March 2016

Essential m-dissipativity and hypocoercivity of Langevin dynamics with multiplicative noise

Article Open access 02 March 2022

Large deviation principle for the 2D stochastic Cahn–Hilliard–Navier–Stokes equations

Article 19 May 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we study the asymptotics of the ergodic behavior of the following stochastic differential equation (SDE)

$$\begin{aligned} \mathrm {d}X^\varepsilon _t(x) = -b(X^\varepsilon _t(x))\mathrm {d}t + \varepsilon \mathrm {d}L_t, \quad X^\varepsilon _0(x) = x\in \mathbb {R}^d \end{aligned}$$

(1.1)

for small noise intensity $\varepsilon >0$, where the vector field $b\in \mathcal {C}^2(\mathbb {R}^d,\mathbb {R}^d)$ satisfies $b(0)=0$ and the following dissipative condition.

Hypothesis 1

(Dissipativity) There exists a constant $\delta >0$ such that

$$\begin{aligned} \langle b(x)-b(y),x-y\rangle \geqslant \delta |x-y|^2 \qquad \text { for all } ~x,y\in \mathbb {R}^d. \end{aligned}$$

(1.2)

The noise process $L=(L_t)_{t\geqslant 0}$ in (1.1) is a Lévy process with values in $\mathbb {R}^d$ on a given probability space $(\Omega , \mathcal {F}, \mathbb {P})$. It is well-known that the law of L is characterized by the triplet $(a,\Sigma , \nu )$, where $a\in \mathbb {R}^d$, $\Sigma \in \mathbb {R}^{d \times d}$ is a non-negative definite matrix and $\nu : \mathcal {B}(\mathbb {R}^d) \rightarrow [0, \infty ]$ is a locally finite Borel measure satisfying

$$\begin{aligned} \nu (\{0\}) = 0 \qquad \text{ and } \qquad \int _{\mathbb {R}^d} (1\wedge |z|^2) \nu (\mathrm {d}z) < \infty . \end{aligned}$$

For $\nu =0$ the process L is a multidimensional Brownian motion with drift, while for $a=0$ and $\Sigma =0$ we have a multidimensional pure jump process such as compound Poisson processes or $\alpha $-stable processes, in particular, the Cauchy process for $\alpha =1$. We refer to [1, 14, 16, 18, 22] for further details on Lévy processes. Under Hypothesis 1, it is known that the SDE (1.1) has a pathwise unique strong solution, see for instance Theorem 1.1 in [10], here denoted by $X^\varepsilon (x):=(X^\varepsilon _t(x))_{t\geqslant 0}$. Moreover, $X^\varepsilon (x)$ is a Markov process and, in particular, it satisfies the Feller property see Proposition 2.1 in [21].

In order to present the main results of this paper, we formally introduce the Wasserstein distance of order $p_*$. We assume some finite moment for $L_t$ and hence $X^{\varepsilon }_t(x)$ for all $t\geqslant 0$.

Hypothesis 2

(Finite $p_*$-th moment) There exists $p_*>0$ such that

$$\begin{aligned} \int _{|z|>1} |z|^{p_*} \nu (\mathrm {d}z) < \infty . \end{aligned}$$

This article shows the cutoff phenomenon for the family of processes $(X^{\varepsilon }(x))_{\varepsilon >0}$ with respective invariant measures $(\mu ^\varepsilon )_{\varepsilon >0}$ under the Wasserstein distance $\mathcal {W}_{p_*}$ of order $p_*>0$. For $p_*>1$ we characterize the following cutoff profile asymptotics

$$\begin{aligned} \mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )=\varepsilon \cdot C e^{-\mathfrak {q}r}+o(\varepsilon )\quad \text { for }\quad \varepsilon \rightarrow 0, \end{aligned}$$

(1.3)

where $\mathfrak {t}_\varepsilon =\frac{1}{\mathfrak {q}}|\ln (\varepsilon )|+\frac{\ell -1}{\mathfrak {q}}\ln (|\ln (\varepsilon )|)$ for some explicit positive constants $\mathfrak {q},\ell ,C$ that depend on x in terms of an $\omega $-limit set of the rotational part for the Hartman–Grobman linearization of $X^0(x)$.

For such processes $(X^{\varepsilon }(x))_{\varepsilon >0}$ where (1.3) fails, we establish the following weaker window cutoff asymptotics

$$\begin{aligned}&\lim \limits _{r\rightarrow \infty }\limsup \limits _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )}{\varepsilon }=0\qquad \text {and}\\&\lim \limits _{r\rightarrow -\infty }\liminf \limits _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )}{\varepsilon }=\infty . \end{aligned}$$

Our results generalize the results in [2] to the nonlinear vector field and [3, 5] and [6] to the Wasserstein distance which cover second order equations with degenerate noise. For a detailed introduction on the subject we refer to the aforementioned articles, in particular, see Table 1.1 in [3]. There is a particular advantage of studying this problem under the Wasserstein distance rather than in the total variation. While the Wasserstein distance only requires the existence of moments of $X^{\varepsilon }(x)$ of a given order, the total variation distance needs existence of its density in addition to its regularity. The latter brings further requirements for the Lévy process L which can be quite restrictive, see [3] for further details. Furthermore the Wasserstein case, at least in case of $X^{\varepsilon }(x)$ moments of order $p> 1$, the cutoff phenomenon of $(X^{\varepsilon }(x))_{\varepsilon > 0}$ is completely determined by an explicit function (see Theorem 2 below), here called as cutoff profile. On the contrary, in the total variation case the profile function can be very involved and even hard to simulate in examples.

In [4], the cutoff phenomenon with respect to the total variation distance covering SDEs of the type (1.1) in the one dimensional case, L being a standard Brownian motion and with general drift coefficient b (satisfying Hypothesis 1) is studied. Since scalar systems are gradient systems, there is always a cutoff profile which can be given explicitly in terms of the Gauss error function. The follow-up work [5] covers the multidimensional case, where the picture is considerably richer, due to the presence of strong and complicated rotational patterns. The authors characterize sharply the existence of a cutoff profile in terms of the omega limit sets appearing in the long-term behavior of the matrix exponential function $e^{-\mathcal {Q} t}x$ in Lemma B.2 in [5], which plays an analogous role in this article. The paper [6] is the first attempt to study the cutoff phenomenon for such models with jumps. More precisely, [6] covers the cutoff phenomenon with respect to the total variation distance of the generalized Ornstein-Uhlenbeck processes. The previous process satisfies an SDE of the form (1.1) with L being a Lévy process and $b(x)=\mathcal {Q} x$, where $\mathcal {Q}$ is a square real matrix whose eigenvalues have positive real parts. The proof methods are based on concise Fourier inversion techniques. Due to the aforementioned regularity inherited by the total variation, the results in [6] are given under the hypothesis of continuous densities of the marginals, which to date is mathematically not characterized in simple terms. The cutoff profile function in [6] is given in terms of the Lévy-Ornstein-Uhlenbeck limiting measure for $\varepsilon =1$ and measured in the total variation distance. Such profile functions are theoretically highly insightful, but almost impossible to calculate and simulate in examples. The characterization of the existence of a cutoff-profile remains analogously to [5] in abstract terms of the behavior of the mentioned profile function on a suitably defined omega limit set. The Wasserstein case is treated in [2] where, contrary to the total variation case, it is noted that the profile function takes an explicit and simple shape. Finally, [3] treats the cutoff phenomenon with respect to the total variation distance for (1.1) with b satisfying Hypothesis 1 and driven by a Lévy process in the rather restrictive class of strongly locally layered stable processes (see Definition 1.4 in [3]).

In this article we combine a nonlinear version of the Wasserstein estimates of [2], with the Freidlin-Wentzell first order approximation of (1.1) in the spirit of [3] and the fine properties of the Wasserstein distance given in Lemma 2.1, in particular, the non-standard shift linearity of Lemma 2.1.d).

The manuscript is organized in four parts. After the exposition of the setting and the presentation of the main results in Sect. 2, we illustrate our findings for the nonlinear Fermi–Pasta–Ulam–Tsingou gradient system and a class of nonlinear oscillators in Sect. 3. The main steps of the proof of the cutoff phenomenon are given in Sect. 4 while the auxiliary technical such as exponential ergodicity in Wasserstein distance, the coupling between the original nonlinear system and the Freidlin-Wentzell linearization results are given in the “Appendix”.

2 Setting and Main Results

2.1 Fine Properties of the Wasserstein Distance

For any two probability distributions $\mu _1$ and $\mu _2$ on $\mathbb {R}^d$ with finite $p_*$-th moment for some $p_*>0$, we define the Wasserstein $p_*$-distance between them as follows

$$\begin{aligned} \mathcal {W}_{p_*}(\mu _1,\mu _2)= \inf _{\Pi } \left( \int _{\mathbb {R}^d\times \mathbb {R}^d}|u-v|^{p_*}\Pi (\mathrm {d}u,\mathrm {d}v)\right) ^{1\wedge (1/p_*)}, \end{aligned}$$

where the infimum is taken over all couplings (joint distributions on $\mathbb {R}^d\times \mathbb {R}^d$) $\Pi $ with marginals $\mu _1$ and $\mu _2$. We refer to [12, 20] and references therein for more details. For convenience of notation we do not distinguish a random variable U and its law $\mathbb {P}_U$ as an argument of $\mathcal {W}_{p_*}$. That is, for random variables $U_1$, $U_2$ and probability measure $\mu $ we write $\mathcal {W}_{p_*}(U_1, U_2)$ instead of $\mathcal {W}_{p_*}(\mathbb {P}_{U_1}, \mathbb {P}_{U_2})$, $\mathcal {W}_{p_*}(U_1, \mu )$ instead of $\mathcal {W}_{p_*}(\mathbb {P}_{U_1}, \mu )$ etc. The next result establishes properties of the Wasserstein distance which turn out to be important for our arguments.

Lemma 2.1

(Properties of $\mathcal {W}_{p_*}$) For $p_*>0$, $u_1,u_2\in \mathbb {R}^d$, $c\in \mathbb {R}$ and $U_1$ and $U_2$ being random vectors in $\mathbb {R}^d$ with finite $p_*$-th moment we have the following:

(a)
The Wasserstein distance $\mathcal {W}_{p_*}$ is a metric.
(b)
Translation invariance: $\mathcal {W}_{p_*}(u_1+U_1,u_2+U_2)=\mathcal {W}_{p_*}(u_1-u_2+U_1,U_2)$.
(c)
Homogeneity:
$$\begin{aligned} \mathcal {W}_{p_*}(c\cdot U_1,c\cdot U_2)= {\left\{ \begin{array}{ll} |c|\;\mathcal {W}_{p_*}(U_1,U_2)&{}\text { for } p_*\in [1,\infty ),\\ |c|^{p_*}\;\mathcal {W}_{p_*}(U_1,U_2)&{}\text { for } p_*\in (0,1). \end{array}\right. } \end{aligned}$$
(d)
Shift linearity: For $p_*\geqslant 1$ it follows
$$\begin{aligned} \mathcal {W}_{p_*}(u_1+U_1,U_1)=|u_1|. \end{aligned}$$
(2.1)
For $p_*\in (0,1)$ we have
$$\begin{aligned} \max \{|u_1|^{p_*}-2\mathbb {E}[|U_1|^{p_*}],0\}\leqslant \mathcal {W}_{p_*}(u_1+U_1,U_1)\leqslant |u_1|^{p_*}. \end{aligned}$$
(2.2)
(e)
Domination: For any given coupling $\tilde{\Pi }$ between $U_1$ and $U_2$ it follows
$$\begin{aligned} \mathcal {W}_{p_*}(U_1, U_2) \leqslant \Big (\int _{\mathbb {R}^d\times \mathbb {R}^d} |v_1-v_2|^{p_*} \tilde{\Pi }(\mathrm {d}v_1,\mathrm {d}v_2)\Big )^{1\wedge (1/p_*)}. \end{aligned}$$
(f)
Characterization: Let $(U_n)_{n\in \mathbb {N}}$ be a sequence of random vectors with finite $p_*$-th moments and U a random vector with finite $p_*$-th moment. Then the following statements are equivalent:
1. (1)
  $\mathcal {W}_{p_*}(U_n, U) \rightarrow 0$ as $n\rightarrow \infty $.
2. (2)
  $U_n {\mathop {\longrightarrow }\limits ^{d}} U$ as $n \rightarrow \infty $ and $\mathbb {E}[|U_n|^{p_*}] \rightarrow \mathbb {E}[|U|^{p_*}]$ as $n\rightarrow \infty $.

For $p_*\in (0,1)$ equality (2.1) is false in general, see Remark 2.4 in [2]. The proof of the previous lemma is given in Lemma 2.2 in [2].

The following result yields the existence of a unique invariant distribution for (1.1) under Hypotheses 1 and 2. Moreover, under the Wasserstein distance, the strong solution of (1.1) is exponentially ergodic.

Proposition 1

(Existence of a unique invariant distribution) Under Hypothesis 1 for $p_*>0$ and Hypothesis 2 there exists a unique invariant probability measure $\mu ^\varepsilon $ such that

$$\begin{aligned} \mathcal {W}_{p_*}(X^\varepsilon _t(x),\mu ^{\varepsilon })\leqslant e^{-({1\wedge p_*}) \delta t} \left( |x|^{1\wedge p_*}+\int _{\mathbb {R}^d}|y|^{1\wedge p_*}\mu ^{\varepsilon }(\mathrm {d}y) \right) . \end{aligned}$$

(2.3)

The proof is given in “Appendix 1”.

2.2 Hartman–Grobman Asymptotics

The zeroth-order approximation of a smooth dynamical systems on a finite time horizon [0, T] subject to small perturbations is given by the deterministic system, that is, $(X^0_t(x))_{t\in [0,T]}$. Our main results treat small asymptotics close to the stable state 0 which translates to meaningful time scales $t_\varepsilon \rightarrow \infty $, as $\varepsilon \rightarrow 0$, in Theorems 1 and 2. Before we state our main result, we first provide the long-time asymptotics of $X^0_t(x)$ in terms of the spectral decomposition of the solution $t\mapsto e^{-Db(0)t}x^*$ of the respective linear system for some $x^*$ in a small neighbourhood of the origin.

Lemma 2.2

(Asymptotic Hartman–Grobman) Assume Hypothesis 1. Then for any $x\in \mathbb {R}^{d}\setminus \{0\}$ there exist:

(i)
positive constants $ \mathfrak {q}^x, \tau ^x,\ell ^x, m^x$ with $\ell ^x,m^x\in \{1,\ldots ,d\}$,
(ii)
angular velocities $\theta ^{x}_{1},\dots ,\theta ^x_{m^x}\in \mathbb {R}$, where all $\theta ^x_k \ne 0$ come in pairs $(\theta ^x_{j_*},\theta ^x_{j_*+1})=(\theta ^x_{j_*}, -\theta ^x_{j_*})$,
(iii)
linearly independent vectors $v_1^x,\dots ,v_{m_x}^x$ in $\mathbb {C}^d$ which are complex conjugate $(v^x_{j_*},v^x_{j_*+1})=(v^x_{j_*}, \bar{v}^x_{j_*})$ whenever $(\theta ^x_{j_*},\theta ^x_{j_*+1})=(\theta ^x_{j_*}, -\theta ^x_{j_*})$,

such that

$$\begin{aligned} \lim _{t \rightarrow \infty } \left| \frac{e^{\mathfrak {q}^x t}}{t^{\ell ^x-1}} X^0_{t+\tau ^x}(x) - \sum _{k=1}^{m^x} e^{i\theta ^x_k t}v^x_k \right| =0. \end{aligned}$$

(2.4)

Moreover,

$$\begin{aligned} 0<\liminf _{t\rightarrow \infty }\left| \sum _{k=1}^{m^x} e^{i t\theta ^x_k} v^x_k\right| \leqslant \limsup _{t\rightarrow \infty }\left| \sum _{k=1}^{m^x} e^{i t\theta ^x_k} v^x_k\right| \leqslant \sum _{k=1}^{m^x} |v^x_k|. \end{aligned}$$

(2.5)

The formal proof of the previous lemma is given in Lemma B.2 in Appendix B of [5].

Remark 2.3

(1)
Convention: Note that $\theta ^x_k=0$ is true for at most one index $k\in \{1,\ldots , m^x\}$. If such an index shows up in $\theta ^x_{1},\ldots , \theta ^x_{m^x}$ we adopt the convention that $\theta ^x_1=0$ and $v_1^x\in \mathbb {R}^d$, and hence $m^x=2n+1$ for some $n\in \mathbb {N}_0$. Otherwise, $m^x=2n$ for some $n\in \mathbb {N}_0$ and we eliminate $\theta ^x_1$ and count the angular velocities as follows $\theta ^x_2,\ldots , \theta ^x_{2n+1}$.
(2)
Note that the linearly independent complex vectors $v_1^x,\dots ,v_{m_x}^x$ in $\mathbb {C}^d$ not only depend on x but also crucially on the dissipation time $\tau ^x$ of the deterministic system to a Hartman–Grobman domain of conjugacy U. We stress that $\tau ^x$ is not unique since $X^0_{t+\tau ^x}(x) \in U$ for all $t\geqslant 0$.
(3)
A word about the parameters $\ell ^x$, $\mathfrak {q}^x$ and $m^x$ in Lemma 2.2. By the Hartman–Grobman theorem there are open sets $0\in U, V\subset \mathbb {R}^d$ and a homeomorphism $H: U\rightarrow V$ with $H(0) = 0$ satisfying for all $u\in U$ and $t\geqslant 0$
$$\begin{aligned} H(X^0_t(u)) = e^{- Db(0) t}H(u). \end{aligned}$$
(2.6)
In fact, by Hypothesis 1 we have that H is a $\mathcal {C}^1$-diffeomorphism, see the original paper [8] or Theorem(Hartman), Sec. 2.8, p.127, [13]. In [8] it is shown that H can be chosen to be
$$\begin{aligned} H(x) = x + o(|x|)_{|x| \rightarrow 0}. \end{aligned}$$
Let $\tilde{u} = X^0_{\tau ^x}(x) \in U$. With the help of a linear coordinate change W we obtain the Jordan normal form $Db(0) = W^{-1} J(Db(0)) W$ and (using the linearity of the semigroup)
$$\begin{aligned} H(X^0_{t+\tau ^x}(x)) = W^{-1} e^{- J(Db(0)) t} (W H(\tilde{u})). \end{aligned}$$
We denote $\tilde{w} = W H(\tilde{u})$. Now, the parameters $\ell ^x$, $\mathfrak {q}^x$ and $m^x$ are given as follows. Consider the sequence of generalized eigenspaces $H_{j}$ of J(Db(0)) such that
$$\begin{aligned} \mathbb {R}^d = H_1\oplus \dots \oplus H_{k_*}. \end{aligned}$$
By construction, $\tilde{w} \in G(\tilde{w}) := \text{ span }(\{H_k~|~\text{ where } 1\leqslant k\leqslant k_*: ~\text{ proj }(\tilde{w}, H_k)\ne 0\})$. Note that $G(\tilde{w})$ is unique. We consider the restriction
$$\begin{aligned} \tilde{J}(\tilde{w}):= J(Db(0))\big |_{G(\tilde{w})}. \end{aligned}$$
Now, $\mathfrak {q}^x$ is the smallest real part of the spectrum of $\tilde{J}(\tilde{w})$, $\ell ^x$ is the dimension of the largest Jordan block of $\tilde{J}(\tilde{w})$ which has the real part $\mathfrak {q}^x$ and $m^x$ is the number of Jordan blocks associated to $\mathfrak {q}^x$ and $\ell ^x$. Note that in case of a non real eigenvalue with real part $\mathfrak {q}^x$ and Jordan block size $\ell ^x$, we have $m^x\geqslant 2$. For an extensive numerical example for a linear chain of oscillators we refer to Sect. 4.3.2 in [2].

2.3 Main Results

Our first main result establishes $\infty /0$ collapse of the Wasserstein distance between the law of the current state $X^{\varepsilon }_t(x)$ and the dynamical equilibrium $\mu ^\varepsilon $ along the critical time scale $\mathfrak {t}^x_\varepsilon $ given in (2.7) under mild conditions.

Theorem 1

(Window cutoff) Let b satisfy Hypothesis 1 and $\nu $ satisfy Hypothesis 2 for some $p_*>0$. Fix $x\in \mathbb {R}^d\setminus \{0\}$ and consider the notation in the asymptotic Hartman–Grobman representation $\mathfrak {q}^x>0$, $\ell ^x , m^x \in \{1,\ldots , d\}$, $\theta ^x_1,\dots ,\theta ^x_{m^x} \in [0,2\pi )$, $v^x_1,\dots ,v^x_{m^x} \in \mathbb {C}^d$ and $\tau ^x>0$ of Lemma 2.2.

Then the family of processes $(X^{\varepsilon }(x))_{\varepsilon >0}$ exhibits a window cutoff phenomenon on the time scale

$$\begin{aligned} \mathfrak {t}^x_\varepsilon =\frac{1}{\mathfrak {q}^x}|\ln (\varepsilon )|+\frac{\ell ^x-1}{\mathfrak {q}^x}\ln (|\ln (\varepsilon )|) \end{aligned}$$

(2.7)

and for all asymptotically constant window sizes $w_\varepsilon $, that is, $w_\varepsilon \rightarrow w>0$ as $\varepsilon \rightarrow 0$, in the following sense. For all $0<p< p^*$ we have

$$\begin{aligned}&\lim _{r\rightarrow \infty }\limsup _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{{1\wedge p}}}=0 \qquad \text{ and } \nonumber \\&\lim _{r\rightarrow -\infty }\liminf _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\infty . \end{aligned}$$

(2.8)

The second main result provides two characterizations for the proper limits ($\varepsilon \rightarrow 0$) of the expressions in (2.8) for any fixed $r\in \mathbb {R}$. That is to say, we characterize under which conditions the asymptotics (1.3) is satisfied. In addition, it yields the precise shape of the limit which turn out to be a simple exponential function for $p\in [1,p_*)$.

Theorem 2

(Dynamical profile cutoff characterization for $p_*>0$) Let the assumptions (and the notation) of Theorem 1 be valid for some $p_*>0$. Consider the unique strong solution $(\mathcal {O}_t)_{t\geqslant 0}$ of the linear system

$$\begin{aligned} \mathrm {d}\mathcal {O}_t=-Db(0)\mathcal {O}_t+\mathrm {d}L_t, \end{aligned}$$

(2.9)

where $\mathcal {O}_\infty $ is the unique invariant probability distribution of (2.9).

(1)
Then for any $0<p< p_*$ the following statements are equivalent.
1. (i)
  For any $\lambda >0$, the function $\omega (x)\ni u\mapsto \mathcal {W}_{p}(\lambda u+\mathcal {O}_\infty ,\mathcal {O}_\infty )$ is constant, where
  $$\begin{aligned} \omega (x):= \Big \{ \text {accumulation points of } \sum _{k=1}^m e^{i t \theta ^x_k} v^x_k \text { as } t\rightarrow \infty \Big \}. \end{aligned}$$
2. (ii)
  The family of processes $(X^{\varepsilon }(x))_{\varepsilon >0}$ exhibits a profile cutoff for any $0< p< p_*$ as follows
  $$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}= \mathcal {P}^{x}_{p}(r) \quad \text { for any } r\in \mathbb {R}, \end{aligned}$$
  where
  $$\begin{aligned} \mathcal {P}^{x}_{p}(r):=\mathcal {W}_{p}\Big (\kappa ^x(r)\cdot v+ \mathcal {O}_\infty ,\mathcal {O}_\infty \Big ) \qquad \text{ for } \text{ any } v\in \omega (x) \end{aligned}$$
  (2.10)
  and
  $$\begin{aligned} \kappa ^x(r)= \frac{e^{-\mathfrak {q}^x r\cdot w}}{e^{\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{\ell ^x-1}}. \end{aligned}$$
(2)
For $p_*> 1$ and $p\in [1,p_*)$ the profile has the shape
$$\begin{aligned} \mathcal {P}^{x}_{p}(r)=\kappa ^x(r)\cdot |v|\quad \text { for all }v\in \omega (x) \end{aligned}$$
if and only if $\omega (x)$ is contained in a sphere in $\mathbb {R}^d$ with respect to the Euclidean norm.
(3)
We recall the convention of Remark 2.3. Let $p_*> 1$ and $p\in [1,p_*)$. If the angles $\theta ^x_{2},\ldots , \theta ^x_{2n}$ satisfy the following non-resonance condition
$$\begin{aligned} h_1\theta _2 + \ldots + h_n \theta _{2n} \in 2 \pi \cdot \mathbb {Z} \qquad \text{ for } \text{ all } (h_1, \ldots , h_n)\in \mathbb {Z}^n\setminus \{0\}, \end{aligned}$$
(2.11)
then the statements (i) and (ii) in item (1) are equivalent to the following normal growth condition of the asymptotic Hartman–Grobman linearization: The family of limiting vectors $(v_1^x,\mathsf {Re}\,v^x_2,\mathsf {Im}\, v^x_2,\ldots ,\mathsf {Re}\, v^x_{2n},\mathsf {Im}\, v^x_{2n})$ is orthogonal in $\mathbb {R}^d$ and satisfies
$$\begin{aligned} |\mathsf {Re}\, v^x_{2k}|=|\mathsf {Im}\, v^x_{2k}|\qquad \text{ for } \text{ all } \quad k=1,\ldots ,n. \end{aligned}$$

Remark 2.4

We stress that $\mathcal {O}_\infty = \lim _{t\rightarrow \infty } \mathcal {O}_t$ in $\mathcal {W}_{p_*}$ and due to Hypothesis 1 (in combination with Hypothesis 2) the distribution of $\mathcal {O}_\infty $ does not depend on any deterministic initial condition of (2.9).

Due to its relevance as physical observables, we formulate the corresponding window cutoff result for the respective moments.

Corollary 2.5

(Moments cutoff) Let the assumptions (and the notation) of Theorem 1 be valid for some $p_*>0$. Then for any $0< p<p_*$ it follows

$$\begin{aligned} \lim _{r\rightarrow \infty } \liminf _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}&=\lim _{r\rightarrow \infty } \limsup _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}= \mathbb {E}[|\mathcal {O}_{\infty }|^{p}],\\ \lim _{r\rightarrow -\infty } \liminf _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{p}}&=\lim _{r\rightarrow -\infty } \limsup _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}=\infty . \end{aligned}$$

3 Examples

In this section we present two examples which illustrate the applicability of Theorem 1 and Theorem 2 to nonlinear dynamics with degenerate noise.

Example 3.1

(The Fermi–Pasta–Ulam–Tsingou potential) We consider the nonlinear Langevin gradient system

$$\begin{aligned} \mathrm {d}X^\varepsilon _t = - \nabla \mathcal {U}(X^\varepsilon _t)\mathrm {d}t + \varepsilon \mathrm {d}L_t \end{aligned}$$

(3.1)

for the strongly convex quartic Fermi–Pasta–Ulam–Tsingou potential $\mathcal {U}(x) = \frac{1}{2} |x|^2 + \frac{1}{4}|x|^4$, $x\in \mathbb {R}^d$ subject to degenerate noise $\mathrm {d}L_t$. For any Lévy process L satisfying Hypothesis 2 for some $p_*>0$ the system (3.1) exhibits a profile cutoff due to Theorem 2 where the cutoff time is given by $\mathfrak {t}_\varepsilon ^x = |\ln (\varepsilon )|$. For $p_*>1$ and any $p\in [1,p_*)$ the profile function in $\mathcal {W}_{p}$ is always of the following exponential shape

$$\begin{aligned} \mathcal {P}^{x}_{p}(r)=e^{-wr- \tau ^x}\Big |\sum _{k=1}^m v^x_k\Big |, \end{aligned}$$

(3.2)

where $\tau ^{x}:=\min \{t\geqslant 0:|X^0_t(x)|\leqslant R_0/2\}$ and $R_0$ being an small radius inside of which Hartman–Grobman conjugation is valid. Note that $\tau ^x$ can be replaced by any upper bound of $\tau ^x$ such as for instance $(1/\delta )\ln (2|x|/R_0)$ given by Hypothesis 1.

In particular, the profile cutoff (3.2) is valid for $L=L^\alpha $ being an (possibly degenerate) $\alpha $-stable process with index $\alpha \in (1,2]$. Note that for the limiting case of a possibly degenerate Cauchy process ($\alpha =1$) and in fact of any $L^\alpha $ with index $\alpha \in (0,1)$, Theorem 2 also yields a profile cutoff. However, the profile function remains not explicit. This is due to the absence of a finite first moment and the lack of the shift linearity (2.2). In other words, the profile function is given in (2.10) for $p\in (0,\alpha )$ and up to our knowledge unknown how to simplify further. Note that the case of $\alpha \in (0,3/2]$ is new and is not covered in [3].

Example 3.2

(Nonlinear non-gradient with degenerate noise) For $F,\mathcal {H}\in \mathcal {C}^2(\mathbb {R}^2,\mathbb {R})$ we consider the following perturbed simple harmonic oscillator with unit angular frequency given in Section 4 of [19] subject to a small noise perturbation

$$\begin{aligned} \mathrm {d}\left( \begin{matrix} X^{\varepsilon ,1}_t \\ X^{\varepsilon ,2}_t \end{matrix} \right) =- \left( \begin{array}{c} X^{\varepsilon ,2}_t \,F(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t)-\partial _1 \mathcal {H}(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t) \\ -X^{\varepsilon ,1}_t \,F(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t)-\partial _2 \mathcal {H}(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t) \end{array} \right) \mathrm {d}t+ \varepsilon \mathrm {d}\left( \begin{matrix} 0 \\ \mathcal {L}_t \end{matrix} \right) , \end{aligned}$$

where $\mathcal {L}=(\mathcal {L}_t)_{t\geqslant 0}$ is a one dimensional Lévy process with finite $p_*$-th moments. The Jacobian matrix $Jb(v_1,v_2)$ at $(v_1,v_2)$ of the respective vector field $b:\mathbb {R}^2\rightarrow \mathbb {R}^2$ is given by

$$\begin{aligned} \left( \begin{matrix} v_2\partial _1 F(v_1,v_2)-\partial _{11}\mathcal {H}(v_1,v_2) &{} F(v_1,v_2)+v_2\partial _2 F(v_1,v_2)-\partial _{12}\mathcal {H}(v_1,v_2)\\ -F(v_1,v_2)-v_1\partial _{1}F(v_1,v_2)-\partial _{12}\mathcal {H}(v_1,v_2) &{} -v_1\partial _{2}F(v_1,v_2)- \partial _{22}\mathcal {H}(v_1,v_2) \end{matrix} \right) . \end{aligned}$$

It is enough to prove the existence of a positive constant $\delta $ such that for any $u_1,u_2,v_1,v_2\in \mathbb {R}$ it follows

$$\begin{aligned} (u_1,u_2) Jb(v_1,v_2)(u_1,u_2)^*&=(v_2\partial _1 F(v_1,v_2)-\partial _{11}\mathcal {H}(v_1,v_2))u^2_1\nonumber \\&\qquad +\, (-v_1\partial _2 F(v_1,v_2)-\partial _{22}\mathcal {H}(v_1,v_2))u^2_2 \nonumber \\&\qquad +\, (v_2\partial _2 F(v_1,v_2)-v_1\partial _{1}F(v_1,v_2)-2\partial _{12}\mathcal {H}(v_1,v_2))u_1u_2 \nonumber \\&\geqslant \delta (u^2_1+u^2_2). \end{aligned}$$

(3.3)

For instance, for a nonlinear perturbation of a linear oscillator, that is, $F(v_1,v_2)=\eta $ for some $\eta >0$, the preceding condition reads

$$\begin{aligned} -\Big (\partial _{11}\mathcal {H}(v_1,v_2)u^2_1+\partial _{22}\mathcal {H}(v_1,v_2)u^2_2+2\partial _{12}\mathcal {H}(v_1,v_2)u_1u_2\Big )\geqslant \delta (u^2_1+u^2_2). \end{aligned}$$

For $\mathcal {L}$ satisfying Hypothesis 2 with $p_*$, and F, $\mathcal {H}$ fulfilling (3.3) Theorem 1 implies window cutoff for any initial condition $(X^{\varepsilon ,1}_0,X^{\varepsilon ,2}_0)=x\in \mathbb {R}^2\setminus \{0\}$ and any $p\in (0,p_*)$. The cutoff time is given by

$$\begin{aligned} \mathfrak {t}^x_\varepsilon = \frac{1}{\mathfrak {q}^x} |\ln (\varepsilon )|+\frac{\ell ^x-1}{\mathfrak {q}^x}\ln (|\ln (\varepsilon )|). \end{aligned}$$

Note that this result is new even in the Brownian case since the results of [3] and [5] are stated for the total variation distance which requires regularity on the transition probabilities given in the setting of non-degenerate noise. In our case, the Wasserstein distance circumvents this difficulty by the continuity of $\mathcal {W}_{p}(x+X,X)$ for any $X\in L^{p}$ as $|x|\rightarrow 0$ and $|x|\rightarrow \infty $, while for total variation distance it requires absolutely continuity on the distribution of X. We refer to [3], Lemma 1.17 in Subsection 1.3.5, for an example where the continuity of the total variation distance under shifts is not valid.

In the sequel, we characterize the existence of a profile cutoff under (3.3) in terms of the linearization at the stable state (0, 0). Let $a:=-\partial ^2_{11}\mathcal {H}(0,0)$ $b:=-\partial ^2_{22}\mathcal {H}(0,0)$, $c:=-\partial _{12}\mathcal {H}(0,0)$ and $\eta _0:=-F(0,0)$. Then

$$\begin{aligned} Jb(0,0)= \left( \begin{matrix} a &{} -\eta _0+c\\ \eta _0+c &{} b \end{matrix} \right) . \end{aligned}$$

Note that $\eta _0=c$ implies that the eigenvalues of Jb(0, 0) are the numbers a and b which are positive and hence by Theorem 2 profile cutoff is valid. In the sequel we assume $\eta _0 \ne c$. Then the eigenvalues of Jb(0, 0) are given by

$$\begin{aligned} \lambda _{\pm }:=\frac{(a+b)\pm \sqrt{\Delta }}{2},\quad \Delta :=(a-b)^2+4(c^2-\eta ^2_0), \end{aligned}$$

with corresponding eigenvectors

$$\begin{aligned} v_{\pm }:= \left( 1,-\frac{a-b\mp \sqrt{\Delta }}{2(-\eta _0+c)}\right) . \end{aligned}$$

In addition,

$$\begin{aligned} \mathsf {Re}(v_{\pm })={\left\{ \begin{array}{ll} \left( 1,-\frac{a-b\mp \sqrt{\Delta }}{2(-\eta _0+c)}\right) &{} \text {if } \Delta \geqslant 0,\\ \left( 1,-\frac{a-b}{2(-\eta _0+c)}\right) &{} \text {if } \Delta< 0, \end{array}\right. }\qquad \mathsf { and } \qquad \mathsf {Im}(v_{\pm })={\left\{ \begin{array}{ll} \left( 0,0\right) &{} \text {if } \Delta \geqslant 0,\\ \pm \left( 0, \frac{ \sqrt{|\Delta |} }{2(-\eta _0+c)}\right) &{} \text {if } \Delta < 0. \end{array}\right. } \end{aligned}$$

For $\Delta \geqslant 0$ Theorem 2 yields a profile cutoff phenomenon. For $\Delta <0$ Theorem 1 implies the weaker window cutoff phenomenon, however, by part (3) of Theorem 2 the stronger profile cutoff for $p_*>1$ and $p\in [1,p_*)$ is valid if and only if

$$\begin{aligned} |\mathsf {Re}(v_{+})|^2=|\mathsf {Im}(v_{+})|^2 \text { and } \langle \mathsf {Re}(v_{+}),\mathsf {Im}(v_{+}) \rangle =0 \end{aligned}$$

which is equivalent to special case $a=b$ and $c=0$. In other words, $e^{-Jb(0,0)t}=e^{-at}R(\theta t)$, where $R(\theta t)$ is an orthogonal $2\times 2$ matrix with angle $\theta t$.

Remark 3.3

(A word about the linear dynamics) In [2] the authors study (1.1) for the linear vector field $b(x)=\mathcal {Q}x$ for any Hurwitz stable matrix $-\mathcal {Q}$, that is, $\mathsf {Re}(\lambda )<0$ for any eigenvalue $\lambda $ of $-\mathcal {Q}$. Under these assumptions, the results of Theorem 1 and Theorem 2 are obtained.

It is not hard to see that Hypothesis 1 implies $\mathsf {Re}(\lambda )\leqslant -\delta $ for any eigenvalue $\lambda $ of $-\mathcal {Q}$ and hence Hurwitz stability. However, the dissipativity condition (1.2) which is assumed in order to control the nonlinear vector field, is strictly stronger than Hurwitz stability. For instance, the vector field $b:\mathbb {R}^2 \rightarrow \mathbb {R}^2$ given by $b(x)=\mathcal {Q}x$ with

$$\begin{aligned} -\mathcal {Q}= \left( \begin{matrix} 0 &{} -1\\ \lambda &{} \lambda \end{matrix} \right) \text { with } \lambda \in (0,1/2) \end{aligned}$$

has eigenvalues with real part $-\lambda /2<0$, but it does not satisfy Hypothesis 1. Note that the dissipativity condition (1.2) is not even satisfied locally in a neighborhood of the origin.

4 Proofs of the Main Results

4.1 The First Order Approximation

We define the Freidlin-Wentzell first order approximation given by

(4.1)

where $(\mathcal {Y}^{x}_t)_{t\geqslant 0}$ is the unique strong solution of the linear inhomogeneous SDE

(4.2)

In [3], Lemma C.4 in Section C.4 it is shown that $Y^{\varepsilon }_t(x)$ converges in total variation distance to a unique limiting distribution $\mu ^\varepsilon _*$ as $t\rightarrow \infty $. Moreover, it is shown there that $\mu ^\varepsilon _*{\mathop {=}\limits ^{d}}\varepsilon \mathcal {O}_\infty $, where $\mathcal {O}_\infty $ is the unique invariant probability distribution of the homogeneous Ornstein-Uhlenbeck dynamics

$$\begin{aligned} \mathrm {d}\mathcal {O}_t=-Db(0)\mathcal {O}_t+\mathrm {d}L_t. \end{aligned}$$

(4.3)

In the sequel we reduce the nonlinear ergodic convergence of $X^\varepsilon _t(x)$ to the ergodic convergence of the Freidlin-Wentzell linearization $Y^\varepsilon _t(x)$ in (4.4) up to error terms. For any $0<p\leqslant p_*$, by the triangle inequality it follows that

$$\begin{aligned} \mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )\leqslant \mathcal {W}_{p}(X^\varepsilon _t(x),Y^\varepsilon _t(x))+\mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*)+\mathcal {W}_{p}(\mu ^\varepsilon _*,\mu ^\varepsilon ) \end{aligned}$$

for any $t\geqslant 0$, $x\in \mathbb {R}^d$. Analogously we estimate

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*)\leqslant \mathcal {W}_{p}(Y^\varepsilon _t(x),X^\varepsilon _t(x))+\mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )+\mathcal {W}_{p}(\mu ^\varepsilon ,\mu ^\varepsilon _*). \end{aligned}$$

Combining the preceding inequalities we obtain the linear approximation

$$\begin{aligned} \left| \mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )-\mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*) \right| \leqslant \mathcal {W}_{p}(X^\varepsilon _t(x),Y^\varepsilon _t(x))+\mathcal {W}_{p}(\mu ^\varepsilon ,\mu ^\varepsilon _*) \end{aligned}$$

(4.4)

for any $t\geqslant 0$, $x\in \mathbb {R}^d$. In Proposition 2 given in “Appendix B.2” we show that for any $t_\varepsilon =O(|\ln (\varepsilon )|)$ and $0< p < p_*$ the following limit holds

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{t_\varepsilon }(x),Y^\varepsilon _{t_\varepsilon }(x))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(4.5)

Moreover, in Lemma B.2 we show that for $0< p < p_*$

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(\mu ^\varepsilon _*,\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(4.6)

4.2 Derivation of the Cutoff Phenomenon

In the sequel, we analyze the asymptotic behavior of $\mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)\cdot \varepsilon ^{-(1\wedge p)}$ from which we recognize the cutoff of the Freidlin-Wentzell linearization $Y^\varepsilon _t(x)$. By the triangle inequality, translation invariance, homogeneity and shift linearity given in Lemma 2.1 we obtain for $0< p\leqslant p_*$

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)&= \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {Y}^x_t, \varepsilon \mathcal {O}_\infty ) \\&\leqslant \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {Y}^x_t, X^0_t(x) + \varepsilon \mathcal {O}_\infty ) + \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {O}_\infty , \varepsilon \mathcal {O}_\infty )\\&= \varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ) +\varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\varepsilon ^{-1} \cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ). \end{aligned}$$

Analogously we deduce

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)&\geqslant \varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\varepsilon ^{-1}\cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ) -\varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ). \end{aligned}$$

Consequently,

$$\begin{aligned} \Big |\frac{\mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)}{\varepsilon ^{1\wedge p}} - \mathcal {W}_{p}(\varepsilon ^{-1} \cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ) \Big | \leqslant \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ). \end{aligned}$$

(4.7)

The right-hand side of (4.7) does not depend of $\varepsilon $ and by Lemma B.3 it tends to 0 as $t\rightarrow \infty $. It is therefore enough to study the precise longterm behavior of $\mathcal {W}_{p}(\varepsilon ^{-1}\cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty )$ in order to derive the cutoff phenomenon.

4.3 Proof of Theorem 1

For any $0< p< p_*$, $\mathfrak {t}^x_\varepsilon $ and $w_\varepsilon $ being given in statement and $r\in \mathbb {R}$, (4.4), (4.5), (4.6), (4.7) yield

$$\begin{aligned} \begin{aligned}&\limsup \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon } (x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big (\frac{X^0_t(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ),\\&\liminf \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon } (x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big (\frac{X^0_t(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned} \end{aligned}$$

For short, we define

$$\begin{aligned} \mathfrak {T}^x_\varepsilon =\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon -\tau ^x \quad \text { and } \quad \Lambda ^x(\varepsilon ):=\frac{(\mathfrak {T}^x_\varepsilon )^{\ell -1}}{\varepsilon e^{\mathfrak {q}^x \mathfrak {T}^x_\varepsilon }} \sum _{k=1}^{m} e^{i \mathfrak {T}^x_\varepsilon \theta ^x_k} v^x_k. \end{aligned}$$

(4.8)

Claim A.

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\limsup _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ) \end{aligned}$$

and

$$\begin{aligned} \liminf _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\liminf _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ). \end{aligned}$$

for any $0<p< p_*$. In particular, the limit

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}\quad \text { exists iff}\quad \lim _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )\quad \text {exists}. \end{aligned}$$

(4.9)

Proof of Claim A

In the sequel we study the asymptotics of the drift term $X^0_t(x) \cdot \varepsilon ^{-1}$. A straightforward calculation shows

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0} \frac{(\mathfrak {T}^x_\varepsilon )^{\ell -1} e^{-\mathfrak {q}^x \mathfrak {T}^x_\varepsilon }}{\varepsilon }=e^{-\mathfrak {q}^x \tau }(\mathfrak {q}^x)^{1-\ell }e^{-\mathfrak {q}^x r\cdot w}. \end{aligned}$$

(4.10)

The preceding limit implies with the help of the spectral decomposition (2.4) given in Lemma 2.2 and the triangle inequality that

$$\begin{aligned} \begin{aligned} \mathcal {W}_{p}\Big (\frac{X^0_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )&\leqslant \mathcal {W}_{p}\Big (\Big (\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big )+\mathcal {O}_\infty , \mathcal {O}_\infty \Big )\\&\quad +\, \mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned} \end{aligned}$$

(4.11)

We set

$$\begin{aligned} R^x_\varepsilon := \mathcal {W}_{p}\Big (\Big (\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned}$$

Analogous reasoning yields

$$\begin{aligned} \mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ) \leqslant \mathcal {W}_{p}\Big (\frac{X^0_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )+R^x_\varepsilon . \end{aligned}$$

In the sequel it remains to show that $R^x_\varepsilon \rightarrow 0$ as $\varepsilon \rightarrow 0$. By the continuity of $z\rightarrow \mathcal {W}_{p}(z+\mathcal {O}_\infty ,\mathcal {O}_\infty )$ at $z=0$ it is enough to prove

$$\begin{aligned} \Big |\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big |\rightarrow 0, \quad \varepsilon \rightarrow 0, \end{aligned}$$

which is valid due to the limit (2.4) and (4.10). This finishes the proof of Claim A. $\square $

In the sequel, we prove the window cutoff asymptotics in (2.8). Note that $\Lambda ^x(\varepsilon )$ is uniformly bounded on $\varepsilon \in (0,1]$. For any accumulation point U (as $\varepsilon \rightarrow 0$) of $\big (\mathcal {W}_{p}( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty )\big )_{\varepsilon \in (0,1]}$ there exists a sequence $(\varepsilon _k)_{k\in \mathbb {N}}$, $\varepsilon _k\rightarrow 0$ as $k\rightarrow \infty $, such that

$$\begin{aligned} U=\lim \limits _{k\rightarrow \infty } \mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon _k) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ). \end{aligned}$$

The Bolzano–Weierstrass theorem for the sequence $(\Lambda (\varepsilon _k))_{k\in \mathbb {N}}$, the limit (4.10) and the continuity of $\mathcal {W}_{p}$ yield

$$\begin{aligned} U=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r} u+\mathcal {O}_\infty ,\mathcal {O}_\infty )\quad \text { for some } u\in \omega (x). \end{aligned}$$

(4.12)

In particular,

$$\begin{aligned} \begin{aligned}&\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}{\hat{u}}+\mathcal {O}_\infty ,\mathcal {O}_\infty ),\\&\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}\check{u}+\mathcal {O}_\infty ,\mathcal {O}_\infty ), \end{aligned} \end{aligned}$$

where ${\hat{u}},\check{u}\in \omega (x)$ and $\check{u}\ne 0$ by (2.5). Hence item (d) in Lemma 2.1 implies

$$\begin{aligned}&\lim \limits _{r\rightarrow \infty }\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=0 \quad \text { and }\\&\lim \limits _{r\rightarrow -\infty }\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\infty . \end{aligned}$$

This finishes the proof of Theorem 1. $\square $

4.4 Proof of Theorem 2

We keep the notation (4.8) of the proof of Theorem 1. By (4.9) it is enough to prove that the limit

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )\quad \text { exists}. \end{aligned}$$

(4.13)

We recall the definition of $\Lambda ^x(\varepsilon )$ (4.8) and the limit (4.10). By (4.12) we have

$$\begin{aligned}&\left\{ \text {accumulation points of } \mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ) \text { as } \varepsilon \rightarrow 0\right\} \nonumber \\&\quad = \left\{ \mathcal {W}_{p}\big ((e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r})\, u+\mathcal {O}_\infty ,\mathcal {O}_\infty \big ) : u\in \omega (x) \right\} . \end{aligned}$$

(4.14)

For $p\geqslant 1$, the shift linearity given in item d) of Lemma 2.1 implies

$$\begin{aligned} \mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r} u+\mathcal {O}_\infty ,\mathcal {O}_\infty )= e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}|u|. \end{aligned}$$

(4.15)

Combining (4.14) and (4.15) we infer

$$\begin{aligned}&\left\{ \mathcal {W}_{p}\big ((e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r})\, u+\mathcal {O}_\infty ,\mathcal {O}_\infty \big ) : u\in \omega (x) \right\} \nonumber \\&\quad = \left\{ e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}\,|u| : u\in \omega (x) \right\} . \end{aligned}$$

(4.16)

Hence (4.14) and (4.16) imply that the limit (4.13) exists if and only if the right-hand side of (4.16) has exactly one element. This is equivalent to $\omega (x)$ being contained in a sphere in $\mathbb {R}^d$ with respect to the Euclidean distance. For $p\in (0,1)$ the shift linearity is not valid and we are stuck after (4.14). Consequently, (4.14) holds true and the limit (4.13) exists if and only if for all $\lambda >0$ the function

$$\begin{aligned} \omega (x)\ni u\mapsto \mathcal {W}_{p}(\lambda u+\mathcal {O}_\infty ,\mathcal {O}_\infty )\quad \text { is constant}. \end{aligned}$$

This finishes the proof of Theorem 2. $\square $

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Applebaum, D.: Lévy Processes and Stochastic Calculus, 2nd edn. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Barrera, G., Högele, M.A., Pardo, J.C.: Cutoff thermalization for Ornstein–Uhlenbeck systems with small Lévy noise in the Wasserstein distance. J. Stat. Phys. 184(27) (2021)
Barrera, G., Högele, M.A., Pardo, J.C.: The cutoff phenomenon in total variation for nonlinear Langevin systems with small layered stable noise. Electron. J. Probab. 26, 1–76 (2021)
Article MathSciNet Google Scholar
Barrera, G., Jara, M.: Abrupt convergence of stochastic small perturbations of one dimensional dynamical systems. J. Stat. Phys. 163(1), 113–138 (2016)
Article ADS MathSciNet Google Scholar
Barrera, G., Jara, M.: Thermalisation for small random perturbation of dynamical systems. Ann. Appl. Probab. 30(3), 1164–1208 (2020)
Article MathSciNet Google Scholar
Barrera, G., Pardo, J.C.: Cut-off phenomenon for Ornstein–Uhlenbeck processes driven by Lévy processes. Electron. J. Probab. 25(15), 1–33 (2020)
Google Scholar
Da Prato, G., Gatarek, D., Zabczyk, J.: Invariant measures for semilinear stochastic equations. Stoch. Anal. Appl. 10(4), 387–408 (1992)
Article MathSciNet Google Scholar
Hartman, P.: On local homeomorphisms of Euclidean spaces. Bol. Soc. Mat. Mexicana 2(5), 220–241 (1960)
MathSciNet Google Scholar
Kallianpur, G., Sundar, P.: Stochastic Analysis and Diffusion Processes. Oxford University Press, Oxford (2014)
Book Google Scholar
Majka, M.: A note on existence of global solutions and invariant measures for jump SDEs with locally one-sided Lipschitz drift. Probab. Math. Stat. 40(1), 37–55 (2020)
Article MathSciNet Google Scholar
Mikami, T.: Asymptotic expansions of the invariant density of a Markov process with a small parameter. Ann. Inst. H. Poincaré Probab. Stat. 24(3), 403–424 (1988)
MathSciNet Google Scholar
Panaretos, V., Zemel, Y.: An Invitation to Statistics in Wasserstein Space. Springer (2020)
Perko, L.: Differential Equations and Dynamical Systems. Texts in Applied Mathematics, vol. 7, 3rd edn. Springer, New York (2001)
Book Google Scholar
Protter, P.: Stochastic Integration and Differential Equations, 2nd edn. Springer-Verlag, Berlin (2004)
Google Scholar
Saint Loubert Bié, E.: Étude d’une EDPS conduite par un bruit poissonnien. Probab. Theory Relat. Fields 111(2), 287–321 (1998)
Sato, K.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge (1999)
Google Scholar
Siorpaes, P.: Applications of pathwise Burkholder–Davis–Gundy inequalities. Bernoulli 24(4B), 3222–3245 (2018)
Article MathSciNet Google Scholar
Situ, R.: Theory of Stochastic Differential Equations with Jumps and Applications. Springer, New York (2005)
Google Scholar
Tudoran, R.M.: On the coercivity of continuously differentiable vector fields. Qual. Theory Dyn. Syst. 19(2), Paper No. 58, 1–7 (2020)
Villani, C.: Optimal Transport. Old and New. Springer-Verlag, Berlin (2009)
Book Google Scholar
Wang, J.: Regularity of semigroups generated by Lévy type operators via coupling. Stoch. Process. Appl. 120(9), 1680–1700 (2010)
Article Google Scholar
Watanabe, S., Ikeda, N.: Stochastic Differential Equations and Diffusion Processes. North-Holland Publishing Co., Amsterdam-New York, Kodansha Ltd., Tokyo (1981)

Download references

Acknowledgements

The authors would like to thank the anonymous referee for her/his valuable comments which has led to significant improvement of the manuscript. The authors would like to thank Carlos Gustavo Tamm de Araújo Moreira (Gugu) at IMPA for clarifying comments on the Hartman–Grobman theorem.

Funding

Open Access funding provided by University of Helsinki including Helsinki University Central Hospital. The research of GB has been supported by the Academy of Finland, via the Matter and Materials Profi4 University Profiling Action, an Academy Project (Project No. 339228) and the Finnish Centre of Excellence in Randomness and STructures (Project No. 346306). GB also would like to express his gratitude to University of Helsinki for all the facilities used along the realization of this work. The research of MAH has been supported by the proyecto de la Convocatoria 2020–2021: “Stochastic dynamics of systems perturbed with small Markovian noise with applications in biophysics, climatology and statistics” of the Facultad de Ciencias at Universidad de los Andes.

Author information

Authors and Affiliations

Department of Mathematical and Statistical Sciences, University of Helsinki, Exactum in Kumpula Campus. PL 68, Pietari Kalmin katu 5, 00560, Helsinki, Finland
G. Barrera
Departamento de Matemáticas, Facultad de Ciencias, Universidad de los Andes, Bogotá, Colombia
M. A. Högele
CIMAT, Jalisco S/N, Valenciana, CP 36240, Guanajuato, México
J. C. Pardo

Authors

G. Barrera
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Högele
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Pardo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors have contributed equally to the paper.

Corresponding author

Correspondence to G. Barrera.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A. Existence of the Invariant Measure

1.1 Invariant distribution $\mu ^\varepsilon $

In the sequel we show the existence of a unique invariant distribution $\mu ^\varepsilon $ of the solution of (1.1) for any $\varepsilon >0$. We stress that beyond the existence of moments (Hypothesis 2), this does not include any regularity such as absolute continuity whatsoever in our setting. For instance, our setting covers nonlinear oscillators with degenerate noise in Example 3.2.

We recall the standing assumptions Hypothesis 1 with $\delta >0$ and Hypothesis 2 with $p_*>0$. For the existence of the invariant probability measure $\mu ^\varepsilon $ it is enough to verify the following condition by [7], p. 388. For some $x\in \mathbb {R}^d$, the limit

$$\begin{aligned} \lim _{R\rightarrow \infty }\liminf _{T\rightarrow \infty }\frac{1}{T}\int _{0}^{T} \mathbb {P}\left( |X^\varepsilon _t(x)|>R\right) \mathrm {d}t=0. \end{aligned}$$

(A.1)

Hypotheses 1 and 2 imply inequality (D.3) p. 71 in [3]. That is to say, for $\gamma \in (0,1\wedge p_*)$ there exist positive constants $C_1,C_2,C_3$ such that for all $x\in \mathbb {R}^d$, $\varepsilon >0$, $t\geqslant 0$, $A=\varepsilon \Pi $, $c=\varepsilon $

$$\begin{aligned} \mathbb {E}[|X^\varepsilon _t(x)|^\gamma ]\leqslant e^{-\delta \gamma t}|x|^\gamma +C_3, \end{aligned}$$

(A.2)

where $C_3= c^\gamma +\frac{1}{\gamma \delta } \big (\gamma \delta c^\gamma + C_1\Vert A\Vert ^\gamma + C_2 c^{\gamma -2}\Vert A\Vert ^{2} \big )=\varepsilon ^\gamma \cdot \big (2+ \frac{1}{\gamma \delta }( C_1\Vert \Pi \Vert ^\gamma +C_2 \Vert \Pi \Vert ^2) \big ) $. Inequality (A.2) implies (A.1) with the help of the Markov inequality.

For the uniqueness, it enough to verify the following condition given in Theorem 11.4.3 in [9]. For any given positive numbers $\eta $, $\delta $ and R, there exists a positive constant S such that

$$\begin{aligned} \frac{1}{T}\int _{0}^{T} \mathbb {P}\left( |X^\varepsilon _t(x)-X^\varepsilon _t(y)|\geqslant \delta \right) \mathrm {d}t < \eta \quad \text { for all }\quad |x|,|y|\leqslant R \quad \text { and } \quad T>S. \end{aligned}$$

(A.3)

Hypotheses 1, 2 and the additivity of the noise imply (D.5) p. 71 in [3]. In other words, for any $\gamma \in (0,1\wedge p_*)$, $x,y\in \mathbb {R}^d$, $t\geqslant 0$, $\varepsilon >0$, $c=\varepsilon $ we have

$$\begin{aligned} \mathbb {E}[|X^\varepsilon _t(x)-X^\varepsilon _t(y)|^\gamma ]\leqslant |x-y|^\gamma e^{-\delta \gamma t}+2\varepsilon ^\gamma . \end{aligned}$$

The preceding inequality implies (A.3) with the help of the Markov inequality.

1.2 Convergence to $\mu ^\varepsilon $ in $\mathcal {W}_{p_}$ for $p_>0$

Due to Hypothesis 1 and the additive of the noise the natural coupling yields

$$\begin{aligned} |X^{\varepsilon }_t(x)-X^{\varepsilon }_t(y)|\leqslant |x-y| e^{-\delta t}\quad \text { for all }\quad x,y\in \mathbb {R}^d, t\geqslant 0. \end{aligned}$$

(A.4)

Since $\mu ^\varepsilon $ is an invariant measure and $X^\varepsilon $ is a Feller process, disintegration and (A.4) imply

$$\begin{aligned} \begin{aligned} \mathcal {W}_{p_*}(X^\varepsilon _t(x),\mu ^{\varepsilon })&\leqslant \int _{\mathbb {R}^d }\mathcal {W}_{p_*}(X^\varepsilon _t(x),X^\varepsilon _t(y))\mu ^\varepsilon (\mathrm {d}y)\leqslant e^{-({1\wedge p_*})\delta t}\int _{\mathbb {R}^d }|x-y|^{1\wedge p_*}\mu ^\varepsilon (\mathrm {d}y)\\&\leqslant e^{-({1\wedge p_*})\delta t}|x|^{1\wedge p_*}+e^{-({1\wedge p_*})\delta t}\int _{\mathbb {R}^d}|y|^{1\wedge p_*}\mu ^{\varepsilon }(\mathrm {d}y). \end{aligned} \end{aligned}$$

(A.5)

The preceding right-hand side tends to zero as $t\rightarrow \infty $ provided that $\int _{\mathbb {R}^d}|y|^{1\wedge p_*}\mu ^{\varepsilon }(\mathrm {d}y)<\infty $ which is shown in (2.84) p. 48 in [3].

Appendix B. $L^{p}$ Estimates for $p\in (0,p_*)$

We recall the Lévy-Khinchin formula of L with characteristic triple $(a,\Sigma ,\nu )$

$$\begin{aligned} \ln (\mathbb {E}\big [e^{i\langle u, L_t \rangle }\big ]) = t\left( i\langle a, u \rangle - \frac{1}{2} \langle u, \Sigma u \rangle + \int _{\mathbb {R}^d} \Big (e^{i\langle u, z \rangle } - 1 - i\langle u, z \rangle \mathbf {1}_{(0,1)}(|z|)\Big ) \nu (\mathrm {d}z)\right) \end{aligned}$$

and the pathwise Lévy-Itô representation

$$\begin{aligned} L_t=at+\Sigma ^{1/2}B_t+\int _{0}^{t}\int _{|z|\leqslant 1} z \tilde{N}(\mathrm {d}s \mathrm {d}z)+ \int _{0}^{t}\int _{|z|>1} z {N}(\mathrm {d}s \mathrm {d}z), \end{aligned}$$

(B.1)

where $(B_t)_{t\geqslant 0}$ is a standard Brownian motion in $\mathbb {R}^d$, N is a Poisson random measure on $[0,\infty )\times \mathbb {R}^d$ with intensity measure $\mathrm {d}t\otimes \nu (\mathrm {d}z)$ and $\tilde{N}$ is the compensated counterpart of N. See [16] for further details on Lévy processes.

We recall the standing assumptions Hypothesis 1 with $\delta >0$ and Hypothesis 2 with $p_*>0$.

1.1 Localization

We start with the probability estimate of the event

$$\begin{aligned} \mathcal {D}_t^x=\Big \{\sup \limits _{0\leqslant s\leqslant t}|\mathcal {Y}_s^x|> \vartheta \Big \}, \qquad \vartheta >0. \end{aligned}$$

where $\mathcal {Y}^x$ is given in (4.2). Note that $\mathcal {Z}_\cdot (0) = \mathcal {Y}^0_\cdot $ satisfies

$$\begin{aligned} \mathrm {d}\mathcal {Z}_t(x) = - Db(0) \mathcal {Z}_t(x) \mathrm {d}t + \mathrm {d}L_t, \qquad \mathcal {Z}_0(x) = x \end{aligned}$$

(B.2)

for $x= 0$.

Lemma B.1

For any $\gamma \in (0,p_*\wedge 1]$ there is a positive constant C such that for any $\vartheta \geqslant 1$, $x\in \mathbb {R}^d$ and $t\geqslant 0$ we have

$$\begin{aligned} \mathbb {P}(\mathcal {D}_t^x)\leqslant C\,t \vartheta ^{-\gamma }. \end{aligned}$$

(B.3)

Proof

By Theorem 1 in [17] we have

$$\begin{aligned} \sup \limits _{0\leqslant s\leqslant t}|\mathcal {Y}^x_s|\leqslant 6\sqrt{[\mathcal {Y}^x_\cdot (0)]_s}+2\int _{0}^{t} H_{s-}\, \mathrm {d}L_s,\qquad \text{ where } \qquad H_{s-}=\frac{\mathcal {Y}^x_{s-}}{\sqrt{\sup \limits _{s\leqslant t} (|\mathcal {Y}^x_{s-}|^2+[\mathcal {Y}^x_\cdot ]_{s-})}}. \end{aligned}$$

In particular, it follows

$$\begin{aligned}&[\mathcal {Y}^x_\cdot ]_{t} = [L]_{t} = \int _0^t \int _{|z|\leqslant 1} |z|^2 N(\mathrm {d}s\mathrm {d}z)\quad \text {such that }\\&\int _0^{t} H_{s-}\, \mathrm {d}\mathcal {Y}^x_s = \int _0^t \langle H_{s}, - Db(0) \mathcal {Y}^x_s \rangle \mathrm {d}s + \int _0^t \int _{|z|\leqslant 1}\langle H_{s-}, z \rangle \tilde{N}(\mathrm {d}s\mathrm {d}z) \\&\quad +\, \int _0^t \int _{|z|> 1}\langle H_{s-}, z \rangle N(\mathrm {d}s\mathrm {d}z). \end{aligned}$$

By Hypothesis 1 we obtain $ \int _0^t \langle H_{s-}, - Db(X^0_s(x)) \mathcal {Y}^x_s \rangle \mathrm {d}s \leqslant 0 $ a.s. Hence

$$\begin{aligned}&\mathbb {P}\Big ( \sup _{0\leqslant s\leqslant t}|\mathcal {Y}^x_s|> \vartheta \Big )\\ {}&\quad \leqslant \mathbb {P}\Big (6 \Big (\int _0^{t} \int \limits _{|z|\leqslant 1} |z|^2 N(\mathrm {d}s\mathrm {d}z)\Big )^{\frac{1}{2}} + 2\int _0^{t} \int \limits _{|z|\leqslant 1}\langle H_{s-}, z \rangle \tilde{N}(\mathrm {d}s\mathrm {d}z) \\ {}&\qquad +\, 2\int _0^{t} \int \limits _{|z|> 1}\langle H_{s-}, z \rangle N(\mathrm {d}s\mathrm {d}z)> \vartheta \Big ) \\ {}&\quad \leqslant \mathbb {P}\Big (\int _0^{t} \int _{|z|\leqslant 1} |z|^2 N(\mathrm {d}s\mathrm {d}z)> \frac{\vartheta ^2}{18^2} \Big ) + \mathbb {P}\Big (\int _0^{t} \int _{|z|\leqslant 1}\langle H_{s-}, z \rangle \tilde{N}(\mathrm {d}s\mathrm {d}z)> \frac{2\vartheta }{3} \Big ) \\ {}&\qquad +\, \mathbb {P}\Big (\int _0^{t} \int _{|z|> 1}\langle H_{s-}, z \rangle N(\mathrm {d}s\mathrm {d}z) > \frac{2\vartheta }{3} \Big ). \end{aligned}$$

We continue term by term. By the Chebyshev inequality we obtain

$$\begin{aligned} \mathbb {P}\Big (\int _0^{t} \int _{|z|\leqslant 1} |z|^2 N(\mathrm {d}s\mathrm {d}z) > \frac{\vartheta ^2}{18^2} \Big )&\leqslant \frac{18^2 t }{\vartheta ^2} \int _{|z|\leqslant 1} |z|^2 \nu (\mathrm {d}z) =:C_1 \frac{t}{\vartheta ^2} \end{aligned}$$

and

$$\begin{aligned} \mathbb {P}\Big (\int _0^{t} \int _{|z|\leqslant 1}\langle H_{s-}, z \rangle \tilde{N}(\mathrm {d}s\mathrm {d}z) > \frac{2\vartheta }{3}\Big )&\leqslant \Big (\frac{3}{2}\Big )^2\frac{1}{\vartheta ^2} \mathbb {E}\Big [\Big (\int _0^{t} \int _{|z|\leqslant 1}\langle H_{s-}, z \rangle \tilde{N}(\mathrm {d}s\mathrm {d}z\Big )^2\Big ]\\&= \Big (\frac{3}{2}\Big )^2\frac{1}{\vartheta ^2} \mathbb {E}\Big [\int _0^{t} \int _{|z|\leqslant 1}\langle H_{s-}, z \rangle ^2 \nu (\mathrm {d}z)\mathrm {d}s\Big ] \\&\leqslant \Big (\frac{3}{2}\Big )^2\frac{t}{\vartheta ^2} \int _{|z|\leqslant 1} |z|^2 \nu (\mathrm {d}z) =:C_2 \frac{t}{\vartheta ^2}. \end{aligned}$$

Finally, for $\gamma \in (0,p_*\wedge 1]$ we have

$$\begin{aligned} \mathbb {P}\Big (\int _0^{t} \int _{|z|> 1}\langle H_{s-}, z \rangle N(\mathrm {d}s\mathrm {d}z)> \frac{2\vartheta }{3}\Big )&\leqslant \mathbb {P}\Big (\int _0^{t} \int _{|z|> 1} |z| N(\mathrm {d}s\mathrm {d}z)> \frac{2\vartheta }{3}\Big ) \\&\leqslant \Big (\frac{3}{2}\Big )^\gamma \frac{1}{\vartheta ^\gamma } \mathbb {E}\Big [\Big (\int _0^{t} \int _{|z|> 1} |z| N(\mathrm {d}s\mathrm {d}z)\Big )^{\gamma }\Big ] \\&\leqslant \Big (\frac{3}{2}\Big )^\gamma \frac{1}{\vartheta ^\gamma } \mathbb {E}\Big [\int _0^{t} \int _{|z|> 1} |z|^{\gamma } N(\mathrm {d}s\mathrm {d}z)\Big ]\\&= \Big (\frac{3}{2}\Big )^\gamma \frac{t}{\vartheta ^\gamma } \int _{|z|> 1} |z|^{\gamma } \nu (\mathrm {d}z) =:C_3 \frac{t}{\vartheta ^\gamma }, \end{aligned}$$

where we have used the subadditivity of the power $\gamma $ in the sense of Subsection 1.1.2, see formula (1.6) in [15]. This finishes the proof of the statement. $\square $

1.2 First Order Approximation

We start with some technical preliminaries. In order to overcome that $u \mapsto |u|^{p}$ for $p \in (0, 2)$ is not twice continuously differentiable which turns out to be necessary for applying Itô’s formula we use the following $\mathcal {C}^2$ norm approximation $|x|_c:= \sqrt{|x|^2 + c^2}, c> 0$, with the limiting case $|x|_0 = |x|$. It is well-behaved in the following sense. For any $c>0$ we have

$$\begin{aligned} c \leqslant |x|_c \leqslant |x| + c, \quad \nabla |x|_c := \frac{x}{|x|_c} \quad \text { and }\quad 0 \leqslant \frac{|x|}{|x|_c} < 1. \end{aligned}$$

Furthermore, it is straightforward to verify for $G(x)=|x|^p_c$ the following calculations

$$\begin{aligned} \nabla G(x)=p |x|^{p-1}_c \frac{x}{|x|_c}=p|x|^{p-2}_c x\quad \text { and } \quad |\nabla G(x)|\leqslant p|x|^{p-1}_c. \end{aligned}$$

The $L_1$-matrix norm $\Vert \cdot \Vert _1$ of the respective Hessian $H_{G}(x)$, $x\in \mathbb {R}^d$, can be estimated as follows

$$\begin{aligned} \Vert H_{G}(x)\Vert _1 \leqslant p d|x|^{p-2}_c+pd (2-p)|x|^{p-2}_c=C(p,d)|x|^{p-2}_c. \end{aligned}$$

For details of the estimates, we refer to p. 69 in [3]. Since $p \in (0,2)$ and $c\leqslant |x|_c$, we obtain

$$\begin{aligned} \sup _{x\in \mathbb {R}^d}|H_{G}(x)|_1\leqslant C(p,d)\,c^{p-2}. \end{aligned}$$

(B.4)

Proposition 2

We keep the notation of Theorem 1. Then for any $x\in \mathbb {R}^d$, $r\in \mathbb {R}$ and $p\in (0,p_*)$ it follows

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(x),Y^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(x))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(B.5)

Proof

By the domination property of the Wasserstein distance in Lemma 2.1 it is enough to show the preceding limit in the respective $L^{p}$ space. By (4.1) we have

$$\begin{aligned} \mathrm {d}Y^\varepsilon _t(x)&= \big (- Db(X^0_t(x)) Y^\varepsilon _t(x) + Db(X^0_t(x)) X^0_t(x)-b(X^0_t(x)) \big )\mathrm {d}t + \varepsilon \mathrm {d}L_t. \end{aligned}$$

Let $\Delta ^\varepsilon _t := X^\varepsilon _t(x)-Y^\varepsilon _t(x)$, $t\geqslant 0$. Then

$$\begin{aligned} \mathrm {d}\, \Delta ^\varepsilon _t&= -\big (b(X^\varepsilon _t(x))-b(Y^\varepsilon _t(x))\big )\mathrm {d}t -\big (b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x)) \varepsilon \mathcal {Y}^x_t \big )\mathrm {d}t, \end{aligned}$$

where $(\mathcal {Y}^x_t)_{t\geqslant 0}$ is given in (4.2). An elementary estimate of the $p_*$-th power of a sum yields for all $t\geqslant 0$

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p_*}]&=\mathbb {E}[|(X^\varepsilon _t(x)-X^0_t(x))+\varepsilon \mathcal {Y}^x_t|^{p_*}]\nonumber \\&\leqslant C_{p_*}\left( \mathbb {E}[|X^\varepsilon _t(x)-X^0_t(x)|^{p_*}]+\varepsilon ^{p_*}\mathbb {E}[|\mathcal {Y}^x_t|^{p_*}]\right) , \end{aligned}$$

(B.6)

where $C_{p_*}$ is a positive constant. Since $(\mathcal {Y}^x_t)_{t\geqslant 0}$ satisfies a dissipative linear equation, it exhibits the same integrability as L, which is straightforward to verify. There are a positive constant $\tilde{C}_{p_*}$ and a function $S_{p_*}(t)$ of at most polynomial order such that

$$\begin{aligned} \mathbb {E}[|\mathcal {Y}^x_t|^{p_*}] \leqslant \tilde{C}_{p_*} \mathbb {E}[|L_t|^{p_*}]\leqslant \tilde{C}_{p_*} S_{p_*}(t) \quad \text { for all } t\geqslant 0. \end{aligned}$$

(B.7)

For the first term of the right-hand side of (B.6), Lemmas B.4 and B.5 yield the following estimate. For any $\eta \in (0,p_*)$ there is a map $R_\eta :[0,\infty )\rightarrow [0,\infty )$ which increases with polynomial order as t tends to infinity, such that

$$\begin{aligned} \mathbb {E}[|X^\varepsilon _t(x)-X^0_t(x)|^{p_*}]\leqslant \varepsilon ^{p_*-\eta }R_\eta (t) \quad \text { for any } t\geqslant 0. \end{aligned}$$

(B.8)

We start with the case $p_*>1$ and $p\in (1,p_*)$. The Hölder inequality implies

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p-1}]&\leqslant \left( \mathbb {E}[|\Delta ^\varepsilon _t|^{p_*}]\right) ^{\frac{p-1}{p_*}}\leqslant \varepsilon ^{\frac{p_*-\eta }{p_*}(p-1)}\tilde{R}_\eta (t)= \varepsilon ^{p-1-\eta '}\tilde{R}_\eta (t)\quad \text { for any } t\geqslant 0, \end{aligned}$$

(B.9)

where $\tilde{R}_\eta $ is a function of at most polynomial order as t tends to infinity and $\eta '=\eta (p-1) p_*^{-1}$. For $\eta $ small enough we fix $\eta '\in (0,{1}/{4})$. Since $p_*>1$, we may choose $p\in (1,p_*)$ and $\theta \in (0,{1}/{4})$. We split

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}] = \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\;\mathbf {1}(\mathcal {A}^\varepsilon _t)] +\mathbb {E}[|\Delta ^\varepsilon _t|^{p}\;\mathbf {1}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})], \end{aligned}$$

(B.10)

where

$$\begin{aligned} \mathcal {A}^\varepsilon _t:=\Big \{\sup _{0\leqslant s\leqslant t} |\varepsilon \mathcal {Y}^x_s|\leqslant \varepsilon ^{1-\theta }\Big \}. \end{aligned}$$

(B.11)

First we prove that

$$\begin{aligned} \left( \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\right) ^{\frac{1}{p}}\leqslant \left( pC(|x|) \int _{0}^{t}\tilde{R}_\eta (s)\mathrm {d}s\right) ^{\frac{1}{p}} \varepsilon ^{1+\frac{1-\eta '-2\theta }{p}}, \end{aligned}$$

where $C(|x|)=\max \limits _{|u|\leqslant |x|+1}|D^2b(u)|$. The choice of $\eta '$ and $\theta $ yields $1-\eta '-2\theta >{1}/{4}$. For notational convenience, we use the differential formalism, however, we stress that all differential inequalities are understood in the integral sense. Since $p> 1$, the chain rule, Hypothesis 1 and Cauchy-Schwarz inequality imply

$$\begin{aligned} \mathrm {d}\,|\Delta ^\varepsilon _t|^{p}&=-p|\Delta ^\varepsilon _t|^{p-2}\langle \Delta ^\varepsilon _t,b(X^\varepsilon _t(x))-b(Y^\varepsilon _t(x)) \rangle \mathrm {d}t\\&\qquad -\, p|\Delta ^\varepsilon _t|^{p-2}\langle \Delta ^\varepsilon _t,b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x))\varepsilon \mathcal {Y}^x_t) \rangle \mathrm {d}t\\&\leqslant -\delta p|\Delta ^\varepsilon _t|^{p}+p|\Delta ^\varepsilon _t|^{p-1} |b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x))\varepsilon \mathcal {Y}^x_t)|\mathrm {d}t. \end{aligned}$$

On the event $\mathcal {A}^\varepsilon _t$, Taylor’s theorem applied to b implies

$$\begin{aligned} \mathrm {d}\, |\Delta ^\varepsilon _t|^{p}\leqslant -\delta p |\Delta ^\varepsilon _t|^{p}\mathrm {d}t + pC(|x|)|\Delta ^\varepsilon _t|^{p-1}\varepsilon ^{2-2\theta }\mathrm {d}t. \end{aligned}$$

Taking expectation, the integral monotonicity, Fubini’s theorem and (B.9) yield

$$\begin{aligned}&\mathrm {d}\,\mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\\&\quad \leqslant -\delta p \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\mathrm {d}t+ pC(|x|)\mathbb {E}[|\Delta ^\varepsilon _t|^{p-1}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\varepsilon ^{2-2\theta }\mathrm {d}t\\&\quad \leqslant pC(|x|)\mathbb {E}[|\Delta ^\varepsilon _t|^{p-1}]\varepsilon ^{2-2\theta }\mathrm {d}t\\&\quad \leqslant pC(|x|) \tilde{R}_{\eta }(t) \varepsilon ^{p+1-\eta '-2\theta }\mathrm {d}t. \end{aligned}$$

Bearing in mind $|\Delta ^\varepsilon _0|^{p}=0$, we have

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\leqslant pC(|x|) \varepsilon ^{p+1-\eta '-2\theta }\int _{0}^{t}\tilde{R}_\eta (s)\mathrm {d}s. \end{aligned}$$

Therefore

$$\begin{aligned} \left( \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\right) ^{\frac{1}{p}}\leqslant \left( p C(|x|) \int _{0}^{t}\tilde{R}_\eta (s)\mathrm {d}s\right) ^{\frac{1}{p}} \varepsilon ^{1+\frac{1-\eta '-2\theta }{p}}. \end{aligned}$$

(B.12)

We continue with the estimate on the complement of $\mathcal {A}^\varepsilon _t$. We show

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})]\leqslant \varepsilon ^{p-\eta '}\mathcal {R}(t)\cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}, \end{aligned}$$

where $\mathcal {R}(t)$ is a function of at most polynomial order. Indeed, by Hölder’s inequality and the inequalities (B.6), (B.7) and (B.8) we have

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})]&\leqslant \mathbb {E}[|\Delta ^\varepsilon _t|^{p_*}]^{\frac{p}{p_*}}\cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}\\&\leqslant \Big ( C_{p_*}\varepsilon ^{p_*-\eta }R_\eta (t)+C_{p_*}\varepsilon ^{p_*}\tilde{C}_{p_*}S_{p_*}(t) \Big )^{\frac{p}{p_*}} \cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}\\&\leqslant \Big (\left( C_{p_*}\varepsilon ^{p_*-\eta }R_\eta (t)\right) ^{\frac{p}{p_*}} +\left( C_{p_*}\varepsilon ^{p_*}\tilde{C}_{p_*}S_{p_*}(t) \right) ^{\frac{p}{p_*}}\Big ) \cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}\\&= \Big (\left( C_{p_*}R_\eta (t)\right) ^{\frac{p}{p_*}}\varepsilon ^{(p_*-\eta ) \frac{p}{p_*}} +\left( C_{p_*}\tilde{C}_{p_*}S_{p_*}(t) \right) ^{\frac{p}{p_*}}\varepsilon ^{p}\Big ) \cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}\\&\leqslant \varepsilon ^{p-\eta \frac{p}{p_*}}\mathcal {R}(t)\cdot \mathbb {P}\big ((\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\big )^\frac{p_*-p}{p_*}, \end{aligned}$$

where $ \mathcal {R}(t):=\max \{\big ( C_{p_*}R_\eta (t)\big )^{\frac{p}{p_*}},\big (C_{p_*}\tilde{C}_{p_*}S_{p_*}(t) \big )^{\frac{p}{p_*}}\}$. As a consequence,

$$\begin{aligned} \left( \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})] \right) ^{1/p} \leqslant \varepsilon ^{1-\frac{\eta }{p_*}} (\mathcal {R}(t))^{\frac{1}{p}} \cdot \mathbb {P}\left( (\mathcal {A}^\varepsilon _t)^{\mathsf {c}} \right) ^\frac{p_*-p}{p_*p}. \end{aligned}$$

(B.13)

Combining estimates (B.12), (B.13) in decomposition (B.10) we obtain a positive constant $C:=C(p_*,p,\delta , |x|, |D^2F|)$ such that for any $t\geqslant 0$

$$\begin{aligned}&\quad \mathcal {W}_{p}(X^{\varepsilon }_t(x),Y^{\varepsilon }_t(x)) \leqslant \left( \mathbb {E}[|\Delta ^\varepsilon _t|^{p}]\right) ^{\frac{1}{p}}\nonumber \\ {}&\quad \leqslant \left( pC(|x|) \int _{0}^{t}\tilde{R}_\eta (s)\mathrm {d}s\right) ^{\frac{1}{p}} \varepsilon ^{1+\frac{1-\eta '-2\theta }{p}}+\varepsilon ^{1-\frac{\eta }{p_*}} (\mathcal {R}(t))^{\frac{1}{p}} \cdot \mathbb {P}\left( (\mathcal {A}^\varepsilon _t)^{\mathsf {c}}\right) ^\frac{p_*-p}{p_*p}. \end{aligned}$$

(B.14)

By Lemma B.1 there exists a positive constant C such that for all $\gamma \in (0, 1)$ for the choice $\vartheta = \varepsilon ^{-\theta /\gamma }$ and any $t\geqslant 0$ it follows

$$\begin{aligned} \mathbb {P}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})\leqslant Ct \varepsilon ^{\theta }. \end{aligned}$$

(B.15)

We further restrict $\theta $ such that additionally $0<\theta <\min \{\frac{2\eta p}{p_*-p},\frac{1}{4}\}$. Hence, with the help of inequality (B.14) and (B.15) we have

$$\begin{aligned}&\mathcal {W}_{p}(X^{\varepsilon }_t(x),Y^{\varepsilon }_t(x))\leqslant \mathcal {R}_1(t)\varepsilon ^{1+\frac{1}{4p}}+\mathcal {R}_2(t)\varepsilon ^{1+\frac{\eta }{p_*}}, \end{aligned}$$

where $\mathcal {R}_1$ and $\mathcal {R}_2$ are functions of at most polynomial order. Consequently we obtain the desired limit

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(x),Y^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(x))}{\varepsilon }=0. \end{aligned}$$

We continue with the case $p_*>0$ and $p\in (0,1\wedge p_*]$. Let $\theta \in (0,{1}/{4})$ and recall the event $\mathcal {A}^\varepsilon _t$ in (B.11). For $p\in (0,1\wedge p_*]$ we split

$$\begin{aligned} \mathbb {E}[|\Delta _t^\varepsilon |^{p}]= \mathbb {E}[|\Delta _t^\varepsilon |^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]+\mathbb {E}[|\Delta _t^\varepsilon |^{p}\mathbf {1}((\mathcal {A}^\varepsilon _t)^{\mathsf {c}})]=:J_1+J_2. \end{aligned}$$

We start with the term $J_1$. Since $|\cdot |^p$ is not differentiable, we apply the chain rule for the smooth approximation $|x|^{p}_c=(\sqrt{|x|^2+c^2})^{p}$. Hypothesis 1 then yields

$$\begin{aligned} \mathrm {d}\,|\Delta ^\varepsilon _t|^{p}_{c}&= -p|\Delta ^\varepsilon _t|^{p-2}_c\langle \Delta ^\varepsilon _t,b(X^\varepsilon _t(x))-b(Y^\varepsilon _t(x)) \rangle \mathrm {d}t\\&\qquad +\, p|\Delta ^\varepsilon _t|^{p-2}_c\langle \Delta ^\varepsilon _t,b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x)) \varepsilon \mathcal {Y}^x_t \rangle \mathrm {d}t\\&\leqslant -p\delta |\Delta ^\varepsilon _t|^{p-2}_c|\Delta ^\varepsilon _t|^2 \mathrm {d}t +p|\Delta ^\varepsilon _t|^{p-1}_c |b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x)) \varepsilon \mathcal {Y}^x_t|\mathrm {d}t\\&\leqslant -p\delta |\Delta ^\varepsilon _t|^{p}_c \mathrm {d}t+p\delta c^{p}\mathrm {d}t +pc^{p-1} |b(Y^\varepsilon _t(x)) -b(X^0_t(x)) - Db(X^0_t(x)) \varepsilon \mathcal {Y}^x_t|\mathrm {d}t. \end{aligned}$$

Due to $|X^0_t(x)|\leqslant e^{-\delta t}|x|$ for all $t\geqslant 0$ and $x\in \mathbb {R}^d$, Taylor’s expansion for b on the event $\mathcal {A}^\varepsilon _t$ implies

$$\begin{aligned} \mathrm {d}\,|\Delta ^\varepsilon _t|^{p}_{c}\leqslant -p\delta |\Delta ^\varepsilon _t|^{p}_c \mathrm {d}t+p\delta c^{p}\mathrm {d}t +pc^{p-1}C(|x|)\varepsilon ^{2(1-\theta )}, \end{aligned}$$

where $C(|x|)=\max \limits _{|u|\leqslant |x|+1}|D^2b(u)|$. Hence

$$\begin{aligned} \mathrm {d}\, \mathbb {E}[|\Delta ^\varepsilon _t|^{p}_{c}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\leqslant -p\delta \mathbb {E}[|\Delta ^\varepsilon _t|^{p}_c\mathbf {1}(\mathcal {A}^\varepsilon _t)]\mathrm {d}t+p\delta c^{p}\mathrm {d}t +pc^{p-1} C(|x|) \varepsilon ^{2(1-\theta )}\mathrm {d}t. \end{aligned}$$

The integral version of the Grönwall inequality with negative linearity given in Lemma 1 in [11] implies for all $t\geqslant 0$

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\leqslant \mathbb {E}[|\Delta ^\varepsilon _t|^{p}_{c}\mathbf {1}(\mathcal {A}^\varepsilon _t)]\leqslant c^{p} +\frac{1}{\delta }c^{p-1} C(|x|)\varepsilon ^{2(1-\theta )}. \end{aligned}$$

(B.16)

For $p\ne 1$ we have the following. Since $c>0$ is arbitrary and $\theta \in (0,{1}/{4})$, the choice $c=\varepsilon ^{1+{\eta }/{p}}$ with $\eta \in (0,\frac{p}{2(1-p)})$ in (B.16) yields for any $r\in \mathbb {R}$

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \frac{1}{\varepsilon ^{p}}\mathbb {E}[|\Delta ^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }|^{p}\mathbf {1}(\mathcal {A}^{\varepsilon }_{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon })]=0. \end{aligned}$$

(B.17)

The case of $p=1$ follows by the choice $c=\varepsilon ^{2}$ in (B.16).

We continue with the term $J_2$. By the subadditivity of the power $p\leqslant 1$ and the Hölder inequality for the index $p'/p$ where $p'\in (p,p_*)$ and r is such that $p/p'+1/r=1$ we have

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _t|^{p}\mathbf {1}((\mathcal {A}^{\varepsilon }_t)^{\mathsf {c}})]&\leqslant \mathbb {E}[|X^\varepsilon _t(x)|^{p}\mathbf {1}((\mathcal {A}^{\varepsilon }_t)^{\mathsf {c}})]+\mathbb {E}[|Y^\varepsilon _t(x)|^{p}\mathbf {1}((\mathcal {A}^{\varepsilon }_t)^{\mathsf {c}})]\nonumber \\&\leqslant (\mathbb {E}[|X^\varepsilon _t(x)|^{p'}])^{p/p'}(\mathbb {P}((\mathcal {A}^{\varepsilon }_t)^{\mathsf {c}}))^{1/r}+(\mathbb {E}[|Y^\varepsilon _t(x)|^{p'}])^{p/p'}(\mathbb {P}((\mathcal {A}^{\varepsilon }_t)^{\mathsf {c}}))^{1/r}. \end{aligned}$$

(B.18)

By Lemma B.5 we obtain for all $t\geqslant 0$

$$\begin{aligned} (\mathbb {E}[|X^\varepsilon _t(x)|^{p'}])^{{p}/{p'}}&\leqslant ( \mathbb {E}[|X^\varepsilon _t(x)-X^0_t(x)|^{p'}]+ |X^0_t(x)|^{p'}))^{p/{p'}}\nonumber \\ {}&\leqslant \varepsilon ^{p}(1+C_{p'}\cdot t)^{{p}/{p'}}+ |X^0_t(x)|^{p}.\end{aligned}$$

(B.19)

Note that for all $t\geqslant 0$ it follows

$$\begin{aligned} (\mathbb {E}[|Y^\varepsilon _t(x)|^{p'}])^{\frac{p}{p'}}\leqslant \varepsilon ^{p}(\mathbb {E}[|\mathcal {Y}^x_t|^{p'}])^{\frac{p}{p'}}+|X^0_t(x)|^{p}. \end{aligned}$$

(B.20)

Lemma A.1 in [3] yields the existence of a positive constant C(r, |x|) such that

$$\begin{aligned} |X^0_{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(x)|\leqslant C(r,|x|)\varepsilon . \end{aligned}$$

(B.21)

Combining (B.18) with inequalities (B.15), (B.19), (B.20) and (B.21) gives

$$\begin{aligned} \mathbb {E}[|\Delta ^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }|^{p}\mathbf {1}((\mathcal {A}^{\varepsilon }_{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon })^{\mathsf {c}})]&\leqslant (C(\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ) \varepsilon ^{\theta })^{1/r}C^{p}(r,|x|)\varepsilon ^{p}\\ {}&\quad +\, (C(\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ) \varepsilon ^{\theta })^{1/r} \big (\varepsilon ^{p}(1+C_{p'}\cdot (\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ))^{{p}/{p'}}\big )\\ {}&\quad +\, (C(\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ) \varepsilon ^{\theta })^{1/r}\big ( \varepsilon ^{p}(\mathbb {E}[|\mathcal {Y}^x_{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }|^{p'}])^{{p}/{p'}} \big ). \end{aligned}$$

Since $\mathbb {E}[|\mathcal {Y}^x_{t}|^{p'}]\leqslant \mathcal {R}(t)$, where $\mathcal {R}$ is a function of at most polynomial order, we have

$$\begin{aligned}&\limsup \limits _{\varepsilon \rightarrow 0} \frac{1}{\varepsilon ^{p'}}\mathbb {E}[|\Delta ^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }|^{p'}\mathbf {1}((\mathcal {A}^{\varepsilon }_{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon })^{\mathsf {c}})]\\ {}&\quad \leqslant \limsup \limits _{\varepsilon \rightarrow 0} \varepsilon ^{\theta /r} (C(\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ))^{1/r}\big (C^p(r,|x|)\\ {}&\qquad +\, (1+C_{p'}\cdot (\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ))^{{p}/{p'}} +\mathcal {R}(\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon ) \big ). \end{aligned}$$

The right-hand side of the preceding inequality equals zero. The preceding argument combined with (B.17) yields the desired limit (B.5). $\square $

1.3 Asymptotic First Order Approximation

Lemma B.2

For any $p\in (0,p_*)$ we have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(\mu ^\varepsilon _*,\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

Proof

First we observe that $Y^\varepsilon _t(0) = \mathcal {Z}^\varepsilon _t(0)$ for any $t\geqslant 0$, $\varepsilon >0$, where $(\mathcal {Z}_t^\varepsilon (0))_{t\geqslant 0}$ is given in (B.2). In abuse of notation, we write $(X^\varepsilon _t(\mu ^\varepsilon ))_{t\geqslant 0}$ (and analogously respectively $(\mathcal {Z}^\varepsilon _t(\mu ^\varepsilon _*))_{t\geqslant 0}$) for the process starting at the random vector with distribution $\mu ^\varepsilon $ independent of the noise process L. Since $X^{\varepsilon }_t(\mu ^\varepsilon )=\mu ^\varepsilon $ and $\mathcal {Z}^{\varepsilon }_t(\mu ^\varepsilon _*)=\mu ^\varepsilon _*$ for any $t\geqslant 0$, the triangle inequality yields

$$\begin{aligned} \mathcal {W}_{p}(\mu ^\varepsilon ,\mu ^\varepsilon _*)&=\mathcal {W}_{p}(X^\varepsilon _t(\mu ^\varepsilon ),\mathcal {Z}^\varepsilon _t(\mu ^\varepsilon _*))\leqslant \mathcal {W}_{p}(X^\varepsilon _t(\mu ^\varepsilon ),X^\varepsilon _t(0))\nonumber \\&\quad + \mathcal {W}_{p}(X^\varepsilon _t(0),\mathcal {Z}^\varepsilon _t(0))+ \mathcal {W}_{p}(\mathcal {Z}^\varepsilon _t(0),\mathcal {Z}^\varepsilon _t(\mu ^\varepsilon _*)). \end{aligned}$$

(B.22)

By Proposition 2 for $x= 0$, we have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(0),\mathcal {Z}^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon }(0))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(B.23)

By disintegration, inequalities (A.4) and (2.84) in [3] imply

$$\begin{aligned} \mathcal {W}_{p}(X^\varepsilon _t(\mu ^\varepsilon ),X^\varepsilon _t(0))&\leqslant \int _{\mathbb {R}^d} \mathcal {W}_{p}(X^\varepsilon _t(u),X^\varepsilon _t(0))\mu ^{\varepsilon }(\mathrm {d}u)\\&\leqslant e^{-\delta (1\wedge p) t}\int _{\mathbb {R}^d} |u|^{1\wedge p}\mu ^{\varepsilon }(\mathrm {d}u)\leqslant C e^{-\delta (1\wedge p) t}\varepsilon ^{1\wedge p} \end{aligned}$$

for some positive constant C. As a consequence,

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(\mu ^\varepsilon ),X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(0))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(B.24)

Analogously,

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(\mathcal {Z}^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(\mu ^\varepsilon _*),\mathcal {Z}^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(0))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$

(B.25)

Combining (B.22) with the estimates (B.23), (B.24) and (B.25) completes the proof. $\square $

Lemma B.3

For any $p\in (0,p_*)$ we have

$$\begin{aligned} \lim \limits _{t\rightarrow \infty }\mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty )=0. \end{aligned}$$

(B.26)

Proof

Recall that $\mathcal {O}_\infty $ is the limiting and invariant distribution of the homogeneous Ornstein-Uhlenbeck process $(\mathcal {Z}(x)_t)_{t\geqslant 0}$ defined in (B.2). That is $\mathcal {O}_\infty {\mathop {=}\limits ^{d}} \mathcal {Z}_\infty $. Since $-Db(X^0_t(x))$ converges exponentially fast to $-Db(0)$, it is natural to expect that the flow of $(\mathcal {Y}^x_t)_{t\geqslant 0}$ behaves as the flow of $(\mathcal {Z}_t(x))_{t\geqslant 0}$ for large t. In [3], Lemma C.3, it is shown that $\mathcal {Y}^x_t \rightarrow \mathcal {O}_\infty $ as $t \rightarrow \infty $ in law. However, the law $\mathcal {O}_\infty $ is not invariant under the random dynamics of $(\mathcal {Y}^x_t)_{t\geqslant 0}$ due to the time inhomogeneity. Analogously as in (A.5) we deduce

$$\begin{aligned} \mathcal {W}_{p}(\mathcal {Z}_t(x), \mathcal {O}_\infty ) \rightarrow 0, \quad \text{ as } t\rightarrow \infty . \end{aligned}$$

(B.27)

We start with the proof of the statement. The triangle inequality yields

$$\begin{aligned} \mathcal {W}_{p}(\mathcal {Y}^x_t,\mathcal {O}_\infty )\leqslant \mathcal {W}_{p}(\mathcal {Y}^x_t,\mathcal {Z}_t(0))+ \mathcal {W}_{p}(\mathcal {Z}_t(0),\mathcal {O}_\infty ), \end{aligned}$$

(B.28)

where the second term on the right-hand side tends to 0 as $t\rightarrow \infty $ due to (B.27). Thus it remains to prove $ \mathcal {W}_{p}(\mathcal {Y}^x_t,\mathcal {Z}_t(0)) \rightarrow 0$, as $t \rightarrow \infty $. Since

$$\begin{aligned} \mathcal {W}_{p}(\mathcal {Y}^x_t,\mathcal {Z}_t(0))\leqslant (\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}])^{1\wedge (1/p)}, \end{aligned}$$

we derive the respective $L^{p}$ estimates. By (4.2) and (B.2) we obtain

$$\begin{aligned} \mathrm {d}\,(\mathcal {Y}^x_t-\mathcal {Z}_t(0))=-Db(X^0_t(x))(\mathcal {Y}^x_t-\mathcal {Z}_t(0))\mathrm {d}t+ (Db(0)-Db(X^0_t(x)))\mathcal {Z}_t(0)\mathrm {d}t. \end{aligned}$$

We first consider the case $p_*>1$ and $p\in (1,p_*)$. The chain rule and Hypothesis 1 yield

$$\begin{aligned} \mathrm {d}|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}&=-p|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-2}\langle \mathcal {Y}^x_t-\mathcal {Z}_t(0),Db(X^0_t(x))(\mathcal {Y}^x_t-\mathcal {Z}_t(0)) \rangle \mathrm {d}t\\&\qquad +\, p|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-2}\langle \mathcal {Y}^x_t-\mathcal {Z}_t,(Db(0)-Db(X^0_t(x)))\mathcal {Z}_t(0) \rangle \mathrm {d}t\\&\leqslant -p\delta |\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathrm {d}t+ p|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-1}|Db(0)\\&\qquad -\, Db(X^0_t(x))||\mathcal {Z}_t(0)|\mathrm {d}t\\&\leqslant -p\delta |\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathrm {d}t+ p|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-1}C(|x|)|X^0_t(x)||\mathcal {Z}_t(0)|\mathrm {d}t, \end{aligned}$$

where $C(|x|)=\max \limits _{|u|\leqslant |x|+1}|D^2b(u)|$. Taking expectation, using the monotonicity of the integrals and Fubini’s theorem imply

$$\begin{aligned}&\mathrm {d}\, \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]\leqslant -p\delta \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]\mathrm {d}t\\&\quad +\, pC(|x|)|X^0_t(x)|\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-1}\cdot |\mathcal {Z}_t(0)|]\mathrm {d}t. \end{aligned}$$

By Young’s inequality and $|X^0_t(x)|\leqslant e^{-\delta t}|x|$ for any $t\geqslant 0$ and $x\in \mathbb {R}^d$ it follows

$$\begin{aligned} \mathrm {d}\, \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]&\leqslant -p\delta \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]\mathrm {d}t \\&\quad +\, pC(|x|)|x| e^{-\delta t}\left( \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]\mathrm {d}t + \mathbb {E}[|\mathcal {Z}_t(0)|^{p}]\right) \mathrm {d}t. \end{aligned}$$

A straightforward calculation yields (for any $p>0$) that there exist functions $P_1(t)$ and $P_2(t)$ of polynomial order (depending of p, $\delta $, |x|) such that

$$\begin{aligned} \mathbb {E}[|\mathcal {Z}_t(0)|^{p}] \leqslant P_1(t)\quad \text { and } \quad \mathbb {E}[|\mathcal {Y}^x_t|^{p}]\leqslant P_2(t)\quad \text { for any } t\geqslant 0. \end{aligned}$$

(B.29)

Therefore,

$$\begin{aligned} \mathrm {d}\, \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]&\leqslant -p\delta \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]\mathrm {d}t+ p2^{p}C(|x|)|x|e^{-\delta t}(P_1(t) + P_2(t))\mathrm {d}t. \end{aligned}$$

The integral version of the Grönwall inequality with negative linearity given in Lemma 1 in [11] yields

$$\begin{aligned} \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]&\leqslant p2^{p}C(|x|)|x| e^{-p\delta t}\int _{0}^{t} e^{p\delta s}e^{-\delta s} (P_1(s) + P_2(s))\mathrm {d}s\\&\leqslant \frac{p2^{p}C(|x|)|x|}{\delta (p-1)} \max \limits _{0\leqslant s\leqslant t}\{P_1(s),P_2(s)\}e^{-\delta t}. \end{aligned}$$

Therefore,

$$\begin{aligned} \lim \limits _{t\rightarrow \infty }\mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {Z}_t(0)) \leqslant \lim \limits _{t\rightarrow \infty }\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]=0. \end{aligned}$$

(B.30)

Combining (B.27) and (B.30) in (B.28) we conclude (B.26).

We continue with the case $p \in (0, p_*\wedge 1]$. Note that the case $p_*>1$ and $p\in (0,1]$ is also covered in the sequel. By Lemma B.1 there exists a positive constant C such that for the choice $\gamma =p$, $\vartheta = e^{\frac{\delta }{2} t}$ and any $t\geqslant 0$ it follows

$$\begin{aligned} \mathbb {P}(\mathcal {D}_t^0)\leqslant Ct e^{-\frac{\delta p}{2} t}, \qquad \text{ where } \text{ we } \text{ recall } \quad \mathcal {D}_t^0=\Big \{\sup \limits _{0\leqslant s\leqslant t}|\mathcal {Z}_s(0)|> \vartheta \Big \}. \end{aligned}$$

(B.31)

We split

$$\begin{aligned} \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}]=\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})]+ \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathbf {1}(\mathcal {D}^0_t)] =:I_1+I_2. \end{aligned}$$

We start with the term $I_1$. The chain rule for $|x|^{p}_c=(\sqrt{|x|^2+c^2})^{p}$ and Hypothesis 1 yield

$$\begin{aligned} \mathrm {d}\,|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_{c}&=-p|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-2}_c\langle \mathcal {Y}^x_t-\mathcal {Z}_t(0),Db(X^0_t(x))(\mathcal {Y}^x_t-\mathcal {Z}_t(0)) \rangle \mathrm {d}t\\&\qquad +p|\mathcal {Y}^x_t-\mathcal {Z}_t|^{p-2}_c\langle \mathcal {Y}^x_t-\mathcal {Z}_t(0),(Db(0)-Db(X^0_t(x)))\mathcal {Z}_t(0) \rangle \mathrm {d}t\\&\leqslant -p\delta |\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p-2}_c|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^2 \mathrm {d}t \\&\qquad +p|\mathcal {Y}^x_t-\mathcal {Z}_t|^{p-1}_c C(|x|)|X^0_t(x)||\mathcal {Z}_t(0)|\mathrm {d}t\\&= -p\delta |\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_c \mathrm {d}t+p\delta c^{p}\mathrm {d}t +p c^{p-1} C(|x|)|X^0_t(x)||\mathcal {Z}_t(0)|\mathrm {d}t, \end{aligned}$$

where $C(|x|)=\max \limits _{|u|\leqslant |x|+1}|D^2b(u)|$. On the event $(\mathcal {D}^0_t)^{\mathsf {c}}$ we have

$$\begin{aligned} \mathrm {d}\,|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_{c}\leqslant -p\delta |\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_c \mathrm {d}t+p\delta c^{p}\mathrm {d}t +pc^{p-1} C(|x|)|x| e^{-(\delta /2) t}\mathrm {d}t \end{aligned}$$

due to $|X^0_t(x)|\leqslant e^{-\delta t}|x|$ for all $t\geqslant 0$ and $x\in \mathbb {R}^d$. Hence

$$\begin{aligned} \mathrm {d}\, \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_{c}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})]&\leqslant -p\delta \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_c\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})]\mathrm {d}t\\&\qquad +p\delta c^{p}\mathrm {d}t +pc^{p-1} C(|x|)|x| e^{-(\delta /2) t}\mathrm {d}t. \end{aligned}$$

The Grönwall inequality in [11] implies

$$\begin{aligned} \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})]&\leqslant \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}_{c}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})] \\&\leqslant c^{p} +pc^{p-1} C(|x|)|x|e^{-p\delta t}\int _{0}^{t}e^{p\delta s} e^{-(\delta /2) s}\mathrm {d}s. \end{aligned}$$

Then

$$\begin{aligned} \limsup _{t\rightarrow \infty } \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})] \leqslant c^{p}\quad \text { for all } c>0, \end{aligned}$$

which yields $\lim _{t\rightarrow \infty } \mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}\mathbf {1}((\mathcal {D}^0_t)^{\mathsf {c}})]=0$.

We continue with the term $I_2$. By the Hölder inequality for the index $p'/p$ where $p'=(p+p_*)/2$ and r the conjugate index of $p'/p$ we have

$$\begin{aligned}&\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p} \mathbf {1}(D_t)]\leqslant \mathbb {E}[|\mathcal {Y}^x_t|^{p}\mathbf {1}(D_t)]+\mathbb {E}[|\mathcal {Z}_t(0)|^{p}\mathbf {1}(D_t)]\nonumber \\&\quad \leqslant (\mathbb {E}[|\mathcal {Y}^x_t|^{p'}])^{p/p'}(\mathbb {P}(D_t))^{1/r}+(\mathbb {E}[|\mathcal {Z}_t(0)|^{p'}])^{p/p'}(\mathbb {P}(D_t))^{1/r}. \end{aligned}$$

(B.32)

By (B.29) and (B.31) the right-hand side of (B.32) tends to zero as $t\rightarrow \infty $. As a consequence we have $\mathcal {W}_{p}(\mathcal {Y}^x_t,\mathcal {Z}_t(0))\leqslant (\mathbb {E}[|\mathcal {Y}^x_t-\mathcal {Z}_t(0)|^{p}])^{1\wedge (1/p)}$ which tends to zero as $t\rightarrow \infty $. By (B.27) and (B.28) we obtain (B.26). $\square $

1.4 Auxiliary Moment Estimates

Lemma B.4

For any $2\leqslant p< p_*$ (and $p = 2$ if $p_* = 2$) there is a function of at most polynomial order R(t) as $t \rightarrow \infty $ and $\varepsilon _0\in (0, 1]$ such that for any $t\geqslant 0$ and $0< \varepsilon < \varepsilon _0$ we have

$$\begin{aligned} \mathbb {E}[|X^\varepsilon _t(x)-X^0_t(x)|^{p_*}] \leqslant \varepsilon ^{p} R(t). \end{aligned}$$

Proof

First note that for $G(u) = |u|^{p_*},p_*\geqslant 2$ we have

$$\begin{aligned}&\nabla G(u) = p_* |u|^{p_*-2} u = p_* (|u|^2)^{\frac{p_*-2}{2}} u, \text{ with } \partial _i G(u) = p_* |u|^{p_*-2} u_i,\\&\sum _{ij} \partial _i \partial _j G(u) \leqslant p_* |u|^{p_*-4} \big (d |u|^{2}+\sum _{ij} \frac{(p_*-2)}{2} (u_j^2 + u_i^2)\big )= p_* (p_*-1 )d |u|^{p_*-4} |u|^{2}. \end{aligned}$$

Recall the notation (B.1) for L. The Itô formula for $\Theta ^\varepsilon _t= X^\varepsilon _t(x)-X^0_t(x)$ yields

$$\begin{aligned} \mathrm {d}|\Theta ^\varepsilon _t|^{p_*}&= -p_* |\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, b(X^{\varepsilon }_t(x))-b(X^0_t(x)) \rangle \mathrm {d}t + p_* |\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, \varepsilon \Sigma ^{1/2} \mathrm {d}B_t \rangle \\&\qquad +\, \frac{\varepsilon ^2}{2} {{\,\mathrm{trace}\,}}(\Sigma ^{1/2} \text{ Hess } G(\Theta ^\varepsilon _t)(\Sigma ^{1/2})^{*}) \mathrm {d}t\\&\qquad +\, \int _{\mathbb {R}^d} \big (|\Theta ^\varepsilon _t+ \varepsilon z|^{p_*} -|\Theta ^\varepsilon _t|^{p_*}- p_* |\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \mathbf {1}\{|z|\leqslant 1\}\big ) \nu (\mathrm {d}z) \mathrm {d}t \\&\qquad +\, \int _{\mathbb {R}^d} \big (|\Theta ^\varepsilon _t+ \varepsilon z|^{p_*} -|\Theta ^\varepsilon _t|^{p_*}\big ) \tilde{N}(\mathrm {d}t, \mathrm {d}z). \end{aligned}$$

Taking expectation yields

$$\begin{aligned}&\mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}]\\&\quad \leqslant - \delta p_* \int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 p_* (p_*-1 )d {{\,\mathrm{trace}\,}}(\Sigma ^{1/2}(\Sigma ^{1/2})^{*})\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*-2} \Big ] \mathrm {d}s\\&\qquad +\, \int _0^t\int _{\mathbb {R}^d} \mathbb {E}\Big [|\Theta ^\varepsilon _t+ \varepsilon z|^{p_*} -|\Theta ^\varepsilon _t|^{p_*}- p_*|\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \Big ] \nu (\mathrm {d}z) \mathrm {d}s. \end{aligned}$$

By the mean value theorem we have

$$\begin{aligned}&\mathbb {E}\Big [|\Theta ^\varepsilon _t+ \varepsilon z|^{p_*} -|\Theta ^\varepsilon _t|^{p_*}- p_* |\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \Big ]\\&\quad \leqslant \mathbb {E}\Big [ p_* (p_*-1 )d \iint _{0}^1 |\Theta ^\varepsilon _t+ \theta \vartheta \varepsilon z|^{p_*-2} \mathrm {d}\theta \mathrm {d}\vartheta \Big ] |\varepsilon z|^2 \\&\quad \leqslant (1\vee 2^{p_*-2}) \mathbb {E}\Big [ p_* (p_*-1 )d ( |\Theta ^\varepsilon _t|^{p_*-2}+ |\varepsilon z|^{p_*-2}) \Big ] |\varepsilon z|^2 \\&\quad \leqslant (1\vee 2^{p_*-2}) p_* (p_*-1 )d \mathbb {E}\Big [ |\Theta ^\varepsilon _t|^{p_*-2}\Big ] \Big (|\varepsilon z|^2 + |\varepsilon z|^{p_*} \Big ) \end{aligned}$$

and

$$\begin{aligned}&\int _{\mathbb {R}^d} \mathbb {E}\Big [|\Theta ^\varepsilon _t+ \varepsilon z|^{p_*} -|\Theta ^\varepsilon _t|^{p_*}- p |\Theta ^\varepsilon _t|^{p_*-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \Big ] \nu (\mathrm {d}z)\\&\leqslant C_{p_*, d} \Big (\int _{\mathbb {R}^d} | z|^2 \nu (dz) + \int _{\mathbb {R}^d} | z|^{p_*} \nu (dz)\Big ) \varepsilon ^2 \mathbb {E}\Big [ |\Theta ^\varepsilon _t|^{p_*-2}\Big ]. \end{aligned}$$

Hence there is a positive constant K such that

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}]&\leqslant - \delta p_* \int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 K\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*-2} \Big ] \mathrm {d}s. \end{aligned}$$

(B.33)

For $p_*=2$ we have directly $ \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}]\leqslant \varepsilon ^2 K t. $ For $p_*>2$ we continue in (B.33) with Young’s inequality

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}]&\leqslant - \delta p_*\int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 K\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*-2} \Big ] \nonumber \\ {}&\leqslant - \delta p_*\int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 K\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*} \Big ] \mathrm {d}s+ \varepsilon ^2 Kt\nonumber \\ {}&\leqslant - \frac{\delta p_* }{2} \int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s + \varepsilon ^2 Kt \end{aligned}$$

for $\varepsilon <(\frac{\delta p_*}{2K})^{1/2}$. Grönwall’s lemma applied to the preceding estimate yields the a priori estimate $ \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}] \leqslant \varepsilon ^2 K t^2 =: \varepsilon ^2 R_0(t). $ Inserting the a priori estimate in (B.33) and using the Hölder inequality for $p_*>2$ we obtain

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}]&\leqslant - \delta p_*\int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 K\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*-2} \Big ] \mathrm {d}s\\&\leqslant - \delta p_*\int _0^t \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p_*}\Big ] \mathrm {d}s +\varepsilon ^2 K\int _0^t \mathbb {E}\Big [ |\Theta ^\varepsilon _s|^{p_*} \Big ]^\frac{p_*-2}{p_*}\mathrm {d}s\\&\leqslant \varepsilon ^{2+2 \frac{p_*-2}{p_*}} K^{1+ \frac{p_*-2}{p_*}} \int _0^t s^{2\frac{p_*-2}{p_*}}\mathrm {d}s=: \varepsilon ^{2+2 \frac{p_*-2}{p_*}} R_1(t). \end{aligned}$$

By induction we deduce after the i-th iterations of the bootstrap the estimate

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}] \leqslant \varepsilon ^{2 \sum _{j=0}^i (\frac{p_*-2}{p_*})^j} R_i(t) \end{aligned}$$

for a polynomial order function $R_i(t)$. Clearly, $\lim _{i\rightarrow \infty } 2 \sum _{j=0}^i \Big (\frac{p_*-2}{p_*}\Big )^j = p_*$ and therefore for any $0<p<p_*$ there is an iteration $i_0=i_0(p_*,p)$ such that we obtain $ \mathbb {E}[|\Theta ^\varepsilon _t|^{p_*}] \leqslant \varepsilon ^{p} R_{i_0}(t) $. This finishes the proof of the lemma. $\square $

Lemma B.5

Let $p_*>0$. Then for any $p\in (0,2\wedge p_*)$ there exists a positive constant $C_{p}$ such that for any $t\geqslant 0$ and $\varepsilon >0$ we have

$$\begin{aligned} \mathbb {E}[|X^\varepsilon _t(x)-X^0_t(x)|^{p}] \leqslant \varepsilon ^{p}(1+C_{p}\cdot t). \end{aligned}$$

Proof

Without loss of generality let $p_*\in (0,2]$. Itô’s formula yields for $\Theta ^\varepsilon _t = X^{\varepsilon }_t(x) -X^0_t(x)$ and the function $G(z)=|z|^{p}_c$

$$\begin{aligned} \mathrm {d}|\Theta ^\varepsilon _t|_c^p&= -p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _t, b(X^{\varepsilon }_t(x))-b(X^0_t(x)) \rangle \mathrm {d}t + p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _t, \varepsilon \Sigma ^{1/2} \mathrm {d}B_t \rangle \\&\qquad +\frac{\varepsilon ^2}{2} {{\,\mathrm{trace}\,}}(\Sigma ^{1/2} \mathrm {Hess}G(\Theta ^\varepsilon _t)(\Sigma ^{1/2})^{*}) \mathrm {d}t\\&\qquad + \int _{\mathbb {R}^d} \big (|\Theta ^\varepsilon _t+ \varepsilon z|_c^p -|\Theta ^\varepsilon _t|_c^p- p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \mathbf {1}\{|z|\leqslant 1\}\big ) \nu (\mathrm {d}z) \mathrm {d}t \\&\qquad + \int _{\mathbb {R}^d} \big (|\Theta ^\varepsilon _t+ \varepsilon z|_c^p -|\Theta ^\varepsilon _t|_c^p\big ) \tilde{N}(\mathrm {d}t, \mathrm {d}z). \end{aligned}$$

Taking expectation and using Hypothesis 1 we have

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|_c^p]&\leqslant c^p - p \delta \int _0^t \mathbb {E}[|\Theta ^\varepsilon _t|_c^{p-2}|\Theta ^{\varepsilon }_t|^2] \mathrm {d}s + \varepsilon ^2 \int _{0}^t {{\,\mathrm{trace}\,}}(\Sigma ^{1/2}\mathrm {Hess}G(\Theta ^\varepsilon _s)(\Sigma ^{1/2})^{*})\mathrm {d}s\\&\qquad +\, \int _0^t\int _{\mathbb {R}^d} \mathbb {E}\Big [|\Theta ^\varepsilon _t+ \varepsilon z|_c^p -|\Theta ^\varepsilon _t|_c^p- p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \mathbf {1}\{|z|\leqslant 1\}\Big ] \nu (\mathrm {d}z) \mathrm {d}s. \end{aligned}$$

Since $|x|^2=|x|^2_c-c^2$, we obtain

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|_c^p]&\leqslant c^p - p \delta \int _0^t \mathbb {E}[|\Theta ^\varepsilon _t|_c^{p} ]\mathrm {d}s +p\delta c^{p}t + \varepsilon ^2 c^{p-2}tC(p,d){{\,\mathrm{trace}\,}}(\Sigma ^{1/2}(\Sigma ^{1/2})^{*})\nonumber \\&\qquad +\, \int _0^t\int _{\mathbb {R}^d} \mathbb {E}\Big [|\Theta ^\varepsilon _t+ \varepsilon z|_c^p -|\Theta ^\varepsilon _t|_c^p- p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _t, \varepsilon z \rangle \mathbf {1}\{|z|\leqslant 1\}\Big ] \nu (\mathrm {d}z) \mathrm {d}s. \end{aligned}$$

(B.34)

In the sequel we estimate the second order term for small increments with the help of (B.4) by

$$\begin{aligned} \int _{0}^{t}&\int _{|z|\leqslant 1} \mathbb {E}\Big [|\Theta ^\varepsilon _s+ \varepsilon z|_c^p -|\Theta ^\varepsilon _s|_c^p- p |\Theta ^\varepsilon _t|_c^{p-2} \langle \Theta ^\varepsilon _s, \varepsilon z \rangle \Big ] \nu (\mathrm {d}z)\mathrm {d}s \nonumber \\& \leqslant C(p,d)\varepsilon ^2 c^{p-2} t \int _{|z|\leqslant 1} | z|^2\nu (\mathrm {d}z)=:K_1 \varepsilon ^2 c^{p-2} t. \end{aligned}$$

(B.35)

For the large increments, we use the mean value theorem and obtain

$$\begin{aligned}&\int _0^t\int _{|z|>1} \mathbb {E}\Big [|\Theta ^\varepsilon _s+ \varepsilon z|_c^p -|\Theta ^\varepsilon _s|_c^p\Big ] \nu (\mathrm {d}z)\mathrm {d}s= p\varepsilon \int _0^t\int _{|z|>1}\int _{0}^{1} \mathbb {E}[|\Theta ^\varepsilon _s+\theta \varepsilon z|^{p-1}_c ]| z|\mathrm {d}\theta \nu (\mathrm {d}z) \mathrm {d}s. \end{aligned}$$

For $p\in (0,1]$, note that $|x+y|^p_c\leqslant |x|^p+|y|^p+c^p$ for all $x,y \in \mathbb {R}^d$. Then we have for all $t\geqslant 0$

$$\begin{aligned} \int _0^t \int _{|z|>1} \mathbb {E}\Big [|\Theta ^\varepsilon _s+ \varepsilon z|_c^p -|\Theta ^\varepsilon _s|_c^p\Big ] \nu (\mathrm {d}z)\mathrm {d}s&\leqslant \int _0^t\int _{|z|>1} (\varepsilon ^{p}| z|^p+c^p) \nu (\mathrm {d}z)\mathrm {d}s\nonumber \\&=t\varepsilon ^p \int _{|z|>1}|z|^p \nu (\mathrm {d}z)+tc^p\nu (\{|z|>1\}). \end{aligned}$$

(B.36)

For $p>1$, due to $|x+y|^{p-1}_c\leqslant |x|^{p-1}+|y|^{p-1}+c^{p-1}$ for all $x,y\in \mathbb {R}^d$, we split the intermediate value as follows

$$\begin{aligned}&p\varepsilon \int _0^t\int _{|z|>1}\int _{0}^{1} \mathbb {E}[|\Theta ^\varepsilon _s+\theta \varepsilon z|^{p-1}_c ]| z|\mathrm {d}\theta \nu (\mathrm {d}z) \mathrm {d}s\nonumber \\&\leqslant p\varepsilon \int _0^t\int _{|z|>1} \mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p-1} \Big ]| z|\mathrm {d}\nu (\mathrm {d}z) \mathrm {d}s \nonumber \\&\qquad +\, p\varepsilon \int _0^t\int _{|z|>1} |\varepsilon z|^{p-1} |z|\nu (\mathrm {d}z) \mathrm {d}s +p\varepsilon c^{p-1}t \nu (\{|z|>1\}) \nonumber \\&= p \mathbb {E}\Big [\int _0^t \varepsilon |\Theta ^\varepsilon _s|^{p-1} \mathrm {d}s\Big ] \int _{|z|>1} | z|\mathrm {d}\nu (\mathrm {d}z) +p\varepsilon ^p t \int _{|z|>1} | z|^{p} \nu (\mathrm {d}z) +p\varepsilon c^{p-1}t \nu (\{|z|>1\}) \nonumber \\&\leqslant tp(1/K_3)^p \varepsilon ^p+\frac{p\delta }{2}\int _{0}^{t}\mathbb {E}\Big [|\Theta ^\varepsilon _s|^{p}_c\Big ]\mathrm {d}s+p\varepsilon ^p t \int _{|z|>1} | z|^{p} \nu (\mathrm {d}z)+p\varepsilon c^{p-1}t \nu (\{|z|>1\}), \end{aligned}$$

(B.37)

where we have used in the last line the following weighted Young inequality

$$\begin{aligned} \int _0^t K_2\varepsilon |\Theta ^\varepsilon _s|^{p-1} \mathrm {d}s&\leqslant (1/K_3)^p t K^p_2\varepsilon ^p+K_3^{p/(p-1)}\int _{0}^{t}|\Theta ^\varepsilon _s|^{p} \mathrm {d}s\\&\leqslant t(1/K_3)^p \varepsilon ^p+\frac{\delta }{2}\int _{0}^{t}|\Theta ^\varepsilon _s|^{p}_c \mathrm {d}s \end{aligned}$$

with $K_2=\int _{|z|>1} | z|\mathrm {d}\nu (\mathrm {d}z)+1$ and $K_3=({\delta }/{2})^{p/(p-1)}$ followed by $|y|\leqslant |y|_c$. Combining (B.35) with (B.37) for $p\geqslant 1$, and (B.36) with (B.37) for $p< 1$, respectively, in (B.34) we obtain

$$\begin{aligned} \mathbb {E}[|\Theta ^\varepsilon _t|_c^p]&\leqslant c^p - \frac{p \delta }{2} \int _0^t \mathbb {E}[|\Theta ^\varepsilon _t|_c^{p} ]\mathrm {d}s +p\delta c^{p}t + K_0\varepsilon ^2 c^{p-2}t+K_1\varepsilon ^2 c^{p-2} t \\&\quad +\, tp(1/K_3)^p \varepsilon ^p\cdot \mathbf {1}\{p\geqslant 1\} +p\varepsilon ^p t \int _{|z|>1} | z|^{p} \nu (\mathrm {d}z)+p\varepsilon c^{p-1}t \nu (\{|z|>1\}), \end{aligned}$$

where $K_0=C(p,d){{\,\mathrm{trace}\,}}(\Sigma ^{1/2}(\Sigma ^{1/2})^{*})$. Since $|x|^p\leqslant |x|^p_c$, the choice $c=c_\varepsilon =\varepsilon $ yields for all $t\geqslant 0$ $ \mathbb {E}[|\Theta ^\varepsilon _t|^p]\leqslant \varepsilon ^p (1+Ct) $ for some constant $C=C(p,\delta )$. This completes the proof of the lemma. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Barrera, G., Högele, M.A. & Pardo, J.C. The Cutoff Phenomenon in Wasserstein Distance for Nonlinear Stable Langevin Systems with Small Lévy Noise. J Dyn Diff Equat 36, 251–278 (2024). https://doi.org/10.1007/s10884-022-10138-1

Download citation

Received: 13 September 2021
Revised: 13 January 2022
Accepted: 24 January 2022
Published: 25 February 2022
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10884-022-10138-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Cutoff Phenomenon in Wasserstein Distance for Nonlinear Stable Langevin Systems with Small Lévy Noise

Abstract

Similar content being viewed by others

The Small-Mass Limit for Langevin Dynamics with Unbounded Coefficients and Positive Friction

Essential m-dissipativity and hypocoercivity of Langevin dynamics with multiplicative noise

Large deviation principle for the 2D stochastic Cahn–Hilliard–Navier–Stokes equations

1 Introduction

Hypothesis 1

Hypothesis 2

2 Setting and Main Results

2.1 Fine Properties of the Wasserstein Distance

Lemma 2.1

Proposition 1

2.2 Hartman–Grobman Asymptotics

Lemma 2.2

Remark 2.3

2.3 Main Results

Theorem 1

Theorem 2

Remark 2.4

Corollary 2.5

3 Examples

Example 3.1

Example 3.2

Remark 3.3

4 Proofs of the Main Results

4.1 The First Order Approximation

4.2 Derivation of the Cutoff Phenomenon

4.3 Proof of Theorem 1

Proof of Claim A

4.4 Proof of Theorem 2

Data Availability Statement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A. Existence of the Invariant Measure

1.1 Invariant distribution \(\mu ^\varepsilon \)

1.2 Convergence to \(\mu ^\varepsilon \) in \(\mathcal {W}_{p_*}\) for \(p_*>0\)

Appendix B. \(L^{p}\) Estimates for \(p\in (0,p_*)\)

1.1 Localization

Lemma B.1

Proof

1.2 First Order Approximation

Proposition 2

Proof

1.3 Asymptotic First Order Approximation

Lemma B.2

Proof

Lemma B.3

Proof

1.4 Auxiliary Moment Estimates

Lemma B.4

Proof

Lemma B.5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

1.2 Convergence to \(\mu ^\varepsilon \) in \(\mathcal {W}_{p_}\) for \(p_>0\)