1 Introduction

In this paper, we study the asymptotics of the ergodic behavior of the following stochastic differential equation (SDE)

$$\begin{aligned} \mathrm {d}X^\varepsilon _t(x) = -b(X^\varepsilon _t(x))\mathrm {d}t + \varepsilon \mathrm {d}L_t, \quad X^\varepsilon _0(x) = x\in \mathbb {R}^d \end{aligned}$$
(1.1)

for small noise intensity \(\varepsilon >0\), where the vector field \(b\in \mathcal {C}^2(\mathbb {R}^d,\mathbb {R}^d)\) satisfies \(b(0)=0\) and the following dissipative condition.

Hypothesis 1

(Dissipativity) There exists a constant \(\delta >0\) such that

$$\begin{aligned} \langle b(x)-b(y),x-y\rangle \geqslant \delta |x-y|^2 \qquad \text { for all } ~x,y\in \mathbb {R}^d. \end{aligned}$$
(1.2)

The noise process \(L=(L_t)_{t\geqslant 0}\) in (1.1) is a Lévy process with values in \(\mathbb {R}^d\) on a given probability space \((\Omega , \mathcal {F}, \mathbb {P})\). It is well-known that the law of L is characterized by the triplet \((a,\Sigma , \nu )\), where \(a\in \mathbb {R}^d\), \(\Sigma \in \mathbb {R}^{d \times d}\) is a non-negative definite matrix and \(\nu : \mathcal {B}(\mathbb {R}^d) \rightarrow [0, \infty ]\) is a locally finite Borel measure satisfying

$$\begin{aligned} \nu (\{0\}) = 0 \qquad \text{ and } \qquad \int _{\mathbb {R}^d} (1\wedge |z|^2) \nu (\mathrm {d}z) < \infty . \end{aligned}$$

For \(\nu =0\) the process L is a multidimensional Brownian motion with drift, while for \(a=0\) and \(\Sigma =0\) we have a multidimensional pure jump process such as compound Poisson processes or \(\alpha \)-stable processes, in particular, the Cauchy process for \(\alpha =1\). We refer to [1, 14, 16, 18, 22] for further details on Lévy processes. Under Hypothesis 1, it is known that the SDE (1.1) has a pathwise unique strong solution, see for instance Theorem 1.1 in [10], here denoted by \(X^\varepsilon (x):=(X^\varepsilon _t(x))_{t\geqslant 0}\). Moreover, \(X^\varepsilon (x)\) is a Markov process and, in particular, it satisfies the Feller property see Proposition 2.1 in [21].

In order to present the main results of this paper, we formally introduce the Wasserstein distance of order \(p_*\). We assume some finite moment for \(L_t\) and hence \(X^{\varepsilon }_t(x)\) for all \(t\geqslant 0\).

Hypothesis 2

(Finite \(p_*\)-th moment) There exists \(p_*>0\) such that

$$\begin{aligned} \int _{|z|>1} |z|^{p_*} \nu (\mathrm {d}z) < \infty . \end{aligned}$$

This article shows the cutoff phenomenon for the family of processes \((X^{\varepsilon }(x))_{\varepsilon >0}\) with respective invariant measures \((\mu ^\varepsilon )_{\varepsilon >0}\) under the Wasserstein distance \(\mathcal {W}_{p_*}\) of order \(p_*>0\). For \(p_*>1\) we characterize the following cutoff profile asymptotics

$$\begin{aligned} \mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )=\varepsilon \cdot C e^{-\mathfrak {q}r}+o(\varepsilon )\quad \text { for }\quad \varepsilon \rightarrow 0, \end{aligned}$$
(1.3)

where \(\mathfrak {t}_\varepsilon =\frac{1}{\mathfrak {q}}|\ln (\varepsilon )|+\frac{\ell -1}{\mathfrak {q}}\ln (|\ln (\varepsilon )|)\) for some explicit positive constants \(\mathfrak {q},\ell ,C\) that depend on x in terms of an \(\omega \)-limit set of the rotational part for the Hartman–Grobman linearization of \(X^0(x)\).

For such processes \((X^{\varepsilon }(x))_{\varepsilon >0}\) where (1.3) fails, we establish the following weaker window cutoff asymptotics

$$\begin{aligned}&\lim \limits _{r\rightarrow \infty }\limsup \limits _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )}{\varepsilon }=0\qquad \text {and}\\&\lim \limits _{r\rightarrow -\infty }\liminf \limits _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p_*}(\text{ Law }(X^\varepsilon _{\mathfrak {t}_\varepsilon +r}(x)),\mu ^\varepsilon )}{\varepsilon }=\infty . \end{aligned}$$

Our results generalize the results in [2] to the nonlinear vector field and [3, 5] and [6] to the Wasserstein distance which cover second order equations with degenerate noise. For a detailed introduction on the subject we refer to the aforementioned articles, in particular, see Table 1.1 in [3]. There is a particular advantage of studying this problem under the Wasserstein distance rather than in the total variation. While the Wasserstein distance only requires the existence of moments of \(X^{\varepsilon }(x)\) of a given order, the total variation distance needs existence of its density in addition to its regularity. The latter brings further requirements for the Lévy process L which can be quite restrictive, see [3] for further details. Furthermore the Wasserstein case, at least in case of \(X^{\varepsilon }(x)\) moments of order \(p> 1\), the cutoff phenomenon of \((X^{\varepsilon }(x))_{\varepsilon > 0}\) is completely determined by an explicit function (see Theorem 2 below), here called as cutoff profile. On the contrary, in the total variation case the profile function can be very involved and even hard to simulate in examples.

In [4], the cutoff phenomenon with respect to the total variation distance covering SDEs of the type (1.1) in the one dimensional case, L being a standard Brownian motion and with general drift coefficient b (satisfying Hypothesis 1) is studied. Since scalar systems are gradient systems, there is always a cutoff profile which can be given explicitly in terms of the Gauss error function. The follow-up work [5] covers the multidimensional case, where the picture is considerably richer, due to the presence of strong and complicated rotational patterns. The authors characterize sharply the existence of a cutoff profile in terms of the omega limit sets appearing in the long-term behavior of the matrix exponential function \(e^{-\mathcal {Q} t}x\) in Lemma B.2 in [5], which plays an analogous role in this article. The paper [6] is the first attempt to study the cutoff phenomenon for such models with jumps. More precisely, [6] covers the cutoff phenomenon with respect to the total variation distance of the generalized Ornstein-Uhlenbeck processes. The previous process satisfies an SDE of the form (1.1) with L being a Lévy process and \(b(x)=\mathcal {Q} x\), where \(\mathcal {Q}\) is a square real matrix whose eigenvalues have positive real parts. The proof methods are based on concise Fourier inversion techniques. Due to the aforementioned regularity inherited by the total variation, the results in [6] are given under the hypothesis of continuous densities of the marginals, which to date is mathematically not characterized in simple terms. The cutoff profile function in [6] is given in terms of the Lévy-Ornstein-Uhlenbeck limiting measure for \(\varepsilon =1\) and measured in the total variation distance. Such profile functions are theoretically highly insightful, but almost impossible to calculate and simulate in examples. The characterization of the existence of a cutoff-profile remains analogously to [5] in abstract terms of the behavior of the mentioned profile function on a suitably defined omega limit set. The Wasserstein case is treated in [2] where, contrary to the total variation case, it is noted that the profile function takes an explicit and simple shape. Finally, [3] treats the cutoff phenomenon with respect to the total variation distance for (1.1) with b satisfying Hypothesis 1 and driven by a Lévy process in the rather restrictive class of strongly locally layered stable processes (see Definition 1.4 in [3]).

In this article we combine a nonlinear version of the Wasserstein estimates of [2], with the Freidlin-Wentzell first order approximation of (1.1) in the spirit of [3] and the fine properties of the Wasserstein distance given in Lemma 2.1, in particular, the non-standard shift linearity of Lemma 2.1.d).

The manuscript is organized in four parts. After the exposition of the setting and the presentation of the main results in Sect. 2, we illustrate our findings for the nonlinear Fermi–Pasta–Ulam–Tsingou gradient system and a class of nonlinear oscillators in Sect. 3. The main steps of the proof of the cutoff phenomenon are given in Sect. 4 while the auxiliary technical such as exponential ergodicity in Wasserstein distance, the coupling between the original nonlinear system and the Freidlin-Wentzell linearization results are given in the “Appendix”.

2 Setting and Main Results

2.1 Fine Properties of the Wasserstein Distance

For any two probability distributions \(\mu _1\) and \(\mu _2\) on \(\mathbb {R}^d\) with finite \(p_*\)-th moment for some \(p_*>0\), we define the Wasserstein \(p_*\)-distance between them as follows

$$\begin{aligned} \mathcal {W}_{p_*}(\mu _1,\mu _2)= \inf _{\Pi } \left( \int _{\mathbb {R}^d\times \mathbb {R}^d}|u-v|^{p_*}\Pi (\mathrm {d}u,\mathrm {d}v)\right) ^{1\wedge (1/p_*)}, \end{aligned}$$

where the infimum is taken over all couplings (joint distributions on \(\mathbb {R}^d\times \mathbb {R}^d\)) \(\Pi \) with marginals \(\mu _1\) and \(\mu _2\). We refer to [12, 20] and references therein for more details. For convenience of notation we do not distinguish a random variable U and its law \(\mathbb {P}_U\) as an argument of \(\mathcal {W}_{p_*}\). That is, for random variables \(U_1\), \(U_2\) and probability measure \(\mu \) we write \(\mathcal {W}_{p_*}(U_1, U_2)\) instead of \(\mathcal {W}_{p_*}(\mathbb {P}_{U_1}, \mathbb {P}_{U_2})\), \(\mathcal {W}_{p_*}(U_1, \mu )\) instead of \(\mathcal {W}_{p_*}(\mathbb {P}_{U_1}, \mu )\) etc. The next result establishes properties of the Wasserstein distance which turn out to be important for our arguments.

Lemma 2.1

(Properties of \(\mathcal {W}_{p_*}\)) For \(p_*>0\), \(u_1,u_2\in \mathbb {R}^d\), \(c\in \mathbb {R}\) and \(U_1\) and \(U_2\) being random vectors in \(\mathbb {R}^d\) with finite \(p_*\)-th moment we have the following:

  1. (a)

    The Wasserstein distance \(\mathcal {W}_{p_*}\) is a metric.

  2. (b)

    Translation invariance: \(\mathcal {W}_{p_*}(u_1+U_1,u_2+U_2)=\mathcal {W}_{p_*}(u_1-u_2+U_1,U_2)\).

  3. (c)

    Homogeneity:

    $$\begin{aligned} \mathcal {W}_{p_*}(c\cdot U_1,c\cdot U_2)= {\left\{ \begin{array}{ll} |c|\;\mathcal {W}_{p_*}(U_1,U_2)&{}\text { for } p_*\in [1,\infty ),\\ |c|^{p_*}\;\mathcal {W}_{p_*}(U_1,U_2)&{}\text { for } p_*\in (0,1). \end{array}\right. } \end{aligned}$$
  4. (d)

    Shift linearity: For \(p_*\geqslant 1\) it follows

    $$\begin{aligned} \mathcal {W}_{p_*}(u_1+U_1,U_1)=|u_1|. \end{aligned}$$
    (2.1)

    For \(p_*\in (0,1)\) we have

    $$\begin{aligned} \max \{|u_1|^{p_*}-2\mathbb {E}[|U_1|^{p_*}],0\}\leqslant \mathcal {W}_{p_*}(u_1+U_1,U_1)\leqslant |u_1|^{p_*}. \end{aligned}$$
    (2.2)
  5. (e)

    Domination: For any given coupling \(\tilde{\Pi }\) between \(U_1\) and \(U_2\) it follows

    $$\begin{aligned} \mathcal {W}_{p_*}(U_1, U_2) \leqslant \Big (\int _{\mathbb {R}^d\times \mathbb {R}^d} |v_1-v_2|^{p_*} \tilde{\Pi }(\mathrm {d}v_1,\mathrm {d}v_2)\Big )^{1\wedge (1/p_*)}. \end{aligned}$$
  6. (f)

    Characterization: Let \((U_n)_{n\in \mathbb {N}}\) be a sequence of random vectors with finite \(p_*\)-th moments and U a random vector with finite \(p_*\)-th moment. Then the following statements are equivalent:

    1. (1)

      \(\mathcal {W}_{p_*}(U_n, U) \rightarrow 0\) as \(n\rightarrow \infty \).

    2. (2)

      \(U_n {\mathop {\longrightarrow }\limits ^{d}} U\) as \(n \rightarrow \infty \) and \(\mathbb {E}[|U_n|^{p_*}] \rightarrow \mathbb {E}[|U|^{p_*}]\) as \(n\rightarrow \infty \).

For \(p_*\in (0,1)\) equality (2.1) is false in general, see Remark 2.4 in [2]. The proof of the previous lemma is given in Lemma 2.2 in [2].

The following result yields the existence of a unique invariant distribution for (1.1) under Hypotheses 1 and 2. Moreover, under the Wasserstein distance, the strong solution of (1.1) is exponentially ergodic.

Proposition 1

(Existence of a unique invariant distribution) Under Hypothesis 1 for \(p_*>0\) and Hypothesis 2 there exists a unique invariant probability measure \(\mu ^\varepsilon \) such that

$$\begin{aligned} \mathcal {W}_{p_*}(X^\varepsilon _t(x),\mu ^{\varepsilon })\leqslant e^{-({1\wedge p_*}) \delta t} \left( |x|^{1\wedge p_*}+\int _{\mathbb {R}^d}|y|^{1\wedge p_*}\mu ^{\varepsilon }(\mathrm {d}y) \right) . \end{aligned}$$
(2.3)

The proof is given in “Appendix 1”.

2.2 Hartman–Grobman Asymptotics

The zeroth-order approximation of a smooth dynamical systems on a finite time horizon [0, T] subject to small perturbations is given by the deterministic system, that is, \((X^0_t(x))_{t\in [0,T]}\). Our main results treat small asymptotics close to the stable state 0 which translates to meaningful time scales \(t_\varepsilon \rightarrow \infty \), as \(\varepsilon \rightarrow 0\), in Theorems 1 and 2. Before we state our main result, we first provide the long-time asymptotics of \(X^0_t(x)\) in terms of the spectral decomposition of the solution \(t\mapsto e^{-Db(0)t}x^*\) of the respective linear system for some \(x^*\) in a small neighbourhood of the origin.

Lemma 2.2

(Asymptotic Hartman–Grobman) Assume Hypothesis 1. Then for any \(x\in \mathbb {R}^{d}\setminus \{0\}\) there exist:

  1. (i)

    positive constants \( \mathfrak {q}^x, \tau ^x,\ell ^x, m^x\) with \(\ell ^x,m^x\in \{1,\ldots ,d\}\),

  2. (ii)

    angular velocities \(\theta ^{x}_{1},\dots ,\theta ^x_{m^x}\in \mathbb {R}\), where all \(\theta ^x_k \ne 0\) come in pairs \((\theta ^x_{j_*},\theta ^x_{j_*+1})=(\theta ^x_{j_*}, -\theta ^x_{j_*})\),

  3. (iii)

    linearly independent vectors \(v_1^x,\dots ,v_{m_x}^x\) in \(\mathbb {C}^d\) which are complex conjugate \((v^x_{j_*},v^x_{j_*+1})=(v^x_{j_*}, \bar{v}^x_{j_*})\) whenever \((\theta ^x_{j_*},\theta ^x_{j_*+1})=(\theta ^x_{j_*}, -\theta ^x_{j_*})\),

such that

$$\begin{aligned} \lim _{t \rightarrow \infty } \left| \frac{e^{\mathfrak {q}^x t}}{t^{\ell ^x-1}} X^0_{t+\tau ^x}(x) - \sum _{k=1}^{m^x} e^{i\theta ^x_k t}v^x_k \right| =0. \end{aligned}$$
(2.4)

Moreover,

$$\begin{aligned} 0<\liminf _{t\rightarrow \infty }\left| \sum _{k=1}^{m^x} e^{i t\theta ^x_k} v^x_k\right| \leqslant \limsup _{t\rightarrow \infty }\left| \sum _{k=1}^{m^x} e^{i t\theta ^x_k} v^x_k\right| \leqslant \sum _{k=1}^{m^x} |v^x_k|. \end{aligned}$$
(2.5)

The formal proof of the previous lemma is given in Lemma B.2 in Appendix B of [5].

Remark 2.3

  1. (1)

    Convention: Note that \(\theta ^x_k=0\) is true for at most one index \(k\in \{1,\ldots , m^x\}\). If such an index shows up in \(\theta ^x_{1},\ldots , \theta ^x_{m^x}\) we adopt the convention that \(\theta ^x_1=0\) and \(v_1^x\in \mathbb {R}^d\), and hence \(m^x=2n+1\) for some \(n\in \mathbb {N}_0\). Otherwise, \(m^x=2n\) for some \(n\in \mathbb {N}_0\) and we eliminate \(\theta ^x_1\) and count the angular velocities as follows \(\theta ^x_2,\ldots , \theta ^x_{2n+1}\).

  2. (2)

    Note that the linearly independent complex vectors \(v_1^x,\dots ,v_{m_x}^x\) in \(\mathbb {C}^d\) not only depend on x but also crucially on the dissipation time \(\tau ^x\) of the deterministic system to a Hartman–Grobman domain of conjugacy U. We stress that \(\tau ^x\) is not unique since \(X^0_{t+\tau ^x}(x) \in U\) for all \(t\geqslant 0\).

  3. (3)

    A word about the parameters \(\ell ^x\), \(\mathfrak {q}^x\) and \(m^x\) in Lemma 2.2. By the Hartman–Grobman theorem there are open sets \(0\in U, V\subset \mathbb {R}^d\) and a homeomorphism \(H: U\rightarrow V\) with \(H(0) = 0\) satisfying for all \(u\in U\) and \(t\geqslant 0\)

    $$\begin{aligned} H(X^0_t(u)) = e^{- Db(0) t}H(u). \end{aligned}$$
    (2.6)

    In fact, by Hypothesis 1 we have that H is a \(\mathcal {C}^1\)-diffeomorphism, see the original paper [8] or Theorem(Hartman), Sec. 2.8, p.127, [13]. In [8] it is shown that H can be chosen to be

    $$\begin{aligned} H(x) = x + o(|x|)_{|x| \rightarrow 0}. \end{aligned}$$

    Let \(\tilde{u} = X^0_{\tau ^x}(x) \in U\). With the help of a linear coordinate change W we obtain the Jordan normal form \(Db(0) = W^{-1} J(Db(0)) W\) and (using the linearity of the semigroup)

    $$\begin{aligned} H(X^0_{t+\tau ^x}(x)) = W^{-1} e^{- J(Db(0)) t} (W H(\tilde{u})). \end{aligned}$$

    We denote \(\tilde{w} = W H(\tilde{u})\). Now, the parameters \(\ell ^x\), \(\mathfrak {q}^x\) and \(m^x\) are given as follows. Consider the sequence of generalized eigenspaces \(H_{j}\) of J(Db(0)) such that

    $$\begin{aligned} \mathbb {R}^d = H_1\oplus \dots \oplus H_{k_*}. \end{aligned}$$

    By construction, \(\tilde{w} \in G(\tilde{w}) := \text{ span }(\{H_k~|~\text{ where } 1\leqslant k\leqslant k_*: ~\text{ proj }(\tilde{w}, H_k)\ne 0\})\). Note that \(G(\tilde{w})\) is unique. We consider the restriction

    $$\begin{aligned} \tilde{J}(\tilde{w}):= J(Db(0))\big |_{G(\tilde{w})}. \end{aligned}$$

    Now, \(\mathfrak {q}^x\) is the smallest real part of the spectrum of \(\tilde{J}(\tilde{w})\), \(\ell ^x\) is the dimension of the largest Jordan block of \(\tilde{J}(\tilde{w})\) which has the real part \(\mathfrak {q}^x\) and \(m^x\) is the number of Jordan blocks associated to \(\mathfrak {q}^x\) and \(\ell ^x\). Note that in case of a non real eigenvalue with real part \(\mathfrak {q}^x\) and Jordan block size \(\ell ^x\), we have \(m^x\geqslant 2\). For an extensive numerical example for a linear chain of oscillators we refer to Sect. 4.3.2 in [2].

2.3 Main Results

Our first main result establishes \(\infty /0\) collapse of the Wasserstein distance between the law of the current state \(X^{\varepsilon }_t(x)\) and the dynamical equilibrium \(\mu ^\varepsilon \) along the critical time scale \(\mathfrak {t}^x_\varepsilon \) given in (2.7) under mild conditions.

Theorem 1

(Window cutoff) Let b satisfy Hypothesis 1 and \(\nu \) satisfy Hypothesis 2 for some \(p_*>0\). Fix \(x\in \mathbb {R}^d\setminus \{0\}\) and consider the notation in the asymptotic Hartman–Grobman representation \(\mathfrak {q}^x>0\), \(\ell ^x , m^x \in \{1,\ldots , d\}\), \(\theta ^x_1,\dots ,\theta ^x_{m^x} \in [0,2\pi )\), \(v^x_1,\dots ,v^x_{m^x} \in \mathbb {C}^d\) and \(\tau ^x>0\) of Lemma 2.2.

Then the family of processes \((X^{\varepsilon }(x))_{\varepsilon >0}\) exhibits a window cutoff phenomenon on the time scale

$$\begin{aligned} \mathfrak {t}^x_\varepsilon =\frac{1}{\mathfrak {q}^x}|\ln (\varepsilon )|+\frac{\ell ^x-1}{\mathfrak {q}^x}\ln (|\ln (\varepsilon )|) \end{aligned}$$
(2.7)

and for all asymptotically constant window sizes \(w_\varepsilon \), that is, \(w_\varepsilon \rightarrow w>0\) as \(\varepsilon \rightarrow 0\), in the following sense. For all \(0<p< p^*\) we have

$$\begin{aligned}&\lim _{r\rightarrow \infty }\limsup _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{{1\wedge p}}}=0 \qquad \text{ and } \nonumber \\&\lim _{r\rightarrow -\infty }\liminf _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\infty . \end{aligned}$$
(2.8)

The second main result provides two characterizations for the proper limits (\(\varepsilon \rightarrow 0\)) of the expressions in (2.8) for any fixed \(r\in \mathbb {R}\). That is to say, we characterize under which conditions the asymptotics (1.3) is satisfied. In addition, it yields the precise shape of the limit which turn out to be a simple exponential function for \(p\in [1,p_*)\).

Theorem 2

(Dynamical profile cutoff characterization for \(p_*>0\)) Let the assumptions (and the notation) of Theorem 1 be valid for some \(p_*>0\). Consider the unique strong solution \((\mathcal {O}_t)_{t\geqslant 0}\) of the linear system

$$\begin{aligned} \mathrm {d}\mathcal {O}_t=-Db(0)\mathcal {O}_t+\mathrm {d}L_t, \end{aligned}$$
(2.9)

where \(\mathcal {O}_\infty \) is the unique invariant probability distribution of (2.9).

  1. (1)

    Then for any \(0<p< p_*\) the following statements are equivalent.

    1. (i)

      For any \(\lambda >0\), the function \(\omega (x)\ni u\mapsto \mathcal {W}_{p}(\lambda u+\mathcal {O}_\infty ,\mathcal {O}_\infty )\) is constant, where

      $$\begin{aligned} \omega (x):= \Big \{ \text {accumulation points of } \sum _{k=1}^m e^{i t \theta ^x_k} v^x_k \text { as } t\rightarrow \infty \Big \}. \end{aligned}$$
    2. (ii)

      The family of processes \((X^{\varepsilon }(x))_{\varepsilon >0}\) exhibits a profile cutoff for any \(0< p< p_*\) as follows

      $$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}= \mathcal {P}^{x}_{p}(r) \quad \text { for any } r\in \mathbb {R}, \end{aligned}$$

      where

      $$\begin{aligned} \mathcal {P}^{x}_{p}(r):=\mathcal {W}_{p}\Big (\kappa ^x(r)\cdot v+ \mathcal {O}_\infty ,\mathcal {O}_\infty \Big ) \qquad \text{ for } \text{ any } v\in \omega (x) \end{aligned}$$
      (2.10)

      and

      $$\begin{aligned} \kappa ^x(r)= \frac{e^{-\mathfrak {q}^x r\cdot w}}{e^{\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{\ell ^x-1}}. \end{aligned}$$
  2. (2)

    For \(p_*> 1\) and \(p\in [1,p_*)\) the profile has the shape

    $$\begin{aligned} \mathcal {P}^{x}_{p}(r)=\kappa ^x(r)\cdot |v|\quad \text { for all }v\in \omega (x) \end{aligned}$$

    if and only if \(\omega (x)\) is contained in a sphere in \(\mathbb {R}^d\) with respect to the Euclidean norm.

  3. (3)

    We recall the convention of Remark 2.3. Let \(p_*> 1\) and \(p\in [1,p_*)\). If the angles \(\theta ^x_{2},\ldots , \theta ^x_{2n}\) satisfy the following non-resonance condition

    $$\begin{aligned} h_1\theta _2 + \ldots + h_n \theta _{2n} \in 2 \pi \cdot \mathbb {Z} \qquad \text{ for } \text{ all } (h_1, \ldots , h_n)\in \mathbb {Z}^n\setminus \{0\}, \end{aligned}$$
    (2.11)

    then the statements (i) and (ii) in item (1) are equivalent to the following normal growth condition of the asymptotic Hartman–Grobman linearization: The family of limiting vectors \((v_1^x,\mathsf {Re}\,v^x_2,\mathsf {Im}\, v^x_2,\ldots ,\mathsf {Re}\, v^x_{2n},\mathsf {Im}\, v^x_{2n})\) is orthogonal in \(\mathbb {R}^d\) and satisfies

    $$\begin{aligned} |\mathsf {Re}\, v^x_{2k}|=|\mathsf {Im}\, v^x_{2k}|\qquad \text{ for } \text{ all } \quad k=1,\ldots ,n. \end{aligned}$$

Remark 2.4

We stress that \(\mathcal {O}_\infty = \lim _{t\rightarrow \infty } \mathcal {O}_t\) in \(\mathcal {W}_{p_*}\) and due to Hypothesis 1 (in combination with Hypothesis 2) the distribution of \(\mathcal {O}_\infty \) does not depend on any deterministic initial condition of (2.9).

Due to its relevance as physical observables, we formulate the corresponding window cutoff result for the respective moments.

Corollary 2.5

(Moments cutoff) Let the assumptions (and the notation) of Theorem 1 be valid for some \(p_*>0\). Then for any \(0< p<p_*\) it follows

$$\begin{aligned} \lim _{r\rightarrow \infty } \liminf _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}&=\lim _{r\rightarrow \infty } \limsup _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}= \mathbb {E}[|\mathcal {O}_{\infty }|^{p}],\\ \lim _{r\rightarrow -\infty } \liminf _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{p}}&=\lim _{r\rightarrow -\infty } \limsup _{\varepsilon \rightarrow 0}\frac{\mathbb {E}[|X^{\varepsilon }_{\mathfrak {t}_\varepsilon ^x+r\cdot w_\varepsilon }(x)|^{p}]}{\varepsilon ^{ p}}=\infty . \end{aligned}$$

3 Examples

In this section we present two examples which illustrate the applicability of Theorem 1 and Theorem 2 to nonlinear dynamics with degenerate noise.

Example 3.1

(The Fermi–Pasta–Ulam–Tsingou potential) We consider the nonlinear Langevin gradient system

$$\begin{aligned} \mathrm {d}X^\varepsilon _t = - \nabla \mathcal {U}(X^\varepsilon _t)\mathrm {d}t + \varepsilon \mathrm {d}L_t \end{aligned}$$
(3.1)

for the strongly convex quartic Fermi–Pasta–Ulam–Tsingou potential \(\mathcal {U}(x) = \frac{1}{2} |x|^2 + \frac{1}{4}|x|^4\), \(x\in \mathbb {R}^d\) subject to degenerate noise \(\mathrm {d}L_t\). For any Lévy process L satisfying Hypothesis 2 for some \(p_*>0\) the system (3.1) exhibits a profile cutoff due to Theorem 2 where the cutoff time is given by \(\mathfrak {t}_\varepsilon ^x = |\ln (\varepsilon )|\). For \(p_*>1\) and any \(p\in [1,p_*)\) the profile function in \(\mathcal {W}_{p}\) is always of the following exponential shape

$$\begin{aligned} \mathcal {P}^{x}_{p}(r)=e^{-wr- \tau ^x}\Big |\sum _{k=1}^m v^x_k\Big |, \end{aligned}$$
(3.2)

where \(\tau ^{x}:=\min \{t\geqslant 0:|X^0_t(x)|\leqslant R_0/2\}\) and \(R_0\) being an small radius inside of which Hartman–Grobman conjugation is valid. Note that \(\tau ^x\) can be replaced by any upper bound of \(\tau ^x\) such as for instance \((1/\delta )\ln (2|x|/R_0)\) given by Hypothesis 1.

In particular, the profile cutoff (3.2) is valid for \(L=L^\alpha \) being an (possibly degenerate) \(\alpha \)-stable process with index \(\alpha \in (1,2]\). Note that for the limiting case of a possibly degenerate Cauchy process (\(\alpha =1\)) and in fact of any \(L^\alpha \) with index \(\alpha \in (0,1)\), Theorem 2 also yields a profile cutoff. However, the profile function remains not explicit. This is due to the absence of a finite first moment and the lack of the shift linearity (2.2). In other words, the profile function is given in (2.10) for \(p\in (0,\alpha )\) and up to our knowledge unknown how to simplify further. Note that the case of \(\alpha \in (0,3/2]\) is new and is not covered in [3].

Example 3.2

(Nonlinear non-gradient with degenerate noise) For \(F,\mathcal {H}\in \mathcal {C}^2(\mathbb {R}^2,\mathbb {R})\) we consider the following perturbed simple harmonic oscillator with unit angular frequency given in Section 4 of [19] subject to a small noise perturbation

$$\begin{aligned} \mathrm {d}\left( \begin{matrix} X^{\varepsilon ,1}_t \\ X^{\varepsilon ,2}_t \end{matrix} \right) =- \left( \begin{array}{c} X^{\varepsilon ,2}_t \,F(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t)-\partial _1 \mathcal {H}(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t) \\ -X^{\varepsilon ,1}_t \,F(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t)-\partial _2 \mathcal {H}(X^{\varepsilon ,1}_t,X^{\varepsilon ,2}_t) \end{array} \right) \mathrm {d}t+ \varepsilon \mathrm {d}\left( \begin{matrix} 0 \\ \mathcal {L}_t \end{matrix} \right) , \end{aligned}$$

where \(\mathcal {L}=(\mathcal {L}_t)_{t\geqslant 0}\) is a one dimensional Lévy process with finite \(p_*\)-th moments. The Jacobian matrix \(Jb(v_1,v_2)\) at \((v_1,v_2)\) of the respective vector field \(b:\mathbb {R}^2\rightarrow \mathbb {R}^2\) is given by

$$\begin{aligned} \left( \begin{matrix} v_2\partial _1 F(v_1,v_2)-\partial _{11}\mathcal {H}(v_1,v_2) &{} F(v_1,v_2)+v_2\partial _2 F(v_1,v_2)-\partial _{12}\mathcal {H}(v_1,v_2)\\ -F(v_1,v_2)-v_1\partial _{1}F(v_1,v_2)-\partial _{12}\mathcal {H}(v_1,v_2) &{} -v_1\partial _{2}F(v_1,v_2)- \partial _{22}\mathcal {H}(v_1,v_2) \end{matrix} \right) . \end{aligned}$$

It is enough to prove the existence of a positive constant \(\delta \) such that for any \(u_1,u_2,v_1,v_2\in \mathbb {R}\) it follows

$$\begin{aligned} (u_1,u_2) Jb(v_1,v_2)(u_1,u_2)^*&=(v_2\partial _1 F(v_1,v_2)-\partial _{11}\mathcal {H}(v_1,v_2))u^2_1\nonumber \\&\qquad +\, (-v_1\partial _2 F(v_1,v_2)-\partial _{22}\mathcal {H}(v_1,v_2))u^2_2 \nonumber \\&\qquad +\, (v_2\partial _2 F(v_1,v_2)-v_1\partial _{1}F(v_1,v_2)-2\partial _{12}\mathcal {H}(v_1,v_2))u_1u_2 \nonumber \\&\geqslant \delta (u^2_1+u^2_2). \end{aligned}$$
(3.3)

For instance, for a nonlinear perturbation of a linear oscillator, that is, \(F(v_1,v_2)=\eta \) for some \(\eta >0\), the preceding condition reads

$$\begin{aligned} -\Big (\partial _{11}\mathcal {H}(v_1,v_2)u^2_1+\partial _{22}\mathcal {H}(v_1,v_2)u^2_2+2\partial _{12}\mathcal {H}(v_1,v_2)u_1u_2\Big )\geqslant \delta (u^2_1+u^2_2). \end{aligned}$$

For \(\mathcal {L}\) satisfying Hypothesis 2 with \(p_*\), and F, \(\mathcal {H}\) fulfilling (3.3) Theorem 1 implies window cutoff for any initial condition \((X^{\varepsilon ,1}_0,X^{\varepsilon ,2}_0)=x\in \mathbb {R}^2\setminus \{0\}\) and any \(p\in (0,p_*)\). The cutoff time is given by

$$\begin{aligned} \mathfrak {t}^x_\varepsilon = \frac{1}{\mathfrak {q}^x} |\ln (\varepsilon )|+\frac{\ell ^x-1}{\mathfrak {q}^x}\ln (|\ln (\varepsilon )|). \end{aligned}$$

Note that this result is new even in the Brownian case since the results of [3] and [5] are stated for the total variation distance which requires regularity on the transition probabilities given in the setting of non-degenerate noise. In our case, the Wasserstein distance circumvents this difficulty by the continuity of \(\mathcal {W}_{p}(x+X,X)\) for any \(X\in L^{p}\) as \(|x|\rightarrow 0\) and \(|x|\rightarrow \infty \), while for total variation distance it requires absolutely continuity on the distribution of X. We refer to [3], Lemma 1.17 in Subsection 1.3.5, for an example where the continuity of the total variation distance under shifts is not valid.

In the sequel, we characterize the existence of a profile cutoff under (3.3) in terms of the linearization at the stable state (0, 0). Let \(a:=-\partial ^2_{11}\mathcal {H}(0,0)\) \(b:=-\partial ^2_{22}\mathcal {H}(0,0)\), \(c:=-\partial _{12}\mathcal {H}(0,0)\) and \(\eta _0:=-F(0,0)\). Then

$$\begin{aligned} Jb(0,0)= \left( \begin{matrix} a &{} -\eta _0+c\\ \eta _0+c &{} b \end{matrix} \right) . \end{aligned}$$

Note that \(\eta _0=c\) implies that the eigenvalues of Jb(0, 0) are the numbers a and b which are positive and hence by Theorem 2 profile cutoff is valid. In the sequel we assume \(\eta _0 \ne c\). Then the eigenvalues of Jb(0, 0) are given by

$$\begin{aligned} \lambda _{\pm }:=\frac{(a+b)\pm \sqrt{\Delta }}{2},\quad \Delta :=(a-b)^2+4(c^2-\eta ^2_0), \end{aligned}$$

with corresponding eigenvectors

$$\begin{aligned} v_{\pm }:= \left( 1,-\frac{a-b\mp \sqrt{\Delta }}{2(-\eta _0+c)}\right) . \end{aligned}$$

In addition,

$$\begin{aligned} \mathsf {Re}(v_{\pm })={\left\{ \begin{array}{ll} \left( 1,-\frac{a-b\mp \sqrt{\Delta }}{2(-\eta _0+c)}\right) &{} \text {if } \Delta \geqslant 0,\\ \left( 1,-\frac{a-b}{2(-\eta _0+c)}\right) &{} \text {if } \Delta< 0, \end{array}\right. }\qquad \mathsf { and } \qquad \mathsf {Im}(v_{\pm })={\left\{ \begin{array}{ll} \left( 0,0\right) &{} \text {if } \Delta \geqslant 0,\\ \pm \left( 0, \frac{ \sqrt{|\Delta |} }{2(-\eta _0+c)}\right) &{} \text {if } \Delta < 0. \end{array}\right. } \end{aligned}$$

For \(\Delta \geqslant 0\) Theorem 2 yields a profile cutoff phenomenon. For \(\Delta <0\) Theorem 1 implies the weaker window cutoff phenomenon, however, by part (3) of Theorem 2 the stronger profile cutoff for \(p_*>1\) and \(p\in [1,p_*)\) is valid if and only if

$$\begin{aligned} |\mathsf {Re}(v_{+})|^2=|\mathsf {Im}(v_{+})|^2 \text { and } \langle \mathsf {Re}(v_{+}),\mathsf {Im}(v_{+}) \rangle =0 \end{aligned}$$

which is equivalent to special case \(a=b\) and \(c=0\). In other words, \(e^{-Jb(0,0)t}=e^{-at}R(\theta t)\), where \(R(\theta t)\) is an orthogonal \(2\times 2\) matrix with angle \(\theta t\).

Remark 3.3

(A word about the linear dynamics) In [2] the authors study (1.1) for the linear vector field \(b(x)=\mathcal {Q}x\) for any Hurwitz stable matrix \(-\mathcal {Q}\), that is, \(\mathsf {Re}(\lambda )<0\) for any eigenvalue \(\lambda \) of \(-\mathcal {Q}\). Under these assumptions, the results of Theorem 1 and Theorem 2 are obtained.

It is not hard to see that Hypothesis 1 implies \(\mathsf {Re}(\lambda )\leqslant -\delta \) for any eigenvalue \(\lambda \) of \(-\mathcal {Q}\) and hence Hurwitz stability. However, the dissipativity condition (1.2) which is assumed in order to control the nonlinear vector field, is strictly stronger than Hurwitz stability. For instance, the vector field \(b:\mathbb {R}^2 \rightarrow \mathbb {R}^2\) given by \(b(x)=\mathcal {Q}x\) with

$$\begin{aligned} -\mathcal {Q}= \left( \begin{matrix} 0 &{} -1\\ \lambda &{} \lambda \end{matrix} \right) \text { with } \lambda \in (0,1/2) \end{aligned}$$

has eigenvalues with real part \(-\lambda /2<0\), but it does not satisfy Hypothesis 1. Note that the dissipativity condition (1.2) is not even satisfied locally in a neighborhood of the origin.

4 Proofs of the Main Results

4.1 The First Order Approximation

We define the Freidlin-Wentzell first order approximation given by

(4.1)

where \((\mathcal {Y}^{x}_t)_{t\geqslant 0}\) is the unique strong solution of the linear inhomogeneous SDE

(4.2)

In [3], Lemma C.4 in Section C.4 it is shown that \(Y^{\varepsilon }_t(x)\) converges in total variation distance to a unique limiting distribution \(\mu ^\varepsilon _*\) as \(t\rightarrow \infty \). Moreover, it is shown there that \(\mu ^\varepsilon _*{\mathop {=}\limits ^{d}}\varepsilon \mathcal {O}_\infty \), where \(\mathcal {O}_\infty \) is the unique invariant probability distribution of the homogeneous Ornstein-Uhlenbeck dynamics

$$\begin{aligned} \mathrm {d}\mathcal {O}_t=-Db(0)\mathcal {O}_t+\mathrm {d}L_t. \end{aligned}$$
(4.3)

In the sequel we reduce the nonlinear ergodic convergence of \(X^\varepsilon _t(x)\) to the ergodic convergence of the Freidlin-Wentzell linearization \(Y^\varepsilon _t(x)\) in (4.4) up to error terms. For any \(0<p\leqslant p_*\), by the triangle inequality it follows that

$$\begin{aligned} \mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )\leqslant \mathcal {W}_{p}(X^\varepsilon _t(x),Y^\varepsilon _t(x))+\mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*)+\mathcal {W}_{p}(\mu ^\varepsilon _*,\mu ^\varepsilon ) \end{aligned}$$

for any \(t\geqslant 0\), \(x\in \mathbb {R}^d\). Analogously we estimate

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*)\leqslant \mathcal {W}_{p}(Y^\varepsilon _t(x),X^\varepsilon _t(x))+\mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )+\mathcal {W}_{p}(\mu ^\varepsilon ,\mu ^\varepsilon _*). \end{aligned}$$

Combining the preceding inequalities we obtain the linear approximation

$$\begin{aligned} \left| \mathcal {W}_{p}(X^\varepsilon _t(x),\mu ^\varepsilon )-\mathcal {W}_{p}(Y^\varepsilon _t(x),\mu ^\varepsilon _*) \right| \leqslant \mathcal {W}_{p}(X^\varepsilon _t(x),Y^\varepsilon _t(x))+\mathcal {W}_{p}(\mu ^\varepsilon ,\mu ^\varepsilon _*) \end{aligned}$$
(4.4)

for any \(t\geqslant 0\), \(x\in \mathbb {R}^d\). In Proposition 2 given in “Appendix B.2” we show that for any \(t_\varepsilon =O(|\ln (\varepsilon )|)\) and \(0< p < p_*\) the following limit holds

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{t_\varepsilon }(x),Y^\varepsilon _{t_\varepsilon }(x))}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$
(4.5)

Moreover, in Lemma B.2 we show that for \(0< p < p_*\)

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(\mu ^\varepsilon _*,\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=0. \end{aligned}$$
(4.6)

4.2 Derivation of the Cutoff Phenomenon

In the sequel, we analyze the asymptotic behavior of \(\mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)\cdot \varepsilon ^{-(1\wedge p)}\) from which we recognize the cutoff of the Freidlin-Wentzell linearization \(Y^\varepsilon _t(x)\). By the triangle inequality, translation invariance, homogeneity and shift linearity given in Lemma 2.1 we obtain for \(0< p\leqslant p_*\)

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)&= \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {Y}^x_t, \varepsilon \mathcal {O}_\infty ) \\&\leqslant \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {Y}^x_t, X^0_t(x) + \varepsilon \mathcal {O}_\infty ) + \mathcal {W}_{p}(X^0_t(x) + \varepsilon \mathcal {O}_\infty , \varepsilon \mathcal {O}_\infty )\\&= \varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ) +\varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\varepsilon ^{-1} \cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ). \end{aligned}$$

Analogously we deduce

$$\begin{aligned} \mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)&\geqslant \varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\varepsilon ^{-1}\cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ) -\varepsilon ^{1\wedge p}\cdot \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ). \end{aligned}$$

Consequently,

$$\begin{aligned} \Big |\frac{\mathcal {W}_{p}(Y^\varepsilon _t(x), \mu ^\varepsilon _*)}{\varepsilon ^{1\wedge p}} - \mathcal {W}_{p}(\varepsilon ^{-1} \cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty ) \Big | \leqslant \mathcal {W}_{p}(\mathcal {Y}^x_t, \mathcal {O}_\infty ). \end{aligned}$$
(4.7)

The right-hand side of (4.7) does not depend of \(\varepsilon \) and by Lemma B.3 it tends to 0 as \(t\rightarrow \infty \). It is therefore enough to study the precise longterm behavior of \(\mathcal {W}_{p}(\varepsilon ^{-1}\cdot X^0_t(x) + \mathcal {O}_\infty , \mathcal {O}_\infty )\) in order to derive the cutoff phenomenon.

4.3 Proof of Theorem 1

For any \(0< p< p_*\), \(\mathfrak {t}^x_\varepsilon \) and \(w_\varepsilon \) being given in statement and \(r\in \mathbb {R}\), (4.4), (4.5), (4.6), (4.7) yield

$$\begin{aligned} \begin{aligned}&\limsup \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon } (x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big (\frac{X^0_t(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ),\\&\liminf \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X^\varepsilon _{\mathfrak {t}^x_\varepsilon + r \cdot w_\varepsilon } (x),\mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big (\frac{X^0_t(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned} \end{aligned}$$

For short, we define

$$\begin{aligned} \mathfrak {T}^x_\varepsilon =\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon -\tau ^x \quad \text { and } \quad \Lambda ^x(\varepsilon ):=\frac{(\mathfrak {T}^x_\varepsilon )^{\ell -1}}{\varepsilon e^{\mathfrak {q}^x \mathfrak {T}^x_\varepsilon }} \sum _{k=1}^{m} e^{i \mathfrak {T}^x_\varepsilon \theta ^x_k} v^x_k. \end{aligned}$$
(4.8)

Claim A.

$$\begin{aligned} \limsup _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\limsup _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ) \end{aligned}$$

and

$$\begin{aligned} \liminf _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}=\liminf _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ). \end{aligned}$$

for any \(0<p< p_*\). In particular, the limit

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\frac{\mathcal {W}_{p}(X_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }^\varepsilon (x), \mu ^\varepsilon )}{\varepsilon ^{1\wedge p}}\quad \text { exists iff}\quad \lim _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )\quad \text {exists}. \end{aligned}$$
(4.9)

Proof of Claim A

In the sequel we study the asymptotics of the drift term \(X^0_t(x) \cdot \varepsilon ^{-1}\). A straightforward calculation shows

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0} \frac{(\mathfrak {T}^x_\varepsilon )^{\ell -1} e^{-\mathfrak {q}^x \mathfrak {T}^x_\varepsilon }}{\varepsilon }=e^{-\mathfrak {q}^x \tau }(\mathfrak {q}^x)^{1-\ell }e^{-\mathfrak {q}^x r\cdot w}. \end{aligned}$$
(4.10)

The preceding limit implies with the help of the spectral decomposition (2.4) given in Lemma 2.2 and the triangle inequality that

$$\begin{aligned} \begin{aligned} \mathcal {W}_{p}\Big (\frac{X^0_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )&\leqslant \mathcal {W}_{p}\Big (\Big (\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big )+\mathcal {O}_\infty , \mathcal {O}_\infty \Big )\\&\quad +\, \mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned} \end{aligned}$$
(4.11)

We set

$$\begin{aligned} R^x_\varepsilon := \mathcal {W}_{p}\Big (\Big (\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ). \end{aligned}$$

Analogous reasoning yields

$$\begin{aligned} \mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big ) \leqslant \mathcal {W}_{p}\Big (\frac{X^0_{\mathfrak {t}^x_\varepsilon +r\cdot w_\varepsilon }(x)}{\varepsilon } + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )+R^x_\varepsilon . \end{aligned}$$

In the sequel it remains to show that \(R^x_\varepsilon \rightarrow 0\) as \(\varepsilon \rightarrow 0\). By the continuity of \(z\rightarrow \mathcal {W}_{p}(z+\mathcal {O}_\infty ,\mathcal {O}_\infty )\) at \(z=0\) it is enough to prove

$$\begin{aligned} \Big |\frac{X^0_{\tau +\mathfrak {T}^x_\varepsilon }(x)}{\varepsilon }- \Lambda ^x(\varepsilon )\Big |\rightarrow 0, \quad \varepsilon \rightarrow 0, \end{aligned}$$

which is valid due to the limit (2.4) and (4.10). This finishes the proof of Claim A. \(\square \)

In the sequel, we prove the window cutoff asymptotics in (2.8). Note that \(\Lambda ^x(\varepsilon )\) is uniformly bounded on \(\varepsilon \in (0,1]\). For any accumulation point U (as \(\varepsilon \rightarrow 0\)) of \(\big (\mathcal {W}_{p}( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty )\big )_{\varepsilon \in (0,1]}\) there exists a sequence \((\varepsilon _k)_{k\in \mathbb {N}}\), \(\varepsilon _k\rightarrow 0\) as \(k\rightarrow \infty \), such that

$$\begin{aligned} U=\lim \limits _{k\rightarrow \infty } \mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon _k) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ). \end{aligned}$$

The Bolzano–Weierstrass theorem for the sequence \((\Lambda (\varepsilon _k))_{k\in \mathbb {N}}\), the limit (4.10) and the continuity of \(\mathcal {W}_{p}\) yield

$$\begin{aligned} U=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r} u+\mathcal {O}_\infty ,\mathcal {O}_\infty )\quad \text { for some } u\in \omega (x). \end{aligned}$$
(4.12)

In particular,

$$\begin{aligned} \begin{aligned}&\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}{\hat{u}}+\mathcal {O}_\infty ,\mathcal {O}_\infty ),\\&\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}\check{u}+\mathcal {O}_\infty ,\mathcal {O}_\infty ), \end{aligned} \end{aligned}$$

where \({\hat{u}},\check{u}\in \omega (x)\) and \(\check{u}\ne 0\) by (2.5). Hence item (d) in Lemma 2.1 implies

$$\begin{aligned}&\lim \limits _{r\rightarrow \infty }\limsup \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=0 \quad \text { and }\\&\lim \limits _{r\rightarrow -\infty }\liminf \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big )=\infty . \end{aligned}$$

This finishes the proof of Theorem 1. \(\square \)

4.4 Proof of Theorem 2

We keep the notation (4.8) of the proof of Theorem 1. By (4.9) it is enough to prove that the limit

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}\mathcal {W}_{p}\Big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \Big )\quad \text { exists}. \end{aligned}$$
(4.13)

We recall the definition of \(\Lambda ^x(\varepsilon )\) (4.8) and the limit (4.10). By (4.12) we have

$$\begin{aligned}&\left\{ \text {accumulation points of } \mathcal {W}_{p}\big ( \Lambda ^x(\varepsilon ) + \mathcal {O}_\infty , \mathcal {O}_\infty \big ) \text { as } \varepsilon \rightarrow 0\right\} \nonumber \\&\quad = \left\{ \mathcal {W}_{p}\big ((e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r})\, u+\mathcal {O}_\infty ,\mathcal {O}_\infty \big ) : u\in \omega (x) \right\} . \end{aligned}$$
(4.14)

For \(p\geqslant 1\), the shift linearity given in item d) of Lemma 2.1 implies

$$\begin{aligned} \mathcal {W}_{p}(e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r} u+\mathcal {O}_\infty ,\mathcal {O}_\infty )= e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}|u|. \end{aligned}$$
(4.15)

Combining (4.14) and (4.15) we infer

$$\begin{aligned}&\left\{ \mathcal {W}_{p}\big ((e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r})\, u+\mathcal {O}_\infty ,\mathcal {O}_\infty \big ) : u\in \omega (x) \right\} \nonumber \\&\quad = \left\{ e^{-\mathfrak {q}^x \tau ^x}(\mathfrak {q}^x)^{1-\ell ^x}e^{-\mathfrak {q}^x w r}\,|u| : u\in \omega (x) \right\} . \end{aligned}$$
(4.16)

Hence (4.14) and (4.16) imply that the limit (4.13) exists if and only if the right-hand side of (4.16) has exactly one element. This is equivalent to \(\omega (x)\) being contained in a sphere in \(\mathbb {R}^d\) with respect to the Euclidean distance. For \(p\in (0,1)\) the shift linearity is not valid and we are stuck after (4.14). Consequently, (4.14) holds true and the limit (4.13) exists if and only if for all \(\lambda >0\) the function

$$\begin{aligned} \omega (x)\ni u\mapsto \mathcal {W}_{p}(\lambda u+\mathcal {O}_\infty ,\mathcal {O}_\infty )\quad \text { is constant}. \end{aligned}$$

This finishes the proof of Theorem 2. \(\square \)