The cutoff phenomenon in Wasserstein distance for nonlinear stable Langevin systems with small L\'evy noise

This article establishes the cutoff phenomenon in the Wasserstein distance for systems of nonlinear ordinary differential equations with a unique coercive stable fixed point subject to general additive Markovian noise in the limit of small noise intensity. This result generalizes the results shown in Barrera, H\"ogele, Pardo (EJP2021) in a more restrictive setting of Blumenthal-Getoor index $\alpha>3/2$ to the formulation in Wasserstein distance, which allows to cover the case of general L\'evy processes with some given moment. The main proof techniques are based on the close control of the errors in a version of the Hartman-Grobman theorem and the adaptation of the linear theory established in Barrera, H\"ogele, Pardo (JSP2021). In particular, they rely on the precise asymptotics of the nonlinear flow and the nonstandard shift linearity property of the Wasserstein distance, which is established by the authors in (JSP2021). Main examples are the Fermi-Pasta-Ulam-Tsingou gradient flow and coercive nonlinear oscillators subject to small (and possibly degenerate) Brownian or arbitrary $\alpha$-stable noise.


Introduction
In this paper, we study the asymptotics of the ergodic behavior of the following stochastic differential equation (SDE) (1.1) dX ε t (x) = −b(X ε t (x))dt + εdL t , X ε 0 (x) = x ∈ R d for small noise intensity ε > 0, where the vector field b ∈ C 2 (R d , R d ) satisfies b(0) = 0 and the following dissipative condition.
Hypothesis 1 (Dissipativity). There exists a constant δ > 0 such that The noise process L = (L t ) t 0 in (1.1) is a Lévy process with values in R d on a given probability space (Ω, F, P). It is well-known that the law of L is characterized by the triplet (a, Σ, ν), where a ∈ R d , Σ ∈ R d×d is a non-negative definite matrix and ν : B(R d ) → [0, ∞] is a locally finite Borel measure satisfying For ν = 0 the process L is a multidimensional Brownian motion with drift, while for a = 0 and Σ = 0 we have a multidimensional pure jump process such as compound Poisson processes or α-stable processes, in particular, the Cauchy process for α = 1. We refer to [1,16,18,22] for further details on Lévy processes. Under Hypothesis 1, it is known that the SDE (1.1) has a pathwise unique strong solution, see for instance Theorem 1.1 in [10], here denoted by X ε (x) := (X ε t (x)) t 0 . Moreover, X ε (x) is a Markov process and, in particular, it satisfies the Feller property see Proposition 2.1 in [21].
In order to present the main results of this paper, we formally introduce the Wasserstein distance of order p * . We assume some finite moment for L t and hence X ε t (x) for all t 0.
This article shows the cutoff phenomenon for the family of processes (X ε (x)) ε>0 with respective invariant measures (µ ε ) ε>0 under the Wasserstein distance W p * of order p * > 0. For p * > 1 we characterize the following cutoff profile asymptotics where t ε = 1 q | ln(ε)| + ℓ−1 q ln(| ln(ε)|) for some explicit positive constants q, ℓ, C that depend on x in terms of an ω-limit set of the rotational part for the Hartman-Grobman linearization of X 0 (x).
For such processes (X ε (x)) ε>0 where (1.3) fails, we establish the following weaker window cutoff asymptotics lim r→∞ lim sup ε→0 W p * (Law(X ε tε+r (x)), µ ε ) ε = 0 and lim Our results generalize the results in [2] to the nonlinear vector field and [3], [5] and [6] to the Wasserstein distance which cover second order equations with degenerate noise. For a detailed introduction on the subject we refer to the aforementioned articles, in particular, see Table 1.1 in [3]. There is a particular advantage of studying this problem under the Wasserstein distance rather than in the total variation. While the Wasserstein distance only requires the existence of moments of X ε (x) of a given order, the total variation distance needs existence of its density in addition to its regularity. The latter brings further requirements for the Lévy process L which can be quite restrictive, see [3] for further details. Furthermore the Wasserstein case, at least in case of X ε (x) moments of order p > 1, the cutoff phenomenon of (X ε (x)) ε>0 is completely determined by an explicit function (see Theorem 2 below), here called as cutoff profile. On the contrary, in the total variation case the profile function can be very involved and even hard to simulate in examples.
In [4], the cutoff phenomenon with respect to the total variation distance covering SDEs of the type (1.1) in the one dimensional case, L being a standard Brownian motion and with general drift coefficient b (satisfying Hypothesis 1) is studied. Since scalar systems are gradient systems, there is always a cutoff profile which can be given explicitly in terms of the Gauss error function. The follow-up work [5] covers the multidimensional case, where the picture is considerably richer, due to the presence of strong and complicated rotational patterns. The authors characterize sharply the existence of a cutoff profile in terms of the omega limit sets appearing in the long-term behavior of the matrix exponential function e −Qt x in Lemma B.2 in [5], which plays an analogous role in this article. The paper [6] is the first attempt to study the cutoff phenomenon for such models with jumps. More precisely, [6] covers the cutoff phenomenon with respect to the total variation distance of the generalized Ornstein-Uhlenbeck processes. The previous process satisfies an SDE of the form (1.1) with L being a Lévy process and b(x) = Qx, where Q is a square real matrix whose eigenvalues have positive real parts. The proof methods are based on concise Fourier inversion techniques. Due to the aforementioned regularity inherited by the total variation, the results in [6] are given under the hypothesis of continuous densities of the marginals, which to date is mathematically not characterized in simple terms. The cutoff profile function in [6] is given in terms of the Lévy-Ornstein-Uhlenbeck limiting measure for ε = 1 and measured in the total variation distance. Such profile functions are theoretically highly insightful, but almost impossible to calculate and simulate in examples. The characterization of the existence of a cutoff-profile remains analogously to [5] in abstract terms of the behavior of the mentioned profile function on a suitably defined omega limit set. The Wasserstein case is treated in [2] where, contrary to the total variation case, it is noted that the profile function takes an explicit and simple shape. Finally, [3] treats the cutoff phenomenon with respect to the total variation distance for (1.1) with b satisfying Hypothesis 1 and driven by a Lévy process in the rather restrictive class of strongly locally layered stable processes (see Definition 1.4 in [3]).
In this article we combine a nonlinear version of the Wasserstein estimates of [2], with the Freidlin-Wentzell first order approximation of (1.1) in the spirit of [3] and the fine properties of the Wasserstein distance given in Lemma 2.1, in particular, the non-standard shift linearity of Lemma 2.1.d).
The manuscript is organized in four parts. After the exposition of the setting and the presentation of the main results in Section 2, we illustrate our findings for the nonlinear Fermi-Pasta-Ulam-Tsingou gradient system and a class of nonlinear oscillators in Section 3. The main steps of the proof of the cutoff phenomenon are given in Section 4 while the auxiliary technical such as exponential ergodicity in Wasserstein distance, the coupling between the original nonlinear system and the Freidlin-Wentzell linearization results are given in the appendix.

Setting and main results
2.1. Fine properties of the Wasserstein distance. For any two probability distributions µ 1 and µ 2 on R d with finite p * -th moment for some p * > 0, we define the Wasserstein p * -distance between them as follows where the infimum is taken over all couplings (joint distributions on R d × R d ) Π with marginals µ 1 and µ 2 . We refer to [12,20] and references therein for more details. For convenience of notation we do not distinguish a random variable U and its law P U as an argument of W p * . That is, for random variables U 1 , U 2 and probability measure µ we write The next result establishes properties of the Wasserstein distance which turn out to be important for our arguments.
Lemma 2.1 (Properties of W p * ). For p * > 0, u 1 , u 2 ∈ R d , c ∈ R and U 1 and U 2 being random vectors in R d with finite p * -th moment we have the following: a) The Wasserstein distance W p * is a metric. b) Translation invariance: W p * (u 1 + U 1 , u 2 + U 2 ) = W p * (u 1 − u 2 + U 1 , U 2 ). c) Homogeneity: for p * ∈ [1, ∞), |c| p * W p * (U 1 , U 2 ) for p * ∈ (0, 1). d) Shift linearity: For p * 1 it follows For p * ∈ (0, 1) we have e) Domination: For any given couplingΠ between U 1 and U 2 it follows . f) Characterization: Let (U n ) n∈N be a sequence of random vectors with finite p * -th moments and U a random vector with finite p * -th moment. Then the following statements are equivalent: (1) W p * (U n , U ) → 0 as n → ∞.
For p * ∈ (0, 1) equality (2.1) is false in general, see Remark 2.4 in [2]. The proof of the previous lemma is given in Lemma 2.2 in [2]. The following result yields the existence of a unique invariant distribution for (1.1) under Hypotheses 1 and 2. Moreover, under the Wasserstein distance, the strong solution of (1.1) is exponentially ergodic.
Proposition 1 (Existence of a unique invariant distribution). Under Hypothesis 1 for p * > 0 and Hypothesis 2 there exists a unique invariant probability measure µ ε such that The proof is given in Appendix A.

2.2.
Hartman-Grobman asymptotics. The zeroth-order approximation of a smooth dynamical systems on a finite time horizon [0, T ] subject to small perturbations is given by the deterministic system, that is, Our main results treat small asymptotics close to the stable state 0 which translates to meaningful time scales t ε → ∞, as ε → 0, in Theorem 1 and Theorem 2. Before we state our main result, we first provide the long-time asymptotics of X 0 t (x) in terms of the spectral decomposition of the solution t → e −Db(0)t x * of the respective linear system for some x * in a small neighbourhood of the origin.

Lemma 2.2 (Asymptotic Hartman-Grobman).
Assume Hypothesis 1. Then for any x ∈ R d \ {0} there exist: Moreover, The formal proof of the previous lemma is given in Lemma B.2 in Appendix B of [5].
If such an index shows up in θ x 1 , . . . , θ x m x we adopt the convention that θ x 1 = 0 and v x 1 ∈ R d , and hence m x = 2n + 1 for some n ∈ N 0 . Otherwise, m x = 2n for some n ∈ N 0 and we eliminate θ x 1 and count the angular velocities as follows θ x 2 , . . . , θ x 2n+1 .
(2) Note that the linearly independent complex vectors v x 1 , . . . , v x mx in C d not only depend on x but also crucially on the dissipation time τ x of the deterministic system to a Hartman-Grobman domain of conjugacy U . We stress that τ x is not unique since X 0 t+τ x (x) ∈ U for all t 0. In fact, by Hypothesis 1 we have that H is a C 1 -diffeomorphism, see the original paper [8] or Theorem(Hartman), Sec. 2.8, p.127, [13]. In [8] it is shown that H can be chosen to be With the help of a linear coordinate change W we obtain the Jordan normal form Db(0) = W −1 J(Db(0))W and (using the linearity of the semigroup) ). We denotew = W H(ũ). Now, the parameters ℓ x , q x and m x are given as follows. Consider the sequence of generalized eigenspaces H j of J(Db(0)) such that Now, q x is the smallest real part of the spectrum ofJ(w), ℓ x is the dimension of the largest Jordan block ofJ (w) which has the real part q x and m x is the number of Jordan blocks associated to q x and ℓ x . Note that in case of a non real eigenvalue with real part q x and Jordan block size ℓ x , we have m x 2. For an extensive numerical example for a linear chain of oscillators we refer to Section 4.3.2 in [2].

Main results.
Our first main result establishes ∞/0 collapse of the Wasserstein distance between the law of the current state X ε t (x) and the dynamical equilibrium µ ε along the critical time scale t x ε given in (2.5) under mild conditions. Theorem 1 (Window cutoff). Let b satisfy Hypothesis 1 and ν satisfy Hypothesis 2 for some p * > 0. Fix x ∈ R d \ {0} and consider the notation in the asymptotic Hartman-Grobman representation q Then the family of processes (X ε (x)) ε>0 exhibits a window cutoff phenomenon on the time scale and for all asymptotically constant window sizes w ε , that is, w ε → w > 0 as ε → 0, in the following sense. For all 0 < p < p * we have The second main result provides two characterizations for the proper limits (ε → 0) of the expressions in (2.6) for any fixed r ∈ R. That is to say, we characterize under which conditions the asymptotics (1.3) is satisfied. In addition, it yields the precise shape of the limit which turn out to be a simple exponential function for p ∈ [1, p * ).
Theorem 2 (Dynamical profile cutoff characterization for p * > 0). Let the assumptions (and the notation) of Theorem 1 be valid for some p * > 0. Consider the unique strong solution (O t ) t 0 of the linear system where O ∞ is the unique invariant probability distribution of (2.7).
(1) Then for any 0 < p < p * the following statements are equivalent.
ii) The family of processes (X ε (x)) ε>0 exhibits a profile cutoff for any 0 < p < p * as follows and (2) For p * > 1 and p ∈ [1, p * ) the profile has the shape if and only if ω(x) is contained in a sphere in R d with respect to the Euclidean norm.
(3) We recall the convention of Remark 2.3. Let p * > 1 and p ∈ [1, p * ). If the angles θ x 2 , . . . , θ x 2n satisfy the following non-resonance condition then the statements i) and ii) in item (1) are equivalent to the following normal growth condition of the asymptotic Hartman-Grobman linearization: The family of limiting vectors Remark 2.4. We stress that O ∞ = lim t→∞ O t in W p * and due to Hypothesis 1 (in combination with Hypothesis 2) the distribution of O ∞ does not depend on any deterministic initial condition of (2.7).
Due to its relevance as physical observables, we formulate the corresponding window cutoff result for the respective moments.
Corollary 2.5 (Moments cutoff). Let the assumptions (and the notation) of Theorem 1 be valid for some p * > 0. Then for any 0 < p < p * it follows

Examples
In this section we present two examples which illustrate the applicability of Theorem 1 and Theorem 2 to nonlinear dynamics with degenerate noise.
Example 3.1 (The Fermi-Pasta-Ulam-Tsingou potential). We consider the nonlinear Langevin gradient system (3.1) dX ε t = −∇U (X ε t )dt + εdL t for the strongly convex quartic Fermi-Pasta-Ulam-Tsingou potential U (x) = 1 2 |x| 2 + 1 4 |x| 4 , x ∈ R d subject to degenerate noise dL t . For any Lévy process L satisfying Hypothesis 2 for some p * > 0 the system (3.1) exhibits a profile cutoff due to Theorem 2 where the cutoff time is given by t x ε = | ln(ε)|. For p * > 1 and any p ∈ [1, p * ) the profile function in W p is always of the following exponential shape where τ x := min{t 0 : |X 0 t (x)| R 0 /2} and R 0 being an small radius inside of which Hartman-Grobman conjugation is valid. Note that τ x can be replaced by any upper bound of τ x such as for instance ( 1 /δ) ln(2|x|/R 0 ) given by Hypothesis 1.
In particular, the profile cutoff (3.2) is valid for L = L α being an (possibly degenerate) α-stable process with index α ∈ (1, 2]. Note that for the limiting case of a possibly degenerate Cauchy process (α = 1) and in fact of any L α with index α ∈ (0, 1), Theorem 2 also yields a profile cutoff. However, the profile function remains not explicit. This is due to the absence of a finite first moment and the lack of the shift linearity (2.2). In other words, the profile function is given in (2.8) for p ∈ (0, α) and up to our knowledge unknown how to simplify further. Note that the case of α ∈ (0, 3/2] is new and is not covered in [3]. Example 3.2 (Nonlinear non-gradient with degenerate noise). For F, H ∈ C 2 (R 2 , R) we consider the following perturbed simple harmonic oscillator with unit angular frequency given in Section 4 of [19] subject to a small noise perturbation where L = (L t ) t 0 is a one dimensional Lévy process with finite p * -th moments. The Jacobian matrix Jb(v 1 , v 2 ) at (v 1 , v 2 ) of the respective vector field b : R 2 → R 2 is given by It is enough to prove the existence of a positive constant δ such that for any u 1 , For instance, for a nonlinear perturbation of a linear oscillator, that is, F (v 1 , v 2 ) = η for some η > 0, the preceding condition reads For L satisfying Hypothesis 2 with p * , and F , H fulfilling (3.3) Theorem 1 implies window cutoff for any initial condition (X ε,1 0 , X ε,2 0 ) = x ∈ R 2 \ {0} and any p ∈ (0, p * ). The cutoff time is given by Note that this result is new even in the Brownian case since the results of [3] and [5] are stated for the total variation distance which requires regularity on the transition probabilities given in the setting of non-degenerate noise. In our case, the Wasserstein distance circumvents this difficulty by the continuity of W p (x + X, X) for any X ∈ L p as |x| → 0 and |x| → ∞, while for total variation distance it requires absolutely continuity on the distribution of X. We refer to [3], Lemma 1.17 in Subsection 1.3.5, for an example where the continuity of the total variation distance under shifts is not valid.
In the sequel, we characterize the existence of a profile cutoff under Note that η 0 = c implies that the eigenvalues of Jb(0, 0) are the numbers a and b which are positive and hence by Theorem 2 profile cutoff is valid. In the sequel we assume η 0 = c. Then the eigenvalues of Jb(0, 0) are given by In addition,  It is not hard to see that Hypothesis 1 implies Re(λ) −δ for any eigenvalue λ of −Q and hence Hurwitz stability. However, the dissipativity condition (1.2) which is assumed in order to control the nonlinear vector field, is strictly stronger than Hurwitz stability. For instance, the vector field b : has eigenvalues with real part −λ/2 < 0, but it does not satisfy Hypothesis 1. Note that the dissipativity condition (1.2) is not even satisfied locally in a neighborhood of the origin.

Proofs of the main results
4.1. The first order approximation. We define the Freidlin-Wentzell first order approximation given by In [3], Lemma C.4 in Section C.4 it is shown that Y ε t (x) converges in total variation distance to a unique limiting distribution µ ε * as t → ∞. Moreover, it is shown there that µ ε * d = εO ∞ , where O ∞ is the unique invariant probability distribution of the homogeneous Ornstein-Uhlenbeck dynamics In the sequel we reduce the nonlinear ergodic convergence of X ε t (x) to the ergodic convergence of the Freidlin-Wentzell linearization Y ε t (x) in (4.3) up to error terms. For any 0 < p p * , by the triangle inequality it follows that . Combining the preceding inequalities we obtain the linear approximation In Proposition 2 given in Appendix B.2 we show that for any t ε = O(| ln(ε)|) and 0 < p < p * the following limit holds Moreover, in Lemma B.2 we show that for 0 < p < p *

Derivation of the cutoff phenomenon.
In the sequel, we analyze the asymptotic behavior of W p (Y ε t (x), µ ε * ) · ε −(1∧p) from which we recognize the cutoff of the Freidlin-Wentzell linearization Y ε t (x). By the triangle inequality, translation invariance, homogeneity and shift linearity given in Lemma 2.1 we obtain for 0 < p p * The right-hand side of (4.6) does not depend of ε and by Lemma B.3 it tends to 0 as t → ∞. It is therefore enough to study the precise longterm behavior of W p (ε −1 · X 0 t (x) + O ∞ , O ∞ ) in order to derive the cutoff phenomenon.

4.3.
Proof of Theorem 1. For any 0 < p < p * , t x ε and w ε being given in statement and r ∈ R, (4.3), (4.4), (4.5), (4.6) yield For short, we define for any 0 < p < p * . In particular, the limit Proof of Claim A. In the sequel we study the asymptotics of the drift term X 0 t (x) · ε −1 . A straightforward calculation shows The preceding limit implies with the help of the spectral decomposition (2.3) given in Lemma 2.2 and the triangle inequality that We set Analogous reasoning yields In the sequel it remains to show that R x ε → 0 as ε → 0. By the continuity of which is valid due to the limit (2.3) and (4.9). This finishes the proof of Claim A.

4.4.
Proof of Theorem 2. We keep the notation (4.7) of the proof of Theorem 1. By (4.8) it is enough to prove that the limit We recall the definition of Λ x (ε) (4.7) and the limit (4.9). By (4.10) we have For p 1, the shift linearity given in item d) of Lemma 2.1 implies Combining (4.12) and (4.13) we infer Hence (4.12) and (4.14) imply that the limit (4.11) exists if and only if the right-hand side of (4.14) has exactly one element. This is equivalent to ω(x) being contained in a sphere in R d with respect to the Euclidean distance. For p ∈ (0, 1) the shift linearity is not valid and we are stuck after (4.12). Consequently, (4.12) holds true and the limit (4.11) exists if and only if for all λ > 0 the function This finishes the proof of Theorem 2.
Appendix A. Existence of the invariant measure A.1. Invariant distribution µ ε . In the sequel we show the existence of a unique invariant distribution µ ε of the solution of (1.1) for any ε > 0. We stress that beyond the existence of moments (Hypothesis 2), this does not include any regularity such as absolute continuity whatsoever in our setting. For instance, our setting covers nonlinear oscillators with degenerate noise in Example 3.2.
We recall the standing assumptions Hypothesis 1 with δ > 0 and Hypothesis 2 with p * > 0. For the existence of the invariant probability measure µ ε it is enough to verify the following condition by [7], p. 388. For some x ∈ R d , the limit Hypotheses 1 and 2 imply inequality (D.3) p. 71 in [3]. That is to say, for γ ∈ (0, 1 ∧ p * ) there exist positive constants C 1 , C 2 , C 3 such that for all x ∈ R d , ε > 0, t 0, A = εΠ, c = ε . Inequality (A.2) implies (A.1) with the help of the Markov inequality.
For the uniqueness, it enough to verify the following condition given in Theorem 11.4.3 in [9]. For any given positive numbers η, δ and R, there exists a positive constant S such that Hypotheses 1, 2 and the additivity of the noise imply (D.5) p. 71 in [3]. In other words, for any The preceding inequality implies (A.3) with the help of the Markov inequality.
Appendix B. L p estimates for p ∈ (0, p * ) We recall the Lévy-Khinchin formula of L with characteristic triple (a, Σ, ν) and the pathwise Lévy-Itô representation where (B t ) t 0 is a standard Brownian motion in R d , N is a Poisson random measure on [0, ∞) × R d with intensity measure dt ⊗ ν(dz) andÑ is the compensated counterpart of N . See [16] for further details on Lévy processes. We recall the standing assumptions Hypothesis 1 with δ > 0 and Hypothesis 2 with p * > 0.
B.1. Localization. We start with the probability estimate of the event where Y x is given in (4.2).
Lemma B.1. For any γ ∈ (0, p * ∧ 1] there is a positive constant C such that for any ϑ 1, x ∈ R d and t 0 we have .
In particular, it follows We continue term by term. By the Chebyshev inequality we obtain Finally, for γ ∈ (0, p * ∧ 1] we have where we have used the subadditivity of the power γ in the sense of Subsection 1.1.2, see formula (1.6) in [15]. This finishes the proof of the statement.
B.2. First order approximation. We start with some technical preliminaries. In order to overcome that u → |u| p for p ∈ (0, 2) is not twice continuously differentiable which turns out to be necessary for applying Itô's formula we use the following C 2 norm approximation |x| c := |x| 2 + c 2 , c > 0, with the limiting case |x| 0 = |x|. It is well-behaved in the following sense. For any c > 0 we have c |x| c |x| + c, ∇|x| c := x |x| c and 0 |x| |x| c < 1.
Furthermore, it is straightforward to verify for G(x) = |x| p c the following calculations For details of the estimates, we refer to p. 69 in [3]. Since p ∈ (0, 2) and c |x| c , we obtain Proposition 2. We keep the notation of Theorem 1. Then for any x ∈ R d , r ∈ R and p ∈ (0, p * ) it follows Proof. By the domination property of the Wasserstein distance in Lemma 2.1 it is enough to show the preceding limit in the respective L p space. By (4.1) we have dY ε where C p * is a positive constant. Since (Y x t ) t 0 satisfies a dissipative linear equation, it exhibits the same integrability as L, which is straightforward to verify. There are a positive constantC p * and a function S p * (t) of at most polynomial order such that (B.6) E[|Y x t | p * ] C p * E[|L t | p * ] C p * S p * (t) for all t 0.