Sticky nonlinear SDEs and convergence of McKean-Vlasov equations without confinement

We develop a new approach to study the long time behaviour of solutions to nonlinear stochastic differential equations in the sense of McKean, as well as propagation of chaos for the corresponding mean-field particle system approximations. Our approach is based on a sticky coupling between two solutions to the equation. We show that the distance process between the two copies is dominated by a solution to a one-dimensional nonlinear stochastic differential equation with a sticky boundary at zero. This new class of equations is then analyzed carefully. In particular, we show that the dominating equation has a phase transition. In the regime where the Dirac measure at zero is the only invariant probability measure, we prove exponential convergence to equilibrium both for the one-dimensional equation, and for the original nonlinear SDE. Similarly, propagation of chaos is shown by a componentwise sticky coupling and comparison with a system of one dimensional nonlinear SDEs with sticky boundaries at zero. The approach applies to equations without confinement potential and to interaction terms that are not of gradient type.


Introduction
The main objective of this paper is to study and quantify convergence to equilibrium for McKean-Vlasov type nonlinear stochastic differential equations of the form where (B t ) t≥0 is a d-dimensional standard Brownian motion and b : R d → R d is a Lipschitz continuous function. This nonlinear SDE is the probabilistic counterpart of the Fokker-Planck equation which describes the time evolution of the density u t ofμ t with respect to the Lebesgue measure on R d . Moreover, we also study uniform in time propagation of chaos for the approximating mean-field interacting particle systems with i.i.d. initial values X 1,N 0 , . . . , X N,N 0 , and driven by independent d-dimensional Brownian mo- Our results are based on a new probabilistic approach relying on sticky couplings and comparison with solutions to a class of nonlinear stochastic differential equations on the real interval [0, ∞) with a sticky boundary at 0. The study of this type of equations carried out below might also be of independent interest.
The equations (1) and (2) have been studied in many works. Often a slightly different setup is considered, where the interaction b is assumed to be of gradient type, i.e., b = −∇W for an interaction potential function W : R d → R, and an additional confinement potential function V : R d → R satisfying lim |x|→∞ V (x) = ∞ is included in the equations. The corresponding Fokker-Planck equation occurs for example in the modelling of granular media, see [45,3] and the references therein. Existence and uniqueness of solutions to (1), (2) and (4) have been studied intensively. Introductions to this topic can be found for example in [24,36,37,44], while recent results have been established in [38,27]. Under appropriate conditions, it can be shown that the solutions converge to a unique stationary distribution at some given rate, see e.g. [11,12,7,18,17,26]. In the case without confinement considered here, convergence to equilibrium of (μ t ) t≥0 defined by (1) can only be expected for centered solutions, or after recentering around the center of mass ofμ t . It has first been analyzed in [11,12] by an analytic approach and under the assumption that b = −∇W for a convex function W . In particular, exponential convergence to equilibrium has been established under the strong convexity assumption Hess(W ) ≥ ρ Id for some ρ > 0, and polynomial convergence in the case where W is only degenerately strictly convex. Similar results and some extensions have been derived in [34,13] using a probabilistic approach. Our first contribution aims at complementing these results, and extending them to non-convex interaction potentials and interaction functions that are not of gradient type. More precisely, where L ∈ (0, ∞) is a positive real constant, and γ : R d → R d is a bounded function. Then we give conditions on γ ensuring exponential convergence of centered solutions to (1) to a unique stationary distribution in the standard L 1 Wasserstein metric. More generally, we show in Theorem 1 that under these conditions there exist constants M, c ∈ (0, ∞) that depend only on L and γ such that if (μ t ) t≥0 and (ν t ) t≥0 are the marginal distributions of two solutions of (1), then for all t ≥ 0, Using a coupling approach, related results have been derived in the previous works [18,17] for the case where an additional confinement term is included in the equations. However, the arguments in these works rely on treating the equation with confinement and interaction term as a perturbation of the corresponding equation without interaction term, which has good ergodic properties. In the unconfined case this approach does not work, since the equation without interaction is transient and hence does not admit an invariant probability measure. Moreover, we are not aware of results for this framework with non-convex interaction potentials and non-gradient interaction functions that rely on classical analytical methods. Therefore, we have to develop a new approach for analyzing the equation without confinement.
Our approach is based on sticky couplings, an idea first developed in [19] to control the total variation distance between the marginal distributions of two non degenerate diffusion processes with identical noise but different drift coefficients. Since two solutions of (1) differ only in their drifts, we can indeed couple them using a sticky coupling in the sense of [19]. It can then be shown that the coupling distance process can be controlled by the solution (r t ) t≥0 of a nonlinear SDE on [0, ∞) with a sticky boundary at 0 of the form Hereb is a real-valued function on [0, ∞) satisfyingb(0) = 0, a is a positive constant, and (W t ) t≥0 is a one-dimensional standard Brownian motion. Solutions to SDEs with diffusion coefficient r → ½ (0,∞) (r), as in (6), have a sticky boundary at 0, i.e., if the drift at 0 is strictly positive, then the set of all time points t ∈ [0, ∞) such that r t = 0 is a fractal set with strictly positive Lebesgue measure that does not contain any open interval. Sticky SDEs have attracted wide interest, starting from [22,23] in the one-dimensional case. Multivariate extensions have been considered in [28,46,47] building upon results obtained in [35,41,42], while corresponding martingale problems have been investigated in [43]. Versions of sticky processes occur among others in natural sciences [8,25] and finance [30]. Note that in general no strong solution for this class of SDEs exists as illustrated in [14]. We refer to [21,2] and the references therein for recent contributions on this topic. Note, however, that in contrast to standard sticky SDEs, the equation (6) is nonlinear in the sense of McKean. We are not aware of previous studies of such nonlinear sticky equations, which seems to be a very interesting topic on its own. Intuitively, one would hope that as time evolves, more mass gets stuck at 0, i.e., P(r t > 0) decreases. As a consequence, the drift at 0 in Equation (6) decreases, which again forces even more mass to get stuck at 0. Therefore, under appropriate conditions one could hope that P(r t = 0) converges to 1 as t → ∞. On the other hand, if a is too large then the drift at 0 might be too strong so that not all of the mass gets stuck at 0 eventually. This indicates that there might be a phase transition for the nonlinear sticky SDE depending on the size of the constant a compared tõ b. In Section 3, we prove rigorously that this intuition is correct. Under appropriate conditions oñ b, we show at first that existence and uniqueness in law holds for solutions of (6). Then we prove that for a sufficiently small, the Dirac measure at 0 is the unique invariant probability measure, and geometric ergodicity holds. As a consequence, under corresponding assumptions, the sticky coupling approach yields exponential convergence to equilibrium for the original nonlinear SDE (1). On the other hand, we prove the existence of multiple invariant probability measures for (6) if the smallness condition on a is not satisfied. In this case, we cannot make a statement on the behaviour of the distance function corresponding to the sticky coupling approach since based on this approach we only get upper bounds and the existence of multiple invariant measure for the dominating sticky nonlinear SDE does not imply that the underlying distance function does not converge. If the unconfined SDE (1) has multiple invariant measures and if the two copies of the unconfined SDE in the sticky coupling start in two different equilibria, then the law of the distance function does not converge to the Dirac measure at zero. Our results for (1) can also be adapted to deal with nonlinear SDEs over the torus T = R/(2πZ), as considered in [16]. As an example, we discuss the application to the Kuramoto model for which a more explicit analysis is available [1,4,5,9].
Finally, in addition to studying the long-time behaviour of the nonlinear SDE (1), we are also interested in establishing propagation of chaos for the mean-field particle system approximation (3). The propagation of chaos phenomenon first introduced by Kac [31] describes the convergence of the empirical measure of the mean-field particle system (3) to the solution (1). More precisely, in [44,37] it has been shown under weak assumptions on W that for i.i.d. initial laws, the random variables X i,N t , i ∈ {1, . . . , N }, become asymptotically independent as N → ∞, and the common law µ N t of each of these random variables converges toμ t . However, the original results are only valid uniformly over a finite time horizon. Quantifying the convergence uniformly for all times t ∈ R + is an important issue. The case with a confinement potential has been studied for example in [17], see also the references therein. Again, the case when there is only interaction is more difficult. Malrieu [34] seems the first to consider the case without confinement. By applying a synchronous coupling, he proved uniform in time propagation of chaos for strongly convex interaction potentials. Later on, assuming that the interaction potential is loosing strict convexity only in a finite number of points (e.g., W (x) = |x| 3 ), Cattiaux, Guillin and Malrieu [13] have shown uniform in time propagation of chaos with a rate getting worse with the degeneracy in convexity. In a very recent work, Delarue and Tse [15] prove uniform in time weak propagation of chaos (i.e., observable by observable) on the torus via Lions derivative methods. Remarkably, their results are not limited to the unique invariant measure case.
Our contribution is in the same vein using probabilistic tools in place of analytic ones. We endow the space R N d consisting of N particle configurations is a normalized l 1 -distance between configurations x, y ∈ R N d , and Let W l 1 •π denote the L 1 Wasserstein semimetric on probability measures on R N d corresponding to the cost function l 1 •π. Then under assumptions stated below, we prove uniform in time propagation of chaos for the mean-field particle system in the following sense: Suppose that (X 1,N t , . . . , X N,N t ) t≥0 is a solution of (3) such that X 1,N 0 , . . . , X N,N 0 are i.i.d. with distributionμ 0 having finite second moment. Let ν N t denote the joint law of the random variables X i,N t , i ∈ {1, . . . N }, and letμ t denote the law of the solution of (1) with initial lawμ 0 . Then there exists a constant C ∈ [0, ∞) such that for any N ∈ N, sup The proof is based on a componentwise sticky coupling, and a comparison of the coupling difference process with a system of one-dimensional sticky nonlinear SDEs.
The paper is organised as follows. In Section 2, we state our main results regarding the long-time behaviour of (1). The main results on one-dimensional nonlinear SDEs with a sticky boundary at zero are stated in Section 3. Sections 4 and 5 contain the corresponding results on uniform (in time) propagation of chaos and mean-field systems of sticky SDEs. All the proofs are given in Section 6. In Appendix A, we carry the results over to nonlinear sticky SDEs over T and consider the application to the Kuramoto model.

Notation
The Euclidean norm on R d is denoted by | · |. For x ∈ R, we write x + = max(0, x). For some space X, which here is either R d , R N d or R + , we denote its Borel σ-algebra by B(X). The space of all probability measures on (X, B(X)) is denoted by P(X). Let µ, ν ∈ P(X). A coupling ξ of µ and ν is a probability measure on (X × X, B(X) ⊗ B(X)) with marginals µ and ν. Γ(µ, ν) denotes the set of all couplings of µ and ν. The L 1 Wasserstein distance with respect to a distance function d : We write W 1 if the underlying distance function is the Euclidean distance.
We denote by C(R + , X) the set of continuous functions from R + to X, and by C 2 (R + , X) the set of twice continuously differentiable functions.
Consider a probability space (Ω, A, P ) and a measurable function r : Ω → C(R + , X). Then P = P • r −1 denotes the law on C(R + , X), and P t = P • r t −1 the marginal law on X at time t.

Long-time behaviour of McKean-Vlasov diffusions
We establish our results regarding (1) and (3) under the following assumption on b. B1. The function b : R d → R d is Lipschitz continuous and anti-symmetric, i.e., b(z) = −b(−z), and there exist L ∈ (0, ∞), a function γ : R d → R d and a Lipschitz continuous function κ : and the following conditions are satisfied for all x, y ∈ R d : and lim sup Letb(r) = (κ(r) − L)r. If (13) holds, then there exist R 0 , R 1 ≥ 0 such that for b(r) < 0 , for any r > R 0 , In addition, we assume B2.
Often drifts of gradient type are considered, i.e., b ≡ ∇U for some potential U ∈ C 2 . Then, B 1 is satisfied for instance for L-strongly convex potentials and condition (12) holds for κ ≡ 0. In this case, B2 reduces to γ ∞ ≤ √ L/8. But, the assumptions include also asymptotically Lstrongly convex potentials as double-well potentials and more general drifts provided the deviation represented by the function γ to the linear term −Lz is sufficiently small in terms of the generalized one-sided Lipschitz bound and the bound in the supremum norm. In particular, this can always be obtained by considering a sufficiently small multiple of γ.
Additionally, we consider the following condition on the initial distribution.
Note that under conditions B1 and B3, unique strong solutions (1) and (3), see e.g. [13,Theorem 2.6]. In addition, note that since b is assumed to be anti-symmetric, by an easy localisation argument, we get that = 0 for all t ≥ 0. Suppose f : R + → R + is an increasing, concave function vanishing at zero. Then d(x, y) = f (|x − y|) defines a distance. The corresponding L 1 Wasserstein distance is denoted by W f . Note that in the case f (t) = t for any t ≥ 0, W f is simply W 1 .
Theorem 1 (Contraction for nonlinear SDE). Assume B 1 and B 2. Letμ 0 ,ν 0 be probability measures on (R d , B(R d )) satisfying B 3. For any t ≥ 0, letμ t andν t denote the laws ofX t and Y t where (X s ) s≥0 and (Ȳ s ) s≥0 are solutions of (1) with initial distributionμ 0 andν 0 , respectively. Then, for all t ≥ 0, where the function f is defined by (37) and the constantsc and M 1 are given bỹ Proof. The proof is postponed to Section 6.2.1.
The construction and definition of the underlying distance function f (|x − y|) mentioned in Theorem 1 is based on the one introduced by [20].
To prove Theorem 1 we use a coupling (X t ,Ȳ t ) t≥0 of two copies of solutions to the nonlinear stochastic differential equation (1) with different initial conditions. The coupling (X t ,Ȳ t ) t≥0 will be defined as the weak limit of a family of couplings (X δ t ,Ȳ δ t ) t≥0 , parametrized by δ > 0. Roughly, this family is mixture of synchronous and reflection couplings and can be described as follows. For we take an interpolation of synchronous and reflection coupling. We argue that the family of couplings {(X δ t ,Ȳ δ t ) t≥0 : δ > 0} is tight and that a subsequence {(X δn t ,Ȳ δn t ) t≥0 : n ∈ N} converges to a limit (X t ,Ȳ t ) t≥0 . This limit is a coupling which we call the sticky coupling associated to (1).

Nonlinear SDEs with sticky boundaries
Consider nonlinear SDEs with a sticky boundary at 0 of the form whereb : [0, ∞) → R is some continuous function and P t (g) = R+ g(r)P t (dr) for some measurable function g : [0, ∞) → R.
In this section we establish existence, uniqueness in law and comparison results for solutions of (6). Consider a filtered probability space (Ω, A, (F t ) t≥0 , P ) and a probability measure µ on R + . We call an (F t ) t≥0 adapted process (r t , W t ) t≥0 a weak solution of (23) with initial distribution µ if the following holds: µ = P • r −1 0 , the process (W t ) t≥0 is a one-dimensional (F t ) t≥0 Brownian motion w.r.t. P , the process (r t ) t≥0 is non-negative and continuous, and satisfies almost-surely Note that the sticky nonlinear SDE given in (6) is a special case of (23) with g(r) = a½ (0,∞) (r)

Existence, uniqueness in law, and a comparison result
Let W = C(R + , R) be the space of continuous functions endowed with the topology of uniform convergence on compact sets, and let B(W) be the corresponding Borel σ-algebra. Suppose (r t , W t ) t≥0 is a solution of (23) on (Ω, A, P ), then we denote by P = P • r −1 its law on (W, B(W)). We say that uniqueness in law holds for (23) if for any two solutions (r 1 t ) t≥0 and (r 2 t ) t≥0 of (23) with the same initial law, the distributions of (r 1 t ) t≥0 and (r 2 t ) t≥0 on (W, B(W)) are equal. We impose the following assumptions onb, g and the initial condition µ: H1.b is a Lipschitz continuous function with Lipschitz constantL andb(0) = 0.
H2. g is a left-continuous, non-negative, non-decreasing and bounded function.
H3. There exists p > 2 such that the p-th order moment of the law µ is finite.
Note that for (6), the condition H2 is satisfied if a is a positive constant. It follows from H 1 and H2 that there is a constant C < ∞ such that for all r ∈ R + , the following linear growth condition holds,b In order to get a solution to (23) on R + we extend the functionb to R by settingb(r) = 0 for r < 0. Note that any solution (r t ) t≥0 with initial distribution supported on R + satisfies almost surely r t ≥ 0 for all t ≥ 0. This follows from the Itō-Tanaka formula applied to F (r) = ½ (−∞,0) (r)r, cf. [ where ℓ 0− t (r) is the left local time at 0, which is given by Existence and uniqueness in law of (23) is a direct consequence of a stronger result that we now introduce. To study existence and uniqueness and to compare two solutions of (23) with different drifts, we establish existence of a synchronous coupling of two copies of (23), is a Brownian motion and where η ∈ Γ(µ, ν) for µ, ν ∈ P(R + ). Proof. The proof is postponed to Section 6.3.1.
Remark 4. We note that by the comparison result we can deduce uniqueness in law for the solution of (23).

Invariant measures and phase transition for (6)
Under the following conditions on the drift functionb we exhibit a phase transition phenomenon for the model (6), where as compared to (23) we focus on the case P t (g) = aP[r t > 0]. Theorem 5. Suppose H1 holds and lim sup r→∞ (r −1b (r)) < 0. Then, the Dirac measure at 0, δ 0 , is an invariant probability measure for (6). If there exists p ∈ (0, 1) solving with then the probability measure π on [0, ∞) given by is another invariant probability measure for (6).
Proof. The proof is postponed to Section 6.3.2.
In our next result we specify a necessary and sufficient condition for the existence of a solution of (26). (6) is of the formb(r) = −Lr with constant aL > 0. If a/ √L > 2/ √ π, then there exists a uniquep solving (27). In particular, the Dirac measure δ 0 and the measure π given in (28) withp are invariant measures for (6). On the other hand, if a/ √L ≤ 2/ √ π, then there exists nop solving (27).

Proposition 6. Suppose thatb(r) in
Proof. The proof is postponed to Section 6.3.2.

Convergence for sticky nonlinear SDEs of the form (6)
Under H1 and the following additional assumption we establish geometric convergence in Wasserstein distance for the marginal law of the solution r t of (6) to the Dirac measure at 0: H4. It holds lim sup r→∞ (r −1b (r)) < 0 and a ≤ (2 Theorem 7. Suppose H1 and H4 holds. Then, the Dirac measure at 0, δ 0 , is the unique invariant probability measure of (6). Moreover if (r s ) s≥0 is a solution of (6) with r 0 distributed with respect to an arbitrary probability measure µ on (R + , B(R + )), it holds for all t ≥ 0, where f and c are given by (37) and (36) with a andb given in (6) andR 0 andR 1 given in (29) and (30).
Proof. The proof is postponed to Section 6.3.3.

Uniform in time propagation of chaos
To prove uniform in time propagation of chaos, we consider the L 1 Wasserstein distance with respect to the cost functionf N • π : R N d × R N d → R + with π given in (8), andf N given bȳ with f : R + → R + defined in (37). This distance is denoted by W f,N . Note thatf N is equivalent to l 1 defined in (7). We note that since π defines a projection from R N d to the hyperplane H N ⊂ R N d given in (9), forμ andν on H N , W f,N (μ,ν) coincides with the Wasserstein distance given bŷ and W l 1 •π (μ,ν) =Ŵ l 1 (μ,ν), wheref N and l 1 are given in (32) and (7), respectively, and wherê W l 1 (μ,ν) is defined as in (33) with respect to the distance l 1 .
Proof. The proof is postponed to Section 6.4.
, respectively, with finite forth moment. An easy inspection and adaptation of the proof of Theorem 8 show that if B1 holds, then where f ,c and M 1 are defined as in Theorem 8.

System of N sticky SDEs
Consider a systerm of N one-dimensional SDEs with sticky boundaries at 0 given by The results on existence, uniqueness and the comparison theorem for solutions of sticky nonlinear SDEs mostly carry directly over to a solution of (34) and are applied to prove propagation of chaos in Theorem 8. Let µ be a probability distribution on the process (r i t ) t≥0 is non-negative, continuous and satisfies almost surely for any i ∈ {1, . . . , N } and t ∈ R + , To show existence and uniqueness in law of a weak solution ( , we suppose H1 and H2 forb and g.
It follows that there exists a constant C < ∞ such that for all is non-explosive. If the initial distribution is supported on R N + , then in the same line as for the nonlinear SDE in Section 3.1, the solution ({r i t } N i=1 ) t≥0 satisfies r i t > 0 almost surely for any i = 1, . . . , N and t ≥ 0 by H1 and H2.
Existence and uniqueness in law of (34) is a direct consequence of a stronger result that we now introduce. To study existence and uniqueness and to compare two solutions of (34) with different drifts, we establish existence of a synchronous coupling of two copies of (34), Let W N = C(R + , R N ) be the space of continuous functions from R + to R N endowed with the topology of uniform convergence on compact sets, and let B(W N ) denote its Borel σ-Algebra.
for any r ∈ R + , Proof. The proof is postponed to Section 6.5.
Remark 11. We note that by the comparison result we can deduce uniqueness in law for the solution of (34).

Proofs
Before proving the statements of Section 2-5, let us give an overview of the proofs. The first subsection gives the definition of the underlying distance function f used in Theorem 1, Theorem 7 and Theorem 8. Section 6.2 and Section 6.3 provide proofs for the convergence result for the nonlinear SDE (Theorem 1) using the sticky coupling approach and the results for the sticky nonlinear SDE (Theorem 7). Note that both Theorem 1 and Theorem 7 use the auxiliary Lemmata 14-16, where a comparison result and an approximation in two steps of the sticky nonlinear SDE are given. The existence of a solution to the sticky nonlinear SDE and a comparison result are essential to show contraction in this approach. In Section 6.4 and Section 6.5 the proofs for the propagation of chaos for the mean-field particle system and for the system of sticky SDEs are given. Note that the techniques to prove the result for the particle systems and the system of N sticky SDEs are partially similar to the nonlinear case. In particular, the proofs of Theorem 8 and Theorem 10 and its auxiliary Lemmata 18, 20-23 have a similar structure as the ones of Theorem 2 and Theorem 3 and its auxiliary Lemmata 12-16, respectively.

Definition of the metrics
In Theorem 1, Theorem 7 and Theorem 8 we consider Wasserstein distances based on a carefully designed concave function f : R + → R + that we now define. In addition we derive useful properties of this function that will be used in our proofs of Theorem 1, Theorem 8 and Theorem 7. Let a ∈ R + andb : R + → R be such that H4 is satisfied withR 0 andR 1 defined in (29). We define andR 1 is given in (30). It holds ϕ(r) = ϕ(R 0 ) for r ≥R 0 withR 0 given in (29), g(r) = g(R 1 ) ∈ [1/2, 3/4] for r ≥R 1 and g(r) ∈ [1/2, 1] for all r ∈ R + by (36) and H4. We define the increasing The construction is adapted from the function f given in [20]. Here, the function g has an extra term. As we see later in the proof of Theorem 1 and Theorem 7, this term has the purpose to control the term aP[r t > 0]. We observe that f is concave, since ϕ and g are decreasing. Since for and Indeed by construction of 1 and so (40) holds for 0 ≤ r <R 1 by (38). To show (40) for r >R 1 note that f ′′ (r) = 0 and f ′ (r) ≥ ϕ(R 0 )/2 hold for r >R 1 . Hence, by the definition (30) ofR 1 , for r >R 1 , Since Furthermore, we have We insert (42) and (43) in (41) and use (36) to obtain By H4 and (36), we get Combining this estimate with (44) gives (40) for r >R 1 . Hence, the choice of the underlying function f for the Wasserstein distance ensures (39) and (40). These properties guarantee that the term aP[r t > 0] is controlled in (6) and contraction with rate c is obtained in Theorem 1, Theorem 7 and Theorem 8.

Proof of Section 2
First, we prove Theorem 1 by using Theorem 2 and properties of the carefully constructed function f before we show Theorem 2. To prove that the dominating process r t exists we make use of the result of the sticky nonlinear SDE which are proven in Section 6.3.1.

Proof of Theorem 2
Note that the nonlinear SDE (21) has Lipschitz continuous coefficients. The existence and the uniqueness of the coupling (X δ For ε > 0 we define as in [19,Lemma 8] a C 2 approximation of the square root by Then, by Itō's formula, We take the limit ε → 0.
and rc δ is Lipschitz continuous with rc δ (0) = 0, we apply Lebesgue's dominated convergence theorem to show convergence for the integrals with respect to time t. More precisely, we note that the integrand ( Hence, the stochastic integral converges along a subsequence almost surely, to t 0 2rc δ (r δ s )dW δ s , see [40, Chapter 4, Theorem 2.12]. Hence, we obtain (46). Since (12) Proof. Note that (r δ t ) t≥0 and (r δ,ǫ t ) t≥0 have the same initial distribution and are driven by the same noise. Since the drift of (r δ t ) t≥0 is smaller than the drift of (r δ,ǫ t ) t≥0 for ǫ < ǫ 0 , the result follows by Lemma 14.
Proof of Theorem 2. We consider the nonlinear process (U δ,ǫ t ) t≥0 = (X δ t ,Ȳ δ t , r δ,ǫ t ) t≥0 on R 2d+1 for each ǫ, δ > 0. We denote by P δ,ǫ the law of U δ,ǫ on the space C(R + , R 2d+1 ). We define by X, Y : C(R + , R 2d+1 ) → C(R + , R d ) and r : C(R + , R 2d+1 ) → C(R + , R) the canonical projections onto the first d components, onto the second d components and onto the last component, respectively. By B1 and B3 following the same line as the proof of Lemma 15, see (56), it holds for each T > 0 for some constant C depending on T , L, γ Lip , γ ∞ and on the fourth moment of µ 0 and ν 0 . [32,Corollary 14.9] and for each ǫ > 0 there exists a subsequence δ n → 0 such that (P δn,ǫ . By a diagonalization argument and since {P ǫ T : T ≥ 0} is a consistent family, cf. [32,Theorem 5.16], there exists a probability measure P ǫ on C(R + , R 2d+1 ) such that for all ǫ there exists a subsequence δ n such that (P δn,ǫ ) n∈N converges along this subsequence to P ǫ . As in the proof of Lemma 16 we repeat this argument for the family of measures (P ǫ ) ǫ>0 . Hence, there exists a subsequence ǫ m → 0 such that (P ǫm ) m∈N converges to a measure P. Let (X t ,Ȳ t , r t ) t≥0 be some process on R 2d+1 with distribution P on (Ω,F ,P ).
Since (X δ t ) t≥0 and (Ȳ δ t ) t≥0 are solutions of (1) which are unique in law, we have that for any And therefore (X t ) t≥0 and (Ȳ t ) t≥0 are solutions of (1) as well with the same initial condition. Hence P • (X, Y) −1 is a coupling of two copies of (1).
Similarly to the proof of Lemma 15 and Lemma 16 there exist an extended probability space and a one-dimensional Brownian motion (W t ) t≥0 such that (r t , W t ) t≥0 is a solution to In addition, the statement of Lemma 13 carries over to the limiting process (r t ) t≥0 , i.e., |X t − Y t | ≤ r t for all t ≥ 0, since by the weak convergence along the subsequences (δ n ) n∈N and (ǫ m ) m∈N and the Portmanteau theorem,

Proof of Section 3
First, we introduce a family of nonlinear SDE whose drift and diffusion coefficient are Lipschitz continuous approximations of the drift and diffusion coefficient of (25). Theorem 3 is shown by proving a comparison result for nonlinear SDEs, taking in two steps the limit of the approximations and identifying the limit with the solution of (25). Then, Theorem 5 and Theorem 7 are shown where we make use of the careful construction of the function f .

Proof of Theorem 3
We show Theorem 3 via a family of stochastic differential equations, indexed by n, m ∈ N, with Lipschitz continuous coefficients, where P n,m (dx) for some measurable functions (g m ) m∈N and (h m ) m∈N , and where η n,m ∈ Γ(µ n,m , ν n,m ) for µ n,m , ν n,m ∈ P(R + ). We identify the weak limit for n → ∞ as solution of a family of stochastic differential equations, indexed by m ∈ N, given by with P m t = Law(r m t ) andP m t = Law(s m t ), and where η m ∈ Γ(µ m , ν m ) for µ m , ν m ∈ P(R + ). Taking the limit m → ∞, we show in the next step that the solution of (51) converges to a solution of (25).
We assume for (g m ) m∈N , (h m ) m∈N , (θ n ) n∈N and the initial distributions: Note that by H5 for any non-decreasing sequence (u m ) m∈N , which converges to u ∈ R + , g m (u m ) and h m (u m ) converge to g(u) and h(u), respectively. More precisely, it holds for for all m ∈ N, g m (u m ) − g(u) ≤ 0 and for m ≥ n, g m (u m ) ≥ g m (u n ) and therefore, lim m→∞ g m (u n ) − g(u) ≥ lim n→∞ lim m→∞ = lim n→∞ g(u n ) − g(u) = 0 by left-continuity of g. Hence, lim m→∞ g m (u m ) − g(u) = 0 and analogously lim m→∞ h m (u m ) − h(u) = 0. By H5, Γ = max( h ∞ , g ∞ ) is a uniform upper bound of (g m ) m∈N and (h m ) m∈N .
Consider a probability space (Ω 0 , A 0 , Q) and a one-dimensional Brownian motion (W t ) t≥0 . Under H5, H6 and H7, for all m, n ∈ N, there exists random variables r n,m , s n,m : Ω 0 → W for each n, m such that (r n,m t , s n,m t ) t≥0 is a unique strong solution to (50) associated to (W t ) t≥0 by [37, Theorem 2.2]. We denote by P n,m = Q • (r n,m , s n,m ) −1 the corresponding distribution on W × W.
Before studying the two limits n, m → ∞ and proving Theorem 3, we state a modification of the comparison theorem by Ikeda and Watanabe to compare two solutions of (50), cf.  Let (a k ) k∈N be a decreasing sequence, 1 > a 1 > a 2 We choose a sequence Ψ k (u), k = 1, 2, . . ., of continuous functions such that its support is contained in (a k , a k−1 ), Such a function exists. We set Note that for any k ∈ N, Applying Itō's formula to ϕ k (r t − s t ), we obtain It holds by boundedness and Lipschitz continuity of θ n We note that by H5 E[(g m (r u ) − h m (s u ))½ ru−su<0 ] ≤ 0 and by Lipschitz continuity of g m , by g m (r) ≤ h m (r) and since g m and h m are non-decreasing. Hence for I 2 , we obtain Taking the limit k → ∞ and using that E[r 0 − s 0 ] = 0, we obtain by the monotone convergence theorem and since (ϕ ′ k ) k∈N is a monotone increasing sequence which Then, By definition of t * , E[(r u − s u ) + ] = 0 for all u < t * and hence both terms are zero. This contradicts the definition of t * . Hence, (52) holds.
Next, we show that the distribution of the solution of (50) converges as n → ∞.
Then by Gronwall's lemma where C p depends on T and the p-th moment of the initial distribution, which is finite by H 6.
where C i (·) are constants depending on the stated argument and which are independent of n, m. Note that in the second step, we used Burkholder-Davis-Gundy inequality, see [ . Hence, for each T > 0 there exists a subsequence n k → ∞ and a probability measure P m T on C([0, T ], R 2 ). Since {P m T } T is a consistent family, there exists by [32, Theorem 5.16] a probability measure P m on (W × W, B(W) ⊗ B(W)) such that there is a subsequence (n k ) k∈N such that P n k ,m converges along this subsequence to P m . Note that here we can take by a diagonalization argument the same subsequence (n k ) k∈N for all m.
Characterization of the limit measure: In the following we drop for simplicity the index k in the subsequence. Denote by (r t , s t )(ω) = ω(t) the canonical process on W × W. Since with P u = P m • r −1 u andP u = P m • s −1 u . To show weak convergence to P m • (r, s, M m , N m ) −1 , we note that (M m , N m ) is continuous in W and we consider for a Lipschitz continuous and bounded function G : W → R, The second term converges to 0 as n → ∞, since (M m ) is continuous. For the first term it holds This term converges to 0 for n → ∞, since for all T > 0 and ω ∈ W, for n → ∞ Let G : W → R + be a F s -measurable, bounded, non-negative function. By uniformly integrability of (M n,m t , P n,m ) n∈N,t≥0 , for any s ≤ t, holds using uniform integrability of ((M n,m t ) 2 , P n,m ) n∈N,t≥0 which holds similarly as above. Note that where ℓ y t (r) is the local time of r in y. Therefore, for any 0 ≤ s < t, and hence, for any F s -measurable, bounded, non-negative function G : W → R + , As before, by a monotone class argument,  H1, H5 and H7. Hence by Kolmogorov's continuity criterion, cf. [32, Corollary 14.9], we can deduce that there exists a probability measure P on (W × W, B(W) ⊗ B(W)) such that there is a subsequence (m k ) k∈N along which P m k converge towards P. To characterize the limit, we first note that by Skorokhod representation theorem, cf. [6, Chapter 1, Theorem 6.7], without loss of generality we can assume that (r m , s m ) are defined on a common probability space (Ω, A, P ) with expectation E and converge almost surely to (r, s) with distribution P. By H 5, P m t (g m ) = E[g m (r m t )] and the monotone convergence theorem, P m t (g m ) converges to P t (g) for any t ≥ 0. Then, by Lebesgue convergence theorem it holds almost surely for all t ≥ 0  • (r, s, M, N ) where Let G : W → R + be a F s -measurable bounded, non-negative function. By uniform integrability, for any s ≤ t, Proof of Theorem 3. The proof is a direct consequence of Lemma 15 and Lemma 16.

Proof of Theorem 5
Proof of Theorem 5. Note that the Dirac at 0, δ 0 , is by definition an invariant measure of (r t ) t≥0 solving (6). Assume that the process starts from an invariant probability measure π, hence P(r t > 0) = p = π((0, ∞)) for any t ≥ 0. Note that for p = 0 the drift vanishes. If the initial measure is the Dirac measure in 0, δ 0 , then the diffusion coefficient disappears. Hence, Law(r t ) = δ 0 for any t ≥ 0. It remains to investigate the case p = 0. Here, we are in the regime of [19,Lemma 24] where an invariant measure is of the form (28). Since p = P(r t > 0), the invariant measure π satisfies additionally the necessary condition p = π((0, ∞)) = I(a, p) 2/(ap) + I(a, p) with I(a, p) given in (27). For p = 0, this expression is equivalent to (26).

Proof of Theorem 7
Proof of Theorem 7. To show (31) we extend the function f to a concave function on R by setting f (x) = x for x < 0. Note that f is continuously differentiable and f ′ is absolutely continuous and bounded. Using Itō-Tanaka formula, c.f. [40, Chapter 6, Theorem 1.1] we obtain where the last step holds by (39) and (40). By applying Gronwall's lemma, we obtain (31).

Proof of Section 4
The proof of Theorem 8 works in the same line as the proof of Theorem 1 and Theorem 2. Additionally, the difference between the nonlinear SDE and the mean-field system is bounded in Lemma 19 for which a uniform in time bound for the second moment of the process (X t ) t≥0 solving (1) is needed and which is given first.
Then there exists C ∈ (0, ∞) depending on d, W and the second moment ofX 0 such that The proof relies on standard techniques (see e.g., [17,Lemma 8]) and is added for completeness.
Proof of Lemma 17. By Itō's formula, it holds Taking expectation and using symmetry, we get Hence by definition (14) of R 0 and by Gronwall's lemma we obtain the result (69).
Let N ∈ N. We construct a sticky coupling of N i.i.d. realizations of solutions ( (1) and of the solution ({Y i t } N i=1 ) t≥0 to the mean field particle system (3). Then, we consider a weak limit for δ → 0 of Markovian couplings which are constructed similar as in Section 2. Let rc δ , sc δ satisfy (19) and (20). The coupling ( is defined as process in R 2N d satisfying a system of SDEs given by where Proof. By (70) and since γ is anti-symmetric, it holds by Itō's formula for any i ∈ {1, . . . , N }, one-dimensional Brownian motions given by (73). Note that the prefactor (N/(N + 1)) 1/2 ensures that the quadratic variation satisfies [W i ] t = t for t ≥ 0, and hence ( are Brownian motions. This definition of ({W i t } N i=1 ) t≥0 leads to (1 + 1/N ) 1/2 in the diffusion term of the SDE. Applying the C 2 approximation of the square root used in the proof of Lemma 12 and taking ε → 0 in the approximation yields the stochastic differential equations of ({r i,δ t } N i=1 ) t≥0 . We obtain its upper bound for ǫ < ǫ 0 by B1 and (20) similarly to the proof of Lemma 12.
Next, we state a bound for (72). The result and the proof are adapted from [17, Theorem 2].

Lemma 19. Under the same assumption as in Lemma 20, it holds for any
is given in (72) and C 1 and C 2 are constants depending on γ ∞ , L and C given in Lemma 17.
Proof. Note, that both processes ( have the same initial condition and are driven by the same noise. Since the drift for ( (20), we can concluder i,δ t ≤ r i,δ,ǫ t almost surely for all t ≥ 0, ǫ < ǫ 0 and i = 1, . . . N by Lemma 21.

Proof of Theorem 8. Consider the process ({U
). We define the canonical projections X, Y, r onto the first N d, second N d and last N components.
By B1 and B3 it holds in the same line as in the proof of Lemma 22 for each T > 0 for some constant C depending on T , L, γ Lip , γ ∞ , N and on the fourth moment of µ 0 and ν 0 . Note that we used here that the additional drift terms (A i,δ t ) t≥0 occurring in the SDE of Then as in the proofs of Lemma 22 and Lemma 23, P δ,ǫ is tight and converges weakly along a subsequence to a measure P by Kolmogorov's continuity criterion, cf. [32,Corollary 14.9].
As in Lemma 22 the law P δ,ǫ T of ( ) is tight for each T > 0 by [32,Corollary 14.9] and for each ǫ > 0 there exists a subsequence δ n → 0 such that (P δn,ǫ T ) n∈N on C([0, T ], R N (2d+1) ) converge to a measure P ǫ T on C([0, T ], R N (2d+1) ). By a diagonalization argument and since {P ǫ T : T ≥ 0} is a consistent family, cf. [32,Theorem 5.16], there exists a probability measure P ǫ on C(R + , R N (2d+1) ) such that for all ǫ there exists a subsequence δ n such that (P δn,ǫ ) n∈N converges along this subsequence to P ǫ . As in the proof of Lemma 23 we repeat this argument for the family of measures (P ǫ ) ǫ>0 . Hence, there exists a subsequence ǫ m → 0 such that (P ǫm ) m∈N converges to a measure P.
Similarly to the proof of Lemma 22 and Lemma 23 there exist an extended underlying probability space and N i.i.d.one-dimensional Brownian motion ( where |. In addition, the statement of Lemma 20 carries over to the limiting process ({r i t } N i=1 ) t≥0 , since by the weak convergence along the subsequences (δ n ) n∈N and (ǫ m ) m∈N and the Portmanteau theorem, j t for all t ≥ 0 and i = 1, . . . , N .

Proof of Section 5
Analogously to the proof of Theorem 3, we introduce approximations for the system of sticky SDEs and prove Theorem 10 using a comparison result given in Lemma 21 and via taking the limit of the approximation of the system of sticky SDEs in two steps and identifying the limit with the solution of (35).
As for the nonlinear case we show Theorem 10 via a family of stochastic differential equations, with Lipschitz continuous coefficients, where η n,m ∈ Γ(µ n,m , ν n,m where η m ∈ Γ(µ m , ν m ).
Taking the limit m → ∞, we obtain (35) as the weak limit of (79). In the case g(r) = ½ (0,∞) (r), Hence, we obtain analogously to (54), In the next step, we prove that the distribution of the solution of (78) converges as n → ∞.

Lemma 22. Assume that H 1 and H 2 is satisfied for
where C p depends on T and the p-th moment of the initial distribution, which is by assumption finite. Similarly, it holds sup t∈[0,T ] E[|s i,n,m t | p ] < C p for t ≤ T . Using these moment bounds, it holds for all t 1 , t 2 ∈ [0, T ] by H1, H5 and H6, where C k (·) are constants depending on the stated arguments, but independent of n, m. Note that in the second step, we use Burkholder-Davis-Gundy inequality, see [ 2N ). Hence, for each T > 0 there exists a subsequence n k → ∞ and a probability measure P T on C([0, T ], R 2N ). Since {P m T } T is a consistent family, there exists by [32, Theorem 5.16] a probability measure P m on (W N × W N , B(W N ) ⊗ B(W N )) such that P n k ,m converges weakly to P m . Note that we can take here the same subsequence (n k ) for all m using a diagonalization argument. Characterization of the limit measure: Denote by ( To characterize the measure P m we first note that P m • (r i 0 , s i 0 ) −1 = η m for all i ∈ {1, . . . , N }, since P n,m (r i 0 , s i 0 ) −1 = η n,m converges weakly to η m by assumption. We define exists P m -almost surely. To complete the identification of the limit, it suffices to identify the quadratic variation. Similar to the computations in the proof of Lemma 15, it holds Then by a martingale representation theorem, cf. [29, Chapter II, Theorem 7.1], there is a probability space (Ω m , A m , P m ) and a Brownian motion In the next step we show that the distribution of the solution of (79) converges as m → ∞. Consider a probability space (Ω m , A m , P m ) for each m ∈ N and random variables : H1 and H2 is satisfied for (b, g) and (b, h). Let η ∈ Γ(µ, ν) where the probability measures µ and ν on R + satisfy H3. Further, let (g m ) m∈N , (h m ) m∈N , (µ m ) m∈N , (ν m ) m∈N  and (η m ) m∈N be such that H5 and H7 hold. Then there exists a random variable ({r i , s

Lemma 23. Assume that
is a weak solution of (35). Moreover, the laws P m • ({r i,m , s for any r ∈ R + , and , P m ) m∈N,t≥0 are uniformly integrable. In the same line as weak convergence is shown in the proof of Lemma 15 and by (85) , F t , P) are continuous martingales using the same argument as in (59). Further, the quadratic variation ([{M i t , N i t } N i=1 ] t ) t≥0 exists P-almost surely and is given by (83) and (84) P-almost surely, which holds following the computations in the proof of Lemma 15 and Lemma 22. As in Lemma 22, we conclude by a martingale representation theorem that there are a probability space (Ω, A, P ) and a Brownian motion {W i } N i=1 and random variables ( is a weak solution of (25).
By the Portmanteau theorem the monotonicity carries over to the limit, since P m •({r i , Proof of Theorem 10. The proof is a direct consequence of Lemma 22 and Lemma 23.

A.1 Kuramoto model
Lower bounds on the contraction rate can also be shown for nonlinear SDEs on the one-dimensional torus using the same approach. Here, we consider the Kuramoto model given by on the torus T = R/(2πZ).

Theorem 24.
Let µ t and ν t be laws of X t and Y t where (X s ) s≥0 and (Y s ) s≥0 are two solutions of (86) with initial distributions µ 0 and ν 0 on (T, B(T)), respectively. If andf is a concave, increasing function given in (92).
In [16, Appendix A], a contraction result is stated for a general drift using a similar approach. We prove Theorem 24 via a sticky coupling approach. In the same line as in Section 2 the coupling (X t , Y t ) t≥0 is defined as the weak limit of Markovian couplings {(X δ wherer δ Consider the process (U δ,ǫ t ) t≥0 = (X δ t , Y δ t , r δ,ǫ t ) t≥0 on T 2 × [0, π] for each ǫ, δ > 0. We define by X, Y : C(R + , T 2 × [0, π]) → C(R + , T) and r : C(R + , T 2 × [0, π]) → C(R + , [0, π]) the canonical projections onto the first component, onto the second component and onto the last component, respectively. Analogously to the proof of Theorem 2, the law P δ,ǫ of the process (U δ,ǫ t ) t≥0 converges along a subsequence (δ k , ǫ k ) k∈N to a probability measure P. Let (X t , Y t , r t ) t≥0 be some process on T 2 × [0, π] with distribution P on (Ω,F,P ). Since (X δ t ) t≥0 and (Y δ t ) t≥0 are solutions of (86) which are unique in law, we have that for any ǫ, δ > 0, P δ,ǫ • X −1 = P • X −1 and P δ,ǫ • Y −1 = P • Y −1 . And therefore (X t ) t≥0 and (Y t ) t≥0 are solutions of (86) as well with the same initial condition. Hence P • (X, Y) −1 is a coupling of two copies of (86).
Further, the monotonicityr δ t ≤ r δ,ǫ t carries over to the limit by the Portmanteau theorem. Finally, similarly to the proof of Lemma 15 and Lemma 16 there exist an extended probability space and a one-dimensional Brownian motion (W t ) t≥0 such that (r t , W t ) t≥0 is a solution to (97).
Remark 26. Let us finally remark that we can relax the condition (87) and we can obtain contraction with a modified contraction rate c T for all k < k 0 , where k 0 is given by k 0 π 0 exp(2k 0 − 2k 0 cos(r/2))dr = 1 .