1 Introduction

Consider a diffusion process \((X_t)_{t\ge 0}\) in \({\mathbb {R}}^d\) defined by a stochastic differential equation

$$\begin{aligned} dX_t\,=\,b(X_t)\,dt+\sigma \,dB_t. \end{aligned}$$
(1)

Here \((B_t)_{t\ge 0}\) is a d-dimensional Brownian motion, \(\sigma \in {\mathbb {R}}^{d\times d}\) is a constant \(d\times d\) matrix with \(\det \sigma >0\), and \(b:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) is a locally Lipschitz continuous function. We assume that the unique strong solution of (1) is non-explosive for any initial condition, which is essentially a consequence of the assumptions imposed further below. The transition kernels of the diffusion process on \({\mathbb {R}}^d\) defined by (1) will be denoted by \(p_t(x,dy)\).

Contraction properties of the transition semigroup \((p_t)_{t\ge 0}\) have been studied by various approaches. In particular, \(L^2\) and entropy methods (e.g. spectral gap estimates, logarithmic Sobolev and transportation inequalities) yield bounds that both are relatively stable under perturbations and applicable in high dimensions, cf. e.g. [27, 38, 43]. On the other hand, coupling methods provide a more intuitive probabilistic understanding of convergence to equilibrium [13, 14, 16, 24, 25, 34, 35, 41, 43]. In contrast to \(L^2\) and entropy methods, bounds resulting from coupling methods typically hold for arbitrary initial values \(x_0\in \mathbb {R}^d\). In many applications, couplings are used to bound the total variation distances \(d_{TV}(\mu p_t,\nu p_t)\) between the laws \(\mu p_t\) and \(\nu p_t\) of \(X_t\) w.r.t. two different initial distributions \(\mu \) and \(\nu \) at a given time \(t\ge 0\) , cf. [34, 35]. Typically, however, the total variation distance is decaying substantially only after a certain amount of time. This is also manifested in cut-off phenomena [12, 19, 20, 33].

Alternatively, it is well-known that synchronous couplings [i.e., couplings given by the flow of the s.d.e. (1)] can be used to show that the map \(\mu \mapsto \mu p_t\) is exponentially contractive w.r.t. \(L^p\) Wasserstein distances \(W^p\) for any \(p\in [1,\infty )\) if, for example, \((X_t)\) is an overdamped Langevin diffusion with a strictly convex potential \(U\in C^2(\mathbb {R}^d)\), i.e., \(\sigma =I_d\) and \(b=-\nabla U/2\), see e.g. [7]. This leads to an elegant and powerful approach to convergence to equilibrium and to many related results if applicable. However, it has been pointed out in [37] that strict convexity of U is also a necessary condition for exponential contractivity w.r.t. \(W^p\). This seems to limit the applicability substantially.

Here, we are instead considering exponential contractivity w.r.t. Kantorovich (\(L^1\) Wasserstein) distances \(W_f\) based on underlying distance functions of the form

$$\begin{aligned} d_f(x,y) = f(\Vert x-y\Vert )\quad \text{ on } \ \mathbb {R}^d, \end{aligned}$$

and, more generally,

$$\begin{aligned} d_f(x,y)= \sum _{i=1}^nf_i(\Vert x^i-y^i\Vert ) \quad \text{ on } \ \mathbb {R}^{d_1}\times \cdots \times \mathbb {R}^{d_n} , \end{aligned}$$

where \(f,f_i:[0,\infty )\rightarrow [0,\infty )\) are strictly increasing concave functions, cf. Sects. 2.1 and 3.1 below for details. For proving exponential contractivity, we will apply a reflection coupling on \({\mathbb {R}^{d}}\) and an (approximate) componentwise reflection coupling on products of Euclidean spaces. It will become clear by the proofs below, that for distances based on concave functions \(f,f_i\), these couplings are superior to synchronuous couplings, whereas the synchronuous couplings are superior w.r.t. the Wasserstein distances \(W^p\) for \(p>1\), cf. e.g. Lemma 4.

The idea to study contraction properties w.r.t. Kantorovich distances based on concave distance functions appears in Chen and Wang [15, 16, 42] and Hairer and Mattingly [24]. In [16], similar methods are applied to estimate spectral gaps of diffusion generators on \(\mathbb {R}^d\) and on manifolds. In [24] and [25], Hairer, Mattingly and Scheutzow apply Wasserstein distances based on particular concave distance functions to prove exponential ergodicity in infinite dimensional situations. The key idea below is to obtain more quantitative results by “almost” optimizing the choice of the functions f and \(f_i\) to obtain large contraction rates. In the case \(n=1\), this idea has also been exploited in [16] to derive lower bounds for spectral gaps. The novelty here is that we suggest a simple and very explicit choice for f that leads to close to optimal results in several examples. Furthermore, by a new extension to the product case based on an approximate componentwise reflection coupling, we obtain dimension free contraction results in product models and perturbations thereof without relying on convexity.

Before stating the general results, we consider some examples illustrating the scope of the approach:

Example 1

(Overdamped Langevin dynamics with locally non-convex potential) Suppose that \(\sigma =I_d\) and \(b(x)=-\frac{1}{2}\nabla U(x)\) for a function \(U\in C^2({\mathbb {R}}^d)\) that is strictly convex outside a given ball \(B\subset {\mathbb {R}}^d\). Then \(Z:=\int \exp (-U(x))dx\) is finite, and the probability measure

$$\begin{aligned} d\mu = Z^{-1}\exp (-U)\,dx \end{aligned}$$

is a stationary distribution for the diffusion process \((X_t)\). Corollary 2 below yields exponential contractivity for the transition semigroup \((p_t)\) with an explicit rate w.r.t. an appropriate Kantorovich distance \(W_f\). As a consequence, we obtain dimension-independent upper bounds for the standard \(L^1\) Wasserstein distances between the laws \(\nu p_t\) of \(X_t\) and \(\mu \) for arbitrary initial distributions \(\nu \) and \(t\ge 0\). These bounds are of optimal order in \(R,L\in [0,\infty )\) and \(K\in (0,\infty )\) if \((x-y)\cdot (\nabla U(x)-\nabla U(y))\) is bounded from below by \(-L|x-y|^2\) for \(|x-y|<R\) and by \(K|x-y|^2\) for \(|x-y|\ge R\).

Example 2

(Product models) For a diffusion process \(X_t=(X_t^1,\ldots ,X_t^n)\) in \({\mathbb {R}}^{n\cdot d}\) with independent Langevin diffusions \(X^1,\ldots ,X^n\) as in Example 1, Theorem 7 below yields exponential contractivity in an appropriate Kantorovich distance with rate \(c=\min \,(c_1,\ldots ,c_n)\) where \(c_1,\ldots ,c_n\) are the lower bounds obtained for the contraction rates of the components.

Example 3

(Systems of interacting diffusions) More generally, consider a system

$$\begin{aligned} dX_t^i= -\frac{1}{2} \nabla U(X_t^i)\, dt\, -\, \frac{\alpha }{n}\sum _{j=1}^n\nabla V(X_t^i-X_t^j)\, dt \, +\, dB_t^i, \ \ \ \quad i=1,\ldots ,n, \end{aligned}$$

of n interacting diffusion processes in \({\mathbb {R}}^d\) where \(U\in C^2({\mathbb {R}}^d)\) is strictly convex outside a ball, \(V\in C^2({\mathbb {R}}^d)\) has bounded second derivatives, and \(B^1,\ldots ,B^n\) are independent Brownian motions in \({\mathbb {R}}^d\). Then Corollary 9 below shows that for \(\alpha \) sufficiently small, exponential contractivity holds in an appropriate Kantorovich distance with a rate that does not depend on n.

We now introduce briefly the couplings to be considered in the proofs below:

A coupling by reflection of two solutions of (1) with initial distributions \(\mu \) and \(\nu \) is a diffusion process \((X_t,Y_t)\) with values in \({\mathbb {R}}^{2d}\) defined by \((X_0,Y_0)\sim \eta \) where \(\eta \) is a coupling of \(\mu \) and \(\nu \),

$$\begin{aligned} dX_t= & {} b(X_t)\,dt+\sigma \,dB_t\qquad \qquad \qquad \quad \,\text{ for } t\ge 0, \end{aligned}$$
(2)
$$\begin{aligned} dY_t= & {} b(Y_t)\,dt+\sigma (I-2e_te_t^\top )\,dB_t\quad \hbox { for }t<T,\quad Y_t= X_t \text{ for } t\ge T. \end{aligned}$$
(3)

Here \(e_te_t^\top \) is the orthogonal projection onto the unit vector

$$\begin{aligned} e_t:=\sigma ^{-1}(X_t-Y_t)/|\sigma ^{-1}(X_t-Y_t)|, \end{aligned}$$

and \(T=\inf \{t\ge 0\, :\, X_t=Y_t\} \) is the coupling time, i.e., the first hitting time of the diagonal \(\varDelta =\{(x,y)\in {\mathbb {R}}^{2d}:x=y\}\), cf. [14, 35]. The reflection coupling can be realized as a diffusion process in \({\mathbb {R}}^{2d}\), and the marginal processes \((X_t)_{t\ge 0}\) and \((Y_t)_{t\ge 0}\) are solutions of (1) w.r.t. the Brownian motions \(B_t\) and

$$\begin{aligned} \check{B}_t=\int _0^t(I_d-2\mathbb {I}_{\{s<T\}}e_se_s^\top )\,dB_s. \end{aligned}$$

Notice that by Lévy’s characterization, \(\check{B}\) is indeed a Brownian motion since the process \(I_d-2\mathbb {I}_{\{s<T\}}e_se_s^\top \) takes values in the orthogonal matrices. The difference vector

$$\begin{aligned} Z_t:= X_t-Y_t \end{aligned}$$

solves the s.d.e.

$$\begin{aligned} dZ_t= & {} (b(X_t)-b(Y_t))\,dt+{2}{|\sigma ^{-1}Z_t|^{-1}}Z_t\,dW_t\quad \text{ for } t<T,\nonumber \\ Z_t= & {} 0\quad \text{ for } t\ge T, \end{aligned}$$
(4)

w.r.t. the one-dimensional Brownian motion

$$\begin{aligned} W_t\,=\,\int _0^te_s^\top \,dB_s. \end{aligned}$$

A synchronuous coupling of two solutions of (1) is defined correspondingly with \(e_t\equiv 0\), i.e., the same noise is applied both to \(X_t\) and \(Y_t\). Below we will also consider mixed couplings that are reflection couplings for certain values of \(Z_t\), synchronuous couplings for other values of \(Z_t\), and mixtures of both types of couplings for \(Z_t\) in an intermediate region. Notice that the standard reflection coupling introduced above is a synchronuous coupling for \(t\ge T\), i.e., if \(Z_t=0\) !

More generally, we will consider couplings for diffusion processes on product spaces (such as in Examples 2 and 3) that are approximately componentwise reflection couplings, i.e., the i-th component \((X^i_t,Y^i_t)\) of the coupling \((X_t,Y_t)\) is defined similarly to (2) provided \(|X_t^i-Y_t^i|\ge \delta \) for a given constant \(\delta >0\), cf. Sect. 6 below.

For diffusion processes with non-constant diffusion matrix \(\sigma (x)\), the reflection coupling should be replaced by the Kendall-Cranston coupling w.r.t. the intrinsic Riemannian metric \(G(x)=\left( \sigma (x)\sigma (x)^T\right) ^{-1}\) induced by the diffusion coefficients, cf. [17, 28, 31, 43]. Here, we restrict ourselves to the case of constant diffusion matrices where the Kendall-Cranston coupling coincides with the standard coupling by reflection.

The main results of this paper are stated in Sect. 2 for reflection coupling, and in Sect. 3 for componentwise reflection coupling on product spaces. The proofs are contained in Sects. 4, 5 and 6. A part of the results in Sect. 2 have been announced in the Comptes Rendus Note [21].

2 Main results for reflection coupling

2.1 Reflection couplings and contractivity on \({\mathbb {R}}^d\)

Lindvall and Rogers [35] introduced coupling by reflection in order to derive upper bounds for the total variation distance of the distributions of \(X_t\) and \(Y_t\) at a given time \(t\ge 0\). Here we are instead considering the Kantorovich-Rubinstein (\(L^1\)-Wasserstein) distances

$$\begin{aligned} W_f(\mu ,\nu )\,=\,\inf \limits _\eta \int d_f(x,y)\,\eta (dx\,dy),\,\quad d_f(x,y)\,=\,f(\Vert x-y\Vert )\quad (x,y\in {\mathbb {R}}^d), \end{aligned}$$
(5)

of probability measures \(\mu , \nu \) on \({\mathbb {R}}^d\), where the infimum is over all couplings \(\eta \) of \(\mu \) and \(\nu \), \(f:[0,\infty )\rightarrow [0,\infty )\) is an appropriately chosen concave increasing function with \(f(0)=0\), and \(\Vert z\Vert =\sqrt{z\cdot Gz}\) with \(G\in {\mathbb {R}}^{d\times d}\) symmetric and strictly positive definite. Typical choices for the norm are the Euclidean norm \(\Vert z\Vert =|z|\) and the intrinsic metric \(\Vert z\Vert =|\sigma ^{-1}z|\) corresponding to \(G=I_d\) and \(G=(\sigma \sigma ^\top )^{-1}\) respectively.

Remark 1

(Interpolating between total variation and Wasserstein distances) For the choice of the function f there are two extreme cases with minimal and maximal concavity:

  1. 1.

    Choosing \(f(x)=x\) yields the standard Kantorovich (\(L^1\) Wasserstein) distance \(W_f=W^1\). In this case it is well known that if, for example, \(G=\sigma =I_d\) and \(b(x)=-\nabla U(x)/2\), then the transition kernels \(p_t(x,dy)\) of the diffusion process \((X_t)\) satisfy

    $$\begin{aligned} W_f(\mu p_t,\nu p_t)\ \le \ e^{-K t/2}\, W_f(\mu ,\nu )\qquad \hbox {for any }\mu ,\nu \quad \hbox { and }\quad t\ge 0, \end{aligned}$$

    provided \(\nabla ^2U\ge K\cdot I_d\) holds globally. This condition is also sharp in the sense that if U is not globally strictly convex, then contractivity of \(p_t\) w.r.t. \(W_f\) does not hold, cf. Sturm and von Renesse [37].

  2. 2.

    On the other hand, choosing \(f(x)=\mathbb {I}_{(0,\infty )}(x)\) yields the total variation distance \(W_f=d_{TV}\). In this case,

    $$\begin{aligned} W_f(\mu p_t,\nu p_t)\ \le \ \mathbb {P}[T>t]\qquad \text{ for } \text{ any } \mu ,\nu \, \quad \text{ and } \quad t\ge 0, \end{aligned}$$

    but there is no strict contractivity of \(p_t\) w.r.t. \(d_{TV}\) in general. Indeed, in many applications \(d_{TV}(\mu p_t,\nu p_t)\) only decreases substantially after a certain amount of time (“cut-off phenomenon”).

By choosing for f an appropriate concave function, exponential contractivity w.r.t. \(W_f\) may hold even without global convexity, cf. [16]. We now explain how the function f can be chosen in a very explicit way such that the obtained exponential decay rate w.r.t. the Kantorovich distance \(W_f\) differs from the maximal decay rate that we can achieve by our approach based on reflection coupling only by a constant factor.

At first, similarly to Lindvall and Rogers [35], let us define for \(r\in (0,\infty )\):

$$\begin{aligned} \kappa (r)= & {} \inf \left\{ -2\,\frac{|\sigma ^{-1}(x-y)|^2}{\Vert x-y\Vert ^2}\,\frac{(x-y)\cdot G(b(x)-b(y))}{\Vert x-y\Vert ^2}\,\right. \\&\qquad \left. :\, x,y\in {\mathbb {R}}^d \text{ s.t. } \Vert x-y\Vert =r\right\} , \end{aligned}$$

i.e., \(\kappa (r)\) is the largest constant such that

$$\begin{aligned} (x-y)\cdot G(b(x)-b(y))\ \le \ -\frac{1}{2}\kappa (r) \Vert x-y\Vert ^4 /|\sigma ^{-1}(x-y)|^2 \end{aligned}$$
(6)

holds for any \(x,y\in {\mathbb {R}}^d\) with \(\Vert x-y\Vert =r\). Notice that if \(\Vert \,\cdot \,\Vert \) is the intrinsic metric then the factor \(|\sigma ^{-1}(x-y)|^2/\Vert x-y\Vert ^2\) equals 1 . In Example 1 with \(G=I_d\), we have

$$\begin{aligned} \kappa (r)=\inf \left\{ \int \nolimits _0^1\partial ^2_{(x-y)/|x-y|}U((1-t)x+ty)\, dt :x,y\in {\mathbb {R}}^d \text{ s.t. }\, |x-y| =r\right\} . \end{aligned}$$

We assume from now on that \(\kappa (r)\) is a continuous function on \((0,\infty )\) satisfying

$$\begin{aligned} \liminf _{r\rightarrow \infty }\kappa (r)>0\quad \text{ and } \quad \int _0^1r\kappa (r)^{-}\, dr<\infty . \end{aligned}$$
(7)

In Example 1 with \(G=I_d\), this assumption is satisfied if U is strictly convex outside a ball.

Next, we define constants \(R_0,R_1\in [0,\infty )\) with \(R_0\le R_1\) by

$$\begin{aligned} R_0= & {} \inf \{R\ge 0\,:\,\kappa (r)\ge 0\ \quad \forall \,r\ge R\}, \end{aligned}$$
(8)
$$\begin{aligned} R_1= & {} \inf \{R\ge R_0\,:\,\kappa (r)R(R-R_0)\ge 8\ \quad \forall \, r\ge R\} , \end{aligned}$$
(9)

Notice that by (7), both constants are finite. We now consider the particular distance function \(d_f(x,y)=f(\Vert x-y\Vert )\) given by

$$\begin{aligned} f(r)= & {} \int \limits _0^{r}\varphi (s)g(s)\,ds,\qquad \qquad \text{ where }\nonumber \\ \varphi (r)= & {} \exp \left( -\frac{1}{4}\int \limits _{0}^{r}s\kappa (s)^{-}\,ds\right) ,\qquad \varPhi (r) \, =\, \int \limits _0^r\varphi (s)\,ds,\nonumber \\ g(r)= & {} 1-\frac{1}{2}\int _0^{r\wedge R_1}\frac{\varPhi (s)}{\varphi (s)}\,ds\Big /\int \limits _0^{R_1}\frac{\varPhi (s)}{\varphi (s)}\,ds. \end{aligned}$$
(10)

Let us summarize some basic properties of the functions \(\varphi ,g\) and f:

  • \(\varphi \) is decreasing, \(\varphi (0)=1\), and \(\varphi (r)=\varphi (R_0)\) for any \(r\ge R_0\),

  • g is decreasing, \(g(0)=1\), and \(g(r)=\frac{1}{2}\) for any \(r\ge R_1\),

  • f is concave, \(f(0)=0\), \(f'(0)=1\), and

    $$\begin{aligned} \varPhi (r)/2\,\le \,f(r)\,\le \,\varPhi (r)\quad \text{ for } \text{ any } r\ge 0. \end{aligned}$$
    (11)

The last statement shows that \(d_f\) and \(d_\varPhi \) as well as \(W_f\) and \(W_\varPhi \) differ at most by a factor 2.

We will explain in Sect. 4 below how the choice of f is obtained by trying to maximize the exponential decay rate. Let us now state our first main result which will be proven in Sect. 4.

Theorem 1

(Exponential contractivity of reflection coupling) Let \(\alpha :=\sup \{|\sigma ^{-1}z|^2\, :\,z\in {\mathbb {R}}^d \text{ with } \Vert z\Vert =1\}\), and define \(c\in (0,\infty )\) by

$$\begin{aligned} \frac{1}{c}= \alpha \int \limits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds= \alpha \int \limits _0^{R_1}\int \limits _0^s\exp \left( \frac{1}{4}\int \limits _t^su\kappa (u)^{-}\,du\right) \,dt\,ds\, . \end{aligned}$$
(12)

Then for the distance \(d_f\) given by (5) and (10), the function \(t\mapsto e^{ct}\mathbb {E}[d_f(X_t,Y_t)]\) is decreasing on \([0,\infty )\).

The theorem yields exponential contractivity at rate \(c>0\) for the transition kernels \(p_t\) of (1) w.r.t. the Kantorovich distance \(W_f\). Moreover, it implies upper bounds for the standard Kantorovich (\(L^1\) Wasserstein) distance \(W^1=W_{\mathrm{id}}\) w.r.t. the distance function \(d(x,y)=\Vert x-y\Vert \):

Corollary 2

For any \(t\ge 0\) and any probability measures \(\mu ,\nu \) on \({\mathbb {R}}^d\),

$$\begin{aligned} W_f(\mu p_t,\nu p_t)\le & {} \exp ({-ct})\, W_f(\mu ,\nu ),\ \qquad \text{ and } \end{aligned}$$
(13)
$$\begin{aligned} W^1(\mu p_t,\nu p_t)\le & {} 2\varphi (R_0)^{-1}\exp ({-ct})\, W^1(\mu ,\nu ). \end{aligned}$$
(14)

Note that the second estimate follows from the first, since by the properties of \(\varphi \) and g stated above, \(\varphi (R_0)/2\le f^\prime \le 1\), and hence

$$\begin{aligned} \varphi (R_0)\Vert x-y\Vert /2\ \le \ d_f(x,y)\ \le \ \Vert x-y\Vert \qquad \text{ for } \text{ any } x,y\in {\mathbb {R}}^d. \end{aligned}$$
(15)

The corollary yields an upper bound for mixing times w.r.t. the Kantorovich distance \(W^1\). For \(\varepsilon >0\) let

$$\begin{aligned} \tau _{W^1}(\varepsilon )\ :=\ \inf \{t\ge 0\,:\,W^1(\mu p_t,\nu p_t)\le \varepsilon W^1(\mu ,\nu )\ \quad \forall \,\mu ,\nu \in \mathcal M_1({\mathbb {R}}^d )\} . \end{aligned}$$

Then by Corollary 2,

$$\begin{aligned} \tau _{W^1}(\varepsilon )\,\le \, c^{-1}\log (2/(\varepsilon \varphi (R_0)))\quad \text{ for } \text{ any } \varepsilon >0. \end{aligned}$$

The proofs of Theorem 1 and Corollary 2 are given in Sect. 4 below.

Remark 2

(Non-constant diffusion coefficients) The methods and results presented above have natural extensions to diffusion processes with smooth non-constant diffusion matrices. In that case, one possibility is to use an ad hoc coupling as in [35], but this leads to restrictive assumptions and bounds that are far from optimal. A better approach is to switch to a Riemannian setup where the metric is the intrinsic metric \(G(x)=(\sigma (x)\sigma (x)^T)^{-1}\) given by the diffusion coefficients. The diffusion process \((X_t)\) can then be represented in the form

$$\begin{aligned} dX_t= \beta (X_t)\, dt\, +\, dB^G_t \end{aligned}$$
(16)

where \((B^G_t)\) is a Brownian motion on the Riemannian manifold \(({\mathbb {R}}^d,G)\), and \(\beta \) is a modified drift vector field. Now, by replacing the reflection coupling by the corresponding Kendall-Cranston coupling on \(({\mathbb {R}}^d,G)\), one can expect similar results as above with \(\kappa \) defined as

$$\begin{aligned} \kappa (r)= & {} 2r^{-1}\inf \left\{ - \langle \gamma _{y,x}^\prime (r),{\beta (x)}\rangle +\langle \gamma _{y,x}^\prime (0),{\beta (y)}\rangle \right. \\&\left. \,+\, \int _0^r\mathrm{Ric}(\gamma _{y,x}^\prime (s) ,\gamma _{y,x}^\prime (s) )\, ds : \Vert x-y\Vert =r\right\} , \end{aligned}$$

where \(\gamma _{y,x}:[0,r]\rightarrow {\mathbb {R}}^d\) is the unit speed geodesic from y to x and \(\mathrm{Ric}\) denotes the Ricci curvature on \(({\mathbb {R}}^d,G)\), cf. [17, 43].

Remark 3

(Diffusions with reflection on smooth convex domains) The results above also apply to diffusion processes on a smooth bounded domain \(D\subseteq {\mathbb {R}}^d\) with normal reflection at the boundary [1, 10, 18, 36, 40]. In that case the SDE (1) is replaced by

$$\begin{aligned} dX_t= b(X_t)\, dt\, +\, n(X_t)\, d\ell _t\, +\,\sigma \, dB_t, \end{aligned}$$
(17)

where n(x) is the interior normal vector at a boundary point x, and \((\ell _t)\) is the local time of \((X_t)\) on the boundary \(\partial D\), i.e., \(t\mapsto \ell _t\) is a non-decreasing process that increases only at times when \(X_t\in \partial D\). Consequently, in the Eq. (4) for the coupling difference \(Z_t=X_t-Y_t\), additional drift terms in the directions \(n(X_t)\) and \(-n(Y_t)\) occur when one of the two copies is at the boundary. Since for a convex domain, both \(Z_t\cdot n(X_t)\le 0\) and \(-Z_t\cdot n(Y_t)\le 0\), the reflection at the boundary improves the upper bounds for \(\Vert Z_t\Vert \) in the proofs below when choosing \(G=I_d\). Therefore, the assertions of Theorem 1 and Corollary 2 hold true without further change if we take the infimum in the definition of \(\kappa \) only over \(x,y\in D\) and choose \(R_0\), \(R_1\) respectively equal to the diameter of D in case the infima in (8) or (9) are over empty sets.

2.2 Consequences

We summarize some important consequences of exponential contractivity w.r.t. Kantorovich distances as stated in Corollary 2. These consequences are essentially well-known, cf. e.g. Joulin [29], Joulin and Ollivier [30], and Komorowski and Walczuk [32] for related results. For the reader’s convenience, the proofs are nevertheless included in Sect. 4 below. We assume that \(\Vert z\Vert =|\sigma ^{-1}z|\) is the intrinsic metric, b is in \(C^1({\mathbb {R}^{d}},{\mathbb {R}^{d}})\), and

$$\begin{aligned} \int |z|\, p_t(x_0,dz)\ <\ \infty \end{aligned}$$
(18)

holds for some \(x_0\in {\mathbb {R}^{d}}\) and any \(t\ge 0\). Then, equivalently to (13), Theorem 1 implies Lipschitz contractivity for the transition semigroup

$$\begin{aligned} (p_tg)(x)= \int g(z)\, p_t(x,dz) \end{aligned}$$

w.r.t. the metric \(d_f\), i.e.,

$$\begin{aligned} \Vert p_tg\Vert _{\mathrm{Lip}(f)}\ \le \ \exp (-ct)\, \Vert g\Vert _{\mathrm{Lip}(f)} \end{aligned}$$
(19)

holds for any \(t\ge 0\) and any Lipschitz continuous function \(g:{\mathbb {R}^{d}}\rightarrow \mathbb {R}\), where

$$\begin{aligned} \Vert g\Vert _{\mathrm{Lip}(f)}= \sup \left\{ \frac{|g(x)-g(y)|}{d_f(x,y)}\, :\, x,y\in {\mathbb {R}^{d}} \text{ s.t. } x\ne y\right\} \end{aligned}$$

denotes the Lipschitz semi-norm w.r.t. \(d_f\). An immediate consequence is the existence of a unique stationary distribution \(\mu \) with finite second moments:

Corollary 3

(Convergence to equilibrium) There exists a unique stationary distribution \(\mu \) of \((p_t)_{t\ge 0}\) satisfying \(\int |y|\, \mu (dy) < \infty \) and

$$\begin{aligned} \mathrm{Var}_\mu (g) \ \le \ (2c)^{-1 }\Vert g\Vert ^2_{\mathrm{Lip}(f)}\ \text{ for } \text{ any } \text{ Lipschitz } \text{ continuous } g:{\mathbb {R}^{d}}\rightarrow \mathbb {R}. \end{aligned}$$
(20)

Moreover, for any probability measure \(\nu \) on \({\mathbb {R}^{d}}\),

$$\begin{aligned} W_f(\mu ,\nu p_t)\ \le \ \exp (-ct)\, W_f(\mu ,\nu )\qquad \text{ for } \text{ any } t\ge 0. \end{aligned}$$
(21)

We refer to [7, 11] for other recent results on convergence to equilibrium of diffusion processes in Wasserstein distances.

Further important consequences of (19) are quantitative non-asymptotic bounds for the decay of correlations and the bias and variance of ergodic averages. Let \(x_0\in {\mathbb {R}^{d}}\) and suppose that \((X,{\mathbb {P}})\) is a solution of (1) with initial condition \(X_0=x_0\).

Corollary 4

(Decay of correlations) For any Lipschitz continuous functions \(g,h:{\mathbb {R}^{d}}\rightarrow \mathbb {R}\) and \(s,t\ge 0\),

$$\begin{aligned} \mathrm{Cov}\, (g(X_t),h(X_{t+s})) \ \le \ \frac{1-e^{-2ct}}{2c}\, e^{-cs}\,\Vert g\Vert _{\mathrm{Lip}(f)}\, \Vert h\Vert _{\mathrm{Lip}(f)}. \end{aligned}$$
(22)

Corollary 5

(Bias and variance of ergodic averages) For any Lipschitz continuous function \(g:{\mathbb {R}^{d}}\rightarrow \mathbb {R}\) and \(t\in (0,\infty )\),

$$\begin{aligned} \left| {\mathbb {E}}\left( \frac{1}{t}\int _0^tg(X_s)\,ds\,-\,\int g\,d\mu \right) \right|\le & {} \frac{1-e^{-ct}}{ct}\, \Vert g\Vert _{\mathrm{Lip}(f)}\,\int d_f(x_0,y)\,\mu (dy),\ \text{ and }\\ \mathrm{Var}\left( \frac{1}{t}\int _0^tg(X_s)\,ds \right)\le & {} \frac{1}{c^2t}\, \Vert g\Vert ^2_{\mathrm{Lip}(f)} . \end{aligned}$$

In the variance estimate in Corollary 5, one of the factors 1 / c is due to the variance bound (20) w.r.t. the stationary distribution, whereas the second factor 1 / c bounds the decay rate for the correlations. Short proofs of Corollaries 3, 4, and 5 are included in Sect. 4.

Remark 4

(CLT, Gaussian deviation inequality) The contractivity w.r.t. \(W_f\) can also be used to prove a central limit theorem for the ergodic averages [32] and a Gaussian deviation inequality strengthening Corollary 5, cf. Remark 2.10 in [29].

2.3 Examples

In order to illustrate the quality of the bounds given in Theorem 1 and in Corollary 2, we estimate the constant c defined by (12) in different scenarios, and we study the behaviour of c under perturbations of the drift b.

We first consider the situation where \(\kappa \) is bounded from below by a negative constant for any r, and by a positive constant for large r:

Lemma 1

(Contractivity under lower bounds on \(\varvec{\kappa }\)) Suppose that

$$\begin{aligned} \kappa (r)\ge -L\, \text{ for } r\le R, \text{ and } \kappa (r)\ge K\, \text{ for } r>R \end{aligned}$$
(23)

hold with constants \(R,L\in [0,\infty )\) and \(K\in (0,\infty )\). If \(LR_0^2\le 8\) then

$$\begin{aligned} \alpha ^{-1}c^{-1}\ \le \ \frac{e-1}{2}R^2\,+\,e\sqrt{8K^{-1}}\,R\,+\, 4K^{-1}\ \le \ \frac{3e}{2}\,\max (R^2,8K^{-1}), \end{aligned}$$
(24)

and if \(LR_0^2\ge 8\) then

$$\begin{aligned} \alpha ^{-1}c^{-1}\ \le \ 8\sqrt{2\pi }R^{-1}L^{-1/2}(L^{-1}+K^{-1})\exp \left( \frac{LR^2}{8}\right) +32R^{-2}K^{-2}. \end{aligned}$$
(25)

For diffusions with reflection on a smooth convex domain corresponding bounds with \(K=\infty \) hold if R is the diameter of the domain, cf. Remark 3 above.

Remark 5

If \(L=0\) then the bound in (24) improves to

$$\begin{aligned} \alpha ^{-1}c^{-1}\ \le \ {2}\,\max (R^2,2K^{-1}). \end{aligned}$$
(26)

The proofs of Lemma 1 and Remark 5 are given in Sect. 5 below.

In the first case considered in the lemma, the constant c is at least of order \(\min (R^{-2}, K)\). Even if \(L=0\) (convex case), this order can not be improved as one-dimensional Langevin diffusions with potential \(U(x)=Kx^2/2\), or, respectively, with vanishing drift on \((-R/2,R/2)\) demonstrate. In particular, for \(U(x)=Kx^2/2\) with \(K>0\), the distance \(W_f\) is equivalent to \(W^1\), and the exact decay rate is K / 2. This differs from the bounds in (24) and (26) only by a factor 2, 6e respectively. Thus, if \(LR_0^2\) is not too large, the contractivity properties are not affected substantially by non-convexity !

In the second case (\(LR_0^2\ge 8\)), if \(K\ge \text{ const. }\cdot L\) then the upper bound for \(c^{-1}\) is of order \(L^{-3/2}R^{-1}\exp (LR^2/8)\). By the next example, this order in R and L is again optimal:

Example 4

(Double-well potential with \( {U}^{\prime \prime }(x)=-L\,\mathrm{for }\,|x|\le R/2\)) Consider a Langevin diffusion in \({\mathbb {R}}^1\) with a symmetric potential \(U\in C^2({\mathbb {R}})\) satisfying \(U(x)=-Lx^2/2\) for \(x\in [-R/2,R/2]\), \(U^{\prime \prime }\ge -L\), and \(\liminf _{|x|\rightarrow \infty }U^{\prime \prime }(x)>0\). If \(\Vert \,\cdot \,\Vert \) is the Euclidean norm then \(\kappa (r)= -L\) for \(r\in (0,R]\). On the other hand, let \(\tau _0=\inf \{ t\ge 0:X_t=0\}\) denote the first hitting time of 0. Then for any initial condition \(x_0>0\),

$$\begin{aligned} \lim _{t\rightarrow \infty }t^{-1}\log \,P_{x_0}[\tau _0>t]= -\lambda _1(0,\infty ) \end{aligned}$$
(27)

where \(-\lambda _1(0,\infty )\) is the first Dirichlet eigenvalue of the generator \({\mathcal {L}}v=(v''-U'v')/2\) on \((0,\infty )\), cf. [23] or see Sect. 5 below for a short proof of the corresponding lower bound that is relevant here. If \(LR^2\ge 4\) then by inserting the function \(g(x)=\min (\sqrt{L}x,1)\) into the variational characterization of the Dirichlet eigenvalue, we obtain the upper bound

$$\begin{aligned} \lambda _1(0,\infty )\ \le \ \frac{3}{4} e^{1/2} L^{3/2}R\exp (-LR^2/8), \end{aligned}$$
(28)

cf. Sect. 5 below. The estimates (27) and (28) seem to indicate that for \(x_0>0\), the Kantorovich distance \(W^1(\delta _{-x_0}p_t, \delta _{x_0}p_t)\) decays at most with a rate of order \(L^{3/2}R\exp (-LR^2/8)\). Indeed, under appropriate growth assumptions on U(x) for \(|x|\ge R\), one can prove that

$$\begin{aligned} {\mathbb {P}}_R\left[ \tau _0 >t\right] \ \ge \ 3/4\qquad \text{ for } \text{ any } t\le \lambda _1(0,\infty )^{-1}/4 , \end{aligned}$$

cf. Sect. 5. Hence for \(t\le 3^{-1}e^{-1/2}L^{-3/2}R^{-1}\exp (LR^2/8)\), the Kantorovich distance \(W^1(\delta _Rp_t,\mu )\) between \(\delta _Rp_t\) and the stationary distribution \(\mu \) is bounded from below by a strictly positive constant that does not depend on L and R if \(LR^2\ge 4\).

For analyzing the behaviour of c under perturbations of the drift, we assume that \(\Vert z\Vert =|\sigma ^{-1}z|\) is the intrinsic metric corresponding to the diffusion matrix, i.e., \(G=(\sigma \sigma ^T)^{-1}\). Suppose that

$$\begin{aligned} b(x)= b_0(x)+\gamma (x)\qquad \text{ for } \text{ any } x\in \mathbb {R}\end{aligned}$$
(29)

with locally Lipschitz continuous functions \(b_0,\gamma : \mathbb {R}^d\rightarrow {\mathbb {R}^{d}}\). For \(r>0\) let

$$\begin{aligned} \kappa _0(r)= \inf \left\{ -2\,\frac{(x-y)\cdot G(b_0(x)-b_0(y))}{\Vert x-y\Vert ^2}\, :\, x,y\in {\mathbb {R}}^d \text{ s.t. } \Vert x-y\Vert =r\right\} \end{aligned}$$
(30)

be defined analogously to \(\kappa (r)\) with b replaced by \(b_0\). We assume that \(\kappa _0\) satisfies the assumptions (7) imposed on \(\kappa \) above, and we define \(R_0\) and \(R_1\) similarly to (8) and (9) but with \(\kappa \) replaced by \(\kappa _0\). Now suppose that there exists a constant \(R\le R_0\) such that

$$\begin{aligned} (x-y)\cdot (\gamma (x)-\gamma (y))\ \le \ 0 \qquad \text{ for } \text{ any } x,y\in {\mathbb {R}^{d}} \text{ s.t. } \Vert x-y\Vert \ge R. \end{aligned}$$
(31)

Then \(\kappa (r)\ge \kappa _0(r)\) for \(r\ge R\), and hence the constants \(R_0\) and \(R_1\) defined w.r.t. b are smaller than the corresponding constants defined w.r.t. \(b_0\). In this situation, we can compare the lower bounds c and \(c_0\) for the contraction rates w.r.t. b and \(b_0\) given by (12):

Lemma 2

(Bounded and Lipschitz perturbations) Suppose that the drift \(b:{\mathbb {R}^{d}}\rightarrow {\mathbb {R}^{d}}\) is given by (29) with \(b_0\) and \(\gamma \) satisfying the assumptions stated above, and let c and \(c_0\) denote the lower bounds for the contraction rates w.r.t. b and \(b_0\) given by (12).

  1. 1.

    If \(\gamma \) is bounded and (31) holds for a constant \(R\in [0,R_0]\) then

    $$\begin{aligned} c\ \ge \ c_0\,\exp (-R\sup \Vert \gamma \Vert ). \end{aligned}$$
    (32)
  2. 2.

    If \(\gamma \) satisfies the one-sided Lipschitz condition

    $$\begin{aligned} (x-y)\cdot G(\gamma (x)-\gamma (y))\ \le \ L\cdot \Vert x-y\Vert ^2 \qquad \forall \ x,y\in {\mathbb {R}^{d}}\end{aligned}$$
    (33)

    with a finite constant \(L\in [0,\infty )\) and (31) holds for a constant \(R\in [0,R_0]\) then

    $$\begin{aligned} c\ \ge \ c_0\,\exp (-LR^2/4 ). \end{aligned}$$
    (34)

Remark 6

The condition \(R\le R_0\) is required in Lemma 2. If (31) does not hold for \(x,y\in {\mathbb {R}^{d}}\) with \(\Vert x-y\Vert \ge R_0\) then the constants \(R_0(b)\) and \(R_1(b)\) defined w.r.t. b are in general greater than the corresponding constants defined w.r.t. \(b_0\), i.e., the region of non-convexity increases by adding the drift \(\gamma \). This will also affect the bound in (12) significantly.

The proof of Lemma 2 is given in Sect. 5.

2.4 Local contractivity and a high-dimensional example

Consider again the setup in Sect. 2.1. In some applications, the condition \(\liminf _{r\rightarrow \infty } \kappa (r)>0\) imposed above is not satisfied, but the diffusion process will stay inside a ball \(B\subset {\mathbb {R}^{d}}\) for a long time with high probability. In this case, one can still prove exponential contractivity up to an error term that is determined by the exit probabilities from the ball. Corresponding estimates are useful to prove non-asymptotic error bounds, i.e., for fixed \(t\in (0,\infty )\), cf. e.g. [8, 9, 22].

Fix \(R\in (0,\infty )\) and let \(W_{f_R}\) denote the Kantorovich distance based on the distance function \(d_{f_R}(x,y)=f_R(\Vert x-y\Vert )\) given by

$$\begin{aligned} f_R(r)= \int _0^r\varphi (s)g_R(s)\, ds\qquad \text{ for } r\ge 0, \end{aligned}$$
(35)

where \(\varphi \) and \(\varPhi \) are defined by (10), and

$$\begin{aligned} g_R(r) = 1-\int _0^{r\wedge R}\frac{\varPhi (s)}{\varphi (s)}\, ds\Big / \int _0^{ R}\frac{\varPhi (s)}{\varphi (s)}\, ds . \end{aligned}$$
(36)

Notice that

$$\begin{aligned} g_R(r)=0\quad \text{ and }\quad f_R(r)=f_R(R) \quad \text{ for } \text{ any } r\ge R, \end{aligned}$$

i.e., we have cut the distance at \(f_R(R)\).

Theorem 6

(Local exponential contractivity) Suppose that the assumptions from Sect. 2.1 are satisfied except for the condition \(\liminf _{r\rightarrow \infty } \kappa (r)>0\). Then for any \(t,R\ge 0\) and any probability measures \(\mu ,\nu \) on \({\mathbb {R}}^d\),

$$\begin{aligned} W_{f_R}(\mu p_t,\nu p_t)\le & {} \exp ({-c_Rt})\, W_{f_R}(\mu ,\nu )\nonumber \\&+\, R\cdot \left( {\mathbb {P}}_\mu [\tau _{R/2}\le t] +{\mathbb {P}}_\nu [\tau _{R/2}\le t]\right) , \end{aligned}$$
(37)

where \((X_t,{\mathbb {P}}_\mu )\) is a diffusion process satisfying (1) with initial distribution \(\mu \), \(\tau _{R/2}=\inf \{ t\ge 0:\Vert X_t\Vert >R/2\}\) denotes the first exit time from the ball of radius R / 2 around 0, and

$$\begin{aligned} \frac{1}{c_R}= \alpha \int \limits _0^{R}\varPhi (s)\varphi (s)^{-1}\,ds= \alpha \int \limits _0^{R}\int \limits _0^s\exp \left( \frac{1}{4}\int \limits _t^su\kappa (u)^{-}\,du\right) \,dt\,ds. \end{aligned}$$
(38)

The proof of the theorem is given in Sect. 5. In applications, the exit probabilities are typically estimated by using appropriate Lyapunov functions.

Example 5

(Stochastic heat equation) We consider the diffusion in \({\mathbb {R}}^{d-1}\) given by \(X_t^0\equiv X_t^d\equiv 0\) and

$$\begin{aligned} dX_t^i= \left[ d^2\, (X_t^{i+1}-2X_t^i+X_t^{i-1})+ V'(X_t^i)\right] \, dt\, +\,\sqrt{d}\, dB_t^i, \end{aligned}$$
(39)

\(i=1,\ldots ,d-1\), where \(V:\mathbb {R}\rightarrow \mathbb {R}\) is a \(C^2\) function such that \(V''\ge -L\) for a finite constant \(L\in \mathbb {R}\). The Eq. (39) is a spatial discretization at the grid points i / d (\(i=0,1,\ldots ,d\)) of the stochastic heat equation with space-time white noise and Dirichlet boundary conditions on the interval [0, 1] given by

$$\begin{aligned} du= \left( \varDelta _{{\mathrm{Dir}}}u\, +\, V'(u)\right) \, dt\, +\, dW \end{aligned}$$
(40)

with the Dirichlet Laplacian \(\varDelta _{{\mathrm{Dir}}}\) on the interval [0, 1] and a cylindrical Wiener process \((W_t)_{t\ge 0}\) over the Hilbert space \(L^2(0,1)\). We observe that (39) is of the form (1) with \(\sigma =\sqrt{d} I_{d-1}\) and \(b=-d\nabla U\) where

$$\begin{aligned} U(x)= \frac{d}{2} \sum _{i=1}^d\left| x^i-x^{i-1}\right| ^2 \, +\, \frac{1}{d}\sum _{i=0}^d V(x^i) \end{aligned}$$

for \(x=(x^1,\ldots ,x^{d-1})\in {\mathbb {R}}^{d-1}\) and \(x^0=x^d=0\). By the discrete Poincaré inequality,

$$\begin{aligned} \sum _{i=1}^d\left| x^i-x^{i-1}\right| ^2 \ \ge \ 2\, (1-\cos (\pi /d))\, \sum _{i=1}^{d-1}\left| x^i\right| ^2 . \end{aligned}$$

Hence for any \(x,\xi \in \mathbb {R}^{d-1}\) and \(x^0=x^d =\xi ^0=\xi ^d=0\), the lower bound

$$\begin{aligned} \partial ^2_{\xi \xi }U(x)= d\sum _{i=1}^d\left| \xi ^i-\xi ^{i-1}\right| ^2\, +\,\frac{1}{d}\sum _{i=1}^{d-1}V''(x^i)\left| \xi ^i\right| ^2 \ \ge \ \frac{1}{d} K_d\sum _{i=1}^{d-1}\left| \xi ^i\right| ^2 \end{aligned}$$

holds with \(K_d= 2\, d^2\, (1-\cos (\pi /d))-L \), and thus

$$\begin{aligned} (x-y)\cdot (b(x)-b(y))= -d\, (x-y)\cdot (\nabla U(x) -\nabla U(y))\ \le \ -K_d\, |x-y|^2 \end{aligned}$$

for any \(x,y\in \mathbb {R}^{d-1}\) where \(|\cdot |\) denotes the Euclidean norm. Choosing for \(\Vert \cdot \Vert \) the intrinsic metric \(\Vert x\Vert =d^{-1/2}|x|\), we obtain

$$\begin{aligned} \kappa (r)\ \ge \ 2\, K_d\quad \text{ for } \text{ any } r>0. \end{aligned}$$

In particular, the function \(\kappa \) is bounded from below uniformly by a real constant that does not depend on the dimension d since

$$\begin{aligned} \lim _{d\rightarrow \infty } K_d= \pi ^2-L\ >\ -\infty . \end{aligned}$$
(41)

Theorem 6 now shows that for any \(R>0\), local exponential contractivity in the sense of (37) holds on the ball

$$\begin{aligned} B_{R/2}= \{ x\in \mathbb {R}^{d-1}: \Vert x\Vert \le R/2\} = \{ x\in \mathbb {R}^{d-1}: | x|\le d^{1/2}R/2\} \end{aligned}$$

with rate \(c_R\) satisfying

$$\begin{aligned} \frac{1}{c_R}\le & {} 4\sqrt{\pi }R^{-1}|K_d|^{-3/2}\exp (-K_dR^2/4)\qquad \text{ for } K_dR^2\le -4,\\ \frac{1}{c_R}\le & {} (e-1)R^2/2\qquad \qquad \text{ for } -4\le K_dR^2<0,\\ \frac{1}{c_R}\le & {} R^2/2\qquad \qquad \text{ for } K_d=0 \text{ respectively. } \end{aligned}$$

Here the explicit upper bounds are obtained analogously as in the proof of Lemma 1. For \(K_d>0\), strict convexity holds, and we obtain global exponential contractivity with a dimension-independent rate. We remark that because of (41), the bounds also carry over to the limiting SPDE (40) for which they imply local exponential contractivity on balls w.r.t. the \(L^2\) norm.

3 Main results for componentwise reflection couplings

3.1 Componentwise reflection couplings and contractivity on product spaces

We now consider a system

$$\begin{aligned} dX_t^i= b^i(X_t)\, dt\, +\, dB_t^i ,\qquad i=1,\ldots ,n, \end{aligned}$$
(42)

of n interacting diffusion processes taking values in \(\mathbb {R}^{d_i}\), \(d_i\in \mathbb {N}\). Here \(B^i\), \(i=1,\ldots ,n\), are independent Brownian motions in \(\mathbb {R}^{d_i}\), \(X=(X^1,\ldots ,X^n)\) is a diffusion process taking values in \({\mathbb {R}^{d}}\) where \(d=\sum _{i=1}^nd_i\), and \(b^i:{\mathbb {R}^{d}}\rightarrow \mathbb {R}^{d_i}\) are locally Lipschitz continuous functions. We will assume that

$$\begin{aligned} b^i(x)= b_0^i(x^i)\, +\, \gamma ^i(x) ,\qquad i=1,\ldots ,n, \end{aligned}$$
(43)

where the functions \(b_0^i:\mathbb {R}^{d_i}\rightarrow \mathbb {R}^{d_i}\) are locally Lipschitz continuous, and \(\gamma ^i:{\mathbb {R}^{d}}\rightarrow \mathbb {R}^{d_i}\) are “sufficiently small” perturbations, cf. Theorem 7 below. In particular, for \(\gamma ^i \equiv 0\) the components \(X^1,\ldots ,X^n\) are independent.

To analyse contraction properties of the process X, one could use a reflection coupling on \({\mathbb {R}^{d}}\) and apply the results above based on a distance function of the form \(d_f(x,y)=f(|x-y|)\). In some applications, this approach does indeed provide dimension-free bounds, cf. Example 5 above. However, in the product case \(\gamma ^i \equiv 0\) it leads in general to lower bounds for contraction rates that degenerate rapidly as \(n\rightarrow \infty \), even though one would expect exponential contractivity with the minimum of the contraction rates for the components. The reason is that the approach requires convexity outside a Euclidean ball in \({\mathbb {R}^{d}}\) whereas in corresponding product models, in general convexity only holds if all components are outside given balls in \(\mathbb {R}^{d_i}\).

Instead, we now consider contractivity w.r.t. Kantorovich distances \(W_{f,w}\) based on distance functions on \({\mathbb {R}^{d}}=\mathbb {R}^{d_1+\cdots +d_n}\) of the form

$$\begin{aligned} d_{f,w}(x,y)= \sum _{i=1}^nf_i(|x^i-y^i|)\, w_i\, . \end{aligned}$$
(44)

Here \(f_i:[0,\infty )\rightarrow [0,\infty )\), \(1\le i\le n\), are strictly increasing concave \(C^1\) functions with \(f_i(0)=0\) and \(f_i^\prime (0)=1\) that are obtained from \(b_0^i\) in the same way as f has been obtained from b above, and \(w_i\in (0,1]\) are positive weights. In many applications, one can choose \(w_i=1\) for any i. The corresponding distance will then be denoted by \(d_{1,f}\). Notice that \(d_{1,f}\) is bounded from above by the \(\ell ^1\) distance

$$\begin{aligned} d_{\ell ^1}(x,y)= \sum _{i=1}^n|x^i-y^i| . \end{aligned}$$

Hence \(W_{1,f}\) is bounded from above by the Kantorovich distance \(W_{\ell ^1}\) based on \(d_{\ell ^1}\).

For \(r\in (0,\infty )\) let

$$\begin{aligned} \kappa _i(r)\,=\,r^{-2}\, \inf \left\{ -2\,(x-y)\cdot (b_0^i(x)-b_0^i(y))\, :\, x,y\in {\mathbb {R}}^d \text{ s.t. } |x-y|=r\right\} . \end{aligned}$$
(45)

Similarly as above, we assume that for \(1\le i\le n\),

$$\begin{aligned} \kappa _i:(0, \infty )\rightarrow \mathbb {R} \text{ is } \text{ continuous } \text{ with } \, \liminf _{r\rightarrow \infty }\kappa _i(r)>0. \end{aligned}$$
(46)

Moreover, we assume

$$\begin{aligned} \lim _{r\rightarrow 0}r\kappa _i(r)=0. \end{aligned}$$
(47)

Let \(R_0^i\), \(R_1^i\), \(g_i(r)\), \(\varphi _i(r)\), \(f_i(r)\) and \(\varPhi _i(r)=\int _0^r\varphi _i(s)\, ds\) be defined analogously to (8), (9) and (10) with \(\kappa \) replaced by \(\kappa _i\). Moreover, we define \(c_i\in (0,\infty )\) by

$$\begin{aligned} \frac{1}{c_i}= \int \limits _0^{R_1^i}\varPhi _i(s)\varphi _i(s)^{-1}\,ds= \int \limits _0^{R_1^i}\int \limits _0^s\exp \left( \frac{1}{4}\int \limits _t^su\kappa _i(u)^{-}\,du\right) \,dt\,ds\, . \end{aligned}$$
(48)

Recall that by Theorem 1 and Corollary 2, \(c_i\) is a lower bound for the contraction rate of the diffusion process \(\widetilde{X}^i\) on \(\mathbb {R}^{d_i}\) satisfying the s.d.e. \(d\widetilde{X}_t^i=b_0^i(\widetilde{X}_t^i)\,dt\,+\,dB_t^i\).

Let \(p_t(x,dy)\) denote the transition kernels of the diffusion process \(X_t=(X_t^1,\ldots ,X_t^d)\) on \({\mathbb {R}^{d}}\) satisfying (42). We now state our second main result:

Theorem 7

(Exponential contractivity on product spaces) Suppose that (46) and (47) hold, and suppose that there exist constants \(\varepsilon _i\in [0,c_i)\), \(1\le i\le n\), such that for any \(x,y\in {\mathbb {R}^{d}}\),

$$\begin{aligned} \sum _{i=1}^n|\gamma ^i(x)-\gamma ^i(y)|\, w_i\ \le \ \sum _{i=1}^n\varepsilon _i\, f_i(|x^i-y^i|)\, w_i . \end{aligned}$$
(49)

Then for any \(t\ge 0\) and any probability measures \(\mu ,\nu \) on \({\mathbb {R}}^d\),

$$\begin{aligned} W_{f,w}(\mu p_t,\nu p_t)\le & {} \exp ({-ct})\, W_{f,w}(\mu ,\nu ),\ \qquad \text{ and } \end{aligned}$$
(50)
$$\begin{aligned} W_{\ell ^1}(\mu p_t,\nu p_t)\le & {} A\, \exp ({-ct})\, W_{\ell ^1}(\mu ,\nu ), \end{aligned}$$
(51)

where \( c = \min \nolimits _{i=1,\ldots ,n} (c_i-\varepsilon _i)\qquad \text{ and }\qquad A = 2\Big /\min \nolimits _{i=1,\ldots ,n} (\varphi _i(R_0^i)w_i) .\)

Example 6

(Product model) In the product case, \(\gamma ^i\equiv 0\) for any i. Hence Condition (49) is satisfied with \(\varepsilon _i=0\), and, therefore,

$$\begin{aligned} W_{f,w}(\mu p_t,\nu p_t) \ \le \ \exp ({-ct})\, W_{f,w}(\mu ,\nu ) \end{aligned}$$

holds with \(c=\min c_i\) for any choice of the weights \(w_1,\ldots , w_n\).

More generally than in the example, suppose now that \(\gamma =(\gamma ^1,\ldots ,\gamma ^n)\) satisfies an \(\ell ^1\)-Lipschitz condition

$$\begin{aligned} \sum _{i=1}^n|\gamma ^i(x)-\gamma ^i(y)|\ \le \ \lambda \, \sum _{i=1 }^n |x^i-y^i|\qquad \forall \ x,y\in {\mathbb {R}^{d}}. \end{aligned}$$
(52)

Then exponential contractivity holds for the perturbed product model provided \(\lambda <c_i\varphi (R_0^i)/2\) for any i:

Corollary 8

(Perturbations of product models) Suppose that (43), (46), (47) and (52) hold with \(\lambda \in [0,\infty )\). Then for any \(t\ge 0\) and any probability measures \(\mu ,\nu \) on \({\mathbb {R}}^d\),

$$\begin{aligned} W_{f,1}(\mu p_t,\nu p_t)\le & {} \exp ({-ct})\, W_{f,1}(\mu ,\nu ),\ \text{ and } \end{aligned}$$
(53)
$$\begin{aligned} W_{\ell ^1}(\mu p_t,\nu p_t)\le & {} A\exp ({-ct})\, W_{\ell ^1}(\mu ,\nu ), \end{aligned}$$
(54)

where \(\ c=\min \nolimits _{i=1,\ldots n}(c_i-2\lambda \varphi _i(R_0^i)^{-1})\ \) and \(\ A=2\max \nolimits _{i=1,\ldots n}\varphi _i(R_0^i)^{-1}\).

The inituitive idea of proof for Theorem 7 is to construct a coupling \((X_t,Y_t)\) of two solutions of (42) by applying a reflection coupling individually for each component \((X_t^i,Y_t^i)\) if \(X_t^i\ne Y_t^i\), and a synchronuous coupling if \(X_t^i=Y_t^i\). In the product case this just means that \(X_t^i=Y_t^i\) for any \(t\ge \tau ^i\) where \(\tau ^i=\inf \{ t\ge 0: X_t^i=Y_t^i\} \) is the coupling time for the i-th component. In the non-product case, however, \(X_t^i\) and \(Y_t^i\) can move apart again after the time \(\tau ^i\) due to interactions with other components. In that case it is not clear how to define a coupling as described above rigorously. Instead we will use a regularized version where reflection coupling is applied to the i-th component whenever \(|X_t^i-Y_t^i|\ge \delta \) for a given constant \(\delta >0\), and synchronuous coupling is applied whenever \(|X_t^i-Y_t^i|\le \delta /2 \). A precise description of the coupling and the proofs of Theorem 7 and Corollary 8 are given in Sects. 6 and 7 below.

3.2 Consequences

The contractivity results in Theorem 7 and Corollary 8 have corresponding consequences as the contractivity results in the non-product case, cf. Sect. 2.2 above. An important difference to be noted is, however, that on product spaces,

$$\begin{aligned} d_{f,w}(x,y)\ \le \ \sum _{i=1}^n|x^i-y^i|\ \le \ n^{1/2}\, |x-y| \end{aligned}$$

by the Cauchy-Schwarz inequality. Therefore, an additional factor n occurs in the variance bounds from Corollaries 3, 4 and 5 on product spaces. Apart from this additional factor, all results in Sect. 2.2 carry over to the setup considered in Sect. 3.1.

3.3 Interacting Langevin diffusions

As an illustration of the results in Sect. 3.1, we consider a system

$$\begin{aligned} dX_t^i= -\frac{1}{2}\nabla U(X_t^i)\, dt\,-\, \sum _{j=1}^na_{ij}\, \nabla V(X_t^i-X_t^j)\, dt \, +\, dB_t^i \end{aligned}$$
(55)

of n interacting overdamped Langevin diffusions taking values in \(\mathbb {R}^k\) for some \(k\in \mathbb {N}\). Here \(B^1,\ldots ,B^n\) are independent Brownian motions in \(\mathbb {R}^k\), \(U\in C^2(\mathbb {R}^k)\) is strictly convex outside a given ball, the interaction potential V is in \(C^2(\mathbb {R}^k)\) with bounded second derivatives, and \(a_{ij}\), \(1\le i,j\le n\), are finite real constants. For example, we are interested in nearest-neighbour interactions and mean-field interactions given by

$$\begin{aligned} a_{ij}= & {} \left\{ \begin{array}{ll}\alpha /2\ \ &{}\quad \text{ if } i-j\equiv 1 \text{ mod } n \text{ or } i-j\equiv -1 \text{ mod } n,\\ 0&{}\quad \text{ otherwise, } \end{array}\right. \end{aligned}$$
(56)
$$\begin{aligned} a_{ij}= & {} \alpha \, n^{-1}\qquad \text{ respectively, } \end{aligned}$$
(57)

where \(\alpha \in \mathbb {R}\) is a finite coupling constant.

Choosing \(b_0^i(x^i)=-\nabla U(x^i)/2\) and \(\gamma ^i(x)= -\sum _{j=1}^na_{ij}\nabla V(x^i-x^j)\), we observe that the function

$$\begin{aligned} \kappa _i(r)= \inf \left\{ \int \nolimits _0^1\partial ^2_{(x-y)/|x-y|}U((1-t)x+ty)\, dt :x,y\in {\mathbb {R}}^k \text{ s.t. }\, |x-y| =r\right\} \end{aligned}$$

does not depend on i. Let \(\varphi \) and f be the corresponding functions given by (10), and consider the distance

$$\begin{aligned} d_{1,f}(x,y)= \sum _{i=1}^nf(|x^i-y^i|). \end{aligned}$$

Moreover, let c be given by (12) with \(\alpha =1\), i.e., c is the lower bound for the contraction rate of the diffusion process Y in \({\mathbb {R}}^k\) satisfying \(dY=-\frac{1}{2}\nabla U(Y)\, dt\, +\, dB\). We note that \(\gamma \) satisfies the \(\ell ^1\) Lipschitz condition (52) with

$$\begin{aligned} \lambda = M\cdot \max _i\sum _{j=1}^n \left( |a_{ij}|+|a_{ji}|\right) \end{aligned}$$

where \(M=\sup \Vert \nabla ^2V\Vert \). Therefore, if

$$\begin{aligned} \sum _{j=1}^n\left( |a_{ij}|+|a_{ji}|\right) \ \le \ c\, \varphi (R_0)\, M^{-1} \end{aligned}$$

then by Corollary 8, contractivity in the sense of (53) holds with contraction rate

$$\begin{aligned} {\bar{c}}= c-2\lambda \varphi (R_0)^{-1}> 0. \end{aligned}$$

In particular, in the nearest neighbour and mean field case, we obtain contractivity with a rate that does not depend on the dimension if \( \alpha \) is small:

Corollary 9

(Mean field and nearest neighbour interactions) Let \(p_t\), \(t\ge 0\), denote the transition kernels of the diffusion process on \({\mathbb {R}}^{nk}\) solving (55). Suppose that \(\sup \Vert \nabla ^2V\Vert <\infty \) and that \(a_{ij}\) is given by (56) or by (57) with \(\alpha \in \mathbb {R}\). Then there exist finite constants \(c,\theta ,A\in (0,\infty )\) that do not depend on the dimension n such that

$$\begin{aligned} W_{f,1}(\mu p_t,\nu p_t)\le & {} e^{(\theta \alpha -c)t}\, W_{f,1}(\mu ,\nu ),\ \text{ and } \end{aligned}$$
(58)
$$\begin{aligned} W_{\ell ^1}(\mu p_t,\nu p_t)\le & {} A\, e^{(\theta \alpha -c)t}\, W_{\ell ^1}(\mu ,\nu ), \end{aligned}$$
(59)

hold for any \(t\ge 0\) and any probability measures \(\mu ,\nu \) on \({\mathbb {R}}^{nk}\). In particular, exponential contractivity holds for \(\alpha <c/\theta \).

The bounds in (58) and (59) are not sharp. However, it is known that for example in mean field models where U is a double-well potential and V is quadratic, exponential contractivity with a rate independent of the dimension can not be expected to hold for large \(\alpha \). Indeed, in this case the corresponding McKean-Vlasov process has several stationary distributions if \(\alpha > \alpha _1\) for some critical parameter \(\alpha _1\in (0,\infty )\), cf. [26, 27].

4 Proofs for reflection coupling

In this section, we first motivate our particular choice of the function f, and we prove Theorem 1. Afterwards, we prove Corollaries 2, 3, 4 and 5.

Let \(r_t=\Vert X_t-Y_t\Vert \) where (XY) is a reflection coupling of two solutions of (1). Our goal is to find an explicit concave increasing function \(f:[0,\infty )\rightarrow [0,\infty )\) with \(f(0)=0\) and \(f^\prime (0)=1\) such that \(e^{ct}f(r_t)\) is a (local) supermartingale for t less than the coupling time T with a constant \(c>0\) that we are trying to maximize by the choice of f.

An application of Itô’s formula to the s.d.e. (4) satisfied by the difference process \(Z_t=X_t-Y_t\) shows that the following Itô equations hold almost surely for \(t<T\) whenever f is \(C^1\) and \(f^\prime \) is absolutely continuous:

$$\begin{aligned} d\Vert Z_t\Vert ^2= & {} 4\, |\sigma ^{-1}Z_t|^{-1}\Vert Z_t\Vert ^2\,dW_t\nonumber \\&+\, 2\, Z_t\cdot G(b(X_t)-b(Y_t))\,dt\, +\,4\, |\sigma ^{-1}Z_t|^{-2}\Vert Z_t\Vert ^2\,dt,\nonumber \\ dr_t= & {} 2\, |\sigma ^{-1}Z_t|^{-1}r_t\,dW_t\,+ \,r_t^{-1}Z_t\cdot G(b(X_t)-b(Y_t))\,dt,\ \ \text{ and }\nonumber \\ df(r_t)= & {} 2\, |\sigma ^{-1}Z_t|^{-1}r_t\, f'(r_t)\,dW_t\nonumber \\&+\, r_t^{-1}Z_t\cdot G(b(X_t)-b(Y_t))f'(r_t)\,dt\, +\, 2\, |\sigma ^{-1}Z_t|^{-2}r_t^2f''(r_t)\,dt.\nonumber \\ \end{aligned}$$
(60)

By definition of the function \(\kappa \), the drift term on the right hand side of (60) is bounded from above by

$$\begin{aligned} \beta _t:=2\, |\sigma ^{-1}Z_t|^{-2}r_t^2\cdot \left( f''(r_t)-\frac{1}{4}\, r_t\, \kappa (r_t)f'(r_t)\right) . \end{aligned}$$
(61)

Hence the process \(e^{ct}f(r_t)\) is a supermartingale for \(t<T\) if \(\beta _t\le -cf(r_t)\). Since

$$\begin{aligned} |\sigma ^{-1}z|^2\ \le \ \alpha \Vert z\Vert ^2\quad \text{ for } \text{ any } z\in \mathbb {R}^d \end{aligned}$$
(62)

with \(\alpha \) defined as in Theorem 1, a sufficient condition is

$$\begin{aligned} f''(r)-\frac{1}{4}r\kappa (r)f'(r)\ \le \ -\frac{\alpha c}{2}f(r)\qquad \text{ for } \text{ a.e. } r>0. \end{aligned}$$
(63)

We now first observe that this equation holds with \(c=0\) (i.e., \(f(r_t)\) is a supermartingale for \(t<T\)) if f is chosen such that \(f'(r)=\varphi (r)=\exp (-\int _0^rs\kappa (s)^{-} ds/4)\). Indeed, \(f(r)=\int _0^r \varphi (s)\, ds\) is the least concave among all concave functions f satisfying \(\beta _t\le 0\).

To satisfy the stronger condition \(\beta _t\le -cf(r_t)\) with \(c>0\), we make the ansatz

$$\begin{aligned} f'(r)= \varphi (r)\, g(r) \end{aligned}$$
(64)

with a decreasing absolutely continuous function \(g\ge 1/2\) such that \(g(0)=1\). Note that the condition \(g\ge 0\) is required to ensure that f is non-decreasing. By replacing this condition by the stronger condition \(g\ge 1/2\), we are loosing at most a factor 2 in the estimates below. On the other hand, the condition \(1/2\le g\le 1\) has the huge advantage of ensuring that

$$\begin{aligned} \varPhi /2\ \le \ f\ \le \ \varPhi \end{aligned}$$
(65)

where \(\varPhi (r)=\int _0^r\varphi (s)\, ds\). The ansatz (64) yields

$$\begin{aligned} f''= -\frac{1}{4}\, r\kappa ^{-}f\,+\,\varphi g'\ \le \ \frac{1}{4}\, r\kappa f\,+\,\varphi g' , \end{aligned}$$

i.e., Condition (63) is satisfied if

$$\begin{aligned} g'\ \le \ -\frac{\alpha c}{2}f/\varphi \, .\qquad \text{ almost } \text{ surely. } \end{aligned}$$
(66)

We will see in the proof below that for \(r\ge R_1\), Condition (63) is automatically satisfied since \(\kappa \) is sufficiently positive. Therefore, it is enough to assume that (66) holds on \((0,R_1)\).

Now on the one hand, if (66) is satisfied on \((0,R_1)\) then

$$\begin{aligned} g(R_1)\ \le \ 1-\frac{\alpha c}{2} \int _0^{R_1}f(s)\varphi (s)^{-1}\, ds\ \le \ 1-\frac{\alpha c}{4}\int _0^{R_1}\varPhi (s) \varphi (s)^{-1}\, ds . \end{aligned}$$

This condition can only be satisfied with a function g taking values in [1 / 2, 1] if

$$\begin{aligned} \alpha \, c \le 2big/\int _0^{R_1}\varPhi (s) \varphi (s)^{-1}\, ds . \end{aligned}$$

On the other hand, by choosing

$$\begin{aligned} g'(r)= -\frac{\varPhi (r)}{2\varphi (r)} \Big /\int \nolimits _0^{R_1}\frac{\varPhi (s)}{\varphi (s)}\,ds\qquad \text{ for } r<R_1, \end{aligned}$$
(67)

Condition (66) is satisfied with the constant

$$\begin{aligned} \alpha \, c= 1\Big /\int _0^{R_1}\varPhi (s) \varphi (s)^{-1}\, ds . \end{aligned}$$

This shows that up to a factor 2, choosing g as in (67) is the best we can do under the assumptions that we have made.

The considerations above explain the particular choice of the function f made in (10). Once this choice has been made, the proof of Theorem 1 is almost straightforward:

Proof of Theorem 1

As remarked above, the drift in the s.d.e. (60) for \(f(r_t)\) is bounded from above by \(\beta _t\) defined by (61). We now show that by our choice of f in (10), this expression is smaller than \(-cf(r_t)\) where c is given by (12). Indeed, for \(r<R_1\),

$$\begin{aligned} f''(r)= & {} -\frac{1}{4} r\kappa (r)^{-}\varphi (r)g(r)-\frac{1}{2}\varPhi (r)\Big /\int \limits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds \nonumber \\\le & {} \ \frac{1}{4} r\kappa (r)f'(r)-\frac{1}{2} f(r)\Big /\int \limits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds. \end{aligned}$$
(68)

For \(r > R_1\), we have \(f'(r)=\varphi (r)/2=\varphi (R_0)/2\) and \(\kappa (r)R_1(R_1-R_0)\ge 8\) by definition of \(R_1\), whence

$$\begin{aligned} f''(r)-\frac{1}{4} r\kappa (r)f'(r)= & {} -\frac{1}{8} r\kappa (r)\varphi (R_0)\ \le \ -\frac{\varphi (R_0)}{R_1-R_0}\cdot \frac{r}{R_1}\nonumber \\\le & {} -\frac{\varphi (R_0)}{R_1-R_0}\cdot \frac{\varPhi (r)}{\varPhi (R_1)}\ \le \ -\frac{1}{2}\varPhi (r)\Big /\int \nolimits _{R_0}^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds\nonumber \\\le & {} -\frac{1}{2}f(r)\Big /\int \nolimits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds. \end{aligned}$$
(69)

Here we have used that for \(r\ge R_0\), the function \(\varphi (r)\) is constant, and, therefore, \(\varPhi (r)= \varPhi (R_0)+(r-R_0)\, \varphi (R_0)\), and

$$\begin{aligned} \int \nolimits _{R_0}^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds= & {} \int \nolimits _{R_0}^{R_1}(\varPhi (R_0)+(s-R_0)\varphi (R_0))\varphi (R_0)^{-1}\,ds\\= & {} {\varPhi (R_0)}{\varphi (R_0)^{-1}}(R_1-R_0)+(R_1-R_0)^2/2\\\ge & {} (R_1-R_0)\left( \varPhi (R_0)+(R_1-R_0)\varphi (R_0)\right) \varphi (R_0)^{-1}/2\\= & {} (R_1-R_0)\varPhi (R_1)\varphi (R_0)^{-1}/2. \end{aligned}$$

By (68) and (69), we conclude that \(\beta _t\le -cf(r_t)\). Optional stopping in (60) at \(T_k=\inf \{t\ge 0:\, r_t\not \in (k^{-1},k)\}\) now implies

$$\begin{aligned} \mathbb {E}[f(r_t)\,;\,t<T_k]\ \le \ -c\int \nolimits _0^t\mathbb {E}[f(r_s)\,;\,s<T_k]\,ds\, \end{aligned}$$

for any \(k\in \mathbb N\) and \(t\ge 0\). The assertion follows for \(k\rightarrow \infty \) since \(r_t=0\) for \(t\ge T\), and \(T=\sup T_k\) by non-explosiveness. \(\square \)

Proof of Corollary 2

Let (XY) be a reflection coupling of two solutions of (1) with joint initial distribution \((X_0,Y_0)\sim \eta \). Then by Theorem 1,

$$\begin{aligned} W_f(\mu p_t,\nu p_t)\le & {} \mathbb {E}\left[ d_f(X_t,Y_t)\right] \ \le \ e^{-ct}\, \mathbb {E}\left[ d_f(X_0,Y_0)\right] \\= & {} e^{-ct}\,\int d_f(x,y)\,\eta (dx\, dy) \end{aligned}$$

for any \(t\ge 0\). The estimate (13) now follows by taking the infimum over all couplings \(\eta \) of two given probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^d\). Moreover, (14) follows from (13) by (15).\(\square \)

Next, we are going to prove the results in Sect. 2.2. Suppose that (18) holds, \(\Vert z\Vert =|\sigma ^{-1}z|\) is the intrinsic metric, and b is in \(C^1\). Corollary 2 implies

$$\begin{aligned} \int |y|\, p_t(x,dy)\le \int |y|\, p_t(x_0,dy)\, +\, W^1(p_t(x,\cdot ),p_t(x_0,\cdot ))\, <\,\infty \end{aligned}$$

for any \(t\ge 0\) and any \(x\in {\mathbb {R}^{d}}\). In particular, \((p_tg)(x)=\int g(y)\,p_t(x,dy)\) is defined for any Lipschitz continuous function \(g:{\mathbb {R}^{d}}\rightarrow \mathbb {R}\), and

$$\begin{aligned} |(p_tg)(x)-(p_tg)(y)|= \left| {\mathbb {E}} [g(X_t)- g(Y_t)]\right| \ \le \ \Vert g\Vert _{\mathrm{Lip}(f)} {\mathbb {E}} [d_f(X_t,Y_t)] \end{aligned}$$

for any coupling \((X_t,Y_t)\) of \(p_t(x,\cdot )\) and \(p_t(y,\cdot )\). Hence by Theorem 1,

$$\begin{aligned} |(p_tg)(x)-(p_tg)(y)| \le e^{-ct}\,\Vert g\Vert _{\mathrm{Lip}(f)} \, d_f(x,y), \end{aligned}$$
(70)

i.e., \(p_t\) satisfies the exponential contractivity condition (19) w.r.t. \(\Vert \cdot \Vert _{\mathrm{Lip}(f)}\). If \(p_tg\) is \(C^1\) then by (70) and since

$$\begin{aligned} d_f(x,y) \le \Vert x-y\Vert = |\sigma ^{-1}(x-y)| \qquad \forall \, x,y\in {\mathbb {R}^{d}}, \end{aligned}$$

we obtain the uniform gradient bound

$$\begin{aligned} \sup \left| \sigma ^T\nabla p_tg\right| \le e^{-ct}\,\Vert g\Vert _{\mathrm{Lip}(f)} \qquad \forall \ t\ge 0. \end{aligned}$$
(71)

It is well-known that this bound can be used to control variances w.r.t. the measures \(p_t(x,\cdot )\):

Lemma 3

For any \(t\ge 0\), \(x\in {\mathbb {R}^{d}}\), and any Lipschitz continuous \(g:{\mathbb {R}^{d}}\rightarrow \mathbb {R}\),

$$\begin{aligned} \mathrm{Var}_{p_t(x,\cdot )}(g)\ \le \ \frac{1-\exp ({-2ct})}{2c} \,\Vert g\Vert ^2_{\mathrm{Lip}(f)} . \end{aligned}$$
(72)

Proof

We may assume \(g\in C^2({\mathbb {R}^{d}})\) and \(t>0\). Then, by standard elliptic regularity results, \((t,x)\mapsto (p_tg)(x)\) is differentiable in t and x, and

$$\begin{aligned} \frac{d}{dt}\, p_tg= {\mathcal {L}}p_tg= p_t{\mathcal {L}}g \end{aligned}$$

where \({\mathcal {L}}=\frac{1}{2}\sum a_{ij}\frac{\partial ^2}{\partial x^i\partial x^j}+b(x)\cdot \nabla \), \(a=\sigma \sigma ^T\), is the generator of \((X_t)\), cf. e.g. [38, 39]. In particular, for \(s\in (0,t)\),

$$\begin{aligned} \frac{d}{ds}\, p_s(p_{t-s}g)^2= & {} p_s\, \left( {\mathcal {L}}(p_{t-s}g)^2-2p_{t-s}g\, {\mathcal {L}}p_{t-s}g\right) \\= & {} p_s\,\left| \sigma ^T\nabla p_{t-s}g\right| ^2 \ \le \ e^{-2c(t-s)}\Vert g\Vert ^2_{\mathrm{Lip}(f)} \end{aligned}$$

by (71). Integrating w.r.t. s, we obtain

$$\begin{aligned} p_tg^2-(p_tg)^2\ \le \ \frac{1-\exp ({-2ct})}{2c} \,\Vert g\Vert ^2_{\mathrm{Lip}(f)} , \end{aligned}$$

which is equivalent to (72). \(\square \)

By Lemma 3 and (70), we can now easily prove Corollaries 3, 4 and 5:

Proof of Corollary 3

Existence and uniqueness of a stationary distribution \(\mu \) for \((p_t)_{t\ge 0}\) satisfying \(\int |y|\, \mu (dy)<\infty \) follows easily as in [32], Sect. 3: By Corollary 2, the map \(\nu \mapsto \nu p_1\) is a contraction w.r.t. the distance \(W_f\) (equivalent to \(W^1\)) on the complete metric space \({\mathcal {P}}^1\) of all probability measures \(\nu \) on \(({\mathbb {R}^{d}},\mathcal B({\mathbb {R}^{d}}))\) satisfying \(\int |y|\, \mu (dy)<\infty \). Hence by the Banach fixed point theorem, there exists a unique probability measure \(\mu _0\) such that \(\mu _0 p_1=\mu _0 \). It is then elementary to verify that the measure \(\mu =\int _0^1 \mu _0p_s\, ds\) satisfies \(\mu p_t=\mu \) for any \(t\in [0,1]\), and hence for any \(t\in [0,\infty )\). Moreover, by Corollary 2,

$$\begin{aligned} W_f(\mu ,\nu p_t)= W_f(\mu p_t,\nu p_t)\ \le \ e^{-ct}\, W_f(\mu ,\nu ) \end{aligned}$$

for any \(\nu \in {\mathcal {P}}^1\). In particular, as \(t\rightarrow \infty \), \(p_t(x,\cdot )\rightarrow \mu \) in \({\mathcal {P}}^1\) for any \(x\in {\mathbb {R}^{d}}\). The variance bound for \(\mu \) now follows from the corresponding bound for \(p_t(x,\cdot )\) in Lemma 3. \(\square \)

Proof of Corollary 4

By Lemma 3,

$$\begin{aligned} \mathrm{Cov}\, \left( g(X_t),h(X_{t+s})\right)= & {} {\mathbb {E}}\left[ g(X_t)\,h(X_{t+s})\right] \, -\, E\left[ g(X_t)\right] \, E\left[ h(X_{t+s})\right] \\= & {} {\mathbb {E}}\left[ (g\,p_sh)(X_t)\right] \, -\, E\left[ g(X_t)\right] \, E\left[ (p_sh)(X_{t})\right] \ \\= & {} \ \mathrm{Cov}_{p_t(x_0,\cdot )}(g,p_sh)\\\le & {} {(1-\exp ({-2ct}))}\, {(2c)^{-1}} \,\Vert g\Vert _{\mathrm{Lip}(f)}\Vert p_sh\Vert _{\mathrm{Lip}(f)} \end{aligned}$$

for any \(s,t\ge 0\). The assertion now follows by (70). \(\square \)

Proof of Corollary 5

The bound for the bias follows immediately from (70), since

$$\begin{aligned} \left| {\mathbb {E}}\left[ \frac{1}{t} \int _0^t g(X_s)\, ds\, - \, \int g\, d\mu \right] \right|= & {} \left| \frac{1}{t} \int _0^t \int (p_sg(x_0)-p_sg (y))\, \mu (dy)\, ds\right| \\\le & {} \frac{1}{t} \int _0^t e^{-cs}\, ds\ \Vert g\Vert _{\mathrm{Lip}(f)}\, \int d_f(x_0,y)\, \mu (dy). \end{aligned}$$

Moreover, by Corollary 4,

$$\begin{aligned} \mathrm{Var}\, \left( \frac{1}{t} \int _0^t g(X_s)\, ds \right)= & {} \mathrm{Cov}\, \left( \frac{1}{t} \int _0^t g(X_s)\, ds\, , \, \frac{1}{t} \int _0^t g(X_s)\, ds \right) \\= & {} \frac{2}{t^2}\int _0^t\int _s^t\mathrm{Cov}\,\left( g(X_s),g(X_u)\right) \, du\, ds\\\le & {} \frac{1}{ct^2}\int _0^t(1-e^{-2cs})\int _s^te^{-c (u-s)}\, du\, ds\; \Vert g\Vert ^2_{\mathrm{Lip}(f)}\\\le & {} \frac{1}{c^2t}\, \Vert g\Vert ^2_{\mathrm{Lip}(f)}. \end{aligned}$$

\(\square \)

5 Examples

We now prove the results in Sects. 2.3 and 2.4, including in particular Lemmas 1, 2 and Theorem 6.

Proof of Lemma 1 and Remark 5

We first prove the lower bounds on the exponential decay rate c in (12) stated in (24), (25) and (26). Notice that the constant c defined by (12) increases if \(\kappa (r)\) is replaced by a greater function. Indeed, for \(r\ge 0\),

$$\begin{aligned} \varPhi (r)\varphi (r)^{-1}\,=\,\int \limits _0^r\varphi (t)\varphi (r)^{-1}\,dt\,=\,\int \limits _0^r\exp \left( \frac{1}{4}\int \limits _t^rs\kappa (s)^{-}\,ds\right) \,dt, \end{aligned}$$
(73)

whence \(R_0\), \(R_1\) and \(c^{-1}=\alpha \int \nolimits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds\) are decreasing functions of \(\kappa \).

Convex Case. Suppose first that \(\kappa (r)\ge 0\) for any \(r\ge 0\) and \(\kappa (r)\ge K\) for \(r\ge R\) with constants \(K \in (0,\infty )\) and \(R\in [0,\infty )\). Then \(R_0=0\), \(R_1\le \max (R,\sqrt{8/K})\), \(\varphi \equiv 1\), and hence

$$\begin{aligned} c\,=\,(\alpha R_1^2/2)^{-1}\,\ge \,\alpha ^{-1}\min (R^{-2}/2,K/4). \end{aligned}$$

Locally non-convex case. Now suppose that \(\kappa (r)\ge -L\) for \(r\le R\) and \(\kappa (r)\ge K\) for \(r>R\) with constants \(K,L\in (0,\infty )\) and \(R\in [0,\infty ]\). Since \(\varphi (r)=\varphi (R_0)\) and \(\varPhi (r)=\varPhi (R_0)+(r-R_0)\varphi (R_0)\) for \(r\ge R_0\), we have

$$\begin{aligned} \alpha ^{-1}c^{-1}= & {} \int \limits _0^{R_1}\varPhi (s)\varphi (s)^{-1}\,ds\nonumber \\= & {} \int \limits _0^{R_0}\varPhi (s)\varphi (s)^{-1}\,ds+(R_1-R_0)\varPhi (R_0)\varphi (R_0)^{-1}+(R_1-R_0)^2/2. \nonumber \\ \end{aligned}$$
(74)

The lower curvature bounds imply the upper bounds

$$\begin{aligned} R_0\le & {} R,\qquad R_1-R_0\ \le \ \min (8/(KR_0),\sqrt{8/K}),\qquad \text{ and } \end{aligned}$$
(75)
$$\begin{aligned} \varPhi (r)\varphi (r)^{-1}\le & {} \int \limits _0^r\exp (L(r^2-t^2)/8)\,dt\nonumber \\\le & {} \min (\sqrt{2\pi /L},r)\exp (Lr^2/8)\qquad \text{ for } r\le R_0. \end{aligned}$$
(76)

Since \(\exp x\le 1+(e-1)x\) for \(x\in [0,1]\) and

$$\begin{aligned} \int _0^x\exp (u^2)\, du\ \le \ e+\int _1^x(2-u^{-2})\exp (u^2)\, du= x^{-1}\exp (x^2)\qquad \text{ for } x \ge 1, \end{aligned}$$

we can conclude that

$$\begin{aligned} \int \limits _0^{R_0}\varPhi (r)\varphi (r)^{-1}\,dr\le & {} \int _0^{R_0}r\exp (Lr^2/8)\, dr= 4L^{-1} (\exp (LR_0^2/8 )-1)\\\le & {} (e-1)R_0^2/2\qquad \text{ if } LR_0^2/8\le 1,\qquad \text{ and }\\ \int \limits _0^{R_0}\varPhi (r)\varphi (r)^{-1}\,dr\le & {} \sqrt{\frac{2\pi }{L} }\int _0^{R_0}\exp (\frac{Lr^2}{8})\, dr= \sqrt{\frac{8\cdot 2\pi }{L^2} }\int _0^{\sqrt{LR_0^2/8}}\exp (u^2)\, du\\\le & {} 8\sqrt{2\pi }L^{-3/2}R_0^{-1}\exp (LR_0^2/8)\qquad \text{ if } LR_0^2/8\ge 1. \end{aligned}$$

Combining these estimates, we obtain by (74), (75) and (76),

$$\begin{aligned} \alpha ^{-1}c^{-1}\le & {} (e-1)R^2/2+e\sqrt{8/K}R+4/K\qquad \text{ if } LR_0^2/8\,\le \,1, \text{ and } \\ \alpha ^{-1}c^{-1}\le & {} 8\sqrt{2\pi }R^{-1}L^{-1/2}(L^{-1}+K^{-1})\exp (LR^2/8)+32R^{-2}K^{-2} \text{ if } LR_0^2/8\ge 1, \end{aligned}$$

where we have used that the function \(x\mapsto x^{-1} \exp (x^2)\) is increasing for \(x\ge 1\). \(\square \)

Proofs for Example 4

Consider the one-dimensional Langevin diffusion \((X_t)\) with drift \(-\nabla U(x)/2\) and generator

$$\begin{aligned} {\mathcal {L}}v= \frac{1}{2}(v''-U'v')= \frac{1}{2}\,e^U \left( e^{-U}v^\prime \right) ^\prime . \end{aligned}$$
(77)

The assumption \(\liminf _{|x|\rightarrow \infty }U^{\prime \prime }(x)>0\) implies that there is a unique strictly positive bounded eigenfunction \(v_1\in C^2(0,\infty )\cap C([0,\infty ))\) satisfying \(v_1(0)=0\), \(v_1^\prime (0)=1\) and \({\mathcal {L}}v_1=-\lambda _1v_1\), where

$$\begin{aligned} \lambda _1= \lambda _1 (0,\infty )= \inf _{v\in C_0^\infty (0,\infty )}\frac{\frac{1}{2} \int _0^\infty v'(x)^2\,\exp (-U(x))\, dx}{\int _0^\infty v(x)^2\,\exp (-U(x))\, dx} \end{aligned}$$

is the infimum of the spectrum of the self-adjoint realization of \(-{\mathcal {L}}\) with Dirichlet boundary conditions on \((0,\infty )\). Since \({\mathcal {L}}v_1=- \lambda _1v_1\) and \(v_1\) is bounded, the process \(M_t=\exp (\lambda _1t)v_1(X_t)\) is a martingale. Optional stopping applied to the diffusion with initial condition \(X_0=x_0\) shows that

$$\begin{aligned} v_1(x_0)= & {} {\mathbb {E}}_{x_0}\left[ M_0\right] = {\mathbb {E}}_{x_0}\left[ M_{\tau _0\wedge t}\right] = {\mathbb {E}}_{x_0}\left[ \exp (\lambda _1t)v_1(X_t);\, {\tau _0> t}\right] \nonumber \\\le & {} \exp (\lambda _1t)\, {\mathbb {P}}_{x_0}\left[ {\tau _0> t}\right] \, \sup v_1 \end{aligned}$$
(78)

for any \(x_0>0\) and \(t\ge 0\). Since \(v_1(x_0)>0\) and \(\sup v_1 <\infty \), the estimate (78) implies the asymptotic lower bound

$$\begin{aligned} \liminf _{t\rightarrow \infty }t^{-1}\log \,P_{x_0}[\tau _0>t]\ \ge \ -\lambda _1(0,\infty ) . \end{aligned}$$
(79)

Moreover, for any fixed \(t\le \lambda _1^{-1}/4\),

$$\begin{aligned} {\mathbb {P}}_R \left[ {\tau _0> t}\right] \ \ge \ e^{-1/4}\, v_1(R)/\sup v_1\ \ge \ 3/4 \end{aligned}$$

provided \(v_1(R)\ge \frac{3}{4}e^{1/4}\sup v_1=0.96\ldots \cdot \sup v_1\). By the eigenfunction equation \(e^U(e^{-U}v_1^\prime )'=-\lambda _1v_1\), one verifies that the latter condition is satisfied whenever U is growing fast enough on \([R,\infty )\).

For bounding \(\lambda _1(0,\infty )\) from above let

$$\begin{aligned} v(x)= \min (\sqrt{L}x,1)= \left\{ \begin{array}{l@{\quad }l} \sqrt{L} x &{} \text{ if } x\le 1/\sqrt{L},\\ 1 &{} \text{ if } x\ge 1/\sqrt{L}. \end{array}\right. \end{aligned}$$

By the assumptions on U, the function v is contained in the weighted Sobolev space \(H_0^{1,2}((0,\infty ), e^{-U}\, dx)\) (closure of \(C_0^\infty (0,\infty )\) w.r.t. the norm \(\Vert w\Vert ^2=\int _0^\infty (w^2+(w')^2)\, e^{-U}\, dx\)). Therefore, if \(LR^2/4\ge 1\) then (28) holds, since

$$\begin{aligned} \lambda _1\le & {} \frac{\frac{1}{2}\int v'(x)^2\exp (-U(x))\,dx}{\int v(x)^2\exp (-U(x))\,dx}\ \le \ \frac{\int _0^{1/\sqrt{L}}L\exp (Lx^2/2)\,dx}{\int _0^{R/2}v(x)^2\exp (Lx^2/2)\,dx}\\= & {} \frac{L}{2}\,\frac{\int _0^{1}exp(y^2/2)\,dy}{\int _0^{\sqrt{LR^2/4}}\min (y,1)^2\exp (y^2/2)\,dy} \ \le \ \frac{3Le^{1/2}}{2}\, \sqrt{\frac{LR^2}{4}}\, \exp \left( \frac{LR^2}{8} \right) . \end{aligned}$$

Here we have used that by assumption, \(U(x)\ge -Lx^2/2\) for any \(x\in {\mathbb {R}}\) with equality for \(|x|<R/2\), and for \(x\ge 1\),

$$\begin{aligned} \int _0^x\min (y,1)^2e^{y^2/2}\, dy= \int _0^1\ldots +\int _1^x\ldots \ \ge \ \frac{1}{3} +\frac{1}{x}e^{x^2/2}-1 \ \ge \ \frac{1}{3x}e^{x^2/2} \end{aligned}$$

as \((x^{-1}e^{x^2/2})'=(1-x^{-2}) e^{x^2/2}\le e^{x^2/2}\). \(\square \)

Proof of Lemma 2

Since \(b=b_0+\gamma \), we have

$$\begin{aligned}&(x-y)\cdot G(b(x)-b(y))= (x-y)\cdot G(b_0(x)-b_0(y))\\&\quad +(x-y)\cdot G(\gamma (x)-\gamma (y)) \end{aligned}$$

for any \(x,y\in {\mathbb {R}^{d}}\). Therefore, by (31) and by definition of \(\kappa \) and \(\kappa _0\),

$$\begin{aligned} \kappa (r)^{-}\le & {} \kappa _0(r)^{-} \qquad \qquad \text{ for } \text{ any } r\le R, \text{ and } \end{aligned}$$
(80)
$$\begin{aligned} \kappa (r)^{-}\le & {} \kappa _0(r)^{-} +4r^{-1}\sup \Vert \gamma \Vert \qquad \text{ for } \text{ any } r\in (0,\infty ). \end{aligned}$$
(81)

In particular, if \(\gamma \) is bounded then \(\kappa \) satisfies the conditions in (7). Since the constant \(R_1(b)\) defined w.r.t. b is smaller than the corresponding constant \(R_1\) defined w.r.t. \(b_0\), we obtain

$$\begin{aligned} \frac{1}{c}\le & {} \int _0^{R_1}\int _0^s\exp \left( \frac{1}{4}\int _t^su\kappa (u)^{-}\, du\right) \, dt\, ds\\\le & {} \int _0^{R_1}\int _0^s\exp \left( \frac{1}{4}\int _t^su\kappa _0 (u)^{-}\, du\right) \, \exp \left( R\sup \Vert \gamma \Vert \right) \, dt\, ds\\\le & {} \frac{1}{c_0}\cdot \exp \left( R\sup \Vert \gamma \Vert \right) , \end{aligned}$$

i.e., (32) holds.

Similarly, if \(\gamma \) satisfies the one-sided Lipschitz condition (33) then

$$\begin{aligned} \kappa (r)^{-} \le \kappa _0(r)^{-} +2L\qquad \text{ for } \text{ any } r\in (0,\infty ). \end{aligned}$$
(82)

Hence again the conditions in (7) are satisfied, and we obtain

$$\begin{aligned} \frac{1}{c} \le \frac{1}{c_0}\cdot \exp \left( \frac{L}{2}\int _0^Rr\, dr\right) \end{aligned}$$

similarly as above, i.e., (34) holds. \(\square \)

Proof of Theorem 6

Fix \(R>0\) and probability measures \(\mu ,\nu \) on \({\mathbb {R}^{d}}\). By definition of \(f_R\),

$$\begin{aligned} f_R''(r) \le \frac{1}{4} r\kappa (r)f_R'(r)-f_R(r) \Big / \int _0^R\frac{\varPhi (s)}{\varphi (s)}\, ds \end{aligned}$$

for any \(r<R\). Therefore, similarly to the proof of Theorem 1, Eq. (60) shows that the process \(e^{c_Rt}f_R(r_t)\) is a local supermartingale for \(t<{\hat{\tau }}_R\) where

$$\begin{aligned} {\hat{\tau }}_R= \inf \{ t\ge 0: r_t> R\} . \end{aligned}$$

Here \(r_t=\Vert X_t-Y_t\Vert \) again denotes the distance process for a reflection coupling \((X_t,Y_t)\) of two solutions of (1) with initial distribution given by a coupling \(\eta \) of \(\mu \) and \(\nu \). By optional stopping and Fatou’s lemma, we thus obtain

$$\begin{aligned} \mathbb {E}[f_R(r_t);\, {\hat{\tau }}_R >t]\ \le \ \mathbb {E}[f_R(r_{t\wedge {\hat{\tau }}_R})]\ \le \ \exp ({-c_Rt})\,\mathbb {E}[f_R(r_0)] \end{aligned}$$

for any \(t\ge 0\), and hence

$$\begin{aligned} \mathbb {E}[f_R(r_t)]\le & {} \exp ({-c_Rt})\mathbb {E}[f_R(r_0)] \,+\,{\mathbb {P}} [ {\hat{\tau }}_R \le t]\\\le & {} e^{-c_Rt}\int f_R(\Vert x-y\Vert \, \eta (dx\,dy)\, +\, {\mathbb {P}}_\mu [\tau _{R/2}\le t]\, +\, {\mathbb {P}}_\nu [\tau _{R/2}\le t] . \end{aligned}$$

The assertion now follows as in the proof of Corollary 2 by minimizing over all couplings \(\eta \) of \(\mu \) and \(\nu \). \(\square \)

6 Couplings on product spaces

Let \(d=\sum _{i=1}^nd_i\) with \(n,d_1,\ldots ,d_n\in \mathbb {N}\). We now consider “componentwise” couplings for diffusion processes \(X_t=(X_t^1,\ldots , X_t^n)\) and \(Y_t=(Y_t^1,\ldots , Y_t^n)\) on \({\mathbb {R}^{d}}\) satisfying the s.d.e.

$$\begin{aligned} dX_t^i= b^i(X_t)\, dt\, +\, dB_t^i,\qquad i=1,\ldots ,n, \end{aligned}$$
(83)

with initial conditions \(X_0\sim \mu \) and \(Y_0\sim \nu \). Here \(B^i\), \(i=1,\ldots ,n\), are independent Brownian motions on \({\mathbb {R}}^{d_i}\), and \(b^i:{\mathbb {R}}^{d_i}\rightarrow {\mathbb {R}}^{d_i}\) are locally Lipschitz continuous functions such that the unique strong solution of (83) is non-explosive for any given initial condition.

Let \(\delta >0\). Suppose that \(\lambda ^i,\pi ^i:{\mathbb {R}^{d}}\rightarrow [0,1]\), \(i=1,\ldots ,n\), are Lipschitz continuous functions such that

$$\begin{aligned} \lambda ^i(z)^2+\pi ^i(z)^2= & {} 1\qquad \text{ for } \text{ any } z\in {\mathbb {R}^{d}},\qquad \text{ and } \end{aligned}$$
(84)
$$\begin{aligned} \lambda ^i(z)= & {} 0\qquad \text{ if } |z^i|\le \delta /2, \end{aligned}$$
(85)

and let \(B^i\) and \({\widetilde{B}}^i\), \(1\le i\le n\), be independent Brownian motions on \({\mathbb {R}}^{d_i}\). Then a coupling of two solutions of (83) with initial distributions \(\mu \) and \(\nu \) is given by a strong solution of the system

$$\begin{aligned} dX_t^i= & {} b^i(X_t)\, dt\, +\,\lambda ^i(Z_t)\, dB_t^i\, +\,\pi ^i(Z_t)\, d{\widetilde{B}}_t^i ,\nonumber \\ dY_t^i= & {} b^i(Y_t)\, dt\, +\,\lambda ^i(Z_t)\,(I-2e_t^i e_t^{i,T})\, dB_t^i\, +\,\pi ^i(Z_t)\, d{\widetilde{B}}_t^i , \end{aligned}$$
(86)

\(1\le i\le n\), with initial distribution \((X_0,Y_0)\sim \eta \) where \(\eta \) is a coupling of \(\mu \) and \(\nu \). Here we use the notation

$$\begin{aligned} Z_t= X_t-Y_t, \end{aligned}$$

and \(e_t^i\) is a measurable process taking values in the unit sphere in \(\mathbb {R}^{d_i}\) such that

$$\begin{aligned} e_t^i= \left\{ \begin{array}{l@{\quad }l}Z_t^i/|Z_t^i| &{}\ \text{ if } Z_t^i\ne 0,\\ u^i&{}\ \text{ if } Z_t^i= 0, \end{array}\right. \end{aligned}$$

where \(u^i\) is an arbitrary fixed unit vector in \(\mathbb {R}^{d_i}\). Notice that by (85), the choice of \(u^i\) is not relevant for (86), which is a standard Itô s.d.e. in \(\mathbb {R}^{2d}\) with locally Lipschitz continuous coefficients. To see that (86) defines a coupling, we observe that \((X_t)\) and \((Y_t)\) satisfy (83) w.r.t. the processes \(\hat{B}_t=(\hat{B}_t^1,\ldots , \hat{B}_t^n)\) and \(\check{B}_t=(\check{B}_t^1,\ldots , \check{B}_t^n)\) defined by

$$\begin{aligned} \hat{B}_t^i= & {} \int _0^t\lambda ^i(Z_s)\, dB_s^i\,+\, \int _0^t\pi ^i(Z_s)\, d{\widetilde{B}}_s^i,\\ \check{B}_t^i= & {} \int _0^t\lambda ^i(Z_s)\, (I-2e_s^i e_s^{i,T})\, dB_s^i\,+\, \int _0^t\pi ^i(Z_s)\, d{\widetilde{B}}_s^i. \end{aligned}$$

By Lévy’s characterization and by (84), both \(\hat{B}\) and \(\check{B}\) are indeed Brownian motions in \({\mathbb {R}^{d}}\), cp. the corresponding argument for reflection coupling.

Remark 7

  1. (1)

    By Condition (85) and non-explosiveness of (83), the coupling process \((X_t,Y_t )\) is defined for any \(t\ge 0\).

  2. (2)

    By choosing \(\lambda ^i\equiv 0\) and \(\pi ^i\equiv 1\) we recover the synchronuous coupling, i.e., the same noise is applied to both processes X and Y.

  3. (3)

    A componentwise reflection coupling would be informally given by choosing \(\lambda ^i(z)=1\) if \(z^i\ne 0\) and \(\lambda ^i(z)=0\) if \(z^i=0\). As this function is not continuous and \(e^i(z)=z^i/|z^i|\) also has a discontinuity at zero, it is not obvious how to make sense of this coupling rigorously. Instead, we will use below an approximate componentwise reflection coupling where \(\lambda ^i(z)=1\) if \(|z^i|\ge \delta \) and \(\lambda ^i(z)=0\) if \(|z^i|\le \delta /2 \) for a small positive constant \(\delta \).

By subtracting the equations for X and Y in (86), we see that the difference process \(Z=X-Y\) satisfies the s.d.e.

$$\begin{aligned} dZ_t^i = (b^i(X_t)-b^i(Y_t))\, dt\, +\,2\lambda ^i(Z_t)\, e_t^i\, dW_t^i\, , \end{aligned}$$
(87)

\(i=1,\ldots ,n\), where the processes

$$\begin{aligned} W_t^i= \int _0^te_t^{i}\cdot dB_t^i,\qquad 1\le i\le n, \end{aligned}$$

are independent one-dimensional Brownian motions.

Let \(r_t^i=|X_t^i-Y_t^i|\) denote the Euclidean norm of \(Z_t^i\). The next lemma is crucial for quantifying contraction properties of the coupling given by (86):

Lemma 4

Suppose that \(f:[0,\infty )\rightarrow [0,\infty )\) is a strictly increasing concave function in \(C^1([0,\infty ))\) such that \(f'\) is absolutely continuous on \((0,\infty )\). Then for any \(i=1,\ldots ,n\), the process \(f(r_t^i)\) satisfies the Itô equation

$$\begin{aligned} f(r_t^i)= & {} f(r_0^i)\, +\, 2\int _0^t\lambda ^i (X_s-Y_s)\, f'(r_s^i)\, dW_s^i\nonumber \\&+\int _0^t\left\{ e_s^i\cdot (b^i(X_s)-b^i(Y_s))\, f'(r_s^i)\, +\,2\lambda ^i(X_s-Y_s)^2\, f''(r_s^i)\right\} \, ds.\nonumber \\ \end{aligned}$$
(88)

Remark 8

The lemma shows in particular that the process \(r_t^i\) satisfies

$$\begin{aligned} dr_t^i= e_t^i\cdot (b^i(X_t)-b^i(Y_t))\, dt\, +\,2\lambda ^i(X_t-Y_t)\, dW_t^i . \end{aligned}$$
(89)

Notice that in this equation, the drift term does not depend on the choice of \(\lambda \).

Proof of Lemma 4

Recall that \(e_t^i= Z_t^i/|Z_t^i|\) if \(r_t^i=|Z_t^i|\ne 0\). Since the function \(y\mapsto y/|y|\) is smooth on \(\mathbb {R}^{d_i} {\setminus }\{ 0\} \) and \(x\mapsto \sqrt{x}\) is smooth on \((0,\infty )\), we can apply Itô’s formula and (87) to show that the Itô equations

$$\begin{aligned} d|Z^i|^2= & {} 2Z^i\cdot (b^i(X)-b^i(Y))\, dt\, +\, 4\, \lambda ^i(Z)^2\, dt\, +\,4\lambda ^i(Z)\,|Z^i| \, dW^i\, ,\nonumber \\ dr^i= & {} \frac{1}{2r^i}\, d|Z^i|^2\, - \,\frac{1}{8(r^i)^3} \, d[|Z^i|^2]\nonumber \\= & {} e^i\cdot (b^i(X)-b^i(Y))\, dt\, +\,2\lambda ^i(X-Y)\, dW^i \end{aligned}$$
(90)

hold almost surely on any stochastic interval \([\tau _1,\tau _2]\) such that \(Z_t^i\ne 0\) a.s. for \(\tau _1\le t \le \tau _2\).

On the other hand, suppose that \(|Z^i| <\delta /2\) a.s. on a stochastic interval \([\tau _3,\tau _4 ]\). Then on \([\tau _3,\tau _4 ]\), \(\lambda (Z)\equiv 0\) by (85), and hence \(Z^i\) is almost surely absolutely continuous with

$$\begin{aligned} dZ^i/dt= b^i(X)-b^i(Y)\qquad \text{ a.e. } \text{ on }\ [\tau _3,\tau _4 ]. \end{aligned}$$

This implies that \(r^i=|Z^i|\) is almost surely absolutely continuous on \([\tau _3,\tau _4 ]\) as well with

$$\begin{aligned} dr^i/dt= e^i\cdot (b^i(X)-b^i(Y))\qquad \text{ a.e. } \text{ on } \ [\tau _3,\tau _4 ], \end{aligned}$$
(91)

which is equivalent to (89) on \([\tau _3,\tau _4 ]\). Note that the value of \(e^i\) for \(Z^i=0\) is not relevant here, since \(Z^i\) can only stay at 0 for a positive amount of time if \(b^i(X)-b^i(Y)\) vanishes during that time interval.

Since \(\mathbb {R}_+\) is the union of countably many stochastic intervals of the first and second type considered above, the Itô equation (89) holds almost surely on \(\mathbb {R}_+\). The assertion (88) now follows from (89) by another application of Itô’s formula. Here it is enough to assume that f is \(C^1\) on \([0,\infty )\) and \(f'\) is absolutely continuous on \((0,\infty )\) because \(\lambda ^i(X_s-Y_s)\) vanishes for \(r_s^i<\delta /2\). \(\square \)

We now fix weights \(w_1,\ldots w_n\in [0,\infty )\) and strictly increasing concave functions \(f_1,\ldots ,f_n \in C^1 ([0,\infty ))\cap C^2((0,\infty ))\) such that \(f_i(0)=0\) for any i. Consider

$$\begin{aligned} \rho _t= \sum _{i=1}^nf_i(r_t^i)\, w_i= d_{f,w}(X_t,Y_t) \end{aligned}$$
(92)

where \(d_{f,w}\) is defined by (44). By Lemma 4,

$$\begin{aligned} d\rho _t= & {} \sum _{i=1}^n \left( e_t^i\cdot (b^i(X_t)-b^i(Y_t))\, f_i^\prime (r_t^i)\,+\, 2\lambda ^i(X_t-Y_t)^2\, f_i^{\prime \prime }(r_t^i)\right) \, w_i\, dt\nonumber \\&+ 2\,\sum _{i=1}^n\lambda ^i (X_t-Y_t)\, f_i^\prime (r_t^i)\, dW_t^i . \end{aligned}$$
(93)

Notice that the last term on the right hand side is a martingale since \(\lambda ^i\) and \(f_i^\prime \) are bounded. This enables us to control the expectation \({\mathbb {E}}[\rho _t]\) if we can bound the drift in (93) by \(m-c\rho _t\) for constants \(m,c\in (0,\infty )\):

Lemma 5

Let \(m,c\in (0,\infty )\) and suppose that

$$\begin{aligned} \sum _{i=1}^n \left( cf_i(r^i)\!+\! (x^i-y^i)\cdot (b^i(x)-b^i(y))\, \frac{f_i^\prime (r^i)}{r^i} \!+\! 2\lambda ^i(x-y)^2\, f_i^{\prime \prime }(r^i)\right) \, w_i\ \le \ m \end{aligned}$$
(94)

holds for any \(x,y\in {\mathbb {R}^{d}}\) with \(r^i:=|x^i-y^i|>0\ \quad \forall \, i\in \{ 1,\ldots n\} \). Then

$$\begin{aligned} \mathbb {E}[\rho _t]\ \le \ e^{-ct}\,\mathbb {E}[\rho _0] \, +\, m\, (1-e^{-ct})/c\qquad \text{ for } \text{ any } t\ge 0. \end{aligned}$$
(95)

Proof

We first note that by continuity of \(b^i\) and \(f_i^\prime \), (94) implies that

$$\begin{aligned} \sum _{i=1}^n \left( cf_i(r^i)\, +\, e^i\cdot (b^i(x)-b^i(y))\, {f_i^\prime (r^i)} \,+\, 2\lambda ^i(x-y)^2\, f_i^{\prime \prime }(r^i)\right) \, w_i\ \le \ m \end{aligned}$$
(96)

holds for any \(x,y\in {\mathbb {R}^{d}}\) (even if \(x^i-y^i=0\)) provided \(e^i=(x^i-y^i)/r^i\) if \(r^i>0\) and \(e^i\) is an arbitrary unit vector if \(r^i=0\). Indeed, we obtain (96) by applying (94) with \(x^i\) replaced by \(x^i+he^i\) whenever \(x^i-y^i=0\) and taking the limit as \(h\downarrow 0\). In particular, by (96), the drift term \(\beta _t\) in (93) is bounded from above by

$$\begin{aligned} \beta _t\ \le \ m-\sum _{i=1}^ncf_i(r_t^i)w_i= m-c\rho _t . \end{aligned}$$

Therefore by (93) and by the Itô product rule,

$$\begin{aligned} d(e^{ct}\rho )= e^{ct}\, d\rho \, +\, ce^{ct}\rho \, dt\ \le \ e^{ct}m\, dt\, +\, dM \end{aligned}$$

where M is a martingale, and thus

$$\begin{aligned} \mathbb {E}[e^{ct}\rho _t]\ \le \ \mathbb {E}[\rho _0] \, +\, m\, \int _0^te^{cs}\, ds\qquad \text{ for } \text{ any } t\ge 0. \end{aligned}$$

\(\square \)

Since \(f_i^{\prime \prime }\le 0\), the process \(\rho _t\) is decreasing more rapidly (or growing more slowly) if \(\lambda ^i\) takes larger values. In particular, the decay properties of \(\rho _t\) would be optimized when \(\lambda ^i(z)=1 \) for any z with \(z^i\ne 0\). This optimal choice of \(\lambda ^1,\ldots ,\lambda ^n\) would correspond to a componentwise reflection coupling, but it violates Condition (85). It is perhaps possible to construct a corresponding coupling process by an approximation argument. For our purpose of bounding the Kantorovich distance \(W_{f,w}( \mu p_t,\nu p_t)\) this is not necessary. Indeed, it will be sufficient to consider approximate componentwise reflection couplings where (84) and (85) are satisfied and \(\lambda ^i(z)=1\) whenever \(|z^i|>\delta \). The limit \(\delta \downarrow 0\) will then be considered for the resulting estimates of the Kantorovich distance but not for the coupling processes.

7 Application to interacting diffusions

We will now apply the couplings introduced in Sect. 6 to prove the contraction properties for systems of interacting diffusions stated in Theorem 7 and Corollary 8. We consider the setup described in Sect. 3.1, i.e.,

$$\begin{aligned} b^i(x)= b_0^i(x^i)\, +\,\gamma ^i(x) \qquad \text{ for } i=1,\ldots ,n \end{aligned}$$
(97)

with \(b_0^i:\mathbb {R}^{d_i}\rightarrow \mathbb {R}^{d_i}\) locally Lipschitz such that \(\kappa _i\) defined by (45) is continuous on \((0,\infty )\) with

$$\begin{aligned} \liminf _{r\rightarrow \infty }\kappa _i(r)>0\ \text{ and } \ \lim _{r\rightarrow 0}r\kappa _i(r)=0\ \quad \text{ for } \text{ any } 1\le i\le n. \end{aligned}$$
(98)

The functions \(f_i\) are defined via \(\kappa _i\), and \(c_i\) is the corresponding contraction rate given by (48).

Proof of Theorem 7

We fix \(\delta >0\) and Lipschitz continuous functions \(\lambda ^i,\mu ^i:{\mathbb {R}^{d}}\rightarrow [0,1]\), \(1\le i\le n\), such that (84) and (85) hold and \(\lambda ^i(z)=1\) if \(|z^i|\ge \delta \). Let \((X_t,Y_t)\) denote a corresponding approximate componentwise reflection coupling of two solutions of (42) given by (86), and let \(\rho _t=d_{f,w}(X_t,Y_t)\). We will apply Lemma 5 which requires bounding the right hand side in (94). For this purpose recall that \(f_i\) and \(c_i\) have been chosen in such a way that

$$\begin{aligned} 2f_i^{\prime \prime }(r)-\frac{1}{2} r\kappa _i(r)f_i^\prime (r)\ \le \ -c_i\, f_i(r)\qquad \forall \ r>0, \end{aligned}$$

cf. (68) and (69). Therefore, by (97) and by definition of \(\kappa _i\),

$$\begin{aligned}&(x^i-y^i)\cdot (b^i(x)-b^i(y))\, f_i^\prime (r^i)/r^i\, +\, 2\lambda ^i(x-y)^2\, f_i^{\prime \prime }(r^i)\nonumber \\&\quad \le -\frac{1}{2}r^i\kappa _i(r^i)f_i^\prime (r^i) \, +\, |\gamma ^i(x)-\gamma ^i(y)|f_i^\prime (r^i)\, +\, 2\lambda ^i(x-y)^2\, f_i^{\prime \prime }(r^i)\nonumber \\&\quad \le -\lambda ^i(x-y)^2 c_i f_i(r^i) + |\gamma ^i(x)-\gamma ^i(y)| - \frac{1}{2}(1-\lambda ^i(x-y)^2)\,r^i\kappa _i(r^i) f_i^{\prime }(r^i)\nonumber \\&\quad \le - c_if_i(r^i) \, +\, |\gamma ^i(x)-\gamma ^i(y)|\, +\, c_i\delta \, +\, \frac{1}{2}\sup _{r<\delta }\left( r\kappa _i(r)^{-}\right) \end{aligned}$$
(99)

for any \(x,y\in {\mathbb {R}^{d}}\) with \(r^i=|x^i-y^i|>0\). Here we have used that \(0\le f_i^\prime \le 1 \), and that \(\lambda ^i (x-y)\ne 1 \) only if \(r^i<\delta \). In this case, \(f_i(r^i)\le r^i\le \delta \). By (99) and by the assumption (49) on \(\gamma ^i\), we obtain

$$\begin{aligned}&\sum _{i=1}^n\left( (x^i-y^i)\cdot (b^i(x)-b^i(y))\, f_i^\prime (r^i)/r^i\, +\, 2\lambda ^i(x-y)^2\, f_i^{\prime \prime }(r^i)\right) \, w_i\\&\quad \le m(\delta )\, +\, \sum _{i=1}^n(-c_i+\varepsilon _i) f_i(r^i)w_i\ \le \ m(\delta )\, -\, c\sum _{i=1}^nf_i(r^i)w_i \end{aligned}$$

for xy as above, where

$$\begin{aligned} m( \delta )= \sum _{i=1}^n (c_i\delta +\frac{1}{2}\sup _{r<\delta }(r\kappa _i(r)^{-}) \end{aligned}$$

is a finite constant by (98), and \(c=\min _{i=1,\ldots n}(c_i-\varepsilon _i)\). Hence (94) is satisfied with c and \(m(\delta )\) and, therefore,

$$\begin{aligned} {\mathbb {E}}[\rho _t]\ \le \ e^{-ct}\, {\mathbb {E}}[\rho _0]\, +\, m(\delta )\, (1-e^{-ct})/c . \end{aligned}$$
(100)

By choosing the coupling process \((X_t,Y_t)\) with initial distribution given by a coupling \(\eta \) of probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}^{d}}\), we conclude that

$$\begin{aligned} W_{f,w}(\mu p_t,\nu p_t)\le & {} {\mathbb {E}}\left[ d_{f,w}(X_t,Y_t)\right] = {\mathbb {E}}[\rho _t]\nonumber \\\le & {} e^{-ct}\int d_{f,w}(x,y)\,\eta (dx~dy)\, +\, m(\delta )\, (1-e^{-ct})/c \end{aligned}$$
(101)

for any \(t\ge 0\). Moreover, by (47), \(m(\delta )\rightarrow 0\) as \(\delta \downarrow 0\). Hence the assertion (50) follows from (101) by taking the limit as \(\delta \downarrow 0\) and minimizing over all couplings \(\eta \) of \(\mu \) and \(\nu \). Finally, (51) follows from (50) since \(\varphi (R^i_0)r/2\le f_i(r)\le r\) implies

$$\begin{aligned} A^{-1}\, d_{\ell ^1}(x,y)\ \le \ d_{f,w}(x,y)= \sum f_i(|x^i-y^i|)\, w_i\ \le \ d_{\ell ^1}(x,y). \end{aligned}$$

\(\square \)

Proof of Corollary 8

The \(\ell ^1 \)-Lipschitz condition (52) for \(\gamma \) implies that (49) holds with \(w_i=1\) for any i, and

$$\begin{aligned} \lambda \varepsilon _i^{-1}= \inf _{r>0}f_i(r)= f_i^\prime (R_1^i)= \varphi _i (R_0^i)/2, \end{aligned}$$

i.e., \(\varepsilon _i=2\lambda /\varphi _i(R_0^i)\). The assertion now follows from Theorem 7. \(\square \)