1 Introduction

Hamiltonian or Hybrid Monte Carlo (HMC) methods are a class of Markov Chain Monte Carlo (MCMC) methods originating in statistical physics [20] which have become increasingly popular in various application areas [11, 48, 53, 60, 64]. Their success is in particular due to empirically observed convergence acceleration compared to more traditional, random-walk based methods. The basic idea in HMC is to define an MCMC method with the help of an artificial Hamiltonian dynamics whose only purpose is to accelerate convergence to equilibrium. This Hamiltonian dynamics is designed to leave invariant a product of the target measure and a fictitious Gaussian measure in an artifical velocity variable. First rigorous theoretical results supporting the empirical evidence have only been established recently. In particular, geometric ergodicity has been verified in [10, 22, 49], and quantitative convergence bounds have been derived in the strongly convex case in [52], and under more general assumptions in [9], both by applying coupling methods.

Since many applications are high dimensional, a key issue is to understand the dependence of the convergence bounds on the dimension. Here, we study the problem of dimension dependence for a special class of models that is relevant for several important applications including Path Integral Molecular Dynamics (PIMD) [14,15,16,17, 31, 32, 43, 44, 50, 57, 58], Transition Path Sampling (TPS) [6, 59, 63, 65], and Bayesian inverse problems [7, 18, 41, 68]. For the class of models we consider, a corresponding HMC Markov chain relying on a preconditioned Hamiltonian dynamics can be defined directly on the infinite dimensional state space [3]. This suggests that one might hope for dimension-free convergence bounds for the corresponding Markov chains on finite-dimensional discretizations of the state space. Corresponding dimension-free convergence rates to equilibrium have been established for the preconditioned Crank-Nicholson (pCN) algorithm [36] and for the Metropolis-adjusted Langevin algorithm (MALA) [24], but a corresponding result for HMC is not known so far.

The goal of this paper is to fill this gap. To this end we extend the coupling approach developed for HMC in the finite dimensional case in [9], and combine it with a two-scale coupling approach for stochastic dynamics on infinite dimensional Hilbert spaces that originates in [33, 34, 54, 55] and has been further developed in [71]. The splitting into “low modes” and “high modes” in the two-scale coupling can be traced back to contraction results for the stochastic Navier-Stokes equations [55], and analogous results in the deterministic setting [28]; see [54] for a detailed review.

Our object of study is the exact preconditioned HMC algorithm (pHMC) with fixed durations on a Hilbert space, i.e., the (preconditioned) Hamiltonian dynamics is exactly integrated (or, in practical terms, the integration is carried out with very small step sizes). Here, preconditioning corresponds to an appropriate choice of the kinetic energy which involves picking the mass operator equal to the stiffness operator (or inverse covariance) associated to the Gaussian reference measure of the target probability measure. This choice of kinetic energy ensures that the corresponding pHMC algorithm is more amenable to numerical approximation and Metropolis-adjustment than HMC without preconditioning [3, 11].

We prove that the transition kernel of the Markov chain induced by the pHMC algorithm is contracting in a suitable Wasserstein/Kantorovich metric with a rate that depends transparently on the duration of the Hamiltonian flow, the eigenvalues of the covariance operator of the Gaussian reference measure, and the regularity of the preconditioned Hamiltonian dynamics. The results are given in a more general setting that includes pHMC as a special case, and also covers other types of dynamics and preconditioning strategies. As a consequence of our general results, we derive dimension-free bounds for pHMC applied to finite-dimensional approximations arising in TPS and PIMD.

Before stating our results in detail, we conclude with a brief outlook. The results below apply only to pHMC with exact integration of the Hamiltonian dynamics. In practice, the Hamiltonian dynamics is numerically approximated, to obtain numerical versions of pHMC that are implementable on a computer. The time integrator of choice for pHMC is the symmetric splitting integrator introduced in [3]. Unlike other splittings for the Hamiltionian dynamics, this approximation has an acceptance rate that is uniform with respect to the spatial step size associated with the discretization of the Hilbert space [11, § 8]. Time discretization creates a bias in the invariant measure that can be avoided by a Metropolis adjustment [11, 70]. We would expect that for unadjusted numerical HMC based on the integrator proposed in [3], similar contraction results as stated below hold if the time step size is chosen sufficiently small (but independently of the dimension). Under additional regularity assumptions, one could also hope for dimension free bounds for the Metropolis adjusted version. First steps in this direction are carried out in [9, § 2.5.4] in the finite dimensional case, and in [62, § 4] in a strongly convex infinite dimensional case, but a full study in the general case would be lengthy and go beyond the scope of the current work.

Alternatively to preconditioning, it is also possible (but more tricky) to implement non-preconditioned HMC, which corresponds to injecting white noise in the velocity variable. In this case, the corresponding Hamiltonian dynamics is highly oscillatory in high modes [61]. Therefore, convergence bounds for exact HMC without preconditioning on an infinite dimensional Hilbert space can be expected to hold only if the durations are randomized [60], and in numerical implementations, strongly stable integrators [43, 44] have to be used in order to be able to choose the step size independently of the dimension. Furthermore, scaling limit results show that for Metropolis adjusted HMC applied to i.i.d. product measures on high dimensional state spaces, the step size has to be chosen of order \(O(d^{-1/4})\) to avoid degeneracy of the acceptance probabilities [2, 30, 42].

We now state our main results in Sect. 2, and consider applications to TPS and PIMD in Sect. 3. The remaining sections contain the proofs of all results.

2 Main results

Let \({\mathcal {H}}\) be a separable and real Hilbert space with inner product \(\langle \cdot , \cdot \rangle \) and norm \(\left|\cdot \right|\). Let \({\mathcal {C}}: {\mathcal {H}}\rightarrow {\mathcal {H}}\) be a positive compact symmetric linear operator. By the spectral theorem, the eigenfunctions \(\{ e_i \}_{i \in {\mathbb {N}}}\) of \({\mathcal {C}}\) form a complete orthonormal basis of \({\mathcal {H}}\) with corresponding eigenvalues \(\{ \lambda _i \}_{i \in {\mathbb {N}}}\) which we arrange in descending order, i.e., \(\lambda _1 \ge \lambda _2 \ge \cdots \). The positivity condition means that \(\lambda _j>0\) for all \(j \in {\mathbb {N}}\), and by compactness, if \(\text {dim}({\mathcal {H}})=\infty \) then \(\lim _{j\rightarrow \infty }\lambda _j=0\). Any function \(x \in {\mathcal {H}}\) can be represented in spectral coordinates by the expansion

$$\begin{aligned} x = \sum x_j e_j \quad \text {where }x_j := \langle x, e_j \rangle . \end{aligned}$$
(2.1)

Moreover, for all \(s\in {\mathbb {R}}\), the operator \({\mathcal {C}}^s\) is defined via the spectral decomposition of \({\mathcal {C}}\). We introduce the family of inner products and norms given by

$$\begin{aligned} \langle x,y\rangle _s:= \langle x, {\mathcal {C}}^{-s} y \rangle = \langle {\mathcal {C}}^{-s/2} x, {\mathcal {C}}^{-s/2} y \rangle , \quad \left|x \right|_s:=\langle x,x\rangle _s^{1/2} \end{aligned}$$
(2.2)

for \(x,y\in {\mathcal {H}}^s\). Here for \(s\ge 0\), \({\mathcal {H}}^s\) denotes the Hilbert space consisting of all \( x \in {\mathcal {H}}\) with \(\left|x \right|_s < \infty \), whereas for \(s<0\), \({\mathcal {H}}^s\) is the completion of \({\mathcal {H}}\) w.r.t. \(\left|x \right|_s\). Note that \({\mathcal {H}}={\mathcal {H}}^0\), and for \(s>0\), \({\mathcal {H}}^s \subset {\mathcal {H}}\subset {\mathcal {H}}^{-s}\). Furthermore, the linear operator \({\mathcal {C}}\) restricts or extends (depending on whether \(s>0\) or \(s<0\)) to a linear isometry from \({\mathcal {H}}^s\) to \({\mathcal {H}}^{s+2}\) which will again be denoted by \({\mathcal {C}}\). This setup is consistent with the framework for infinite-dimensional Bayesian inverse problems [3, 18, 68]. Here, typically, \(s \in (0,1)\).

We will now introduce the pHMC method for approximate sampling from a probability measure \(\mu \) that has a density w.r.t. a Gaussian measure \(\mu _0\) on one of the Hilbert spaces \({\mathcal {H}}^s\). Afterwards, in Sect. 2.2, we will introduce a more general family of Markov chains on Hilbert spaces that includes the Markov chain associated to pHMC as a special case. In Sect. 2.3, we introduce a new coupling for these Markov chains that combines ideas from [9, 71]. Then in Sects. 2.4 and 2.5, we state our main contraction result for these couplings, and derive quantitative error bounds.

2.1 Exact preconditioned Hamiltonian Monte Carlo

Let \(\mu _0={\mathcal {N}}(0, {\mathcal {C}})\) denote the centered Gaussian measure whose covariance operator w.r.t. the inner product \(\langle \cdot , \cdot \rangle \) is \({\mathcal {C}}\) [5]. If \({\mathcal {C}}\) is trace class then \(\mu _0\) is supported on \({\mathcal {H}}\). More generally, we fix \(s\in (-\infty ,1)\) and assume that \(\mu _0\) is supported on the corresponding Hilbert space \({\mathcal {H}}^s\). This is ensured by the following assumption:

Assumption 2.1

The operator \({\mathcal {C}}^{1-s}\) is trace class, i.e.,

$$\begin{aligned} {\text {trace}}({\mathcal {C}}^{1-s}) = \sum _{j=1}^{\infty } \lambda _j^{1-s} <\infty . \end{aligned}$$

A realization \(\xi \) from \(\mu _0\) can be generated using the expansion

$$\begin{aligned} \xi = \sum _{j=1}^{\infty } \sqrt{\lambda _j} \rho _j e_j, \quad \{ \rho _i \} \overset{\text {i.i.d.}}{\sim } {\mathcal {N}}(0,1). \end{aligned}$$

For \(\xi \sim \mu _0\), Assumption 2.1 implies \({\mathbb {E}}\left|\xi \right|_s^2 = {\text {trace}}({\mathcal {C}}^{1-s}) < \infty \), and thus, \(\xi \) is indeed a Gaussian random variable on \({\mathcal {H}}^s\).

Remark 2.2

To avoid confusion, we stress that the covariance operator of a Gaussian measure is a non-intrinsic object that depends on the choice of an inner product. In particular, the covariance operator of \(\mu _0\) w.r.t. the \({\mathcal {H}}^s\) inner product is \({\mathcal {C}}^{1-s}\). Nonetheless, in what follows, we always define the covariance operator with respect to the \({\mathcal {H}}\) inner product, and in this sense, the measure \(\mu _0\) has covariance operator \({\mathcal {C}}\).

Exact preconditioned Hamiltonian Monte Carlo (pHMC) is an MCMC method for approximate sampling from probability distributions on a Hilbert space that have the general form

$$\begin{aligned} \mu (dx) \propto \exp (- U(x) ) \mu _0(dx), \quad \mu _0 = {\mathcal {N}}(0, {\mathcal {C}}), \end{aligned}$$
(2.3)

where U is a function on a Hilbert space on which the Gaussian measure \(\mu _0\) is supported. The pHMC method generates a Markov chain on this Hilbert space with transition step

$$\begin{aligned} x\mapsto X'(x) \quad \text {where} \quad X'(x)= & {} q_T(x,\xi ). \end{aligned}$$
(2.4)

Here \(\xi \sim {\mathcal {N}}(0,{\mathcal {C}})\), and the duration \(T:\Omega \rightarrow {\mathbb {R}}_+\) is in general an independent random variable with a given distribution \(\nu \) (e.g. \(\nu =\delta _r\) or \(\nu =\text {Exp}(\lambda ^{-1})\)). We will only consider the case where \(T\in (0,\infty )\) is a given deterministic constant. Moreover,

$$\begin{aligned} \phi _t(x,v)= & {} \left( q_t(x,v), v_t(x,v)\right) \qquad ( t\in [0,\infty )) \end{aligned}$$

is the exact flow of the Hilbert space valued ODE given by

$$\begin{aligned} \frac{d}{dt} q_t \ = \ v_t, \quad \frac{d}{dt} v_t \ = \ -q_t-{\mathcal {C}}{\mathcal {D}}{\mathcal {U}}(q_t),\quad \left( q_0(x,v),v_0(x,v)\right) \ = \ (x,v).\nonumber \\ \end{aligned}$$
(2.5)

Formally, (2.5) is a preconditioned Hamiltonian dynamics for the Hamiltonian

$$\begin{aligned} H(x,v)=U(x)+ \langle x, {\mathcal {C}}^{-1}x\rangle /2+\langle v, {\mathcal {C}}^{-1}v\rangle /2 , \end{aligned}$$

where the covariance operator \({\mathcal {C}}\) is used for preconditioning. A key property of (2.5) is that it leaves invariant the probability measure

$$\begin{aligned} {\hat{\mu }}(dx \; dv)\ \propto \ \exp (- U(x) )\, {\mathcal {N}}(0, {\mathcal {C}})(dx)\, {\mathcal {N}}(0, {\mathcal {C}})(dv), \end{aligned}$$
(2.6)

on phase space, and in turn, this implies that the transition kernel of pHMC defined by \(\pi (x,B) = {\mathbb {P}}[X'(x) \in B]\) leaves \(\mu \) in (2.3) invariant [3].

Below, our key assumption in this setup will be that U is a gradient Lipschitz function on the Hilbert space \({\mathcal {H}}^s\) where the reference measure \(\mu _0\) is supported:

Assumption 2.3

The target measure \(\mu \) is a probability measure on \({\mathcal {H}}^s\) that is absolutely continuous with respect to \(\mu _0\). The relative density is proportional to \(\exp (-U)\) where \(U:{\mathcal {H}}^s\rightarrow [0,\infty )\) is a Frèchet differentiable function satisfying the gradient Lipschitz condition

$$\begin{aligned} \left| \partial _hU(x)-\partial _hU(y)\right| \ \le \ L_g\, \left| x-y\right| _s\, \left| h\right| _s\qquad \text {for all }x,y,h\in {\mathcal {H}}^s \end{aligned}$$

for some finite positive constant \(L_g\).

In Assumption 2.3, \(\partial _h U\) denotes the directional derivative of U in direction h. We also use the notation DU to denote the differential of U, i.e., (DU)(x) is the linear functional on \({\mathcal {H}}^s\) defined by \((DU)(x)[h]=(\partial _hU)(x)\). Identifying the dual space of \({\mathcal {H}}^s\) with \({\mathcal {H}}^{-s}\), Assumption 2.3 shows that we can view DU as a Lipschitz continuous function from \({\mathcal {H}}^s\) to \({\mathcal {H}}^{-s}\), i.e.,

$$\begin{aligned} \left| DU(x)-DU(y)\right| _{-s}\ \le \ L_g \, \left| x-y\right| _s\qquad \text {for all }x,y\in {\mathcal {H}}^s. \end{aligned}$$
(2.7)

Recalling that \({\mathcal {C}}\) is an isometry from \({\mathcal {H}}^{-s}\) to \({\mathcal {H}}^{2-s}\), and \({\mathcal {H}}^{2-s}\) is continuously embedded into \({\mathcal {H}}^s\) for \(s<1\), we see that Assumption 2.3 implies that the drift function

$$\begin{aligned} b(x):= - x- {\mathcal {C}} D U(x) \end{aligned}$$
(2.8)

occurring in (2.5) is a Lipschitz continuous map from \({\mathcal {H}}^s\) to \({\mathcal {H}}^s\).

Remark 2.4

The global Lipschitz condition in Assumption 2.3 is principally the same as Condition 3.2 in [3] except here the domain \({\mathcal {H}}^s\) of the potential energy is defined in terms of the covariance operator itself rather than in terms of an auxiliary operator with related eigenfunctions and eigenvalues.

2.2 General setting

We now introduce a more general setup that includes the Markov chain induced by pHMC as a special case. We fix \(s\in {\mathbb {R}}\), and we assume that \(b: {\mathcal {H}}^{s} \rightarrow {\mathcal {H}}^{s}\) is a Lipschitz continuous function. Let \( \phi _t(x,v) = \left( q_t(x,v), v_t(x,v)\right) \) denote the exact flow of the Hilbert space valued ODE given by

$$\begin{aligned} \frac{d}{dt} q_t \ = \ v_t, \quad \frac{d}{dt} v_t \ = \ b(q_t),\quad \left( q_0(x,v),v_0(x,v)\right) \ = \ (x,v). \end{aligned}$$
(2.9)

As above, we fix a constant duration \(T\in (0,\infty )\) and consider the Markov chain on \({\mathcal {H}}^s\) with transition step

$$\begin{aligned} x\ \mapsto \ X'(x)\ :=\ q_T(x,\xi ) \;,\qquad \xi \sim {\mathcal {N}}(0, \widetilde{{\mathcal {C}}}) \end{aligned}$$
(2.10)

where \(\widetilde{{\mathcal {C}}}\) is a linear operator on \({\mathcal {H}}\) with the same eigenfunctions as \({\mathcal {C}}\).

Assumption 2.5

\(\widetilde{{\mathcal {C}}}: {\mathcal {H}}\rightarrow {\mathcal {H}}\) is a symmetric linear operator with eigenfunctions \(\{e_i\}_{i \in {\mathbb {N}}}\) and corresponding eigenvalues \(\{{\widetilde{\lambda }}_i\}_{i \in {\mathbb {N}}}\). Moreover, the operator \(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}\) is trace class, i.e.,

$$\begin{aligned} {\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}) = \sum _{j=1}^{\infty } {\widetilde{\lambda }}_j \lambda _j^{-s} <\infty . \end{aligned}$$

For \(\xi \sim {\mathcal {N}}(0, \widetilde{{\mathcal {C}}})\), this implies \({\mathbb {E}}\left|\xi \right|_s^2 = {\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}) < \infty \), and thus, \(\xi \) in (2.10) is a Gaussian random variable on \({\mathcal {H}}^s\). Let \(\pi (x,B) = {\mathbb {P}}[X'(x) \in B]\) denote the corresponding transition kernel. In particular, in the case where b is given by (2.8) and \(\widetilde{ {\mathcal {C}}}={\mathcal {C}}\), we recover the Markov chain associated to pHMC. When \(\widetilde{ {\mathcal {C}}} \ne {\mathcal {C}}\), the choice \(b(x)=-\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-1} x - \widetilde{{\mathcal {C}}} DU(x)\) ensures that the corresponding partially preconditioned dynamics in (2.9) leaves invariant the probability measure \(\mu (dx) {\mathcal {N}}(0, {{\widetilde{C}}})(dv)\).

Our main result rests on the assumption that the Hilbert space \({\mathcal {H}}^s\) can be split into a finite dimensional subspace \({\mathcal {H}}^{s,\ell }\) (“the low modes”) and its orthogonal complement \({\mathcal {H}}^{s,h}\) (“the high modes”) such that b(x) is close to a linear map on \({\mathcal {H}}^{s,h}\). More precisely, fix \(n \in {\mathbb {N}}\). Let \({\mathcal {H}}^{s,\ell } := {\text {span}}\{e_1, \dots , e_n\}\), and let \({\mathcal {H}}^{s,h}\) denote its orthogonal complement, i.e., \({\mathcal {H}}^{s,h}\) is the closure in \({\mathcal {H}}^s\) of \( {\text {span}}\{e_{n+1},e_{n+2}, \dots \} \). Thus \({\mathcal {H}}^s = {\mathcal {H}}^{s,\ell } \oplus {\mathcal {H}}^{s,h}\). For any \(x \in {\mathcal {H}}^s\), we denote by \(x^{\ell }\) and \(x^h\) the orthogonal projections onto \({\mathcal {H}}^{s,\ell }\) and \({\mathcal {H}}^{s,h}\), respectively.

Assumption 2.6

b is a function from \({\mathcal {H}}^s\) to \({\mathcal {H}}^s\) such that \(b(0)=0\). Moreover it satisfies the following conditions:

  1. (B1)

    There exists \(L\in [1,\infty )\) such that

    $$\begin{aligned} \left|b(x) - b(y) \right|_s \ \le \ L\left|x - y \right|_s \quad \text {for all }x, y \in {\mathcal {H}}^{s}. \end{aligned}$$
    (2.11)
  2. (B2)

    There exists \(n \in {\mathbb {N}}\) such that

    $$\begin{aligned} \left|b^h(x) - b^h(y) + x^h - y^h \right|_s \ \le \ \frac{1}{3} \left|x-y \right|_s \quad \text {for all }x, y \in {\mathcal {H}}^{s}. \end{aligned}$$
    (2.12)
  3. (B3)

    There exist \(K>0\) and \(A \ge 0\) such that

    $$\begin{aligned} \langle x, b(x) \rangle _{s}\le & {} - K \left|x \right|_s^2 + A \quad \text {for any }x \in {\mathcal {H}}^{s}. \end{aligned}$$
    (2.13)

Condition (B1) is a global Lipschitz condition. Since \(b(0)=0\), it implies the linear growth condition \(\left|b(x) \right|_s \le L\left|x \right|_s\), and this condition and (B3) imply that \(K \le L\). Condition (B2) says that in the high modes, b(x) behaves essentially as a linear drift. Finally, Condition (B3) is a standard drift condition which implies that the Markov chain has a Foster–Lyapunov function. It is similar to other conditions in the literature that consider Markov processes on unbounded spaces based on second-order dynamical systems including Hypothesis (H2) in [10], Equation (13) of [66], Hypothesis 1.1 in [69], Condition 3.1 in [56], and Assumption 1.2 in [25].

Lemma 2.7

(Foster–Lyapunov function) Suppose that Assumptions 2.5 and 2.6 hold. Then for any \(T>0\) satisfying \( LT^2 \le \frac{1}{48} \frac{K}{L}\) we have

$$\begin{aligned} {\mathbb {E}}\left[ \left|X'(x) \right|_s^2 \right] \le \left( 1- \frac{K T^2}{2} \right) \left|x \right|_s^2 + 5 ( A + {\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}) ) T^2 \quad \text {for all }x \in {\mathcal {H}}^s. \end{aligned}$$

The proof of this lemma is given in Sect. 5.

Example 2.1

(pHMC) Suppose that b is given by (2.8) and U satisfies the global Lipschitz condition in Assumption 2.3 with Lipschitz constant \(L_g\). Then condition (B1) holds with Lipschitz constant \(L= 1 + \lambda _1^{1-s} L_g\) and Condition (B2) holds with \(n=\inf \{ k \in {\mathbb {N}} : \lambda _{k+1}^{1-s} < 1/(3 L_g) \}\). Indeed, by (2.7),

$$\begin{aligned}&\left|b(x) - b(y) \right|_s \le \left|x-y \right|_s + \left|{\mathcal {C}}^{1-s} ( DU(x) - D U(y) ) \right|_{-s} \le ( 1 + \lambda _1^{1-s} L_g ) \left|x-y \right|_s, \end{aligned}$$

and \(\left|b^h(x) - b^h(y) + x^h - y^h \right|_s\le \lambda _{n+1}^{1-s} L_g \left|x - y \right|_s \le (1/3) \left|x- y \right|_s\) as required. Moreover, the drift condition (B3) can be verified in examples, see Sect. 3.

2.3 Two-scale coupling

We now introduce a coupling for the transition steps of two copies of the Markov chain starting at different initial conditions x and y. We use a synchronous coupling of the high modes in \({\mathcal {H}}^{s,h}\) and a different coupling for the low modes in \({\mathcal {H}}^{s,\ell }\) that together enable us to derive a weak form of contractivity. Note that the covariance operator \({{\widetilde{C}}}\) has a bounded inverse on the finite dimensional subspace \({\mathcal {H}}^\ell \). Therefore, for \({\mathsf {h}}\in {\mathcal {H}}^\ell \), the Gaussian measure \({\mathcal {N}}({\mathsf {h}}, \widetilde{{\mathcal {C}}})\) is absolutely continuous w.r.t. \({\mathcal {N}}(0, \widetilde{{\mathcal {C}}})\) with relative density

$$\begin{aligned} \rho _{{\mathsf {h}}}(x)\ =\ \exp \left( \langle \widetilde{{\mathcal {C}}}^{-1} {\mathsf {h}}, x \rangle - \langle \widetilde{{\mathcal {C}}}^{-1} {\mathsf {h}}, {\mathsf {h}} \rangle /2\right) . \end{aligned}$$
(2.14)

Let \(\gamma >0\) be a positive constant. The precise value of the parameter \(\gamma \) will be chosen in an appropriate way below. The coupling transition step is given by \((x,y)\mapsto (X'(x,y),Y'(x,y))\) where

$$\begin{aligned} X'(x,y) \ = \ q_T(x,\xi ), \quad \text {and} \quad Y'(x,y) \ = \ q_T(y,\eta ) \end{aligned}$$
(2.15)

with \(\xi \sim {\mathcal {N}}(0, \widetilde{{\mathcal {C}}})\) and \(\eta \) defined in high/low components as \(\eta ^h \ := \ \xi ^h\) and

$$\begin{aligned} \eta ^{\ell }:= & {} {\left\{ \begin{array}{ll} \xi ^{\ell } \ + \ \gamma z^{\ell } &{} \text {if } \ {\mathcal {U}} \ \le \ \rho _{-\gamma z^\ell } (\xi ^{\ell }) , \\ {\mathcal {R}} \xi ^{\ell } &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(2.16)

Here \({\mathcal {U}}\sim \text {Unif}(0,1)\) is independent of \(\xi \), \(z:=x-y\), and the reflection operator \({\mathcal {R}}\) is defined by

$$\begin{aligned} {\mathcal {R}} \ :=\ \widetilde{{\mathcal {C}}}^{1/2} (I - 2 e^{\ell } \langle e^{\ell }, \cdot \rangle ) \widetilde{{\mathcal {C}}}^{-1/2},\quad \text {where}\quad e^{\ell }\ :=\ \widetilde{{\mathcal {C}}}^{-1/2} z^{\ell }/\left|\widetilde{{\mathcal {C}}}^{-1/2} z^{\ell } \right|. \end{aligned}$$
(2.17)

Due to Assumption 2.6 (B2), the component in \({\mathcal {H}}^{s,h}\) of the resulting coupled dynamics is contracting in a finite time interval as a result of the linear part of the drift in (2.9). Moreover, the coupling of the components of the initial velocities in \({\mathcal {H}}^{s,\ell }\) is similar to the coupling in [9] which is inspired by a related coupling for second order Langevin diffusions [25]. It is defined in such a way that \(\xi ^{\ell }-\eta ^{\ell }=-\gamma z^{\ell }\) occurs with the maximal possible probability. As illustrated in Fig. 1, and proven later in Lemma 4.3, the reason for this choice is that the projection of the difference process on \({\mathcal {H}}^{s,\ell }\), i.e., \(q_t^{\ell }(x,\xi )-q_t^{\ell }(y,\eta )\), is contracting in a finite time interval if the difference \(\xi ^{\ell }-\eta ^{\ell }\) of the initial velocities is negatively proportional to the difference of the initial positions \(x^{\ell }-y^{\ell }\). Note that if \(b(x)= 0\) or \(b(x)=-x\) then the optimal choices of \(\gamma \) would be \(\gamma =T^{-1}\) and \(\gamma =\cot (T)\), respectively, because for these choices, \(X'(x,y)=Y'(x,y)\) if \( {\mathcal {U}} \le \rho _{-\gamma z^\ell }(\xi ^{\ell })\). In the case where \(\xi ^{\ell }-\eta ^{\ell }\ne -\gamma z^{\ell }\), a reflection coupling is applied. The corresponding reflection \({\mathcal {R}}\) is an orthogonal transformation w.r.t. the inner product \(\langle x,y\rangle _{\widetilde{{\mathcal {C}}}}=\langle \widetilde{{\mathcal {C}}}^{-1/2}x,\widetilde{{\mathcal {C}}}^{-1/2}y\rangle \) induced by the covariance operator \(\widetilde{{\mathcal {C}}}\) on \({\mathcal {H}}^\ell \).

Fig. 1
figure 1

Two-Scale Coupling. A diagram illustrating the two-scale coupling in the case \(\gamma =T^{-1}\). (a) In the low modes \((1 \le i \le n)\), the initial velocities are coupled such that the final positions \(q_T^{\ell }(x,\xi ) = q_T^{\ell }(y, \eta )\) are equal when \(b \equiv 0\). (b) In the high modes \((i>n)\), the initial velocities are synchronously coupled

In order to verify that \((X'(x,y),Y'(x,y))\) is indeed a coupling of the transition probabilities \(\pi (x,\cdot )\) and \(\pi (y,\cdot ) \), we remark that the distribution of \(\eta \) is \({\mathcal {N}}(0,\widetilde{{\mathcal {C}}})\) since, by definition of \(\eta ^{\ell }\) in (2.16) and a change of variables,

$$\begin{aligned}&P[ \eta ^{\ell } \in B ]\\&\quad = E \left[ I_B( \xi ^{\ell }+\gamma z^{\ell })\, \rho _{-\gamma z^\ell } (\xi ^{\ell }) \wedge 1 \right] \, +\, E \left[ I_B( \mathcal {{R}} \xi ^{\ell } ) \left( 1 - \rho _{-\gamma z^\ell } (\xi ^{\ell } ) \right) ^+ \right] \\&\quad = E \left[ \rho _{\gamma z^\ell } (\xi ^{\ell }) I_B( \xi ^{\ell }) \rho _{-\gamma z^\ell } (\xi ^{\ell }-\gamma z^{\ell }) \wedge 1 \right] + E \left[ I_B( \xi ^{\ell } ) \left( 1 - \rho _{-\gamma z^\ell } (\mathcal {{R}} \xi ^{\ell } ) \right) ^+ \right] \\&\quad = E \left[ I_B( \xi ^{\ell })\, 1\wedge \rho _{\gamma z^\ell } (\xi ^{\ell }) \right] \, +\, E \left[ I_B( \xi ^{\ell } ) \left( 1 - \rho _{\gamma z^\ell } (\xi ^{\ell }) \right) ^+ \right] \ = \ P[\xi ^{\ell } \in B] \end{aligned}$$

for any measurable set B. Here \(a \wedge b\) denotes the minimum of real numbers a and b, \(I_B(\cdot )\) denotes the indicator function for the set B, and we have used that \({\mathcal {N}}(0,\widetilde{{\mathcal {C}}})\) is invariant under the reflection \( {\mathcal {R}}\), \({\mathcal {R}} z^{\ell }= - z^{\ell }\), and by (2.14), \( \rho _{-{\mathsf {h}}}(x-{\mathsf {h}}) \rho _{{\mathsf {h}}}(x)=1\). A similar calculation shows that

$$\begin{aligned} P[\eta ^{\ell }\ne \xi ^{\ell } +\gamma z^{\ell }]\ = \ E \left[ \left( 1 - \rho _{-\gamma z^\ell } (\xi ^{\ell } ) \right) ^+ \right] \ =\ d_{\mathrm {TV}}({\mathcal {N}}(0, \widetilde{{\mathcal {C}}}) , {\mathcal {N}}(\gamma z^{\ell }, \widetilde{{\mathcal {C}}}) ) \end{aligned}$$
(2.18)

where \(d_{\mathrm {TV}}\) is the total variation distance. Hence, by the coupling characterization of the total variation distance, \(\eta ^{\ell } = \xi ^{\ell } + \gamma z^{\ell }\) does indeed hold with the maximal possible probability. Note that if z is not in the reproducing kernel Hilbert space of the covariance operator \( \widetilde{{\mathcal {C}}}\) then the probability of the event \(\eta ^{\ell } \ne \xi ^{\ell } + \gamma z^{\ell }\) in (2.18) tends to one as the number of low modes increases. This explains why it is necessary to split the Hilbert space and apply a two-scale coupling.

2.4 Contractivity

We now state our main contraction bound for the coupling introduced above. We first define a norm \({\left| \left| \left| \cdot \right| \right| \right| _{\alpha }}\) on \({\mathcal {H}}^s\) where the high modes are weighted by \(\alpha >0\):

$$\begin{aligned} {\left| \left| \left| x \right| \right| \right| _{\alpha }}\ =\ \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} x^{\ell } \right| + \alpha \left|x^h \right|_s \ =\ \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} x^{\ell } \right|_s + \alpha \left|x^h \right|_s. \end{aligned}$$
(2.19)

Let \(\sigma _{min} = \min _{1 \le i \le n} \{ {\widetilde{\lambda }}_i^{-1/2} \lambda _i^{s/2}\}\) and \(\sigma _{max} = \max _{1 \le i \le n} \{ {\widetilde{\lambda }}_i^{-1/2} \lambda _i^{s/2} \}\). Note that

$$\begin{aligned} \sigma _{min} \left|x^\ell \right|_s\ \le \ \left|\widetilde{{\mathcal {C}}}^{-1/2} {\mathcal {C}}^{s/2}x^\ell \right|_s \ \le \ \sigma _{max} \left|x^\ell \right|_s \quad \text {for all }x \in {\mathcal {H}}^{s}. \end{aligned}$$
(2.20)

Thus \({\left| \left| \left| \cdot \right| \right| \right| _{\alpha }}\) and \(\left| \cdot \right|_s\) are equivalent norms with

$$\begin{aligned} \min (\sigma _{min}, \alpha ) \left|x \right|_s \le {\left| \left| \left| x \right| \right| \right| _{\alpha }} \le \sqrt{2} \max (\sigma _{max}, \alpha ) \left|x \right|_s \quad \text {for all }x \in {\mathcal {H}}^{s}. \end{aligned}$$
(2.21)

Remark 2.8

If the dimension is infinite then the operator \(\widetilde{{\mathcal {C}}}^{-1} {\mathcal {C}}^{s}\) is unbounded on \({\mathcal {H}}^s\), because its inverse is trace class. Nonetheless, \({\left| \left| \left| x \right| \right| \right| _{\alpha }}\) is a well-defined norm for any \(x \in {\mathcal {H}}^s\) because the operator \(\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}}\) appearing in \({\left| \left| \left| x \right| \right| \right| _{\alpha }}\) only acts on the projection \(x^{\ell }\) of x onto the finite dimensional space \({\mathcal {H}}^{s,\ell }\).

As we will see below, even when U is non-convex, we can still obtain contractivity with respect to a semimetric \(\rho : {\mathcal {H}}^s \times {\mathcal {H}}^s \rightarrow [0, \infty )\) of the form

$$\begin{aligned} \rho (x,y) \ =\ \sqrt{ f( {\left| \left| \left| x-y \right| \right| \right| _{\alpha }}) (1+ \epsilon \left|x \right|_s^2 + \epsilon \left|y \right|_s^2 ) },\qquad x,y\in {\mathcal {H}}^s, \end{aligned}$$
(2.22)

where \(f:[0,\infty )\rightarrow [0,\infty )\) is a concave function given by

$$\begin{aligned} f(r) = \int _0^r e^{-a t} \ I_{\{ t \le R\}} \ dt \ =\ \frac{1}{a}\left( 1-e^{-a\, r\wedge R}\right) , \end{aligned}$$
(2.23)

and where \(R>0\), \(a >0\), and \(\epsilon > 0\) are parameters to be specified below. The semimetric \(\rho \) is similar to the one introduced in [9] in order to prove contractivity of the HMC transition step in the finite dimensional case. In general, \(\rho \) is not a metric, since the triangle inequality might be violated. Note that f is non-decreasing, and constant when \(r \ge R\).

Remark 2.9

The semimetric (2.22) incorporates, in a multiplicative or weighted way, the quadratic Foster–Lyapunov function for pHMC from Lemma 2.7 with weight \(\epsilon \). The idea to use semimetrics of this general form to study contraction properties of Markov processes goes back to [12, 35]; see also [26].

Lemma 2.7 implies that the coupling transition \((x,y) \mapsto (X'(x,y), Y'(x,y))\) also has a quadratic Foster–Lyapunov function: If \( LT^2 \le \frac{1}{48} \frac{K}{L}\) then

$$\begin{aligned} {\mathbb {E}}\left[ \left|X'(x,y) \right|_s^2 + \left|Y'(x,y) \right|_s^2 \right] \le \left( 1- \frac{K T^2}{2} \right) ( \left|x \right|_s^2 + \left|y \right|_s^2 ) + 10 ( A + {\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}) ) T^2. \end{aligned}$$

We fix a finite, positive constant R satisfying

$$\begin{aligned} R \ \ge \ 8 \sqrt{40} ( A + {\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}))^{1/2}\sigma _{max} LK^{-1/2}. \end{aligned}$$
(2.24)

In our main result below, we choose \(\alpha :=4\sigma _{max}L\). In this case, the choice of R in (2.24) guarantees that a strict drift condition

$$\begin{aligned} {\mathbb {E}}\left[ \left|X'(x,y) \right|_s^2 + \left|Y'(x,y) \right|_s^2 \right] \ \le \ \left( 1- {K T^2}/{4} \right) ( \left|x \right|_s^2 + \left|y \right|_s^2 ) \end{aligned}$$
(2.25)

holds for all (xy) satisfying \({\left| \left| \left| x - y \right| \right| \right| _{\alpha }}\ge R \), because by (2.21) and since \(L\ge 1\),

$$\begin{aligned} {\left| \left| \left| x - y \right| \right| \right| _{\alpha }}\ \le \ 4\sqrt{2}\, \sigma _{max}L\left|x-y \right|_s\ \le \ 8 \sigma _{max}L\sqrt{ \left|x \right|_s^2 + \left|y \right|_s^2} . \end{aligned}$$
(2.26)

The asymptotic strict drift condition in (2.25) allows us to split the proof of contractivity into two parts: (i) \({\left| \left| \left| x - y \right| \right| \right| _{\alpha }}\ge R \) where any coupling is contracting in \(\rho \) due to (2.25), and (ii) \({\left| \left| \left| x - y \right| \right| \right| _{\alpha }}< R \), where \(\rho \) is contracting due to the specially designed two-scale coupling.

Theorem 2.10

Suppose that Assumption  2.6 holds. Let \(T>0\) satisfy

$$\begin{aligned} \dfrac{\sigma _{max}}{\sigma _{min}} LT^2\ \le \ \min \left( \dfrac{1}{48} \dfrac{K}{L}, \dfrac{1}{256 LR^2} \dfrac{\sigma _{min}}{\sigma _{max}} \right) \end{aligned}$$
(2.27)

Let \(\alpha \), \(\gamma \), a, and \(\epsilon \) be given by

$$\begin{aligned} \alpha:= & {} 4 \sigma _{max} L, \end{aligned}$$
(2.28)
$$\begin{aligned} \gamma:= & {} \min \left( T^{-1},R^{-1}/4\right) , \end{aligned}$$
(2.29)
$$\begin{aligned} a:= & {} T^{-1}, \end{aligned}$$
(2.30)
$$\begin{aligned} \epsilon:= & {} (1/160) (A+{\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}))^{-1} e^{- R / T }. \end{aligned}$$
(2.31)

Then for any \(x, y \in {\mathcal {H}}^s\), we have

$$\begin{aligned} E[\rho (X'(x,y),Y'(x,y))]\le & {} e^{-c} \rho (x,y),\qquad \text{ where } \end{aligned}$$
(2.32)
$$\begin{aligned} c= & {} \min \left( \frac{1}{16} K T^2,\, \frac{1}{128} T\max (R,T)\, e^{-\max (R,T)/T} \right) .\nonumber \\ \end{aligned}$$
(2.33)

The proof of this theorem is given in Sect. 6.

Remark 2.11

The rate in (2.33) is similar to the rate in the finite-dimensional case found in Theorem 2.3 of Ref. [9]. The main difference is that the condition on \(LT^2\) in (2.27) now reflects the effect of preconditioning.

2.5 Quantitative bounds for distance to the invariant measure

Theorem 2.10 establishes global contractivity of the transition kernel \(\pi (x,dy)\) w.r.t. the Kantorovich distance based on the underlying semimetric \(\rho \), which for probability measures \(\nu ,\eta \) on \({\mathcal {H}}^s\) is defined as

$$\begin{aligned} {\mathcal {W}}_\rho (\nu ,\eta )\ =\ \inf _{\gamma \in C(\nu ,\eta )}\int \rho (x,y)\,\gamma (dx\, dy) = \inf _{\begin{array}{c} X' \sim \nu , Y' \sim \eta \end{array}}E \left[ \rho (X',Y') \right] \end{aligned}$$

where the infimum is over all couplings \(\gamma \) of \(\nu \) and \(\eta \). Moreover, it implies quantitative bounds for the standard \(L^1\) Wasserstein distance

$$\begin{aligned} {\mathcal {W}}^{s,1} (\nu ,\mu )\ =\ \inf _{\gamma \in C(\nu ,\mu )}\int \left|x-y \right|_s\,\gamma (dx\, dy) = \inf _{\begin{array}{c} X' \sim \nu , Y' \sim \mu \end{array}}E \left[ \left|X'-Y' \right|_s \right] \end{aligned}$$

with respect to the invariant measure \(\mu \) on \({\mathcal {H}}^s\). Let \(M_1(\nu ):= \int \left|x \right|_s\, \nu (dx)\).

Corollary 2.12

Suppose that Assumption  2.6 holds. Let \(T\in (0,R)\) satisfy (2.27). Then for any \(k\in {\mathbb {N}}\) and for any probability measures \(\nu ,\eta \) on \({\mathcal {H}}^s\),

$$\begin{aligned} {\mathcal {W}}_\rho (\nu \pi ^k,\eta \pi ^k)\le & {} e^{-c k}\, {\mathcal {W}}_\rho (\nu ,\eta ),\qquad \qquad \text {and} \end{aligned}$$
(2.34)
$$\begin{aligned} {\mathcal {W}}^{s,1} (\nu \pi ^k ,\mu )\le & {} C\,(1+\sqrt{\epsilon }M_1(\nu )+ (1/4) K^{-1/2} e^{-R/(2 T)})\, e^{-c k} \end{aligned}$$
(2.35)

where the rate c and the constant \(\epsilon \) are given explicitly by (2.33) and (2.31), and

$$\begin{aligned} C\ =\ \max \left( 2 {T}\sigma _{min}^{-1},\,23\, ({A+{\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s})})^{1/2} e^{ R /(2 T) }\right) . \end{aligned}$$
(2.36)

In particular, for a given constant \(\delta \in (0,\infty )\), the \(L^1\) Wasserstein distance \(\Delta (k)={\mathcal {W}}^{s,1}(\nu \pi ^k ,\mu )\) w.r.t. \(\mu \) after k steps of the chain with initial distribution \(\nu \) satisfies \(\Delta (k)\le \delta \) provided

$$\begin{aligned} k\ \ge \ \frac{1}{c}\, \log \frac{C \,(1+ \sqrt{\epsilon } M_1(\nu )+(1/4) K^{-1/2} e^{-R/(2 T)})}{\delta } . \end{aligned}$$
(2.37)

The corollary is a rather direct consequence of Theorem 2.10. A short proof is included in Sect. 6.

Remark 2.13

(Quantitative bounds for ergodic averages) MCMC methods are often applied to approximate expectation values w.r.t. the target distribution by ergodic averages of the Markov chain. Our results (e.g. (2.34)) directly imply completely explicit bounds for bias and variances, as well as explicit concentration inequalities for these ergodic averages in the case of pHMC. Indeed, the general results by Joulin and Ollivier [40] show that such bounds follow directly from an \(L^1\) Wasserstein contraction w.r.t. an arbitrary metric \(\rho \), which is precisely the statement shown above.

3 Applications

3.1 Transition path sampling

Here we discuss the use of pHMC in transition path sampling (TPS). As an application of Theorem 2.10, we obtain dimension-free contraction rates for exact preconditioned HMC in this context. Fix a time horizon \(\tau >0\) (not to be confused with the duration parameter in preconditioned HMC which we denote by T). The aim of TPS [4, 37, 38, 65] is to sample from a diffusion bridge or conditioned diffusion, i.e., from the conditional law \(\nu _{a,b}\) of the solution \({\mathsf {X}}: [0,\tau ] \rightarrow {\mathbb {R}}^d\) to a d-dimensional stochastic differential equation of the form

$$\begin{aligned} {\mathsf {d}} {\mathsf {X}}( {\mathsf {t}} ) = - \nabla \Psi ( {\mathsf {X}}( {\mathsf {t}} ) ) \, {{\mathsf {d}}}{{\mathsf {s}}} + {\mathsf {d}} {\mathsf {W}}( {\mathsf {t}} ) \end{aligned}$$
(3.1)

given both initial and final conditions

$$\begin{aligned} {\mathsf {X}}(0) = a \quad \text {and} \quad {\mathsf {X}}(\tau ) = b . \end{aligned}$$

Here \(\Psi : {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is a given potential energy function and \( {\mathsf {W}}\) is a d-dimensional standard Brownian motion. TPS is particularly relevant to molecular dynamics where the states a and b represent different configurations of a molecular system [6, 59, 63].

We first recenter: Let \(\mu =\nu \circ \theta _M^{-1 }\) denote the law of the recentered bridge where \(\theta _M(x)=x-M\) is the translation on path space by the mean \({\mathsf {M}}({\mathsf {t}}) = a + ({\mathsf {t}}/\tau ) (b-a)\) of the Brownian bridge from a to b. Then by Girsanov’s theorem, the measure \(\mu \) is absolutely continuous with respect to the law \(\mu _0\) of the Brownian bridge from 0 to 0 [4, 39]. Moreover, the measure \(\mu _0\) is the centered Gaussian measure on the Hilbert space \({\mathcal {H}}= L^2([0,\tau ], {\mathbb {R}}^d)\) with covariance operator \({\mathcal {C}}=-\Delta _D^{-1}\) where \(\Delta _D\) is the Dirichlet Laplacian, and the relative density of \(\mu \) with respect to \(\mu _0\) is proportional to \(\exp (-U(x))\) where the function U(x) is defined in terms of the so-called path potential energy function \(G : {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) as follows

$$\begin{aligned} U(x) = \int _0^{\tau } G ( x({\mathsf {t}})+{\mathsf {M}}({\mathsf {t}}) ) {{\mathsf {d}}}{{\mathsf {t}}} \quad \text {where} \quad G(\cdot ) = \frac{1}{2} | \nabla \Psi (\cdot )|^2 - \frac{1}{2} \Delta \Psi (\cdot ). \end{aligned}$$
(3.2)

In the main convergence result given below, we make the following regularity assumption on G.

Assumption 3.1

The function \(G: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is continuously differentiable. Moreover, \(\nabla G(0)=0\), and \(\nabla G\) is uniformly bounded and globally Lipschitz continuous, i.e., there exist finite constants \(M_G, L_G\) such that for all \(x,y\in {\mathbb {R}}^d\),

$$\begin{aligned} | \nabla G(x)| \le M_G, \quad \text {and}\quad |\nabla G(x)-\nabla G(y)|\le L_G|x-y|. \end{aligned}$$

This regularity assumption frequently holds in molecular dynamics applications, since the configuration space of molecular systems is usually taken to be a fixed cubic box with periodic boundary conditions [1, 8, 13, 23, 29, 46, 67]. In this case, we can lift the TPS problem to the covering space \({\mathbb {R}}^d\) by extending the path potential to a periodic function on this space. Thus after recentering the coordinate system, Assumption 3.1 is satisfied whenever G is \(C^2\).

To implement TPS on a computer, we use the finite difference method to approximate the infinite-dimensional distribution \(\mu (dx) \propto \exp (- U(x) ) {\mathcal {N}}(0, {\mathcal {C}})(dx)\) by a finite-dimensional probability measure \(\mu _{m}\). Other approximations, e.g., Galerkin or finite-element, are also possible and should yield similar results. We focus on the finite difference method because it is widely used in practice. Discretize the interval \([0,\tau ]\) into \(m +2\) evenly spaced grid points

$$\begin{aligned} {\mathsf {t}}_j = {\tau }j/{(m+1)} , \quad j=0, \dots , m+1 . \end{aligned}$$
(3.3)

The space of paths on \({\mathbb {R}}^d\) is then approximated by the finite-dimensional space \({\mathbb {R}}^{m d}\). Specifically, we write \(\varvec{x} \in {\mathbb {R}}^{m d}\) as

$$\begin{aligned} \varvec{x} = (\varvec{x}_{1:d}, \dots , \varvec{x}_{m+1:m+d}) \end{aligned}$$

where the j-th component \(\varvec{x}_{j+1:j+d}:=(\varvec{x}_{j+1}, \dots , \varvec{x}_{j+d})\) is a d-dimensional vector that can be viewed as an approximation of \(x({\mathsf {t}}_j)\) for \(j=1,\dots ,m\). The Dirichlet Laplacian \(\Delta _D\) is approximated by the \(m d \times m d\) Dirichlet Laplacian matrix \(\varvec{\Delta }_{D,m}\) with (ij)-th entry

$$\begin{aligned} (\varvec{\Delta }_{D,m})_{i,j} = {\left\{ \begin{array}{ll} - 2 \left( { \tau }/{(m+1)} \right) ^{-2} &{} \text {if }|i-j|=0, \\ \left( { \tau }/{(m+1)} \right) ^{-2} &{} \text {if }|i-j|=d, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

The covariance operator \({\mathcal {C}}\) is approximated by the \(m d \times m d\) matrix \(\varvec{{\mathcal {C}}} = - \varvec{\Delta }_{D,m}^{-1}\), and the Hilbert space \({\mathcal {H}}\) is represented by \({\mathbb {R}}^{m d}\) with inner product given by the weighted dot product \(\langle \varvec{x}, \varvec{y} \rangle = \frac{\tau }{m+1} \varvec{x} \bullet \varvec{y}\). The functional (3.2) is discretized as

$$\begin{aligned} U_{m}( \varvec{x}) = \frac{\tau }{m+1} G_{m}(\varvec{x}), \quad \text {where} \quad G_{m}(\varvec{x}) = \sum _{j=1}^{m} G(\varvec{x}_{j+1:j+d} + {\mathsf {M}}({\mathsf {t}}_j) ). \end{aligned}$$

Note that if the vector \(\varvec{x}\) contains the grid values of a smooth function x, then \(U_{m}( \varvec{x}) \rightarrow U(x)\) as \(m \rightarrow \infty \). In summary, the infinite-dimensional path distribution \(\mu (dx)\) is approximated by the finite-dimensional probability measure \(\mu _m(d \varvec{x})\) with non-normalized density \(\exp \left( -U_{m}( \varvec{x})- \frac{1}{2} \langle \varvec{x}, \varvec{{\mathcal {C}}}^{-1} \varvec{x} \rangle \right) \).

To approximately sample from \(\mu _m\), we use pHMC with transition step in (2.10). This corresponds to a Markov chain on \({\mathbb {R}}^{m d}\) with transition step

$$\begin{aligned} \varvec{x} \mapsto \varvec{X}'(\varvec{x}) ~:=~ \varvec{q}_T(\varvec{x}, \varvec{\xi }), \qquad \varvec{\xi } \sim {\mathcal {N}}\left( 0, \frac{m+1}{\tau } \varvec{{\mathcal {C}}}\right) \end{aligned}$$
(3.4)

where \(\varvec{q}_t\) solves

$$\begin{aligned} \frac{d}{dt} \varvec{q}_t = \varvec{v}_t, \quad \frac{d}{dt} \varvec{v}_t = \varvec{b}(\varvec{q}_t) , \quad (\varvec{q}_0( \varvec{x},\varvec{v} ), \varvec{v}_0( \varvec{x},\varvec{v} )) = ( \varvec{x},\varvec{v} ) \in {\mathbb {R}}^{2 m d}, \end{aligned}$$
(3.5)

with \(\varvec{b}(\varvec{x}) = - \varvec{x} - \varvec{{\mathcal {C}}} \nabla G_{m} (\varvec{x})\). Let \(\pi _m\) denote the transition kernal of (3.4).

Theorem 3.2

(Transition Path Sampling) Suppose that Assumption 3.1 holds. Let \(\kappa :=2 (\tau ^2 / \pi ^2) L_G\), \(m_{\ell }=\lfloor \sqrt{3 \kappa } \rfloor \), \(n=m_{\ell } d\), and \(m^{\star }= \lceil (m_{\ell }+1) \pi /2 \rceil \). Let R, c, C and \(\epsilon \) be defined as

$$\begin{aligned} R&:= \ 16 \sqrt{20} \pi \kappa ^{1/2} (1+\kappa ) ( (\tau / \pi )^3 M_G^2 + d)^{1/2}, \end{aligned}$$
(3.6)
$$\begin{aligned} \quad c&:= \ \min ( (1/32) T^2, (1/128) T^2 \max (R,T) e^{-\max (R,T)/T} ), \end{aligned}$$
(3.7)
$$\begin{aligned} C&:= \ \max (T \tau , 23 ( (\tau ^5 / \pi ^4) M_G^2+ d \tau ^2 / 3 )^{1/2} e^{R/(2 T)}) , \end{aligned}$$
(3.8)
$$\begin{aligned} \epsilon&:= \ ( \tau ^5 M_G^2)^{-1} e^{-R/T}. \end{aligned}$$
(3.9)

Suppose that the duration parameter \(T \in (0, R)\) satisfies

$$\begin{aligned} 2 \sqrt{3 \kappa } (1+\kappa ) T^2&\le \ \min \left( \frac{1}{96 (1+\kappa )}, \frac{1}{512 \sqrt{3 \kappa } (1+\kappa ) R^2} \right) \;. \end{aligned}$$
(3.10)

Then for any \(m > m^{\star }\), \(k\in {\mathbb {N}}\), and probability measure \(\nu _m\) on \({\mathbb {R}}^{m d}\),

$$\begin{aligned} {\mathcal {W}}^{0,1} (\nu _m \pi _m^k,&\mu _m) \le C e^{-c k} \left( 1 + \sqrt{\epsilon } M_1(\nu _m) + (1/8) e^{-R/(2 T)} \right) . \end{aligned}$$
(3.11)

Remark 3.3

Note that the upper bound in (3.11) depends on dimension only through the initial distribution. The dimension independence in the other terms of this bound reflects that the finite-dimensional pHMC algorithm in (3.4) converges to an infinite-dimensional pHMC algorithm whose transition kernel satisfies an infinite-dimensional analog of this quantitative bound.

A proof of this result is given in Sect. 7.1.

3.2 Path integral molecular dynamics

Here we discuss the use of pHMC for path-integral molecular dynamics (PIMD), and as an application of Theorem 2.10, obtain dimension-free contraction rates for preconditioned HMC in this context. PIMD is used to compute exact Boltzmann properties and approximate dynamical properties of quantum mechanical systems [14]. The technique is based on Feynman’s path-integral formulation of quantum statistical mechanics [27], and the observation that the quantum Boltzmann statistical mechanics of a quantum system can be reproduced by the classical Boltzmann statistical mechanics of a ring-polymer system [14].

Consider N interacting quantum particles in 3D with potential energy operator given by

$$\begin{aligned} {\hat{V}}=V({\hat{q}}_1, \dots , {\hat{q}}_N) \end{aligned}$$
(3.12)

where \({\hat{q}}_i\) is the three-dimensional position operator of particle i and \(V: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) is a potential energy function where \(d=3 N\) [32]. The thermal equilibrium properties of this system are described by the quantum mechanical Boltzmann partition function,

$$\begin{aligned} Q={\text {Tr}}[e^{-\beta {\hat{V}}}] \end{aligned}$$
(3.13)

where \(\beta \) is an inverse temperature parameter. For some \({\mathsf {a}}>0\), suppose that the potential energy function can be written as

$$\begin{aligned} V(\cdot ) = \frac{1}{2} {\mathsf {a}} |\cdot |^2 + G(\cdot ) \end{aligned}$$

where \(G: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\). Then the partition function Q can be written as the expected value of a Gaussian random variable on loop space as follows

$$\begin{aligned} Q={\mathbb {E}}[ e^{-U(\xi )} ], \quad \text {where} \quad \xi \sim \mu _0 = {\mathcal {N}}(0,{\mathcal {C}}_{{\mathsf {a}}}), \end{aligned}$$
(3.14)

and the covariance operator \({\mathcal {C}}_{{\mathsf {a}}}\) of the Gaussian reference measure is defined in terms of the Laplacian with periodic boundary conditions \(\Delta _P\) on \(L^2([0,\beta ],{\mathbb {R}}^{d})\) as follows

$$\begin{aligned} {\mathcal {C}}_{{\mathsf {a}}} = (-\Delta _P + {\mathsf {a}} I)^{-1}, \end{aligned}$$

where I is the identity operator and the potential energy U(x) is given by

$$\begin{aligned} U(x) = \int _0^{\beta } G(x({\mathsf {t}})) {{\mathsf {d}}}{{\mathsf {t}}}. \end{aligned}$$
(3.15)

The probability measures \(\mu _0\) and \(\mu (dx) \propto \exp (- U(x) )\, {\mathcal {N}}(0, {\mathcal {C}}_{{\mathsf {a}}})(dx)\) are supported on the loop space consisting of all periodic continuous paths \(x:[0,\beta ]\rightarrow {\mathbb {R}}^d\). They are similar to the corresponding measures considered for Transition Path Sampling, but there is an additional, artificially introduced parameter \({\mathsf {a}}\) appearing in \({\mathcal {C}}_{{\mathsf {a}}}\). This parameter is essential because \(\Delta _P\) is not invertible since it has a zero (leading) eigenvalue corresponding to the ‘centroid mode’ [50].

To implement PIMD on a computer, we use finite-differences to truncate the infinite-dimensional path distribution \(\mu \) to a finite-dimensional one \(\mu _{m}\) by discretizing the interval \([0,\beta ]\) into \(m +1\) grid points

$$\begin{aligned} {\mathsf {t}}_j = {\beta }j/{m} , \quad j=0, \dots , m . \end{aligned}$$
(3.16)

The space of loops on \({\mathbb {R}}^d\) is approximated by the finite-dimensional space \({\mathbb {R}}^{m d}\). Specifically, we write \(\varvec{x} \in {\mathbb {R}}^{m d}\) as

$$\begin{aligned} \varvec{x} = (\varvec{x}_{1:d}, \dots , \varvec{x}_{m+1:m+d}) \end{aligned}$$

where \(\varvec{x}_{j+1:j+d}:=(\varvec{x}_{j+1}, \dots , \varvec{x}_{j+d})\) is a d-dimensional vector that can be viewed as an approximation of \(x({\mathsf {t}}_j)\) for \(j=1,\dots ,m\).

Remark 3.4

Comparing (3.3)–(3.16), note that the number of grid points in TPS, resp. PIMD, is \(m+2\), resp. \(m+1\). Nonetheless, in both cases path and loop space are approximated by \({\mathbb {R}}^{m d}\). The difference in the number of grid points is due to the boundary conditions: in TPS the Dirichlet boundary conditions eliminate two unknown d-dimensional vectors, whereas in PIMD the periodic boundary conditions eliminate only one unknown d-dimensional vector. Thus, the total number of unknowns in both cases is md.

The periodic Laplacian \(\Delta _P\) is approximated by the \(m d \times m d\) discrete periodic Laplacian matrix \(\varvec{\Delta _{P,m}}\) with (ij)-th entry

$$\begin{aligned} (\varvec{\Delta _{P,m}})_{i,j} = {\left\{ \begin{array}{ll} - 2 \left( \beta /{m} \right) ^{-2} &{} \text {if }|i-j|=0, \\ \left( \beta /{m} \right) ^{-2} &{} \text {if }(i-j) \bmod ( m d )=d~\text {or}~(j-i) \bmod (m d)=d, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Naturally, the covariance operator \({\mathcal {C}}_{{\mathsf {a}}}\) is approximated by the \(md \times md\) matrix \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}} = (- \varvec{\Delta _{P,m}} + {\mathsf {a}} \varvec{I}_{m d \times m d})^{-1}\) where \(\varvec{I}_{m d \times m d}\) is the \(m d \times m d\) identity matrix, and the infinite-dimensional Hilbert space \({\mathcal {H}}\) is represented by \({\mathbb {R}}^{m d}\) with inner product given by the weighted scalar product \(\langle \varvec{x}, \varvec{y} \rangle = \frac{\beta }{m} \varvec{x} \bullet \varvec{y} \). The functional in (3.2) is discretized as

$$\begin{aligned} U_{m}( \varvec{x}) = \frac{\beta }{m} G_{m}(\varvec{x}), \quad \text {where} \quad G_{m}(\varvec{x}) = \sum _{j=1}^{m} G(\varvec{x}_{j+1:j+d}). \end{aligned}$$

In summary, the infinite-dimensional path distribution \(\mu (dx)\) is approximated by the finite-dimensional distribution \(\mu _{m}(d \varvec{x}) \propto \exp (-U_{m}(\varvec{x})) {\mathcal {N}}(0,\frac{m}{\beta } \varvec{{\mathcal {C}}}_{{\mathsf {a}}})(d \varvec{x})\).

In this context, pHMC generates a Markov chain on \({\mathbb {R}}^{m d}\) with invariant measure \(\mu _{m}\) and with transition step given by

$$\begin{aligned} \varvec{x} \mapsto \varvec{X}'(\varvec{x}) ~:=~ \varvec{q}_T(\varvec{x}, \varvec{\xi }), \qquad \varvec{\xi } \sim {\mathcal {N}}(0, \frac{m}{\beta } \varvec{{\mathcal {C}}}_{{\mathsf {a}}}) \end{aligned}$$
(3.17)

where \(\varvec{q}_t\) solves (3.5) with \( \varvec{b}(\varvec{x}) = - \varvec{x} - \varvec{{\mathcal {C}}}_{{\mathsf {a}}} \nabla G_{m} (\varvec{x})\).

Theorem 3.5

Path Integral Molecular Dynamics Suppose that Assumption 3.1 holds. Let \(\kappa :=6 {\mathsf {a}}^{-1} L_G \), \(m_{\ell }=\lceil \sqrt{3 L_G/2} (\beta /\pi ) \rceil \), \(n=2 m_{\ell } d -d\), and \(m^{\star }=\lceil 2 \pi m_{\ell } \rceil \). Let R, c, C and \(\epsilon \) be defined as

$$\begin{aligned} R&:= \ 16 \sqrt{20} (1+\kappa )^{3/2} ( (1/2) \beta {\mathsf {a}}^{-1} M_G^2 + 2 d (\beta ^2 {\mathsf {a}} + 1) )^{1/2}, \end{aligned}$$
(3.18)
$$\begin{aligned} \quad c&:= \ \min ( (1/32) T^2, (1/128) T^2 \max (R,T) e^{-\max (R,T)/T} ), \end{aligned}$$
(3.19)
$$\begin{aligned} C&:= \ \max (2 T a^{1/2}, 23 ((1/2) \beta {\mathsf {a}}^{-2} M_G^2 + 2 d ({\mathsf {a}}^{-1} + \beta ^2) )^{1/2} e^{R/(2 T)}), \end{aligned}$$
(3.20)
$$\begin{aligned} \epsilon&:= \ (1/80) {\mathsf {a}}^2 ( \beta M_G^2 )^{-1} e^{-R/T}. \end{aligned}$$
(3.21)

Suppose that the duration parameter \(T \in (0, R)\) satisfies

$$\begin{aligned} (1+\kappa )^{3/2} T^2&\le \ \min \left( \frac{1}{96 (1+ \kappa )}, \frac{1}{256 (1+\kappa )^{3/2} R^2} \right) . \end{aligned}$$
(3.22)

Then for any \(m > m^{\star }\), \(k\in {\mathbb {N}}\), and probability measure \(\nu _m\) on \({\mathbb {R}}^{m d}\), (3.11) holds for the transition kernel of (3.17).

A proof of this result is given in Sect. 7.2.

3.3 Numerical illustration of couplings

Before turning to the proofs of our main results, we test the two-scale coupling defined by (2.15) numerically on the following distributions.

  • A TPS distribution with the three-well path potential energy function illustrated in Fig. 2a. The initial conditions of the components of the coupling are taken to be paths that pass through the two channels that connect the bottom two wells located at \(x^{\pm } \approx (\pm 1.048, -0.042)\).

  • A PIMD distribution where the underlying potential energy is the negative logarithm of the normal mixture density illustrated in Fig. 2b. The mixture components are twenty two-dimensional Gaussian distributions with covariance matrix given by the \(2 \times 2\) identity matrix and with mean vectors given by 20 independent samples from the uniform distribution over the rectangle \([0, 10] \times [0,10]\). The energy barriers are not large. The potential energy in this example is adapted from [45, 47]. The initial paths are taken to be two unit circles one centered at (1, 1) and the other centered at (9, 9). The parameter \({\mathsf {a}}\) is selected to be 0.1.

  • A PIMD distribution where the underlying potential energy is the negative logarithm of the Laplace mixture density illustrated in Fig. 2c. The mixture components are twenty two-dimensional (regularized) Laplace distributions using the same covariance matrix and mean vectors as in the preceding example. However, unlike the preceding example, in this example the underlying potential is only asymptotically convex. The initial paths are taken to be two unit circles one centered at (1, 1) and the other centered at (9, 9). The parameter \({\mathsf {a}}\) is selected to be 0.1.

  • A PIMD distribution where the underlying potential energy is the banana-shaped potential energy illustrated in Fig. 2d. This function is highly non-convex and unimodal with a global minimum at the point (1, 1). This minimum lies in a long, narrow, banana shaped valley. The initial paths are taken to be small circles centered at \((\pm 4,16)\). The parameter \({\mathsf {a}}\) is selected to be 1.0.

For the TPS and PIMD distributions we use the finite-dimensional approximations described in Sects. 3.1 and 3.2, respectively. The resulting semi-discrete evolution equations are discretized in time using a strongly stable symmetric splitting integrator [3, 43, 44]. We describe this integrator in the specific context of TPS, since a very similar method is used for PIMD with \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) replacing \(\varvec{{\mathcal {C}}}\) in the dynamics, and the covariance matrix \((m/\beta ) \varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) replacing \(((m+1)/\tau ) \varvec{{\mathcal {C}}}\) in the velocity randomization step. First, split (3.5) with \(\varvec{b}(\varvec{x}) = \varvec{x}-\varvec{{\mathcal {C}}} \nabla G_{m}(\varvec{x})\) into

$$\begin{aligned}&(\mathrm {A}):\qquad \dot{\varvec{q}}_t = \varvec{v}_t, \quad \dot{\varvec{v}}_t = -\varvec{q}_t \end{aligned}$$
(3.23)
$$\begin{aligned}&(\mathrm {B}):\qquad \dot{\varvec{q}}_t = 0 \;, \quad \dot{\varvec{v}}_t = -\varvec{{\mathcal {C}}} \nabla G_{m}(\varvec{q}_t) \end{aligned}$$
(3.24)

with corresponding flows explicitly given by

$$\begin{aligned}&\varphi _t^{(\mathrm {A})}(\varvec{q}_0,\varvec{v}_0) = (\cos (t) \varvec{q}_0 + \sin (t) \varvec{v}_0, -\sin (t) \varvec{q}_0 + \cos (t) \varvec{v}_0) \\&\varphi _t^{(\mathrm {B})}(\varvec{q}_0,\varvec{v}_0) = (\varvec{q}_0, \varvec{v}_0 - t \varvec{{\mathcal {C}}} \nabla G_{m}(\varvec{q}_0)). \end{aligned}$$

Given a time step size \(\Delta t>0\), and using these exact solutions, a \(\Delta t\) step of the symmetric splitting integrator we use is given by

$$\begin{aligned} \psi _{\Delta t} = \varphi _{\Delta t/2}^{(\mathrm {B})} \circ \varphi _{\Delta t}^{(\mathrm {A})} \circ \varphi _{\Delta t/2}^{(\mathrm {B})}. \end{aligned}$$
(3.25)

In order to mitigate the effect of periodicities or near periodicities in the underlying dynamics, we choose the number of integration steps to be geometrically distributed with mean \(T/\Delta t\). The idea of duration randomization has a long history [10, 11, 13, 19, 21, 51]. The initial velocity is taken to be an md-dimensional standard normal vector with covariance matrix \(((m+1)/\tau ) \varvec{{\mathcal {C}}}\) and a Metropolis accept/reject step is added to ensure the algorithm leaves invariant \(\mu _m\) [11, 70]. In summary, we use the following transition step in the simulations.

Algorithm 3.1

(Numerical Randomized pHMC) Denote by \(T>0\) the duration parameter and let \(\psi _{\Delta t}\) be the time integrator described in (3.25). Given the current state of the chain \(\varvec{x} \in {\mathbb {R}}^{m d}\), the algorithm outputs the next state of the chain \(\varvec{X} \in {\mathbb {R}}^{m d}\) as follows.

  • Step 1 Generate a d-dimensional random vector \(\varvec{\xi } \sim {\mathcal {N}}(\varvec{0},((m+1)/\tau ) \varvec{{\mathcal {C}}})\).

  • Step 2 Generate a geometric random variable k supported on the set \(\{1, 2, 3, ... \}\) with mean \(T/ \Delta t\).

  • Step 3 Output \(\varvec{X} = \gamma \widetilde{\varvec{q}}_k + (1-\gamma ) \varvec{x}\) where \((\widetilde{\varvec{q}}_k, \widetilde{\varvec{v}}_k) = \psi _{\Delta t}^k(\varvec{x}, \varvec{\xi })\), and given \(\varvec{\xi }\) and k, \(\gamma \) is a Bernoulli random variable with parameter \(\alpha \) defined as

    $$\begin{aligned} \alpha = \min \{ 1, \exp \left( - [ {\mathcal {E}}(\widetilde{\varvec{q}}_k, \widetilde{\varvec{v}}_k) - {\mathcal {E}}(\varvec{x},\varvec{\xi }) ]\right) \} \end{aligned}$$

    where \({\mathcal {E}}(\varvec{x},\varvec{v}) = (1/2) \langle \varvec{v}, \varvec{{\mathcal {C}}}^{-1} \varvec{v} \rangle + U_{m}(\varvec{x})+ (1/2) \langle \varvec{x} , \varvec{{\mathcal {C}}}^{-1} \varvec{x} \rangle \).

We stress that \(\varvec{\xi }\) and k from (Step 1) and (Step 2) are mutually independent and independent of the state of the Markov chain associated to pHMC. We pick the time step size \(\Delta t\) of the integrator sufficiently small to ensure that \(99\%\) of proposal moves are accepted on average in (Step 3).

Realizations of the coupling process are shown in Fig. 2. We chose parameters only for visualization purposes. The different components of the coupling are shown as different color dots. The insets of the figures show the distance between the components of the coupling as a function of the number of steps.

Figure 3 shows the average time after which the distance between the components of the coupling is for the first time within \(10^{-12}\). To produce this figure, we generated one hundred samples of the coupled process for one hundred different values of the duration parameter T. As indicated in the figure legends, the coupling parameter \(\gamma \) is set equal to either:

  • \(\gamma =0\) which corresponds to a synchronous coupling of the initial velocities;

  • \(\gamma =T^{-1}\) which corresponds to the optimal coupling of the initial velocities when \(\varvec{b}(\varvec{x})=0\); and,

  • \(\gamma =\cot (T)\) which corresponds to the optimal coupling of the initial velocities when \(\varvec{b}(\varvec{x})=-\varvec{x}\).

Fig. 2
figure 2

Realizations of Two-Scale Coupling. This figure illustrates realizations of the coupling with \(T=\gamma ^{-1}\). The different components of the coupling are shown as different color dots. A contour plot of the underlying potential energy function is shown in the background. The inset plots the distance \(r_i\) between the components of the coupling as a function of the step index i. The simulation is terminated when this distance first reaches \(10^{-12}\). In (a), (b), (c), and (d), this occurs in 34, 44, 38, and 88 steps, respectively

Fig. 3
figure 3

Mean Coupling Times. This figure illustrates the average of the random time \(\tau \) after which the distance between the components of the coupling is for the first time within \(10^{-8}\). The estimated average is plotted as a function of the duration T of the Hamiltonian dynamics for \(\gamma =0\) (black), \(\gamma =T^{-1}\) (gray), and \(\gamma =\cot (T)\) (light gray). In all cases, note that the minimum of the function is smaller and occurs at a smaller value of T when \(\gamma =T^{-1}\) or \(\gamma =\cot (T)\), and the difference between the minima for \(\gamma =T^{-1}\) and \(\gamma =\cot (T)\) is slight, because these minima occur at \(T\le 1\) where \(\cot (T) \approx T^{-1}\)

4 A priori bounds

In this section we gather several bounds for the dynamics and for the coupling that will be crucial in the proof of our main result.

4.1 Bounds for the dynamics

In the following, we assume throughout that Assumption 2.6 is satisfied, and

$$\begin{aligned} Lt^2\le & {} 1. \end{aligned}$$
(4.1)

Recall that \(\phi _t=(q_t,v_t)\) denotes the flow of (2.9). With the exception of using a different norm, the proofs of Lemmas 4.1 and 4.2 below are identical to the proofs of Lemmas 3.1 and 3.2 in Ref. [9] and therefore not repeated.

Lemma 4.1

For any \(x,v\in {\mathcal {H}}^s\),

$$\begin{aligned} \ \ \sup _{r\le t} \left|q_r(x,v)-(x+rv) \right|_s\le & {} Lt^2 \max (\left|x \right|_s,\left|x+tv \right|_s),\quad \text{ and } \end{aligned}$$
(4.2)
$$\begin{aligned} \sup _{r\le t} \left|v_r(x,v)-v \right|_s\le & {} Lt\, \sup _{r\le t}\left|q_r(x,v) \right|_s \nonumber \\\le & {} Lt (1+Lt^2) \max (\left|x \right|_s,\left|x+tv \right|_s). \end{aligned}$$
(4.3)

In particular,

$$\begin{aligned} \sup _{r\le t}\left|q_r(x,v) \right|_s\le & {} 2 \max (\left|x \right|_s,\left|x+tv \right|_s),\qquad \text{ and } \end{aligned}$$
(4.4)
$$\begin{aligned} \sup _{r\le t}\left|v_r(x,v) \right|_s\le & {} \left|v \right|_s + 2 Lt \max (\left|x \right|_s,\left|x+tv \right|_s). \end{aligned}$$
(4.5)

Lemma 4.2

For any \(x,y,u,v\in {\mathcal {H}}^s\),

$$\begin{aligned}&\sup _{r\le t} \left|q_r(x,u)-q_r(y,v)-(x-y)-r(u-v) \right|_s\nonumber \\&\quad \le Lt^2 \max \left( \left|x-y \right|_s,\left|(x-y)+t(u-v) \right|_s\right) ,\qquad \text{ and } \end{aligned}$$
(4.6)
$$\begin{aligned}&{ \sup _{r\le t} \left|v_r(x,u)-v_r(y,v)-(u-v) \right|_s \le Lt\, \sup _{r\le t} \left|q_r(x,u)-q_r(y,v) \right|_s} \nonumber \\&\quad \le Lt (1+Lt^2)\max \left( \left|x-y \right|_s,\left|(x-y)+t(u-v) \right|_s\right) . \end{aligned}$$
(4.7)

In particular,

$$\begin{aligned} \sup _{r\le t} \left|q_r(x,u)-q_r(y,v) \right|_s \ \le \ (1 + Lt^2) \max (\left|x-y \right|_s,\left|(x-y)+t(u-v) \right|_s). \end{aligned}$$
(4.8)

Lemma 4.1 is used in the proof of the Foster–Lyapunov drift condition in Lemma 2.7. Lemma 4.2 is used in the proof of Lemma 4.3 below.

4.2 Bounds related to two-scale coupling

The following lemma is used in the proof of Theorem 2.10 to obtain a contraction for the two-scale coupling when the distance between the components of the coupling is sufficiently small, i.e., \({\left| \left| \left| x-y \right| \right| \right| _{\alpha }}<R\).

Lemma 4.3

Suppose that \(\gamma >0\) and \(t>0\) satisfy \(\gamma t \le 1\) and \(Lt^2 \le 1/4\). Then for any \(x, y, u, v \in {\mathcal {H}}^s\) such that \(v^h = u^h\) and \(v^{\ell } = u^{\ell } + \gamma (x^{\ell } - y^{\ell })\), we have

  • (i) \(\left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} (q_t^{\ell }(x,u) - q_t^{\ell }(y,v) ) \right|_s \le \left( 1-\gamma t + \dfrac{5}{8} \dfrac{\sigma _{max}}{\sigma _{min}} L t^2 \right) \left| \widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} (x^{\ell }-y^{\ell }) \right|_s + \dfrac{5}{8} \sigma _{max} L t^2 \left|x^h-y^h \right|_s\)

  • (ii) \(\left| q_t^h(x,u) - q_t^h(y,v) \right|_s \le \left( 1 - \dfrac{1}{4} t^2 \right) \left|x^h - y^h \right|_s + \dfrac{1}{4} \sigma _{min}^{-1} t^2 \left| \widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} (x^{\ell }-y^{\ell }) \right|_s \)

Proof

Let \({\mathcal {G}}(x) = b(x) + x\), \(z_t = q_t(x,u) - q_t(y,v)\), and \(w_t = \frac{dz_t}{dt}\). By (2.9),

$$\begin{aligned}&\frac{d}{dt} z^{\ell }_t \ = \ w^{\ell }_t,\quad \frac{d}{dt} w^{\ell }_t \ = \ b^{\ell }(q_t(x,u)) - b^{\ell }(q_t(y,v)), \\&\frac{d}{dt} z^h_t \ = \ w^h_t,\quad \frac{d}{dt} w^h_t \ = \ -z^h_t + {\mathcal {G}}^h(q_t(x,u)) - {\mathcal {G}}^h(q_t(y,v)), \end{aligned}$$

with \(w_0 = - \gamma z^{\ell }_0\). These are second order linear ordinary differential equations, perturbed by a nonlinearity. A variation of constants ansatz shows that they are equivalent to the equations

$$\begin{aligned}&z^{\ell }_t \ = \ (1-\gamma t) z^{\ell }_0+ \int _0^t (t-r) \left( b^{\ell }(q_r(x,u)) - b^{\ell }(q_r(y,v)) \right) dr, \\&z^h_t \ = \ \cos (t) z^h_0 + \int _0^t \sin (t-r) \left( {\mathcal {G}}^h(q_r(x,u)) - {\mathcal {G}}^h(q_r(y,v)) \right) dr. \end{aligned}$$

Since \(t^2\le Lt^2 \le 1/4\), \(\gamma t \le 1\), and by Assumption 2.6 (B1) and (B2) and (2.20),

$$\begin{aligned} \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell }_t \right|_s&\le \ (1-\gamma t) \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell }_0 \right|_s+ \sigma _{max} L\int _0^t (t-r) \left|z_r \right|_s dr, \nonumber \\&\le \ (1-\gamma t) \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell }_0 \right|_s+ \sigma _{max} \frac{ Lt^2}{2} \sup _{r \le t} \left|z_r \right|_s, \end{aligned}$$
(4.9)
$$\begin{aligned} \left|z^h_t \right|_s&\le \ |\cos (t)| \left|z^h_0 \right|_s + \frac{1}{3} \int _0^t |\sin (t-r)| \left|z_r \right|_s dr, \nonumber \\&\le \left( 1-\frac{t^2}{2} +\frac{t^4}{24} \right) \left|z^h_0 \right|_s + \frac{t^2}{6} \sup _{r \le t} \left|z_r \right|_s, \end{aligned}$$
(4.10)

Here in (4.10) we used that \(t\le 1/2\), and hence \(|\cos (t)| \le 1- (1/2) t^2 + (1/24) t^4\), and \(|\sin (t-r)|\le t\). Since \(w_0=-\gamma z_0^\ell \) and \(\gamma t\le 1\), Lemma 4.2 and (2.20) imply

$$\begin{aligned} \sup _{r \le t} \left|z_r \right|_s \le \frac{5}{4} ( \left|z_0^{\ell } \right|_s + \left|z_0^h \right|_s ) \le \frac{5}{4} \left( \sigma _{min}^{-1} \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z_0^{\ell } \right|_s + \left|z_0^h \right|_s \right) . \end{aligned}$$

Inserting this estimate into (4.9) and (4.10), and again using \(t^2 \le 1/4\) yields,

$$\begin{aligned}&\left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell }_t \right|_s \ \le \ \left( 1-\gamma t + \frac{5}{8} \frac{\sigma _{max}}{\sigma _{min}} L t^2 \right) \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell }_0 \right|_s+ \frac{5}{8} \sigma _{max} L t^2 \left|z_0^h \right|_s, \\&\left|z^h_t \right|_s \ \le \ \left( 1- \frac{1}{4} t^2 \right) \left|z^h_0 \right|_s + \frac{1}{4} \sigma _{min}^{-1} t^2 \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z_0^{\ell } \right|_s \qquad \text {as required.} \end{aligned}$$

\(\square \)

Recall that the two-scale coupling that we consider ensures that \(\xi ^{\ell } -\eta ^{\ell } =-\gamma z^{\ell }\) with the maximal possible probability, where \(z=x-y\). The following lemma enables us to control the probability that \(\xi ^{\ell } -\eta ^{\ell } \ne -\gamma z^{\ell }\) for small distances \({\left| \left| \left| z \right| \right| \right| _{\alpha }}<R\).

Lemma 4.4

For any choice of \(\gamma \), \( P[\xi ^{\ell }-\eta ^{\ell }\ne -\gamma z^{\ell }] \le \left| \gamma \widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s/\sqrt{2\pi }.\)

Since \(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}\) is a trace class operator on \({\mathcal {H}}^s\), note that \(\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}}\) is not a bounded operator. Nonetheless, the bound appearing in Lemma 4.4 is finite because the operator \(\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}}\) appearing in that bound only acts on \(z^{\ell }\), i.e., the n-dimensional projection of z onto the finite dimensional space \({\mathcal {H}}^{s,\ell }\).

Proof

Recall from (2.18) that \(P[\xi ^{\ell }-\eta ^{\ell }\ne -\gamma z^{\ell }] =d_{\mathrm {TV}}({\mathcal {N}}(0, \widetilde{{\mathcal {C}}}) , {\mathcal {N}}(\gamma z^{\ell }, \widetilde{{\mathcal {C}}}) )\). Let \(\widetilde{{\mathcal {C}}}^{\ell }\) denote the restriction of \(\widetilde{{\mathcal {C}}}\) to \({\mathcal {H}}^\ell \). Then since \(z^\ell \in {\mathcal {H}}^\ell \), and by scale invariance of the total variation distance,

$$\begin{aligned} P[\xi ^{\ell }-\eta ^{\ell }\ne -\gamma z^{\ell }]= & {} d_{\mathrm {TV}}({\mathcal {N}}(0, \widetilde{{\mathcal {C}}}^\ell ) , {\mathcal {N}}(\gamma z^{\ell }, \widetilde{{\mathcal {C}}}^\ell ) )\\= & {} d_{\mathrm {TV}}({\mathcal {N}}(0, I_\ell ) , {\mathcal {N}}(\gamma \widetilde{{\mathcal {C}}}^{-1/2}z^{\ell }, I_\ell ) ) \ =\ d_{\mathrm {TV}}({\mathcal {N}}(0, 1) , {\mathcal {N}}(|\gamma \widetilde{{\mathcal {C}}}^{-1/2}z^{\ell }|, 1) )\\= & {} 2\, {\mathcal {N}}(0,1) \left[ (0 ,\, |\gamma \widetilde{{\mathcal {C}}}^{-1/2} z^{\ell }/2 |) \right] \ \le \ |\gamma \widetilde{{\mathcal {C}}}^{-1/2} z^{\ell } |/\sqrt{2\pi }, \end{aligned}$$

see Fig. 4 for the last equation. \(\square \)

Fig. 4
figure 4

The total variation distance between the one-dimensional normal distributions \({\mathcal {N}}(0,1)\) and \({\mathcal {N}}(h,1)\) equals one minus the area of the shaded region. Therefore, \(d_{\mathrm {TV}}({\mathcal {N}}(0, 1) , {\mathcal {N}}(h,1)) =2{\mathcal {N}}(0,1)[(0,h/2)]\)

Next we gather some elementary inequalities on the function f in (2.23) needed in the proof of Theorem 2.10. To state these results, let \(f_{-}^{\prime }\) denote the left derivative of f which satisfies

$$\begin{aligned} f_{-}^{\prime }(r) = {\left\{ \begin{array}{ll} e^{-a r} &{} \text {for }r \le R, \\ 0 &{} \text {for }r > R . \end{array}\right. } \end{aligned}$$

Lemma 4.5

For all \(r,{{\widetilde{r}}}>0\), the function f in (2.23) satisfies:

  • (i) \(f({{\widetilde{r}}}) - f(r) \le f^{\prime }_{-}(r) ({{\widetilde{r}}}-r) \;.\)

  • (ii) \(f({{\widetilde{r}}}) - f(r) \le a^{-1} f'_{-}(r).\)

  • (iii) If \(r \le R\) then \( \max (1,a R) e^{-\max (1,a R)} \le {r f^{\prime }_{-}(r)}/{f(r)} \le 1.\)

Proof

Property (i) follows from the fact that f is concave. Since f is non-decreasing and constant for \(r\ge R\), (ii) is trivially true in the cases \({{\widetilde{r}}} \le r\) and \(r \ge R\). In the case \(r<\min ({{\widetilde{r}}},R)\),

$$\begin{aligned} f({{\widetilde{r}}}) - f(r)&\ =\ \frac{1}{a}\left( e^{-ar}-e^{-a\min ({{\widetilde{r}}},R)}\right) \ \le \ \frac{1}{a} f_{-}^{\prime }(r) . \end{aligned}$$

Combining these cases gives (ii). Let

$$\begin{aligned} g(x) := \frac{x}{e^{x} - 1} ~~ \text {so that }\frac{r f^{\prime }_{-}(r) }{f(r)} = g(ar). \end{aligned}$$

Property (iii) then follows because g decreases with x, \(\lim _{x \rightarrow 0} g(x) = 1\) and \(g(x) \ge \max (1,x) e^{-\max (1,x)}\). \(\square \)

5 Proof of Foster–Lyapunov drift condition

Before giving the proof of Lemma 2.7, here are some preparatory results. Using the shorthand notation \(\varrho (t)=\left|q_t(x,\xi ) \right|_s^2\) and \(\varphi (t) = \left|v_t(x,\xi ) \right|_s^2\), (2.9) implies

$$\begin{aligned} \varrho '(t)= & {} 2 \langle q_t(x,\xi ), v_t(x,\xi ) \rangle _s, \\ \varrho ''(t)= & {} 2 \left( \varphi (t) + \langle q_t(x,\xi ), b(q_t(x,\xi )) \rangle _s \right) , \\ \varphi '(t)= & {} 2 \langle v_t(x,\xi ), b(q_t(x,\xi )) \rangle _s . \end{aligned}$$

Hence, by Assumption 2.6 (B1) and (B3), we have the differential inequalities

$$\begin{aligned} \varrho ''(t)&\le 2 \left( \varphi (t) - K\varrho (t) + A \right) \;, \qquad - \varrho ''(t) \le 2 L\varrho (t). \end{aligned}$$
(5.1)

The following formula comes from two applications of integration by parts and is valid for any \(k \in {\mathbb {N}}\) and for any twice differentiable function \(g: {\mathbb {R}} \rightarrow {\mathbb {R}}\),

$$\begin{aligned} \int _0^T g(r) (T-r)^k dr = \frac{g(0) T^{k+1}}{k+1} + \frac{g'(0) T^{k+2}}{(k+1) (k+2)} + \int _0^T \frac{g''(r) (T-r)^{k+2}}{(k+1) (k+2)} dr. \end{aligned}$$
(5.2)

We also require the following inequalities

$$\begin{aligned} \sup _{r \le T} \varrho (r)\le & {} 4 (\left|x \right|_s + T \left|\xi \right|_s)^2 \ \le \ 8 ( \varrho (0) + T^2 \varphi (0) ), \end{aligned}$$
(5.3)
$$\begin{aligned} \sup _{r \le T} \varphi (r)\le & {} 8 L^2 T^2 \varrho (0) + 2 (1+2LT^2)^2 \varphi (0) \ \le \ 8 L^2 T^2 \varrho (0) + \frac{9}{2} \varphi (0) ,\ \ \end{aligned}$$
(5.4)

which follow from Lemma 4.1 and the assumption \(LT^2 \le 1/48\).

Proof of Lemma 2.7

Apply in turn (5.1), then (5.2) with \(g(r) = \varrho (r)\), and then (5.1) again, to obtain

$$\begin{aligned}&E \left[ \left|X'(x) \right|_s^2 \right] \ =\ E \left[ \rho (T) \right] \ =\ \rho (0)+T\, E[\rho '(0)]+ \int _0^T (T-r) E \left[ \varrho ''(r) \right] dr \\&\quad \le \left|x \right|_s^2 + 2 \int _0^T (T-r)\, E \left[ \varphi (r) - K\varrho (r) + A \right] dr \\&\quad \le (1 - K T^2) \left|x \right|_s^2 + A T^2 + 2 \int _0^T (T-r) E \left[ \varphi (r) \right] dr \\&\qquad - 2 K \int _0^T \frac{(T-r)^3}{3!} E \left[ \varrho ''(r) \right] dr \\&\quad \le (1 - K T^2) \left|x \right|_s^2 + A T^2 + T^2 E \left[ \sup _{r \le T} \varphi (r) \right] + \frac{1}{6} K LT^4 \, E \left[ \sup _{r \le T} \varrho (r) \right] \\&\quad \le \left( 1 - K T^2 + 8 L^2 T^4 + \frac{4}{3} K L T^4 \right) \left|x \right|_s^2+ A T^2 + \left( \frac{9}{2} + \frac{4}{3} K LT^4 \right) T^2 E\left|\xi \right|_s^2 \end{aligned}$$

where in the last step we applied (5.3) and (5.4). Since we assume \(LT^2 \le (1/48) ( K/L)\), note that \(8 L^2 T^4 \le (1/6) K T^2\), and since also \(KT^2\le LT^2 \le 1/4\),

$$\begin{aligned} E \left[ \left|X'(x) \right|_s^2 \right] \le \left( 1 - {K T^2}/{2} \right) \left|x \right|_s^2 + 5 ({\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s}) + A) T^2 \end{aligned}$$

as required. \(\square \)

6 Proofs of main results

Proof of Theorem 2.10

The parameters \(\gamma \), a and \(\epsilon \) have been chosen in (2.29), (2.30), and (2.31) such that the following conditions are satisfied:

$$\begin{aligned} \gamma T\le & {} 1, \end{aligned}$$
(6.1)
$$\begin{aligned} \gamma R\le & {} 1/4, \end{aligned}$$
(6.2)
$$\begin{aligned} a T= & {} 1, \end{aligned}$$
(6.3)
$$\begin{aligned} (\sigma _{max}/{\sigma _{min}} ) LT\le & {} {\gamma }/{4} , \end{aligned}$$
(6.4)
$$\begin{aligned} \epsilon (A+{\text {trace}}( \widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s} ) )= & {} \ \max (1,R/T)\,e^{ - \max (1,R / T)}/160. \end{aligned}$$
(6.5)

Indeed, (6.1) and (6.2) hold by selection of \(\gamma \) in (2.29); (6.3) holds by selection of a in (2.30); (6.4) holds because (2.27) implies that

$$\begin{aligned} ( \sigma _{max}/\sigma _{min} ) T \le \min (T^{-1}/4, R^{-1}/16) =\gamma /4 \end{aligned}$$
(6.6)

by selection of \(\gamma \) in (2.29); and (6.5) holds by selection of \(\epsilon \) in (2.31).

Let \(z=x-y\), \(W=\xi -\eta \), \(r={\left| \left| \left| z \right| \right| \right| _{\alpha }}\), \(R'= {\left| \left| \left| X'(x,y)-Y'(x,y) \right| \right| \right| _{\alpha }}\), \(G=1+\epsilon (\left|x \right|_s^2+\left|y \right|_s^2)\), \(G'=1+\epsilon (\left|X' \right|_s^2+\left|Y' \right|_s^2) \), \(F=f(r)\) and \(F'=f(R')\). We consider separately the cases where \(r < R\) and \(r \ge R\).

(i) Contractivity for \(r < R\). Expand

$$\begin{aligned} E [ F' - F ] = \mathrm {I} + \mathrm {II}, \quad \text {where} \quad {\left\{ \begin{array}{ll} \mathrm {I} =E [ F' - F ; ~W^{\ell } = -\gamma z^{\ell }], \\ \mathrm {II} = E[ F' - F; ~W^{\ell } \ne -\gamma z^{\ell } ].\end{array}\right. } \end{aligned}$$
(6.7)

Let \(Z_T = q_T(x,\xi ) - q_T(y, \eta )\). By Lemma 4.5 (i), Lemma 4.3, (6.4) and (2.28),

$$\begin{aligned} \mathrm {I}&\ \le \ f'(r) E [ R' - r ; ~ W^{\ell } = -\gamma z^{\ell }] \nonumber \\&\ =\ f'(r) E \left[ \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} Z^{\ell }_T \right|_s - \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s + \alpha ( \left|Z^h_T \right|_s - \left|z^h \right|_s ) ;~ W^{\ell } = -\gamma z^{\ell } \right] \nonumber \\&\ \le \ f'(r) \left( ( -\gamma T + \frac{5}{8} \frac{\sigma _{max}}{\sigma _{min}} LT^2 + \frac{1}{4} \sigma _{min}^{-1} \alpha T^2 ) \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s \right. \nonumber \\&\quad \left. +\, ( - \frac{1}{4} \alpha T^2 + \frac{5}{8} \sigma _{max} LT^2 ) \left|z^h \right|_s \right) ( 1 - P[ W^{\ell } \ne -\gamma z^{\ell } ])\nonumber \\&\ \le \ -f'(r) \left( \frac{19}{32}\gamma T \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s \, +\, \frac{3}{32} \alpha T^2 \left|z^h \right|_s \right) ( 1 - P[ W^{\ell } \ne -\gamma z^{\ell } ]) \end{aligned}$$
(6.8)

Moreover, by Lemmas 4.4 and 4.5 (ii),

$$\begin{aligned} P[ W^{\ell } \ne -\gamma z^{\ell } ]&\ \le \ \frac{\gamma R}{\sqrt{2 \pi }}\ \le \ \frac{1}{4} \frac{1}{ \sqrt{2 \pi }}\ <\ \frac{1}{10} \;,\qquad \text { and} \end{aligned}$$
(6.9)
$$\begin{aligned} \mathrm {II}&\ \le \ a^{-1} f'(r) P[ w^{\ell } \ne -\gamma z^{\ell } ]\ \le \ f'(r) \frac{2}{5} \gamma T \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s , \end{aligned}$$
(6.10)

where in (6.9) we used (6.2) and in (6.10) we used (6.3) and \(\sqrt{2\pi }>5/2\).

Inserting (6.8), (6.9), and (6.10) into (6.7), and using Lemma 4.5 (iii), gives

$$\begin{aligned}&E [ F' - F ] \ \le \ f'(r) \left( - \tfrac{1}{8} \gamma T \left|\widetilde{{\mathcal {C}}}^{-\frac{1}{2}} {\mathcal {C}}^{\frac{s}{2}} z^{\ell } \right|_s - \tfrac{1}{12} \alpha T^2 \left|z^h \right|_s \right) \nonumber \\&\quad \le -\tfrac{1}{12}\, \min (\gamma T,T^2)\, rf'(r) \nonumber \\&\quad \le - \tfrac{1}{12} \, \min (1, T/(4R),T^2)\,\max (1,R/T)\,e^{-\max (1,R/T)}\, f(r)\nonumber \\&\quad \le -c_1 \, F. \end{aligned}$$
(6.11)

Here we have introduced \(c_1 := (1/12) T^2 \max (1,R/T) e^{-\max (1, R / T) }\), and we have used (2.29), (6.3), and the fact that \(T^2\le \min (1,T/(4R))\) by (6.6).

Furthermore, by Lemma 2.7,

$$\begin{aligned} E [ G' - G ]&\le \ 10 \epsilon (A+{\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s})) T^2 \ \le \ (3/4) {c_1} G, \end{aligned}$$
(6.12)

where in the last step we eliminated \(\epsilon \) using (6.5).

The Cauchy-Schwarz inequality, (6.11) and (6.12) now imply

$$\begin{aligned}&E[ \rho (X',Y')]\ =\ E[ \sqrt{F' G'} ]\ \le \ \sqrt{ E[F'] } \sqrt{ E[G'] } \end{aligned}$$
(6.13)
$$\begin{aligned}&\quad \le \sqrt{(1-c_1)F}\sqrt{ (1 + 3{c_1}/{4} )G}\ \le \ \sqrt{ 1 - {c_1}/{4}} \sqrt{FG} \nonumber \\&\quad \le \exp \left( - {c_1}/{8} \right) \rho (x,y), \end{aligned}$$
(6.14)

where in the last step we used \(1-{\mathsf {x}} \le e^{-{\mathsf {x}}}\) with \({\mathsf {x}}=c_1/4\).

(ii) Contractivity for \(r \ge R\). In this case, by (2.24) and (2.26), we have \(\left|x \right|_s^2 + \left|y \right|_s^2 \ge 40 (A+{\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s})) / K\), and we can apply the Foster–Lyapunov drift condition in (2.25) to (6.13) to obtain

$$\begin{aligned} E&[ \rho (X',Y')] \le \sqrt{F} \sqrt{ E[G'] } \le \sqrt{F} \left( 1 + \epsilon \left( 1 - \frac{K T^2}{4} \right) (\left|x \right|_s^2 + \left|y \right|_s^2) \right) ^{1/2} \nonumber \\&\le \sqrt{F} \left( 1- \frac{5}{2} \epsilon (A+{\text {trace}}(\widetilde{{\mathcal {C}}} {\mathcal {C}}^{-s})) T^2 + \epsilon \left( 1 - \frac{K T^2}{8} \right) (\left|x \right|_s^2 + \left|y \right|_s^2) \right) ^{1/2} \nonumber \\&\le \left( 1 - c_2 \right) ^{1/2} \rho (x,y)\, \le \, \exp \left( - {c_2}/{2} \right) \rho (x,y) \end{aligned}$$
(6.15)

where \(c_2:= \min \left( K T^2/8,\, T^2 \max (1,R/T)e^{-\max (1,R/T)} /64 \right) \).

(iii) Global Contraction. Let \(c:=\min (c_1/8,c_2/2)=c_2/2\). By combining the bounds in (6.14) and (6.15), we see that for any \(x,y \in {\mathcal {H}}^s\),

$$\begin{aligned} E [ \rho (X',Y')] \le e^{-c} \rho (x,y).[-3.4pc] \end{aligned}$$

\(\square \)

Proof of Corollary 2.12

The Wasserstein contraction in (2.34) follows directly from Theorem 2.10, see e.g. [9, Corollary 2.8] for a similar result. The bound in (2.35) then follows from (2.34) by comparing \(\rho \) to the metric on \({\mathcal {H}}^s\). Indeed, recall that by (2.23) and (2.30), \(f(r)=(1-e^{-\min (r,R)/T}) T\). Let \(x,y\in {\mathcal {H}}^s\), and let \(r={\left| \left| \left| x-y \right| \right| \right| _{\alpha }}\). Suppose first that \(r\le T\). Then \(f(r)\ge r/e\), and thus

$$\begin{aligned} \rho (x,y)\ \ge \ \sqrt{f(r)}\ \ge \ \sqrt{r/e}\ \ge \ r/\sqrt{eT}\ \ge \ \left|x-y \right|_s\min (\sigma _{min},\alpha )/\sqrt{eT} \end{aligned}$$

where in the last step we used (2.21). Now suppose that \(r>T\). Then since also \(R\ge T\) by assumption, \(f(r)\ge (1-e^{-1})T\), and thus we obtain

$$\begin{aligned} \rho (x,y)\ \ge \ \sqrt{(1-e^{-1})T\epsilon }\sqrt{\left|x \right|_s^2+\left|y \right|_s^2}\ \ge \ \sqrt{(1-e^{-1})T{\epsilon }/2}\left|x-y \right|_s. \end{aligned}$$

Combining both cases and noting that by (2.28), \(\alpha \ge \sigma _{min}\), we see that

$$\begin{aligned} \left|x-y \right|_s\ \le \ \max \left( \sqrt{eT}/\sigma _{min}, \sqrt{2}/\sqrt{(1-e^{-1})T{\epsilon }} \right) \, \rho (x,y)\quad \text {for all }x,y , \end{aligned}$$

which implies an analogue bound for the corresponding Wasserstein distances \({\mathcal {W}}^{s,1}\) and \({\mathcal {W}}_\rho \). Conversely, since \(f(r)\le T\) for all r,

$$\begin{aligned} \rho (x,y)\ \le \ \sqrt{T}\sqrt{1+\epsilon \left|x \right|_s^2+\epsilon \left|y \right|_s^2}\ \le \ \sqrt{T} (1+\sqrt{\epsilon }\left|x \right|_s+\sqrt{\epsilon }\left|y \right|_s)\quad \text {for all }x,y . \end{aligned}$$

Therefore, with C defined by (2.36), we obtain

$$\begin{aligned}&{\mathcal {W}}^{s,1} (\nu \pi ^k ,\mu )\ = \ {\mathcal {W}}^{s,1} (\nu \pi ^k ,\mu \pi ^k )\ \le \ CT^{-1/2}\, {\mathcal {W}}_\rho (\nu \pi ^k,\mu \pi ^k) \\&\quad \le CT^{-1/2}e^{-c k}\, {\mathcal {W}}_\rho (\nu ,\mu ) \ \le \ C\left( 1+\sqrt{\epsilon }M_1(\nu )+\sqrt{\epsilon }M_1(\mu )\right) e^{-c k} \end{aligned}$$

for all \(k\in {\mathbb {N}}\) and all probability measures \(\nu \) on \({\mathcal {H}}^s\). Finally, by Lemma 2.7 and (2.31), we have \(\sqrt{\epsilon }M_1(\mu ) \le (1/4) K^{-1/2} e^{-R/(2 T)}\). \(\square \)

7 Proofs of results from Sect. 3 (applications)

7.1 Proofs of results for TPS

To prove Theorem 3.2, we compare the eigenvalues of \(\varvec{{\mathcal {C}}}\) to \({\mathcal {C}}\). Note that these eigenvalues each have multiplicity d, and to account for this, define the index function \(\varphi (k,j) = d (k-1) +j\). Then the eigenvalues of \({\mathcal {C}}\) are

$$\begin{aligned} \lambda _{\varphi (k,j)}= \left( \frac{ \tau }{k \pi } \right) ^{2}, \quad k \in {\mathbb {N}}, \quad 1 \le j \le d, \end{aligned}$$
(7.1)

and the eigenvalues of \(\varvec{{\mathcal {C}}}\) are

$$\begin{aligned} \varvec{\Lambda }_{\varphi (k,j)} = \lambda _{\varphi (k,j)} \left( \frac{\theta _k}{\sin (\theta _k)} \right) ^2, ~ \theta _k:=\frac{k \pi }{2 (m+1)}, ~ 1 \le k \le m, ~ 1 \le j \le d. \end{aligned}$$
(7.2)

The following lemma helps estimate the error of the eigenvalues of the approximation \(\varvec{{\mathcal {C}}}\) relative to those of \({\mathcal {C}}\).

Lemma 7.1

For any \(m \in {\mathbb {N}}\), for all \(1 \le k \le m\), and for \(1 \le j \le d\),

  • (E1) \(| \varvec{\Lambda }_{\varphi (k,j)} - \lambda _{\varphi (k,j)} | = \varvec{\Lambda }_{\varphi (k,j)} - \lambda _{\varphi (k,j)} \le \lambda _{\varphi (k,j)} \dfrac{k^2 \pi ^2 }{6 (m+1)^2} = \dfrac{\tau ^2}{6 (m+1)^2}\),

  • (E2) \(\left( \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (k,j)}}\right) ^{1/2} \le \left( \dfrac{\lambda _1}{\lambda _{\varphi (k,j)}}\right) ^{1/2} \left( 1 + \dfrac{\pi ^2}{16 (m+1)^2} \right) \).

Proof

This lemma is an easy consequence of the elementary inequalities

$$\begin{aligned} \frac{1}{2}< 1 - \frac{\theta ^2}{6}< \frac{\sin (\theta )}{\theta }< 1 \quad \text {and} \quad 1< \frac{\theta }{\sin (\theta )}< 1 + \frac{\theta ^2}{4} < \frac{5}{3} \end{aligned}$$
(7.3)

which are valid for \(0< \theta < \pi /2\). Indeed, (7.1), (7.2) and (7.3) imply

$$\begin{aligned} | \varvec{\Lambda }_{\varphi (k,j)}&- \lambda _{\varphi (k,j)} | = \varvec{\Lambda }_{\varphi (k,j)} - \lambda _{\varphi (k,j)} \\&= \lambda _{\varphi (k,j)} \left( \left( \frac{\theta _k}{\sin (\theta _k)} \right) ^2 - 1 \right) \\&= \lambda _{\varphi (k,j)} \left( \frac{\theta _k}{\sin (\theta _k)} + 1 \right) \left( \frac{\theta _k}{\sin (\theta _k)} - 1 \right) \\&\le \lambda _{\varphi (k,j)}\frac{2}{3} \theta _k^2 = \lambda _{\varphi (k,j)} \frac{k^2 \pi ^2}{6 (m+1)^2} = \frac{\tau ^2}{6 (m+1)^2} \end{aligned}$$

as required for (E1). For (E2), we use (7.2) to write

$$\begin{aligned} \left( \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (k,j)}}\right) ^{1/2} = \left( \dfrac{\lambda _1}{\lambda _{\varphi (k,j)}}\right) ^{1/2} \frac{\sin (\theta _k)}{\theta _k} \frac{\theta _1}{\sin (\theta _1)}. \end{aligned}$$

Hence, by (7.3),

$$\begin{aligned} \left( \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (k,j)}}\right) ^{1/2} \le \left( \dfrac{\lambda _1}{\lambda _{\varphi (k,j)}}\right) ^{1/2} \left( 1 + \frac{\theta _1^2}{4} \right) = \left( \dfrac{\lambda _1}{\lambda _{\varphi (k,j)}}\right) ^{1/2} \left( 1 + \frac{\pi ^2}{16 (m+1)^2} \right) \end{aligned}$$

as required for (E2). \(\square \)

Proof of Theorem 3.2

This result is an application of Corollary 2.12. Since \(\varvec{{\mathcal {C}}}\) is a finite-dimensional matrix, Assumption 2.1 holds for \(\varvec{{\mathcal {C}}}\) with \(s=0\), and since we choose \(\varvec{\widetilde{{\mathcal {C}}}} = \varvec{{\mathcal {C}}}\), Assumption 2.5 also holds with \(s=0\). Therefore, to apply Corollary 2.12, it suffices to check that: (i) Assumption 2.6 holds with dimension-free constants L, n, K, and A, (ii) the dimension-free R defined in (3.6) satisfies condition (2.24), and (iii) the dimension-free condition (3.10) on the duration T implies (2.27) holds. We then invoke Corollary 2.12 to conclude convergence in the standard \(L^1\) Wasserstein distance.

Verify Assumption 2.6(B1)-(B3). For (B1), note that

$$\begin{aligned} |\varvec{b}(\varvec{x}) - \varvec{b}(\varvec{y}) |&\le (1+\varvec{\Lambda }_1 L_G) |\varvec{x}-\varvec{y}| \\&\le (1+2 \lambda _1 L_G) |\varvec{x}-\varvec{y}| = (1+2 (\tau ^2 / \pi ^2) L_G) |\varvec{x}-\varvec{y}| \end{aligned}$$

where in the last step we used Lemma 7.1 (E1) which implies that \(\varvec{\Lambda }_1 - \lambda _1 \le \lambda _1 \pi ^2 / (6 (m+1)^2) \le \lambda _1\) since \(m \ge 1\). Thus, (B1) holds with \(L=1+\kappa \) since \(\kappa = 2 (\tau ^2 / \pi ^2) L_G\). For (B2), since \(n=m_{\ell } d = \varphi (m_{\ell },d)\), \(n+1=\varphi (m_{\ell }+1,1)\) and

$$\begin{aligned} |\varvec{b}^h(\varvec{x}) + \varvec{x}^h&- \varvec{b}^h(\varvec{y}) - \varvec{y}^h | \le \varvec{\Lambda }_{\varphi (m_{\ell }+1,1)} L_G |\varvec{x}-\varvec{y}| \\&\le \lambda _{\varphi (m_{\ell }+1,1)} L_G |\varvec{x}-\varvec{y}| + ( \varvec{\Lambda }_{\varphi (m_{\ell }+1,1)} - \lambda _{\varphi (m_{\ell }+1,1)}) L_G |\varvec{x}-\varvec{y}| \\&\le 2 \lambda _{\varphi (m_{\ell }+1,1)} L_G |\varvec{x}-\varvec{y}| \le (1/3) |\varvec{x}-\varvec{y}| \end{aligned}$$

where in the second to last step we used

$$\begin{aligned} \varvec{\Lambda }_{\varphi (m_{\ell }+1,1)} - \lambda _{\varphi (m_{\ell }+1,1)} \le \lambda _{\varphi (m_{\ell }+1,1)} (m_{\ell }+1)^2 \pi ^2 / (6 (m+1)^2) \le \lambda _{\varphi (m_{\ell }+1,1)} \end{aligned}$$

which follows from Lemma 7.1 (E1) since \(m \ge (m_{\ell }+1) \pi /2\), and in the last step, we used that \(m_{\ell }+1 \ge \sqrt{6 L_G} \tau /\pi \). Hence, (B2) holds with \(n=m_{\ell } d\). For (B3),

$$\begin{aligned} \langle \varvec{x}, \varvec{b}(\varvec{x}) \rangle&\le - | \varvec{x}|^2 + | \varvec{x}| |\varvec{{\mathcal {C}}} \nabla G_{m}(\varvec{x})| \le - (1/2) | \varvec{x}|^2 + (1/2) |\varvec{{\mathcal {C}}} \nabla G_{m}(\varvec{x})|^2 \\&\le - (1/2) | \varvec{x}|^2 + (1/2) \varvec{\Lambda }_1^2 M_G^2 \tau \le - (1/2) | \varvec{x}|^2 + \lambda _1^2 \tau M_G^2 \end{aligned}$$

where in the last step we used

$$\begin{aligned} \varvec{\Lambda }_1^2 - \lambda _1^2 = 2 \lambda _1 (\varvec{\Lambda }_1 - \lambda _1) + (\varvec{\Lambda }_1 - \lambda _1)^2 \le \lambda _1^2 \end{aligned}$$

which follows from (E1) since \(m \ge 1\). Thus, (B3) holds with \(K=1/2\) and \(A= \lambda _1^2 \tau M_G^2 = ( \tau ^5 / \pi ^4) M_G^2\). To summarize, Assumption 2.6 holds with dimension-independent constants \(L=1+\kappa \), \(n=m_{\ell } d\) where \(m_{\ell } = \lfloor \sqrt{3 \kappa } \rfloor \), \(K=1/2\), and \(A= (\tau ^5 / \pi ^4) M_G^2\).

Verify Conditions (2.24) & (2.27). To show that R defined in (3.6) satisfies condition (2.24) and that condition (3.10) on the duration parameter implies condition (2.27), in this paragraph we gather some additional bounds. Since \(m_{\ell } \le \sqrt{3 \kappa }\), we have

$$\begin{aligned} \sigma _{max}= & {} \varvec{\Lambda }_{\varphi (m_{\ell },1)}^{-1/2} = \lambda _{\varphi (m_{\ell },1)}^{-1/2} (\sin (\theta _{m_{\ell }})/\theta _{m_{\ell }}) \le m_{\ell } \pi / \tau \le \sqrt{6 L_G}, \end{aligned}$$
(7.4)
$$\begin{aligned} \sigma _{min}^{-1}= & {} \varvec{\Lambda }_1^{1/2} = \lambda _1^{1/2} + \lambda _1^{1/2} \left( \theta _1/\sin (\theta _1) - 1 \right) \le 2 \lambda _1^{1/2} = 2 \tau /\pi \;, \end{aligned}$$
(7.5)

where in (7.4) and (7.5) we used (7.3). Moreover, by Lemma 7.1 (E2),

$$\begin{aligned} \dfrac{\sigma _{max}}{\sigma _{min}} = \left( \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (m_{\ell },1)}}\right) ^{1/2} \le \left( \dfrac{ \lambda _1}{ \lambda _{\varphi (m_{\ell },1)}}\right) ^{1/2} \left( 1 + \frac{\pi ^2}{16 (m+1)^2} \right) \le 2 m_{\ell } \end{aligned}$$
(7.6)

since \(m \ge 1\) and \(\sqrt{\lambda _1/\lambda _{\varphi (m_{\ell },1)}} = m_{\ell }\), and by Lemma 7.1 (E1),

$$\begin{aligned} {\text {trace}}(\varvec{{\mathcal {C}}})&\le {\text {trace}}({\mathcal {C}})+ \sum _{i=1}^{m d} ( \varvec{\Lambda }_i - \lambda _i) \le {\text {trace}}({\mathcal {C}})+ (d \tau ^2 / 6) \ m/(m+1)^2 \nonumber \\&\le 2 {\text {trace}}({\mathcal {C}}) = d \tau ^2 / 3, \quad \text {since } {\text {trace}}({\mathcal {C}}) = d \tau ^2 / 6. \end{aligned}$$
(7.7)

Let \(R_m=8 \sqrt{40} (A + {\text {trace}}(\varvec{{\mathcal {C}}}))^{1/2} \sigma _{max} L K^{-1/2}\) denote the RHS of (2.24). Then using (7.4), (7.7), \(L=1+\kappa \), \(K=1/2\), and \(A=M_G^2 \tau ^5 / \pi ^4\), we have

$$\begin{aligned} R_m^2&\le 128 \times 40 \times 6 L_G (1+\kappa )^2 ( (\tau ^5 / \pi ^4) M_G^2 + {\text {trace}}(\varvec{{\mathcal {C}}}) ) \\&\le 256 \times 20 \times \kappa (1 + \kappa )^2 \left( 3 (\tau ^3/\pi ^2) M_G^2 + d \pi ^2 \right) \\&\le 256 \times 20 \pi ^2 \kappa (1+\kappa )^2 ( (\tau / \pi )^3 M_G^2 + d) = R^2 \end{aligned}$$

which implies that R defined in (3.6) satisfies (2.24). Moreover, by (7.6),

$$\begin{aligned} \dfrac{\sigma _{max}}{\sigma _{min}} L\le 2 m_{\ell } (1+\kappa ) \le 2 \sqrt{3 \kappa } (1+\kappa ) . \end{aligned}$$
(7.8)

Inserting (7.8) into the LHS and RHS of (2.27) gives (3.10). Thus, whenever T satisfies condition (3.10) then condition (2.27) holds.

Invoke Corollary 2.12. By Corollary 2.12 and using \(K=1/2\), as long as T satisfies (3.10),

$$\begin{aligned} {\mathcal {W}}^{0,1} (\nu _m \pi _m^k, \mu _m)&\le C_{m} e^{-c k} \left( 1 + \sqrt{\epsilon _m} M_1(\nu _m) + (1/8) e^{-R/(2 T)} \right) \end{aligned}$$
(7.9)

holds with the dimension-free rate in (3.7) and the constants:

$$\begin{aligned} \begin{aligned} C_{m}&= \max (2 T \sigma _{min}^{-1}, 23 (A + {\text {trace}}(\varvec{{\mathcal {C}}}))^{1/2} e^{R/(2 T)}), \\ \epsilon _m&= (1/160) (A + {\text {trace}}( \varvec{{\mathcal {C}}} ))^{-1} e^{-R/T} \;. \end{aligned} \end{aligned}$$
(7.10)

These dimension-dependent constants can be upper bounded by dimension-free constants C and \(\epsilon \) given in (3.8) and (3.9), by using \(A=(\tau ^5 / \pi ^4) M_G^2 \), (7.5) and (7.7). Thus, (3.11) holds. \(\square \)

7.2 Proofs of results for PIMD

To prove Theorem 3.5, we compare the eigenvalues of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) to \({\mathcal {C}}_{{\mathsf {a}}}\). The leading eigenvalue of \({\mathcal {C}}_{{\mathsf {a}}}\) has multiplicity d, while all of the other eigenvalues have multiplicity 2d. If m is odd, then the leading eigenvalue of both \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) has multiplicity d, while all of the other eigenvalues of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) have multiplicity 2d. However, if m is even, then the trailing and leading eigenvalues of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) have multiplicity d, while all of the other eigenvalues of of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) have multiplicity 2d. To account for these multiplicities, it is helpful to define the index function

$$\begin{aligned} \varphi (k,j) = {\left\{ \begin{array}{ll} j &{} \text {if }k=1,~~ 1 \le j \le d, \\ 2 d (k-2)+j+d &{} \text {if }k>1, ~~ 1 \le j \le 2 d. \end{array}\right. } \end{aligned}$$

For any \(k\in {\mathbb {N}}\), the eigenvalues of \({\mathcal {C}}_{{\mathsf {a}}}\) are

$$\begin{aligned} \lambda _{\varphi (k,j)}= {\left\{ \begin{array}{ll} {\mathsf {a}}^{-1} &{} \text {if }k=1, ~~ 1 \le j \le d, \\ \dfrac{1}{{\mathsf {a}}+\omega _k^2} &{} \text {if }k>1,~~ 1 \le j \le 2 d. \end{array}\right. } \end{aligned}$$
(7.11)

For all \(m \in {\mathbb {N}}\) and \(k \in \{1, \dots , \lceil (m+1)/2 \rceil \}\), the eigenvalues of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) are

$$ \begin{aligned} \varvec{\Lambda }_{\varphi (k,j)} = {\left\{ \begin{array}{ll} {\mathsf {a}}^{-1} &{} \text {if }k=1, ~~ 1 \le j \le d, \\ \dfrac{1}{{\mathsf {a}} +\omega _k^2 \sin ^2(\theta _k)/\theta _k^2} &{} {\left\{ \begin{array}{ll} \text {if }k>1 \& k \ne \frac{m}{2}+1, &{} 1 \le j \le 2 d, \\ \text {if }k=\frac{m}{2}+1, &{} 1 \le j \le d, \end{array}\right. } \end{array}\right. } \end{aligned}$$
(7.12)

Here we have introduced

$$\begin{aligned} \theta _k := \ \frac{(k-1) \pi }{m} \quad \text {and} \quad \omega _{k}^2 := \frac{4 (k-1)^2 \pi ^2}{\beta ^2}. \end{aligned}$$
(7.13)

Note that the definition in (7.12) includes odd or even values of m. The following lemma estimates the error of the eigenvalues of \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) relative to those of \({\mathcal {C}}_{{\mathsf {a}}}\).

Lemma 7.2

For any \(m \in {\mathbb {N}}\) and \(k \in \{1, \dots , \lceil (m+1)/2 \rceil \}\),

  • (E1) \(| \varvec{\Lambda }_{\varphi (k,1)} - \lambda _{\varphi (k,1)} | = \varvec{\Lambda }_{\varphi (k,1)} - \lambda _{\varphi (k,1)} \le \lambda _{\varphi (k,1)} 2 \theta _k^2 \).

  • (E2) \(\left( \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (k,1)}}\right) ^{1/2} \le \left( \dfrac{\lambda _1}{\lambda _{\varphi (k,1)}}\right) ^{1/2}\).

Proof

This lemma is an easy consequence of the inequalities in (7.3). For \(k=1\), (E1) and (E2) trivially hold since \(\varvec{\Lambda }_{\varphi (k,1)}=\lambda _{\varphi (k,1)}={\mathsf {a}}^{-1}\). For \(k>1\), (7.12), (7.11), and (7.3) imply

$$\begin{aligned} | \varvec{\Lambda }_{\varphi (k,j)}&- \lambda _{\varphi (k,j)} | = \varvec{\Lambda }_{\varphi (k,j)} - \lambda _{\varphi (k,j)} \\&= \left( [{\mathsf {a}}+\omega _k^2 \sin ^2(\theta _k)/\theta _k^2 ]^{-1} - \lambda _{\varphi (k,j)} \right) \\&\le \left( [{\mathsf {a}}+\omega _k^2 - \omega _k^2 \theta _k^2 /3]^{-1} - \lambda _{\varphi (k,j)} \right) \quad \text {since } \sin ^2(\theta _k)/\theta _k^2 \ge 1-\theta _k^2 / 3 \\&\le \lambda _{\varphi (k,j)} \left( [ 1 - \omega _k^2 \theta _k^2 /(3 ({\mathsf {a}}+\omega _k^2))]^{-1} - 1 \right) \\&\le \lambda _{\varphi (k,j)} \theta _k^2/ 3 (1 - \theta _k^2/ 3)^{-1} \\&\le \lambda _{\varphi (k,j)} 2 \theta _k^2 \quad \text {as required for (E1).} \end{aligned}$$

For (E2) with \(k>1\), by (7.12), (7.11), and (7.3),

$$\begin{aligned} \dfrac{ \varvec{\Lambda }_1}{ \varvec{\Lambda }_{\varphi (k,j)}} = 1+\frac{\omega _k^2}{{\mathsf {a}}} \frac{\sin ^2(\theta _k)}{\theta _k^2} \le 1+\frac{\omega _k^2}{{\mathsf {a}}} = \dfrac{\lambda _1}{\lambda _{\varphi (k,j)}}. \end{aligned}$$

Taking square roots of both sides then gives (E2). \(\square \)

Proof of Theorem 3.5

This proof is very similar to the Proof of Theorem 3.2 with some differences which are highlighted below.

Verify Assumption 2.6(B1)-(B3). Since both \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\) and \({\mathcal {C}}_{{\mathsf {a}}}\) have leading eigenvalue \({\mathsf {a}}^{-1}\), (B1) holds with \(L=1+6 {\mathsf {a}}^{-1} L_G\). Similarly, (B3) holds with \(K=1/2\) and \(A=(1/2) \beta \lambda _1^2 M_G^2= (1/2) \beta {\mathsf {a}}^{-2} M_G^2\). Moreover, (B2) holds with \(n=2 m_{\ell } d -d = \varphi (m_{\ell }, d)\), since \(n+1= \varphi (m_{\ell }+1,1)\), \(m_{\ell } \ge \sqrt{3 L_G/2} (\beta /\pi )\), and \(m \ge 2 \pi m_{\ell }\).

Verify Conditions (2.24) & (2.27). By (7.12) and (7.13),

$$\begin{aligned} \sigma _{min} = {\mathsf {a}}^{1/2}, \quad \sigma _{max} \le \beta ^{-1} \left( \beta ^2 {\mathsf {a}} + 4 (m_{\ell }-1)^2 \pi ^2 \right) ^{1/2} \le \sqrt{{\mathsf {a}} + 6 L_G}, \end{aligned}$$
(7.14)

since \(m_{\ell }-1<\sqrt{3 L_G/2} (\beta /\pi )\). Moreover, by Lemma 7.2 (E2),

$$\begin{aligned} \frac{\sigma _{max}}{\sigma _{min}} \le \left( 1+ \frac{\omega _{m_{\ell }}^2}{{\mathsf {a}}} \right) ^{1/2}&= \left( 1+ \frac{4 (m_{\ell }-1)^2 \pi ^2}{\beta ^2 {\mathsf {a}}} \right) ^{1/2} \le \left( 1+\frac{6 L_G}{{\mathsf {a}}} \right) ^{1/2}. \end{aligned}$$
(7.15)

Furthermore, by Lemma 7.2 (E1),

$$\begin{aligned} {\text {trace}}(\varvec{{\mathcal {C}}}_{{\mathsf {a}}})&\le {\text {trace}}({\mathcal {C}}_{{\mathsf {a}}})+ 2 d \sum _{k=1}^{\lceil (m+1)/2 \rceil } ( \varvec{\Lambda }_{\varphi (k,1)} - \lambda _{\varphi (k,1)}) \nonumber \\&\le {\text {trace}}({\mathcal {C}}_{{\mathsf {a}}}) + d \beta ^2 = \frac{d}{2 {\mathsf {a}}}+ \frac{d \beta }{4 \sqrt{{\mathsf {a}}}} \left( 1 + \frac{2}{e^{\sqrt{{\mathsf {a}}} \beta }-1} \right) + d \beta ^2 \nonumber \\&\le 2 d ( {\mathsf {a}}^{-1} + \beta ^2 ) \end{aligned}$$
(7.16)

where in the last step we used \(1 + 2/(e^{2 {\mathsf {x}}}-1) < {\mathsf {x}} + {\mathsf {x}}^{-1}\) valid for all \({\mathsf {x}}>0\).

Let \(R_m=8 \sqrt{40} (A + {\text {trace}}(\varvec{{\mathcal {C}}}))^{1/2} \sigma _{max} L K^{-1/2}\) denote the RHS of (2.24). Using \(L=1+ 6 {\mathsf {a}}^{-1} L_G\), \(K=1/2\), \(A=(1/2) (\beta / {\mathsf {a}}^2) M_G^2\), (7.14), and (7.16),

$$\begin{aligned} R_m^2&\le 128 \times 40 {\mathsf {a}} (1 + 6 L_G {\mathsf {a}}^{-1})^3 ( (1/2) \beta {\mathsf {a}}^{-2} M_G^2 + 2 d (\beta ^2 + {\mathsf {a}}^{-1}) ) \\&\le 256 \times 20 (1+6 L_G {\mathsf {a}}^{-1})^3 \left( (1/2) \beta {\mathsf {a}}^{-1}M_G^2 + 2 d (\beta ^2 {\mathsf {a}} + 1) \right) \\&\le 256 \times 20 (1+\kappa )^3 \left( (1/2) \beta {\mathsf {a}}^{-1} M_G^2+ 2 d (\beta ^2 {\mathsf {a}} + 1) \right) = R^2, \end{aligned}$$

which implies that R defined in (3.18) satisfies (2.24). Moreover, by (7.15),

$$\begin{aligned} \left( \dfrac{\sigma _{max}}{\sigma _{min}}\right) L\le (1+\kappa )^{3/2}. \end{aligned}$$
(7.17)

Inserting (7.17) into (2.27) gives (3.22).

Invoke Corollary 2.12. By Corollary 2.12, as long as T satisfies (3.22), (7.9) holds with the dimension-free rate in (3.19) and the constants in (7.10) with \(\varvec{{\mathcal {C}}}\) replaced with \(\varvec{{\mathcal {C}}}_{{\mathsf {a}}}\). Moreover, the dimension-dependent constants in (7.10) can be upper bounded by dimension-free constants C and \(\epsilon \) given in (3.20) and (3.21), by using \(A=(1/2) \beta M_G^2 {\mathsf {a}}^{-2}\), (7.14) and (7.16). Thus, (3.11) holds for the transition kernel of (3.17). \(\square \)