Two-scale coupling for preconditioned Hamiltonian Monte Carlo in infinite dimensions

We derive non-asymptotic quantitative bounds for convergence to equilibrium of the exact preconditioned Hamiltonian Monte Carlo algorithm (pHMC) on a Hilbert space. As a consequence, explicit and dimension-free bounds for pHMC applied to high-dimensional distributions arising in transition path sampling and path integral molecular dynamics are given. Global convexity of the underlying potential energies is not required. Our results are based on a two-scale coupling which is contractive in a carefully designed distance.


Introduction
Hamiltonian or Hybrid Monte Carlo (HMC) methods are a class of Markov Chain Monte Carlo (MCMC) methods originating in statistical physics [20] which have become increasingly popular in various application areas [11,48,53,60,64]. Their success is in particular due to empirically observed convergence acceleration compared to more traditional, random-walk based methods. The basic idea in HMC is to define an MCMC method with the help of an artificial Hamiltonian dynamics whose only purpose is to accelerate convergence to equilibrium. This Hamiltonian dynamics is designed to leave invariant a product of the target measure and a fictitious Gaussian measure in an artifical velocity variable. First rigorous theoretical results supporting the empirical evidence have only been established recently. In particular, geometric ergodicity has been verified in [10,22,49], and quantitative convergence bounds have been derived in the strongly convex case in [52], and under more general assumptions in [9], both by applying coupling methods.
Since many applications are high dimensional, a key issue is to understand the dependence of the convergence bounds on the dimension. Here, we study the problem of dimension dependence for a special class of models that is relevant for several important applications including Path Integral Molecular Dynamics (PIMD) [14][15][16][17]31,32,43,44,50,57,58], Transition Path Sampling (TPS) [6,59,63,65], and Bayesian inverse problems [7,18,41,68]. For the class of models we consider, a corresponding HMC Markov chain relying on a preconditioned Hamiltonian dynamics can be defined directly on the infinite dimensional state space [3]. This suggests that one might hope for dimension-free convergence bounds for the corresponding Markov chains on finite-dimensional discretizations of the state space. Corresponding dimension-free convergence rates to equilibrium have been established for the preconditioned Crank-Nicholson (pCN) algorithm [36] and for the Metropolis-adjusted Langevin algorithm (MALA) [24], but a corresponding result for HMC is not known so far.
The goal of this paper is to fill this gap. To this end we extend the coupling approach developed for HMC in the finite dimensional case in [9], and combine it with a twoscale coupling approach for stochastic dynamics on infinite dimensional Hilbert spaces that originates in [33,34,54,55] and has been further developed in [71]. The splitting into "low modes" and "high modes" in the two-scale coupling can be traced back to contraction results for the stochastic Navier-Stokes equations [55], and analogous results in the deterministic setting [28]; see [54] for a detailed review.
Our object of study is the exact preconditioned HMC algorithm (pHMC) with fixed durations on a Hilbert space, i.e., the (preconditioned) Hamiltonian dynamics is exactly integrated (or, in practical terms, the integration is carried out with very small step sizes). Here, preconditioning corresponds to an appropriate choice of the kinetic energy which involves picking the mass operator equal to the stiffness operator (or inverse covariance) associated to the Gaussian reference measure of the target probability measure. This choice of kinetic energy ensures that the corresponding pHMC algorithm is more amenable to numerical approximation and Metropolis-adjustment than HMC without preconditioning [3,11].
We prove that the transition kernel of the Markov chain induced by the pHMC algorithm is contracting in a suitable Wasserstein/Kantorovich metric with a rate that depends transparently on the duration of the Hamiltonian flow, the eigenvalues of the covariance operator of the Gaussian reference measure, and the regularity of the preconditioned Hamiltonian dynamics. The results are given in a more general setting that includes pHMC as a special case, and also covers other types of dynamics and preconditioning strategies. As a consequence of our general results, we derive dimension-free bounds for pHMC applied to finite-dimensional approximations arising in TPS and PIMD.
Before stating our results in detail, we conclude with a brief outlook. The results below apply only to pHMC with exact integration of the Hamiltonian dynamics. In practice, the Hamiltonian dynamics is numerically approximated, to obtain numerical versions of pHMC that are implementable on a computer. The time integrator of choice for pHMC is the symmetric splitting integrator introduced in [3]. Unlike other splittings for the Hamiltionian dynamics, this approximation has an acceptance rate that is uniform with respect to the spatial step size associated with the discretization of the Hilbert space [11, § 8]. Time discretization creates a bias in the invariant measure that can be avoided by a Metropolis adjustment [11,70]. We would expect that for unadjusted numerical HMC based on the integrator proposed in [3], similar contraction results as stated below hold if the time step size is chosen sufficiently small (but independently of the dimension). Under additional regularity assumptions, one could also hope for dimension free bounds for the Metropolis adjusted version. First steps in this direction are carried out in [9, § 2.5.4] in the finite dimensional case, and in [62, § 4] in a strongly convex infinite dimensional case, but a full study in the general case would be lengthy and go beyond the scope of the current work.
Alternatively to preconditioning, it is also possible (but more tricky) to implement non-preconditioned HMC, which corresponds to injecting white noise in the velocity variable. In this case, the corresponding Hamiltonian dynamics is highly oscillatory in high modes [61]. Therefore, convergence bounds for exact HMC without preconditioning on an infinite dimensional Hilbert space can be expected to hold only if the durations are randomized [60], and in numerical implementations, strongly stable integrators [43,44] have to be used in order to be able to choose the step size independently of the dimension. Furthermore, scaling limit results show that for Metropolis adjusted HMC applied to i.i.d. product measures on high dimensional state spaces, the step size has to be chosen of order O(d −1/4 ) to avoid degeneracy of the acceptance probabilities [2,30,42].
We now state our main results in Sect. 2, and consider applications to TPS and PIMD in Sect. 3. The remaining sections contain the proofs of all results.

Main results
Let H be a separable and real Hilbert space with inner product ·, · and norm |·|. Let C : H → H be a positive compact symmetric linear operator. By the spectral theorem, the eigenfunctions {e i } i∈N of C form a complete orthonormal basis of H with corresponding eigenvalues {λ i } i∈N which we arrange in descending order, i.e., λ 1 ≥ λ 2 ≥ · · · . The positivity condition means that λ j > 0 for all j ∈ N, and by compactness, if dim(H) = ∞ then lim j→∞ λ j = 0. Any function x ∈ H can be represented in spectral coordinates by the expansion Moreover, for all s ∈ R, the operator C s is defined via the spectral decomposition of C. We introduce the family of inner products and norms given by  [3,18,68]. Here, typically, s ∈ (0, 1). We will now introduce the pHMC method for approximate sampling from a probability measure μ that has a density w.r.t. a Gaussian measure μ 0 on one of the Hilbert spaces H s . Afterwards, in Sect. 2.2, we will introduce a more general family of Markov chains on Hilbert spaces that includes the Markov chain associated to pHMC as a special case. In Sect. 2.3, we introduce a new coupling for these Markov chains that combines ideas from [9,71]. Then in Sects. 2.4 and 2.5, we state our main contraction result for these couplings, and derive quantitative error bounds.

Exact preconditioned Hamiltonian Monte Carlo
Let μ 0 = N (0, C) denote the centered Gaussian measure whose covariance operator w.r.t. the inner product ·, · is C [5]. If C is trace class then μ 0 is supported on H. More generally, we fix s ∈ (−∞, 1) and assume that μ 0 is supported on the corresponding Hilbert space H s . This is ensured by the following assumption: A realization ξ from μ 0 can be generated using the expansion For ξ ∼ μ 0 , Assumption 2.1 implies E |ξ | 2 s = trace(C 1−s ) < ∞, and thus, ξ is indeed a Gaussian random variable on H s .

Remark 2.2
To avoid confusion, we stress that the covariance operator of a Gaussian measure is a non-intrinsic object that depends on the choice of an inner product. In particular, the covariance operator of μ 0 w.r.t. the H s inner product is C 1−s . Nonetheless, in what follows, we always define the covariance operator with respect to the H inner product, and in this sense, the measure μ 0 has covariance operator C.
Exact preconditioned Hamiltonian Monte Carlo (pHMC) is an MCMC method for approximate sampling from probability distributions on a Hilbert space that have the general form where U is a function on a Hilbert space on which the Gaussian measure μ 0 is supported. The pHMC method generates a Markov chain on this Hilbert space with transition step Here ξ ∼ N (0, C), and the duration T : → R + is in general an independent random variable with a given distribution ν (e.g. ν = δ r or ν = Exp(λ −1 )). We will only consider the case where T ∈ (0, ∞) is a given deterministic constant. Moreover, is the exact flow of the Hilbert space valued ODE given by (2.5) Formally, (2.5) is a preconditioned Hamiltonian dynamics for the Hamiltonian where the covariance operator C is used for preconditioning. A key property of (2.5) is that it leaves invariant the probability measurê on phase space, and in turn, this implies that the transition kernel of pHMC defined by π( [3]. Below, our key assumption in this setup will be that U is a gradient Lipschitz function on the Hilbert space H s where the reference measure μ 0 is supported: The target measure μ is a probability measure on H s that is absolutely continuous with respect to μ 0 . The relative density is proportional to exp(−U ) where U : H s → [0, ∞) is a Frèchet differentiable function satisfying the gradient Lipschitz condition for all x, y, h ∈ H s for some finite positive constant L g .
In Assumption 2.3, ∂ h U denotes the directional derivative of U in direction h. We also use the notation DU to denote the differential of U , i.e., (DU ) (

Remark 2.4
The global Lipschitz condition in Assumption 2.3 is principally the same as Condition 3.2 in [3] except here the domain H s of the potential energy is defined in terms of the covariance operator itself rather than in terms of an auxiliary operator with related eigenfunctions and eigenvalues.

General setting
We now introduce a more general setup that includes the Markov chain induced by pHMC as a special case. We fix s ∈ R, and we assume that b : ) denote the exact flow of the Hilbert space valued ODE given by As above, we fix a constant duration T ∈ (0, ∞) and consider the Markov chain on H s with transition step where C is a linear operator on H with the same eigenfunctions as C.
For ξ ∼ N (0, C), this implies E |ξ | 2 s = trace( CC −s ) < ∞, and thus, ξ in (2.10) is a Gaussian random variable on H s . Let π(x, B) = P[X (x) ∈ B] denote the corresponding transition kernel. In particular, in the case where b is given by (2.8) and C = C, we recover the Markov chain associated to pHMC. When C = C, the choice b(x) = − CC −1 x − C DU (x) ensures that the corresponding partially preconditioned dynamics in (2.9) leaves invariant the probability measure μ(dx)N (0, C)(dv).
Our main result rests on the assumption that the Hilbert space H s can be split into a finite dimensional subspace H s, ("the low modes") and its orthogonal complement H s,h ("the high modes") such that b(x) is close to a linear map on H s,h . More precisely, fix n ∈ N. Let H s, := The proof of this lemma is given in Sect. 5.

Two-scale coupling
We now introduce a coupling for the transition steps of two copies of the Markov chain starting at different initial conditions x and y. We use a synchronous coupling of the high modes in H s,h and a different coupling for the low modes in H s, that together enable us to derive a weak form of contractivity. Note that the covariance operator C has a bounded inverse on the finite dimensional subspace H . Therefore, for h ∈ H , the Gaussian measure Let γ > 0 be a positive constant. The precise value of the parameter γ will be chosen in an appropriate way below. The coupling transition step is given by ( with ξ ∼ N (0, C) and η defined in high/low components as η h := ξ h and Here U ∼ Unif(0, 1) is independent of ξ , z := x − y, and the reflection operator R is defined by Due to Assumption 2.6 (B2), the component in H s,h of the resulting coupled dynamics is contracting in a finite time interval as a result of the linear part of the drift in (2.9). Moreover, the coupling of the components of the initial velocities in H s, is similar to the coupling in [9] which is inspired by a related coupling for second order Langevin diffusions [25]. It is defined in such a way that ξ − η = −γ z occurs with the maximal possible probability. As illustrated in Fig. 1, and proven later in In the case where ξ −η = −γ z , a reflection coupling is applied. The corresponding reflection R is an orthogonal transformation w.r.t. the inner product x, y In order to verify that (X (x, y), Y (x, y)) is indeed a coupling of the transition probabilities π(x, ·) and π(y, ·), we remark that the distribution of η is N (0, C) since, by definition of η in (2.16) and a change of variables, for any measurable set B. Here a ∧ b denotes the minimum of real numbers a and b, I B (·) denotes the indicator function for the set B, and we have used that N (0, C) is invariant under the reflection R, Rz = −z , and by (2.14), where d TV is the total variation distance. Hence, by the coupling characterization of the total variation distance, η = ξ + γ z does indeed hold with the maximal possible probability. Note that if z is not in the reproducing kernel Hilbert space of the covariance operator C then the probability of the event η = ξ + γ z in (2.18) tends to one as the number of low modes increases. This explains why it is necessary to split the Hilbert space and apply a two-scale coupling.

Contractivity
We now state our main contraction bound for the coupling introduced above. We first define a norm |||·||| α on H s where the high modes are weighted by α > 0: Thus |||·||| α and |·| s are equivalent norms with

Remark 2.8
If the dimension is infinite then the operator C −1 C s is unbounded on H s , because its inverse is trace class. Nonetheless, |||x||| α is a well-defined norm for any x ∈ H s because the operator C − 1 2 C s 2 appearing in |||x||| α only acts on the projection x of x onto the finite dimensional space H s, .
As we will see below, even when U is non-convex, we can still obtain contractivity with respect to a semimetric ρ : and where R > 0, a > 0, and > 0 are parameters to be specified below. The semimetric ρ is similar to the one introduced in [9] in order to prove contractivity of the HMC transition step in the finite dimensional case. In general, ρ is not a metric, since the triangle inequality might be violated. Note that f is non-decreasing, and constant when r ≥ R.

Remark 2.9
The semimetric (2.22) incorporates, in a multiplicative or weighted way, the quadratic Foster-Lyapunov function for pHMC from Lemma 2.7 with weight . The idea to use semimetrics of this general form to study contraction properties of Markov processes goes back to [12,35]; see also [26].
Lemma 2.7 implies that the coupling transition (x, y) → (X (x, y), Y (x, y)) also has a quadratic Foster-Lyapunov function: We fix a finite, positive constant R satisfying In our main result below, we choose α := 4σ max L. In this case, the choice of R in (2.24) guarantees that a strict drift condition holds for all (x, y) satisfying |||x − y||| α ≥ R, because by (2.21) and since L ≥ 1, The asymptotic strict drift condition in (2.25) allows us to split the proof of contractivity into two parts: (i) |||x − y||| α ≥ R where any coupling is contracting in ρ due to (2.25), and (ii) |||x − y||| α < R, where ρ is contracting due to the specially designed two-scale coupling.
Theorem 2.10 Suppose that Assumption 2.6 holds. Let T > 0 satisfy Let α, γ , a, and be given by Then for any x, y ∈ H s , we have The proof of this theorem is given in Sect. 6.

Remark 2.11
The rate in (2.33) is similar to the rate in the finite-dimensional case found in Theorem 2.3 of Ref. [9]. The main difference is that the condition on LT 2 in (2.27) now reflects the effect of preconditioning.

Quantitative bounds for distance to the invariant measure
Theorem 2.10 establishes global contractivity of the transition kernel π(x, dy) w.r.t. the Kantorovich distance based on the underlying semimetric ρ, which for probability measures ν, η on H s is defined as where the infimum is over all couplings γ of ν and η. Moreover, it implies quantitative bounds for the standard L 1 Wasserstein distance
Then for any k ∈ N and for any probability measures ν, η on H s , where the rate c and the constant are given explicitly by (2.33) and (2.31), and In particular, for a given constant δ ∈ (0, ∞), the The corollary is a rather direct consequence of Theorem 2.10. A short proof is included in Sect. 6.

Remark 2.13 (Quantitative bounds for ergodic averages)
MCMC methods are often applied to approximate expectation values w.r.t. the target distribution by ergodic averages of the Markov chain. Our results (e.g. (2.34)) directly imply completely explicit bounds for bias and variances, as well as explicit concentration inequalities for these ergodic averages in the case of pHMC. Indeed, the general results by Joulin and Ollivier [40] show that such bounds follow directly from an L 1 Wasserstein contraction w.r.t. an arbitrary metric ρ, which is precisely the statement shown above.

Transition path sampling
Here we discuss the use of pHMC in transition path sampling (TPS). As an application of Theorem 2.10, we obtain dimension-free contraction rates for exact preconditioned HMC in this context. Fix a time horizon τ > 0 (not to be confused with the duration parameter in preconditioned HMC which we denote by T ). The aim of TPS [4,37,38,65] is to sample from a diffusion bridge or conditioned diffusion, i.e., from the conditional law ν a,b of the solution X : given both initial and final conditions Here : R d → R is a given potential energy function and W is a d-dimensional standard Brownian motion. TPS is particularly relevant to molecular dynamics where the states a and b represent different configurations of a molecular system [6,59,63].
We first recenter: of the Brownian bridge from a to b. Then by Girsanov's theorem, the measure μ is absolutely continuous with respect to the law μ 0 of the Brownian bridge from 0 to 0 [4,39]. Moreover, the measure μ 0 is the centered Gaussian measure on the Hilbert D where D is the Dirichlet Laplacian, and the relative density of μ with respect to μ 0 is proportional to exp(−U (x)) where the function U (x) is defined in terms of the so-called path potential energy function G : R d → R as follows In the main convergence result given below, we make the following regularity assumption on G. This regularity assumption frequently holds in molecular dynamics applications, since the configuration space of molecular systems is usually taken to be a fixed cubic box with periodic boundary conditions [1,8,13,23,29,46,67]. In this case, we can lift the TPS problem to the covering space R d by extending the path potential to a periodic function on this space. Thus after recentering the coordinate system, Assumption 3.1 is satisfied whenever G is C 2 .
To implement TPS on a computer, we use the finite difference method to approximate the infinite-dimensional distribution μ(dx) ∝ exp(−U (x))N (0, C)(dx) by a finite-dimensional probability measure μ m . Other approximations, e.g., Galerkin or finite-element, are also possible and should yield similar results. We focus on the finite difference method because it is widely used in practice. Discretize the interval [0, τ ] into m + 2 evenly spaced grid points The space of paths on R d is then approximated by the finite-dimensional space R md . Specifically, we write x ∈ R md as otherwise.
The covariance operator C is approximated by the md × md matrix C = − −1 D,m , and the Hilbert space H is represented by R md with inner product given by the weighted dot product x, y = τ m+1 x • y. The functional (3.2) is discretized as Note that if the vector x contains the grid values of a smooth function x, then U m (x) → U (x) as m → ∞. In summary, the infinite-dimensional path distribution μ(dx) is approximated by the finite-dimensional probability measure μ m (d x) with non-normalized density exp −U m (x) − 1 2 x, C −1 x . To approximately sample from μ m , we use pHMC with transition step in (2.10). This corresponds to a Markov chain on R md with transition step Let π m denote the transition kernal of (3.4).
Suppose that the duration parameter T ∈ (0, R) satisfies Then for any m > m , k ∈ N, and probability measure ν m on R md ,

Remark 3.3
Note that the upper bound in (3.11) depends on dimension only through the initial distribution. The dimension independence in the other terms of this bound reflects that the finite-dimensional pHMC algorithm in (3.4) converges to an infinitedimensional pHMC algorithm whose transition kernel satisfies an infinite-dimensional analog of this quantitative bound.
A proof of this result is given in Sect. 7.1.

Path integral molecular dynamics
Here we discuss the use of pHMC for path-integral molecular dynamics (PIMD), and as an application of Theorem 2.10, obtain dimension-free contraction rates for preconditioned HMC in this context. PIMD is used to compute exact Boltzmann properties and approximate dynamical properties of quantum mechanical systems [14]. The technique is based on Feynman's path-integral formulation of quantum statistical mechanics [27], and the observation that the quantum Boltzmann statistical mechanics of a quantum system can be reproduced by the classical Boltzmann statistical mechanics of a ring-polymer system [14].
Consider N interacting quantum particles in 3D with potential energy operator given byV = V (q 1 , . . . ,q N ) (3.12) whereq i is the three-dimensional position operator of particle i and V : R d → R is a potential energy function where d = 3N [32]. The thermal equilibrium properties of this system are described by the quantum mechanical Boltzmann partition function, where β is an inverse temperature parameter. For some a > 0, suppose that the potential energy function can be written as where G : R d → R. Then the partition function Q can be written as the expected value of a Gaussian random variable on loop space as follows 14) and the covariance operator C a of the Gaussian reference measure is defined in terms of the Laplacian with periodic boundary conditions P on L 2 ([0, β], R d ) as follows where I is the identity operator and the potential energy U (x) is given by The probability measures μ 0 and μ(dx) ∝ exp(−U (x)) N (0, C a )(dx) are supported on the loop space consisting of all periodic continuous paths x : [0, β] → R d . They are similar to the corresponding measures considered for Transition Path Sampling, but there is an additional, artificially introduced parameter a appearing in C a . This parameter is essential because P is not invertible since it has a zero (leading) eigenvalue corresponding to the 'centroid mode' [50].
To implement PIMD on a computer, we use finite-differences to truncate the infinitedimensional path distribution μ to a finite-dimensional one μ m by discretizing the interval [0, β] into m + 1 grid points (3.16) The space of loops on R d is approximated by the finite-dimensional space R md . Specifically, we write x ∈ R md as where x j+1: j+d := (x j+1 , . . . , x j+d ) is a d-dimensional vector that can be viewed as an approximation of x(t j ) for j = 1, . . . , m.

Remark 3.4 Comparing (3.3)-(3.16)
, note that the number of grid points in TPS, resp. PIMD, is m + 2, resp. m + 1. Nonetheless, in both cases path and loop space are approximated by R md . The difference in the number of grid points is due to the boundary conditions: in TPS the Dirichlet boundary conditions eliminate two unknown d-dimensional vectors, whereas in PIMD the periodic boundary conditions eliminate only one unknown d-dimensional vector. Thus, the total number of unknowns in both cases is md.
The periodic Laplacian P is approximated by the md × md discrete periodic Laplacian matrix P,m with (i, j)-th entry otherwise.
Naturally, the covariance operator C a is approximated by the md × md matrix C a = (− P,m + aI md×md ) −1 where I md×md is the md × md identity matrix, and the infinite-dimensional Hilbert space H is represented by R md with inner product given by the weighted scalar product x, y = β m x • y. The functional in (3.2) is discretized as G(x j+1: j+d ).
In summary, the infinite-dimensional path distribution μ(dx) is approximated by the finite-dimensional distribution μ m (d x) ∝ exp(−U m (x))N (0, m β C a )(d x). In this context, pHMC generates a Markov chain on R md with invariant measure μ m and with transition step given by Suppose that the duration parameter T ∈ (0, R) satisfies (

3.22)
Then for any m > m , k ∈ N, and probability measure ν m on R md , (3.11) holds for the transition kernel of (3.17).
A proof of this result is given in Sect. 7.2.

Numerical illustration of couplings
Before turning to the proofs of our main results, we test the two-scale coupling defined by (2.15) numerically on the following distributions.
• A TPS distribution with the three-well path potential energy function illustrated in Fig. 2a. The initial conditions of the components of the coupling are taken to be paths that pass through the two channels that connect the bottom two wells located at x ± ≈ (±1.048, −0.042). • A PIMD distribution where the underlying potential energy is the negative logarithm of the normal mixture density illustrated in Fig. 2b. The mixture components are twenty two-dimensional Gaussian distributions with covariance matrix given by the 2 × 2 identity matrix and with mean vectors given by 20 independent samples from the uniform distribution over the rectangle [0, 10] × [0, 10]. The energy barriers are not large. The potential energy in this example is adapted from [45,47]. The initial paths are taken to be two unit circles one centered at (1, 1) and the other centered at (9,9). The parameter a is selected to be 0.1. • A PIMD distribution where the underlying potential energy is the negative logarithm of the Laplace mixture density illustrated in Fig. 2c. The mixture components are twenty two-dimensional (regularized) Laplace distributions using the same covariance matrix and mean vectors as in the preceding example. However, unlike the preceding example, in this example the underlying potential is only asymptotically convex. The initial paths are taken to be two unit circles one centered at (1, 1) and the other centered at (9,9). The parameter a is selected to be 0.1. • A PIMD distribution where the underlying potential energy is the banana-shaped potential energy illustrated in Fig. 2d. This function is highly non-convex and unimodal with a global minimum at the point (1, 1). This minimum lies in a long, narrow, banana shaped valley. The initial paths are taken to be small circles centered at (±4, 16). The parameter a is selected to be 1.0.
For the TPS and PIMD distributions we use the finite-dimensional approximations described in Sects. 3.1 and 3.2, respectively. The resulting semi-discrete evolution equations are discretized in time using a strongly stable symmetric splitting integrator [3,43,44]. We describe this integrator in the specific context of TPS, since a very similar method is used for PIMD with C a replacing C in the dynamics, and the covariance matrix (m/β)C a replacing ((m + 1)/τ )C in the velocity randomization step. First, with corresponding flows explicitly given by Given a time step size t > 0, and using these exact solutions, a t step of the symmetric splitting integrator we use is given by In order to mitigate the effect of periodicities or near periodicities in the underlying dynamics, we choose the number of integration steps to be geometrically distributed with mean T / t. The idea of duration randomization has a long history [10,11,13,19,21,51]. The initial velocity is taken to be an md-dimensional standard normal vector with covariance matrix ((m + 1)/τ )C and a Metropolis accept/reject step is added to ensure the algorithm leaves invariant μ m [11,70]. In summary, we use the following transition step in the simulations.

Algorithm 3.1 (Numerical Randomized pHMC)
Denote by T > 0 the duration parameter and let ψ t be the time integrator described in (3.25). Given the current state of the chain x ∈ R md , the algorithm outputs the next state of the chain X ∈ R md as follows.
Step 2 Generate a geometric random variable k supported on the set {1, 2, 3, ...} with mean T / t. , ξ ), and given ξ and k, γ is a Bernoulli random variable with parameter α defined as We stress that ξ and k from (Step 1) and (Step 2) are mutually independent and independent of the state of the Markov chain associated to pHMC. We pick the time step size t of the integrator sufficiently small to ensure that 99% of proposal moves are accepted on average in (Step 3).
Realizations of the coupling process are shown in Fig. 2. We chose parameters only for visualization purposes. The different components of the coupling are shown as different color dots. The insets of the figures show the distance between the components of the coupling as a function of the number of steps. Figure 3 shows the average time after which the distance between the components of the coupling is for the first time within 10 −12 . To produce this figure, we generated

A priori bounds
In this section we gather several bounds for the dynamics and for the coupling that will be crucial in the proof of our main result.

Bounds for the dynamics
In the following, we assume throughout that Assumption 2.6 is satisfied, and Recall that φ t = (q t , v t ) denotes the flow of (2.9). With the exception of using a different norm, the proofs of Lemmas 4.1 and 4.2 below are identical to the proofs of Lemmas 3.1 and 3.2 in Ref. [9] and therefore not repeated.
In particular,

Bounds related to two-scale coupling
The following lemma is used in the proof of Theorem 2.10 to obtain a contraction for the two-scale coupling when the distance between the components of the coupling is sufficiently small, i.e., |||x − y||| α < R.

Lemma 4.3
Suppose that γ > 0 and t > 0 satisfy γ t ≤ 1 and Lt 2 ≤ 1/4. Then for any x, y, u, v ∈ H s such that v h = u h and v = u + γ (x − y ), we have , v), and w t = dz t dt . By (2.9), with w 0 = −γ z 0 . These are second order linear ordinary differential equations, perturbed by a nonlinearity. A variation of constants ansatz shows that they are equivalent to the equations (q r (y, v)) dr, q r (y, v)) dr.

Lemma 4.4 For any choice of γ , P[ξ
Since CC −s is a trace class operator on H s , note that C − 1 2 C s 2 is not a bounded operator. Nonetheless, the bound appearing in Lemma 4.4 is finite because the operator C − 1 2 C s 2 appearing in that bound only acts on z , i.e., the n-dimensional projection of z onto the finite dimensional space H s, . Fig. 4 The total variation distance between the one-dimensional normal distributions N (0, 1) and N (h, 1) equals one minus the area of the shaded region. Therefore, d TV (N (0, 1), N (h, 1) (N (0, C), N (γ z , C)). Let C denote the restriction of C to H . Then since z ∈ H , and by scale invariance of the total variation distance, (N (0, C ), N (γ z , C ) Proof Property (i) follows from the fact that f is concave. Since f is non-decreasing and constant for r ≥ R, (ii) is trivially true in the cases r ≤ r and r ≥ R. In the case r < min( r , R), Combining these cases gives (ii). Let Property (iii) then follows because g decreases with x, lim x→0 g(x) = 1 and g(x) ≥ max(1, x)e − max(1,x) .

Proof of Foster-Lyapunov drift condition
Before giving the proof of Lemma 2.7, here are some preparatory results. Using the shorthand notation (t) = |q t (x, ξ)| 2 s and ϕ(t) = |v t (x, ξ)| 2 s , (2.9) implies Hence, by Assumption 2.6 (B1) and (B3), we have the differential inequalities The following formula comes from two applications of integration by parts and is valid for any k ∈ N and for any twice differentiable function g : R → R, dr.

Proofs of main results
Proof of Theorem 2. 10 The parameters γ , a and have been chosen in (2.29), (2.30), and (2.31) such that the following conditions are satisfied: and F = f (R ). We consider separately the cases where r < R and r ≥ R.

Proofs of results for TPS
To prove Theorem 3.2, we compare the eigenvalues of C to C. Note that these eigenvalues each have multiplicity d, and to account for this, define the index function ϕ(k, j) = d(k − 1) + j. Then the eigenvalues of C are and the eigenvalues of C are The following lemma helps estimate the error of the eigenvalues of the approximation C relative to those of C.

Proof of Theorem 3.2
This result is an application of Corollary 2.12. Since C is a finitedimensional matrix, Assumption 2.1 holds for C with s = 0, and since we choose C = C, Assumption 2.5 also holds with s = 0. Therefore, to apply Corollary 2.12, it suffices to check that: (i) Assumption 2.6 holds with dimension-free constants L, n, K , and A, (ii) the dimension-free R defined in (3.6) satisfies condition (2.24), and (iii) the dimension-free condition (3.10) on the duration T implies (2.27) holds. We then invoke Corollary 2.12 to conclude convergence in the standard L 1 Wasserstein distance.

Proofs of results for PIMD
To prove Theorem 3.5, we compare the eigenvalues of C a to C a . The leading eigenvalue of C a has multiplicity d, while all of the other eigenvalues have multiplicity 2d. If m is odd, then the leading eigenvalue of both C a has multiplicity d, while all of the other eigenvalues of C a have multiplicity 2d. However, if m is even, then the trailing and leading eigenvalues of C a have multiplicity d, while all of the other eigenvalues of of C a have multiplicity 2d. To account for these multiplicities, it is helpful to define the index function For any k ∈ N, the eigenvalues of C a are For all m ∈ N and k ∈ {1, . . . , (m + 1)/2 }, the eigenvalues of C a are Here we have introduced Note that the definition in (7.12) includes odd or even values of m. The following lemma estimates the error of the eigenvalues of C a relative to those of C a . Lemma 7.2 For any m ∈ N and k ∈ {1, . . . , (m + 1)/2 }, (E1) | ϕ(k,1) − λ ϕ(k,1) | = ϕ(k,1) − λ ϕ(k,1) ≤ λ ϕ(k,1) 2θ 2 k .
Taking square roots of both sides then gives (E2).
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.