# Using Perturbed Underdamped Langevin Dynamics to Efficiently Sample from Probability Distributions

- 474 Downloads

## Abstract

In this paper we introduce and analyse Langevin samplers that consist of perturbations of the standard underdamped Langevin dynamics. The perturbed dynamics is such that its invariant measure is the same as that of the unperturbed dynamics. We show that appropriate choices of the perturbations can lead to samplers that have improved properties, at least in terms of reducing the asymptotic variance. We present a detailed analysis of the new Langevin sampler for Gaussian target distributions. Our theoretical results are supported by numerical experiments with non-Gaussian target measures.

## 1 Introduction and Motivation

*overdamped Langevin dynamics*defined to be the unique (strong) solution \((X_{t})_{t\ge 0}\) of the following stochastic differential equation (SDE):

*V*, i.e. on the measure \(\pi (\mathrm {d}x)\), the process \((X_{t})_{t\ge 0}\) is ergodic and in fact reversible with respect to the target distribution.

*underdamped Langevin dynamics*given by \((X_t)_{t\ge 0} = (q_t, p_t)_{t\ge 0}\) defined on the extended space (phase space) \(\mathbb {R}^{d}\times \mathbb {\mathbb {R}}^{d}\) by the following pair of coupled SDEs:

*M*and \(\varGamma \) are assumed to be symmetric positive definite matrices. It is well-known [36, 46] that \((q_{t},p_{t})_{t\ge 0}\) is ergodic with respect to the measure \(\widehat{\pi }:=\pi \otimes \mathcal {N}(0,M)\), having density with respect to the Lebesgue measure on \(\mathbb {R}^{2d}\) given by

*p*and thus for functions \(f\in L^{1}(\pi )\), we have that \(\frac{1}{t}\int _0^t f(q_{t})\,\mathrm {d}t\rightarrow \pi (f)\) almost surely. Notice also that the dynamics restricted to the

*q*-variables is no longer Markovian. The

*p*-variables can thus be interpreted as giving some instantaneous memory to the system, facilitating efficient exploration of the state space. Higher order Markovian models, based on a finite dimensional (Markovian) approximation of the generalized Langevin equation can also be used [12].

As there is a lot of freedom in choosing the dynamics in (2), see the discussion in Sect. 2, it is desirable to choose the diffusion process \((X_t)_{t\ge 0}\) in such a way that \(\pi _T(f)\) can provide a good estimation of \(\pi (f)\). The performance of the estimator (2) can be quantified in various manners. The ultimate goal, of course, is to choose the dynamics as well as the numerical discretization in such a way that the computational cost of the longtime-average estimator is minimized, for a given tolerance. The minimization of the computational cost consists of three steps: bias correction, variance reduction and choice of an appropriate discretization scheme. For the latter step see Sect. 5 and [14, Sect. 6].

Under appropriate conditions on the potential *V* it can be shown that both (3) and (4) converge to equilibrium exponentially fast, e.g. in relative entropy. One performance objective would then be to choose the process \((X_t)_{t\ge 0}\) so that this rate of convergence is maximised. Conditions on the potential *V* which guarantee exponential convergence to equilibrium, both in \(L^{2}(\pi )\) and in relative entropy can be found in [7, 39, 54]. In the case when the target measure \(\pi \) is Gaussian, both the overdamped (3) and the underdamped (4) dynamics become generalized Ornstein–Uhlenbeck processes. For such processes the entire spectrum of the generator—or, equivalently, the Fokker–Planck operator—can be computed analytically and, in particular, an explicit formula for the \(L^2\)-spectral gap can be obtained [38, 43, 44]. A detailed analysis of the convergence to equilibrium in relative entropy for stochastic differential equations with linear drift, i.e. generalized Ornstein–Uhlenbeck processes, has been carried out in [1, 2].

*f*, the estimator \(\pi _T(f)\) satisfies a central limit theorem (CLT) [31], that is,

*asymptotic variance*of the estimator \(\pi _T(f)\). The asymptotic variance characterises the magnitude of fluctuations of \(\pi _T(f)\) around \(\pi (f)\). Consequently, another natural objective is to choose the process \((X_t)_{t\ge 0}\) such that \(\sigma ^2_f\) is as small as possible. It is well known that the asymptotic variance can be expressed in terms of the solution to an appropriate Poisson equation for the generator of the dynamics [31]

Other measures of performance have also been considered. For example, in [50, 51], performance of the estimator is quantified in terms of the rate functional of the ensemble measure \(\frac{1}{t}\int _0^t \delta _{X(t)}(dx)\). See also [28] for a study of the nonasymptotic behaviour of MCMC techniques, including the case of overdamped Langevin dynamics.

Similar analyses have been carried out for various modifications of (3). Of particular interest to us are the *Riemannian manifold MCMC* [18] (see the discussion in Sect. 2) and the *nonreversible Langevin samplers* [20, 21]. As a particular example of the general framework that was introduced in [18], we mention the preconditioned overdamped Langevin dynamics \( \mathrm {d}X_t = -P \nabla V(X_t)\,\mathrm {d}t + \sqrt{2P}\,\mathrm {d}W_t, \) that was presented in [4]. There, the long-time behaviour of as well as the asymptotic variance of the corresponding estimator \(\pi _T(f)\) are studied and applied to equilibrium sampling in molecular dynamics. A variant of the standard underdamped Langevin dynamics that can be thought of as a form of preconditioning and that has been used by practitioners is the *mass-tensor molecular dynamics* [6].

Our objective is to investigate the use of these dynamics for computing ergodic averages of the form (2). To this end, we study the long time behaviour of (8) and, using hypocoercivity techniques, prove that the process converges exponentially fast to equilibrium. This perturbed underdamped Langevin process introduces a number of parameters in addition to the mass and friction tensors which must be tuned to ensure that the process is an efficient sampler. For Gaussian target densities, we derive estimates for the spectral gap and the asymptotic variance, valid in certain parameter regimes. Moreover, for certain classes of observables, we are able to identify the choices of parameters which lead to the optimal performance in terms of asymptotic variance. While these results are valid for Gaussian target densities, we advocate these particular parameter choices also for more complex target densities. To demonstrate their efficacy, we perform a number of numerical experiments on more complex, multimodal distributions. In particular, we use the Langevin sampler (8) in order to study the problem of diffusion bridge sampling.

The rest of the paper is organized as follows. In Sect. 2 we present some background material on Langevin dynamics, we construct general classes of Langevin samplers and we introduce criteria for assessing the performance of the samplers. In Sect. 3 we study qualitative properties of the perturbed underdamped Langevin dynamics (8) including exponentially fast convergence to equilibrium and the overdamped limit. In Sect. 4 we study in detail the performance of the Langevin sampler (8) for the case of Gaussian target distributions. In Sect. 5 we introduce a numerical scheme for simulating the perturbed dynamics (8) and we present numerical experiments on the implementation of the proposed samplers for the problem of diffusion bridge sampling. Section 6 is reserved for conclusions and suggestions for further work.

## 2 Construction of General Langevin Samplers

### 2.1 Background and Preliminaries

*f*and \(\, : \,\) denotes the Frobenius inner product. In general, \(\varSigma \) is nonnegative definite, and could possibly be degenerate. In particular, the infinitesimal generator (10) need not be uniformly elliptic. To ensure that the corresponding semigroup exhibits sufficient smoothing behaviour, we shall require that the process (9) is hypoelliptic in the sense of Hörmander. If this condition holds, then irreducibility of the process \((X_t)_{t\ge 0}\) will be an immediate consequence of the existence of a strictly positive invariant distribution \(\pi (x)\mathrm {d}x\), see [30].

Suppose that \((X_t)_{t\ge 0}\) is nonexplosive. It follows from the hypoellipticity assumption that the process \((X_t)_{t\ge 0}\) possesses a smooth transition density *p*(*t*, *x*, *y*) which is defined for all \(t \ge 0\) and \(x, y \in {{\mathrm{\mathbb {R}}}}^d\), [5, Theorem VII.5.6]. The associated strongly continuous Markov semigroup \((P_t)_{t\ge 0}\) is defined by \( P_t f(x) = \int _{{{\mathrm{\mathbb {R}}}}^d} p(t, x, y)f(y)\,\mathrm {d}y. \) Suppose that \((P_t)_{t\ge 0}\) is invariant with respect to the target measure \(\pi \), i.e. \(\int _{\mathbb {R}^d} P_t f(x)\pi (\mathrm {d}x) = \int _{\mathbb {R}^d} f(x)\pi (\mathrm {d}x) \) for \(t\ge 0\) and all bounded continuous functions *f*. Then \((P_t)_{t\ge 0}\) can be extended to a positivity preserving contraction semigroup on \(L^2(\pi )\) which is strongly continuous. Moreover, the infinitesimal generator corresponding to \((P_t)_{t\ge 0}\) is given by an extension of \(({{\mathrm{\mathcal {L}}}}, C^{2}_c({{\mathrm{\mathbb {R}}}}^d))\), also denoted by \({{\mathrm{\mathcal {L}}}}\).

Due to hypoellipticity and invariance with respect to \((P_t)_{t \ge 0}\), the probability measure \(\pi \) on \(\mathbb {R}^d\) has a smooth density with respect to the Lebesgue measure. If this density is strictly positive, it follows that \(\pi \) is necessarily the unique invariant distribution. Slightly abusing the notation, we will denote both the measure and its density by \(\pi \). Furthermore, we will denote by \(L^2(\pi )\) be the Hilbert space of \(\pi \)-square integrable functions equipped with inner product \(\left\langle \cdot , \cdot \right\rangle _{L^2(\pi )}\) and norm \(\left||{\cdot }\right||_{L^2(\pi )}\).

### 2.2 A General Characterisation of Ergodic Diffusions

A natural question is what conditions on the coefficients *a* and *b* of (9) are required to ensure that \((X_t)_{t\ge 0}\) is invariant with respect to the distribution \(\pi (x)\,\mathrm {d}x\). The following result provides a necessary and sufficient condition for a diffusion process to be invariant with respect to a given target distribution.

### Theorem 1

The proof of the first part of this result can be found in [46, Chap. 4]; similar versions of this characterisation can be found in [54] and [21]. For the existence of the skew-symmetric matrix *C* see, e.g., [16, Sec.4, Prop. 1]. See also [37].

### Remark 1

If (11) holds and \(\mathcal {L}\) is hypoelliptic it follows immediately that \((X_t)_{t\ge 0}\) is ergodic with unique invariant distribution \(\pi (x)\,\mathrm {d}x\) (see [30]).

- 1.
Choosing \(b = I\) and \(\gamma = 0\) we immediately recover the overdamped Langevin dynamics (3).

- 2.Choosing \(b = I\), and \(\gamma \ne 0\) such that (12) holds gives rise to the nonreversible overdamped equation defined by (7). As it satisfies the conditions of Theorem 1, it is ergodic with respect to \(\pi \). In particular choosing \(\gamma (x) = J\nabla V(x)\) for a constant skew-symmetric matrix
*J*we obtainwhich has been studied in previous works.$$\begin{aligned} \mathrm {d}X_t = -(I + J)\nabla V(X_t)\,\mathrm {d}t + \sqrt{2}\,\mathrm {d}W_t, \end{aligned}$$(15) - 3.Given a target density \(\pi > 0\) on \({{\mathrm{\mathbb {R}}}}^d\), if we consider the augmented target density \(\widehat{\pi }\) on \({{\mathrm{\mathbb {R}}}}^{2d}\) given in (5), then choosingwhere$$\begin{aligned} \gamma ((q,p)) = \left( \begin{array}{c} M^{-1}p \\ -\nabla V(q)\end{array}\right) \quad \text{ and }\quad b = \left( \begin{array}{c}\varvec{0} \\ \sqrt{\varGamma }\end{array}\right) \in \mathbb {R}^{2d \times d}, \end{aligned}$$(16)
*M*and \(\varGamma \) are positive definite symmetric matrices, the conditions of Theorem 1 are satisfied for the target density \(\widehat{\pi }\). The resulting dynamics \((q_t, p_t)_{t\ge 0}\) is determined by the underdamped Langevin equation (4). It is straightforward to verify that the generator is hypoelliptic, [35, Sec 2.2.3.1], and thus \((q_t, p_t)_{t\ge 0}\) is ergodic. - 4.More generally, consider the augmented target density \(\widehat{\pi }\) on \(\mathbb {R}^{2d}\) as above, and choosewhere \(\mu \) and \(\nu \) are scalar constants and \(J_1, J_2 \in \mathbb {R}^{d\times d}\) are constant skew-symmetric matrices. With this choice we recover the perturbed Langevin dynamics (8). It is straightforward to check that (17) satisfies the invariance condition (12), and thus Theorem 1 guarantees that (8) is invariant with respect to \(\widehat{\pi }\).$$\begin{aligned} \gamma ((q,p)) = \left( \begin{array}{c} M^{-1}p - \mu J_1\nabla V(q) \\ -\nabla V(q) - \nu J_2 M^{-1}p\end{array}\right) \quad \text{ and }\quad b = \left( \begin{array}{c}\varvec{0} \\ \sqrt{\varGamma }\end{array}\right) \in \mathbb {R}^{2d \times d}, \end{aligned}$$(17)
- 5.In a similar fashion, one can introduce an augmented target density on \(\mathbb {R}^{(m+2)d}\), withwhere \(p, q, u_i \in \mathbb {R}^d\), for \(i=1,\ldots , m\). Clearly \(\int _{\mathbb {R}^{d}\times \mathbb {R}^{md}} \widehat{\widehat{\pi }}(q, p, u_1,\ldots ,u_m)\,\mathrm {d}p\,\mathrm {d}u_1\,\ldots \mathrm {d}u_m = \pi (q)\). We now define \(\gamma :\mathbb {R}^{(m+2)d}\rightarrow \mathbb {R}^{(m+2)d}\) by$$\begin{aligned} \widehat{\widehat{\pi }}(q, p, u_1,\ldots , u_m) \propto e^{-\left( \frac{|p|^2}{2} + \frac{\vert u_1 \vert ^2 + \ldots + \vert u_m \vert ^2}{2}+V(q)\right) }, \end{aligned}$$and \(b: \mathbb {R}^{(m+2)d}\rightarrow \mathbb {R}^{(m+2)d\times (m+2)d}\) by$$\begin{aligned} \gamma (q,p, u_1,\ldots ,u_m) = \left( p \quad -\nabla _q V(q) + \sum _{j=1}^{m} \lambda _j u_j \quad -\lambda _1 p \quad \cdots \quad -\lambda _m p \right) ^{T} \end{aligned}$$where \(\lambda _i \in \mathbb {R}\) and \(\alpha _i > 0\), for \(i=1,\ldots , m\). The resulting process (9) is given by$$\begin{aligned} b(q,p,u_1,\ldots , u_m) = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c}\varvec{0} &{} \varvec{0} &{} \varvec{0} &{} \varvec{0} &{} \ldots &{} \varvec{0}\\ \varvec{0} &{} \varvec{0} &{} \varvec{0} &{} \varvec{0}&{} \ldots &{} \varvec{0} \\ \varvec{0} &{} \varvec{0} &{} \sqrt{\alpha _1}I_{d\times d} &{} \varvec{0} &{} \ldots &{} \varvec{0} \\ \varvec{0} &{} \varvec{0} &{} \varvec{0} &{} \sqrt{\alpha _2}I_{d\times d} &{} \ldots &{} \varvec{0} \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots \\ \varvec{0} &{} \varvec{0} &{} \varvec{0} &{}\varvec{0} &{} \ldots &{} \sqrt{\alpha _m}I_{d\times d}\end{array}\right) , \end{aligned}$$where \((W^j_t)_{t\ge 0,j=1,\ldots ,m}\) are independent \(\mathbb {R}^d\)–valued Brownian motions. This process is ergodic with unique invariant distribution \(\widehat{\widehat{\pi }}\), and under appropriate conditions on$$\begin{aligned} \begin{aligned} \mathrm {d}q_t&= p_t \,\mathrm {d}t, \quad \mathrm {d}p_t = -\nabla _q V(q_t)\,\mathrm {d}t + \sum _{j=1}^{d}\lambda _j u^{j}(t)\,\mathrm {d}t, \\ \mathrm {d}u^{1}_t&= -\lambda _1 p_t\,\mathrm {d}t -\alpha _1 u^{1}_t\,\mathrm {d}t + \sqrt{2\alpha _1 }\,\mathrm {d}W^{1}_t,\\ \vdots&\\ \mathrm {d}u^{m}_t&= -\lambda _m p_t\,\mathrm {d}t -\alpha _m u^{m}_t\,\mathrm {d}t + \sqrt{2\alpha _m }\,\mathrm {d}W^{m}_t, \end{aligned} \end{aligned}$$(18)
*V*, converges exponentially fast to equilibrium in relative entropy [42]. Equation (18) is a Markovian representation of a generalised Langevin equation of the formwhere$$\begin{aligned} \mathrm {d}q_t = p_t \,\mathrm {d}t, \quad \mathrm {d}p_t = -\nabla _{q}V(q_t) \,\mathrm {d}t - \int _0^t F(t-s)p_s\,\mathrm {d}s + N(t), \end{aligned}$$*N*(*t*) is a mean-zero stationary Gaussian process with autocorrelation function*F*(*t*), i.e. \( \mathbb {E}\left[ N(t) \otimes N(s) \right] = F(t-s)I_{d\times d}\) and \(F(t) = \sum _{i=1}^{m} \lambda _i^2 e^{-\alpha _i|t|}. \) - 6.Let \(\widetilde{\pi }(z) \propto \exp (-\Phi (z))\) be a positive density on \(\mathbb {R}^N\) where \(N > d\) such that \( \pi (x) = \int _{\mathbb {R}^{N-d}}\widetilde{\pi }(x,z)\,\mathrm {d}z, \) where \((x, y)\in \mathbb {R}^d\times \mathbb {R}^{N-d}\). Then choosing \(b = I_{D\times D}\) and \(\gamma = 0\) we obtain the dynamicsthen \((X_t, Y_t)_{t\ge 0}\) is immediately ergodic with respect to \(\widetilde{\pi }\).$$\begin{aligned} \mathrm {d}X_t = -\nabla _x \Phi (X_t, Y_t)\,\mathrm {d}t + \sqrt{2}\,\mathrm {d}W^{1}_t, \quad \mathrm {d}Y_t = -\nabla _y \Phi (X_t, Y_t)\,\mathrm {d}t + \sqrt{2}\,\mathrm {d}W^{2}_t, \end{aligned}$$

### 2.3 Comparison Criteria

*f*, a natural measure of accuracy of the estimator \(\pi _T(f) = t^{-1}\int _0^{t}f(X_s)\,\mathrm {d}s\) is the

*mean square error*(MSE) defined by

*x*. It is instructive to introduce the decomposition \(MSE(f, T) = \mu ^2(f, T) + \sigma ^2(f, T)\), where

### Remark 2

If (21) holds with \(C=1\), this estimate is equivalent to \(-{{\mathrm{\mathcal {L}}}}\) having a spectral gap in \(L^2(\pi )\). Allowing for a constant \(C>1\) is essential for our purposes though in order to treat nonreversible and degenerate diffusion processes by the theory of *hypocoercivity* as outlined in [54].

The following lemma characterises the decay of the bias \(\mu (f,T)\) as \(T\rightarrow \infty \) in terms of \(\lambda \) and *C*. The proof can be found in [41].

### Lemma 1

### Lemma 2

^{1}

*C*and \(\lambda \) appearing in the exponential decay estimate (21) also control the speed of convergence of \(\sigma ^2(f, T)\) to zero. Indeed, it is straightforward to show that if (21) is satisfied, then the solution \(\chi \) of (22) satisfies

*C*and \(\lambda \) in (26) would be an effective means of improving the performance of the estimator \(\pi _T(f)\), especially since the improvement in performance would be uniform over an entire class of observables. When this is possible, this is indeed the case. However, as has been observed in [20, 21, 34], maximising the speed of convergence to equilibrium is a delicate task. As the leading order term in

*MSE*(

*f*,

*T*), it is typically sufficient to focus specifically on the asymptotic variance \(\sigma ^2_{f}\) and study how the parameters of the SDE (9) can be chosen to minimise \(\sigma ^2_{f}\). This study was undertaken in [14] for processes of the form (7).

## 3 Perturbation of Underdamped Langevin Dynamics

The primary objective of this work is to compare the performances of the perturbed underdamped Langevin dynamics (8) and the unperturbed dynamics (4) according to the criteria outlined in Sect. 2.3 and to find suitable choices for the matrices \(J_{1}\), \(J_{2}\), *M* and \(\varGamma \) that improve the performance of the sampler. We begin our investigations of (8) by establishing ergodicity and exponentially fast return to equilibrium, and by studying the overdamped limit of (8). As the latter turns out to be nonreversible and therefore in principle superior to the usual overdamped limit (3), e.g. [21], this calculation provides us with further motivation to study the proposed dynamics.

*S*(i.e. the covariance matrix is given by \(S^{-1}\)). In this case, we advocate the following conditions for the choice of parameters:

Numerical experiments and analysis show that departing significantly from (27) in fact possibly decreases the performance of the sampler. This is in stark contrast to (7), where it is not possible to increase the asymptotic variance by *any* perturbation. For that reason, until now it seems practical to use (8) as a sampler only when a reasonable estimate of the global covariance of the target distribution is available. In the case of Bayesian inverse problems and diffusion bridge sampling, the target measure \(\pi \) is given with respect to a Gaussian prior. We demonstrate the effectiveness of our approach in these applications, taking the prior Gaussian covariance as *S* in (27).

### Remark 3

*J*again denoting an antisymmetric matrix. However, under the change of variables \(p\mapsto (1+J)\tilde{p}\) the above equations transform into

*f*depends only on

*q*(the

*p*-variables are merely auxiliary), the estimator \(\pi _T(f)\) as well as its associated convergence characteristics (i.e. asymptotic variance and speed of convergence to equilibrium) are invariant under this transformation. Therefore, (28) reduces to the underdamped Langevin dynamics (4) and does not represent an independent approach to sampling. Suitable choices of

*M*and \(\varGamma \) will be discussed in Sect. 4.5.

### 3.1 Properties of Perturbed Underdamped Langevin Dynamics

### Lemma 3

The infinitesimal generator \({{\mathrm{\mathcal {L}}}}\) (30) is hypoelliptic.

### Proof

The proof consists of verifying the conditions of Hörmander’s Theorem for the generator (30) and can be found in [41]. \(\square \)

An immediate corollary of this result and of Theorem 1 is that the perturbed underdamped Langevin process (8) is ergodic with unique invariant distribution \(\widehat{\pi }\) given by (5).

As explained in Sect. 2.3, the exponential decay estimate (21) is crucial for our approach, as in particular it guarantees the well-posedness of the Poisson equation (22). From now on, we will therefore make the following assumption on the potential *V*, required to prove exponential decay in \(L^2(\pi )\):

### Assumption 1

*V*is

*bounded*and that the target measure \(\pi (\mathrm {d}q) = \frac{1}{Z}e^{-V}\mathrm {d}q\) satisfies a

*Poincare inequality*, i.e. there exists a constant \(\rho >0\) such that

Sufficient conditions on the potential so that Poincaré’s inequality holds, e.g. the Bakry-Emery criterion, are presented in [7].

### Theorem 2

Under Assumption 1 there exist constants \(C\ge 1\) and \(\lambda >0\) such that the semigroup \((P_t)_{t\ge 0}\) generated by \({{\mathrm{\mathcal {L}}}}\) satisfies exponential decay in \(L^2(\pi )\) as in (21).

### Proof

The proof uses the machinery of hypocoercivity developed in [54] and can be found in [41]. Using the framework of [15], we conjecture that the assumption on the boundedness of the Hessian of *V* can be substantially weakened and more quantitative decay estimates (in particular with respect to \(\mu \) and \(\nu \)) can be obtained. This approach has recently been successfully applied to equilibrium and nonequilibirum Langevin dynamics, see [27, 53]. We leave this work track for future study. \(\square \)

### 3.2 The Overdamped Limit

*d*-dimensional torus \(\mathbb {T}^d \cong (\mathbb {R} / \mathbb {Z})^d\), i.e. we will assume \(q \in \mathbb {T}^d\). Consider the following scaling of (8):

### Proposition 1

Denote by \((q_{t}^{\epsilon },p_{t}^{\epsilon })\) the solution to (32) with (deterministic) initial conditions \((q_{0}^{\epsilon },p_{0}^{\epsilon })=(q_{init},p_{init})\) and by \(q_{t}^{0}\) the solution to (33) with initial condition \(q_{0}^{0}=q_{init}.\) For any \(T>0\), \((q_{t}^{\epsilon })_{0\le t\le T}\) converges to \((q_{t}^{0})_{0\le t\le T}\) in \(L^{2}(\Omega ,C([0,T]),\mathbb {T}^{d})\) as \(\epsilon \rightarrow 0\), i.e. \( \lim _{\epsilon \rightarrow 0}\mathbb {E}\big (\sup _{0\le t\le T}\vert q_{t}^{\epsilon }-q_{t}^{0}\vert ^{2}\big )=0. \)

### Proof

The proof follows standard arguments (see for instance [46]) and can be found in [41]. By a more refined analysis, it is also possible to get information on the rate of convergence; see, e.g. [48, 49]. \(\square \)

### Remark 4

The overdamped limit (33) respects the invariant distribution, in the sense that it is ergodic with respect to \(\pi (\mathrm {d}q) = \frac{1}{Z}e^{-V}\mathrm {d}q\).

The limiting SDE (33) is nonreversible due to the term \(-\mu J_1 \nabla _q V(q_t)\mathrm {d}t\) and also because the matrix \((\nu J_{2}+\varGamma )^{-1}\) is in general neither symmetric nor antisymmetric. This result, together with the fact that nonreversible perturbations of overdamped Langevin dynamics of the form (7) are by now well-known to have improved performance properties, motivates further investigation of the dynamics (8).

## 4 Sampling from a Gaussian Distribution

In this section we study in detail the performance of the Langevin sampler (8) for Gaussian target densities, first considering the case of unit covariance. In particular, we study the optimal choice for the parameters in the sampler, the exponential decay rate and the asymptotic variance. We then extend our results to Gaussian target densities with arbitrary covariance matrices.

### 4.1 Unit Covariance: Small Perturbations

In our study of the dynamics given by (8) we first consider the simple case when \(V(q)=\frac{1}{2}\vert q\vert ^{2}\), i.e. the task of sampling from a Gaussian measure with unit covariance. We will assume \(M=I\), \(\varGamma =\gamma I\) and \(J_{1}=J_{2}=:J\) (so that the \(q-\) and \(p-\)dynamics are perturbed in the same way, albeit posssibly with different strengths \(\mu \) and \(\nu \)). Our first result concerns the asymptotic variance for linear and quadratic observables for small perturbations of equal strength (\(\mu = \nu \)). For sufficiently strong damping \((\gamma >\sqrt{2}\)) always leads to an improvement in asymptotic variance under the nondegeneracy conditions \([J,K]\ne 0\) and \(l\notin \ker J\):

### Theorem 3

*K*and

*J*(and \(\gamma \), as long as \(\gamma >\sqrt{2}\)), i.e.

### Proof

*J*,

*K*] is symmetric. It follows that \({{\mathrm{Tr}}}(JKJK)-{{\mathrm{Tr}}}(J^{2}K^{2})\ge 0\) with equality if and only if \([J,K]=0\). Together with \(-2\gamma ^3+4\gamma < 0\) for \(\gamma > \sqrt{2}\) and \(\gamma - \frac{4}{\gamma ^3} - \gamma ^3 - \frac{1}{\gamma } < 0\) for \(\gamma >0\), the claim follows. \(\square \)

### Remark 5

As we will see in Sect. 4.3, Example 1, if \([J,K]=0\) and \(l\in \ker J\), the asymptotic variance is constant as a function of \(\mu \), i.e. the perturbation has no effect.

Figure 1a, b, c show the asymptotic variance associated with the quadratic observable \(f(q)=q\cdot K q\). In accordance with Theorem 3, the asymptotic variance is at a local maximum at zero perturbation in the case \(\mu =\nu \) (see Fig. 1a). For increasing perturbation strength, the graph shows monotone decay for \(\mu \rightarrow \infty \) (this limiting behaviour will be explored analytically in Sect. 4.3). If the condition \(\mu =\nu \) is only approximately satisfied (Fig. 1b), our numerical examples still exhibits decaying asymptotic variance in the neighbourhood of the critical point. In this case, however, the asymptotic variance diverges for growing values of the perturbation \(\mu \). If the perturbations are opposed (\(\mu =-\nu \)), it is possible for certain observables that the unperturbed dynamics represents a global minimum. Such a case is observed in Fig. 1c. In Fig. 1d, e the observable \(f(q)=l\cdot q\) is considered. If the damping is sufficiently strong (\(\gamma > \sqrt{2}\)), the unperturbed dynamics is at a local maximum of the asymptotic variance (Fig. 1d). Furthermore, the asymptotic variance approaches zero as \(\mu \rightarrow \infty \) (for a theoretical explanation see again Sect. 4.3). The graph in Fig. 1e shows that the assumption of \(\gamma \) not being too small cannot be dropped from Theorem 3. Even in this case though the example shows decay of the asymptotic variance for large values of \(\mu \).

### 4.2 Exponential Decay Rate

*optimal exponential decay rate*in (21), i.e.

*spectral bound*of the generator \({{\mathrm{\mathcal {L}}}}\) by

*B*. In the case where \(\mu =\nu \), the spectrum of

*B*can be computed explicitly.

### Lemma 4

*B*is given by

^{2}

### Proof

*I*is understood to denote the identity matrix of appropriate dimension. The above quantity is zero if and only if

### 4.3 Unit Covariance: Large Perturbations

In the previous subsection we observed that for the particular perturbation \(J_1 = J_2\) and \(\mu = \nu \) [see equation (34)] the perturbed Langevin dynamics demonstrated an improvement in performance for \(\mu \) in a neighbourhood of 0, when the observable is linear or quadratic. Recall that this dynamics is ergodic with respect to a standard Gaussian measure \(\hat{\pi }\) on \(\mathbb {R}^{2d}\) with marginal \(\pi \) with respect to the *q*-variable. As before, we shall consider only observables that do not depend on *p*. Moreover, we assume without loss of generality that \(\pi (f) = 0\). For such observables we will write \(f \in L_0^2(\pi )\) and consider the canonical embedding \(L_0^2(\pi ) \subset L^2(\hat{\pi })\). We emphasize that \(L_0^2(\pi )\) consists of functions that only depend on *q*, whereas functions in \(L^2(\hat{\pi })\) may depend on both *q* and *p*.

*B*and

*Q*are assumed to be such that \(\mathcal {L}\) is the generator of an ergodic stochastic process (see [2, Definition 2.1] for precise conditions).

### Theorem 4

- (a)The space \(L^{2}(\widehat{\pi })\) has a decomposition into mutually orthogonal subspaces:$$\begin{aligned} L^{2}(\widehat{\pi })=\bigoplus _{m\in \mathbb {N}_{0}}H_{m}. \end{aligned}$$
- (b)
For all \(m\in \mathbb {N}_{0}\), \(H_{m}\) is invariant under \(\mathcal {L}\) as well as under the semigroup \((e^{t\mathcal {L}})_{t\ge 0}\).

- (c)The spectrum of \(\mathcal {L}\) has the following decomposition:$$\begin{aligned} \sigma (\mathcal {L})=\bigcup _{m\in \mathbb {N}_{0}}\sigma (\mathcal {L}\vert _{H_{m}}), \quad \sigma (\mathcal {L}\vert _{H_{m}})=\left\{ \sum _{j=1}^{2d}\alpha _{j}\lambda _{j}:\,\vert \alpha \vert =m,\,\lambda _{j}\in \sigma (B)\right\} . \end{aligned}$$

### Remark 6

Our first main result of this section is an expression for the asymptotic variance in terms of the unperturbed operator \(\mathcal {L}_{0}\) and the perturbation \(\mathcal {A}\):

### Proposition 2

### Remark 7

The proof of the preceding Proposition will show that \(\mathcal {L}_{0}^{2}+\mu ^{2}\mathcal {A}^{*}\mathcal {A}\) is invertible on \(L^2_0(\widehat{\pi })\) and that \((\mathcal {L}_{0}^{2}+\mu ^{2}\mathcal {A}^{*}\mathcal {A})^{-1}f \in \mathcal {D}(\mathcal {L}_0)\) for all \(f \in L^2_0(\widehat{\pi })\).

*generator with reversed perturbation*

*momentum flip operator*

*P*are gathered in the following lemma:

### Lemma 5

- (a)The generator \(\mathcal {L}_{0}\) is symmetric in \(L^2(\widehat{\pi })\) with respect to
*P*:$$\begin{aligned} \langle \phi , P\mathcal {L}_{0}P \psi \rangle _{L^2(\widehat{\pi })}=\langle \mathcal {L}_{0} \phi , \psi \rangle _{L^2(\widehat{\pi })}. \end{aligned}$$ - (b)The perturbation \(\mathcal {A}\) is skewadjoint in \(L^{2}(\widehat{\pi })\):$$\begin{aligned} \mathcal {A}^{*} = -\mathcal {A}. \end{aligned}$$
- (c)The operators \(\mathcal {L}_{0}\) and \(\mathcal {A}\) commute:$$\begin{aligned}{}[\mathcal {L}_{0},\mathcal {A}]\phi =0. \end{aligned}$$
- (d)The perturbation \(\mathcal {A}\) satisfies$$\begin{aligned} P\mathcal {A}P\phi =\mathcal {A}\phi . \end{aligned}$$
- (e)\(\mathcal {L}\) and \(\mathcal {L}_{-}\) commute,$$\begin{aligned}{}[\mathcal {L},\mathcal {L}_{-}]\phi = 0, \end{aligned}$$and the following relation holds:$$\begin{aligned} \langle \phi ,P\mathcal {L}P\psi \rangle _{L^{2}(\widehat{\pi })}=\langle \mathcal {L}_{-}\phi ,\psi \rangle _{ L^{2}(\widehat{\pi })}. \end{aligned}$$(52)
- (f)
The operators \(\mathcal {L}\), \(\mathcal {L}_0\), \(\mathcal {L}_{-}\), \(\mathcal {A}\) and

*P*leave the Hermite spaces \(H_m\) invariant.

### Remark 8

The claim (c) in the above lemma is crucial for our approach, which itself rests heavily on the fact that the \(q-\) and \(p-\)perturbations match (\(J_{1}=J_{2}\)).

### Proof of Lemma 5

The statement (a) is well-known and its proof can be found in [35, Sect. 2.2.3.1] for instance. The claim (b) follows by noting that the flow vector field \(b(q,p)=(-Jq,-Jp)\) associated to \(\mathcal {A}\) is divergence-free with respect to \(\widehat{\pi }\), i.e. \(\nabla \cdot (\widehat{\pi }b)=0\). Therefore, \(\mathcal {A}\) is the generator of a strongly continuous unitary semigroup on \(L^2(\widehat{\pi })\) and hence skewadjoint by Stone’s Theorem. The claims (c), (d) and (e) follow by direct computations which can be found in [41]. To prove (f) first notice that \(\mathcal {L}\), \(\mathcal {L}_0\) and \(\mathcal {L}_{-}\) are of the form (36) and therefore leave the spaces \(H_m\) invariant by Theorem 4. It follows immediately that also \(\mathcal {A}\) leaves those spaces invariant. The fact that *P* leaves the spaces \(H_m\) invariant follows directly by inspection of (49) and (51). \(\square \)

Now we proceed with the proof of Proposition 2:

### Proof of Proposition 2

*V*is quadratic, Assumption 1 clearly holds and thus Lemma 2 ensures that \(\mathcal {L}\) and \(\mathcal {L}_{-}\) are invertible on \(L^2_{0}(\widehat{\pi })\) with

### Theorem 5

Let \(f\in L_{0}^{2}(\pi )\) (so in particular \(f = f(q)\)). Then \( \lim _{\mu \rightarrow \infty }\sigma _{f}^{2}(\mu )=\sigma _{\varPi f}^{2}(0)\le \sigma _{f}^{2}(0). \)

### Remark 9

Note that the fact that the limit exists and is finite is nontrivial. In particular, as Fig. 1b, c demonstrate, it is often the case that \(\lim _{\mu \rightarrow \infty }\sigma _{f}^{2}(\mu )=\infty \) if the condition \(\mu =\nu \) is not satisfied.

### Remark 10

*f*that only depend on

*q*, \(f \in \ker (Jq\cdot \nabla _q)\) is equivalent to \(f \in \ker \mathcal {A}\). Let us denote by \( \bar{\sigma } = \bigcap _{\mu \in \mathbb {R}} \sigma (\mathcal {L}_0 + \mu \mathcal {A}) \) the part of \(\sigma (\mathcal {L}_0)\) that is not affected by the perturbation and by

^{3}In Fig. 2a, \(\bar{\sigma }\) has been highlighted by diamonds.

### Proof of Theorem 5

*p*, \(\varPi f\in \ker (Jq\cdot \nabla _q)\) and \((1-\varPi )f\in \bigoplus _{i\ge 1}W_{i}\). Since \(\mathcal {L}_{0}\) commutes with \(\mathcal {A}\), it follows that \((-\mathcal {L}_{0})^{-1}\) leaves both \(W_{0}\) and \(\bigoplus _{i\ge 1}W_{i}\) invariant. Therefore, as the latter spaces are orthogonal to each other, it follows that \(R=0\), from which the result follows. \(\square \)

From Theorem 5 it follows that in the limit as \(\mu \rightarrow \infty \), the asymptotic variance \(\sigma _f^2(\mu )\) is not decreased by the perturbation if \(f \in \ker (Jq \cdot \nabla _q)\). In fact, this result also holds true non-asymptotically, i.e. observables in \(\ker (Jq \cdot \nabla _q)\) are not affected at all by the perturbation:

### Lemma 6

Let \(f\in \ker (Jq\cdot \nabla _q)\). Then \( \sigma ^2_f(\mu ) = \sigma ^2_f(0) \) for all \(\mu \in \mathbb {R}\).

### Proof

From \(f \in \ker (Jq\cdot \nabla _q)\) it follows immediately that \(f \in \ker \mathcal {A}^{*}\mathcal {A}\). Then the claim follows from the expression (57). \(\square \)

### Example 1

The following result shows that the dynamics (34) is particularly effective for antisymmetric observables (at least in the limit of large perturbations):

### Proposition 3

*J*are rationally independent, i.e.

### Proof of Proposition 3

The claim would immediately follow from \(f\in \ker (Jq\cdot \nabla )^{\perp }\) according to Theorem 5, but that does not seem to be so easy to prove directly. Instead, we again make use of the Hermite polynomials.

*f*being antisymmetric it follows that \( f\in \bigoplus _{m\ge 1,m\,\text {odd}}H_{m}. \) In view of (45), ((c)) and (58) the spectrum of \(\mathcal {L}_{\vert H_{m}}\) can be written as

*B*(0,

*r*) denotes the ball of radius

*r*centered at the origin in \(\mathbb {C}\). Consequently, the spectral radius of \((-\mathcal {L}\vert _{H_m})^{-1}\) and hence \((-\mathcal {L}\vert _{H_m})^{-1}\) itself converges to zero as \(\mu \rightarrow \infty \). The result then follows from (59). \(\square \)

### Remark 11

The idea of the preceding proof can be explained using Fig. 2a and Remark 10. The eigenvalues in the fixed spectrum \(\bar{E}\) (on the real axis, highlighted by diamonds) correspond to Hermite polynomials of even order. The independence condition on the eigenvalues of *J* prevents cancellations that would lead to fixed eigenvalues associated to Hermite polynomials of odd order. Therefore, antisymmetric observables are orthogonal to \(\bar{E} = \ker \mathcal {A}\).

The following corollary gives a version of the converse of Proposition 3 and provides further intuition into the mechanics of the variance reduction achieved by the perturbation.

### Corollary 1

*B*(0,

*r*) denotes the ball centered at 0 with radius

*r*.

### Proof

*J*is antisymmetric, we have that \(Jq\cdot \nabla \phi _n=\nabla \cdot (\phi _n Jq)\). Now Gauss’s theorem yields

*n*denotes the outward normal to the sphere \(\partial B(0,r)\). This quantity is zero due to the orthogonality of

*Jq*and

*n*, and so the result follows from Lebesgue’s dominated convergence theorem. \(\square \)

### 4.4 Optimal Choices of *J* for Quadratic Observables

Assume \(f\in L_{0}^{2}(\pi )\) is given by \(f(q)=q\cdot Kq+l\cdot q -{{\mathrm{Tr}}}K\), with \(K\in \mathbb {R}_{sym}^{d\times d}\) and \(l\in \mathbb {R}^{d}\) (note that the constant term is chosen such that \( \pi (f)=0 \)). Our objective is to choose *J* in such a way that \(\lim _{\mu \rightarrow \infty }\sigma _{f}^{2}(\mu )\) becomes as small as possible. To stress the dependence on the choice of *J*, we introduce the notation \(\sigma _{f}^{2}(\mu ,J)\). Also, we denote the orthogonal projection onto \((\ker J)^{\perp }\) by \(\varPi ^{\perp }_{\ker J}\).

### Lemma 7

### Proof

According to Theorem 5, we have to show that \(\varPi f=0\), where \(\varPi \) is the \(L^{2}(\pi )\)-orthogonal projection onto \(\ker (Jq\cdot \nabla )\). Let us thus use (63) and prove that \( f\in \overline{{{\mathrm{im}}}(Jq\cdot \nabla )}. \) Indeed, since \(\varPi ^{\perp }_{\ker J}{l}=0\), by Fredholm’s alternative there exists \(u \in \mathbb {R}^d\) such that \(Ju=l\). Now define \(\phi \in L_{0}^{2}(\pi )\) by \(\phi (q)=-u\cdot q,\) leading to \( f=Jq\cdot \nabla \phi , \) so the result follows. \(\square \)

### Lemma 8

- (a)
There exists an antisymmetric matrix

*J*such that \(\lim _{\mu \rightarrow \infty }\sigma _{f_{0}}^{2}(\mu ,J)=0,\) and there is an algorithmic way (see Algorithm 1) to compute an appropriate*J*in terms of*K*. - (b)
The trace-part is not effected by the perturbation, i.e. \(\sigma _{f_{1}}^{2}(\mu ,J)=\sigma _{f_{1}}^{2}(0)\) for all \(\mu \in \mathbb {R}\).

### Proof

*J*such that \( \lim _{\mu \rightarrow \infty }\sigma _{f_{0}}^{2}(\mu ,J)=0 \) can therefore be accomplished by constructing an antisymmetric matrix

*J*such that there exists a symmetric matrix

*A*with the property \(K_{0}=[A,J]\). Given any traceless matrix \(K_{0}\) there exists an orthogonal matrix \(U\in O(\mathbb {R}^{d})\) such that \(UK_{0}U^{T}\) has zero entries on the diagonal, and that

*U*can be obtained in an algorithmic manner (see for example [29] or [22, Chap. 2, Sect. 2, Problem 3]) Assume thus that such a matrix \(U\in O(\mathbb {R}^{d})\) has been found and choose real numbers \(a_1,\ldots ,a_d \in \mathbb {R}\) such that \(a_{i}\ne a_{j}\) if \(i\ne j\). We now set \( \bar{A}={{\mathrm{diag}}}(a_{1},\ldots ,a_{n}), \) and

*J*constructed in this way indeed satisfies (4.4). For the second claim, note that \(f_{1}\in \ker (Jq\cdot \nabla )\), since \( Jq\cdot \nabla \left( q\cdot \frac{{{\mathrm{Tr}}}K}{d}q\right) =2\frac{{{\mathrm{Tr}}}K}{d}q\cdot Jq=0 \) due to the antisymmetry of

*J*. The result then follows from Lemma 6. \(\square \)

We would like to stress that the perturbation *J* constructed in the previous lemma is far from unique due to the freedom of choice of *U* and \(a_1,\ldots ,a_d \in \mathbb {R}\) in its proof. However, it is asymptotically optimal:

### Corollary 2

### Proof

The claim follows immediately since \(f_{1}\in \ker (Jq\cdot \nabla )\) for arbitrary antisymmetric *J* as shown in (4.4), and therefore the contribution of the trace part \(f_1\) to the asymptotic variance cannot be reduced by any choice of *J* according to Lemma 6. \(\square \)

As the proof of Lemma 8 is constructive, we obtain the following algorithm for determining optimal perturbations for quadratic observables:

### Algorithm 1

*J*as follows:

- 1.
Set \(K_{0}=K-\frac{{{\mathrm{Tr}}}K}{d}\cdot I.\)

- 2.
Find \(U\in O(\mathbb {R}^{d})\) such that \(UK_{0}U^{T}\) has zero entries on the diagonal.

- 3.Choose \(a_{i}\in \mathbb {R},\) \(i=1,\ldots d\) such that \(a_{i}\ne a_{j}\) for \(i\ne j\) and setfor \(i\ne j\) and \(\bar{J}_{ii}=0\) otherwise.$$\begin{aligned} \bar{J}_{ij}=\frac{(UK_{0}U^{T})_{ij}}{a_{i}-a_{j}} \end{aligned}$$
- 4.
Set \(J=U^{T}\bar{J}U\).

### Remark 12

In [14], the authors consider the task of finding optimal perturbations *J* for the nonreversible overdamped Langevin dynamics given in (15). In the Gaussian case this optimization problem turns out be equivalent to the one considered in this section. Indeed, equation (39) of [14] can be rephrased as \( f \in \ker (Jq\cdot \nabla )^{\perp }. \) Therefore, Algorithm 1 and its generalization Algorithm 2 (described in Sect. 4.5) can be used without modifications to find optimal perturbations of overdamped Langevin dynamics.

### 4.5 Gaussians with Arbitrary Covariance and Preconditioning

### Corollary 3

### Proof

*S*is nondegenerate. The second condition is equivalent to \( S^{1/2}J_{1}l\ne 0, \) which is equivalent to \(J_{1}l\ne 0,\) again by nondegeneracy of

*S*. \(\square \)

### Corollary 4

Assume the setting from the previous corollary and denote by \(\varPi \) the orthogonal projection onto \(\ker (J_{1}Sq\cdot \nabla )\). For \(f\in L^{2}(\pi )\) it holds that \( \lim _{\mu \rightarrow \infty }\sigma _{f}^{2}(\mu )=\sigma _{\varPi f}^{2}(0)\le \sigma _{f}^{2}(0). \)

### Proof

Theorem 5 implies \( \lim _{\mu \rightarrow \infty }\widetilde{\sigma }_{\widetilde{f}}^{2}(\mu )=\widetilde{\sigma }_{\widetilde{\varPi }\widetilde{f}}^{2}(0)\le \widetilde{\sigma }_{\widetilde{f}}^{2}(0) \) for the transformed system (66). Here \(\widetilde{f}(q)=f(S^{-1/2}q)\) is the transformed observable and \(\widetilde{\varPi }\) denotes \(L^{2}(\pi )\)-orthogonal projection onto \(\ker (S^{1/2}J_{1}S^{1/2}q\cdot \nabla )\). According to (4.5), it is sufficient to show that \((\varPi f)\circ S^{-1/2}=\widetilde{\varPi }\widetilde{f}\). This however follows directly from the fact that the linear transformation \(\phi \mapsto \phi \circ S^{1/2}\) maps \(\ker (S^{1/2}J_{1}S^{1/2}q\cdot \nabla )\) bijectively onto \(\ker (J_{1}Sq\cdot \nabla )\). \(\square \)

Let us also reformulate Algorithm 1 for the case of a Gaussian with arbitrary covariance.

### Algorithm 2

*S*is nondegenerate), determine optimal perturbations \(J_{1}\) and \(J_{2}\) as follows:

- 1.
Set \(\widetilde{K}=S^{-1/2}KS^{-1/2}\) and \(\widetilde{K}_{0}=\widetilde{K}-\frac{{{\mathrm{Tr}}}\widetilde{K}}{d}\cdot I\).

- 2.
Find \(U\in O(\mathbb {R}^{d})\) such that \(U\widetilde{K}_{0}U^{T}\) has zero entries on the diagonal.

- 3.Choose \(a_{i}\in \mathbb {R}\), \(i=1,\ldots ,d\) such that \(a_{i}\ne a_{j}\) for \(i\ne j\) and set$$\begin{aligned} \bar{J}_{ij}=\frac{(U\widetilde{K}_{0}U^{T})_{ij}}{a_{i}-a_{j}}. \end{aligned}$$
- 4.
Set \(\widetilde{J}=U^{T}\bar{J}U\).

- 5.
Put \(J_{1}=S^{-1/2}\widetilde{J}S^{-1/2}\) and \(J_{2}=S^{1/2}JS^{1/2}\).

Finally, we obtain the following optimality result from Lemma 7 and Corollary 2.

### Corollary 5

### Remark 13

Since in Sect. 4.1 we analysed the case where \(J_{1}\) and \(J_{2}\) are proportional, we are not able to drop the restriction \(J_{2}=SJ_{1}S\) from the above optimality result. Analysis of completely arbitrary perturbations will be the subject of future work.

### Remark 14

*S*is diagonal, i.e. \(S={{\mathrm{diag}}}(s^{(1)},\ldots ,s^{(d)})\) and that \(M={{\mathrm{diag}}}(m^{(d)},\ldots ,m^{(d)})\) and \(\varGamma ={{\mathrm{diag}}}(\gamma ^{(d)},\ldots ,\gamma ^{(d)})\) are chosen diagonally as well. Then (65) decouples into one-dimensional SDEs of the following form:

*i*, leading to the restriction \(M=cS\) with \(c>0\). Choosing

*c*small will result in fast convergence to equilibrium, but also make the dynamics (69) quite stiff, requiring a very small timestep \(\varDelta t\) in a discretisation scheme. The choice of

*c*will therefore need to strike a balance between those two competing effects. The constraint (71) then implies \(\varGamma =2cS\). By a coordinate transformation, the preceding argument also applies if

*S*,

*M*and \(\varGamma \) are diagonal in the same basis, and of course

*M*and \(\varGamma \) can always be chosen that way. Numerical experiments show that it is possible to increase the rate of convergence to equilibrium even further by choosing

*M*and \(\varGamma \) nondiagonally with respect to

*S*(although only by a small margin). A clearer understanding of this is a topic of further investigation.

## 5 Numerical Experiments: Diffusion Bridge Sampling

### 5.1 Numerical Scheme

*BAOAB*scheme (see [33] and references therein) has proven to be efficient for computing long time ergodic averages with respect to

*q*-dependent observables. Motivated by this, we introduce the following perturbed scheme, introducing additional Runge-Kutta integration steps:

*O*-part without much computational overhead. We emphasize the fact that many different splitting schemes could be investigated: although the BAOAB-scheme works well for unperturbed Langevin dynamics, it is not clear whether this remains true for the perturbed dynamics. Moreover, the perturbations introduced by \(J_1\) and \(J_2\) can be added in various places. Other discretisation schemes for the ODE (73) could be useful as well, for instance one could use a symplectic integrator, using the Hamiltonian structure of (73). However, since

*V*as the Hamiltonian for (73) is not separable in general, such a symplectic integrator would have to be implicit. Note that (72c) and (72e) could be merged since (72e) commutes with (72d). In this paper, we content ourselves with the above scheme for our numerical experiments. Investigation of optimal numerical schemes for perturbed Langevin dynamics is an interesting problem for further research.

### Remark 15

The aformentioned schemes lead to an error in the approximation for \(\pi (f)\), since the invariant measure \(\pi \) is not preserved exactly by the numerical scheme. In practice, the *BAOAB*-scheme can therefore be accompanied by an accept-reject Metropolis step as in [40], leading to an unbiased estimate of \(\pi (f)\), albeit with an inflated variance. In this case, after every rejection the momentum variable has to be flipped (\(p\mapsto -p\)) in order to keep the correct invariant measure. We note here that our perturbed scheme can be ’Metropolized’ in a similar way by ’flipping’ the matrices \(J_{1}\) and \(J_{2}\) after every rejection (\(J_{1}\mapsto -J_{1}\) and \(J_{2}\mapsto -J_{2})\) and using an appropriate (volume-preserving and time-reversible) integrator for the dynamics given by (73). Implementations of this idea are the subject of ongoing work. See [47] for a similar approach to nonreversible overdamped Langevin dynamics.

### 5.2 Diffusion Bridge Sampling

*U*is multimodal. This setting has been used as a test case for sampling probability measures in high dimensions (see for example [9] and [45]). For a more detailed introduction (including applications) see [11] and for a rigorous theoretical treatment the papers [11, 24, 25, 26].

*s*-interval [0, 1] according to

*M*, \(\varGamma \), \(J_{1}\) and \(J_{2}\) according to the recommended choice in the Gaussian case, (27), where we take \(S=\frac{\beta }{2}\delta \cdot A_{\delta }\) as the precision operator of the Gaussian target. We will consider the linear observable \(f_{1}(x)=l\cdot x\) with \(l=(1,\ldots ,1)\) and the quadratic observable \(f_{2}(x)=\vert x\vert ^{2}\). In a first experiment we adjust the perturbations \(J_{1}\) and \(J_{2}\) to the observable \(f_{2}\) according to Algorithm 2. The dynamics (8) is integrated using the splitting scheme introduced in Sect. 5.1 with a stepsize of \(\varDelta t=10^{-4}\) over the time interval [0,

*T*] with \(T=10^{2}\). Furthermore, we choose initial conditions \(q_0=(1,\ldots ,1)\), \(p_0=(0,\ldots ,0)\) and introduce a burn-in time \(T_{0}=1\), i.e. we take the estimator to be \( \hat{\pi }(f)\approx \frac{1}{T-T_{0}}\int _{T_{0}}^{T}f(q_{t})\mathrm {d}t. \) We compute the variance of the above estimator from \(N=500\) realisations and compare the results for different choices of the friction coefficient \(\gamma \) and of the perturbation strength \(\mu \).

In the regime of growing values of \(\mu \), the experiments confirm the results from Sect. 4.3, i.e. the asymptotic variance approaches a limit that is smaller than the asymptotic variance of the unperturbed dynamics.

As a final remark we report our finding that the performance of the sampler for the linear observable is qualitatively independent of the choice of \(J_1\) [as long as \(J_2\) is adjusted according to (27)]. This result is in alignment with Propostion 3 which predicts good properties of the sampler for antisymmetric observables. In contrast to this, a judicious choice of \(J_1\) is critical for quadratic observables. In particular, applying Algorithm 2 significantly improves the performance of the perturbed sampler in comparison to choosing \(J_1\) arbitrarily.

## 6 Outlook and Future Work

A new family of Langevin samplers was introduced in this paper. These new SDE samplers consist of perturbations of the underdamped Langevin dynamics (that is known to be ergodic with respect to the canonical measure), where auxiliary drift terms in the equations for both the position and the momentum are added, in a way that the perturbed family of dynamics is ergodic with respect to the same (canonical) distribution. These new Langevin samplers were studied in detail for Gaussian target distributions where it was shown, using tools from spectral theory for differential operators, that an appropriate choice of the perturbations in the equations for the position and momentum can improve the performance of the Langvin sampler, at least in terms of reducing the asymptotic variance. The performance of the perturbed Langevin sampler to non-Gaussian target densities was tested numerically on the problem of diffusion bridge sampling.

The work presented in this paper can be improved and extended in several directions. First, a rigorous analysis of the new family of Langevin samplers for non-Gaussian target densities is needed. The analytical tools developed in [14] can be used as a starting point. Furthermore, the study of the actual computational cost and its minimization by an appropriate choice of the numerical scheme and of the perturbations in position and momentum would be of interest to practitioners. In addition, the analysis of our proposed samplers can be facilitated by using tools from symplectic and differential geometry. Finally, combining the new Langevin samplers with existing variance reduction techniques such as zero variance MCMC, preconditioning/Riemannian manifold MCMC can lead to sampling schemes that can be of interest to practitioners, in particular in molecular dynamics simulations. All these topics are currently under investigation.

## Footnotes

- 1.
In fact, using the results from [8], we could consider observables in \(L^2(\pi )\). However, we will not extend this point further in this paper.

- 2.
Notice that \(\sqrt{\left( \frac{\gamma }{2}\right) ^2-1}\) is understood to be a complex number for \(\gamma < 2\).

- 3.
Indeed, the fact that \(f\in \ker \mathcal {A}\) is equivalent to \(f \in \bar{E}\) is easy to check if

*f*is an eigenvector of \(\mathcal {L}_0\) (recall that*f*is then an eigenvector of \(\mathcal {L}_0 +\mu \mathcal {A}\) as well, using Lemma 5(c) The claim then follows by extending linearly.

## Notes

### Acknowledgements

AD was supported by the EPSRC under Grant No. EP/J009636/1. NN is supported by EPSRC through a Roth Departmental Scholarship. GP is partially supported by the EPSRC under Grants No. EP/J009636/1, EP/L024926/1, EP/L020564/1 and EP/L025159/1. Part of the work reported in this paper was done while NN and GP were visiting the Institut Henri Poincaré during the Trimester Program “Stochastic Dynamics Out of Equilibrium”. The hospitality of the Institute and of the organizers of the program is greatly acknowledged. We thank the referees for their useful comments and suggestions that have lead to various improvements in the presentation of this paper.

## References

- 1.Achleitner, F., Arnold, A., Stürzer, D.: Large-time behavior in non-symmetric Fokker-Planck equations. Riv. Math. Univ. Parma (N.S.)
**6**(1), 1–68 (2015)MATHMathSciNetGoogle Scholar - 2.Arnold, A., Erb, J.: Sharp entropy decay for hypocoercive and non-symmetric Fokker-Planck equations with linear drift. arXiv:1409.5425v2 (2014)
- 3.Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability, vol. 57. Springer, New York (2007)MATHGoogle Scholar
- 4.Alrachid, H., Mones, L., Ortner, C.: Some remarks on preconditioning molecular dynamics. arXiv preprint arXiv:1612.05435 (2016)
- 5.Bass, R.F.: Diffusions and Elliptic Operators. Springer, Berlin (1998)MATHGoogle Scholar
- 6.Bennett, C.H.: Mass tensor molecular dynamics. J. Comput. Phys.
**19**(3), 267–279 (1975)CrossRefADSGoogle Scholar - 7.Bakry, D., Gentil, I., Ledoux, M.: Analysis and Geometry of Markov Diffusion Operators, vol. 348. Springer, Berlin (2013)MATHGoogle Scholar
- 8.Bhattacharya, R.N.: On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. Verw. Gebiete
**60**(2), 185–201 (1982)CrossRefMATHMathSciNetGoogle Scholar - 9.Beskos, A., Pinski, F.J., Sanz-Serna, J.M., Stuart, A.M.: Hybrid Monte Carlo on Hilbert spaces. Stoch. Process. Appl.
**121**(10), 2201–2230 (2011)CrossRefMATHMathSciNetGoogle Scholar - 10.Beskos, A., Roberts, G., Stuart, A., Voss, J.: MCMC methods for diffusion bridges. Stoch. Dyn.
**8**(3), 319–350 (2008)CrossRefMATHMathSciNetGoogle Scholar - 11.Beskos, A., Stuart, A.: MCMC methods for sampling function space. In: ICIAM 07—6th International Congress on Industrial and Applied Mathematics, pp. 337–364. European Mathematical Society, Zürich (2009)Google Scholar
- 12.Ceriotti, M., Bussi, G., Parrinello, M.: Langevin equation with colored noise for constant-temperature molecular dynamics simulations. Phys. Rev. Lett.
**102**(2), 020601 (2009)CrossRefADSGoogle Scholar - 13.Cattiaux, P., Chafaı, D., Guillin, A.: Central limit theorems for additive functionals of ergodic markov diffusions processes. ALEA
**9**(2), 337–382 (2012)MATHMathSciNetGoogle Scholar - 14.Duncan, A.B., Lelievre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys.
**163**(3), 457–491 (2016)CrossRefMATHADSMathSciNetGoogle Scholar - 15.Dolbeault, J., Mouhot, C., Schmeiser, C.: Hypocoercivity for linear kinetic equations conserving mass. Trans. Am. Math. Soc.
**367**(6), 3807–3828 (2015)CrossRefMATHMathSciNetGoogle Scholar - 16.Eyink, G.L., Lebowitz, J.L., Spohn, H.: Hydrodynamics and fluctuations outside of local equilibrium: driven diffusive systems. J. Stat. Phys.
**83**(3–4), 385–472 (1996)CrossRefMATHADSMathSciNetGoogle Scholar - 17.Engel, K.-J., Nagel, R.: One-parameter semigroups for linear evolution equations, volume 194 of Graduate Texts in Mathematics. Springer, New York (2000). With contributions by Brendle, S., Campiti, M., Hahn, T., Metafune, G., Nickel, G., Pallara, D., Perazzoli, C., Rhandi, A., Romanelli, S., Schnaubelt, RGoogle Scholar
- 18.Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol.
**73**(2), 123–214 (2011). With discussion and a reply by the authorsCrossRefMathSciNetGoogle Scholar - 19.Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. Texts in Statistical Science Series, 3rd edn. CRC Press, Boca Raton, FL (2014)MATHGoogle Scholar
- 20.Hwang, C.-R., Hwang-Ma, S., Sheu, S.J.: Accelerating Gaussian diffusions. Ann. Appl. Probab.
**3**(3), 897–913 (1993)CrossRefMATHMathSciNetGoogle Scholar - 21.Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating diffusions. Ann. Appl. Probab.
**15**(2), 1433–1444 (2005)CrossRefMATHMathSciNetGoogle Scholar - 22.Horn, Roger A., Johnson, Charles R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2013)MATHGoogle Scholar
- 23.Hwang, C.-R., Normand, R., Wu, S.-J.: Variance reduction for diffusions. Stoch. Process. Appl.
**125**(9), 3522–3540 (2015)CrossRefMATHMathSciNetGoogle Scholar - 24.Hairer, M., Stuart, A.M., Voss, J.: Analysis of SPDEs arising in path sampling. II. The nonlinear case. Ann. Appl. Probab.
**17**(5–6), 1657–1706 (2007)CrossRefMATHMathSciNetGoogle Scholar - 25.Hairer, M., Stuart, A., Voss, J.: Sampling conditioned diffusions. In: Trends in Stochastic Analysis. London London Mathematical Society Lecture Note Series, vol. 353, pp. 159–185. Cambridge University Press, Cambridge (2009)Google Scholar
- 26.Hairer, M., Stuart, A.M., Voss, J., Wiberg, P.: Analysis of SPDEs arising in path sampling. I. The Gaussian case. Commun. Math. Sci.
**3**(4), 587–603 (2005)CrossRefMATHMathSciNetGoogle Scholar - 27.Iacobucci, A., Olla, S., Stoltz, G.: Convergence rates for nonequilibrium Langevin dynamics. arXiv:1702.03685 (2017)
- 28.Joulin, A., Ollivier, Y.: Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab.
**38**(6), 2418–2442 (2010)CrossRefMATHMathSciNetGoogle Scholar - 29.Kazakia, J.Y.: Orthogonal transformation of a trace free symmetric matrix into one with zero diagonal elements. Int. J. Eng. Sci.
**26**(8), 903–906 (1988)CrossRefMATHMathSciNetGoogle Scholar - 30.Kliemann, W.: Recurrence and invariant measures for degenerate diffusions. Ann Probab
**15**, 690–707 (1987)CrossRefMATHMathSciNetGoogle Scholar - 31.Komorowski, T., Landim, C., Olla, S.: Fluctuations in Markov processes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 345. Springer, Heidelberg. Time symmetry and martingale approximation (2012)Google Scholar
- 32.Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Berlin (2008)MATHGoogle Scholar
- 33.Leimkuhler, B., Matthews,C.: Molecular Dynamics. Interdisciplinary Applied Mathematics, vol. 39. Springer, Berlin (2015). With deterministic and stochastic numerical methodsGoogle Scholar
- 34.Lelièvre, T., Nier, F., Pavliotis, G.A.: Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J. Stat. Phys.
**152**(2), 237–274 (2013)CrossRefMATHADSMathSciNetGoogle Scholar - 35.Lelièvre, T., Rousset, M., Stoltz, G.: Free Energy Computations. Imperial College Press, London (2010). A Mathematical PerspectiveGoogle Scholar
- 36.Lelièvre, T., Stoltz, G.: Partial differential equations and stochastic methods in molecular dynamics. Acta Numer.
**25**, 681–880 (2016)CrossRefMATHMathSciNetGoogle Scholar - 37.Ma, Y.-A., Chen, T., Fox, E.: A complete recipe for stochastic gradient MCMC. In: Advances in Neural Information Processing Systems, pp. 2899–2907 (2015)Google Scholar
- 38.Metafune, G., Pallara, D., Priola, E.: Spectrum of Ornstein-Uhlenbeck operators in \(L^p\) spaces with respect to invariant measures. J. Funct. Anal.
**196**(1), 40–60 (2002)CrossRefMATHMathSciNetGoogle Scholar - 39.Markowich, P.A., Villani, C.: On the trend to equilibrium for the Fokker-Planck equation: an interplay between physics and functional analysis. Mat. Contemp.
**19**, 1–29 (2000)MATHMathSciNetGoogle Scholar - 40.Matthews, C., Weare, J., Leimkuhler, B.: Ensemble preconditioning for Markov Chain Monte Carlo simulation. arXiv:1607.03954 (2016)
- 41.Nüsken, N.: Construction of optimal samplers (in preparation). PhD thesis, Imperial College London (2018)Google Scholar
- 42.Ottobre, M., Pavliotis, G.A.: Asymptotic analysis for the generalized Langevin equation. Nonlinearity
**24**(5), 1629 (2011)CrossRefMATHADSMathSciNetGoogle Scholar - 43.Ottobre, M., Pavliotis, G.A., Pravda-Starov, K.: Exponential return to equilibrium for hypoelliptic quadratic systems. J. Funct. Anal.
**262**(9), 4000–4039 (2012)CrossRefMATHMathSciNetGoogle Scholar - 44.Ottobre, M., Pavliotis, G.A., Pravda-Starov, K.: Some remarks on degenerate hypoelliptic Ornstein-Uhlenbeck operators. J. Math. Anal. Appl.
**429**(2), 676–712 (2015)CrossRefMATHMathSciNetGoogle Scholar - 45.Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli
**22**(1), 60–106 (2016)CrossRefMATHMathSciNetGoogle Scholar - 46.Pavliotis, G.A.: Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, vol. 60. Springer, Berlin (2014)MATHGoogle Scholar
- 47.Poncet, R.: Generalized and hybrid Metropolis-Hastings overdamped Langevin algorithms. arXiv:1701.05833 (2017)
- 48.Pavliotis, G.A., Stuart, A.M.: White noise limits for inertial particles in a random field. Multiscale Model. Simul.
**1**(4), 527–533 (2003). (electronic)CrossRefMATHMathSciNetGoogle Scholar - 49.Pavliotis, G.A., Stuart, A.M.: Analysis of white noise limits for stochastic systems with two fast relaxation times. Multiscale Model. Simul.
**4**(1), 1–35 (2005). (electronic)CrossRefMATHMathSciNetGoogle Scholar - 50.Rey-Bellet, L., Spiliopoulos, K.: Irreversible Langevin samplers and variance reduction: a large deviations approach. Nonlinearity
**28**(7), 2081–2103 (2015)CrossRefMATHADSMathSciNetGoogle Scholar - 51.Rey-Bellet, L., Spiliopoulos, K.: Variance reduction for irreversible Langevin samplers and diffusion on graphs. Electron. Commun. Probab., vol. 20, pp. 15, 16, (2015)Google Scholar
- 52.Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (2013)MATHGoogle Scholar
- 53.Roussel, J., Stoltz, G.: Spectral methods for Langevin dynamics and associated error estimates. arXiv:1702.04718 (2017)
- 54.Villani, C.: Hypocoercivity. Number 949-951. American Mathematical Society (2009)Google Scholar
- 55.Wu, S.-J., Hwang, C.-R., Chu, M.T.: Attaining the optimal Gaussian diffusion acceleration. J. Stat. Phys.
**155**(3), 571–590 (2014)CrossRefMATHADSMathSciNetGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.