# Variance Reduction Using Nonreversible Langevin Samplers

- 974 Downloads
- 8 Citations

## Abstract

A standard approach to computing expectations with respect to a given target measure is to introduce an overdamped Langevin equation which is reversible with respect to the target distribution, and to approximate the expectation by a time-averaging estimator. As has been noted in recent papers [30, 37, 61, 72], introducing an appropriately chosen nonreversible component to the dynamics is beneficial, both in terms of reducing the asymptotic variance and of speeding up convergence to the target distribution. In this paper we present a detailed study of the dependence of the asymptotic variance on the deviation from reversibility. Our theoretical findings are supported by numerical simulations.

## 1 Introduction

### 1.1 Motivation

*f*with respect to a target probability distribution \(\pi (dx)\) on \(\mathbb {R}^d\) with density \(\pi (x)\) with respect to the Lebesgue measure, known up to the normalization constant.

^{1}When the dimension

*d*is large, standard deterministic quadrature approaches become intractable, and one typically resorts to Markov-chain Monte Carlo (MCMC) methods [19, 39, 63]. In this approach, \(\pi (f)\) is approximated by a long-time average of the form:

*f*. For the reversible process (3) started from stationarity (i.e. \(X_0 \sim \pi \)), the Kipnis–Varadhan theorem [12, 32] implies that (4) holds with asymptotic variance

*f*and the process \(X_t\) over which the long-time average is generated. Quite often, \(X_t\) will exhibit some form of metastability [10, 11, 36, 66]: the process \(X_t\) will remain trapped for a long time exploring one mode, with transitions between different modes occurring over longer timescales. When the observable depends directly on the metastable degrees of freedom (i.e. the observable takes different values in different metastable regions), the asymptotic variance \(\sigma _f^2\) of the estimator \(\pi _T(f)\) may be very large. As a result, more samples are required to obtain an estimate of \(\pi (f)\) with the desired accuracy. A similar scenario arises when the mass of \(\pi \) is tightly concentrated along a low-dimensional submanifold of \(\mathbb {R}^d\), as illustrated in Fig. 1. In this case, reversible dynamics, such as (3) will cause a very slow exploration of the support of \(\pi \). As a result, \(\pi _T(f)\) will exhibit very large variance for observables which vary strongly along the manifold.

^{2}many Markov processes with invariant distribution \(\pi \), a natural question is whether such a process can be chosen to have optimal performance. The two standard optimality criteria that are commonly used are:

- (a)
With respect to speeding up convergence to the target distribution.

- (b)
With respect to minimizing the asymptotic variance.

Within the family of reversible samplers, much work has been done to derive samplers which exhibit good (if not optimal) computational performance. This has motivated a number of variants of MALA which exploit geometric features of \(\pi \) to explore the state space more effectively, including preconditioned MALA [64], Riemannian Manifold MALA [20] and Stochastic Newton Methods [45].

### 1.2 Nonreversible Langevin Dynamics

An MCMC scheme which departs from the assumption of reversible dynamics is Hamiltonian MCMC [53], which has proved successful in Bayesian inference. By augmenting the state space with a momentum variable, proposals for the next step of the Markov chain are generated by following Hamiltonian dynamics over a large, fixed time interval. The resulting nonreversible chain is able to make distant proposals. Various methods have been proposed which are related to this general idea of breaking nonreversibility by introducing an additional dimension to the state space and introducing dynamics which explore the enlarged space while still preserving the equilibrium distribution. In particular, the lifting method [15, 27, 71] is one such method applied to discrete state systems, where the Markov chain is “lifted” from the state space \(\Omega \) onto the space \(\Omega \times \lbrace 1, -1 \rbrace \). The transition probabilities in each copy are modified to introduce transitions between the copies to preserve the invariant distribution but now promote the sampler to generate long strides or trajectories. Similar methods based on introducing nonreversibility into the dynamics of discrete state chains to speed up convergence have been applied with success in various applications [9, 26, 52, 68, 69]. These methods are also reminiscent of parallel tempering or replica exchange MCMC [51], which are aimed at efficiently sampling from multimodal distributions.

It is well documented that breaking detailed balance, i.e. considering a nonreversible diffusion process that is ergodic with respect to \(\pi \), can help accelerate convergence to equilibrium. In fact, it has been proved that, among all diffusion processes with additive noise that are ergodic with respect to \(\pi \), the reversible dynamics (3) has the slowest rate of convergence to equilibrium, measured in terms of the spectral gap of the generator in \(L^2(\mathbb {R}^d; \, \pi ) = : L^2(\pi )\), c.f. [37]. Adding a drift to (3) that is divergence-free with respect to \(\pi \) and that preserves the invariant measure of the dynamics will always accelerate convergence to equilibrium [28, 29, 37, 54, 61, 72]. The optimal nonreversible perturbation can be identified and obtained in an algorithmic manner for diffusions with linear drift, whose invariant distribution is Gaussian [37]; see also [72].

The effect of nonreversibility in the dynamics on the asymptotic variance has also been studied. In [61] it is shown that small antisymmetric perturbations of the reversible dynamics always decrease the asymptotic variance, and more recently in [62] Friedlin–Wentzell theory is used to study the limit of infinitely strong antisymmetric perturbations. In [30] the authors use spectral theory for selfadjoint operators to study the effect of antisymmetric perturbations on diffusions on both \(\mathbb {R}^d\) and on compact manifolds and provide a general comparison result between reversible and nonreversible diffusions. This work is related with previous studies on the behavior of the spectral gap of the generator when the strength of the nonreversible perturbation is increased [13, 18].

### 1.3 Objectives of This Paper

The effect of the antisymmetric part on the long time dynamics of diffusion processes has been also studied extensively in the context of turbulent diffusion [43] and fluid mixing. The effect of an incompressible flow on the convergence of the solution to the advection–diffusion equation on a compact manifold to its mean value (i.e. when \(\pi \equiv 1\)) was first studied in [13]. In particular, the concept of a relaxation enhancing flow was introduced and it was shown that a divergence-free flow is relaxation enhancing if and only if the Liouville operator \(\mathcal A= v \cdot \nabla \) has no eigenfunctions in the Sobolev space \(H^{1}\). An equivalent formulation of this result is that an incompressible flow is relaxation enhancing if and only if it is weakly mixing. Examples of relaxation enhancing flows are given in [13]. This problem was studied further in [18], where it also mentioned that there are very few examples of relaxation enhancing flows. In [18] it is shown that the spectral gap of the advection-diffusion operator \(\mathcal L= \alpha v \cdot \nabla + \Delta \), for a divergence-free drift *v* and \(\alpha \in \mathbb {R}\), remains bounded above in the limit as \(\alpha \rightarrow \pm \infty \) by a negative constant if and only if the advection operator has an eigenfunction in \(H^{1}\). These results are reminiscent of the results we mention above on the necessary and sufficient condition to obtain a reduction of asymptotic variance in the limit \(\alpha \rightarrow \pm \infty \).

Our analysis of the asymptotic variance \(\sigma ^2_f\), based on the careful study of the Poisson Eq. (10), enables us to study in detail the problem of finding the nonreversible perturbation giving rise to minimum asymptotic variance for diffusions with linear drift, i.e. diffusions whose invariant measure is Gaussian, over a large class of observables. Diffusions with linear drift were considered in [37] where the optimal nonreversible perturbation with respect to accelerating convergence was obtained. For linear and quadratic observables, we can give a complete solution to this problem, and construct nonreversible perturbations that provide a dramatic reduction in asymptotic variance. Moreover, we demonstrate that the conditions under which the variance is reduced are very different from those of maximising the spectral gap discussed in [37]. In particular, we show how a nonreversible perturbation can dramatically reduce the asymptotic variance for the estimator \(\pi _T(f)\), even though no such improvement can be made on the rate of convergence to equilibrium.

Guided by our theoretical results, we can then study numerically the reduction in the asymptotic variance due to the addition of a nonreversible drift for some toy models from molecular dynamics. In particular, we study the problem of computing expectations of observables with respect to a warped Gaussian [23] in two dimensions, as well as a simple model for a dimer in a solvent [38]. The numerical experiments reported in this paper illustrate that a judicious choice of the nonreversible perturbation, dependent on the target distribution and the observable, can dramatically reduce the asymptotic variance.

To compute \(\pi _T(f)\) numerically, we use an Euler–Maruyama discretisation of (6). The resulting discretisation error introduces an additional bias in the estimator for \(\pi (f)\), see [47] for a comprehensive error analysis. This imposes additional constraints on the magnitude of the nonreversible drift, since increasing \(\alpha \) arbitrarily will give rise to a large discretisation error which must be controlled by taking smaller timesteps. A natural question is whether the increase in the computational cost due to the necessity of taking smaller timesteps negates any benefits of the resulting variance reduction. To study this problem, we compare the computational cost of the unadjusted nonreversible Langevin sampler with the corresponding MALA scheme.^{3} Our numerical results, together with the theoretical analysis for diffusions with linear drift, show that the nonreversible Langevin sampler can outperform the MALA algorithm, provided that the nonreversible perturbation is well-chosen. Finally, we consider a higher order numerical scheme for generating samples of (6), based on splitting the reversible and nonreversible dynamics. Numerically, we investigate the properties of this integrator, and demonstrate that its improved stability and discretisation error make it a good numerical scheme for computing the estimator \(\pi _T(f)\) using a nonreversible diffusion.

The rest of the paper is organized as follows. In Sect. 2 we describe how the central limit theorem (4) arises from the solution of the Poisson equation associated with the generator of the dynamics. In Sect. 3 we analyse the asymptotic variance and formulate the problem of finding the optimal perturbation with respect to minimising \(\sigma ^2_f\), for a fixed observable, or over the space of square-integrable observables. Moreover, following the analysis described in [58], we derive a spectral representation for the asymptotic variance \(\sigma ^2_f\), in terms of the discrete spectrum of the operator \(\mathcal {G}\) defined in (12), and recover estimates for the asymptotic variance for any value of \(\alpha \). In Sect. 4, we consider the case of Gaussian diffusions, which are amenable to explicit calculation to demonstrate the theory presented in this paper. In Sect. 5, we provide various numerical examples to complement the theoretical results. Finally, in Sect. 6 we describe the bias-variance tradeoff for nonreversible Langevin samplers, and explore their computational cost. Conclusions and discussion on further work are presented in Sect. 7.

## 2 The Central Limit Theorem and Estimates on the Asymptotic Variance via the Poisson Equation

**Assumption A**(

*Foster–Lyapunov Criterion*) There exists a function \(U:\mathbb {R}^d\rightarrow \mathbb {R}\) and constants \(c > 0\) and \(b \in \mathbb {R}\) such that \(\pi (U) < \infty \) and

*petite set*. For the definition of a petite set we refer the reader to [48]. For the generator \(\mathcal {L}\) corresponding to the process (6), compact sets are always petite. As noted in [65], a sufficient condition on \(\pi \) for (6) to possess a Lyapunov function is the following.

**Assumption B**The density \(\pi \) is bounded and for some \(0 < \beta < 1\):

**Lemma 1**

*U*is a Lyapunov function for \(X_t^\gamma \) defined by (6) for all \(\alpha \in \mathbb {R}\).

*Proof*

*M*such that for \(|x| > M\):

*b*is a positive constant.

Finally, we note that since \(\pi \) is bounded, then *U* is bounded above from zero uniformly. Thus, *U* can be rescaled to satisfy the condition \(U \ge 1\), as is required by the Foster–Lyapunov criterion. \(\square \)

*Remark 1*

*p*(

*x*) is a polynomial of order

*m*such that \(p(x) \rightarrow \infty \) as \(|x|\rightarrow \infty \) (necessarily \(m \ge 2\) and

*m*is even). Clearly

If condition (14) holds for \(X_t^\gamma \), then the process will be exponentially ergodic. More specifically, the law of the process \(X_t^\gamma \) started from a point \(x \in \mathbb {R}^d\) will converge exponentially fast in the total variation norm to the equilibrium distribution \(\pi \). In particular, denoting by \((P_t^\gamma )_{t\ge 0}\) the semigroup associated with the diffusion process (6), we have the following result.

**Theorem 1**

*U*. Then there exist positive constants

*c*and \(\lambda \) such that, for all

*x*,

For a central limit theorem to hold for the process \(X_t^\gamma \), and thus for \(\sigma ^2_f\) to be finite, it is necessary that the Poisson Eq. (10) is well-posed. The Foster-Lyapunov condition (14) is sufficient for this to hold.

**Theorem 2**

[21, Thm3.2] Suppose that Assumption A holds for the diffusion process (6) with Lyapunov function *U*. Then there exists a positive constant *c* such that for any \(|f|^2 \le U\), the Poisson Eq. (10) admits a unique zero mean solution \(\phi \) satisfying the bound \(\left| \phi (x)\right| ^2 \le cU(x)\). In particular, \(\phi \in L^2(\pi )\). \(\square \)

*H*. The result then follows by decomposing

**Theorem 3**

*U*, then for any

*f*such that \(f^2(x) \le U(x)\), there exists a constant \(0 < \sigma ^2_f < \infty \) such that \(\sqrt{t}(\pi _t(f) - \pi (f))\) converges in distribution to an \(\mathcal {N}(0, \sigma ^2_f)\) distribution, as \(t \rightarrow \infty \), for any initial distribution \(\nu \), where

In the remainder of this paper we shall study the dependence of \(\phi \), and thus \(\sigma ^2_f\) on the choice of non-reversible perturbation \(\gamma \). We note that (16) is precisely the Dirichlet form associated with the dynamics \(\mathcal {L}\) evaluated at the solution \(\phi \) of the Poisson Eq. (13).

## 3 Analysis of the Asymptotic Variance

### 3.1 Mathematical Setting

The operator \({\mathcal S}\) is symmetric with respect to \(\pi \) and can be extended to a selfadjoint operator on \(L^2_0(\pi )\), which is also denoted by \({\mathcal S}\), with domain \(\mathcal {D}({\mathcal S}) = {\mathcal H}^2\). We shall make the following assumption on \(\pi \) which is required, in addition to Assumption B, to ensure that \(\mathcal {L}\) possesses a spectral gap in \(L^2_0(\pi )\).

**Assumption C**

**Lemma 2**

*Proof*

Throughout this section, we shall assume that Assumptions B and C hold, and moreover, for simplicity we shall make the following additional assumption:

**Assumption D** The nonreversible perturbation \(\gamma \) is smooth and bounded in \(L^\infty \).

*J*is antisymmetric. Moreover, it is clear that \(\gamma (x)\) is bounded and smooth, thus satisfying Assumption D. In particular, if

*V*has compact level sets (which holds for example if

*V*is a nonnegative polynomial function), then condition (24) is satisfied by chosing \(\psi \) to be a smooth non-negative function with compact support.

*V*invariant under the flow \(\dot{z}_t = \gamma (z_t)\), i.e. \(V(z_t)\) is constant for all \(t \ge 0\). Thus, for large \(|\alpha |\), the flow \(\alpha \gamma \) will result in rapid exploration of the level surfaces of

*V*, but the motion of \(X_t\) between level surfaces is entirely due to the reversible dynamics of the process. In particular, for potentials with energy barriers, the transition time for \(X_t^\gamma \) to cross a barrier will still satisfy the same Arrenhius law as the corresponding reversible process. Other choices of flow \(\gamma \) are possible. For example, one could alternatively consider a skew-symmetric matrix function

*J*(

*x*) as detailed in [41]. The corresponding flow would then be defined by

*J*(

*x*) is skew-symmetric. If additionally, the matrix function

*J*is smooth with bounded derivative and compact support, then \(\gamma \) satisfies Assumption D. As detailed in [41] one can further generalise this choice of dynamics by additionally introducing a space dependent diffusion tensor, and an appropriate correction of the drift to maintain ergodicity with respect to \(\pi \). We do not consider this choice of dynamics in this paper, noting that most of the presented results can be readily generalized to this scenario.

### 3.2 An Expression for the Asymptotic Variance

**Lemma 3**

*Proof*

Thus, the asymptotic variance is never increased by introducing a nonreversible perturbation, for all \(f \in L^2_0(\pi )\). This had already been noted in [61] where an expression for the the asymptotic was derived as the curvature of the rate function of the empirical measure, and also in [30] using an approach similar to that above. Expression (28) provides us with a formula for \(\sigma ^2_f\) in terms of a symmetric quadratic form which is explicit in terms of \(\mathcal A\) and \({\mathcal S}\).

### 3.3 Quantitative Estimates for the Asymptotic Variance

In this section we derive quantitative versions of (29) and (30), using techniques developed in [58] for the analysis of the Green–Kubo formula, which is itself based on earlier work on the estimation of the eddy diffusivity in turbulent diffusion [2, 8, 44, 57].

**Lemma 4**

Suppose that Assumption D holds, then the operator \({\mathcal G}= (-{\mathcal S})^{-1}\mathcal A\) is skew-adjoint on \({\mathcal H}^1\).

*Proof*

Moreover, under appropriate assumptions on the target distribution \(\pi \), one can show that the operator \({\mathcal G}\) is compact.

**Lemma 5**

Suppose that Assumptions B, C and D hold, then the operator \({\mathcal G}\) is compact on \({\mathcal H}^1\).

*Proof*

**Theorem 4**

*f*reduces to finding \(\gamma \) such that

*H*such that \(H\circ V \in {\mathcal H}^1\). In this case, it is always possible to choose an observable

*f*such that \(\gamma \) will not be optimal for

*f*, in the sense that \(\sigma _f^2(\alpha )\) is nonzero in the limit \(\alpha \rightarrow \infty \). We also remark that these asymptotic results, in particular the distinction between these two cases is reminiscent of similar results that have been obtained in the context of turbulent diffusion [44, 57].

An analogous classification of the asymptotic behaviour of the spectral gap of the operator (9) in the limit of large \(\alpha \) is considered in [18] on a compact manifold. Indeed, in [18, Theorem 1] it is determined that the spectral gap is finite in the limit of \(\alpha \rightarrow \pm \infty \) if and only if \(\mathcal A\) has a nonconstant eigenfunction in \({\mathcal H}^1\). However, one should note that, the asymptotic variance \(\sigma ^2_f(\alpha )\) may converge to 0 as \(\alpha \rightarrow \pm \infty \) even when the spectral gap is finite, see also [61, Example 2.9] for a counterexample.

### 3.4 A Two Dimensional Example

*f*, any variance in \(\pi _T(f)\) arising from the variation of

*f*along the level curves should vanish as \(|\alpha |\rightarrow \infty \), leaving only the variance contributed by the variation of

*f*between level curves. We make this precise with a particular example. Consider the observable \(f(x_1,x_2) = 2 x_1^2\), expressible in polar coordinates as

*r*, we have \(\hat{f}_{\mathcal {N}} = r^2/2 - 1\). It follows that \(2\left| \left| \hat{f}_{\mathcal {N}}\right| \right| ^2_{1} = 4\), so that

*f*varies strongly with respect to \(\theta \). In the other extreme, we expect zero improvement when

*f*depends only on

*r*. This intuition is formalised in the following section, in particular in Proposition 1 and the subsequent bound (51).

For more general potentials, using a flow field of the form \(\gamma (x) = J\nabla \log \pi (x)\), \(J^\top = -J\), the mechanism for reducing the asymptotic variance is analogous: the large antisymmetric drift gives rise to fast deterministic mixing along the level curves of the potential, while the reversible dynamics induce slow diffusive motion along the gradient of the potential. When the nullspace of \(\mathcal {A}\) is trivial in \({\mathcal H}^1\), the fast deterministic flow is ergodic, so that, for \(\alpha \) large, the antisymmetric component will cause a rapid exploration of the entire state space. Consequently, the asymptotic variance converges to 0 as \(\alpha \rightarrow \infty \). On the other hand, if \(\mathcal {A}\) has a nontrivial nullspace, the antisymmetric perturbation is no longer ergodic, and the state space can be decomposed into components such that the rapid flow behaves ergodically in each individual component. In the limit of large \(\alpha \), \(X_t\) becomes a fast-slow system, with rapid exploration within the ergodic components coupled to a slow diffusion between components. Very recently, Rey-Bellet and Spiliopoulos [62] have applied Freidlin–Wenzell theory to rigorously analyse this case in the large \(\alpha \) limit for a large class of potentials.

## 4 Nonreversible Perturbations of Gaussian Diffusions

*f*. Indeed, consider the nonsymmetric Ornstein-Uhlenbeck process in \( \mathbb {R}^d\):

*J*is an antisymmetric matrix, \(\alpha > 0\), and \(W_t\) is a standard

*d*-dimensional Brownian motion. The stationary distribution \(\pi (x)\) is \(\mathcal {N}(0, I)\), independent of \(\alpha \) and

*J*. Although this system does not fall under the framework of Theorem 4 we are still able to obtain analogous conditions for a reduction in the asymptotic variance. The objective of this section is to mirror the results for speeding up convergence to equilibrium of \(X_t\) that were derived in [37] to the case of minimizing the asymptotic variance. In particular, following arguments similar to [58, Sect. 4.2], an explicit formula for the asymptotic variance will be derived, from which an optimal

*J*can be chosen, in a manner similar to [37]. We note that for the process (41), the optimal nonreversible perturbation obtained in [37] does not provide any increase to the rate of convergence to equilibrium, since all eigenvalues of the covariance matrix of the Gaussian stationary distribution are the same. Nonetheless, in this section we show that for certain observables, the asymptotic variance of \(\pi _T(f)\) can be dramatically decreased.

### 4.1 Explicit Formula for the Asymptotic Variance

*f*is a quadratic functional of the form

*k*is a constant, chosen so that

*f*(

*x*) is centered with respect to \(\pi (x)\): \(k = -{{\mathrm{Tr}}}M\). Consider the Poisson Eq. (10):

*f*.

**Proposition 1**

*A*, and \([A, B] = AB - BA\) is the commutator of

*A*and

*B*. In particular,

*Proof*

*x*, so we make the ansatz

*x*we have

*x*is arbitrary, it follows that

*M*is positive definite and \(\text{ spec }(A) \subset \lbrace \lambda \in \mathbb {C} \, : \text{ Re } \lambda >0 \rbrace \). Indeed, for

*C*given by

*C*and

*M*are symmetric,

*J*is antisymmetric, we have that, for all \(l \in \mathbb {R}^d\):

*J*is skew-symmetric, the matrix exponential \(e^{\alpha J s}\) is a rotation matrix. Thus, the matrix \(e^{\alpha J s} M e^{-\alpha J s}\) has the same eigenvalues as

*M*and \(M^\top \). From [4, III. 6.14] we have

*M*, sorted in ascending and descending order, respectively. In particular,

*U*is a \(d\times d\) orthogonal matrix with the first

*N*columns spanning \(\mathcal {N}\). Then

*l*onto \({\mathcal N}\). Thus, for quadratic observables, from (49), the asymptotic variance has the following lower bound in the limit of large \(\alpha \):

*J*and \(\alpha \in \mathbb {R}\), analogous to the situation which occurs in [37] for maximising the spectral gap, when all the eigenvalues of the covariance matrix are equal.

### 4.2 Finding the Optimal Perturbation

*J*which minimizes the asymptotic variance subject to \(\left| \left| J\right| \right| _F = 1\). In this case, we have the equality

*l*is orthogonal to \(\mathcal {N}\). Thus, the best we can do is to choose

*J*such that

*l*is an eigenvector of \(J^\top J\) with maximal eigenvalue. This can be done by choosing a unit vector \(\omega \in \mathbb {R}^d\) orthogonal to

*l*and setting

*J*which give the minimal asymptotic variance. As an example, let \(d = 3\), consider the observables \(f(x) = l^{(i)}\cdot x\) for \(l^{(1)} = (0, 1, 1)^\top /\sqrt{2}\) and \(l^{(2)} = (1, 0, 1)^{\top }/\sqrt{2}\) and \(l^{(3)} = (1, -1, 1)^{\top }/\sqrt{3}\), respectively. We choose

*J*to be

*J*, given by

*J*. Thus, for this observable the matrix

*J*given by (52) is an optimal perturbation. For \(f(x) = l^{(2)}\cdot x\), since \(l^{(2)}\) is not orthogonal to \(\zeta \), as \(\alpha \rightarrow \infty \),

*J*, however the bound (51) suggests a good candidate for

*J*. Suppose that

*M*has eigenvalues \(\lambda _1 \le \lambda _2 \le \cdots \le \lambda _d\) with corresponding eigenvectors \(e_1, \ldots , e_d\). Suppose that

*d*is even. Let

*J*by

*J*, the asymptotic variance is given by

## 5 Numerical Experiments

*V*(

*x*) is a given smooth, confining potential with finite unknown normalisation constant

*Z*. We will also assume that the divergence-free vector field \(\gamma (x)\) is given by

*f*such that \(\pi (f) = 0\) and

### 5.1 Periodic Distribution

*V*(

*x*) is given by

### 5.2 Warped Gaussian Distribution

*V*(

*x*) is plotted in Fig. 4. Our objective is to compute \(\pi (f)\) where the observable

*f*is given by:

### 5.3 Introducing a Metropolis–Hastings Accept–Reject Step

A natural question is whether an MH chain using a proposal distribution based on the SDE (6) with antisymmetric drift will inherit the superior mixing properties of the nonreversible diffusion process. As the MH algorithm works by enforcing the detailed balance of the chain \(X^{(n)}\) with respect to the distribution \(\pi \), we expect that any benefits of the antisymmetric drift term will be negated when introducing this accept–reject step. To test this, we repeat the numerical experiment of Sect. 5.2 using the MH algorithm using the Euler discretisation of (6) as a proposal scheme, for various values of \(\alpha \). The effect of introducing this accept–reject step to the nonreversible diffusion is evident from Fig. 5. While the accept–reject step removes any bias due to discretisation error, as is evident from Fig. 5b, the asymptotic variance actually increases as \(\alpha \) increases. This is due to the fact that for large \(\alpha \), proposals are more likely to be rejected as they are far away from the current state.

### 5.4 Dimer in a Solvent

*N*particles \(P_1, \ldots , P_N\) in a two-dimensional periodic box of side length

*L*. Particles \(P_1\) and \(P_2\) are assumed to form a dimer pair, in a solvent comprising the particles \(P_3, \ldots , P_N\). The solvent particles interact through a truncated Lennard-Jones potential:

*r*is the distance between two particles, \(\epsilon \) and \(\sigma \) are two positive parameters, and \(r_0 = 2^{\frac{1}{6}}\sigma \). The interaction potential between the dimer pair is given by a double-well potential

*h*and

*w*are two positive parameters. The total energy of the system is given by

*h*. Define \(\xi (q)\) to be

*R*is the following rotation matrix on \(\mathbb {R}^{4 \times 4}\):

## 6 The Computational Cost of Nonreversible Langevin Samplers

As observed Sect. 5, while increasing \(\alpha \) is guaranteed to decrease the asymptotic variance \(\sigma ^2_{f}(\alpha )\) for the estimator \(\pi _T(f)\), this will also give rise to an increase in the discretisation error arising from the particular discretisation being used. Moreover, as \(\alpha \) increases, the SDE (6) becomes more and more stiff, to the extent that the discretisation becomes numerically unstable unless the stepsize is chosen to be accordingly small. As a result, any discretisation \(\lbrace X^{(n)} \rbrace _{n=1}^N\) will require smaller timesteps to guarantee that the stationary distribution of \(X^{(n)}\) is sufficiently close to \(\pi (x)\). This tradeoff between computational cost and asymptotic variance of the estimator must be taken into consideration when comparing reversible to nonreversible diffusions.

*C*is a positive constant independent of \(\Delta t\) and

*N*, which depends on the coefficients of the SDE and the observable

*f*. This estimate makes explicit the tradeoff between discretisation error and sampling error. For a fixed computational budget

*N*, the right hand side of (64) is minimized when \(\Delta t \propto {(N)^{-\frac{1}{3}}}.\) For an SDE of the form (6), we expect that the constant

*C*will increase with \(\alpha \). Identifying the correct scaling of the error with respect to \(\alpha \) is an interesting problem that we intend to study.

To obtain a clearer idea of the bias variance tradeoff we compute the mean-square error for the Euler–Maruyama discretisation for two particular examples. In Fig. 7a, we consider the warped Gaussian distribution defined by (58) and the observable \(f(x) = |x|^2\). A value for \(\pi (f)\) is obtained by integrating \(\int _{\mathbb {R}^d} f(x)\pi (dx)\) numerically, using a globally adaptive quadrature scheme to obtain an approximation with error less than \(10^{-12}\). In Fig. 7a we plot the relative mean–squared-error defined by \(\left( Err_{N, \Delta t}[f]/\pi (f)\right) ^2\) for an Euler–Maruyama discretisation of (6), for timestep \(\Delta t\) in the interval \([2^{-5}, 1]\). The total number of timesteps is kept fixed at \(N = 10^6\). For each value of \(\alpha \), the mean square error is approximated over an ensemble of 256 independent realisations. Missing points indicate finite time blowup of the discretized diffusion. The dashed line denotes the MSE generated from the corresponding MALA sampler, namely an Euler–Maruyama discretisation of the reversible diffusion with an added Metropolis–Hastings accept–reject step. We note that both the Euler–Maruyama discretisation and the MALA sampler require one evaluation of the gradient term \(\nabla \log \pi \) per timestep, so that comparing an ergodic average obtained from \(10^6\) steps of each scheme is fair.

A trade-off between discretisation error and variance is evident from Fig. 7a, and is consistent with the error estimate (64). We observe that the nonreversible Langevin sampler outperforms the reversible Langevin sampler by an order of magnitude, with the lowest MSE attained when \(\alpha = 10\). As \(\alpha \) is increased beyond this point, the discretisation error is balanced by the decrease in variance, and we observe no further gain in performance. Nonetheless, despite the fact that the MALA scheme has no bias, the nonreversible sampler, with \(\alpha = 5\), outperforms MALA (in terms of MSE) by a significant factor of 8.8.

We repeat this numerical experiment for the target distribution \(\pi \) given by a standard Gaussian distribution in \(\mathbb {R}^d\) and observable \(f(x) = x_2 + x_3\). In this case \(\pi (f)\) is exacty 0. We use the linear diffusion specified by (41) where the antisymmetric matrix *J* is given in (52), which is optimal for this observable. We plot the (absolute) MSE for the estimator \(\pi _T(f)\) in Fig. 8a. While the smallest MSE is attained by the nonreversible Langevin sampler, when \(\alpha = 25\), the increase in performance is only marginal. This is due to the fact that increasingly smaller timesteps must be taken to ensure that the EM approximation does not blow up. Indeed, the \(\alpha = 25\) sampler would not converge to a finite value for \(\Delta t\) greater than \(10^{-3}\), while the reversible sampler (\(\alpha = 0\)) and the MALA scheme were accurate even for timesteps of order 1.

*t*, and let \(\Phi _{n, t}(x)\) denote the flow map corresponding to the ODE:

*t*to \(t + \Delta t\):

We leave the justification and analysis of this scheme as the goal of future work, and in this paper simply use it to compute a long time average approximation to \(\pi (f)\) and compare the MSE with that of a corresponding reversible MALA scheme. To obtain a fair comparison between the results obtained by MALA and the splitting scheme, we note that while a careful implementation of MALA requires only one evaluation of \(\nabla \log \pi \) per timestep, the splitting scheme requires six evaluations of \(\nabla \log \pi \) per timestep (naively a single timestep would require two evaluations for each reversible substep and four evaluations for the nonreversible substep, however we can reuse two evaluations of \(\nabla \log \pi \) between the steps). Thus, we shall compare the MSE obtained from trajectories of \(10^6\) timesteps of the nonreversible sampler with \(6\cdot 10^6\) timesteps of the corresponding MALA scheme, for stepsizes ranging from \(10^{-5}\) to 1. The results for the warped Gaussian distribution in \(\mathbb {R}^2\) and standard Gaussian in \(\mathbb {R}^3\) are plotted in Figs. 7b and 8b, respectively. Note that we omit the \(\alpha = 0\) case since, in this case, the splitting scheme reduces to standard the MALA scheme. We observe that with this splitting scheme, the nonreversible sampler outperforms MALA by a factor of 13 for the warped Gaussian model, and by a factor of 20 for the standard Gaussian model. The benefits of the splitting scheme appear to be twofold: firstly the integrator is more stable, in both models, the long time simulation of \(X_t^\gamma \) did not blow up, even for large values of \(\alpha \), and for \(\Delta t = 0.1\). Moreover, compared to the corresponding Euler–Maruyama discretisation, the MSE is consistently an order of magnitude less.

While this splitting scheme is only a first step into properly investigating appropriate integrators for nonreversible Langevin schemes, the above numerical experiments demonstrate clearly that there is a significant benefit in doing so, which motivates future investigation.

## 7 Conclusions and Further Work

In this paper we have presented a detailed analytical and numerical study of the effect of nonreversible perturbations to Langevin samplers. In particular, we have focused on the effect on the asymptotic variance of adding a nonreversible drift to the overdamped Langevin dynamics. Our theoretical analysis, presented for diffusions with periodic coefficients and for diffusions with linear drift for which a complete analytical study can be performed, and our numerical investigations on toy models clearly show that a judicious choice of the nonreversible drift can lead to a substantial reduction in the asymptotic variance. On the other hand, as observed from the dimer model example in Sect. 5.4, an arbitrary choice of nonreversible drift will not always give rise to significant improvement. We have also presented a careful study of the computational cost of the algorithm based on a nonreversible Langevin sampler, in which the competing effects of reducing the asymptotic variance and of increasing the stiffness due to the addition of a nonreversible drift are monitored. The main conclusions that can be drawn from our numerical experiments is that a nonreversible Langevin sampler with close-to-the-optimal choice of the nonreversible drift significantly outperforms the (reversible) Metropolis–Hastings sampler.

- (1)
The effect of using degenerate, hypoelliptic diffusions for sampling from a given distribution.

- (2)
Combining the use of nonreversible Langevin samplers with standard variance reduction techniques such as the zero variance reduction MCMC methodology [14].

- (3)
Optimizing Langevin samplers within the class of reversible diffusions.

- (4)
The development of nonreversible Metropolis–Hastings algorithms based on the above techniques, possibly related to approach described in [9].

- (5)
The development and analysis of numerical schemes specifically designed to simulate nonreversible Langevin diffusions.

## Footnotes

- 1.
With a slight abuse of notation, we will denote by \(\pi \) both the measure and the density.

- 2.
Formally, all diffusion processes \(X_t\) with drift

*b*(*x*) and diffusion \(\sigma (x)\) and generator \(\mathcal L= b\cdot \nabla + \frac{1}{2}\sigma \sigma ^\top :\nabla \nabla \) such that \(\pi \) is the unique solution of the stationary Fokker-Planck equation \(\mathcal L^{\star } \pi =0\) can be used to sample from \(\pi \). - 3.
As we illustrate in our paper, there is no point in considering the Metropolis adjusted sampler with a nonreversible proposal, since the addition of the accept–reject step renders the resulting Markov chain reversible and any nonreversibility-induced variance reduction is lost.

## Notes

### Acknowledgments

The work of TL is supported by the European Research Council under the European Union’s Seventh Framework Program/ ERC Grant Agreement Number 614492. GP thanks Ch. Doering for useful discussions and comments. The research of AD is supported by the EPSRC under Grant No. EP/J009636/1. GP is partially supported by the EPSRC under Grants Nos. EP/J009636/1, EP/L024926/1, EP/L020564/1 and EP/L025159/1. TL would like to thank J. Roussel for pointing out some mistakes in a preliminary version of the manuscript. The authors would like to thank the anonymous referees for their useful comments and suggestions.

### References

- 1.Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis, vol. 57. Springer, New York (2007)MATHGoogle Scholar
- 2.Avellanada, M., Majda, A.J.: Stieltjes integral representation and effective diffusivity bounds for turbulent transport. Phys. Rev. Lett.
**62**, 753–755 (1989)CrossRefADSGoogle Scholar - 3.Avellaneda, M., Majda, A.J.: An integral representation and bounds on the effective diffusivity in passive advection by laminar and turbulent flows. Commun. Math. Phys.
**138**(2), 339–391 (1991)MathSciNetCrossRefMATHADSGoogle Scholar - 4.Bhatia, R.: Matrix Analysis, vol. 169. Springer, New York (1997)MATHGoogle Scholar
- 5.Bhattacharya, R.: A central limit theorem for diffusions with periodic coefficients. Ann. Probab.
**13**(2), 385–396 (1985)MathSciNetCrossRefMATHGoogle Scholar - 6.Bhattacharya, R.N.: On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete
**60**(2), 185–201 (1982)MathSciNetCrossRefMATHGoogle Scholar - 7.Bhattacharya, R.N.: Multiscale diffusion processes with periodic coefficients and an application to solute transport in porous media. Ann. Appl. Probab.
**9**(4), 951–1020 (1999)MathSciNetCrossRefMATHGoogle Scholar - 8.Bhattacharya, R.N., Gupta, V.K., Walker, H.F.: Asymptotics of solute dispersion in periodic porous media. SIAM J. Appl. Math.
**49**(1), 86–98 (1989)MathSciNetCrossRefMATHGoogle Scholar - 9.Bierkens, J.: Nonreversible Metropolis-Hastings. arXiv:1401.8087 (2014)
- 10.Bovier, A., Eckhoff, M., Gayrard, V., Klein, M.: Metastability in reversible diffusion processes: sharp asymptotics for capacities and exit times. Technical report, WIAS, Berlin (2002)Google Scholar
- 11.Bovier, A., Gayrard, V., Klein, M.: Metastability in reversible diffusion processes. II. Precise asymptotics for small eigenvalues. J. Eur. Math. Soc.
**7**(1), 69–99 (2005)MathSciNetCrossRefMATHGoogle Scholar - 12.Cattiaux, P., Chafaı, D., Guillin, A.: Central limit theorems for additive functionals of ergodic Markov diffusions processes. ALEA
**9**(2), 337–382 (2012)MathSciNetMATHGoogle Scholar - 13.Constantin, P., Kiselev, A., Ryzhik, L., Zlato, A.: Diffusion and mixing in fluid flow. Ann. Math.
**168**(2), 643–674 (2008)MathSciNetCrossRefMATHGoogle Scholar - 14.Dellaportas, P., Kontoyiannis, I.: Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. B
**74**(1), 133–161 (2012)MathSciNetCrossRefGoogle Scholar - 15.Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab.
**10**(3), 726–752 (2000)MathSciNetCrossRefMATHGoogle Scholar - 16.Douc, R., Fort, G., Guillin, A.: Subgeometric rates of convergence of f-ergodic strong Markov processes. Stoch. Processes Appl.
**119**(3), 897–923 (2009)MathSciNetCrossRefMATHGoogle Scholar - 17.Down, D., Meyn, S.P., Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab.
**23**(4), 1671–1691 (1995)MathSciNetCrossRefMATHGoogle Scholar - 18.Franke, B., Hwang, C.R., Pai, H.M., Sheu, S.J.: The behavior of the spectral gap under growing drift. Trans. Am. Math. Soc.
**362**(3), 1325–1350 (2010)MathSciNetCrossRefMATHGoogle Scholar - 19.Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, vol. 2. Taylor & Francis, Boca Raton (2014)MATHGoogle Scholar
- 20.Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. B
**73**(2), 123–214 (2011)MathSciNetCrossRefGoogle Scholar - 21.Glynn, P.W., Meyn, S.P.: A Liapounov bound for solutions of the Poisson equation. Ann. Probab.
**24**(2), 916–931 (1996)MathSciNetCrossRefMATHGoogle Scholar - 22.Golden, K., Papanicolaou, G.: Bounds for effective parameters of heterogeneous media by analytic continuation. Commun. Math. Phys.
**90**(4), 473–491 (1983)MathSciNetCrossRefADSGoogle Scholar - 23.Haario, H., Saksman, E., Tamminen, J.: Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Stat.
**14**(3), 375–396 (1999)CrossRefMATHGoogle Scholar - 24.Helffer, B.: Spectral Theory and Its Applications, vol. 139. Cambridge University Press, Cambridge (2013)MATHGoogle Scholar
- 25.Helland, I.S.: Central limit theorems for martingales with discrete or continuous time. Scand. J. Stat.
**9**(2), 79–94 (1982)MathSciNetMATHGoogle Scholar - 26.Hennequin, G., Aitchison, L., Lengyel, M.: Fast sampling-based inference in balanced neuronal networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2240–2248. Curran Associates, Inc., New York (2014)Google Scholar
- 27.Hukushima, K., Sakai, Y.: An irreversible Markov-chain Monte Carlo method with skew detailed balance conditions. J. Phys.
**473**, 012012 (2013)Google Scholar - 28.Hwang, C.R., Hwang-Ma, S.Y., Sheu, S.J.: Accelerating Gaussian diffusions. Ann. Appl. Probab.
**3**(3), 897–913 (1993)MathSciNetCrossRefMATHGoogle Scholar - 29.Hwang, C.R., Hwang-Ma, S.Y., Sheu, S.J., et al.: Accelerating diffusions. Ann. Appl. Probab.
**15**(2), 1433–1444 (2005)MathSciNetCrossRefMATHGoogle Scholar - 30.Hwang, C.-R., Normand, R., Wu, S.-J.: Variance reduction for diffusions. arXiv:1406.4657 (2014)
- 31.Jacod, J., Shiryaev, A.N.: Limit Theorems for Stochastic Processes, vol. 1943877. Springer, Berlin (1987)CrossRefMATHGoogle Scholar
- 32.Kipnis, C., Varadhan, S.R.S.: Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Commun. Math. Phys.
**104**(1), 1–19 (1986)MathSciNetCrossRefMATHADSGoogle Scholar - 33.Komorowski, T., Landim, C., Olla, S.: Fluctuations in Markov Processes: Time Symmetry and Martingale Approximation. Springer, Heidelberg (2012)CrossRefMATHGoogle Scholar
- 34.Krengel, U.: Ergodic Theorems. de Gruyter Studies in Mmathematics, vol. 6. Walter de Gruyter & Co., Berlin (1985)CrossRefMATHGoogle Scholar
- 35.Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics, vol. 14. Cambridge University Press, Cambridge (2004)MATHGoogle Scholar
- 36.Lelievre, T.: Two mathematical tools to analyze metastable stochastic processes. In: Cangiani, A., Davidchack, R.L., Georgoulis, E., Gorban, A.N., Levesley, J., Tretyakov, M.V. (eds.) Numerical Mathematics and Advanced Applications 2011, pp. 791–810. Springer, New York (2013)CrossRefGoogle Scholar
- 37.Lelièvre, T., Nier, F., Pavliotis, G.A.: Optimal non-reversible linear drift for the convergence to equilibrium of a diffusion. J. Stat. Phys.
**152**(2), 237–274 (2013)MathSciNetCrossRefMATHADSGoogle Scholar - 38.Lelièvre, T., Stoltz, G., Rousset, M.: Free Energy Computations: A Mathematical Perspective. World Scientific, Singapore (2010)CrossRefMATHGoogle Scholar
- 39.Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008)MATHGoogle Scholar
- 40.Lorenzi, L., Bertoldi, M.: Analytical Methods for Markov Semigroups. CRC Press, New York (2006)CrossRefMATHGoogle Scholar
- 41.Ma, Y.-A., Chen, T., Fox, E.: A complete recipe for stochastic gradient mcmc. In: Advances in Neural Information Processing Systems, pp. 2899–2907 (2015)Google Scholar
- 42.MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)MATHGoogle Scholar
- 43.Majda, A.J., Kramer, P.R.: Simplified models for turbulent diffusion: theory, numerical modelling, and physical phenomena. Phys. Rep.
**314**(4), 237–574 (1999)MathSciNetCrossRefADSGoogle Scholar - 44.Majda, A.J., McLaughlin, R.M.: The effect of mean flows on enhanced diffusivity in transport by incompressible periodic velocity fields. Stud. Appl. Math.
**89**(3), 245–279 (1993)MathSciNetCrossRefMATHGoogle Scholar - 45.Martin, J., Wilcox, L.C., Burstedde, C., Ghattas, O.: A stochastic newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM J. Sci. Comput.
**34**(3), A1460–A1487 (2012)MathSciNetCrossRefMATHGoogle Scholar - 46.Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stoch. Processes Appl.
**101**(2), 185–232 (2002)MathSciNetCrossRefMATHGoogle Scholar - 47.Mattingly, J.C., Stuart, A.M., Tretyakov, M.V.: Convergence of numerical time-averaging and stationary measures via Poisson equations. SIAM J. Numer. Anal.
**48**(2), 552–577 (2010)MathSciNetCrossRefMATHGoogle Scholar - 48.Meyn, S.P., Tweedie, R.L.: Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes. Adv. Appl. Probab.
**25**(3), 518–548 (1993)MathSciNetCrossRefMATHGoogle Scholar - 49.Meyn, S. P., Tweedie, R. L.: A survey of Foster-Lyapunov techniques for general state space Markov processes. In: Proceedings of the Workshop on Stochastic Stability and Stochastic Stabilization, Metz, France. Citeseer (1993)Google Scholar
- 50.Mira, A.: Ordering and improving the performance of Monte Carlo Markov chains. Stat. Sci.
**16**(4), 340–350 (2001)MathSciNetCrossRefMATHGoogle Scholar - 51.Neal, R.M.: Sampling from multimodal distributions using tempered transitions. Stat. Comput.
**6**(4), 353–366 (1996)MathSciNetCrossRefGoogle Scholar - 52.Neal, R. M.: Improving asymptotic variance of MCMC estimators: nonreversible chains are better. arXiv:math/0407281 (2004)
- 53.Neal, R.M.: MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo, vol. 54. CRC Press, Boca Raton (2010)Google Scholar
- 54.Ohzeki, M., Ichiki, A.: Simple implementation of Langevin dynamics without detailed balance condition. arXiv:1307.0434 (2013)
- 55.Pardoux, E., Veretennikov, A.Y.: On the Poisson equation and diffusion approximation I. Ann. Probab.
**29**(3), 1061–1085 (2001)MathSciNetCrossRefMATHGoogle Scholar - 56.Pavliotis, G.: Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, vol. 60. Springer, New York (2014)MATHGoogle Scholar
- 57.Pavliotis, G. A.: Homogenization Theory for Advection—Diffusion Equations with Mean Flow, Ph.D Thesis. Rensselaer Polytechnic Institute, Troy, NY (2002)Google Scholar
- 58.Pavliotis, G.A.: Asymptotic analysis of the Green-Kubo formula. IMA J. Appl. Math.
**75**, 951–967 (2010)MathSciNetCrossRefMATHGoogle Scholar - 59.Peskun, P.H.: Optimum Monte-Carlo sampling using Markov chains. Biometrika
**60**(3), 607–612 (1973)MathSciNetCrossRefMATHGoogle Scholar - 60.Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion, vol. 293. Springer, New York (1999)MATHGoogle Scholar
- 61.Rey-Bellet, L., Spiliopoulos, K.: Irreversible Langevin samplers and variance reduction: a large deviation approach. arXiv:1404.0105 (2014)
- 62.Rey-Bellet, L., Spiliopoulos, K.: Variance reduction for irreversible Langevin samplers and diffusion on graphs. arXiv:1410.0255 (2014)
- 63.Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2013)MATHGoogle Scholar
- 64.Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis-Hastings algorithms. Methodol. Comput. Appl. Probab.
**4**(4), 337–357 (2002)MathSciNetCrossRefMATHGoogle Scholar - 65.Roberts, G.O., Tweedie, R.L.: Exponential convergence of langevin distributions and their discrete approximations. Bernoulli
**2**(4), 341–363 (1996)MathSciNetCrossRefMATHGoogle Scholar - 66.Schütte, C., Sarich, M.: Metastability and Markov State Models in molecular Dynamics: Modeling, Analysis, Algorithmic Approaches. American Mathematical Society, Providence (2013)MATHGoogle Scholar
- 67.Strang, G.: On the construction and comparison of difference schemes. SIAM J. Numer. Anal.
**5**(3), 506–517 (1968)MathSciNetCrossRefMATHADSGoogle Scholar - 68.Sun, Y., Schmidhuber, J., Faustino, J.G.: Improving the asymptotic performance of Markov chain Monte-Carlo by inserting vortices. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 2235–2243. Curran Associates, Inc., New York (2010)Google Scholar
- 69.Suwa, H., Todo, S.: General construction of irreversible kernel in Markov Chain Monte Carlo. arXiv:1207.0258 (2012)
- 70.Talay, D., Tubaro, L.: Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl.
**8**(4), 483–509 (1990)MathSciNetCrossRefMATHGoogle Scholar - 71.Turitsyn, K.S., Chertkov, M., Vucelja, M.: Irreversible Monte Carlo algorithms for efficient sampling. Phys. D
**240**(4), 410–414 (2011)CrossRefMATHGoogle Scholar - 72.Wu, S.J., Hwang, C.R., Chu, M.T.: Attaining the optimal Gaussian diffusion acceleration. J. Stat. Phys.
**155**(3), 571–590 (2014)MathSciNetCrossRefMATHADSGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.