Abstract
It is well known in many settings that reversible Langevin diffusions in confining potentials converge to equilibrium exponentially fast. Adding irreversible perturbations to the drift of a Langevin diffusion that maintain the same invariant measure accelerates its convergence to stationarity. Many existing works thus advocate the use of such non-reversible dynamics for sampling. When implementing Markov Chain Monte Carlo algorithms (MCMC) using time discretisations of such Stochastic Differential Equations (SDEs), one can append the discretization with the usual Metropolis–Hastings accept–reject step and this is often done in practice because the accept–reject step eliminates bias. On the other hand, such a step makes the resulting chain reversible. It is not known whether adding the accept–reject step preserves the faster mixing properties of the non-reversible dynamics. In this paper, we address this gap between theory and practice by analyzing the optimal scaling of MCMC algorithms constructed from proposal moves that are time-step Euler discretisations of an irreversible SDE, for high dimensional Gaussian target measures. We call the resulting algorithm the ipMALA , in comparison to the classical MALA algorithm (here ip is for irreversible proposal). In order to quantify how the cost of the algorithm scales with the dimension N, we prove invariance principles for the appropriately rescaled chain. In contrast to the usual MALA algorithm, we show that there could be two regimes asymptotically: (i) a diffusive regime, as in the MALA algorithm and (ii) a “fluid” regime where the limit is an ordinary differential equation. We provide concrete examples where the limit is a diffusion, as in the standard MALA, but with provably higher limiting acceptance probabilities. Numerical results are also given corroborating the theory.
Similar content being viewed by others
Notes
Indeed, in this case Assumptions 5.1 and 5.2 presented in Sect. 5 are satisfied with \(c_1\) as in (4.8) and \(c_2=c_3=0\). This makes Assumption 5.4 easy to verify. Moreover, Condition 5.3 is trivially satisfied for matrices in Jordan block form. Detailed comments on this can be found in Sect. 5, see comments after (5.29).
This does not mean that the effect of the irreversible term is destroyed. It simply means that the acceptance probability will not feel it.
These calculations are a bit long but straightforward and follow the lines of the calculation done in Lemma B.1, proof of point (vi).
Having used Lemma 5.6 for the last equality.
References
Berger, E.: Asymptotic behaviour of a class of stochastic approximation procedures. Probab. Theory Relat. Fields 71(4), 517–552 (1986)
Bernard, E.P., Krauth, W., Wilson, D.B.: Event-chain algorithms for hard-sphere systems. Phys. Rev. E 80(5), 056704 (2009)
Bierkens, J.: Non-reversible Metropolis–Hastings. Stat. Comput 26(6), 1213–1228 (2016)
Bouchard-Côte, A., Vollmer, S.J., Doucet, A.: The bouncy particle sampler: a non-reversible rejection-free Markov Chain Monte Carlo method. J. Am. Stat. Assoc. 113(522), 855–867 (2018). https://doi.org/10.1080/01621459.2017.1294075
Christensen, O.F., Roberts, G.O., Rosenthal, J.S.: Scaling limits for the transient phase of local Metropolis–Hastings algorithms. J. R. Stat. Soc. Ser. B Stat. Methodol. 67(2), 253–268 (2005)
Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible Markov chain sampler. Ann. Appl. Probab. 10(3), 726–752 (2000)
Duncan, A.B., Lelievre, T., Pavliotis, G.A.: Variance reduction using nonreversible Langevin samplers. J. Stat. Phys. 163(3), 457–491 (2016)
Duncan, A.B., Pavliotis, G.A., Zygalakis, K.: Nonreversible Langevin samplers: splitting schemes, analysis and implementation. submitted (2017)
Dvoretzky, A., et al.: Asymptotic normality for sums of dependent random variables. In: Proceedings of 6th Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, pp. 513–535 (1972)
Horowitz, A.M.: A generalized guided Monte Carlo algorithm. Phys. Lett. B 268(2), 247–252 (1991)
Hwang, C.-R., Hwang-Ma, S.-Y., Sheu, S.-J.: Accelerating diffusions. Ann. Appl. Probab. 15(2), 1433–1444 (2005)
Jourdain, B., Lelievre, T., Miasojedow, B.: Optimal scaling for the transient phase of Metropolis Hastings algorithms: the longtime behavior. Bernoulli 20(4), 1930–1978 (2014)
Jourdain, B., Lelievre, T., Miasojedow, B.: Optimal scaling for the transient phase of the random walk Metropolis algorithm: the mean field limit. Ann. Appl. Probab. 25(4), 2263–2300 (2015)
Kuntz, J., Ottobre, M., Stuart, A.M.: Diffusion Limit for the Random Walk Metropolis Algorithm Out of Stationarity. arXiv:1405.4896 (2014)
Kuntz, J., Ottobre, M., Stuart, A.M.: Non-stationary Phase of the MALA Algorithm. arXiv:1608.08379 (2016)
Lu, J., Spiliopoulos, K.: Multiscale Integrators for Stochastic Differential Equations and Irreversible Langevin Samplers. arXiv:1606.09539 (2016)
Ma, Y.-A., Chen, T., Fox, E.: A complete recipe for stochastic-gradient MCMC. Adv. Neural Inf. Process. Syst. 28, 2899–2907 (2015)
Mattingly, J.C., Pillai, N.S., Stuart, A.M.: Diffusion limits of the random walk metropolis algorithm in high dimensions. Ann. Appl. Probab. 22(3), 881–930 (2012)
Monmarche, P.: Piecewise deterministic simulated annealing, ALEA. Lat. Am. J. Probab. Math. Stat. 13(1), 357–398 (2016)
Ottobre, M.: Markov chain Monte Carlo and irreversibility. Rep. Math. Phys. 77, 267–292 (2016)
Ottobre, M., Pillai, N.S., Pinski, F.J., Stuart, A.M.: A function space hmc algorithm with second order Langevin diffusion limit. Bernoulli 22(1), 60–106 (2016)
Pillai, N.S., Stuart, A.M., Thiéry, A.H.: Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22(6), 2320–2356 (2012)
Poncet, R.: Generalized and Hybrid Metropolis–Hastings Overdamped Langevin Algorithms. arXiv:1701.05833 (2017)
Rey-Bellet, L., Spiliopoulos, K.: Irreversible Langevin samplers and variance reduction: a large deviations approach. Nonlinearity 28(7), 2081–2103 (2015)
Rey-Bellet, L., Spiliopoulos, K.: Variance reduction for irreversible Langevin samplers and diffusion on graphs. Electron. Commun. Probab. 20(15), 1–16 (2015)
Roberts, G.O., Rosenthal, J.S.: Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 60(1), 255–268 (1998)
Roberts, G.O., Rosenthal, J.S.: Optimal scaling for various Metropolis–Hastings algorithms. Stat. Sci. 16(4), 351–367 (2001)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2, 341–363 (1996)
Rossky, P.J., Doll, J.D., Friedman, H.L.: Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys. 69(10), 4628–4633 (1978)
Acknowledgements
K.S. was partially supported by NSF CAREER AWARD NSF-DMS 1550918. N.S.P. was partially supported by the ONR grant N00014-18-1-2730.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Proof of Theorem 4.2
In this section we present the proof of our main results. The proof is based on diffusion approximation techniques analogous to those used in [18]. In [18] the authors consider the MALA algorithm with reversible proposal. That is, if we fix \(S=0\) in our paper and \(\Psi =0\) in their paper, the algorithms we consider coincide. For this reason we try to adopt a notation as similar as possible to the one used in [18] and, for the sake of brevity, we detail only the parts of the proof that differ from the work [18] and just sketch the rest.
We start by recalling that, by Fernique’s theorem,
This fact will be often implicitly used without mention in the remainder of the paper.
We also recall that the chain \(\{x_k^N\}_k\) that we consider is defined in (3.6); the drift-martingale decomposition of such a chain is given in Eq. (4.1) . Let us start by recalling the definition of the continuous interpolant of the chain, Eqs. (4.5)–(4.6), and by introducing the piecewise constant interpolant of the chain \(x_k^N\), that is,
where \( t_k=k/N^{\zeta \gamma }\). It is easy to see (see e.g., [21, Appendix A]) that
where
For any \(t \in [0,T]\), we set
With the above notation, we can then rewrite (A.2) as
Let now \(C([0,T];{\mathcal {H}})\) denote the space of \({\mathcal {H}}\)-valued continuous functions, endowed with the uniform topology and consider the map
That is, \({\mathcal {I}}\) is the map that to every \((x_0 , \eta (t)) \in {\mathcal {H}} \times C([0,T];{\mathcal {H}})\) associates the (unique solution) of the equation
From (A.4) it is clear that \(x^{(N)}= {\mathcal {I}}(x_0^N, {\hat{w}}_{{\mathbf {p}}}^N)\). Notice that, under our continuity assumption on \({\tilde{S}}\), \({\mathcal {I}}\) is a continuous map. Therefore, in order to prove that \(x^{(N)}(t)\) converges weakly to x(t) (where x(t) is the solution of Eq. (A.5) with \(\eta (t)=D_{{\mathbf {p}}}W^{{\mathcal {C}}}(t)\)), by the continuous mapping theorem we only need to prove that \({\hat{w}}_{{\mathbf {p}}}^N\) converges weakly to \(D_{{\mathbf {p}}}W^{{\mathcal {C}}}(t)\), where \(W^{{\mathcal {C}}}(t)\) is a \({\mathcal {H}}\)-valued \({\mathcal {C}}\)-Brownian motion. The weak convergence of \({\hat{w}}_{{\mathbf {p}}}^N\) to \(D_{{\mathbf {p}}}W^{{\mathcal {C}}}(t)\) is a consequence of (A.3) and of Lemmata A.1 and A.2. Then, we get the statement of Theorem 4.2. The proof of Lemmas A.1 and A.2 is contained in the remained of this “Appendix”.
Lemma A.1
Under Assumptions 5.1, 5.2 and Condition 5.3, the following holds
and
where the function \(d_{{\mathbf {p}}}(x)\) has been defined in the statement of Theorem 4.2.
Set now
and
While \(h_{{\mathbf {p}}}\) [see (5.28)] is the limiting average acceptance probability, \(h_{{\mathbf {p}}}^N(x)\) is the local average acceptance probability. The above notation will be used in the proof of the next lemma.
Lemma A.2
If Assumptions 5.1, 5.2 and Condition 5.3 hold, then \(w_{{\mathbf {p}}}^N(t)\) converges weakly in \(C([0,T];{\mathcal {H}})\) to \(D_{{\mathbf {p}}}W^{{\mathcal {C}}}(t)\), where \(W^{{\mathcal {C}}}(t)\) is a \({\mathcal {H}}\)-valued \({\mathcal {C}}\)-Brownian motion and the constant \(D_{{\mathbf {p}}}\) has been defined in the statement of Theorem 4.2.
Proof of Lemma A.1
We start by proving (A.7), which is simpler. The drift coefficient \(d_{{\mathbf {p}}}\) is globally Lipshitz; therefore, using (3.6), (3.3) and (3.4), if \(t_k\le t <t_{k+1}\), we have
Let us now come to the proof of (A.6). From (4.3)–(4.4), we have
where
We split the function \(d_{{\mathbf {p}}}(x)\) in three (corresponding) parts:
with
We therefore need to consecutively estimate the above three terms.
\( \bullet \mathbf{\,\, A_1^N-A_1}:\) if \(\alpha \ge 2\) (and \(\gamma =1/6\)) we fix \(\zeta =2\) and we have
as the first addend tends to zero by Lemma B.2 of B and the second addend tends to zero by definition (see also [22, equation (4.3)]). If \(\alpha <2\) then \(\zeta =\alpha \) and we have
\( \bullet \mathbf{\,\, A_2^N-A_2}:\) if \(\alpha \le 2\) then, recalling (4.5), a calculation analogous to the one in (A.11) gives the statement. If \(\alpha >2\) then we can act as in (A.12).
\( \bullet \mathbf{\,\, A_3^N-A_3}:\) by Lemma B.2 of B [and (4.5)], we have
This concludes the proof. \(\square \)
Proof of Lemma A.2
The calculations here are standard so we only prove it for the case \(\gamma =1/6\) and \(\alpha >2\). Let us recall the martingale difference given by (5.35). In the case \(\gamma =1/6\) and \(\alpha >2\), we have \(\zeta =2\) and \(d_{{\mathbf {p}}}(x)=-\frac{\ell ^2}{2}h_{{\mathbf {p}}}x\). Hence, the expression (5.35) becomes
By Lemma A.1, we have that \({\mathbb {E}}_{\pi ^N}\Vert d_{{\mathbf {p}}}^N(x)-d_{{\mathbf {p}}}(x) \Vert ^{2} \longrightarrow 0\) as \(N\rightarrow \infty \). This implies that
At the same time, we also notice that
Hence, if we define \({\mathcal {M}}_{N}(x)={\mathbb {E}}_x\left[ M_k^N \otimes M_k^N|x_k^N=x\right] \), we obtain that up to a constant
Then, as in Lemma 4.8 of [22] we obtain that
which then immediately implies that
The latter result implies that the invariance principle of Proposition 5.1 of [1] holds, which then imply the statement of the lemma. \(\square \)
Appendix B. Auxiliary estimates
We first decompose \(Q^N\) as follows: let
Then
where
with
having defined
and
Finally, we set
That is, \({\tilde{e}}^N\) contains only the addends of \(e^N\) that depend on the noise z.
Furthermore, we split \(Q^N(x,z)\) into the terms that contain \(z^{j,N}\) and the terms that don’t, \(Q^N_j\) and \(Q^N_{j, \perp }\), respectively; that is
where
having denoted by \((i_1^N)_j, \, (i_2^N)_j\) and \({Z}^N_j\), the part of \(i_1^N, i_2^N\) and \({Z}^N\), respectively, that depend on \(z^{j,N}\).
Lemma B.1
Let Assumptions 5.1, 5.2 and Condition 5.3 hold; then,
Proof of Lemma B.1
Recall that, under \(\pi ^N\), \(x^{i, N} \sim \lambda _i \rho ^i\), where \(\{\rho ^i\}_{i \in {\mathbb {N}}}\) are i.i.d standard Gaussians. We now consecutively prove all the statements of the lemma.
Proof of (i) Notice that
Therefore, since \(\gamma \ge 1/6\),
Furthermore, using (5.15) (which follows from point i) of Assumption 5.2) and (5.11), we have
similarly, using (5.16) (which follows from point ii) of Assumption 5.2) and (5.12),
Now the first statement of the lemma is a consequence of (B.6)–(B.7) and the above (B.13), (B.14) and (B.15).
Proof of (ii) This is proved in [22], see calculations after [22, equation (4.18)], so we omit it.
Proof of (iii) This estimate follows again from Assumption 5.2, once we observe that if \(x \sim \pi ^N\) then \( \Vert {\tilde{S}}^N x^N\Vert _{{\mathcal {C}}^N}^2\) and \(\Vert {\tilde{S}}^N ({\mathcal {C}}^N)^{1/2} z^N\Vert _{{\mathcal {C}}^N}^2\) are two independent random variables with the same distribution. With this observation in place, we have
which gives the claim.
Proof of (iv) We recall that \({Z}^N_j\) has been introduced in (B.8). Using the antisymmetry of \(S^N\) and the definition of \({\tilde{S}}^N\), we have
and
We can therefore write an explicit expression for \({Z}^N_j\):
For the sake of clarity we stress again that in the above \((S^Nx^N)\) is an N-dimensional vector and \((S^Nx^N)^j\) is the j-th component of such a vector. Therefore, recalling (2.3), (2.1), (A.1) and setting \(\gamma =1/6\), we have
Therefore,
Proof of (v) Follows from Assumption 5.2, from statement (i) of this lemma and from (B.7).
Proof of (vi) From (B.1),
Therefore,
where \(r^N\) contains all the cross-products in the expansion of the variance. By direct calculation and using the antisymmetry of S, one finds that most of such cross products vanish and we have
Observe that
using this fact, Assumption 5.2 implies that the first addend in the above expression for \({\mathbb {E}}_x r^N\) vanishes as \(N \rightarrow \infty \). The second addend contributes instead to the limiting variance. Now straightforward calculations give the result.
\(\square \)
We recall the definitions
and
Lemma B.2
Suppose that Assumptions 5.1, 5.2 and Condition 5.3 hold. Then
- (i)
If \(\alpha > 2\) and \(\gamma = 1/6\),
$$\begin{aligned} N^{1/3}{\mathbb {E}}_{\pi ^N}\Vert \epsilon _{{\mathbf {p}}}^N(x)\Vert ^2 {\mathop {\longrightarrow }\limits ^{N\rightarrow \infty }} 0\, ; \end{aligned}$$(B.18) - (ii)
if \(1 \le \alpha \le 2\) and \(\gamma \ge 1/6\) then
$$\begin{aligned} {\mathbb {E}}_{\pi ^N}\Vert N^{\gamma (\alpha -1)}\epsilon _{{\mathbf {p}}}^N(x) + 2 \ell ^{\alpha -1} \nu _{{\mathbf {p}}}\, {\tilde{S}} x\Vert ^2 {\mathop {\longrightarrow }\limits ^{N\rightarrow \infty }} 0 \, \end{aligned}$$(B.19)where the constant \(\nu _{{\mathbf {p}}}\) has been defined in (5.31).
- (iii)
if \(\alpha \ge 1\), \(\gamma \ge 1/6\) and \(S^N\) is such that \((c_1, c_2, c_3) \ne (0,0,0)\), then
$$\begin{aligned} {\mathbb {E}}_{\pi ^N}\left| h_{{\mathbf {p}}}^N(x)-h_{{\mathbf {p}}}\right| ^2 {\mathop {\longrightarrow }\limits ^{N\rightarrow \infty }} 0 \,; \end{aligned}$$(B.20) - (iv)
finally, if \( 1< \alpha < 2 \), \(\gamma >1/6\) and \(S^N\) is such that \(c_1=c_2=c_3=0\), then
$$\begin{aligned} {\mathbb {E}}_{\pi ^N}\left| h_{{\mathbf {p}}}^N(x)- 1\right| ^2 {\mathop {\longrightarrow }\limits ^{N\rightarrow \infty }} 0 \,, \end{aligned}$$i.e., the constant \(h_{{\mathbf {p}}}\) in (B.20) is equal to one. This means that, as \(N \rightarrow \infty \), the acceptance probability tends to one.
Proof of Lemma B.2
\(\bullet \,\)Proof of (i) Acting as in [22, page 2349], we obtain
Taking the sum over j on both sides of the above then gives
Therefore, if we show
(B.18) follows. From (B.8), it is clear that the above is a consequence of Lemma B.1 [in particular, it follows from (B.9)–(B.10)].
\(\bullet \,\)Proof of (ii) Let us split \(Q^N\) as follows:
where \( e^N\) and \(i_2^N\) are defined in (B.6) and (B.5), respectively, while
having set
and \(i_1^N\) is defind in (B.4). The j-th component of \(N^{\gamma (\alpha -1)} \epsilon _{{\mathbf {p}}}^N\) can be therefore expressed as follows:
where \(T_0^j:=\langle T_0, \varphi _j \rangle \) and
We now decompose \(R^N\) into a component which depends on \(z^{j,N}\), \(R^N_j\), and a component that does not depend on \(z^{j,N}\), \(R^N_{j, \perp }\):
with
having denoted by \( (i_1^N)_j, (B^N)_j\) and \((H^N)_j\) the part of \(i_1, B^N\) and \(H^N\), respectively, that depend on \(z^{j,N}\). That is,
as for \((H^N)_j\), it suffices to notice that
and the expression for \({Z}^N_j\) is detailed in (B.16) (just set \(\alpha =2\) in (B.16)). With this notation, from (B.25), we further write
where, like before, \(T_1^j:=\langle T_1, \varphi _j \rangle \) and
We recall that the notation \({\mathbb {E}}_x\) denotes expected value given x, where the expectation is taken over all the sources of noise contained in the integrand. In order to further evaluate the RHS of (B.27) we calculate the expected value of the integrand with respect to the law of \(z^j\) (we denote such expected value by \({\mathbb {E}}^{z^j}\) and use \({\mathbb {E}}^{z^j_-}\) to denote expectation with respect to \(z^N\setminus z^{j,N}\)); to this end, we use the following lemma.
Lemma B.3
If G is a normally distributed random variable with \(G \sim {\mathcal {N}}(0,1)\) then
where \(\Phi \) is the CDF of a standard Gaussian.
We apply the above lemma with \(\mu = R^N_{j,\perp }\) and \(\delta = \delta ^B_j\), where
We therefore obtain
and
To prove the statement it suffices to show that
These calculations are a a bit lengthy, so we gather the proof of the above in Lemma B.4 below. Assuming for the moment that the above is true, the proof is concluded after recognising that
\(\bullet \,\)Proof of (iii) By acting as in the proof of [22, Lemma 4.5 and Corollary 4.6] we see that (B.20) is a consequence of (B.2), (B.12) and of the following limit:
The above follows from the definition (B.3), Lemma B.1, Assumption 5.2 and [22, equation (4.7)].
\(\bullet \,\)Proof of (iv) One could show this with the same procedure as in (iii). However, in this case things are easier, indeed we can write
The above limit follows simply by the assumption that \(\gamma >1/6\) and \(c_1=c_2=c_3=0\). \(\square \)
Lemma B.4
If Assumptions 5.1, 5.2, 5.4 and Condition 5.3 hold, then
where \(T_n=\sum _{i=1}^N \langle T_n, \varphi _i\rangle \varphi _i\) and the terms \(T_0^j ,{\dots },T_5^j\) have been introduced in (B.26)–(B.28).
Proof of Lemma B.4
We consecutively prove the above limit for the terms \(T_0^j ,{\dots },T_5^j\).
\(\bullet \) Using the Lipshitzianity of the function \(u \rightarrow 1 \wedge e^u\), the result for \(T_0\) follows from the definition of \(R^N\), Eq. (B.21), and Lemma B.1, statements (iii) and (v). The result for \(T_1\) can be obtained similarly.
\(\bullet \) Term \(T_2\): we using the the lipshitzianity of the function \(\Phi \) and observe that the following holds
The above can be obtained with a reasoning similar to the one detailed in [18, page 916 and (5.20)], using (i) of Assumption 5.4 . Using the above observations and applying the Hoelder inequality with the exponent r appearing in (ii) of Assumption 5.4, one then gets,
Therefore the term \(T_2\) goes to zero by Assumption 5.4.
\(\bullet \) The term \(T_3\) can be treated with calculations completely analogous to those in [18, Lemma 5.8]. As a result of such calculations, using the fact that the noise \(z^{j,N}\) is always independent on the current position x, and recalling Eq. (B.16), we obtain that for any \(r,q>1\) (to be later appropriately chosen), the following bound holds:
Now set
so that
Notice that by the bounded convergence theorem, we have
With this observation it is easy to show that the term (I) tends to zero. It is less easy to show that (II) tends to zero, so for this term we detail calculations a bit more.
We denote by (II)\(_1\) to (II)\(_4\) the terms in line 1 to 4 of the above array of equations and the scond factor in line i we denote by (II)\(_{ib}\), so e.g.,
where
To streamline the presentation we have written the calculations leading to the above four addends in a way that it looks like the choice of q should be the same for the four terms above. However, acting appropriately in the computations that give (B.29), one can see that the q does not need to be the same for each one of the above addends. We show how to study (II)\(_1\) and (II)\(_3\), the other terms can be done with similar tricks. Starting from (II)\(_1\), because of (B.30), we just need to prove that (II)\(_{1b}\) is bounded. We will do slightly better in what follows. Recall that by assumption
Choosing \(q=2\) in the definition of (II)\(_{1b}\) and recalling \(x^{j,N} \sim \lambda _j \rho _j\), we get
where in the last inequality we have used the weighted Jentsen’s inequality (relying on the fact that \(\{\lambda _j^2\}_j\) is summable) and the convergence of the RHS to zero follows from (B.31). The term (II)\(_{2b}\) can be dealt with analogously, choosing \(q=4\) (this time when applying the weighted Jentsen’s inequality one should rely on the fact that the sequence \(\{\lambda _j^4 \left| (Sx)^j\right| ^2 \}_j \) is summable for every \(x \in {\mathcal {H}}\)). Finally, to deal with (II)\(_{3b}\), we use the fact that the sequence \(\{({\tilde{S}}^2 x)^j\}_j\) is, by assumpion, bounded for every \(x \in {\mathcal {H}}\). Therefore, choosing \(q=2\) we have:
Because \(2\alpha \gamma -\gamma > 2\gamma (\alpha -1)\), the RHS of the above tends to zero by using (B.31). The term (II)\(_{4b}\) can be dealt with in a completely analogous manner.
\(\bullet \) The terms \(T_4\) and \(T_5\) can be studied similarly to what has been done in [14], see calculations from equation (8.31), in particular the terms \(e^{i,N}_{3,k},e^{i,N}_{5,k}\). \(\square \)
Rights and permissions
About this article
Cite this article
Ottobre, M., Pillai, N.S. & Spiliopoulos, K. Optimal scaling of the MALA algorithm with irreversible proposals for Gaussian targets. Stoch PDE: Anal Comp 8, 311–361 (2020). https://doi.org/10.1007/s40072-019-00147-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40072-019-00147-5