Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

Durmus, Alain; Moulines, Éric

doi:10.1007/s11222-014-9511-z

Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

Published: 20 September 2014

Volume 25, pages 5–19, (2015)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Alain Durmus¹ &
Éric Moulines²

1034 Accesses
16 Citations
Explore all metrics

Abstract

In this paper, we establish explicit convergence rates for Markov chains in Wasserstein distance. Compared to the more classical total variation bounds, the proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying sampling space is large. We illustrate these results by analyzing the Exponential Integrator version of the Metropolis Adjusted Langevin Algorithm. We illustrate our findings using a Bayesian linear inverse problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Non-stationary phase of the MALA algorithm

Article Open access 17 April 2018

Juan Kuntz, Michela Ottobre & Andrew M. Stuart

Large deviations conditioned on large deviations I: Markov chain and Langevin equation

Article 31 May 2019

Bernard Derrida & Tridib Sadhu

Maximum Kolmogorov-Sinai Entropy Versus Minimum Mixing Time in Markov Chains

Article 21 November 2017

M. Mihelich, B. Dubrulle, … D. Faranda

Notes

The underlying problem is typically infinite-dimensional; the problem is finite dimensional after truncation.

References

Baxendale, P.H.: Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B), 700–738 (2005)
Article MATH MathSciNet Google Scholar
Beskos, A., Roberts, G., Stuart, A.M., Voss, J.: An MCMC method for diffusion bridges. Stochas. Dynam. 8, 319–350 (2008)
Article MATH MathSciNet Google Scholar
Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Wiley, New York (1973) (reprinted in paperback 1992 ISBN: 0-471-57428-7 pbk.)
Butkovsky, O.: Subgeometric rates of convergence of Markov processes in the Wasserstein metric. Ann. Appl. Probab. 24(2), 526–552 (2014)
Cotter, S.L., Roberts, G.O., Stuart, A.M., White, D.: MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci. 28(3), 424–446 (2013)
Article MathSciNet Google Scholar
Dashti, M., Law, K.J.H., Stuart, A.M., Voss, J.: MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Probl. 29(9), 095017 (2013)
Article MathSciNet Google Scholar
Eberle, A.: Error bounds for Metropolis-Hastings algorithms applied to perturbations of gaussian measures in high dimensions. Ann. Appl. Probab. 24(1), 337–377 (2014)
Article MATH MathSciNet Google Scholar
Hairer, M., Mattingly, J.C., Scheutzow, M.: Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probab. Theor. Relat. Fields 149(1–2), 223–259 (2011)
Article MATH MathSciNet Google Scholar
Hairer, M., Stuart, A.M., Vollmer, S.J.: Spectral gaps for Metropolis-Hastings algorithm in infinite dimensions. Ann. Appl. Probab. 24(6), 2455–2490 (2014)
Jarner, S.F., Tweedie, R.L.: Locally contracting iterated functions and stability of Markov chains. J. Appl. Probab. 38(2), 494–507 (2001)
Article MATH MathSciNet Google Scholar
Jones, G., Hobert, J.: Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16(4), 312–334 (2001)
Article MATH MathSciNet Google Scholar
Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160. Applied Mathematical Sciences. Springer, New York (2005)
Google Scholar
Kent, J.: Time-reversible diffusions. Adv. Appl. Probab. 10(4), 819–835 (1978)
Article MATH MathSciNet Google Scholar
Lord, G.J., Rougemont, J.: A numerical scheme for stochastic PDEs with Gevrey regularity. IMA J. Numer. Anal. 24(4), 587–604 (2004)
Article MATH MathSciNet Google Scholar
Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, New York (2009)
Book MATH Google Scholar
Meyn, S.P., Tweedie, R.L.: Computable bounds for convergence rates of Markov chains. Ann. Appl. Probab. 4, 981–1011 (1994)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Tweedie, R.L.: Bounds on regeneration times and convergence rates for Markov chains. Stochast. Process. Appl. 80, 211–229 (1999)
Rogers, L., Williams, D.: Diffusions, Markov processes, and martingales, vol. 2. Cambridge Mathematical Library, Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)
Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90(430), 558–566 (1995)
Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)
Article MATH MathSciNet Google Scholar
Villani, C.: Optimal transport : Old and New. Grundlehren der Mathematischen Wissenschaften. Springer, Berlin (2009)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Département de Mathématiques, École Normale Supérieure de Cachan, 61 Av. du Président Wilson, 94235, Cachan Cedex, France
Alain Durmus
Institut Mines-Télécom; Télécom ParisTech ; CNRS LTCI, 46 rue Barrault, 75634, Paris, Cedex 13, France
Éric Moulines

Authors

Alain Durmus
View author publications
You can also search for this author in PubMed Google Scholar
Éric Moulines
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alain Durmus.

A Proofs

Lemma 5

Assume M1.

1.
For all $x,y,z \in \mathbb {R}^d$,
$$\begin{aligned}&\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),z \right\rangle \le C_\varUpsilon \left\| x-y \right\| _Q\left\| z \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (x) \right\| _Q \le C_\varUpsilon \left\| x \right\| _Q + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&\quad \left\langle \nabla \varUpsilon (z), x-y \right\rangle \le C_\varUpsilon ( C_\varUpsilon \left\| z \right\| _Q \\&\quad + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q) \left\| x-y \right\| _Q. \end{aligned}$$
2.
For all $x,y \in \mathbb {R}^d$ and $h \le 4/(C_\varUpsilon ^2 +1)$,
$$\begin{aligned}&\left\| x-(h/2)Q^{-1}\nabla U(x) - \{y - (h/2) Q^{-1} \nabla U (y) \} \right\| _Q \nonumber \\&\quad \le \nu \left\| x-y \right\| _Q, \end{aligned}$$
(41)
where
$$\begin{aligned} \nu = \left( 1- h (1- h(1+C_\varUpsilon ^2)/4) \right) ^{1/2}. \end{aligned}$$
(42)
In particular,
$$\begin{aligned}&\left\| x - (h/2) Q^{-1}\nabla U(x) \right\| _Q\nonumber \\&\quad \le \nu \left\| x \right\| _Q + (h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q. \end{aligned}$$
(43)

Proof

(1)
is just a consequence of the definition of $\left\langle \cdot ,\cdot \right\rangle _Q$, M1, the Cauchy–Schwarz inequality and the triangle inequality.
(2)
Let $h \le 4/(C_\varUpsilon ^2+1)$. On M1, since $\varUpsilon $ is convex and $C^1$, for all $x,y \in \mathbb {R}^d$,
$$\begin{aligned}&\left\| x-y-(h/2)Q^{-1}(\nabla U(x) - \nabla U(y)) \right\| _Q^2 \\&\quad = (1-h/2)^2\left\| x-y \right\| _Q^2 \\&\qquad +(h^2/4)\left\| Q^{-1}( \nabla \varUpsilon (x) - \nabla \varUpsilon (y)) \right\| _Q^2 \\&\qquad - h (1-h/2)\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),x-y \right\rangle \\&\quad \le \left( 1- h(1 - h (C_\varUpsilon ^2+1) /4) \right) \left\| x-y \right\| _Q ^2, \end{aligned}$$
showing (41). Equation (43) follows from (41) and the triangle inequality. $\square $

Lemma 6

Assume M1.

1.
For all $x,y \in \mathbb {R}^d$,
$$\begin{aligned} \left\| X_1(x) - Y_1(y) \right\| _Q&\le \nu \left\| x-y \right\| _Q \\ \left\| X_1(x) \right\| _Q&\le \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&+ \widetilde{h}\left\| Q^{-1/2}Z_0 \right\| _Q, \end{aligned}$$
where $(X_1(x), Y_1(y))$ and $\nu $ are given by (28) and (42), respectively.
2.
For all $x,y \in \mathbb {R}^d$
$$\begin{aligned}&\left\| Q^{-1}(\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y))) \right\| _Q \le C_\varUpsilon \nu \left\| x-y \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (X_1(x)) \right\| _Q \\&\quad \le C_\varUpsilon \left( \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q + \widetilde{h}\left\| Z_0 \right\| \right) \!. \end{aligned}$$

Proof

(1)
is just a consequence of Lemma 5-(2). Then
(2)
follows from M1 and Lemma 5-(1). $\square $

Lemma 7

Assume M1. There exists $C$ such that

$$\begin{aligned}&\left| G_h(x, X_1(x) ) - G_h(y,Y_1(y)) \right| \le C \left\| x-y \right\| _Q \nonumber \\&\quad \times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

(44)

Proof

Let us write

$$\begin{aligned} \left| G_h(x, X_1(x) ) - G_h(y,Y_1(y)) \right| \le \sum _{i=1}^4 I_i. \end{aligned}$$

Using M1, Lemma 5 and Lemma 6 we have the following inequalities for $I_i$, $i=1 \ldots 4$.

$$\begin{aligned} I_1&= \left| \varUpsilon (x) -\varUpsilon (y) \right| + \left| \varUpsilon (X_1(x) )-\varUpsilon (Y_1(y)) \right| \\&\quad \;+ \left| \varGamma (x) -\varGamma (y) \right| + \left| \varGamma (X_1(x) )-\varGamma (Y_1(y)) \right| \\&\le \left| \int _0 ^1 \left\langle \nabla \varUpsilon (tx + (1-t)y),x-y \right\rangle \mathrm {d}t \right| \\&\quad \;+ \Bigg | \int _0 ^1 \langle \nabla \varUpsilon (tX_1(x) + (1-t)Y_1(y)),\\&\qquad X_1(x)-Y_1(y)\rangle \mathrm {d}t \Bigg | \\&\quad \;+ \left| \int _0 ^1 \left\langle \nabla \varGamma (tx + (1-t)y),x-y \right\rangle \mathrm {d}t \right| \\&\quad \;+ \Bigg |\int _0 ^1 \langle \nabla \varGamma (tX_1(x) + (1-t)Y_1(y)),\\&\qquad X_1(x)-Y_1(y) \mathrm {d}t \rangle \Bigg | \\&\le C_1 \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

$$\begin{aligned} I_2&= (1/2)\vert \left\langle x-X_1(x),\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (y) + \nabla \varUpsilon (Y_1(y) ) \right\rangle \vert \\&= (1/2) \vert \left\langle x-y,\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;+ \left\langle Y_1(y)-X_1(x),\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (y) -\nabla \varUpsilon (x) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (Y_1(y)) - \nabla \varUpsilon (X_1(x) ) \right\rangle \vert \\&\le C_2 \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

Denote $\widehat{h} =h /(8-2h)$. Then,

$$\begin{aligned} I_3&= \widehat{h} \vert \left\langle X_1(x)+x,\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;- \left\langle Y_1(y)+y,\nabla \varUpsilon (Y_1(y)) - \nabla \varUpsilon (y) \right\rangle \vert \\&= \widehat{h} \vert \left\langle X_1(x) -Y_1(x),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle x-y,\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle Y_1(y)+y,\nabla \varUpsilon (X_1(x)) -\nabla \varUpsilon (Y_1(y)) \right\rangle \\&\quad \;+ \left\langle Y_1(y)+y,\nabla \varUpsilon (y) -\nabla \varUpsilon (x) \right\rangle \vert \\&\le C_3 \widehat{h} \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

$$\begin{aligned} I_4&= \widehat{h} \vert \left\| Q^{-1/2}\nabla \varUpsilon (y) \right\| _Q^2 - \left\| Q^{-1/2} \nabla \varUpsilon (x) \right\| _Q^2 \\&\quad \;+ \left\| Q^{-1/2}\nabla \varUpsilon (X_1(x)) \right\| _Q^2 - \left\| Q^{-1/2}\nabla \varUpsilon (Y_1(y)) \right\| _Q^2 \vert \\&= \widehat{h}\vert \left\langle Q^{-1}(\nabla \varUpsilon (y) + \nabla \varUpsilon (x)) , \nabla \varUpsilon (y) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle Q^{-1} \nabla \varUpsilon (X_1(x)),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y)) \right\rangle \\&\quad \;+ \left\langle Q^{-1} \nabla \varUpsilon (Y_1(y)),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y)) \right\rangle \vert \\&\le C_4 \widehat{h} \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

$\square $

Proof of Proposition

4 Set $R = \nu ^{-1} \vee R_h$. First, by Lemma 6-(1) and M1,

$$\begin{aligned}&\sup _{x \in \mathrm{B }_Q\left( 0,R\right) } P \fancyscript{V}(x) \nonumber \\ \nonumber&\quad \le \sup _{x \in \mathrm{B }_Q\left( 0,R\right) } \mathbb {E} \left[ 1 \vee \left\| x \right\| _Q \vee \left\| X_1(x) \right\| _Q \right] \nonumber \\&\quad \le 1 \vee \left( R + (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] \right) < +\infty \nonumber \\ \end{aligned}$$

(45)

Next, if $x \not \in \mathrm{B }_Q\left( 0,R\right) $, let $X_1(x)$ defined by (28) and $U \sim \mathcal {U} (\left[ 0,1\right] )$ and set

$$\begin{aligned} \fancyscript{A}(x)&= \left\{ \alpha _h(x, X_1(x)) \le U \right\} \\ \fancyscript{I}&= \left\{ \widetilde{h}\left\| Z_0 \right\| \le r_h \right\} . \end{aligned}$$

On $\fancyscript{A}(x)$, the proposal is accepted and $\mathbf {X}_1 = X_1(x)$. On this complement, $\mathbf {X}_1 = x$. Then by Lemma 6-(1) and since $\left\| x \right\| _Q \ge \nu \left\| x \right\| _Q \ge 1$,

$$\begin{aligned}&\!\!\!P \fancyscript{V}(x) \nonumber \\&\le \mathbb {E} \left[ \left\| X_1(x) \right\| _Q \mathbb {1}_{\fancyscript{A}(x) \cap \fancyscript{I}} \right] \nonumber \\&+ \mathbb {E} \left[ \left\| x \right\| _Q \mathbb {1}_{\fancyscript{A}(x)^c \cap \fancyscript{I}} \right] + \mathbb {E} \left[ \left\| x \right\| _Q \vee \left\| X_1(x) \right\| _Q \mathbb {1}_{\fancyscript{I}^c} \right] \nonumber \\&\le \nu \left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{A}(x) \cap \fancyscript{I} \right] + \left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{A}(x)^c \cap \fancyscript{I} \right] \nonumber \\&\quad \;+\left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{I}^c \right] + (h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q + \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \nonumber \\&\le \lambda \left\| x \right\| _Q + (h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \end{aligned}$$

(46)

with $\lambda = \mathbb {P}\left[ \fancyscript{I} \right] (1 - (1-\nu ) \mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] ) + \mathbb {P}\left[ \fancyscript{I}^c \right] $. Since by M2, $\mathbb {P}\left[ \fancyscript{I} \right] $ and $\mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] $ are positive, $\lambda \in \left( 0,1\right) $. In addition on M1, $(h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] $ is finite. Therefore, the proof is concluded combining (45) and (46). $\square $

Lemma 8

Let $x,y \in \mathbb {R}^d$, $T ,\epsilon , \eta \in \mathbb {R}^*_+$ and $\psi \in \fancyscript{F}_{x,y}^T$. Denote

$$\begin{aligned} \delta _\epsilon (x,y) {\overset{\mathrm{def}}{=}} \epsilon / (\eta (\left\| x \right\| _Q \vee \left\| y \right\| _Q-\epsilon )\vee 0 + 1). \end{aligned}$$

(47)

(i)
$T \ge \left\| x-y \right\| _Q$.
(ii)
If $ \eta \int _0 ^T \left\| \psi (s) \right\| _Q \mathrm {d}s + T < \epsilon $, then $T \le \delta _\epsilon (x,y) \le \epsilon $.
(iii)
$d_{\eta ,\epsilon }(x,y) \le \epsilon ^{-1} \, \left\| x-y \right\| _Q \left( \eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} +1 \right) $
(iv)
If $d_{\eta ,\epsilon }(x,y) < 1$, then
$$\begin{aligned} d_{\eta , \epsilon }(x,y)&\ge \epsilon ^{-1} \, \left\| x-y \right\| _Q \\&\left( (\eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\}-\delta _\epsilon (x,y))\vee 0 +1 \right) . \end{aligned}$$
(v)
If $d_{\eta , \epsilon }(x,y) < 1$ then
$$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \left\| x-(h/2)\nabla U(x) - (y - (h/2) \nabla U(y)) \right\| _Q \right. \\&\quad \left\{ \eta \left\| x-(h/2)\nabla U(x) \right\| _Q \vee \left\| y-(h/2)\nabla U(y) \right\| _Q \right. \\&\quad \left. \left. \quad + \eta \widetilde{h}\left\| Z_0 \right\| _Q+1 \right\} \right) \\&\quad \left( \left\| x-y \right\| _Q\left\{ \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right\} \right) ^{-1} \end{aligned}$$
and under M1 for all $h \le 4/(C_\varUpsilon ^2 +1)$,
$$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \nu \left\{ \eta \nu \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} \right. \right. \\&\quad \left. \left. +\eta (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\eta \left\| Z_0 \right\| +1 \right\} \right) \\&\quad \left( \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right) ^{-1} \;, \end{aligned}$$
with $\nu $ given by (42).

Proof of Lemma 8

(i) By definition of $\fancyscript{F}_{x,y}^T$,

$$\begin{aligned} \left\| x-y \right\| _Q&= \left\| \psi (T) - \psi (0) \right\| _Q \\&\le \int _0^T \left\| \psi '(s) \right\| _Q ds = T. \end{aligned}$$

(ii) First, it is straightforward to see $T < \epsilon $. In addition, for all $s \in \left[ 0,T\right] $,

$$\begin{aligned} \nonumber \left\| \psi (s) \right\| _Q&\ge \left| \left\| x \right\| _Q - s \right| \vee \left| \left\| y \right\| _Q - (T-s) \right| \\&\ge \left| \left\| x \right\| _Q \vee \left\| y \right\| _Q -\epsilon \right| . \end{aligned}$$

(48)

Then the result follows from integrating the inequality between $0$ and $T$ and using the assumption.

(iii) It suffices to consider the particular case, $T = \left\| y-x \right\| _Q$ and $\psi \in \fancyscript{F}^T_{x,y}$ defined for $s \in \left[ 0,T\right] $ by

$$\begin{aligned} \psi (s) = x +(y-x)s / \left\| x-y \right\| _Q. \end{aligned}$$

(iv) Let $\{(T_n,\psi _n) ; n \in \mathbb {N}\}$ such that

$$\begin{aligned} \lim _{n \rightarrow +\infty } \epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right)&= d_{\eta ,\epsilon }(x,y) \\ \nonumber \epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right)&< 1 \quad \forall n \end{aligned}$$

(49)

Then using (i)–(ii) and (48), for all $n$ it holds

$$\begin{aligned}&\epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right) \\&\quad \ge \epsilon ^{-1} \left\| x-y \right\| _Q\\&\qquad \left( (\eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\}-\delta _\epsilon (x,y))\vee 0 +1 \right) . \end{aligned}$$

The result now comes from (49).

(v) The claimed inequalities come from (iii)–(iv) with the definition of the basic coupling (28) for the first one, and using Lemma 6-(1) for the second. $\square $

Lemma 9

Assume M1 and M2.

Then for all $h \in \left( 0, h_\ell \wedge (4/C_U) \right) $, there exists $\epsilon , \eta , \tau >0$ such that for all $x,y \in \mathbb {R}^d$, $d_{\eta ,\epsilon }(x,y) <1$,

$$\begin{aligned} \mathbf{K}_\mathrm{M}d_{\eta ,\epsilon }(x,y) \le (1- \tau )d_{\eta ,\epsilon }(x,y), \end{aligned}$$

with $d_{\eta ,\epsilon }$ given by (32). In particular, for all $x,y \in \mathbb {R}^d$,

$$\begin{aligned} \mathbf{K}_\mathrm{M}d_{\eta ,\epsilon }(x,y) \le d_{\eta ,\epsilon }(x,y). \end{aligned}$$

Proof

For ease of notation, we simply write $\mathbf {K}$ for $\mathbf{K}_\mathrm{M}$. Let $x,y \in \mathbb {R}^d$, $d_{\eta ,\epsilon }(x,y) <1$. Then, if $\left\| y \right\| _Q \le R_h$,

$$\begin{aligned} \left\| x \right\| _Q \le \left\| x-y \right\| _Q + \left\| y \right\| _Q \le \epsilon + R_h, \end{aligned}$$

where we used Lemma 8-(iv). Therefore, as we will choose $\epsilon $ small enough we can assume $\epsilon \le 1$, and we end up with two cases: either $x,y \in \mathrm{B }_Q\left( 0, R_h +2\right) $ or $x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) $. Let $(\mathbf {X}_1,\mathbf {Y}_1)$ be the basic coupling between $P(x,\cdot )$ and $P(y,\cdot )$; let $Z_0, U$ be resp. the Gaussian variable and the uniform variable used for the basic coupling. Let $\mathbb {P}\left[ \cdot \right] $ and $\mathbb {E} \left[ \cdot \right] $ be the probability and the expectation over $Z_1$ and $U_1$. Set

$$\begin{aligned} \fancyscript{I}&= \left\{ \widetilde{h}\left\| Z_0 \right\| \le r_h \right\} \\ \fancyscript{A}(x,y)&= \left\{ \alpha _h(x, X_1(x)) \wedge \alpha _h(y, Y_1(y)) > U \right\} \\ \fancyscript{R}(x,y)&= \left\{ \alpha _h(x, X_1(x)) \vee \alpha _h(y, Y_1(y)) < U \right\} . \end{aligned}$$

On the event $\fancyscript{A}(x,y)$, the moves are both accepted so that $\mathbf {X}_1 = X_1(x)$ and $\mathbf {Y}_1 = Y_1(y)$; On the event $\fancyscript{R}(x,y)$, the moves are both rejected so that $\mathbf {X}_1 =x$ and $\mathbf {Y}_1 =y$. Then for all event $\fancyscript{I}$, it holds,

$$\begin{aligned} \nonumber&\!\!\!\mathbf {K}d_{\eta ,\epsilon }(x,y) \le \mathbb {E} \left[ d_{\eta ,\epsilon }(\mathbf {X}_1, \mathbf {Y}_1) \right] \\ \nonumber&\le \mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \\ \nonumber&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \\ \nonumber&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \\&\quad + \mathbb {E} \left[ \left| \alpha _h(x,X_1(x) ) - \alpha _h(y,Y_1(y)) \right| \right] , \end{aligned}$$

(50)

where we have used $d_{\eta ,\epsilon }$ is bounded by $1$. First, by Lemma 8-(v) since $\delta _\epsilon (x,y) \le \epsilon \le 1$, there exist $\eta _1,\tau _1 >0$ such that for all $\eta < \eta _1$

$$\begin{aligned}&\!\!\!\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \\&\quad \le (1-\tau _1) d_{\eta ,\epsilon }(x,y) \mathbb {P}\left[ \fancyscript{A}(x,y) \cap \fancyscript{I} \right] . \end{aligned}$$

Let us define $\tau _2 = (1-(1-\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] ) \tau _1)$. We claim that $\tau _2 >0$. Indeed, if $x,y \in \mathrm{B }_Q\left( 0,R_h + 2\right) $, since $\varUpsilon $ and $Q^{-1}\nabla \varUpsilon $ are continuous by assumption, we have by Lemma 6-(1)

$$\begin{aligned} \mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right]&\ge \inf \left\{ \exp (-G(t,z)^+) ; t \in \mathrm{B }_Q\left( 0,R_h +2\right) , \right. \\&\quad \left. z \in \mathrm{B }_Q\left( 0,C_\varUpsilon ((R_h+2)\nu +(h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q +r_h) \right) \right\} > 0. \end{aligned}$$

If $x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) $, by M2, $\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] > a_l >0$. Therefore, as $\mathbb {P}\left[ \fancyscript{R}(x,y) \cap \fancyscript{I} \right] \le \mathbb {P}\left[ \fancyscript{I} \right] - \mathbb {P}\left[ \fancyscript{I}\cap \fancyscript{A}(x,y) \right] $

$$\begin{aligned}&\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \le \mathbb {P}\left[ \fancyscript{I} \right] (1-\tau _2) d_{\eta ,\epsilon }(x,y).\nonumber \\ \end{aligned}$$

(51)

By Lemma 8-(ii)–(v), since $\delta _\epsilon (x,y) \le \epsilon \le 1$ and for all $a,b,c \in \mathbb {R}_+$, $(a+b)/(a+c) \le 1 \vee (b/c)$,

$$\begin{aligned} d_{\eta , \epsilon }(X_h(x), Y_h(y) ) / d_{\eta ,\epsilon } (x,y) \le \nu \left\{ 2 \eta (1- \nu ) +\eta (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\eta \left\| Z_0 \right\| +1 \right\} . \end{aligned}$$

Also, by the Dominated Convergence theorem and Lemma 8-(v), for all $\kappa >0$ there exists $\eta _2 \in \left( 0,\eta _1\right] $ such that for all $\eta \in \left( 0, \eta _2\right] $,

$$\begin{aligned} \nonumber&\mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \\ \nonumber&\quad \le d_{\eta ,\epsilon }(x,y) \mathbb {E} \left[ 1 \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y))/d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{I}^c} \right] \\&\quad \le d_{\eta ,\epsilon }(x,y) ( 1+ \kappa ) \mathbb {P}\left[ \fancyscript{I}^c \right] . \end{aligned}$$

(52)

Therefore since $\mathbb {P}\left[ \fancyscript{I} \right] >0$, using (51)–(52) and choosing $\kappa $ small enough, there exists $\tau _3, \eta _2 >0$ such that for all $\eta \in \left( 0,\eta _2\right] $

$$\begin{aligned}&\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad +\, \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad +\, \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \nonumber \\&\qquad \le (1-\tau _3) d_{\eta ,\epsilon }(x,y). \end{aligned}$$

(53)

Next, by Lemma 7 and Lemma 8-(iv), since $\delta _\epsilon (x,y) \le 1$, there exists $C$, such that

$$\begin{aligned} \nonumber&\mathbb {E} \left[ \left| \alpha _h(x,X_1(x) ) - \alpha _h(y,Y_1(y)) \right| \right] \\ \nonumber&\quad \le C \left\| x-y \right\| _Q(\left\| x \right\| _Q \vee \left\| y \right\| _Q \\ \nonumber&\qquad -\delta _\epsilon (x,y) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] )\\&\quad \le C ((1/\eta ) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q\nonumber \\&\qquad + \mathbb {E} \left[ \left\| Z_0 \right\| \right] ) \epsilon d_{\eta ,\epsilon }(x,y). \end{aligned}$$

(54)

Set

$$\begin{aligned} \epsilon _1 \overset{\text {def}}{=}\tau _3/(2 C ((1/\eta _2) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] )). \end{aligned}$$

(55)

Therefore by (50)–(53) and (54), for $\eta \leftarrow \eta _2$ and $\epsilon \leftarrow \epsilon _1$, for all $x,y \in \mathbb {R}^d$,

$$\begin{aligned} \mathbf {K}d_{\eta ,\epsilon }(x,y) \le (1-\tau _3/2) d_{\eta ,\epsilon }(x,y). \end{aligned}$$

(56)

$\square $

Proof of Proposition 5

For ease of notation, we simply write $\mathbf {K}$ for $\mathbf{K}_\mathrm{M}$. Let $\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}$ be a Markov chain with Markov kernel $\mathbf {K}$. We denote for all $n \in \mathbb {N}^*$, $Z_n$ and $U_n$, respectively the common gaussian variable and uniform variable, sampled to build $(\mathbf {X}_n,\mathbf {Y}_n)$. Let $\mathbb {P}\left[ \cdot \right] $ and $\mathbb {E} \left[ \cdot \right] $ be the probability and the expectation over $\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} $. Note that by definition the variables $\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} $ are independent. Under M1 and M2, by definition of $Q$ and 9 the condition (i) and (ii) of Definition 1 are satisfied. In addition, there exists $\epsilon ,\eta , \tau >0$ such that for all $x,y \in \mathbb {R}^d$, $d_{\eta ,\epsilon }(x,y) <1$,

$$\begin{aligned} \mathbf {K}d_{\eta ,\epsilon }(x,y) \le (1-\tau ) d_{\eta ,\epsilon }(x,y) \end{aligned}$$

(57)

Let $R >0 $, and $x,y$ be in $\mathrm{B }_Q\left( 0,R\right) $. Assume first $d_{\eta ,\epsilon }(x,y) < 1$. Then by (57) and Lemma 2, for every $n \in \mathbb {N}^*$,

$$\begin{aligned}&\mathbf {K}^nd_{\eta ,\epsilon }(x,y) \le \mathbf {K}^{n-1}d_{\eta ,\epsilon }(x,y) \nonumber \\&\quad \le \cdots \le (1-\tau ) d_{\eta ,\epsilon }(x,y). \end{aligned}$$

(58)

Consider now the case $d_{\eta ,\epsilon }(x,y) = 1$. Let $\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}$ be the Markov chain with Markov kernel $\mathbf {K}$ starting in $(x,y)$. Let $n \in \mathbb {N}^*$ and denote for all $1 \le i \le n$

$$\begin{aligned}&\varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i)\\&\quad = \alpha _h(\mathbf {X}_{i-1}, \fancyscript{O}(\mathbf {X}_{i-1},Z_i)) \wedge \alpha _h(\mathbf {Y}_{i-1}, \fancyscript{O}(\mathbf {Y}_{i-1},Z_i)) \\&\fancyscript{I}_i(n) = \left\{ \left\| (h/2)Q^{-1}\nabla \varUpsilon (0) + \widetilde{h}Q^{-1/2}Z_i \right\| _Q \le R/ n \right\} \\&\fancyscript{A}_i(x,y)= \left\{ U_i \le \varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i) \right\} \\&\widetilde{\fancyscript{A}}^i(x,y,n) = \bigcap _{1 \le j \le i} \left( \fancyscript{I}_j(n) \cap \fancyscript{A}_j(x,y) \right) , \end{aligned}$$

where $\fancyscript{O}$ given by (26). On the set $\widetilde{\fancyscript{A}}\,^\mathrm{i}(x,y,i)$, for all $1 \le j \le i$, $\mathbf {X}_j= \fancyscript{O}(\mathbf {X}_{i-1},Z_i), \mathbf {Y}_j = \fancyscript{O}(\mathbf {Y}_{i-1},Z_i) $ and $ \left\| \mathbf {X}_j \right\| _Q \vee \left\| \mathbf {Y}_j \right\| _Q \le 2R $. Then, since by Lemma 8-(iii),

$$\begin{aligned}&d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n) \le \epsilon ^{-1}\left\| \mathbf {X}_n-\mathbf {Y}_n \right\| _Q \\&\quad \times \left( \eta \left\{ \left\| \mathbf {X}_n \right\| _Q\vee \left\| \mathbf {Y}_n \right\| _Q \right\} + 1 \right) , \end{aligned}$$

by Lemma 6-(1) on $\widetilde{\fancyscript{A}}^n(x,y,n)$ it holds

$$\begin{aligned} d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n) \le \epsilon ^{-1} \nu ^n \left\| x-y \right\| _Q( 2 \eta R + 1). \end{aligned}$$

This inequality and $d_{\eta ,\epsilon }\le 1$ yield

$$\begin{aligned} \nonumber&\mathbf {K}^n d_{\eta ,\epsilon }(x,y) \\&\quad = \widetilde{\mathbb {E}}_{x,y} \left[ d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n)( \mathbb {1}_{\widetilde{\fancyscript{A}}^n(x,y,n)} + \mathbb {1}_{(\widetilde{\fancyscript{A}}^n(x,y,n))^c}) \right] \end{aligned}$$

(59)

$$\begin{aligned} \nonumber&\quad \le \epsilon ^{-1} \nu ^n \left\| x-y \right\| _Q( 2 \eta R + 1) \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] \nonumber \\&\qquad +\mathbb {P}\left[ (\widetilde{\fancyscript{A}}^n(x,y,n))^c \right] \nonumber \\ \nonumber&\quad \le \epsilon ^{-1} \nu ^n 2 R ( 2 \eta R + 1) \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] \nonumber \\&s+ \mathbb {P}\left[ (\widetilde{\fancyscript{A}}^n(x,y,n))^c \right] \nonumber \\&\quad \le 1+ \left( \epsilon ^{-1} \nu ^n 2 R ( 2 \eta R \!+\! 1) \!-\!1 \right) \, \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] . \end{aligned}$$

(60)

As $\nu \in \left( 0,1\right) $, there exists $m$ such that, $\epsilon ^{-1} \nu ^m 2 R (2 \eta R + 1) < 1$. It remains to lower bound $\mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] $ by a positive constant to conclude, which is done by the following inequalities, where we use the independence of the random variables $\left\{ Z_i , U_i ; \;i \in \mathbb {N}^* \right\} $.

$$\begin{aligned}&\mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] = \mathbb {P}\left[ \widetilde{\fancyscript{A}}^{m-1}(x,y,m) \cap \fancyscript{I}_m(m) \right] \\&\ \widetilde{\mathbb {E}}_{x,y} \left[ \varPsi (\mathbf {X}_{m-1}, \mathbf {Y}_{m-1}, Z_m) \left| \widetilde{\fancyscript{A}} \right. ^{m-1}(x,y,m) \cap \fancyscript{I}_m(m) \right] . \end{aligned}$$

For all $1 \le i \le m$, on the event $ \bigcap _{j \le i} \fancyscript{I}_j(m)$, it holds

$$\begin{aligned}&\varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i) \\&\quad \ge \exp \left( -\sup _{(z,t) \in \mathrm{B }_Q\left( 0,2R\right) } G(z,t)^+ \right) = \delta , \end{aligned}$$

where $\delta \in (0,1)$, since $G$ is continuous by M1. Therefore, since $Z_i$ is independent of $\widetilde{\fancyscript{A}}^{i-1}(x,y,m)$, we have

$$\begin{aligned} \mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] \ge \delta \ \mathbb {P}\left[ \widetilde{\fancyscript{A}}_{m-1}(x,y,m) \right] \ \mathbb {P}\left[ \fancyscript{I}_m(m) \right] . \end{aligned}$$

An immediate induction leads to

$$\begin{aligned} \mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right]&\ge \mathbb {P}\left[ \fancyscript{I}_1(m) \right] ^m \delta ^m. \end{aligned}$$

Plugging this result in (60) and (58) imply there exists $s \in \left( 0,1\right) $ such that for all $x,y \in \mathrm{B }_Q\left( 0,R\right) $, $\mathbf {K}^m d_{\eta ,\epsilon }(x,y) \le s d_{\eta ,\epsilon }(x,y)$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Durmus, A., Moulines, É. Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm. Stat Comput 25, 5–19 (2015). https://doi.org/10.1007/s11222-014-9511-z

Download citation

Received: 15 March 2014
Accepted: 28 August 2014
Published: 20 September 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11222-014-9511-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

Abstract

Access this article

Similar content being viewed by others

Non-stationary phase of the MALA algorithm

Large deviations conditioned on large deviations I: Markov chain and Langevin equation

Maximum Kolmogorov-Sinai Entropy Versus Minimum Mixing Time in Markov Chains

Notes

References

Author information

Authors and Affiliations

Corresponding author

A Proofs

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Proof of Proposition

Lemma 8

Proof of Lemma 8

Lemma 9

Proof

Proof of Proposition 5

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

Abstract

Access this article

Similar content being viewed by others

Non-stationary phase of the MALA algorithm

Large deviations conditioned on large deviations I: Markov chain and Langevin equation

Maximum Kolmogorov-Sinai Entropy Versus Minimum Mixing Time in Markov Chains

Notes

References

Author information

Authors and Affiliations

Corresponding author

A Proofs

A Proofs

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Proof of Proposition

Lemma 8

Proof of Lemma 8

Lemma 9

Proof

Proof of Proposition 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation