Skip to main content
Log in

Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this paper, we establish explicit convergence rates for Markov chains in Wasserstein distance. Compared to the more classical total variation bounds, the proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying sampling space is large. We illustrate these results by analyzing the Exponential Integrator version of the Metropolis Adjusted Langevin Algorithm. We illustrate our findings using a Bayesian linear inverse problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The underlying problem is typically infinite-dimensional; the problem is finite dimensional after truncation.

References

  • Baxendale, P.H.: Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B), 700–738 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Beskos, A., Roberts, G., Stuart, A.M., Voss, J.: An MCMC method for diffusion bridges. Stochas. Dynam. 8, 319–350 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Wiley, New York (1973) (reprinted in paperback 1992 ISBN: 0-471-57428-7 pbk.)

  • Butkovsky, O.: Subgeometric rates of convergence of Markov processes in the Wasserstein metric. Ann. Appl. Probab. 24(2), 526–552 (2014)

  • Cotter, S.L., Roberts, G.O., Stuart, A.M., White, D.: MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci. 28(3), 424–446 (2013)

    Article  MathSciNet  Google Scholar 

  • Dashti, M., Law, K.J.H., Stuart, A.M., Voss, J.: MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Probl. 29(9), 095017 (2013)

    Article  MathSciNet  Google Scholar 

  • Eberle, A.: Error bounds for Metropolis-Hastings algorithms applied to perturbations of gaussian measures in high dimensions. Ann. Appl. Probab. 24(1), 337–377 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  • Hairer, M., Mattingly, J.C., Scheutzow, M.: Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probab. Theor. Relat. Fields 149(1–2), 223–259 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Hairer, M., Stuart, A.M., Vollmer, S.J.: Spectral gaps for Metropolis-Hastings algorithm in infinite dimensions. Ann. Appl. Probab. 24(6), 2455–2490 (2014)

  • Jarner, S.F., Tweedie, R.L.: Locally contracting iterated functions and stability of Markov chains. J. Appl. Probab. 38(2), 494–507 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Jones, G., Hobert, J.: Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16(4), 312–334 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160. Applied Mathematical Sciences. Springer, New York (2005)

    Google Scholar 

  • Kent, J.: Time-reversible diffusions. Adv. Appl. Probab. 10(4), 819–835 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  • Lord, G.J., Rougemont, J.: A numerical scheme for stochastic PDEs with Gevrey regularity. IMA J. Numer. Anal. 24(4), 587–604 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, New York (2009)

    Book  MATH  Google Scholar 

  • Meyn, S.P., Tweedie, R.L.: Computable bounds for convergence rates of Markov chains. Ann. Appl. Probab. 4, 981–1011 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Bounds on regeneration times and convergence rates for Markov chains. Stochast. Process. Appl. 80, 211–229 (1999)

  • Rogers, L., Williams, D.: Diffusions, Markov processes, and martingales, vol. 2. Cambridge Mathematical Library, Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)

  • Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90(430), 558–566 (1995)

  • Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Villani, C.: Optimal transport : Old and New. Grundlehren der Mathematischen Wissenschaften. Springer, Berlin (2009)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alain Durmus.

A Proofs

A Proofs

Lemma 5

Assume M1.

  1. 1.

    For all \(x,y,z \in \mathbb {R}^d\),

    $$\begin{aligned}&\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),z \right\rangle \le C_\varUpsilon \left\| x-y \right\| _Q\left\| z \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (x) \right\| _Q \le C_\varUpsilon \left\| x \right\| _Q + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&\quad \left\langle \nabla \varUpsilon (z), x-y \right\rangle \le C_\varUpsilon ( C_\varUpsilon \left\| z \right\| _Q \\&\quad + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q) \left\| x-y \right\| _Q. \end{aligned}$$
  2. 2.

    For all \(x,y \in \mathbb {R}^d\) and \(h \le 4/(C_\varUpsilon ^2 +1)\),

    $$\begin{aligned}&\left\| x-(h/2)Q^{-1}\nabla U(x) - \{y - (h/2) Q^{-1} \nabla U (y) \} \right\| _Q \nonumber \\&\quad \le \nu \left\| x-y \right\| _Q, \end{aligned}$$
    (41)

    where

    $$\begin{aligned} \nu = \left( 1- h (1- h(1+C_\varUpsilon ^2)/4) \right) ^{1/2}. \end{aligned}$$
    (42)

    In particular,

    $$\begin{aligned}&\left\| x - (h/2) Q^{-1}\nabla U(x) \right\| _Q\nonumber \\&\quad \le \nu \left\| x \right\| _Q + (h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q. \end{aligned}$$
    (43)

Proof

  1. (1)

    is just a consequence of the definition of \(\left\langle \cdot ,\cdot \right\rangle _Q\), M1, the Cauchy–Schwarz inequality and the triangle inequality.

  2. (2)

    Let \(h \le 4/(C_\varUpsilon ^2+1)\). On M1, since \(\varUpsilon \) is convex and \(C^1\), for all \(x,y \in \mathbb {R}^d\),

    $$\begin{aligned}&\left\| x-y-(h/2)Q^{-1}(\nabla U(x) - \nabla U(y)) \right\| _Q^2 \\&\quad = (1-h/2)^2\left\| x-y \right\| _Q^2 \\&\qquad +(h^2/4)\left\| Q^{-1}( \nabla \varUpsilon (x) - \nabla \varUpsilon (y)) \right\| _Q^2 \\&\qquad - h (1-h/2)\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),x-y \right\rangle \\&\quad \le \left( 1- h(1 - h (C_\varUpsilon ^2+1) /4) \right) \left\| x-y \right\| _Q ^2, \end{aligned}$$

    showing (41). Equation (43) follows from (41) and the triangle inequality. \(\square \)

Lemma 6

Assume M1.

  1. 1.

    For all \(x,y \in \mathbb {R}^d\),

    $$\begin{aligned} \left\| X_1(x) - Y_1(y) \right\| _Q&\le \nu \left\| x-y \right\| _Q \\ \left\| X_1(x) \right\| _Q&\le \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&+ \widetilde{h}\left\| Q^{-1/2}Z_0 \right\| _Q, \end{aligned}$$

    where \((X_1(x), Y_1(y))\) and \(\nu \) are given by (28) and (42), respectively.

  2. 2.

    For all \(x,y \in \mathbb {R}^d\)

    $$\begin{aligned}&\left\| Q^{-1}(\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y))) \right\| _Q \le C_\varUpsilon \nu \left\| x-y \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (X_1(x)) \right\| _Q \\&\quad \le C_\varUpsilon \left( \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q + \widetilde{h}\left\| Z_0 \right\| \right) \!. \end{aligned}$$

Proof

  1. (1)

    is just a consequence of Lemma 5-(2). Then

  2. (2)

    follows from M1 and Lemma 5-(1). \(\square \)

Lemma 7

Assume M1. There exists \(C\) such that

$$\begin{aligned}&\left| G_h(x, X_1(x) ) - G_h(y,Y_1(y)) \right| \le C \left\| x-y \right\| _Q \nonumber \\&\quad \times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$
(44)

Proof

Let us write

$$\begin{aligned} \left| G_h(x, X_1(x) ) - G_h(y,Y_1(y)) \right| \le \sum _{i=1}^4 I_i. \end{aligned}$$

Using M1, Lemma 5 and Lemma 6 we have the following inequalities for \(I_i\), \(i=1 \ldots 4\).

$$\begin{aligned} I_1&= \left| \varUpsilon (x) -\varUpsilon (y) \right| + \left| \varUpsilon (X_1(x) )-\varUpsilon (Y_1(y)) \right| \\&\quad \;+ \left| \varGamma (x) -\varGamma (y) \right| + \left| \varGamma (X_1(x) )-\varGamma (Y_1(y)) \right| \\&\le \left| \int _0 ^1 \left\langle \nabla \varUpsilon (tx + (1-t)y),x-y \right\rangle \mathrm {d}t \right| \\&\quad \;+ \Bigg | \int _0 ^1 \langle \nabla \varUpsilon (tX_1(x) + (1-t)Y_1(y)),\\&\qquad X_1(x)-Y_1(y)\rangle \mathrm {d}t \Bigg | \\&\quad \;+ \left| \int _0 ^1 \left\langle \nabla \varGamma (tx + (1-t)y),x-y \right\rangle \mathrm {d}t \right| \\&\quad \;+ \Bigg |\int _0 ^1 \langle \nabla \varGamma (tX_1(x) + (1-t)Y_1(y)),\\&\qquad X_1(x)-Y_1(y) \mathrm {d}t \rangle \Bigg | \\&\le C_1 \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$
$$\begin{aligned} I_2&= (1/2)\vert \left\langle x-X_1(x),\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (y) + \nabla \varUpsilon (Y_1(y) ) \right\rangle \vert \\&= (1/2) \vert \left\langle x-y,\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;+ \left\langle Y_1(y)-X_1(x),\nabla \varUpsilon (x) + \nabla \varUpsilon (X_1(x) ) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (y) -\nabla \varUpsilon (x) \right\rangle \\&\quad \;- \left\langle y-Y_1(y),\nabla \varUpsilon (Y_1(y)) - \nabla \varUpsilon (X_1(x) ) \right\rangle \vert \\&\le C_2 \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

Denote \(\widehat{h} =h /(8-2h)\). Then,

$$\begin{aligned} I_3&= \widehat{h} \vert \left\langle X_1(x)+x,\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;- \left\langle Y_1(y)+y,\nabla \varUpsilon (Y_1(y)) - \nabla \varUpsilon (y) \right\rangle \vert \\&= \widehat{h} \vert \left\langle X_1(x) -Y_1(x),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle x-y,\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle Y_1(y)+y,\nabla \varUpsilon (X_1(x)) -\nabla \varUpsilon (Y_1(y)) \right\rangle \\&\quad \;+ \left\langle Y_1(y)+y,\nabla \varUpsilon (y) -\nabla \varUpsilon (x) \right\rangle \vert \\&\le C_3 \widehat{h} \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$
$$\begin{aligned} I_4&= \widehat{h} \vert \left\| Q^{-1/2}\nabla \varUpsilon (y) \right\| _Q^2 - \left\| Q^{-1/2} \nabla \varUpsilon (x) \right\| _Q^2 \\&\quad \;+ \left\| Q^{-1/2}\nabla \varUpsilon (X_1(x)) \right\| _Q^2 - \left\| Q^{-1/2}\nabla \varUpsilon (Y_1(y)) \right\| _Q^2 \vert \\&= \widehat{h}\vert \left\langle Q^{-1}(\nabla \varUpsilon (y) + \nabla \varUpsilon (x)) , \nabla \varUpsilon (y) - \nabla \varUpsilon (x) \right\rangle \\&\quad \;+ \left\langle Q^{-1} \nabla \varUpsilon (X_1(x)),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y)) \right\rangle \\&\quad \;+ \left\langle Q^{-1} \nabla \varUpsilon (Y_1(y)),\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y)) \right\rangle \vert \\&\le C_4 \widehat{h} \left\| x-y \right\| _Q \\&\quad \;\times \left\{ \left\| x \right\| _Q\vee \left\| y \right\| _Q +\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \left\| Z_0 \right\| \right\} \!. \end{aligned}$$

\(\square \)

Proof of Proposition

4 Set \(R = \nu ^{-1} \vee R_h\). First, by Lemma 6-(1) and M1,

$$\begin{aligned}&\sup _{x \in \mathrm{B }_Q\left( 0,R\right) } P \fancyscript{V}(x) \nonumber \\ \nonumber&\quad \le \sup _{x \in \mathrm{B }_Q\left( 0,R\right) } \mathbb {E} \left[ 1 \vee \left\| x \right\| _Q \vee \left\| X_1(x) \right\| _Q \right] \nonumber \\&\quad \le 1 \vee \left( R + (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] \right) < +\infty \nonumber \\ \end{aligned}$$
(45)

Next, if \(x \not \in \mathrm{B }_Q\left( 0,R\right) \), let \(X_1(x)\) defined by (28) and \(U \sim \mathcal {U} (\left[ 0,1\right] )\) and set

$$\begin{aligned} \fancyscript{A}(x)&= \left\{ \alpha _h(x, X_1(x)) \le U \right\} \\ \fancyscript{I}&= \left\{ \widetilde{h}\left\| Z_0 \right\| \le r_h \right\} . \end{aligned}$$

On \(\fancyscript{A}(x)\), the proposal is accepted and \(\mathbf {X}_1 = X_1(x)\). On this complement, \(\mathbf {X}_1 = x\). Then by Lemma 6-(1) and since \(\left\| x \right\| _Q \ge \nu \left\| x \right\| _Q \ge 1\),

$$\begin{aligned}&\!\!\!P \fancyscript{V}(x) \nonumber \\&\le \mathbb {E} \left[ \left\| X_1(x) \right\| _Q \mathbb {1}_{\fancyscript{A}(x) \cap \fancyscript{I}} \right] \nonumber \\&+ \mathbb {E} \left[ \left\| x \right\| _Q \mathbb {1}_{\fancyscript{A}(x)^c \cap \fancyscript{I}} \right] + \mathbb {E} \left[ \left\| x \right\| _Q \vee \left\| X_1(x) \right\| _Q \mathbb {1}_{\fancyscript{I}^c} \right] \nonumber \\&\le \nu \left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{A}(x) \cap \fancyscript{I} \right] + \left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{A}(x)^c \cap \fancyscript{I} \right] \nonumber \\&\quad \;+\left\| x \right\| _Q \mathbb {P}\left[ \fancyscript{I}^c \right] + (h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q + \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \nonumber \\&\le \lambda \left\| x \right\| _Q + (h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \end{aligned}$$
(46)

with \(\lambda = \mathbb {P}\left[ \fancyscript{I} \right] (1 - (1-\nu ) \mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] ) + \mathbb {P}\left[ \fancyscript{I}^c \right] \). Since by M2, \(\mathbb {P}\left[ \fancyscript{I} \right] \) and \(\mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] \) are positive, \(\lambda \in \left( 0,1\right) \). In addition on M1, \((h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \) is finite. Therefore, the proof is concluded combining (45) and (46). \(\square \)

Lemma 8

Let \(x,y \in \mathbb {R}^d\), \(T ,\epsilon , \eta \in \mathbb {R}^*_+\) and \(\psi \in \fancyscript{F}_{x,y}^T\). Denote

$$\begin{aligned} \delta _\epsilon (x,y) {\overset{\mathrm{def}}{=}} \epsilon / (\eta (\left\| x \right\| _Q \vee \left\| y \right\| _Q-\epsilon )\vee 0 + 1). \end{aligned}$$
(47)
  1. (i)

    \(T \ge \left\| x-y \right\| _Q\).

  2. (ii)

    If \( \eta \int _0 ^T \left\| \psi (s) \right\| _Q \mathrm {d}s + T < \epsilon \), then \(T \le \delta _\epsilon (x,y) \le \epsilon \).

  3. (iii)

    \(d_{\eta ,\epsilon }(x,y) \le \epsilon ^{-1} \, \left\| x-y \right\| _Q \left( \eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} +1 \right) \)

  4. (iv)

    If \(d_{\eta ,\epsilon }(x,y) < 1\), then

    $$\begin{aligned} d_{\eta , \epsilon }(x,y)&\ge \epsilon ^{-1} \, \left\| x-y \right\| _Q \\&\left( (\eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\}-\delta _\epsilon (x,y))\vee 0 +1 \right) . \end{aligned}$$
  5. (v)

    If \(d_{\eta , \epsilon }(x,y) < 1\) then

    $$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \left\| x-(h/2)\nabla U(x) - (y - (h/2) \nabla U(y)) \right\| _Q \right. \\&\quad \left\{ \eta \left\| x-(h/2)\nabla U(x) \right\| _Q \vee \left\| y-(h/2)\nabla U(y) \right\| _Q \right. \\&\quad \left. \left. \quad + \eta \widetilde{h}\left\| Z_0 \right\| _Q+1 \right\} \right) \\&\quad \left( \left\| x-y \right\| _Q\left\{ \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right\} \right) ^{-1} \end{aligned}$$

    and under M1 for all \(h \le 4/(C_\varUpsilon ^2 +1)\),

    $$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \nu \left\{ \eta \nu \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} \right. \right. \\&\quad \left. \left. +\eta (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\eta \left\| Z_0 \right\| +1 \right\} \right) \\&\quad \left( \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right) ^{-1} \;, \end{aligned}$$

    with \(\nu \) given by (42).

Proof of Lemma 8

(i) By definition of \(\fancyscript{F}_{x,y}^T\),

$$\begin{aligned} \left\| x-y \right\| _Q&= \left\| \psi (T) - \psi (0) \right\| _Q \\&\le \int _0^T \left\| \psi '(s) \right\| _Q ds = T. \end{aligned}$$

(ii) First, it is straightforward to see \(T < \epsilon \). In addition, for all \(s \in \left[ 0,T\right] \),

$$\begin{aligned} \nonumber \left\| \psi (s) \right\| _Q&\ge \left| \left\| x \right\| _Q - s \right| \vee \left| \left\| y \right\| _Q - (T-s) \right| \\&\ge \left| \left\| x \right\| _Q \vee \left\| y \right\| _Q -\epsilon \right| . \end{aligned}$$
(48)

Then the result follows from integrating the inequality between \(0\) and \(T\) and using the assumption.

(iii) It suffices to consider the particular case, \(T = \left\| y-x \right\| _Q\) and \(\psi \in \fancyscript{F}^T_{x,y}\) defined for \(s \in \left[ 0,T\right] \) by

$$\begin{aligned} \psi (s) = x +(y-x)s / \left\| x-y \right\| _Q. \end{aligned}$$

(iv) Let \(\{(T_n,\psi _n) ; n \in \mathbb {N}\}\) such that

$$\begin{aligned} \lim _{n \rightarrow +\infty } \epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right)&= d_{\eta ,\epsilon }(x,y) \\ \nonumber \epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right)&< 1 \quad \forall n \end{aligned}$$
(49)

Then using (i)–(ii) and (48), for all \(n\) it holds

$$\begin{aligned}&\epsilon ^{-1} \left( \eta \int _0 ^{T_n} \left\| \psi _n(s) \right\| _Q ds + T_n \right) \\&\quad \ge \epsilon ^{-1} \left\| x-y \right\| _Q\\&\qquad \left( (\eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\}-\delta _\epsilon (x,y))\vee 0 +1 \right) . \end{aligned}$$

The result now comes from (49).

(v) The claimed inequalities come from (iii)–(iv) with the definition of the basic coupling (28) for the first one, and using Lemma 6-(1) for the second. \(\square \)

Lemma 9

Assume M1 and M2.

Then for all \(h \in \left( 0, h_\ell \wedge (4/C_U) \right) \), there exists \(\epsilon , \eta , \tau >0\) such that for all \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\),

$$\begin{aligned} \mathbf{K}_\mathrm{M}d_{\eta ,\epsilon }(x,y) \le (1- \tau )d_{\eta ,\epsilon }(x,y), \end{aligned}$$

with \(d_{\eta ,\epsilon }\) given by (32). In particular, for all \(x,y \in \mathbb {R}^d\),

$$\begin{aligned} \mathbf{K}_\mathrm{M}d_{\eta ,\epsilon }(x,y) \le d_{\eta ,\epsilon }(x,y). \end{aligned}$$

Proof

For ease of notation, we simply write \(\mathbf {K}\) for \(\mathbf{K}_\mathrm{M}\). Let \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\). Then, if \(\left\| y \right\| _Q \le R_h\),

$$\begin{aligned} \left\| x \right\| _Q \le \left\| x-y \right\| _Q + \left\| y \right\| _Q \le \epsilon + R_h, \end{aligned}$$

where we used Lemma 8-(iv). Therefore, as we will choose \(\epsilon \) small enough we can assume \(\epsilon \le 1\), and we end up with two cases: either \(x,y \in \mathrm{B }_Q\left( 0, R_h +2\right) \) or \(x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) \). Let \((\mathbf {X}_1,\mathbf {Y}_1)\) be the basic coupling between \(P(x,\cdot )\) and \(P(y,\cdot )\); let \(Z_0, U\) be resp. the Gaussian variable and the uniform variable used for the basic coupling. Let \(\mathbb {P}\left[ \cdot \right] \) and \(\mathbb {E} \left[ \cdot \right] \) be the probability and the expectation over \(Z_1\) and \(U_1\). Set

$$\begin{aligned} \fancyscript{I}&= \left\{ \widetilde{h}\left\| Z_0 \right\| \le r_h \right\} \\ \fancyscript{A}(x,y)&= \left\{ \alpha _h(x, X_1(x)) \wedge \alpha _h(y, Y_1(y)) > U \right\} \\ \fancyscript{R}(x,y)&= \left\{ \alpha _h(x, X_1(x)) \vee \alpha _h(y, Y_1(y)) < U \right\} . \end{aligned}$$

On the event \(\fancyscript{A}(x,y)\), the moves are both accepted so that \(\mathbf {X}_1 = X_1(x)\) and \(\mathbf {Y}_1 = Y_1(y)\); On the event \(\fancyscript{R}(x,y)\), the moves are both rejected so that \(\mathbf {X}_1 =x\) and \(\mathbf {Y}_1 =y\). Then for all event \(\fancyscript{I}\), it holds,

$$\begin{aligned} \nonumber&\!\!\!\mathbf {K}d_{\eta ,\epsilon }(x,y) \le \mathbb {E} \left[ d_{\eta ,\epsilon }(\mathbf {X}_1, \mathbf {Y}_1) \right] \\ \nonumber&\le \mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \\ \nonumber&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \\ \nonumber&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \\&\quad + \mathbb {E} \left[ \left| \alpha _h(x,X_1(x) ) - \alpha _h(y,Y_1(y)) \right| \right] , \end{aligned}$$
(50)

where we have used \(d_{\eta ,\epsilon }\) is bounded by \(1\). First, by Lemma 8-(v) since \(\delta _\epsilon (x,y) \le \epsilon \le 1\), there exist \(\eta _1,\tau _1 >0\) such that for all \(\eta < \eta _1\)

$$\begin{aligned}&\!\!\!\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \\&\quad \le (1-\tau _1) d_{\eta ,\epsilon }(x,y) \mathbb {P}\left[ \fancyscript{A}(x,y) \cap \fancyscript{I} \right] . \end{aligned}$$

Let us define \(\tau _2 = (1-(1-\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] ) \tau _1)\). We claim that \(\tau _2 >0\). Indeed, if \(x,y \in \mathrm{B }_Q\left( 0,R_h + 2\right) \), since \(\varUpsilon \) and \(Q^{-1}\nabla \varUpsilon \) are continuous by assumption, we have by Lemma 6-(1)

$$\begin{aligned} \mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right]&\ge \inf \left\{ \exp (-G(t,z)^+) ; t \in \mathrm{B }_Q\left( 0,R_h +2\right) , \right. \\&\quad \left. z \in \mathrm{B }_Q\left( 0,C_\varUpsilon ((R_h+2)\nu +(h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q +r_h) \right) \right\} > 0. \end{aligned}$$

If \(x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) \), by M2, \(\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] > a_l >0\). Therefore, as \(\mathbb {P}\left[ \fancyscript{R}(x,y) \cap \fancyscript{I} \right] \le \mathbb {P}\left[ \fancyscript{I} \right] - \mathbb {P}\left[ \fancyscript{I}\cap \fancyscript{A}(x,y) \right] \)

$$\begin{aligned}&\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad + \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \le \mathbb {P}\left[ \fancyscript{I} \right] (1-\tau _2) d_{\eta ,\epsilon }(x,y).\nonumber \\ \end{aligned}$$
(51)

By Lemma 8-(ii)–(v), since \(\delta _\epsilon (x,y) \le \epsilon \le 1\) and for all \(a,b,c \in \mathbb {R}_+\), \((a+b)/(a+c) \le 1 \vee (b/c)\),

$$\begin{aligned} d_{\eta , \epsilon }(X_h(x), Y_h(y) ) / d_{\eta ,\epsilon } (x,y) \le \nu \left\{ 2 \eta (1- \nu ) +\eta (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\eta \left\| Z_0 \right\| +1 \right\} . \end{aligned}$$

Also, by the Dominated Convergence theorem and Lemma 8-(v), for all \(\kappa >0\) there exists \(\eta _2 \in \left( 0,\eta _1\right] \) such that for all \(\eta \in \left( 0, \eta _2\right] \),

$$\begin{aligned} \nonumber&\mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \\ \nonumber&\quad \le d_{\eta ,\epsilon }(x,y) \mathbb {E} \left[ 1 \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y))/d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{I}^c} \right] \\&\quad \le d_{\eta ,\epsilon }(x,y) ( 1+ \kappa ) \mathbb {P}\left[ \fancyscript{I}^c \right] . \end{aligned}$$
(52)

Therefore since \(\mathbb {P}\left[ \fancyscript{I} \right] >0\), using (51)–(52) and choosing \(\kappa \) small enough, there exists \(\tau _3, \eta _2 >0\) such that for all \(\eta \in \left( 0,\eta _2\right] \)

$$\begin{aligned}&\mathbb {E} \left[ d_{\eta ,\epsilon }(X_1(x) , Y_1(y)) \mathbb {1}_{\fancyscript{A}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad +\, \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \mathbb {1}_{\fancyscript{R}(x,y) \cap \fancyscript{I}} \right] \nonumber \\&\quad +\, \mathbb {E} \left[ d_{\eta ,\epsilon }(x,y) \vee d_{\eta ,\epsilon }(X_1(x),Y_1(y)) \mathbb {1}_{\fancyscript{I}^c} \right] \nonumber \\&\qquad \le (1-\tau _3) d_{\eta ,\epsilon }(x,y). \end{aligned}$$
(53)

Next, by Lemma 7 and Lemma 8-(iv), since \(\delta _\epsilon (x,y) \le 1\), there exists \(C\), such that

$$\begin{aligned} \nonumber&\mathbb {E} \left[ \left| \alpha _h(x,X_1(x) ) - \alpha _h(y,Y_1(y)) \right| \right] \\ \nonumber&\quad \le C \left\| x-y \right\| _Q(\left\| x \right\| _Q \vee \left\| y \right\| _Q \\ \nonumber&\qquad -\delta _\epsilon (x,y) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] )\\&\quad \le C ((1/\eta ) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q\nonumber \\&\qquad + \mathbb {E} \left[ \left\| Z_0 \right\| \right] ) \epsilon d_{\eta ,\epsilon }(x,y). \end{aligned}$$
(54)

Set

$$\begin{aligned} \epsilon _1 \overset{\text {def}}{=}\tau _3/(2 C ((1/\eta _2) +1+ \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \mathbb {E} \left[ \left\| Z_0 \right\| \right] )). \end{aligned}$$
(55)

Therefore by (50)–(53) and (54), for \(\eta \leftarrow \eta _2\) and \(\epsilon \leftarrow \epsilon _1\), for all \(x,y \in \mathbb {R}^d\),

$$\begin{aligned} \mathbf {K}d_{\eta ,\epsilon }(x,y) \le (1-\tau _3/2) d_{\eta ,\epsilon }(x,y). \end{aligned}$$
(56)

\(\square \)

Proof of Proposition 5

For ease of notation, we simply write \(\mathbf {K}\) for \(\mathbf{K}_\mathrm{M}\). Let \(\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}\) be a Markov chain with Markov kernel \(\mathbf {K}\). We denote for all \(n \in \mathbb {N}^*\), \(Z_n\) and \(U_n\), respectively the common gaussian variable and uniform variable, sampled to build \((\mathbf {X}_n,\mathbf {Y}_n)\). Let \(\mathbb {P}\left[ \cdot \right] \) and \(\mathbb {E} \left[ \cdot \right] \) be the probability and the expectation over \(\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} \). Note that by definition the variables \(\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} \) are independent. Under M1 and M2, by definition of \(Q\) and 9 the condition (i) and (ii) of Definition 1 are satisfied. In addition, there exists \(\epsilon ,\eta , \tau >0\) such that for all \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\),

$$\begin{aligned} \mathbf {K}d_{\eta ,\epsilon }(x,y) \le (1-\tau ) d_{\eta ,\epsilon }(x,y) \end{aligned}$$
(57)

Let \(R >0 \), and \(x,y\) be in \(\mathrm{B }_Q\left( 0,R\right) \). Assume first \(d_{\eta ,\epsilon }(x,y) < 1\). Then by (57) and Lemma 2, for every \(n \in \mathbb {N}^*\),

$$\begin{aligned}&\mathbf {K}^nd_{\eta ,\epsilon }(x,y) \le \mathbf {K}^{n-1}d_{\eta ,\epsilon }(x,y) \nonumber \\&\quad \le \cdots \le (1-\tau ) d_{\eta ,\epsilon }(x,y). \end{aligned}$$
(58)

Consider now the case \(d_{\eta ,\epsilon }(x,y) = 1\). Let \(\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}\) be the Markov chain with Markov kernel \(\mathbf {K}\) starting in \((x,y)\). Let \(n \in \mathbb {N}^*\) and denote for all \(1 \le i \le n\)

$$\begin{aligned}&\varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i)\\&\quad = \alpha _h(\mathbf {X}_{i-1}, \fancyscript{O}(\mathbf {X}_{i-1},Z_i)) \wedge \alpha _h(\mathbf {Y}_{i-1}, \fancyscript{O}(\mathbf {Y}_{i-1},Z_i)) \\&\fancyscript{I}_i(n) = \left\{ \left\| (h/2)Q^{-1}\nabla \varUpsilon (0) + \widetilde{h}Q^{-1/2}Z_i \right\| _Q \le R/ n \right\} \\&\fancyscript{A}_i(x,y)= \left\{ U_i \le \varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i) \right\} \\&\widetilde{\fancyscript{A}}^i(x,y,n) = \bigcap _{1 \le j \le i} \left( \fancyscript{I}_j(n) \cap \fancyscript{A}_j(x,y) \right) , \end{aligned}$$

where \(\fancyscript{O}\) given by (26). On the set \(\widetilde{\fancyscript{A}}\,^\mathrm{i}(x,y,i)\), for all \(1 \le j \le i\), \(\mathbf {X}_j= \fancyscript{O}(\mathbf {X}_{i-1},Z_i), \mathbf {Y}_j = \fancyscript{O}(\mathbf {Y}_{i-1},Z_i) \) and \( \left\| \mathbf {X}_j \right\| _Q \vee \left\| \mathbf {Y}_j \right\| _Q \le 2R \). Then, since by Lemma 8-(iii),

$$\begin{aligned}&d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n) \le \epsilon ^{-1}\left\| \mathbf {X}_n-\mathbf {Y}_n \right\| _Q \\&\quad \times \left( \eta \left\{ \left\| \mathbf {X}_n \right\| _Q\vee \left\| \mathbf {Y}_n \right\| _Q \right\} + 1 \right) , \end{aligned}$$

by Lemma 6-(1) on \(\widetilde{\fancyscript{A}}^n(x,y,n)\) it holds

$$\begin{aligned} d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n) \le \epsilon ^{-1} \nu ^n \left\| x-y \right\| _Q( 2 \eta R + 1). \end{aligned}$$

This inequality and \(d_{\eta ,\epsilon }\le 1\) yield

$$\begin{aligned} \nonumber&\mathbf {K}^n d_{\eta ,\epsilon }(x,y) \\&\quad = \widetilde{\mathbb {E}}_{x,y} \left[ d_{\eta ,\epsilon }(\mathbf {X}_n,\mathbf {Y}_n)( \mathbb {1}_{\widetilde{\fancyscript{A}}^n(x,y,n)} + \mathbb {1}_{(\widetilde{\fancyscript{A}}^n(x,y,n))^c}) \right] \end{aligned}$$
(59)
$$\begin{aligned} \nonumber&\quad \le \epsilon ^{-1} \nu ^n \left\| x-y \right\| _Q( 2 \eta R + 1) \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] \nonumber \\&\qquad +\mathbb {P}\left[ (\widetilde{\fancyscript{A}}^n(x,y,n))^c \right] \nonumber \\ \nonumber&\quad \le \epsilon ^{-1} \nu ^n 2 R ( 2 \eta R + 1) \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] \nonumber \\&s+ \mathbb {P}\left[ (\widetilde{\fancyscript{A}}^n(x,y,n))^c \right] \nonumber \\&\quad \le 1+ \left( \epsilon ^{-1} \nu ^n 2 R ( 2 \eta R \!+\! 1) \!-\!1 \right) \, \mathbb {P}\left[ \widetilde{\fancyscript{A}}^n(x,y,n) \right] . \end{aligned}$$
(60)

As \(\nu \in \left( 0,1\right) \), there exists \(m\) such that, \(\epsilon ^{-1} \nu ^m 2 R (2 \eta R + 1) < 1\). It remains to lower bound \(\mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] \) by a positive constant to conclude, which is done by the following inequalities, where we use the independence of the random variables \(\left\{ Z_i , U_i ; \;i \in \mathbb {N}^* \right\} \).

$$\begin{aligned}&\mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] = \mathbb {P}\left[ \widetilde{\fancyscript{A}}^{m-1}(x,y,m) \cap \fancyscript{I}_m(m) \right] \\&\ \widetilde{\mathbb {E}}_{x,y} \left[ \varPsi (\mathbf {X}_{m-1}, \mathbf {Y}_{m-1}, Z_m) \left| \widetilde{\fancyscript{A}} \right. ^{m-1}(x,y,m) \cap \fancyscript{I}_m(m) \right] . \end{aligned}$$

For all \(1 \le i \le m\), on the event \( \bigcap _{j \le i} \fancyscript{I}_j(m)\), it holds

$$\begin{aligned}&\varPsi (\mathbf {X}_{i-1},\mathbf {Y}_{i-1},Z_i) \\&\quad \ge \exp \left( -\sup _{(z,t) \in \mathrm{B }_Q\left( 0,2R\right) } G(z,t)^+ \right) = \delta , \end{aligned}$$

where \(\delta \in (0,1)\), since \(G\) is continuous by M1. Therefore, since \(Z_i\) is independent of \(\widetilde{\fancyscript{A}}^{i-1}(x,y,m)\), we have

$$\begin{aligned} \mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] \ge \delta \ \mathbb {P}\left[ \widetilde{\fancyscript{A}}_{m-1}(x,y,m) \right] \ \mathbb {P}\left[ \fancyscript{I}_m(m) \right] . \end{aligned}$$

An immediate induction leads to

$$\begin{aligned} \mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right]&\ge \mathbb {P}\left[ \fancyscript{I}_1(m) \right] ^m \delta ^m. \end{aligned}$$

Plugging this result in (60) and (58) imply there exists \(s \in \left( 0,1\right) \) such that for all \(x,y \in \mathrm{B }_Q\left( 0,R\right) \), \(\mathbf {K}^m d_{\eta ,\epsilon }(x,y) \le s d_{\eta ,\epsilon }(x,y)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Durmus, A., Moulines, É. Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm. Stat Comput 25, 5–19 (2015). https://doi.org/10.1007/s11222-014-9511-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9511-z

Keywords

Navigation