Abstract
In this paper, we establish explicit convergence rates for Markov chains in Wasserstein distance. Compared to the more classical total variation bounds, the proposed rate of convergence leads to useful insights for the analysis of MCMC algorithms, and suggests ways to construct sampler with good mixing rate even if the dimension of the underlying sampling space is large. We illustrate these results by analyzing the Exponential Integrator version of the Metropolis Adjusted Langevin Algorithm. We illustrate our findings using a Bayesian linear inverse problem.
Similar content being viewed by others
Notes
The underlying problem is typically infinite-dimensional; the problem is finite dimensional after truncation.
References
Baxendale, P.H.: Renewal theory and computable convergence rates for geometrically ergodic Markov chains. Ann. Appl. Probab. 15(1B), 700–738 (2005)
Beskos, A., Roberts, G., Stuart, A.M., Voss, J.: An MCMC method for diffusion bridges. Stochas. Dynam. 8, 319–350 (2008)
Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. Wiley, New York (1973) (reprinted in paperback 1992 ISBN: 0-471-57428-7 pbk.)
Butkovsky, O.: Subgeometric rates of convergence of Markov processes in the Wasserstein metric. Ann. Appl. Probab. 24(2), 526–552 (2014)
Cotter, S.L., Roberts, G.O., Stuart, A.M., White, D.: MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci. 28(3), 424–446 (2013)
Dashti, M., Law, K.J.H., Stuart, A.M., Voss, J.: MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Probl. 29(9), 095017 (2013)
Eberle, A.: Error bounds for Metropolis-Hastings algorithms applied to perturbations of gaussian measures in high dimensions. Ann. Appl. Probab. 24(1), 337–377 (2014)
Hairer, M., Mattingly, J.C., Scheutzow, M.: Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probab. Theor. Relat. Fields 149(1–2), 223–259 (2011)
Hairer, M., Stuart, A.M., Vollmer, S.J.: Spectral gaps for Metropolis-Hastings algorithm in infinite dimensions. Ann. Appl. Probab. 24(6), 2455–2490 (2014)
Jarner, S.F., Tweedie, R.L.: Locally contracting iterated functions and stability of Markov chains. J. Appl. Probab. 38(2), 494–507 (2001)
Jones, G., Hobert, J.: Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16(4), 312–334 (2001)
Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160. Applied Mathematical Sciences. Springer, New York (2005)
Kent, J.: Time-reversible diffusions. Adv. Appl. Probab. 10(4), 819–835 (1978)
Lord, G.J., Rougemont, J.: A numerical scheme for stochastic PDEs with Gevrey regularity. IMA J. Numer. Anal. 24(4), 587–604 (2004)
Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, New York (2009)
Meyn, S.P., Tweedie, R.L.: Computable bounds for convergence rates of Markov chains. Ann. Appl. Probab. 4, 981–1011 (1994)
Roberts, G.O., Rosenthal, J.S.: General state space Markov chains and MCMC algorithms. Probab. Surv. 1, 20–71 (2004)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Roberts, G.O., Tweedie, R.L.: Bounds on regeneration times and convergence rates for Markov chains. Stochast. Process. Appl. 80, 211–229 (1999)
Rogers, L., Williams, D.: Diffusions, Markov processes, and martingales, vol. 2. Cambridge Mathematical Library, Cambridge University Press, Cambridge (2000) (Itô calculus, Reprint of the second (1994) edition)
Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90(430), 558–566 (1995)
Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)
Villani, C.: Optimal transport : Old and New. Grundlehren der Mathematischen Wissenschaften. Springer, Berlin (2009)
Author information
Authors and Affiliations
Corresponding author
A Proofs
A Proofs
Lemma 5
Assume M1.
-
1.
For all \(x,y,z \in \mathbb {R}^d\),
$$\begin{aligned}&\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),z \right\rangle \le C_\varUpsilon \left\| x-y \right\| _Q\left\| z \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (x) \right\| _Q \le C_\varUpsilon \left\| x \right\| _Q + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&\quad \left\langle \nabla \varUpsilon (z), x-y \right\rangle \le C_\varUpsilon ( C_\varUpsilon \left\| z \right\| _Q \\&\quad + \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q) \left\| x-y \right\| _Q. \end{aligned}$$ -
2.
For all \(x,y \in \mathbb {R}^d\) and \(h \le 4/(C_\varUpsilon ^2 +1)\),
$$\begin{aligned}&\left\| x-(h/2)Q^{-1}\nabla U(x) - \{y - (h/2) Q^{-1} \nabla U (y) \} \right\| _Q \nonumber \\&\quad \le \nu \left\| x-y \right\| _Q, \end{aligned}$$(41)where
$$\begin{aligned} \nu = \left( 1- h (1- h(1+C_\varUpsilon ^2)/4) \right) ^{1/2}. \end{aligned}$$(42)In particular,
$$\begin{aligned}&\left\| x - (h/2) Q^{-1}\nabla U(x) \right\| _Q\nonumber \\&\quad \le \nu \left\| x \right\| _Q + (h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q. \end{aligned}$$(43)
Proof
-
(1)
is just a consequence of the definition of \(\left\langle \cdot ,\cdot \right\rangle _Q\), M1, the Cauchy–Schwarz inequality and the triangle inequality.
-
(2)
Let \(h \le 4/(C_\varUpsilon ^2+1)\). On M1, since \(\varUpsilon \) is convex and \(C^1\), for all \(x,y \in \mathbb {R}^d\),
$$\begin{aligned}&\left\| x-y-(h/2)Q^{-1}(\nabla U(x) - \nabla U(y)) \right\| _Q^2 \\&\quad = (1-h/2)^2\left\| x-y \right\| _Q^2 \\&\qquad +(h^2/4)\left\| Q^{-1}( \nabla \varUpsilon (x) - \nabla \varUpsilon (y)) \right\| _Q^2 \\&\qquad - h (1-h/2)\left\langle \nabla \varUpsilon (x) - \nabla \varUpsilon (y),x-y \right\rangle \\&\quad \le \left( 1- h(1 - h (C_\varUpsilon ^2+1) /4) \right) \left\| x-y \right\| _Q ^2, \end{aligned}$$showing (41). Equation (43) follows from (41) and the triangle inequality. \(\square \)
Lemma 6
Assume M1.
-
1.
For all \(x,y \in \mathbb {R}^d\),
$$\begin{aligned} \left\| X_1(x) - Y_1(y) \right\| _Q&\le \nu \left\| x-y \right\| _Q \\ \left\| X_1(x) \right\| _Q&\le \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q \\&+ \widetilde{h}\left\| Q^{-1/2}Z_0 \right\| _Q, \end{aligned}$$where \((X_1(x), Y_1(y))\) and \(\nu \) are given by (28) and (42), respectively.
-
2.
For all \(x,y \in \mathbb {R}^d\)
$$\begin{aligned}&\left\| Q^{-1}(\nabla \varUpsilon (X_1(x)) - \nabla \varUpsilon (Y_1(y))) \right\| _Q \le C_\varUpsilon \nu \left\| x-y \right\| _Q \\&\quad \left\| Q^{-1}\nabla \varUpsilon (X_1(x)) \right\| _Q \\&\quad \le C_\varUpsilon \left( \nu \left\| x \right\| _Q +(h/2) \left\| Q^{-1} \nabla \varUpsilon (0) \right\| _Q + \widetilde{h}\left\| Z_0 \right\| \right) \!. \end{aligned}$$
Proof
-
(1)
is just a consequence of Lemma 5-(2). Then
-
(2)
follows from M1 and Lemma 5-(1). \(\square \)
Lemma 7
Assume M1. There exists \(C\) such that
Proof
Let us write
Using M1, Lemma 5 and Lemma 6 we have the following inequalities for \(I_i\), \(i=1 \ldots 4\).
Denote \(\widehat{h} =h /(8-2h)\). Then,
\(\square \)
Proof of Proposition
4 Set \(R = \nu ^{-1} \vee R_h\). First, by Lemma 6-(1) and M1,
Next, if \(x \not \in \mathrm{B }_Q\left( 0,R\right) \), let \(X_1(x)\) defined by (28) and \(U \sim \mathcal {U} (\left[ 0,1\right] )\) and set
On \(\fancyscript{A}(x)\), the proposal is accepted and \(\mathbf {X}_1 = X_1(x)\). On this complement, \(\mathbf {X}_1 = x\). Then by Lemma 6-(1) and since \(\left\| x \right\| _Q \ge \nu \left\| x \right\| _Q \ge 1\),
with \(\lambda = \mathbb {P}\left[ \fancyscript{I} \right] (1 - (1-\nu ) \mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] ) + \mathbb {P}\left[ \fancyscript{I}^c \right] \). Since by M2, \(\mathbb {P}\left[ \fancyscript{I} \right] \) and \(\mathbb {P}\left[ \fancyscript{A}(x) \left| \fancyscript{I} \right. \right] \) are positive, \(\lambda \in \left( 0,1\right) \). In addition on M1, \((h/2)\left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\mathbb {E} \left[ \left\| Z_0 \right\| \right] \) is finite. Therefore, the proof is concluded combining (45) and (46). \(\square \)
Lemma 8
Let \(x,y \in \mathbb {R}^d\), \(T ,\epsilon , \eta \in \mathbb {R}^*_+\) and \(\psi \in \fancyscript{F}_{x,y}^T\). Denote
-
(i)
\(T \ge \left\| x-y \right\| _Q\).
-
(ii)
If \( \eta \int _0 ^T \left\| \psi (s) \right\| _Q \mathrm {d}s + T < \epsilon \), then \(T \le \delta _\epsilon (x,y) \le \epsilon \).
-
(iii)
\(d_{\eta ,\epsilon }(x,y) \le \epsilon ^{-1} \, \left\| x-y \right\| _Q \left( \eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} +1 \right) \)
-
(iv)
If \(d_{\eta ,\epsilon }(x,y) < 1\), then
$$\begin{aligned} d_{\eta , \epsilon }(x,y)&\ge \epsilon ^{-1} \, \left\| x-y \right\| _Q \\&\left( (\eta \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\}-\delta _\epsilon (x,y))\vee 0 +1 \right) . \end{aligned}$$ -
(v)
If \(d_{\eta , \epsilon }(x,y) < 1\) then
$$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \left\| x-(h/2)\nabla U(x) - (y - (h/2) \nabla U(y)) \right\| _Q \right. \\&\quad \left\{ \eta \left\| x-(h/2)\nabla U(x) \right\| _Q \vee \left\| y-(h/2)\nabla U(y) \right\| _Q \right. \\&\quad \left. \left. \quad + \eta \widetilde{h}\left\| Z_0 \right\| _Q+1 \right\} \right) \\&\quad \left( \left\| x-y \right\| _Q\left\{ \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right\} \right) ^{-1} \end{aligned}$$and under M1 for all \(h \le 4/(C_\varUpsilon ^2 +1)\),
$$\begin{aligned}&d_{\eta , \epsilon }(X_1(x), Y_1(y) ) / d_{\eta ,\epsilon } (x,y) \\&\quad \le \left( \nu \left\{ \eta \nu \{\left\| x \right\| _Q \vee \left\| y \right\| _Q\} \right. \right. \\&\quad \left. \left. +\eta (h/2) \left\| Q^{-1}\nabla \varUpsilon (0) \right\| _Q+ \widetilde{h}\eta \left\| Z_0 \right\| +1 \right\} \right) \\&\quad \left( \eta \{\left\| x \right\| \vee \left\| y \right\| - \delta _\epsilon (x,y)\} \vee 0 +1 \right) ^{-1} \;, \end{aligned}$$with \(\nu \) given by (42).
Proof of Lemma 8
(i) By definition of \(\fancyscript{F}_{x,y}^T\),
(ii) First, it is straightforward to see \(T < \epsilon \). In addition, for all \(s \in \left[ 0,T\right] \),
Then the result follows from integrating the inequality between \(0\) and \(T\) and using the assumption.
(iii) It suffices to consider the particular case, \(T = \left\| y-x \right\| _Q\) and \(\psi \in \fancyscript{F}^T_{x,y}\) defined for \(s \in \left[ 0,T\right] \) by
(iv) Let \(\{(T_n,\psi _n) ; n \in \mathbb {N}\}\) such that
Then using (i)–(ii) and (48), for all \(n\) it holds
The result now comes from (49).
(v) The claimed inequalities come from (iii)–(iv) with the definition of the basic coupling (28) for the first one, and using Lemma 6-(1) for the second. \(\square \)
Lemma 9
Assume M1 and M2.
Then for all \(h \in \left( 0, h_\ell \wedge (4/C_U) \right) \), there exists \(\epsilon , \eta , \tau >0\) such that for all \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\),
with \(d_{\eta ,\epsilon }\) given by (32). In particular, for all \(x,y \in \mathbb {R}^d\),
Proof
For ease of notation, we simply write \(\mathbf {K}\) for \(\mathbf{K}_\mathrm{M}\). Let \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\). Then, if \(\left\| y \right\| _Q \le R_h\),
where we used Lemma 8-(iv). Therefore, as we will choose \(\epsilon \) small enough we can assume \(\epsilon \le 1\), and we end up with two cases: either \(x,y \in \mathrm{B }_Q\left( 0, R_h +2\right) \) or \(x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) \). Let \((\mathbf {X}_1,\mathbf {Y}_1)\) be the basic coupling between \(P(x,\cdot )\) and \(P(y,\cdot )\); let \(Z_0, U\) be resp. the Gaussian variable and the uniform variable used for the basic coupling. Let \(\mathbb {P}\left[ \cdot \right] \) and \(\mathbb {E} \left[ \cdot \right] \) be the probability and the expectation over \(Z_1\) and \(U_1\). Set
On the event \(\fancyscript{A}(x,y)\), the moves are both accepted so that \(\mathbf {X}_1 = X_1(x)\) and \(\mathbf {Y}_1 = Y_1(y)\); On the event \(\fancyscript{R}(x,y)\), the moves are both rejected so that \(\mathbf {X}_1 =x\) and \(\mathbf {Y}_1 =y\). Then for all event \(\fancyscript{I}\), it holds,
where we have used \(d_{\eta ,\epsilon }\) is bounded by \(1\). First, by Lemma 8-(v) since \(\delta _\epsilon (x,y) \le \epsilon \le 1\), there exist \(\eta _1,\tau _1 >0\) such that for all \(\eta < \eta _1\)
Let us define \(\tau _2 = (1-(1-\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] ) \tau _1)\). We claim that \(\tau _2 >0\). Indeed, if \(x,y \in \mathrm{B }_Q\left( 0,R_h + 2\right) \), since \(\varUpsilon \) and \(Q^{-1}\nabla \varUpsilon \) are continuous by assumption, we have by Lemma 6-(1)
If \(x,y \not \in \mathrm{B }_Q\left( 0,R_h\right) \), by M2, \(\mathbb {P}\left[ \fancyscript{A}(x,y) \left| \fancyscript{I} \right. \right] > a_l >0\). Therefore, as \(\mathbb {P}\left[ \fancyscript{R}(x,y) \cap \fancyscript{I} \right] \le \mathbb {P}\left[ \fancyscript{I} \right] - \mathbb {P}\left[ \fancyscript{I}\cap \fancyscript{A}(x,y) \right] \)
By Lemma 8-(ii)–(v), since \(\delta _\epsilon (x,y) \le \epsilon \le 1\) and for all \(a,b,c \in \mathbb {R}_+\), \((a+b)/(a+c) \le 1 \vee (b/c)\),
Also, by the Dominated Convergence theorem and Lemma 8-(v), for all \(\kappa >0\) there exists \(\eta _2 \in \left( 0,\eta _1\right] \) such that for all \(\eta \in \left( 0, \eta _2\right] \),
Therefore since \(\mathbb {P}\left[ \fancyscript{I} \right] >0\), using (51)–(52) and choosing \(\kappa \) small enough, there exists \(\tau _3, \eta _2 >0\) such that for all \(\eta \in \left( 0,\eta _2\right] \)
Next, by Lemma 7 and Lemma 8-(iv), since \(\delta _\epsilon (x,y) \le 1\), there exists \(C\), such that
Set
Therefore by (50)–(53) and (54), for \(\eta \leftarrow \eta _2\) and \(\epsilon \leftarrow \epsilon _1\), for all \(x,y \in \mathbb {R}^d\),
\(\square \)
Proof of Proposition 5
For ease of notation, we simply write \(\mathbf {K}\) for \(\mathbf{K}_\mathrm{M}\). Let \(\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}\) be a Markov chain with Markov kernel \(\mathbf {K}\). We denote for all \(n \in \mathbb {N}^*\), \(Z_n\) and \(U_n\), respectively the common gaussian variable and uniform variable, sampled to build \((\mathbf {X}_n,\mathbf {Y}_n)\). Let \(\mathbb {P}\left[ \cdot \right] \) and \(\mathbb {E} \left[ \cdot \right] \) be the probability and the expectation over \(\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} \). Note that by definition the variables \(\left\{ Z_n , U_n ; \;n \in \mathbb {N} \right\} \) are independent. Under M1 and M2, by definition of \(Q\) and 9 the condition (i) and (ii) of Definition 1 are satisfied. In addition, there exists \(\epsilon ,\eta , \tau >0\) such that for all \(x,y \in \mathbb {R}^d\), \(d_{\eta ,\epsilon }(x,y) <1\),
Let \(R >0 \), and \(x,y\) be in \(\mathrm{B }_Q\left( 0,R\right) \). Assume first \(d_{\eta ,\epsilon }(x,y) < 1\). Then by (57) and Lemma 2, for every \(n \in \mathbb {N}^*\),
Consider now the case \(d_{\eta ,\epsilon }(x,y) = 1\). Let \(\{(\mathbf{X}_n,\,\mathbf{Y}_n), n\in \mathbb {N}\}\) be the Markov chain with Markov kernel \(\mathbf {K}\) starting in \((x,y)\). Let \(n \in \mathbb {N}^*\) and denote for all \(1 \le i \le n\)
where \(\fancyscript{O}\) given by (26). On the set \(\widetilde{\fancyscript{A}}\,^\mathrm{i}(x,y,i)\), for all \(1 \le j \le i\), \(\mathbf {X}_j= \fancyscript{O}(\mathbf {X}_{i-1},Z_i), \mathbf {Y}_j = \fancyscript{O}(\mathbf {Y}_{i-1},Z_i) \) and \( \left\| \mathbf {X}_j \right\| _Q \vee \left\| \mathbf {Y}_j \right\| _Q \le 2R \). Then, since by Lemma 8-(iii),
by Lemma 6-(1) on \(\widetilde{\fancyscript{A}}^n(x,y,n)\) it holds
This inequality and \(d_{\eta ,\epsilon }\le 1\) yield
As \(\nu \in \left( 0,1\right) \), there exists \(m\) such that, \(\epsilon ^{-1} \nu ^m 2 R (2 \eta R + 1) < 1\). It remains to lower bound \(\mathbb {P}\left[ \widetilde{\fancyscript{A}}^m(x,y,m) \right] \) by a positive constant to conclude, which is done by the following inequalities, where we use the independence of the random variables \(\left\{ Z_i , U_i ; \;i \in \mathbb {N}^* \right\} \).
For all \(1 \le i \le m\), on the event \( \bigcap _{j \le i} \fancyscript{I}_j(m)\), it holds
where \(\delta \in (0,1)\), since \(G\) is continuous by M1. Therefore, since \(Z_i\) is independent of \(\widetilde{\fancyscript{A}}^{i-1}(x,y,m)\), we have
An immediate induction leads to
Plugging this result in (60) and (58) imply there exists \(s \in \left( 0,1\right) \) such that for all \(x,y \in \mathrm{B }_Q\left( 0,R\right) \), \(\mathbf {K}^m d_{\eta ,\epsilon }(x,y) \le s d_{\eta ,\epsilon }(x,y)\). \(\square \)
Rights and permissions
About this article
Cite this article
Durmus, A., Moulines, É. Quantitative bounds of convergence for geometrically ergodic Markov chain in the Wasserstein distance with application to the Metropolis Adjusted Langevin Algorithm. Stat Comput 25, 5–19 (2015). https://doi.org/10.1007/s11222-014-9511-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9511-z