Skip to main content

Rate-optimal refinement strategies for local approximation MCMC

Abstract

Many Bayesian inference problems involve target distributions whose density functions are computationally expensive to evaluate. Replacing the target density with a local approximation based on a small number of carefully chosen density evaluations can significantly reduce the computational expense of Markov chain Monte Carlo (MCMC) sampling. Moreover, continual refinement of the local approximation can guarantee asymptotically exact sampling. We devise a new strategy for balancing the decay rate of the bias due to the approximation with that of the MCMC variance. We prove that the error of the resulting local approximation MCMC (LA-MCMC) algorithm decays at roughly the expected \(1/\sqrt{T}\) rate, and we demonstrate this rate numerically. We also introduce an algorithmic parameter that guarantees convergence given very weak tail bounds, significantly strengthening previous convergence results. Finally, we apply LA-MCMC to a computationally intensive Bayesian inverse problem arising in groundwater hydrology.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Notes

  1. We note that our \({\widehat{\pi }}(x) = \exp \widehat{{\mathcal {L}}}(x)\) often fails to be a probability density. We resolve this technical problem in Sect. 3, and somewhat surprisingly our approach means that this problem has very little impact on the following heuristic calculations.

References

  • Al-Murad, M., Zubari, W.K., Uddin, S.: Geostatistical characterization of the transmissivity: an example of Kuwait aquifers. Water 10(7), 828 (2018)

    Article  Google Scholar 

  • Angelikopoulos, P., Papadimitriou, C., Koumoutsakos, P.: X-TMCMC: adaptive kriging for Bayesian inverse modeling. Comput. Methods Appl. Mech. Eng. 289, 409–428 (2015)

    MathSciNet  Article  Google Scholar 

  • Blanco, J.L., Rai, P.K.: nanoflann: a C++ header-only fork of FLANN, a library for nearest neighbor (NN) with kd-trees. https://github.com/jlblancoc/nanoflann (2014)

  • Blatman, G., Sudret, B.: Adaptive sparse polynomial chaos expansion based on least angle regression. J. Comput. Phys. 230(6), 2345–2367 (2011)

    MathSciNet  Article  Google Scholar 

  • Bliznyuk, N., Ruppert, D., Shoemaker, C.A.: Local derivative-free approximation of computationally expensive posterior densities. J. Comput. Graph. Stat. 21(2), 476–495 (2012)

    MathSciNet  Article  Google Scholar 

  • Chkrebtii, O.A., Campbell, D.A., Calderhead, B., Girolami, M.A., et al.: Bayesian solution uncertainty quantification for differential equations. Bayesian Anal. 11(4), 1239–1267 (2016)

    MathSciNet  MATH  Google Scholar 

  • Christen, J.A., Fox, C.: Markov chain Monte Carlo using an approximation. J. Comput. Graph. Stat. 14(4), 795–810 (2005)

    MathSciNet  Article  Google Scholar 

  • Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization, vol. 8. Siam, Philadelphia (2009)

    Book  Google Scholar 

  • Conrad, P.R., Marzouk, Y.M., Pillai, N.S., Smith, A.: Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. J. Am. Stat. Assoc. 111(516), 1591–1607 (2016)

    MathSciNet  Article  Google Scholar 

  • Conrad, P.R., Davis, A.D., Marzouk, Y.M., Pillai, N.S., Smith, A.: Parallel local approximation MCMC for expensive models. SIAM/ASA J. Uncertain. Quantifi. 6(1), 339–373 (2018)

    MathSciNet  Article  Google Scholar 

  • Constantine, P.G., Kent, C., Bui-Thanh, T.: Accelerating Markov chain Monte Carlo with active subspaces. SIAM J. Sci. Comput. 38(5), A2779–A2805 (2016)

    MathSciNet  Article  Google Scholar 

  • Cotter, S.L., Dashti, M., Stuart, A.M.: Approximation of Bayesian inverse problems for PDEs. SIAM J. Numer. Anal. 48(1), 322–345 (2010)

    MathSciNet  Article  Google Scholar 

  • Cui, T., Fox, C., O’sullivan, M.: Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm. Water Resources Res. 47(10) (2011)

  • Cui, T., Martin, J., Marzouk, Y.M., Solonen, A., Spantini, A.: Likelihood-informed dimension reduction for nonlinear inverse problems. Inverse Prob. 30(11), 114015 (2014)

    MathSciNet  Article  Google Scholar 

  • Cui, T., Marzouk, Y., Willcox, K.: Scalable posterior approximations for large-scale Bayesian inverse problems via likelihood-informed parameter and state reduction. J. Comput. Phys. 315, 363–387 (2016)

    MathSciNet  Article  Google Scholar 

  • Dodwell, T.J., Ketelsen, C., Scheichl, R., Teckentrup, A.L.: A hierarchical multilevel Markov chain Monte Carlo algorithm with applications to uncertainty quantification in subsurface flow. SIAM/ASA J. Uncertain. Quantif. 3(1), 1075–1108 (2015)

    MathSciNet  Article  Google Scholar 

  • Haario, H., Saksman, E., Tamminen, J.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001)

    MathSciNet  Article  Google Scholar 

  • Janetti, E.B., Riva, M., Straface, S., Guadagnini, A.: Stochastic characterization of the montalto uffugo research site (italy) by geostatistical inversion of moment equations of groundwater flow. J. Hydrol. 381(1–2), 42–51 (2010)

    Article  Google Scholar 

  • Jardani, A., Dupont, J.P., Revil, A., Massei, N., Fournier, M., Laignel, B.: Geostatistical inverse modeling of the transmissivity field of a heterogeneous alluvial aquifer under tidal influence. J. Hydrol. 472, 287–300 (2012)

    Article  Google Scholar 

  • Jasra, A., Kamatani, K., Law, K.J., Zhou, Y.: A multi-index Markov chain Monte Carlo method. Int. J. Uncertain. Quantif. 8(1) (2018)

  • Johndrow, J.E., Mattingly, J.C., Mukherjee, S., Dunson, D.: Optimal approximating Markov chains for Bayesian inference. arXiv preprint arXiv:1508.03387 (2015)

  • Johnson, S.G.: The nlopt nonlinear-optimization package (2014)

  • Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems, vol. 160. Springer, Berlin (2006)

    MATH  Google Scholar 

  • Kaipio, J., Somersalo, E.: Statistical inverse problems: discretization, model reduction and inverse crimes. J. Comput. Appl. Math. 198(2), 493–504 (2007)

    MathSciNet  Article  Google Scholar 

  • Kohler, M.: Universal consistency of local polynomial kernel regression estimates. Ann. Inst. Stat. Math. 54(4), 879–899 (2002)

    MathSciNet  Article  Google Scholar 

  • Łatuszyński, K., Rosenthal, J.S.: The containment condition and AdapFail algorithms. J. Appl. Probab. 51(4), 1189–1195 (2014)

    MathSciNet  Article  Google Scholar 

  • Li, J., Marzouk, Y.M.: Adaptive construction of surrogates for the Bayesian solution of inverse problems. SIAM J. Sci. Comput. 36(3), A1163–A1186 (2014)

    MathSciNet  Article  Google Scholar 

  • Llorente, F., Martino, L., Read, J., Delgado, D.: A survey of Monte Carlo methods for noisy and costly densities with application to reinforcement learning. arXiv preprint arXiv:2108.00490 (2021)

  • Marzouk, Y., Xiu, D.: A stochastic collocation approach to Bayesian inference in inverse problems. Commun. Comput. Phys. 6(4), 826–847 (2009)

    MathSciNet  Article  Google Scholar 

  • Matott, L.S.: Screening-level sensitivity analysis for the design of pump-and-treat systems. Groundwater Monitor. Remediat. 32(2), 66–80 (2012)

    Article  Google Scholar 

  • Medina-Aguayo, F., Rudolf, D., Schweizer, N.: Perturbation bounds for Monte Carlo within Metropolis via restricted approximations. arXiv preprint arXiv:1809.09547 (2018)

  • Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, Berlin (2012)

    MATH  Google Scholar 

  • None, N.: 40 cfr part 191 subparts b and c and 40 cfr 194 monitoring implementation plan (rev. 3). Tech. rep., Waste Isolation Pilot Plant (WIPP), Carlsbad, NM (United States); Washington (2003)

  • Pillai, N.S., Smith, A.: Ergodicity of approximate MCMC chains with applications to large data sets. arXiv preprint arXiv:1405.0182 (2014)

  • Pool, M., Carrera, J., Alcolea, A., Bocanegra, E.: A comparison of deterministic and stochastic approaches for regional scale inverse modeling on the Mar del Plata aquifer. J. Hydrol. 531, 214–229 (2015)

    Article  Google Scholar 

  • Roberts, G.O., Rosenthal, J.S.: Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probab. 44(2), 458–475 (2007)

    MathSciNet  Article  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83(1), 95–110 (1996)

    MathSciNet  Article  Google Scholar 

  • Rote, G., Tichy, R.F.: Quasi-Monte Carlo methods and the dispersion of point sequences. Math. Comput. Model. 23(8–9), 9–23 (1996)

    MathSciNet  Article  Google Scholar 

  • Rudolf, D., Schweizer, N., et al.: Perturbation theory for Markov chains via Wasserstein distance. Bernoulli 24(4A), 2610–2639 (2018)

    MathSciNet  Article  Google Scholar 

  • Schillings, C., Schwab, C.: Sparsity in Bayesian inversion of parametric operator equations. Inverse Prob. 30(6), 065007 (2014)

    MathSciNet  Article  Google Scholar 

  • Sherlock, C., Golightly, A., Henderson, D.A.: Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. J. Comput. Graph. Stat. 26(2), 434–444 (2017). https://doi.org/10.1080/10618600.2016.1231064

    MathSciNet  Article  Google Scholar 

  • Stone, C.J.: Consistent nonparametric regression. Ann. Stat. 595–620 (1977)

  • Stuart, A., Teckentrup, A.: Posterior consistency for Gaussian process approximations of Bayesian posterior distributions. Math. Comput. 87(310), 721–753 (2018)

  • Willmann, M., Carrera, J., Sánchez-Vila, X., Vázquez-Suñé, E.: On the meaning of the transmissivity values obtained from recovery tests. Hydrogeol. J. 15(5), 833–842 (2007)

    Article  Google Scholar 

  • Wolff, U., Collaboration, A., et al.: Monte carlo errors with less errors. Comput. Phys. Commun. 156(2), 143–153 (2004)

    Article  Google Scholar 

  • Zahm, O., Cui, T., Law, K., Spantini, A., Marzouk, Y.: Certified dimension reduction in nonlinear Bayesian inverse problems. Mathematics of Computation (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew D. Davis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

AS was supported by NSERC. AD and YM were supported by the SciDAC program of the DOE Office of Advanced Scientific Computing Research.

Theoretical results

Theoretical results

We include all theoretical results from the paper. Throughout this section, we only deal with the case that the underlying proposal distribution \(q_{t}\) does not change with t. To deal with typical small adaptations, we believe that the following framework can be combined with, e.g., the approach of Roberts and Rosenthal (2007), but this would result in a significantly longer paper and these adaptations are not central to our approach.

General bounds on non-Markovian approximate MCMC algorithms

Proceeding more formally, let \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) be a triple satisfying:

  1. 1.

    \(\{{\hat{X}}_t\}_{t \ge 0}\) is a sequence of random variables on \({\mathbb {R}}^{d}\);

  2. 2.

    \(\{{\hat{K}}_t\}_{t \ge 0}\) is a (typically random) sequence of transition kernels on \({\mathbb {R}}^{d}\);

  3. 3.

    \(\{{\mathcal {F}}_t\}_{t \ge 0}\) is a filtration, and \(\{{\hat{X}}_t, {\hat{K}}_t\}_{t \ge 0}\) is adapted to this filtration;

  4. 4.

    the three agree in the sense that

    $$\begin{aligned} {\mathbb {P}}[X_{s+1} \in A \vert {\mathcal {F}}_{s}] = {\hat{K}}_s({\hat{X}}_s, A) \end{aligned}$$
    (38)

    for all \(s \ge 0\) and all measurable A. Note that, in particular, both left- and right-hand sides are \({\mathcal {F}}_s\)-measurable random variables in [0, 1].

In practice, \({\mathcal {F}}_{s}\) is generated by our sequence of approximations to the true log-target. We use the following quantitative assumptions:

Assumption 4

(Lyapunov inequality) There exists \(V: {\mathbb {R}}^{d} \rightarrow [1, \infty )\) and constants \(0 < \alpha \le 1\) and \(0 \le \beta < \infty \) so that

$$\begin{aligned} ({\hat{K}}_s V)({\hat{X}}_{s}) \le (1-\alpha ) V({\hat{X}}_{s}) + \beta \end{aligned}$$
(39)

and

$$\begin{aligned} (KV)(x) \le (1-\alpha ) V(x) + \beta \end{aligned}$$
(40)

for all \(s \ge 0\). The second inequality should hold deterministically; note that this is an \({\mathcal {F}}_{s}\)-measurable event.

Assumption 5

(Good approximation) Let Assumption 1 or 4 hold. There exists a monotonically decreasing function \(\delta :[0, \infty ) \rightarrow [0, 0.5)\) so that

$$\begin{aligned} \Vert K({\hat{X}}_{s}, \cdot ) - {\hat{K}}_s({\hat{X}}_{s}, \cdot )\Vert _{TV} \le \delta (s) V(x) \end{aligned}$$
(41)

for all \(s \ge 0\) and \(x \in {\mathbb {R}}^{d}\). Again, this inequality should hold deterministically, which is an \({\mathcal {F}}_{s}\)-measurable event. For notational convenience, we define \(\delta (s) = \delta (0)\) for all \(s < 0\).

Initial coupling bounds

The following is our main technical lemma. It is not monotone in the time s; this will be remedied in applications.

Lemma 1

Let Assumptions 24, and 5 hold. There exists a constant \(0< C < \infty \) depending only on \(\alpha \), \(\beta \), R, and \(\gamma \) so that for \(x \in {\mathbb {R}}^{d}\) and triple \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) started at \({\hat{X}}_0 = x\), we have

$$\begin{aligned} \Vert {\mathbb {P}}[{\hat{X}}_s \in \cdot ] - \pi (\cdot ) \Vert _{TV} \le {\left\{ \begin{array}{ll} 1 &{} \text{ if } s \le C_0 \\ C \delta (0) s &{} \text{ if } s > C_0, \end{array}\right. } \end{aligned}$$

where \(C_0 = C \log {(\delta (0)^{-1}V(x))}\).

Proof

Define the “small set”

$$\begin{aligned} {\mathcal {C}} = \left\{ y : V(y) \le \frac{4 \beta }{\alpha }\right\} \end{aligned}$$
(42)

and the associated hitting time

$$\begin{aligned} \tau _{{\mathcal {C}}} = \min {\{t : {\hat{X}}_t \in {\mathcal {C}}\}}. \end{aligned}$$
(43)

Denote by \(T_b \ge 0\) a “burn-in” time whose value will be fixed toward the end of the proof.

By the triangle inequality, for all measurable \(A \subset {\mathbb {R}}^{d}\)

$$\begin{aligned}&\vert {\mathbb {P}}[{\hat{X}}_s \in A] - \pi (A) \vert \\&\quad \le \vert {\mathbb {P}}[{\hat{X}}_s \in A, \tau _{{\mathcal {C}}} \le T_b] - \pi (A) {\mathbb {P}}[\tau _{{\mathcal {C}}} \le T_b] \vert \\&\qquad + \vert {\mathbb {P}}[{\hat{X}}_s \in A, \tau _{{\mathcal {C}}}> T_b] - \pi (A) {\mathbb {P}}[\tau _{{\mathcal {C}}}> T_b] \vert \\&\quad \le \vert {\mathbb {P}}[{\hat{X}}_s \in A, \tau _{{\mathcal {C}}} \le T_b] - \pi (A) {\mathbb {P}}[\tau _{{\mathcal {C}}} \le T_b] \vert \\&\qquad + {\mathbb {P}}[\tau _{{\mathcal {C}}} > T_b] \end{aligned}$$

To bound the first term, note that

$$\begin{aligned}&\vert {\mathbb {P}}[{\hat{X}}_s \in A, \tau _{{\mathcal {C}}} \le T_b] - \pi (A) {\mathbb {P}}[\tau _{{\mathcal {C}}} \le T_b] \vert \nonumber \\&\quad \le \sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\vert {\mathbb {P}}[{\hat{X}}_s\in A \vert {\hat{X}}_u = y, \tau _{{\mathcal {C}}}=u] - \pi (A) \vert } \nonumber \\&\quad \le \sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}\nonumber \\&\quad {\vert {\mathbb {P}}[{\hat{X}}_s\in A \vert {\hat{X}}_u = y, \tau _{{\mathcal {C}}}=u] - K^{s-u-1}(y, A) \vert } \nonumber \\&\qquad + \sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\vert K^{s-u-1}(y, A) - \pi (A) \vert }. \end{aligned}$$
(44)

Assumption 2 gives

$$\begin{aligned} \sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\left| K^{s-u-1}(y, A) - \pi (A)\right| } \le R \gamma ^{s-T_b-1} \end{aligned}$$

and Assumption 5 gives

$$\begin{aligned}&\sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\vert {\mathbb {P}}[{\hat{X}}_s\in A \vert {\hat{X}}_u = y, \tau _{{\mathcal {C}}}=u] - K^{s-u-1}(y, A) \vert } \\&\quad \le \sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\sum _{t=u}^{s-1} \delta (t) (K^{s-t-1} V)(y)}. \end{aligned}$$

Furthermore,

$$\begin{aligned}&\sup _{y \in {\mathcal {C}}, 0 \le u \le T_b}{\sum _{t=u}^{s-1} \delta (t) (K^{s-t-1} V)(y)} + R \gamma ^{s-T_b-1} \\&\quad \le \sup _{y \in {\mathcal {C}}}{\sum _{t=0}^{s-1} \delta (t) (K^{s-t-1} V)(y)} + R \gamma ^{s-T_b-1}. \end{aligned}$$

Substituting this back into (44) and by Assumption 4,

$$\begin{aligned}&\vert {\mathbb {P}}[{\hat{X}}_s \in A, \tau _{{\mathcal {C}}} \le T_b] - \pi (A) {\mathbb {P}}[\tau _{{\mathcal {C}}} \le T_b] \vert \nonumber \\&\quad \le \sup _{y \in {\mathcal {C}}}{\sum _{t=0}^{s-1} \delta (t) ((1-\alpha )^{s-t-1} V(y) + \beta /\alpha )} + R \gamma ^{s-T_b-1} \end{aligned}$$
(45a)
$$\begin{aligned}&\quad \le \sum _{t=0}^{s-1} \delta (t) ((1-\alpha )^{s-t-1} 4 \beta /\alpha + \beta /\alpha ) + R \gamma ^{s-T_b-1} \end{aligned}$$
(45b)
$$\begin{aligned}&\quad \le \delta (0) 5 s \beta /\alpha + R \gamma ^{s-T_b-1}. \end{aligned}$$
(45c)

To bound the second term in (44), recall from Assumption 4 that

$$\begin{aligned}&{\mathbb {E}}[V({\hat{X}}_{t+1}) {\mathbf {1}}_{\tau _{{\mathcal {C}}}> t} \vert {\mathcal {F}}_t] \\&\quad \le ((1-\alpha )V({\hat{X}}_t) + \beta ) {\mathbf {1}}_{V({\hat{X}}_t)>4 \beta /\alpha } \\&\quad \le \left( 1-\frac{3\alpha }{4}\right) V({\hat{X}}_t) + \left( \beta - \frac{\alpha }{4} V({\hat{X}}_t)\right) {\mathbf {1}}_{V({\hat{X}}_t)>4 \beta /\alpha } \\&\quad \le \left( 1-\frac{3\alpha }{4}\right) V({\hat{X}}_t) + \left( \beta - \frac{\alpha }{4} \frac{4 \beta }{\alpha }\right) {\mathbf {1}}_{V({\hat{X}}_t)>4 \beta /\alpha } \\&\quad \le \left( 1-\frac{3\alpha }{4}\right) V({\hat{X}}_t) \end{aligned}$$

for all \(t \ge 0\). Iterating, we find by induction on t that

$$\begin{aligned} {\mathbb {E}}[V({\hat{X}}_{t}) {\mathbf {1}}_{\tau _{{\mathcal {C}}} > t}] \le \left( 1-\frac{3\alpha }{4}\right) ^{t} V({\hat{X}}_0) = \left( 1-\frac{3\alpha }{4}\right) ^{t} V(x). \end{aligned}$$

Thus, by Markov’s inequality,

$$\begin{aligned} {\mathbb {P}}[\tau _{{\mathcal {C}}}>T_b]= & {} {\mathbb {P}}\left[ V({\hat{X}}_{\tau _{{\mathcal {C}}}}) {\mathbf {1}}_{\tau _{{\mathcal {C}}}> T_b}>\frac{4 \beta }{\alpha } \right] \nonumber \\\le & {} \frac{\alpha }{4 \beta }\left( 1-\frac{3\alpha }{4}\right) ^{T_b} V(x). \end{aligned}$$
(46)

Combining (45) and (46), we have shown that

$$\begin{aligned}&\vert {\mathbb {P}}[{\hat{X}}_s \in A] - \pi (A) \vert \le \delta (0) \frac{5 s \beta }{\alpha } + R \gamma ^{s-T_b-1} \nonumber \\&\quad + \frac{\alpha }{4 \beta }\left( 1-\frac{3\alpha }{4}\right) ^{T_b} V(x). \end{aligned}$$
(47)

Finally, we can choose \(T_b\). Set

$$\begin{aligned} T(s) = \max {\{t : R \gamma ^{s-t-1}, \frac{\alpha }{4\beta }\left( 1-\frac{3\alpha }{4}\right) ^{t} V(x) \le \frac{1}{2}\delta (0)\}} \end{aligned}$$

and define \(T_b = \lfloor T(s) \rfloor \) when \(0< T(s) < \infty \) and \(T_b = 0\) otherwise. Define \(S=\min {\{s : T(s) \in (0, \infty )\}}\). Noting that \(S = \varTheta (\log {(\delta (0)^{-1})}+\log {(V(x))})\) for fixed \(\alpha \), \(\beta \), R, and \(\gamma \) completes the proof. \(\square \)

We strengthen Lemma 1 by first observing that if \({\hat{X}}_0\) satisfies \(V({\hat{X}}_0) \le 4 \beta / \alpha \), then

$$\begin{aligned} {\mathbb {E}}[V({\hat{X}}_1)]\le & {} (1-\alpha ) {\mathbb {E}}[ V({\hat{X}}_0)]+ \beta \le (1-\alpha ) \frac{4 \beta }{\alpha } + \beta \nonumber \\\le & {} \frac{4 \beta }{\alpha }. \end{aligned}$$
(48)

Thus, by induction, if \({\mathbb {E}}[V({\hat{X}}_{0})] \le 4 \beta / \alpha \), then \({\mathbb {E}}[V({\hat{X}}_s)] \le 4 \beta / \alpha \) for all time \(s \ge 0\). Using this (and possibly relabeling the starting time to the quantity denoted by \(T_{0}(s)\)), Lemma 1 has the immediate slight strengthening:

Lemma 2

Let Assumptions 24, and 5 hold. There exists a constant \(0< C < \infty \) depending only on \(\alpha \), \(\beta \), R, and \(\gamma \) so that for all starting distributions \(\mu \) on \({\mathbb {R}}^{d}\) with \(\mu (V) \le 4 \beta / \alpha \) and triple \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) started at \({\hat{X}}_0 \sim \mu \), we have

$$\begin{aligned}&\Vert {\mathbb {P}}[{\hat{X}}_s \in \cdot ] - \pi (\cdot ) \Vert _{TV} \\&\le {\left\{ \begin{array}{ll} 1, &{} s \le C_0 \\ C \delta ( T_{0}(s) ) \log {(\delta (T_{0}(s))^{-1})}, &{} s > C_0, \end{array}\right. } \end{aligned}$$

where \(C_0 = C \delta (0) \log {(\delta (0)^{-1})}\) and

$$\begin{aligned} T_{0}(s) = s - C \delta (0) \log {(\delta (0)^{-1})}. \end{aligned}$$
(49)

Application to bounds on mean-squared error

Recall that \(f : {\mathbb {R}}^{d} \rightarrow [-1,1]\) with \(\pi (f) = 0\). We apply Lemma 2 to obtain the following bound on the Monte Carlo bias:

Lemma 3

(Bias estimate) Let Assumptions 24, and 5 hold. There exists a constant \(0< C < \infty \) depending only on \(\alpha \), \(\beta \), R, and \(\gamma \) so that for all starting points \(x \in {\mathbb {R}}^{d}\) with \(V(x) \le 4 \beta / \alpha \) and triple \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) started at \({\hat{X}}_0 = x\) we have

$$\begin{aligned}&\left| {\mathbb {E}}\left[ \frac{1}{T} \sum _{t=1}^{T} f({\hat{X}}_t) \right] \right| \\&\quad \le \ {\left\{ \begin{array}{ll} 1, &{} T \le C_0 \\ \frac{C_0}{T} + \frac{C}{T} \displaystyle \sum _{s=C_0}^{T} \delta (T_{0}(s)) \log {(\delta (T_{0}(s))^{-1})} &{} T > C_0, \end{array}\right. } \end{aligned}$$

where \(C_0 = C \delta (0) \log {(\delta (0)^{-1})}\).

Proof

In the notation of Lemma 2, we have for \(T > C_0\) sufficiently large

$$\begin{aligned} \left| {\mathbb {E}}\left[ \frac{1}{T} \sum _{t=1}^{T} f({\hat{X}}_t) \right] \right|\le & {} \frac{1}{T} \sum _{t=1}^{C_0} |{\mathbb {E}}[f({\hat{X}}_{t})]| + \frac{1}{T} \sum _{t=C_0}^{T} |{\mathbb {E}}[f({\hat{X}}_{t})]| \\\le & {} \frac{C_0}{T} + \frac{C}{T} \sum _{s=C_0}^{T} \delta (T_{0}(s)) \log (\delta (T_{0}(s))^{-1}). \end{aligned}$$

\(\square \)

We have a similar bound for the Monte Carlo variance:

Lemma 4

(Covariance estimate) Let Assumptions 24, and 5 hold. There exists a constant \(0< C < \infty \) depending only on \(\alpha \), \(\beta \), R, and \(\gamma \) so that for all starting points \(x \in {\mathbb {R}}^{d}\) with \(V(x) \le 4 \beta / \alpha \) and triple \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) started at \({\hat{X}}_0 = x\) we have

$$\begin{aligned}&\sqrt{\vert {\mathbb {E}}[f({\hat{X}}_s) f({\hat{X}}_t)] - {\mathbb {E}}[f({\hat{X}}_s)] {\mathbb {E}}[f({\hat{X}}_t)] \vert } \\&\quad \le \ {\left\{ \begin{array}{ll} 1, &{} m(s,t) \le C_0 \\ C \delta ( T_{0} ) \log {(\delta (T_{0})^{-1})}, &{} m(s,t) > C_0, \end{array}\right. } \end{aligned}$$

where \(m(s,t) = \min (s,t,|t-s|)\), \(T_{0} = T_{0}(m(s,t))\) is as in (49), and \(C_0 = C \delta (0) \log {(\delta (0)^{-1})}\) as before.

Proof

By the triangle inequality

$$\begin{aligned}&\vert {\mathbb {E}}[f({\hat{X}}_s) f({\hat{X}}_t)] - {\mathbb {E}}[f({\hat{X}}_s)] {\mathbb {E}}[f({\hat{X}}_t)] \vert \\&\quad \le \vert {\mathbb {E}}[f({\hat{X}}_s) f({\hat{X}}_t)] \vert + \vert {\mathbb {E}}[f({\hat{X}}_s)] {\mathbb {E}}[f({\hat{X}}_t)] \vert . \end{aligned}$$

As above, applying Lemma 2 completes the proof. \(\square \)

The above bias and variance estimates immediate imply our main theorem on the total error of the Monte Carlo estimator:

Theorem 2

Let Assumptions 24, and 5 hold. There exists a constant \(0< C < \infty \) depending only on \(\alpha \), \(\beta \), R, and \(\gamma \) so that for all starting points \(x \in {\mathbb {R}}^{d}\) with \(V(x) \le 4 \beta / \alpha \) and triple \(\{{\hat{X}}_t, {\hat{K}}_t, {\mathcal {F}}_t\}_{t \ge 0}\) started at \({\hat{X}}_0 = x\) we have

$$\begin{aligned} {\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=1}^{T} f({\hat{X}}_t) \right) ^2 \right] \le \frac{2C_{0}}{T^{2}}\sum _{s=1}^{T} C(s) + \frac{3}{T} \sum _{s=1}^{T} C(s)^{2}, \end{aligned}$$

where \(C(s) = C \delta (T_{0}(s)) \log (\delta (T_{0}(s))^{-1})\) and we write \(C_{0} = C_{0}(0)\).

Proof

We calculate

$$\begin{aligned}&{\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=1}^{T} f({\hat{X}}_t) \right) ^2 \right] \\&\quad = T^{-2}\Bigg [\sum _{t=1}^{T} {\mathbb {E}}[f({\hat{X}}_{t})^{2}] \ + \sum _{s,t \, : \, m(s,t) < C_{0}} {\mathbb {E}}[f({\hat{X}}_{s}) f({\hat{X}}_{t})] \ \\&\quad + \sum _{s,t \, : \, m(s,t) \ge C_{0}} {\mathbb {E}}[f({\hat{X}}_{s}) f({\hat{X}}_{t})] \Bigg ] \\&\quad \le T^{-2} \left[ \sum _{s=1}^{T} C_{0}(s) + C_{0} \sum _{s=1}^{T} C_{0}(s) + 3 T \sum _{s=1}^{T} C_{0}(s)^{2} \right] \\&\quad \le \frac{2C_{0}}{T^{2}}\sum _{s=1}^{T} C_{0}(s) + \frac{3}{T} \sum _{s=1}^{T} C_{0}(s)^{2}. \end{aligned}$$

\(\square \)

Inheriting Lyapunov conditions

Observe that each step of the main “for” loop in Algorithm 1 determines an entire transition kernel from any starting point; denote the kernel in step t by \({\mathcal {K}}_{t}\). Finally, let \({\mathcal {F}}_{t}\) be the associated filtration.

Lemma 5

Let Assumptions 1 and 3 hold. Then, in fact Assumption 4 holds as well.

Proof

Under Assumption 3, all proposals that decrease V are more likely to be accepted under \({\hat{K}}_{t}\) than under K, while all proposals that increase V are less likely to be accepted under \({\hat{K}}_{t}\) than under K. Thus, for all x and all t,

$$\begin{aligned} ({\hat{K}}_{t}V)({\hat{X}}_{t}) \le (KV)({\hat{X}}_{t}) \le (1 - \alpha ) V({\hat{X}}_{t}) + \beta , \end{aligned}$$
(50)

which completes the proof. \(\square \)

Final estimates

We combine the theoretical results in the previous sections to obtain a final estimate on the error of our algorithm. Continuing the notation as above, we have our main theoretical result: Theorem 1, whose proof we give here.

Proof

We set some notation. Define

$$\begin{aligned} E(T) \equiv {\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=1}^{T} f({\hat{X}}_t) \right) ^2 \right] . \end{aligned}$$
(51)

Also define the “burn-in” time \(T_{b} = \log (T)^{2}\), and define the hitting time \(\tau _{{\mathcal {C}}}\) as in Eq. (43).

By Lemma 5, Assumption 4 in fact holds. Note that our assumptions also immediately give Assumption 5 with

$$\begin{aligned}\delta (t) \le 2 \gamma _{0} \sqrt{\frac{\tau _{0}}{t}}.\end{aligned}$$

Thus, applying Theorem 2 in line 3 and then Assumption  4 and Markov’s inequality in line 4, we have (in the notation of that theorem and assumption):

$$\begin{aligned}&E(T) \le {\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=1}^{T_{b}} f({\hat{X}}_t) \right) ^2 \right] + {\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=T_{b}+1}^{T} f({\hat{X}}_t) \right) ^2 \right] \\&\qquad + 2 {\mathbb {E}}\left[ \frac{1}{T^{2}} \left( \sum _{t=1}^{T_{b}} f({\hat{X}}_t) \right) \left( \sum _{t=T_{b}+1}^{T} f({\hat{X}}_t) \right) \right] \\&\quad \le {\mathbb {E}}\left[ \left( \frac{1}{T} \sum _{t=1}^{T_{b}} f({\hat{X}}_t) \right) ^2 \right] + \frac{T_{b}^{2}}{T^{2}} + \frac{2T_{b}}{T} \\&\quad \le \frac{2C_{0}}{T^{2}}\sum _{s=1}^{T} C_{0}(s) + \frac{3}{T} \sum _{s=1}^{T} C_{0}(s)^{2} + {\mathbb {P}}[\tau _{{\mathcal {C}}} > T_{b}] \\&\qquad + \frac{T_{b}^{2}}{T^{2}} + \frac{2T_{b}}{T} \\&\quad \le \frac{2C_{0}}{T^{2}}\sum _{s=1}^{T} C_{0}(s) + \frac{3}{T} \sum _{s=1}^{T} C_{0}(s)^{2} + \frac{\alpha }{4 \beta } (1-\alpha )^{T_{b}} V(x) \\&\qquad + \frac{T_{b}^{2}}{T^{2}} + \frac{2T_{b}}{T} \\&\quad =O\Bigg ( \frac{1}{T^{2}} \sum _{s=1}^{T} \sqrt{\frac{\tau _{0}}{s}} \log \left( \sqrt{\frac{\tau _{0}}{s}} \right) + \frac{1}{T} \sum _{s=1}^{T}\frac{\tau _{0}}{s} \log \left( \sqrt{\frac{\tau _{0}}{s}}\right) ^{2} \\&\qquad + \frac{\log (T)^{2}}{T} \Bigg )\\&\quad = O \left( \frac{\log (T)^{3}}{T} \right) . \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Davis, A.D., Marzouk, Y., Smith, A. et al. Rate-optimal refinement strategies for local approximation MCMC. Stat Comput 32, 60 (2022). https://doi.org/10.1007/s11222-022-10123-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10123-0

Keywords

  • Markov chain Monte Carlo
  • Local regression
  • Bayesian inference
  • Surrogate models
  • Sampling methods