Skip to main content

Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

Abstract

Monte Carlo algorithms often aim to draw from a distribution \(\pi \) by simulating a Markov chain with transition kernel \(P\) such that \(\pi \) is invariant under \(P\). However, there are many situations for which it is impractical or impossible to draw from the transition kernel \(P\). For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace \(P\) by an approximation \(\hat{P}\). Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ‘close’ the chain given by the transition kernel \(\hat{P}\) is to the chain given by \(P\). We apply these results to several examples from spatial statistics and network analysis.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  • Ahn, S., Korattikara, A., Welling, M.: Bayesian posterior sampling via stochastic gradient Fisher scoring. In: Proceedings of the 29th International Conference on Machine Learning. (2012)

  • Andrieu, C., Roberts, G.: The pseudo-marginal approach for efficient Monte-Carlo computations. Ann. Stat. 37(2), 697–725 (2009)

    MATH  MathSciNet  Article  Google Scholar 

  • Andrieu, C., Vihola, M.: Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Preprint arXiv:1210.1484 (2012).

  • Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 31st International Conference on Machine Learning (2014)

  • Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164, 1139–1160 (2003)

    Google Scholar 

  • Besag, J.E.: Spatial Interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)

    MATH  MathSciNet  Google Scholar 

  • Bottou, L., Bousquet, O.: The tradeoffs of large-scale learning. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 351–368. MIT Press, Cambridge (2011)

    Google Scholar 

  • Bühlmann, P., Van de Geer, S.: Statistics for High-Dimensional Data. Springer, Berlin (2011)

    MATH  Book  Google Scholar 

  • Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc. Netw. 33, 41–55 (2011)

    Article  Google Scholar 

  • Dalalyan, A., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin. J. Comput. Syst. Sci. 78(5), 1423–1443 (2012)

    MATH  MathSciNet  Article  Google Scholar 

  • Ferré, D., Hervé, L., Ledoux, J.: Regular perturbation of \(V\)-geometrically ergodic Markov chains. J. Appl. Probab. 50(1), 184–194 (2013)

  • Friel, N., Pettitt, A.N.: Likelihood estimation and inference for the autologistic model. J. Comput. Graph. Stat. 13, 232–246 (2004)

    MathSciNet  Article  Google Scholar 

  • Friel, N., Pettitt, A.N., Reeves, R., Wit, E.: Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comput. Graph. Stat. 18, 243–261 (2009)

    MathSciNet  Article  Google Scholar 

  • Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94, 661–672 (2007)

    MATH  MathSciNet  Article  Google Scholar 

  • Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)

    MATH  Article  Google Scholar 

  • Gilks, W., Roberts, G., George, E.: Adaptive direction sampling. Statistician 43, 179–189 (1994)

    Article  Google Scholar 

  • Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltoian Monte Carlo methods (with discussion). J. R. Stat. Soc. Ser. B 73, 123–214 (2011)

    MathSciNet  Article  Google Scholar 

  • Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  • Kartashov, N.V.: Strong Stable Markov Chains. VSP, Utrecht (1996)

    MATH  Google Scholar 

  • Korattikara, A., Chen Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis–Hastings Budget. In: Proceedings of the 31st International Conference on Machine Learning, pp. 681–688 (2014)

  • Liang, F. Jin, I.-H.: An auxiliary variables Metropolis–Hastings algorithm for sampling from distributions with intractable normalizing constants. Technical report (2011)

  • Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)

    MATH  MathSciNet  Article  Google Scholar 

  • Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability. Cambridge University Press, Cambridge (1993)

    MATH  Book  Google Scholar 

  • Mitrophanov, A.Y.: Sensitivity and convergence of uniformly ergodic Markov chains. J. Appl. Probab. 42, 1003–1014 (2005)

    MATH  MathSciNet  Article  Google Scholar 

  • Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov chain Monte-Carlo method for distributions with intractable normalizing constants. Biometrika 93, 451–458 (2006)

    MathSciNet  Article  Google Scholar 

  • Murray, I., Ghahramani, Z., MacKay, D.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence UAI06, AUAI Press, Arlington, Virginia (2006)

  • Nicholls, G. K., Fox, C., Watt, A.M.: Coupled MCMC with a randomized acceptance probability. Preprint arXiv:1205.6857 (2012)

  • Propp, J., Wilson, D.: Exactly sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algorithms 9, 223–252 (1996)

    MATH  MathSciNet  Article  Google Scholar 

  • Reeves, R., Pettitt, A.N.: Efficient recursions for general factorisable models. Biometrika 91, 751–757 (2004)

    MATH  MathSciNet  Article  Google Scholar 

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    MATH  MathSciNet  Article  Google Scholar 

  • Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis–Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337–357 (2002)

    MATH  MathSciNet  Article  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996a)

    MATH  MathSciNet  Article  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithm. Biometrika 83(1), 95–110 (1996b)

    MATH  MathSciNet  Article  Google Scholar 

  • Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph models for social networks. Soc. Netw. 29(2), 169–348 (2007)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)

    MATH  Article  Google Scholar 

  • Welling, M., Teh, Y. W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, pp. 681–688 (2011)

Download references

Acknowledgments

The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant Number SFI/12/RC/2289. Nial Friel’s research was also supported by an Science Foundation Ireland grant: 12/IP/1424.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Friel.

Appendix: Proofs

Appendix: Proofs

Proof of Corollary 2.3

We apply Theorem 2.1. First, note that we have

$$\begin{aligned} P(\theta ,\mathrm{d}\theta ')&= \delta _{\theta }(\mathrm{d}\theta ') \left[ 1-\int \mathrm{d}t\; h(t|\theta ) \min \left( 1,\alpha (\theta ,t)\right) \right] \\&+\, h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \end{aligned}$$

and

$$\begin{aligned} \hat{P}(\theta ,\mathrm{d}\theta ')&= \delta _{\theta }(\mathrm{d}\theta ')\, \Biggl [1-\iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \Biggr ]\\&+ \int \mathrm{d}y'F_{\theta '}(y') \Bigl [ h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr ]. \end{aligned}$$

So we can write

$$\begin{aligned}&(P-\hat{P})(\theta ,\mathrm{d}\theta ')\\&\quad = \delta _{\theta }(\mathrm{d}\theta ') \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \Bigl [ \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \\&\quad - \min \left( 1,\alpha (\theta ,t)\right) \Bigr ]+\int \mathrm{d}y'\; F_{\theta '}(y') \Bigl [ h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr ] \end{aligned}$$

and, finally,

$$\begin{aligned} \Vert P-\hat{P}\Vert&= \frac{1}{2}\sup _{\theta } \int |P-\hat{P}|(\theta ,\mathrm{d}\theta ')\\&= \frac{1}{2}\sup _{\theta } \Biggl \{ \Biggl | \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \Bigl [ \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \\&\quad -\,\min \left( 1,\alpha (\theta ,t)\right) \Bigr ] \Biggr |\\&\quad +\, \Biggl | \iint \mathrm{d}y'\; \mathrm{d}\theta '\; F_{\theta '}(y') \Biggl [h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Biggr ] \Biggr |\Biggr \}\\&= \sup _{\theta } \Biggl \{ \Biggl | \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y')\\&\qquad \times \, \Bigl [\min \left( 1,\hat{\alpha }(\theta ,t,y')\right) - \min \left( 1,\alpha (\theta ,t)\right) \Bigr ] \Biggr | \Biggr \}\\&\le \sup _{\theta } \iint \mathrm{d}y'\; \mathrm{d}\theta ' F_{\theta '}(y') h(\theta '|\theta ) \Bigl | \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr |\\&= \sup _{\theta } \int \mathrm{d}\theta '\; h(\theta '|\theta ) \int \mathrm{d}y'\; F_{\theta '}(y') \Bigl | \min (1,\alpha (\theta ,\theta '))\\&\quad - \,\min (1,\hat{\alpha }(\theta ,\theta ',y')) \Bigr |\\&\le \sup _{\theta } \int \mathrm{d}\theta '\; h(\theta '|\theta ) \delta (\theta ,\theta '). \end{aligned}$$

\(\square \)

Proof of Lemma 1

We still use Theorem 2.1, note that

Now, note that

where \(X\sim \mathcal {N}(0,I)\) and . Then:

$$\begin{aligned} \mathbb {E}\Biggl |&1 - \exp \left( a^T X - \frac{\Vert a\Vert ^2 }{2} \right) \Biggr | \\&= \exp \left( - \frac{\Vert a\Vert ^2}{2} \right) \mathbb {E}\Biggl | \exp \left( a^T X \right) - \exp \left( \frac{\Vert a\Vert ^2 }{2} \right) \Biggr |\\&= \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \mathbb {E}\Biggl | \exp \left( a^T X \right) - \mathbb {E}\left[ \exp \left( a^T X \right) \right] \Biggr |\\&\le \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\mathrm{Var}[\exp \left( a^T X \right) ]}\\&=\exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\mathbb {E}\left[ \exp \left( 2 a^T X \right) \right] -\mathbb {E}\left[ \exp \left( a^T X \right) \right] ^2}\\&= \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\exp (2 \Vert a\Vert ^2) - \exp (\Vert a\Vert ^2 )} \\&= \sqrt{\exp (\Vert a\Vert ^2) -1 }. \end{aligned}$$

So finally,

\(\square \)

Proof of Lemma 2

We only have to check that

$$\begin{aligned}&\mathbb {E}_{y'\sim F_{\theta '}} \left| \hat{\alpha }(\theta ,\theta ',y')-\alpha (\theta ,\theta ')\right| \\&\quad \le \int \mathrm{d}y'\; f(y'|\theta ') \Bigl | \alpha (\theta ,\theta ') - \hat{\alpha }(\theta ,\theta ',y') \Bigr |\\&=\frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)}\\&\quad \quad \times \,\,\mathbb {E}_{y_1',\dots ,y_N'\sim f(\cdot |\theta ')} \left| \frac{1}{N}\sum _{i=1}^{N} \frac{q_{\theta }(y_i')}{q_{\theta '}(y_i')} - \frac{Z(\theta )}{Z(\theta ')} \right| \\&\le \frac{1}{\sqrt{N}} \frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)} \sqrt{\mathrm{Var}_{y_1 '\sim f(y_1 '|\theta ')} \left( \frac{q_{\theta _n}(y_1')}{q_{\theta '}(y_1')} \right) }. \end{aligned}$$

\(\square \)

Proof of Theorem 3.1

Under the assumptions of Theorem 3.1, note that (4) leads to

$$\begin{aligned} \alpha (\theta _n,\theta ') = \frac{\pi (\theta ')q_{\theta '}(y) Z(\theta _n)}{\pi (\theta _n)q_{\theta _n}(y)Z(\theta ') } \frac{h(\theta _n|\theta ')}{h(\theta '|\theta _n)} \ge \frac{1}{c_{\pi }^2 c_{h}^2 \mathcal {K}^4}. \end{aligned}$$
(10)

Let us consider any measurable subset \(B\) of \(\varTheta \) and \(\theta \in \varTheta \). We have

$$\begin{aligned} P(\theta ,B)&= \int _{B} \delta _{\theta }(\mathrm{d}\theta ') \left[ 1-\int \mathrm{d}t\; h(t|\theta ) \min \left( 1,\alpha (\theta ,t)\right) \right] \\&\quad +\int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\ge \int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\ge \frac{1}{c_{\pi }^2 c_{h}^2 \mathcal {K}^4} \int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \text { thanks to~(10)}\\&\ge \frac{1}{c_{\pi }^2 c_{h}^3 \mathcal {K}^4} \int _B \mathrm{d}\theta '. \end{aligned}$$

This proves that \(\varTheta \) is a small set for the Lebesgue measure (multiplied by constant \(1/c_{\pi }^2 c_{h}^3 \mathcal {K}^4\)) on \(\varTheta \). According to Theorem 16.0.2 page 394 in Meyn and Tweedie (1993), this proves that:

$$\begin{aligned} \sup _{\theta } \Vert \delta _{\theta } P - \pi (\cdot |y) \Vert \le C \rho ^n \end{aligned}$$

where

$$\begin{aligned} C = 2 \text { and } \rho = 1 - \frac{1}{c_{\pi }^3 c_{h}^3 \mathcal {K}^4} \end{aligned}$$

(note that, by definition, \(\mathcal {K},c_\pi ,c_h>1\) so we necessarily have \(0<\rho <1\)). So, Condition (H1) in Lemma 2.3 is satisfied.

Moreover,

$$\begin{aligned} \delta (\theta ,\theta ')&= \frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)} \sqrt{\mathrm{Var}_{y '\sim f(y '|\theta ')} \left( \frac{q_{\theta _n}(y')}{q_{\theta '}(y')} \right) } \\&\le c_h^2 c_{\pi }^2 \frac{q_{\theta '}(y)}{q_{\theta }(y)} \sqrt{\mathbb {E}_{y '\sim f(y '|\theta ')} \left[ \left( \frac{q_{\theta _n}(y')}{q_{\theta '}(y')} \right) ^2\right] } \le c_h^2 c_{\pi }^2 \mathcal {K}^4. \end{aligned}$$

So, Condition (H2) in Lemma 2.3 is satisfied. We can apply this lemma and to give

$$\begin{aligned} \sup _{\theta _0\in \varTheta } \Vert \delta _{\theta _0} P^n - \delta _{\theta _0} \hat{P}^n \Vert \le \frac{\mathcal {C}}{\sqrt{N}} \end{aligned}$$

with

$$\begin{aligned} \mathcal {C} = c_\pi ^2 c_h^2 \mathcal {K}^4 \left( \lambda + \frac{C\rho ^{\lambda }}{1-\rho } \right) \end{aligned}$$

with \(\lambda =\left\lceil \frac{\log (1/C)}{\log (\rho )} \right\rceil \). \(\square \)

Proof of Lemma 3

Note that

So we have to find an upper bound, uniformly over \(\theta \), for

$$\begin{aligned}&D\,{:=}\, \mathbb {E}_{y'\sim F_{\theta _n}} \Biggl \{ \exp \Biggl [ \frac{\sigma ^2}{2}\Biggl \Vert \Sigma ^{\frac{1}{2}}\Biggl (\frac{1}{N}\sum _{i=1}^{N} s(y'_i)\\&\quad \quad - \mathbb {E}_{y'\sim f_{\theta }}[s(y')] \Biggr )\Biggr \Vert ^2 \Biggr ] -1 \Biggr \}. \end{aligned}$$

Let us put \(V:=\frac{1}{N}\sum _{i=1}^N V^{(i)} := \frac{1}{N}\sum _{i=1}^{N} \Sigma ^{\frac{1}{2}} \{ s(y'_i) - \mathbb {E}_{y'\sim f_{\theta }}[s(y')]\}\) and denote \(V_j\) (\(j=1,\dots ,k\)) the coordinates of \(V\), and \(V_j^{(i)}\) (\(j=1,\dots ,k\)) the coordinates of \(V^{(i)}\). We have

$$\begin{aligned} D&= \mathbb {E} \left\{ \exp \left[ \frac{1}{2}\sum _{j=1}^k V_j^2 \right] -1 \right\} \\&= \mathbb {E} \left\{ \exp \left[ \frac{1}{k}\sum _{j=1}^k \frac{k}{2} V_j^2 \right] -1 \right\} \\&\le \frac{1}{k}\sum _{j=1}^k \mathbb {E} \left\{ \exp \left[ \frac{k}{2} V_j^2 \right] -1 \right\} . \end{aligned}$$

Now, remark that \(V_j=\frac{1}{N}\sum _{i=1}^{N} V_j^{(i)}\) with \(-\mathcal {S} \Vert \Sigma \Vert \le V_j^i \le \mathcal {S} \Vert \Sigma \Vert \) so, Hoeffding’s inequality ensures, for any \(t\ge 0\),

$$\begin{aligned} \mathbb {P} \left( \left| \sqrt{N} V_j \right| \ge t \right) \le 2 \exp \left[ - \frac{t^2}{2 \mathcal {S}^2 \Vert \Sigma \Vert ^2 }. \right] \end{aligned}$$

As a consequence, for any \(\tau >0\),

$$\begin{aligned} \mathbb {E} \exp&\left[ \frac{k}{2} V_j^2 \right] = \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N}V_j\right) ^2 \right] \\&= \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N} V_j\right) ^2 \mathbf {1}_{|\sqrt{N} V_j|\le \tau } \right] \\&\quad + \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N}V_j\right) ^2 \mathbf {1}_{|\sqrt{N} V_j|> \tau } \right] \\&= \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + \int _{\tau }^{\infty } \exp \left( \frac{k}{2N} x^2 \right) \mathbb {P}\left( \left| \sqrt{N} V_j\right| \ge x\right) \mathrm{d} x \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \int _{\tau }^{\infty } \exp \left[ \left( \frac{k}{2N} - \frac{1}{2\mathcal {S}^2\Vert \Sigma \Vert ^2} \right) x^2 \right] \mathrm{d} x \\&=\exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2 k}{N}}}\\&\quad \times \mathbb {P}\left( |\mathcal {N}| > \tau \sqrt{\frac{1}{\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2k\sigma ^2}{N}}}\right) \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2 \Vert \Sigma \Vert ^2}-\frac{2k}{N}}} \exp \left[ -\frac{\tau ^2}{ \left( \frac{2}{\mathcal {S}^2 \Vert \Sigma \Vert ^2}-\frac{4k}{N}\right) } \right] \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2 \Vert \Sigma \Vert ^2} -\frac{2k}{N}}} \exp \left[ - \frac{\tau ^2\mathcal {S}^2 \Vert \Sigma \Vert ^2}{2}\right] \end{aligned}$$

where \(\mathcal {N}\sim \mathcal {N}(0,1)\). Now, we assume that \(N>4 k \mathcal {S}^2 \Vert \Sigma \Vert ^2\). This leads to \(\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2k}{N} >\frac{1}{2\mathcal {S}^2 \Vert \Sigma \Vert ^2}\). This simplifies the bound to

$$\begin{aligned} \mathbb {E} \exp \left[ \frac{k}{2} V_j^2 \right]&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&+\,\, 4\sqrt{\pi } \mathcal {S} \Vert \Sigma \Vert \exp \left[ -\frac{\tau ^2\mathcal {S}^2 \Vert \Sigma \Vert ^2}{2} \right] . \end{aligned}$$

Finally, we put \(\tau =\sqrt{\log (N/k)/(2\mathcal {S}^2 \Vert \Sigma \Vert ^2)}\) to get

$$\begin{aligned} \mathbb {E} \exp \left[ \frac{k}{2} V_j^2 \right] \le \exp \left( \frac{k \log \left( \frac{N}{k}\right) }{4 \mathcal {S}^2 \Vert \Sigma \Vert ^2 N} \right) + \frac{4k\sqrt{\pi } \mathcal {S}\Vert \Sigma \Vert }{N}. \end{aligned}$$

It follows that

$$\begin{aligned} D\le \exp \left( \frac{k \log (N)}{4 \mathcal {S}^2 \Vert \Sigma \Vert ^2 N} \right) -1 +\frac{4k \sqrt{\pi } \mathcal {S} \Vert \Sigma \Vert }{N}. \end{aligned}$$

This ends the proof. \(\square \)

Proof of Lemma 3.2

We just check all the conditions of Theorem 2.2. First, from Lemma 3, we know that \(\Vert P_{\Sigma }-\hat{P}_{\Sigma }| \le \sqrt{\delta /2}\rightarrow 0\) when \(N\rightarrow \infty \). Then, we have to find the function \(V\). Note that here:

$$\begin{aligned} \nabla \log \pi (\theta |y)&= \nabla \log \pi (\theta ) + s(y)-\mathbb {E}_{y|\theta }[s(y)]\\&= - \frac{\theta }{s^2} + s(y)-\mathbb {E}_{y|\theta }[s(y)]\\&\asymp - \frac{\theta }{s^2}. \end{aligned}$$

Then, according to Theorem 3.1 page 352 in Roberts and Tweedie (1996a) (and its proof), we know that for \(\Sigma <s^2\), for some positive numbers \(a\) and \(b\), for \(V(\theta )=a\theta \) when \(\theta \ge 0\) and \(V(\theta )=-b\theta \) for \(\theta <0\), there is a \(0<\delta <1\), \(\beta >0\) and an inverval \(I\) with

$$\begin{aligned} \int V(\theta ) P_{\Sigma } (\theta _0,\mathrm{d}\theta ) \le \delta V(\theta _0) + L\mathbf {1}_{I}(\theta _0), \end{aligned}$$

and so \(P_{\Sigma }\) is geometrically ergodic with function \(V\). We calculate

and:

So,

$$\begin{aligned} \int V(\theta ) \hat{P}_{\Sigma } (\theta _0,\mathrm{d}\theta )&\le \int V(\theta ) P_{\Sigma } (\theta _0,\mathrm{d}\theta ) + 2\mathcal {S} \max (a,b) \\&\le \delta V(\theta _0) + [L + 2\mathcal {S} \max (a,b)]. \end{aligned}$$

So all the assumptions of Theorem 2.2 are satisfied, and we can conclude that \( \Vert \pi _{\Sigma }-\pi _{\Sigma ,N}\Vert \xrightarrow [N\rightarrow \infty ]{} 0\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alquier, P., Friel, N., Everitt, R. et al. Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels. Stat Comput 26, 29–47 (2016). https://doi.org/10.1007/s11222-014-9521-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9521-x

Keywords

  • Markov chain Monte Carlo
  • Pseudo-marginal Monte Carlo
  • Intractable likelihoods