The use of a single pseudo-sample in approximate Bayesian computation

Bornn, Luke; Pillai, Natesh S.; Smith, Aaron; Woodard, Dawn

doi:10.1007/s11222-016-9640-7

The use of a single pseudo-sample in approximate Bayesian computation

Published: 14 March 2016

Volume 27, pages 583–590, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Luke Bornn^1,2,
Natesh S. Pillai²,
Aaron Smith³ &
…
Dawn Woodard⁴

806 Accesses
12 Citations
Explore all metrics

Abstract

We analyze the computational efficiency of approximate Bayesian computation (ABC), which approximates a likelihood function by drawing pseudo-samples from the associated model. For the rejection sampling version of ABC, it is known that multiple pseudo-samples cannot substantially increase (and can substantially decrease) the efficiency of the algorithm as compared to employing a high-variance estimate based on a single pseudo-sample. We show that this conclusion also holds for a Markov chain Monte Carlo version of ABC, implying that it is unnecessary to tune the number of pseudo-samples used in ABC-MCMC. This conclusion is in contrast to particle MCMC methods, for which increasing the number of particles can provide large gains in computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

Bayesian Approaches to the Design of Markov Chain Monte Carlo Samplers

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

Article 16 August 2022

References

Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 697–725 (2009)
Article MathSciNet MATH Google Scholar
Andrieu, C., Vihola, M.: Establishing some order amongst exact approximations of MCMCs. arXiv preprint, arXiv:1404.6909v1 (2014)
Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. 72, 269–342 (2010)
Article MathSciNet MATH Google Scholar
Doucet, A., Pitt, M., Deligiannidis, G., Kohn, R.: Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. arXiv preprint, arXiv:1210.1871v3 (2014)
Flury, T., Shephard, N.: Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econom. Theory 27(5), 933–956 (2011)
Article MathSciNet MATH Google Scholar
Guan, Y., Krone, S.M.: Small-world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. Ann. Appl. Probab. 17, 284–304 (2007)
Article MathSciNet MATH Google Scholar
Latuszyński, K., Roberts, G.O.: CLTs and asymptotic variance of time-sampled Markov chains. Methodol. Comput. Appl. Probab. 15(1), 237–247 (2013)
Article MathSciNet MATH Google Scholar
Lee, A., Latuszynski, K.: Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation. arXiv preprint, arXiv:1210.6703 (2013)
Leskelä, L., Vihola, M.: Conditional convex orders and measurable martingale couplings. arXiv preprint, arXiv:1404.0999 (2014)
Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22, 1167–1180 (2012)
Article MathSciNet MATH Google Scholar
Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. 100(26), 15324–15328 (2003)
Article Google Scholar
Narayanan, H., Rakhlin, A.: Random walk approach to regret minimization. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Conference proceedings of NIPS, Advances in Neural Information Processing Systems, vol. 23. Curran Associates, Inc., http://papers.nips.cc/book/advances-in-neural-information-processing-systems-23-2010 (2010)
Pitt, M.K., Silva, R.d S., Giordani, P., Kohn, R.: On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econom. 171(2), 134–151 (2012)
Article MathSciNet Google Scholar
Roberts, G.O., Rosenthal, J.S.: Variance bounding Markov chains. Ann. Appl. Probab. 18, 1201–1214 (2008)
Article MathSciNet MATH Google Scholar
Sherlock, C., Thiery, A.H., Roberts, G.O., Rosenthal, J.S.: On the efficiency of pseudo-marginal random walk Metropolis algorithms. arXiv preprint, arXiv:1309.7209 (2013)
Tavare, S., Balding, D.J., Griffiths, R., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145(2), 505–518 (1997)
Google Scholar
Tierney, L.: A note on Metropolis-Hastings kernels for general state spaces. Ann Appl Probab 8, 1–9 (1998)
Article MathSciNet MATH Google Scholar
Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)
MathSciNet Google Scholar
Woodard, D.B., Schmidler, S.C., Huber, M.: Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. Ann. Appl. Probab. 19, 617–640 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors thank Alex Thiery for his careful reading of an earlier draft, as well as Pierre Jacob, Rémi Bardenet, Christophe Andrieu, Matti Vihola, Christian Robert, and Arnaud Doucet for useful discussions. This research was supported in part by U.S. National Science Foundation grants 1461435, DMS-1209103, and DMS-1406599, by DARPA under Grant No. FA8750-14-2-0117, by ARO under Grant No. W911NF-15-1-0172, and by NSERC.

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, Canada
Luke Bornn
Department of Statistics, Harvard University, Cambridge, USA
Luke Bornn & Natesh S. Pillai
Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
Aaron Smith
School of Operations Research and Information Engineering, Cornell University, Ithaca, USA
Dawn Woodard

Authors

Luke Bornn
View author publications
You can also search for this author in PubMed Google Scholar
Natesh S. Pillai
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Smith
View author publications
You can also search for this author in PubMed Google Scholar
Dawn Woodard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luke Bornn.

Appendices

Proofs

Proof of Theorem 3

Denote by $H_{2,\alpha }$ the transition kernel of the pseudo-marginal algorithm with proposal kernel q, target marginal distribution $\mu $, and estimator $T_{2,x,\alpha }$ of the unnormalized target. If we denote by $\{ (X_{t}^{(1)}, T_{t}^{(1)}) \}_{t \in \mathbb {N}}$ and $\{ (X_{t}^{(2)}, T_{t}^{(2)}) \}_{t \in \mathbb {N}}$ the Markov chains driven by the kernels $H_{2,\alpha }$ and $\alpha \mathrm {I} + (1-\alpha ) H_2$ respectively, then $\{ X_{t}^{(1)} \}_{t \in \mathbb {N}}$ and $\{ X_{t}^{(2)} \}_{t \in \mathbb {N}}$ have the same distribution.

If $T_{1,x} \le _{cx} T_{2,x,\alpha }$ then by Theorem 3 of Andrieu and Vihola (2014),

$$\begin{aligned} v(f,H_{1})&\le v(f, H_{2,\alpha }) \end{aligned}$$

(6)

for any $f \in L^2(\mu )$. We also have

$$\begin{aligned} v(f, H_{2,\alpha })\le & {} \frac{1}{1-\alpha } v(f,H_{2}) + \frac{\alpha }{1 - \alpha } v(f) \nonumber \\\le & {} \frac{1+ \alpha }{1 - \alpha } v(f, H_{2}), \end{aligned}$$

(7)

where the first inequality follows from Corollary 1 of Latuszyński and Roberts (2013) and the second follows from the fact that $H_{2}$ has nonnegative spectrum. Combining this with (6) yields the desired result.$\square $

Proof of Proposition 4

For any $M\ge 1$, let $T_{M,\varvec{\theta }}$ be the estimator $\hat{\pi }_{K,M}(\varvec{\theta }|\varvec{y}_{\mathrm {obs}})$ of the target $\pi _K$, so that $T_{M,\varvec{\theta },\alpha }$ is $T_{M,\varvec{\theta }}$ handicapped by $\alpha $ as defined in (4) of the main document. To obtain (5) of the main document, by Theorem 3 it is sufficient to take $\alpha = 1-\frac{1}{M}$ and show that $T_{1,\varvec{\theta }} \le _{cx} T_{M,\varvec{\theta },\alpha }$. By Proposition 2.2 of Leskelä and Vihola (2014), it is furthermore sufficient to show that, for all $c \in \mathbb {R}$,

$$\begin{aligned} \mathbb {E}[\vert T_{1,\varvec{\theta }} - c \vert ] \le \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha } - c \vert ]. \end{aligned}$$

(8)

Let $\mathrm {Bin}(n,\psi )$ denote the binomial distribution with n trials and success probability $\psi $. For a given point $\varvec{\theta }\in \Theta $, let $\tau = \tau (\varvec{\theta }) \equiv \mathbb {P}[T_{1,\varvec{\theta }} \ne 0] = \int \mathbf 1 _{\{\Vert \eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y})\Vert < \epsilon \}} p(\varvec{y}|\varvec{\theta }) d \varvec{y}$. Noting that $\frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \in \{0, 1 \}$, we may then write $T_{1,\varvec{\theta }}$ and $T_{M,\varvec{\theta },\alpha }$ as the following mixtures

$$\begin{aligned}&\frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \mathop {=}\limits ^{D} \mathrm {Bin}(1,\tau ),\nonumber \\&\frac{T_{M,\varvec{\theta },\alpha }}{ \pi (\varvec{\theta })} \mathop {=}\limits ^{D} \frac{M-1}{M} \delta _{0} + \frac{1}{M} \mathrm {Bin}(M,\tau ), \end{aligned}$$

where $\delta _0$ is the unit point mass at zero. Denote $T_{1,\varvec{\theta }}' = \frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} $ and $T_{M,\varvec{\theta },\alpha }'=\frac{T_{M,\varvec{\theta },\alpha }}{\pi (\varvec{\theta })}$. We will check condition (8) for $T_{1,\varvec{\theta }}', T_{M,\varvec{\theta },\alpha }'$ and $0 \le c \le 1$, then separately for $c < 0$ and $c > 1$. For $0 \le c \le 1$, we compute:

$$\begin{aligned}&\mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - c \vert \right] \nonumber \\&\quad =\left( 1 - \frac{1}{M} \right) c + \frac{1}{M} (1 - \tau )^{M} c \nonumber \\&\qquad \; +\,\frac{1}{M} \left( \sum _{j=1}^{M} \frac{M!}{j! (M-j)!} \tau ^{j} (1-\tau )^{M-j} \left( j - c \right) \right) \nonumber \\&\quad = \tau + c \left( 1 - \frac{2}{M}\left( 1 - (1-\tau )^{M} \right) \right) \ge \tau + c \left( 1 - 2\tau \right) \nonumber \\&\quad = \mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - c \vert \right] . \end{aligned}$$

(9)

For $c < 0$, we have

$$\begin{aligned} \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - c \vert \right]= & {} \mathbb {E}\left[ T_{M,\varvec{\theta },\alpha }'\right] - c = \mathbb {E}\left[ T_{1,\varvec{\theta }}'\right] - c \\= & {} \mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - c \vert \right] , \end{aligned}$$

and the analogous calculation gives the same conclusion for $c \ge M$. Finally, For $1< c < M$, note

$$\begin{aligned}&\mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - 1 \vert \right] \le \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - 1 \vert \right] , \nonumber \\&\mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - M \vert \right] = \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - M \vert \right] . \end{aligned}$$

(10)

Also, the functions $f_1(c) \equiv \mathbb {E}[\vert T_{1,\varvec{\theta }}' - c \vert ]$ and $f_2(c) \equiv \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha }' - c \vert ]$ are continuous, convex and piecewise linear. For $c \ge 1$, they satisfy

$$\begin{aligned} \frac{d}{dc} f_{1}(c) = 1 \ge \frac{d}{dc} f_{2}(c) \end{aligned}$$

(11)

where the derivative of $f_{2}$ exists. Combining inequalities (10) and (11), we conclude that

$$\begin{aligned} \mathbb {E}[\vert T_{1,\varvec{\theta }}' - c \vert ] \le \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha }' - c \vert ] \end{aligned}$$

for all $1 < c < M$. Thus we have verified (8) and the proposition follows.$\square $

Analysis of an alternative ABC-MCMC method

We give a result analogous to Corollary 2 for the version of ABC-MCMC proposed in Wilkinson (2013), given in Algorithm 3 below. The constant c can be any value satisfying $c \ge \sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y}))$.

Lemma 6 compares Algorithm 3 (call its transition kernel $\tilde{Q}$) to $Q_\infty $.

Lemma 6

For any $f\in L^2(\pi _K)$ we have $ v(f, \tilde{Q}) \ge v( f, Q_{\infty })$.

Proof

Both $\tilde{Q}$ and $Q_\infty $ have stationary density $\pi _K$, so by Theorem 4 of Tierney (1998), it suffices to show that $Q_\infty (\varvec{\theta },A\backslash \{\varvec{\theta }\}) \ge \tilde{Q}(\varvec{\theta },A\backslash \{\varvec{\theta }\})$ for all $\varvec{\theta }\in \Theta $ and measurable $A \subset \Theta $. Since $\tilde{Q}$ and $Q_\infty $ use the same proposal density q, it is furthermore sufficient to show that for every $\varvec{\theta }^{(t-1)}$ and $\varvec{\theta }'$, the acceptance probability of $Q_\infty $ is at least as large as that of $\tilde{Q}$. Since $p(\cdot |\varvec{\theta })$ is a probability density,

$$\begin{aligned}&\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}|\varvec{\theta }) d\varvec{y}\nonumber \\ {}&\quad \le \sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})- \eta (\varvec{y})) \qquad \forall \varvec{\theta }\in \Theta . \end{aligned}$$

(12)

So the acceptance probability of $\tilde{Q}$, marginalizing over $\varvec{y}_{\varvec{\theta }'}$, is

$$\begin{aligned} a_{\mathrm {ABC}}= & {} \int r(\varvec{\theta }^{(t-1)}, \varvec{\theta }' | \varvec{y}) p(\varvec{y}| \varvec{\theta }') d \varvec{y}\nonumber \\\le & {} \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y}))}\nonumber \\&\times \min \left\{ 1,\frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right\} \nonumber \\\le & {} \min \left\{ 1, \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y}))}\right. \nonumber \\&\left. \times \left( \frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right) \right\} . \end{aligned}$$

(13)

The acceptance probability of the Metropolis-Hastings algorithm is

$$\begin{aligned} a_{\mathrm {MH}}= & {} \min \left\{ 1, \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }^{(t-1)}) d\varvec{y}}\right. \nonumber \\&\times \left. \left( \frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right) \right\} . \end{aligned}$$

(14)

Using (12), (13), and (14), $a_{\mathrm {ABC}}/a_{\mathrm {MH}} \le 1$.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bornn, L., Pillai, N.S., Smith, A. et al. The use of a single pseudo-sample in approximate Bayesian computation. Stat Comput 27, 583–590 (2017). https://doi.org/10.1007/s11222-016-9640-7

Download citation

Received: 02 April 2015
Accepted: 17 February 2016
Published: 14 March 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s11222-016-9640-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The use of a single pseudo-sample in approximate Bayesian computation

Abstract

Access this article

Similar content being viewed by others

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

Bayesian Approaches to the Design of Markov Chain Monte Carlo Samplers

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Proofs

Proof of Theorem 3

Proof of Proposition 4

Analysis of an alternative ABC-MCMC method

Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The use of a single pseudo-sample in approximate Bayesian computation

Abstract

Access this article

Similar content being viewed by others

Markov Chain Monte Carlo Algorithms for Bayesian Computation, a Survey and Some Generalisation

Bayesian Approaches to the Design of Markov Chain Monte Carlo Samplers

Computing marginal likelihoods via the Fourier integral theorem and pointwise estimation of posterior densities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Proofs

Proof of Theorem 3

Proof of Proposition 4

Analysis of an alternative ABC-MCMC method

Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation