Skip to main content
Log in

The use of a single pseudo-sample in approximate Bayesian computation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We analyze the computational efficiency of approximate Bayesian computation (ABC), which approximates a likelihood function by drawing pseudo-samples from the associated model. For the rejection sampling version of ABC, it is known that multiple pseudo-samples cannot substantially increase (and can substantially decrease) the efficiency of the algorithm as compared to employing a high-variance estimate based on a single pseudo-sample. We show that this conclusion also holds for a Markov chain Monte Carlo version of ABC, implying that it is unnecessary to tune the number of pseudo-samples used in ABC-MCMC. This conclusion is in contrast to particle MCMC methods, for which increasing the number of particles can provide large gains in computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 697–725 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Andrieu, C., Vihola, M.: Establishing some order amongst exact approximations of MCMCs. arXiv preprint, arXiv:1404.6909v1 (2014)

  • Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. 72, 269–342 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Doucet, A., Pitt, M., Deligiannidis, G., Kohn, R.: Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. arXiv preprint, arXiv:1210.1871v3 (2014)

  • Flury, T., Shephard, N.: Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econom. Theory 27(5), 933–956 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Guan, Y., Krone, S.M.: Small-world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. Ann. Appl. Probab. 17, 284–304 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Latuszyński, K., Roberts, G.O.: CLTs and asymptotic variance of time-sampled Markov chains. Methodol. Comput. Appl. Probab. 15(1), 237–247 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, A., Latuszynski, K.: Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation. arXiv preprint, arXiv:1210.6703 (2013)

  • Leskelä, L., Vihola, M.: Conditional convex orders and measurable martingale couplings. arXiv preprint, arXiv:1404.0999 (2014)

  • Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22, 1167–1180 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. 100(26), 15324–15328 (2003)

    Article  Google Scholar 

  • Narayanan, H., Rakhlin, A.: Random walk approach to regret minimization. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Conference proceedings of NIPS, Advances in Neural Information Processing Systems, vol. 23. Curran Associates, Inc., http://papers.nips.cc/book/advances-in-neural-information-processing-systems-23-2010 (2010)

  • Pitt, M.K., Silva, R.d S., Giordani, P., Kohn, R.: On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econom. 171(2), 134–151 (2012)

    Article  MathSciNet  Google Scholar 

  • Roberts, G.O., Rosenthal, J.S.: Variance bounding Markov chains. Ann. Appl. Probab. 18, 1201–1214 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Sherlock, C., Thiery, A.H., Roberts, G.O., Rosenthal, J.S.: On the efficiency of pseudo-marginal random walk Metropolis algorithms. arXiv preprint, arXiv:1309.7209 (2013)

  • Tavare, S., Balding, D.J., Griffiths, R., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145(2), 505–518 (1997)

    Google Scholar 

  • Tierney, L.: A note on Metropolis-Hastings kernels for general state spaces. Ann Appl Probab 8, 1–9 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  • Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)

    MathSciNet  Google Scholar 

  • Woodard, D.B., Schmidler, S.C., Huber, M.: Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. Ann. Appl. Probab. 19, 617–640 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank Alex Thiery for his careful reading of an earlier draft, as well as Pierre Jacob, Rémi Bardenet, Christophe Andrieu, Matti Vihola, Christian Robert, and Arnaud Doucet for useful discussions. This research was supported in part by U.S. National Science Foundation grants 1461435, DMS-1209103, and DMS-1406599, by DARPA under Grant No. FA8750-14-2-0117, by ARO under Grant No. W911NF-15-1-0172, and by NSERC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luke Bornn.

Appendices

Proofs

Proof of Theorem 3

Denote by \(H_{2,\alpha }\) the transition kernel of the pseudo-marginal algorithm with proposal kernel q, target marginal distribution \(\mu \), and estimator \(T_{2,x,\alpha }\) of the unnormalized target. If we denote by \(\{ (X_{t}^{(1)}, T_{t}^{(1)}) \}_{t \in \mathbb {N}}\) and \(\{ (X_{t}^{(2)}, T_{t}^{(2)}) \}_{t \in \mathbb {N}}\) the Markov chains driven by the kernels \(H_{2,\alpha }\) and \(\alpha \mathrm {I} + (1-\alpha ) H_2\) respectively, then \(\{ X_{t}^{(1)} \}_{t \in \mathbb {N}}\) and \(\{ X_{t}^{(2)} \}_{t \in \mathbb {N}}\) have the same distribution.

If \(T_{1,x} \le _{cx} T_{2,x,\alpha }\) then by Theorem 3 of Andrieu and Vihola (2014),

$$\begin{aligned} v(f,H_{1})&\le v(f, H_{2,\alpha }) \end{aligned}$$
(6)

for any \(f \in L^2(\mu )\). We also have

$$\begin{aligned} v(f, H_{2,\alpha })\le & {} \frac{1}{1-\alpha } v(f,H_{2}) + \frac{\alpha }{1 - \alpha } v(f) \nonumber \\\le & {} \frac{1+ \alpha }{1 - \alpha } v(f, H_{2}), \end{aligned}$$
(7)

where the first inequality follows from Corollary 1 of Latuszyński and Roberts (2013) and the second follows from the fact that \(H_{2}\) has nonnegative spectrum. Combining this with (6) yields the desired result.\(\square \)

Proof of Proposition 4

For any \(M\ge 1\), let \(T_{M,\varvec{\theta }}\) be the estimator \(\hat{\pi }_{K,M}(\varvec{\theta }|\varvec{y}_{\mathrm {obs}})\) of the target \(\pi _K\), so that \(T_{M,\varvec{\theta },\alpha }\) is \(T_{M,\varvec{\theta }}\) handicapped by \(\alpha \) as defined in (4) of the main document. To obtain (5) of the main document, by Theorem 3 it is sufficient to take \(\alpha = 1-\frac{1}{M}\) and show that \(T_{1,\varvec{\theta }} \le _{cx} T_{M,\varvec{\theta },\alpha }\). By Proposition 2.2 of Leskelä and Vihola (2014), it is furthermore sufficient to show that, for all \(c \in \mathbb {R}\),

$$\begin{aligned} \mathbb {E}[\vert T_{1,\varvec{\theta }} - c \vert ] \le \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha } - c \vert ]. \end{aligned}$$
(8)

Let \(\mathrm {Bin}(n,\psi )\) denote the binomial distribution with n trials and success probability \(\psi \). For a given point \(\varvec{\theta }\in \Theta \), let \(\tau = \tau (\varvec{\theta }) \equiv \mathbb {P}[T_{1,\varvec{\theta }} \ne 0] = \int \mathbf 1 _{\{\Vert \eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y})\Vert < \epsilon \}} p(\varvec{y}|\varvec{\theta }) d \varvec{y}\). Noting that \(\frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \in \{0, 1 \}\), we may then write \(T_{1,\varvec{\theta }}\) and \(T_{M,\varvec{\theta },\alpha }\) as the following mixtures

$$\begin{aligned}&\frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \mathop {=}\limits ^{D} \mathrm {Bin}(1,\tau ),\nonumber \\&\frac{T_{M,\varvec{\theta },\alpha }}{ \pi (\varvec{\theta })} \mathop {=}\limits ^{D} \frac{M-1}{M} \delta _{0} + \frac{1}{M} \mathrm {Bin}(M,\tau ), \end{aligned}$$

where \(\delta _0\) is the unit point mass at zero. Denote \(T_{1,\varvec{\theta }}' = \frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \) and \(T_{M,\varvec{\theta },\alpha }'=\frac{T_{M,\varvec{\theta },\alpha }}{\pi (\varvec{\theta })}\). We will check condition (8) for \(T_{1,\varvec{\theta }}', T_{M,\varvec{\theta },\alpha }'\) and \(0 \le c \le 1\), then separately for \(c < 0\) and \(c > 1\). For \(0 \le c \le 1\), we compute:

$$\begin{aligned}&\mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - c \vert \right] \nonumber \\&\quad =\left( 1 - \frac{1}{M} \right) c + \frac{1}{M} (1 - \tau )^{M} c \nonumber \\&\qquad \; +\,\frac{1}{M} \left( \sum _{j=1}^{M} \frac{M!}{j! (M-j)!} \tau ^{j} (1-\tau )^{M-j} \left( j - c \right) \right) \nonumber \\&\quad = \tau + c \left( 1 - \frac{2}{M}\left( 1 - (1-\tau )^{M} \right) \right) \ge \tau + c \left( 1 - 2\tau \right) \nonumber \\&\quad = \mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - c \vert \right] . \end{aligned}$$
(9)

For \(c < 0\), we have

$$\begin{aligned} \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - c \vert \right]= & {} \mathbb {E}\left[ T_{M,\varvec{\theta },\alpha }'\right] - c = \mathbb {E}\left[ T_{1,\varvec{\theta }}'\right] - c \\= & {} \mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - c \vert \right] , \end{aligned}$$

and the analogous calculation gives the same conclusion for \(c \ge M\). Finally, For \(1< c < M\), note

$$\begin{aligned}&\mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - 1 \vert \right] \le \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - 1 \vert \right] , \nonumber \\&\mathbb {E}\left[ \vert T_{1,\varvec{\theta }}' - M \vert \right] = \mathbb {E}\left[ \vert T_{M,\varvec{\theta },\alpha }' - M \vert \right] . \end{aligned}$$
(10)

Also, the functions \(f_1(c) \equiv \mathbb {E}[\vert T_{1,\varvec{\theta }}' - c \vert ]\) and \(f_2(c) \equiv \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha }' - c \vert ]\) are continuous, convex and piecewise linear. For \(c \ge 1\), they satisfy

$$\begin{aligned} \frac{d}{dc} f_{1}(c) = 1 \ge \frac{d}{dc} f_{2}(c) \end{aligned}$$
(11)

where the derivative of \(f_{2}\) exists. Combining inequalities (10) and (11), we conclude that

$$\begin{aligned} \mathbb {E}[\vert T_{1,\varvec{\theta }}' - c \vert ] \le \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha }' - c \vert ] \end{aligned}$$

for all \(1 < c < M\). Thus we have verified (8) and the proposition follows.\(\square \)

Analysis of an alternative ABC-MCMC method

We give a result analogous to Corollary 2 for the version of ABC-MCMC proposed in Wilkinson (2013), given in Algorithm 3 below. The constant c can be any value satisfying \(c \ge \sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y}))\).

Lemma 6 compares Algorithm 3 (call its transition kernel \(\tilde{Q}\)) to \(Q_\infty \).

figure c

Lemma 6

For any \(f\in L^2(\pi _K)\) we have \( v(f, \tilde{Q}) \ge v( f, Q_{\infty })\).

Proof

Both \(\tilde{Q}\) and \(Q_\infty \) have stationary density \(\pi _K\), so by Theorem 4 of Tierney (1998), it suffices to show that \(Q_\infty (\varvec{\theta },A\backslash \{\varvec{\theta }\}) \ge \tilde{Q}(\varvec{\theta },A\backslash \{\varvec{\theta }\})\) for all \(\varvec{\theta }\in \Theta \) and measurable \(A \subset \Theta \). Since \(\tilde{Q}\) and \(Q_\infty \) use the same proposal density q, it is furthermore sufficient to show that for every \(\varvec{\theta }^{(t-1)}\) and \(\varvec{\theta }'\), the acceptance probability of \(Q_\infty \) is at least as large as that of \(\tilde{Q}\). Since \(p(\cdot |\varvec{\theta })\) is a probability density,

$$\begin{aligned}&\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}|\varvec{\theta }) d\varvec{y}\nonumber \\ {}&\quad \le \sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})- \eta (\varvec{y})) \qquad \forall \varvec{\theta }\in \Theta . \end{aligned}$$
(12)

So the acceptance probability of \(\tilde{Q}\), marginalizing over \(\varvec{y}_{\varvec{\theta }'}\), is

$$\begin{aligned} a_{\mathrm {ABC}}= & {} \int r(\varvec{\theta }^{(t-1)}, \varvec{\theta }' | \varvec{y}) p(\varvec{y}| \varvec{\theta }') d \varvec{y}\nonumber \\\le & {} \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y}))}\nonumber \\&\times \min \left\{ 1,\frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right\} \nonumber \\\le & {} \min \left\{ 1, \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y}))}\right. \nonumber \\&\left. \times \left( \frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right) \right\} . \end{aligned}$$
(13)

The acceptance probability of the Metropolis-Hastings algorithm is

$$\begin{aligned} a_{\mathrm {MH}}= & {} \min \left\{ 1, \frac{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }') d\varvec{y}}{\int K(\eta (\varvec{y}_{\mathrm {obs}})-\eta (\varvec{y})) p(\varvec{y}| \varvec{\theta }^{(t-1)}) d\varvec{y}}\right. \nonumber \\&\times \left. \left( \frac{\pi (\varvec{\theta }') q(\varvec{\theta }^{(t-1)} \vert \varvec{\theta }')}{\pi (\varvec{\theta }^{(t-1)}) q(\varvec{\theta }' \vert \varvec{\theta }^{(t-1)})}\right) \right\} . \end{aligned}$$
(14)

Using (12), (13), and (14), \(a_{\mathrm {ABC}}/a_{\mathrm {MH}} \le 1\).\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bornn, L., Pillai, N.S., Smith, A. et al. The use of a single pseudo-sample in approximate Bayesian computation. Stat Comput 27, 583–590 (2017). https://doi.org/10.1007/s11222-016-9640-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9640-7

Keywords

Navigation