Abstract
We analyze the computational efficiency of approximate Bayesian computation (ABC), which approximates a likelihood function by drawing pseudo-samples from the associated model. For the rejection sampling version of ABC, it is known that multiple pseudo-samples cannot substantially increase (and can substantially decrease) the efficiency of the algorithm as compared to employing a high-variance estimate based on a single pseudo-sample. We show that this conclusion also holds for a Markov chain Monte Carlo version of ABC, implying that it is unnecessary to tune the number of pseudo-samples used in ABC-MCMC. This conclusion is in contrast to particle MCMC methods, for which increasing the number of particles can provide large gains in computational efficiency.
Similar content being viewed by others
References
Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 697–725 (2009)
Andrieu, C., Vihola, M.: Establishing some order amongst exact approximations of MCMCs. arXiv preprint, arXiv:1404.6909v1 (2014)
Andrieu, C., Doucet, A., Holenstein, R.: Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. 72, 269–342 (2010)
Doucet, A., Pitt, M., Deligiannidis, G., Kohn, R.: Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. arXiv preprint, arXiv:1210.1871v3 (2014)
Flury, T., Shephard, N.: Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econom. Theory 27(5), 933–956 (2011)
Guan, Y., Krone, S.M.: Small-world MCMC and convergence to multi-modal distributions: from slow mixing to fast mixing. Ann. Appl. Probab. 17, 284–304 (2007)
Latuszyński, K., Roberts, G.O.: CLTs and asymptotic variance of time-sampled Markov chains. Methodol. Comput. Appl. Probab. 15(1), 237–247 (2013)
Lee, A., Latuszynski, K.: Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation. arXiv preprint, arXiv:1210.6703 (2013)
Leskelä, L., Vihola, M.: Conditional convex orders and measurable martingale couplings. arXiv preprint, arXiv:1404.0999 (2014)
Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22, 1167–1180 (2012)
Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. 100(26), 15324–15328 (2003)
Narayanan, H., Rakhlin, A.: Random walk approach to regret minimization. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Conference proceedings of NIPS, Advances in Neural Information Processing Systems, vol. 23. Curran Associates, Inc., http://papers.nips.cc/book/advances-in-neural-information-processing-systems-23-2010 (2010)
Pitt, M.K., Silva, R.d S., Giordani, P., Kohn, R.: On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econom. 171(2), 134–151 (2012)
Roberts, G.O., Rosenthal, J.S.: Variance bounding Markov chains. Ann. Appl. Probab. 18, 1201–1214 (2008)
Sherlock, C., Thiery, A.H., Roberts, G.O., Rosenthal, J.S.: On the efficiency of pseudo-marginal random walk Metropolis algorithms. arXiv preprint, arXiv:1309.7209 (2013)
Tavare, S., Balding, D.J., Griffiths, R., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145(2), 505–518 (1997)
Tierney, L.: A note on Metropolis-Hastings kernels for general state spaces. Ann Appl Probab 8, 1–9 (1998)
Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)
Woodard, D.B., Schmidler, S.C., Huber, M.: Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. Ann. Appl. Probab. 19, 617–640 (2009)
Acknowledgments
The authors thank Alex Thiery for his careful reading of an earlier draft, as well as Pierre Jacob, Rémi Bardenet, Christophe Andrieu, Matti Vihola, Christian Robert, and Arnaud Doucet for useful discussions. This research was supported in part by U.S. National Science Foundation grants 1461435, DMS-1209103, and DMS-1406599, by DARPA under Grant No. FA8750-14-2-0117, by ARO under Grant No. W911NF-15-1-0172, and by NSERC.
Author information
Authors and Affiliations
Corresponding author
Appendices
Proofs
Proof of Theorem 3
Denote by \(H_{2,\alpha }\) the transition kernel of the pseudo-marginal algorithm with proposal kernel q, target marginal distribution \(\mu \), and estimator \(T_{2,x,\alpha }\) of the unnormalized target. If we denote by \(\{ (X_{t}^{(1)}, T_{t}^{(1)}) \}_{t \in \mathbb {N}}\) and \(\{ (X_{t}^{(2)}, T_{t}^{(2)}) \}_{t \in \mathbb {N}}\) the Markov chains driven by the kernels \(H_{2,\alpha }\) and \(\alpha \mathrm {I} + (1-\alpha ) H_2\) respectively, then \(\{ X_{t}^{(1)} \}_{t \in \mathbb {N}}\) and \(\{ X_{t}^{(2)} \}_{t \in \mathbb {N}}\) have the same distribution.
If \(T_{1,x} \le _{cx} T_{2,x,\alpha }\) then by Theorem 3 of Andrieu and Vihola (2014),
for any \(f \in L^2(\mu )\). We also have
where the first inequality follows from Corollary 1 of Latuszyński and Roberts (2013) and the second follows from the fact that \(H_{2}\) has nonnegative spectrum. Combining this with (6) yields the desired result.\(\square \)
Proof of Proposition 4
For any \(M\ge 1\), let \(T_{M,\varvec{\theta }}\) be the estimator \(\hat{\pi }_{K,M}(\varvec{\theta }|\varvec{y}_{\mathrm {obs}})\) of the target \(\pi _K\), so that \(T_{M,\varvec{\theta },\alpha }\) is \(T_{M,\varvec{\theta }}\) handicapped by \(\alpha \) as defined in (4) of the main document. To obtain (5) of the main document, by Theorem 3 it is sufficient to take \(\alpha = 1-\frac{1}{M}\) and show that \(T_{1,\varvec{\theta }} \le _{cx} T_{M,\varvec{\theta },\alpha }\). By Proposition 2.2 of Leskelä and Vihola (2014), it is furthermore sufficient to show that, for all \(c \in \mathbb {R}\),
Let \(\mathrm {Bin}(n,\psi )\) denote the binomial distribution with n trials and success probability \(\psi \). For a given point \(\varvec{\theta }\in \Theta \), let \(\tau = \tau (\varvec{\theta }) \equiv \mathbb {P}[T_{1,\varvec{\theta }} \ne 0] = \int \mathbf 1 _{\{\Vert \eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y})\Vert < \epsilon \}} p(\varvec{y}|\varvec{\theta }) d \varvec{y}\). Noting that \(\frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \in \{0, 1 \}\), we may then write \(T_{1,\varvec{\theta }}\) and \(T_{M,\varvec{\theta },\alpha }\) as the following mixtures
where \(\delta _0\) is the unit point mass at zero. Denote \(T_{1,\varvec{\theta }}' = \frac{T_{1,\varvec{\theta }}}{\pi (\varvec{\theta })} \) and \(T_{M,\varvec{\theta },\alpha }'=\frac{T_{M,\varvec{\theta },\alpha }}{\pi (\varvec{\theta })}\). We will check condition (8) for \(T_{1,\varvec{\theta }}', T_{M,\varvec{\theta },\alpha }'\) and \(0 \le c \le 1\), then separately for \(c < 0\) and \(c > 1\). For \(0 \le c \le 1\), we compute:
For \(c < 0\), we have
and the analogous calculation gives the same conclusion for \(c \ge M\). Finally, For \(1< c < M\), note
Also, the functions \(f_1(c) \equiv \mathbb {E}[\vert T_{1,\varvec{\theta }}' - c \vert ]\) and \(f_2(c) \equiv \mathbb {E}[\vert T_{M,\varvec{\theta },\alpha }' - c \vert ]\) are continuous, convex and piecewise linear. For \(c \ge 1\), they satisfy
where the derivative of \(f_{2}\) exists. Combining inequalities (10) and (11), we conclude that
for all \(1 < c < M\). Thus we have verified (8) and the proposition follows.\(\square \)
Analysis of an alternative ABC-MCMC method
We give a result analogous to Corollary 2 for the version of ABC-MCMC proposed in Wilkinson (2013), given in Algorithm 3 below. The constant c can be any value satisfying \(c \ge \sup _{\varvec{y}} K(\eta (\varvec{y}_{\mathrm {obs}}) - \eta (\varvec{y}))\).
Lemma 6 compares Algorithm 3 (call its transition kernel \(\tilde{Q}\)) to \(Q_\infty \).
Lemma 6
For any \(f\in L^2(\pi _K)\) we have \( v(f, \tilde{Q}) \ge v( f, Q_{\infty })\).
Proof
Both \(\tilde{Q}\) and \(Q_\infty \) have stationary density \(\pi _K\), so by Theorem 4 of Tierney (1998), it suffices to show that \(Q_\infty (\varvec{\theta },A\backslash \{\varvec{\theta }\}) \ge \tilde{Q}(\varvec{\theta },A\backslash \{\varvec{\theta }\})\) for all \(\varvec{\theta }\in \Theta \) and measurable \(A \subset \Theta \). Since \(\tilde{Q}\) and \(Q_\infty \) use the same proposal density q, it is furthermore sufficient to show that for every \(\varvec{\theta }^{(t-1)}\) and \(\varvec{\theta }'\), the acceptance probability of \(Q_\infty \) is at least as large as that of \(\tilde{Q}\). Since \(p(\cdot |\varvec{\theta })\) is a probability density,
So the acceptance probability of \(\tilde{Q}\), marginalizing over \(\varvec{y}_{\varvec{\theta }'}\), is
The acceptance probability of the Metropolis-Hastings algorithm is
Using (12), (13), and (14), \(a_{\mathrm {ABC}}/a_{\mathrm {MH}} \le 1\).\(\square \)
Rights and permissions
About this article
Cite this article
Bornn, L., Pillai, N.S., Smith, A. et al. The use of a single pseudo-sample in approximate Bayesian computation. Stat Comput 27, 583–590 (2017). https://doi.org/10.1007/s11222-016-9640-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9640-7