Abstract
Delayed-acceptance is a technique for reducing computational effort for Bayesian models with expensive likelihoods. Using a delayed-acceptance kernel for Markov chain Monte Carlo can reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation. Delayed-acceptance uses a surrogate, or approximate, likelihood to avoid evaluation of the expensive likelihood when possible. Within the sequential Monte Carlo framework, we utilise the history of the sampler to adaptively tune the surrogate likelihood to yield better approximations of the expensive likelihood and use a surrogate first annealing schedule to further increase computational efficiency. Moreover, we propose a framework for optimising computation time whilst avoiding particle degeneracy, which encapsulates existing strategies in the literature. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is applied to static Bayesian models, which we demonstrate on toy and real examples.
Similar content being viewed by others
References
Banterle, M., Grazian, C., Lee, A., Robert, C.P.: Accelerating Metropolis-Hastings algorithms by delayed acceptance. Foundations Data Sci. 1(2), 103–128 (2019)
Barndorff-Nielsen, O., Schou, G.: On the parametrization of autoregressive models by partial autocorrelations. J. Multivariate Anal. 3(4), 408–419 (1973)
Beskos, A., Jasra, A., Kantas, N., Thiery, A.: On the convergence of adaptive sequential Monte Carlo methods. Ann. Appl. Probab. 26(2), 1111–1146 (2016)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Switzerland (2016)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539–552 (2002)
Christen, J.A., Fox, C.: Markov chain Monte Carlo using an approximation. J. Comput. Graph. Stat. 14(4), 795–810 (2005)
Conrad, P.R., Marzouk, Y.M., Pillai, N.S., Smith, A.: Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. J. Am. Stat. Assoc. 111(516), 1591–1607 (2016)
Cui, T., Fox, C. and O’sullivan, M.: Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm. Water Resour. Res. 47(10) (2011)
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. Royal Stat. Soc. Ser. B Stat. Methodol. 68(3), 411–436 (2006)
Donnet, S., and Robin, S.: Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv:1707.07971 (2017)
Drovandi, C.C., Moores, M.T., Boys, R.J.: Accelerating pseudo-marginal MCMC using Gaussian processes. Comput. Stat. Data Anal. 118, 1–17 (2018)
Drovandi, C.C., Pettitt, A.N.: Likelihood-free Bayesian estimation of multivariate quantile distributions. Comput. Stat. Data Anal. 55(9), 2541–2556 (2011)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)
Elf, J., Ehrenberg, M.: Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 13(11), 2475–2484 (2003)
Everitt, R. G. and Rowińska, P. A.: Delayed acceptance ABC-SMC. arXiv preprint arXiv:1708.02230 (2017)
Fearnhead, P., Taylor, B.M., et al.: An adaptive sequential Monte Carlo sampler. Bayesian Anal. 8(2), 411–438 (2013)
Fox, C., and Nicholls, G.: Sampling conductivity images via MCMC. in K. Mardia, C. Gill and R. Aykroyd (eds), The art and Science of Bayesian Image Analysis, Proceedings of the Leeds Annual Statistical Research Workshop (LASR), Leeds, pp. 91–100 (1997)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software 33(1), 1 (2010)
Gilks, W.R., Berzuini, C.: Following a moving target-Monte Carlo inference for dynamic Bayesian models. J. Royal Stat. Soc. Ser. B Stat. Methodol. 63(1), 127–146 (2001)
Golightly, A., Henderson, D.A., Sherlock, C.: Delayed acceptance particle MCMC for exact inference in stochastic kinetic models. Statistics Comput. 25(5), 1039–1055 (2015)
Granger, C.W., Joyeux, R.: An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1(1), 15–29 (1980)
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. Royal Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
Hastings, W.: Monte carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Hennig, C.: fpc: Flexible Procedures for Clustering. R package version 2.2-7. https://CRAN.R-project.org/package=fpc (2020)
Jasra, A., Stephens, D.A., Doucet, A., Tsagaris, T.: Inference for Lévy-driven stochastic volatility models via adaptive sequential Monte Carlo. Scandinavian J. Stat. 38(1), 1–22 (2011)
Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)
Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)
Merkle, M.: Jensen’s inequality for multivariate medians. J. Math. Anal. Appl. 370(1), 258–269 (2010)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Pasarica, C., Gelman, A.: Adaptively scaling the Metropolis algorithm using expected squared jumped distance. Statistica Sinica 20(1), 343–364 (2010)
Payne, R.D., Mallick, B.K.: Two-stage Metropolis-Hastings for tall data. J. Classif. 35(1), 29–51 (2018)
Prangle, D.: Lazy ABC. Stat. Comput. 26(1–2), 171–185 (2016)
Quiroz, M., Tran, M.-N., Villani, M., Kohn, R.: Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27(1), 12–22 (2018)
Salomone, R., Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N.: Spectral subsampling MCMC for stationary time series. arXiv preprint arXiv:1910.13627 (2019)
Salomone, R., South, L. F., Drovandi, C. C. and Kroese, D. P.: Unbiased and consistent nested sampling via sequential Monte Carlo’, arXiv preprint arXiv:1805.03924 (2018)
Sherlock, C., Golightly, A., Henderson, D.A.: Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. J. Comput. Graph. Stat. 26(2), 434–444 (2017)
Sherlock, C., Thiery, A. and Golightly, A.: Efficiency of delayed-acceptance random walk Metropolis algorithms. arXiv preprint arXiv:1506.08155 (2015)
Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., Järvinen, H., et al.: Efficient MCMC for climate model parameter estimation: Parallel adaptive chains and early rejection. Bayesian Anal. 7(3), 715–736 (2012)
South, L.F., Pettitt, A.N., Drovandi, C.C.: Sequential Monte Carlo samplers with independent Markov chain Monte Carlo proposals. Bayesian Anal. 14(3), 753–776 (2019). https://doi.org/10.1214/18-BA1129
Stathopoulos, V., Girolami, M.A.: Markov chain Monte Carlo inference for Markov jump processes via the linear noise approximation. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 371, 20110541 (1984)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
Whittle, P.: Estimation and information in stationary time series. Arkiv för Matematik 2(5), 423–434 (1953)
Wiqvist, S., Picchini, U., Forman, J. L., Lindorff-Larsen, K. and Boomsma, W.: Accelerating delayed-acceptance Markov chain Monte Carlo algorithms. arXiv preprint arXiv:1806.05982 (2018)
Acknowledgements
JJB is a recipient of a Ph.D. Research Training Program scholarship from the Australian Government. JJB, AL, and CD thank the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers for financial support (CE140100049). AL and CD were supported by an ARC Research Council Discovery Project (DP200102101). AL was supported by an EPSRC grant (EP/R034710/1) and received travel funding from the Statistical Society of Australia. JJB and CD also thank the Centre for Data Science at QUT for support.
Author information
Authors and Affiliations
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Availability of data and material
Code to simulate datasets is available at at https://github.com/bonStats/smcdar.
Code availability
Code to run examples is available at at https://github.com/bonStats/smcdar.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
1.1 A.1 One-move diversification
An alternative to ESJD diversification can be found in a simple method from South et al. (2019) for choosing the number of MCMC runs—the basis of which is from Drovandi and Pettitt (2011). In this regime, the number of cycles, k, is chosen so that each particle moves at least once in k iterations. A move occurs when a proposal is accepted using an MH kernel. We will refer to this criterion as one-move diversification. For a fixed scaling parameter h, one-move diversification uses the MH acceptance rates to determine the average number of MCMC cycles required for at least one proposal per particle to be accepted. More generally, one could require a higher minimum number of moves, but for simplicity we just consider the case of at least one move.
This section will consider a single mutation step of the SMC algorithm, consisting of multiple cycles of the MCMC kernel, indexed by \(s \in \{1,2, \ldots , k\}\). Assuming the probability of moving (or acceptance, \(\alpha ^{(1)}\)) is equal across cycles, the average probability (across the tempered posterior distribution) that at least one is accepted in a sequence of k cycles, \(\alpha ^{(k)}\), is
for \(k \in \{1,2,\ldots \}\). A pilot mutation step can be used to estimate the average acceptance rate across the particles in a single step, \(\widehat{\alpha }^{(1)}\). We can then find k such that \(\alpha ^{(k)} \ge p_{\min }\) for some threshold \(0< p_{\min } < 1\). The formula to choose the total number of iterations, k, is
where \(\widehat{\alpha }^{(1)}\) is the estimated acceptance rate from the pilot run of the MH kernel.
To frame this in the context of optimising computation time, note that the underlying criterion is to ensure a sufficient number of mutation steps are taken so that the probability of at least one move is greater than \(p_{\min }\) for a given particle.
If we denote a move by \(\Vert \varvec{\theta }_{s} - \varvec{\theta }_{s-1}\Vert _{0}\), where \(\Vert \cdot \Vert _{0}\) is the zero “norm”, the corresponding diversification criterion can be expressed with
where the probability is taken with respect to the acceptance rates of the Metropolis-Hastings steps. Of course, this expression for \(D(k, {\varvec{\phi }})\) is a more general version of (23) and coincides if we assume the probability of acceptance is equal across particle locations and MCMC iterations, s. We emphasise the norm notation to draw a comparison to the jumping distance diversification in Sect. 4.1. That is, we can write \(P(k, {\varvec{\phi }})\) in (8) as
where the expectation is with respect to the random acceptance over k cycles of the MH kernel. Written in this way, \(P(k, {\varvec{\phi }})\) elicits an interesting comparison to (25); it is a change of “norm” when moving between one-move and jumping distance diversification.
Now we wish to use one-move criterion to select the tuning parameters. If we use different proposal kernel tuning parameters for particular subsets of particles, the acceptance rate will be a function of those parameters, so we write \(\alpha ^{(k)}\) as \(\alpha ^{(k)}({\varvec{\phi }})\). The optimisation stated in (3) can be simplified as stated in Proposition 3.
Proposition 3
Assume the cost function is \(C(k, {\varvec{\phi }}) = k \times L_{F}\), approximating the cost of a standard Metropolis-Hastings step, and \(D(k,{\varvec{\phi }}) = \alpha ^{(k)}({\varvec{\phi }})\). The latter also corresponds to (25) assuming a uniform acceptance rate across the support of \(\varvec{\theta }\). Then the general problem in (3) is equivalent to
where the general diversification threshold, d, has been replaced by the probability \(p_{\min }\).
Proposition 3 is the solution to choosing the best tuning parameters with the one-move criterion and MH-cost. It closely connects to the original decision for k without tuning parameters (24). A proof of Proposition 3 is in Appendix A.4.
In general, we expect the tuning criterion in Proposition 3 to perform poorly. This can be demonstrated by a simple, but highly applicable, example. If the tuning parameter is the step size for an MH mutation, i.e. \({\varvec{\phi }} = [h]\), then we would expect the acceptance probability, \(\alpha ^{(1)}({\varvec{\phi }})\), to be monotone decreasing in h. Hence, the minimisation in (26) will prefer the minimum step size possible, which will ensure at least one move with the minimal computation cost. In other words, the diversification criterion in (25) is only concerned with the probability of at least one move, not the quality of this move.
Due to the aforementioned shortcoming, a diversification criterion that also measures the quality of the mutation is desirable. For this reason, we focus on the ESJD as a criterion in the main text.
1.2 A.2 Proof of Proposition 1
Let \(\mathsf {D}_m = \left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi :D(k,{\varvec{\phi }}) \ge d\right\} \) with
Using the multivariate Jensen inequality for medians in (Merkle (2010), Theorem 5.2), we have that
and assuming the jumping distances are iid, for a given \({\phi }\), we further reduce this to
We can define the set \(\tilde{\mathsf {D}}_m\) as
then we see from (27) that \(\tilde{\mathsf {D}}_m \subseteq \mathsf {D}_m\).
1.3 A.3 Proof of Proposition 2
Under the MH cost function, \(C(k, {\varvec{\phi }}) = k \times L_{F}\), and approximate ESJD diversification criterion,
the inequality for the diversification criterion can be rearranged into
Under this restriction, note that
so under these conditions, the general problem in (3) is equivalent to
1.4 A.4 Proof of Proposition 3
Under the MH cost function, \(C(k, {\phi }) = k \times L_{F}\), and one-move diversification criterion,
the inequality for the diversification criterion can be rearranged into
Under this restriction, note that
so under these conditions, the general problem in (3) is equivalent to
1.5 A.5 Additional figures and tables from simulation
1.6 A.6 Additional figures from ARFIMA model example
Rights and permissions
About this article
Cite this article
Bon, J.J., Lee, A. & Drovandi, C. Accelerating sequential Monte Carlo with surrogate likelihoods. Stat Comput 31, 62 (2021). https://doi.org/10.1007/s11222-021-10036-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10036-4