Accelerating sequential Monte Carlo with surrogate likelihoods

Bon, Joshua J.; Lee, Anthony; Drovandi, Christopher

doi:10.1007/s11222-021-10036-4

Accelerating sequential Monte Carlo with surrogate likelihoods

Published: 18 August 2021

Volume 31, article number 62, (2021)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

586 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Delayed-acceptance is a technique for reducing computational effort for Bayesian models with expensive likelihoods. Using a delayed-acceptance kernel for Markov chain Monte Carlo can reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation. Delayed-acceptance uses a surrogate, or approximate, likelihood to avoid evaluation of the expensive likelihood when possible. Within the sequential Monte Carlo framework, we utilise the history of the sampler to adaptively tune the surrogate likelihood to yield better approximations of the expensive likelihood and use a surrogate first annealing schedule to further increase computational efficiency. Moreover, we propose a framework for optimising computation time whilst avoiding particle degeneracy, which encapsulates existing strategies in the literature. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is applied to static Bayesian models, which we demonstrate on toy and real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable inference for Markov processes with intractable likelihoods

Article Open access 01 November 2014

Efficient $$\hbox {SMC}^2$$ schemes for stochastic kinetic models

Article Open access 10 November 2017

The use of a single pseudo-sample in approximate Bayesian computation

Article 14 March 2016

References

Banterle, M., Grazian, C., Lee, A., Robert, C.P.: Accelerating Metropolis-Hastings algorithms by delayed acceptance. Foundations Data Sci. 1(2), 103–128 (2019)
Article Google Scholar
Barndorff-Nielsen, O., Schou, G.: On the parametrization of autoregressive models by partial autocorrelations. J. Multivariate Anal. 3(4), 408–419 (1973)
Article MathSciNet Google Scholar
Beskos, A., Jasra, A., Kantas, N., Thiery, A.: On the convergence of adaptive sequential Monte Carlo methods. Ann. Appl. Probab. 26(2), 1111–1146 (2016)
Article MathSciNet Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Switzerland (2016)
Book Google Scholar
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539–552 (2002)
Article MathSciNet Google Scholar
Christen, J.A., Fox, C.: Markov chain Monte Carlo using an approximation. J. Comput. Graph. Stat. 14(4), 795–810 (2005)
Article MathSciNet Google Scholar
Conrad, P.R., Marzouk, Y.M., Pillai, N.S., Smith, A.: Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. J. Am. Stat. Assoc. 111(516), 1591–1607 (2016)
Article MathSciNet Google Scholar
Cui, T., Fox, C. and O’sullivan, M.: Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm. Water Resour. Res. 47(10) (2011)
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. Royal Stat. Soc. Ser. B Stat. Methodol. 68(3), 411–436 (2006)
Article MathSciNet Google Scholar
Donnet, S., and Robin, S.: Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv:1707.07971 (2017)
Drovandi, C.C., Moores, M.T., Boys, R.J.: Accelerating pseudo-marginal MCMC using Gaussian processes. Comput. Stat. Data Anal. 118, 1–17 (2018)
Article MathSciNet Google Scholar
Drovandi, C.C., Pettitt, A.N.: Likelihood-free Bayesian estimation of multivariate quantile distributions. Comput. Stat. Data Anal. 55(9), 2541–2556 (2011)
Article MathSciNet Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)
MATH Google Scholar
Elf, J., Ehrenberg, M.: Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 13(11), 2475–2484 (2003)
Article Google Scholar
Everitt, R. G. and Rowińska, P. A.: Delayed acceptance ABC-SMC. arXiv preprint arXiv:1708.02230 (2017)
Fearnhead, P., Taylor, B.M., et al.: An adaptive sequential Monte Carlo sampler. Bayesian Anal. 8(2), 411–438 (2013)
Article MathSciNet Google Scholar
Fox, C., and Nicholls, G.: Sampling conductivity images via MCMC. in K. Mardia, C. Gill and R. Aykroyd (eds), The art and Science of Bayesian Image Analysis, Proceedings of the Leeds Annual Statistical Research Workshop (LASR), Leeds, pp. 91–100 (1997)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software 33(1), 1 (2010)
Article Google Scholar
Gilks, W.R., Berzuini, C.: Following a moving target-Monte Carlo inference for dynamic Bayesian models. J. Royal Stat. Soc. Ser. B Stat. Methodol. 63(1), 127–146 (2001)
Article MathSciNet Google Scholar
Golightly, A., Henderson, D.A., Sherlock, C.: Delayed acceptance particle MCMC for exact inference in stochastic kinetic models. Statistics Comput. 25(5), 1039–1055 (2015)
Article MathSciNet Google Scholar
Granger, C.W., Joyeux, R.: An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1(1), 15–29 (1980)
Article MathSciNet Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. Royal Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
MATH Google Scholar
Hastings, W.: Monte carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Article MathSciNet Google Scholar
Hennig, C.: fpc: Flexible Procedures for Clustering. R package version 2.2-7. https://CRAN.R-project.org/package=fpc (2020)
Jasra, A., Stephens, D.A., Doucet, A., Tsagaris, T.: Inference for Lévy-driven stochastic volatility models via adaptive sequential Monte Carlo. Scandinavian J. Stat. 38(1), 1–22 (2011)
Article MathSciNet Google Scholar
Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)
MathSciNet Google Scholar
Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)
Article MathSciNet Google Scholar
Merkle, M.: Jensen’s inequality for multivariate medians. J. Math. Anal. Appl. 370(1), 258–269 (2010)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Article Google Scholar
Pasarica, C., Gelman, A.: Adaptively scaling the Metropolis algorithm using expected squared jumped distance. Statistica Sinica 20(1), 343–364 (2010)
MathSciNet MATH Google Scholar
Payne, R.D., Mallick, B.K.: Two-stage Metropolis-Hastings for tall data. J. Classif. 35(1), 29–51 (2018)
Article MathSciNet Google Scholar
Prangle, D.: Lazy ABC. Stat. Comput. 26(1–2), 171–185 (2016)
Article MathSciNet Google Scholar
Quiroz, M., Tran, M.-N., Villani, M., Kohn, R.: Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27(1), 12–22 (2018)
Article MathSciNet Google Scholar
Salomone, R., Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N.: Spectral subsampling MCMC for stationary time series. arXiv preprint arXiv:1910.13627 (2019)
Salomone, R., South, L. F., Drovandi, C. C. and Kroese, D. P.: Unbiased and consistent nested sampling via sequential Monte Carlo’, arXiv preprint arXiv:1805.03924 (2018)
Sherlock, C., Golightly, A., Henderson, D.A.: Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. J. Comput. Graph. Stat. 26(2), 434–444 (2017)
Article MathSciNet Google Scholar
Sherlock, C., Thiery, A. and Golightly, A.: Efficiency of delayed-acceptance random walk Metropolis algorithms. arXiv preprint arXiv:1506.08155 (2015)
Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., Järvinen, H., et al.: Efficient MCMC for climate model parameter estimation: Parallel adaptive chains and early rejection. Bayesian Anal. 7(3), 715–736 (2012)
Article MathSciNet Google Scholar
South, L.F., Pettitt, A.N., Drovandi, C.C.: Sequential Monte Carlo samplers with independent Markov chain Monte Carlo proposals. Bayesian Anal. 14(3), 753–776 (2019). https://doi.org/10.1214/18-BA1129
Article MathSciNet MATH Google Scholar
Stathopoulos, V., Girolami, M.A.: Markov chain Monte Carlo inference for Markov jump processes via the linear noise approximation. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 371, 20110541 (1984)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Whittle, P.: Estimation and information in stationary time series. Arkiv för Matematik 2(5), 423–434 (1953)
Article MathSciNet Google Scholar
Wiqvist, S., Picchini, U., Forman, J. L., Lindorff-Larsen, K. and Boomsma, W.: Accelerating delayed-acceptance Markov chain Monte Carlo algorithms. arXiv preprint arXiv:1806.05982 (2018)

Download references

Acknowledgements

JJB is a recipient of a Ph.D. Research Training Program scholarship from the Australian Government. JJB, AL, and CD thank the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers for financial support (CE140100049). AL and CD were supported by an ARC Research Council Discovery Project (DP200102101). AL was supported by an EPSRC grant (EP/R034710/1) and received travel funding from the Statistical Society of Australia. JJB and CD also thank the Centre for Data Science at QUT for support.

Author information

Authors and Affiliations

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia
Joshua J. Bon & Christopher Drovandi
School of Mathematics, University of Bristol, Bristol, UK
Anthony Lee

Authors

Joshua J. Bon
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Lee
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Drovandi
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Availability of data and material

Code to simulate datasets is available at at https://github.com/bonStats/smcdar.

Code availability

Code to run examples is available at at https://github.com/bonStats/smcdar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

1.1 A.1 One-move diversification

An alternative to ESJD diversification can be found in a simple method from South et al. (2019) for choosing the number of MCMC runs—the basis of which is from Drovandi and Pettitt (2011). In this regime, the number of cycles, k, is chosen so that each particle moves at least once in k iterations. A move occurs when a proposal is accepted using an MH kernel. We will refer to this criterion as one-move diversification. For a fixed scaling parameter h, one-move diversification uses the MH acceptance rates to determine the average number of MCMC cycles required for at least one proposal per particle to be accepted. More generally, one could require a higher minimum number of moves, but for simplicity we just consider the case of at least one move.

This section will consider a single mutation step of the SMC algorithm, consisting of multiple cycles of the MCMC kernel, indexed by $s \in \{1,2, \ldots , k\}$. Assuming the probability of moving (or acceptance, $\alpha ^{(1)}$) is equal across cycles, the average probability (across the tempered posterior distribution) that at least one is accepted in a sequence of k cycles, $\alpha ^{(k)}$, is

$$\begin{aligned} \alpha ^{(k)} = 1 - \left( 1-\alpha ^{(1)}\right) ^{k} \end{aligned}$$

(23)

for $k \in \{1,2,\ldots \}$. A pilot mutation step can be used to estimate the average acceptance rate across the particles in a single step, $\widehat{\alpha }^{(1)}$. We can then find k such that $\alpha ^{(k)} \ge p_{\min }$ for some threshold $0< p_{\min } < 1$. The formula to choose the total number of iterations, k, is

$$\begin{aligned} k = \left\lceil \frac{\log (1 - p_{\min })}{\log \left( 1-\widehat{\alpha }^{(1)}\right) } \right\rceil \end{aligned}$$

(24)

where $\widehat{\alpha }^{(1)}$ is the estimated acceptance rate from the pilot run of the MH kernel.

To frame this in the context of optimising computation time, note that the underlying criterion is to ensure a sufficient number of mutation steps are taken so that the probability of at least one move is greater than $p_{\min }$ for a given particle.

If we denote a move by $\Vert \varvec{\theta }_{s} - \varvec{\theta }_{s-1}\Vert _{0}$, where $\Vert \cdot \Vert _{0}$ is the zero “norm”, the corresponding diversification criterion can be expressed with

$$\begin{aligned} D(k, {\varvec{\phi }}) = \mathsf {P}\left( \sum _{s=1}^{k} \left\| \varvec{\theta }_{s} - \varvec{\theta }_{s-1} \right\| _{0} \ge 1 \right) \quad \text {and} \quad d= p_{\min } \end{aligned}$$

(25)

where the probability is taken with respect to the acceptance rates of the Metropolis-Hastings steps. Of course, this expression for $D(k, {\varvec{\phi }})$ is a more general version of (23) and coincides if we assume the probability of acceptance is equal across particle locations and MCMC iterations, s. We emphasise the norm notation to draw a comparison to the jumping distance diversification in Sect. 4.1. That is, we can write $P(k, {\varvec{\phi }})$ in (8) as

$$\begin{aligned} P(k, {\varvec{\phi }}) = \mathsf {P}\left( \mathsf {E}\left[ \sum _{s=1}^{k} \left\| \varvec{\theta }_{s} - \varvec{\theta }_{s-1} \right\| ^{2}_{\Sigma } \right] \ge d \right) \end{aligned}$$

where the expectation is with respect to the random acceptance over k cycles of the MH kernel. Written in this way, $P(k, {\varvec{\phi }})$ elicits an interesting comparison to (25); it is a change of “norm” when moving between one-move and jumping distance diversification.

Now we wish to use one-move criterion to select the tuning parameters. If we use different proposal kernel tuning parameters for particular subsets of particles, the acceptance rate will be a function of those parameters, so we write $\alpha ^{(k)}$ as $\alpha ^{(k)}({\varvec{\phi }})$. The optimisation stated in (3) can be simplified as stated in Proposition 3.

Proposition 3

Assume the cost function is $C(k, {\varvec{\phi }}) = k \times L_{F}$, approximating the cost of a standard Metropolis-Hastings step, and $D(k,{\varvec{\phi }}) = \alpha ^{(k)}({\varvec{\phi }})$. The latter also corresponds to (25) assuming a uniform acceptance rate across the support of $\varvec{\theta }$. Then the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \frac{\log (1 - p_{\min })}{\log \left( 1-\alpha ^{(1)}({\varvec{\phi }})\right) } \end{aligned}$$

(26)

where the general diversification threshold, d, has been replaced by the probability $p_{\min }$.

Proposition 3 is the solution to choosing the best tuning parameters with the one-move criterion and MH-cost. It closely connects to the original decision for k without tuning parameters (24). A proof of Proposition 3 is in Appendix A.4.

In general, we expect the tuning criterion in Proposition 3 to perform poorly. This can be demonstrated by a simple, but highly applicable, example. If the tuning parameter is the step size for an MH mutation, i.e. ${\varvec{\phi }} = [h]$, then we would expect the acceptance probability, $\alpha ^{(1)}({\varvec{\phi }})$, to be monotone decreasing in h. Hence, the minimisation in (26) will prefer the minimum step size possible, which will ensure at least one move with the minimal computation cost. In other words, the diversification criterion in (25) is only concerned with the probability of at least one move, not the quality of this move.

Due to the aforementioned shortcoming, a diversification criterion that also measures the quality of the mutation is desirable. For this reason, we focus on the ESJD as a criterion in the main text.

Table 4 Median (80% interval) multiplicative improvement of computation time relative to MH-SMC (using median tuning method) for simulation study

Full size table

1.2 A.2 Proof of Proposition 1

Let $\mathsf {D}_m = \left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi :D(k,{\varvec{\phi }}) \ge d\right\} $ with

$$\begin{aligned} D(k,{\varvec{\phi }}) = \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} . \end{aligned}$$

Using the multivariate Jensen inequality for medians in (Merkle (2010), Theorem 5.2), we have that

$$\begin{aligned} \sum _{s=1}^{k}\mathrm {median}\left\{ J_{s}({\varvec{\phi }}) \right\} \le \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} \end{aligned}$$

and assuming the jumping distances are iid, for a given ${\phi }$, we further reduce this to

$$\begin{aligned} k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \le \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} . \end{aligned}$$

(27)

We can define the set $\tilde{\mathsf {D}}_m$ as

$$\begin{aligned} \tilde{\mathsf {D}}_m= & {} \left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi :\tilde{D}(k,{\varvec{\phi }}) \ge d\right\} \\&\tilde{D}(k, {\varvec{\phi }}) =k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \end{aligned}$$

then we see from (27) that $\tilde{\mathsf {D}}_m \subseteq \mathsf {D}_m$.

1.3 A.3 Proof of Proposition 2

Under the MH cost function, $C(k, {\varvec{\phi }}) = k \times L_{F}$, and approximate ESJD diversification criterion,

$$\begin{aligned} \tilde{\mathsf {D}}_m =&\left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi : \tilde{D}(k,{\varvec{\phi }}) \ge d\right\} \\ \text {where }&\tilde{D}(k,{\varvec{\phi }}) = k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} , \end{aligned}$$

the inequality for the diversification criterion can be rearranged into

$$\begin{aligned} k \ge \frac{d}{\mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} }. \end{aligned}$$

Under this restriction, note that

$$\begin{aligned} C(k, {\varvec{\phi }}) \ge L_{F} \times \frac{d}{\mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} }. \end{aligned}$$

so under these conditions, the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \left( \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \right) ^{-1} \equiv \mathop {\hbox {arg max}}\limits _{\varvec{\phi } \in \Phi }~ \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} . \end{aligned}$$

1.4 A.4 Proof of Proposition 3

Under the MH cost function, $C(k, {\phi }) = k \times L_{F}$, and one-move diversification criterion,

$$\begin{aligned} \mathsf {D} =&\left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi : \alpha ^{(k)}({\varvec{\phi }}) \ge p_{\min }\right\} \\ \text {where }&\alpha ^{(k)}({\varvec{\phi }}) = 1 - (1-\alpha ^{(1)}({\varvec{\phi }}))^{k}, \end{aligned}$$

the inequality for the diversification criterion can be rearranged into

$$\begin{aligned} k \ge \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

Under this restriction, note that

$$\begin{aligned} C(k, {\varvec{\phi }}) \ge L_{F} \times \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

so under these conditions, the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

1.5 A.5 Additional figures and tables from simulation

1.6 A.6 Additional figures from ARFIMA model example

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bon, J.J., Lee, A. & Drovandi, C. Accelerating sequential Monte Carlo with surrogate likelihoods. Stat Comput 31, 62 (2021). https://doi.org/10.1007/s11222-021-10036-4

Download citation

Received: 08 September 2020
Accepted: 26 July 2021
Published: 18 August 2021
DOI: https://doi.org/10.1007/s11222-021-10036-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating sequential Monte Carlo with surrogate likelihoods

Abstract

Access this article

Similar content being viewed by others

Scalable inference for Markov processes with intractable likelihoods

Efficient $$\hbox {SMC}^2$$ schemes for stochastic kinetic models

The use of a single pseudo-sample in approximate Bayesian computation

References

Acknowledgements