Skip to main content
Log in

Accelerating sequential Monte Carlo with surrogate likelihoods

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Delayed-acceptance is a technique for reducing computational effort for Bayesian models with expensive likelihoods. Using a delayed-acceptance kernel for Markov chain Monte Carlo can reduce the number of expensive likelihoods evaluations required to approximate a posterior expectation. Delayed-acceptance uses a surrogate, or approximate, likelihood to avoid evaluation of the expensive likelihood when possible. Within the sequential Monte Carlo framework, we utilise the history of the sampler to adaptively tune the surrogate likelihood to yield better approximations of the expensive likelihood and use a surrogate first annealing schedule to further increase computational efficiency. Moreover, we propose a framework for optimising computation time whilst avoiding particle degeneracy, which encapsulates existing strategies in the literature. Overall, we develop a novel algorithm for computationally efficient SMC with expensive likelihood functions. The method is applied to static Bayesian models, which we demonstrate on toy and real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Banterle, M., Grazian, C., Lee, A., Robert, C.P.: Accelerating Metropolis-Hastings algorithms by delayed acceptance. Foundations Data Sci. 1(2), 103–128 (2019)

    Article  Google Scholar 

  • Barndorff-Nielsen, O., Schou, G.: On the parametrization of autoregressive models by partial autocorrelations. J. Multivariate Anal. 3(4), 408–419 (1973)

    Article  MathSciNet  Google Scholar 

  • Beskos, A., Jasra, A., Kantas, N., Thiery, A.: On the convergence of adaptive sequential Monte Carlo methods. Ann. Appl. Probab. 26(2), 1111–1146 (2016)

    Article  MathSciNet  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  • Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, Switzerland (2016)

    Book  Google Scholar 

  • Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  • Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539–552 (2002)

    Article  MathSciNet  Google Scholar 

  • Christen, J.A., Fox, C.: Markov chain Monte Carlo using an approximation. J. Comput. Graph. Stat. 14(4), 795–810 (2005)

    Article  MathSciNet  Google Scholar 

  • Conrad, P.R., Marzouk, Y.M., Pillai, N.S., Smith, A.: Accelerating asymptotically exact MCMC for computationally intensive models via local approximations. J. Am. Stat. Assoc. 111(516), 1591–1607 (2016)

    Article  MathSciNet  Google Scholar 

  • Cui, T., Fox, C. and O’sullivan, M.: Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm. Water Resour. Res. 47(10) (2011)

  • Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. Royal Stat. Soc. Ser. B Stat. Methodol. 68(3), 411–436 (2006)

    Article  MathSciNet  Google Scholar 

  • Donnet, S., and Robin, S.: Using deterministic approximations to accelerate SMC for posterior sampling. arXiv preprint arXiv:1707.07971 (2017)

  • Drovandi, C.C., Moores, M.T., Boys, R.J.: Accelerating pseudo-marginal MCMC using Gaussian processes. Comput. Stat. Data Anal. 118, 1–17 (2018)

    Article  MathSciNet  Google Scholar 

  • Drovandi, C.C., Pettitt, A.N.: Likelihood-free Bayesian estimation of multivariate quantile distributions. Comput. Stat. Data Anal. 55(9), 2541–2556 (2011)

    Article  MathSciNet  Google Scholar 

  • Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, vol. 3. Wiley, New York (1973)

    MATH  Google Scholar 

  • Elf, J., Ehrenberg, M.: Fast evaluation of fluctuations in biochemical networks with the linear noise approximation. Genome Res. 13(11), 2475–2484 (2003)

    Article  Google Scholar 

  • Everitt, R. G. and Rowińska, P. A.: Delayed acceptance ABC-SMC. arXiv preprint arXiv:1708.02230 (2017)

  • Fearnhead, P., Taylor, B.M., et al.: An adaptive sequential Monte Carlo sampler. Bayesian Anal. 8(2), 411–438 (2013)

    Article  MathSciNet  Google Scholar 

  • Fox, C., and Nicholls, G.: Sampling conductivity images via MCMC. in K. Mardia, C. Gill and R. Aykroyd (eds), The art and Science of Bayesian Image Analysis, Proceedings of the Leeds Annual Statistical Research Workshop (LASR), Leeds, pp. 91–100 (1997)

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Software 33(1), 1 (2010)

    Article  Google Scholar 

  • Gilks, W.R., Berzuini, C.: Following a moving target-Monte Carlo inference for dynamic Bayesian models. J. Royal Stat. Soc. Ser. B Stat. Methodol. 63(1), 127–146 (2001)

    Article  MathSciNet  Google Scholar 

  • Golightly, A., Henderson, D.A., Sherlock, C.: Delayed acceptance particle MCMC for exact inference in stochastic kinetic models. Statistics Comput. 25(5), 1039–1055 (2015)

    Article  MathSciNet  Google Scholar 

  • Granger, C.W., Joyeux, R.: An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1(1), 15–29 (1980)

    Article  MathSciNet  Google Scholar 

  • Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. J. Royal Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)

    MATH  Google Scholar 

  • Hastings, W.: Monte carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  Google Scholar 

  • Hennig, C.: fpc: Flexible Procedures for Clustering. R package version 2.2-7. https://CRAN.R-project.org/package=fpc (2020)

  • Jasra, A., Stephens, D.A., Doucet, A., Tsagaris, T.: Inference for Lévy-driven stochastic volatility models via adaptive sequential Monte Carlo. Scandinavian J. Stat. 38(1), 1–22 (2011)

    Article  MathSciNet  Google Scholar 

  • Kitagawa, G.: Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25 (1996)

    MathSciNet  Google Scholar 

  • Liu, J.S., Chen, R.: Sequential Monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032–1044 (1998)

    Article  MathSciNet  Google Scholar 

  • Merkle, M.: Jensen’s inequality for multivariate medians. J. Math. Anal. Appl. 370(1), 258–269 (2010)

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)

    Article  Google Scholar 

  • Pasarica, C., Gelman, A.: Adaptively scaling the Metropolis algorithm using expected squared jumped distance. Statistica Sinica 20(1), 343–364 (2010)

    MathSciNet  MATH  Google Scholar 

  • Payne, R.D., Mallick, B.K.: Two-stage Metropolis-Hastings for tall data. J. Classif. 35(1), 29–51 (2018)

    Article  MathSciNet  Google Scholar 

  • Prangle, D.: Lazy ABC. Stat. Comput. 26(1–2), 171–185 (2016)

    Article  MathSciNet  Google Scholar 

  • Quiroz, M., Tran, M.-N., Villani, M., Kohn, R.: Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27(1), 12–22 (2018)

    Article  MathSciNet  Google Scholar 

  • Salomone, R., Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N.: Spectral subsampling MCMC for stationary time series. arXiv preprint arXiv:1910.13627 (2019)

  • Salomone, R., South, L. F., Drovandi, C. C. and Kroese, D. P.: Unbiased and consistent nested sampling via sequential Monte Carlo’, arXiv preprint arXiv:1805.03924 (2018)

  • Sherlock, C., Golightly, A., Henderson, D.A.: Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. J. Comput. Graph. Stat. 26(2), 434–444 (2017)

    Article  MathSciNet  Google Scholar 

  • Sherlock, C., Thiery, A. and Golightly, A.: Efficiency of delayed-acceptance random walk Metropolis algorithms. arXiv preprint arXiv:1506.08155 (2015)

  • Solonen, A., Ollinaho, P., Laine, M., Haario, H., Tamminen, J., Järvinen, H., et al.: Efficient MCMC for climate model parameter estimation: Parallel adaptive chains and early rejection. Bayesian Anal. 7(3), 715–736 (2012)

    Article  MathSciNet  Google Scholar 

  • South, L.F., Pettitt, A.N., Drovandi, C.C.: Sequential Monte Carlo samplers with independent Markov chain Monte Carlo proposals. Bayesian Anal. 14(3), 753–776 (2019). https://doi.org/10.1214/18-BA1129

    Article  MathSciNet  MATH  Google Scholar 

  • Stathopoulos, V., Girolami, M.A.: Markov chain Monte Carlo inference for Markov jump processes via the linear noise approximation. Philos. Trans. Royal Soc. A Math. Phys. Eng. Sci. 371, 20110541 (1984)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Whittle, P.: Estimation and information in stationary time series. Arkiv för Matematik 2(5), 423–434 (1953)

    Article  MathSciNet  Google Scholar 

  • Wiqvist, S., Picchini, U., Forman, J. L., Lindorff-Larsen, K. and Boomsma, W.: Accelerating delayed-acceptance Markov chain Monte Carlo algorithms. arXiv preprint arXiv:1806.05982 (2018)

Download references

Acknowledgements

JJB is a recipient of a Ph.D. Research Training Program scholarship from the Australian Government. JJB, AL, and CD thank the Australian Research Council (ARC) Centre of Excellence for Mathematical and Statistical Frontiers for financial support (CE140100049). AL and CD were supported by an ARC Research Council Discovery Project (DP200102101). AL was supported by an EPSRC grant (EP/R034710/1) and received travel funding from the Statistical Society of Australia. JJB and CD also thank the Centre for Data Science at QUT for support.

Author information

Authors and Affiliations

Authors

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Availability of data and material

Code to simulate datasets is available at at https://github.com/bonStats/smcdar.

Code availability

Code to run examples is available at at https://github.com/bonStats/smcdar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

A Appendix

1.1 A.1 One-move diversification

An alternative to ESJD diversification can be found in a simple method from South et al. (2019) for choosing the number of MCMC runs—the basis of which is from Drovandi and Pettitt (2011). In this regime, the number of cycles, k, is chosen so that each particle moves at least once in k iterations. A move occurs when a proposal is accepted using an MH kernel. We will refer to this criterion as one-move diversification. For a fixed scaling parameter h, one-move diversification uses the MH acceptance rates to determine the average number of MCMC cycles required for at least one proposal per particle to be accepted. More generally, one could require a higher minimum number of moves, but for simplicity we just consider the case of at least one move.

This section will consider a single mutation step of the SMC algorithm, consisting of multiple cycles of the MCMC kernel, indexed by \(s \in \{1,2, \ldots , k\}\). Assuming the probability of moving (or acceptance, \(\alpha ^{(1)}\)) is equal across cycles, the average probability (across the tempered posterior distribution) that at least one is accepted in a sequence of k cycles, \(\alpha ^{(k)}\), is

$$\begin{aligned} \alpha ^{(k)} = 1 - \left( 1-\alpha ^{(1)}\right) ^{k} \end{aligned}$$
(23)

for \(k \in \{1,2,\ldots \}\). A pilot mutation step can be used to estimate the average acceptance rate across the particles in a single step, \(\widehat{\alpha }^{(1)}\). We can then find k such that \(\alpha ^{(k)} \ge p_{\min }\) for some threshold \(0< p_{\min } < 1\). The formula to choose the total number of iterations, k, is

$$\begin{aligned} k = \left\lceil \frac{\log (1 - p_{\min })}{\log \left( 1-\widehat{\alpha }^{(1)}\right) } \right\rceil \end{aligned}$$
(24)

where \(\widehat{\alpha }^{(1)}\) is the estimated acceptance rate from the pilot run of the MH kernel.

To frame this in the context of optimising computation time, note that the underlying criterion is to ensure a sufficient number of mutation steps are taken so that the probability of at least one move is greater than \(p_{\min }\) for a given particle.

If we denote a move by \(\Vert \varvec{\theta }_{s} - \varvec{\theta }_{s-1}\Vert _{0}\), where \(\Vert \cdot \Vert _{0}\) is the zero “norm”, the corresponding diversification criterion can be expressed with

$$\begin{aligned} D(k, {\varvec{\phi }}) = \mathsf {P}\left( \sum _{s=1}^{k} \left\| \varvec{\theta }_{s} - \varvec{\theta }_{s-1} \right\| _{0} \ge 1 \right) \quad \text {and} \quad d= p_{\min } \end{aligned}$$
(25)

where the probability is taken with respect to the acceptance rates of the Metropolis-Hastings steps. Of course, this expression for \(D(k, {\varvec{\phi }})\) is a more general version of (23) and coincides if we assume the probability of acceptance is equal across particle locations and MCMC iterations, s. We emphasise the norm notation to draw a comparison to the jumping distance diversification in Sect. 4.1. That is, we can write \(P(k, {\varvec{\phi }})\) in (8) as

$$\begin{aligned} P(k, {\varvec{\phi }}) = \mathsf {P}\left( \mathsf {E}\left[ \sum _{s=1}^{k} \left\| \varvec{\theta }_{s} - \varvec{\theta }_{s-1} \right\| ^{2}_{\Sigma } \right] \ge d \right) \end{aligned}$$

where the expectation is with respect to the random acceptance over k cycles of the MH kernel. Written in this way, \(P(k, {\varvec{\phi }})\) elicits an interesting comparison to (25); it is a change of “norm” when moving between one-move and jumping distance diversification.

Now we wish to use one-move criterion to select the tuning parameters. If we use different proposal kernel tuning parameters for particular subsets of particles, the acceptance rate will be a function of those parameters, so we write \(\alpha ^{(k)}\) as \(\alpha ^{(k)}({\varvec{\phi }})\). The optimisation stated in (3) can be simplified as stated in Proposition 3.

Proposition 3

Assume the cost function is \(C(k, {\varvec{\phi }}) = k \times L_{F}\), approximating the cost of a standard Metropolis-Hastings step, and \(D(k,{\varvec{\phi }}) = \alpha ^{(k)}({\varvec{\phi }})\). The latter also corresponds to (25) assuming a uniform acceptance rate across the support of \(\varvec{\theta }\). Then the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \frac{\log (1 - p_{\min })}{\log \left( 1-\alpha ^{(1)}({\varvec{\phi }})\right) } \end{aligned}$$
(26)

where the general diversification threshold, d, has been replaced by the probability \(p_{\min }\).

Proposition 3 is the solution to choosing the best tuning parameters with the one-move criterion and MH-cost. It closely connects to the original decision for k without tuning parameters (24). A proof of Proposition 3 is in Appendix A.4.

In general, we expect the tuning criterion in Proposition 3 to perform poorly. This can be demonstrated by a simple, but highly applicable, example. If the tuning parameter is the step size for an MH mutation, i.e. \({\varvec{\phi }} = [h]\), then we would expect the acceptance probability, \(\alpha ^{(1)}({\varvec{\phi }})\), to be monotone decreasing in h. Hence, the minimisation in (26) will prefer the minimum step size possible, which will ensure at least one move with the minimal computation cost. In other words, the diversification criterion in (25) is only concerned with the probability of at least one move, not the quality of this move.

Due to the aforementioned shortcoming, a diversification criterion that also measures the quality of the mutation is desirable. For this reason, we focus on the ESJD as a criterion in the main text.

Table 4 Median (80% interval) multiplicative improvement of computation time relative to MH-SMC (using median tuning method) for simulation study

1.2 A.2 Proof of Proposition 1

Let \(\mathsf {D}_m = \left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi :D(k,{\varvec{\phi }}) \ge d\right\} \) with

$$\begin{aligned} D(k,{\varvec{\phi }}) = \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} . \end{aligned}$$

Using the multivariate Jensen inequality for medians in (Merkle (2010), Theorem 5.2), we have that

$$\begin{aligned} \sum _{s=1}^{k}\mathrm {median}\left\{ J_{s}({\varvec{\phi }}) \right\} \le \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} \end{aligned}$$

and assuming the jumping distances are iid, for a given \({\phi }\), we further reduce this to

$$\begin{aligned} k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \le \mathrm {median}\left\{ \sum _{s=1}^{k} J_{s}({\varvec{\phi }}) \right\} . \end{aligned}$$
(27)

We can define the set \(\tilde{\mathsf {D}}_m\) as

$$\begin{aligned} \tilde{\mathsf {D}}_m= & {} \left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi :\tilde{D}(k,{\varvec{\phi }}) \ge d\right\} \\&\tilde{D}(k, {\varvec{\phi }}) =k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \end{aligned}$$

then we see from (27) that \(\tilde{\mathsf {D}}_m \subseteq \mathsf {D}_m\).

1.3 A.3 Proof of Proposition 2

Under the MH cost function, \(C(k, {\varvec{\phi }}) = k \times L_{F}\), and approximate ESJD diversification criterion,

$$\begin{aligned} \tilde{\mathsf {D}}_m =&\left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi : \tilde{D}(k,{\varvec{\phi }}) \ge d\right\} \\ \text {where }&\tilde{D}(k,{\varvec{\phi }}) = k \times \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} , \end{aligned}$$

the inequality for the diversification criterion can be rearranged into

$$\begin{aligned} k \ge \frac{d}{\mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} }. \end{aligned}$$

Under this restriction, note that

$$\begin{aligned} C(k, {\varvec{\phi }}) \ge L_{F} \times \frac{d}{\mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} }. \end{aligned}$$

so under these conditions, the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \left( \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} \right) ^{-1} \equiv \mathop {\hbox {arg max}}\limits _{\varvec{\phi } \in \Phi }~ \mathrm {median}\left\{ J_{1}({\varvec{\phi }}) \right\} . \end{aligned}$$

1.4 A.4 Proof of Proposition 3

Under the MH cost function, \(C(k, {\phi }) = k \times L_{F}\), and one-move diversification criterion,

$$\begin{aligned} \mathsf {D} =&\left\{ (k,{\varvec{\phi }}) \in \mathbb {Z}^{+} \times \Phi : \alpha ^{(k)}({\varvec{\phi }}) \ge p_{\min }\right\} \\ \text {where }&\alpha ^{(k)}({\varvec{\phi }}) = 1 - (1-\alpha ^{(1)}({\varvec{\phi }}))^{k}, \end{aligned}$$

the inequality for the diversification criterion can be rearranged into

$$\begin{aligned} k \ge \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

Under this restriction, note that

$$\begin{aligned} C(k, {\varvec{\phi }}) \ge L_{F} \times \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

so under these conditions, the general problem in (3) is equivalent to

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{{\varvec{\phi }} \in \Phi }~ \frac{\log (1 - p_{\min })}{\log (1-\alpha ^{(1)}({\varvec{\phi }}))}. \end{aligned}$$

1.5 A.5 Additional figures and tables from simulation

Fig. 3
figure 3

Example of posterior \(\varvec{\beta }\) densities from three SMC algorithms (example 1)

Fig. 4
figure 4

Example of posterior \(\varvec{\beta }\) densities from three SMC algorithms (example 2)

Fig. 5
figure 5

Example of posterior \(\varvec{\beta }\) densities from three SMC algorithms (example 3)

Fig. 6
figure 6

Example of posterior \(\varvec{\beta }\) densities from three SMC algorithms (example 4)

Fig. 7
figure 7

Posteriors of \(\phi _{1}\) from 10 replicates of the four SMC algorithms. The SMC algorithms using only the surrogate likelihood (approx) are annealed to \(\gamma _{T} \in \{0.01,1\}\) and \(\gamma _{T} = 0.01\). The latter of which is the initial particle set for the surrogate first annealing procedure

Fig. 8
figure 8

Posteriors of \(\theta _{1}\) from 10 replicates of the four SMC algorithms. The SMC algorithms using only the surrogate likelihood (approx) are annealed to \(\gamma _{T} \in \{0.01,1\}\) and \(\gamma _{T} = 0.01\). The latter of which is the initial particle set for the surrogate first annealing procedure

1.6 A.6 Additional figures from ARFIMA model example

Fig. 9
figure 9

Posteriors of d from 10 replicates of the four SMC algorithms. The SMC algorithms using only the surrogate likelihood (approx) are annealed to \(\gamma _{T} \in \{0.01,1\}\) and \(\gamma _{T} = 0.01\). The latter of which is the initial particle set for the surrogate first annealing procedure

Fig. 10
figure 10

Posteriors of \(\sigma ^{2}\) from 10 replicates of the four SMC algorithms. The SMC algorithms using only the surrogate likelihood (approx) are annealed to \(\gamma _{T} \in \{0.01,1\}\) and \(\gamma _{T} = 0.01\). The latter of which is the initial particle set for the surrogate first annealing procedure

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bon, J.J., Lee, A. & Drovandi, C. Accelerating sequential Monte Carlo with surrogate likelihoods. Stat Comput 31, 62 (2021). https://doi.org/10.1007/s11222-021-10036-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10036-4

Keywords

Navigation