Advertisement

Sankhya A

pp 1–37 | Cite as

Subsampling MCMC - an Introduction for the Survey Statistician

  • Matias QuirozEmail author
  • Mattias Villani
  • Robert Kohn
  • Minh-Ngoc Tran
  • Khue-Dung Dang
Article

Abstract

The rapid development of computing power and efficient Markov Chain Monte Carlo (MCMC) simulation algorithms have revolutionized Bayesian statistics, making it a highly practical inference method in applied work. However, MCMC algorithms tend to be computationally demanding, and are particularly slow for large datasets. Data subsampling has recently been suggested as a way to make MCMC methods scalable on massively large data, utilizing efficient sampling schemes and estimators from the survey sampling literature. These developments tend to be unknown by many survey statisticians who traditionally work with non-Bayesian methods, and rarely use MCMC. Our article explains the idea of data subsampling in MCMC by reviewing one strand of work, Subsampling MCMC, a so called Pseudo-Marginal MCMC approach to speeding up MCMC through data subsampling. The review is written for a survey statistician without previous knowledge of MCMC methods since our aim is to motivate survey sampling experts to contribute to the growing Subsampling MCMC literature.

Keywords and phrases.

Pseudo-Marginal MCMC Difference estimator Hamiltonian Monte Carlo (HMC). 

AMS (2000) subject classification.

Primary 62-02 Secondary 62D05 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

Matias Quiroz and Robert Kohn were partially supported by Australian Research Council Center of Excellence grant CE140100049.

References

  1. Alquier, P., Friel, N., Everitt, R. and Boland, A. (2016). Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Stat. Comput. 26, 1-2, 29–47.MathSciNetzbMATHCrossRefGoogle Scholar
  2. Andrieu, C. and Roberts, G.O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 2, 697–725.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Bardenet, R., Doucet, A. and Holmes, C. (2014). Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach, p. 405–413.Google Scholar
  4. Bardenet, R., Doucet, A. and Holmes, C. (2017). On Markov chain Monte Carlo methods for tall data. J. Mach. Learn. Res. 18, 47, 1–43.MathSciNetzbMATHGoogle Scholar
  5. Beaumont, M.A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164, 3, 1139–1160.Google Scholar
  6. Betancourt, M. (2015). The fundamental incompatibility of scalable Hamiltonian Monte Carlo and naive data subsampling. In International Conference on Machine Learning, pp. 533–540.Google Scholar
  7. Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv:1701.02434.
  8. Bierkens, J., Fearnhead, P. and Roberts, G (2018). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Annals of Statistics, forthcoming.Google Scholar
  9. Blei, D.M., Kucukelbir, A. and McAuliffe, J.D. (2017). Variational inference: A review for statisticians. Journal of the American Statistical Association 112, 518, 859–877.MathSciNetCrossRefGoogle Scholar
  10. Bouchard-Côté, A., Vollmer, S.J. and Doucet, A. (2018). The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Am. Stat. Assoc. 113, 522, 855–867.MathSciNetzbMATHCrossRefGoogle Scholar
  11. Brooks, S., Gelman, A., Jones, G. and Meng, X.-L (2011). Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton.zbMATHCrossRefGoogle Scholar
  12. Ceperley, D. and Dewing, M. (1999). The penalty method for random walks with uncertain energies. J. Chem. Phys. 110, 20, 9812–9820.CrossRefGoogle Scholar
  13. Chen, C.-F. (1985). On asymptotic normality of limiting density functions with Bayesian implications. J. R. Stat. Soc. Ser. B Stat. Methodol. 47, 3, 540–546.MathSciNetzbMATHGoogle Scholar
  14. Chen, T., Fox, E. and Guestrin, C. (2014). Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, pp. 1683–1691.Google Scholar
  15. Dang, K.-D., Quiroz, M., Kohn, R., Tran, M.-N. and Villani, M. (2017). Hamiltonian Monte Carlo with energy conserving subsampling. arXiv:1708.00955.
  16. Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107–113.CrossRefGoogle Scholar
  17. Del Moral, P. (2004). Feynman-Kac formulae: genealogical and interacting particle systems with applications. Springer, Berlin.zbMATHCrossRefGoogle Scholar
  18. Deligiannidis, G., Doucet, A. and Pitt, M.K. (2018). The correlated pseudo-marginal method. Journal of the Royal Statistical Society B, forthcoming.Google Scholar
  19. Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87, 418, 376–382.MathSciNetzbMATHCrossRefGoogle Scholar
  20. Doucet, A., De Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo methods in practice, pp. 3–14. Springer.Google Scholar
  21. Doucet, A., Pitt, M., Deligiannidis, G. and Kohn, R. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102, 2, 295–313.MathSciNetzbMATHCrossRefGoogle Scholar
  22. Duane, S., Kennedy, A.D., Pendleton, B.J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195, 2, 216–222.CrossRefGoogle Scholar
  23. Flury, T. and Shephard, N. (2011). Bayesian inference based only on simulated likelihood: particle filter analysis of dynamic economic models. Econometric Theory 27, 5, 933–956.MathSciNetzbMATHCrossRefGoogle Scholar
  24. Gelman, A., Vehtari, A., Jylänki, P., Sivula, T., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J.P., Schiminovich, D. and Robert, C. (2017). Expectation Propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv:1412.4869.
  25. Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 6, 721–741.zbMATHCrossRefGoogle Scholar
  26. Gunawan, D., Kohn, R., Quiroz, M., Dang, K.-D. and Tran, M.-N. (2018). Subsampling sequential Monte Carlo for static Bayesian models. arXiv:1805.03317.
  27. Hammersley, J.M. and Handscomb, D.C. (1964). Monte Carlo methods. Chapman and Hall, London.zbMATHCrossRefGoogle Scholar
  28. Hastings, W.K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97–109.MathSciNetzbMATHCrossRefGoogle Scholar
  29. Hoffman, M.D. and Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1, 1593–1623.MathSciNetzbMATHGoogle Scholar
  30. Joe, H. (2014). Dependence modeling with copulas. CRC Press, Boca Raton.zbMATHCrossRefGoogle Scholar
  31. Jordan, M.I., Ghahramani, Z., Jaakkola, T.S. and Saul, L.K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 2, 183–233.zbMATHCrossRefGoogle Scholar
  32. Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, pp. 181–189.Google Scholar
  33. Lin, L., Liu, K. and Sloan, J. (2000). A noisy Monte Carlo algorithm. Phys. Rev. D 61, 7, 074505.CrossRefGoogle Scholar
  34. Lyne, A.-M., Girolami, M., Atchade, Y., Strathmann, H. and Simpson, D. (2015). On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Stat. Sci. 30, 4, 443–467.MathSciNetzbMATHCrossRefGoogle Scholar
  35. Maclaurin, D. and Adams, R.P. (2014). Firefly Monte Carlo: Exact MCMC with subsets of data. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI 2014).Google Scholar
  36. Maire, F., Friel, N. and Alquier, P. (2018). Informed sub-sampling MCMC: Approximate Bayesian inference for large datasets. Statistics and Computing, forthcoming.Google Scholar
  37. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 6, 1087–1092.CrossRefGoogle Scholar
  38. Minka, T.P. (2001). Expectation Propagation for approximate Bayesian inference. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 362–369. Morgan Kaufmann Publishers Inc.Google Scholar
  39. Minsker, S., Srivastava, S., Lin, L. and Dunson, D. (2014). Scalable and robust Bayesian inference via the median posterior. In International Conference on Machine Learning, pp. 1656–1664.Google Scholar
  40. Neal, R.M. et al. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2, 11, 2.zbMATHGoogle Scholar
  41. Neiswanger, W., Wang, C. and Xing, E. (2013). Asymptotically exact, embarrassingly parallel MCMC. arXiv:1311.4780.
  42. Nemeth, C. (2018). Sherlock Merging MCMC subposteriors through Gaussian-process approximations. Bayesian Anal. 13, 2, 507–530.MathSciNetzbMATHCrossRefGoogle Scholar
  43. Nicholls, G.K., Fox, C. and Watt, A.M. (2012). Coupled MCMC with a randomized acceptance probability. arXiv:1205.6857.
  44. Papaspiliopoulos, O. (2009). A methodological framework for Monte Carlo probabilistic inference for diffusion processes. Manuscript. Available at http://wrap.warwick.ac.uk/35220/1/WRAP_Papaspiliopoulos_09-31w.pdf.
  45. Pitt, M.K., dos Santos Silva, R., Giordani, P. and Kohn, R. (2012). On some properties of Markov Chain Monte Carlo simulation methods based on the particle filter. J. Econ. 171, 2, 134–151.MathSciNetzbMATHCrossRefGoogle Scholar
  46. Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). Coda: Convergence diagnosis and output analysis for MCMC. R News 6, 1, 7–11.Google Scholar
  47. Quiroz, M., Kohn, R., Villani, M. and Tran, M.-N. (2018a). Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, forthcoming.Google Scholar
  48. Quiroz, M., Tran, M.-N., Villani, M. and Kohn, R. (2018b). Speeding up MCMC by delayed acceptance and data subsampling. J. Comput. Graph. Stat. 27, 12– 22.MathSciNetCrossRefGoogle Scholar
  49. Quiroz, M., Tran, M.-N., Villani, M., Kohn, R. and Dang, K.-D. (2018c). The block-Poisson estimator for optimally tuned exact Subsampling MCMC. arXiv:1603.08232.
  50. Quiroz, M., Villani, M. and Kohn, R. (2014). Speeding up MCMC by efficient data subsampling. arXiv:1603.08232v1.
  51. Rhee, C. and Glynn, P.W. (2015). Unbiased estimation with square root convergence for SDE models. Oper. Res. 63, 5, 1026–1043.MathSciNetzbMATHCrossRefGoogle Scholar
  52. Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7, 1, 110–120.MathSciNetzbMATHCrossRefGoogle Scholar
  53. Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71, 2, 319–392.MathSciNetzbMATHCrossRefGoogle Scholar
  54. Särndal, C.-E., Swensson, B. and Wretman, J. (2003). Model assisted survey sampling. Springer Science & Business Media, Berlin.zbMATHGoogle Scholar
  55. Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I. and McCulloch, R.E. (2016). Bayes and big data: The consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11, 2, 78–88.Google Scholar
  56. Sherlock, C., Thiery, A.H., Roberts, G.O. and Rosenthal, J.S. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Stat. 43, 1, 238– 275.MathSciNetzbMATHCrossRefGoogle Scholar
  57. Steel, D. and McLaren, C. (2009). Design and analysis of surveys repeated over time. Handbook of Statist. 29, 289–313.CrossRefGoogle Scholar
  58. Van der Vaart, A.W. (1998). Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge.zbMATHCrossRefGoogle Scholar
  59. Wagner, W. (1988). Unbiased multi-step estimators for the Monte Carlo evaluation of certain functional integrals. J. Comput. Phys. 79, 2, 336–352.MathSciNetzbMATHCrossRefGoogle Scholar
  60. Wang, X. and Dunson, D.B. (2014). Parallel MCMC via Weierstrass sampler. arXiv:1312.4605v2.

Copyright information

© Indian Statistical Institute 2018

Authors and Affiliations

  1. 1.Australian School of BusinessUniversity of New South WalesSydneyAustralia
  2. 2.Department of StatisticsStockholm UniversityStockholmSweden
  3. 3.Department of Computer and Information ScienceLinköping UniversityLinköpingSweden
  4. 4.Discipline of Business AnalyticsUniversity of SydneyCamperdownAustralia

Personalised recommendations