Statistics and Computing

, Volume 26, Issue 1–2, pp 29–47 | Cite as

Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

Article

Abstract

Monte Carlo algorithms often aim to draw from a distribution \(\pi \) by simulating a Markov chain with transition kernel \(P\) such that \(\pi \) is invariant under \(P\). However, there are many situations for which it is impractical or impossible to draw from the transition kernel \(P\). For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace \(P\) by an approximation \(\hat{P}\). Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ‘close’ the chain given by the transition kernel \(\hat{P}\) is to the chain given by \(P\). We apply these results to several examples from spatial statistics and network analysis.

Keywords

Markov chain Monte Carlo Pseudo-marginal Monte Carlo  Intractable likelihoods 

Notes

Acknowledgments

The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant Number SFI/12/RC/2289. Nial Friel’s research was also supported by an Science Foundation Ireland grant: 12/IP/1424.

References

  1. Ahn, S., Korattikara, A., Welling, M.: Bayesian posterior sampling via stochastic gradient Fisher scoring. In: Proceedings of the 29th International Conference on Machine Learning. (2012)Google Scholar
  2. Andrieu, C., Roberts, G.: The pseudo-marginal approach for efficient Monte-Carlo computations. Ann. Stat. 37(2), 697–725 (2009)MATHMathSciNetCrossRefGoogle Scholar
  3. Andrieu, C., Vihola, M.: Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Preprint arXiv:1210.1484 (2012).
  4. Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 31st International Conference on Machine Learning (2014)Google Scholar
  5. Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164, 1139–1160 (2003)Google Scholar
  6. Besag, J.E.: Spatial Interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)MATHMathSciNetGoogle Scholar
  7. Bottou, L., Bousquet, O.: The tradeoffs of large-scale learning. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 351–368. MIT Press, Cambridge (2011)Google Scholar
  8. Bühlmann, P., Van de Geer, S.: Statistics for High-Dimensional Data. Springer, Berlin (2011)MATHCrossRefGoogle Scholar
  9. Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc. Netw. 33, 41–55 (2011)CrossRefGoogle Scholar
  10. Dalalyan, A., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin. J. Comput. Syst. Sci. 78(5), 1423–1443 (2012)MATHMathSciNetCrossRefGoogle Scholar
  11. Ferré, D., Hervé, L., Ledoux, J.: Regular perturbation of \(V\)-geometrically ergodic Markov chains. J. Appl. Probab. 50(1), 184–194 (2013)Google Scholar
  12. Friel, N., Pettitt, A.N.: Likelihood estimation and inference for the autologistic model. J. Comput. Graph. Stat. 13, 232–246 (2004)MathSciNetCrossRefGoogle Scholar
  13. Friel, N., Pettitt, A.N., Reeves, R., Wit, E.: Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comput. Graph. Stat. 18, 243–261 (2009)MathSciNetCrossRefGoogle Scholar
  14. Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94, 661–672 (2007)MATHMathSciNetCrossRefGoogle Scholar
  15. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)MATHCrossRefGoogle Scholar
  16. Gilks, W., Roberts, G., George, E.: Adaptive direction sampling. Statistician 43, 179–189 (1994)CrossRefGoogle Scholar
  17. Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltoian Monte Carlo methods (with discussion). J. R. Stat. Soc. Ser. B 73, 123–214 (2011)MathSciNetCrossRefGoogle Scholar
  18. Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)MATHGoogle Scholar
  19. Kartashov, N.V.: Strong Stable Markov Chains. VSP, Utrecht (1996)MATHGoogle Scholar
  20. Korattikara, A., Chen Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis–Hastings Budget. In: Proceedings of the 31st International Conference on Machine Learning, pp. 681–688 (2014)Google Scholar
  21. Liang, F. Jin, I.-H.: An auxiliary variables Metropolis–Hastings algorithm for sampling from distributions with intractable normalizing constants. Technical report (2011)Google Scholar
  22. Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)MATHMathSciNetCrossRefGoogle Scholar
  23. Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability. Cambridge University Press, Cambridge (1993)MATHCrossRefGoogle Scholar
  24. Mitrophanov, A.Y.: Sensitivity and convergence of uniformly ergodic Markov chains. J. Appl. Probab. 42, 1003–1014 (2005)MATHMathSciNetCrossRefGoogle Scholar
  25. Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov chain Monte-Carlo method for distributions with intractable normalizing constants. Biometrika 93, 451–458 (2006)MathSciNetCrossRefGoogle Scholar
  26. Murray, I., Ghahramani, Z., MacKay, D.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence UAI06, AUAI Press, Arlington, Virginia (2006)Google Scholar
  27. Nicholls, G. K., Fox, C., Watt, A.M.: Coupled MCMC with a randomized acceptance probability. Preprint arXiv:1205.6857 (2012)
  28. Propp, J., Wilson, D.: Exactly sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algorithms 9, 223–252 (1996)MATHMathSciNetCrossRefGoogle Scholar
  29. Reeves, R., Pettitt, A.N.: Efficient recursions for general factorisable models. Biometrika 91, 751–757 (2004)MATHMathSciNetCrossRefGoogle Scholar
  30. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)MATHMathSciNetCrossRefGoogle Scholar
  31. Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis–Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337–357 (2002)MATHMathSciNetCrossRefGoogle Scholar
  32. Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996a)MATHMathSciNetCrossRefGoogle Scholar
  33. Roberts, G.O., Tweedie, R.L.: Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithm. Biometrika 83(1), 95–110 (1996b)MATHMathSciNetCrossRefGoogle Scholar
  34. Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph models for social networks. Soc. Netw. 29(2), 169–348 (2007)CrossRefGoogle Scholar
  35. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MATHMathSciNetGoogle Scholar
  36. Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)MATHCrossRefGoogle Scholar
  37. Welling, M., Teh, Y. W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, pp. 681–688 (2011)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.ENSAEParisFrance
  2. 2.School of Mathematical Sciences and Insight: The National Center for Data AnalyticsUniversity College DublinDublinIreland
  3. 3.Department of Mathematics and StatisticsUniversity of ReadingReadingUK

Personalised recommendations