Statistics and Computing

, Volume 18, Issue 4, pp 343–373 | Cite as

A tutorial on adaptive MCMC

Article

Abstract

We review adaptive Markov chain Monte Carlo algorithms (MCMC) as a mean to optimise their performance. Using simple toy examples we review their theoretical underpinnings, and in particular show why adaptive MCMC algorithms might fail when some fundamental properties are not satisfied. This leads to guidelines concerning the design of correct algorithms. We then review criteria and the useful framework of stochastic approximation, which allows one to systematically optimise generally used criteria, but also analyse the properties of adaptive MCMC algorithms. We then propose a series of novel adaptive algorithms which prove to be robust and reliable in practice. These algorithms are applied to artificial and high dimensional scenarios, but also to the classic mine disaster dataset inference problem.

Keywords

MCMC Adaptive MCMC Controlled Markov chain Stochastic approximation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahn, J.-H., Oh, J.-H.: A constrained EM algorithm for principal component analysis. Neural Comput. 15, 57–65 (2003) MATHCrossRefGoogle Scholar
  2. Andradóttir, S.: A stochastic approximation algorithm with varying bounds. Oper. Res. 43(6), 1037–1048 (1995) MATHCrossRefMathSciNetGoogle Scholar
  3. Andrieu, C.: Discussion of Haario, H., Laine, M., Lehtinen, M., Saksman, E.: Markov chain Monte Carlo methods for high dimensional inversion in remote sensing (December 2003). J. R. Stat. Soc. Ser. B 66(3), 497–813 (2004) Google Scholar
  4. Andrieu, C., Atchadé, Y.F.: On the efficiency of adaptive MCMC algorithms. Electron. Commun. Probab. 12, 336–349 (2007) MATHGoogle Scholar
  5. Andrieu, C., Doucet, A.: Discussion of Brooks, S.P., Giudici, P., Roberts, G.O.: Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. Part 1. J. R. Stat. Soc. B 65, 3–55 (2003) Google Scholar
  6. Andrieu, C., Jasra, A.: Efficient and principled implementation of the tempering procedure. Tech. Rep. University of Bristol (2008) Google Scholar
  7. Andrieu, C., Moffa, G.: A Gaussian copula approach for adaptation in discrete scenarios (2008, in preparation) Google Scholar
  8. Andrieu, C., Moulines, É.: On the ergodicity properties of some adaptive MCMC algorithms. Ann. Appl. Probab. 16(3), 1462–1505 (2006) MATHCrossRefMathSciNetGoogle Scholar
  9. Andrieu, C., Robert, C.P.: Controlled MCMC for optimal sampling. Tech. Rep. 0125, Cahiers de Mathématiques du Ceremade, Université Paris-Dauphine (2001) Google Scholar
  10. Andrieu, C., Tadić, V.B.: The boundedness issue for controlled MCMC algorithms. Tech. Rep. University of Bristol (2007) Google Scholar
  11. Andrieu, C., Moulines, É., Priouret, P.: Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44(1), 283–312 (2005) MATHCrossRefMathSciNetGoogle Scholar
  12. Atchadé, Y.F.: An adaptive version for the Metropolis adjusted Langevin algorithm with a truncated drift. Methodol. Comput. Appl. Probab. 8, 235–254 (2006) MATHCrossRefMathSciNetGoogle Scholar
  13. Atchadé, Y.F., Fort, G.: Limit Theorems for some adaptive MCMC algorithms with subgeometric kernels. Tech. Rep. (2008) Google Scholar
  14. Atchadé, Y.F., Liu, J.S.: The Wang-Landau algorithm in general state spaces: applications and convergence analysis. Technical report Univ. of Michigan (2004) Google Scholar
  15. Atchadé, Y.F., Rosenthal, J.S.: On adaptive Markov chain Monte Carlo algorithms. Bernoulli 11, 815–828 (2005) MATHCrossRefMathSciNetGoogle Scholar
  16. Bai, Y., Roberts, G.O., Rosenthal, J.S.: On the Containment Condition for Adaptive Markov Chain Monte Carlo Algorithms. Tech. Rep. University of Toronto (2008) Google Scholar
  17. Bédard, M.: Optimal acceptance rates for metropolis algorithms: moving beyond 0.234. Tech. Rep. University of Montréal (2006) Google Scholar
  18. Bédard, M.: Weak convergence of metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab. 17, 1222–1244 (2007) MATHCrossRefMathSciNetGoogle Scholar
  19. Bennet, J.E., Racine-Poon, A., Wakefield, J.C.: MCMC for nonlinear hierarchical models. In: MCMC in Practice. Chapman & Hall, London (1996) Google Scholar
  20. Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (1990) MATHGoogle Scholar
  21. Besag, J., Green, P.J.: Spatial statistics and Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 55, 25–37 (1993) MATHMathSciNetGoogle Scholar
  22. Borkar, V.S.: Topics in Controlled Markov Chains. Longman, Harlow (1990) Google Scholar
  23. Browne, W.J., Draper, D.: Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Comput. Stat. 15, 391–420 (2000) MATHCrossRefGoogle Scholar
  24. Cappé, O., Douc, R., Gullin, A., Marin, J.-M., Robert, C.P.: Adaptive Importance Sampling in General Mixture Classes. Preprint (2007) Google Scholar
  25. Ceperley, D., Chester, G.V., Kalos, M.H.: Monte Carlo simulation of a many fermion study. Phys. Rev. B 16(7), 3081–3099 (1977) CrossRefGoogle Scholar
  26. Chauveau, D., Vandekerkhove, P.: Improving convergence of the Hastings-Metropolis algorithm with an adaptive proposal. Scand. J. Statist. 29(1), 13–29 (2001) CrossRefMathSciNetGoogle Scholar
  27. Chen, H.F., Guo, L., Gao, A.J.: Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds. Stoch. Process. Their Appl. 27(2), 217–231 (1988) MATHGoogle Scholar
  28. Chib, S., Greenberg, E., Winkelmann, R.: Posterior simulation and Bayes factors in panel count data models. J. Econ. 86, 33–54 (1998) MATHGoogle Scholar
  29. de Freitas, N., Højen-Sørensen, P., Jordan, M., Russell, S.: Variational MCMC. In: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 120–127. Morgan Kaufman, San Mateo (2001). ISBN:1-55860-800-1 Google Scholar
  30. Delmas, J.-F., Jourdain, B.: Does waste-recycling really improve Metropolis-Hastings Monte Carlo algorithm? Tech. Rep. Cermics, ENPC (2007) Google Scholar
  31. Delyon, B.: General results on the convergence of stochastic algorithms. IEEE Trans. Automat. Control 41(9), 1245–1256 (1996) MATHCrossRefMathSciNetGoogle Scholar
  32. Delyon, B., Juditsky, A.: Accelerated stochastic approximation. SIAM J. Optim. 3(4), 868–881 (1993) MATHCrossRefMathSciNetGoogle Scholar
  33. Douglas, C.: Simple adaptive algorithms for cholesky, LDL T, QR, and eigenvalue decompositions of autocorrelation matrices for sensor array data. In: Signals, Systems and Computers, 2001, Conference Record of the Thirty-Fifth Asilomar Conference, vol. 21, pp. 1134–1138 (2001) Google Scholar
  34. Erland, S.: On Adaptivity and Eigen-Decompositions of Markov Chains. Ph.D. thesis Norwegian University of Science and Technology (2003) Google Scholar
  35. Frenkel, D.: Waste-recycling Monte Carlo. In: Computer Simulations In Condensed Matter: from Materials to Chemical Biology. Lecture Notes in Physics, vol. 703, pp. 127–138. Springer, Berlin (2006) CrossRefGoogle Scholar
  36. Gåsemyr, J.: On an adaptive Metropolis-Hastings algorithm with independent proposal distribution. Scand. J. Stat. 30(1), 159–173 (2003). ISSN 0303-6898 CrossRefGoogle Scholar
  37. Gåsemyr, J., Natvig, B., Nygård, C.S.: An application of adaptive independent chain Metropolis–Hastings algorithms in Bayesian hazard rate estimation. Methodol. Comput. Appl. Probab. 6(3), 293–302(10) (2004) MATHCrossRefMathSciNetGoogle Scholar
  38. Gelfand, A.E., Sahu, S.K.: On Markov chain Monte Carlo acceleration. J. Comput. Graph. Stat. 3(3), 261–276 (1994) CrossRefMathSciNetGoogle Scholar
  39. Gelman, A., Roberts, G., Gilks, W.: Efficient Metropolis jumping rules. In: Bayesian Statistics, vol. 5. Oxford University Press, New York (1995) Google Scholar
  40. Geyer, C.J., Thompson, E.A.: Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 90, 909–920 (1995) MATHCrossRefGoogle Scholar
  41. Ghasemi, A., Sousa, E.S.: An EM-based subspace tracker for wireless communication applications. In: Vehicular Technology Conference. VTC-2005-Fall. IEEE 62nd, pp. 1787–1790 (2005) Google Scholar
  42. Gilks, W.R., Roberts, G.O., George, E.I.: Adaptive direction sampling. The Statistician 43, 179–189 (1994) CrossRefGoogle Scholar
  43. Gilks, W.R., Roberts, G.O., Sahu, S.K.: Adaptive Markov chain Monte Carlo through regeneration. J. Am. Stat. Assoc. 93, 1045–1054 (1998) MATHCrossRefMathSciNetGoogle Scholar
  44. Giordani, P., Kohn, R.: Efficient Bayesian inference for multiple change-point and mixture innovation models. Sveriges Riksbank Working Paper No. 196 (2006) Google Scholar
  45. Green, P.J.: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732 (1995) MATHCrossRefMathSciNetGoogle Scholar
  46. Green, P.J.: Trans-dimensional Markov chain Monte Carlo. In: Green, P.J., Hjort, N.L., Richardson, S. (eds.) Highly Structured Stochastic Systems. Oxford Statistical Science Series, vol. 27, pp. 179–198. Oxford University Press, London (2003) Google Scholar
  47. Green, P.J., Mira, A.: Delayed rejection in reversible jump Metropolis-Hastings. Biometrica 88(3) (2001) Google Scholar
  48. Haario, H., Saksman, E., Tamminen, J.: Adaptive proposal distribution for random walk Metropolis algorithm. Comput. Stat. 14(3), 375–395 (1999) MATHCrossRefGoogle Scholar
  49. Haario, H., Saksman, E., Tamminen, J.: An adaptive Metropolis algorithm. Bernoulli 7(2), 223–242 (2001) MATHCrossRefMathSciNetGoogle Scholar
  50. Haario, H., Laine, M., Mira, A., Saksman, E.: DRAM: Efficient adaptive MCMC (2003) Google Scholar
  51. Haario, H., Laine, M., Lehtinen, M., Saksman, E.: Markov chain Monte Carlo methods for high dimensional inversion in remote sensing. J. R. Stat. Soc. Ser. B 66(3), 591–607 (2004) MATHCrossRefMathSciNetGoogle Scholar
  52. Haario, H., Saksman, E., Tamminen, J.: Componentwise adaptation for high dimensional MCMC. Comput. Stat. 20, 265–274 (2005) MATHCrossRefMathSciNetGoogle Scholar
  53. Hastie, D.I.: Towards automatic reversible jump Markov chain Monte Carlo. Ph.D. thesis Bristol University, March 2005 Google Scholar
  54. Holden, L.: Adaptive chains. Tech. Rep. Norwegian Computing Center (1998) Google Scholar
  55. Holden, L. et al.: History matching using adaptive chains. Tech. Report Norwegian Computing Center (2002) Google Scholar
  56. Kesten, H.: Accelerated stochastic approximation. Ann. Math. Stat. 29(1), 41–59 (1958) MATHCrossRefMathSciNetGoogle Scholar
  57. Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998) MATHCrossRefGoogle Scholar
  58. Laskey, K.B., Myers, J.: Population Markov chain Monte Carlo. Mach. Learn. 50(1–2), 175–196 (2003) MATHCrossRefGoogle Scholar
  59. Liu, J., Liang, F., Wong, W.H.: The use of multiple-try method and local optimization in Metropolis sampling. J. Am. Stat. Assoc. 95, 121–134 (2000) MATHCrossRefMathSciNetGoogle Scholar
  60. Mykland, P., Tierney, L., Yu, B.: Regeneration in Markov chain samplers. J. Am. Stat. Assoc. 90, 233–241 (1995) MATHCrossRefMathSciNetGoogle Scholar
  61. Nott, D.J., Kohn, R.: Adaptive sampling for Bayesian variable selection. Biometrika 92(4), 747–763 (2005) CrossRefMathSciNetMATHGoogle Scholar
  62. Pasarica, C., Gelman, A.: Adaptively scaling the Metropolis algorithm using the average squared jumped distance. Tech. Rep. Department of Statistics, Columbia University (2003) Google Scholar
  63. Plakhov, A., Cruz, P.: A stochastic approximation algorithm with step-size adaptation. J. Math. Sci. 120(1), 964–973 (2004) MATHCrossRefMathSciNetGoogle Scholar
  64. Ramponi, A.: Stochastic adaptive selection of weights in the simulated tempering algorithm. J. Ital. Stat. Soc. 7(1), 27–55 (1998) CrossRefGoogle Scholar
  65. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951) MATHCrossRefMathSciNetGoogle Scholar
  66. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, Berlin (1999) MATHGoogle Scholar
  67. Roberts, G.O., Rosenthal, J.: Optimal scaling of discrete approximation to Langevin diffusion. J. R. Stat. Soc. B 60, 255–268 (1998) MATHCrossRefMathSciNetGoogle Scholar
  68. Roberts, G.O., Rosenthal, J.S.: Examples of adaptive MCMC. Technical Report University of Toronto (2006) Google Scholar
  69. Roberts, G.O., Rosenthal, J.S.: Coupling and ergodicity of adaptive MCMC. J. Appl. Probab. 44(2), 458–475 (2007) MATHCrossRefMathSciNetGoogle Scholar
  70. Roberts, G.O., Gelman, A., Gilks, W.: Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7, 110–120 (1997) MATHCrossRefMathSciNetGoogle Scholar
  71. Roweis, S.: EM algorithms for PCA and SPCA. Neural Inf. Process. Syst. 10, 626–632 (1997) Google Scholar
  72. Sahu, S.K., Zhigljavsky, A.A.: Adaptation for self regenerative MCMC. Available from http://www.maths.soton.ac.uk/staff/Sahu/research/papers/self.html
  73. Saksman, E., Vihola, M.: On the ergodicity of the adaptive Metropolis algorithm on unbounded domains (2008). arXiv:0806.2933
  74. Sherlock, C., Roberts, G.O.: Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets. Tech. Rep. University of Lancaster (2006) Google Scholar
  75. Sims, C.A.: Adaptive Metropolis-Hastings algorithm or Monte Carlo kernel estimation. Tech. report Princeton University (1998) Google Scholar
  76. Spall, J.C.: Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Automat. Control 45(10), 1839–1853 (2000) MATHCrossRefMathSciNetGoogle Scholar
  77. Stramer, O., Tweedie, R.L.: Langevin-type models II: self-targeting candidates for MCMC algorithms. Methodol. Comput. Appl. Probab. 1(3), 307–328 (1999) MATHCrossRefMathSciNetGoogle Scholar
  78. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) Google Scholar
  79. Tierney, L., Mira, A.: Some adaptive Monte Carlo methods for Bayesian inference. Stat. Med. 18, 2507–2515 (1999) CrossRefGoogle Scholar
  80. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 61, 611–622 (1999) MATHCrossRefMathSciNetGoogle Scholar
  81. Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction. Stochastic Modelling and Applied Probability. Springer, Berlin (2003) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.School of MathematicsUniversity of BristolBristolUK
  2. 2.Chairs of StatisticsÉcole Polytechnique Fédérale de LausanneLausanneSwitzerland

Personalised recommendations