Skip to main content

An Introduction to MCMC for Machine Learning

Abstract

This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

References

  1. Al-Qaq, W. A., Devetsikiotis, M., &; Townsend, J. K. (1995). Stochastic gradient optimization of importance sampling for the efficient simulation of digital communication systems. IEEE Transactions on Communications, 43:12, 2975-2985.

    Google Scholar 

  2. Albert, J., &; Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88:422, 669-679.

    Google Scholar 

  3. Anderson, H. L. (1986). Metropolis, Monte Carlo, and the MANIAC. Los Alamos Science, 14, 96-108.

    Google Scholar 

  4. Andrieu, C., &; Doucet, A. (1999). Joint Bayesian detection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Transactions on Signal Processing, 47:10, 2667-2676.

    Google Scholar 

  5. Andrieu, C., Breyer, L. A., &; Doucet, A. (1999). Convergence of simulated annealing using Foster-Lyapunov criteria. Technical Report CUED/F-INFENG/TR 346, Cambridge University Engineering Department.

  6. Andrieu, C., de Freitas, N., &; Doucet, A. (1999). Sequential MCMC for Bayesian model selection. In IEEE Higher Order Statistics Workshop, Caesarea, Israel (pp. 130-134).

  7. Andrieu, C., de Freitas, N., &; Doucet, A. (2000a). Reversible jump MCMC simulated annealing for neural networks. In Uncertainty in artificial intelligence (pp. 11-18). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  8. Andrieu, C., de Freitas, N., &; Doucet, A. (2000b). Robust full Bayesian methods for neural networks. In S. A. Solla, T. K. Leen, &; K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 379-385). MIT Press.

  9. Andrieu, C., de Freitas, N., &; Doucet, A. (2001a). Robust full Bayesian learning for radial basis networks. Neural Computation, 13:10, 2359-2407.

    Google Scholar 

  10. Andrieu, C., de Freitas, N., &; Doucet, A. (2001b). Rao-blackwellised particle filtering via data augmentation. Advances in Neural Information Processing Systems (NIPS13).

  11. Andrieu, C., Doucet, A., &; Punskaya, E. (2001). Sequential Monte Carlo methods for optimal filtering. In A Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.

    Google Scholar 

  12. Applegate, D., &; Kannan, R. (1991). Sampling and integration of near log-concave functions. In Proceedings of the Twenty Third Annual ACM Symposium on Theory of Computing (pp. 156-163).

  13. Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., &; Weitz, D. (2000). Approximating aggregate queries about web pages via random walks. In International Conference on Very Large Databases (pp. 535-544).

  14. Barber, D., &; Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, &; T. Petsche (Eds.), Advances in neural information processing systems 9 (pp. 340-346). Cambridge, MA: MIT Press.

    Google Scholar 

  15. Baum, L. E., Petrie, T., Soules, G., &; Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164-171.

    Google Scholar 

  16. Baxter, R. J. (1982). Exactly solved models in statistical mechanics. San Diego, CA: Academic Press.

    Google Scholar 

  17. Beichl, I., &; Sullivan, F. (2000). The Metropolis algorithm. Computing in Science &; Engineering, 2:1, 65-69.

    Google Scholar 

  18. Bergman, N. (1999). Recursive Bayesian estimation: Navigation and tracking applications. Ph.D. Thesis, Department of Electrical Engineering, Linköping University, Sweden.

    Google Scholar 

  19. Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H. F., &; Secret, A. (1994). The World-Wide Web. Communications of the ACM, 10:4, 49-63.

    Google Scholar 

  20. Besag, J., Green, P. J., Hidgon, D., &; Mengersen, K. (1995). Bayesian computation and stochastic systems. Statistical Science, 10:1, 3-66.

    Google Scholar 

  21. Bielza, C., Müller, P., &; Rios Insua, D. (1999). Decision Analysis by Augmented Probability Simulation, Management Science, 45:7, 995-1007.

    Google Scholar 

  22. Brooks, S. P. (1998). Markov chain Monte Carlo method and its application. The Statistician, 47:1 69-100.

    Google Scholar 

  23. Browne, W. J., &; Draper, D. (2000). Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics 15, 391-420.

    Google Scholar 

  24. Bucher, C. G. (1988). Adaptive sampling-An iterative fast Monte Carlo procedure. Structural Safety, 5, 119-126.

    Google Scholar 

  25. Bui, H. H., Venkatesh, S., &; West, G. (1999). On the recognition of abstract Markov policies. In National Conference on Artificial Intelligence (AAAI-2000).

  26. Carlin, B. P., &; Chib, S. (1995). Bayesian Model choice via MCMC. Journal of the Royal Statistical Society Series B, 57, 473-484.

    Google Scholar 

  27. Carter, C. K., &; Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81:3, 541-553.

    Google Scholar 

  28. Casella, G., &; Robert, C. P. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83:1, 81-94.

    Google Scholar 

  29. Casella, G., Mengersen, K. L., Robert, C. P., &; Titterington, D. M. (1999). Perfect slice samplers for mixtures of distributions. Technical Report BU-1453-M, Department of Biometrics, Cornell University.

  30. Celeux, G., &; Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly, 2, 73-82.

    Google Scholar 

  31. Celeux, G., &; Diebolt, J. (1992). A stochastic approximation typeEMalgorithm for the mixture problem. Stochastics and Stochastics Reports, 41, 127-146.

    Google Scholar 

  32. Chen, M. H., Shao, Q. M., &; Ibrahim, J. G. (Eds.) (2001). Monte Carlo methods for Bayesian computation. Berlin: Springer-Verlag.

    Google Scholar 

  33. Cheng, J., &; Druzdzel, M. J. (2000). AIS-BN:An adaptive importance sampling algorithm for evidential reasoning in large bayesian networks. Journal of Artificial Intelligence Research, 13, 155-188.

    Google Scholar 

  34. Chenney, S., &; Forsyth, D. A. (2000). Sampling plausible solutions to multi-body constraint problems. SIGGRAPH (pp. 219-228).

  35. Clark, E., &; Quinn, A. (1999). A data-driven Bayesian sampling scheme for unsupervised image segmentation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Arizona (Vol. 6, pp. 3497-3500).

    Google Scholar 

  36. Damien, P., Wakefield, J., &; Walker, S. (1999). Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables. Journal of the Royal Statistical Society B, 61:2, 331-344.

    Google Scholar 

  37. de Freitas, N., Højen-Sørensen, P., Jordan, M. I., &; Russell, S. (2001). Variational MCMC. In J. Breese &; D. Koller (Eds.), Uncertainty in artificial intelligence (pp. 120-127). San Matio, CA: Morgan Kaufmann.

    Google Scholar 

  38. de Freitas, N., Niranjan, M., Gee, A. H., &; Doucet, A. (2000). Sequential Monte Carlo methods to train neural network models. Neural Computation, 12:4, 955-993.

    Google Scholar 

  39. De Jong, P., &; Shephard, N. (1995). Efficient sampling from the smoothing density in time series models. Biometrika, 82:2, 339-350.

    Google Scholar 

  40. Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1-38.

    Google Scholar 

  41. Denison, D. G. T., Mallick, B. K., &; Smith, A. F. M. (1998). A Bayesian CART algorithm. Biometrika, 85, 363-377.

    Google Scholar 

  42. Diaconis, P., &; Saloff-Coste, L. (1998). What do we know about the Metropolis algorithm? Journal of Computer and System Sciences, 57, 20-36.

    Google Scholar 

  43. Doucet, A. (1998). On sequential simulation-based methods for Bayesian filtering. Technical Report CUED/FINFENG/TR 310, Department of Engineering, Cambridge University.

  44. Doucet, A., de Freitas, N., &; Gordon, N. J. (Eds.) (2001). Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.

    Google Scholar 

  45. Doucet, A., de Freitas, N., Murphy, K., &; Russell, S. (2000). Rao blackwellised particle filtering for dynamic Bayesian networks. In C. Boutilier &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 176-183). Morgan Kaufmann Publishers.

  46. Doucet, A., Godsill, S., &; Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10:3, 197-208.

    Google Scholar 

  47. Doucet, A., Godsill, S. J., &; Robert, C. P. (2000). Marginal maximum a posteriori estimation using MCMC. Technical Report CUED/F-INFENG/TR 375, Cambridge University Engineering Department.

  48. Duane, S., Kennedy, A. D., Pendleton, B. J., &; Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195:2, 216-222.

    Google Scholar 

  49. Dyer, M., Frieze, A., &; Kannan, R. (1991). A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM, 1:38, 1-17.

    Google Scholar 

  50. Eckhard, R. (1987). Stan Ulam, John Von Neumann and the Monte Carlo method. Los Alamos Science, 15, 131-136.

    Google Scholar 

  51. Escobar, M. D., &; West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577-588.

    Google Scholar 

  52. Fill, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. The Annals of Applied Probability, 8:1, 131-162.

    Google Scholar 

  53. Forsyth, D. A. (1999). Sampling, resampling and colour constancy. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 300-305).

  54. Fox, D., Thrun, S., Burgard,W., &; Dellaert, F. (2001). Particle filters for mobile robot localization. In A. Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.

    Google Scholar 

  55. Gelfand, A. E., &; Sahu, S. K. (1994). On Markov chain Monte Carlo acceleration. Journal of Computational and Graphical Statistics, 3, 261-276.

    Google Scholar 

  56. Gelfand, A. E., &; Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85:410, 398-409.

    Google Scholar 

  57. Geman, S., &; Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:6, 721-741.

    Google Scholar 

  58. Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 24, 1317-1399.

    Google Scholar 

  59. Ghahramani, Z. (1995). Factorial learning and the EM algorithm. In G. Tesauro, D. S. Touretzky, &; J. Alspector (Eds.), Advances in neural information processing systems 7 (pp. 617-624).

  60. Ghahramani, Z., &; Jordan, M. (1995). Factorial hidden Markov models. Technical Report 9502, MIT Artificial Intelligence Lab, MA.

    Google Scholar 

  61. Gilks, W. R., &; Berzuini, C. (1998). Monte Carlo inference for dynamic Bayesian models. Unpublished. Medical Research Council, Cambridge, UK.

  62. Gilks, W. R., &; Roberts, G. O. (1996). Strategies for improving MCMC. In W. R. Gilks, S. Richardson, &; D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 89-114). Chapman &; Hall.

  63. Gilks, W. R., Richardson, S., &; Spiegelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice. Suffolk: Chapman and Hall.

    Google Scholar 

  64. Gilks, W. R., Roberts, G. O., &; Sahu, S. K. (1998). Adaptive Markov chain Monte Carlo through regeneration. Journal of the American Statistical Association, 93, 763-769.

    Google Scholar 

  65. Gilks, W. R., Thomas, A., &; Spiegelhalter, D. J. (1994).A language and program for complex Bayesian modelling. The Statistician, 43, 169-178.

    Google Scholar 

  66. Godsill, S. J., &; Rayner, P. J. W. (Eds.) (1998). Digital audio restoration: A statistical model based approach. Berlin: Springer-Verlag.

    Google Scholar 

  67. Gordon, N. J., Salmond, D. J., &; Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140:2, 107-113.

    Google Scholar 

  68. Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.

    Google Scholar 

  69. Green, P. J., &; Richardson, S. (2000). Modelling heterogeneity with and without the Dirichlet process. Department of Statistics, Bristol University.

  70. Haario, H., &; Sacksman, E. (1991). Simulated annealing process in general state space. Advances in Applied Probability, 23, 866-893.

    Google Scholar 

  71. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their Applications. Biometrika 57, 97-109.

    Google Scholar 

  72. Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with application. Journal of American Statistical Association, 93:442, 585-595.

    Google Scholar 

  73. Holmes, C. C., &; Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10:5, 1217-1233.

    Google Scholar 

  74. Isard, M., &; Blake, A. (1996). Contour tracking by stochastic propagation of conditional density. In European Conference on Computer Vision (pp. 343-356). Cambridge, UK.

  75. Ishwaran, H. (1999). Application of hybrid Monte Carlo to Bayesian generalized linear models: Quasicomplete separation and neural networks. Journal of Computational and Graphical Statistics, 8, 779-799.

    Google Scholar 

  76. Jensen, C. S., Kong, A., &; Kjærulff, U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems. International Journal of Human-Computer Studies, 42, 647-666.

    Google Scholar 

  77. Jerrum, M., &; Sinclair, A. (1996). The Markov chain Monte Carlo method: an approach to approximate counting and integration. In D. S. Hochbaum (Ed.), Approximation algorithms for NP-hard problems (pp. 482-519). PWS Publishing.

  78. Jerrum, M., Sinclair, A., &; Vigoda, E. (2000). A polynomial-time approximation algorithm for the permanent of a matrix. Technical Report TR00-079, Electronic Colloquium on Computational Complexity.

  79. Kalos, M. H., &; Whitlock, P. A. (1986). Monte Carlo methods. New York: John Wiley &; Sons.

    Google Scholar 

  80. Kam, A. H. (2000). A general multiscale scheme for unsupervised image segmentation. Ph.D. Thesis, Department of Engineering, Cambridge University, Cambridge, UK.

    Google Scholar 

  81. Kanazawa, K., Koller, D., &; Russell, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 346-351). Morgan Kaufmann.

  82. Kannan, R., &; Li, G. (1996). Sampling according to the multivariate normal density. In 37th Annual Symposium on Foundations of Computer Science (pp. 204-212). IEEE.

  83. Kirkpatrick, S., Gelatt, C. D., &; Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671-680.

    Google Scholar 

  84. Levine, R., &; Casella, G. (2001).Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10:3, 422-440.

    Google Scholar 

  85. Liu, J. S. (Ed.) (2001). Monte Carlo strategies in scientific computing. Berlin: Springer-Verlag.

    Google Scholar 

  86. MacEachern, S. N., Clyde, M., &; Liu, J. S. (1999). Sequential importance sampling for nonparametric Bayes models: The next generation. Canadian Journal of Statistics, 27, 251-267.

    Google Scholar 

  87. McCulloch, C. E. (1994). Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association, 89:425, 330-335.

    Google Scholar 

  88. Mengersen, K. L., &; Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics, 24, 101-121.

    Google Scholar 

  89. Metropolis, N., &; Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44:247, 335-341.

    Google Scholar 

  90. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., &; Teller, E. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1091.

    Google Scholar 

  91. Meyn, S. P., &; Tweedie, R. L. (1993). Markov chains and stochastic stability. New York: Springer-Verlag.

    Google Scholar 

  92. Mira, A. (1999). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. Thesis, School of Statistics, University of Minnesota.

  93. Morris, R. D., Fitzgerald, W. J., &; Kokaram, A. C. (1996). A sampling based approach to line scratch removal from motion picture frames. In IEEE International Conference on Image Processing (pp. 801-804).

  94. Müller, P., &; Rios Insua, D. (1998). Issues in Bayesian analysis of neural network models. Neural Computation, 10, 571-592.

    Google Scholar 

  95. Mykland, P., Tierney, L., &; Yu, B. (1995). Regeneration in Markov chain samplers. Journal of the American Statistical Association, 90, 233-241.

    Google Scholar 

  96. Neal, R. M. (1993). Probabilistic inference using markov chain monte carlo methods. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto.

  97. Neal, R. M. (1996). Bayesian learning for neural networks. Lecture Notes in Statistics No. 118. New York: Springer-Verlag.

    Google Scholar 

  98. Neal, R. M. (2000). Slice sampling. Technical Report No. 2005, Department of Statistics, University of Toronto.

  99. Neuwald, A. F., Liu, J. S., Lipman, D. J., &; Lawrence, C. E. (1997). Extracting protein alignment models from the sequence database. Nucleic Acids Research, 25:9, 1665-1677.

    Google Scholar 

  100. Newton, M. A., &; Lee,Y. (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics, 56, 1088-1097.

    Google Scholar 

  101. Ormoneit, D., Lemieux, C., &; Fleet, D. (2001). Lattice particle filters. Uncertainty in artificial intelligence. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  102. Ortiz, L. E., &; Kaelbling, L. P. (2000). Adaptive importance sampling for estimation in structured domains. In C. Boutilier, &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 446-454). San Mateo, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  103. Page, L., Brin, S., Motwani, R., &; Winograd, T. (1998). The PageRank citation ranking: Bringing order to the Web. Stanford Digital Libraries Working Paper.

  104. Pasula, H., &; Russell, S. (2001). Approximate inference for first-order probabilistic languages. In International Joint Conference on Artificial Intelligence, Seattle.

  105. Pasula, H., Russell, S., Ostland, M., &; Ritov,Y. (1999). Tracking many objects with many sensors. In International Joint Conference on Artificial Intelligence, Stockholm.

  106. Pearl, J. (1987). Evidential reasoning using stochastic simulation. Artificial Intelligence, 32, 245-257.

    Google Scholar 

  107. Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika, 60:3, 607-612.

    Google Scholar 

  108. Pitt, M. K., &; Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94:446, 590-599.

    Google Scholar 

  109. Propp, J., &; Wilson, D. (1998). Coupling from the past: a user's guide. InD. Aldous, &; J. Propp (Eds.), Microsurveys in discrete probability. DIMACS series in discrete mathematics and theoretical computer science.

  110. Remondo, D., Srinivasan, R., Nicola, V. F., van Etten, W. C., &; Tattje, H. E. P. (2000). Adaptive importance sampling for performance evaluation and parameter optimization of communications systems. IEEE Transactions on Communications, 48:4, 557-565.

    Google Scholar 

  111. Richardson, S., &; Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, 59:4, 731-792.

    Google Scholar 

  112. Ridgeway, G. (1999). Generalization of boosting algorithms and applications of bayesian inference for massive datasets. Ph.D. Thesis, Department of Statistics, University of Washington.

  113. Rios Insua, D., &; Müller, P. (1998). Feedforward neural networks for nonparametric regression. In D. K. Dey, P. Müller, &; D. Sinha (Eds.), Practical nonparametric and semiparametric bayesian statistics (pp. 181-191). Springer Verlag.

  114. Robert, C. P., &; Casella, G. (1999). Monte Carlo statistical methods. New York: Springer-Verlag.

    Google Scholar 

  115. Roberts, G., &; Tweedie, R. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika, 83, 95-110.

    Google Scholar 

  116. Rubin, D. B. (1998). Using the SIR algorithm to simulate posterior distributions. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, &; A. F. M. Smith (Eds.), Bayesian statistics 3 (pp. 395-402). Cambridge, MA: Oxford University Press.

    Google Scholar 

  117. Rubinstein, R. Y. (Eds.) (1981). Simulation and the Monte Carlo method. New York: John Wiley and Sons.

    Google Scholar 

  118. Salmond, D., &; Gordon, N. (2001). Particles and mixtures for tracking and guidance. In A. Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.

    Google Scholar 

  119. Schuurmans, D., &; Southey, F. (2000). Monte Carlo inference via greedy importance sampling. In C. Boutilier, &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 523-532). Morgan Kaufmann Publishers.

  120. Sherman, R. P., Ho, Y. K., &; Dalal, S. R. (1999). Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling. Econometrics Journal, 2:2, 248-267.

    Google Scholar 

  121. Smith, P. J., Shafi, M., &; Gao, H. (1997). Quick simulation: A review of importance sampling techniques in communications systems. IEEE Journal on Selected Areas in Communications, 15:4, 597-613.

    Google Scholar 

  122. Stephens, M. (1997). Bayesian methods for mixtures of normal distributions. Ph.D. Thesis, Department of Statistics, Oxford University, England.

    Google Scholar 

  123. Swendsen, R. H., &; Wang, J. S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters, 58:2, 86-88.

    Google Scholar 

  124. Tanner, M. A., &; Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82:398, 528-550.

    Google Scholar 

  125. Thrun, S. (2000). Monte Carlo POMDPs. In S. Solla, T. Leen, &; K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 1064-1070). Cambridge, MA: MIT Press.

    Google Scholar 

  126. Tierney, L. (1994). Markov chains for exploring posterior distributions. The Annals of Statistics, 22:4, 1701-1762.

    Google Scholar 

  127. Tierney, L., &; Mira, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine, 18, 2507-2515.

    Google Scholar 

  128. Troughton, P. T., &; Godsill, S. J. (1998). A reversible jump sampler for autoregressive time series. In International Conference on Acoustics, Speech and Signal Processing (Vol. IV, pp. 2257-2260).

    Google Scholar 

  129. Tu, Z. W., &; Zhu, S. C. (2001). Image segmentation by data driven Markov chain Monte Carlo. In International Computer Vision Conference.

  130. Utsugi, A. (2001). Ensemble of independent factor analyzers with application to natural image analysis. Neural Processing Letters, 14:1, 49-60.

    Google Scholar 

  131. van der Merwe, R., Doucet, A., de Freitas, N., &; Wan, E. (2000). The unscented particle filter. Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department.

  132. Van Laarhoven, P. J., &; Arts, E. H. L. (1987). Simulated annealing: Theory and applications. Amsterdam: Reidel Publishers.

    Google Scholar 

  133. Veach, E., &; Guibas, L. J. (1997). Metropolis light transport. SIGGRAPH, 31, 65-76.

    Google Scholar 

  134. Vermaak, J., Andrieu, C., Doucet, A., &; Godsill, S. J. (1999). Non-stationary Bayesian modelling and enhancement of speech signals. Technical Report CUED/F-INFENG/TR, Cambridge University Engineering Department.

  135. Wakefield, J. C., Gelfand, A. E., &; Smith, A. F. M. (1991). Efficient generation of random variates via the ratio-of-uniforms methods. Statistics and Computing, 1, 129-133.

    Google Scholar 

  136. Wei, G. C. G., &; Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. Journal of the American Statistical Association, 85:411, 699-704.

    Google Scholar 

  137. West, M., Nevins, J. R., Marks, J. R., Spang, R., &; Zuzan, H. (2001). Bayesian regression analysis in the “large p, small n” paradigm with application in DNA microarray studies. Department of Statistics, Duke University.

  138. Wilkinson, D. J., &; Yeung, S. K. H. (2002). Conditional simulation from highly structured Gaussian systems, with application to blocking-MCMC for the Bayesian analysis of very large linear models. Statistics and Computing, 12, 287-300.

    Google Scholar 

  139. Wood, S., &; Kohn, R. (1998). A Bayesian approach to robust binary nonparametric regression. Journal of the American Statistical Association, 93:441, 203-213.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Andrieu, C., de Freitas, N., Doucet, A. et al. An Introduction to MCMC for Machine Learning. Machine Learning 50, 5–43 (2003). https://doi.org/10.1023/A:1020281327116

Download citation

  • Markov chain Monte Carlo
  • MCMC
  • sampling
  • stochastic algorithms