Machine Learning

, Volume 50, Issue 1–2, pp 175–196 | Cite as

Population Markov Chain Monte Carlo

  • Kathryn Blackmond Laskey
  • James W. Myers

Abstract

Stochastic search algorithms inspired by physical and biological systems are applied to the problem of learning directed graphical probability models in the presence of missing observations and hidden variables. For this class of problems, deterministic search algorithms tend to halt at local optima, requiring random restarts to obtain solutions of acceptable quality. We compare three stochastic search algorithms: a Metropolis-Hastings Sampler (MHS), an Evolutionary Algorithm (EA), and a new hybrid algorithm called Population Markov Chain Monte Carlo, or popMCMC. PopMCMC uses statistical information from a population of MHSs to inform the proposal distributions for individual samplers in the population. Experimental results show that popMCMC and EAs learn more efficiently than the MHS with no information exchange. Populations of MCMC samplers exhibit more diversity than populations evolving according to EAs not satisfying physics-inspired local reversibility conditions.

Markov chain Monte Carlo Metropolis-Hastings algorithm graphical probabilistic models Bayesian networks Bayesian learning evolutionary algorithms 

References

  1. Back, T. (1996). Evolutionary algorithms in theory and practice. New York: Oxford University Press.Google Scholar
  2. Beinlich, I. A., &; Suermondt, H. J., et al. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the Second European Conference on Artificial Intelligence in Medicine.Google Scholar
  3. Cooper, G. F., &; Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309-347.Google Scholar
  4. Davis, T. E., &; Principe, J. C. (1993). A Markov chain framework for the simple genetic algorithm. Evolutionary Computation, 1:3, 269-288.Google Scholar
  5. DeJong, K. A. (1975). An analysis of the behavior of a class of genetic adaptive systems. In Computer and communication sciences (p. 256). Ann Arbor, MI: University of Michigan.Google Scholar
  6. DeJong, K. A., &; Spears, W. M. (1990). An analysis of the interacting roles of population size and crossover in genetic algorithms. In Proceedings of the First International Conference on Parallel Problem Solving from Nature, Dortmund, Germany.Google Scholar
  7. Feller, W. (1968). An introduction to probability theory and its applications. New York: Wiley.Google Scholar
  8. Fogel, L. J. (1991). System identification through simulated evolution: A machine learning approach to modeling. Needham Heights: Ginn Press.Google Scholar
  9. Friedman, N. (1998a). The Bayesian structural EM algorithm. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. San Mateo, CA: Morgan Kaufmann.Google Scholar
  10. Friedman, N. (1998b). Learning belief networks in the presence of missing values and hidden variables. In Fourteenth International Conference on Machine Learning (ICML-97). San Mateo, CA: Morgan Kaufmann.Google Scholar
  11. Gelman, A., Carlin, J., Stern, H., &; Rubin, D. (1995). Bayesian data analysis. London: Chapman and Hall.Google Scholar
  12. Gelman, A., &; Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-472.Google Scholar
  13. Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In E. D. Keramidas (Ed.), Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (pp. 156-163). Fairfax Station: Interface Foundation.Google Scholar
  14. Gilks, W. R., Richardson, S., &; Spiegelhalter, D. (Eds) (1996). Markov chain Monte Carlo in practice. London: Chapman and Hall.Google Scholar
  15. Gilks, W. R., Roberts, G. O., &; Sahu, S. K. (1998). Adaptive Markov chain Monte Carlo through regeneration. Journal of the American Statistical Association, 93, 1045-1054.Google Scholar
  16. Goldberg, D. E., &; Deb, K. (1991). A comparative analysis of selection schemes used in genetic algorithms. In G. J. E. Rawlins (Ed.), Foundations of genetic algorithms (Vol. 1, pp. 69-93). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:1, 97-109.Google Scholar
  18. Heckerman, D. (1996). A tutorial on learning with Bayesian networks. Redmond WA: Microsoft.Google Scholar
  19. Heckerman, D., Geiger, D., et al. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197-243.Google Scholar
  20. Hoeting, J., Madigan, D., Raftery, A., &; Volinsky, C. (1996). Bayesian model averaging, Technical Report #335, Department of Statistics, University of Washingtion, Seattle, WA.Google Scholar
  21. Holland, J. H. (1995). Adaptation in natural and artificial systems. Cambridge, MA: MIT Press.Google Scholar
  22. Holmes, C. C., &; Mallick, B. K. (1998). Parallel Markov chain Monte Carlo sampling: An evolutionary based approach. London: Imperial College.Google Scholar
  23. Jensen, F. V. (2001). Bayesian networks and decision graphs. New York: Springer.Google Scholar
  24. Kass, R., &; Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:430, 773-795.Google Scholar
  25. Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation systems. Complex Systems, 4, 461-476.Google Scholar
  26. Larrañaga, P., &; Poza, M., et al. (1996). Structure learning of Bayesian networks by genetic algorithms: A performance analysis of control parameters. IEEE Journal on Pattern Analysis and Machine Intelligence, 18:9, 912-926.Google Scholar
  27. Lauritzen, S. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics &; Data Analysis, 19, 191-201.Google Scholar
  28. Lauritzen, S. (1996). Graphical models. Oxford: Oxford Science Publications.Google Scholar
  29. Little, R., &; Rubin, D. (1987). Statistical analysis with missing data. New York: John Wiley &; Sons.Google Scholar
  30. Madigan, D., Raftery, A. E., et al. (1994). Strategies for graphical model selection. In P. Cheeseman, &; W. Oldford (Eds.), Selecting models from data: Artificial intelligence and statistics IV (pp. 91-100). Berlin: Springer-Verlag.Google Scholar
  31. Madigan, D.,&; York, J. (1993). Bayesian graphical models for discrete data. Technical Report no. 259, Department of Statistics, University of Washingtion, Seattle, WA.Google Scholar
  32. Metropolis, N., Rosenbluth, A. W., et al. (1953). Equations of state calculation by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.Google Scholar
  33. Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.Google Scholar
  34. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco: Morgan Kaufmann.Google Scholar
  35. Roberts, G. O. (1996). The Gibbs sampler and Metropolis-Hastings algorithm. In W. R. Gilks, S. Richardson, &; D. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice. London: Chapman &; Hall.Google Scholar
  36. Schwefel, H.-P. (1995). Evolution and optimum seeking. New York: John Wiley &; Sons.Google Scholar
  37. Sewell, W., &; Shah, V. (1968). Social class, parental encouragement, and educational aspirations. American Journal of Sociology, 73, 559-572.Google Scholar
  38. Spears, W. M. (1994). Simple subpopulation schemes. In Proceedings of the Third Annual Conference on Evolutionary Programming. San Diego: World Scientific.Google Scholar
  39. Spiegelhalter, D., &; Lauritzen, S. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks, 20, 279-605.Google Scholar
  40. Syswerda, G. (1989). Uniform crossover in genetic algorithms. In Proceedings of the 3rd International Conference on Genetic Algorithms. San Mateo, CA: Morgan Kaufmann.Google Scholar
  41. Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester: John Wiley &; Sons.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Kathryn Blackmond Laskey
    • 1
  • James W. Myers
    • 2
  1. 1.Department of Systems Engineering and Operations ResearchGeorge Mason UniversityFairfaxUSA
  2. 2.TRW, VAR1/9D02Reston

Personalised recommendations