Machine Learning

, Volume 20, Issue 3, pp 197–243

Learning Bayesian networks: The combination of knowledge and statistical data

  • David Heckerman
  • Dan Geiger
  • David M. Chickering
Article

Abstract

We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

Keywords

Bayesian networks learning Dirichlet likelihood equivalence maximum branching heuristic search 

References

  1. Aczel, J. (1966).Lectures on Functional Equations and Their Applications. Academic Press, New York.Google Scholar
  2. Beinlich, I., Suermondt, H., Chavez, R., & Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. InProceedings of the Second European Conference on Artificial Intelligence in Medicine London: Springer Verlag, Berlin.Google Scholar
  3. Buntine, W. (1991). Theory refinement on Bayesian networks. InProceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pages 52–60. Morgan Kaufmann.Google Scholar
  4. Camerini, P. & Maffioli, L. F. F. (1980). Thek best spanning arborescences of a network.Networks 10:91–110.Google Scholar
  5. Chickering, D. (1995a). A transformational characterization of equivalent Bayesian-network structures. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 87–98. Morgan Kaufmann.Google Scholar
  6. Chickering, D. (March, 1995b). Search operators for learning equivalence classes of Bayesian- network structures. Technical Report R231, Cognitive Systems Laboratory, University of California, Los Angeles.Google Scholar
  7. Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. InProceedings of Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pages 112–128. Society for Artificial Intelligence in Statistics.Google Scholar
  8. Chow, C. & Liu, C. (1968). Approximating discrete probability distributions with dependence trees.IEEE Transactions on Information Theory 14:462–467.Google Scholar
  9. Cooper, G. & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data.Machine Learning 9:309–347.Google Scholar
  10. Cooper, G. & Herskovits, E. (January, 1991). A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI-91-1, Section on Medical Informatics, Stanford University.Google Scholar
  11. Dawid, A. & Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models.Annals of Statistics 21:1272–1317.Google Scholar
  12. de Finetti, B. (1937). La prévision: See lois logiques, ses sources subjectives.Annales de l'Institut Henri Poincaré, 7:1–68. Translated in Kyburg and Smokler, 1964.Google Scholar
  13. Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm.Journal of the Royal Statistical Society,B 39:1–38.Google Scholar
  14. Druzdzel, M. & Simon, H. (1993). Causality in Bayesian belief networks. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 3–11. Morgan Kaufmann.Google Scholar
  15. Edmonds, J. (1967). Optimum brachching.J. Res. NBS 71B:233–240.Google Scholar
  16. Evans, J. & Minieka, E. (1991).Optimization algorithms for networks and graphs. Marcel Dekker Inc., New York.Google Scholar
  17. Gabow, H. (1977). Siam journal of computing.Networks 6:139–150.Google Scholar
  18. Gabow, H., Galil, Z., & Spencer, T. (1984). Efficient implementation of graph algorithms using contraction. InProceedings of FOCS.Google Scholar
  19. Geiger, D. & Heckerman, D. (1994). Learning Gaussian networks. InProceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 235–243. Morgan Kaufmann.Google Scholar
  20. Geiger, D. & Heckerman, D. (1995). A characterization of the Dirichlet distribution with application to learning Bayesian networks. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 196–207. Morgan Kaufmann.Google Scholar
  21. Good, I. (1965).The Estimation of Probabilities. MIT Press, Cambridge, MA.Google Scholar
  22. Heckerman, D. (1995). A Bayesian approach for learning causal networks. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 285–295, Morgan Kaufmann.Google Scholar
  23. Heckerman, D. & Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 274–284. Morgan Kaufmann.Google Scholar
  24. Heckerman, D., Geiger, D., & Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. InProceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 293–301. Morgan Kaufmann.Google Scholar
  25. Heckerman, D. & Nathwani, B. (1992). An evaluation of the diagnostic accuracy of Pathfinder.Computers and Biomedical Research 25:56–74.Google Scholar
  26. Heckerman, D. & Shachter, R. (1995). A definition and graphical representation of causality. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 262–273. Morgan Kaufmann.Google Scholar
  27. Höffgen, K. (revised 1993). Learning and robust learning of product distributions. Technical Report 464, Fachbereich Informatik, Universität Dortmund.Google Scholar
  28. Horvitz, E. (1987). Reasoning about beliefs and actions under computational resource constraints. InProceedings of the Third Workshop on Uncertainty in Artificial Intelligence Seattle, WA, Association for Uncertainty in Artificial Intelligence, Mountain View, CA. Also in Kanal, L., Levitt, T., and Lemmer, J., editors,Uncertainty in Artificial Intelligence 3, pages 301–324. North-Holland, New York, 1989.Google Scholar
  29. Howard, R. (1988). Uncertainty about probability: A decision-analysis perspective.Risk Analysis 8:91–98.Google Scholar
  30. Howard, R. & Matheson, J. (1981). Influence diagrams. In Howard, R. and Matheson, J., editors,Readings on the Principles and Applications of Decision Analysis, volume II, pages 721–762. Strategic Decisions Group, Menlo Park, CA.Google Scholar
  31. Johnson (1985). How fast is local search? InFOCS, pages 39–42.Google Scholar
  32. Karp, R. (1971). A simple derivation of Edmond's algorithm for optimal branchings.Networks 1:265–272.Google Scholar
  33. Korf, R. (1993). Linear-space best-first search.Artificial Intelligence 62:41–78.Google Scholar
  34. Kullback, S. & Leibler, R. (1951). Information and sufficiency.Ann. Math. Statistics 22:79–86.Google Scholar
  35. Kyburg, H. & Smokler, H. (1964).Studies in Subjective Probability. Wiley and Sons, New York.Google Scholar
  36. Lam, W. & Bacchus, F. (1993). Using causal information and local measures to learn Bayesian networks. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 243–250. Morgan Kaufmann.Google Scholar
  37. Lauritzen, S. (1982).Lectures on Contingency Tables. University of Aalborg Press, Aalborg, Denmark.Google Scholar
  38. Madigan, D. & Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window.Journal of the American Statistical Association, 89.Google Scholar
  39. Matzkevich, I. & Abramson, B. (1993). Deriving a minimal I-map of a belief network relative to a target ordering of its nodes. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 159–165. Morgan Kaufmann.Google Scholar
  40. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953).Journal of Chemical Physics 21:1087–1092.Google Scholar
  41. Pearl, J. (1988).Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA.Google Scholar
  42. Pearl, J. & Verma, T. (1991). A theory of inferred causation. In Allen, J., Fikes, R., and Sandewall, E., editors,Knowledge Representation and Reasoning: Proceedings of the Second International Conference pages 441–452. Morgan Kaufmann, New York.Google Scholar
  43. Spiegelhalter, D., Dawid, A., Lauritzen, S., & Cowell, R. (1993). Bayesian analysis in expert systems.Statistical Science, 8:219–282.Google Scholar
  44. Spiegelhalter, D. & Lauritzen, S. (1990). Sequential updating of conditional probabilities on directed graphical structures.Networks, 20:579–605.Google Scholar
  45. Spirtes, P., Glymour, C., & Scheines, R. (1993).Causation, Prediction, and Search. Springer-Verlag, New York.Google Scholar
  46. Spirtes, P. & Meek, C. (1995). Learning Bayesian networks with discrete variables from data. InProceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU. Morgan Kaufmann.Google Scholar
  47. Suzuki, J. (1993). A construction of Bayesian networks from databases based on an MDL scheme. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 266–273. Morgan Kaufmann.Google Scholar
  48. Tarjan, R. (1977). Finding optimal branchings.Networks, 7:25–35.Google Scholar
  49. Titterington, D. (1976). Updating a diagnostic system using unconfirmed cases.Applied Statistics, 25:238–247.Google Scholar
  50. Verma, T. & Pearl, J. (1990). Equivalence and synthesis of causal models. InProceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pages 220–227. Morgan Kaufmann.Google Scholar
  51. Winkler, R. (1967). The assessment of prior distributions in Bayesian analysis.American Statistical Association Journal, 62:776–800.Google Scholar
  52. York, J. (1992).Bayesian methods for the analysis of misclassified or incomplete multivariate discrete data. PhD thesis. Department of Statistics, University of Washington, Seattle.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • David Heckerman
    • 1
  • Dan Geiger
    • 1
  • David M. Chickering
    • 1
  1. 1.Microsoft Research, 9SRedmond
  2. 2.Computer Science Department, TechnionHaifaIsrael

Personalised recommendations