Learning Bayesian networks: The combination of knowledge and statistical data
 David Heckerman,
 Dan Geiger,
 David M. Chickering
 … show all 3 hide
Abstract
We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highestscoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NPhard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesiannetwork learning algorithms, and apply this approach to a comparison of various approaches.
 Aczel, J. (1966) Lectures on Functional Equations and Their Applications. Academic Press, New York
 Beinlich, I., Suermondt, H., Chavez, R., Cooper, G. (1989) The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine. Springer Verlag, Berlin, London
 Buntine, W. (1991). Theory refinement on Bayesian networks. InProceedings of Seventh Conference on Uncertainty in Artificial Intelligence, Los Angeles, CA, pages 52–60. Morgan Kaufmann.
 Camerini, P., Maffioli, L. F. F. (1980) Thek best spanning arborescences of a network. Networks 10: pp. 91110
 Chickering, D. (1995a). A transformational characterization of equivalent Bayesiannetwork structures. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 87–98. Morgan Kaufmann.
 Chickering, D. (1995) Search operators for learning equivalence classes of Bayesian network structures. Cognitive Systems Laboratory, University of California, Los Angeles
 Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. InProceedings of Fifth Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, pages 112–128. Society for Artificial Intelligence in Statistics.
 Chow, C., Liu, C. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14: pp. 462467
 Cooper, G., Herskovits, E. (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9: pp. 309347
 Cooper, G. & Herskovits, E. (January, 1991). A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI911, Section on Medical Informatics, Stanford University.
 Dawid, A., Lauritzen, S. (1993) Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics 21: pp. 12721317
 de Finetti, B. (1937). La prévision: See lois logiques, ses sources subjectives.Annales de l'Institut Henri Poincaré, 7:1–68. Translated in Kyburg and Smokler, 1964.
 Dempster, A., Laird, N., Rubin, D. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39: pp. 138
 Druzdzel, M. & Simon, H. (1993). Causality in Bayesian belief networks. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 3–11. Morgan Kaufmann.
 Edmonds, J. (1967) Optimum brachching. J. Res. NBS 71B: pp. 233240
 Evans, J., Minieka, E. (1991) Optimization algorithms for networks and graphs. Marcel Dekker Inc., New York
 Gabow, H. (1977) Siam journal of computing. Networks 6: pp. 139150
 Gabow, H., Galil, Z., & Spencer, T. (1984). Efficient implementation of graph algorithms using contraction. InProceedings of FOCS.
 Geiger, D. & Heckerman, D. (1994). Learning Gaussian networks. InProceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 235–243. Morgan Kaufmann.
 Geiger, D. & Heckerman, D. (1995). A characterization of the Dirichlet distribution with application to learning Bayesian networks. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 196–207. Morgan Kaufmann.
 Good, I. (1965) The Estimation of Probabilities. MIT Press, Cambridge, MA
 Heckerman, D. (1995). A Bayesian approach for learning causal networks. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 285–295, Morgan Kaufmann.
 Heckerman, D. & Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 274–284. Morgan Kaufmann.
 Heckerman, D., Geiger, D., & Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. InProceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, pages 293–301. Morgan Kaufmann.
 Heckerman, D., Nathwani, B. (1992) An evaluation of the diagnostic accuracy of Pathfinder. Computers and Biomedical Research 25: pp. 5674
 Heckerman, D. & Shachter, R. (1995). A definition and graphical representation of causality. InProceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QU, pages 262–273. Morgan Kaufmann.
 Höffgen, K. (revised 1993). Learning and robust learning of product distributions. Technical Report 464, Fachbereich Informatik, Universität Dortmund.
 Horvitz, E. (1987) Reasoning about beliefs and actions under computational resource constraints. Association for Uncertainty in Artificial Intelligence, Mountain View, CA
 Howard, R. (1988) Uncertainty about probability: A decisionanalysis perspective. Risk Analysis 8: pp. 9198
 Howard, R., Matheson, J. Influence diagrams. In: Howard, R., Matheson, J. eds. (1981) Readings on the Principles and Applications of Decision Analysis, volume II. Strategic Decisions Group, Menlo Park, CA, pp. 721762
 Johnson (1985). How fast is local search? InFOCS, pages 39–42.
 Karp, R. (1971) A simple derivation of Edmond's algorithm for optimal branchings. Networks 1: pp. 265272
 Korf, R. (1993) Linearspace bestfirst search. Artificial Intelligence 62: pp. 4178
 Kullback, S., Leibler, R. (1951) Information and sufficiency. Ann. Math. Statistics 22: pp. 7986
 Kyburg, H., Smokler, H. (1964) Studies in Subjective Probability. Wiley and Sons, New York
 Lam, W. & Bacchus, F. (1993). Using causal information and local measures to learn Bayesian networks. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 243–250. Morgan Kaufmann.
 Lauritzen, S. (1982) Lectures on Contingency Tables. University of Aalborg Press, Aalborg, Denmark
 Madigan, D. & Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window.Journal of the American Statistical Association, 89.
 Matzkevich, I. & Abramson, B. (1993). Deriving a minimal Imap of a belief network relative to a target ordering of its nodes. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 159–165. Morgan Kaufmann.
 Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E. (1953) Journal of Chemical Physics 21: pp. 10871092
 Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA
 Pearl, J., Verma, T. A theory of inferred causation. In: Allen, J., Fikes, R., Sandewall, E. eds. (1991) Knowledge Representation and Reasoning: Proceedings of the Second International Conference. Morgan Kaufmann, New York, pp. 441452
 Spiegelhalter, D., Dawid, A., Lauritzen, S., Cowell, R. (1993) Bayesian analysis in expert systems. Statistical Science 8: pp. 219282
 Spiegelhalter, D., Lauritzen, S. (1990) Sequential updating of conditional probabilities on directed graphical structures. Networks 20: pp. 579605
 Spirtes, P., Glymour, C., Scheines, R. (1993) Causation, Prediction, and Search. SpringerVerlag, New York
 Spirtes, P. & Meek, C. (1995). Learning Bayesian networks with discrete variables from data. InProceedings of First International Conference on Knowledge Discovery and Data Mining, Montreal, QU. Morgan Kaufmann.
 Suzuki, J. (1993). A construction of Bayesian networks from databases based on an MDL scheme. InProceedings of Ninth Conference on Uncertainty in Artificial Intelligence, Washington, DC, pages 266–273. Morgan Kaufmann.
 Tarjan, R. (1977) Finding optimal branchings. Networks 7: pp. 2535
 Titterington, D. (1976) Updating a diagnostic system using unconfirmed cases. Applied Statistics 25: pp. 238247
 Verma, T. & Pearl, J. (1990). Equivalence and synthesis of causal models. InProceedings of Sixth Conference on Uncertainty in Artificial Intelligence, Boston, MA, pages 220–227. Morgan Kaufmann.
 Winkler, R. (1967) The assessment of prior distributions in Bayesian analysis. American Statistical Association Journal 62: pp. 776800
 York, J. (1992) Bayesian methods for the analysis of misclassified or incomplete multivariate discrete data. Department of Statistics, University of Washington, Seattle
 Title
 Learning Bayesian networks: The combination of knowledge and statistical data
 Journal

Machine Learning
Volume 20, Issue 3 , pp 197243
 Cover Date
 19950901
 DOI
 10.1007/BF00994016
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 Bayesian networks
 learning
 Dirichlet
 likelihood equivalence
 maximum branching
 heuristic search
 Industry Sectors
 Authors

 David Heckerman ^{(1)}
 Dan Geiger ^{(1)}
 David M. Chickering ^{(1)}
 Author Affiliations

 1. Microsoft Research, 9S, 980526399, Redmond, WA