# Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

- 1.7k Downloads
- 16 Citations

## Abstract

We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of *likelihood equivalence*, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a *prior network*—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most *k* = 1 parent. For the general case (*k* > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

## References

- Aczel, J. (1966).
*Lectures on Functional Equations and Their Applications*. Academic Press, New York.Google Scholar - Beinlich, I., Suermondt, H., Chavez, R., & Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In
*Proceedings of the Second European Conference on Artificial Intelligence in Medicine*, London. Springer Verlag, Berlin.Google Scholar - Buntine, W. (1991). Theory refinement on Bayesian networks. In
*Proceedings of Seventh Conference on Uncertainty in Artificial Intelligence*, Los Angeles, CA, pages 52–60. Morgan Kaufmann.Google Scholar - Camerini, P. & Maffioli, L. F. F. (1980). The κ best spanning arborescences of a network.
*Networks*, 10:91–110.Google Scholar - Chickering, D. (1995a). A transformational characterization of equivalent Bayesian-network structures. In
*Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence*, Montreal, QU, pages 87–98. Morgan Kaufmann.Google Scholar - Chickering, D. (March, 1995b). Search operators for learning equivalence classes of Bayesian-network structures. Technical Report R231, Cognitive Systems Laboratory, University of California, Los Angeles.Google Scholar
- Chickering, D., Geiger, D., & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In
*Proceedings of Fifth Conference on Artificial Intelligence and Statistics*, Ft. Lauderdale, FL, pages 112–128. Society for Artificial Intelligence in Statistics.Google Scholar - Chow, C. & Liu, C. (1968). Approximating discrete probability distributions with dependence trees.
*IEEE Transactions on Information Theory*, 14:462–467.Google Scholar - Cooper, G. & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data.
*Machine Learning*, 9:309–347.Google Scholar - Cooper, G. & Herskovits, E. (January, 1991). A Bayesian method for the induction of probabilistic networks from data. Technical Report SMI-91-1, Section on Medical Informatics, Stanford University.Google Scholar
- Dawid, A. & Lauritzen, S. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models.
*Annals of Statistics*, 21:1272–1317.Google Scholar - de Finetti, B. (1937). La prévision: See lois logiques, ses sources subjectives.
*Annales de l'Institut Henri Poincaré*, 7:1–68. Translated in Kyburg and Smokler, 1964.Google Scholar - Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm.
*Journal of the Royal Statistical Society*, B 39:1–38.Google Scholar - Druzdzel, M. & Simon, H. (1993). Causality in Bayesian belief networks. In
*Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence*, Washington, DC, pages 3–11. Morgan Kaufmann.Google Scholar - Edmonds, J. (1967). Optimum brachching.
*J. Res. NBS*, 71B:233–240.Google Scholar - Evans, J. & Minieka, E. (1991).
*Optimization algorithms for networks and graphs*. Marcel Dekker Inc., New York.Google Scholar - Gabow, H. (1977). Siam journal of computing.
*Networks*, 6:139–150.PubMedGoogle Scholar - Gabow, H., Galil, Z., & Spencer, T. (1984). Efficient implementation of graph algorithms using contraction. In
*Proceedings of FOCS*.Google Scholar - Geiger, D. & Heckerman, D. (1994). Learning Gaussian networks. In
*Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence*, Seattle, WA, pages 235–243. Morgan Kaufmann.Google Scholar - Geiger, D. & Heckerman, D. (1995). A characterization of the Dirichlet distribution with application to learning Bayesian networks. In
*Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence*, Montreal, QU, pages 196–207. Morgan Kaufmann.Google Scholar - Good, I. (1965).
*The Estimation of Probabilities*. MIT Press, Cambridge, MA.Google Scholar - Heckerman, D. (1995). A Bayesian approach for learning causal networks. In
*Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence*, Montreal, QU, pages 285–295. Morgan Kaufmann.Google Scholar - Heckerman, D. & Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In
*Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence*, Montreal, QU, pages 274–284. Morgan Kaufmann.Google Scholar - Heckerman, D., Geiger, D., & Chickering, D. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. In
*Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence*, Seattle, WA, pages 293–301. Morgan Kaufmann.Google Scholar - Heckerman, D. & Nathwani, B. (1992). An evaluation of the diagnostic accuracy of Pathfinder.
*Computers and Biomedical Research*, 25:56–74.Google Scholar - Heckerman, D. & Shachter, R. (1995). A definition and graphical representation of causality. In
*Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence*, Montreal, QU, pages 262–273. Morgan Kaufmann.Google Scholar - Höffgen, K. (revised 1993). Learning and robust learning of product distributions. Technical Report 464, Fachbereich Informatik, Universität Dortmund.Google Scholar
- Horvitz, E. (1987). Reasoning about beliefs and actions under computational resource constraints. In
*Proceedings of the Third Workshop on Uncertainty in Artificial Intelligence*, Seattle, WA. Association for Uncertainty in Artificial Intelligence, Mountain View, CA. Also in Kanal, L., Levitt, T., and Lemmer, J., editors,*Uncertainty in Artificial Intelligence 3*, pages 301–324. North-Holland, New York, 1989.Google Scholar - Howard, R. (1988). Uncertainty about probability: A decision-analysis perspective.
*Risk Analysis*, 8:91–98.Google Scholar - Howard, R. & Matheson, J. (1981). Influence diagrams. In Howard, R. and Matheson, J., editors,
*Readings on the Principles and Applications of Decision Analysis*, volume II, pages 721–762. Strategic Decisions Group, Menlo Park, CA.Google Scholar - Johnson (1985). How fast is local search? In
*FOCS*, pages 39–42.Google Scholar - Karp, R. (1971). A simple derivation of Edmond's algorithm for optimal branchings.
*Networks*, 1:265–272.Google Scholar - Korf, R. (1993). Linear-space best-first search.
*Artificial Intelligence*, 62:41–78.Google Scholar - Kullback, S. & Leibler, R. (1951). Information and sufficiency.
*Ann. Math. Statistics*, 22:79–86.Google Scholar - Kyburg, H. & Smokler, H. (1964).
*Studies in Subjective Probability*. Wiley and Sons, New York.Google Scholar - Lam, W. & Bacchus, F. (1993). Using causal information and local measures to learn Bayesian networks. In
*Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence*, Washington, DC, pages 243–250. Morgan Kaufmann.Google Scholar - Lauritzen, S. (1982).
*Lectures on Contingency Tables*. University of Aalborg Press, Aalborg, Denmark.Google Scholar - Madigan, D. & Raftery, A. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window.
*Journal of the American Statistical Association*, 89.Google Scholar - Matzkevich, I. & Abramson, B. (1993). Deriving a minimal I-map of a belief network relative to a target ordering of its nodes. In
*Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence*, Washington, DC, pages 159–165. Morgan Kaufmann.Google Scholar - Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953).
*Journal of Chemical Physics*, 21:1087–1092.Google Scholar - Pearl, J. (1988).
*Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference*. Morgan Kaufmann, San Mateo, CA.Google Scholar - Pearl, J. & Verma, T. (1991). A theory of inferred causation. In Allen, J., Fikes, R., and Sandewall, E., editors,
*Knowledge Representation and Reasoning: Proceedings of the Second International Conference*, pages 441–452. Morgan Kaufmann, New York.Google Scholar - Spiegelhalter, D., Dawid, A., Lauritzen, S., & Cowell, R. (1993). Bayesian analysis in expert systems.
*Statistical Science*, 8:219–282.Google Scholar - Spiegelhalter, D. & Lauritzen, S. (1990). Sequential updating of conditional probabilities on directed graphical structures.
*Networks*, 20:579–605.Google Scholar - Spirtes, P., Glymour, C., & Scheines, R. (1993).
*Causation, Prediction, and Search*. Springer-Verlag, New York.Google Scholar - Spirtes, P. & Meek, C. (1995). Learning Bayesian networks with discrete variables from data. In
*Proceedings of First International Conference on Knowledge Discovery and Data Mining*, Montreal, QU. Morgan Kaufmann.Google Scholar - Suzuki, J. (1993). A construction of Bayesian networks from databases based on an MDL scheme. In
*Proceedings of Ninth Conference on Uncertainty in Artificial Intelligence*, Washington, DC, pages 266–273. Morgan Kaufmann.Google Scholar - Tarjan, R. (1977). Finding optimal branchings.
*Networks*, 7:25–35.Google Scholar - Titterington, D. (1976). Updating a diagnostic system using unconfirmed cases.
*Applied Statistics*, 25:238–247.Google Scholar - Verma, T. & Pearl, J. (1990). Equivalence and synthesis of causal models. In
*Proceedings of Sixth Conference on Uncertainty in Artificial Intelligence*, Boston, MA, pages 220–227. Morgan Kaufmann.Google Scholar - Winkler, R. (1967). The assessment of prior distributions in Bayesian analysis.
*American Statistical Association Journal*, 62:776–800.Google Scholar - York, J. (1992).
*Bayesian methods for the analysis of misclassified or incomplete multivariate discrete data*. PhD thesis, Department of Statistics, University of Washington, Seattle.Google Scholar