Machine Learning

, Volume 9, Issue 4, pp 309–347 | Cite as

A Bayesian method for the induction of probabilistic networks from data

  • Gregory F. Cooper
  • Edward Herskovits

Abstract

This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. Finally, we relate the methods in this paper to previous work, and we discuss open problems.

Keywords

probabilistic networks Bayesian belief networks machine learning induction 

References

  1. Agogino, A.M., & Rege, A. (1987). IDES: Influence diagram based expert system.Mathematical Modelling, 8, 227–233.Google Scholar
  2. Andreassen, S., Woldbye, M., Falck, B., & Andersen, S.K. (1987). MUNIN—A causal probabilistic network for interpretation of electromyographic findings.Proceedings of the International Joint Conference on Artificial Intelligence (pp. 366–372). Milan, Italy: Morgan Kaufmann.Google Scholar
  3. Beinlich, I.A., Suermondt, H.J., Chavez, R.M., & Cooper, G.F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks.Proceedings of the Second European Conference on Artificial Intelligence in Medicine (pp. 247–256). London, England.Google Scholar
  4. Blum, R.L. (1982). Discovery, confirmation, and incorporation of causal relationships from a large time-oriented clinical database: The RX project.Computers and Biomedical Research, 15, 164–187.Google Scholar
  5. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984).Classification and regression trees. Belmont, CA: Wadsworth.Google Scholar
  6. Buntine, W.L. (1990a). Myths and legends in learning classification rules.Proceedings of AAAI (pp. 736–742). Boston, MA: MIT Press.Google Scholar
  7. Buntine, W.L. (1990b).A theory of learning classification rules. Doctoral dissertation, School of Computing Science, University of Technology, Sydney, Australia.Google Scholar
  8. Carbonell, J.G. (Ed.) (1990). Special volume on machine learning.Artificial Intelligence, 40, 1–385.Google Scholar
  9. Chavez, R.M. & Cooper, G.F. (1990). KNET: Integrating hypermedia and normative Bayesian modeling. In R.D. Shachter, T.S. Levitt, L.N. Kanal, & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 4. Amsterdam: North-Holland.Google Scholar
  10. Cheeseman, P. (1983). A method of computing generalized Bayesian probability values for expert systems.Proceedings of the International Joint Conference on Artificial Intelligence (pp. 198–202). Karlsruhe, West Germany: Morgan Kaufmann.Google Scholar
  11. Cheeseman, P., Self, M., Kelly, J., Taylor, W., Freeman, D., & Stutz, J. (1988). Bayesian classification.Proceedings of AAAI (pp. 607–611). St. Paul, MN: Morgan Kaufmann.Google Scholar
  12. Chow, C.K. & Liu, C.N. (1968). Approximating discrete probability distributions with dependence trees.IEEE Transactions on Information Theory, 14, 462–467.Google Scholar
  13. Cooper, G.F. (1984).NESTOR: A computer-based medical diagnostic aid that integrates causal and probabilistic knowledge. Doctoral dissertation, Medical Information Sciences, Stanford University, Stanford, CA.Google Scholar
  14. Cooper G.F. (1989). Current research directions in the development of expert systems based on belief networks.Applied Stochastic Models and Data Analysis, 5, 39–52.Google Scholar
  15. Cooper, G.F. & Herskovits, E.H. (1991).A Bayesian method for the induction of probabilistic networks from data (Report SMI-91-1). Pittsburgh PA: University of Pittsburgh, Section of Medical Informatics. (Also available as Report KSL-91-02, from the Section on Medical Informatics, Stanford University, Stanford, CA.)Google Scholar
  16. Crawford, S.L. & Fung, R.M. (1991). An analysis of two probabilistic model induction techniques.Proceedings of the Third International Workshop on AI and Statistics (in press).Google Scholar
  17. deGroot, M.H. (1970).Optimal statistical decisions. New York: McGraw-Hill.Google Scholar
  18. Fung, R. & Shachter, R.D. (1991).Contingent influence diagrams (Research report 90-10). Mountain View, CA: Advanced Decision Systems.Google Scholar
  19. Fung, R.M. & Crawford, S.L. (1990a). Constructor: A system for the induction of probabilistic models.Proceedings of AAAI (pp. 762–769). Boston, MA: MIT Press.Google Scholar
  20. Fung, R.M., Crawford, S.L., Appelbaum, L.A., & Tong, R.M. (1990b). An architecture for probabilistic concept-based information retrieval.Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 392–404). Cambridge, MA.Google Scholar
  21. Geiger, D. & Heckerman, D.E. (1991). Advances in probabilistic reasoning.Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 118–126). Los Angeles, CA: Morgan Kaufmann.Google Scholar
  22. Geiger, D., Paz, A., & Pearl, J. (1990). Learning causal trees from dependence information.Proceedings of AAAI (pp. 770–776). Boston, MA: MIT Press.Google Scholar
  23. Gevarter, W.B. (1986).Automatic probabilistic knowledge acquisition from data NASA Technical Memorandum 88224). Mt. View, CA: NASA Ames Research Center.Google Scholar
  24. Glymour, C., Scheines, R., Spirtes, P., & Kelley, K. (1987).Discovering causal structure. New York: Academic Press.Google Scholar
  25. Glymour, C. & Spirtes, P. (1988). Latent variables, causal models and overidentifying constraints.Journal of Econometrics, 39, 175–198.Google Scholar
  26. Golmard, J.L., & Mallet, A. (1989). Learning probabilities in causal trees from incomplete databases.Proceedings of the IJCAI Workshop on Knowledge Discovery in Databases (pp. 117–126). Detroit, MI.Google Scholar
  27. Heckerman, D.E. (1990). Probabilistic similarity networks.Networks, 20, 607–636.Google Scholar
  28. Heckerman, D.E., Horvitz, E.J., & Nathwani, B.N. (1989). Update on the Pathfinder project.Proceedings of the Symposium on Computer Applications in Medical Care (pp. 203–207). Washington, DC: IEEE Computer Society Press.Google Scholar
  29. Henrion, M. (1988). Propagating uncertainty in Bayesian networks by logic sampling. In J.F. Lemmer & L.N. Kanal (Eds.),Uncertainty in artificial intelligence 2. Amsterdam: North-Holland.Google Scholar
  30. Henrion, M. (1990). An introduction to algorithms for inference in belief nets. In M. Henrion, R.D. Shachter, L.N. Kanal, & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 5. Amsterdam: North-Holland.Google Scholar
  31. Henrion, M. & Cooley, D.R. (1987). An experimental comparison of knowledge engineering for expert systems and for decision analysis.Proceedings of AAAI (pp. 471–476). Seattle, WA: Morgan Kaufmann.Google Scholar
  32. Herskovits, E.H. (1991).Computer-based probabilistic network construction. Doctoral dissertation, Medical Information Sciences, Stanford University, Stanford, CA.Google Scholar
  33. Herskovits, E.H. & Cooper, G.F. (1990). Kutató: An entropy-driven system for the construction of probabilistic expert systems from databases.Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 54–62). Cambridge, MA.Google Scholar
  34. Hinton, G.E. (1990). Connectionist learning procedures.Artificial Intelligence, 40, 185–234.Google Scholar
  35. Holtzman, S. (1989).Intelligent decision systems. Reading, MA: Addison-Wesley.Google Scholar
  36. Horvitz, E.J., Breese, J.S. & Henrion, M. (1988). Decision theory in expert systems and artificial intelligence.International Journal of Approximate Reasoning, 2, 247–302.Google Scholar
  37. Howard, R.A. (1988). Uncertainty about probability: A decision analysis perspective.Risk Analysis, 8, 91–98.Google Scholar
  38. Hunt, E.B., Marin, J., & Stone, P.T. (1966).Experiments in induction. New York: Academic Press.Google Scholar
  39. James, M. (1985).Classification algorithms. New York: John Wiley & Sons.Google Scholar
  40. Johnson, R.A. & Wichern, D.W. (1982).Applied multivariate statistical analysis. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  41. Kiiveri, H., Speed, T.P., & Carlin, J.B. (1984). Recursive causal models.Journal of the Australian Mathematical Society, 36, 30–52.Google Scholar
  42. Kwok, S.W. & Carter, C. (1990). Multiple decision trees. In R.D. Shachter, T.S. Levitt, L.N. Kanal, & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 4. Amsterdam: North-Holland.Google Scholar
  43. Lauritzen, S.L. & Spiegelhalter, D.J. (1988). Local computations with probabilities on graphical structures and their application to expert systems.Journal of the Royal Statistical Society (Series B), 50, 157–224.Google Scholar
  44. Liu, L., Wilkins, D.C., Ying, X., & Bian, Z. (1990). Minimum error tree decomposition.Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 180–185). Cambridge, MA.Google Scholar
  45. Michalski, R.S., Carbonell, J.G., & Mitchell, T.M. (Eds.) (1983).Machine learning: An artificial intelligence approach (Vol. 1). Palo Alto, CA: Tioga Press.Google Scholar
  46. Michalski, R.S., Carbonell, J.G., & Mitchell, T.M. (Eds.) (1986).Machine learning: An artificial intelligence approach (Vol. 2). Los Altos, CA: Morgan Kaufmann.Google Scholar
  47. Mitchell, T.M. (1980).The need for biases in learning generalizations (Report CBM-TR-5-H0). New Brunswick, NJ: Rutgers University, Department of Computer Science.Google Scholar
  48. Neapolitan, R. (1990).Probabilistic reasoning in expert systems. New York: John Wiley & Sons.Google Scholar
  49. Pearl, J. (1986). Fusion, propagation and structuring in belief networks.Artificial Intelligence, 29, 241–288.Google Scholar
  50. Pearl, J. (1988).Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann.Google Scholar
  51. Pearl, J. & Verma, T.S. (1991). A theory of inferred causality.Proceedings of the Second International Conference on the Principles of Knowledge Representation and Reasoning (pp. 441–452). Boston, MA: Morgan Kaufmann.Google Scholar
  52. Pittarelli, M. (1990). Reconstructability analysis: An overview.Revue Internationale de Systemique, 4, 5–32.Google Scholar
  53. Quinlan, J.R. (1986). Induction of decision trees.Machine Learning, 1, 81–106.Google Scholar
  54. Rebane, G. & Pearl, J. (1987). The recovery of causal poly-trees from statistical data.Proceedings of the Workshop on Uncertainty in Artificial Intelligence (pp. 222–228). Seattle, Washington.Google Scholar
  55. Robinson, R.W. (1977). Counting unlabeled acyclic digraphs. In C.H.C. Little (Ed.),Lecture notes in mathematics, 622: Combinatorial mathematics V. New York: Springer-Verlag. (Note: This paper also discusses counting of labeled acyclic graphs.)Google Scholar
  56. Shachter, R.D. (1986). Intelligent probabilistic inference. In L.N. Kanal & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 1. Amsterdam: North-Holland.Google Scholar
  57. Shachter, R.D. (1988). Probabilistic inference and influence diagrams.Operations Research 36, 589–604.Google Scholar
  58. Shachter, R.D. (1990). A linear approximation method for probabilistic inference. In R.D. Shachter, T.S. Levitt, L.N. Kanal, & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 4. Amsterdam: North-Holland.Google Scholar
  59. Shachter, R.D. & Kenley, C.R. (1989). Gaussian influence diagrams.Management Science, 35, 527–550.Google Scholar
  60. Spiegelhalter, D.J. & Lauritzen, S.L. (1990). Sequential updating of conditional probabilities on directed graphical structures.Networks, 20, 579–606.Google Scholar
  61. Spirtes, P. & Glymour, C. (1990).Causal structure among measured variables preserved with unmeasured variables (Report CMU-LCL-90-5). Pittsburgh, PA: Carnegie Mellon University, Department of Philosophy.Google Scholar
  62. Spirtes, P., Glymour, C., & Scheines, R. (1990a).Causal hypotheses, statistical inference, and automated model specification (Research report). Pittsburgh, PA: Carnegie Mellon University, Department of Philosophy.Google Scholar
  63. Spirtes, P., Glymour, C., & Scheines, R. (1990b). Causality from probability. In G. McKee (Ed.),Evolving knowledge in natural and artificial intelligence. London: Pitman.Google Scholar
  64. Spirtes, P., Glymour, C., & Scheines, R. (1991). An algorithm for fast recovery of sparse causal graphs.Social Science Computer Review, 9, 62–72.Google Scholar
  65. Spirtes, P., Scheines, R., & Glymour, C. (1990c). Simulation studies of the reliability of computer-aided model specification using the Tetrad II, EQS, and LISREL programs.Sociological Methods and Research, 19, 3–66.Google Scholar
  66. Srinivas, S., Russell, S., & Agogino, A. (190). Automated construction of sparse Bayesian networks for unstructured probabilistic models and domain information. In M. Henrion, R.D. Shachter, L.N. Kanal, & J.F. Lemmer (Eds.),Uncertainty in artificial intelligence 5. Amsterdam: North-Holland.Google Scholar
  67. Suermondt, H.J. & Amylon, M.D. (1989). Probabilistic prediction of the outcome of bone-marrow transplantation.Proceedings of the Symposium on Computer Applications in Medical Care (pp. 208–212). Washington, DC: IEEE Computer Society Press.Google Scholar
  68. Utgoff, P.E. (1986).Machine learning of inductive bias. Boston, MA: Kluwer Academic.Google Scholar
  69. Verma, T.S. & Pearl, J. (1990). Equivalence and synthesis of causal models.Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 220–227). Cambridge, MA.Google Scholar
  70. Wermuth, N. & Lauritzen, S. (1983). Graphical and recursive models for contingency tables.Biometrika, 72, 537–552.Google Scholar
  71. Wilks, S.S. (1962).Mathematical statistics. New York: John Wiley & Sons.Google Scholar

Copyright information

© Kluwer Academic Publishers 1992

Authors and Affiliations

  • Gregory F. Cooper
    • 1
  • Edward Herskovits
    • 2
  1. 1.Section of Medical Informatics, Department of MedicineUniversity of PittsburghPittsburgh
  2. 2.Noetic Systems, IncorporatedBaltimore

Personalised recommendations