Machine Learning

, Volume 50, Issue 1–2, pp 95–125 | Cite as

Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks

  • Nir Friedman
  • Daphne Koller

Abstract

In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

Bayesian networks structure learning MCMC Bayesian model averaging 

References

  1. Beinlich, I., Suermondt, G., Chavez, R., &; Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proc. 2nd European Conf. on AI and Medicine, Berlin: Springer-Verlag.Google Scholar
  2. Buntine, W. L. (1991). Theory refinement on Bayesian networks. In B. D. D'Ambrosio, P. Smets, &; P. P. Bonissone (Eds.), Proc. Seventh Annual Conference on Uncertainty Artificial Intelligence (UAI '91) (pp. 52-60). San Francisco: Morgan Kaufmann.Google Scholar
  3. Buntine, W. L. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8, 195-210.Google Scholar
  4. Casella, G., &; Robert, C. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83:1, 81-94.Google Scholar
  5. Cooper, G. F., &; Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309-347.Google Scholar
  6. Friedman, N., Goldszmidt, M., &; Wyner, A. (1999). Data analysis with Bayesian networks: A bootstrap approach. In Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99) (pp. 206-215). San Francisco: Morgan Kaufmann.Google Scholar
  7. Friedman, N., Linial, M., Nachman, I., &; Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J. Computational Biology, 7, 601-620.Google Scholar
  8. Gelfand, A., &; Smith, A. (1990). Sampling based approaches to calculating marginal densities. Journal American Statistical Association, 85, 398-409.Google Scholar
  9. Gilks, W., Richardson, S., &; Spiegelhalter, D. (1996). Markov chain Monte Carlo methods in practice. CRC Press.Google Scholar
  10. Giudici, P., &; Green, P. (1999). Decomposable graphical Gaussian model determination. Biometrika, 86:4, 785-801.Google Scholar
  11. Giudici, P., Green, P., &; Tarantola, C. (2000). Efficient model determination for discrete graphical models. Discussion Paper 99-93, Department of Statistics, Athens University of Economics and Business.Google Scholar
  12. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.Google Scholar
  13. Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in graphical models. Dordrecht, The Netherlands: Kluwer.Google Scholar
  14. Heckerman, D., &; Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In P. Besnard &; S. Hanks (Eds.), Proc. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI'95) (pp. 274-284). San Francisco: Morgan Kaufmann.Google Scholar
  15. Heckerman, D., Geiger, D., &; Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197-243.Google Scholar
  16. Heckerman, D., Meek, C., &; Cooper, G. (1997). A Bayesian approach to causal discovery. Technical Report MSR-TR-97-05, Microsoft Research.Google Scholar
  17. Lander, E. (1999). Array of hope. Nature Genetics, 21:1, 3-4.Google Scholar
  18. Larrañaga, P., Kuijpers, C., Murga, R., &; Yurramendi, Y. (1996). Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on System, Man and Cybernetics 26:4, 487-493.Google Scholar
  19. Liu, J., Wong, W., &; Kong, A. (1994). Coveriance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81:1, 27-40.Google Scholar
  20. Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic graphs. Communications in Statistics: Theory and Methods, 25, 2493-2519.Google Scholar
  21. Madigan, D., &; Raftery, E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal Americal Statistical Association, 89, 1535-1546.Google Scholar
  22. Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.Google Scholar
  23. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., &; Teller, E. (1953). Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.Google Scholar
  24. Murphy, P. M., &; Aha, D. W. (1995). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
  25. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco, CA: Morgan Kaufmann.Google Scholar
  26. Pereira, F., &; Singer, Y. (1999). An efficient extension to mixture techniques for prediction and decision trees. Machine Learning, 36:3, 183-199.Google Scholar
  27. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., &; Futcher (1998). Comprehensive identification of cell cycle-regulated genes of the yeastSaccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273-3297.Google Scholar
  28. Spirtes, P., Glymour, C., &; Scheines, R. (1993). Causation, prediction and search, Vol. 81 of Lecture Notes in Statistics. New York: Springer-Verlag.Google Scholar
  29. Wallace, C., Korb, K., &; Dai, H. (1996). Causal discovery via MML. In Proc. 13th International Conference on Machine Learning (pp. 516-524).Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Nir Friedman
    • 1
  • Daphne Koller
    • 2
  1. 1.School of Computer Science &; EngineeringHebrew UniversityJerusalemIsrael
  2. 2.Computer Science DepartmentStanford UniversityStanfordUSA

Personalised recommendations