Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks

Abstract

In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

References

  1. Beinlich, I., Suermondt, G., Chavez, R., &; Cooper, G. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proc. 2nd European Conf. on AI and Medicine, Berlin: Springer-Verlag.

    Google Scholar 

  2. Buntine, W. L. (1991). Theory refinement on Bayesian networks. In B. D. D'Ambrosio, P. Smets, &; P. P. Bonissone (Eds.), Proc. Seventh Annual Conference on Uncertainty Artificial Intelligence (UAI '91) (pp. 52-60). San Francisco: Morgan Kaufmann.

    Google Scholar 

  3. Buntine, W. L. (1996). A guide to the literature on learning probabilistic networks from data. IEEE Transactions on Knowledge and Data Engineering, 8, 195-210.

    Google Scholar 

  4. Casella, G., &; Robert, C. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83:1, 81-94.

    Google Scholar 

  5. Cooper, G. F., &; Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309-347.

    Google Scholar 

  6. Friedman, N., Goldszmidt, M., &; Wyner, A. (1999). Data analysis with Bayesian networks: A bootstrap approach. In Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99) (pp. 206-215). San Francisco: Morgan Kaufmann.

    Google Scholar 

  7. Friedman, N., Linial, M., Nachman, I., &; Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J. Computational Biology, 7, 601-620.

    Google Scholar 

  8. Gelfand, A., &; Smith, A. (1990). Sampling based approaches to calculating marginal densities. Journal American Statistical Association, 85, 398-409.

    Google Scholar 

  9. Gilks, W., Richardson, S., &; Spiegelhalter, D. (1996). Markov chain Monte Carlo methods in practice. CRC Press.

  10. Giudici, P., &; Green, P. (1999). Decomposable graphical Gaussian model determination. Biometrika, 86:4, 785-801.

    Google Scholar 

  11. Giudici, P., Green, P., &; Tarantola, C. (2000). Efficient model determination for discrete graphical models. Discussion Paper 99-93, Department of Statistics, Athens University of Economics and Business.

  12. Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.

    Google Scholar 

  13. Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. I. Jordan (Ed.), Learning in graphical models. Dordrecht, The Netherlands: Kluwer.

    Google Scholar 

  14. Heckerman, D., &; Geiger, D. (1995). Learning Bayesian networks: A unification for discrete and Gaussian domains. In P. Besnard &; S. Hanks (Eds.), Proc. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI'95) (pp. 274-284). San Francisco: Morgan Kaufmann.

    Google Scholar 

  15. Heckerman, D., Geiger, D., &; Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197-243.

    Google Scholar 

  16. Heckerman, D., Meek, C., &; Cooper, G. (1997). A Bayesian approach to causal discovery. Technical Report MSR-TR-97-05, Microsoft Research.

  17. Lander, E. (1999). Array of hope. Nature Genetics, 21:1, 3-4.

    Google Scholar 

  18. Larrañaga, P., Kuijpers, C., Murga, R., &; Yurramendi, Y. (1996). Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on System, Man and Cybernetics 26:4, 487-493.

    Google Scholar 

  19. Liu, J., Wong, W., &; Kong, A. (1994). Coveriance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika, 81:1, 27-40.

    Google Scholar 

  20. Madigan, D., Andersson, S., Perlman, M., &; Volinsky, C. (1996). Bayesian model averaging and model selection for Markov equivalence classes of acyclic graphs. Communications in Statistics: Theory and Methods, 25, 2493-2519.

    Google Scholar 

  21. Madigan, D., &; Raftery, E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam's window. Journal Americal Statistical Association, 89, 1535-1546.

    Google Scholar 

  22. Madigan, D., &; York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review, 63, 215-232.

    Google Scholar 

  23. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., &; Teller, E. (1953). Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.

    Google Scholar 

  24. Murphy, P. M., &; Aha, D. W. (1995). UCI repository of machine learning databases. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.

  25. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  26. Pereira, F., &; Singer, Y. (1999). An efficient extension to mixture techniques for prediction and decision trees. Machine Learning, 36:3, 183-199.

    Google Scholar 

  27. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., &; Futcher (1998). Comprehensive identification of cell cycle-regulated genes of the yeastSaccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273-3297.

    Google Scholar 

  28. Spirtes, P., Glymour, C., &; Scheines, R. (1993). Causation, prediction and search, Vol. 81 of Lecture Notes in Statistics. New York: Springer-Verlag.

    Google Scholar 

  29. Wallace, C., Korb, K., &; Dai, H. (1996). Causal discovery via MML. In Proc. 13th International Conference on Machine Learning (pp. 516-524).

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Friedman, N., Koller, D. Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks. Machine Learning 50, 95–125 (2003). https://doi.org/10.1023/A:1020249912095

Download citation

  • Bayesian networks
  • structure learning
  • MCMC
  • Bayesian model averaging