The max-min hill-climbing Bayesian network structure learning algorithm

Abstract

We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at http://www.dsl-lab.org/supplements/mmhc_paper/mmhc_index.html.

References

  1. Abramson, B., Brown, J., Edwards, W., Murphy, A., & Winkler, R. L. (1996). Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12, 57–71.

    Article  Google Scholar 

  2. Acid, S., de Campos, L., Fernandez-Luna, J., Rodriguez, S., Rodriguez, J., & Salcedo, J. (2004). A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service. Artificial Intelligence in Medicine, 30, 215–232.

    Google Scholar 

  3. Acid, S., & de Cam-pos, L. M. (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. Journal of Artificial Intelligence Research, 445–490.

  4. Acid, S., & de Cam-pos, L. (2001). A hybrid methodology for learning belief networks: BENEDICT. International Journal of Approximate Reasoning, 235–262.

  5. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

    MATH  MathSciNet  Article  Google Scholar 

  6. Aliferis, C. F., Tsamardinos, I., Statnikov, A., & Brown, L. E. (2003a). Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery. In International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS ’03) (pp. 371–376).

  7. Aliferis, C. F., Tsamardinos, I., & Statnikov, A. (2003b). HITON, A novel markov blanket algorithm for optimal variable selection. In American Medical Informatics Association (AMIA) (pp. 21–25).

  8. Andreassen, S., Jensen, F. V., Andersen, S. K., Falck, B., Kharulff, U., & Woldbye, M. (1989). MUNIN—An expert EMG assistant. In J. E. Desmedt (Eds.), Computer-aided electromyography and expert systems.

  9. Baeze-Yates, R., & Ribiero-Neto, B. (1999). Modern information retrieval. Addison-Wesley Pub Co.

  10. Beal, M. J. & Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian statistics 7. Oxford University Press.

  11. Beinlich, I. A., Suermondt, H., Chavez, R., Cooper, G., et al. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Second European Conference in Artificial Intelligence in Medicine.

  12. Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29.

  13. Bouckaert, R. (1995). Bayesian belief networks from construction to inference. Ph.D. thesis, University of Utrecht.

  14. Brown, L., Tsamardinos, I., & Aliferis, C. (2004). A novel algorithm for scalable and accurate bayesian network learning. In 11th World Congress on Medical Informatics (MEDINFO). San Francisco, California.

  15. Brown, L. E., Tsamardinos, I., & Aliferis, C. F. (2005). A comparison of novel and state-of-the-art polynomial Bayesian network learning algorithms. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI).

  16. Chapman, W. W., Fizman, M., Chapman, B. E. & Haug, P. J. (2001). A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. Journal of Biomedical Informatics, 34, 4–14.

    Google Scholar 

  17. Cheng, J., Bell, D., & Liu, W. (1998). Learning Bayesian networks from data: An efficient approach based on information theory. Technical report, University of Alberta, Canada.

  18. Cheng, J., Greiner, R., Kelly, J., Bell, D. A. & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137, 43–90.

    MATH  MathSciNet  Article  Google Scholar 

  19. Chickering, D. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-95). San Francisco, CA (pp. 87–98). Morgan Kaufmann Publishers.

  20. Chickering, D. (1996). Learning Bayesian networks is NP-complete. In D. Fisher and H. Lenz (Eds.), Learning from data: Artificial intelligence and statistics V (pp. 121–130) Springer-Verlag.

  21. Chickering, D. (2002b). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 445–498.

  22. Chickering, D., Geiger, D. & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Fifth International Workshop on Artificial Intelligence and Statistics (pp. 112–128).

  23. Chickering, D., Meek, C. & Heckerman D. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.

    Google Scholar 

  24. Chickering, D. M. (2002a). Optimal structure identification with greedy search. Journal of Machine Learning Research, 507–554.

  25. Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9 (4), 309–347.

    MATH  Google Scholar 

  26. Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhalter, D. J. (1999). Probabilistic networks and expert systems. Springer.

  27. Dash, D. (2005). Restructuring dynamic causal systems in equilibrium. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AIStats 2005).

  28. Dash, D. & Druzdzel, M. (1999). A hybrid anytime algorithm for the construction of causal models from sparse data. In Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99).

  29. Dash, D., & Druzdzel, M. (2003). Robust independence testing for constraint-based learning of causal structure. In Proceedings of the Nineteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-03) (pp. 167–174), Morgan Kaufmann.

  30. Dor, D., & Tarsi, M. (1992). A simple algorithm to construct a consistent extension of a partially oriented graph. Technicial Report R-185, Cognitive Systems Laboratory, UCLA.

  31. Friedman, N. (1998). The Bayesian structural EM algorithm. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98). (pp. 129–138), San Francisco, CA, Morgan Kaufmann Publishers.

  32. Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Computational Biology, 7, 601–620.

    Article  Google Scholar 

  33. Friedman, N., Nachman, I., & Pe’er, D., (1999). Learning Bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99).

  34. Ghahramani, Z., & Beal, M. (2001). Graphical models and variational methods. In M. Opper, & D. Saad (Eds.), Advanced mean field methods—Theory and practice. MIT Press.

  35. Glymour, C., & Cooper, G. F. (eds.) (1999). Computation, causation, and discovery. AAAI Press/The MIT Press.

  36. Glymour, C. N. (2001). The mind’s arrows: Bayes nets & graphical causal models in psychology. MIT Press.

  37. Goldenberg, A., & Moore, A. (2004). Tractable learning of large Bayes net structures from sparse data. In Proceedings of 21st International Conference on Machine Learning.

  38. Heckerman, D. E., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.

    MATH  Google Scholar 

  39. Jensen, A., & Jensen, F. (1996). Midas—An influence diagram for management of mildew in winter wheat. In Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI-96) (pp. 349–356). Morgan Kaufmann Publishers.

  40. Jensen, C. S. (1997). Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. Ph.D. thesis, Aalborg University, Denmark.

  41. Jensen, C. S., & Kong, A. (1996). Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. Research Report R-96-2048, Department of Computer Science, Aalborg University, Denmark.

  42. Jordan, M. I., Ghahramani, Z., T.S., J., & L.K., S. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.

    MATH  Article  Google Scholar 

  43. Kocka, T., Bouckaert, R., & Studeny, M. (2001). On the inclusion problem. Technical report, Academy of Sciences of the Czech Republic.

  44. Kovisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.

    Google Scholar 

  45. Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Thirteen International Conference in Machine Learning.

  46. Komarek, P., & Moore, A. (2000). A dynamic adaptation of AD-trees for efficient machine learning on large data sets. In Proc. 17th International Conf. on Machine Learning (pp. 495–502). San Francisco, CA: Morgan Kaufmann.

  47. Kristensen, K., & Rasmussen, I. A. (2002). The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Computers and Electronics in Agriculture, 33, 197–217.

    Article  Google Scholar 

  48. Kullback, S., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86.

    MATH  MathSciNet  Google Scholar 

  49. Margaritis, D., & Thrun, S. (1999). Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12 (NIPS).

  50. Margaritis, D., & Thrun, S. (2001). A Bayesian multiresolution independence test for continuous variables. In 17th Conference on Uncertainty in Artificial Intelligence (UAI).

  51. Meek, C. (1995). Strong completeness and faithfulnes in Bayesian networks. In Conference on Uncertainty in Artificial Intelligence 411–418.

  52. Meek, C. (1997). Graphical models: Selecting causal and statistical models. Ph.D. thesis, Carnegie Mellon University.

  53. Moore, A., & Lee, M. (1998). Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, 8, 67–91.

    MATH  MathSciNet  Google Scholar 

  54. Moore, A., & Schneider, J. (2002). Real-valued all-dimensions search: Low-overhead rapid searching over subsets of attributes. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI-2002) (pp. 360–369).

  55. Moore, A., & Wong, W. (2003). Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In Twentieth International Conference on Machine Learning (ICML-2003).

  56. Neapolitan, R. (2003). Learning Bayesian networks. Prentice Hall.

  57. Nielson, J., Kocka, T., & Pena, J. (2003). On local optima in learning bayesian networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 435–442.

  58. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann.

    MATH  Google Scholar 

  59. Pearl, J. (2000). Causality, models, reasoning, and inference. Cambridge University Press.

  60. Pearl, J., & Verma, T. (1991). A theory of inferred causation. In J. F. Allen, R. Fikes, & E. Sandewall (Eds.), KR’91: Principles of knowledge representation and reasoning (pp. 441–452). San Mateo, California: Morgan Kaufmann.

  61. Peterson, W., TG, B., & Fox, W. (1954). The theory of signal detectability. IRE Professional Group on Information Theory PGIT-4, 171–212.

  62. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–671.

    MATH  Article  Google Scholar 

  63. Rissanen, J. (1987). Stochastic complexity. Journal of the Royal Statistical Soceity, Series B, 49, 223–239.

    MathSciNet  Google Scholar 

  64. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    MATH  MathSciNet  Google Scholar 

  65. Silverstein, C., Brin, S., Motwani, R., & Ullman, J. (2000). Scalable techniques for mining causal structures. Data Mining and Knowledge Discovery, 4 (2/3), 163–192.

    Article  Google Scholar 

  66. Singh, M., & Valtorta, M. (1993). An algorithm for the construction of Bayesian network structures from data. In 9th Conference on Uncertainty in Artificial Intelligence, pp. 259–265.

  67. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K. et al. & Eisen, M. B. (1998). Comprehensive identification of cell cycle regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.

    Google Scholar 

  68. Spirtes, P., Glymour, C., & Scheines, R. (1990). Causality from probability. In J. Tiles, G. McKee, & G. Dean (eds.): Evolving knowledge in the natural and behavioral sciences (pp. 181–199). London: Pittman.

  69. Spirtes, P., Glymour, C. & Scheines, R. (1993). Causation, prediction, and search. Springer/Verlag, first edition.

  70. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. The MIT Press, second edition.

  71. Spirtes, P., & Meek, C. (1995). Learning Bayesian networks with discrete variables from data. In Proceedings from First Annual Conference on Knowledge Discovery and Data Mining (pp. 294–299). Morgan Kaufmann.

  72. Statnikov, A., Tsamardinos, I., & Aliferis, C. F. (2003). An algorithm for the generation of large Bayesian networks. Technical Report DSL-03-01, Vanderbilt University.

  73. Steck, H., & Jaakkola, T. (2002). On the dirichlet prior and Bayesian regularization. In Advances in Neural Information Processing Systems, 15.

  74. Tsamardinos, I., & Aliferis, C. F. (2003). Towards principled feature selection: Relevancy, filters and wrappers. In Ninth International Workshop on Artificial Intelligence and Statistics (AI & Stats 2003).

  75. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003b). Algorithms for large scale markov blanket discovery. In The 16th International FLAIRS Conference (pp. 376–381).

  76. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003c). Time and sample efficient discovery of Markov blankets and direct causal relations. In The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 673–678).

  77. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003a). Time and sample efficient discovery of Markov Blankets and direct causal relations. Technical Report DSL-03-02, Vanderbilt University.

  78. Tsamardinos, I., Aliferis, C. F., Statnikov, A., & Brown. L. E. (2003a). Scaling-Up Bayesian network learning to thousands of variables using local Learning Technique. Technical Report DSL TR-03-02, Dept. Biomedical Informatics, Vanderbilt University.

  79. Tsamardinos, I., Statnikov, A., Brown, L. E., and Aliferis, C. F. (2006) Generating realistic large bayesian networks by tiling. In The 19th International FLAIRS Conference (to appear).

  80. Verma, T., & Pearl, J. (1988). Causal networks: Semantics and expressiveness. In: 4th Workshop on Uncertainty in Artificial Intelligence.

  81. Verma, T., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedins of 6th Annual Conference on Uncertainty in Artificial Intelligence (pp. 255–268). Elsevier Science.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Laura E. Brown.

Additional information

Editor: Andrew W. Moore

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Tsamardinos, I., Brown, L.E. & Aliferis, C.F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65, 31–78 (2006). https://doi.org/10.1007/s10994-006-6889-7

Download citation

Keywords

  • Bayesian networks
  • Graphical models
  • Structure learning