Machine Learning

, Volume 65, Issue 1, pp 31–78 | Cite as

The max-min hill-climbing Bayesian network structure learning algorithm

  • Ioannis Tsamardinos
  • Laura E. Brown
  • Constantin F. Aliferis
Article

Abstract

We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at http://www.dsl-lab.org/supplements/mmhc_paper/mmhc_index.html.

Keywords

Bayesian networks Graphical models Structure learning 

References

  1. Abramson, B., Brown, J., Edwards, W., Murphy, A., & Winkler, R. L. (1996). Hailfinder: A Bayesian system for forecasting severe weather. International Journal of Forecasting, 12, 57–71.CrossRefGoogle Scholar
  2. Acid, S., de Campos, L., Fernandez-Luna, J., Rodriguez, S., Rodriguez, J., & Salcedo, J. (2004). A comparison of learning algorithms for Bayesian networks: A case study based on data from an emergency medical service. Artificial Intelligence in Medicine, 30, 215–232.Google Scholar
  3. Acid, S., & de Cam-pos, L. M. (2003). Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. Journal of Artificial Intelligence Research, 445–490.Google Scholar
  4. Acid, S., & de Cam-pos, L. (2001). A hybrid methodology for learning belief networks: BENEDICT. International Journal of Approximate Reasoning, 235–262.Google Scholar
  5. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.MATHMathSciNetCrossRefGoogle Scholar
  6. Aliferis, C. F., Tsamardinos, I., Statnikov, A., & Brown, L. E. (2003a). Causal explorer: A causal probabilistic network learning toolkit for biomedical discovery. In International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS ’03) (pp. 371–376).Google Scholar
  7. Aliferis, C. F., Tsamardinos, I., & Statnikov, A. (2003b). HITON, A novel markov blanket algorithm for optimal variable selection. In American Medical Informatics Association (AMIA) (pp. 21–25).Google Scholar
  8. Andreassen, S., Jensen, F. V., Andersen, S. K., Falck, B., Kharulff, U., & Woldbye, M. (1989). MUNIN—An expert EMG assistant. In J. E. Desmedt (Eds.), Computer-aided electromyography and expert systems.Google Scholar
  9. Baeze-Yates, R., & Ribiero-Neto, B. (1999). Modern information retrieval. Addison-Wesley Pub Co.Google Scholar
  10. Beal, M. J. & Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, & M. West (Eds.), Bayesian statistics 7. Oxford University Press.Google Scholar
  11. Beinlich, I. A., Suermondt, H., Chavez, R., Cooper, G., et al. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Second European Conference in Artificial Intelligence in Medicine.Google Scholar
  12. Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29.Google Scholar
  13. Bouckaert, R. (1995). Bayesian belief networks from construction to inference. Ph.D. thesis, University of Utrecht.Google Scholar
  14. Brown, L., Tsamardinos, I., & Aliferis, C. (2004). A novel algorithm for scalable and accurate bayesian network learning. In 11th World Congress on Medical Informatics (MEDINFO). San Francisco, California.Google Scholar
  15. Brown, L. E., Tsamardinos, I., & Aliferis, C. F. (2005). A comparison of novel and state-of-the-art polynomial Bayesian network learning algorithms. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI).Google Scholar
  16. Chapman, W. W., Fizman, M., Chapman, B. E. & Haug, P. J. (2001). A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. Journal of Biomedical Informatics, 34, 4–14.Google Scholar
  17. Cheng, J., Bell, D., & Liu, W. (1998). Learning Bayesian networks from data: An efficient approach based on information theory. Technical report, University of Alberta, Canada.Google Scholar
  18. Cheng, J., Greiner, R., Kelly, J., Bell, D. A. & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137, 43–90.MATHMathSciNetCrossRefGoogle Scholar
  19. Chickering, D. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-95). San Francisco, CA (pp. 87–98). Morgan Kaufmann Publishers.Google Scholar
  20. Chickering, D. (1996). Learning Bayesian networks is NP-complete. In D. Fisher and H. Lenz (Eds.), Learning from data: Artificial intelligence and statistics V (pp. 121–130) Springer-Verlag.Google Scholar
  21. Chickering, D. (2002b). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 445–498.Google Scholar
  22. Chickering, D., Geiger, D. & Heckerman, D. (1995). Learning Bayesian networks: Search methods and experimental results. In Fifth International Workshop on Artificial Intelligence and Statistics (pp. 112–128).Google Scholar
  23. Chickering, D., Meek, C. & Heckerman D. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.Google Scholar
  24. Chickering, D. M. (2002a). Optimal structure identification with greedy search. Journal of Machine Learning Research, 507–554.Google Scholar
  25. Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9 (4), 309–347.MATHGoogle Scholar
  26. Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhalter, D. J. (1999). Probabilistic networks and expert systems. Springer.Google Scholar
  27. Dash, D. (2005). Restructuring dynamic causal systems in equilibrium. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AIStats 2005).Google Scholar
  28. Dash, D. & Druzdzel, M. (1999). A hybrid anytime algorithm for the construction of causal models from sparse data. In Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99).Google Scholar
  29. Dash, D., & Druzdzel, M. (2003). Robust independence testing for constraint-based learning of causal structure. In Proceedings of the Nineteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-03) (pp. 167–174), Morgan Kaufmann.Google Scholar
  30. Dor, D., & Tarsi, M. (1992). A simple algorithm to construct a consistent extension of a partially oriented graph. Technicial Report R-185, Cognitive Systems Laboratory, UCLA.Google Scholar
  31. Friedman, N. (1998). The Bayesian structural EM algorithm. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98). (pp. 129–138), San Francisco, CA, Morgan Kaufmann Publishers.Google Scholar
  32. Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. Computational Biology, 7, 601–620.CrossRefGoogle Scholar
  33. Friedman, N., Nachman, I., & Pe’er, D., (1999). Learning Bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99).Google Scholar
  34. Ghahramani, Z., & Beal, M. (2001). Graphical models and variational methods. In M. Opper, & D. Saad (Eds.), Advanced mean field methods—Theory and practice. MIT Press.Google Scholar
  35. Glymour, C., & Cooper, G. F. (eds.) (1999). Computation, causation, and discovery. AAAI Press/The MIT Press.Google Scholar
  36. Glymour, C. N. (2001). The mind’s arrows: Bayes nets & graphical causal models in psychology. MIT Press.Google Scholar
  37. Goldenberg, A., & Moore, A. (2004). Tractable learning of large Bayes net structures from sparse data. In Proceedings of 21st International Conference on Machine Learning.Google Scholar
  38. Heckerman, D. E., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.MATHGoogle Scholar
  39. Jensen, A., & Jensen, F. (1996). Midas—An influence diagram for management of mildew in winter wheat. In Proceedings of the 12th Annual Conference on Uncertainty in Artificial Intelligence (UAI-96) (pp. 349–356). Morgan Kaufmann Publishers.Google Scholar
  40. Jensen, C. S. (1997). Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. Ph.D. thesis, Aalborg University, Denmark.Google Scholar
  41. Jensen, C. S., & Kong, A. (1996). Blocking Gibbs sampling for linkage analysis in large pedigrees with many loops. Research Report R-96-2048, Department of Computer Science, Aalborg University, Denmark.Google Scholar
  42. Jordan, M. I., Ghahramani, Z., T.S., J., & L.K., S. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.MATHCrossRefGoogle Scholar
  43. Kocka, T., Bouckaert, R., & Studeny, M. (2001). On the inclusion problem. Technical report, Academy of Sciences of the Czech Republic.Google Scholar
  44. Kovisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.Google Scholar
  45. Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Thirteen International Conference in Machine Learning.Google Scholar
  46. Komarek, P., & Moore, A. (2000). A dynamic adaptation of AD-trees for efficient machine learning on large data sets. In Proc. 17th International Conf. on Machine Learning (pp. 495–502). San Francisco, CA: Morgan Kaufmann.Google Scholar
  47. Kristensen, K., & Rasmussen, I. A. (2002). The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Computers and Electronics in Agriculture, 33, 197–217.CrossRefGoogle Scholar
  48. Kullback, S., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86.MATHMathSciNetGoogle Scholar
  49. Margaritis, D., & Thrun, S. (1999). Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12 (NIPS).Google Scholar
  50. Margaritis, D., & Thrun, S. (2001). A Bayesian multiresolution independence test for continuous variables. In 17th Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
  51. Meek, C. (1995). Strong completeness and faithfulnes in Bayesian networks. In Conference on Uncertainty in Artificial Intelligence 411–418.Google Scholar
  52. Meek, C. (1997). Graphical models: Selecting causal and statistical models. Ph.D. thesis, Carnegie Mellon University.Google Scholar
  53. Moore, A., & Lee, M. (1998). Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, 8, 67–91.MATHMathSciNetGoogle Scholar
  54. Moore, A., & Schneider, J. (2002). Real-valued all-dimensions search: Low-overhead rapid searching over subsets of attributes. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI-2002) (pp. 360–369).Google Scholar
  55. Moore, A., & Wong, W. (2003). Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning. In Twentieth International Conference on Machine Learning (ICML-2003).Google Scholar
  56. Neapolitan, R. (2003). Learning Bayesian networks. Prentice Hall.Google Scholar
  57. Nielson, J., Kocka, T., & Pena, J. (2003). On local optima in learning bayesian networks. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 435–442.Google Scholar
  58. Pearl, J. (1988). Probabilistic reasoning in intelligent systems. San Mateo, CA: Morgan Kaufmann.MATHGoogle Scholar
  59. Pearl, J. (2000). Causality, models, reasoning, and inference. Cambridge University Press.Google Scholar
  60. Pearl, J., & Verma, T. (1991). A theory of inferred causation. In J. F. Allen, R. Fikes, & E. Sandewall (Eds.), KR’91: Principles of knowledge representation and reasoning (pp. 441–452). San Mateo, California: Morgan Kaufmann.Google Scholar
  61. Peterson, W., TG, B., & Fox, W. (1954). The theory of signal detectability. IRE Professional Group on Information Theory PGIT-4, 171–212.Google Scholar
  62. Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–671.MATHCrossRefGoogle Scholar
  63. Rissanen, J. (1987). Stochastic complexity. Journal of the Royal Statistical Soceity, Series B, 49, 223–239.MathSciNetGoogle Scholar
  64. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.MATHMathSciNetGoogle Scholar
  65. Silverstein, C., Brin, S., Motwani, R., & Ullman, J. (2000). Scalable techniques for mining causal structures. Data Mining and Knowledge Discovery, 4 (2/3), 163–192.CrossRefGoogle Scholar
  66. Singh, M., & Valtorta, M. (1993). An algorithm for the construction of Bayesian network structures from data. In 9th Conference on Uncertainty in Artificial Intelligence, pp. 259–265.Google Scholar
  67. Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K. et al. & Eisen, M. B. (1998). Comprehensive identification of cell cycle regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.Google Scholar
  68. Spirtes, P., Glymour, C., & Scheines, R. (1990). Causality from probability. In J. Tiles, G. McKee, & G. Dean (eds.): Evolving knowledge in the natural and behavioral sciences (pp. 181–199). London: Pittman.Google Scholar
  69. Spirtes, P., Glymour, C. & Scheines, R. (1993). Causation, prediction, and search. Springer/Verlag, first edition.Google Scholar
  70. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search. The MIT Press, second edition.Google Scholar
  71. Spirtes, P., & Meek, C. (1995). Learning Bayesian networks with discrete variables from data. In Proceedings from First Annual Conference on Knowledge Discovery and Data Mining (pp. 294–299). Morgan Kaufmann.Google Scholar
  72. Statnikov, A., Tsamardinos, I., & Aliferis, C. F. (2003). An algorithm for the generation of large Bayesian networks. Technical Report DSL-03-01, Vanderbilt University.Google Scholar
  73. Steck, H., & Jaakkola, T. (2002). On the dirichlet prior and Bayesian regularization. In Advances in Neural Information Processing Systems, 15.Google Scholar
  74. Tsamardinos, I., & Aliferis, C. F. (2003). Towards principled feature selection: Relevancy, filters and wrappers. In Ninth International Workshop on Artificial Intelligence and Statistics (AI & Stats 2003).Google Scholar
  75. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003b). Algorithms for large scale markov blanket discovery. In The 16th International FLAIRS Conference (pp. 376–381).Google Scholar
  76. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003c). Time and sample efficient discovery of Markov blankets and direct causal relations. In The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 673–678).Google Scholar
  77. Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003a). Time and sample efficient discovery of Markov Blankets and direct causal relations. Technical Report DSL-03-02, Vanderbilt University.Google Scholar
  78. Tsamardinos, I., Aliferis, C. F., Statnikov, A., & Brown. L. E. (2003a). Scaling-Up Bayesian network learning to thousands of variables using local Learning Technique. Technical Report DSL TR-03-02, Dept. Biomedical Informatics, Vanderbilt University.Google Scholar
  79. Tsamardinos, I., Statnikov, A., Brown, L. E., and Aliferis, C. F. (2006) Generating realistic large bayesian networks by tiling. In The 19th International FLAIRS Conference (to appear).Google Scholar
  80. Verma, T., & Pearl, J. (1988). Causal networks: Semantics and expressiveness. In: 4th Workshop on Uncertainty in Artificial Intelligence.Google Scholar
  81. Verma, T., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedins of 6th Annual Conference on Uncertainty in Artificial Intelligence (pp. 255–268). Elsevier Science.Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Ioannis Tsamardinos
    • 1
  • Laura E. Brown
    • 1
  • Constantin F. Aliferis
    • 1
  1. 1.Discovery Systems Laboratory, Dept. of Biomedical InformaticsVanderbilt UniversityNashville

Personalised recommendations