Skip to main content
Log in

Learning Causal Bayesian Networks Using Minimum Free Energy Principle

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Constraint-based search methods, which are a major approach to learning Bayesian networks, are expected to be effective in causal discovery tasks. However, such methods often suffer from impracticality of classical hypothesis testing for conditional independence when the sample size is insufficiently large. We present a new conditional independence (CI) testing method that is designed to be effective for small samples. Our method uses the minimum free energy principle, which originates from thermodynamics, with the “Data Temperature” assumption recently proposed by us. This CI method incorporates the maximum entropy principle and converges to classical hypothesis tests in asymptotic regions. In our experiments using repository datasets (Alarm/Insurance/Hailfinder/Barley/Mildew), the results show that our method improves the learning performance of the well known PC algorithm in the view of edge-reversed errors in addition to extra/missing errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abramson B., Brown J., Winkler R. L.: “Hailfinder: A Bayesian system for forecasting severe weather,” International Journal of Forecasting, 12, 57–71 (1996)

    Article  Google Scholar 

  2. Akaike H.: “A new look at the statistical model identification,” IEEE Trans. on Automatic Control, 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  3. Basu A., Harris I. R., Hjort N. L., Jones M. C.: “Robust and efficient estimation by minimising a density power divergence,” Biometrika, 85, 549–559 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beinlich, I., Suermondt, H., Chavez, R. and Cooper, G., “The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks,” in Proc. of European Conference on Artificial Intelligence in Medicine (AIME-89), pp. 247–256, 1989.

  5. Binder J., Koller D., Russell S., Kanazawa K.: “Adaptive probabilistic networks with hidden variables,” Machine Learning, 29, 213–244 (1997)

    Article  MATH  Google Scholar 

  6. Bouckaert, R., “Properties of Bayesian belief network learning algorithms,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-94), pp. 102–109, 1994.

  7. Buntine, W., “Theory refinement on bayesian networks,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-91), pp. 52–61, 1991.

  8. Callen, H. B., Thermodynamics and An Introduction to Thermostatistics, John Wiley & Sons, Hoboken, NJ, second edition, 1985.

  9. Cheng J., Greiner R., Kelly J., Bell D., Liu W.: “Learning Bayesian networks from data: An information-theory based approach,” Artificial Intelligence, 137(1-2), 43–90 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  10. Cooper G., Herskovits E.: “A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9, 309–347 (1992)

    MATH  Google Scholar 

  11. Cover, T. M. and Thomas, J. A., Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, second edition, 2006.

  12. Dash, D. and Druzdzel, M. J., “Robust independence testing for constraintbased learning of causal structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-03), pp. 167–174, 2003.

  13. Friedman N.: “Inferring cellular networks using probabilistic graphical models,” Science, 303(5659), 799–805 (2004)

    Article  Google Scholar 

  14. Friedman N., Geiger D., Goldszmidt M.: “Bayesian network classifiers,” Machine Learning, 29(2-3), 131–163 (1997)

    Article  MATH  Google Scholar 

  15. Friedman, N. and Goldszmidt, M., “Learning Bayesian networks with local structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 252–262, 1996.

  16. Geiger, D. and Pearl, J., “Logical and algorithmic properties of conditional independence and qualitative independence,” Technical Report R-97, UCLA, Cognitive Systems Laboratory, 1988.

  17. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B., Bayesian Data Analysis, Chapman & Hall/CRC, Boca Raton, FL, second edition, 2004.

  18. Heckerman D., Geiger D., Chickering D.: “Learning Bayesian networks: The combination of knowledge and statistical data,” Machine Learning, 20, 197–243 (1995)

    MATH  Google Scholar 

  19. Hofmann, T., “Probabilistic latent semantic analysis,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 289–296, 1999.

  20. Horvitz, E., Breese, J., Heckerman, D., Horvel, D. and Rommelse, K., “The lumiere project: Bayesian user modeling for inferring the goals and needs of software users,” in Proc. of Uncertainty in Artificial Intelligence (UAI-98), pp. 256–265, 1998.

  21. Isozaki T., Kato N., Ueno M.: “Data temperature” in minimum free energies for parameter learning of Bayesian networks,” International Journal on Artificial Intelligence Tools, 18(5), 653–671 (2009)

    Article  Google Scholar 

  22. Isozaki, T. and Ueno, M., “Minimum free energy principle for constraint-based learning Bayesian networks,” in Proc. of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2009), Part 1, LNAI 5781, pp. 612–627, 2009.

  23. Jaynes E. T.: “Information theory and statistical mechanics,” Physical Review, 106(4), 620–630 (1957)

    Article  Google Scholar 

  24. Jensen, A. and Jensen, F., “Midasan influence diagram for management of mildew in winter wheat,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 349–356, 1996.

  25. Kojima K., Perrier E., Imoto S., Miyano S.: “Optimal search on clustered structural constraint for learning bayesian network structure,” Journal of Machine Learning Research, 11, 285–310 (2010)

    MathSciNet  Google Scholar 

  26. Kristensen K., Rasmussen I. A.: “The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides,” Computers and Electronics in Agriculture, 33, 197–217 (2002)

    Article  Google Scholar 

  27. Kullback S.: Information Theory and Statistics,. Dover Publications, Mineola, NY (1968)

    Google Scholar 

  28. Lam W., Bacchus F.: “Learning Bayesian belief networks: An approach based on the MDL principle,” Computational Intelligence, 10, 269–293 (1994)

    Article  Google Scholar 

  29. LeCun, Y. and Huang, F. J., “Loss functions for discriminative training of energy-based models,” in Proc. of International Workshop on Artificial Intelligence and Statistics (AISTATS-05), pp. 206–213, 2005.

  30. Lehmann, E. L., Testing Statistical Hypotheses, John Wiley & Sons, second edition, 1986.

  31. Meek, C., “Causal inference and causal explanation with background knowledge,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403–410, 1995.

  32. Neapolitan R. E.: Learning Bayesian Networks,. Prentice Hall, Upper Saddle River, NJ (2004)

    Google Scholar 

  33. Pearl J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA (1988)

    Google Scholar 

  34. Pearl J.: Causality, models, reasoning, and inference. Cambridge University Press, New York, NY (2000)

    MATH  Google Scholar 

  35. Pearl J., Geiger D., Verma T.: “The logic of influence diagrams,” Kybernetica, 25(2), 33–44 (1989)

    MathSciNet  Google Scholar 

  36. Pereira, F., Tishby, N. and Lee, L., “Distributional clustering of English words,” in Proc. of Annual Meeting on Association for Computational Linguistics (ACL-93), pp. 183–190, 1993.

  37. Perrier E., Imoto S., Miyano S.: “Finding optimal bayesian networks given a super-structure,” Journal of Machine Learning Research, 9, 2251–2286 (2008)

    MathSciNet  MATH  Google Scholar 

  38. Ramsey, J., Spirtes, P. and Zhang, J., “Adjacency-faithfulness and conservative causal inference,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 401–408, 2006.

  39. Rebane, G. and Pearl, J., “The recovery of causal poly-trees from statistical data,” in Workshop on Uncertainty in Artificial Intelligence (UAI-87), pp. 222– 228, 1987.

  40. Reichenbach, H., The Direction of Time, Dover Publications (Republication of the work published by University of California Press, Berkeley), Mineola, NY, 1956.

  41. Rissanen J.: “Modeling by shortest data description” Automatica, 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  42. Rissanen J.: “Optimal estimation,” IEEE Information Theory Society Newsletter, 69(3), 1–2 (2009)

    Google Scholar 

  43. Robert, C. P., The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, Springer-Verlag, New York, NY, second edition, 2007.

  44. Schwarz G.: “Estimating the dimension of a model,” Annals of Statistics, 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  45. Silander, T., Kontkane, P. and Myllymaki, P., “On sensitivity of the map Bayesian network structure to the equivalent sample size parameter,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-07), pp. 360–367, 2007.

  46. Spirtes P., Glymour C.: “An algorithm for fast recovery of sparse causal graphs,” Social Science Computer Review, 9(1), 62–72 (1991)

    Article  Google Scholar 

  47. Spirtes, P., Glymour, C. and Scheines, R., “Causality from probability,” Evolving Knowledge in the Natural and Behavioral Sciences (Tiles, J., McKee, G. and Dean, G., Eds.), Pitman Publishing, London, 1990.

  48. Spirtes, P., Glymour, C. and Scheines, R., Causation, Prediction and Search, MIT Press, Cambridge, MA, second edition, 2000.

  49. Steck, H., “Learning the Bayesian network structure: Dirichlet prior versus data,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 511–518, 2008.

  50. Suzuki, J., “A construction of Bayesian networks from databases based on an MDL principle,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-93), pp. 266–273, 1993.

  51. Tsamardinos I., Brown L. E., Aliferis C. F.: “The max-min hill-climbing Bayesian network structure learning algorithm,” Machine Learning, 65(1), 31–78 (2006)

    Article  Google Scholar 

  52. Verma, T. and Pearl, J., “Equivalence and synthesis of causal models,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-90), pp. 220–227, 1990.

  53. Verma, T. and Pearl, J., “An algorithm for deciding if a set of observed independencies has a causal explanation,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-92), pp. 323–330, 1992.

  54. Watanabe K., Shiga M., Watanabe S.: “Upper bound for variational free energy of Bayesian networks,” Machine Learning, 75(2), 199–215 (2009)

    Article  Google Scholar 

  55. Yang S., Chang K. C.: “Comparison of score metrics for Bayesian network learning,” IEEE Trans. on Systems, Man and Cybernetics Part A: Systems and Humans, 32(3), 419–428 (2002)

    Article  Google Scholar 

  56. Yedidia J. S., Freeman W. T., Weiss Y.: “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. on Information Theory, 51(7), 2282–2312 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takashi Isozaki.

About this article

Cite this article

Isozaki, T. Learning Causal Bayesian Networks Using Minimum Free Energy Principle. New Gener. Comput. 30, 17–52 (2012). https://doi.org/10.1007/s00354-012-0103-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-012-0103-1

Keywords

Navigation