Learning Causal Bayesian Networks Using Minimum Free Energy Principle

Isozaki, Takashi

doi:10.1007/s00354-012-0103-1

Learning Causal Bayesian Networks Using Minimum Free Energy Principle

Published: 07 February 2012

Volume 30, pages 17–52, (2012)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Takashi Isozaki¹

227 Accesses
2 Citations
Explore all metrics

Abstract

Constraint-based search methods, which are a major approach to learning Bayesian networks, are expected to be effective in causal discovery tasks. However, such methods often suffer from impracticality of classical hypothesis testing for conditional independence when the sample size is insufficiently large. We present a new conditional independence (CI) testing method that is designed to be effective for small samples. Our method uses the minimum free energy principle, which originates from thermodynamics, with the “Data Temperature” assumption recently proposed by us. This CI method incorporates the maximum entropy principle and converges to classical hypothesis tests in asymptotic regions. In our experiments using repository datasets (Alarm/Insurance/Hailfinder/Barley/Mildew), the results show that our method improves the learning performance of the well known PC algorithm in the view of edge-reversed errors in addition to extra/missing errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abramson B., Brown J., Winkler R. L.: “Hailfinder: A Bayesian system for forecasting severe weather,” International Journal of Forecasting, 12, 57–71 (1996)
Article Google Scholar
Akaike H.: “A new look at the statistical model identification,” IEEE Trans. on Automatic Control, 19, 716–723 (1974)
Article MathSciNet MATH Google Scholar
Basu A., Harris I. R., Hjort N. L., Jones M. C.: “Robust and efficient estimation by minimising a density power divergence,” Biometrika, 85, 549–559 (1998)
Article MathSciNet MATH Google Scholar
Beinlich, I., Suermondt, H., Chavez, R. and Cooper, G., “The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks,” in Proc. of European Conference on Artificial Intelligence in Medicine (AIME-89), pp. 247–256, 1989.
Binder J., Koller D., Russell S., Kanazawa K.: “Adaptive probabilistic networks with hidden variables,” Machine Learning, 29, 213–244 (1997)
Article MATH Google Scholar
Bouckaert, R., “Properties of Bayesian belief network learning algorithms,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-94), pp. 102–109, 1994.
Buntine, W., “Theory refinement on bayesian networks,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-91), pp. 52–61, 1991.
Callen, H. B., Thermodynamics and An Introduction to Thermostatistics, John Wiley & Sons, Hoboken, NJ, second edition, 1985.
Cheng J., Greiner R., Kelly J., Bell D., Liu W.: “Learning Bayesian networks from data: An information-theory based approach,” Artificial Intelligence, 137(1-2), 43–90 (2002)
Article MathSciNet MATH Google Scholar
Cooper G., Herskovits E.: “A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9, 309–347 (1992)
MATH Google Scholar
Cover, T. M. and Thomas, J. A., Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, second edition, 2006.
Dash, D. and Druzdzel, M. J., “Robust independence testing for constraintbased learning of causal structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-03), pp. 167–174, 2003.
Friedman N.: “Inferring cellular networks using probabilistic graphical models,” Science, 303(5659), 799–805 (2004)
Article Google Scholar
Friedman N., Geiger D., Goldszmidt M.: “Bayesian network classifiers,” Machine Learning, 29(2-3), 131–163 (1997)
Article MATH Google Scholar
Friedman, N. and Goldszmidt, M., “Learning Bayesian networks with local structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 252–262, 1996.
Geiger, D. and Pearl, J., “Logical and algorithmic properties of conditional independence and qualitative independence,” Technical Report R-97, UCLA, Cognitive Systems Laboratory, 1988.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B., Bayesian Data Analysis, Chapman & Hall/CRC, Boca Raton, FL, second edition, 2004.
Heckerman D., Geiger D., Chickering D.: “Learning Bayesian networks: The combination of knowledge and statistical data,” Machine Learning, 20, 197–243 (1995)
MATH Google Scholar
Hofmann, T., “Probabilistic latent semantic analysis,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 289–296, 1999.
Horvitz, E., Breese, J., Heckerman, D., Horvel, D. and Rommelse, K., “The lumiere project: Bayesian user modeling for inferring the goals and needs of software users,” in Proc. of Uncertainty in Artificial Intelligence (UAI-98), pp. 256–265, 1998.
Isozaki T., Kato N., Ueno M.: “Data temperature” in minimum free energies for parameter learning of Bayesian networks,” International Journal on Artificial Intelligence Tools, 18(5), 653–671 (2009)
Article Google Scholar
Isozaki, T. and Ueno, M., “Minimum free energy principle for constraint-based learning Bayesian networks,” in Proc. of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2009), Part 1, LNAI 5781, pp. 612–627, 2009.
Jaynes E. T.: “Information theory and statistical mechanics,” Physical Review, 106(4), 620–630 (1957)
Article Google Scholar
Jensen, A. and Jensen, F., “Midasan influence diagram for management of mildew in winter wheat,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 349–356, 1996.
Kojima K., Perrier E., Imoto S., Miyano S.: “Optimal search on clustered structural constraint for learning bayesian network structure,” Journal of Machine Learning Research, 11, 285–310 (2010)
MathSciNet Google Scholar
Kristensen K., Rasmussen I. A.: “The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides,” Computers and Electronics in Agriculture, 33, 197–217 (2002)
Article Google Scholar
Kullback S.: Information Theory and Statistics,. Dover Publications, Mineola, NY (1968)
Google Scholar
Lam W., Bacchus F.: “Learning Bayesian belief networks: An approach based on the MDL principle,” Computational Intelligence, 10, 269–293 (1994)
Article Google Scholar
LeCun, Y. and Huang, F. J., “Loss functions for discriminative training of energy-based models,” in Proc. of International Workshop on Artificial Intelligence and Statistics (AISTATS-05), pp. 206–213, 2005.
Lehmann, E. L., Testing Statistical Hypotheses, John Wiley & Sons, second edition, 1986.
Meek, C., “Causal inference and causal explanation with background knowledge,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403–410, 1995.
Neapolitan R. E.: Learning Bayesian Networks,. Prentice Hall, Upper Saddle River, NJ (2004)
Google Scholar
Pearl J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA (1988)
Google Scholar
Pearl J.: Causality, models, reasoning, and inference. Cambridge University Press, New York, NY (2000)
MATH Google Scholar
Pearl J., Geiger D., Verma T.: “The logic of influence diagrams,” Kybernetica, 25(2), 33–44 (1989)
MathSciNet Google Scholar
Pereira, F., Tishby, N. and Lee, L., “Distributional clustering of English words,” in Proc. of Annual Meeting on Association for Computational Linguistics (ACL-93), pp. 183–190, 1993.
Perrier E., Imoto S., Miyano S.: “Finding optimal bayesian networks given a super-structure,” Journal of Machine Learning Research, 9, 2251–2286 (2008)
MathSciNet MATH Google Scholar
Ramsey, J., Spirtes, P. and Zhang, J., “Adjacency-faithfulness and conservative causal inference,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 401–408, 2006.
Rebane, G. and Pearl, J., “The recovery of causal poly-trees from statistical data,” in Workshop on Uncertainty in Artificial Intelligence (UAI-87), pp. 222– 228, 1987.
Reichenbach, H., The Direction of Time, Dover Publications (Republication of the work published by University of California Press, Berkeley), Mineola, NY, 1956.
Rissanen J.: “Modeling by shortest data description” Automatica, 14, 465–471 (1978)
Article MATH Google Scholar
Rissanen J.: “Optimal estimation,” IEEE Information Theory Society Newsletter, 69(3), 1–2 (2009)
Google Scholar
Robert, C. P., The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, Springer-Verlag, New York, NY, second edition, 2007.
Schwarz G.: “Estimating the dimension of a model,” Annals of Statistics, 6, 461–464 (1978)
Article MathSciNet MATH Google Scholar
Silander, T., Kontkane, P. and Myllymaki, P., “On sensitivity of the map Bayesian network structure to the equivalent sample size parameter,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-07), pp. 360–367, 2007.
Spirtes P., Glymour C.: “An algorithm for fast recovery of sparse causal graphs,” Social Science Computer Review, 9(1), 62–72 (1991)
Article Google Scholar
Spirtes, P., Glymour, C. and Scheines, R., “Causality from probability,” Evolving Knowledge in the Natural and Behavioral Sciences (Tiles, J., McKee, G. and Dean, G., Eds.), Pitman Publishing, London, 1990.
Spirtes, P., Glymour, C. and Scheines, R., Causation, Prediction and Search, MIT Press, Cambridge, MA, second edition, 2000.
Steck, H., “Learning the Bayesian network structure: Dirichlet prior versus data,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 511–518, 2008.
Suzuki, J., “A construction of Bayesian networks from databases based on an MDL principle,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-93), pp. 266–273, 1993.
Tsamardinos I., Brown L. E., Aliferis C. F.: “The max-min hill-climbing Bayesian network structure learning algorithm,” Machine Learning, 65(1), 31–78 (2006)
Article Google Scholar
Verma, T. and Pearl, J., “Equivalence and synthesis of causal models,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-90), pp. 220–227, 1990.
Verma, T. and Pearl, J., “An algorithm for deciding if a set of observed independencies has a causal explanation,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-92), pp. 323–330, 1992.
Watanabe K., Shiga M., Watanabe S.: “Upper bound for variational free energy of Bayesian networks,” Machine Learning, 75(2), 199–215 (2009)
Article Google Scholar
Yang S., Chang K. C.: “Comparison of score metrics for Bayesian network learning,” IEEE Trans. on Systems, Man and Cybernetics Part A: Systems and Humans, 32(3), 419–428 (2002)
Article Google Scholar
Yedidia J. S., Freeman W. T., Weiss Y.: “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. on Information Theory, 51(7), 2282–2312 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sony Computer Science Laboratories, Inc., Takanawa Muse Bldg., 3-14-13, Higashi-Gotanda, Shinagawa-ku, Tokyo, 141-0022, Japan
Takashi Isozaki

Authors

Takashi Isozaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takashi Isozaki.

About this article

Cite this article

Isozaki, T. Learning Causal Bayesian Networks Using Minimum Free Energy Principle. New Gener. Comput. 30, 17–52 (2012). https://doi.org/10.1007/s00354-012-0103-1

Download citation

Received: 01 April 2011
Revised: 08 September 2011
Published: 07 February 2012
Issue Date: January 2012
DOI: https://doi.org/10.1007/s00354-012-0103-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Causal Bayesian Networks Using Minimum Free Energy Principle

Abstract

Access this article

Similar content being viewed by others

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Counterfactual explanations and how to find them: literature review and benchmarking

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Learning Causal Bayesian Networks Using Minimum Free Energy Principle

Abstract

Access this article

Similar content being viewed by others

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Counterfactual explanations and how to find them: literature review and benchmarking

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation