Abstract
Constraint-based search methods, which are a major approach to learning Bayesian networks, are expected to be effective in causal discovery tasks. However, such methods often suffer from impracticality of classical hypothesis testing for conditional independence when the sample size is insufficiently large. We present a new conditional independence (CI) testing method that is designed to be effective for small samples. Our method uses the minimum free energy principle, which originates from thermodynamics, with the “Data Temperature” assumption recently proposed by us. This CI method incorporates the maximum entropy principle and converges to classical hypothesis tests in asymptotic regions. In our experiments using repository datasets (Alarm/Insurance/Hailfinder/Barley/Mildew), the results show that our method improves the learning performance of the well known PC algorithm in the view of edge-reversed errors in addition to extra/missing errors.
Similar content being viewed by others
References
Abramson B., Brown J., Winkler R. L.: “Hailfinder: A Bayesian system for forecasting severe weather,” International Journal of Forecasting, 12, 57–71 (1996)
Akaike H.: “A new look at the statistical model identification,” IEEE Trans. on Automatic Control, 19, 716–723 (1974)
Basu A., Harris I. R., Hjort N. L., Jones M. C.: “Robust and efficient estimation by minimising a density power divergence,” Biometrika, 85, 549–559 (1998)
Beinlich, I., Suermondt, H., Chavez, R. and Cooper, G., “The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks,” in Proc. of European Conference on Artificial Intelligence in Medicine (AIME-89), pp. 247–256, 1989.
Binder J., Koller D., Russell S., Kanazawa K.: “Adaptive probabilistic networks with hidden variables,” Machine Learning, 29, 213–244 (1997)
Bouckaert, R., “Properties of Bayesian belief network learning algorithms,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-94), pp. 102–109, 1994.
Buntine, W., “Theory refinement on bayesian networks,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-91), pp. 52–61, 1991.
Callen, H. B., Thermodynamics and An Introduction to Thermostatistics, John Wiley & Sons, Hoboken, NJ, second edition, 1985.
Cheng J., Greiner R., Kelly J., Bell D., Liu W.: “Learning Bayesian networks from data: An information-theory based approach,” Artificial Intelligence, 137(1-2), 43–90 (2002)
Cooper G., Herskovits E.: “A Bayesian method for the induction of probabilistic networks from data, Machine Learning, 9, 309–347 (1992)
Cover, T. M. and Thomas, J. A., Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, second edition, 2006.
Dash, D. and Druzdzel, M. J., “Robust independence testing for constraintbased learning of causal structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-03), pp. 167–174, 2003.
Friedman N.: “Inferring cellular networks using probabilistic graphical models,” Science, 303(5659), 799–805 (2004)
Friedman N., Geiger D., Goldszmidt M.: “Bayesian network classifiers,” Machine Learning, 29(2-3), 131–163 (1997)
Friedman, N. and Goldszmidt, M., “Learning Bayesian networks with local structure,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 252–262, 1996.
Geiger, D. and Pearl, J., “Logical and algorithmic properties of conditional independence and qualitative independence,” Technical Report R-97, UCLA, Cognitive Systems Laboratory, 1988.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B., Bayesian Data Analysis, Chapman & Hall/CRC, Boca Raton, FL, second edition, 2004.
Heckerman D., Geiger D., Chickering D.: “Learning Bayesian networks: The combination of knowledge and statistical data,” Machine Learning, 20, 197–243 (1995)
Hofmann, T., “Probabilistic latent semantic analysis,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 289–296, 1999.
Horvitz, E., Breese, J., Heckerman, D., Horvel, D. and Rommelse, K., “The lumiere project: Bayesian user modeling for inferring the goals and needs of software users,” in Proc. of Uncertainty in Artificial Intelligence (UAI-98), pp. 256–265, 1998.
Isozaki T., Kato N., Ueno M.: “Data temperature” in minimum free energies for parameter learning of Bayesian networks,” International Journal on Artificial Intelligence Tools, 18(5), 653–671 (2009)
Isozaki, T. and Ueno, M., “Minimum free energy principle for constraint-based learning Bayesian networks,” in Proc. of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2009), Part 1, LNAI 5781, pp. 612–627, 2009.
Jaynes E. T.: “Information theory and statistical mechanics,” Physical Review, 106(4), 620–630 (1957)
Jensen, A. and Jensen, F., “Midasan influence diagram for management of mildew in winter wheat,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-96), pp. 349–356, 1996.
Kojima K., Perrier E., Imoto S., Miyano S.: “Optimal search on clustered structural constraint for learning bayesian network structure,” Journal of Machine Learning Research, 11, 285–310 (2010)
Kristensen K., Rasmussen I. A.: “The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides,” Computers and Electronics in Agriculture, 33, 197–217 (2002)
Kullback S.: Information Theory and Statistics,. Dover Publications, Mineola, NY (1968)
Lam W., Bacchus F.: “Learning Bayesian belief networks: An approach based on the MDL principle,” Computational Intelligence, 10, 269–293 (1994)
LeCun, Y. and Huang, F. J., “Loss functions for discriminative training of energy-based models,” in Proc. of International Workshop on Artificial Intelligence and Statistics (AISTATS-05), pp. 206–213, 2005.
Lehmann, E. L., Testing Statistical Hypotheses, John Wiley & Sons, second edition, 1986.
Meek, C., “Causal inference and causal explanation with background knowledge,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403–410, 1995.
Neapolitan R. E.: Learning Bayesian Networks,. Prentice Hall, Upper Saddle River, NJ (2004)
Pearl J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA (1988)
Pearl J.: Causality, models, reasoning, and inference. Cambridge University Press, New York, NY (2000)
Pearl J., Geiger D., Verma T.: “The logic of influence diagrams,” Kybernetica, 25(2), 33–44 (1989)
Pereira, F., Tishby, N. and Lee, L., “Distributional clustering of English words,” in Proc. of Annual Meeting on Association for Computational Linguistics (ACL-93), pp. 183–190, 1993.
Perrier E., Imoto S., Miyano S.: “Finding optimal bayesian networks given a super-structure,” Journal of Machine Learning Research, 9, 2251–2286 (2008)
Ramsey, J., Spirtes, P. and Zhang, J., “Adjacency-faithfulness and conservative causal inference,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 401–408, 2006.
Rebane, G. and Pearl, J., “The recovery of causal poly-trees from statistical data,” in Workshop on Uncertainty in Artificial Intelligence (UAI-87), pp. 222– 228, 1987.
Reichenbach, H., The Direction of Time, Dover Publications (Republication of the work published by University of California Press, Berkeley), Mineola, NY, 1956.
Rissanen J.: “Modeling by shortest data description” Automatica, 14, 465–471 (1978)
Rissanen J.: “Optimal estimation,” IEEE Information Theory Society Newsletter, 69(3), 1–2 (2009)
Robert, C. P., The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, Springer-Verlag, New York, NY, second edition, 2007.
Schwarz G.: “Estimating the dimension of a model,” Annals of Statistics, 6, 461–464 (1978)
Silander, T., Kontkane, P. and Myllymaki, P., “On sensitivity of the map Bayesian network structure to the equivalent sample size parameter,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-07), pp. 360–367, 2007.
Spirtes P., Glymour C.: “An algorithm for fast recovery of sparse causal graphs,” Social Science Computer Review, 9(1), 62–72 (1991)
Spirtes, P., Glymour, C. and Scheines, R., “Causality from probability,” Evolving Knowledge in the Natural and Behavioral Sciences (Tiles, J., McKee, G. and Dean, G., Eds.), Pitman Publishing, London, 1990.
Spirtes, P., Glymour, C. and Scheines, R., Causation, Prediction and Search, MIT Press, Cambridge, MA, second edition, 2000.
Steck, H., “Learning the Bayesian network structure: Dirichlet prior versus data,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-08), pp. 511–518, 2008.
Suzuki, J., “A construction of Bayesian networks from databases based on an MDL principle,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-93), pp. 266–273, 1993.
Tsamardinos I., Brown L. E., Aliferis C. F.: “The max-min hill-climbing Bayesian network structure learning algorithm,” Machine Learning, 65(1), 31–78 (2006)
Verma, T. and Pearl, J., “Equivalence and synthesis of causal models,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-90), pp. 220–227, 1990.
Verma, T. and Pearl, J., “An algorithm for deciding if a set of observed independencies has a causal explanation,” in Proc. of Conference on Uncertainty in Artificial Intelligence (UAI-92), pp. 323–330, 1992.
Watanabe K., Shiga M., Watanabe S.: “Upper bound for variational free energy of Bayesian networks,” Machine Learning, 75(2), 199–215 (2009)
Yang S., Chang K. C.: “Comparison of score metrics for Bayesian network learning,” IEEE Trans. on Systems, Man and Cybernetics Part A: Systems and Humans, 32(3), 419–428 (2002)
Yedidia J. S., Freeman W. T., Weiss Y.: “Constructing free-energy approximations and generalized belief propagation algorithms,” IEEE Trans. on Information Theory, 51(7), 2282–2312 (2005)
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Isozaki, T. Learning Causal Bayesian Networks Using Minimum Free Energy Principle. New Gener. Comput. 30, 17–52 (2012). https://doi.org/10.1007/s00354-012-0103-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-012-0103-1