Abstract
This chapter addresses the problem of estimating the parameters of a Bayesian network from incomplete data. This is a hard problem, which for computational reasons cannot be effectively tackled by a full Bayesian approach. The work around is to search for the estimate with maximum posterior probability. This is usually done by selecting the highest posterior probability estimate among those found by multiple runs of Expectation-Maximization with distinct starting points. However, many local maxima characterize the posterior probability function, and several of them have similar high probability. We argue that high probability is necessary but not sufficient in order to obtain good estimates. We present an approach based on maximum entropy to address this problem and describe a simple and effective way to implement it. Experiments show that our approach produces significantly better estimates than the most commonly used method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
MCAR (or missing completely at random) indicates that the probability of each value being missing does not depend on the value itself, neither on the value of other variables.
References
Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the 2nd European Conference on Artificial Intelligence. Medicine, vol. 38, pp. 247–256 (1989)
Cowell, R.G.: Parameter learning from incomplete data for Bayesian networks. In: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann (1999)
de Campos, C.P., Cozman, F.G.: Inference in credal networks using multilinear programming. In: Proceedings of the 2nd Starting AI Researcher Symposium, pp. 50–61. IOS Press, Valencia (2004)
de Campos, C.P., Ji, Q.: Improving Bayesian network parameter learning using constraints. In: Proceedings of the 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
de Campos, C.P., Zhang, L., Tong, Y., Ji, Q.: Semi-qualitative probabilistic networks in computer vision problems. J. Stat. Theory Pract. 3(1), 197–210 (2009)
de Campos, C.P., Ji, Q.: Bayesian networks and the imprecise Dirichlet model applied to recognition problems. In: W. Liu (ed.) Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Lecture Notes in Computer Science, vol. 6717, pp. 158–169. Springer, Berlin (2011)
de Campos, C.P., Rancoita, P.M.V., Kwee, I., Zucca, E., Zaffalon, M., Bertoni, F.: Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices. PLoS ONE 8(11), e79,720 (2013)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Series B 39(1), 1–38 (1977)
Good, I.J.: Studies in the history of probability and statistics. XXXVII A. M. Turing’s statistical work in World War II. Biometrika 66, 393–396 (1979)
Heckerman, D.: A tutorial on learning with Bayesian networks. In: Jordan, M. Learning in Graphical Models vol. 89, pp. 301–354. MIT, Cambridge (1998)
Huang, B., Salleb-Aouissi, A.: Maximum entropy density estimation with incomplete presence-only data. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics: JMLR W&CP 5, pp. 240–247 (2009)
Jaynes, E.T.: On the rationale of maximum-entropy methods. Proc. IEEE 70(9), 939–952 (1982)
Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT, Cambridge (2009)
Lauritzen, S.L.: The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19(2), 191–201 (1995)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Stat. Soc. Series B 50(2), 157–224 (1988)
Lukasiewicz, T.: Credal Networks under Maximum Entropy. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 363–370. Morgan Kaufmann Publishers Inc. (2000)
McLachlan, G.M., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
Murphy, K.P.: The Bayes Net Toolbox for MATLAB. In: Comput. Sci. Stat. 33, 331–350 (2001)
Ramoni, M., Sebastiani, P.: Robust learning with missing data. Mach. Learn. 45(2), 147–170 (2001)
Sherali, H.D., Tuncbilek, C.H.: A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique. J. Global Optim. 2, 101–112 (1992)
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, New York (1991)
Wang, S., Schuurmans, D., Peng, F., Zhao, Y.: Combining statistical language models via the latent maximum entropy principle. Mach. Learn. 60(1–3), 229–250 (2005)
Acknowledgements
The research in this paper has been partially supported by the Swiss NSF grant no. 200021_146606/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Corani, G., de Campos, C. (2015). A Maximum Entropy Approach to Learn Bayesian Networks from Incomplete Data. In: Polpo, A., Louzada, F., Rifo, L., Stern, J., Lauretto, M. (eds) Interdisciplinary Bayesian Statistics. Springer Proceedings in Mathematics & Statistics, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-319-12454-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-12454-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12453-7
Online ISBN: 978-3-319-12454-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)