Skip to main content

A Maximum Entropy Approach to Learn Bayesian Networks from Incomplete Data

  • Conference paper
  • First Online:
Interdisciplinary Bayesian Statistics

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 118))

  • 2276 Accesses

Abstract

This chapter addresses the problem of estimating the parameters of a Bayesian network from incomplete data. This is a hard problem, which for computational reasons cannot be effectively tackled by a full Bayesian approach. The work around is to search for the estimate with maximum posterior probability. This is usually done by selecting the highest posterior probability estimate among those found by multiple runs of Expectation-Maximization with distinct starting points. However, many local maxima characterize the posterior probability function, and several of them have similar high probability. We argue that high probability is necessary but not sufficient in order to obtain good estimates. We present an approach based on maximum entropy to address this problem and describe a simple and effective way to implement it. Experiments show that our approach produces significantly better estimates than the most commonly used method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    MCAR (or missing completely at random) indicates that the probability of each value being missing does not depend on the value itself, neither on the value of other variables.

References

  1. Beinlich, I.A., Suermondt, H.J., Chavez, R.M., Cooper, G.F.: The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the 2nd European Conference on Artificial Intelligence. Medicine, vol. 38, pp. 247–256 (1989)

    Google Scholar 

  2. Cowell, R.G.: Parameter learning from incomplete data for Bayesian networks. In: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann (1999)

    Google Scholar 

  3. de Campos, C.P., Cozman, F.G.: Inference in credal networks using multilinear programming. In: Proceedings of the 2nd Starting AI Researcher Symposium, pp. 50–61. IOS Press, Valencia (2004)

    Google Scholar 

  4. de Campos, C.P., Ji, Q.: Improving Bayesian network parameter learning using constraints. In: Proceedings of the 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)

    Google Scholar 

  5. de Campos, C.P., Zhang, L., Tong, Y., Ji, Q.: Semi-qualitative probabilistic networks in computer vision problems. J. Stat. Theory Pract. 3(1), 197–210 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  6. de Campos, C.P., Ji, Q.: Bayesian networks and the imprecise Dirichlet model applied to recognition problems. In: W. Liu (ed.) Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Lecture Notes in Computer Science, vol. 6717, pp. 158–169. Springer, Berlin (2011)

    Google Scholar 

  7. de Campos, C.P., Rancoita, P.M.V., Kwee, I., Zucca, E., Zaffalon, M., Bertoni, F.: Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices. PLoS ONE 8(11), e79,720 (2013)

    Article  Google Scholar 

  8. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  9. Good, I.J.: Studies in the history of probability and statistics. XXXVII A. M. Turing’s statistical work in World War II. Biometrika 66, 393–396 (1979)

    Article  MathSciNet  Google Scholar 

  10. Heckerman, D.: A tutorial on learning with Bayesian networks. In: Jordan, M. Learning in Graphical Models vol. 89, pp. 301–354. MIT, Cambridge (1998)

    Google Scholar 

  11. Huang, B., Salleb-Aouissi, A.: Maximum entropy density estimation with incomplete presence-only data. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics: JMLR W&CP 5, pp. 240–247 (2009)

    Google Scholar 

  12. Jaynes, E.T.: On the rationale of maximum-entropy methods. Proc. IEEE 70(9), 939–952 (1982)

    Article  Google Scholar 

  13. Koller, D., Friedman, N.: Probabilistic Graphical Models. MIT, Cambridge (2009)

    MATH  Google Scholar 

  14. Lauritzen, S.L.: The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19(2), 191–201 (1995)

    Article  MATH  Google Scholar 

  15. Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Stat. Soc. Series B 50(2), 157–224 (1988)

    MATH  MathSciNet  Google Scholar 

  16. Lukasiewicz, T.: Credal Networks under Maximum Entropy. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 363–370. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  17. McLachlan, G.M., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)

    MATH  Google Scholar 

  18. Murphy, K.P.: The Bayes Net Toolbox for MATLAB. In: Comput. Sci. Stat. 33, 331–350 (2001)

    Google Scholar 

  19. Ramoni, M., Sebastiani, P.: Robust learning with missing data. Mach. Learn. 45(2), 147–170 (2001)

    Article  MATH  Google Scholar 

  20. Sherali, H.D., Tuncbilek, C.H.: A global optimization algorithm for polynomial programming problems using a reformulation-linearization technique. J. Global Optim. 2, 101–112 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  21. Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, New York (1991)

    Book  MATH  Google Scholar 

  22. Wang, S., Schuurmans, D., Peng, F., Zhao, Y.: Combining statistical language models via the latent maximum entropy principle. Mach. Learn. 60(1–3), 229–250 (2005)

    Article  Google Scholar 

Download references

Acknowledgements

The research in this paper has been partially supported by the Swiss NSF grant no. 200021_146606/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Corani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Corani, G., de Campos, C. (2015). A Maximum Entropy Approach to Learn Bayesian Networks from Incomplete Data. In: Polpo, A., Louzada, F., Rifo, L., Stern, J., Lauretto, M. (eds) Interdisciplinary Bayesian Statistics. Springer Proceedings in Mathematics & Statistics, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-319-12454-4_6

Download citation

Publish with us

Policies and ethics