How Good Are the Bayesian Information Criterion and the Minimum Description Length Principle for Model Selection? A Bayesian Network Analysis

  • Nicandro Cruz-Ramírez
  • Héctor-Gabriel Acosta-Mesa
  • Rocío-Erandi Barrientos-Martínez
  • Luis-Alonso Nava-Fernández
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)


The Bayesian Information Criterion (BIC) and the Minimum Description Length Principle (MDL) have been widely proposed as good metrics for model selection. Such scores basically include two terms: one for accuracy and the other for complexity. Their philosophy is to find a model that rightly balances these terms. However, it is surprising that both metrics do often not work very well in practice for they overfit the data. In this paper, we present an analysis of the BIC and MDL scores using the framework of Bayesian networks that supports such a claim. To this end, we carry out different tests that include the recovery of gold-standard network structures as well as the construction and evaluation of Bayesian network classifiers. Finally, based on these results, we discuss the disadvantages of both metrics and propose some future work to examine these limitations more deeply.


Bayesian Network Bayesian Information Criterion Minimum Description Length Bayesian Network Structure Minimum Description Length Principle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Heckerman, D.: A Tutorial on Learning with Bayesian Networks. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 301–354. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Grunwald, P.: Tutorial on MDL. In: Grunwald, P., Myung, I.J., Pitt, M.A. (eds.) Advances in Minimum Description Length: Theory and Applications, MIT Press, Cambridge (2005)Google Scholar
  3. 3.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)MATHCrossRefGoogle Scholar
  4. 4.
    Lam, W., Bacchus: Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence 10(4) (1994) Google Scholar
  5. 5.
    Grunwald, P.: Model Selection Based on Minimum Description Length. Journal of Mathematical Psychology 44, 133–152 (2000)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Suzuki, J.: Learning Bayesian Belief Networks based on the MDL principle: An efficient algorithm using the branch and bound technique. In: International Conference on Machine Learning, Bary, Italy (1996)Google Scholar
  7. 7.
    Suzuki, J.: Learning Bayesian Belief Networks based on the Minimum Description Length Principle: Basic Properties. IEICE Transactions on Fundamentals E82-A(10), 2237–2245 (1999)Google Scholar
  8. 8.
    Cooper, G.F.: An Overview of the Representation and Discovery of Causal Relationships using Bayesian Networks. In: Glymour, C., Cooper, G.F. (eds.) Computation, Causation & Discovery, pp. 3–62. AAAI Press / MIT Press (1999)Google Scholar
  9. 9.
    Cooper, G.F., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9, 309–347 (1992)MATHGoogle Scholar
  10. 10.
    Cheng, J.: Learning Bayesian Networks from data: An information theory based approach. In: Faculty of Informatics, University of Ulster, United Kingdom, University of Ulster: Jordanstown, United Kingdom (1998)Google Scholar
  11. 11.
    Friedman, N., Goldszmidt, M.: Learning Bayesian Networks from Data, University of California, Berkeley and Stanford Research Institute, p. 117 (1998)Google Scholar
  12. 12.
    Cheng, J., Bell, D.A., Liu, W.: Learning Belief Networks from Data: An Information Theory Based Approach. In: Sixth ACM International Conference on Information and Knowledge Management, ACM, New York (1997)Google Scholar
  13. 13.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction and Search. In: Berger, J., et al. (eds.), 1st edn. Lecture Notes in Statistics, vol. 81, p. 526. Springer, Heidelberg (1993)Google Scholar
  14. 14.
    Bozdogan, H.: Akaike’s Information Criterion and Recent Developments in Information Complexity. Journal of Mathematical Psychology 44, 62–91 (2000)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian Networks: The combination of knowledge and statistical data. Machine Learning 20, 197–243 (1995)MATHGoogle Scholar
  16. 16.
    Cruz-Ramirez Nicandro, N.-F.L., Gabriel, A.-M.H., Erandi, B.-M., Efrain, R.-M.J.: A Parsimonious Constraint-based Algorithm to Induce Bayesian Network Structures from Data. In: IEEE Proceedings of the Mexican International Conference on Computer Science ENC 2005, pp. 306–313. IEEE, Puebla (2005)Google Scholar
  17. 17.
    Cheng, J., Greiner, R.: Learning Bayesian Belief Network Classifiers: Algorithms and Systems. In: Proceedings of the Canadian Conference on Artificial Intelligence (CSCSI 2001), Ottawa, Canada (2001)Google Scholar
  18. 18.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2001)MATHGoogle Scholar
  19. 19.
    Chickering, D.M.: Learning Bayesian Networks from Data. In: Computer Science, Cognitive Systems Laboratory, University of California, Los Angeles, California, p. 172 (1996)Google Scholar
  20. 20.
    Spiegelhalter, D.J., et al.: Bayesian Analysis in Expert Systems. Statistical Science 8(3), 219–247 (1993)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
  22. 22.
    Murphy, P.M., Aha, D.W.: UCI repository of Machine Learning Databases (1995)Google Scholar
  23. 23.
    Kurgan, L.A., Cios, K.J.: CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145–153 (2004)CrossRefGoogle Scholar
  24. 24.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: 14th International Joint Conference on Artificial Intelligence IJCAI 1995, Morgan Kaufmann, Montreal, Canada (1995a)Google Scholar
  25. 25.
    Cheng, J., Greiner, R.: Comparing Bayesian Network Classifiers. In: Fifteenth Conference on Uncertainty in Artificial Intelligence (1999)Google Scholar
  26. 26.
    Spirtes, P., Meek, C.: Learning Bayesian Networks with Discrete Variables from Data. In: First International Conference on Knowledge Discovery and Data Mining (1995)Google Scholar
  27. 27.
    Singh, M., Valtorta, Marco: An Algorithm for the Construction of Bayesian Network Structures from Data. In: 9th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Francisco (1993)Google Scholar
  28. 28.
    Singh, M., Valtorta, M.: Construction of Bayesian Network Structures from Data: a Brief Survey and an Efficient Algorithm. International Journal of Approximate Reasoning 12, 111–131 (1995)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicandro Cruz-Ramírez
    • 1
  • Héctor-Gabriel Acosta-Mesa
    • 1
  • Rocío-Erandi Barrientos-Martínez
    • 1
  • Luis-Alonso Nava-Fernández
    • 2
  1. 1.Facultad de Física e Inteligencia ArtificialUniversidad VeracruzanaXalapa, VeracruzMéxico
  2. 2.Instituto de Investigaciones en EducaciónUniversidad VeracruzanaXalapa, VeracruzMéxico

Personalised recommendations