A Bayes Evaluation Criterion for Decision Trees

  • Nicolas Voisine
  • Marc Boullé
  • Carine Hue
Part of the Studies in Computational Intelligence book series (SCI, volume 292)


We present a new evaluation criterion for the induction of decision trees. We exploit a parameter-free Bayesian approach and propose an analytic formula for the evaluation of the posterior probability of a decision tree given the data. We thus transform the training problem into an optimization problem in the space of decision tree models, and search for the best tree, which is the maximum a posteriori (MAP) one. The optimization is performed using top-down heuristics with pre-pruning and post-pruning processes. Extensive experiments on 30 UCI datasets and on the 5 WCCI 2006 performance prediction challenge datasets show that our method obtains predictive performance similar to that of alternative state-of-the-art methods, with far simpler trees.


Decision Tree Bayesian Optimization Minimum Description Length Supervised Learning Model Selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blake, C., Merz, C.: UCI repository of machine learning databases (1996),
  2. Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)Google Scholar
  3. Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)CrossRefGoogle Scholar
  4. Boullé, M.: Compression-Based Averaging of Selective Naive Bayes Classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)Google Scholar
  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, New York (1984)zbMATHGoogle Scholar
  6. Breslow, L.A., Aha, D.W.: Simplifying decision trees: A survey. Knowl. Eng. Rev. 12(1), 1–40 (1997), CrossRefGoogle Scholar
  7. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & sons, Chichester (1991)zbMATHCrossRefGoogle Scholar
  8. Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, HP Laboratories (2003)Google Scholar
  9. Garner, S.R.: WEKA: The Waikato Environment for Knowledge Analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  10. Guyon, I., Saffari, A., Dror, G., Bumann, J.: Performance Prediction Challenge. In: International Joint Conference on Neural Networks, pp. 2958–2965 (2006),
  11. Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)CrossRefGoogle Scholar
  12. Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (2002)Google Scholar
  13. Morgan, J., Sonquist, J.A.: Problems in the analysis of Survey data, And a proposal. Journal of the American Statistical Association 58, 415–435 (1963)zbMATHCrossRefGoogle Scholar
  14. Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)CrossRefMathSciNetGoogle Scholar
  15. Naumov, G.E.: NP-completeness of problems of construction of optimal decision trees. Soviet Physics 34(4), 270–271 (1991)MathSciNetGoogle Scholar
  16. Provost, F., Domingos, P.: Well-trained PETs: Improving Probability Estimation Trees. Technical Report CeDER #IS-00-04, New York University (2001)Google Scholar
  17. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–553 (1998)Google Scholar
  18. Quinlan, J., Rivest, R.: Inferring decision trees using the minimum description length principle. Inf. Comput. 80(3), 227–248 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  19. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)Google Scholar
  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  21. Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)zbMATHCrossRefMathSciNetGoogle Scholar
  22. Wallace, C., Patrick, J.: Coding Decision Trees. Machine Learning 11, 7–22 (1993)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Nicolas Voisine
    • 1
  • Marc Boullé
    • 1
  • Carine Hue
    • 1
  1. 1.Orange LabsFrance

Personalised recommendations