Estimating Prediction Certainty in Decision Trees

  • Eduardo P. Costa
  • Sicco Verwer
  • Hendrik Blockeel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8207)


Decision trees estimate prediction certainty using the class distribution in the leaf responsible for the prediction. We introduce an alternative method that yields better estimates. For each instance to be predicted, our method inserts the instance to be classified in the training set with one of the possible labels for the target attribute; this procedure is repeated for each one of the labels. Then, by comparing the outcome of the different trees, the method can identify instances that might present some difficulties to be correctly classified, and attribute some uncertainty to their prediction. We perform an extensive evaluation of the proposed method, and show that it is particularly suitable for ranking and reliability estimations. The ideas investigated in this paper may also be applied to other machine learning techniques, as well as combined with other methods for prediction certainty estimation.


Decision trees prediction certainty soft classifiers 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ferri, C., Flach, P.A., Hernández-Orallo, J.: Improving the AUC of probabilistic estimation trees. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 121–132. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Provost, F., Domingos, P.: Tree induction for probability-based ranking. Machine Learning 52(3), 199–215 (2003)CrossRefzbMATHGoogle Scholar
  3. 3.
    Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning (ICML), pp. 609–616 (2001)Google Scholar
  4. 4.
    Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Review 78(1), 1–3 (1950)CrossRefGoogle Scholar
  5. 5.
    Kukar, M., Kononenko, I.: Reliable classifications with machine learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 219–231. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  7. 7.
    Hüllermeier, E., Vanderlooy, S.: Why fuzzy decision trees are good rankers. IEEE Transactions on Fuzzy Systems 17(6), 1233–1244 (2009)CrossRefGoogle Scholar
  8. 8.
    Margineantu, D.D., Dietterich, T.G.: Improved class probability estimates from decision tree models. In: Denison, D.D., Hansen, M.H., Holmes, C.C., Mallick, B., Yu, B. (eds.) Nonlinear Estimation and Classification. Lecture Notes in Statistics, vol. 171, pp. 169–184. Springer (2001)Google Scholar
  9. 9.
    Liang, H., Yan, Y.: Improve decision trees for probability-based ranking by lazy learners. In: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence, pp. 427–435 (2006)Google Scholar
  10. 10.
    Ling, C.X., Yan, R.J.: Decision tree with better ranking. In: Proceedings of the 20th International Conference on Machine Learning, pp. 480–487 (2003)Google Scholar
  11. 11.
    Wang, B., Zhang, H.: Improving the ranking performance of decision trees. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 461–472. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Li, M., Vitanyi, P.: An Introduction to Kolmogorov Complexity and its Applications. Springer (1997)Google Scholar
  13. 13.
    Vovk, V., Gammerman, A., Saunders, C.: Machine-learning applications of algorithmic randomness. In: Proceedings of the 16th International Conference on Machine Learning, pp. 444–453 (1999)Google Scholar
  14. 14.
    Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Bache, K., Lichman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2013),
  16. 16.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Eduardo P. Costa
    • 1
  • Sicco Verwer
    • 2
  • Hendrik Blockeel
    • 1
    • 3
  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium
  2. 2.Institute for Computing and Information SciencesRadboud University NijmegenThe Netherlands
  3. 3.Leiden Institute of Advanced Computer ScienceUniversiteit LeidenLeidenThe Netherlands

Personalised recommendations