Advertisement

Many Are Better Than One: Improving Probabilistic Estimates from Decision Trees

  • Nitesh V. Chawla
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3944)

Abstract

Decision trees, a popular choice for classification, have their limitation in providing probability estimates, requiring smoothing at the leaves. Typically, smoothing methods such as Laplace or m-estimate are applied at the decision tree leaves to overcome the systematic bias introduced by the frequency-based estimates. In this work, we show that an ensemble of decision trees significantly improves the quality of the probability estimates produced at the decision tree leaves. The ensemble overcomes the myopia of the leaf frequency based estimates. We show the effectiveness of the probabilistic decision trees as a part of the Predictive Uncertainty Challenge. We also include three additional highly imbalanced datasets in our study. We show that the ensemble methods significantly improve not only the quality of the probability estimates but also the AUC for the imbalanced datasets.

Keywords

Decision Tree Feature Selection Probability Estimate Information Gain Ensemble Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, P.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)zbMATHGoogle Scholar
  2. 2.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning 36(1,2) (1999)Google Scholar
  3. 3.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. epartment of Information and Computer Sciences, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/~MLRepository.html
  4. 4.
    Bradley, A.P.: The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30(6), 1145–1159 (1997)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHGoogle Scholar
  6. 6.
    Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Oversampling TEchnique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  7. 7.
    Cohen, W.: Learning to Classify English Text with ILP Methods. In: Proceedings of the 5th International Workshop on Inductive Logic Programming, pp. 3–24. Department of Computer Science, Katholieke Universiteit Leuven (1995)Google Scholar
  8. 8.
    Cussents, J.: Bayes and pseudo-bayes estimates of conditional probabilities and their reliabilities. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667. Springer, Heidelberg (1993)Google Scholar
  9. 9.
    Draper, B., Baek, K.: Bagging in computer vision. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 144–149 (1998)Google Scholar
  10. 10.
    Dietterich, T.: An empirical comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 40(2), 139–157 (2000)CrossRefGoogle Scholar
  11. 11.
    Domingos, P.: Why does bagging work? a bayesian account and its implications. In: Proceedings of Third International Conference Knowledge Discovery and Data Mining, pp. 155–158 (1997)Google Scholar
  12. 12.
    Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155 (1998)Google Scholar
  13. 13.
    Ezawa, J.K., Singh, M., Norton, W.S.: Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management. In: Proceedings of the International Conference on Machine Learning, ICML 1996, Bari, Italy, pp. 139–147. Morgan Kauffman, San Francisco (1996)Google Scholar
  14. 14.
    Fayyad, U., Kohavi, R.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)Google Scholar
  15. 15.
    Fawcett, T., Provost, F.: Combining Data Mining and Machine Learning for Effective User Profile. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 8–13. AAAI, Menlo Park (1996)Google Scholar
  16. 16.
    De Groot, M., Fienberg, S.: The Comparison and Evaluation of Forecasters. Statistician 32, 12–22 (1983)CrossRefGoogle Scholar
  17. 17.
    Ho, T.K.: The random subspace method for constructing decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  18. 18.
    Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intelligent Data Analysis 6(5) (2002)Google Scholar
  19. 19.
    Kubat, M., Holte, R., Matwin, S.: Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning 30, 195–215 (1998)CrossRefGoogle Scholar
  20. 20.
    Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)CrossRefzbMATHGoogle Scholar
  21. 21.
    Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)Google Scholar
  22. 22.
    Mladenić, D., Grobelnik, M.: Feature Selection for Unbalanced Class Distribution and Naive Bayes. In: Proceedings of the 16th International Conference on Machine Learning, pp. 258–267. Morgan Kaufmann, San Francisco (1999)Google Scholar
  23. 23.
    Minsky, M.: Logical versus analogical, symbolic versus connectionist, neat versus scruffy. AI Magazine 12 (1991)Google Scholar
  24. 24.
    Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52(3) (2003)Google Scholar
  25. 25.
    Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, Madison, WI, pp. 445–453. Morgan Kauffmann, San Francisco (1998)Google Scholar
  26. 26.
    Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–215 (1994)Google Scholar
  27. 27.
    Quiñonero-Candela, J.: Evaluating Predictive Uncertainty Challenge (2005), http://www.predict.kyb.tuebingen.mpg.de
  28. 28.
    Quinlan, J.R.: Simplifying decision trees. International Journal of Man Machine Studies 27, 227–248 (1987)CrossRefGoogle Scholar
  29. 29.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)Google Scholar
  30. 30.
    Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 725–730 (1996)Google Scholar
  31. 31.
    Sylvester, J., Chawla, N.V.: Evolutionary ensembles: Combining learning agents using genetic algorithms. In: AAAI Workshop on Multiagent Learning, pp. 46–51 (2005)Google Scholar
  32. 32.
    Smyth, P., Gray, A., Fayyad, U.: Retrofitting decision tree classifiers using kernel density estimation. In: Proceedings of the Twelth International Conference on Machine Learning, pp. 506–514 (1995)Google Scholar
  33. 33.
    Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C., Kegelmeyer, P.: Comparative Evaluation of Pattern Recognition Techniques for Detection of Microcalcifications in Mammography. International Journal of Pattern Recognition and Artificial Intelligence 7(6), 1417–1436 (1993)CrossRefGoogle Scholar
  34. 34.
    Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nitesh V. Chawla
    • 1
  1. 1.Department of Computer Science and EngineeringUniversity of Notre DameNotre DameUSA

Personalised recommendations