Improving the Ranking Performance of Decision Trees
An accurate ranking of instances based on their class probabilities, which is measured by AUC (area under the Receiver Operating Characteristics curve), is desired in many applications. In a traditional decision tree, two obstacles prevent it from yielding accurate rankings: one is that the sample size on a leaf is small, and the other is that the instances falling into the same leaf are assigned to the same class probability. In this paper, we propose two techniques to address these two issues. First, we use the statistical technique shrinkage which estimates the class probability of a test instance by using a linear interpolation of the local class probabilities on each node along the path from leaf to root. An efficient algorithm is also brought forward to learn the interpolating weights. Second, we introduce an instance-based method, the weighted probability estimation (WPE), to generate distinct local probability estimates for the test instances falling into the same leaf. The key idea is to assign different weights to training instances based on their similarities to the test instance in probability estimation. Furthermore, we combine shrinkage and WPE together to compensate for the defects of each. Our experiments show that both shrinkage and WPE improve the ranking performance of decision trees, and that their combination works even better. The experiments also indicate that various decision tree algorithms with the combination of shrinkage and WPE significantly outperform the original ones and other state-of-the-art techniques proposed to enhance the ranking performance of decision trees.
KeywordsDecision Tree Class Probability Ranking AUC Shrinkage WPE
Unable to display preview. Download preview PDF.
- 2.Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Artificial Intelligence 36, 105–142 (1989)Google Scholar
- 4.Buntine, W.: Learning classification trees. In: Artificial Intelligence frontiers in statistics, pp. 182–201. Chapman & Hall, London (1993)Google Scholar
- 7.Hastie, T., Pregibon, L.: Shrinking Trees. AT & T Bell Laboratories (1990)Google Scholar
- 8.Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence (IJCAI 2003), pp. 329–341. Morgan Kaufmann, San Francisco (2003)Google Scholar
- 9.Ling, C.X., Yan, R.J.: Decision tree with Better Ranking. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003). AAAI Press, Menlo Park (2003)Google Scholar
- 10.McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.Y.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning, pp. 359–367. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 11.Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Bunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 217–225. Morgan Kaufmann, San Francisco (1994)Google Scholar
- 12.Provost, F., Domingos, P.: Tree Induction for Probability-based Ranking. In: Machine Learning. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
- 13.Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1999)Google Scholar
- 14.Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 43–48. AAAI Press, Menlo Park (1997)Google Scholar
- 15.Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 17.Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning, pp. 609–616. Morgan Kaufmann, San Francisco (2001)Google Scholar