Lazy Learning for Improving Ranking of Decision Trees
Decision tree-based probability estimation has received great attention because accurate probability estimation can possibly improve classification accuracy and probability-based ranking. In this paper, we aim to improve probability-based ranking under decision tree paradigms using AUC as the evaluation metric. We deploy a lazy probability estimator at each leaf to avoid uniform probability assignment. More importantly, the lazy probability estimator gives higher weights to the leaf samples closer to an unlabeled sample so that the probability estimation of this unlabeled sample is based on its similarities to those leaf samples. The motivation behind it is that ranking is a relative evaluation measurement among a set of samples, therefore, it is reasonable to yield the probability for an unlabeled sample with reference to its extent of similarities to its neighbors. The proposed new decision tree model, LazyTree, outperforms C4.5, its recent improvement C4.4 and their state-of-the-art variants in AUC on a large suite of benchmark sample sets.
KeywordsDecision Tree Probability Estimator Decision Tree Model Unlabeled Sample Improve Ranking
Unable to display preview. Download preview PDF.
- 1.Blake, C., Merz, C.J.: Uci repository of machine learning databaseGoogle Scholar
- 2.Flach, P.A., Ferri, C., Hernandez-Orallo, J.: Improving the auc of probabilistic estimation trees. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, Springer, Heidelberg (2003)Google Scholar
- 3.Hand, D.J., Till, R.J.: A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning 45 (2001)Google Scholar
- 4.Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
- 5.Liang, H., Yan, Y.: Lazy learning for improving ranking of decision trees (2006), www.flydragontech.com/publications/2006/LazyLeaveTree_long.pdf
- 6.Ling, C.X., Yan, R.J.: Decision tree with better ranking. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), Morgan Kaufmann, San Francisco (2003)Google Scholar
- 7.Provost, F.J., Domingos, P.: Tree induction for probability-based ranking. Machine Learning 52(30) (2003)Google Scholar
- 8.Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)Google Scholar
- 9.Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), Springer, Heidelberg (2001)Google Scholar