Gradient Descent Style Leveraging of Decision Trees and Stumps for Misclassification Cost Performance

  • Mike Cameron-Jones
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2256)

Abstract

This paper investigates the use, for the task of classifier learning in the presence of misclassification costs, of some gradient descent style leveraging approaches to classifier learning: Schapire and Singer’s AdaBoost.MH and AdaBoost.MR [16], and Collins et al’s multiclass logistic regression method [4], and some modifications that retain the gradient descent style approach. Decision trees and stumps are used as the underlying base classifiers, learned from modified versions of Quinlan’s C4.5 [15]. Experiments are reported comparing the performance, in terms of average cost, of the modified methods to that of the originals, and to the previously suggested “Cost Boosting” methods of Ting and Zheng [21] and Ting [18], which also use decision trees based upon modified C4.5 code, but do not have an interpretation in the gradient descent framework. While some of the modifications improve upon the originals in terms of cost performance for both trees and stumps, the comparison with tree-based Cost Boosting suggests that out of the methods first experimented with here, it is one based on stumps that has the most promise.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.MATHMathSciNetGoogle Scholar
  2. [2]
    M. Cameron-Jones and A. Charman-Williams. Stacking for misclassification cost performance. In Advances in Artificial Intelligence: 14th Biennial Conference Canadian Society for Computational Studies of Intelligence, AI2001, pages 215–224. Springer Verlag, 2001.Google Scholar
  3. [3]
    M. Cameron-Jones and L. Richards. Repechage bootstrap aggregating for misclassification cost reduction. In PRICAI’98: Topics in Artificial Intelligence-Fifth Pacific Rim International Conference on Artificial Intelligence, pages 1–11. Springer Verlag, 1998.Google Scholar
  4. [4]
    M. Collins, R.E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 158–169. Morgan Kaufmann, 2000.Google Scholar
  5. [5]
    C. Drummond and R.C. Holte. Explicitly representing expected cost: An alternative to roc representation. Technical report, University of Ottawa, 2000.Google Scholar
  6. [6]
    C. Drummond and R.C. Holte. Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), pages 239–246. Morgan Kaufmann, 2000.Google Scholar
  7. [7]
    Nigel Duffy and David Helmbold. Potential boosters? In Advances in Neural Information Processing Systems 12, pages 258–264. MIT Press, 2000.Google Scholar
  8. [8]
    W. Fan, S.J. Stolfo, J. Zhang, and P.K. Chan. Adacost: Misclassification costsensitive boosting. In Machine Learning: Proceedings of the Sixteenth International Conference (ICML’ 99), pages 97–105, 1999.Google Scholar
  9. [9]
    Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.MATHCrossRefMathSciNetGoogle Scholar
  10. [10]
    J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression. Technical report, Stanford University, 1998.Google Scholar
  11. [11]
    D. Margineantu. Building ensembles of classifiers for loss minimisation. In Proceedings of the 31st Symposium of the Interface: Models, Prediction and Computing, pages 190–194, 1999.Google Scholar
  12. [12]
    M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk. Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning (ML94), pages 217–225. Morgan Kaufmann, 1994.Google Scholar
  13. [13]
    F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42:203–231, 2001.MATHCrossRefGoogle Scholar
  14. [14]
    F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Machine Learning: Proceedings of the Fifteenth International Conference (ICML’98). Morgan Kaufmann, 1998.Google Scholar
  15. [15]
    J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. The Morgan Kaufmann Series in Machine Learning.Google Scholar
  16. [16]
    R.E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:297–336, 1999.MATHCrossRefGoogle Scholar
  17. [17]
    K.M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), pages 983–990. Morgan Kaufmann, 2000.Google Scholar
  18. [18]
    K.M. Ting. An empirical study of metacost using boosting algorithms. In Proceedings of the Eleventh European Conference on Machine Learning (ECML-2000), pages 413–425. Springer Verlag, 2000.Google Scholar
  19. [19]
    K.M. Ting and I.H. Witten. Stacked generalization: when does it work? In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 866–871. Morgan Kaufmann, 1997.Google Scholar
  20. [20]
    K.M. Ting and I.H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271–289, 1999.MATHGoogle Scholar
  21. [21]
    K.M. Ting and Z. Zheng. Boosting trees for cost-sensitive classifications. In Machine Learning: ECML-98: Proceedings of the Tenth European Conference on Machine Learning, pages 190–195. Springer-Verlag, 1998.Google Scholar
  22. [22]
    P.D. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2:369–409, 1995.Google Scholar
  23. [23]
    D.H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Mike Cameron-Jones
    • 1
  1. 1.School of ComputingUniversity of TasmaniaLauncestonAustralia

Personalised recommendations