Stacking for Misclassiffication Cost Performance

  • Mike Cameron-Jones
  • Andrew Charman-Williams
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2056)

Abstract

This paper investigates the application of the multiple classifier technique known as “stacking” [23], to the task of classifier learning for misclassiffication cost performance, by straightforwardly adapting a technique successfully developed by Ting and Witten 20 for the task of classiffier learning for accuracy performance. Experiments are reported comparing the performance of the stacked classiffier with that of its component classifiers, and of other proposed cost-sensitive multiple classifier methods - a variation of “bagging”, and two “boosting” style methods. These experiments confirm that stacking is competitive with the other methods that have previously been proposed. Some further experiments examine the performance of stacking methods with different numbers of component classifiers, including the case of stacking a single classifier, and provide the first demonstration that stacking a single classifier can be beneficial for many data sets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D.W. Aha, D. Kibler, and M.K. Albert. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991.Google Scholar
  2. [2]
    E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105–139, 1999.CrossRefGoogle Scholar
  3. [3]
    C. Blake, E. Keogh, and C.J. Merz. UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine, California, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
  4. [4]
    L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.MATHMathSciNetGoogle Scholar
  5. [5]
    M. Cameron-Jones and L. Richards. Repechage bootstrap aggregating for misclassification cost reduction. In PRICAI’98: Topics in Artificial Intelligence-Fifth Pacific Rim International Conference on Artificial Intelligence, pages 1–11. Springer Verlag, 1998.Google Scholar
  6. [6]
    A. Charman-Williams. Cost-stacked classification, 1999. Honours thesis, School of Computing, University of Tasmania.Google Scholar
  7. [7]
    M. Collins, R.E. Schapire, and Y. Singer. Logistic regression, adaboost and bregman distances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 158–169. Morgan Kaufmann, 2000.Google Scholar
  8. [8]
    S. Cost and S. Salzberg. A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10:57–78, 1993.Google Scholar
  9. [9]
    T.G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40:139–157, 2000.CrossRefGoogle Scholar
  10. [10]
    C. Drummond and R.C. Holte. Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), pages 239–246. Morgan Kaufmann, 2000.Google Scholar
  11. [11]
    W. Fan, S.J. Stolfo, J. Zhang, and P.K. Chan. Adacost: Misclassification cost-sensitive boosting. In Machine Learning: Proceedings of the Sixteenth International Conference (ICML’ 99), pages 97–105, 1999.Google Scholar
  12. [12]
    Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.MATHCrossRefMathSciNetGoogle Scholar
  13. [13]
    C.L. Lawson and R.J. Hanson. Solving Least Squares Problems. SIAM, 1995.Google Scholar
  14. [14]
    M.G. O’Meara. Investigations in cost boosting, 1998. Honours thesis, School of Computing, University of Tasmania.Google Scholar
  15. [15]
    F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Machine Learning: Proceedings of the Fifteenth International Conference (ICML’98). Morgan Kaufmann, 1998.Google Scholar
  16. [16]
    J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. The Morgan Kaufmann Series in Machine Learning.Google Scholar
  17. [17]
    J.R. Quinlan. Bagging, boosting and c4.5. In Proceedings of the Thirteenth American Association for Artificial Intelligence National Conference on Artificial Intelligence, pages 725–730. AAAI Press, 1996.Google Scholar
  18. [18]
    K.M. Ting. A comparative study of cost-sensitive boosting algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), pages 983–990. Morgan Kaufmann, 2000.Google Scholar
  19. [19]
    K.M. Ting and I.H. Witten. Stacked generalization: when does it work? In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 866–871. Morgan Kaufmann, 1997.Google Scholar
  20. [20]
    K.M. Ting and I.H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Research, 10:271–289, 1999.MATHGoogle Scholar
  21. [21]
    K.M. Ting and Z. Zheng. Boosting trees for cost-sensitive classifications. In Machine Learning: ECML-98: Proceedings of the Tenth European Conference on Machine Learning, pages 190–195. Springer-Verlag, 1998.Google Scholar
  22. [22]
    P.D. Turney. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2:369–409, 1995.Google Scholar
  23. [23]
    D.H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Mike Cameron-Jones
    • 1
  • Andrew Charman-Williams
    • 1
  1. 1.University of TasmaniaLauncestonAustralia

Personalised recommendations