Advertisement

Cost-Based Classifier Evaluation for Imbalanced Problems

  • Thomas Landgrebe
  • Pavel Paclík
  • David M. J. Tax
  • Serguei Verzakov
  • Robert P. W. Duin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3138)

Abstract

A common assumption made in the field of Pattern Recognition is that the priors inherent to the class distributions in the training set are representative of the true class distributions. However this assumption does not always hold, since the true class-distributions may be different, and in fact may vary significantly. The implication of this is that the effect on cost for a given classifier may be worse than expected. In this paper we address this issue, discussing a theoretical framework and methodology to assess the effect on cost for a classifier in imbalanced conditions. The methodology can be applied to many different types of costs. Some artificial experiments show how the methodology can be used to assess and compare classifiers. It is observed that classifiers that model the underlying distributions well are more resilient to changes in the true class distribution than weaker classifiers.

Keywords

Class Distribution Left Plot Target Class Positive Fraction Imbalanced Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bishop, C.M.: Neural Networks for Pattern Recognition, 1st edn. Oxford University Press Inc., New York (1995)Google Scholar
  2. 2.
    Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn. Wiley- Interscience, Chichester (2001)zbMATHGoogle Scholar
  3. 3.
    Duin, R.P.W.: On the choice of smoothing parameters for parzen estimators of probability density functions. IEEE Trans. Computing 25, 1175–1179 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Duin, R.P.W.: PRTools Version 3.0, A Matlab Toolbox for Pattern Recognition. Pattern Recognition Group, TUDelft (January 2000)Google Scholar
  5. 5.
    Flach, P.: The geometry of roc space: understanding machine learning metrics through roc isometrics. In: ICML 2003 Washington DC, pp. 194–201 (2003)Google Scholar
  6. 6.
    Hand, D.J.: Construction and Assessment of Classification Rules. John Wiley and Sons, Chichester (1997) ISBN 0-471- 96583-9zbMATHGoogle Scholar
  7. 7.
    Highleyman, W.: Linear decision functions, with application to pattern recognition. In: Proc. IRE, vol. 49, pp. 31–48 (1961)Google Scholar
  8. 8.
    Kubat, M., Matwin, S.: Addressing the curse of imbalanced data sets: One-sided sampling. In: Proceedings, 14th ICML, Nashville, July 1997, pp. 179–186 (1997)Google Scholar
  9. 9.
    Metz, C.: Basic principles of roc analysis. Seminars in Nuclear Medicine 3(4) (1978)Google Scholar
  10. 10.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42, 203–231 (2001)zbMATHCrossRefGoogle Scholar
  11. 11.
    Weiss, G.M., Provost, F.: The effect of class distribution on classifier learning: an empirical study. Technical report ML-TR-44, Department of Computer Science, Rutgers University (August 2, 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Thomas Landgrebe
    • 1
  • Pavel Paclík
    • 1
  • David M. J. Tax
    • 1
  • Serguei Verzakov
    • 1
  • Robert P. W. Duin
    • 1
  1. 1.Elect. Eng., Maths and Comp. Sc.Delft University of TechnologyThe Netherlands

Personalised recommendations