Evaluating Misclassifications in Imbalanced Data

  • William Elazmeh
  • Nathalie Japkowicz
  • Stan Matwin
Conference paper

DOI: 10.1007/11871842_16

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)
Cite this paper as:
Elazmeh W., Japkowicz N., Matwin S. (2006) Evaluating Misclassifications in Imbalanced Data. In: Fürnkranz J., Scheffer T., Spiliopoulou M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science, vol 4212. Springer, Berlin, Heidelberg

Abstract

Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable. We propose a generic framework for classifier evaluation to identify a segment of an ROC curve in which misclassifications are balanced. Confidence is measured by Tango’s 95%-confidence interval for the difference in misclassification in both classes. We test our method with severe class imbalance in a two-class problem. Our evaluation favors classifiers with low numbers of misclassifications in both classes. Our results show that the proposed evaluation method is more confident than ROC bands.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • William Elazmeh
    • 1
  • Nathalie Japkowicz
    • 1
  • Stan Matwin
    • 1
    • 2
  1. 1.School of Information Technology and EngineeringUniversity of OttawaCanada
  2. 2.The Institute of Computer SciencePolish Academy of SciencesPoland

Personalised recommendations