F-Measure Curves for Visualizing Classifier Performance with Imbalanced Data

  • Roghayeh Soleymani
  • Eric Granger
  • Giorgio FumeraEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11081)


Training classifiers using imbalanced data is a challenging problem in many real-world recognition applications due in part to the bias in performance that occur for: (1) classifiers that are often optimized and compared using unsuitable performance measurements for imbalance problems; (2) classifiers that are trained and tested on a fixed imbalance level of data, which may differ from operational scenarios; (3) cases where the preference of correct classification of classes is application dependent. Specialized performance evaluation metrics and tools are needed for problems that involve class imbalance, including scalar metrics that assume a given operating condition (skew level and relative preference of classes), and global evaluation curves or metrics that consider a range of operating conditions. We propose a global evaluation space for the scalar F-measure metric that is analogous to the cost curves for expected cost. In this space, a classifier is represented as a curve that shows its performance over all of its decision thresholds and a range of imbalance levels for the desired preference of true positive rate to precision. Experiments with synthetic data show the benefits of evaluating and comparing classifiers under different operating conditions in the proposed F-measure space over ROC, precision-recall, and cost spaces.


Class imbalance Performance visualization tools F-measure 


  1. 1.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: ICML, pp. 233–240 (2006)Google Scholar
  2. 2.
    Dembczynski, K.J., Waegeman, W., Cheng, W., Hüllermeier, E.: An exact algorithm for F-measure maximization. In: NIPS, pp. 1404–1412 (2011)Google Scholar
  3. 3.
    Drummond, C., Holte, R.C.: Cost curves: an improved method for visualizing classifier performance. Mach. Learn. 65(1), 95–130 (2006)CrossRefGoogle Scholar
  4. 4.
    Fawcett, T.: An introduction to ROC analysis. Pattern. Recognit. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Ferri, C., Hernández-orallo, J., Flach, P.A.: Brier curves: a new cost-based visualisation of classifier performance. In: ICML, pp. 585–592 (2011)Google Scholar
  6. 6.
    Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern. Recognit. Lett. 30(1), 27–38 (2009)CrossRefGoogle Scholar
  7. 7.
    Flach, P., Kull, M.: Precision-recall-gain curves: PR analysis done right. In: NIPS, pp. 838–846 (2015)Google Scholar
  8. 8.
    Garcıa, V., Mollineda, R., Sánchez, J.: Theoretical analysis of a performance measure for imbalanced data. In: ICPR, pp. 617–620 (2010)Google Scholar
  9. 9.
    Hanczar, B., Nadif, M.: Precision-recall space to correct external indices for biclustering. In: ICML, pp. 136–144 (2013)Google Scholar
  10. 10.
    Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. in AI 5(4), 221–232 (2016)Google Scholar
  11. 11.
    Landgrebe, T.C., Paclik, P., Duin, R.P.: Precision-recall operating characteristic (P-ROC) curves in imprecise environments. In: ICPR, vol. 4, pp. 123–127 (2006)Google Scholar
  12. 12.
    Lipton, Z.C., Elkan, C., Naryanaswamy, B.: Optimal Thresholding of classifiers to maximize F1 measure. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 225–239. Springer, Heidelberg (2014). Scholar
  13. 13.
    Parambath, S.P., Usunier, N., Grandvalet, Y.: Optimizing F-measures by cost-sensitive classification. In: NIPS, pp. 2123–2131 (2014)Google Scholar
  14. 14.
    Pillai, I., Fumera, G., Roli, F.: Designing multi-label classifiers that maximize F measures: state of the art. Pattern. Recognit. 61, 394–404 (2017)CrossRefGoogle Scholar
  15. 15.
    Prati, R.C., Batista, G.E., Monard, M.C.: A survey on graphical methods for classification predictive performance evaluation. IEEE Trans. KDE 23(11), 1601–1618 (2011)Google Scholar
  16. 16.
    Van Rijsbergen, C.: Information retrieval: theory and practice. In: Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, pp. 1–14 (1979)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Roghayeh Soleymani
    • 1
  • Eric Granger
    • 1
  • Giorgio Fumera
    • 2
    Email author
  1. 1.Laboratoire d’imagerie, de vision et d’intelligence artificielle, École de technologie supérieureUniversité du QuébecMontrealCanada
  2. 2.Pattern Recognition and Applications Lab, Department of Electrical and Electronic EngineeringUniversity of CagliariCagliariItaly

Personalised recommendations