A New Performance Evaluation Method for Two-Class Imbalanced Problems

  • Vicente García
  • Ramón A. Mollineda
  • J. Salvador Sánchez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5342)


In this paper, we introduce a new approach to evaluate and visualize the classifier performance in two-class imbalanced domains. This method defines a two-dimensional space by combining the geometric mean of class accuracies and a new metric that gives an indication of how balanced they are. A given point in this space represents a certain trade-off between those two measures, which will be expressed as a trapezoidal function. Besides, this evaluation function has the interesting property that it allows to emphasize the correct predictions on the minority class, which is often considered as the most important class. Experiments demonstrate the consistency and validity of the evaluation method here proposed.


Imbalance performance measure learning 


  1. 1.
    Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets. In: Proc. XVth European Conference on Machine Learning (ECML 2004), Pisa, Italy, pp. 39–50 (2004)Google Scholar
  2. 2.
    Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)CrossRefGoogle Scholar
  3. 3.
    Batista, G.E., Prati, R.C., Monard, M.C.: Balancing Strategies and Class Overlapping. In: Proc. 6th Intl. Symposium on Intelligent Data Analysis, Madrid, Spain, pp. 24–35 (2005)Google Scholar
  4. 4.
    Chawla, N.V., Bowyer, K.W., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  5. 5.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. In: Proc. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, Cavtat–Dubrovnik, Croatia, pp. 107–119 (2003)Google Scholar
  6. 6.
    Daskalaki, S., Kopanas, I., Avouris, N.: Evaluation of classifiers for an uneven class distribution problem. Applied Artificial Intelligence 20, 381–417 (2006)CrossRefGoogle Scholar
  7. 7.
    Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, pp. 155–164 (1999)Google Scholar
  8. 8.
    Drummond, C., Holte, R.C.: Explicitly representing expected cost: an alternative to ROC representation. In: Proc. 6th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, Boston, MA, pp. 198–207 (2000)Google Scholar
  9. 9.
    García, V., Mollineda, R.A., Sánchez, J.S., Alejo, R., Sotoca, J.M.: When overlapping unexpectedly alters the class imbalance effects. In: Pattern Recognition and Image Analysis, pp. 499–506 (2007)Google Scholar
  10. 10.
    Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Proc. Intl. Conf. on Intelligent Computing, Hefei, China, pp. 878–887 (2005)Google Scholar
  11. 11.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. on Knowledge and Data Engineering 17, 299–310 (2005)CrossRefGoogle Scholar
  12. 12.
    Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intelligent Data Analysis 6, 40–49 (2002)zbMATHGoogle Scholar
  13. 13.
    Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explorations 6, 40–49 (2004)CrossRefGoogle Scholar
  14. 14.
    Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, Nashville, TN, pp. 179–186 (1997)Google Scholar
  15. 15.
    Liu, X.Y., Zhou, Z.H.: The influence of class imbalance on cost-sensitive learning: an empirical study. In: Proc. 6th Intl. Conf. on Data Mining, pp. 970–974 (2006)Google Scholar
  16. 16.
    Maloof, M.A.: Learning when data sets are imbalanced and when costs are unequal and unknow. In: ICML–2003 Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
  17. 17.
    Okamoto, S., Yugami, N.: Effects of domain characteristics on instance-based learning algorithms. Theoretical Computer Science 298, 207–233 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conf. on Artificial Intelligence, Mexico City, Mexico, pp. 312–321 (2004)Google Scholar
  19. 19.
    Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proc. 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Newport Beach, CA, pp. 43–48 (1997)Google Scholar
  20. 20.
    Raskutti, B., Kowalczyk, A.: Extreme rebalancing for svms: a case study. SIGKDD Explorations 6, 60–69 (2004)CrossRefGoogle Scholar
  21. 21.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  22. 22.
    Yen, S.H., Lee, Y.S., Lin, C.H., Ying, J.C.: Investigating the effect of sampling methods for imbalanced data distributions. In: Proc. IEEE Intl. Conf. on Systems, Man, and Cybernetics, Taipei, Taiwan, pp. 4163–4168 (2006)Google Scholar
  23. 23.
    Zhang, J., Srihari, R.K.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Vicente García
    • 1
    • 2
  • Ramón A. Mollineda
    • 2
  • J. Salvador Sánchez
    • 2
  1. 1.Lab. Reconocimiento de Patrones, Instituto Tecnológico de TolucaMetepecMéxico
  2. 2.Dept. Llenguatges i Sistemes InformàticsUniversitat Jaume ICastelló de la PlanaSpain

Personalised recommendations