An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets

  • Vicente García
  • Jose Sánchez
  • Ramon Mollineda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4756)


Class imbalance has been reported as an important obstacle to apply traditional learning algorithms to real-world domains. Recent investigations have questioned whether the imbalance is the unique factor that hinders the performance of classifiers. In this paper, we study the behavior of six algorithms when classifying imbalanced, overlapped data sets under uncommon situations (e.g., when the overall imbalance ratio is different from the local imbalance ratio in the overlap region). This is accomplished by analyzing the accuracy on each individual class, thus devising how those situations affect the majority and minority classes. The experiments corroborate that overlap is more important than imbalance for the classification performance. Also, they show that the classifiers behave differently depending on the nature of each model.


Imbalance overlapping classifiers performance measures 


  1. 1.
    Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)CrossRefGoogle Scholar
  2. 2.
    Batista, G.E., Prati, R.C., Monard, M.C.: Balancing strategies and class overlapping. In: Proc. 6th Intl. Symposium on Intelligent Data Analysis, Madrid, Spain, pp. 24–35 (2005)Google Scholar
  3. 3.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)Google Scholar
  4. 4.
    Buhmann, M., Albowitz, M.: Radial Basis Functions: Theory and Implementations. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  5. 5.
    Chawla, N.V., Bowyer, K.W., Hall, L., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  6. 6.
    Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. on Information Theory 13, 21–27 (1967)zbMATHCrossRefGoogle Scholar
  7. 7.
    Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, pp. 155–164 (1999)Google Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (2001)Google Scholar
  9. 9.
    Japkowicz, N., Stphen, S.: The class imbalance problem: a systematic study. Intelligent Data Analysis 6, 40–49 (2002)Google Scholar
  10. 10.
    Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletters 6, 40–49 (2004)CrossRefGoogle Scholar
  11. 11.
    Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, Nashville, USA, pp. 179–186 (1997)Google Scholar
  12. 12.
    Orriols, A., Bernadó, E.: The class imbalance problem in learning classifier systems: a preliminary study. In: Proc. of Conf. on Genetic and Evolutionary Computation, Washington DC, USA, pp. 74–78 (2005)Google Scholar
  13. 13.
    Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conf. on Artificial Intelligence, Mexico City, Mexico, pp. 312–321 (2004)Google Scholar
  14. 14.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  15. 15.
    Vapnik, V., Kotz, S.: Estimation of Dependences Based on Empirical Data. Springer, New York (2006)zbMATHGoogle Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Vicente García
    • 1
    • 2
  • Jose Sánchez
    • 2
  • Ramon Mollineda
    • 2
  1. 1.Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Av. Tecnológico s/n, 52140 MetepecMéxico
  2. 2.Dept. Llenguatges i Sistemes Informàtics, Universitat Jaume I, Av. Sos Baynat s/n, 12071 Castelló de la PlanaSpain

Personalised recommendations