Advertisement

Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification

  • V. García
  • R. Alejo
  • J. S. Sánchez
  • J. M. Sotoca
  • R. A. Mollineda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)

Abstract

In real-world applications, it has been often observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. This effect becomes especially significant on instance-based learning due to the use of some dissimilarity measure. We analyze the effects of class imbalance on the classifier performance and how the overlap has influence on such an effect, as well as on several techniques proposed in the literature to tackle the class imbalance. Besides, we study how these methods affect to the performance on both classes, not only on the minority class as usual.

Keywords

Majority Class Near Neighbor Weighted Distance Minority Class Class Imbalance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36, 849–851 (2003)CrossRefGoogle Scholar
  2. Batista, G.E., Pratti, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6, 20–29 (2004)CrossRefGoogle Scholar
  3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)MATHGoogle Scholar
  4. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)Google Scholar
  5. Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons, In: Proc. Workshop on Learning from Imbalanced Data Sets, Technical Report WS-00-05 (2000)Google Scholar
  6. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1996)CrossRefGoogle Scholar
  7. Gordon, D.F., Perlis, D.: Explicitly biased generalization. Computational Intelligence 5, 67–81 (1989)CrossRefGoogle Scholar
  8. Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: Proc. Intl. Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
  9. Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explorations 6, 40–49 (2004)CrossRefGoogle Scholar
  10. Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, pp. 179–186 (1997)Google Scholar
  11. Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)Google Scholar
  12. Orriols, A., Bernardó, E.: The class imbalance problem in learning classifier systems: a preliminary study. In: Proc. Conf. on Genetic and Evolutionary Computation, pp. 74–78 (2005)Google Scholar
  13. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Intl. Conf. on Machine Learning, pp. 217–225 (1994)Google Scholar
  14. Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conference on Artificial Intelligence, pp. 312–321 (2004)Google Scholar
  15. Tan, S.: Neighbor-weighted K-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28, 667–671 (2005)CrossRefGoogle Scholar
  16. Weiss, G.M.: The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning. PhD thesis, Rutgers University (2003)Google Scholar
  17. Wilson, D.L.: Asymptotic properties of nearest neighbour rules using edited data. IEEE Trans. on Systems, Man and Cybernetics 2, 408–421 (1972)MATHCrossRefGoogle Scholar
  18. Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • V. García
    • 1
    • 2
  • R. Alejo
    • 1
    • 2
  • J. S. Sánchez
    • 1
  • J. M. Sotoca
    • 1
  • R. A. Mollineda
    • 1
  1. 1.Dept. Llenguatges i Sistemes InformàticsUniversitat Jaume ICastelló de la PlanaSpain
  2. 2.Instituto Tecnológico de TolucaLab. Reconocimiento de PatronesMetepecMéxico

Personalised recommendations