Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification
In real-world applications, it has been often observed that class imbalance (significant differences in class prior probabilities) may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. This effect becomes especially significant on instance-based learning due to the use of some dissimilarity measure. We analyze the effects of class imbalance on the classifier performance and how the overlap has influence on such an effect, as well as on several techniques proposed in the literature to tackle the class imbalance. Besides, we study how these methods affect to the performance on both classes, not only on the minority class as usual.
KeywordsMajority Class Near Neighbor Weighted Distance Minority Class Class Imbalance
Unable to display preview. Download preview PDF.
- Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 155–164 (1999)Google Scholar
- Eavis, T., Japkowicz, N.: A recognition-based alternative to discrimination-based multi-layer perceptrons, In: Proc. Workshop on Learning from Imbalanced Data Sets, Technical Report WS-00-05 (2000)Google Scholar
- Japkowicz, N.: Class imbalance: are we focusing on the right issue? In: Proc. Intl. Workshop on Learning from Imbalanced Data Sets II (2003)Google Scholar
- Kubat, M., Matwin, S.: Adressing the curse of imbalanced training sets: one-sided selection. In: Proc. 14th Intl. Conf. on Machine Learning, pp. 179–186 (1997)Google Scholar
- Ling, C.X., Li, C.: Data mining for direct marketing: problems and solutions. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 73–79 (1998)Google Scholar
- Orriols, A., Bernardó, E.: The class imbalance problem in learning classifier systems: a preliminary study. In: Proc. Conf. on Genetic and Evolutionary Computation, pp. 74–78 (2005)Google Scholar
- Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. 11th Intl. Conf. on Machine Learning, pp. 217–225 (1994)Google Scholar
- Prati, R.C., Batista, G.E., Monard, M.C.: Class imbalance versus class overlapping: an analysis of a learning system behavior. In: Proc. 3rd Mexican Intl. Conference on Artificial Intelligence, pp. 312–321 (2004)Google Scholar
- Weiss, G.M.: The Effect of Small Disjuncts and Class Distribution on Decision Tree Learning. PhD thesis, Rutgers University (2003)Google Scholar
- Woods, K., Doss, C., Bowyer, K.W., Solka, J., Priebe, C., Kegelmeyer, W.P.: Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. International Journal of Pattern Recognition and Artificial Intelligence 7, 1417–1436 (1993)CrossRefGoogle Scholar