Class Imbalance Problem
Motivation and Background
Class imbalanced datasets occur in many real-world applications where the class distributions of data are highly imbalanced. For the two-class case, without loss of generality, one assumes that the minority or rare class is the positive class, and the majority class is the negative class. Often the minority class is very infrequent, such as 1% of the dataset. If one applies most traditional (cost-insensitive) classifiers on the dataset, they are likely to predict everything as negative (the majority class). This was often regarded as a problem in learning from highly imbalanced datasets.
However, Provost (2000) describes two fundamental assumptions that are often made...
- Drummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the seventeenth international conference on machine learning (pp. 239–246).Google Scholar
- Drummond, C., & Holte, R. (2005). Severe class imbalance: Why better algorithms aren’t the answer. In Proceedings of the sixteenth European conference of machine learning, LNAI (Vol. 3720, pp. 539–546).Google Scholar
- Ling, C. X., & Li, C. (1998). Data mining for direct marketing – Specific problems and solutions. In Proceedings of fourth international conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 73–79).Google Scholar
- Provost, F. (2000). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data.Google Scholar