Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Class Imbalance Problem

  • Charles X. Ling
  • Victor S. Sheng
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_110

Definition

Data are said to suffer the Class Imbalance Problem when the class distributions are highly imbalanced. In this context, many  classification learning algorithms have low predictive accuracy for the infrequent class.  Cost-sensitive learning is a common approach to solve this problem.

Motivation and Background

Class imbalanced datasets occur in many real-world applications where the class distributions of data are highly imbalanced. For the two-class case, without loss of generality, one assumes that the minority or rare class is the positive class, and the majority class is the negative class. Often the minority class is very infrequent, such as 1% of the dataset. If one applies most traditional (cost-insensitive) classifiers on the dataset, they are likely to predict everything as negative (the majority class). This was often regarded as a problem in learning from highly imbalanced datasets.

However, Provost (2000) describes two fundamental assumptions that are often made...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Drummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the seventeenth international conference on machine learning (pp. 239–246).Google Scholar
  2. Drummond, C., & Holte, R. (2005). Severe class imbalance: Why better algorithms aren’t the answer. In Proceedings of the sixteenth European conference of machine learning, LNAI (Vol. 3720, pp. 539–546).Google Scholar
  3. Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–450.MATHGoogle Scholar
  4. Ling, C. X., & Li, C. (1998). Data mining for direct marketing – Specific problems and solutions. In Proceedings of fourth international conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 73–79).Google Scholar
  5. Provost, F. (2000). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Charles X. Ling
  • Victor S. Sheng

There are no affiliations available