Skip to main content

Class Imbalance Problem

  • Reference work entry

Definition

Data are said to suffer the Class Imbalance Problem when the class distributions are highly imbalanced. In this context, many classification learning algorithms have low predictive accuracy for the infrequent class. Cost-sensitive learning is a common approach to solve this problem.

Motivation and Background

Class imbalanced datasets occur in many real-world applications where the class distributions of data are highly imbalanced. For the two-class case, without loss of generality, one assumes that the minority or rare class is the positive class, and the majority class is the negative class. Often the minority class is very infrequent, such as 1% of the dataset. If one applies most traditional (cost-insensitive) classifiers on the dataset, they are likely to predict everything as negative (the majority class). This was often regarded as a problem in learning from highly imbalanced datasets.

However, Provost (2000) describes two fundamental assumptions that are often made...

This is a preview of subscription content, log in via an institution.

Recommended Reading

  • Drummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the seventeenth international conference on machine learning (pp. 239–246).

    Google Scholar 

  • Drummond, C., & Holte, R. (2005). Severe class imbalance: Why better algorithms aren’t the answer. In Proceedings of the sixteenth European conference of machine learning, LNAI (Vol. 3720, pp. 539–546).

    Google Scholar 

  • Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–450.

    MATH  Google Scholar 

  • Ling, C. X., & Li, C. (1998). Data mining for direct marketing – Specific problems and solutions. In Proceedings of fourth international conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 73–79).

    Google Scholar 

  • Provost, F. (2000). Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 workshop on imbalanced data.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Ling, C.X., Sheng, V.S. (2011). Class Imbalance Problem. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_110

Download citation

Publish with us

Policies and ethics