Severe Class Imbalance: Why Better Algorithms Aren’t the Answer

Drummond, Chris; Holte, Robert C.

doi:10.1007/11564096_52

Chris Drummond²³ &
Robert C. Holte²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

5810 Accesses
24 Citations

Abstract

This paper argues that severe class imbalance is not just an interesting technical challenge that improved learning algorithms will address, it is much more serious. To be useful, a classifier must appreciably outperform a trivial solution, such as choosing the majority class. Any application that is inherently noisy limits the error rate, and cost, that is achievable. When data are normally distributed, even a Bayes optimal classifier has a vanishingly small reduction in the majority classifier’s error rate, and cost, as imbalance increases. For fat tailed distributions, and when practical classifiers are used, often no reduction is achieved.

Download to read the full chapter text

Chapter PDF

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Article 17 October 2014

Imbalance factor: a simple new scale for measuring inter-class imbalance extent in classification problems

Article 13 May 2023

Handling Imbalanced Data: A Survey

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chawla, N.V., Japkowicz, N., Kolcz, A. (eds.): Proc. of ICML 2003 Workshop on Learning from Imbalanced Data Sets (2003)
Google Scholar
Cardie, C., Howe, N.: Improving minority class prediction using case-specific feature weights. In: Proc. of 14th Int. Conf. on Machine Learning, pp. 57–65 (1997)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proc. of 15th Int. Conf. on Machine Learning, pp. 43–48 (1998)
Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proc. of 18th Int. Joint Conf. on Artificial Intelligence, pp. 519–524 (2003)
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proc. of 11th Int. Conf. on Machine Learning, pp. 217–225 (1994)
Google Scholar
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)
Article Google Scholar
Drummond, C., Holte, R.C.: Explicitly representing expected cost: An alternative to ROC representation. In: Proc. of 6th Int. Conf. on Knowledge Discovery and Data Mining, pp. 198–207 (2000)
Google Scholar
Axelsson, S.: The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proc. of 6th ACM Conf. on Computer & Communications Security, pp. 1–7 (1999)
Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, University of California, Irvine, CA (1998), www.ics.uci.edu/~mlearn/MLRepository.html
Provost, F., Fawcett, T.: Robust classification systems for imprecise environments. In: Proc. of 15th Nat. Conf. on Artificial Intelligence, pp. 706–713 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Ontario, K1A 0R6, Canada
Chris Drummond
Department of Computing Science, University of Alberta, Edmonton, Alberta, T6G 2E8, Canada
Robert C. Holte

Authors

Chris Drummond
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Holte
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drummond, C., Holte, R.C. (2005). Severe Class Imbalance: Why Better Algorithms Aren’t the Answer. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_52

Download citation

DOI: https://doi.org/10.1007/11564096_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Severe Class Imbalance: Why Better Algorithms Aren’t the Answer

Abstract

Chapter PDF

Similar content being viewed by others

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Imbalance factor: a simple new scale for measuring inter-class imbalance extent in classification problems

Handling Imbalanced Data: A Survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Severe Class Imbalance: Why Better Algorithms Aren’t the Answer

Abstract

Chapter PDF

Similar content being viewed by others

Class imbalance revisited: a new experimental setup to assess the performance of treatment methods

Imbalance factor: a simple new scale for measuring inter-class imbalance extent in classification problems

Handling Imbalanced Data: A Survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation