Evaluation of Sampling Methods for Learning from Imbalanced Data

Goel, Garima; Maguire, Liam; Li, Yuhua; McLoone, Sean

doi:10.1007/978-3-642-39479-9_47

Garima Goel²⁰,
Liam Maguire²⁰,
Yuhua Li²⁰ &
…
Sean McLoone²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7995))

Included in the following conference series:

International Conference on Intelligent Computing

3587 Accesses
21 Citations

Abstract

The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 6(1), 20–29 (2004)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: Improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Chawla, N.V., Cieslak, D.A., Hall, L.O., Joshi, A.: Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery 17(2), 225–252 (2008)
Article MathSciNet Google Scholar
Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from imbalanced data sets. Computational Intelligence 20(1), 18–36 (2004)
Article MathSciNet Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: Advances in Intelligent Computing, pp. 878–887 (2005)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Article Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN 2008, pp. 1322–1328 (2008)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering 30(1), 25–36 (2006)
Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: The 14th International Conference on Machine Learning, pp. 179–186 (1997)
Google Scholar
Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: AI in Medicine in Europe: Artificial Intelligence Medicine, pp. 63–66 (2001)
Google Scholar
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(2), 539–550 (2009)
Article Google Scholar
Rätsch, G., Onoda, T., Müller, K.R.: Soft margins for AdaBoost. Machine Learning 42(3), 287–320 (2001)
Article MATH Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 40(1), 185–197 (2010)
Article Google Scholar
Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-43, Dept. of Computer Science, Rutgers Univ (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Intelligent Systems, University of Ulster, UK
Garima Goel, Liam Maguire & Yuhua Li
Department of Electronic Engineering, National University of Ireland Maynooth, Ireland
Sean McLoone

Authors

Garima Goel
View author publications
You can also search for this author in PubMed Google Scholar
Liam Maguire
View author publications
You can also search for this author in PubMed Google Scholar
Yuhua Li
View author publications
You can also search for this author in PubMed Google Scholar
Sean McLoone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Learning and Systems Biology Laboratory, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
Electrical and Electronics Department, Polytechnic of Bari, Via Orabona 4, 70125, Bari, Italy
Vitoantonio Bevilacqua
Faculty of Engineering, District University Francisco José de Caldas, Cra. 7a No. 40-53, Fifth Floor, Bogotá, Colombia
Juan Carlos Figueroa
School of Electrical, Computer and Telecommunications Engineering, The University of Wollongong, 2522, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goel, G., Maguire, L., Li, Y., McLoone, S. (2013). Evaluation of Sampling Methods for Learning from Imbalanced Data. In: Huang, DS., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds) Intelligent Computing Theories. ICIC 2013. Lecture Notes in Computer Science, vol 7995. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39479-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-39479-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39478-2
Online ISBN: 978-3-642-39479-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics