Skip to main content

Hybrid Data-Level Techniques for Class Imbalance Problem

  • Conference paper
  • First Online:
International Conference on Innovative Computing and Communications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1165))

Abstract

In data mining, the task of classification is to identify an instance in a dataset into one of the predefined classes. In real-life applications, the traditional classification does not work well for imbalanced datasets, i.e., where one class contains very few number of data points, named as the minority class, as compared to other class(es), named as the majority class(es). This problem of imbalanced dataset distribution is termed as the class imbalance problem (CIP). To solve CIP, the researchers examined the effects of CIP on the performance of classifier and proposed various techniques to handle this problem. In literature, these techniques are majorly classified into three levels: data-level approaches (or pre-processing techniques), algorithm-level approaches and ensemble-level approaches. The sampling-based approaches are further subdivided into three categories, such as oversampling techniques, undersampling techniques and hybrid sampling (undersampling + oversampling) techniques. In this paper, we proposed three hybrid sampling techniques (named as Bor-SMOTE+TL, TL+C-SMOTE, SL-SMOTE+TL) using Tomek links (an undersampling) technique combined with the oversampling techniques. The experiments are carried out using real-life imbalanced datasets to show the usefulness of the proposed techniques as compared to the existing sampling techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C.X. Ling, V.S. Sheng, Class imbalance problem, in Encyclopedia of Machine Learning, ed. by C. Sammut, G.I. Webb (Springer, Boston, MA, 2011)

    Google Scholar 

  2. D.A. Cieslak, N.V. Chawla, A. Striegel, Combating imbalance in network intrusion datasets, in IEEE International Conference on Granular Computing (2006), pp. 732–737

    Google Scholar 

  3. N. Japkowicz, The class imbalance problem: significance and strategies, in Proceedings of International Conference on Artificial Intelligence (2000), pp. 111–117

    Google Scholar 

  4. A. Gosain, A. Saha, D. Singh, Analysis of sampling based classification techniques to overcome class imbalancing, in Proceedings of the 10th INDIACom-2016 IEEE International Conference (2016), pp. 7320–7326

    Google Scholar 

  5. B. Krawczyk, Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  6. C. Seiffert, T.M. Khoshgoftaar, J.V. Hulse, Hybrid sampling for imbalanced data, in IEEE International Conference on Information Reuse and Integration (Las Vegas, NV, USA 2008), pp. 202–207. https://doi.org/10.1109/iri.2008.4583030

  7. Q. Wang, A hybrid sampling SVM approach to imbalanced data classification. Abstr. Appl. Anal. (2014). https://doi.org/10.1155/2014/972786

    Article  MATH  Google Scholar 

  8. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, F. Herrera, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, in IEEE transactions on systems, man, and cybernetics—part c: applications and reviews (vol. 42, 2012), pp. 463–484

    Google Scholar 

  9. P. Kaur, A. Gosain, Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise (ICT Based Innovations, Springer, Singapore, 2018), pp. 23–30

    Google Scholar 

  10. I. Tomek, Two modifications of CNN, in IEEE Transactions on Systems Man and Communications SMC-6 (1976), pp. 769–772

    Google Scholar 

  11. N.V. Chawla et al., SMOTE: synthetic minority over sampling technique. J. Artif. Intell. Res. (vol. 16, 2002), pp. 321– 357

    Google Scholar 

  12. H. Han, W. Wang, B. Mao, Borderline-SMOTE: A New Oversampling Method in Imbalanced Data-sets Learning (ICIC 2005. LNCS, Springer, Heidelberg, vol. 3644, 2005), pp. 878–887

    Google Scholar 

  13. G. He, W. Wang, H. Han, C-SMOTE: A Combination Method for Learning From Imbalanced Datasets (IICAI, 2005)

    Google Scholar 

  14. C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-Level-SMOTE: Safe Level-Synthetic Minority Over-Sampling Technique for handling the Class Imbalance Problem (PADD2009, LNAI, Springer, vol. 5476, 2009), pp. 475–482

    Google Scholar 

  15. G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several methods for balancing machine learning training data. Sigkdd Explorations (vol. 6, 2004), pp. 20–29

    Google Scholar 

  16. C. Blake, C. Merz, UCI Repository of Machine Learning Databases. Department of Information and Computer Sciences (University of California, Irvine, CA, USA, 1998). https://archive.ics.uci.edu/ml/datasets.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepika Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gosain, A., Gupta, A., Singh, D. (2021). Hybrid Data-Level Techniques for Class Imbalance Problem. In: Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. Advances in Intelligent Systems and Computing, vol 1165. Springer, Singapore. https://doi.org/10.1007/978-981-15-5113-0_95

Download citation

Publish with us

Policies and ethics