Skip to main content

KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance Class Data

  • Conference paper
  • First Online:
International Conference on Innovative Computing and Communications (ICICC 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 731))

Included in the following conference series:

  • 147 Accesses

Abstract

Classification accuracy for imbalance class data is a primary issue in machine learning. Most classification algorithms result in insignificant accuracy when used over class imbalance data. Class imbalance data exist in many sensitive domains such as medicine, finance, etc., where infrequent events such as rare disease diagnoses and fraud transactions are required to be identified. In these domains, correct classification is essential. The paper presents a hybrid sampling model called KSMOTEEN to address class imbalance data. The model uses a clustering approach, the K-means clustering algorithm, and combines the SMOTEEN technique. The experimental result shows, the KSMOTEEN outperforms some existing sampling methods, thus improving the performance of classifiers for class imbalance data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wasikowski M, Chen X-W (2009) Combating the small sample class imbalance problem using feature selection. IEEE Trans Knowl Data Eng 22(10):1388–1400

    Article  Google Scholar 

  2. Dong Q, Gong S, Zhu X (2018) Imbalanced deep learning by minority class incremental rectification. IEEE Trans Pattern Anal Mach Intell 41(6):1367–1381

    Google Scholar 

  3. Mathew J et al (2017) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076

    Google Scholar 

  4. Bader-El-Den M, Teitei E, Perry T (2018) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172

    Article  Google Scholar 

  5. Beyan C, Fisher R (2015) Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recogn 48(5):1653–1672

    Article  Google Scholar 

  6. López V et al (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141

    Google Scholar 

  7. Hirsch V, Reimann P, Mitschang B (2020) Exploiting domain knowledge to address multi-class imbalance and a heterogeneous feature space in classification tasks for manufacturing data. Proc VLDB Endowment 13(12):3258–3271

    Article  Google Scholar 

  8. Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Google Scholar 

  9. Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Google Scholar 

  10. Yong Y (2012) The research of imbalanced data set of sample sampling method based on K-means cluster and genetic algorithm. Energy Procedia 17:164–170

    Article  Google Scholar 

  11. Siers MJ, Islam MZ (2020) Class imbalance and cost-sensitive decision trees: a unified survey based on a core similarity. ACM Trans Knowl Discovery Data (TKDD) 15(1):1–31

    Google Scholar 

  12. Tomek I (1976) Two modifications of CNN. IEEE Trans Syst, Man Cybern 6:769–772

    Google Scholar 

  13. Li Z, Kamnitsas K, Glocker B (2020) Analyzing overfitting under class imbalance in neural networks for image segmentation. IEEE Trans Med Imaging 40(3):1065–1077

    Article  Google Scholar 

  14. Mehrotra S, Kohli S, Sharan A (2019) An intelligent clustering approach for improving search result of a website. Int J Adv Intell Paradigms 12(3–4):295–304

    Article  Google Scholar 

  15. Mehrotra S, Kohli S (2017) Data clustering and various clustering approaches. In: Intelligent multidimensional data clustering and analysis. IGI Global, pp 90–108

    Google Scholar 

  16. Mehrotra S, Kohli S, Sharan A (2018) To identify the usage of clustering techniques for improving search result of a website. Int J Data Min, Model Manag 10(3):229–249

    Google Scholar 

  17. Wang J, Xu M, Wang H, Zhang J (2006) Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding. In: 2006 8th International conference on signal processing, vol 3. IEEE

    Google Scholar 

  18. Das R et al (2020) An oversampling technique by integrating reverse nearest neighbor in SMOTE: reverse-SMOTE. In: 2020 International conference on smart electronics and communication (ICOSEC). IEEE

    Google Scholar 

  19. Lee H et al (2017) Synthetic minority over-sampling technique based on fuzzy c-means clustering for imbalanced data. In: 2017 International conference on fuzzy theory and its applications (iFUZZY). IEEE

    Google Scholar 

  20. Tallo TE, Musdholifah A (2018) The implementation of genetic algorithm in smote (synthetic minority oversampling technique) for handling imbalanced dataset problem. In: 2018 4th international conference on science and technology (ICST). IEEE

    Google Scholar 

  21. Islam MS, Arifuzzaman M, Islam MS (2019) SMOTE approach for predicting the success of bank telemarketing. In: 2019 4th Technology innovation management and engineering science international conference (TIMES-iCON). IEEE

    Google Scholar 

  22. Bajer D et al (2019) Performance analysis of SMOTE-based oversampling techniques when dealing with data imbalance. In: 2019 International conference on systems, signals and image processing (IWSSIP). IEEE

    Google Scholar 

  23. Li J, Li H, Yu J-L (2011) Application of random-SMOTE on imbalanced data mining. In: 2011 Fourth international conference on business intelligence and financial engineering. IEEE

    Google Scholar 

  24. Rustogi R, Prasad A (2019) Swift imbalance data classification using SMOTE and extreme learning machine. In: 2019 International conference on computational intelligence in data science (ICCIDS). IEEE

    Google Scholar 

  25. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. Springer, Berlin, Heidelberg

    Google Scholar 

  26. Liu B, Liu Z, Xiao Y (2021) A new dictionary-based positive and unlabeled learning method. Appl Intell 51(12):8850–8864

    Google Scholar 

  27. Patel VR, Mehta RG (2011) Impact of outlier removal and normalization approach in modified k-means clustering algorithm. Int J Comput Sci Issues (IJCSI) 8(5):331

    Google Scholar 

  28. Chawla NV et al (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg

    Google Scholar 

  29. Gök EC, Olgun MO (2021) SMOTE-NC and gradient boosting impu- tation based random forest classifier for predicting severity level of covid-19 patients with blood samples. Neural Comput Appl 33(22):15693–15707

    Google Scholar 

  30. Nishant PS et al (2021) HOUSEN: hybrid over–undersampling and ensemble ap- proach for imbalance classification. In: Inventive systems and control. Springer, Singapore, pp 93–108

    Google Scholar 

  31. Wegier W, Koziarski M, Wozniak M (2022) Multicriteria classifier ensemble learning for imbalanced data. IEEE Access 10:16807–16818

    Google Scholar 

  32. Brzezinski D et al (2019) On the dynamics of classification measures for imbalanced and streaming data. IEEE Trans Neural Netw Learn Syst 31(8):2868–2878

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashi Mehrotra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dhamal, P., Mehrotra, S. (2024). KSMOTEEN: A Cluster Based Hybrid Sampling Model for Imbalance Class Data. In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. ICICC 2023. Lecture Notes in Networks and Systems, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-99-4071-4_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4071-4_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4070-7

  • Online ISBN: 978-981-99-4071-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics