Skip to main content

An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE

  • Conference paper
  • First Online:
Explore Business, Technology Opportunities and Challenges ‎After the Covid-19 Pandemic (ICBT 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 495))

Included in the following conference series:

  • 1393 Accesses

Abstract

Class imbalance is a challenge in different actual datasets, where the majority class contains a large number of data points, and the minority class contains a small number of data points. Class imbalance affects the learning process negatively, resulting in classification algorithms’ ignorance of the minority class. To address this issue, various researchers developed different algorithms to tackle the problem; however, the majority of these algorithms are complex and generate noise. This paper provides a simple and effective oversampling technique based on the mean-shift clustering algorithm and using the synthetic minority oversampling technique (SMOTE) of selected clusters.

We conducted several experiments to compare the performance of our technique with different algorithms mentioned in the literature on three common datasets. Experimental results indicate that our technique performs better in synthesizing new samples and improves support vector machine (SVM) classification performance on imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yadav, S.S., Bhole, G.P.: Learning from imbalanced data in classification. Int. J. Recent Technol. Eng. 8(5) (2020). ISSN: 2277-3878

    Google Scholar 

  2. Wang, C.-R., Shao, X.-H.: An improving majority weighted minority oversampling technique for imbalanced classification problem. IEEE Access 9, 5069–5082 (2021). https://doi.org/10.1109/ACCESS.2020.3047923

    Article  Google Scholar 

  3. Comaniciu, D., Meer, P.: Mean-shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell. 24, 603–619. http://www.caip.rutgers.edu/riul/research/papers/pdf/mnshft.pdf (2002)

  4. Cheng, Y.: Mean-shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995). CiteSeerX 10.1.1.510.1222. https://doi.org/10.1109/34.400568

  5. Carreira-Perpinán, M.A. A review of mean-shift algorithms for clustering. arXiv Preprint: 1503.00687 (2015)

    Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    Article  Google Scholar 

  7. Beyan, C., Fisher, R.: Classifying imbalanced data sets using similarity-based hierarchical decomposition. Pattern Recogn. 48(5), 1653–1672 (2015)

    Article  Google Scholar 

  8. Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of International Conference on Machine Learning 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC (2003)

    Google Scholar 

  9. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the 11th International Conference on Machine Learning, pp. 217–225 (1994)

    Google Scholar 

  10. Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (August 1999). https://doi.org/10.1145/312129.312220

  11. Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46, 3460–3471 (2013)

    Article  Google Scholar 

  12. Xu, Z., Chen, D.R., Nie, T.Z., Kouet, Y.: A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data. J. Biomed. Inform. 107, 103465 (2020)

    Article  Google Scholar 

  13. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 42(4), 463–484 (2012)

    Article  Google Scholar 

  14. Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inform. Sci. 465, 1–20 (2018)

    Article  Google Scholar 

  15. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)

    Article  Google Scholar 

  16. Zeng, M., Zou, B., Wei, F., Liu, X., Wang, L.: Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. In: 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 225–228 (2016)

    Google Scholar 

  17. He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)

    Google Scholar 

  18. Pedregosa, et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Last, F., Douzas, G., Bação, F.: Oversampling for imbalanced learning based on k-means and SMOTE. ArXiv abs/1711.00837 (2017)

    Google Scholar 

  20. Dua, D., Graff, C.: UCI Machine Learning Repository. The University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2019)

  21. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press. https://www.kaggle.com/uciml/pima-indians-diabetes-database (1988)

  22. Demidova, L., Klyueva, I.: SVM classification: optimization with the SMOTE algorithm for the class imbalance problem. In: 2017 6th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4 (2017). https://doi.org/10.1109/MECO.2017.7977136

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed S. Ghorab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ghorab, A.S., Ashour, W.M., Abudalfa, S.I. (2023). An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE. In: Alareeni, B., Hamdan, A. (eds) Explore Business, Technology Opportunities and Challenges ‎After the Covid-19 Pandemic. ICBT 2022. Lecture Notes in Networks and Systems, vol 495. Springer, Cham. https://doi.org/10.1007/978-3-031-08954-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08954-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08953-4

  • Online ISBN: 978-3-031-08954-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics