An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE

Ghorab, Ahmed S.; Ashour, Wesam M.; Abudalfa, Shadi I.

doi:10.1007/978-3-031-08954-1_2

Ahmed S. Ghorab¹¹,
Wesam M. Ashour¹² &
Shadi I. Abudalfa¹¹

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 495))

Included in the following conference series:

International Conference on Business and Technology

1393 Accesses

Abstract

Class imbalance is a challenge in different actual datasets, where the majority class contains a large number of data points, and the minority class contains a small number of data points. Class imbalance affects the learning process negatively, resulting in classification algorithms’ ignorance of the minority class. To address this issue, various researchers developed different algorithms to tackle the problem; however, the majority of these algorithms are complex and generate noise. This paper provides a simple and effective oversampling technique based on the mean-shift clustering algorithm and using the synthetic minority oversampling technique (SMOTE) of selected clusters.

We conducted several experiments to compare the performance of our technique with different algorithms mentioned in the literature on three common datasets. Experimental results indicate that our technique performs better in synthesizing new samples and improves support vector machine (SVM) classification performance on imbalanced datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yadav, S.S., Bhole, G.P.: Learning from imbalanced data in classification. Int. J. Recent Technol. Eng. 8(5) (2020). ISSN: 2277-3878
Google Scholar
Wang, C.-R., Shao, X.-H.: An improving majority weighted minority oversampling technique for imbalanced classification problem. IEEE Access 9, 5069–5082 (2021). https://doi.org/10.1109/ACCESS.2020.3047923
Article Google Scholar
Comaniciu, D., Meer, P.: Mean-shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell. 24, 603–619. http://www.caip.rutgers.edu/riul/research/papers/pdf/mnshft.pdf (2002)
Cheng, Y.: Mean-shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995). CiteSeerX 10.1.1.510.1222. https://doi.org/10.1109/34.400568
Carreira-Perpinán, M.A. A review of mean-shift algorithms for clustering. arXiv Preprint: 1503.00687 (2015)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Article Google Scholar
Beyan, C., Fisher, R.: Classifying imbalanced data sets using similarity-based hierarchical decomposition. Pattern Recogn. 48(5), 1653–1672 (2015)
Article Google Scholar
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of International Conference on Machine Learning 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC (2003)
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the 11th International Conference on Machine Learning, pp. 217–225 (1994)
Google Scholar
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (August 1999). https://doi.org/10.1145/312129.312220
Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46, 3460–3471 (2013)
Article Google Scholar
Xu, Z., Chen, D.R., Nie, T.Z., Kouet, Y.: A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data. J. Biomed. Inform. 107, 103465 (2020)
Article Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 42(4), 463–484 (2012)
Article Google Scholar
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inform. Sci. 465, 1–20 (2018)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)
Article Google Scholar
Zeng, M., Zou, B., Wei, F., Liu, X., Wang, L.: Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. In: 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 225–228 (2016)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Google Scholar
Pedregosa, et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Last, F., Douzas, G., Bação, F.: Oversampling for imbalanced learning based on k-means and SMOTE. ArXiv abs/1711.00837 (2017)
Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository. The University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2019)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press. https://www.kaggle.com/uciml/pima-indians-diabetes-database (1988)
Demidova, L., Klyueva, I.: SVM classification: optimization with the SMOTE algorithm for the class imbalance problem. In: 2017 6th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4 (2017). https://doi.org/10.1109/MECO.2017.7977136

Download references

Author information

Authors and Affiliations

University College of Applied Sciences, Gaza, Palestine
Ahmed S. Ghorab & Shadi I. Abudalfa
Islamic University of Gaza, Gaza, Palestine
Wesam M. Ashour

Authors

Ahmed S. Ghorab
View author publications
You can also search for this author in PubMed Google Scholar
Wesam M. Ashour
View author publications
You can also search for this author in PubMed Google Scholar
Shadi I. Abudalfa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed S. Ghorab .

Editor information

Editors and Affiliations

Northern Cyprus Campus, R-141 , KKTC, Middle East Technical University, Kalkanli, Güzelyurt, Turkey
Bahaaeddin Alareeni
Col of Business & Finance, Building 41, Ahlia University, Manama, Bahrain
Allam Hamdan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghorab, A.S., Ashour, W.M., Abudalfa, S.I. (2023). An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE. In: Alareeni, B., Hamdan, A. (eds) Explore Business, Technology Opportunities and Challenges ‎After the Covid-19 Pandemic. ICBT 2022. Lecture Notes in Networks and Systems, vol 495. Springer, Cham. https://doi.org/10.1007/978-3-031-08954-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-08954-1_2
Published: 13 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08953-4
Online ISBN: 978-3-031-08954-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics