Abstract
Class imbalance is a challenge in different actual datasets, where the majority class contains a large number of data points, and the minority class contains a small number of data points. Class imbalance affects the learning process negatively, resulting in classification algorithms’ ignorance of the minority class. To address this issue, various researchers developed different algorithms to tackle the problem; however, the majority of these algorithms are complex and generate noise. This paper provides a simple and effective oversampling technique based on the mean-shift clustering algorithm and using the synthetic minority oversampling technique (SMOTE) of selected clusters.
We conducted several experiments to compare the performance of our technique with different algorithms mentioned in the literature on three common datasets. Experimental results indicate that our technique performs better in synthesizing new samples and improves support vector machine (SVM) classification performance on imbalanced datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yadav, S.S., Bhole, G.P.: Learning from imbalanced data in classification. Int. J. Recent Technol. Eng. 8(5) (2020). ISSN: 2277-3878
Wang, C.-R., Shao, X.-H.: An improving majority weighted minority oversampling technique for imbalanced classification problem. IEEE Access 9, 5069–5082 (2021). https://doi.org/10.1109/ACCESS.2020.3047923
Comaniciu, D., Meer, P.: Mean-shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell. 24, 603–619. http://www.caip.rutgers.edu/riul/research/papers/pdf/mnshft.pdf (2002)
Cheng, Y.: Mean-shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995). CiteSeerX 10.1.1.510.1222. https://doi.org/10.1109/34.400568
Carreira-Perpinán, M.A. A review of mean-shift algorithms for clustering. arXiv Preprint: 1503.00687 (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Beyan, C., Fisher, R.: Classifying imbalanced data sets using similarity-based hierarchical decomposition. Pattern Recogn. 48(5), 1653–1672 (2015)
Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of International Conference on Machine Learning 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC (2003)
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the 11th International Conference on Machine Learning, pp. 217–225 (1994)
Domingos, P.: Metacost: a general method for making classifiers cost-sensitive. In: KDD 1999: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (August 1999). https://doi.org/10.1145/312129.312220
Galar, M., Fernandez, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46, 3460–3471 (2013)
Xu, Z., Chen, D.R., Nie, T.Z., Kouet, Y.: A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data. J. Biomed. Inform. 107, 103465 (2020)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C: Appl. Rev. 42(4), 463–484 (2012)
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inform. Sci. 465, 1–20 (2018)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: DBSMOTE: density-based synthetic minority over-sampling technique. Appl. Intell. 36(3), 664–684 (2012)
Zeng, M., Zou, B., Wei, F., Liu, X., Wang, L.: Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. In: 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), pp. 225–228 (2016)
He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008 (IEEE World Congress on Computational Intelligence), pp. 1322–1328. IEEE (2008)
Pedregosa, et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)
Last, F., Douzas, G., Bação, F.: Oversampling for imbalanced learning based on k-means and SMOTE. ArXiv abs/1711.00837 (2017)
Dua, D., Graff, C.: UCI Machine Learning Repository. The University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml (2019)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press. https://www.kaggle.com/uciml/pima-indians-diabetes-database (1988)
Demidova, L., Klyueva, I.: SVM classification: optimization with the SMOTE algorithm for the class imbalance problem. In: 2017 6th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4 (2017). https://doi.org/10.1109/MECO.2017.7977136
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ghorab, A.S., Ashour, W.M., Abudalfa, S.I. (2023). An Adaptive Oversampling Method for Imbalanced Datasets Based on Mean-Shift and SMOTE. In: Alareeni, B., Hamdan, A. (eds) Explore Business, Technology Opportunities and Challenges After the Covid-19 Pandemic. ICBT 2022. Lecture Notes in Networks and Systems, vol 495. Springer, Cham. https://doi.org/10.1007/978-3-031-08954-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-08954-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08953-4
Online ISBN: 978-3-031-08954-1
eBook Packages: EngineeringEngineering (R0)