Abstract
Recently, there has been a renewed interest in smart health systems that aim to deliver high quality healthcare services. Prediction methods are very essential to support these systems. They mainly rely on datasets with assumptions that match the reality. However, one of the greatest challenges to prediction methods is to have datasets which are normally distributed. This paper presents an experimental work to implement SMOTE (Synthetic Minority Oversampling Technique) and bootstrapping methods to normalize datasets. It also measured the impact of both methods in the performance of different prediction methods such as Support vector machine (SVM), Naive Bayes, and neural network(NN) The results showed that bootstrapping with native bays yielded better prediction performance as compared to other prediction methods with SMOTE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ebenuwa, S.H., Sharif, M.S., Alazab, M., Al-Nemrat, A.: Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7, 24649–24666 (2019)
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv Prepr. arXiv1305.1707 (2013)
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny) 250, 113–141 (2013)
Meidan, Y., et al.: N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17(3), 12–22 (2018)
Nguyen, G.H., Bouzerdoum, A., Phung, S.L.: Learning pattern classification tasks with imbalanced data sets. In: Pattern recognition, IntechOpen (2009)
Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 58(1), 15–26 (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Caminero, G., Lopez-Martin, M., Carro, B.: Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 159, 96–109 (2019)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Improving software-quality predictions with data sampling and boosting. IEEE Trans. Syst. Man, Cybern. Syst. Humans 39(6), 1283–1294 (2009)
Drummond, C., Holte, R.C.: “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11, pp. 1–8 (2003)
Liu, A., Ghosh, J., Martin, C.E.: Generative Oversampling for Mining Imbalanced Datasets. In: DMIN, pp. 66–72 (2007)
Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016)
Team, A.V.C.: Practical guide to deal with imbalanced classification problems in R. Analytics Vidhya (2016)
Wang, Q., Luo, Z., Huang, J., Feng, Y., Liu, Z.: A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci. 2017, (2017)
Liu, R., Hall, L.O., Bowyer, K.W., Goldgof, D.B., Gatenby, R., Ben Ahmed, K.: Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1357–1362 (2017)
Wijermans, N., Conrado, C., van Steen, M., Martella, C., Li, J.: A landscape of crowd-management support: an integrative approach. Saf. Sci. 86, 142-164 (2016)
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111 (2011)
Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292 (2008)
Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002)
Lavanya, D., Rani, D.K.U.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. 2(5), 756–763 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aborujilah, A. et al. (2021). Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-70713-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)