Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods

Aborujilah, Abdulaziz; Nassr, Rasheed Mohammad; Al-Hadhrami, Tawfik; Husen, Mohd Nizam; Ali, Nor Azlina; Othmani, Abdulaleem Al-; Hamdi, Mustapha

doi:10.1007/978-3-030-70713-2_1

Abdulaziz Aborujilah⁵,
Rasheed Mohammad Nassr⁵,
Tawfik Al-Hadhrami⁶,
Mohd Nizam Husen⁵,
Nor Azlina Ali⁵,
Abdulaleem Al- Othmani⁵ &
…
Mustapha Hamdi⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 72))

Included in the following conference series:

International Conference of Reliable Information and Communication Technology

1119 Accesses

Abstract

Recently, there has been a renewed interest in smart health systems that aim to deliver high quality healthcare services. Prediction methods are very essential to support these systems. They mainly rely on datasets with assumptions that match the reality. However, one of the greatest challenges to prediction methods is to have datasets which are normally distributed. This paper presents an experimental work to implement SMOTE (Synthetic Minority Oversampling Technique) and bootstrapping methods to normalize datasets. It also measured the impact of both methods in the performance of different prediction methods such as Support vector machine (SVM), Naive Bayes, and neural network(NN) The results showed that bootstrapping with native bays yielded better prediction performance as compared to other prediction methods with SMOTE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ebenuwa, S.H., Sharif, M.S., Alazab, M., Al-Nemrat, A.: Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7, 24649–24666 (2019)
Article Google Scholar
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv Prepr. arXiv1305.1707 (2013)
Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny) 250, 113–141 (2013)
Article Google Scholar
Meidan, Y., et al.: N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17(3), 12–22 (2018)
Article Google Scholar
Nguyen, G.H., Bouzerdoum, A., Phung, S.L.: Learning pattern classification tasks with imbalanced data sets. In: Pattern recognition, IntechOpen (2009)
Google Scholar
Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 58(1), 15–26 (2019)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Caminero, G., Lopez-Martin, M., Carro, B.: Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 159, 96–109 (2019)
Article Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Improving software-quality predictions with data sampling and boosting. IEEE Trans. Syst. Man, Cybern. Syst. Humans 39(6), 1283–1294 (2009)
Google Scholar
Drummond, C., Holte, R.C.: “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11, pp. 1–8 (2003)
Google Scholar
Liu, A., Ghosh, J., Martin, C.E.: Generative Oversampling for Mining Imbalanced Datasets. In: DMIN, pp. 66–72 (2007)
Google Scholar
Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016)
Article Google Scholar
Team, A.V.C.: Practical guide to deal with imbalanced classification problems in R. Analytics Vidhya (2016)
Google Scholar
Wang, Q., Luo, Z., Huang, J., Feng, Y., Liu, Z.: A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci. 2017, (2017)
Google Scholar
Liu, R., Hall, L.O., Bowyer, K.W., Goldgof, D.B., Gatenby, R., Ben Ahmed, K.: Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1357–1362 (2017)
Google Scholar
Wijermans, N., Conrado, C., van Steen, M., Martella, C., Li, J.: A landscape of crowd-management support: an integrative approach. Saf. Sci. 86, 142-164 (2016)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111 (2011)
Google Scholar
Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292 (2008)
Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002)
Google Scholar
Lavanya, D., Rani, D.K.U.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. 2(5), 756–763 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

University Kuala Lumpur, 50250, Kuala Lumpur, Malaysia
Abdulaziz Aborujilah, Rasheed Mohammad Nassr, Mohd Nizam Husen, Nor Azlina Ali & Abdulaleem Al- Othmani
Nottingham Trent University, Nottingham, NG1 4FQ, UK
Tawfik Al-Hadhrami
Edge IA, IoT, Nottingham, UK
Mustapha Hamdi

Authors

Abdulaziz Aborujilah
View author publications
You can also search for this author in PubMed Google Scholar
Rasheed Mohammad Nassr
View author publications
You can also search for this author in PubMed Google Scholar
Tawfik Al-Hadhrami
View author publications
You can also search for this author in PubMed Google Scholar
Mohd Nizam Husen
View author publications
You can also search for this author in PubMed Google Scholar
Nor Azlina Ali
View author publications
You can also search for this author in PubMed Google Scholar
Abdulaleem Al- Othmani
View author publications
You can also search for this author in PubMed Google Scholar
Mustapha Hamdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulaziz Aborujilah .

Editor information

Editors and Affiliations

College of Computer Science and Engineering, Taibah University, Medina, Saudi Arabia
Faisal Saeed
School of Computing, Information Systems Department, Universiti Utara Malaysia, Sintok, Malaysia
Fathey Mohammed
Sanaa’a Community College, Sana'a, Yemen
Abdulaziz Al-Nahari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aborujilah, A. et al. (2021). Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-70713-2_1
Published: 06 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70712-5
Online ISBN: 978-3-030-70713-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics