Skip to main content

Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods

  • Conference paper
  • First Online:
Innovative Systems for Intelligent Health Informatics (IRICT 2020)

Abstract

Recently, there has been a renewed interest in smart health systems that aim to deliver high quality healthcare services. Prediction methods are very essential to support these systems. They mainly rely on datasets with assumptions that match the reality. However, one of the greatest challenges to prediction methods is to have datasets which are normally distributed. This paper presents an experimental work to implement SMOTE (Synthetic Minority Oversampling Technique) and bootstrapping methods to normalize datasets. It also measured the impact of both methods in the performance of different prediction methods such as Support vector machine (SVM), Naive Bayes, and neural network(NN) The results showed that bootstrapping with native bays yielded better prediction performance as compared to other prediction methods with SMOTE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ebenuwa, S.H., Sharif, M.S., Alazab, M., Al-Nemrat, A.: Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7, 24649–24666 (2019)

    Article  Google Scholar 

  2. Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv Prepr. arXiv1305.1707 (2013)

    Google Scholar 

  3. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. (Ny) 250, 113–141 (2013)

    Article  Google Scholar 

  4. Meidan, Y., et al.: N-BaIoT—network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17(3), 12–22 (2018)

    Article  Google Scholar 

  5. Nguyen, G.H., Bouzerdoum, A., Phung, S.L.: Learning pattern classification tasks with imbalanced data sets. In: Pattern recognition, IntechOpen (2009)

    Google Scholar 

  6. Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 58(1), 15–26 (2019)

    Article  Google Scholar 

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. Caminero, G., Lopez-Martin, M., Carro, B.: Adversarial environment reinforcement learning algorithm for intrusion detection. Comput. Netw. 159, 96–109 (2019)

    Article  Google Scholar 

  9. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J.: Improving software-quality predictions with data sampling and boosting. IEEE Trans. Syst. Man, Cybern. Syst. Humans 39(6), 1283–1294 (2009)

    Google Scholar 

  10. Drummond, C., Holte, R.C.: “C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced datasets II, vol. 11, pp. 1–8 (2003)

    Google Scholar 

  11. Liu, A., Ghosh, J., Martin, C.E.: Generative Oversampling for Mining Imbalanced Datasets. In: DMIN, pp. 66–72 (2007)

    Google Scholar 

  12. Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access 4, 9145–9154 (2016)

    Article  Google Scholar 

  13. Team, A.V.C.: Practical guide to deal with imbalanced classification problems in R. Analytics Vidhya (2016)

    Google Scholar 

  14. Wang, Q., Luo, Z., Huang, J., Feng, Y., Liu, Z.: A novel ensemble method for imbalanced data learning: bagging of extrapolation-SMOTE SVM. Comput. Intell. Neurosci. 2017, (2017)

    Google Scholar 

  15. Liu, R., Hall, L.O., Bowyer, K.W., Goldgof, D.B., Gatenby, R., Ben Ahmed, K.: Synthetic minority image over-sampling technique: How to improve AUC for glioblastoma patient survival prediction. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1357–1362 (2017)

    Google Scholar 

  16. Wijermans, N., Conrado, C., van Steen, M., Martella, C., Li, J.: A landscape of crowd-management support: an integrative approach. Saf. Sci. 86, 142-164 (2016)

    Google Scholar 

  17. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  18. Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111 (2011)

    Google Scholar 

  19. Stefanowski, J., Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292 (2008)

    Google Scholar 

  20. Liaw, A., Wiener, M.: Classification and regression by randomForest. R news 2(3), 18–22 (2002)

    Google Scholar 

  21. Lavanya, D., Rani, D.K.U.: Analysis of feature selection with classification: breast cancer datasets. Indian J. Comput. Sci. Eng. 2(5), 756–763 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulaziz Aborujilah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aborujilah, A. et al. (2021). Comparative Study of SMOTE and Bootstrapping Performance Based on Predication Methods. In: Saeed, F., Mohammed, F., Al-Nahari, A. (eds) Innovative Systems for Intelligent Health Informatics. IRICT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 72. Springer, Cham. https://doi.org/10.1007/978-3-030-70713-2_1

Download citation

Publish with us

Policies and ethics