Skip to main content

ACTIVE SMOTE for Imbalanced Medical Data Classification

  • Conference paper
  • First Online:
Advances in Information Systems, Artificial Intelligence and Knowledge Management (ICIKS 2023)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 486))

Included in the following conference series:

Abstract

Classifying imbalanced data is a big challenge for machine learning techniques, especially for medical data. To deal with this challenge, many solutions have been proposed. The most famous methods are based on the Synthetic Minority Over-sampling Technique (SMOTE), which creates new synthetic instances in the minority class. In this paper, we study the efficiency of the SMOTE-based methods on some imbalanced data sets. We then propose extending these techniques with Active Learning to control the evolution of the minority class better. Active Learning uses uncertainty and diversity sampling to choose wisely the data points from which the synthetic samples will be generated. To evaluate our approach, we make comprehensive experimental studies on two medical data sets for diabetes diagnosis and breast cancer diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://imbalanced-learn.org/stable/index.html.

  2. 2.

    http://archive.ics.uci.edu/ml.

References

  1. Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Philip, S.Y.: Active learning: a survey. In: Data Classification, pp. 599–634. Chapman and Hall (2014)

    Google Scholar 

  2. Bach, F.R., Heckerman, D., Horvitz, E.: Considering cost asymmetry in learning classifiers. J. Mach. Learn. Res. 7, 1713–1741 (2006)

    MathSciNet  Google Scholar 

  3. Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  4. Chen, B., Xia, S., Chen, Z., Wang, B., Wang, G.: RSMOTE: a self-adaptive robust smote for imbalanced problems with label noise. Inf. Sci. 553, 397–428 (2021). https://doi.org/10.1016/j.ins.2020.10.013

    Article  MathSciNet  Google Scholar 

  5. Devarriya, D., Gulati, C., Mansharamani, V., Sakalle, A., Bhardwaj, A.: Unbalanced breast cancer data classification using novel fitness functions in genetic programming. 140, 112866. https://doi.org/10.1016/j.eswa.2019.112866

  6. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  7. Elreedy, D., Atiya, A.F.: A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Inf. Sci. 505, 32–64 (2019)

    Article  Google Scholar 

  8. Ertekin, S., Huang, J., Giles, C.L.: Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 823–824. ACM (2007). https://doi.org/10.1145/1277741.1277927

  9. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets, vol. 10. Springer, Cham (2018)

    Book  Google Scholar 

  10. Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. 61, 863–905 (2018). https://doi.org/10.1613/jair.1.11192

  11. Ben Hamida, S., Benjelloun, G., Hmida, H.: Trends of evolutionary machine learning to address big data mining. In: Saad, I., Rosenthal-Sabroux, C., Gargouri, F., Arduin, P.-E. (eds.) ICIKS 2021. LNBIP, vol. 425, pp. 85–99. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85977-0_7

    Chapter  Google Scholar 

  12. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  13. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008)

    Google Scholar 

  14. Hmida, H., Hamida, S.B., Borgi, A., Rukoz, M.: Sampling methods in genetic programming learners from large datasets: a comparative study. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 50–60. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47898-2_6

    Chapter  Google Scholar 

  15. Le, T., Vo, M.T., Vo, B., Lee, M.Y., Baik, S.W.: A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity 2019 (2019)

    Google Scholar 

  16. Li, J., et al.: SMOTE-NaN-DE: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution. Knowl.-Based Syst. 223, 107056 (2021)

    Article  Google Scholar 

  17. Oh, S., Lee, M.S., Zhang, B.T.: Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 316–325 (2010). https://doi.org/10.1109/TCBB.2010.96

    Article  Google Scholar 

  18. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Machine Learning Proceedings, pp. 217–225. Elsevier (1994)

    Google Scholar 

  19. Saez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015). https://doi.org/10.1016/j.ins.2014.08.051

    Article  Google Scholar 

  20. Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)

    Article  MathSciNet  Google Scholar 

  21. Zhang, J., Wu, X., Shengs, V.S.: Active learning with imbalanced multiple noisy labeling. IEEE Trans. Cybern. 45(5), 1095–1107 (2015). https://doi.org/10.1109/TCYB.2014.2344674

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sana Ben Hamida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sena, R., Ben Hamida, S. (2024). ACTIVE SMOTE for Imbalanced Medical Data Classification. In: Saad, I., Rosenthal-Sabroux, C., Gargouri, F., Chakhar, S., Williams, N., Haig, E. (eds) Advances in Information Systems, Artificial Intelligence and Knowledge Management. ICIKS 2023. Lecture Notes in Business Information Processing, vol 486. Springer, Cham. https://doi.org/10.1007/978-3-031-51664-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51664-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51663-4

  • Online ISBN: 978-3-031-51664-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics