ACTIVE SMOTE for Imbalanced Medical Data Classification

Sena, Raul; Ben Hamida, Sana

doi:10.1007/978-3-031-51664-1_6

Raul Sena¹² &
Sana Ben Hamida¹²

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 486))

Included in the following conference series:

International Conference on Information and Knowledge Systems

134 Accesses
1 Altmetric

Abstract

Classifying imbalanced data is a big challenge for machine learning techniques, especially for medical data. To deal with this challenge, many solutions have been proposed. The most famous methods are based on the Synthetic Minority Over-sampling Technique (SMOTE), which creates new synthetic instances in the minority class. In this paper, we study the efficiency of the SMOTE-based methods on some imbalanced data sets. We then propose extending these techniques with Active Learning to control the evolution of the minority class better. Active Learning uses uncertainty and diversity sampling to choose wisely the data points from which the synthetic samples will be generated. To evaluate our approach, we make comprehensive experimental studies on two medical data sets for diabetes diagnosis and breast cancer diagnosis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification

Article Open access 01 December 2016

A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Article Open access 25 April 2023

Notes

References

Aggarwal, C.C., Kong, X., Gu, Q., Han, J., Philip, S.Y.: Active learning: a survey. In: Data Classification, pp. 599–634. Chapman and Hall (2014)
Google Scholar
Bach, F.R., Heckerman, D., Horvitz, E.: Considering cost asymmetry in learning classifiers. J. Mach. Learn. Res. 7, 1713–1741 (2006)
MathSciNet Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Chen, B., Xia, S., Chen, Z., Wang, B., Wang, G.: RSMOTE: a self-adaptive robust smote for imbalanced problems with label noise. Inf. Sci. 553, 397–428 (2021). https://doi.org/10.1016/j.ins.2020.10.013
Article MathSciNet Google Scholar
Devarriya, D., Gulati, C., Mansharamani, V., Sakalle, A., Bhardwaj, A.: Unbalanced breast cancer data classification using novel fitness functions in genetic programming. 140, 112866. https://doi.org/10.1016/j.eswa.2019.112866
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Elreedy, D., Atiya, A.F.: A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Inf. Sci. 505, 32–64 (2019)
Article Google Scholar
Ertekin, S., Huang, J., Giles, C.L.: Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 823–824. ACM (2007). https://doi.org/10.1145/1277741.1277927
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets, vol. 10. Springer, Cham (2018)
Book Google Scholar
Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. 61, 863–905 (2018). https://doi.org/10.1613/jair.1.11192
Ben Hamida, S., Benjelloun, G., Hmida, H.: Trends of evolutionary machine learning to address big data mining. In: Saad, I., Rosenthal-Sabroux, C., Gargouri, F., Arduin, P.-E. (eds.) ICIKS 2021. LNBIP, vol. 425, pp. 85–99. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85977-0_7
Chapter Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008)
Google Scholar
Hmida, H., Hamida, S.B., Borgi, A., Rukoz, M.: Sampling methods in genetic programming learners from large datasets: a comparative study. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.) INNS 2016. AISC, vol. 529, pp. 50–60. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-47898-2_6
Chapter Google Scholar
Le, T., Vo, M.T., Vo, B., Lee, M.Y., Baik, S.W.: A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity 2019 (2019)
Google Scholar
Li, J., et al.: SMOTE-NaN-DE: addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution. Knowl.-Based Syst. 223, 107056 (2021)
Article Google Scholar
Oh, S., Lee, M.S., Zhang, B.T.: Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(2), 316–325 (2010). https://doi.org/10.1109/TCBB.2010.96
Article Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., Brunk, C.: Reducing misclassification costs. In: Machine Learning Proceedings, pp. 217–225. Elsevier (1994)
Google Scholar
Saez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015). https://doi.org/10.1016/j.ins.2014.08.051
Article Google Scholar
Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf. Sci. 572, 574–589 (2021)
Article MathSciNet Google Scholar
Zhang, J., Wu, X., Shengs, V.S.: Active learning with imbalanced multiple noisy labeling. IEEE Trans. Cybern. 45(5), 1095–1107 (2015). https://doi.org/10.1109/TCYB.2014.2344674
Article Google Scholar

Download references

Author information

Authors and Affiliations

Paris Dauphine University, PSL Research University, CNRS, UMR[7243], LAMSADE, Paris, France
Raul Sena & Sana Ben Hamida

Authors

Raul Sena
View author publications
You can also search for this author in PubMed Google Scholar
Sana Ben Hamida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sana Ben Hamida .

Editor information

Editors and Affiliations

ESC Amiens and University of Picardie Jules Verne, Amiens, France
Inès Saad
University of Paris Dauphine-PSL, Paris, France
Camille Rosenthal-Sabroux
University of Sfax, Sfax, Tunisia
Faiez Gargouri
University of Portsmouth, Portsmouth, UK
Salem Chakhar
University of Portsmouth, Portsmouth, UK
Nigel Williams
University of Portsmouth, Portsmouth, UK
Ella Haig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sena, R., Ben Hamida, S. (2024). ACTIVE SMOTE for Imbalanced Medical Data Classification. In: Saad, I., Rosenthal-Sabroux, C., Gargouri, F., Chakhar, S., Williams, N., Haig, E. (eds) Advances in Information Systems, Artificial Intelligence and Knowledge Management. ICIKS 2023. Lecture Notes in Business Information Processing, vol 486. Springer, Cham. https://doi.org/10.1007/978-3-031-51664-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-51664-1_6
Published: 20 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51663-4
Online ISBN: 978-3-031-51664-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ACTIVE SMOTE for Imbalanced Medical Data Classification

Abstract

Access this chapter

Similar content being viewed by others

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification

A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

ACTIVE SMOTE for Imbalanced Medical Data Classification

Abstract

Access this chapter

Similar content being viewed by others

Investigating the Stability of SMOTE-Based Oversampling on COVID-19 Data

Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification

A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation