Skip to main content

Machine Learning Approaches for Handling Imbalances in Health Data Classification

  • Chapter
  • First Online:
Sustainable Statistical and Data Science Methods and Practices

Abstract

Imbalanced data classification is an important area of statistical learning that has gained research attention over the years. Despite extensive research, imbalanced data classification remains one of the most challenging problems in data science and machine learning, especially for large data sets. Health data imbalance poses challenges for machine learning models to classify data points accurately and can lead to biased and inaccurate predictions with severe consequences in medical settings. The majority of real world datasets are skewed. Reports from previous researchers suggest that sampling approaches are effective for data imbalances. To tackle this problem, in this study, we discuss sampling approaches, including oversampling and undersampling methods, such as Random Oversampling, SMOTE, ADASYN, Random Undersampling, Tomek links, NearMiss and so on, and conduct experiments on four different skewed health datasets to achieve promising performances. The four imbalanced secondary health data sets used are on Diabetics, Anaemia, Lung Cancer, and Obesity classification respectively. Results show that Repeated Edited Nearest Neighbours (RENN) undersampling technique with logistic regression is more effective in handling data skewness. RENN technique should be adopted in cases of imbalance in health research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. Olawale Awe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Awe, O.O., Ojumu, J.B., Ayanwoye, G.A., Ojumoola, J.S., Dias, R. (2023). Machine Learning Approaches for Handling Imbalances in Health Data Classification. In: Awe, O.O., Vance, E.A. (eds) Sustainable Statistical and Data Science Methods and Practices. STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health. Springer, Cham. https://doi.org/10.1007/978-3-031-41352-0_19

Download citation

Publish with us

Policies and ethics