Skip to main content

Influence of Bias and Variance in Selection of Machine Learning Classifiers for Biomedical Applications

  • Conference paper
  • First Online:
Smart Data Intelligence

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Machine learning classifiers play vital role in biomedical signals analysis and disease diagnosis. The selection of proper machine learning model for disease detection is based on the data characteristics. Bias and variance are the important errors which affects the machine learning model performance. Bias and variance are often taken into consideration for error analysis of any model. Unbiasedness is often considered as a positive property of a classifier selection condition but here we present a low variance is at least as significant, as a non-negligible variance introduces the possible solution for over-fitting problem in classifier selection and model training. In machine learning (ML), the performance degradation caused by over-fitting the ML classifiers selection criterion is a common problem but attained a minimum attention in machine learning literature. This paper is aimed to address the problems faced due to over-fitting in machine learning. The effects of over-fitting are often of comparable degree to differences in performance between various learning algorithms and hence cannot be avoided in experimental evaluation. Common performance measuring matrices are dependent on selection of bias/variance and hence results in over-fitting which are unreliable in practice. We discuss various methods to evade the over-fitting in the selection of classifiers and also discuss subsequent bias/variance selection in performance parameter evaluation. While this study focuses on statistical parameter-based ML classifiers selection, the findings are quite general and can be applied for any model selection in practice involving ML classifiers selection in biomedical signal and data applications. The novelty of the suggested method highlights on effect of bias and variance in choosing the ML classifiers, especially for biomedical signals and data classification. There is very limited work carried out to the best of our knowledge toward the ML classifier selection based on bias and variance, and hence, our suggested method ensures better performance in abnormality detection using biomedical signal and data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Steyerberg EW (2019) Clinical prediction models. Springer, Berlin

    Book  Google Scholar 

  2. Pradhan K, Chawla P (2020) Medical Internet of things using machine learning algorithms for lung cancer detection. J Manage Analytics 7(4):591–623

    Article  Google Scholar 

  3. Dhaya R (2021) Efficient two stage identification for face mask detection using multiclass deep learning approach. J Ubiquitous Comput Commun Technol 3(2):107–121

    Google Scholar 

  4. Balasubramaniam V (2021) Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis. J Artif Intell Capsule Netw 3(1):34–42

    Article  Google Scholar 

  5. James Deva Koresh H. Chacko S (2020) Hybrid speckle reduction filter for corneal OCT images. In: International conference on image processing and capsule networks, pp 87–99. Springer, Cham

    Google Scholar 

  6. Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr et al (2009) Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet 373(9665):739–745

    Article  Google Scholar 

  7. D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB (1994) Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke 25(1):40–43

    Article  Google Scholar 

  8. Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/

  9. Gawehn E, Hiss JA, Schneider G (2016) Deep learning in drug discovery. Mol Inf 35:3–14

    Article  Google Scholar 

  10. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477

    Article  Google Scholar 

  11. Marcus G (2018) Deep learning: a critical appraisal. arXiv preprint arXiv:180100631

    Google Scholar 

  12. Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T et al (2018) From hype to reality: data science enabling personalized medicine. BMC Med 16(1):150

    Article  Google Scholar 

  13. Chi-Hsien K, Nagasawa S (2019) Applying machine learning to market analysis: knowing your luxury consumer. J Manage Analytics 6(4):404–419

    Article  Google Scholar 

  14. Vafeiadis T, Dimitriou N, Ioannidis D, Wotherspoon T, Tinker G, Tzovaras D (2018) A framework for inspection of dies attachment on PCB utilizing machine learning techniques. J Manage Analytics 5(2):81–94

    Article  Google Scholar 

  15. Kullaya Swamy A, Sarojamma B (2020) Bank transaction data modeling by optimized hybrid machine learning merged with ARIMA. J Manage Analytics 7(4):624–648

    Article  Google Scholar 

  16. Wanigasekara C, Oromiehie E, Swain A, Prusty BG, Nguang SK (2021) Machine learning-based inverse predictive model for AFP based thermoplastic composites. J Ind Inf Integr 22:100197

    Google Scholar 

  17. Ding D, He F, Yuan L, Pan Z, Wang L, Ros M (2021) The first step towards intelligent wire arc additive manufacturing: an automatic bead modelling system using machine learning through industrial information integration. J Ind Inf Integr 23:100218

    Google Scholar 

  18. Bobrowski L (2004) Feature selection based on linear separability and a CPL criterion function. Task Q 8:183–192

    Google Scholar 

  19. Lee I, Shin YJ (2020) Machine learning for enterprises: Applications, algorithm selection, and challenges. Bus Horiz 63(2):157–170. ISSN 0007-6813. https://doi.org/10.1016/j.bushor.2019.10.005

  20. Chen W, Zhang H, Mehlawat MK, Jia L (2021) Mean–variance portfolio optimization using machine learning-based stock price prediction. Appl Soft Comput 100:106943. ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2020.106943

  21. https://www.kaggle.com/datasets/tauilabdelilah/mrl-eye-dataset

  22. https://www.kaggle.com/uciml/indian-liver-patient-records

  23. https://www.kaggle.com/yasserhessein/thyroid-disease-data-set

  24. Thambawita V, Jha D, Hammer HL, Johansen HD, Johansen D, Halvorsen P, Riegler MA (2020) An extensive study on cross-dataset bias and evaluation metrics interpretation for machine learning applied to gastrointestinal tract abnormality classification. ACM Trans Comput Healthcare 1(3) Article 17 (July 2020), 29 pages. https://doi.org/10.1145/3386295

  25. Wang Q, Guo A (2020) An efficient variance estimator of AUC and its applications to binary classification. Stat Med 39:4281–4300. https://doi.org/10.1002/sim.8725

    Article  MathSciNet  Google Scholar 

  26. Chaubey G, Bisen D, Arjaria S, Yadav V (2020) Thyroid disease prediction using machine learning approaches. Natl Acad Sci Lett 44. https://doi.org/10.1007/s40009-020-00979-z

  27. Jiao Y, Deng Y, Luo Y, Lu B-L (2020) Driver sleepiness detection from EEG and EOG signals using GAN and LSTM networks. Neurocomputing 408:100–111. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2019.05.108

  28. Ma P, Gao Q (2020) EEG signal and feature interaction modeling-based eye behavior prediction research. Comput Math Methods Med 2020, Article ID 2801015, 10 pages. https://doi.org/10.1155/2020/2801015

  29. Singh J, Bagga S, Kaur R (2020) Software-based prediction of liver disease with feature selection and classification techniques. Procedia Comput Sci 167:1970–1980. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2020.03.226

  30. Fathi M, Nemati M, Mohammadi S, Abbasi Kesbi R (2020) A machine learning approach based on SVM for classification of liver diseases. Biomed Eng: Appl Basis Commun 32:2050018. https://doi.org/10.4015/S1016237220500180

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parnasree Chakraborty .

Editor information

Editors and Affiliations

Ethics declarations

Conflict of Interest

The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakraborty, P., Rafiammal, S.S., Tharini, C., Jamal, D.N. (2022). Influence of Bias and Variance in Selection of Machine Learning Classifiers for Biomedical Applications. In: Asokan, R., Ruiz, D.P., Baig, Z.A., Piramuthu, S. (eds) Smart Data Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-3311-0_39

Download citation

Publish with us

Policies and ethics