Abstract
Machine learning classifiers play vital role in biomedical signals analysis and disease diagnosis. The selection of proper machine learning model for disease detection is based on the data characteristics. Bias and variance are the important errors which affects the machine learning model performance. Bias and variance are often taken into consideration for error analysis of any model. Unbiasedness is often considered as a positive property of a classifier selection condition but here we present a low variance is at least as significant, as a non-negligible variance introduces the possible solution for over-fitting problem in classifier selection and model training. In machine learning (ML), the performance degradation caused by over-fitting the ML classifiers selection criterion is a common problem but attained a minimum attention in machine learning literature. This paper is aimed to address the problems faced due to over-fitting in machine learning. The effects of over-fitting are often of comparable degree to differences in performance between various learning algorithms and hence cannot be avoided in experimental evaluation. Common performance measuring matrices are dependent on selection of bias/variance and hence results in over-fitting which are unreliable in practice. We discuss various methods to evade the over-fitting in the selection of classifiers and also discuss subsequent bias/variance selection in performance parameter evaluation. While this study focuses on statistical parameter-based ML classifiers selection, the findings are quite general and can be applied for any model selection in practice involving ML classifiers selection in biomedical signal and data applications. The novelty of the suggested method highlights on effect of bias and variance in choosing the ML classifiers, especially for biomedical signals and data classification. There is very limited work carried out to the best of our knowledge toward the ML classifier selection based on bias and variance, and hence, our suggested method ensures better performance in abnormality detection using biomedical signal and data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Steyerberg EW (2019) Clinical prediction models. Springer, Berlin
Pradhan K, Chawla P (2020) Medical Internet of things using machine learning algorithms for lung cancer detection. J Manage Analytics 7(4):591–623
Dhaya R (2021) Efficient two stage identification for face mask detection using multiclass deep learning approach. J Ubiquitous Comput Commun Technol 3(2):107–121
Balasubramaniam V (2021) Artificial intelligence algorithm with SVM classification using dermascopic images for melanoma diagnosis. J Artif Intell Capsule Netw 3(1):34–42
James Deva Koresh H. Chacko S (2020) Hybrid speckle reduction filter for corneal OCT images. In: International conference on image processing and capsule networks, pp 87–99. Springer, Cham
Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr et al (2009) Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet 373(9665):739–745
D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB (1994) Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke 25(1):40–43
Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/
Gawehn E, Hiss JA, Schneider G (2016) Deep learning in drug discovery. Mol Inf 35:3–14
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18(6):463–477
Marcus G (2018) Deep learning: a critical appraisal. arXiv preprint arXiv:180100631
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T et al (2018) From hype to reality: data science enabling personalized medicine. BMC Med 16(1):150
Chi-Hsien K, Nagasawa S (2019) Applying machine learning to market analysis: knowing your luxury consumer. J Manage Analytics 6(4):404–419
Vafeiadis T, Dimitriou N, Ioannidis D, Wotherspoon T, Tinker G, Tzovaras D (2018) A framework for inspection of dies attachment on PCB utilizing machine learning techniques. J Manage Analytics 5(2):81–94
Kullaya Swamy A, Sarojamma B (2020) Bank transaction data modeling by optimized hybrid machine learning merged with ARIMA. J Manage Analytics 7(4):624–648
Wanigasekara C, Oromiehie E, Swain A, Prusty BG, Nguang SK (2021) Machine learning-based inverse predictive model for AFP based thermoplastic composites. J Ind Inf Integr 22:100197
Ding D, He F, Yuan L, Pan Z, Wang L, Ros M (2021) The first step towards intelligent wire arc additive manufacturing: an automatic bead modelling system using machine learning through industrial information integration. J Ind Inf Integr 23:100218
Bobrowski L (2004) Feature selection based on linear separability and a CPL criterion function. Task Q 8:183–192
Lee I, Shin YJ (2020) Machine learning for enterprises: Applications, algorithm selection, and challenges. Bus Horiz 63(2):157–170. ISSN 0007-6813. https://doi.org/10.1016/j.bushor.2019.10.005
Chen W, Zhang H, Mehlawat MK, Jia L (2021) Mean–variance portfolio optimization using machine learning-based stock price prediction. Appl Soft Comput 100:106943. ISSN 1568-4946. https://doi.org/10.1016/j.asoc.2020.106943
https://www.kaggle.com/datasets/tauilabdelilah/mrl-eye-dataset
https://www.kaggle.com/yasserhessein/thyroid-disease-data-set
Thambawita V, Jha D, Hammer HL, Johansen HD, Johansen D, Halvorsen P, Riegler MA (2020) An extensive study on cross-dataset bias and evaluation metrics interpretation for machine learning applied to gastrointestinal tract abnormality classification. ACM Trans Comput Healthcare 1(3) Article 17 (July 2020), 29 pages. https://doi.org/10.1145/3386295
Wang Q, Guo A (2020) An efficient variance estimator of AUC and its applications to binary classification. Stat Med 39:4281–4300. https://doi.org/10.1002/sim.8725
Chaubey G, Bisen D, Arjaria S, Yadav V (2020) Thyroid disease prediction using machine learning approaches. Natl Acad Sci Lett 44. https://doi.org/10.1007/s40009-020-00979-z
Jiao Y, Deng Y, Luo Y, Lu B-L (2020) Driver sleepiness detection from EEG and EOG signals using GAN and LSTM networks. Neurocomputing 408:100–111. ISSN 0925-2312. https://doi.org/10.1016/j.neucom.2019.05.108
Ma P, Gao Q (2020) EEG signal and feature interaction modeling-based eye behavior prediction research. Comput Math Methods Med 2020, Article ID 2801015, 10 pages. https://doi.org/10.1155/2020/2801015
Singh J, Bagga S, Kaur R (2020) Software-based prediction of liver disease with feature selection and classification techniques. Procedia Comput Sci 167:1970–1980. ISSN 1877-0509. https://doi.org/10.1016/j.procs.2020.03.226
Fathi M, Nemati M, Mohammadi S, Abbasi Kesbi R (2020) A machine learning approach based on SVM for classification of liver diseases. Biomed Eng: Appl Basis Commun 32:2050018. https://doi.org/10.4015/S1016237220500180
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Conflict of Interest
The authors declare no conflict of interest.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chakraborty, P., Rafiammal, S.S., Tharini, C., Jamal, D.N. (2022). Influence of Bias and Variance in Selection of Machine Learning Classifiers for Biomedical Applications. In: Asokan, R., Ruiz, D.P., Baig, Z.A., Piramuthu, S. (eds) Smart Data Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-3311-0_39
Download citation
DOI: https://doi.org/10.1007/978-981-19-3311-0_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3310-3
Online ISBN: 978-981-19-3311-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)