Foundations of Machine Learning-Based Clinical Prediction Modeling: Part II—Generalization and Overfitting

Kernbach, Julius M.; Staartjes, Victor E.

doi:10.1007/978-3-030-85292-4_3

Julius M. Kernbach⁵^na1 &
Victor E. Staartjes⁶^na1

Part of the book series: Acta Neurochirurgica Supplement ((NEUROCHIRURGICA,volume 134))

3141 Accesses
40 Citations
3 Altmetric

Abstract

We review the concept of overfitting, which is a well-known concern within the machine learning community, but less established in the clinical community. Overfitted models may lead to inadequate conclusions that may wrongly or even harmfully shape clinical decision-making. Overfitting can be defined as the difference among discriminatory training and testing performance, while it is normal that out-of-sample performance is equal to or ever so slightly worse than training performance for any adequately fitted model, a massively worse out-of-sample performance suggests relevant overfitting. We delve into resampling methods, specifically recommending k-fold cross-validation and bootstrapping to arrive at realistic estimates of out-of-sample error during training. Also, we encourage the use of regularization techniques such as L1 or L2 regularization, and to choose an appropriate level of algorithm complexity for the type of dataset used. Data leakage is addressed, and the importance of external validation to assess true out-of-sample performance and to—upon successful external validation—release the model into clinical practice is discussed. Finally, for highly dimensional datasets, the concepts of feature reduction using principal component analysis (PCA) as well as feature elimination using recursive feature elimination (RFE) are elucidated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Domingos P. Process-oriented estimation of generalization error. In: IJCAI Int. Jt. Conf. Artif. Intell; 1999. p. 714–9.
Google Scholar
Arplt D, Jastrzebskl S, Bailas N, et al. A closer look at memorization in deep networks. In: 34th Int. Conf. Mach. Learn. ICML 2017; 2017.
Google Scholar
Goodfellow I, Yoshua Bengio AC. Deep learning book. In: Deep learn. Cambridge, MA: MIT Press; 2015. https://doi.org/10.1016/B978-0-12-391420-0.09987-X.
Chapter Google Scholar
Zhang C, Vinyals O, Munos R, Bengio S. A study on overfitting in deep reinforcement learning. ArXiv. 2018:180406893.
Google Scholar
Domingos P. A few useful things to know about machine learning. Commun ACM. 2012;55(10):78.
Article Google Scholar
Domingos P. A unified bias-variance decomposition and its applications. In: Proc 17th Int. Conf Mach. Learn. San Francisco, CA: Morgan Kaufmann; 2000. p. 231–8.
Google Scholar
James G, Hastie T. Generalizations of the bias/variance decomposition for prediction error. Stanford, CA: Department of Statistics, Stanford University; 1997.
Google Scholar
Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11:63. https://doi.org/10.1023/A:1022631118932.
Article Google Scholar
Staartjes VE, Kernbach JM. Letter to the editor regarding “Investigating risk factors and predicting complications in deep brain stimulation surgery with machine learning algorithms”. World Neurosurg. 2020;137:496.
Article PubMed Google Scholar
Davison AC, Hinkley DV. Bootstrap methods and their application. Cambridge: Cambridge University Press; 1997.
Book Google Scholar
Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, Steyerberg EW, CENTER-TBI Collaborators. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122:95–107.
Article PubMed Google Scholar
Quenouille MH. Notes on bias in estimation. Biometrika. 1956;43(3–4):353–60.
Article Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer Science & Business Media; 2013.
Google Scholar
Efron B, Tibshirani RJ. An introduction to the bootstrap. New York, NY: Chapman and Hall; 1993. https://doi.org/10.1007/978-1-4899-4541-9.
Book Google Scholar
Hastie T, Tibshirani R, James G, Witten D. An introduction to statistical learning. New York, NY: Springer; 2006. https://doi.org/10.1016/j.peva.2007.06.006.
Book Google Scholar
Staartjes VE, Kernbach JM. Letter to the editor. Importance of calibration assessment in machine learning-based predictive analytics. J Neurosurg Spine. 2020;32:985–7.
Article Google Scholar
Lever J, Krzywinski M, Altman N. Points of significance: regularization. Nat Methods. 2016;13:803. https://doi.org/10.1038/nmeth.4014.
Article CAS Google Scholar
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301. https://doi.org/10.1111/j.1467-9868.2005.00503.x.
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.
Google Scholar
Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry. 2019;77:534. https://doi.org/10.1001/jamapsychiatry.2019.3671.
Article Google Scholar
Kriegeskorte N, Simmons WK, Bellgowan PS, Baker CI. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci. 2009;12:535. https://doi.org/10.1038/nn.2303.
Article PubMed PubMed Central CAS Google Scholar
Ng AY. Preventing “overfitting” of cross-validation data. CEUR Workshop Proc. 2015;1542:33. https://doi.org/10.1017/CBO9781107415324.004.
Article Google Scholar
Varoquaux G. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage. 2018;180:68. https://doi.org/10.1016/j.neuroimage.2017.06.061.
Article PubMed Google Scholar
Collins GS, Ogundimu EO, Le Manach Y. Assessing calibration in an external validation study. Spine J. 2015;15:2446. https://doi.org/10.1016/j.spinee.2015.06.043.
Article PubMed Google Scholar
Staartjes VE, Schröder ML. Class imbalance in machine learning for neurosurgical outcome prediction: are our models valid? J Neurosurg Spine. 2018;26:736. https://doi.org/10.3171/2018.5.SPINE18543.
Article Google Scholar
Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.
Article Google Scholar
Lever J, Krzywinski M, Altman N. Points of significance: principal component analysis. Nat Methods. 2017;14:641. https://doi.org/10.1038/nmeth.4346.
Article CAS Google Scholar
Amunts K, Zilles K. Architectonic mapping of the human brain beyond brodmann. Neuron. 2015;88:1086. https://doi.org/10.1016/j.neuron.2015.12.001.
Article PubMed CAS Google Scholar
Glasser MF, Coalson TS, Robinson EC, et al. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536:171. https://doi.org/10.1038/nature18933.
Article PubMed PubMed Central CAS Google Scholar
Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 1997;97:245. https://doi.org/10.1016/s0004-3702(97)00063-5.
Article Google Scholar
Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97:273. https://doi.org/10.1016/s0004-3702(97)00043-x.
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389. https://doi.org/10.1023/A:1012487302797.
Article Google Scholar
Iguyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157. https://doi.org/10.1162/153244303322753616.
Article Google Scholar

Download references

Author information

J. M. Kernbach and V. E. Staartjes have contributed equally to this series, and share first authorship.

Authors and Affiliations

Neurosurgical Artificial Intelligence Laboratory Aachen (NAILA), Department of Neurosurgery, RWTH Aachen University Hospital, Aachen, Germany
Julius M. Kernbach
Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Victor E. Staartjes

Authors

Julius M. Kernbach
View author publications
You can also search for this author in PubMed Google Scholar
Victor E. Staartjes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Victor E. Staartjes .

Editor information

Editors and Affiliations

Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Victor E. Staartjes
Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Luca Regli
Machine Intelligence in Clinical Neuroscience (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Carlo Serra

Ethics declarations

Funding

No funding was received for this research.

Conflict of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

No human or animal participants were included in this study.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kernbach, J.M., Staartjes, V.E. (2022). Foundations of Machine Learning-Based Clinical Prediction Modeling: Part II—Generalization and Overfitting. In: Staartjes, V.E., Regli, L., Serra, C. (eds) Machine Learning in Clinical Neuroscience. Acta Neurochirurgica Supplement, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-85292-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-85292-4_3
Published: 04 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85291-7
Online ISBN: 978-3-030-85292-4
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics