Skip to main content

Foundations of Machine Learning-Based Clinical Prediction Modeling: Part II—Generalization and Overfitting

  • Conference paper
  • First Online:
Machine Learning in Clinical Neuroscience

Part of the book series: Acta Neurochirurgica Supplement ((NEUROCHIRURGICA,volume 134))

Abstract

We review the concept of overfitting, which is a well-known concern within the machine learning community, but less established in the clinical community. Overfitted models may lead to inadequate conclusions that may wrongly or even harmfully shape clinical decision-making. Overfitting can be defined as the difference among discriminatory training and testing performance, while it is normal that out-of-sample performance is equal to or ever so slightly worse than training performance for any adequately fitted model, a massively worse out-of-sample performance suggests relevant overfitting. We delve into resampling methods, specifically recommending k-fold cross-validation and bootstrapping to arrive at realistic estimates of out-of-sample error during training. Also, we encourage the use of regularization techniques such as L1 or L2 regularization, and to choose an appropriate level of algorithm complexity for the type of dataset used. Data leakage is addressed, and the importance of external validation to assess true out-of-sample performance and to—upon successful external validation—release the model into clinical practice is discussed. Finally, for highly dimensional datasets, the concepts of feature reduction using principal component analysis (PCA) as well as feature elimination using recursive feature elimination (RFE) are elucidated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Domingos P. Process-oriented estimation of generalization error. In: IJCAI Int. Jt. Conf. Artif. Intell; 1999. p. 714–9.

    Google Scholar 

  2. Arplt D, Jastrzebskl S, Bailas N, et al. A closer look at memorization in deep networks. In: 34th Int. Conf. Mach. Learn. ICML 2017; 2017.

    Google Scholar 

  3. Goodfellow I, Yoshua Bengio AC. Deep learning book. In: Deep learn. Cambridge, MA: MIT Press; 2015. https://doi.org/10.1016/B978-0-12-391420-0.09987-X.

    Chapter  Google Scholar 

  4. Zhang C, Vinyals O, Munos R, Bengio S. A study on overfitting in deep reinforcement learning. ArXiv. 2018:180406893.

    Google Scholar 

  5. Domingos P. A few useful things to know about machine learning. Commun ACM. 2012;55(10):78.

    Article  Google Scholar 

  6. Domingos P. A unified bias-variance decomposition and its applications. In: Proc 17th Int. Conf Mach. Learn. San Francisco, CA: Morgan Kaufmann; 2000. p. 231–8.

    Google Scholar 

  7. James G, Hastie T. Generalizations of the bias/variance decomposition for prediction error. Stanford, CA: Department of Statistics, Stanford University; 1997.

    Google Scholar 

  8. Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11:63. https://doi.org/10.1023/A:1022631118932.

    Article  Google Scholar 

  9. Staartjes VE, Kernbach JM. Letter to the editor regarding “Investigating risk factors and predicting complications in deep brain stimulation surgery with machine learning algorithms”. World Neurosurg. 2020;137:496.

    Article  PubMed  Google Scholar 

  10. Davison AC, Hinkley DV. Bootstrap methods and their application. Cambridge: Cambridge University Press; 1997.

    Book  Google Scholar 

  11. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, Steyerberg EW, CENTER-TBI Collaborators. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122:95–107.

    Article  PubMed  Google Scholar 

  12. Quenouille MH. Notes on bias in estimation. Biometrika. 1956;43(3–4):353–60.

    Article  Google Scholar 

  13. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer Science & Business Media; 2013.

    Google Scholar 

  14. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York, NY: Chapman and Hall; 1993. https://doi.org/10.1007/978-1-4899-4541-9.

    Book  Google Scholar 

  15. Hastie T, Tibshirani R, James G, Witten D. An introduction to statistical learning. New York, NY: Springer; 2006. https://doi.org/10.1016/j.peva.2007.06.006.

    Book  Google Scholar 

  16. Staartjes VE, Kernbach JM. Letter to the editor. Importance of calibration assessment in machine learning-based predictive analytics. J Neurosurg Spine. 2020;32:985–7.

    Article  Google Scholar 

  17. Lever J, Krzywinski M, Altman N. Points of significance: regularization. Nat Methods. 2016;13:803. https://doi.org/10.1038/nmeth.4014.

    Article  CAS  Google Scholar 

  18. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67:301. https://doi.org/10.1111/j.1467-9868.2005.00503.x.

    Article  Google Scholar 

  19. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929–58.

    Google Scholar 

  20. Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry. 2019;77:534. https://doi.org/10.1001/jamapsychiatry.2019.3671.

    Article  Google Scholar 

  21. Kriegeskorte N, Simmons WK, Bellgowan PS, Baker CI. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci. 2009;12:535. https://doi.org/10.1038/nn.2303.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Ng AY. Preventing “overfitting” of cross-validation data. CEUR Workshop Proc. 2015;1542:33. https://doi.org/10.1017/CBO9781107415324.004.

    Article  Google Scholar 

  23. Varoquaux G. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage. 2018;180:68. https://doi.org/10.1016/j.neuroimage.2017.06.061.

    Article  PubMed  Google Scholar 

  24. Collins GS, Ogundimu EO, Le Manach Y. Assessing calibration in an external validation study. Spine J. 2015;15:2446. https://doi.org/10.1016/j.spinee.2015.06.043.

    Article  PubMed  Google Scholar 

  25. Staartjes VE, Schröder ML. Class imbalance in machine learning for neurosurgical outcome prediction: are our models valid? J Neurosurg Spine. 2018;26:736. https://doi.org/10.3171/2018.5.SPINE18543.

    Article  Google Scholar 

  26. Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci. 2001;16(3):199–231.

    Article  Google Scholar 

  27. Lever J, Krzywinski M, Altman N. Points of significance: principal component analysis. Nat Methods. 2017;14:641. https://doi.org/10.1038/nmeth.4346.

    Article  CAS  Google Scholar 

  28. Amunts K, Zilles K. Architectonic mapping of the human brain beyond brodmann. Neuron. 2015;88:1086. https://doi.org/10.1016/j.neuron.2015.12.001.

    Article  PubMed  CAS  Google Scholar 

  29. Glasser MF, Coalson TS, Robinson EC, et al. A multi-modal parcellation of human cerebral cortex. Nature. 2016;536:171. https://doi.org/10.1038/nature18933.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Blum AL, Langley P. Selection of relevant features and examples in machine learning. Artif Intell. 1997;97:245. https://doi.org/10.1016/s0004-3702(97)00063-5.

    Article  Google Scholar 

  31. Kohavi R, John GH. Wrappers for feature subset selection. Artif Intell. 1997;97:273. https://doi.org/10.1016/s0004-3702(97)00043-x.

    Article  Google Scholar 

  32. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389. https://doi.org/10.1023/A:1012487302797.

    Article  Google Scholar 

  33. Iguyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157. https://doi.org/10.1162/153244303322753616.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor E. Staartjes .

Editor information

Editors and Affiliations

Ethics declarations

Funding

No funding was received for this research.

Conflict of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

No human or animal participants were included in this study.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kernbach, J.M., Staartjes, V.E. (2022). Foundations of Machine Learning-Based Clinical Prediction Modeling: Part II—Generalization and Overfitting. In: Staartjes, V.E., Regli, L., Serra, C. (eds) Machine Learning in Clinical Neuroscience. Acta Neurochirurgica Supplement, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-85292-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85292-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85291-7

  • Online ISBN: 978-3-030-85292-4

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics