Skip to main content

Foundations of Machine Learning-Based Clinical Prediction Modeling: Part IV—A Practical Approach to Binary Classification Problems

  • Conference paper
  • First Online:
Machine Learning in Clinical Neuroscience

Part of the book series: Acta Neurochirurgica Supplement ((NEUROCHIRURGICA,volume 134))

  • 2351 Accesses

Abstract

We illustrate the steps required to train and validate a simple, machine learning-based clinical prediction model for any binary outcome, such as, for example, the occurrence of a complication, in the statistical programming language R. To illustrate the methods applied, we supply a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict the occurrence of 12-month survival. We walk the reader through each step, including import, checking, and splitting of datasets. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm, and how to perform feature selection using recursive feature elimination. When it comes to training models, we apply the theory discussed in Parts I–III. We show how to implement bootstrapping and to evaluate and select models based on out-of-sample error. Specifically for classification, we discuss how to counteract class imbalance by using upsampling techniques. We discuss how the reporting of a minimum of accuracy, area under the curve (AUC), sensitivity, and specificity for discrimination, as well as slope and intercept for calibration—if possible alongside a calibration plot—is paramount. Finally, we explain how to arrive at a measure of variable importance using a universal, AUC-based method. We provide the full, structured code, as well as the complete glioblastoma survival database for the readers to download and execute in parallel to this section.

J. M. Kernbach and V. E. Staartjes have contributed equally to this series, and share first authorship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brusko GD, Kolcun JPG, Wang MY. Machine-learning models: the future of predictive analytics in neurosurgery. Neurosurgery. 2018;83(1):E3–4.

    Article  PubMed  Google Scholar 

  2. Celtikci E. A systematic review on machine learning in neurosurgery: the future of decision making in patient care. Turk Neurosurg. 2017;28:167. https://doi.org/10.5137/1019-5149.JTN.20059-17.1.

    Article  Google Scholar 

  3. Senders JT, Staples PC, Karhade AV, Zaki MM, Gormley WB, Broekman MLD, Smith TR, Arnaout O. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018;109:476–486.e1.

    Article  PubMed  Google Scholar 

  4. Senders JT, Zaki MM, Karhade AV, Chang B, Gormley WB, Broekman ML, Smith TR, Arnaout O. An introduction and overview of machine learning in neurosurgical care. Acta Neurochir. 2018;160(1):29–38.

    Article  PubMed  Google Scholar 

  5. Djuric U, Zadeh G, Aldape K, Diamandis P. Precision histology: how deep learning is poised to revitalize histomorphology for personalized cancer care. NPJ Precis Oncol. 2017;1(1):22.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Kernbach JM, Yeo BTT, Smallwood J, et al. Subspecialization within default mode nodes characterized in 10,000 UK Biobank participants. Proc Natl Acad Sci U S A. 2018;115(48):12295–300.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–58.

    Article  PubMed  Google Scholar 

  8. Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1(1):1–10.

    Article  Google Scholar 

  9. Senders JT, Karhade AV, Cote DJ, et al. Natural language processing for automated quantification of brain metastases reported in free-text radiology reports. JCO Clin Cancer Inform. 2019;3:1–9.

    Article  PubMed  Google Scholar 

  10. Swinburne NC, Schefflein J, Sakai Y, Oermann EK, Titano JJ, Chen I, Tadayon S, Aggarwal A, Doshi A, Nael K. Machine learning for semi-automated classification of glioblastoma, brain metastasis and central nervous system lymphoma using magnetic resonance advanced imaging. Ann Transl Med. 2019;7(11):232.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med. 2018;24(9):1337–41.

    Article  PubMed  CAS  Google Scholar 

  12. Weng SF, Vaz L, Qureshi N, Kai J. Prediction of premature all-cause mortality: a prospective general population cohort study comparing machine-learning and standard epidemiological approaches. PLoS One. 2019;14(3):e0214365.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Zlochower A, Chow DS, Chang P, Khatri D, Boockvar JA, Filippi CG. Deep learning AI applications in the imaging of glioma. Top Magn Reson Imaging. 2020;29(2):115–0.

    Article  PubMed  Google Scholar 

  14. R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2020.

    Google Scholar 

  15. Rinker T, Kurkiewicz D, Hughitt K, Wang A, Aden-Buie G, Wang A, Burk L. pacman: package management tool. 2019.

    Google Scholar 

  16. Ooi H, Microsoft Corporation, Weston S, Tenenbaum D. doParallel: foreach parallel adaptor for the “parallel” package. 2019.

    Google Scholar 

  17. Templ M, Kowarik A, Alfons A, Prantner B. VIM: visualization and imputation of missing values. 2019.

    Google Scholar 

  18. Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003;17(5–6):519–33.

    Article  Google Scholar 

  19. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1. https://doi.org/10.18637/jss.v028.i05.

    Article  Google Scholar 

  20. Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York, NY: Wiley; 2000.

    Book  Google Scholar 

  21. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

    Article  Google Scholar 

  22. Greenwell B, Boehmke B, Cunningham J. Developers. GBM (2019) gbm: generalized boosted regression models. 2020. https://github.com/gbm-developers.

  23. Hastie T. gam: generalized additive models. 2019.

    Google Scholar 

  24. Roever C, Raabe N, Luebke K, Ligges U, Szepannek G, Zentgraf M. klaR: classification and visualization. 2018.

    Google Scholar 

  25. Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, Steyerberg EW, CENTER-TBI Collaborators. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122:95–107.

    Article  PubMed  Google Scholar 

  26. Staartjes VE, Kernbach JM. Letter to the editor regarding “Investigating risk factors and predicting complications in deep brain stimulation surgery with machine learning algorithms”. World Neurosurg. 2020;137:496.

    Article  PubMed  Google Scholar 

  27. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.

    Article  Google Scholar 

  28. Staartjes VE, Schröder ML. Letter to the editor. Class imbalance in machine learning for neurosurgical outcome prediction: are our models valid? J Neurosurg Spine. 2018;29(5):611–2.

    Article  PubMed  Google Scholar 

  29. Harrell FE. rms: regression modeling strategies. 2019.

    Google Scholar 

  30. Staartjes VE, Kernbach JM. Letter to the editor. Importance of calibration assessment in machine learning-based predictive analytics. J Neurosurg Spine. 2020;32:985–7.

    Article  Google Scholar 

  31. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M, S Siegert, M Doering, Z Billings. pROC: display and analyze ROC curves. 2021.

    Google Scholar 

  32. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Victor E. Staartjes .

Editor information

Editors and Affiliations

Ethics declarations

Funding

No funding was received for this research.

Conflict of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

No human or animal participants were included in this study.

1 Electronic Supplementary Material

Supplement 5.1

R Code (R 17 kb)

Supplement 5.2

Simulated Glioblastoma dataset (XLSX 1851 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Staartjes, V.E., Kernbach, J.M. (2022). Foundations of Machine Learning-Based Clinical Prediction Modeling: Part IV—A Practical Approach to Binary Classification Problems. In: Staartjes, V.E., Regli, L., Serra, C. (eds) Machine Learning in Clinical Neuroscience. Acta Neurochirurgica Supplement, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-85292-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85292-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85291-7

  • Online ISBN: 978-3-030-85292-4

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics