Abstract
We illustrate the steps required to train and validate a simple, machine learning-based clinical prediction model for any binary outcome, such as, for example, the occurrence of a complication, in the statistical programming language R. To illustrate the methods applied, we supply a simulated database of 10,000 glioblastoma patients who underwent microsurgery, and predict the occurrence of 12-month survival. We walk the reader through each step, including import, checking, and splitting of datasets. In terms of pre-processing, we focus on how to practically implement imputation using a k-nearest neighbor algorithm, and how to perform feature selection using recursive feature elimination. When it comes to training models, we apply the theory discussed in Parts I–III. We show how to implement bootstrapping and to evaluate and select models based on out-of-sample error. Specifically for classification, we discuss how to counteract class imbalance by using upsampling techniques. We discuss how the reporting of a minimum of accuracy, area under the curve (AUC), sensitivity, and specificity for discrimination, as well as slope and intercept for calibration—if possible alongside a calibration plot—is paramount. Finally, we explain how to arrive at a measure of variable importance using a universal, AUC-based method. We provide the full, structured code, as well as the complete glioblastoma survival database for the readers to download and execute in parallel to this section.
J. M. Kernbach and V. E. Staartjes have contributed equally to this series, and share first authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brusko GD, Kolcun JPG, Wang MY. Machine-learning models: the future of predictive analytics in neurosurgery. Neurosurgery. 2018;83(1):E3–4.
Celtikci E. A systematic review on machine learning in neurosurgery: the future of decision making in patient care. Turk Neurosurg. 2017;28:167. https://doi.org/10.5137/1019-5149.JTN.20059-17.1.
Senders JT, Staples PC, Karhade AV, Zaki MM, Gormley WB, Broekman MLD, Smith TR, Arnaout O. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018;109:476–486.e1.
Senders JT, Zaki MM, Karhade AV, Chang B, Gormley WB, Broekman ML, Smith TR, Arnaout O. An introduction and overview of machine learning in neurosurgical care. Acta Neurochir. 2018;160(1):29–38.
Djuric U, Zadeh G, Aldape K, Diamandis P. Precision histology: how deep learning is poised to revitalize histomorphology for personalized cancer care. NPJ Precis Oncol. 2017;1(1):22.
Kernbach JM, Yeo BTT, Smallwood J, et al. Subspecialization within default mode nodes characterized in 10,000 UK Biobank participants. Proc Natl Acad Sci U S A. 2018;115(48):12295–300.
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–58.
Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1(1):1–10.
Senders JT, Karhade AV, Cote DJ, et al. Natural language processing for automated quantification of brain metastases reported in free-text radiology reports. JCO Clin Cancer Inform. 2019;3:1–9.
Swinburne NC, Schefflein J, Sakai Y, Oermann EK, Titano JJ, Chen I, Tadayon S, Aggarwal A, Doshi A, Nael K. Machine learning for semi-automated classification of glioblastoma, brain metastasis and central nervous system lymphoma using magnetic resonance advanced imaging. Ann Transl Med. 2019;7(11):232.
Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med. 2018;24(9):1337–41.
Weng SF, Vaz L, Qureshi N, Kai J. Prediction of premature all-cause mortality: a prospective general population cohort study comparing machine-learning and standard epidemiological approaches. PLoS One. 2019;14(3):e0214365.
Zlochower A, Chow DS, Chang P, Khatri D, Boockvar JA, Filippi CG. Deep learning AI applications in the imaging of glioma. Top Magn Reson Imaging. 2020;29(2):115–0.
R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2020.
Rinker T, Kurkiewicz D, Hughitt K, Wang A, Aden-Buie G, Wang A, Burk L. pacman: package management tool. 2019.
Ooi H, Microsoft Corporation, Weston S, Tenenbaum D. doParallel: foreach parallel adaptor for the “parallel” package. 2019.
Templ M, Kowarik A, Alfons A, Prantner B. VIM: visualization and imputation of missing values. 2019.
Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003;17(5–6):519–33.
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1. https://doi.org/10.18637/jss.v028.i05.
Hosmer DW, Lemeshow S. Applied logistic regression. 2nd ed. New York, NY: Wiley; 2000.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Greenwell B, Boehmke B, Cunningham J. Developers. GBM (2019) gbm: generalized boosted regression models. 2020. https://github.com/gbm-developers.
Hastie T. gam: generalized additive models. 2019.
Roever C, Raabe N, Luebke K, Ligges U, Szepannek G, Zentgraf M. klaR: classification and visualization. 2018.
Gravesteijn BY, Nieboer D, Ercole A, Lingsma HF, Nelson D, van Calster B, Steyerberg EW, CENTER-TBI Collaborators. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol. 2020;122:95–107.
Staartjes VE, Kernbach JM. Letter to the editor regarding “Investigating risk factors and predicting complications in deep brain stimulation surgery with machine learning algorithms”. World Neurosurg. 2020;137:496.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Staartjes VE, Schröder ML. Letter to the editor. Class imbalance in machine learning for neurosurgical outcome prediction: are our models valid? J Neurosurg Spine. 2018;29(5):611–2.
Harrell FE. rms: regression modeling strategies. 2019.
Staartjes VE, Kernbach JM. Letter to the editor. Importance of calibration assessment in machine learning-based predictive analytics. J Neurosurg Spine. 2020;32:985–7.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M, S Siegert, M Doering, Z Billings. pROC: display and analyze ROC curves. 2021.
Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Funding
No funding was received for this research.
Conflict of Interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Ethical Approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent
No human or animal participants were included in this study.
1 Electronic Supplementary Material
Supplement 5.1
R Code (R 17 kb)
Supplement 5.2
Simulated Glioblastoma dataset (XLSX 1851 kb)
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Staartjes, V.E., Kernbach, J.M. (2022). Foundations of Machine Learning-Based Clinical Prediction Modeling: Part IV—A Practical Approach to Binary Classification Problems. In: Staartjes, V.E., Regli, L., Serra, C. (eds) Machine Learning in Clinical Neuroscience. Acta Neurochirurgica Supplement, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-030-85292-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-85292-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85291-7
Online ISBN: 978-3-030-85292-4
eBook Packages: MedicineMedicine (R0)