Development and Validation of Machine Learning Models for Prediction of 1-Year Mortality Utilizing Electronic Medical Record Data Available at the End of Hospitalization in Multicondition Patients: a Proof-of-Concept Study
- 641 Downloads
Predicting death in a cohort of clinically diverse, multicondition hospitalized patients is difficult. Prognostic models that use electronic medical record (EMR) data to determine 1-year death risk can improve end-of-life planning and risk adjustment for research.
Determine if the final set of demographic, vital sign, and laboratory data from a hospitalization can be used to accurately quantify 1-year mortality risk.
A retrospective study using electronic medical record data linked with the state death registry.
A total of 59,848 hospitalized patients within a six-hospital network over a 4-year period.
The last set of vital signs, complete blood count, basic and complete metabolic panel, demographic information, and ICD codes. The outcome of interest was death within 1 year.
Model performance was measured on the validation data set. Random forests (RF) outperformed logisitic regression (LR) models in discriminative ability. An RF model that used the final set of demographic, vitals, and laboratory data from the final 48 h of hospitalization had an AUC of 0.86 (0.85–0.87) for predicting death within a year. Age, blood urea nitrogen, platelet count, hemoglobin, and creatinine were the most important variables in the RF model. Models that used comorbidity variables alone had the lowest AUC. In groups of patients with a high probability of death, RF models underestimated the probability by less than 10%.
The last set of EMR data from a hospitalization can be used to accurately estimate the risk of 1-year mortality within a cohort of multicondition hospitalized patients.
KEY WORDSmachine learning hospital outcomes predictive models data mining
Aspartate amino transferase
Alanine amino transferase
Agency for Health Care Research and Quality
End of life
End of life planning
Electronic medical record
Mean corpuscular volume
White blood cell count
Complete metabolic panel
Complete blood count
Basic metabolic panel
Area under the curve
Out of bag
International Classification of Diseases 9-Clinical Modification
International Classification of Diseases 10
Mean decrease in Gini Index
We thank Zohara Cohen, MS, and Justin Dale for help with the technical aspects of data management. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health Award Number UL1TR000114. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Compliance with Ethical Standards
Conflicts of Interest
The authors report no conflicts of interest in this work.
- 3.Van Walraven C, McAlister FA, Bakal JA, Hawken S, Donzé J. External validation of the Hospital-patient One-year Mortality Risk (HOMR) model for predicting death within 1 year after hospital admission. CMAJ. 2015;187(10):725-733. https://doi.org/10.1503/cmaj.150209.CrossRefPubMedPubMedCentralGoogle Scholar
- 8.Hripcsak G, Bloomrosen M, FlatelyBrennan P, et al. Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy Meeting. J Am Med Inform Assoc. 2013;21(2):204-211. https://doi.org/10.1136/amiajnl-2013-002117.CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Nguyen OK, Makam AN, Clark C, et al. Predicting all-cause 30-day hospital readmissions using electronic health record data over the course of hospitalization: Model derivation, validation and comparison. J Gen Intern Med. 2015;30(0):S231. https://doi.org/10.1002/jhm.2568.
- 13.James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. Vol 103.; 2013. https://doi.org/10.1007/978-1-4614-7138-7.
- 15.Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130-1139. https://doi.org/10.1097/01.mlr.0000182534.19832.83.CrossRefPubMedGoogle Scholar
- 16.Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J Biomed Inform. 2017;68:112-120. https://doi.org/10.1016/j.jbi.2017.03.009.CrossRefPubMedPubMedCentralGoogle Scholar
- 18.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (9780387848570): Trevor Hastie, Robert Tibshirani, Jerome Friedman: Books. In: The Elements of Statistical Learning: Dta Mining, Inference, and Prediction. ; 2011:501–520. http://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576/ref=sr_1_14?ie=UTF8&qid=1429565346&sr=8-14&keywords=machine+learning.