Journal of General Internal Medicine

, Volume 33, Issue 6, pp 921–928 | Cite as

Development and Validation of Machine Learning Models for Prediction of 1-Year Mortality Utilizing Electronic Medical Record Data Available at the End of Hospitalization in Multicondition Patients: a Proof-of-Concept Study

  • Nishant Sahni
  • Gyorgy Simon
  • Rashi Arora
Original Research



Predicting death in a cohort of clinically diverse, multicondition hospitalized patients is difficult. Prognostic models that use electronic medical record (EMR) data to determine 1-year death risk can improve end-of-life planning and risk adjustment for research.


Determine if the final set of demographic, vital sign, and laboratory data from a hospitalization can be used to accurately quantify 1-year mortality risk.


A retrospective study using electronic medical record data linked with the state death registry.


A total of 59,848 hospitalized patients within a six-hospital network over a 4-year period.

Main Measures

The last set of vital signs, complete blood count, basic and complete metabolic panel, demographic information, and ICD codes. The outcome of interest was death within 1 year.

Key results

Model performance was measured on the validation data set. Random forests (RF) outperformed logisitic regression (LR) models in discriminative ability. An RF model that used the final set of demographic, vitals, and laboratory data from the final 48 h of hospitalization had an AUC of 0.86 (0.85–0.87) for predicting death within a year. Age, blood urea nitrogen, platelet count, hemoglobin, and creatinine were the most important variables in the RF model. Models that used comorbidity variables alone had the lowest AUC. In groups of patients with a high probability of death, RF models underestimated the probability by less than 10%.


The last set of EMR data from a hospitalization can be used to accurately estimate the risk of 1-year mortality within a cohort of multicondition hospitalized patients.


machine learning hospital outcomes predictive models data mining 



Aspartate amino transferase


Alanine amino transferase


Agency for Health Care Research and Quality


End of life


End of life planning


Electronic medical record


Mean corpuscular volume


White blood cell count


Complete metabolic panel


Complete blood count


Basic metabolic panel


Random forest


Likelihood ratio


Logistic regression


Standard deviation


Area under the curve


Receiver-operator curve


Out of bag


International Classification of Diseases 9-Clinical Modification


International Classification of Diseases 10


Mean decrease in Gini Index


Machine learning



We thank Zohara Cohen, MS, and Justin Dale for help with the technical aspects of data management. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health Award Number UL1TR000114. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Compliance with Ethical Standards

Conflicts of Interest

The authors report no conflicts of interest in this work.

Supplementary material

11606_2018_4316_MOESM1_ESM.docx (495 kb)
ESM 1 (DOCX 494 kb)


  1. 1.
    Frost DW, Cook DJ, Heyland DK, Fowler RA. Patient and healthcare professional factors influencing end-of-life decision-making during critical illness: a systematic review*. Crit Care Med. 2011;39(5):1174–1189. Scholar
  2. 2.
    You JJ, Downar J, Fowler RA, et al. Barriers to goals of care discussions with seriously ill hospitalized patients and their families: a multicenter survey of clinicians. JAMA Intern Med. 2015;175(4):549-556. Scholar
  3. 3.
    Van Walraven C, McAlister FA, Bakal JA, Hawken S, Donzé J. External validation of the Hospital-patient One-year Mortality Risk (HOMR) model for predicting death within 1 year after hospital admission. CMAJ. 2015;187(10):725-733. Scholar
  4. 4.
    Tabak YP, Sun X, Nunez CM, Johannes RS. Using electronic health record data to develop inpatient mortality predictive model: Acute Laboratory Risk of Mortality Score (ALaRMS). J Am Med Informatics Assoc. 2014;21(3):455-463. Scholar
  5. 5.
    Escobar GJ, Greene JD, Scheirer P, Gardner MN, Draper D, Kipnis P. Risk-Adjusting Hospital Inpatient Mortality Using Automated Inpatient, Outpatient, and Laboratory Databases. Med Care. 2008;46(3):232-239. Scholar
  6. 6.
    Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic indices for older adults: a systematic review. JAMA. 2012;307(2):182-192. Scholar
  7. 7.
    Van Walraven C. The Hospital-patient One-year Mortality Risk score accurately predicted long-term death risk in hospitalized patients. J Clin Epidemiol. 2014;67(9):1025-1034. Scholar
  8. 8.
    Hripcsak G, Bloomrosen M, FlatelyBrennan P, et al. Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy Meeting. J Am Med Inform Assoc. 2013;21(2):204-211. Scholar
  9. 9.
    Fialho AS, Cismondi F, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39(18):13158-13165. Scholar
  10. 10.
    Rothman MJ, Rothman SI, Beals J. Development and validation of a continuous measure of patient condition using the Electronic Medical Record. J Biomed Inform. 2013;46(5):837-848. Scholar
  11. 11.
    Nguyen OK, Makam AN, Clark C, et al. Predicting all-cause 30-day hospital readmissions using electronic health record data over the course of hospitalization: Model derivation, validation and comparison. J Gen Intern Med. 2015;30(0):S231.
  12. 12.
    Nguyen OK, Makam AN, Clark C, et al. Vital Signs Are Still Vital: Instability on Discharge and the Risk of Post-Discharge Adverse Outcomes. J Gen Intern Med. 2017;32(1):42-48. Scholar
  13. 13.
    James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. Vol 103.; 2013.
  14. 14.
    Hripcsak G, Albers DJ. Correlating electronic health record concepts with healthcare process events. J Am Med Inform Assoc. 2013;20(e2):e311-8. Scholar
  15. 15.
    Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130-1139. Scholar
  16. 16.
    Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J Biomed Inform. 2017;68:112-120. Scholar
  17. 17.
    Breiman L. Random forest. Mach Learn. 1999;45(5):1-35. Scholar
  18. 18.
    Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (9780387848570): Trevor Hastie, Robert Tibshirani, Jerome Friedman: Books. In: The Elements of Statistical Learning: Dta Mining, Inference, and Prediction. ; 2011:501–520.
  19. 19.
    Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. Scholar
  20. 20.
    Shi T, Horvath S. Unsupervised Learning With Random Forest Predictors. J Comput Graph Stat. 2006;15(1):118-138. Scholar
  21. 21.
    Grömping U. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am Stat. 2009;63(4):308-319. Scholar
  22. 22.
    DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837-845. Scholar
  23. 23.
    Krumholz HM, Wang Y, Mattera JA, et al. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with heart failure. Circulation. 2006;113(13):1693-1701. Scholar
  24. 24.
    Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117-121. Scholar
  25. 25.
    Nguyen OK, Makam AN, Clark C, et al. Predicting all-cause readmissions using electronic health record data from the entire hospitalization: Model development and comparison. J Hosp Med. 2016;11(7):473-480. Scholar

Copyright information

© Society of General Internal Medicine 2018

Authors and Affiliations

  1. 1.Division of General Internal MedicineUniversity of MinnesotaMinneapolisUSA
  2. 2.Institute of Health InformaticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations