Advertisement

Journal of General Internal Medicine

, Volume 33, Issue 6, pp 921–928 | Cite as

Development and Validation of Machine Learning Models for Prediction of 1-Year Mortality Utilizing Electronic Medical Record Data Available at the End of Hospitalization in Multicondition Patients: a Proof-of-Concept Study

  • Nishant Sahni
  • Gyorgy Simon
  • Rashi Arora
Original Research

Abstract

Background

Predicting death in a cohort of clinically diverse, multicondition hospitalized patients is difficult. Prognostic models that use electronic medical record (EMR) data to determine 1-year death risk can improve end-of-life planning and risk adjustment for research.

Objective

Determine if the final set of demographic, vital sign, and laboratory data from a hospitalization can be used to accurately quantify 1-year mortality risk.

Design

A retrospective study using electronic medical record data linked with the state death registry.

Participants

A total of 59,848 hospitalized patients within a six-hospital network over a 4-year period.

Main Measures

The last set of vital signs, complete blood count, basic and complete metabolic panel, demographic information, and ICD codes. The outcome of interest was death within 1 year.

Key results

Model performance was measured on the validation data set. Random forests (RF) outperformed logisitic regression (LR) models in discriminative ability. An RF model that used the final set of demographic, vitals, and laboratory data from the final 48 h of hospitalization had an AUC of 0.86 (0.85–0.87) for predicting death within a year. Age, blood urea nitrogen, platelet count, hemoglobin, and creatinine were the most important variables in the RF model. Models that used comorbidity variables alone had the lowest AUC. In groups of patients with a high probability of death, RF models underestimated the probability by less than 10%.

Conclusion

The last set of EMR data from a hospitalization can be used to accurately estimate the risk of 1-year mortality within a cohort of multicondition hospitalized patients.

KEY WORDS

machine learning hospital outcomes predictive models data mining 

Abbreviations

AST

Aspartate amino transferase

ALT

Alanine amino transferase

AHRQ

Agency for Health Care Research and Quality

EOL

End of life

EOLp

End of life planning

EMR

Electronic medical record

MCV

Mean corpuscular volume

WBC

White blood cell count

CMP

Complete metabolic panel

CBC

Complete blood count

BMP

Basic metabolic panel

RF

Random forest

LR

Likelihood ratio

LR

Logistic regression

SD

Standard deviation

AUC

Area under the curve

ROC

Receiver-operator curve

OOB

Out of bag

ICD9-CM

International Classification of Diseases 9-Clinical Modification

ICD10

International Classification of Diseases 10

MD-Gini

Mean decrease in Gini Index

ML

Machine learning

Notes

Acknowledgements

We thank Zohara Cohen, MS, and Justin Dale for help with the technical aspects of data management. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health Award Number UL1TR000114. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Compliance with Ethical Standards

Conflicts of Interest

The authors report no conflicts of interest in this work.

Supplementary material

11606_2018_4316_MOESM1_ESM.docx (495 kb)
ESM 1 (DOCX 494 kb)

References

  1. 1.
    Frost DW, Cook DJ, Heyland DK, Fowler RA. Patient and healthcare professional factors influencing end-of-life decision-making during critical illness: a systematic review*. Crit Care Med. 2011;39(5):1174–1189.  https://doi.org/10.1097/CCM.0b013e31820eacf2.CrossRefPubMedGoogle Scholar
  2. 2.
    You JJ, Downar J, Fowler RA, et al. Barriers to goals of care discussions with seriously ill hospitalized patients and their families: a multicenter survey of clinicians. JAMA Intern Med. 2015;175(4):549-556.  https://doi.org/10.1001/jamainternmed.2014.7732.CrossRefPubMedGoogle Scholar
  3. 3.
    Van Walraven C, McAlister FA, Bakal JA, Hawken S, Donzé J. External validation of the Hospital-patient One-year Mortality Risk (HOMR) model for predicting death within 1 year after hospital admission. CMAJ. 2015;187(10):725-733.  https://doi.org/10.1503/cmaj.150209.CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Tabak YP, Sun X, Nunez CM, Johannes RS. Using electronic health record data to develop inpatient mortality predictive model: Acute Laboratory Risk of Mortality Score (ALaRMS). J Am Med Informatics Assoc. 2014;21(3):455-463.  https://doi.org/10.1136/amiajnl-2013-001790.CrossRefGoogle Scholar
  5. 5.
    Escobar GJ, Greene JD, Scheirer P, Gardner MN, Draper D, Kipnis P. Risk-Adjusting Hospital Inpatient Mortality Using Automated Inpatient, Outpatient, and Laboratory Databases. Med Care. 2008;46(3):232-239.  https://doi.org/10.1097/MLR.0b013e3181589bb6.CrossRefPubMedGoogle Scholar
  6. 6.
    Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic indices for older adults: a systematic review. JAMA. 2012;307(2):182-192.  https://doi.org/10.1001/jama.2011.1966.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Van Walraven C. The Hospital-patient One-year Mortality Risk score accurately predicted long-term death risk in hospitalized patients. J Clin Epidemiol. 2014;67(9):1025-1034.  https://doi.org/10.1016/j.jclinepi.2014.05.003.CrossRefPubMedGoogle Scholar
  8. 8.
    Hripcsak G, Bloomrosen M, FlatelyBrennan P, et al. Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy Meeting. J Am Med Inform Assoc. 2013;21(2):204-211.  https://doi.org/10.1136/amiajnl-2013-002117.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Fialho AS, Cismondi F, Vieira SM, Reti SR, Sousa JMC, Finkelstein SN. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39(18):13158-13165.  https://doi.org/10.1016/j.eswa.2012.05.086.CrossRefGoogle Scholar
  10. 10.
    Rothman MJ, Rothman SI, Beals J. Development and validation of a continuous measure of patient condition using the Electronic Medical Record. J Biomed Inform. 2013;46(5):837-848.  https://doi.org/10.1016/j.jbi.2013.06.011.CrossRefPubMedGoogle Scholar
  11. 11.
    Nguyen OK, Makam AN, Clark C, et al. Predicting all-cause 30-day hospital readmissions using electronic health record data over the course of hospitalization: Model derivation, validation and comparison. J Gen Intern Med. 2015;30(0):S231.  https://doi.org/10.1002/jhm.2568.
  12. 12.
    Nguyen OK, Makam AN, Clark C, et al. Vital Signs Are Still Vital: Instability on Discharge and the Risk of Post-Discharge Adverse Outcomes. J Gen Intern Med. 2017;32(1):42-48.  https://doi.org/10.1007/s11606-016-3826-8.CrossRefPubMedGoogle Scholar
  13. 13.
    James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. Vol 103.; 2013.  https://doi.org/10.1007/978-1-4614-7138-7.
  14. 14.
    Hripcsak G, Albers DJ. Correlating electronic health record concepts with healthcare process events. J Am Med Inform Assoc. 2013;20(e2):e311-8.  https://doi.org/10.1136/amiajnl-2013-001922.CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130-1139.  https://doi.org/10.1097/01.mlr.0000182534.19832.83.CrossRefPubMedGoogle Scholar
  16. 16.
    Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J Biomed Inform. 2017;68:112-120.  https://doi.org/10.1016/j.jbi.2017.03.009.CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Breiman L. Random forest. Mach Learn. 1999;45(5):1-35.  https://doi.org/10.1023/A:1010933404324.Google Scholar
  18. 18.
    Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) (9780387848570): Trevor Hastie, Robert Tibshirani, Jerome Friedman: Books. In: The Elements of Statistical Learning: Dta Mining, Inference, and Prediction. ; 2011:501–520. http://www.amazon.com/Elements-Statistical-Learning-Prediction-Statistics/dp/0387848576/ref=sr_1_14?ie=UTF8&qid=1429565346&sr=8-14&keywords=machine+learning.
  19. 19.
    Breiman L. Random forests. Mach Learn. 2001;45(1):5-32.  https://doi.org/10.1023/A:1010933404324.CrossRefGoogle Scholar
  20. 20.
    Shi T, Horvath S. Unsupervised Learning With Random Forest Predictors. J Comput Graph Stat. 2006;15(1):118-138.  https://doi.org/10.1198/106186006X94072.CrossRefGoogle Scholar
  21. 21.
    Grömping U. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am Stat. 2009;63(4):308-319.  https://doi.org/10.1198/tast.2009.08199.CrossRefGoogle Scholar
  22. 22.
    DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837-845.  https://doi.org/10.2307/2531595.CrossRefPubMedGoogle Scholar
  23. 23.
    Krumholz HM, Wang Y, Mattera JA, et al. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with heart failure. Circulation. 2006;113(13):1693-1701.  https://doi.org/10.1161/CIRCULATIONAHA.105.611194.CrossRefPubMedGoogle Scholar
  24. 24.
    Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc. 2013;20(1):117-121.  https://doi.org/10.1136/amiajnl-2012-001145.CrossRefPubMedGoogle Scholar
  25. 25.
    Nguyen OK, Makam AN, Clark C, et al. Predicting all-cause readmissions using electronic health record data from the entire hospitalization: Model development and comparison. J Hosp Med. 2016;11(7):473-480.  https://doi.org/10.1002/jhm.2568.CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Society of General Internal Medicine 2018

Authors and Affiliations

  1. 1.Division of General Internal MedicineUniversity of MinnesotaMinneapolisUSA
  2. 2.Institute of Health InformaticsUniversity of MinnesotaMinneapolisUSA

Personalised recommendations