Health and Technology

, Volume 9, Issue 3, pp 297–309 | Cite as

A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records

  • Alvaro Ribeiro Botelho Junqueira
  • Farhaan Mirza
  • Mirza Mansoor BaigEmail author
Original Paper


Due to high costs, resources and managemant associated with readmission into Intensive Care Units (ICU), it has been a center of clinical research. Previous research successfully identified several common risk factors and proposed a variety of frameworks to predict ICU readmissions, whereas, some studies reported that many risk factors were too specific and/or had limited focus. This study aims to investigate and analyze if the relevance of ICU readmission risk factors may have changed overtime. We used MIMIC-III database with 42,307 ICU stays of 31,749 patients from a US hospital, related to medical services provided from 2001 to 2012. The dataset was initially split into two chronological subsets (2001–2008 and 2008–2012), and then split again into train (70%) and test (30%) datasets. The training datasets were rebalanced through undersampling technique. To identify if the most relevant risk factors changes over time, 13 variables (12 features and one class) were selected and a three-step machine learning approach was executed: (i) Numerical Analysis, to identify overall quantitative changes; (ii) Feature Correlation Value Analysis, to rank the most important risk factors in each subset and compare them to identify any significant changes; and (iii) Classifier Performance Analysis, to identify changes in the risk factors prediction capability, based on the three machine learning algorithms - Multilayer Perceptron, Random Forest and Support Vector Machine. When considering readmission rates, some changes were observed for patients using private insurance (variability of +3.0%) and first admitted in ICU through Medical Intensive Care Unit (−3.1%). Regarding the feature analysis, the two most relevant variables were the same in both datasets, having similar correlation value. When applying the machine learning algorithms in test datasets, the model presented similar results for both periods, achieving the best accuracy of 86.4%, and Area Under ROC Curve (AUC) of 0.642. The difference in AUC values between the first and the second periods varied up to 0.05 (better in the first dataset) and in accuracy up to 4% (better in the second period). Overall results indicate that the most relevant risk factors were stable over the years, with some minor changes. Further research is required to incorporate other readmission risk factors, such as social determinants and mental health and well-being.


Unplanned ICU readmission MIMIC-III database Machine learning US hospital admissions Readmission risk factors and predictive modeling Machine learning algorithms Multilayer perceptron Neural networks Random forest and support vector machine 


Compliance with ethical standards

Conflict of interest

Authors declare no conflict of interest.

Ethical approval

The ethical approval was obtained from the appropriate committees prior to conducting the research study.


  1. 1.
    Bosco JA, et al. Cost burden of 30-day readmissions following Medicare total hip and knee arthroplasty. J Arthroplast. 2014;29(5):903–5.CrossRefGoogle Scholar
  2. 2.
    Paratz J, Thomas P, Adsett J. Re-admission to intensive care: identification of risk factors. Physiother Res Int. 2005;10(3):154–63.CrossRefGoogle Scholar
  3. 3.
    Braet A, et al. Risk factors for unplanned hospital re-admissions: a secondary data analysis of hospital discharge summaries. J Eval Clin Pract. 2015;21(4):560–6.CrossRefGoogle Scholar
  4. 4.
    Elliott M, Worrall-Carter L, Page K. Intensive care readmission: a contemporary review of the literature. Intensive and Critical Care Nursing. 2014;30(3):121–37.CrossRefGoogle Scholar
  5. 5.
    Jiang S, et al. An integrated machine learning framework for hospital readmission prediction. Knowl-Based Syst. 2018;146:73–90.CrossRefGoogle Scholar
  6. 6.
    Wong EG, et al. Association of severity of illness and intensive care unit readmission: A systematic review. Heart & Lung: The Journal of Acute and Critical Care. 2016;45(1):3–9 e2.CrossRefGoogle Scholar
  7. 7.
    van Walraven C, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. Can Med Assoc J. 2010;182(6):551–7.CrossRefGoogle Scholar
  8. 8.
    Donzé J, et al. Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. JAMA Intern Med. 2013;173(8):632–8.CrossRefGoogle Scholar
  9. 9.
    Lee EW. Selecting the best prediction model for readmission. J Prev Med Public Health. 2012;45(4):259.CrossRefGoogle Scholar
  10. 10.
    Billings J, et al. Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30). BMJ Open. 2012;2(4):e001667.CrossRefGoogle Scholar
  11. 11.
    Timmers T, et al. Patients’ characteristics associated with readmission to a surgical intensive care unit. Am J Crit Care. 2012;21(6):e120–8.CrossRefGoogle Scholar
  12. 12.
    Charlson ME, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83.CrossRefGoogle Scholar
  13. 13.
    van Walraven C, Wong J, Forster AJ. LACE+ index: extension of a validated index to predict early death or urgent readmission after hospital discharge using administrative data. Open Medicine. 2012;6(3):e80.Google Scholar
  14. 14.
    Shadmi E, et al. Predicting 30-day readmissions with preadmission electronic health record data. Med Care. 2015;53(3):283–9.CrossRefGoogle Scholar
  15. 15.
    Rothman MJ, Rothman SI, Beals J IV. Development and validation of a continuous measure of patient condition using the electronic medical record. J Biomed Inform. 2013;46(5):837–48.CrossRefGoogle Scholar
  16. 16.
    Robinson R, Hudali T. The HOSPITAL score and LACE index as predictors of 30 day readmission in a retrospective study at a university-affiliated community hospital. Peer J. 2017;5:e3137.CrossRefGoogle Scholar
  17. 17.
    Maali Y, et al. Predicting 7-day, 30-day and 60-day all-cause unplanned readmission: a case study of a Sydney hospital. BMC Medical Informatics and Decision Making. 2018;18(1):1.CrossRefGoogle Scholar
  18. 18.
    Johnson AE, et al. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035.CrossRefGoogle Scholar
  19. 19.
    Fialho AS, et al. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Syst Appl. 2012;39(18):13158–65.CrossRefGoogle Scholar
  20. 20.
    Hall M, et al. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter. 2009;11(1):10–8.CrossRefGoogle Scholar
  21. 21.
    Van Hulse J, Khoshgoftaar TM, Napolitano A. Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on Machine learning. ACM; 2007.Google Scholar
  22. 22.
    Al-Shahib A, Breitling R, Gilbert D. Feature selection and the class imbalance problem in predicting protein function from sequence. Appl Bioinforma. 2005;4(3):195–203.CrossRefGoogle Scholar
  23. 23.
    Chawla NV, et al. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.CrossRefzbMATHGoogle Scholar
  24. 24.
    Goswami S, Chakrabarti A. Feature selection: A practitioner view. International Journal of Information Technology and Computer Science (IJITCS). 2014;6(11):66.CrossRefGoogle Scholar
  25. 25.
    Singh B, Kushwaha N, Vyas OP. A feature subset selection technique for high dimensional data using symmetric uncertainty. Journal of Data Analysis and Information Processing. 2014;2(04):95.CrossRefGoogle Scholar
  26. 26.
    Yu L, Liu H. Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-03). 2003.Google Scholar
  27. 27.
    Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One. 2015;10(3):e0118432.CrossRefGoogle Scholar

Copyright information

© IUPESM and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Alvaro Ribeiro Botelho Junqueira
    • 1
  • Farhaan Mirza
    • 1
  • Mirza Mansoor Baig
    • 1
    Email author
  1. 1.School of Engineering, Computer and Mathematical SciencesAuckland University of TechnologyAucklandNew Zealand

Personalised recommendations