Health Care Management Science

, Volume 22, Issue 1, pp 156–179 | Cite as

Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

  • Suiyao Chen
  • Nan Kong
  • Xuxue Sun
  • Hongdao Meng
  • Mingyang LiEmail author


Hospital readmission risk modeling is of great interest to both hospital administrators and health care policy makers, for reducing preventable readmission and advancing care service quality. To accommodate the needs of both stakeholders, a readmission risk model is preferable if it (i) exhibits superior prediction performance; (ii) identifies risk factors to help target the most at-risk individuals; and (iii) constructs composite metrics to evaluate multiple hospitals, hospital networks, and geographic regions. Existing work mainly addressed the first two features and it is challenging to address the third one because available medical data are fragmented across hospitals. To simultaneously address all three features, this paper proposes readmission risk models with incorporation of latent heterogeneity, and takes advantage of administrative claims data, which is less fragmented and involves larger patient cohorts. Different levels of latent heterogeneity are considered to quantify the effects of unobserved factors, provide composite measures for performance evaluation at various aggregate levels, and compensate less informative claims data. To demonstrate the prediction performances of the proposed models, a real case study is considered on a state-wide heart failure patient cohort. A systematic comparison study is then carried out to evaluate the performances of 49 risk models and their variants.


Hospital readmission Latent heterogeneity Predictive modeling Aggregate-level performance Administrative claims data 



This work was supported in part by University of South Florida Research & Innovation Internal Awards Program under Grant No. 0114783.


  1. 1.
    Jencks S F, Williams M V, Coleman E A (2009) Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med 360:1418–1428CrossRefGoogle Scholar
  2. 2.
    Shams I, Ajorlou S, Yang K (2015) A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci 18:19–34CrossRefGoogle Scholar
  3. 3.
    Centers for Medicare and Medicaid Services (CMS). Medicare and Medicaid Statistical Supplement. (2013).
  4. 4.
    Gu Q, Koenig L, Faerberg J et al (2014) The medicare hospital readmissions reduction program: potential unintended consequences for hospitals serving vulnerable populations. Health Serv Res 49:818–837CrossRefGoogle Scholar
  5. 5.
    Barrett M L, Wier L M, Jiang J, Steiner C A (2015) All-cause readmissions by payer and age, 2009-2013: table 2. HCUP Stat Br #199 166:1–14Google Scholar
  6. 6.
    Council FL (2017) Demystifying hospital readmissions penalties commonly asked questions from hospital CFOs. Advis Board Co 1–8Google Scholar
  7. 7.
    Zheng B, Zhang J, Yoon S W et al (2015) Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst Appl 42:7110–7120CrossRefGoogle Scholar
  8. 8.
    Betihavas V, Davidson P M, Newton P J et al (2012) What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure?. Aust Crit Care 25:31–40CrossRefGoogle Scholar
  9. 9.
    Nijhawan A E, Kitchell E, Etherton S S et al (2015) Half of 30-Day Hospital Readmissions Among HIV-Infected Patients Are Potentially Preventable. AIDS Patient Care STDS 29:465– 473CrossRefGoogle Scholar
  10. 10.
    Tran T, Luo W, Phung D et al (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinforma 15:65–96CrossRefGoogle Scholar
  11. 11.
    Maddipatla R M, Hadzikadic M, Misra D P, Yao L (2015) 30 Day hospital readmission analysis. In: Proc. - 2015 IEEE int. conf. big data, IEEE big data 2015. IEEE, pp 2922–2924Google Scholar
  12. 12.
    Shulan M, Gao K, Moore C D (2013) Predicting 30-day all-cause hospital readmissions. Health Care Manag Sci 16:167–175CrossRefGoogle Scholar
  13. 13.
    Futoma J, Morris J, Lucas J (2015) A comparison of models for predicting early hospital readmissions. J Biomed Inform 56:229–238CrossRefGoogle Scholar
  14. 14.
    Ross J S, Mulvey G K, Stauer B, Patlolla V, Bernheim S M, Keenan P S, Krumholz H M (2008) Statistical models and patient predictors of readmission for heart failure a systematic review. Arch Intern Med 168:1371–1386CrossRefGoogle Scholar
  15. 15.
    Wan H, Zhang L, Witz S et al (2016) A literature review of preventable hospital readmissions: preceding the readmissions reduction act. IIE Trans Healthc Syst Eng 6:193–211CrossRefGoogle Scholar
  16. 16.
    Kansagara D, Englander H, Salanitro A et al (2011) Risk prediction models for hospital readmission. Jama 306:1688– 1698CrossRefGoogle Scholar
  17. 17.
    McGinnis J M, Olsen L, Goolsby WA, Grossmann C (eds) (2011) Clinical data as the basic staple of health learning. National Academies Press, Washington, D.C.Google Scholar
  18. 18.
    Houchens R L, Ross DN, Elixhauser A, Jiang J (2014) U.S. Agency for Healthcare Research and Quality. HCUP NIS Related Reports ONLINE. Nationwide Inpatient Sample Redesign Final Report.
  19. 19.
    He D, Mathews S C, Kalloo A N, Hutfless S (2014) Mining high-dimensional administrative claims data to predict early hospital readmissions. J Am Med Informatics Assoc 21:272–279CrossRefGoogle Scholar
  20. 20.
    Wallmann R, Llorca J, Gómez-Acebo I et al (2013) Prediction of 30-day cardiac-related-emergency-readmissions using simple administrative hospital data. Int J Cardiol 164:193–200CrossRefGoogle Scholar
  21. 21.
    Chin D L, Bang H, Manickam R N, Romano P S (2016) Rethinking thirty-day hospital readmissions: shorter intervals might be better indicators of quality of care. Health Aff 35:1867–1875CrossRefGoogle Scholar
  22. 22.
    Helm J E, Alaeddini A, Stauffer J M et al (2016) Reducing hospital readmissions by integrating empirical prediction with resource optimization. Prod Oper Manag 25:233–257CrossRefGoogle Scholar
  23. 23.
    Lin C H, Lin S C, Chen M C, Wang S Y (2006) Comparison of time to rehospitalization among schizophrenic patients discharged on typical antipsychotics, clozapine or risperidone. J Chinese Med Assoc 69:264–269CrossRefGoogle Scholar
  24. 24.
    Lin C H, Lin K S, Lin C Y et al (2008) Time to rehospitalization in patients with major depressive disorder taking venlafaxine or fluoxetine. J Clin Psychiatry 69:54–59CrossRefGoogle Scholar
  25. 25.
    Omurlu I K, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36:8582–8588CrossRefGoogle Scholar
  26. 26.
    Ture M, Tokatli F, Kurt I (2009) Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Syst Appl 36:2017–2026CrossRefGoogle Scholar
  27. 27.
    Miller RG Jr (2011) Survival analysis, vol 66. WileyGoogle Scholar
  28. 28.
    Li M, Hu Q, Liu J (2014) Proportional hazard modeling for hierarchical systems with multi-level information aggregation. IIE Trans 46:149–163CrossRefGoogle Scholar
  29. 29.
    Cox D R, Johnson NI (1992) Regression models and life-tables. In: Kotz S (ed) Breakthrough in statistics. Springer, New York, pp 527–541Google Scholar
  30. 30.
    Shapiro S P (2005) Agency theory. Annu Rev Sociol 31:263–284CrossRefGoogle Scholar
  31. 31.
    Kiser E (1999) Comparing varieties of agency theory in economics, political science, and sociology: an illustration from state policy implementation. Sociol Theory 17:146–170CrossRefGoogle Scholar
  32. 32.
    Eisenhardt K M (1989) Agency theory: an assessment and review. Acad Manag Rev 14:57–74CrossRefGoogle Scholar
  33. 33.
    Anwar A M (2016) Presenting traveller preference heterogeneity in the context of agency theory: understanding and minimising the agency problem. Urban, Plan Transp Res 4:26–45CrossRefGoogle Scholar
  34. 34.
    Anwar A H M M, Tieu K, Gibson P et al (2014) Analysing the heterogeneity of traveller mode choice preference using a random parameter logit model from the perspective of principal-agent theory. Int J Logist Syst Manag 17:447–71CrossRefGoogle Scholar
  35. 35.
    Dempster A P, Laird N M, Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38Google Scholar
  36. 36.
    Nielsen G G, Gill R D, Andersen P K, Sorensen T I A (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 19:25–43Google Scholar
  37. 37.
    Cortiñas Abrahantes J, Burzykowski T (2005) A version of the EM algorithm for proportional hazard model with random effects. Biometrical J 47:847–862CrossRefGoogle Scholar
  38. 38.
    Harrell FE (2015) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. SpringerGoogle Scholar
  39. 39.
    Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis. Chapman & Hall/CRC, Boca RatonCrossRefGoogle Scholar
  40. 40.
    Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324CrossRefGoogle Scholar
  41. 41.
    Klein J P (1992) Semiparametric estimation of random effects using the cox model based on the EM algorithm. Biometrics 48:795–806CrossRefGoogle Scholar
  42. 42.
    Zhang H H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703CrossRefGoogle Scholar
  43. 43.
    Wu Y (2012) Elastic net for Cox’s proportional hazards model with a solution path algorithm. Stat Sin 22:27–294Google Scholar
  44. 44.
    Schnedler W (2005) Likelihood estimation for censored random vectors. Econom Rev 24:195–217CrossRefGoogle Scholar
  45. 45.
    Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction. SpringerGoogle Scholar
  46. 46.
    Harrell F E, Califf R M, Pryor D B et al (1982) Evaluating the yield of medical tests. Jama 247:2543–2546CrossRefGoogle Scholar
  47. 47.
    Kremers WK (2007) Concordance for survival time data: fixed and time-dependent covariates and possible ties in predictor and time. Mayo Foundation.
  48. 48.
    Uno H, Cai T, Pencina M J et al (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105– 1117Google Scholar
  49. 49.
    Schmid M, Wright M N, Ziegler A (2016) On the use of Harrell’s C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450–459CrossRefGoogle Scholar
  50. 50.
    Collins T C, Daley J, Henderson W H, Khuri S F (1999) Risk factors for prolonged length of stay after major elective surgery. Ann Surg 230:251–259CrossRefGoogle Scholar
  51. 51.
    Pencina M J, D’Agostino R B, Song L (2012) Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med 31:1543– 1553CrossRefGoogle Scholar
  52. 52.
    Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley Ser Probab SattisticsGoogle Scholar
  53. 53.
    Zhu K, Lou Z, Zhou J et al (2015) Predicting 30-day hospital readmission with publicly available administrative database: a conditional logistic regression modeling approach. Methods Inf Med 54:560–567CrossRefGoogle Scholar
  54. 54.
    Fingar K, Washington R (2006) Trends in hospital readmissions for four high-volume conditions, 2009-2013: statistical brief #196. HCUP Stat Br #196 1–17Google Scholar
  55. 55.
    Silverstein M D, Qin H, Mercer S Q et al (2008) Risk factors for 30-day hospital readmission in patients ≥ 65 years of age. Proc (Bayl Univ Med Cent) 21:363–372CrossRefGoogle Scholar
  56. 56.
    HCUP State Inpatient Databases (SID). Healthcare Cost and Utilization Project (HCUP). (2009-2011). Agency for Healthcare Research and Quality, Rockville, MD.
  57. 57.
    Schmutte T, Dunn C L, Sledge W H (2010) Predicting time to readmission in patients with recent histories of recurrent psychiatric hospitalization. J Nerv Ment Dis 198:860–863CrossRefGoogle Scholar
  58. 58.
    Van Walraven C, Dhalla I A, Bell C et al (2010) Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ 182:551–557CrossRefGoogle Scholar
  59. 59.
    Tan P N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison WesleyGoogle Scholar
  60. 60.
    Fung G M, Mangasarian O L (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202CrossRefGoogle Scholar
  61. 61.
    Kruse R L, Hays H D, Madsen R W et al (2013) Risk factors for all-cause hospital readmission within 30 days of hospital discharge. J Clin Outcomes Manag 21:203–214Google Scholar
  62. 62.
    García-Pérez L, Linertová R, Lorenzo-Riera A et al (2011) Risk factors for hospital readmissions in elderly patients: a systematic review. Qjm 104:639–651CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Suiyao Chen
    • 1
  • Nan Kong
    • 2
  • Xuxue Sun
    • 1
  • Hongdao Meng
    • 3
  • Mingyang Li
    • 1
    Email author
  1. 1.Department of Industrial and Management Systems EngineeringUniversity of South FloridaTampaUSA
  2. 2.Weldon School of Biomedical EngineeringPurdue UniversityWest LafayetteUSA
  3. 3.School of Aging StudiesUniversity of South FloridaTampaUSA

Personalised recommendations