Abstract
Hospital readmission risk modeling is of great interest to both hospital administrators and health care policy makers, for reducing preventable readmission and advancing care service quality. To accommodate the needs of both stakeholders, a readmission risk model is preferable if it (i) exhibits superior prediction performance; (ii) identifies risk factors to help target the most at-risk individuals; and (iii) constructs composite metrics to evaluate multiple hospitals, hospital networks, and geographic regions. Existing work mainly addressed the first two features and it is challenging to address the third one because available medical data are fragmented across hospitals. To simultaneously address all three features, this paper proposes readmission risk models with incorporation of latent heterogeneity, and takes advantage of administrative claims data, which is less fragmented and involves larger patient cohorts. Different levels of latent heterogeneity are considered to quantify the effects of unobserved factors, provide composite measures for performance evaluation at various aggregate levels, and compensate less informative claims data. To demonstrate the prediction performances of the proposed models, a real case study is considered on a state-wide heart failure patient cohort. A systematic comparison study is then carried out to evaluate the performances of 49 risk models and their variants.
Similar content being viewed by others
References
Jencks S F, Williams M V, Coleman E A (2009) Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med 360:1418–1428
Shams I, Ajorlou S, Yang K (2015) A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci 18:19–34
Centers for Medicare and Medicaid Services (CMS). Medicare and Medicaid Statistical Supplement. (2013). https://www.cms.gov/Research-Statistics-Data-and-Systems/Stat-istics-Trends-and-Reports/Archives/MMSS/2013.html
Gu Q, Koenig L, Faerberg J et al (2014) The medicare hospital readmissions reduction program: potential unintended consequences for hospitals serving vulnerable populations. Health Serv Res 49:818–837
Barrett M L, Wier L M, Jiang J, Steiner C A (2015) All-cause readmissions by payer and age, 2009-2013: table 2. HCUP Stat Br #199 166:1–14
Council FL (2017) Demystifying hospital readmissions penalties commonly asked questions from hospital CFOs. Advis Board Co 1–8
Zheng B, Zhang J, Yoon S W et al (2015) Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst Appl 42:7110–7120
Betihavas V, Davidson P M, Newton P J et al (2012) What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure?. Aust Crit Care 25:31–40
Nijhawan A E, Kitchell E, Etherton S S et al (2015) Half of 30-Day Hospital Readmissions Among HIV-Infected Patients Are Potentially Preventable. AIDS Patient Care STDS 29:465– 473
Tran T, Luo W, Phung D et al (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinforma 15:65–96
Maddipatla R M, Hadzikadic M, Misra D P, Yao L (2015) 30 Day hospital readmission analysis. In: Proc. - 2015 IEEE int. conf. big data, IEEE big data 2015. IEEE, pp 2922–2924
Shulan M, Gao K, Moore C D (2013) Predicting 30-day all-cause hospital readmissions. Health Care Manag Sci 16:167–175
Futoma J, Morris J, Lucas J (2015) A comparison of models for predicting early hospital readmissions. J Biomed Inform 56:229–238
Ross J S, Mulvey G K, Stauer B, Patlolla V, Bernheim S M, Keenan P S, Krumholz H M (2008) Statistical models and patient predictors of readmission for heart failure a systematic review. Arch Intern Med 168:1371–1386
Wan H, Zhang L, Witz S et al (2016) A literature review of preventable hospital readmissions: preceding the readmissions reduction act. IIE Trans Healthc Syst Eng 6:193–211
Kansagara D, Englander H, Salanitro A et al (2011) Risk prediction models for hospital readmission. Jama 306:1688– 1698
McGinnis J M, Olsen L, Goolsby WA, Grossmann C (eds) (2011) Clinical data as the basic staple of health learning. National Academies Press, Washington, D.C.
Houchens R L, Ross DN, Elixhauser A, Jiang J (2014) U.S. Agency for Healthcare Research and Quality. HCUP NIS Related Reports ONLINE. Nationwide Inpatient Sample Redesign Final Report. http://www.hcupus.ahrq.gov/db/nation/nis/nisrelatedreports.jsp
He D, Mathews S C, Kalloo A N, Hutfless S (2014) Mining high-dimensional administrative claims data to predict early hospital readmissions. J Am Med Informatics Assoc 21:272–279
Wallmann R, Llorca J, Gómez-Acebo I et al (2013) Prediction of 30-day cardiac-related-emergency-readmissions using simple administrative hospital data. Int J Cardiol 164:193–200
Chin D L, Bang H, Manickam R N, Romano P S (2016) Rethinking thirty-day hospital readmissions: shorter intervals might be better indicators of quality of care. Health Aff 35:1867–1875
Helm J E, Alaeddini A, Stauffer J M et al (2016) Reducing hospital readmissions by integrating empirical prediction with resource optimization. Prod Oper Manag 25:233–257
Lin C H, Lin S C, Chen M C, Wang S Y (2006) Comparison of time to rehospitalization among schizophrenic patients discharged on typical antipsychotics, clozapine or risperidone. J Chinese Med Assoc 69:264–269
Lin C H, Lin K S, Lin C Y et al (2008) Time to rehospitalization in patients with major depressive disorder taking venlafaxine or fluoxetine. J Clin Psychiatry 69:54–59
Omurlu I K, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36:8582–8588
Ture M, Tokatli F, Kurt I (2009) Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Syst Appl 36:2017–2026
Miller RG Jr (2011) Survival analysis, vol 66. Wiley
Li M, Hu Q, Liu J (2014) Proportional hazard modeling for hierarchical systems with multi-level information aggregation. IIE Trans 46:149–163
Cox D R, Johnson NI (1992) Regression models and life-tables. In: Kotz S (ed) Breakthrough in statistics. Springer, New York, pp 527–541
Shapiro S P (2005) Agency theory. Annu Rev Sociol 31:263–284
Kiser E (1999) Comparing varieties of agency theory in economics, political science, and sociology: an illustration from state policy implementation. Sociol Theory 17:146–170
Eisenhardt K M (1989) Agency theory: an assessment and review. Acad Manag Rev 14:57–74
Anwar A M (2016) Presenting traveller preference heterogeneity in the context of agency theory: understanding and minimising the agency problem. Urban, Plan Transp Res 4:26–45
Anwar A H M M, Tieu K, Gibson P et al (2014) Analysing the heterogeneity of traveller mode choice preference using a random parameter logit model from the perspective of principal-agent theory. Int J Logist Syst Manag 17:447–71
Dempster A P, Laird N M, Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Nielsen G G, Gill R D, Andersen P K, Sorensen T I A (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 19:25–43
Cortiñas Abrahantes J, Burzykowski T (2005) A version of the EM algorithm for proportional hazard model with random effects. Biometrical J 47:847–862
Harrell FE (2015) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer
Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis. Chapman & Hall/CRC, Boca Raton
Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324
Klein J P (1992) Semiparametric estimation of random effects using the cox model based on the EM algorithm. Biometrics 48:795–806
Zhang H H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703
Wu Y (2012) Elastic net for Cox’s proportional hazards model with a solution path algorithm. Stat Sin 22:27–294
Schnedler W (2005) Likelihood estimation for censored random vectors. Econom Rev 24:195–217
Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction. Springer
Harrell F E, Califf R M, Pryor D B et al (1982) Evaluating the yield of medical tests. Jama 247:2543–2546
Kremers WK (2007) Concordance for survival time data: fixed and time-dependent covariates and possible ties in predictor and time. Mayo Foundation. https://www.semanticscholar.org/paper/Concordance-for-Survival-Time-Data-Fixed-and-Time-Kremers-Liebig/06ad5dc66f40f1f2a7be3cb068bbd619ce06e3d4
Uno H, Cai T, Pencina M J et al (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105– 1117
Schmid M, Wright M N, Ziegler A (2016) On the use of Harrell’s C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450–459
Collins T C, Daley J, Henderson W H, Khuri S F (1999) Risk factors for prolonged length of stay after major elective surgery. Ann Surg 230:251–259
Pencina M J, D’Agostino R B, Song L (2012) Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med 31:1543– 1553
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley Ser Probab Sattistics
Zhu K, Lou Z, Zhou J et al (2015) Predicting 30-day hospital readmission with publicly available administrative database: a conditional logistic regression modeling approach. Methods Inf Med 54:560–567
Fingar K, Washington R (2006) Trends in hospital readmissions for four high-volume conditions, 2009-2013: statistical brief #196. HCUP Stat Br #196 1–17
Silverstein M D, Qin H, Mercer S Q et al (2008) Risk factors for 30-day hospital readmission in patients ≥ 65 years of age. Proc (Bayl Univ Med Cent) 21:363–372
HCUP State Inpatient Databases (SID). Healthcare Cost and Utilization Project (HCUP). (2009-2011). Agency for Healthcare Research and Quality, Rockville, MD. http://www.hcup-us.ahrq.gov/sidoverview.jsp
Schmutte T, Dunn C L, Sledge W H (2010) Predicting time to readmission in patients with recent histories of recurrent psychiatric hospitalization. J Nerv Ment Dis 198:860–863
Van Walraven C, Dhalla I A, Bell C et al (2010) Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ 182:551–557
Tan P N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley
Fung G M, Mangasarian O L (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202
Kruse R L, Hays H D, Madsen R W et al (2013) Risk factors for all-cause hospital readmission within 30 days of hospital discharge. J Clin Outcomes Manag 21:203–214
García-Pérez L, Linertová R, Lorenzo-Riera A et al (2011) Risk factors for hospital readmissions in elderly patients: a systematic review. Qjm 104:639–651
Acknowledgments
This work was supported in part by University of South Florida Research & Innovation Internal Awards Program under Grant No. 0114783.
Author information
Authors and Affiliations
Corresponding author
Appendix: E-M algorithms derivations
Appendix: E-M algorithms derivations
Maximizing (3) consists of two steps repeating iteratively, namely the expectation step (i.e., E-step) and the maximization step (i.e., M-step). In the E-step of iteration r, a conditional expectation is computed for Eq. 3, i.e., \(\text {Q}(\boldsymbol {\beta },\lambda _{0}(t),\boldsymbol {{\gamma }})=\text {E}_{\boldsymbol {{\gamma }}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}}(\text {L}(\boldsymbol {\beta },\lambda _{0}(t)|\mathbf {D},\boldsymbol {{\gamma }}))\), which can be explicitly expressed as
where β(r) and λ0(t)(r) are updated values of β and λ0(t) at iteration r. The two separate parts can be expressed by
and
To evaluate Q(β,λ0(t),γ), the conditional expectation term \(\text {E}(g({\gamma }_{j})|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}) = \int g({\gamma }_{j})p({\gamma }_{j}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)})d{\gamma }_{j}\) need to be computed, where g(γj) ∈{γj, exp(γj), log(p(γj))}. Monte Carlo simulation based on the conditional distribution p(γj|D,β(r),λ0(t)(r)) can be performed to compute these expectations numerically [39].
In the M-step of iteration r, the coefficient vector β and baseline readmission rate function λ0(t) are updated by maximizing Q1(β,λ0(t)). When λ0(t) is modeled parametrically (e.g., Weibull, Exponential, etc.) with the parameters 𝜃λ, Q1(β,𝜃λ) can be directly maximized based on numerical optimization methods, such as the Newton-Raphson method, which is readily available in many computing packages and routines. When λ0(t) is modeled non-parametrically with Cox baseline specification, directly maximizing Q1(β,λ0(t)) is not straightforward. A partial likelihood approach is considered to update β by treating λ0(t) as nuisance parameters and maximizing the partial likelihood function Q1′(β), which is explicitly given by
Equation 9 resembles the partial log-likelihood for the Cox proportional hazard model and thus can be maximized based on the readily available packages for estimating the Cox proportional hazard model [40]. After maximizing \(\text {Q}_{1}^{\prime }(\boldsymbol {\beta })\), β at iteration r can therefore be updated as β(r+ 1). In the meantime, λ0(t)(r+ 1) and its counterpart of cumulative baseline function Λ0(t)(r+ 1) can be calculated non-parametrically with the Nelson-Aalen estimator at iteration r [41].
Due to the augmentation techniques in the E-M method, latent variables γj’s can be directly estimated by maximizing Q2(γ). Different distributions, such as Gamma and Lognormal distributions, can be assumed for the latent variables γj’s. For instance, when Gamma distribution is assumed, each latent variable \({\gamma }_{j}^{(r + 1)}\) at iteration r can be updated as
where Λ0(t)(r+ 1) and (σ2)(r) are updated cumulative baseline function and variance on the latent variable, respectively.
Rights and permissions
About this article
Cite this article
Chen, S., Kong, N., Sun, X. et al. Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity. Health Care Manag Sci 22, 156–179 (2019). https://doi.org/10.1007/s10729-018-9431-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-018-9431-0