Skip to main content

Advertisement

Log in

Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

  • Published:
Health Care Management Science Aims and scope Submit manuscript

Abstract

Hospital readmission risk modeling is of great interest to both hospital administrators and health care policy makers, for reducing preventable readmission and advancing care service quality. To accommodate the needs of both stakeholders, a readmission risk model is preferable if it (i) exhibits superior prediction performance; (ii) identifies risk factors to help target the most at-risk individuals; and (iii) constructs composite metrics to evaluate multiple hospitals, hospital networks, and geographic regions. Existing work mainly addressed the first two features and it is challenging to address the third one because available medical data are fragmented across hospitals. To simultaneously address all three features, this paper proposes readmission risk models with incorporation of latent heterogeneity, and takes advantage of administrative claims data, which is less fragmented and involves larger patient cohorts. Different levels of latent heterogeneity are considered to quantify the effects of unobserved factors, provide composite measures for performance evaluation at various aggregate levels, and compensate less informative claims data. To demonstrate the prediction performances of the proposed models, a real case study is considered on a state-wide heart failure patient cohort. A systematic comparison study is then carried out to evaluate the performances of 49 risk models and their variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Jencks S F, Williams M V, Coleman E A (2009) Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med 360:1418–1428

    Article  Google Scholar 

  2. Shams I, Ajorlou S, Yang K (2015) A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci 18:19–34

    Article  Google Scholar 

  3. Centers for Medicare and Medicaid Services (CMS). Medicare and Medicaid Statistical Supplement. (2013). https://www.cms.gov/Research-Statistics-Data-and-Systems/Stat-istics-Trends-and-Reports/Archives/MMSS/2013.html

  4. Gu Q, Koenig L, Faerberg J et al (2014) The medicare hospital readmissions reduction program: potential unintended consequences for hospitals serving vulnerable populations. Health Serv Res 49:818–837

    Article  Google Scholar 

  5. Barrett M L, Wier L M, Jiang J, Steiner C A (2015) All-cause readmissions by payer and age, 2009-2013: table 2. HCUP Stat Br #199 166:1–14

    Google Scholar 

  6. Council FL (2017) Demystifying hospital readmissions penalties commonly asked questions from hospital CFOs. Advis Board Co 1–8

  7. Zheng B, Zhang J, Yoon S W et al (2015) Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst Appl 42:7110–7120

    Article  Google Scholar 

  8. Betihavas V, Davidson P M, Newton P J et al (2012) What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure?. Aust Crit Care 25:31–40

    Article  Google Scholar 

  9. Nijhawan A E, Kitchell E, Etherton S S et al (2015) Half of 30-Day Hospital Readmissions Among HIV-Infected Patients Are Potentially Preventable. AIDS Patient Care STDS 29:465– 473

    Article  Google Scholar 

  10. Tran T, Luo W, Phung D et al (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinforma 15:65–96

    Article  Google Scholar 

  11. Maddipatla R M, Hadzikadic M, Misra D P, Yao L (2015) 30 Day hospital readmission analysis. In: Proc. - 2015 IEEE int. conf. big data, IEEE big data 2015. IEEE, pp 2922–2924

  12. Shulan M, Gao K, Moore C D (2013) Predicting 30-day all-cause hospital readmissions. Health Care Manag Sci 16:167–175

    Article  Google Scholar 

  13. Futoma J, Morris J, Lucas J (2015) A comparison of models for predicting early hospital readmissions. J Biomed Inform 56:229–238

    Article  Google Scholar 

  14. Ross J S, Mulvey G K, Stauer B, Patlolla V, Bernheim S M, Keenan P S, Krumholz H M (2008) Statistical models and patient predictors of readmission for heart failure a systematic review. Arch Intern Med 168:1371–1386

    Article  Google Scholar 

  15. Wan H, Zhang L, Witz S et al (2016) A literature review of preventable hospital readmissions: preceding the readmissions reduction act. IIE Trans Healthc Syst Eng 6:193–211

    Article  Google Scholar 

  16. Kansagara D, Englander H, Salanitro A et al (2011) Risk prediction models for hospital readmission. Jama 306:1688– 1698

    Article  Google Scholar 

  17. McGinnis J M, Olsen L, Goolsby WA, Grossmann C (eds) (2011) Clinical data as the basic staple of health learning. National Academies Press, Washington, D.C.

  18. Houchens R L, Ross DN, Elixhauser A, Jiang J (2014) U.S. Agency for Healthcare Research and Quality. HCUP NIS Related Reports ONLINE. Nationwide Inpatient Sample Redesign Final Report. http://www.hcupus.ahrq.gov/db/nation/nis/nisrelatedreports.jsp

  19. He D, Mathews S C, Kalloo A N, Hutfless S (2014) Mining high-dimensional administrative claims data to predict early hospital readmissions. J Am Med Informatics Assoc 21:272–279

    Article  Google Scholar 

  20. Wallmann R, Llorca J, Gómez-Acebo I et al (2013) Prediction of 30-day cardiac-related-emergency-readmissions using simple administrative hospital data. Int J Cardiol 164:193–200

    Article  Google Scholar 

  21. Chin D L, Bang H, Manickam R N, Romano P S (2016) Rethinking thirty-day hospital readmissions: shorter intervals might be better indicators of quality of care. Health Aff 35:1867–1875

    Article  Google Scholar 

  22. Helm J E, Alaeddini A, Stauffer J M et al (2016) Reducing hospital readmissions by integrating empirical prediction with resource optimization. Prod Oper Manag 25:233–257

    Article  Google Scholar 

  23. Lin C H, Lin S C, Chen M C, Wang S Y (2006) Comparison of time to rehospitalization among schizophrenic patients discharged on typical antipsychotics, clozapine or risperidone. J Chinese Med Assoc 69:264–269

    Article  Google Scholar 

  24. Lin C H, Lin K S, Lin C Y et al (2008) Time to rehospitalization in patients with major depressive disorder taking venlafaxine or fluoxetine. J Clin Psychiatry 69:54–59

    Article  Google Scholar 

  25. Omurlu I K, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36:8582–8588

    Article  Google Scholar 

  26. Ture M, Tokatli F, Kurt I (2009) Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Syst Appl 36:2017–2026

    Article  Google Scholar 

  27. Miller RG Jr (2011) Survival analysis, vol 66. Wiley

  28. Li M, Hu Q, Liu J (2014) Proportional hazard modeling for hierarchical systems with multi-level information aggregation. IIE Trans 46:149–163

    Article  Google Scholar 

  29. Cox D R, Johnson NI (1992) Regression models and life-tables. In: Kotz S (ed) Breakthrough in statistics. Springer, New York, pp 527–541

  30. Shapiro S P (2005) Agency theory. Annu Rev Sociol 31:263–284

    Article  Google Scholar 

  31. Kiser E (1999) Comparing varieties of agency theory in economics, political science, and sociology: an illustration from state policy implementation. Sociol Theory 17:146–170

    Article  Google Scholar 

  32. Eisenhardt K M (1989) Agency theory: an assessment and review. Acad Manag Rev 14:57–74

    Article  Google Scholar 

  33. Anwar A M (2016) Presenting traveller preference heterogeneity in the context of agency theory: understanding and minimising the agency problem. Urban, Plan Transp Res 4:26–45

    Article  Google Scholar 

  34. Anwar A H M M, Tieu K, Gibson P et al (2014) Analysing the heterogeneity of traveller mode choice preference using a random parameter logit model from the perspective of principal-agent theory. Int J Logist Syst Manag 17:447–71

    Article  Google Scholar 

  35. Dempster A P, Laird N M, Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    Google Scholar 

  36. Nielsen G G, Gill R D, Andersen P K, Sorensen T I A (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 19:25–43

    Google Scholar 

  37. Cortiñas Abrahantes J, Burzykowski T (2005) A version of the EM algorithm for proportional hazard model with random effects. Biometrical J 47:847–862

    Article  Google Scholar 

  38. Harrell FE (2015) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer

  39. Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  40. Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324

    Article  Google Scholar 

  41. Klein J P (1992) Semiparametric estimation of random effects using the cox model based on the EM algorithm. Biometrics 48:795–806

    Article  Google Scholar 

  42. Zhang H H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703

    Article  Google Scholar 

  43. Wu Y (2012) Elastic net for Cox’s proportional hazards model with a solution path algorithm. Stat Sin 22:27–294

    Google Scholar 

  44. Schnedler W (2005) Likelihood estimation for censored random vectors. Econom Rev 24:195–217

    Article  Google Scholar 

  45. Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction. Springer

  46. Harrell F E, Califf R M, Pryor D B et al (1982) Evaluating the yield of medical tests. Jama 247:2543–2546

    Article  Google Scholar 

  47. Kremers WK (2007) Concordance for survival time data: fixed and time-dependent covariates and possible ties in predictor and time. Mayo Foundation. https://www.semanticscholar.org/paper/Concordance-for-Survival-Time-Data-Fixed-and-Time-Kremers-Liebig/06ad5dc66f40f1f2a7be3cb068bbd619ce06e3d4

  48. Uno H, Cai T, Pencina M J et al (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105– 1117

    Google Scholar 

  49. Schmid M, Wright M N, Ziegler A (2016) On the use of Harrell’s C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450–459

    Article  Google Scholar 

  50. Collins T C, Daley J, Henderson W H, Khuri S F (1999) Risk factors for prolonged length of stay after major elective surgery. Ann Surg 230:251–259

    Article  Google Scholar 

  51. Pencina M J, D’Agostino R B, Song L (2012) Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med 31:1543– 1553

    Article  Google Scholar 

  52. Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley Ser Probab Sattistics

  53. Zhu K, Lou Z, Zhou J et al (2015) Predicting 30-day hospital readmission with publicly available administrative database: a conditional logistic regression modeling approach. Methods Inf Med 54:560–567

    Article  Google Scholar 

  54. Fingar K, Washington R (2006) Trends in hospital readmissions for four high-volume conditions, 2009-2013: statistical brief #196. HCUP Stat Br #196 1–17

  55. Silverstein M D, Qin H, Mercer S Q et al (2008) Risk factors for 30-day hospital readmission in patients ≥ 65 years of age. Proc (Bayl Univ Med Cent) 21:363–372

    Article  Google Scholar 

  56. HCUP State Inpatient Databases (SID). Healthcare Cost and Utilization Project (HCUP). (2009-2011). Agency for Healthcare Research and Quality, Rockville, MD. http://www.hcup-us.ahrq.gov/sidoverview.jsp

  57. Schmutte T, Dunn C L, Sledge W H (2010) Predicting time to readmission in patients with recent histories of recurrent psychiatric hospitalization. J Nerv Ment Dis 198:860–863

    Article  Google Scholar 

  58. Van Walraven C, Dhalla I A, Bell C et al (2010) Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ 182:551–557

    Article  Google Scholar 

  59. Tan P N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley

  60. Fung G M, Mangasarian O L (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202

    Article  Google Scholar 

  61. Kruse R L, Hays H D, Madsen R W et al (2013) Risk factors for all-cause hospital readmission within 30 days of hospital discharge. J Clin Outcomes Manag 21:203–214

    Google Scholar 

  62. García-Pérez L, Linertová R, Lorenzo-Riera A et al (2011) Risk factors for hospital readmissions in elderly patients: a systematic review. Qjm 104:639–651

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by University of South Florida Research & Innovation Internal Awards Program under Grant No. 0114783.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingyang Li.

Appendix: E-M algorithms derivations

Appendix: E-M algorithms derivations

Maximizing (3) consists of two steps repeating iteratively, namely the expectation step (i.e., E-step) and the maximization step (i.e., M-step). In the E-step of iteration r, a conditional expectation is computed for Eq. 3, i.e., \(\text {Q}(\boldsymbol {\beta },\lambda _{0}(t),\boldsymbol {{\gamma }})=\text {E}_{\boldsymbol {{\gamma }}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}}(\text {L}(\boldsymbol {\beta },\lambda _{0}(t)|\mathbf {D},\boldsymbol {{\gamma }}))\), which can be explicitly expressed as

$$ \text{Q}(\boldsymbol{\beta},\lambda_{0}(t),\boldsymbol{{\gamma}}) =\text{Q}_{1}(\boldsymbol{\beta},\lambda_{0}(t))+\text{Q}_{2}(\boldsymbol{{\gamma}}), $$
(6)

where β(r) and λ0(t)(r) are updated values of β and λ0(t) at iteration r. The two separate parts can be expressed by

$$\begin{array}{@{}rcl@{}} \text{Q}_{1}(\boldsymbol{\beta},\lambda_{0}(t))&=& \sum_{j = 1}^{N_{\text{J}}}\sum_{i = 1}^{n_{j}}[\delta_{ij}\{\log\lambda_{0}(t_{ij})+\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{ij}\\&&+\text{E}({\gamma}_{j}|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)})\} \\ &&-{\Lambda}_{0}(t_{ij})\exp(\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{i}\\ &&+\log\text{E}(\exp({\gamma}_{j})|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)}))] \end{array} $$
(7)

and

$$ \text{Q}_{2}(\boldsymbol{{\gamma}})=\sum_{j = 1}^{N_{\text{J}}}\text{E}(\log(p({\gamma}_{j}))|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)}). $$
(8)

To evaluate Q(β,λ0(t),γ), the conditional expectation term \(\text {E}(g({\gamma }_{j})|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}) = \int g({\gamma }_{j})p({\gamma }_{j}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)})d{\gamma }_{j}\) need to be computed, where g(γj) ∈{γj, exp(γj), log(p(γj))}. Monte Carlo simulation based on the conditional distribution p(γj|D,β(r),λ0(t)(r)) can be performed to compute these expectations numerically [39].

In the M-step of iteration r, the coefficient vector β and baseline readmission rate function λ0(t) are updated by maximizing Q1(β,λ0(t)). When λ0(t) is modeled parametrically (e.g., Weibull, Exponential, etc.) with the parameters 𝜃λ, Q1(β,𝜃λ) can be directly maximized based on numerical optimization methods, such as the Newton-Raphson method, which is readily available in many computing packages and routines. When λ0(t) is modeled non-parametrically with Cox baseline specification, directly maximizing Q1(β,λ0(t)) is not straightforward. A partial likelihood approach is considered to update β by treating λ0(t) as nuisance parameters and maximizing the partial likelihood function Q1′(β), which is explicitly given by

$$\begin{array}{@{}rcl@{}} \text{Q}_{1}^{\prime}(\boldsymbol{\beta}) &=& \sum_{j = 1}^{N_{\text{J}}}\sum_{i = 1}^{n_{j}}\delta_{ij}[\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{i} \\ &&-\log\sum_{t_{lj}\geq t_{ij}}\exp\{\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{l}\\&&+\log\text{E}(\exp({\gamma}_{j})|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)})\}]. \end{array} $$
(9)

Equation 9 resembles the partial log-likelihood for the Cox proportional hazard model and thus can be maximized based on the readily available packages for estimating the Cox proportional hazard model [40]. After maximizing \(\text {Q}_{1}^{\prime }(\boldsymbol {\beta })\), β at iteration r can therefore be updated as β(r+ 1). In the meantime, λ0(t)(r+ 1) and its counterpart of cumulative baseline function Λ0(t)(r+ 1) can be calculated non-parametrically with the Nelson-Aalen estimator at iteration r [41].

Due to the augmentation techniques in the E-M method, latent variables γj’s can be directly estimated by maximizing Q2(γ). Different distributions, such as Gamma and Lognormal distributions, can be assumed for the latent variables γj’s. For instance, when Gamma distribution is assumed, each latent variable \({\gamma }_{j}^{(r + 1)}\) at iteration r can be updated as

$$ {\gamma}_{j}^{(r + 1)} =\frac{1/({\sigma}^{2})^{(r)}+\sum_{i = 1}^{n_{j}}\delta_{ij}}{1/({\sigma}^{2})^{(r)}+\sum_{i = 1}^{n_{j}}{\Lambda}_{0}(t_{ij})^{(r + 1)}\exp((\hat{\boldsymbol{\beta}}^{\text{T}})^{(r)}\mathbf{X}_{ij})}, $$
(10)

where Λ0(t)(r+ 1) and (σ2)(r) are updated cumulative baseline function and variance on the latent variable, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Kong, N., Sun, X. et al. Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity. Health Care Manag Sci 22, 156–179 (2019). https://doi.org/10.1007/s10729-018-9431-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10729-018-9431-0

Keywords

Navigation