Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

Chen, Suiyao; Kong, Nan; Sun, Xuxue; Meng, Hongdao; Li, Mingyang

doi:10.1007/s10729-018-9431-0

Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

Published: 25 January 2018

Volume 22, pages 156–179, (2019)
Cite this article

Health Care Management Science Aims and scope Submit manuscript

Suiyao Chen¹,
Nan Kong²,
Xuxue Sun¹,
Hongdao Meng³ &
…
Mingyang Li¹

917 Accesses
14 Citations
Explore all metrics

Abstract

Hospital readmission risk modeling is of great interest to both hospital administrators and health care policy makers, for reducing preventable readmission and advancing care service quality. To accommodate the needs of both stakeholders, a readmission risk model is preferable if it (i) exhibits superior prediction performance; (ii) identifies risk factors to help target the most at-risk individuals; and (iii) constructs composite metrics to evaluate multiple hospitals, hospital networks, and geographic regions. Existing work mainly addressed the first two features and it is challenging to address the third one because available medical data are fragmented across hospitals. To simultaneously address all three features, this paper proposes readmission risk models with incorporation of latent heterogeneity, and takes advantage of administrative claims data, which is less fragmented and involves larger patient cohorts. Different levels of latent heterogeneity are considered to quantify the effects of unobserved factors, provide composite measures for performance evaluation at various aggregate levels, and compensate less informative claims data. To demonstrate the prediction performances of the proposed models, a real case study is considered on a state-wide heart failure patient cohort. A systematic comparison study is then carried out to evaluate the performances of 49 risk models and their variants.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Hospital Readmissions in a Commercially Insured Population over Varying Time Horizons

Article 28 November 2022

Morgan Henderson, Jon Mark Hirshon, … Ian Stockwell

To what degree can variations in readmission rates be explained on the level of the hospital? a multilevel study using a large Dutch database.

Article Open access 27 December 2018

Karin Hekkert, Rudolf B. Kool, … Gert Westert

Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results

Article Open access 25 November 2020

Huaqing Zhao, Samuel Tanner, … Daniel J. Rubin

References

Jencks S F, Williams M V, Coleman E A (2009) Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N Engl J Med 360:1418–1428
Article Google Scholar
Shams I, Ajorlou S, Yang K (2015) A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD. Health Care Manag Sci 18:19–34
Article Google Scholar
Centers for Medicare and Medicaid Services (CMS). Medicare and Medicaid Statistical Supplement. (2013). https://www.cms.gov/Research-Statistics-Data-and-Systems/Stat-istics-Trends-and-Reports/Archives/MMSS/2013.html
Gu Q, Koenig L, Faerberg J et al (2014) The medicare hospital readmissions reduction program: potential unintended consequences for hospitals serving vulnerable populations. Health Serv Res 49:818–837
Article Google Scholar
Barrett M L, Wier L M, Jiang J, Steiner C A (2015) All-cause readmissions by payer and age, 2009-2013: table 2. HCUP Stat Br #199 166:1–14
Google Scholar
Council FL (2017) Demystifying hospital readmissions penalties commonly asked questions from hospital CFOs. Advis Board Co 1–8
Zheng B, Zhang J, Yoon S W et al (2015) Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst Appl 42:7110–7120
Article Google Scholar
Betihavas V, Davidson P M, Newton P J et al (2012) What are the factors in risk prediction models for rehospitalisation for adults with chronic heart failure?. Aust Crit Care 25:31–40
Article Google Scholar
Nijhawan A E, Kitchell E, Etherton S S et al (2015) Half of 30-Day Hospital Readmissions Among HIV-Infected Patients Are Potentially Preventable. AIDS Patient Care STDS 29:465– 473
Article Google Scholar
Tran T, Luo W, Phung D et al (2014) A framework for feature extraction from hospital medical data with applications in risk prediction. BMC Bioinforma 15:65–96
Article Google Scholar
Maddipatla R M, Hadzikadic M, Misra D P, Yao L (2015) 30 Day hospital readmission analysis. In: Proc. - 2015 IEEE int. conf. big data, IEEE big data 2015. IEEE, pp 2922–2924
Shulan M, Gao K, Moore C D (2013) Predicting 30-day all-cause hospital readmissions. Health Care Manag Sci 16:167–175
Article Google Scholar
Futoma J, Morris J, Lucas J (2015) A comparison of models for predicting early hospital readmissions. J Biomed Inform 56:229–238
Article Google Scholar
Ross J S, Mulvey G K, Stauer B, Patlolla V, Bernheim S M, Keenan P S, Krumholz H M (2008) Statistical models and patient predictors of readmission for heart failure a systematic review. Arch Intern Med 168:1371–1386
Article Google Scholar
Wan H, Zhang L, Witz S et al (2016) A literature review of preventable hospital readmissions: preceding the readmissions reduction act. IIE Trans Healthc Syst Eng 6:193–211
Article Google Scholar
Kansagara D, Englander H, Salanitro A et al (2011) Risk prediction models for hospital readmission. Jama 306:1688– 1698
Article Google Scholar
McGinnis J M, Olsen L, Goolsby WA, Grossmann C (eds) (2011) Clinical data as the basic staple of health learning. National Academies Press, Washington, D.C.
Houchens R L, Ross DN, Elixhauser A, Jiang J (2014) U.S. Agency for Healthcare Research and Quality. HCUP NIS Related Reports ONLINE. Nationwide Inpatient Sample Redesign Final Report. http://www.hcupus.ahrq.gov/db/nation/nis/nisrelatedreports.jsp
He D, Mathews S C, Kalloo A N, Hutfless S (2014) Mining high-dimensional administrative claims data to predict early hospital readmissions. J Am Med Informatics Assoc 21:272–279
Article Google Scholar
Wallmann R, Llorca J, Gómez-Acebo I et al (2013) Prediction of 30-day cardiac-related-emergency-readmissions using simple administrative hospital data. Int J Cardiol 164:193–200
Article Google Scholar
Chin D L, Bang H, Manickam R N, Romano P S (2016) Rethinking thirty-day hospital readmissions: shorter intervals might be better indicators of quality of care. Health Aff 35:1867–1875
Article Google Scholar
Helm J E, Alaeddini A, Stauffer J M et al (2016) Reducing hospital readmissions by integrating empirical prediction with resource optimization. Prod Oper Manag 25:233–257
Article Google Scholar
Lin C H, Lin S C, Chen M C, Wang S Y (2006) Comparison of time to rehospitalization among schizophrenic patients discharged on typical antipsychotics, clozapine or risperidone. J Chinese Med Assoc 69:264–269
Article Google Scholar
Lin C H, Lin K S, Lin C Y et al (2008) Time to rehospitalization in patients with major depressive disorder taking venlafaxine or fluoxetine. J Clin Psychiatry 69:54–59
Article Google Scholar
Omurlu I K, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36:8582–8588
Article Google Scholar
Ture M, Tokatli F, Kurt I (2009) Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Syst Appl 36:2017–2026
Article Google Scholar
Miller RG Jr (2011) Survival analysis, vol 66. Wiley
Li M, Hu Q, Liu J (2014) Proportional hazard modeling for hierarchical systems with multi-level information aggregation. IIE Trans 46:149–163
Article Google Scholar
Cox D R, Johnson NI (1992) Regression models and life-tables. In: Kotz S (ed) Breakthrough in statistics. Springer, New York, pp 527–541
Shapiro S P (2005) Agency theory. Annu Rev Sociol 31:263–284
Article Google Scholar
Kiser E (1999) Comparing varieties of agency theory in economics, political science, and sociology: an illustration from state policy implementation. Sociol Theory 17:146–170
Article Google Scholar
Eisenhardt K M (1989) Agency theory: an assessment and review. Acad Manag Rev 14:57–74
Article Google Scholar
Anwar A M (2016) Presenting traveller preference heterogeneity in the context of agency theory: understanding and minimising the agency problem. Urban, Plan Transp Res 4:26–45
Article Google Scholar
Anwar A H M M, Tieu K, Gibson P et al (2014) Analysing the heterogeneity of traveller mode choice preference using a random parameter logit model from the perspective of principal-agent theory. Int J Logist Syst Manag 17:447–71
Article Google Scholar
Dempster A P, Laird N M, Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Google Scholar
Nielsen G G, Gill R D, Andersen P K, Sorensen T I A (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 19:25–43
Google Scholar
Cortiñas Abrahantes J, Burzykowski T (2005) A version of the EM algorithm for proportional hazard model with random effects. Biometrical J 47:847–862
Article Google Scholar
Harrell FE (2015) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer
Carlin BP, Louis TA (2000) Bayes and empirical Bayes methods for data analysis. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Vaida F, Xu R (2000) Proportional hazards model with random effects. Stat Med 19:3309–3324
Article Google Scholar
Klein J P (1992) Semiparametric estimation of random effects using the cox model based on the EM algorithm. Biometrics 48:795–806
Article Google Scholar
Zhang H H, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703
Article Google Scholar
Wu Y (2012) Elastic net for Cox’s proportional hazards model with a solution path algorithm. Stat Sin 22:27–294
Google Scholar
Schnedler W (2005) Likelihood estimation for censored random vectors. Econom Rev 24:195–217
Article Google Scholar
Hastie TJ, Tibshirani RJ, Friedman JH (2009) The elements of statistical learning : data mining, inference, and prediction. Springer
Harrell F E, Califf R M, Pryor D B et al (1982) Evaluating the yield of medical tests. Jama 247:2543–2546
Article Google Scholar
Kremers WK (2007) Concordance for survival time data: fixed and time-dependent covariates and possible ties in predictor and time. Mayo Foundation. https://www.semanticscholar.org/paper/Concordance-for-Survival-Time-Data-Fixed-and-Time-Kremers-Liebig/06ad5dc66f40f1f2a7be3cb068bbd619ce06e3d4
Uno H, Cai T, Pencina M J et al (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105– 1117
Google Scholar
Schmid M, Wright M N, Ziegler A (2016) On the use of Harrell’s C for clinical risk prediction via random survival forests. Expert Syst Appl 63:450–459
Article Google Scholar
Collins T C, Daley J, Henderson W H, Khuri S F (1999) Risk factors for prolonged length of stay after major elective surgery. Ann Surg 230:251–259
Article Google Scholar
Pencina M J, D’Agostino R B, Song L (2012) Quantifying discrimination of Framingham risk functions with different survival C statistics. Stat Med 31:1543– 1553
Article Google Scholar
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley Ser Probab Sattistics
Zhu K, Lou Z, Zhou J et al (2015) Predicting 30-day hospital readmission with publicly available administrative database: a conditional logistic regression modeling approach. Methods Inf Med 54:560–567
Article Google Scholar
Fingar K, Washington R (2006) Trends in hospital readmissions for four high-volume conditions, 2009-2013: statistical brief #196. HCUP Stat Br #196 1–17
Silverstein M D, Qin H, Mercer S Q et al (2008) Risk factors for 30-day hospital readmission in patients ≥ 65 years of age. Proc (Bayl Univ Med Cent) 21:363–372
Article Google Scholar
HCUP State Inpatient Databases (SID). Healthcare Cost and Utilization Project (HCUP). (2009-2011). Agency for Healthcare Research and Quality, Rockville, MD. http://www.hcup-us.ahrq.gov/sidoverview.jsp
Schmutte T, Dunn C L, Sledge W H (2010) Predicting time to readmission in patients with recent histories of recurrent psychiatric hospitalization. J Nerv Ment Dis 198:860–863
Article Google Scholar
Van Walraven C, Dhalla I A, Bell C et al (2010) Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ 182:551–557
Article Google Scholar
Tan P N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison Wesley
Fung G M, Mangasarian O L (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202
Article Google Scholar
Kruse R L, Hays H D, Madsen R W et al (2013) Risk factors for all-cause hospital readmission within 30 days of hospital discharge. J Clin Outcomes Manag 21:203–214
Google Scholar
García-Pérez L, Linertová R, Lorenzo-Riera A et al (2011) Risk factors for hospital readmissions in elderly patients: a systematic review. Qjm 104:639–651
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by University of South Florida Research & Innovation Internal Awards Program under Grant No. 0114783.

Author information

Authors and Affiliations

Department of Industrial and Management Systems Engineering, University of South Florida, 4202 E. Fowler Avenue, Tampa, FL, 33620, USA
Suiyao Chen, Xuxue Sun & Mingyang Li
Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, 47907, USA
Nan Kong
School of Aging Studies, University of South Florida, Tampa, FL, 33620, USA
Hongdao Meng

Authors

Suiyao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Nan Kong
View author publications
You can also search for this author in PubMed Google Scholar
Xuxue Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongdao Meng
View author publications
You can also search for this author in PubMed Google Scholar
Mingyang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingyang Li.

Appendix: E-M algorithms derivations

Maximizing (3) consists of two steps repeating iteratively, namely the expectation step (i.e., E-step) and the maximization step (i.e., M-step). In the E-step of iteration r, a conditional expectation is computed for Eq. 3, i.e., $\text {Q}(\boldsymbol {\beta },\lambda _{0}(t),\boldsymbol {{\gamma }})=\text {E}_{\boldsymbol {{\gamma }}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}}(\text {L}(\boldsymbol {\beta },\lambda _{0}(t)|\mathbf {D},\boldsymbol {{\gamma }}))$, which can be explicitly expressed as

$$ \text{Q}(\boldsymbol{\beta},\lambda_{0}(t),\boldsymbol{{\gamma}}) =\text{Q}_{1}(\boldsymbol{\beta},\lambda_{0}(t))+\text{Q}_{2}(\boldsymbol{{\gamma}}), $$

(6)

where β^(r) and λ₀(t)^(r) are updated values of β and λ₀(t) at iteration r. The two separate parts can be expressed by

$$\begin{array}{@{}rcl@{}} \text{Q}_{1}(\boldsymbol{\beta},\lambda_{0}(t))&=& \sum_{j = 1}^{N_{\text{J}}}\sum_{i = 1}^{n_{j}}[\delta_{ij}\{\log\lambda_{0}(t_{ij})+\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{ij}\\&&+\text{E}({\gamma}_{j}|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)})\} \\ &&-{\Lambda}_{0}(t_{ij})\exp(\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{i}\\ &&+\log\text{E}(\exp({\gamma}_{j})|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)}))] \end{array} $$

(7)

and

$$ \text{Q}_{2}(\boldsymbol{{\gamma}})=\sum_{j = 1}^{N_{\text{J}}}\text{E}(\log(p({\gamma}_{j}))|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)}). $$

(8)

To evaluate Q(β,λ₀(t),γ), the conditional expectation term $\text {E}(g({\gamma }_{j})|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)}) = \int g({\gamma }_{j})p({\gamma }_{j}|\mathbf {D},\boldsymbol {\beta }^{(r)},\lambda _{0}(t)^{(r)})d{\gamma }_{j}$ need to be computed, where g(γ_j) ∈{γ_j, exp(γ_j), log(p(γ_j))}. Monte Carlo simulation based on the conditional distribution p(γ_j|D,β^(r),λ₀(t)^(r)) can be performed to compute these expectations numerically [39].

In the M-step of iteration r, the coefficient vector β and baseline readmission rate function λ₀(t) are updated by maximizing Q₁(β,λ₀(t)). When λ₀(t) is modeled parametrically (e.g., Weibull, Exponential, etc.) with the parameters 𝜃_λ, Q₁(β,𝜃_λ) can be directly maximized based on numerical optimization methods, such as the Newton-Raphson method, which is readily available in many computing packages and routines. When λ₀(t) is modeled non-parametrically with Cox baseline specification, directly maximizing Q₁(β,λ₀(t)) is not straightforward. A partial likelihood approach is considered to update β by treating λ₀(t) as nuisance parameters and maximizing the partial likelihood function Q1′(β), which is explicitly given by

$$\begin{array}{@{}rcl@{}} \text{Q}_{1}^{\prime}(\boldsymbol{\beta}) &=& \sum_{j = 1}^{N_{\text{J}}}\sum_{i = 1}^{n_{j}}\delta_{ij}[\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{i} \\ &&-\log\sum_{t_{lj}\geq t_{ij}}\exp\{\boldsymbol{\beta}^{\text{T}}\mathbf{X}_{l}\\&&+\log\text{E}(\exp({\gamma}_{j})|\mathbf{D},\boldsymbol{\beta}^{(r)},\lambda_{0}(t)^{(r)})\}]. \end{array} $$

(9)

Equation 9 resembles the partial log-likelihood for the Cox proportional hazard model and thus can be maximized based on the readily available packages for estimating the Cox proportional hazard model [40]. After maximizing $\text {Q}_{1}^{\prime }(\boldsymbol {\beta })$, β at iteration r can therefore be updated as β^(r+ 1). In the meantime, λ₀(t)^(r+ 1) and its counterpart of cumulative baseline function Λ₀(t)^(r+ 1) can be calculated non-parametrically with the Nelson-Aalen estimator at iteration r [41].

Due to the augmentation techniques in the E-M method, latent variables γ_j’s can be directly estimated by maximizing Q₂(γ). Different distributions, such as Gamma and Lognormal distributions, can be assumed for the latent variables γ_j’s. For instance, when Gamma distribution is assumed, each latent variable ${\gamma }_{j}^{(r + 1)}$ at iteration r can be updated as

$$ {\gamma}_{j}^{(r + 1)} =\frac{1/({\sigma}^{2})^{(r)}+\sum_{i = 1}^{n_{j}}\delta_{ij}}{1/({\sigma}^{2})^{(r)}+\sum_{i = 1}^{n_{j}}{\Lambda}_{0}(t_{ij})^{(r + 1)}\exp((\hat{\boldsymbol{\beta}}^{\text{T}})^{(r)}\mathbf{X}_{ij})}, $$

(10)

where Λ₀(t)^(r+ 1) and (σ²)^(r) are updated cumulative baseline function and variance on the latent variable, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Kong, N., Sun, X. et al. Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity. Health Care Manag Sci 22, 156–179 (2019). https://doi.org/10.1007/s10729-018-9431-0

Download citation

Received: 31 July 2017
Accepted: 09 January 2018
Published: 25 January 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10729-018-9431-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

Abstract

Access this article

Similar content being viewed by others

Predicting Hospital Readmissions in a Commercially Insured Population over Varying Time Horizons

To what degree can variations in readmission rates be explained on the level of the hospital? a multilevel study using a large Dutch database.

Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: E-M algorithms derivations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity

Abstract

Access this article

Similar content being viewed by others

Predicting Hospital Readmissions in a Commercially Insured Population over Varying Time Horizons

To what degree can variations in readmission rates be explained on the level of the hospital? a multilevel study using a large Dutch database.

Common sampling and modeling approaches to analyzing readmission risk that ignore clustering produce misleading results

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: E-M algorithms derivations

Appendix: E-M algorithms derivations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation