Driven by quality outcomes and economic incentives, predicting 30-day hospital readmissions remains an important focus for healthcare systems. Several models to predict hospital readmission have been previously published.1,2,3,4,5,6,7 The ultimate goal of predicting readmission risk is to improve the quality of care provided to patients at high risk for worsening health upon discharge from the hospital, and efforts to reduce 30-day readmissions are also linked to payment for hospitals.8 However, readmissions may vary in different contexts such as by hospital or patient characteristics9,10,11,12,13,14 and risk prediction models that are initially validated and utilized in a health system may need to be reevaluated once implemented to assess longer-term performance15 and specifically across different groups. The Cleveland Clinic Health System (CCHS) created and validated a 30-day readmission risk score and implemented the score reporting in our electronic medical record (EMR). The goals of this study were to confirm that (1) over time the readmission risk score model was performing acceptably overall (2) to assess whether there was variation in performance in particular contexts. Thus, we sought to report the performance of the readmission risk score implementation for predicting all-cause readmissions over 3 years across diverse hospitals in the CCHS; medical and surgical specialties; primary discharge diagnosis categories, including the time frame of the 2020 COVID-19 pandemic; and race and ethnicity.

METHODS

Initial Readmission Risk Score Development

The CCHS is a large, integrated health system with national and international locations. For the current analysis, Northeast Ohio and Florida hospitals were included. A model to predict 30-day hospital readmissions was created in the CCHS in 2017 for operational purposes using EMR data from all hospital admissions in the CCHS in 2015 and validated using a cohort from January to June 2016, including observation admissions and excluding Oncology and Labor and Delivery admissions given their unique context.

Model Variables

Model variables were chosen based upon evidence from published literature and input from clinical experts. The final model contains 18 variables available in the EMR (Appendix Table 3) including previous healthcare utilization in the CCHS (number of admissions in the past year, total number of emergency department [ED] visits in the past 6 months, and whether there was an ED visit in the past 6 months); admission class (whether admission was an observation type); discharge disposition (home, home with home care, hospice, other, skilled nursing facility); comorbidities including history of chronic kidney disease, chronic lung disease, and anemia; drug abuse; medications (number of active medications, whether there were any anticoagulation medications); laboratory values (hemoglobin and calcium at admission; blood urea nitrogen, albumin, and sodium at discharge); insurance status (Medicaid, Medicare, private, self-pay, other); and a variable to capture language barrier, poor health literacy, or impaired cognition.

Model Development

The model was created using a multiple logistic regression and was reduced to find the best-fitting model with the fewest number of variables. Starting with over 200 candidate variables, model reduction was performed using a “stepdown” model reduction process that ranks each variable on the reduction of the R2 from smallest to largest. Variables were removed from the model according to their rank, and the process was stopped when the change in the area under the receiver operating characteristic curve (AUC) or c-statistic was less than 0.01. Validation on the 2016 data was done with an imputation of missing variables using the mean of that variable from the cohort. This imputation process was a simple imputation that could be applied to the automation of the model within the medical record. The odds ratios for the initial validation are shown in Appendix Table 3. The model obtained a c-statistic of 0.723 when validated on the 2016 cohort. The performance on the validation cohort compared favorably to that of the previously published HOSPITAL2 and LACE7 models (c-statistic 0.65 and 0.69, respectively).

Identification of High-Risk Readmission Score

A risk score of 0–100 to predict readmission was generated from the model with a high risk score identified as ≥ 40 calculated to capture the top 5% of patients at risk for 30-day readmission. Identifying the top 5% of patients at risk for readmission also allowed for feasibility of implementing transitional care interventions. When tested on the 2016 validation cohort, this cutoff score had a sensitivity of 16%, specificity of 97%, a positive predictive value of 52%, and a negative predictive value of 85%.

Model Implementation in EMR

The readmission risk score model was implemented in the Epic EMR system16 in April 2017 using configurable rule logic records and assigned a score to all applicable patients daily during the hospital admission with a final score on the day of discharge.

Some modifications to variable definitions were made to accommodate performance considerations, such as limiting lookback to 1 year of history, and to use existing fields and categories available as input to Epic rules. Additional modifications were made to incorporate missing data or data not yet present upon admission. The population mean (continuous variables), or population mode (categorical variables), was imputed for missing data. If the variable became available during the admission, e.g., new lab result, then the daily score would update using the patient’s specific data. The score can also be calculated in real time when viewing a patient record. The score and the patient variables contributing to the score can be included as columns in patient lists. A readmission report available on each patient record shows a line chart of the history of the patient’s daily risk scores, a description of how the score is calculated, and the specific variables contributing to the score, both describing factors that may increase the patient’s readmission risk and factors protective against readmission. The data for the specific variables contributing to the score is stored as variable names and numeric values of each named variable. For patients with a risk score of at least 40, a red banner appears identifying the patient as high risk.

Implementation Analysis

This retrospective cohort study utilized data obtained from CCHS EMR and billing databases from April 1, 2017, through September 3, 2020. All patient demographic information including race and ethnicity was obtained as recorded in the EMR. All patients aged ≥ 18 years discharged from a CCHS hospital were included except patients discharged from Oncology or Labor and Delivery departments or with attending providers specializing in cancer care, patients with hospice orders, or patients who died during the admission. Discharges were categorized as surgical when identified in the record as a surgical department or if surgery was performed during admission. All other admissions were categorized as medical. Primary discharge diagnoses were categorized according to Agency for Healthcare Research and Quality (AHRQ) Clinical Classifications Software (CCS) Level 1 categories.17 Primary discharge diagnoses are available as International Classification of Diseases (ICD) 10 codes in each patient’s billing data. ICD 10 codes were categorized according to AHRQ CCS ICD-10-CM beta version level 1 categories. The beta version of the CCS includes ICD-10-CM codes valid through September 2019. As our study time frame was through September 3, 2020, any ICD-10 codes newer than September 2019 were categorized by our research team in the same level 1 category as pre-existing codes from the same ICD-10 range if possible, or manually categorized if not.

Every admission in this analysis has a binary outcome value calculated for whether the patient was readmitted to any CCHS hospital within 30 days. The final recorded risk score was used to assess readmission risk score performance of observed vs. expected readmissions, assessing separately the variation according to the hospital site, medical vs. surgical specialty, and a categorization of the primary discharge diagnosis. For each category, the risk score’s discrimination performance in predicting all-cause 30-day readmission is reported using the c-statistic as well as the Brier score for prediction accuracy. A 95% confidence interval for the c-statistic was calculated from 1000 bootstrap samples. Variation in performance was assessed by comparison of each category’s c-statistic and confidence interval to the overall cohort c-statistic. In addition, given that performance of models may differ based upon race or ethnicity, 18,19 we performed subgroup analyses to assess performance of the readmission risk score by race and ethnicity. Calibration is presented in visual plots by dividing the observations into bins with an even number of data points per bin. The mean predicted risk is calculated for each bin and compared to the proportion of observed readmissions. In a perfectly calibrated model, the predicted and observed readmissions would be equal, represented by a line at a 45° angle. A sensitivity analysis was done assessing readmission rates and model performance excluding the COVID-19 pandemic period from March 1 to September 3, 2020. To assess the frequency of contributions of the specific variables contributing to the readmission risk score, we extracted for the year 2019—which is the last full year of data in our study period prior to the COVID-19 pandemic—the section of the discharge summary containing the individual variables following the header “Patient variables contributing to increased readmission risk.” Regular expressions were used to extract the variable names and numeric values of each named variable. Python version 3.7 was used for all analyses. This project was reviewed and classified as exempt by the CCHS Institutional Review Board.

RESULTS

The final retrospective cohort included 600,872 discharges for 321,470 unique patients from 11 acute care hospital sites in the CCHS including 10 in Northeast Ohio and 1 in Florida (Appendix Fig. 4, Table 1). The mean readmission score was 18.2 [standard deviation (SD) 11.0]. The mean age of the patients was 62.8 years (SD 17.7); there were 52.5% females; race categories included less than 1% American Indian or Alaskan Native or Asian, 21.8% Black, less than 1% Native Hawaiian or Pacific Islander, 63.5% White, 1.5% multiracial, less than 1% other race, and 12.4% unknown race and with 4.5% of Hispanic ethnicity and 1.8% of unknown ethnicity. There were 60.2% medical discharges. The most frequent CCS diagnosis categories included diseases of the circulatory system (25.3%), diseases of the digestive system (12.9%), injury and poisoning (11.1%), and diseases of the respiratory system (9.7%).

Table 1 Characteristics of All Cleveland Clinic Health System Discharges by Readmission Status April 2017–September 2020

The overall readmission rate was 15.9% (N = 95,497) and readmission rates per year of our study were 16.1% in 2017 (N = 19,708) and 2018 (N = 28,494), 15.1% in 2019 (N = 27,862), and 16.5% through September 2020 (N = 19,433). There were similar readmission rates of ~ 16% for medical and surgical discharges, with variation in readmission rates by hospital (range 12.6–19.3%) and by CCS diagnosis categories from 8.3% for diseases of the musculoskeletal system and connective tissue to 24.92% for diseases of the blood and blood-forming organs. The readmission rate for the COVID-19 diagnosis group in 2020 was 14.41% (306/2123) (Appendix Table 4).

The performance of the risk score evaluated for the entire cohort from 2017 to 2020 had a c-statistic of 0.6875. Medical and surgical specialties showed similar consistent performance in discrimination and calibration, with c-statistics of 0.6923, 95% CI [0.6807, 0.7045], and 0.6802, 95% CI [0.6681, 0.6925], respectively. Both contained the overall c-statistic of 0.6875 within their 95% confidence intervals. The calibration was optimized at lower readmission risk scores (Table 2, Fig. 1) with a trend toward over-prediction at higher scores. The c-statistic for hospital sites ranged from 0.6762, 95% CI [0.6634, 0.6876], at hospital G to 0.7023, 95% CI [0.6903, 0.7132], at hospital D. Except for hospital D, the 95% confidence intervals for the hospital sites included the overall c-statistic of 0.6875 (Table 2). The pattern of over-prediction of risk for patients with higher risk scores was demonstrated across sites, but some sites, such as hospital E, showed moderate over-prediction for their entire cohort (Fig. 2). CCS level 1 categories of the primary discharge diagnosis showed more variation, with several categories having 95% confidence intervals that did not include the cohort total c-statistic. Some of the lower-performing categories with c-statistic < 0.65 include COVID-19 (c-statistic 0.6387), infectious and parasitic diseases (c-statistic 0.6398), benign neoplasms (c-statistic 0.6153), or admissions related to congenital anomalies (c-statistic 0.5403) (Table 2, Fig. 3). Subgroup analyses by race and ethnicity categories showed c-statistics of > 0.65 across race and ethnicity categories (Appendix Table 5) while calibration (Appendix Figs. 5 and 6) showed over-prediction using the readmission risk score for patients of White and unknown race compared to Black and multiracial patients while there was over-prediction overall for ethnicity for all categories (Hispanic, non-Hispanic, and unknown ethnicity). The sensitivity analysis of readmission rates and overall model performance excluding the COVID-19 pandemic period from March 1 through September 3, 2020, showed the total number of discharges was 513,862 with a c-statistic for overall performance of 0.688, 95% CI [0.6766–0.6997], and Brier score of 0.12. There were 80,987 readmissions or 15.8% compared to 15.9% including the pandemic period. Readmission by year was 16.0% for 2020 similar to the yearly rate for 2020 of 16.5% including the pandemic time period.

Table 2 Risk Score Performance by Cleveland Clinic Health System Hospital Site, Medical/Surgical, and Diagnosis Categories
Figure 1
figure 1

Calibration plot: observed vs. mean predicted readmission risk for Cleveland Clinic Health System medical and surgical discharges April 2017–September 2020.

Figure 2
figure 2

Calibration plot: observed vs. mean predicted readmission risk for Cleveland Clinic Health System hospital sites April 2017–September 2020.

Figure 3
figure 3

Calibration plot: observed vs. mean predicted readmission risk for Cleveland Clinic Health system discharge diagnosis categories April 2017–September 2020.

The subgroup analysis of 2019 discharges of readmission risk score variables identified as contributing to an increased score included 126,269 discharges. The following categories of variables were listed (Appendix Table 6): utilization (N = 261,204), laboratory results (N = 246,679), comorbidities and medications (N = 244,799), insurance (N = 125,142), discharge disposition to home (N = 67,093), health literacy (N = 14,025), and history of drug abuse (N = 7120).

DISCUSSION

Our work highlights the importance of continued study of risk models that are implemented in practice to determine their sustained relevance, to identify opportunities to capture additional contextual data for model improvement, and to determine the appropriate role of model-based risk assessment after initial implementation. It is critical to regularly evaluate the performance of risk models utilized in healthcare to ensure acceptable performance on potentially evolving patient populations compared to the population on which the model was initially developed as well as to assess for model performance among diverse populations to ensure equity.20

Our retrospective analysis of the implementation of the CCHS readmission risk score shows overall good performance across medical and surgical patients and diverse health system hospital sites over 3 years of implementation in our health system EMR. The model is generally well-calibrated but shows some over-prediction of risk for patients with higher readmission risk scores overall and certain diagnosis groups. COVID-19 is a new category that would not have been present in the training data for the risk score but for which the readmission risk score had acceptable performance.

Readmission risk prediction had generally consistent performance by hospital site but with clear outliers (hospitals B and E) with under- and over-prediction. These are sites that require evaluation of implementation context to further understand the variable performance including possible differences in additional patient characteristics not captured by the prediction model or other site-specific issues. Regarding the primary diagnosis categories, there was also variation in performance which is consistent with previous work at our health system suggesting the impact of the primary diagnosis on outcomes related to hospitalization21 and may impact post-discharge planning specific to diagnosis categories.

The value of readmission risk prediction is in the potential to improve care delivery.22,23 Our subgroup analysis of variables identified as contributing to a higher risk score suggests potentially modifiable variables including discharge disposition to home and health literacy. Future transitional care efforts could focus on these variables in addition to assessing for variation in hospital sites and primary diagnoses. For example, we previously reported the association with subsequent healthcare utilization of a CCHS transitional care intervention program of post-discharge home visits that had been successfully implemented utilizing a high-risk CCHS readmission risk score as one criterion for program offering.24 Further efforts to provide support at home post discharge may thus be promising.

Regarding transitional care program eligibility, an interesting finding of our current study is that while model performance is overall similar across race categories, there is over-prediction of readmission risk for White patients compared to Black patients at higher readmission risk scores. Since the readmission risk score is utilized for risk stratification in the offering of transitional care programs, this could potentially lead to offering of these programs to patients with actually lower risk for readmission. This would also be the case for the over-prediction at higher risk scores overall and for certain hospital sites or diagnosis categories. It is important to note that clinician assessment of readmission risk may be a valuable addition to readmission risk prediction25,26 and the readmission risk score is currently only one criterion used for program offering, as our health system programs are also open to referral by our clinicians and nurse care coordinators regardless of risk score, mitigating to some extent the concern of potentially exacerbating disparities in program eligibility or offering based upon race, site of care, or diagnosis category using the risk score alone.

Even with appropriate program offering to patients and available reimbursement with recent increased national trends of utilization,27 the acceptance of transitional care programs may vary24,28,29 and addressing both medical and social needs30 after discharge will be required to have impact on longer-term outcomes. Future work will require focus not only on utilizing the risk score to inform resource allocation for post-discharge initiatives and subsequent assessment of clinical outcomes but on implementation strategies to improve uptake of the programs offered. For example, recent experience suggests telehealth may be promising to deliver transitional care interventions.31

Limitations of our analysis include that the readmission risk score is calculated based upon available CCHS EMR data and the outcome of 30-day hospital readmission is captured only if the readmission was in the CCHS. Changes to the patient’s medical record, such as the discharge disposition, one of the variables in the risk score, could be made after the last readmission risk score is filed. While we were able to capture in a subgroup analysis the frequency of the predictive variables contributing to an elevated risk score displayed to the provider, we could not assess their effect on provider actions at the time of discharge.

Our results suggest that our readmission risk score continues to perform well overall and calls for further study of variation in performance by hospital site and primary discharge diagnosis, and continued assessment of performance by race and ethnicity categories in order to optimally inform our ongoing transitional care efforts.