Background

Hospital readmissions represent a significant expense within the US health system accounting for $17 billion of preventable healthcare costs [1]. The Hospital Readmissions Reduction Program (HRRP) penalizes hospitals with higher excess 30-day readmission rates by reducing Medicare reimbursement. Thus, many groups have published predictive models to identify which patients are at high risk for readmission.

The HOSPITAL Risk Score (HRS), introduced by Donzé et al. in 2013, is a commonly used predictive model for readmissions that has been internationally validated [2, 3]. The score consists of 8 factors which spell “HOSPITAL” acronymically: Hemoglobin, discharge from an Oncology service, Sodium level at discharge, any coded Procedure during the hospital stay, Index admission Type, number of previous Admissions in the prior year, and Length of stay. However, the score’s ability to predict readmission risk in the real world has been questioned as the score uses exclusively clinical factors and does not include social determinants of health (SDOH), known contributors to readmission risk [4]. Similarly, when developing metrics for expected readmission rates for individual hospitals, the Centers for Medicare and Medicaid Services (CMS) only included age and gender as non-clinical risk factors for readmission [5].

Authors have recommended adjustment of readmission algorithms to include SDOH to improve their predictive accuracy [6]. By ignoring SDOH in the determination of readmission rates, they argue, CMS may unfairly penalize hospitals that care for the most vulnerable Americans [7]. To ameliorate this, CMS recently adjusted their algorithm to compare readmission rates only between hospitals with similar proportions of low-income patients [8]. This effort resulted in fewer safety-net hospitals being penalized. While a step in the right direction, hospitals who serve patients with unfavorable SDOH still lack a tool to be able to reliably predict which patients are at highest risk for readmission.

The purpose of this study is to assess whether the addition of SDOH to the HRS improved its predictive ability. We hypothesized that the HRS may be improved by integrating more data, specifically pertaining to non-clinical SDOH intrinsic to patients or their communities.

Methods

Patient population and study design

We queried a dataset containing all adult patients admitted to our center from 2014 to 2016. As this study measured readmission back to our center, we sought to avoid confounding by excluding patients who lived outside of the city of Chicago, those who were discharged to any location aside from home, and those who lived in very sparsely populated areas of the city.

Data sources and measures

We calculated the HRS for each patient using data available in our electronic health record (EHR) according to the method described by Donzé [3]. Points were assigned and summed for the following components of the HRS: Hemoglobin < 12 g/dL (1 point), discharge from an Oncology service (2 points), Sodium level at discharge < 135 mEq/L (1 point), any coded Procedure during the hospital stay (1 point), urgent or emergent Index admission Type (1 point), number of previous Admissions in the prior year (0–1, 0 points, 2–5, 2 points, > 5, 5 points), and Length of stay ≥5 days (2 points). Additional variables extracted from the EHR included age, gender, race, ethnicity, laboratory values, vital signs, number of prior readmissions and emergency department (ED) visits, and comorbidities. Fifteen census tract-level SDOH variables which comprise the Social Vulnerability Index (SVI) were obtained from CDC and the census tract-level violent crime rate was obtained from the City of Chicago Data Portal. Neighborhoods then were grouped into sociodemographic clusters using classifications obtained from a published cross-sectional spatial analysis using data from the US Census Bureau [9]. This method provided an objective and evidence-based way to group neighborhoods by common SDOH. The Census block group-level Area Deprivation Index (ADI) was obtained from the University of Wisconsin Neighborhood Atlas [10] and the Census tract-level Hardship Index (HI) was obtained from the City of Chicago Data Portal.

We considered all admissions to our center within the study period as index admissions. Readmissions were defined as any additional admission to our center within 30 days of an index admission. As such, some admissions served as both an index themselves and a readmission for a prior index. The outcome of 30-day readmission was defined dichotomously as the presence or absence of one or more readmissions within 30 days of an index admission.

Statistical analysis

Continuous variables were represented as median (interquartile ranges) as determined by visualizing the variables, while categorical variables were expressed as frequencies and percentages. A Spearman rank correlation test was completed to assess for the multicollinearity of clinical and social variables (Fig. S1). Since social variables were the variables of interest for the study and they showed multicollinearity, they were grouped into components using Principal Component Analysis (PCA). Cronbach alpha tests were performed to confirm the internal consistency of each component; each component had a Cronbach alpha value above 0.57. Bartlett’s test of sphericity (p < 0.001) and Kaiser-Meyer-Olkin (KMO) sampling adequacy (MSA = 0.85) were conducted to further confirm adequate sample size and correlation between the social variables. Variables included in the PCA analysis were percentages for poverty, unemployment, per capita income, disability, single parent households, minority, no vehicle, no high school diploma, age 65 years and above, limited English, crowding, multiunit living, and violent crime rate. After scaling the data, PCA with varimax rotation was performed on unsampled, randomly sampled, gender-stratified, and disease-stratified datasets. Scree plot was used to determine the optimal number of components needed to explain the total variance. A four component PCA solution containing social variables with a cutoff of PCA loading of 0.62 was found to explain 80% of the total variance. In addition, a parallel analysis was used to confirm the use of a four component PCA. Spearman correlation analysis was performed on the components to confirm their independence, and then based upon the items in each of the components, they were named the following categories: low income, no high school diploma, no vehicle and multiunit living, and age 65 years and above or disabled.

Since most patients at our center reside in the extreme poverty and suburban affluent clusters, we randomly sampled 600 patients from each of those clusters to create a balanced dataset. A multivariable linear regression was used to test for significance against the HOSPITAL score with low income, no high school diploma, no vehicle and multiunit living, age 65 years and above or disabled, congestive heart failure (CHF), valvular heart diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, chronic obstructive pulmonary disease (COPD), atrial fibrillation (AF), dyslipidemia, and coronary artery disease as independent variables derived from PCA analysis. Receiver Operating Curve analysis (ROC) was used to determine the c-statistic for models with HRS and social variables and HRS without social variables. Statistical significance was defined as a p-value < 0.05 for two-tailed tests. Data were analyzed using RStudio version 3.5.1 (RStudio: Integrated Development for R, RStudio, Inc. 2015, Boston, MA). Statistical models were performed using these packages in R: psych (version 2.0.7), corrplot (version 0.84), FactoMineR (version 2.3), and ade4 (version 1.7).

Multiple sensitivity analyses were performed by repeating our analysis after replacing the SVI with the HI and again with the ADI. Variables within the HI include crowding, poverty, unemployed and age 16 and above, no high school diploma and age 25 and above, age 18 and under and 64 and above, and per capita income. A two component PCA solution containing social variables with a cutoff of PCA loading of 0.6 was found to explain 83% of the total variance. Based upon the variables in each component, they were named low income and no high school diploma. A multivariable linear regression was used to test for significance against the HOSPITAL score using low income, no high school diploma, CHF, valvular diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, COPD, AF, dyslipidemia, and coronary artery disease as independent variables. ROC was used to determine the c statistic for each model.

The ADI was used to perform a multivariable linear regression and was used to test for significance against HOSPITAL score using the state ADI ranking, CHF, valvular diseases, hypertension, diabetes mellitus, renal diseases, liver diseases, COPD, AF, dyslipidemia, and coronary artery disease as independent variables. ROC was used to determine the c statistic for each model.

Results

Participant characteristics

A total of 54,215 records were queried and 37,105 participants met the inclusion criteria (Fig. 1). Median age of patients was 53 years (IQR 33–67 years). The majority of patients were female (63.8%), African-American (80.3%), not Hispanic or Latino (94.0%), and resided in the extreme poverty cluster (63.2%). The median household income was $32,401 (IQR $27,091 - $40,587). Clinical and social variables are summarized in Table 1.

Fig. 1
figure 1

Patient Inclusion: Schema depicting inclusion and exclusion criteria for our study. All patients treated at our center 2014–2016 were queried. Patients living outside Chicago, patients discharged to any place other than home, and patients living in very sparsely populated areas of the city were excluded

Table 1 Participant Characteristics

Outcomes

The c-statistic for the HRS predicting 30-day readmission in our dataset was 0.735 which is similar to the published value (0.72) [2]. However, the addition of SDOH to create the “social HRS” did not improve the predictive power (c-statistic = 0.713, Fig. 2). This finding persisted in the balanced dataset as well (0.721) suggesting that over-representation of patients living in extreme poverty at our center was not the cause of the negative results. This finding further persisted when patients were stratified by presence or absence of a recent admission within 30 days prior to the index admission (Fig. S2a-b) and when the ADI or HI were substituted for the SVI (Fig. S3a-b). Rather, several SDOH, including patients with no high school diploma (β = 0.062, p < 0.001), no vehicle and multiunit living (β = − 0.060, p < 0.001), CHF (β = 0.142, p < 0.001), valvular disease ((β = 0.480, p < 0.001), diabetes mellitus (β = 0.093, p < 0.005), renal disease (β = 0.740, p < 0.001), liver disease (β = 0.688, p < 0.001), COPD (β = 0.345, p < 0.001), AF (β = 0.169, p < 0.001), and dyslipidemia (β = − 0.278, p < 0.001) were significantly associated with higher HRS scores. PCA component scores are shown in Tables S1 through S12.

Fig. 2
figure 2

ROC Analysis for HRS and Social HRS: Addition of SVI to the HRS did not improve predictive performance

Using components from PCA in three separate subanalyses, we found that patients who were disabled or over 65 years of age had a higher HRS than those who were younger and not disabled if they had coronary artery disease (β = 0.31, p < 0.001), liver disease (β = 0.32, p < 0.05), or pulmonary disease (β = 0.18, p < 0.001). Similarly, low-income patients with cardiac valvular disease (β = 0.37, p < 0.005), and obesity (β = 0.12, p < 0.05) also had a higher HRS than similar higher earners. Among females, those who were low income (β = 0.037, p < 0.05), those with no vehicle and living in multiunit housing (β = − 0.033, p < 0.05), and those with CHF (β = 0.136, p < 0.005), valvular disease (β = 0.493, p < 0.001), renal disease (β = 0.819, p < 0.001), liver disease (β = 0.619, p < 0.001), COPD (β = 0.251, p < 0.001), or dyslipidemia (β = − 0.262, p < 0.001) had a higher HRS. Linear regression estimates are shown in Table S13-S24. Overall, SDOH explained 0.2% of the HRS.

Analysis using HI showed that patients with no high school diploma (β = 0.069, p < 0.001), CHF (β = 0.139, p < 0.001), valvular disease (β = 0.481, p < 0.001), diabetes mellitus (β = 0.096, p < 0.005), renal disease (β = 0.1740 p < 0.001), liver disease (β = 0.697, p < 0.001), COPD (β = 0.399, p < 0.001), AF (β = 0.163, p < 0.001), and dyslipidemia (β = − 0.277, p < 0.001) had a higher HRS score. In another subanalyses, low income patients had a higher HRS if they were obese (β = 0.129, p < 0.005) and had valvular heart disease (β = 0.372, p < 0.005). Among females, those who were low income (β = 0.025, p < 0.05), those with no high school diploma (β = 0.041, p < 0.005), and those with valvular disease (β = 0.4935 p < 0.001), diabetes mellitus (β = 0.045, p < 0.001), renal disease (β = 0.819, p < 0.001), liver disease (β = 0.620, p < 0.001), COPD (β = 0.250, p < 0.001), or dyslipidemia (β = − 0.265, p < 0.001) had a higher HRS. PCA component scores for HI are shown in Tables S25-S36. Linear regression estimates are shown in Table S37-S48. Similar to the analysis using SDOH, HI explained 0.2% of the HRS.

Discussion

In this study, we sought to determine if the predictive performance of the HRS could be improved by integrating SDOH into its structure (Social HRS). Surprisingly, we found that adding SDOH as variables did not improve the HRS’ performance. Rather, it appears that patients with poor SDOH are clinically more ill and this increased illness is already captured in the HRS.

In support of this conclusion, we found that patients who had both unfavorable SDOH such as older age, disability status, low SES, without vehicles, and who are in multiunit living, and chronic diseases such as CAD, liver disease, and pulmonary disease had significantly higher HRS. These conditions have high morbidity and mortality at baseline, both of which may be exacerbated by unfavorable SDOH, leading to more frequent readmissions. However, even in these populations, SDOH only explained 0.2% of the HRS. SDOH, by definition, are independently associated with health outcomes and life expectancy [11]. Patients with unfavorable SDOH tend to have more chronic medical conditions and present to the hospital with more advanced disease [12, 13]. Thus, the clinical factors included in the HOSPITAL score, such as hemoglobin, number of admissions in the last year, and length of stay, likely already reflect the effects of SDOH. Therefore, addition of SDOH to HRS does not appear to improve its predictive power.

These findings are consistent with a study by Bernheim et al. in which adjusting for SES did not affect estimated readmission rates [14]. Similarly, a study out of Ontario found no link between SES and readmission [15]. Our study builds on this prior work by demonstrating for the first time that the HRS is objectively higher in patients with poor SDOH and that addition of SDOH to the HRS is not necessary for predictive accuracy.

Notably, programs such as the Coalition project that attempted to reduce admissions among high utilizers with interventions targeting SDOH have had limited impact on readmission rates [16]. These results were obtained in the context of a universal health care system, which may have mitigated issues with access to healthcare. While there are disparities in healthcare access in the United States and within the population our institution serves, our study was specifically focused on patients who were admitted to the hospital and therefore did have access to healthcare. Further analysis could include patients without insurance or otherwise less access to healthcare.

These results do not imply that SDOH do not influence readmission rates. Multiple studies have demonstrated that SDOH such as race, socioeconomic status, and education contribute to a higher risk of readmission [17,18,19,20]. A study by Barnett et al. found that half of the difference in readmission rates between hospitals with highest and lowest rates of readmission could be explained by patient characteristics outside of the hospital’s control [6]. Additionally, a metanalysis by Van Walraven et al. found that predictive models for readmission that included SDOH in their algorithms were able to identify twice as many avoidable readmissions as those that used only clinical factors [18, 21]. These models have been found to be weaker when applied to patient populations with poor SDOH, which potentially makes models like the HRS less useful in safety-net hospitals [22].

The mechanisms by which SDOH influence readmissions are complex and difficult to define. For example, for the cross-section of unmarried men with low incomes, Social HRS was lower than HRS. Thus, even though having a low SES is considered an unfavorable SDOH, within this intersection, patients were less likely to be readmitted within 30 days. This may be because unmarried men are less likely to interface with doctors. The 2017 MENtion it Survey by Cleveland Clinic showed that only 61% of men go to their doctor even after developing symptoms that they describe as “unbearable,” and that 83% of married women remind their husbands to attend annual checkups [23]. A qualitative study that interviewed physicians at seven hospitals with high readmissions rates found that most physicians asserted that readmissions were influenced by factors such as patient trust and willingness to participate as well as other social factors [24]. Additional patient attributes such as social support and personal resilience factors such as patient adaptability and biologic stress mechanisms also influence disease severity, which in turn influences readmission rates [25].

Our study has several limitations. First, we utilized census tract-level SDOH in this analysis. Individual-level SDOH are influenced but not entirely explained by neighborhood factors. Patient-level data may more accurately encapsulate resilience factors and lead to a different conclusion. The SVI was used in this manuscript because it is easily available at the census-tract level, well validated, and included in other community-level tools. Our findings were also similar when the ADI or HI were substituted for the SVI. The authors acknowledge that other factors such as legal status and environmental factors may alter the results and we believe further studies exploring these factors’ potential contribution to readmission risk should be undertaken.

Additionally, participants were studied at a single tertiary care center that serves a large population of urban poor as well as patients with advanced illnesses. Patients seen at our institution who have more favorable SDOH likely traveled a longer distance to our center and may have been self-selected due to the severity of their illness. These patients may have been on a trajectory toward frequent readmissions and similarly would have a higher HRS. To address this, we sampled a balanced dataset and found similar results. However, our dataset remains bereft of patients outside of a metropolitan area and would likely not be generalizable to hospitals that serve more rural populations. This could be an area for further research.

While we have tested for multicollinearity among variables, correlation of two variables does not equate to a linear combination of the vector space and linear dependence is rarely influenced by two dimensions alone. Correlation of two variables does not provide information about the relative importance of each variable. The authors acknowledge these limitations of our models. This study is further limited by the lookback period length (30 days). While similar results were obtained when the analysis was stratified by the presence or absence of an admission in the prior 30 days, it is possible that other lookback period lengths may produce different results.

Finally, this study examined patients admitted to our center and readmitted back to our center. We were not able to determine if patients were admitted to a different center and then readmitted here, or admitted here and then readmitted elsewhere. However, we have previously found that 95% of patients discharged from our center who require readmission are readmitted back to our center with only 5% readmitted elsewhere [26]. This ratio has been stable for many years at our center, including the time of the present study.

Conclusion

The addition of SDOH does not improve the predictive accuracy of the HRS. Rather, the effects of unfavorable SDOH manifest as overall worse health which is already captured in the HRS.