Hospital readmission shortly after discharge is increasingly recognized as a marker of inpatient quality of care and a significant contributor to rising healthcare costs1,2. Nearly one fifth of Medicare beneficiaries discharged from acute care hospitals are readmitted within 30 days, incurring additional costs of several billion dollars annually3. Although it remains unclear whether such readmissions are entirely preventable,46 there is good evidence that targeted interventions initiated before and/or shortly after discharge can decrease the likelihood of readmission711. Identifying patients at risk of readmission can guide efficient resource utilization and permit valid comparisons of hospital quality across institutions.

Previous studies that have examined risk factors for early hospital readmission have focused primarily on a single disease or condition,1215 a single hospital site,1618 or a specific patient population1922. Of four large multi-hospital studies that modeled readmission risk in a diverse patient population, one studied patients discharged from Veterans Affairs hospitals,23 two were conducted in England and employed extensive information technology resources unavailable in most other countries,24,25 and the fourth utilized Medicare data to derive a highly predictive but difficult to use model incorporating 20 variables26. These studies yielded complex prediction models that used patient information not currently easily available in most hospitals and did not adequately assess the impact of patients’ social supports on readmission risk.

To address these gaps, we used data from the Multicenter Hospitalist (MCH) Study to identify patient-level factors significantly associated with early hospital readmission among general medicine patients hospitalized in six large academic medical centers. In addition, we aimed to create and internally validate a simple score-based prediction model to identify patients with significantly elevated readmission risk. We limited our analysis to patient information that could be easily collected within the first 48 hours of admission.



The MCH Study was a prospective multi-center trial designed to assess the impact of hospitalist care on patients admitted to the general medicine services of six academic medical centers2729. Patients were enrolled from July 1, 2001 through June 30, 2003 at the following six sites: University of Chicago, University of California San Francisco, University of Iowa, University of Wisconsin, University of New Mexico, and Brigham and Women’s Hospital in Boston. The study was approved by each site’s institutional review board.

Patients were eligible for inclusion if they were 18 years of age or older and were admitted by a hospitalist or other internist to a general medicine service. Patients admitted specifically under the care of their primary care physician were excluded.

Data Collection

Detailed sociodemographic and health information was collected during a 15–20 minute intake interview conducted by a research assistant, generally within 48 hours of admission. Additional data were obtained from each site’s administrative records and a telephone interview of patients or their proxies conducted 30 days after discharge. These data were matched with the National Death Index to ascertain 30-day mortality from the date of hospital discharge.

Administrative data were used to estimate length of stay and to ascertain age, sex, and insurance status. Intake interviews were used to administer the adult lifestyles and function interview mini-mental state exam (ALFI-MMSE),30,31 the Medical Outcomes Study Short Form 12 (SF12) questionnaire,32 and gather data on social supports, prior healthcare utilization, and health condition, including comorbidities for calculating a self-reported Charlson index33.


We retrospectively selected a subset of enrolled MCH Study patients for our analysis. First, we only included patients where they or their proxies could be interviewed in the hospital and therefore could provide timely data for our predictive models. We then excluded patients with a length of stay greater than 30 days to reduce bias from outlier effects. Next, we excluded patients not discharged to home, i.e., patients who died during hospitalization, were transferred to another healthcare facility, or left against medical advice. Lastly, we excluded patients who died within 30 days of discharge.

Outcome Variable

We defined hospital readmission as all-cause admission to an acute care hospital within 30 days of discharge from the index hospitalization. We identified readmissions in two ways: using administrative data from the study sites and from patient response to a specific question regarding hospital readmission included in the 30-day telephone follow-up. To minimize recall bias, administrative data were used to identify readmissions to each index hospital, while self-reported data were only used to identify readmissions to non-index hospitals.

Predictor Variables

We identified candidate patient factors likely to be associated with high readmission risk a priori from a survey of the relevant literature and grouped them into four natural categories as follows: (1) sociodemographic factors, including age, sex, self-reported race/ethnicity, self-reported total household income, education, and insurance status; (2) social support including, marital status, number of people living with patient, having someone to help at home, and having a regular physician; (3) health condition, including self-reported 0–9 Charlson comorbidity index, self-reported 0–100 health rating, 0–100 SF12 physical and mental component scores, 0–22 ALFI-MMSE score, and limitations in activities of daily living (ADLs) and/or instrumental activities of daily living (IADLs); and (4) healthcare utilization, including number of admissions in last one year, length of stay of the current hospital admission, and whether, given the choice, the patient would stay an extra day in the hospital even if their doctor told them they were well enough to go home.

Statistical Analysis

The patient was the unit of analysis. Because of our large sample size, we chose a split-sample design to derive and internally validate our prediction model. We randomly selected two thirds of patients from each site and combined them to create a derivation cohort and subsequently combined the remaining one third of patients from each site to create a validation cohort34.

To assess whether the candidate patient factors were significantly associated with hospital readmission, we fitted separate multivariable logistic regression models for each of the four categories of patient factors using data from the derivation cohort. We used P < 0.10 as the cutoff for assessing significance. Only factors noted to be significantly associated with readmission within their respective categories were included in the final regression model. Generalized estimating equations (GEE) were used to account for clustering by discharging physician, and hospital site was entered as a fixed effect in each of the models to minimize confounding35.

When constructing the final model, factors that became non-significant at P > 0.05 were removed if their presence did not change the beta-coefficient for any other factor by more than 20%. We derived a scoring system by multiplying each beta coefficient by ten and rounding to the nearest integer; the integer values from all applicable factors were then added together to estimate a total score for each patient. We subsequently obtained score-based predicted probabilities of readmission by entering each patient’s risk score into a single-predictor logistic regression model and used the output from this model to determine score cutoffs for identifying patients within selected readmission risk levels (0–9%, 10–19%, 20–29%, and 30% or higher).

We tested the performance of our model using data from the validation cohort. We assessed goodness of fit using the The Hosmer–Lemeshow chi-square test36 and model discrimination by measuring the C statistic, which is the area under the receiver operating characteristic (ROC) curve37. Because patients discharged to sub-acute or long-term care facilities are an important patient population but might have different predictors of readmission, we repeated our methodology in this population. We used SAS statistical software (Version 9.1; SAS Inc, North Carolina) to perform all analyses.


Patient selection is described in Fig. 1. Of 13,903 patients who failed to complete the intake interview, 28% refused to sign informed consent, 52% were discharged before they or their proxies could be interviewed, 17% had been admitted during the previous month and were not re-interviewed, and 3% died in the hospital. The 10,946 patients selected for our analysis were randomly assigned to a derivation cohort of 7,287 patients and a validation cohort of 3,659 patients. There were no statistically significant differences in patient characteristics between the two cohorts (Table 1).

Figure 1
figure 1

Patient selection.

Table 1 Patient Characteristics

Approximately 20% of patients in each cohort were older than 75 years, over 60% were not currently married, approximately one-quarter needed at least some help with their ADLs, and half had been hospitalized at least once in the preceding year. The mean rating for self-rated health (one month prior to admission) was 55 (standard deviation 25), and the median Charlson comorbidity index was 1 (inter-quartile range 0–2).

Of the 7,287 patients in the derivation cohort, 1,274 (17.5%) were readmitted within 30 days, of which 79% could be confirmed with administrative data from the index hospitals. Readmission rates varied from 16.1% to 17.9% among the different sites. Table 2 compares readmitted and non-readmitted patients and shows the results of each of the four sub-models used to derive the final model. The significant predictors included sociodemographic factors (age, income, insurance status), social support factors (marital status, having a regular physician), markers of health (Charlson comorbidity index, SF12 physical component score), and healthcare utilization factors (number of admissions in last one year, current length of stay greater than two days).

Table 2 Association of Patient Characteristics with 30-Day Hospital Readmission in the Derivation Cohort

In the final model, two previously significant predictors—age and income—lost statistical significance (Table 3). Removing either of these predictors did not change the beta coefficients of other predictors by more than 20%, and so both were removed from the final model. As a consequence of removing these two, SF12 physical component score became non-significant; however, removing this predictor caused a substantial change in the beta coefficients of remaining predictors and it was retained in the final model. Study site was retained as an obligatory confounder. The odds ratio for the site with the highest adjusted readmission rate was 1.40 (95% CI 1.09–1.79) compared with the site with the lowest rate; no other differences among sites were statistically significant.

Table 3 Final Logistic Regression Model of Predictors of 30-Day Hospital Readmissiona

Points were assigned to each predictor as described in the Methods section, except four points were assigned for ‘3 admissions in last one year’ to allow for a monotonic function. Using a regression model based only on the scoring system, we were able to assign score cutoffs based on predicted readmission rates of 0–9%, 10–19%, 20–29%, and 30% or higher (Table 4). The 5.1% of patients with a score of 25 or higher had 30-day readmission rates of 32.6% and 28.9% in the derivation and validation sets, respectively, compared with a 30-day readmission rate of 16.4% in patients with a score below 25 (same rate in both cohorts).

Table 4 Comparison of Score Predicted and Observed Readmission Rates

The Hosmer–Lemeshow goodness of fit test yielded P-values of 0.44 and 0.23 for the derivation and validation cohorts, respectively, indicating good model fit. Discrimination of the model was only fair: the area under the ROC curve (AUC) was 0.65 in the derivation cohort and 0.61 in the validation cohort (Fig. 2).

Figure 2
figure 2

Comparison of the receiver operating characteristic (ROC) curves for the derivation and validation cohorts. A. ROC curve for derivation cohort. B. ROC curve for validation cohort.

In a similar analysis in patients discharged to sub-acute or long-term care facilities, the only significant predictors of readmission were the number of hospital admissions in the preceding year and patient age (results not shown). The effects of Charlson comorbidity index and SF12 physical component score on readmission were of a similar magnitude as with patients discharged home but were not significant predictors due to wider confidence intervals. Hospital length of stay, marital status, presence of a PCP, and insurance status were much less predictive of readmission.


Using data from the MCH Study, we were able to identify key patient-level predictors of early hospital readmission and derive and internally validate a parsimonious and easy-to-use model for assessing readmission risk in general medicine patients hospitalized for a variety of medical conditions and discharged home. Using seven easily available predictors, our model was able to identify 5% of patients with an approximately 30% risk of readmission within 30 days of discharge. Although the discriminative ability of our model is only fair, it still provides a useful and easily applicable tool for identifying high-risk patients who may require more intensive use of hospital resources designed to reduce readmission rates.

Several patient-level factors identified as significant predictors were known from the published literature, such as the number of hospital admissions in the preceding year and the Charlson comorbidity index1626. It was somewhat surprising that marital status and having a regular physician were both positively associated with readmission risk. It is possible that the presence of social supports, such as a spouse, allows some frail patients to be discharged home who would otherwise be transferred to a subacute care facility (of note, when our model was applied to patients discharged to facilities, being married was a negligible predictor of readmission). Similarly, having a regular physician may be a marker of illness severity not captured by other predictors in our model. Another possibility is that having a spouse or regular physician may lead to earlier detection of clinical deterioration and/or a lower threshold for readmission, although the lack of a significant association between having help at home or living with someone and early readmission appears to argue against this. Nevertheless, these explanations do not diminish the usefulness of these factors as predictors in our model. Future studies should validate these findings and compare model discrimination with and without these social factors.

Age is one predictor that has been noted to have a significant association with readmission risk in several studies but was non-significant in our final model. However, Medicare as primary insurance was a significant predictor in our final model and this variable incorporates age, which may partly explain these findings. Also, other predictive factors more commonly found in the elderly may already have been captured in our model, including Charlson comorbidity index and hospitalizations during the preceding one year.

In a supplementary analysis of patients discharged to sub-acute and long-term care facilities, we found that sociodemographic and social support variables were much less predictive of readmission. This may be because these factors mediate readmission through access to care and performance of self-care activities, and these factors are much less important when access to care is essentially continuous.

The performance characteristics of our model are only fair. However, they are comparable to the discriminative ability of other commonly cited readmission risk prediction models. Our AUC of 0.61 in the validation cohort is identical to the AUC of the Pra (probability of repeated admission) model21,38. Similar to our model, the Pra model was able to identify a small group (7.2%) of patients at high risk of readmission (41.8% with two or more admissions over 4 years). The one well-known model with a high AUC (0.83) included 20 variables with eight interaction terms, raising the possibility that it was overfit for the population used to derive it and limiting its usefulness as a practical clinical tool26. Two other high performing but similarly complicated models with AUC ranging from 0.68 to 0.75 were derived from the United Kingdom’s National Health Service database and are not usable in the U.S. due to their reliance on data from national electronic medical records24,25.

Why is it that so few statistical models derived to date are capable of reliably predicting readmission risk in a diverse population of medical inpatients? There are several possibilities. First, several important and previously unknown predictors may be missing from existing models. For example, we now know that adverse drug events (ADEs) are an important patient safety problem following hospital discharge, and existing models—including the one derived for this study—do not include many of the recently identified predictors of post-discharge ADEs (such as the number and classes of preadmission medications and patients’ knowledge of their medications)39,40. Second, generic markers of illness severity may be less predictive when evaluating populations with diverse medical conditions in contrast to disease-specific markers such as those for congestive heart failure1214. Third, it is plausible that readmission risk has a weaker correlation with patients’ clinical characteristics and social circumstances than it does with the processes of care during hospitalization and discharge and with post-discharge care41. That adjusted readmission rate varied by site is one piece of evidence in favor of this hypothesis. Thus, rather than identifying a single group of patients at high risk of readmission and focusing interventions on them, it may be more efficient to ensure that all patients receive a standardized set of discharge processes11. Alternatively, it may be worth identifying different types of high-risk patients and customizing interventions accordingly (e.g., a focused pharmacist intervention for patients at high risk for ADEs; close follow-up for patients with certain high-risk medical conditions).

Our study has several limitations. Although it was conducted at six academic medical centers in different states and included detailed information on a sizeable and diverse patient population, caution should be exercised in generalizing its findings to small, rural, and/or community hospitals. A sizeable proportion of screened patients could not be included in our study, further limiting generalizability; analyzing readmissions in this population, especially in patients who were discharged before they could be interviewed, may yield additional insights into the reasons behind early hospital readmission in patients with short lengths of stay. Furthermore, we excluded patients who died within 30 days of discharge because predictors of death may be somewhat different than predictors of readmission. Since we did not adjudicate whether each readmission was elective versus unplanned, we could not exclude purely elective readmissions; however, based on our collective experience we would expect the rate of elective readmissions on general medicine services to be low. Lastly, we were unable to confirm readmissions to non-study hospitals, and patients are known to underreport hospital readmissions 42. However, the short time-frame (only 30 days) for measuring readmissions and use of administrative data to confirm readmissions to study hospitals minimizes the potential impact of recall bias.

In summary, a prediction model derived and internally validated in a large multi-center cohort of general medicine inpatients successfully identified a small proportion of patients at elevated risk of hospital readmission within 30 days of home discharge. While interventions could be designed and tested on this population, more work is needed to identify additional factors that impact post-discharge health outcomes, optimize the discharge process for all patients, and create interventions tailored to patients’ needs in order to prevent potentially avoidable readmissions.