Bundled Payments for Care Improvement (BPCI) was a 5-year initiative of the Centers for Medicare & Medicaid Services (CMS) from 2013 to 2018, under the authority of the CMS Innovation Center, to test whether incentivizing reductions in Medicare payments for an episode of care triggered by a hospitalization and extending for up to 90 days after hospital discharge could reduce Medicare payments while maintaining quality of care.1,2 Under Model 2 of the initiative, hospitals and physician group practices (PGPs) could participate in any of 48 clinical episodes. Episodes were attributed to a participating PGP when a physician in the practice was the attending physician or operating surgeon listed on the inpatient claim, and, starting in 2016, a second criterion was added that a physician from the same practice also must treat the patient at least once during the episode after hospital discharge. If Medicare payments for the episode exceeded a target price, the BPCI participant was responsible for paying Medicare the difference, but if payments were less than the target price, the BPCI participant received the difference. In this way, participants had incentives to reduce service use to lower episode payments, raising concerns about whether reductions could jeopardize quality or patient experiences. Assessing the impact of bundled payment models is important given the continued industry-wide shift towards value-based care.3,4,5,6 A survey of commercial payers in 2018 indicated that 20% of their business was covered under bundled payments, a proportion that is expected to increase.7 In 2018, CMS introduced the BPCI Advanced model, which builds on BPCI Model 2.8

Most research to date on BPCI has been on hospitals and post-acute care providers.9,10,11,12,13,14 Only two peer-reviewed studies to date have reported findings related to PGP participation in BPCI, and both have focused on total joint arthroplasty.15,16 Under BPCI Model 2, 272 PGPs initiated nearly 300,000 episodes of care in the first four years of the initiative.17 BPCI-participating PGPs reduced average payments for 16 of 21 clinical episodes with sufficient sample size for evaluation, seven of which were statistically significant.17 Quality of care based on measures derived from claims indicated that quality was maintained for PGP-initiated episodes, relative to comparison episodes.17 Claims-based quality measures, however, do not holistically measure quality of care and are especially limited in reflecting patient perspectives.18,19,20 Patient-reported outcomes can help identify potential impacts of bundled payment initiatives caused by premature discharge or inadequate post-acute care that cannot be measured using claims data.21

For hospital-initiated episodes in BPCI Model 2, there were few differences in patient-reported functional status or care experiences between BPCI and comparison survey respondents,12 but the impact of BPCI on these outcomes may not be the same for those with PGP-initiated episodes. Participating PGPs may have different resources and responses to bundled payments than participating hospitals. For example, PGPs may be more likely to have on-going relationships with patients, which could promote continuity of care.22,23,24 Hospitals may have greater resources to devote to enhanced care coordination.13,25 Additionally, hospitals and PGPs that are part of a health system may need to consider the financial impact of their responses across the broader enterprise in redesigning clinical pathways for bundled payments.26,27

This study compares patient-reported functional outcomes and care experiences for Medicare fee-for-service beneficiaries in one of 18 clinical episodes initiated by BPCI Model 2 PGP participants, with the outcomes and experiences of a similar group of comparison beneficiaries. We separately analyzed the five highest-volume clinical episodes.

METHODS

Data

Survey Instrument

The BPCI beneficiary survey instrument contained 36 multiple-choice, closed-ended questions in five domains: changes in functional status, overall mental and physical health, care experiences and discharge planning, overall satisfaction with recovery, and personal characteristics. The survey instrument included items adapted from validated survey instruments, including the CARE Tool,28 National Health Interview Survey,29 Short Form 36 Health Survey,30 and the Care Transitions Measure®.31 The survey underwent cognitive testing with a convenience sample of Medicare beneficiaries with recent hospital and PAC experience. Beneficiaries with BPCI episodes could only be identified after their hospitalization, so it was not possible to survey them before the episode. Instead, we surveyed beneficiaries only once, at the end of their episode, and asked them to report their functional status at two points in time: the week before their hospitalization and the day they completed the survey (received roughly 90 days after hospital discharge). While it is possible that patients may not have been able to recall their pre-hospitalization functional status with complete accuracy,32,33 some studies have found minimal recall bias up to three months following a major health event,34,35,36 and any recall bias would be similar for both the intervention and comparison groups. The survey instrument is available in the supplementary appendix.

Survey Sample

We surveyed beneficiaries with the 18 highest-volume clinical episodes, which represented 89% of all BPCI Model 2 PGP episodes. We selected these 18 clinical episodes because we projected they would have a sufficient sample size. For each clinical episode, we selected a stratified random sample of beneficiaries in the intervention group, who had an episode attributed to a physician associated with a BPCI-participating PGP. Within each clinical episode, we defined sampling cells as unique combinations of presence/absence of a major complication or comorbidity, beneficiary age (<65, 65–74, 75–84, 85+), hospital size (above or below median number of beds), and hospital academic affiliations. We used coarsened exact matching to identify beneficiaries for the comparison group, randomly selecting comparison beneficiaries in equal proportions as the intervention group across cells.37 The survey was mailed to 37,998 beneficiaries with a clinical episode initiated by a BPCI-participating PGP and 31,707 beneficiaries with a clinical episode initiated at comparison hospitals, from February to September 2017. Appendix A has additional information about the survey sample.

Survey Administration

Approximately 90 days after hospital discharge, we mailed sampled beneficiaries a survey and introductory cover letter, followed by a reminder postcard and a follow-up invitation letter with a second survey. We called the remaining non-respondents to complete the survey by phone. Survey data collection materials were approved by the Abt Associates IRB. The data generated during this study are not currently publicly available.

Patient-Reported Measures

For each functional status component, we created binary measures of improvement or maintenance of the highest functional status between respondents’ recalled status and reported status at the time of the survey. The variable had a value of one if beneficiaries reported improved functional status from before their episode to the time of the survey or if beneficiaries recalled having the highest functional status in the week prior to hospitalization and also at the time of the survey. The variable had a value of zero otherwise. We also constructed binary measures indicating affirmative outcomes for measures of care experience and overall satisfaction with recovery. Appendix B provides greater detail about these measures.

Analysis

We estimated the difference in outcomes between the BPCI and comparison samples using logistic regression, weighted to adjust for sampling and nonresponse, with standard errors clustered at the level of the discharging hospital. The association of BPCI with outcomes was estimated as the percentage point difference between the BPCI and comparison respondents. Outcomes were risk-adjusted for age, sex, Medicaid eligibility, hierarchical condition category (HCC) index, health care use in the 90 days before the episode, Medicare Severity-Diagnosis Related Group (MS-DRG), fracture (for major joint replacement of the lower extremity [MJRLE] episodes), characteristics of the discharging hospital (bed size, ownership type, academic status, participation in the Comprehensive Care for Joint Replacement model), the respondent’s recalled functional status prior to hospitalization, whether a proxy responded to the survey, and survey wave.38 For each outcome measure, we used F-tests to assess model fit. Although this study focuses on PGP-attributed episodes, we adjusted for characteristics of the discharging hospital, because all BPCI clinical episodes were initiated by a hospitalization, and patient-reported functional status and care experiences partly depend on attributes of the hospital where the episode began.

The primary analysis pooled all survey responses for the 18 available clinical episodes.12 We also estimated differences in outcomes for each of the five largest clinical episodes, which accounted for approximately half of all Model 2 BPCI episodes initiated by PGPs: MJRLE; chronic obstructive pulmonary disease, bronchitis, and asthma (COPD); congestive heart failure (CHF); simple pneumonia and respiratory infections (pneumonia); and sepsis. Appendix C has additional information about the analytic methods.

Although this study includes 16 patient-reported measures in total, we did not adjust for multiple comparisons39 because this would decrease the chance of identifying negative consequences associated with BPCI.40

Sensitivity Analyses

We assessed the robustness of our empirical approach in multiple ways. First, not all BPCI beneficiaries in the survey sample received subsequent outpatient care from the PGP that initiated their episode following their hospitalization; it is possible that beneficiaries who did not have such a follow-up visit had different experiences than those that did continue to receive post-hospital care from physicians in the same PGP. To assess this possibility, we conducted analyses including only beneficiaries who received follow-up during the episode from BPCI-participating practices (Appendix F). Second, we assessed the sensitivity of estimates when using alternative empirical specifications (regressions without weights, weighted differences with no regression adjustment, and raw differences with no weights or adjustment) (Appendix G). Third, the survey was designed to assess both improvement in functional status and patient experiences throughout a 90-day episode of care; because beneficiaries with BPCI episodes could not be identified before the episode began, we measured improvement in functional status using respondents’ current self-reported functional status relative to their recalled pre-hospitalization functional status.33 Given potential concerns about bias in recalled measures of functional status, we assessed the robustness of findings when using only measures of functional status as reported post-hospitalization, at the time of the survey (Appendix H). Lastly, we estimated changes in functional status stratified by pre-hospital function, to account for the possibility that the association between BPCI and probability of functional improvement varied based on respondents’ initial function prior to the triggering hospital admission (Appendix I).

RESULTS

The overall response rate for the survey was 45.3% (44.5% for the BPCI group and 46.2% for the comparison group, p<0.01). While the response rate was statistically significantly different between the BPCI and comparison groups, the magnitude of the difference was qualitatively small, less than two percentage points, and BPCI and comparison survey respondents were well-balanced on all demographic characteristics (Table 1). Respondents tended to be healthier than non-respondents, as indicated by differences in rates of hip fracture among MJRLE patients and the average HCC score (Appendix D). However, differences in patient characteristics between respondents and non-respondents were similar between the BPCI and comparison groups.

Table 1 Characteristics of Survey Respondents*

For the 18 combined clinical episodes, we did not detect significant differences between the BPCI and comparison respondents for any of seven functional status change measures (Table 2). For example, the same proportion of BPCI and comparison respondents reported improvement or maintenance of ability to walk without resting from before their hospitalization to the time of the survey. In addition, a majority of BPCI and comparison respondents reported positive care experiences and overall satisfaction with recovery. For example, 71% of respondents in both groups reported never receiving conflicting medical advice, and 95% of respondents in both groups reported having a good understanding of how to take care of themselves before going home. However, BPCI respondents were less likely to report positive care experience for 3 of 9 measures: discharged at the right time (−1.2 percentage points [pp], 95% confidence interval [CI]: −2.1 to −0.3), appropriate level of care (−1.8 pp, 95% CI: −3.1 to −0.5), and preferences for post-discharge care taken into account (−0.9 pp, 95% CI: −1.7 to −0.1). Null results were precisely estimated. The 95% confidence intervals for our insignificant estimates did not exceed 2.1 percentage points in absolute magnitude, suggesting that if there were true underlying differences that we failed to detect, they were small in magnitude. These findings were broadly consistent with the sensitivity analyses (Appendix F-I).

Table 2 Differences in Survey-based Quality Outcomes between BPCI and Comparison Respondents, Aggregate Model 2 Physician Group Practices, February 2017–September 2017*

Functional improvement, care experience, and overall satisfaction for each of the five largest clinical episodes were similar to the aggregated results, with a few notable exceptions (Table 3). BPCI respondents with episodes for MJRLE were more likely to report improvement in use of stairs from before the surgery to after the episode than were comparison respondents (3.2 PP, 95% CI: 0.1 to 6.2). There were no other statistically significant differences in functional improvement between BPCI and comparison respondents for any of the five largest clinical episodes. BPCI respondents with COPD or pneumonia episodes were significantly less likely than comparison respondents to report having a good understanding of how to take care of themselves before going home (COPD: −2.1 pp, 95% CI: −4.1 to −0.1; pneumonia: −1.9 pp, 95% CI: −3.8 to 0.0), and less likely to report that medical staff clearly explained how to take medications before going home (COPD: −2.2 pp, 95% CI: −4.1 to −0.2; pneumonia: −3.3 pp, 95% CI: −5.5 to −1.1). Other statistically significant differences were consistent with the aggregate findings.

Table 3 Differences in Survey-based Quality Outcomes between BPCI and Comparison Respondents, Five Largest Clinical Episodes in Model 2 Physician Group Practices*

DISCUSSION

This study evaluated patient-reported experiences with BPCI Model 2 PGP-attributed episodes and included over 16,000 responses from beneficiaries treated by PGPs participating in the BPCI initiative and over 14,000 responses from a matched comparison group. Over 60% of both BPCI and comparison respondents—and as high as 96% for some measures—reported favorable care experiences and the highest levels of overall satisfaction with recovery. BPCI and comparison respondents reported similar care experience on five of eight measures and similar overall satisfaction with recovery. BPCI respondents, however, were less likely than comparison respondents to report being discharged at the right time, having an appropriate level of care, and that their preferences for post-discharge care were taken into account, with differences ranging from one to two percentage points. Since we used binary measures of care experience in this study, the findings imply substantive differences in care experience associated with BPCI that affected relatively few patients, rather than small differences that affected all patients.

We did not observe differences in self-reported functional outcomes about 90 days after hospital discharge between the two groups, under multiple specifications, indicating that concerns about care experiences among BPCI respondents did not coincide with worse functional recovery relative to comparison respondents. These results were also consistent across the five highest volume clinical episodes. Regardless of the type of episode, BPCI was not associated with differences in patient-reported functional improvement and was associated with worse care experiences on only a few measures.

Our results are similar to findings among beneficiaries with BPCI episodes initiated by acute care hospitals under Model 2. Prior analyses of beneficiaries with episodes initiated by hospitals did not find differences in functional improvement between BPCI respondents and a matched comparison group, but found that BPCI respondents were significantly less likely than comparison respondents to report positive care experiences and high overall satisfaction with recovery.12 Differences in care experiences between BPCI and comparison respondents were slightly larger and more likely to be significant for those whose episodes were initiated by hospital participants than by PGPs, and differences in satisfaction with recovery were not significant for those with PGP-initiated episodes. This suggests that participating PGPs may have been more effective in communicating with patients, preparing them for hospital discharge, and setting expectations about post-hospital care. Together the two studies suggest that BPCI Model 2 did not adversely impact patient functional status. However, both PGPs and hospitals participating in bundled payment initiatives may have room for improvement with regard to the patient experience of care.

This study has limitations. First, because data collection began after the start of the BPCI initiative, this study relies on a post-only with comparison group study design, and we cannot know whether patient-reported outcomes differed between BPCI and comparison participants prior to the initiative. Pre-existing differences may have contributed to the estimated differences between BPCI and comparison respondents, and the lack of significant differences for most outcomes does not preclude the possibility that BPCI impacted these outcomes for better or worse. Second, because BPCI was a voluntary initiative, these results might not be generalizable to all Medicare fee-for-service beneficiaries treated by all physician groups.41,42,43 Third, approximately half of the sampled beneficiaries did not complete the survey. Response rates and beneficiary characteristics were similar for the BPCI and comparison groups, and weights and risk-adjustment helped account for non-response. However, both BPCI and comparison respondents tended to be somewhat healthier than non-respondents on average. While we adjusted for health status, to the extent that non-respondents differed from respondents on other unobservable factors, findings may not generalize to all BPCI beneficiaries. Fourth, PGPs participating in BPCI carefully verified the NPIs attributed to their practice, but it was not possible to have non-participants verify NPI lists in the same way; for this reason, we were not able to report or control for the characteristics of PGPs. However, physician specialty and number of physicians participating per BPCI-participating PGP were similar between BPCI survey respondents and non-respondents, except that beneficiaries with episodes attributed to surgeons were more likely to respond to the survey than beneficiaries with episodes initiated by other specialists, primary care physicians, or hospitalists (Appendix E). We adjusted for this difference using non-response weights and clinical episode fixed effects. Lastly, survey data collection only covered approximately one out of five years of BPCI. Given the possibility for attrition of PGPs prior to our survey, as well as changes to the PGP assignment algorithm made after our survey, our results may not generalize to the entire period covered by the model.

In conclusion, survey respondents in bundled payment episodes attributed to BPCI Model 2 PGP participants and a matched comparison group reported similar changes in functional status from before to after their care episodes, but a smaller proportion of BPCI respondents reported positive care experiences for three of eight measures, relative to the comparison group. These results alleviate concerns that episode-based payment models could adversely affect patient-reported functional outcomes among Medicare beneficiaries, but highlight the importance patients place on clear and consistent communication with providers.