INTRODUCTION

In pay-for-performance (P4P) programs, a portion of payments to providers, administrators, or health systems is linked to achievement of specific benchmarks in access to care, process of care, or patient outcomes. This strategy has become widespread in the Veterans Health Administration (VHA) after being codified by law over a decade ago.1 The Centers for Medicare and Medicaid Services’ (CMS) Merit-Based Incentive Payment System (MIPS) under the Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) has established P4P as a foundational strategy for health reform in the community.2

Although P4P aims to increase health care value, the empiric data are far from clear. A recent systematic review found that while P4P programs may be associated with improved processes of care in ambulatory settings over the short term, there was no consistent evidence of an effect on patient health. The review also found that P4P was associated with potential unintended consequences, and that ultimately, P4P’s balance of benefits and harms depends heavily on the nuances of program implementation.3,4,5

In 2016, there were 25.5 million Veteran appointments with non-VHA providers in the community,6 and this number is expected to rise with the recent extension of the Veterans Choice Program (VCP). The VA Commission on Care recommended that payments to community providers be based on P4P incentives on quality and appropriate utilization.7 Yet, how to integrate payment and care from the nation’s largest health care system to a broad and diverse patchwork of community providers and health systems in a transparent and clinically meaningful way, without encouraging unintended consequences, remains largely unknown.

This report, which was part of a larger report commissioned by the VHA, presents the results of a systematic review and key informant interviews on (1) the effects of P4P programs on the quality of care and health of Veterans, (2) potential unintended consequences of P4P targeting Veteran health, and (3) program design features and implementation factors that might modify the effectiveness of P4P targeting Veterans, in both VHA and community settings.

METHODS

Data Sources and Strategy

We searched PubMed, PsycINFO©, and CINAHL© (January 2014 to March 2017) for studies examining P4P in Veteran populations (search strategy in online Appendix 1), updating our previous P4P review3, 5; from a targeted search of known VA P4P and quality improvement researchers; and from a search of the VHA’s website for unpublished studies.

Study Selection

We included studies examining direct P4P programs targeting healthcare providers in VHA and VCP settings (study selection criteria in online Appendices 2 and 3). We excluded studies examining other payment models and patient-targeted incentives. To assess the effectiveness of P4P, we included trials and observational studies that either (a) had a comparison group, (b) had three or more time points and reported a trend (e.g., interrupted time series), or (c) included 10,000+ participants. All study designs were included for questions related to unintended consequences and community care. We included studies examining processes occurring both upstream (e.g., performance measures) and downstream (e.g., audit and feedback) of P4P. Two independent reviewers assessed studies, and all discordant results were resolved through consensus or consultation with a third reviewer.

Data Extraction and Quality Assessment

Data from each study were abstracted by one investigator and confirmed by a second. We abstracted information on study design, sample size, observation period, program focus, incentive (target, size, timing), comparator, implementation factors, unintended consequences, and findings. Two investigators independently assessed study quality using the Cochrane Risk-of-Bias tool8 for RCTs and the Newcastle-Ottawa Scale9, 10 for observational studies (see online Appendix 4). We did not assess the quality of qualitative studies.

Key Informants

We engaged VHA stakeholders and technical experts experienced with P4P as key informants to better understand the program features and implementation factors that might contribute to successful P4P programs in VHA and community settings. We identified key informants through snowball sampling. We used conventional content analysis11 and developed a semi-structured interview that probed previously identified themes3, 4 and explored emerging themes (see online Appendix 5). Two investigators co-led telephone interviews (June–August 2017). Interviews were approximately 60 min and were recorded and transcribed verbatim. Four investigators reviewed each transcript and identified emergent themes and categories. Key themes were determined by consensus, and two investigators compared all related quotes across and within interviews.

Data Synthesis and Analysis

We qualitatively synthesized the results of included studies and interviews according to an implementation framework developed for our previous P4P review.3 The framework describes the relationship between P4P program features, external and implementation factors, and provider cognitive/affective and behavioral responses on processes of care and patient outcomes (see Fig. 1). Table 1 describes each of these categories. Due to heterogeneity among the studies, we did not perform meta-analysis.

Fig. 1
figure 1

Conceptual framework. P4P pay-for-performance.

Table 1 Description of Implementation Framework Categories

We assessed the overall strength of evidence for the effectiveness of P4P on Veteran care using a method developed by the Agency for Healthcare Research and Quality.12

RESULTS

We reviewed 1031 titles and abstracts and selected 74 articles for full-text review. Thirty met inclusion criteria and provided evidence addressing the key questions (Fig. 2). We invited 29 individuals for key informant interviews. Seventeen participated (see online Appendix 6). Tables 2 and 3 provide quotes from the interviews.

Fig. 2
figure 2

Literature flowchart. EHR electronic health record, ESP evidence-based synthesis program, P4P pay-for-performance, VA Veterans Administration, VCP Veterans Choice Program, VHA Veterans Health Administration.

Fig. 3
figure 3

Key informant interviews: themes—unintended consequences.

Fig. 4
figure 4

Key informant interviews: themes—program design features and implementation factors in VHA settings.

Fig. 5
figure 5

Key informant interviews: themes—program design features and implementation factors in non-VHA/community settings.

Table 2 Evidence and Policy Implications—P4P in VHA Settings
Table 3 Evidence and Policy Implications—P4P in Non-VHA/Community Settings

P4P in VHA Settings

Effectiveness of P4P

Four articles13, 23, 29, 30 from three studies13, 23, 29 provide data on the effectiveness of P4P in VHA settings (see online Appendix 7 for detail). Overall, the evidence is insufficient to determine whether P4P results in durable improvements in the quality of care or health of Veterans. The sole RCT found that the combination of audit and feedback and physician-directed incentives resulted in a small, short-term positive effect on blood pressure control.13 Two observational studies reported evidence of positive effects of P4P on processes of care. However, it is possible that these findings may have been influenced by concomitant public reporting23 and denominator management (i.e., a decrease in the number of patients eligible for a performance measure that may be positive, resulting in improvements in identification; or negative, resulting from gaming).29

Unintended Consequences

Eleven studies published in 13 articles13,14,15, 24, 29,30,31,32,33,34,35,36, 42 examined potential unintended consequences in VHA settings (see online Appendices 8 and 9). In general, qualitative studies and those using administrative data identify the potential for overtreatment associated with performance measures.14, 31, 32, 34 However, one RCT found that P4P for hypertension did not increase the risk of hypotension despite findings from a sub-study that suggested subjects were concerned about the risk of overtreatment.13, 42 Other studies examining unintended consequences reported findings congruent with denominator management,29 but no evidence of risk selection.30, 33 Qualitative studies found that participants felt performance measures may lead to negative unintended consequences such as reduced focus on patient needs/concerns, unincentivized areas of care, and/or healthier patient populations (teaching to the test/attention shift),15, 24, 35 and that they may negatively affect team dynamics, particularly if metrics are incentivized.35

Findings from Key Informant Interviews (See Figure 3 for Themes, and  Table 2 for Quotes)

Consistent with the literature, key informants voiced concern for potential overtreatment, particularly in facilities with metric-driven cultures, and more commonly with metrics that vary (e.g., blood pressure). Other concerns included denominator management, gaming, risk selection and health disparities—particularly for low SES Veterans, and the need to mitigate against teaching to the test/attention shift by having a variety of actively monitored valid metrics covering different aspects of care.

Implementation of P4P in VHA Settings

Thirteen studies reported in 16 articles13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 provide data examining program design features or implementation factors and/or provider cognitive or affective responses related to pay-for-performance programs in VHA settings (see online Appendices 10 and 11 for detail). In general, studies found physician-targeted incentives to be more effective than those targeting groups or practices13; the agreement between EHR data and manual review varied by metric14, 22; the relationship between access and patient satisfaction varied by the access metric used20 as well as whether the patient was new or returning21; and the difficulty of achieving multi-tasked metrics was not directly related to the number of tasks involved.17 Studies also found no difference in the achievement of actively vs. passively monitored metrics,25 and were mixed on the impact of the removal of incentives on performance.13, 23 Areas of improvement for implementing performance measures at the local level were suggested.15, 24 One study examined provider affective/cognitive responses and found that P4P had no impact on goal commitment.26

Findings from KI Interviews (See Figure 4 for Themes, and Table 2 for Quotes)

Program Design Features

Key informants consistently stressed the need for larger and more frequent incentives attached to clinically meaningful metrics that are within provider control. Other key themes included the potential benefit of incentivizing teams or front-line staff, placing greater emphasis on patient evaluation metrics, establishing the validity of performance measures, the feasibility of achieving performance measures at the local level, and the importance of de-intensification metrics.

Implementation Factors

Common among key informants was a belief that the implementation of P4P and performance measures in the VHA needs improvement. Key informants felt that VHA physicians are not able to identify their P4P-linked metrics; that the implementation of metrics has historically lacked interpretation, documentation, and support; and that implementation should include the resources necessary to ensure success and take into consideration facility-level contextual factors.

Provider Cognitive, Affective, and Behavioral Responses

Key informants expressed the belief that the intrinsic motivation of physicians is the driving factor in achieving evidence-based performance metrics that make clinical sense.

P4P at the Intersection of VHA and Community Care

Implementation of Pay-for-Performance in Community Settings

Findings from the Literature

We identified five studies examining P4P or related implementation factors in Veteran populations in community settings.37,38,39,40,41 One study identified published survey instruments examining cross-system access and coordination.37 Across studies, findings suggest that Veterans, providers, and VHA administrators are concerned that VCP already has and will continue to result in fragmented care, poor communication and coordination among providers, and places an additional burden on and VHA providers and on Veterans.37, 40, 41 Other concerns include barriers to sharing medical records,39,40,41 and differences between providers who are interested in VCP participation and those who are not (see online Appendix 12 for detail).38

Findings from Key Informant Interviews (See Figure 5 for Themes, and Table 3 for Quotes)

Program Design Features

Key informants stressed the importance of considering the overarching goal of the VCP in decisions about the metrics to incentivize. Although key informants recognized the need for increased access to healthcare for Veterans, they also suggested goals including the receipt of quality care, coordination of care, cost effectiveness, and “conservative care” (e.g., restrictive selection of surgical patients). Some key informants suggested that known differences between VHA and community care be used to guide metric selection.

Several key informants suggested that incentives might help to address known challenges related to the receipt of documentation and the overall quality of records received from community providers—particularly early in the program. Key informants also suggested the possibility of pooled population guideline-based metrics to compare the outcomes of Veterans receiving care in VHA to VCP, acknowledging that population-based incentives are unlikely to motivate provider behavior.

Key informants stressed the importance of building relationships between VHA and community providers at both national and local levels, and raised the question of how to select high-quality providers. Suggestions included contracting with established networks and/or only with board certified physicians; as well as using providers’ performance on established metrics (e.g., Centers for Medicare and Medicaid Services) for selection.

Implementation Factors

Key informants suggested a number of quality improvement strategies to accompany P4P in the community and stressed the importance of transparency and public reporting. To improve coordination of care, they suggested implementing systems that would provide community providers with the pop-up reminders available in the VHA and VHA formulary lists by adapting existing tools (e.g., Epocrates®) or creating new ones.

Key informants discussed differences between Veterans and the general population, largely noting lower socioeconomic status (SES) among Veterans, as well as greater mental health needs, higher rates of substance use, and a large rural population. Key informants felt it important to account for SES when implementing P4P and expressed concern for the limited availability of quality care for Veterans living in rural areas.

Key informants identified potential challenges the VHA might face in implementing P4P in community settings. Most commonly, key informants worried that because Veterans accessing care through VCP would be dispersed widely (comprising a small percentage of a provider’s patient population), community providers would view VCP as just one of many insurers—and for many providers, the smallest. This may inhibit the potential impact of P4P in community care, particularly if incentivized metrics do not align with those of other insurers. Furthermore, if providers have only a handful of VCP patients, their measured performance may vary widely and result in unreliable measures of quality. Key informants reiterated the potential for incentives related to access or data, as well as population-based incentives, and suggested aligning incentivized metrics with larger P4P programs. Other key informants discussed the potential tradeoffs of using narrow networks to increase the percentage of VCP patients per provider and access to high quality care, particularly for rural Veterans.

There was concern among key informants that the VHA may have already developed a fragile relationship with community providers due to slow payment, with providers refusing to accept Veteran patients. They advised that the VHA pay providers in a timely fashion and reiterated that P4P metrics must be achievable, or risk additional providers opting out and resulting in even poorer access for Veterans otherwise.

Concerns related to mental health treatment were raised frequently. Key informants cautioned that sending Veterans to community mental health providers will likely reduce the quality of care and coordination Veterans receive, especially for those with combat related PTSD, substance use disorders, and those experiencing homelessness. Key informants were also concerned that implementing P4P metrics would present a barrier to entry for providers, as the use of performance metrics is uncommon in community mental health. In addition, they felt strongly that providers would resist sharing treatment notes and other records.

Finally, key informants were concerned about the impact of VCP on current patients and VHA providers—that in time, resources could be diverted from Veterans receiving care in VHA settings, and that VCP may influence the ability of VHA providers to maximize their own performance pay.

Provider Cognitive, Affective, and Behavioral Responses

Key informants voiced concern for unintended consequences resulting from P4P in community settings, particularly overtreatment and overuse. They felt that overtreatment may be more common in the community than in VHA settings, and that the lack of integration and coordination with VCP might place Veterans at increased risk.

DISCUSSION

We examined 30 articles and conducted interviews with 17 key informants to help inform the implementation of pay-for-performance programs for Veterans in VHA and community settings. Although we found insufficient evidence to determine the degree to which P4P affects Veteran outcomes, we identified information in the literature and through key informant interviews that may help guide the implementation of P4P and maximize potential benefits while minimizing negative unintended consequences.

Several themes emerged from the interviews related to general issues with P4P in VHA that are consistent with the findings from published literature (see Table 2).3, 4 First, key informants felt that performance measures should be valid and well-designed and cited a need for further research evaluating alternate validation methods. Second, findings from a handful of included studies14, 31, 32, 34, 35 combined with concerns voiced by key informants suggest that potential overtreatment and overuse may be an unintended consequence of performance metrics, regardless of whether they are incentivized. Third, consistent with qualitative findings,24 provider key informants consistently stated that they did not know which metrics were incentivized and did not feel that the current P4P structure influences their behavior. Fourth, despite previous research stressing the importance of bottom-up, realistic metrics,3, 4 qualitative findings illustrate VHA staff are frustrated with current implementation practices.15, 24 There was strong consensus among key informants that incentivized metrics need to be achievable, that local resources are necessary for achievement, that incentivization decisions are perceived as equitable, and that incentive payments are predictable and reliable. Fifth, included studies found that metric-driven cultures were more prone to potential overtreatment,31, 32 and that overtreatment may be mitigated by incentivizing appropriate care rather than treatment or targets.13

Several themes related to P4P in community settings also emerged (see Table 3). First, key informants expressed that, given known challenges related to receipt of documentation,40, 41 data and care coordination may be an initial area for P4P to target. Secondly, they stressed the importance of establishing relationships with local providers and suggested ways to select providers with demonstrated records of quality care. Third, there was concern about the VA’s ability to influence provider behavior using P4P and to accurately estimate quality at the provider level, given that Veterans may comprise a small percentage of an individual provider’s patient population. Fourth, consistent with the findings from previous research,3 key informants stressed that P4P is only one part of a quality improvement strategy. Fifth, along with findings from included studies,39, 40 key informants cited ongoing challenges in coordinating care with community providers, and suggested the development of tools to facilitate coordination. Sixth, there was concern for and uncertainty about how VCP may affect Veterans who continue to receive care in VHA. Seventh, key informants noted that there may be Veterans who receive care both in the community and in VHA settings, and voiced concern for the potential impact on the achievement of VHA performance metrics and VHA provider metrics and performance pay. Finally, key informants stressed that a fundamental difference between VHA and community care is that the VHA tends to be more conservative. They felt that despite evidence of potential overtreatment in VHA settings,31, 32 overtreatment is even more common in community settings and community providers may be more prone to prescribing opioids than VHA providers.39

Our approach to the topic of P4P and Veteran health has several strengths. To our knowledge, this is the first paper to examine P4P specific to Veteran care. The VHA is a large integrated system that differs significantly from others in the United States, and the recent expansion of community care adds additional complexity. We recognized early that much of the information we sought related to the implementation of P4P would not be found in published research—particularly related to the intersection of VHA and community care. We interviewed VHA stakeholders with P4P expertise as researchers, clinicians, and administrators to provide informed insight into the implementation factors and program design features important to P4P success in the community.

Our review is limited by the paucity of research directly assessing the effectiveness of P4P in VHA settings, and the heterogeneity in the way that P4P is implemented in VHA settings. We therefore focused primarily on examining program design features, implementation factors, and unintended consequences. As research examining VCP is just beginning to emerge, our findings regarding P4P in community settings are influenced heavily by our key informant interviews. The breadth of topics and outcomes made it difficult to apply strict study design criteria. Thus, we included studies with less-rigorous methodology, some of which had small samples. We conducted 17 interviews to gain insight into factors important to the design and implementation of P4P in VHA and community settings. Although we aimed for a broad range of stakeholders, we recognize that a larger sample or different mix of key informants could yield a different subset of themes.

Although performance pay has been a part of the VHA for more than a decade, little research has evaluated its effectiveness, and no research has explored alternatives. The nature of the VHA as an integrated yet closed system provides a unique opportunity for research comparing P4P program design and implementation.

Although Veterans seeking care in the community is not a new phenomenon, continued funding for VCP necessitates the need for more comprehensive evaluation. Current research, programs, and initiatives funded largely by QUERI are evaluating metrics, quality, and P4P programs directly within the context of community care. More research is needed to identify how expanded care in the community may impact Veterans receiving care in VHA settings – in particular vulnerable populations such as Veterans of color, low income Veterans, and Veterans living in rural areas, for whom even community providers may be limited.

CONCLUSION

While the effectiveness of P4P in VHA settings is understudied, we highlight key lessons learned from the implementation of programs that may help guide future P4P program improvements in VHA. In P4P programs targeting Veteran health in community settings, care should be taken to establish relationships with providers with records of quality; consideration should be given to the impact of the small number of Veterans per community provider; efforts should be made to develop resources and tools to better enable coordination of care, data-sharing, and record transfer; and special attention should be paid to mitigate the potential for overtreatment and ensure quality care for all Veterans.