We reviewed 1363 studies, with 509 examined at the full-text level. Forty studies met inclusion criteria, with an additional study identified by a peer reviewer, for a total of 41 (Fig. 2; see Table 2 for study characteristics; study details provided in Appendices 4 and 5, available online). Of 45 individuals invited, 14 participated in KI interviews (Appendix 6, available online).
Table 2. Study Characteristics
Program Design Features (13 Studies)
We identified one prospective cohort,9 two retrospective cohort,10,11 and one pre-post study,12 six cross-sectional surveys,13–17 one economic analysis,18 and two simulation studies.19,20 Related to measure development, studies found that an emphasis on clinical quality and patient experience criteria was related to increased coordination of care, improved office staff interaction, and provider confidence in providing high-quality care.11,14 Conversely, an emphasis on productivity and efficiency measures was associated with poorer provider and office staff communication.11 In addition, one study that surveyed administrators and managers about the overall effectiveness of a P4P program found that factors predictive of the perceived effectiveness of the program included both the communication of goal alignment and the alignment of individual goals to institutional goals, while another found that providers believed that the P4P program increased a clinician’s focus on issues related to quality of care.12,15 Finally, one study examined different statistical methods of constructing composite measures, and found latent variable methods to be more reliable than raw sum scores.19
Related to incentive structures, one study examined the extent to which incentive size related to the decision to participate in P4P programs, and found that no clear amount determined decisions of whether to participate, but rather that there was a positive relationship between participation and the potential for reward.10 Similarly, another study found that after controlling for covariates, perceived financial salience was significantly related to a high degree of performance.13 Another study found that the underlying payment structure influenced performance, and that higher incentives may be necessary when the degree of cost sharing is lower.9 Finally, a study examining the relationship between P4P and patient experience in California over a 3-year period found that, compared with larger incentives (>10%), smaller incentives were associated with greater improvement in provider communication and office staff interaction measures.11 These findings were contrary to the hypotheses of the authors, who concluded that their findings may have been influenced by the tendency of practices with smaller incentives to incentivize clinical quality and patient experience measures (vs. productivity measures), which were also associated with improvements in office staff interaction.
Findings from Key Informant Interviews
Key informants stressed that P4P programs should include a combination of measures addressing processes of care and patient outcome, and that while measures should cover a broad range, having too many measures increased the likelihood of negative unintended consequences. KIs also agreed that measures should reflect organizational priorities, and should be realistically attainable, evidence-based, clear, simple, and linked to clinically significant rather than data-driven outcomes, with systems in place for evaluation and modification as needed. In addition, they suggested that improvements should be incentivized, that incentives should be large enough to provide motivation, but not so large as to encourage gaming, that penalties may be more effective than rewards, and that team-based incentives may be effective for increasing buy-in and professionalism among both clinical and non-clinical staff. Similarly, the timing of payments should be frequent enough to reinforce the link between measure achievement and the reward; however, this must be balanced with payment size, as the reward must be sufficient to reinforce behavior.
Implementation Processes (8 Studies)
We identified seven cohort studies, one prospective21 and six retrospective,22–27 and one simulation study.28 Three included studies25,26,28 examined threshold changes in the QOF, and found that quality continued to increase after increases in maximum thresholds, with lower-performing providers improving significantly more than those who were performing at a high level under the previous threshold.25,26 In addition, we identified three studies examining clinical process, and patient outcome incentives were removed from a measure. One study, of the QOF, found that the level of performance achieved prior to the incentive withdrawal was generally maintained, with some difference by indicator and disease condition.27 Two studies examined changes in incentives within the VHA. Benzer et al. (2013) evaluated the effect of incentive removal and found that all improvements were sustained for up to 3 years.22 Similarly, Hysong and others (2011) evaluated changes in measure status, that is, the effect on performance when measures shift from being passive monitored (i.e., no incentive) to actively monitored (i.e., incentivized), and vice versa.23 Findings indicate that regardless of whether a measure was incentivized, all remained stable or improved over time. Quality did not deteriorate for any of the measures in which incentives were removed, and of the six measures that changed from passive to active monitoring, only two improved significantly after the change (HbA1c and colorectal cancer screening).
Findings from Key Informant Interviews
Similar to the findings reported in the literature, key informants believed that measures should be evaluated regularly (e.g., yearly) to enable continued increases in quality. Once achievement rates are high, those measures should be evaluated, with the possibility of increasing thresholds, if relevant, or replacing them with others representing areas in need of quality improvement.
KIs stressed that implementation processes should be transparent and should provide resources to encourage and enable provider buy-in through information that allows them to link the measure to clinical quality and provides guidance on how to achieve success. To achieve buy-in, KIs urged the engagement of stakeholders of all levels, recommended a “bottom-up” approach to program development, and strongly supported clear performance feedback to providers at regular intervals, accompanied by suggestions for and examples of how to achieve high levels of performance.
Outer Setting (6 Studies)
We identified five retrospective cohort studies29–33 and one cross-sectional survey17 related to the outer setting. Studies provided no clear evidence related to factors associated with region, population density, or patient population. One short-term study of the QOF reported better performance associated with a larger proportion of older patients.33 Findings related to performance in urban compared with rural settings were inconsistent, with two studies reporting better performance by providers in rural settings,29,32 and one finding no difference.31
Findings from Key Informant Interviews
Key informants discussed the importance of taking patient populations into account when designing P4P programs, stressing the importance of flexibility in larger multi-site programs to allow for targets that are realistic and that meet the needs of local patient populations.
Inner Setting (18 Studies)
We identified 15 retrospective cohort studies30,32–45 and three cross sectional surveys15,46,47 related to the inner setting. Studies of the QOF found that larger practices in the UK performed better in the short term,33–35 particularly when examining total QOF points;37 however, results varied when examining subgroups by condition or location and by indicator.36,44,45 In addition, two studies found that group practice and training practice status was associated with higher quality of care,33,34 while two others found no significant effect of training practice status after controlling for covariates.35,44 Studies in the United States and other countries indicate that factors related to higher quality or greater quality improvement include culture change interventions introduced along with P4P,46 and clinical support tools.42 Results were mixed regarding quality improvement visits/groups and training.15,47 Contrary to findings related to the QOF, however, differences in quality associated with P4P within independent versus group practices,48 type of hospital (e.g., training, public, private, etc.),30 and patient panel size/volume are less clear, with studies reporting conflicting results.30,43
Findings from Key Informant Interviews
KIs stressed that P4P is just one piece of an overall quality improvement program, with other important factors such as a strong infrastructure and ongoing infrastructure support (particularly with regard to information technology and electronic medical records), organizational culture around P4P and associated measures, alignment/allocation of resources with P4P measures, and public reporting. Public reporting was described by many of our KIs as a strong motivator, particularly for hospital administrators, but also for individual providers operating within systems in which quality achievement scores are shared publically.
Provider Characteristics (5 Studies)
We identified three retrospective cohort studies29,34,43 and two cross-sectional surveys.13,49 Studies examining the influence of provider characteristics found no strong evidence that provider characteristics (e.g., gender, age) related to performance in P4P programs.13,29,34,43,45
Table 3. Evidence and Policy Implications by Implementation Framework Category