Description of Studies
Searching the databases yielded 1106 results after duplicates were removed. An additional six reviews were identified from the gray literature and the authors’ personal collections. After title and abstract screening, potentially relevant reviews were assessed against the predefined inclusion criteria. The final evaluation comprised 21 reviews. Table 1 shows the geographic setting of the studies included in the reviews.
Six reviews [17,18,19,20,21,22] focused on ACOs or HMOs, which means the studies already comprised a group setting. The other reviews explicitly included “provider groups” in their analyses, but only some differentiated between individual- and group-specific incentives, which provides less reliable results for interpretation. Petersen et al. , Van Herck et al. , and Kondo et al.  included the incentive level (individual vs. group).
Within the reviews, we found seven different incentive schemes/initiatives (Table 2); Appendix F in the ESM provides a detailed overview.
Risk of Bias
To identify the risk of bias among the included reviews, we applied the validated AMSTAR checklist . Information about the scores for each review can be found in Appendix F. A higher score in the AMSTAR quality rating indicates a lower risk of bias. Figure 1 shows a summary of the results.
High-quality studies were identified for P4P, salaried payment, and bundled payment, without indicating a trend in quality of review by incentive type. We also analyzed whether an association existed between the geographic areas included (USA only vs. various countries) and the quality of reviews. Six reviews included only studies from the USA. Five of them were of moderate quality and one was of low quality. All high-quality reviews comprised studies from various countries, whereas two of the four reviews in the low-quality category did not provide information about the geographical origin of the included studies.
Only one review performed a meta-analysis regarding results for P4P . Many of the other reviews reported that the heterogeneous settings or outcomes were major obstacles that prevented them aggregating results using meta-analysis.
Reviews with low AMSTAR quality scores lacked a priori planning and gray literature searches, did not address potential publication bias, and did not provide information about included and excluded studies. Often, no comprehensive literature search was performed; how the studies were selected, data were extracted, and the scientific quality of studies was assessed remained unclear; and conflicts of interest were not sufficiently addressed. Major concerns that led to a quality assessment of “moderate” comprised a priori planning, incomplete information about included or excluded studies, or missing consideration of publication bias.
Effects of Interventions
The unit of remuneration measure for salaries is the period of working time. Reimbursement does not depend on the type and number of patients treated and, therefore, is relatively easy to apply. Within this reimbursement scheme, no incentives for (unnecessary) extension of services or for quality improvements or cost consciousness existed.
Chaix-Couturier et al.  and Scott et al.  dealt with the effects of paying physicians a fixed salary, among other effects. These reviews were of low and high quality, respectively. Both reviews focused on measures of process and outcome quality. Chaix-Couturier et al.  used process and outcome indicators as well as costs, whereas Scott et al.  used patient-reported outcomes, changes in physician behavior, and physiological indicators. Chaix-Couturier et al.  concluded that salaries were associated with a lower referral rate and fewer activities than was FFS. Conversely, Scott et al.  did not report any statistically significant changes in patient-reported outcomes. The heterogeneity of indicators precluded overall conclusions being drawn about the results of these two reviews.
Fee for Service
In FFS, remuneration is based on the services actually provided. This direct link between services and reimbursement can provide an incentive to reduce costs but might also induce an extension or selection of certain low-cost/high-margin services.
Five reviews analyzed the effects of FFS reimbursement using structure, process, and outcome indicators [21, 22, 27, 29, 30]. Four were of moderate quality and one was of low quality. Chaix-Couturier et al.  found a higher level of activities in FFS, e.g., a higher fee for visits led to an increase in the number of visits made by the physicians themselves instead of deputies. Another study  reported that FFS resulted in more elective procedures. Steiner and Robinson  focused on managed care, with FFS forming the main comparator. Overall, the results varied among the indicators analyzed. For example, in FFS, preventive screening was lower, hospital admission rates were higher, and health outcomes were virtually identical compared with managed care. For a more detailed description, see Sect. 3.3.6. Generally, FFS seemed to be less favorable than managed care. However, a closer look at the medical indication is warranted. Results suggested that specific conditions such as depression treatment/mental health were treated better in FFS. Keyhani et al.  analyzed the effects of two types of reimbursement on oversupply of services and found only a slight difference between FFS and managed care. Nejati et al.  compared FFS versus per-diem reimbursement and bundled payment. They reported less favorable results in length of stay and costs compared with per-diem payments and in 5-year cost and quality outcomes in FFS compared with bundled payments. Wranik et al.  assumed that team characteristics influenced outcomes and found some evidence that FFS had a negative effect on teamwork.
Overall, FFS seemed to have neither a clear positive nor a negative impact on structure or outcome of care. On the other hand, process quality might be negatively affected by FFS compared with other reimbursement types.
Bonus payments are supposed to incentivize certain services and, therefore, are paid in addition to the overall reimbursement system.
Two reviews of moderate quality [32, 33] examined whether bonus payments had a positive impact on the provision of certain services. Hamilton et al.  focused on smoking cessation, especially process indicators, e.g., recording of smoking status or referral to smoking-cessation services. Those studies that provided detailed information about the bonus payments reported bonuses of $US24–152 per patient advised or referred. Most of the studies showed improvements regarding process indicators. On the other hand, the results of studies evaluating the quit rate did not allow clear deductions regarding the effects of bonus payments. Sabatino et al.  focused on screening for breast, cervical, and colorectal cancer. The bonuses varied from a practice bonus paid per quarter of approximately 5% of capitation through to year-end physician bonuses, for which no further details about bonus potential were provided. The inconsistency of study results meant the authors could not draw any clear conclusions.
Overall, the impact of bonus payment cannot be clearly classified.
Bundled payments are much more sophisticated than salaried and FFS payments. They define cases based on diagnosis or therapy and provide a single payment for an episode of care or multiple services. By facilitating the comparison between payments received and costs, transparency is increased and efficiency might be incentivized. However, bundled payments also bear some risks, e.g., cost shifting to other sectors or complete omission of services.
Within the system of bundled payments, providers receive predetermined payments based on expected costs for a defined episode of care. Three reviews, two moderate quality and one high quality, reported results on this type of payment. Aviki et al.  focused on oncological care, and their review indicated positive effects but did not provide sufficient evidence. For example, one of the studies  showed an increase in guideline adherence but only for two of the five types of cancer analyzed. Another study  discovered a reduction in hospitalization and radiotherapy, but the cost of chemotherapy drugs increased. Hussey et al.  analyzed the effects of bundled payments on costs and quality of care. The authors identified 20 different designs of bundled payments and concluded that the effects were weak but consistent: bundled payments led to cost reductions but did not show significant effects on quality. Nejati et al.  focused on cancer care. Results showed significant improvements regarding 5-year costs for bundled payments compared with FFS but were heterogeneous regarding outcome quality.
The results of these three reviews reporting on bundled payments are mixed regarding process and outcome quality.
Pay for Performance
With the application of P4P, a new unit of remuneration was introduced: treatment success. Success is determined by the achievement of defined quality indicators, which sets incentives for quality improvement. One of the challenges of P4P is the selection of valid indicators.
The concept of P4P has recently attracted widespread interest. It was examined in ten reviews [22, 23, 25, 32, 33, 35, 65,66,67,68], of which three were of high, six were of moderate, and one was of low quality.
Huang et al.  conducted an indication-based review to analyze the effects of P4P on management of diabetes using meta-analysis. Study heterogeneity accounted for some limitations of the analysis. Physician behavior, mainly measured by process indicators as well as outcomes, were positively influenced by applying P4P. Mendelson et al.  focused on the effects of P4P regarding the process of care, utilization of services, and outcomes. No clear results could be gained for ambulatory care. Methodologically sound controlled before–after studies assessing the effects of P4P in the process of care did not show improvements, whereas six other studies, of which three were at high risk of selection bias, found positive results. A randomized controlled study reported appropriate management of blood pressure, though it was not accompanied by guideline adherence in terms of medication . Available studies were inconsistent about the utilization of services: Mendelson et al.  noted that studies with a higher-quality design found no effects. When focusing on blood pressure control and cholesterol levels as intermediate outcomes, no statistically significant effects were reported. Petersen et al.  and Schatz  considered the levels at which incentives were provided, with Schatz’s  work in the ambulatory setting showing contradictory results and Petersen et al.  drawing more positive conclusions: most studies found at least a partially positive impact of P4P on both single- and group-level P4P. Christianson et al.  reviewed evaluations of P4P plans. Incentive size varied strongly from approximately 0.5 to 12% of a physician’s total compensation. Most of the studies reported process quality measures, and few contained outcome quality measures. Overall, each study found at least partial quality improvements. Van Herck et al.  studied the impact of P4P on clinical effectiveness and equity of care. Similar to Christianson et al. , most included studies applied process indicators. Outcome indicators were less frequently used. The high-quality review reported weak evidence regarding coordination, patient centeredness, continuity, and cost effectiveness. Kondo et al.  analyzed P4P in veteran care and community settings, where the evidence for effectiveness of P4P was limited and insufficient for clear conclusions.
Scott et al.  conducted a very detailed analysis of the effects of blended payment schemes, including schemes that directly rewarded performance and quality. They identified three different schemes that followed P4P thinking: tournament-based pay, threshold target payments, and a fixed fee for a patient achieving a certain outcome. Tournament-based pay is a system rewarding medical groups according to their relative performance. The Cochrane review by Scott et al.  included one study  examining the effects of tournament-based pay on the provision of diabetes-related services (glycated hemoglobin testing, urinalysis, lipoprotein density level, and eye examination). Approximately 5% of each physician’s annual fee was covered by the tournament-based pay, which depended on clinical quality, patient satisfaction, and practice efficiency. The results of the study showed better rates of adherence to eye examination guidelines only. Single-threshold target payments are conditional on reaching certain targets. The studies included by Scott et al.  measured effects by process indicators [43,44,45]. Results were mixed, so no conclusion regarding the effects of these payment methods could be drawn. Mullen et al.  evaluated the effects of a combination of tournament-based pay and single-threshold target payments. Indicators included screening rates and appropriate asthma medication. Only one of the indicators (increased screening rate for cervical cancer) showed statistically significant change. Another study  from the review by Scott et al.  examined the effects of paying a fixed fee for a patient achieving an outcome, which in this case was defined as the rate of smokers being “smoke free” at 12-month follow-up. This type of incentive did not have an effect.
The idea of P4P has also been applied in England, Wales, Scotland, and Northern Ireland, where the NHS introduced the Quality and Outcomes Framework (QOF). Hamilton et al.  focused on evaluating monetary incentive systems in the field of smoking cessation. They reported the following impacts of QOF: increased recording of smoking status, provision of cessation advice, and referrals to smoking cessation services, whereas no effect on reduced smoking rates could be proved. The review by Mendelson et al.  did not have an indication-specific focus. The authors’ conclusions regarding the effects of QOF in ambulatory care were ambiguous: Although the included studies showed a tendency for improved process and outcome indicators, this tendency could not be found in methodologically stronger studies. Mendelson et al.  reported that incentive payments accounted for up to 30% of practice income. Forbes et al.  analyzed the effects of QOF in the context of long-term conditions. They reported the amount of payments depending on incentives as 10–15% of practice income. The five studies reported modest improvements regarding emergency admissions and consultations in severe mental illness. Process quality of diabetes care was also positively affected. No clear results were found regarding mortality.
Overall, P4P seemed to have a positive impact on process quality. Outcome quality may also partially benefit, but results were inconclusive and dependent on the outcome measure applied.
In capitation, a cross-sector lump-sum reimbursement is paid for a patient’s expected healthcare utilization. This is supposed to incentivize continuity of care and lead to service provision by the most efficient provider. However, it bears some risk for risk selection.
Many different forms of capitated payments exist. Five reviews dealt with this type of payment, three of low and two of moderate quality. For example, Chaix-Couturier et al.  differentiated capitation, managed care initiatives, and fund-holding models, in which capitated payments were made for each patient registered. For capitation, the authors’ conclusions referred to gynecology patients and reported a reduction in elective procedures in this setting. For managed care, Chaix-Couturier et al.  found a reduction in resource spending due to shorter hospital stays, a lower number of diagnostic services, and higher-quality decision making. Additionally, guideline adherence improved significantly. On the other hand, outcomes of care did not show significant overall improvements. Both positive effects (reduced prescribing costs, decreased number of drugs per prescription, and reduced referral rates for elective surgery and to private clinics) and negative effects (no reduction in physician workload) were observed with fund holding.
Steiner and Robinson  conducted a very detailed review on the effects of managed care. The authors analyzed the effects of managed care mainly compared with FFS in seven categories. In terms of utilization, they found less use of hospital care in managed care mostly due to lower admission rates and more frequent visits to physicians—at least for non-mental care patients. In mental health, studies recognized fewer physician office visits and less specialized treatment in managed care. Results regarding the use of prescription medication were mixed. For the second category, charges and expenditures, the type of payment seemed to have no significant effects. Regarding preventive screening and health promotion, Steiner and Robinson  reported higher activity rates for managed care. Quality of care was measured in terms of structure, process, and outcome quality. Results for process quality were inconsistent, and those for outcome quality did not differ. However, the authors found that access to treatment (structural quality) was more difficult for enrollees of managed care. When it came to enrollee satisfaction, rates were mostly lower for managed care. The sixth category, equity of care, required a differentiated view: children seemed to at least partially benefit from managed care. The care they received within managed care was reported to be as good as or even better than that in an FFS environment. This was especially proven by an increase in doctor visits, specialist referrals, laboratory tests, and preventive screening. Preventive screening for low-income women was similar in managed care and FFS, but antenatal care was worse, although no differences in childbirth outcomes were observed. Regarding care for elderly people, findings were mixed. The last category contained specific conditions, such as cancer care and chronic disease management. The studies analyzed by Steiner and Robinson  indicated mainly better or similar cancer care in managed care and FFS. On the other hand, the treatment of depression was either worse or no different. Chronic disease management showed equivalent results for managed care and FFS, and results were mixed for myocardial infarction but did not result in differences regarding mortality. Overall, there seemed to be favorable tendencies for managed care, but results were too inconclusive to determine an overall benefit.
The moderate-quality review by Wranik et al.  aimed to determine the effect of capitation on team expansion. Compared with FFS, team expansion tended to increase under capitated payments. However, the evidence level was weak.
Hodgson et al.  analyzed, among others, how FFS and HMO reimbursement affected the treatment and outcomes of patients with colorectal cancer. HMOs are a special care model wherein coverage of care is usually limited to physicians who work for or contract with the HMO. Regarding the medical treatment, Hodgson et al.  found only little evidence for a statistically significant impact of the reimbursement type, and outcomes did not differ substantially. Johri et al.  focused on social HMOs (S/HMOs), a special type of HMO that aims at care for elderly patients. The S/HMOs assessed within the review put the financial risk for provision of care at a single organizational structure. Payments were provided as capitated payments in advance. For this kind of S/HMO, Johri et al.  drew negative conclusions: analyzed studies showed negative results regarding costs, utilization of services, and outcomes.
Summarizing the results on the level of quality dimensions did not lead to any clear evidence regarding the effect of capitation.
Accountable Care Organizations
ACOs were introduced to the US healthcare system in 2010 with the Patient Protection and Affordable Care Act. They have been implemented in the Medicare and Medicaid system as well as by private care providers. A key element of ACOs is the assumption of responsibility for medical care by provider networks. Payments are based on FFS and supported by additional elements to ensure quality and efficiency of care. Within “shared savings” programs, an ACO can participate in savings by receiving a certain proportion of the savings as a bonus payment, whereas within “shared risk” programs, the ACO also participates in losses—in return for a higher share of participation in savings. In addition, ACOs must meet certain quality criteria. Three reviews, all of moderate quality, dealt with ACOs. Aviki et al.  analyzed the value of care per dollar spent in cancer care. Two of three studies found a reduction in inpatient hospital treatment, especially for the length of stay [47, 48]. The third study, focusing on 30-day mortality, readmission and complication rates, and inpatient length of stay, did not report any effects caused by ACO participation . Kaufman et al.  examined the effect of ACOs on the utilization of services, the process of care itself, and outcomes while differentiating between Medicare, Medicaid, and private payer ACOs. Overall, the authors reported a correlation between ACO participation and a reduction of both inpatient care and emergency department visits. The process of care itself was improved, especially for chronic diseases and regarding preventive care services. Regarding outcomes, no generally valid conclusions on the effect of ACO participation could be drawn. Some of the studies reported partially positive effects for patient experience [51, 52] and mortality , whereas others did not find any effects [49, 50, 54, 55]. Nejati et al.  evaluated the impact of ACOs in cancer care and reported mixed results but did find some improvements in process quality due to decreased utilization of low-value services within the Medicare Pioneer ACO.
On the level of Donabedian’s quality dimensions, ACOs seemed to have a positive impact on process quality. However, this effect did not result in better outcome quality. Table 3 provides an overview of the results reported within this section.
Influence of Application Level (Group vs. Individual)
The initially defined research subject regarding this aspect was insufficiently addressed by the reviews included in this review. Only three reviews in the field of P4P [23,24,25] provided more detailed information about the difference between group and individual incentives. Petersen et al.  and Van Herck et al.  differentiated between studies of physician-level and group-level incentives and both reported that most studies showed positive results. However, Petersen et al.  found that effects for group-level incentives were weaker. This was supported by Kondo et al. , who also stated that physician-level incentives were more effective. A possible explanation was provided by Petersen et al. , who argued that this might be because the link between individual performance and the incentive is less direct in the group-level context.