Audit and feedback is widely used as a strategy to improve professional practice, either on its own or as a key component of multifaceted quality improvement (QI) interventions. Providing data regarding clinical performance may overcome health professionals’ limited abilities to accurately self-assess their performance.1 It is posited that when well-designed feedback demonstrates suboptimal performance for important and actionable targets, recipients are more likely to respond with efforts to improve quality of care.2

The most recent Cochrane systematic review and meta-analysis of audit and feedback included 140 randomized clinical trials (RCTs),3 making audit and feedback one of the most studied healthcare quality improvement (QI) interventions. Three Cochrane reviews over the course of 10 years came to the same conclusion: Audit and feedback generally leads to small but potentially important improvements in professional practice (Table 1).3 5 Yet despite the increasing number of audit and feedback trials (Fig. 1), uncertainty remains regarding when audit and feedback is likely to be most helpful and how to best optimize the intervention.6

Table 1 Findings from Cochrane Systematic Reviews and Meta-Analyses of Audit and Feedback Over Time
Figure 1
figure 1

Cumulative number of randomized trials featuring audit and feedback as a core component of a quality improvement intervention.

In some instances, audit and feedback is highly effective; learning from such examples is necessary to optimize the effectiveness of the intervention across different contexts. The Cochrane review and associated re-analyses have found that the effectiveness of audit and feedback depends to some extent on how the intervention is designed and delivered, suggesting an opportunity to maximize the impact of this QI strategy on quality of care.3 , 7 , 8 However, there is evidence that many audit and feedback interventions are developed and tested without an explicit attempt to consider relevant theories or to build upon extant knowledge.9 Ideally, results of early studies would inform the design of future interventions, and through this process, cumulative knowledge would lead to more effective QI. Given the continuing human and financial capital invested in audit and feedback interventions in health care, it is important to examine whether newer trials of audit and feedback have contributed new knowledge to the field.

The purpose of this paper is to extend the results of the Cochrane review of audit and feedback to explore the evolution of evidence supporting this QI intervention over time. In particular, we examined whether effect estimates, and the precision around those estimates, changed over time. To do this, we undertook a cumulative analysis of trials by year of publication and conducted a series of meta-regressions to understand how the literature has developed with respect to determining factors that could explain why audit and feedback is more or less effective.


This is a secondary analysis of data from the previously published Cochrane systematic review of audit and feedback. Complete methodological details are available3 and are summarized below. Ethics approval was not required for this study.

Eligibility Criteria

Audit and feedback was defined as a “summary of clinical performance of health care over a specified period of time.” This secondary analysis only included RCTs that directly compared audit and feedback (either alone or as the core, essential feature of a multifaceted intervention) to usual care. Furthermore, only RCTs that evaluated effects on provider practice as a primary outcome were included. For ease of interpretation of the meta-regression and cumulative meta-analysis, we further limited studies to those that reported dichotomous outcomes (i.e., compliance with intended professional practice).

Information Sources, Search, and Study Selection

A search strategy sensitive for RCTs involving audit and feedback was applied in December 2010 to the Cochrane Central Register of Controlled Trials, MEDLINE, EMBASE, and CINAHL. As previously described,3 we developed a MEDLINE search strategy that identified 89 % of all MEDLINE indexed studies from the previous version of the review and then translated this strategy into the other databases using the appropriate controlled vocabulary as applicable. Search terms included: audit, benchmarking, feedback, utilization review, health care quality, etcetera, plus typical search terms to focus on RCTs. Two reviewers independently screened the titles, abstracts, and full texts to apply inclusion criteria.

Data Collection Process

Two reviewers independently abstracted data from included studies. Studies included in the previous version of the Cochrane review of audit and feedback were reassessed due to changes in the data abstraction form and methods. Discrepancies were resolved through discussion. For studies lacking extractable data or without baseline information, we contacted investigators via email. Risk of bias for the primary outcome(s) in each study was assessed according to the Cochrane Effective Practice and Organization of Care group criteria10 (sequence generation, allocation concealment, blinding, incomplete outcome data, selective reporting, baseline similarity, lack of contamination, and other). We assigned an overall assessment of the risk of bias for each study as high, moderate, or low, following the recommendations in the Cochrane Handbook.11 Studies with a high risk of bias in at least one domain that decreased the certainty of the effect size of the primary outcome were considered to have a high risk of bias. Conversely, when a study had low risk of bias for each domain, it was deemed low risk of bias overall. Other studies were considered to have unclear risk of bias.

Measure of Treatment Effect

We only extracted results for the primary outcome. When the primary outcome was not specified, we used the variable described in the sample size calculation as the primary outcome. When the primary outcome was still unclear or when the manuscript described several primary process outcomes, we calculated the median value. We calculated the treatment effect as an adjusted risk difference (RD) by subtracting baseline differences from post-intervention differences. Thus, an adjusted RD of +10 % indicates that after accounting for baseline differences, health professionals receiving the intervention adhered to the desired practice 10 % more often than those not receiving the intervention.


Across multiple studies, we weighted the median effect by the number of health care providers. The ‘median of medians’ technique has been used in many similar reviews evaluating the effect of QI interventions on health professional performance,12 due to frequency of unit of analysis errors in the literature and the great variety of clinical contexts covered in the studies. For the cumulative analysis, the median adjusted RD and interquartile range (IQR) was recalculated at each time point as studies were added. The meta-regression examined how the adjusted RD was related to explanatory variables, weighted according to study size (number of health care professionals). Unlike the meta-regression from the Cochrane review of audit and feedback,3 high risk of bias studies were included. The meta-regression also tested the following potential sources of heterogeneity to explain variation in the results of the included studies: format (verbal, written, both, unclear); source (supervisor or senior colleague, professional standards review organization or representative of employer/purchaser, investigators, unclear); frequency (weekly, monthly, less than monthly, one-time); instruction for improvement (explicit measurable target or specific goal but no action plan, action plan with suggestions or advice given to help participants improve but no goal/target, both, neither); direction of change required (increase current behavior, decrease current behavior, mix or unclear); recipient (physician, other health professional); and study risk of bias (high, unclear, low). Meta-regression was conducted for all published trials as of 2010, 2006 and 2002. Finally, we added year of publication as a continuous variable to the meta-regression of all studies as an additional approach to assess whether this variable accounted for a significant portion of the heterogeneity. We conducted a multivariable linear regression using main effects only. Baseline compliance and year of publication were treated as continuous explanatory variables and the others as categorical. The analyses were conducted using the GLIMMIX procedure in SAS Version 9.2 (SAS Institute Inc. Cary, NC USA), accounting for the dependency between comparisons from the same trial.


Of the 140 RCTs included in the Cochrane review, 98 comparisons from 62 studies met the criteria for this study (Fig. 2). These studies included over 2,300 groups of healthcare providers (e.g., clinics or hospitals) from 38 trials allocating clusters of professionals, and more than 2,000 professionals from 24 trials allocating individual healthcare providers. Characteristics of these studies are described in Table 2. Most studies took place in the USA or Canada (55 %), and outpatient care was the most common setting (69 %). The feedback was delivered by either the investigators or by an unclear source in 85 % of studies. In 47 % of the studies, the feedback was only delivered once, and in 61 % the feedback did not include an explicit goal or action plan.

Figure 2
figure 2

Study flow diagram.

Table 2 Characteristics of Studies

The cumulative analysis revealed little change in the median effect or the interquartile range over the course of 25 years (Fig. 3). The median improvement in adherence to intended practice in 2002 after 51 comparisons had been published was 5.7 (IQR = 1.65–10.85), the effect in 2006 after 86 comparisons was 3.5 (IQR = 0.65–9.00), and the effect after including all 98 comparisons was 4.4 (IQR = 1.04–10.90).

Figure 3
figure 3

Cumulative analysis–effect size* of audit and feedback interventions over time (AF: audit and feedback; *absolute difference in compliance with intended professional behaviors).

The meta-regression revealed that heterogeneity in effect sizes could be explained in part by feedback characteristics, but year of publication did not explain a significant portion of the variability in effect size (Table 3). Feedback seemed most effective when it was: delivered from a supervisor or respected colleague; presented more than once; featuring both specific goals and action-plans; aiming to decrease the targeted behavior; with lower baseline performance; and when recipients were non-physicians. Studies published after 2006 did not change the meta-regression results statistically; differences in the estimated effect for most feedback characteristics have been apparent qualitatively since 2002. For example, although the p value was not significant for source of feedback in 2002, the estimated adjusted risk difference for feedback delivered by a supervisor or respected colleague (24.5) was higher than feedback delivered by study investigators (17.9) or feedback from a representative of a regulatory agency or employer (0.9).

Table 3 Factors Explaining Variability in Effectiveness of Feedback: Serial Meta-Regressions


Audit and feedback works; the median effect is small though still potentially important at the population level, and 27/98 comparisons (28 %) resulted in an improvement of at least 10 % in quality of care.3 Small differences in the results seen in these re-analyses compared to the results of the Cochrane review are due to the lack of weighting in the cumulative analysis and the inclusion of high risk of bias studies in the meta-regression. Nevertheless, the expected effect of an intervention comparing audit and feedback to usual care has changed very little over the last two decades. Furthermore, new trials have provided little new knowledge regarding key effect modifiers. Given the lack of equipoise, it may no longer be ethically appropriate to continue to direct human and financial resources toward trials comparing audit and feedback against usual care, especially for common conditions in common settings. At this point, the appropriate question is not, ‘can audit and feedback improve professional practice?’ but ‘how can the effect of audit and feedback interventions be optimized?’

Based on our analyses, feedback seems most effective when it: is delivered by a supervisor or respected colleague; is presented frequently; includes both specific goals and action-plans; aims to decrease the targeted behavior; focuses on a problem where there was larger scope for improvement; and when the recipients are non-physicians. Unfortunately, relatively few trials feature these components. Furthermore, our findings suggest that investigators are not building upon best practices. For example, despite evidence that repeated feedback is more effective, studies that evaluate interventions after only one cycle of feedback continue to be performed. Furthermore, of the 32 studies conducted after 2002 considered in this analysis, feedback was delivered by a supervisor or respected colleague only six times, and no studies included feedback with both explicit goals and action plans. As a result, even after 140 randomized trials of audit and feedback, it remains difficult to identify how to optimize audit and feedback.6 For instance, although a ‘supervisor or respected colleague’ appears to be the most effective source to deliver feedback, precise strategies to reliably identify and leverage such sources are not well known.13 In addition, while it is advisable for action plans to accompany feedback since the downside is minimal, the best way to operationalize this is unknown.7 , 14 It is noteworthy that explicit targets without action plans do not seem to be particularly helpful. To achieve performance targets, recipients of feedback benefit from correct solution information8 that can focus their attention on the targeted behavior(s).

Cumulative meta-analyses have previously been used to investigate whether future trials would be likely to change the conclusions regarding the effectiveness of QI or health services interventions.15 , 16 For audit and feedback, it is plausible that further studies comparing the intervention against control may be informative if they are conducted for settings, professional groups or behaviors not well targeted in the current review (although relatively few additional trials should be needed to confirm whether observed effects are broadly aligned with observed effects across the body of literature). We recognize the risks of cumulative meta-analysis with respect to multiple testing and escalating type one error.17 However, since the Cochrane review did not include a variance around the intervention effect, the figures showing the results of our cumulative analysis do not feature error bars as in the seminal examples of Lau et al.18 Additionally, the number of characteristics tested in the meta-regression was limited by statistical and pragmatic concerns. Variables were only chosen for abstraction if there was an a priori directional hypothesis and a belief that data would be available in published reports. Confidence in the results of the meta-regression is limited by reliance upon indirect comparisons and risk of ecological fallacy. In other words, relationships identified across studies through meta-regression may not reflect relationships evident within studies; this is also known as aggregation bias. Finally, as with any review, the limitations of the primary studies must be considered.

We acknowledge that many other potential variables, including the clinical topic and context, likely impact the effectiveness of the intervention.19 , 20 Amongst the 98 comparisons, there were 41 comparisons testing audit and feedback alone and 57 comparisons testing audit and feedback as the core, essential part of a multifaceted intervention. It is plausible that co-interventions may interact with the effect modifiers tested in the meta-regressions. A recent international meeting was conducted to identify high-yield research questions for understanding how to enhance the effectiveness of audit and feedback. Stakeholders suggested a need for more research to better understand how contextual and recipient characteristics moderate audit and feedback effectiveness, characteristics of the desired behavior change that make a good target for audit and feedback, and how the specific design of the audit and feedback intervention interacts with these factors.21

Given the importance of audit and feedback as a key component of many QI interventions, there is a need to identify opportunities to sequentially and systematically test various approaches to the design and development of audit and feedback. Researchers can continue to conduct uncoordinated trials of audit and feedback versus usual care and rely upon periodically conducted meta-regressions across studies to explore effect modifiers. But the results will be at risk of ecological fallacies, and as demonstrated here, this approach has resulted in minimal advances over time. Alternatively, researchers could achieve greater confidence in causal inference regarding more effective intervention design through a limited number of multi-arm trials with direct, head-to-head comparisons testing different approaches for designing and delivering audit and feedback. Another approach that could help advance cumulative knowledge regarding audit and feedback and other QI strategies would be to consider engineering-based methodological options that enable testing of multiple potential effect modifiers, such as theory-driven factorial and/or sequential adaptive trials.22 Future audit and feedback interventions should feature the aspects known to be associated with greater effectiveness and future trials should be powered to find relatively small effect sizes, especially in the case of head-to-head trials. This proposed shift in direction for QI trials parallels the movement to limit placebo-controlled trials of clinical interventions and to increase focus on comparative effectiveness research.23

The findings of this review suggest that QI trialists have failed to cumulatively learn from previous studies (or from systematic reviews). Rather, it would appear that the norm for those testing audit and feedback interventions is to ‘re-invent the wheel’, repeating rather than learning from and contributing to extant knowledge.24 As highlighted in the recent series on increasing value and reducing waste in research,25 the opportunity cost of continuing in the current manner is large for patients, providers, and health systems. A coordinated approach toward building upon previous literature and relevant theory to identify the key, active ingredients of interventions would help QI stakeholders achieve greater impact with their interventions and produce outcomes that are more generalizable.26 , 27 In particular, QI trialists could benefit from adapting the model of the Children’s Oncology Group, which has successfully shared resources to accelerate progress.28 At a minimum, for stakeholders involved in the funding and conduct of QI trials, this analysis emphasizes the need for trials of carefully planned interventions with explicitly justified components to ensure that the field of QI in healthcare can move forward.