Background

Laboratory testing is an important and high volume medical resource that facilitates disease detection and monitoring of patient status [1]. However, lab testing is prone to inappropriate use [1], with estimates suggesting that 20-30% of tests ordered are low-value, i.e., unnecessary, not indicated, or potentially harmful [1, 2]. While the tests themselves directly comprise only 4% of overall hospital expenditure, they are thought to be important in up to 70% of subsequent healthcare decisions and their related expenditures, and thus represent an important area for quality improvement [3, 4].

Critical care is one setting where tests are ordered often [5] and where there is concern of overuse contributing to clinically important poor outcomes in vulnerable patients [6,7,8,9,10,11,12,13,14]. Blood loss can contribute to iatrogenic anemia [5, 15]. Subsequent red blood cell (RBC) transfusions [5, 15] can be associated with non-trivial risks such as transfusion-associated circulatory overload (TACO), transfusion-related acute lung injury (TRALI), and transfusion-related immunomodulation (TRIM) [15, 16]. Similar to laboratory testing, transfusion ordering has been flagged as an important area for quality improvement due to inappropriate use [17,18,19,20,21,22]. A call to improve both practices was made by the Critical Care Societies Collaborative within their “Five Things Physicians and Patients Should Question” list, as part of the Choosing Wisely initiative [23]. The potential risks and downstream consequences associated with laboratory testing and transfusion ordering, in addition to increased expenditure and limited blood resources, all provide motivation to reduce inappropriate use [5, 15, 24, 25].

Audit and Feedback (A&F), the collection and provision of clinical performance data to healthcare providers, represents a potentially low cost and sustainable class of intervention [26, 27] for improvement of test and transfusion ordering in the critical care setting. A Cochrane review has demonstrated that A&F shows widespread effectiveness across a range of clinical behaviors [28]. It is a broadly used intervention, familiar to most healthcare providers. We hypothesize that this class of intervention may be particularly well suited to the critical care setting, as A&F can be provided at the individual or group level through a variety of different modalities. Furthermore, test and transfusion ordering is increasingly documented electronically, providing accessible data to produce feedback reports at a reasonable cost [27, 29,30,31,32]. A&F interventions in the context of test ordering in various clinical settings show a 22% relative risk reduction in test volume [33]. To date however, no review has examined the effectiveness of A&F interventions to modify these behaviors in the complex, team-based critical care setting.

Objectives

  • To review how A&F interventions targeting healthcare professionals have been implemented in the critical care setting to improve the appropriateness of laboratory test and transfusion ordering.

  • To summarize the effectiveness of these interventions as compared to usual care or other interventions in modifying laboratory test and transfusion ordering.

Methods

Protocol and registration

We used the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P checklist) [34] to draft our protocol, which was registered with the International Prospective Register of Systematic Reviews (PROSPERO: CRD42016051941) [35, 36]. All deviations to the protocol were minor and were implemented prior to the start of data extraction.

Eligibility criteria

Inclusion

Studies with the following PICOS characteristics were included in the review.

Population

Studies that targeted healthcare professionals (physicians, nurses, phlebotomists, or respiratory therapists) ordering laboratory tests or blood transfusion components (red blood cells (RBCs), platelets, plasma, or cryoprecipitate) for patients in an intensive care unit (ICU). Articles targeting healthcare professionals ordering laboratory tests or blood transfusion components for patients in a non-ICU setting were excluded.

Intervention

Studies assessing Audit and Feedback (A&F) interventions, defined as “Any summary of clinical performance of health care over a specified period of time. The summary may also have included recommendations for clinical action. The information may have been obtained from medical records, computerized databases, or observations from patients” [37]. We also included multifaceted interventions that included an A&F component (e.g., A&F paired with educational sessions).

Comparator

Studies that compared A&F interventions to usual care (no intervention; historical or concurrent), or any other single or multifaceted behavioral intervention that did not involve A&F (e.g., education, incentives, reminders, or systems-based changes).

Outcomes

Primary outcomes included the number of laboratory tests or transfusions ordered. Secondary outcomes included the appropriateness of ordered laboratory tests or transfusions (for example as judged by the clinical context, or as compared to specified guidelines), length of stay (LOS), mortality, infection, and laboratory test or blood product expenditure.

Study design

We included randomized controlled trials (RCTs), controlled clinical trials (CCTs), and observational studies (controlled before-after studies (CBAs), interrupted time series studies (ITSs), and uncontrolled before-after studies (UBAs)).

Setting

We assessed studies that implemented interventions in an intensive care setting. All types of hospitals (i.e., academic, community) and ICUs (i.e., surgical, medical, pediatric, neonatal, etc.) were included. Studies implementing interventions across multiple settings (i.e., hospital-wide) were only included if ICU-specific data was reported for the primary outcome.

Exclusion

No time restrictions or year or language filters were used. We excluded conference abstracts, commentaries and letters to the editor, as well as studies not published in English to maintain feasibility. Previous literature suggests such language restrictions do not greatly affect review conclusions [38]. Studies implementing interventions across multiple settings, but not reporting ICU-specific data for the primary outcome, were excluded.

Search strategy development and information sources

Our Medline (database conception: 1946) search strategy (Additional File 1) was developed with help from an information specialist. The strategy was then peer reviewed by a second, independent information specialist, as recommended by the Centre for Reviews and Dissemination [39,40,41]. Medical Subject Headings (MeSH terms) and title and abstract terms (“.tw”) were chosen for the general categories “Laboratory Tests,” “Transfusions,” “Intensive Care,” and “Audit and Feedback.” This template strategy was translated for use in the remaining databases, Embase (1947), EBM Reviews-Cochrane Central Register of Controlled Trials, CINAHL (1981), and PsycINFO (1806). These searches were run on October 28th, 2016, starting from database conception. The trial registries “ClincalTrials.gov” and International Standard Registered Clinical/soCial sTudy Number (ISRCTN) were additionally searched on December 23rd, 2016 to identify any relevant ongoing trials, using the search terms “intensive care” and “feedback.” The bibliographies of included articles and relevant systematic reviews [28, 33, 42,43,44] were also hand searched to identify any further articles meeting the inclusion criteria.

Study records

Data management

Citations retrieved from the search were imported into the reference manager software program Mendeley Desktop 1.17.12 (Mendeley Ltd., London, UK) for de-duplication, then imported into Covidence [45] for screening.

Selection process

The titles and abstracts of unique citations identified from electronic database searches were screened by two independent reviewers (MF and KC), and registry citations were screened by one reviewer (MF). Conflicts were resolved through discussion or reference to a third independent reviewer (JCB, JP). Full text articles were screened by one reviewer (MF), and justifications for inclusion or exclusion were confirmed by a second member of the research team (KC).

Data collection process

Data was extracted by two independent reviewers (MF and NM) using a standardized data extraction form implemented in Microsoft Excel 2011. One reviewer piloted the form on the first five articles and only minor refinements were required. Conflicts between data extraction forms were identified by one reviewer (MF), and consensus was reached between reviewers through discussion. If reviewers were not able to come to an agreement, a third reviewer (JCB, JP) was consulted to reach consensus.

Data extracted

We extracted several A&F intervention details based on characteristics described in the most recent Cochrane review [28] (format type, interval between reports (frequency)) and recently published guidance for the optimization of A&F [27] (type of data, specificity of data, number of reports, mode of delivery). We also extracted details about study design, type of control (e.g., historical, concurrent), type of ICU, type of patient (if applicable), type of laboratory test or blood component targeted, study participants (e.g., healthcare provider type), number of participants, follow-up time points, study country, funding, year of publication, and each study’s definition for an appropriate test or transfusion (if applicable). We also extracted other intervention components (for multifaceted interventions) according to the following categories adapted from the Effective Practice and Organisation of Care (EPOC) Taxonomy [46] and a review by Kobewka et al. [33]: Education, Guidelines, Opinion Leader, Administrative Intervention, Financial Incentive, or “Other.”

Risk of bias

Two independent reviewers (NM and MF) assessed the methodological quality of studies using a modified version of the EPOC Review Group’s quality criteria [37] used by Kobewka et al. [33] (Additional File 2). At the present time, there is not enough evidence to pick an appropriate cut-off to differentiate between high and low-quality studies. Furthermore, Cochrane recommends researchers avoid a scaled approach, and instead advocates for complete reporting of quality criteria [47]. We have thus presented results for each criteria item, and have not excluded any studies from our qualitative review. Reviewers were not blinded during data extraction or quality assessment. Cohen’s Kappa [48] was calculated manually to evaluate inter-rater reliability for extraction of the quality assessment criteria.

Data synthesis and analysis

Because of high heterogeneity in study designs, methods, outcomes, and variable reporting formats, we deemed meta-analysis to be inappropriate. Tables of study characteristics, intervention characteristics, and intervention effects were prepared to describe the set of included studies; absolute differences have been calculated for study outcomes. Our results have otherwise been reported as per the PRISMA guidelines, and a PRISMA checklist has been completed to document the inclusion of all critical elements of this review (Additional File 3) [49].

Results

Study selection

Figure 1 describes our screening process. Starting from 2364 citations (extracted from electronic databases on October 28th, 2016 and registries December 23rd, 2016), after removal of duplicates and two rounds of screening, 16 unique studies (described within a set of 17 publications) [16, 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65] were identified for inclusion (Note: Merlani et al. [60] and Diby et al. [61] are publications assessing different aspects of the same study). A list of the excluded full text articles, sorted by reason for exclusion, can be found in Additional File 4.

Fig. 1
figure 1

PRISMA flow diagram outlining the selection of citations for inclusion in the qualitative analysis [49]

Study characteristics

Table 1 describes characteristics of the included studies (n = 16). Ten of the 16 studies (63%) included transfusion outcomes [16, 50,51,52,53,54,55,56,57, 64], eight studies (50%) included test ordering outcomes [51, 57,58,59,60,61,62,63, 65], while two studies included both [51, 57]. Of the studies including test ordering outcomes, six aimed to reduce overall test ordering [57, 58, 60,61,62,63, 65], and four aimed to improve the appropriateness of tests; one aimed to increase compliance with a sepsis bundle [51], one aimed to improve compliance with arterial blood gas guidelines (an algorithm) [60, 61], one aimed to improve compliance with standards for practice in the ICU [59], and one aimed to reduce “unordered” tests (tests with no written order) [62]. Of the studies including transfusion ordering outcomes, three aimed to reduce the overall number of transfusions [50, 52, 57], while seven aimed to improve the appropriateness of transfusions [16, 51, 53,54,55,56, 64]. Of those assessing appropriateness, two aimed to improve compliance with a bundle [16, 51], three assessed appropriateness as per guidelines or a protocol involving a transfusion “trigger” (defined level(s) at which to transfuse) and sometimes other patient factors [54,55,56], and one study assessed appropriateness as per guidelines but included an additional category based on clinical context, “inconsistent with guidelines yet appropriate for ICU” [53]. The remaining study used a combination of transfusion “triggers” and an audit of clinical factors; however, several transfusion triggers were noted in the publication and it was not entirely clear which were used to specify appropriateness [64]. Further details on the criteria used to assess appropriateness can be found in Additional File 5.

Table 1 Summary of study characteristics (n = 16 studies)a

Most studies (81%) used an uncontrolled before-after design [50, 51, 53,54,55,56,57,58, 60,61,62,63,64,65]. Only one RCT [59], one controlled clinical trial (CCT) with a quasi-experimental comparative design [16], and one controlled before-after design were identified [52]. Most (56%) were conducted in America [50,51,52, 54,55,56,57, 59, 62] and two (13%) were conducted in Canada [53, 58]. Half (50%) of the included studies did not report their source of funding [16, 50, 55, 56, 58, 62, 63, 65]; four studies (25%) reported government grant funding [51, 54, 57, 59]. Most studies (56%) were conducted in a single ICU [51,52,53,54,55, 58, 60,61,62, 64], while four studies (25%) were conducted in multiple ICUs at a single centre [16, 50, 56, 65]. Most (69%) took place at academic hospitals [16, 51,52,53, 55, 57, 60,61,62,63,64,65]. The year of publication ranged from 1988 to 2016, and study duration ranged from 25 weeks to 4 years.

Assessment of study quality

A Cohen’s Kappa of 0.67 was computed for inter-rater reliability (Additional File 6), representing “substantial agreement” as per Landis and Koch, but just meeting the cut-off for “suggesting that … conclusions tentatively be made” as per Krippendorff [48]. As such, reviewers discussed all disagreements to reach a consensus. Additional File 2 describes the quality of included studies (n = 16). Overall quality of the studies was judged to be poor; 94% of studies [16, 50,51,52,53,54,55,56,57,58, 60,61,62,63,64,65] scored 4 or lower on the 8–9 criteria (risk of contamination was often not applicable). Most studies reported similar providers between groups (94%) [16, 51,52,53,54,55,56,57,58,59,60,61,62,63, 65], and used an objective primary outcome measure or blinded for the primary outcome assessment (88%; 13 studies [16, 50,51,52, 54, 56,57,58,59, 62,63,64,65] and one study [53] respectively). However, most studies lacked a concurrent control group (88%) [50,51,52,53,54,55,56,57,58, 60,61,62,63,64,65], did not use time series analysis (100%), provided an insufficient amount of detail to allow for replication (100%), and did not report the number of tests per patient (56%) [16, 50, 53, 54, 56, 59, 62, 64, 65].

Range of A&F interventions

There was a range of A&F interventions (n = 17) used in the 16 included studies.a As shown in Table 2, most interventions were multifaceted (88%) [16, 50,51,52,53, 55,56,57,58,59,60,61,62, 64, 65], including A&F and one or more additional components (i.e., education, guidelines, opinion leaders, financial incentives, checklists, or administrative interventions). Seven interventions (41%) reported providing feedback in a written format only [16, 52, 56, 57, 60, 61, 63, 65], four (24%) provided at least verbal feedback [54, 55, 58, 64], and three (18%) reported providing both written and verbal feedback [16, 59, 62]. Four interventions (24%) provided feedback only once [50, 59, 64, 65], nine (53%) provided feedback more than once [16, 51, 52, 54, 56, 57, 60, 61, 63], and in four cases it was unclear or the feedback was provided variably (24%) [53, 55, 58, 62]. Where reported, feedback was provided daily in one study [63], weekly in two (12%) [51, 54], monthly in three (18%) [16, 57, 60, 61], and at various instances in four (24%) [16, 52, 55, 56]. Feedback most often provided data on group performance only, in seven of the interventions (41%) [16, 50, 53, 57, 59,60,61, 65], three interventions provided both group and individual feedback (18%) [16, 54, 56], and one intervention only clearly reported providing individual feedback (unclear if group data was provided) [58]. Feedback recipients were most commonly multiple groups of healthcare providers (HCPs) (29%) [53, 55, 59,60,61,62], or physicians only (24%) [54, 56, 63, 64].

Table 2 Description of audit and feedback interventions

Summary of studies on improving test ordering

Table 3 summarizes test and transfusion ordering (or appropriateness) outcome data from the included studies. Six of the 16 studies aimed to reduce test ordering [57, 58, 60,61,62,63, 65]. Five of these six studies reported decreases (range − 1.6 mean tests per encounter, − 1.72 to − 8 tests per patient, − 1.7 to − 3.4 median tests per patient day, − 613.1 tests per 100 hospital days) [57, 58, 60, 61, 63, 65]. All three of the studies that tested significance found these reductions to be statistically significant [57, 60, 61, 65].

Table 3 Summary of effect outcomes for ‘placebo’ A&F studies (RCTs, controlled and uncontrolled before after studies)

Four studies aimed to improve the appropriateness of test orders (as per compliance with a bundle [51], guidelines (an algorithm) [60, 61], standards for practice [59], or whether the test had a written order [62]). Three of these four studies reported statistically significant increases in compliance (range + 5.3 to + 27%) [51, 59,60,61]. The remaining study reported a decrease in the proportion of inappropriate tests; however, upon assessing the number of overall tests per patient and inappropriate tests per patient, we noted undesired increases in both outcomes (range + 144 to + 214 total tests/patient; + 6 to + 15 unordered tests per patient). No statistical test was reported [62].

Summary of studies on improving transfusion ordering

Three studies sought to reduce transfusion orders. All three reported decreases (range − 0.1 mean RBC unit orders/encounter, − 79 FFP use/month [units not reported], − 6 to − 15.9% of patients receiving transfusion [all products], − 2288 units [all products]/year); one reported a statistically significant difference [57], one reported a statistically significant decrease for a subset of patients (overall significance not reported) [52], and one did not report a statistical test [50].

Seven studies [16, 51, 53,54,55,56, 64] aimed to improve the appropriateness of transfusion orders (as per compliance with a bundle [16, 51], a protocol/guideline [54,55,56], guidelines plus clinical context [53], and a combination of transfusion triggers and audit of patient factors [specifics unclear] [64]). Outcomes included the over-transfusion rate, the odds of an inappropriate transfusion, the proportion of patients receiving inappropriate orders, the threshold at which a transfusion was given, the proportion of transfusions with an inappropriate threshold, or compliance with a bundle. Two studies saw significant decreases (range: OR of inappropriate transfusion 0.37–0.52; proportion of patients receiving unnecessary transfusion − 6.6%) [54, 55]; one saw significant reductions during the intervention period and non-significant reductions at follow-up (range − 8 to − 23% inappropriate transfusions; − 5 to − 8% over-transfusion rate; − 0.3 to − 0.5 g/dL mean pre-transfusion trigger) [56]; one saw a significant reduction for one transfusion outcome, but no significant difference for another (− 6.9% to − 17% in proportion of transfusions over specific triggers; distribution of pre-transfusion platelet counts: p = 0.452) [64]; and one saw a non-significant increase in compliance (range + 3.1 to + 3.8% compliant episodes of transfusion) [51]. Another study saw non-significant decreases for both inappropriate transfusions and transfusions consistent with guidelines (− 14% and − 1% respectively) [53]. As described in Table 4, the final included study was a head-to-head comparison of different types of A&F and found the enhanced intervention (timely individual + monthly group feedback) to significantly improve compliance of transfusions as compared to the monthly, group A&F (range + 31 to + 36% bundle compliance) [16].

Table 4 Summary of effect outcomes for comparative A&F study

Table 3 also describes A&F in light of different comparators. Fourteen studies (88%) compared multifaceted interventions to usual care [16, 50,51,52,53, 55,56,57,58,59,60,61,62, 64,65,66]. In most cases, data were only reported for the baseline and post-intervention periods, thus not enabling direct assessment of A&F components only. Nine of these studies [51, 52, 55,56,57, 59,60,61, 64, 65] saw a statistically significant change in the hypothesized direction for at least one of the outcomes (range + 15 to + 27% in compliance, + 5.3 to + 21.6% compliant episodes, − 0.1 to − 1.6 orders/encounter, − 1.7 to − 3.4 median tests per patient day, − 613.1 tests/ 100 hospital days, − 6.9 to − 17% in proportion of transfusions over specific triggers, − 23% in inappropriate transfusions, − 8% in over-transfusion rate, − 0.5 g/dL mean pre-transfusion trigger, − 6.6% in patients receiving unnecessary transfusion, − 15.9% of patients receiving transfusion); three [50, 58, 63] reported changes in the hypothesized direction but did not report the significance (range − 1.72 to − 8 tests per patient; − 79 FFP use/month [units not reported]), and one [53] saw a statistically significant increase in transfusions “inconsistent with guidelines yet appropriate for the ICU” (+ 15% in requests), but non-significant decreases in both inappropriate (− 14% in requests) and “consistent with guidelines” transfusions (− 1% in requests). One study [62] did however provide a comparison of A&F alone versus usual care prior to implementing additional intervention components; undesired increases were seen for both overall (+ 144 tests per patient) and inappropriate tests per patient (+ 15 unordered tests per patient) (significance not reported). The only study [54] to implement a sole A&F intervention saw a significant decrease in the odds and proportion of inappropriate transfusion (OR 0.37–0.52).

Additional outcomes

Additional outcomes of interest, including length of stay, mortality, infection, and expenditure, are summarized in Tables 5 and 6. Length of stay (ICU or hospital) and mortality (ICU or hospital) outcomes were reported in totals of 11 studies [16, 51, 52, 54,55,56,57, 59,60,61,62, 64] and ten studies [16, 51,52,53,54,55,56,57, 59,60,61], respectively. A statistically significant reduction in LOS measure was reported in only one of the seven studies where it was tested [51]. Statistically significant decreases in mortality were found in three of the eight studies in which it was tested [51, 54, 57]. In the two studies that reported infection rates, one saw no statistical difference [59], and the other did not report statistical tests [52]. Savings or expenditure was reported in five studies [52, 57, 58, 60,61,62]; however, no statistical tests were reported.

Table 5 Summary of secondary outcomes for ‘placebo’ A&F studies (RCTs, controlled and uncontrolled before after studies)
Table 6 Summary of secondary outcomes for comparative A&F study

Discussion

A&F is known to be an effective component of interventions to improve practice [28], and it is suggested to be a feasible intervention due to the availability of electronic health data [27, 29, 30, 32]. However, relatively little work has explored how this behaviour change intervention can be effectively implemented in the complex, team-based critical care setting. Our systematic review yielded 16 studies, the majority of which showed positive effects, though their overall quality and rigour of design were assessed to be relatively weak.

Of the 16 included studies, only one [54] assessed A&F alone as the sole intervention; the remaining studies assessed the effects of A&F alongside a range of intervention components (and in one case it was unclear if there were additional components). That most studies used a multifaceted intervention was reasonable, as previous literature has suggested that these interventions are more effective than single component interventions [33, 66,67,68]. While the lack of simple comparison studies would seem to prevent us from directly assessing the effectiveness of A&F, some investigators have argued that the substantial literature (the latest Cochrane review included 140 trials [28]) demonstrates A&F’s effectiveness, and negates the need for further testing of this intervention on its own [69]. Instead, the assessment of the conditions and mechanisms under which A&F is most effective is argued to be more likely to improve effectiveness of interventions [28, 69, 70]. Future primary studies may therefore consider the application of theory, process evaluations, and methods to compare different intervention component combinations to facilitate identification of those that are most effective and to better understand the potential mechanisms [71, 72]. Syntheses of the literature of the sort we report here are another way to advance work in this field.

Our review points to some mechanisms by which A&F might be made more effective in the critical care context. Two studies in our review [16, 54] suggest enhancing group feedback with individual feedback may improve intervention effectiveness. This is in line with a previous meta-analysis which found that combined group and individual feedback yielded a larger effect size than either type of feedback alone [32]. Recent guidance around A&F [27] also suggests that provision of individualized feedback whenever possible is more likely to be effective, as group-level feedback is easier for an individual to discount. In the critical care context, both levels of feedback may be preferable, in that it addresses the team-based nature of critical care [73, 74], but still provides specific data for individual practitioners.

In eight of the 17 interventions, feedback was either presented only once, it was not clearly specified how often feedback was provided, or the feedback was provided variably (only when an inappropriate order was placed) [50, 53, 55, 58, 59, 62, 64, 65]. The finding that not all A&F interventions provide iterative feedback suggests that the important notion of the feedback loop [27] is overlooked in some cases. Recent guidance [27] recommends that feedback be provided multiple times, in order to close the feedback loop (i.e., a provider identifies a practice gap(s) based on the first instance of feedback, makes a change, and then needs subsequent instances of feedback to understand whether the practice change has resulted in improved outcomes).

While we were primarily interested in studies that aimed to reduce inappropriate tests and transfusions, it can be difficult to both define and adjudicate whether these resources are used appropriately [4, 44]. Thus, some studies aim to reduce inappropriate orders, but simply measure the overall reduction in tests or blood components. For instance, in our small sample, six studies (37.5%) did not assess appropriateness. Clear definitions of appropriate use are needed to ensure that the tests and transfusions reduced are in fact unnecessary, and that underuse and patient harm does not occur, especially in the context of the ICU. The remaining ten studies (62.5%) assessed appropriateness, with the majority identifying “appropriateness” as compliance with guidelines or protocols. Across studies, there was great variation in definitions of appropriateness, study aim, and outcomes measured. While it is plausible that varying definitions of appropriateness may have impacted the effectiveness of A&F, the small number of studies identified limited our ability to derive any differences and precluded statistical analysis.

The limited evidence we could find pertaining to patient length of stay (LOS) and mortality showed few significant differences. In part, this may be due to a lack of reporting on patient outcomes, an issue that has also been identified in other reviews [33].

We found studies in this area lacking on important quality indicators. Many studies lacked a concurrent control group, and only one study used randomization. No time-series analyses were identified. Interventions were rarely described adequately to allow for replication. Lack of an appropriate control group and time-series analysis makes interpretation of study results difficult, as any effect seen may simply be due to coincidence, Hawthorne effects, seasonal differences, or another undocumented change [75,76,77]. Non-randomized studies are at risk of introducing selection bias [47]. Furthermore, poor reporting of intervention details makes synthesis and replication more difficult.

A&F interventions for laboratory test and transfusion ordering exhibited differences that may be important but that we were unable to test statistically due to the low number of studies available to us. They differed substantially in terms of the outcomes reported for the two types of studies (e.g., number of tests ordered per 100 hospital days versus number of blood component units ordered per year; unordered “blood work” tests per patient versus proportion of patients receiving an unnecessary transfusion). We noted that a greater proportion of studies assessing transfusion practices (7/10) reported measures of appropriateness as compared to studies assessing laboratory test ordering (4/8), which more often focused on reduction alone. These findings may warrant further investigation when more studies are available.

Strengths and limitations

We conducted the first comprehensive review of A&F interventions for improvement of test and transfusion ordering in critical care. Our search strategy was developed and peer reviewed with guidance from library information specialists, and screening, data extraction, and the risk of bias assessment were completed by two independent reviewers. Furthermore, in addition to summarizing the effectiveness of these interventions, our review is the first to assess characteristics of the A&F interventions in light of recent best practice guidance [27].

Our study has limitations that warrant consideration. Inconsistency in reporting and differences in intervention component nomenclature complicated our categorization of intervention types. Using standard intervention categories and terms (such as those outlined by the EPOC taxonomy [46] or the Expert Recommendations for Implementing Change (ERIC) project [78]), reporting guidelines (such as the Template for Intervention Description and Replication (TIDieR) checklist [79]), and online access to more detailed descriptions of the interventions, may facilitate comparisons between studies in future reviews. Our use of an unvalidated subset of quality items also precluded us from computing an overall quality score for each study. While we worked hard to be comprehensive, some relevant studies may not have been included in our review as not all publications provide the relevant information in the abstract. Considerable work aiming to improve test and transfusion ordering may be conducted as quality improvement initiatives, and thus be less frequently published or more difficult to identify in electronic searches [80, 81]. Finally, there is the potential for publication bias; we note that many of the included studies showed desired, albeit weak effects, which may suggest that studies that have positive and/or significant findings may be more likely to be submitted and published. Due to the heterogeneity in outcomes, we were not able to assess the potential for publication bias by funnel plot, as Cochrane suggests asymmetry statistical tests be conducted with no less than ten studies [82]. Future updates to this review, however, may be able to address this issue.

Guidance for future research

Our research identifies several ways to advance this literature. Use of more rigorous study designs, such as randomized controlled trials or cluster randomized controlled trials, would help to produce a higher quality evidence base around A&F interventions in the critical care setting. Greater focus on head-to-head trials of different types of A&F to study potential mechanisms of action and whether theory-informed suggestions for best practice help to optimize this intervention would advance this literature [27, 28, 69]. To allow for more robust and conclusive synthesis techniques such as meta-analysis and network meta-analysis, primary studies should employ comparative designs measuring and reporting on common outcomes (e.g., the number of laboratory tests per patient). Furthermore, adoption of consistent [46, 78] and thorough reporting practices [79], improved access to feedback templates, and development of core outcome sets would enable research teams to produce cumulative knowledge. Measurement and reporting of core patient outcomes and cost data will also help to assess whether these interventions are safe and sustainable. In future updates of this review, it may be of interest to describe intervention components in light of established frameworks (e.g., Consolidated Framework for Implementation Research [83], TIDieR [79]), and to describe intervention implementation outcomes (e.g., acceptability, adoption, feasibility) [84].

Conclusions

This study showed that A&F is potentially effective in the critical care setting, but interventions are typically inconsistent with best practice recommendations for A&F interventions, and lack important indicators of study quality. In the majority of cases, A&F was implemented as one part of a multi-component intervention, limiting our ability to determine which components were contributing to the overall success. Additionally, the majority of studies in our sample were uncontrolled, leaving the results prone to bias [76].

More research focussed on the optimization of A&F in critical care is warranted; initial signals of efficacy, and the lack of consistency with best practices, suggest that these types of intervention can be improved. Future work should focus on understanding the mechanisms by which this intervention works [27, 85], particularly in this team-based environment. Assessment of whether interventions designed with more best practice recommendations [27] in place are more effective, would help to advance this literature. Further work to develop a tool enabling assessment of A&F interventions in terms of these best practice recommendations would be valuable. Such work will help us determine how A&F interventions may optimally improve test and transfusion ordering in the critical care setting.