Background

Implementation strategies are “methods or techniques used to enhance the adoption, implementation, and sustainment of evidence-based practices or programs” (EBPs) [1]. In 2015, the Expert Recommendations for Implementing Change (ERIC) study organized a panel of implementation scientists to compile a standardized set of implementation strategy terms and definitions [2,3,4]. These 73 strategies were then organized into nine “clusters” [5]. The ERIC taxonomy has been widely adopted and further refined [6,7,8,9,10,11,12,13]. However, much of the evidence for individual or groups of ERIC strategies remains narrowly focused. Prior systematic reviews and meta-analyses have assessed strategy effectiveness, but have generally focused on a specific strategy, (e.g., Audit and Provide Feedback) [14,15,16], subpopulation, disease (e.g., individuals living with dementia) [16], outcome [15], service setting (e.g., primary care clinics) [17,18,19] or geography [20]. Given that these strategies are intended to have broad applicability, there remains a need to understand how well implementation strategies work across EBPs and settings and the extent to which implementation knowledge is generalizable.

There are challenges in assessing the evidence of implementation strategies across many EBPs, populations, and settings. Heterogeneity in population characteristics, study designs, methods, and outcomes have made it difficult to quantitatively compare which strategies work and under which conditions [21]. Moreover, there remains significant variability in how researchers operationalize, apply, and report strategies (individually or in combination) and outcomes [21, 22]. Still, synthesizing data related to using individual strategies would help researchers replicate findings and better understand possible mediating factors including the cost, timing, and delivery by specific types of health providers or key partners [23,24,25]. Such an evidence base would also aid practitioners with implementation planning such as when and how to deploy a strategy for optimal impact.

Building upon previous efforts, we therefore conducted a systematic review to evaluate the level of evidence supporting the ERIC implementation strategies across a broad array of health and human service settings and outcomes, as organized by the evaluation framework, RE-AIM (Reach, Effectiveness, Adoption, Implementation, Maintenance) [26,27,28]. A secondary aim of this work was to identify patterns in scientific reporting of strategy use that could not only inform reporting standards for strategies but also the methods employed in future. The current study was guided by the following research questionsFootnote 1:

  1. 1.

    What implementation strategies have been most commonly and rigorously tested in health and human service settings?

  2. 2.

    Which implementation strategies were commonly paired?

  3. 3.

    What is the evidence supporting commonly tested implementation strategies?

Methods

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA-P) model [29,30,31] to develop and report on the methods for this systematic review (Additional File 1). This study was considered to be non-human subjects research by the RAND institutional review board.

Registration

The protocol was registered with PROSPERO (PROSPERO 2021 CRD42021235592).

Eligibility criteria

This review sought to synthesize evidence for implementation strategies from research studies conducted across a wide range of health-related settings and populations. Inclusion criteria required studies to: 1) available in English; 2) published between January 1, 2010 and September 20, 2022; 3) based on experimental research (excluded protocols, commentaries, conference abstracts, or proposed frameworks); 4) set in a health or human service context (described below); 5) tested at least one quantitative outcome that could be mapped to the RE-AIM evaluation framework [26,27,28]; and 6) evaluated the impact of an implementation strategy that could be classified using the ERIC taxonomy [2, 32]. We defined health and human service setting broadly, including inpatient and outpatient healthcare settings, specialty clinics, mental health treatment centers, long-term care facilities, group homes, correctional facilities, child welfare or youth services, aging services, and schools, and required that the focus be on a health outcome. We excluded hybrid type I trials that primarily focused on establishing EBP effectiveness, qualitative studies, studies that described implementation barriers and facilitators without assessing implementation strategy impact on an outcome, and studies not meeting standardized rigor criteria defined below.

Information sources

Our three-pronged search strategy included searching academic databases (i.e., CINAHL, PubMed, and Web of Science for replicability and transparency), seeking recommendations from expert implementation scientists, and assessing existing, relevant systematic reviews and meta-analyses.

Search strategy

Search terms included “implementation strateg*” OR “implementation intervention*” OR “implementation bundl*” OR “implementation support*.” The search, conducted on September 20, 2022, was limited to English language and publication between 2010 and 2022, similar to other recent implementation science reviews [22]. This timeframe was selected to coincide with the advent of Implementation Science and when the term “implementation strategy” became conventionally used [2, 4, 33]. A full search strategy can be found in Additional File 2.

Title and abstract screening process

Each study’s title and abstract were read by two reviewers, who dichotomously scored studies on each of the six eligibility criteria described above as yes=1 or no=0, resulting in a score ranging from 1 to 6. Abstracts receiving a six from both reviewers were included in the full text review. Those with only one score of six were adjudicated by a senior member of the team (MJC, SSR, DEG). The study team held weekly meetings to troubleshoot and resolve any ongoing issues noted through the abstract screening process.

Full text screening

During the full text screening process, we reviewed, in pairs, each article that had progressed through abstract screening. Conflicts between reviewers were adjudicated by a senior member of the team for a final inclusion decision (MJC, SSR, DEG).

Review of study rigor

After reviewing published rigor screening tools [34,35,36], we developed an assessment of study rigor that was appropriate for the broad range of reviewed implementation studies. Reviewers evaluated studies on the following: 1) presence of a concurrent comparison or control group (=2 for traditional randomized controlled trial or stepped wedge cluster randomized trial and =1 for pseudo-randomized and other studies with concurrent control); 2) EBP standardization by protocol or manual (=1 if present); 3) EBP fidelity tracking (=1 if present); 4) implementation strategy standardization by operational description, standard training, or manual (=1 if present); 5) length of follow-up from full implementation of intervention (=2 for twelve months or longer, =1 for six to eleven months, or =0 for less than six months); and 6) number of sites (=1 for more than one site). Rigor scores ranged from 0 to 8, with 8 indicating the most rigorous. Articles were included if they 1) included a concurrent control group, 2) had an experimental design, and 3) received a score of 7 or 8 from two independent reviewers.

Outside expert consultation

We contacted 37 global implementation science experts who were recognized by our study team as leaders in the field or who were commonly represented among first or senior authors in the included abstracts. We asked each expert for recommendations of publications meeting study inclusion criteria (i.e., quantitatively evaluating the effectiveness of an implementation strategy). Recommendations were recorded and compared to the full abstract list.

Systematic reviews

Eighty-four systematic reviews were identified through the initial search strategy (See Additional File 3). Systematic reviews that examined the effectiveness of implementation strategies were reviewed in pairs for studies that were not found through our initial literature search.

Data abstraction and coding

Data from the full text review were abstracted in pairs, with conflicts resolved by senior team members (DEG, MJC) using a standard Qualtrics abstraction form. The form captured the setting, number of sites and participants studied, evidence-based practice/program of focus, outcomes assessed (based on RE-AIM), strategies used in each study arm, whether the study took place in the U.S. or outside of the U.S., and the findings (i.e., was there significant improvement in the outcome(s)?). We coded implementation strategies used in the Control and Experimental Arms. We defined the Control Arm as receiving the lowest number of strategies (which could mean zero strategies or care as usual) and the Experimental Arm as the most intensive arm (i.e., receiving the highest number of strategies). When studies included multiple Experimental Arms, the Experimental Arm with the least intensive implementation strategy(ies) was classified as “Control” and the Experimental Arm with the most intensive implementation strategy(ies) was classified as the “Experimental” Arm.

Implementation strategies were classified using standard definitions (MJC, SSR, DEG), based on minor modifications to the ERIC taxonomy [2,3,4]. Modifications resulted in 70 named strategies and were made to decrease redundancy and improve clarity. These modifications were based on input from experts, cognitive interview data, and team consensus [37] (See Additional File 4). Outcomes were then coded into RE-AIM outcome domains following best practices as recommended by framework experts [26,27,28]. We coded the RE-AIM domain of Effectiveness as either an assessment of the effectiveness of the EBP or the implementation strategy. We did not assess implementation strategy fidelity or effects on health disparities as these are recently adopted reporting standards [27, 28] and not yet widely implemented in current publications. Further, we did not include implementation costs as an outcome because reporting guidelines have not been standardized [38, 39].

Assessment and minimization of bias

Assessment and minimization of bias is an important component of high-quality systematic reviews. The Cochrane Collaboration guidance for conducting high-quality systematic reviews recommends including a specific assessment of bias for individual studies by assessing the domains of randomization, deviations of intended intervention, missing data, measurement of the outcome, and selection of the reported results (e.g., following a pre-specified analysis plan) [40, 41]. One way we addressed bias was by consolidating multiple publications from the same study into a single finding (i.e., N=1), so-as to avoid inflating estimates due to multiple publications on different aspects of a single trial. We also included high-quality studies only, as described above. However, it was not feasible to consistently apply an assessment of bias tool due to implementation science’s broad scope and the heterogeneity of study design, context, outcomes, and variable measurement, etc. For example, most implementation studies reviewed had many outcomes across the RE-AIM framework, with no one outcome designated as primary, precluding assignment of a single score across studies.

Analysis

We used descriptive statistics to present the distribution of health or healthcare area, settings, outcomes, and the median number of included patients and sites per study, overall and by country (classified as U.S. vs. non-U.S.). Implementation strategies were described individually, using descriptive statistics to summarize the frequency of strategy use “overall” (in any study arm), and the mean number of strategies reported in the Control and Experimental Arms. We additionally described the strategies that were only in the experimental (and not control) arm, defining these as strategies that were “tested” and may have accounted for differences in outcomes between arms.

We described frequencies of pair-wise combinations of implementation strategies in the Experimental Arm. To assess the strength of the evidence supporting implementation strategies that were used in the Experimental Arm, study outcomes were categorized by RE-AIM and coded based on whether the association between use of the strategies resulted in a significantly positive effect (yes=1; no=0). We then created an indicator variable if at least one RE-AIM outcome in the study was significantly positive (yes=1; no=0). We plotted strategies on a graph with quadrants based on the combination of median number of studies in which a strategy appears and the median percent of studies in which a strategy was associated with at least one positive RE-AIM outcome. The upper right quadrant—higher number of studies overall and higher percent of studies with a significant RE-AIM outcome—represents a superior level of evidence. For implementation strategies in the upper right quadrant, we describe each RE-AIM outcome and the proportion of studies which have a significant outcome.

Results

Search results

We identified 14,646 articles through the initial literature search, 17 articles through expert recommendation (three of which were not included in the initial search), and 1,942 articles through reviewing prior systematic reviews (Fig. 1). After removing duplicates, 9,399 articles were included in the initial abstract screening. Of those, 48% (n=4,075) abstracts were reviewed in pairs for inclusion. Articles with a score of five or six were reviewed a second time (n=2,859). One quarter of abstracts that scored lower than five were reviewed for a second time at random. We screened the full text of 1,426 articles in pairs. Common reasons for exclusion were 1) study rigor, including no clear delineation between the EBP and implementation strategy, 2) not testing an implementation strategy, and 3) article type that did not meet inclusion criteria (e.g., commentary, protocol, etc.). Six hundred seventeen articles were reviewed for study rigor with 385 excluded for reasons related to study design and rigor, and 86 removed for other reasons (e.g., not a research article). Among the three additional expert-recommended articles, one met inclusion criteria and was added to the analysis. The final number of studies abstracted was 129 representing 143 publications.

Fig. 1
figure 1

Expanded PRISMA Flow Diagram

The expanded PRISMA flow diagram provides a description of each step in the review and abstraction process for the systematic review

Descriptive results

Of 129 included studies (Table 1; see also Additional File 5 for Summary of Included Studies), 103 (79%) were conducted in a healthcare setting. EBP health care setting varied and included primary care (n=46; 36%), specialty care (n=27; 21%), mental health (n=11; 9%), and public health (n=30; 23%), with 64 studies (50%) occurring in an outpatient health care setting. Studies included a median of 29 sites and 1,419 target population (e.g., patients or students). The number of strategies varied widely across studies, with Control Arms averaging approximately two strategies (Range = 0-20, including studies with no strategy in the comparison group) and Experimental Arms averaging eight strategies (Range = 1-21). Non-US studies (n=73) included more sites and target population on average, with an overall median of 32 sites and 1,531 patients assessed in each study.

Table 1 Study characteristics

Organized by RE-AIM, the most evaluated outcomes were Effectiveness (n = 82, 64%) and Implementation (n = 73, 56%); followed by Maintenance (n=40; 31%), Adoption (n=33; 26%), and Reach (n=31; 24%). Most studies (n = 98, 76%) reported at least one significantly positive outcome. Adoption and Implementation outcomes showed positive change in three-quarters of studies (n=78), while Reach (n=18; 58%), Effectiveness (n=44; 54%), and Maintenance (n=23; 58%) outcomes evidenced positive change in approximately half of studies.

The following describes the results for each research question.

  1. 1.

    What implementation strategies have been most commonly and rigorously tested in health and human service settings?

Table 2 shows the frequency of studies within which an implementation strategy was used in the Control Arm, Experimental Arm(s), and tested strategies (those used exclusively in the Experimental Arm) grouped by strategy type, as specified by previous ERIC reports [2, 6].

Table 2 Frequency of ERIC implementation strategy use

Control arm

In about half the studies (53%; n=69), the Control Arms were “active controls” that included at least one strategy, with an average of 1.64 (and up to 20) strategies reported in control arms. The two most common strategies used in Control Arms were: Distribute Educational Materials (n=52) and Conduct Educational Meetings (n=30).

Experimental arm

Experimental conditions included an average of 8.33 implementation strategies per study (Range = 1-21). Figure 2 shows a heat map of the strategies that were used in the Experimental Arms in each study. The most common strategies in the Experimental Arm were Distribute Educational Materials (n=99), Conduct Educational Meetings (n=96), Audit and Provide Feedback (n=76), and External Facilitation (n=59).

Fig. 2
figure 2

Implementation strategies used in the Experimental Arm of included studies. Explore more here: https://public.tableau.com/views/Figure2_16947070561090/Figure2?:language=en-US&:display_count=n&:origin=viz_share_link

Tested strategies

The average number of implementation strategies that were included in the Experimental Arm only (and not in the Control Arm) was 6.73 (Range = 0-20).Footnote 2 Overall, the top 10% of tested strategies included Conduct Educational Meetings (n=68), Audit and Provide Feedback (n=63), External Facilitation (n=54), Distribute Educational Materials (n=49), Tailor Strategies (n=41), Assess for Readiness and Identify Barriers and Facilitators (n=38) and Organize Clinician Implementation Team Meetings (n=37). Few studies tested a single strategy (n=9). These strategies included, Audit and Provide Feedback, Conduct Educational Meetings, Conduct Ongoing Training, Create a Learning Collaborative, External Facilitation (n=2), Facilitate Relay of Clinical Data To Providers, Prepare Patients/Consumers to be Active Participants, and Use Other Payment Schemes. Three implementation strategies were included in the Control or Experimental Arms but were not Tested including, Use Mass Media, Stage Implementation Scale Up, and Fund and Contract for the Clinical Innovation.

  1. 2.

    Which implementation strategies were commonly paired?

Table 3 shows the five most used strategies in Experimental Arms with their top ten most frequent pairings, excluding Distribute Educational Materials and Conduct Educational Meetings, as these strategies were included in almost all Experimental and half of Control Arms. The five most used strategies in the Experimental Arm included Audit and Provide Feedback (n=76), External Facilitation (n=59), Tailor Strategies (n=43), Assess for Readiness and Identify Barriers and Facilitators (n=43), and Organize Implementation Teams (n=42).

Table 3 Top 5 commonly used strategies in the Experimental Arm and their 10 most common pairings, organized by cluster †

Strategies frequently paired with these five strategies included two educational strategies: Distribute Educational Materials and Conduct Educational Meetings. Other commonly paired strategies included Develop a Formal Implementation Blueprint, Promote Adaptability, Conduct Ongoing Training, Purposefully Reexamine the Implementation, and Develop and Implement Tools for Quality Monitoring.

  1. 3.

    What is the evidence supporting commonly tested implementation strategies?

We classified the strength of evidence for each strategy by evaluating both the number of studies in which each strategy appeared in the Experimental Arm and the percentage of times there was at least one significantly positive RE-AIM outcome. Using these factors, Fig. 3 shows the number of studies in which individual strategies were evaluated (on the y axis) compared to the percentage of times that studies including those strategies had at least one positive outcome (on the x axis). Due to the non-normal distribution of both factors, we used the median (rather than the mean) to create four quadrants. Strategies in the lower left quadrant were tested in fewer than the median number of studies (8.5) and were less frequently associated with a significant RE-AIM outcome (75%). The upper right quadrant included strategies that occurred in more than the median number of studies (8.5) and had more than the median percent of studies with a significant RE-AIM outcome (75%); thus those 19 strategies were viewed as having stronger evidence. Of those 19 implementation strategies, Conduct Educational Meetings, Distribute Educational Materials, External Facilitation, and Audit and Provide Feedback continued to occur frequently, appearing in 59-99 studies.

Fig. 3
figure 3

Experimental Arm Implementation Strategies with significant RE-AIM outcome. Explore more here: https://public.tableau.com/views/Figure3_16947017936500/Figure3?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

Figure 4 graphically illustrates the proportion of significant outcomes for each RE-AIM outcome for the 19 commonly used and evidence-based implementation strategies in the upper right quadrant. These findings again show the widespread use of Conduct Educational Meetings and Distribute Educational Materials. Implementation and Effectiveness outcomes were assessed most frequently, with Implementation being the mostly commonly reported significantly positive outcome.

Fig. 4
figure 4

RE-AIM outcomes for the 19 Top-Right Quadrant Implementation Strategies. The y-axis is the number of studies and the x-axis is a stacked bar chart for each RE-AIM outcome with R=Reach, E=Effectiveness, A=Adoption, I=Implementation, M=Maintenance. Blue denotes at least one significant RE-AIM outcome; Light blue denotes studies which used the given implementation strategy and did not have a significant RE-AIM. Explore more here: https://public.tableau.com/views/Figure4_16947017112150/Figure4?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link

Discussion

This systematic review identified 129 experimental studies examining the effectiveness of implementation strategies across a broad range of health and human service studies. Overall, we found that evidence is lacking for most ERIC implementation strategies, that most studies employed combinations of strategies, and that implementation outcomes, categorized by RE-AIM dimensions, have not been universally defined or applied. Accordingly, other researchers have described the need for universal outcomes definitions and descriptions across implementation research studies [28, 42]. Our findings have important implications not only for the current state of the field but also for creating guidance to help investigators determine which strategies and in what context to examine.

The four most evaluated strategies were Distribute Educational Materials, Conduct Educational Meetings, External Facilitation, and Audit and Provide Feedback. Conducting Educational Meetings and Distributing Educational Materials were surprisingly the most common. This may reflect the fact that education strategies are generally considered to be “necessary but not sufficient” for successful implementation [43, 44]. Because education is often embedded in interventions, it is critical to define the boundary between the innovation and the implementation strategies used to support the innovation. Further specification as to when these strategies are EBP core components or implementation strategies (e.g., booster trainings or remediation) is needed [45, 46].

We identified 19 implementation strategies that were tested in at least 8 studies (more than the median) and were associated with positive results at least 75% of the time. These strategies can be further categorized as being used in early or pre-implementation versus later in implementation. Preparatory activities or pre-implementation, strategies that had strong evidence included educational activities (Meetings, Materials, Outreach visits, Train for Leadership, Use Train the Trainer Strategies) and site diagnostic activities (Assess for Readiness, Identify Barriers and Facilitators, Conduct Local Needs Assessment, Identify and Prepare Champions, and Assess and Redesign Workflows). Strategies that target the implementation phase include those that provide coaching and support (External and Internal Facilitation), involve additional key partners (Intervene with Patients to Enhance Uptake and Adherence), and engage in quality improvement activities (Audit and Provide Feedback, Facilitate the Relay of Clinical Data to Providers, Purposefully Reexamine the Implementation, Conduct Cyclical Small Tests of Change, Develop and Implement Tools for Quality Monitoring).

There were many ERIC strategies that were not represented in the reviewed studies, specifically the financial and policy strategies. Ten strategies were not used in any studies, including: Alter Patient/Consumer Fees, Change Liability Laws, Change Service Sites, Develop Disincentives, Develop Resource Sharing Agreements, Identify Early Adopters, Make Billing Easier, Start a Dissemination Organization, Use Capitated Payments, and Use Data Experts. One of the limitations of this investigation was that not all individual strategies or combinations were investigated. Reasons for the absence of these strategies in our review may include challenges with testing certain strategies experimentally (e.g., changing liability laws), limitations in our search terms, and the relative paucity of implementation strategy trials compared to clinical trials. Many “untested” strategies require large-scale structural changes with leadership support (see [47] for policy experiment example). Recent preliminary work has assessed the feasibility of applying policy strategies and described the challenges with doing so [48,49,50]. While not impossible in large systems like VA (for example: the randomized evaluation of the VA Stratification Tool for Opioid Risk Management) the large size, structure, and organizational imperative makes these initiatives challenging to experimentally evaluate. Likewise, the absence of these ten strategies may have been the result of our inclusion criteria, which required an experimental design. Thus, creative study designs may be needed to test high-level policy or financial strategies experimentally.

Some strategies that were likely under-represented in our search strategy included electronic medical record reminders and clinical decision support tools and systems. These are often considered “interventions” when used by clinical trialists and may not be indexed as studies involving ‘implementation strategies’ (these tools have been reviewed elsewhere [51,52,53]). Thus, strategies that are also considered interventions in the literature (e.g., education interventions) were not sought or captured. Our findings do not imply that these strategies are ineffective, rather that more study is needed. Consistent with prior investigations [54], few studies meeting inclusion criteria tested financial strategies. Accordingly, there are increasing calls to track and monitor the effects of financial strategies within implementation science to understand their effectiveness in practice [55, 56]. However, experts have noted that the study of financial strategies can be a challenge given that they are typically implemented at the system-level and necessitate research designs for studying policy-effects (e.g., quasi-experimental methods, systems-science modeling methods) [57]. Yet, there have been some recent efforts to use financial strategies to support EBPs that appear promising [58] and could be a model for the field moving forward.

The relationship between the number of strategies used and improved outcomes has been described inconsistently in the literature. While some studies have found improved outcomes with a bundle of strategies that were uniquely combined or a standardized package of strategies (e.g., Replicating Effective Programs [59, 60] and Getting To Outcomes [61, 62]), others have found that “more is not always better” [63,64,65]. For example, Rogal and colleagues documented that VA hospitals implementing a new evidence-based hepatitis C treatment chose >20 strategies, when multiple years of data linking strategies to outcomes showed that 1-3 specific strategies would have yielded the same outcome [39]. Considering that most studies employed multiple or multifaceted strategies, it seems that there is a benefit of using a targeted bundle of strategies that are purposefully aligns with site/clinic/population norms, rather than simply adding more strategies [66].

It is difficult to assess the effectiveness of any one implementation strategy in bundles where multiple strategies are used simultaneously. Even a ‘single’ strategy like External Facilitation is, in actuality, a bundle of narrowly constructed strategies (e.g., Conduct Educational Meetings, Identify and Prepare Champions, and Develop a Formal Implementation Blueprint). Thus, studying External Facilitation does not allow for a test of the individual strategies that comprise it, potentially masking the effectiveness of any individual strategy. While we cannot easily disaggregate the effects of multifaceted strategies, doing so may not yield meaningful results. Because strategies often synergize, disaggregated results could either underestimate the true impact of individual strategies or conversely, actually undermine their effectiveness (i.e., when their effectiveness comes from their combination with other strategies). The complexity of health and human service settings, imperative to improve public health outcomes, and engagement with community partners often requires the use of multiple strategies simultaneously. Therefore, the need to improve real-world implementation may outweigh the theoretical need to identify individual strategy effectiveness. In situations where it would be useful to isolate the impact of single strategies, we suggest that the same methods for documenting and analyzing the critical components (or core functions) of complex interventions [67,68,69,70] may help to identify core components of multifaceted implementation strategies [71,72,73,74].

In addition, to truly assess the impacts of strategies on outcomes, it may be necessary to track fidelity to implementation strategies (not just the EBPs they support). While this can be challenging, without some degree of tracking and fidelity checks, one cannot determine whether a strategy’s apparent failure to work was because it 1) was ineffective or 2) was not applied well. To facilitate this tracking there are pragmatic tools to support researchers. For example, the Longitudinal Implementation Strategy Tracking System (LISTS) offers a pragmatic and feasible means to assess fidelity to and adaptations of strategies [75].

Implications for implementation science: four recommendations

Based on our findings, we offer four recommended “best practices” for implementation studies.

  1. 1.

    Prespecify strategies using standard nomenclature. This study reaffirmed the need to apply not only a standard naming convention (e.g., ERIC) but also a standard reporting of for implementation strategies. While reporting systems like those by Proctor [1] or Pinnock [75] would optimize learning across studies, few manuscripts specify strategies as recommended [76, 77]. Pre-specification allows planners and evaluators to assess the feasibility and acceptability of strategies with partners and community members [24, 78, 79] and allows evaluators and implementers to monitor and measure the fidelity, dose, and adaptations to strategies delivered over the course of implementation [27]. In turn, these data can be used to assess the costs, analyze their effectiveness [38, 80, 81], and ensure more accurate reporting [82,83,84,85]. This specification should include, among other data, the intensity, stage of implementation, and justification for the selection. Information regarding why strategies were selected for specific settings would further the field and be of great use to practitioners. [63, 65, 69, 79, 86].

  2. 2.

    Ensure that standards for measuring and reporting implementation outcomes are consistently applied and account for the complexity of implementation studies. Part of improving standardized reporting must include clearly defining outcomes and linking each outcome to particular implementation strategies. It was challenging in the present review to disentangle the impact of the intervention(s) (i.e., the EBP) versus the impact of the implementation strategy(ies) for each RE-AIM dimension. For example, often fidelity to the EBP was reported but not for the implementation strategies. Similarly, Reach and Adoption of the intervention would be reported for the Experimental Arm but not for the Control Arm, prohibiting statistical comparisons of strategies on the relative impact of the EBP between study arms. Moreover, there were many studies evaluating numerous outcomes, risking data dredging. Further, the significant heterogeneity in the ways in which implementation outcomes are operationalized and reported is a substantial barrier to conducting large-scale meta-analytic approaches to synthesizing evidence for implementation strategies [67]. The field could look to others in the social and health sciences for examples in how to test, validate, and promote a common set of outcome measures to aid in bringing consistency across studies and real-world practice (e.g., the NIH-funded Patient-Reported Outcomes Measurement Information System [PROMIS], https://www.healthmeasures.net/explore-measurement-systems/promis).

  3. 3.

    Develop infrastructure to learn cross-study lessons in implementation science. Data repositories, like those developed by NCI for rare diseases, U.S. HIV Implementation Science Coordination Initiative [87], and the Behavior Change Technique Ontology [88], could allow implementation scientists to report their findings in a more standardized manner, which would promote ease of communication and contextualization of findings across studies. For example, the HIV Implementation Science Coordination Initiative requested all implementation projects use common frameworks, developed user friendly databases to enable practitioners to match strategies to determinants, and developed a dashboard of studies that assessed implementation determinants [89,90,91,92,93,94].

  4. 4.

    Develop and apply methods to rigorously study common strategies and bundles. These findings support prior recommendations for improved empirical rigor in implementation studies [46, 95]. Many studies were excluded from our review based on not meeting methodological rigor standards. Understanding the effectiveness of discrete strategies deployed alone or in combination requires reliable and low burden tracking methods to collect information about strategy use and outcomes. For example, frameworks like the Implementation Replication Framework [96] could help interpret findings across studies using the same strategy bundle. Other tracking approaches may leverage technology (e.g., cell phones, tablets, EMR templates) [78, 97] or find novel, pragmatic approaches to collect recommended strategy specifications over time (e.g.., dose, deliverer, and mechanism) [1, 9, 27, 98, 99]. Rigorous reporting standards could inform more robust analyses and conclusions (e.g., moving toward the goal of understanding causality, microcosting efforts) [24, 38, 100, 101]. Such detailed tracking is also required to understand how site-level factors moderate implementation strategy effects [102]. In some cases, adaptive trial designs like sequential multiple assignment randomized trials (SMARTs) and just-in-time adaptive interventions (JITAIs) can be helpful for planning strategy escalation.

Limitations

Despite the strengths of this review, there were certain notable limitations. For one, we only included experimental studies, omitting many informative observational investigations that cover the range of implementation strategies. Second, our study period was centered on the creation of the journal Implementation Science and not on the standardization and operationalization of implementation strategies in the publication of the ERIC taxonomy (which came later). This, in conjunction with latency in reporting study results and funding cycles, means that the employed taxonomy was not applied in earlier studies. To address this limitation, we retroactively mapped strategies to ERIC, but it is possible that some studies were missed. Additionally, indexing approaches used by academic databases may have missed relevant studies. We addressed this particular concern by reviewing other systematic reviews of implementation strategies and soliciting recommendations from global implementation science experts.

Another potential limitation comes from the ERIC taxonomy itself—i.e., strategy listings like ERIC are only useful when they are widely adopted and used in conjunction with guidelines for specifying and reporting strategies [1] in protocol and outcome papers. Although the ERIC paper has been widely cited (over three thousand times, accessed about 186 thousand times), it is still not universally applied, making tracking the impact of specific strategies more difficult. However, our experience with this review seemed to suggest that ERIC’s use was increasing over time. Also, some have commented that ERIC strategies can be unclear and are missing key domains. Thus, researchers are making definitions clearer for lay users [37, 103], increasing the number of discrete strategies for specific domains like HIV treatment, acknowledging strategies for new functions (e.g., de-implementation [104], local capacity building), accounting for phases of implementation (dissemination, sustainment [13], scale-up), addressing settings [12, 20], actors roles in the process, and making mechanisms of change to select strategies more user-friendly through searchable databases [9, 10, 54, 73, 104,105,106]. In sum, we found the utility of the ERIC taxonomy to outweigh any of the taxonomy’s current limitations.

As with all reviews, the search terms influenced our findings. As such, the broad terms for implementation strategies (e.g., “evidence-based interventions”[7] or “behavior change techniques” [107]) may have led to inadvertent omissions of studies of specific strategies. For example, the search terms may not have captured tests of policies, financial strategies, community health promotion initiatives, or electronic medical record reminders, due to differences in terminology used in corresponding subfields of research (e.g., health economics, business, health information technology, and health policy). To manage this, we asked experts to inform us about any studies that they would include and cross-checked their lists with what was identified through our search terms, which yielded very few additional studies. We included standard coding using the ERIC taxonomy, which was a strength, but future work should consider including the additional strategies that have been recommended to augment ERIC, around sustainment [13, 79, 106, 108], community and public health research [12, 109,110,111], consumer or service user engagement [112], de-implementation [104, 113,114,115,116,117] and related terms [118].

We were unable to assess the bias of studies due to non-standard reporting across the papers and the heterogeneity of study designs, measurement of implementation strategies and outcomes, and analytic approaches. This could have resulted in over- or underestimating the results of our synthesis. We addressed this limitation by being cautious in our reporting of findings, specifically in identifying “effective” implementation strategies. Further, we were not able to gather primary data to evaluate effect sizes across studies in order to systematically evaluate bias, which would be fruitful for future study.

Conclusions

This novel review of 129 studies summarized the body of evidence supporting the use of ERIC-defined implementation strategies to improve health or healthcare. We identified commonly occurring implementation strategies, frequently used bundles, and the strategies with the highest degree of supportive evidence, while simultaneously identifying gaps in the literature. Additionally, we identified several key areas for future growth and operationalization across the field of implementation science with the goal of improved reporting and assessment of implementation strategies and related outcomes.