Background

Implementation outcomes reflect the progress towards success of efforts to implement evidence-based innovations. More specifically, they help disentangle the complex process of implementation so we can identify and target intermediate outcomes that may influence an intervention’s success in context. Robust conceptualization and rigorous measurement of these outcomes are essential for precision in implementation research, including understanding and testing the effectiveness of implementation strategies and explaining their mechanisms of action.

A 2011 paper by Proctor and colleagues advanced the concept of implementation outcomes, identified their critical role in implementation evaluation, and distinguished them from other traditionally measured outcomes (service system and clinical outcomes) [1]. The authors proposed a heuristic taxonomy of implementation outcomes and challenged the field to address a two-pronged research agenda: advance conceptualization and measurement and build theory including the identification and testing of change mechanisms. Ten years since the taxonomy’s publication, this paper maps the field’s progress in response to the originally proposed research agenda.

Conceptualization of implementation outcomes in the 2011 paper

Proctor and colleagues identified eight implementation outcomes [1] acceptability is defined as stakeholders’ perceptions that an implementation target is agreeable, palatable, or satisfactory. Adoption (also called uptake) is the intent, initial decision, or action to employ an implementation target. Appropriateness is the perceived fit, relevance, or compatibility of an implementation target for a given context or its perceived fit for a problem. Feasibility is the extent to which an implementation target can be successfully used or deployed within a given setting. Fidelity is the degree to which an intervention was implemented as prescribed or intended. Implementation cost is the financial impact of an implementation effort and must become bearable for implementation to proceed. Penetration—the integration or saturation of an intervention within a service setting and its subsystem—is calculated as a ratio of those to whom the intervention is delivered divided by the number of eligible or potential recipients. Sustainability is the extent to which an implementation target is maintained or institutionalized within a service setting. The 2011 paper encouraged further scholarship of this initial conceptualization, both in terms of the number of outcomes and in further refinements to their operationalization [1]. Cautioning that the original taxonomy included “only the more obvious”, that paper projected that new concepts would emerge as newly defined implementation outcomes [1].

Fig. 1
figure 1

PRISMA diagram

Impact of the 2011 implementation outcomes paper

The 2011 paper spurred several critical developments in implementation science. Research funding announcements began to note the taxonomy’s importance for study conceptualization and design, including the U.S. National Institute of Health’s PAR 22–105 for Dissemination and Implementation Science in Health which identified these implementation outcomes as important for inclusion in investigator-initiated research applications [2]. Eighteen institutes and centers signed onto this crosscutting PAR.

The Implementation Outcomes Framework joined the ever-growing list of implementation research frameworks [3, 4], with unique contributions. First, the taxonomy signaled to researchers, policymakers, practitioners, and system leaders that implementation science has distinctive, legitimate outcomes warranting study alongside the outcomes traditionally prioritized in intervention trials. Second, the taxonomy provided direction for treating implementation outcomes as key targets of change, spurring the testing of implementation strategies designed to improve this new outcomes category. Third, the taxonomy raised widespread awareness around the lack of tools, instruments, and designs (e.g., hybrids II and III [5, 6]) that support the measurement of implementation outcomes either as standalone research aims or in conjunction with other outcomes and/or variables capturing contextual determinants.

The 2011 call for advances in the conceptualization and measurement

The first prong of the 2011 research agenda [1] called for advancing the conceptualization and measurement of implementation outcomes through consistent terminology, a call recently echoed by Murrell et al. [7]. The 2011 paper challenged researchers to report the referent for all implementation outcomes and to specify measurement levels and methods [5]. Subsequently, many scholars have helped refine implementation outcome conceptualization.

For example, Lyon and Bruns [70] distinguished two types of implementation outcomes. They proposed that acceptability, appropriateness, and feasibility comprise perceptual implementation outcomes, while adoption, fidelity, and reach/penetration are behavioral implementation outcomes. An updated Consolidated Framework for Implementation Research (CFIR) distinguished between anticipated (forward-looking) and actual (backward-looking) implementation outcomes [8]. An Implementation Science editorial indicated that observable implementation outcomes such as adoption, fidelity, penetration, and sustainability are of most interest to the journal [9].

Significant measurement advances also occurred in response to the 2011 research agenda. The Grid-Enabled Measures Project [10] and The Society for Implementation Research Collaborative (SIRC) Instrument Review Project [11] are organized around the Proctor 2011 taxonomy. Weiner and colleagues’ study of the psychometric properties for measures of three key implementation outcomes [12]. Moullin and colleagues further refined pragmatic measurement via the PRESS measure for provider-rated sustainment in inner contexts [13], while the NoMAD instrument, based on normalization process theory, may also enhance implementation outcomes’ measurement [14]. We now have systematic reviews of implementation outcomes [15] and their measurement properties in behavioral health [16], public policy [17], stroke care [7], and physical healthcare [18]. Given these advancements in measurement tools, the field needs to examine commensurate progress toward their conceptual precision and linguistic harmony.

The 2011 call for theory building

Improved conceptualization and measurement positions researchers to move from asking descriptive questions about implementation outcomes to causal mechanistic ones [9], which is essential for building testable theory that describes, explains, and predicts how and why the implementation process worked (or not). Accordingly, the second prong of the 2011 research agenda called for theory-building research focused on employing implementation outcomes as key constructs in efforts to model successful implementation [1]. Researchers were challenged to explore the salience of implementation outcomes to different stakeholders and to investigate the importance of various implementation outcomes by phase in the implementation process—both of which can help researchers detect modifiable indicators of successful implementation [1].

Proctor and colleagues also called for research that tests and models various roles that implementation outcomes can play and research that illuminates how different implementation outcomes are associated with one another [1]. Their paper called for researchers to test several types of hypotheses related to how implementation outcomes are associated with each other, how the attainment of implementation outcomes influences service system and clinical outcomes, and how the effectiveness of implementation strategies affects implementation outcome attainment [20]. This call for hypothesis testing in implementation outcomes research has been echoed by a number of recent papers [21,22,23,24]. Current literature also reflects an increasing number of studies testing the effectiveness of implementation strategies and the mechanisms that explain how these strategies may influence implementation outcomes [25,26,27,28,29,30,31]. A 2021 scoping review paper [7] of adult stroke rehabilitation research using the Proctor 2011 framework revealed that adoption was the most frequently measured implementation outcome. No studies examined implementation cost, and fewer than half found that implementation strategies were effective in attaining implementation outcomes [7].

The 2011 paper also noted that measuring and empirically testing implementation outcomes can help specify the mechanisms and causal relationships within implementation processes and advance an evidence base around successful implementation. Since then, the field has responded. Recent publications raise awareness of mechanisms and advance their conceptualization in the context of implementation research. Emerging topics include prospectively building mechanism-focused hypotheses into research designs, developing approaches for identifying and prioritizing mechanisms, and advancing mechanisms measurement [19, 27]. Overall, the field still lacks conclusive evidence about interrelationships, particularly causal relationships, among implementation outcomes, strategies, subsequent outcomes, and their contextual and strategy determinants.

Study purpose

This review was designed to examine advances in (1) conceptualization of implementation outcomes (including the outcomes that have received empirical attention, the contexts for their study, and methods employed) and (2) theory building around implementation outcomes (interrelationships among implementation outcomes and their relationship to implementation strategies). We synthesize progress against the challenges posted in the 2011 paper and propose directions for the next 10 years of implementation outcomes research.

Methods

The first five steps of Arksey and O’Malley’s methodological framework for conducting scoping reviews guided our approach [32]. We also replicated the iterative and reflexive approach modeled by Marchand et al. [33] and Kim et al. [34] during each step of our scoping review process. Our published review protocol describes methods [35]. Here, we summarize and review key steps and note refinements to the protocol.

Stage 1: Defining the research questions

This review addressed three questions about advances in implementation outcomes conceptualization and measurement:

  1. 1.

    To what extent has each of the eight implementation outcomes been examined empirically in the literature? What other implementation outcomes did these studies identify?

  2. 2.

    What research designs and methods have been used to study each outcome?

  3. 3.

    In what contexts have implementation outcomes been studied? What service settings, populations, health conditions, and innovation types are represented?

To understand advances in theory-building around implementation outcomes, we addressed two additional questions:

  1. 4.

    Which implementation outcomes have been studied as dependent variables in tests of implementation strategy effectiveness?

  2. 5.

    What interrelationships between implementation outcomes have been studied empirically? This includes relationships among implementation outcomes and other outcome types, specifically service, and clinical outcomes.

Stage 2: Identifying relevant literature

Using forward citation tracking, we identified all literature that cited the 2011 paper and was published between October 2010 (date of online publication) and October 30, 2020. We conducted our search in the WOS database in July 2020. To account for any delays in archiving more recent publications in WOS, we also located articles using citation alerts sent to the first author from the publisher for a 6-month period coinciding with the end of the WOS citation search (February to July 2020). In May 2023, we used the same forward citation tracking procedures in WOS to confirm all articles that cited the 2011 paper and were published through October 2020 because of archiving lags and to collect a full 10 years of implementation outcomes papers. Citations were managed in Mendeley and then exported to Covidence.

Stage 3: Article selection

As reported in our protocol paper [35], we screened articles and included them if they (a) reported results of an empirical study, (b) were published in a peer-reviewed journal, and (c) were designed to assess at least one of the identified implementation outcomes or their synonyms as specified in the original implementation outcome taxonomy.

Stage 4: Data charting

Data were charted using a customized Google Form, depicted in Table 1 of the study protocol paper [35]. Since protocol publication, we added two variables: health condition, which was defined as the primary health, disease, or problem targeted by the intervention or prevention effort, and funding source variable, defined as the first listed funder of the study.

Table 1 Number and percent of studies by funding source and regional setting (n = 400)

Stage 5: Collating, summarizing, and reporting the results

We calculated and report frequencies, averages, and trends over time to identify the extent to which implementation outcomes are studied empirically in the 400 included manuscripts. To identify progress in research on implementation outcomes, we examined the role of implementation outcomes in analyses—as correlates of contextual factors and other implementation outcomes, and as dependent variables in relation to implementation outcomes.

Results

Our identification process generated 1346 abstracts for screening, which yielded 479 manuscripts for full-text review. After a full-text review, we excluded 79 manuscripts. A total of 400 manuscripts met the inclusion criteria (Fig. 1). Among the manuscripts qualifying for a full-text review, 82% were published in or after 2017 (Fig. 2). A wide range of funders supported implementation outcomes research globally and domestically. The National Institutes of Health (NIH)—especially the National Institute for Mental Health (NIMH)—was the most frequent funding source (24.5%). We found little evidence of foundation, state, or the Patient-Centered Outcomes Research Institute (PCORI) funding (Table 1).

Fig. 2
figure 2

Number of included records and study types by year of publication (n = 400)

Question 1: To what extent has each of the eight implementation outcomes been examined empirically in the literature? What additional implementation outcomes were identified?

More than half (52%) of the included manuscripts examined acceptability, followed by fidelity (38.8%), feasibility (36.9%), adoption (24.0%), and appropriateness (20.1%). Penetration (15.4%), sustainability (15.1%), and cost (7.5%) were examined less frequently (Table 2). Most manuscripts indicated the stage or phase of implementation investigated, which we coded using the EPIS framework (exploration, adoption/preparation, implementation, sustainment). Focus on implementation outcomes varied by stage or phase, bearing out projections in the 2011 paper. In studies conducted during the exploration phase, appropriateness, feasibility, acceptability, and adoption were most frequently examined. Adoption, cost, and feasibility were addressed most frequently in studies conducted during the preparation phase. As hypothesized in 2011, sustainability was the outcome examined most during sustainment phase studies.

Table 2 Coverage of implementation outcomes (n = 400)

Eight percent (n = 32) of manuscripts identified implementation outcomes that were not in the original taxonomy. Our coding’s free text entry captured 24 unique alternative implementation outcome constructs, including evidence of delivery (e.g., use, provision, or receipt of an intervention; n = 4), usefulness (e.g., usability, utility; n = 14), clients’ responsiveness/engagement (n = 4), features of the intervention (e.g., adaptability, effectiveness; n = 7); clinician features (e.g., efficacy, competence; n = 8), level of implementation (n = 1), scale up (n = 1), and timely initiation (n = 1). Some of these terms (e.g., provider skill) may reflect determinants of implementation. Others—notably usefulness, usability, and utility—were identified in the 2011 paper as “other terms in the literature.”

Question 2: What research designs and methods have been used to study each outcome?

As Table 3 shows, most analyses of implementation outcomes were descriptive, with two-thirds employing on observational designs (n = 266). Experimental (n = 86, 21.5%) and quasi-experimental studies (n = 27; 6.8%) were less common; these studies accounted for about 30% of manuscripts every year, and this proportion did not fluctuate greatly over time (Fig. 2). Acceptability, adoption, and fidelity were most likely to be studied through experimental designs. Appropriateness was most likely to be studied qualitatively. Quantitative methods were used primarily for assessing adoption, cost, fidelity, and penetration. Less than a third of manuscripts presented mixed or multiple methods.

Table 3 Design and methodological approaches identified in implementation outcomes research (n = 400)

Question 3: In what contexts have implementation outcomes been studied? What service settings, populations, health conditions, and innovation types are represented in the studies?

To describe the context in which implementation outcomes have been studied, we captured study settings and populations, the innovations (implementation objects [36]) studied, and the health conditions addressed by the study (Table 4). Most manuscripts were situated in healthcare (n = 183, 45.8%) or behavioral health (n = 90, 22.5%) organizations—both inpatient and outpatient, with an additional 50 manuscripts (12.5%) set in schools. Studies predominantly addressed mental health (n = 129, 32.3%) or medical (n = 103, 25.8%) concerns. Manuscripts varied in their focus on the age group, with some including more than one age group. Nearly two-thirds of studies addressed adults and over 40% included children. The most common implementation object studied was a single evidence-based practice (n = 161, 40.3%). Implementation outcomes were studied in relation to screening and technological innovations in fewer than 22% of the manuscripts.

Table 4 Service context features in implementation outcomes research (n = 400)

Question 4: Which outcomes have been studied as dependent variables in tests of implementation strategy effectiveness—a theory-building question?

Despite being conceptualized as outcomes (because of exposure to different conditions and strategies), implementation outcomes were treated as dependent variables in only one-quarter (n = 97) of included manuscripts. Only 56 (14.0%) manuscripts examined implementation outcomes in relation to implementation strategies. Fidelity was most frequently studied as an outcome of implementation strategies (7.0%) (Fig. 3). Although over half of the manuscripts examined acceptability, only 5.0% assessed its role as an outcome of implementation strategies. Similarly, few manuscripts presented tests of implementation strategies for their ability to attain fidelity, feasibility, appropriateness, or address cost barriers. Most manuscripts examining implementation strategies presented experimental (n = 24) or quasi-experimental (n = 22) designs (Fig. 4).

Fig. 3
figure 3

Percentage of included records that examined implementation strategies, by implementation outcome (n = 400)

Fig. 4
figure 4

Designs used to examine implementation strategies and outcomes over time (n = 56)

Question 5: What interrelationships between implementation outcomes have been studied empirically? This theory-building question includes relationships among implementation outcomes and other outcome types, specifically service and clinical outcomes

Finally, we examined the role of each implementation outcome in the analysis (Tables 5 and 6). Fifteen percent of included manuscripts examined relationships between implementation outcomes and other outcomes. Only 5.0% (n = 21) tested relationships among different implementation outcomes. As Tables 5 and 6 show, the cost was not examined in relation to other implementation outcomes. Sustainability was examined most often, particularly in relation to fidelity (n = 3), penetration (n = 3), and adoption (n = 2).

Table 5 Percentage of studies that examined implementation outcomes relative to other outcomes (by implementation outcome) (n = 400)
Table 6 Number of studies that examined interrelationships among implementation outcomes (n = 21)

As shown in Table 7, only 23 manuscripts (5.8%) examined implementation outcomes in relation to service outcomes. Among implementation outcomes, feasibility (n = 9) was most often correlated with service outcomes. Effectiveness (n = 15) was the service outcome most frequently tested in relation to implementation outcomes. No studies of implementation outcomes in our sample addressed service outcomes of safety or equity. We also coded whether each manuscript examined implementation outcomes in relation to clinical outcomes, although given the wide heterogeneity in clinical outcomes of interest and in the absence of a corresponding taxonomy, we did not categorize specific clinical outcomes in this review. Only 22 studies (5.5%) examined implementation outcomes in relation to clinical outcomes. Fidelity was the implementation outcome most examined relative to clinical outcomes (10.2% of the manuscripts).

Table 7 Implementation outcomes and service system outcomes (n = 22)

Discussion

One decade later, this scoping review assessed the field’s response to the 2011 paper’s research agenda calling for advances in conceptualization, measurement, and theory-building around implementation outcomes. Our results show a proliferation of literature on implementation outcomes. However, empirical investigations accounted for less than one-third of manuscripts citing the 2011 paper. While descriptive work can enrich our conceptual understanding of implementation outcomes, more work remains to advance a theory that explains the attainment and effects of implementation outcomes.

How has research on implementation outcomes advanced over the past 10 years?

Implementation outcomes research is supported by a range of funding sources and is conducted in many settings and disciplines. Most included studies were conducted in health and behavioral health organizations. Similar research is needed in less frequently studied settings where health and other social care interventions are delivered (e.g., schools, social service organizations, and home-based services) [37,38,39,40,41,42] to diverse communities and consumers with a range of intersecting needs. The context for implementation, often varying by setting, has been shown to affect certain implementation outcomes [43]. Building knowledge in varying settings can help advance conceptualization and theory building around implementation outcomes like penetration (or reach), propel incorporation of equity in the study of implementation outcomes, and provide unique opportunities to further articulate the relationships between implementation outcomes and other service outcomes, particularly equity.

Most included studies examined the implementation of a single evidence-based intervention or implementation object, failing to capture the reality of organizations and systems that typically work to introduce, implement and sustain the simultaneous delivery of multiple interventions. Studying the implementation of multiple interventions carries logistic, resource, and design challenges but can make scientific leaps, particularly regarding external validity. Future research should examine how service system directors weigh acceptability, feasibility, and cost while selecting interventions and strategies and how they juggle simultaneous implementation efforts, stagger their timing, and sustain them in dynamic and unpredictable environments.

Our results reflected considerable variation in the degree to which different implementation outcomes have been studied, with a heavy emphasis on acceptability, echoing other recent reports. In a systematic review of quantitative measures assessing health policy implementation determinants and outcomes, Allen and colleagues found that acceptability, feasibility, appropriateness, and compliance were most frequently measured [17]. Moreover, Mettert and colleagues reported that acceptability had the greatest number of measurement options [15]. Other implementation outcomes like cost, penetration, and sustainability (the observable implementation outcomes prioritized by Implementation Science [9]) were measured less frequently in our review sample.

This suggests that, currently, implementation outcomes research reveals more about which interventions and strategies people like (important for refining interventions, improving patient-centeredness, and supporting initial uptake), but less about the degree to which interventions reach and benefit communities. Insufficient attention to outcomes like penetration and cost (those highly prioritized in real-world decision making) limits our field’s ability to take evidence-based practices to scale for public health impact. Building strong evidence about these more observable implementation outcomes is critical for supporting policymakers and program leaders as they make decisions about strategic priorities and resource allocation to deploy, scale, and sustain interventions that will reach an adequate number of consumers equitably.

Our review explored the field’s progress toward conceptual and linguistic harmony and the promise of uncovering new implementation outcomes. Some manuscripts cited the 2011 paper but employed alternative concepts and terminology for implementation outcomes despite their close alignment with the 2011 taxonomy. For example, terms such as “evidence of delivery,” “use,” “provision,” or “receipt of services” could be more precisely operationalized by adoption or penetration. Similarly, outcomes such as “client response,” “participant responsiveness,” and “engagement” align closely with the term acceptability. Where authors discover granular distinctions between more commonly used terms, a rationale for proposing new terms is welcome and necessary. Nonetheless, we reiterate the importance of common implementation outcome terminology, where possible, so that the field can continue to build and harmonize knowledge across studies. Moreover, some of the alternative terms may be more accurately labeled as determinants of implementation outcomes rather than new outcomes (e.g., client and provider factors).

The results of our review also identified emerging implementation outcomes that are distinct from those proposed in the 2011 taxonomy. For example, there has been widespread attention to scale-up [44,45,46,47,48,49]. Although the 2011 paper conceptualized actual or perceived utility as a synonym for feasibility and usefulness as a synonym for appropriateness, the number of studies using this term as a distinct outcome suggests that perceived usefulness, usability, and utility may be conceptually distinct from constructs in the 2011 outcome taxonomy. The expansion of implementation outcomes taxonomy was encouraged by Proctor et al. in the 2011 manuscript. For such outcomes, we encourage the provision of common use and operational definitions, psychometric research to refine measurement, and clear reporting and justification for how these are conceptually distinct from the original taxonomy.

Reflecting the phased nature of implementation, Proctor et al. 2011 proposed that some implementation outcomes might be most salient—and thus likely to be measured—at different times [1]. Although all outcomes were likely to be studied during active implementation phases, outcomes like appropriateness, feasibility, acceptability, and adoption were especially common in studies conducted during the early phases of exploration and preparation. Outcomes like cost, fidelity, penetration, and sustainability were more common during later implementation and sustainment phases. This may reflect the importance of different implementation outcomes for decision making over time and at certain points in the implementation lifecycle. However, we found little evidence of testing hypotheses about the optimal order of attaining specific implementation outcomes. We hope this can be improved as methods such as causal pathway diagrams, causal loop diagrams, and directed acyclic graphs gain traction in mechanistic implementation research [19, 30, 50,51,52,53].

More theory-building work and more experimental studies are needed

Our results suggested limited progress toward theory development. Few manuscripts focused on explaining, testing, or modeling the processes that reveal how implementation outcomes can change, be intervened upon, or affect other outcomes. Few studies treated implementation outcomes as dependent variables in studies that investigate associations or causal relationships between determinants and implementation outcomes. We also found few studies testing the relationships between implementation strategies and implementation outcomes—a key part of the 2011 proposed agenda. This gap is concerning given the purpose of implementation science, that is, to advance strategies for integrating innovations into everyday practice. Our results suggested that implementation scholars are still in the early stages of building evidence for the causal effects of implementation determinants and strategies and still do not know how to achieve implementation outcomes. We hope that can be ameliorated with a continued increase in study designs that include prospective theorizing about what mechanisms explain strategy effectiveness and precise measurement of these mechanisms in relation to specific implementation outcome attainment [19, 27, 54].

Although some have questioned testing implementation outcomes as dependent variables [55], rigorous trials of implementation strategies are important for learning how to achieve acceptability, feasibility, adoption, and sustainment. For example, random assignment of train-the-trainer or coaching to clinics can inform the most effective approach to provider adoption. Debate also surrounds the question of whether or not implementation outcomes are ever sufficient as “endpoint”-dependent variables and whether they should always be tested in relation to more distal service systems and clinical outcomes (as discussed below). While we argue for more research testing the intermediate role of implementation outcomes, testing their role as endpoint-dependent variables seems warranted as we continue to advance knowledge about how to most effectively attain them, and which implementation strategies to prioritize and invest in to do so.

Though correlational studies serve the function of suggesting variables for further testing to reveal building blocks for theory, scientific leaps require a shift from the descriptive work that, as evidenced by our findings, dominates the field. Though observational research is important for laying a foundation, particularly as implementation research moves into newer fields and settings (e.g., large-scale policy implementation), theoretical advances are necessary to understand how contextual factors such as organizational leadership [24] and implementation strategies affect outcome attainment. More work is needed to specify and test mechanistic pathways and empirical hypotheses about drivers, moderators, and mediators of implementation outcomes in a replicable way so that it is clear what knowledge is generalizable across settings versus what needs to be learned and assessed locally. Furthermore, finer-grained identification of the measurable proximal outcomes that precede implementation outcome attainment can help us better understand how exactly a strategy works to improve the implementation outcome(s) it is targeted to change (and thus what is core vs. adaptable about the strategy itself), as well as more clearly isolate what factors are not addressed by the strategy and thus need additional attention in order to achieve the desired implementation outcome(s). Notably, the frequency with which mixed methods were employed in our sample suggested the availability of rich data to pursue the theoretical advances we encourage here.

Studies in our reviews rarely addressed relationships among implementation outcomes. Given our finding that various implementation outcomes might be more salient at different phases, studies should examine the optimal temporal ordering of their pursuit. For instance, clinician perceptions about the acceptability, appropriateness, and feasibility of an intervention might predict adoption [56]. Longitudinal studies that measure and test relationships among multiple implementation outcomes before, during, and after implementation can generate new insights about phasing implementation efforts and the potential additive and interactive effects of thoughtful sequencing.

Few studies tested hypothesized impacts of implementation outcomes on other important outcome types, such as service system changes and improved individual or population health, thereby limiting theory building and testing the impact of implementation outcomes. This finding echoes recent reflections on the promises and pitfalls of implementation science [54] and suggests that our field has yet to empirically demonstrate the value of implementation science for improving health and healthcare quality.

Such inquiry is critical in work to reduce health disparities [40, 57,58,59,60,61]. Equity is a key service system outcome [1, 62]. Delivering interventions that are unacceptable to clients will surely block equitable care. Data on acceptability and feasibility can be used to adapt interventions and the associated implementation processes to build local capacity. Using implementation outcomes, equity in service delivery may be modeled and tested as follows:

$$\mathrm{Equity }= f\mathrm{\, of\, service\, acceptability }+\mathrm{ feasibility }+\mathrm{ appropriateness}$$

Similarly, penetration and sustainment of evidence-based care to the entirety of a system’s service recipients or a community’s under-resourced populations can serve as measurable indicators of equitable access and reach [63, 64], consistent with calls to prioritize structural racism in contextual analyses [65]. We hypothesize the following:

$$\text{Equitable access }= f\ \text{of fidelity} + \mathrm{ penetration} + \text{sustainment of evidence}-\text{based care}$$
$$\mathrm{Adoption }= f\mathrm{\, of\, feasibility\, and\, appropriateness}$$

[63,64,65]. Future studies that investigate relationships among different outcome types are necessary for achieving the system and population health impact that motivates the field of implementation science and are essential for demonstrating tangible impact and the value of investing in implementation work.

Strengths and limitations of our review

Our paper complies with recommendations that review articles in implementation science be rigorous, comprehensive of the questions being asked, and provide accurate attributions [66]. Given our review’s aims, we included only articles that cited the 2011 Proctor et. al implementation outcomes paper. Thus, our results likely underestimated advances in the field from studies anchored in alternative theories and taxonomies (e.g., those anchored by the RE-AIM framework), particularly those in adjacent disciplines or that focus on alternative implementation outcomes. Our rigorous calibration process to ensure reliability in the screening and data charting phases, and the iterative adaptation of our data charting procedures contributed to the strength of our review. For example, when coding revealed the need for new variables, we re-reviewed all articles. The reviewed articles presented many coding challenges, particularly around the precision of reporting, which could have introduced errors during the data charting. See Lengnick-Hall et al. [67] for detail on the coding challenges we encountered, along with recommendations to improve reporting.

When juxtaposed with Proctor et al.’s 2011 recommendations and a recent paper on recommendations for reporting implementation outcomes [67], our data provide a basis for projecting priorities for a “next stage agenda” on implementation outcomes 2022–2023. Summarized in Table 8, work must further harmonize implementation outcome terminology. Beyond observational measurement of implementation outcomes, studies should specify their role in analyses, test how to achieve them, and demonstrate their impact on clinical, system, and public health improvements. Especially pressing is understanding how implementation outcomes—particularly acceptability, feasibility, and sustainability—can advance equitable health service delivery. Testing hypothesized sequencing, impact, and efficiency of attaining implementation outcomes via strategies is essential to understanding and accelerating implementation processes [68, 69].

Table 8 Agenda for implementation outcomes research: 2022–2032

Conclusion

This review illustrated growth in implementation outcomes research, but with empirical evaluation reflecting a small subset of publications (30%). Over the past 10 years, manuscripts described implementation outcomes across many settings, emphasizing perceived outcomes like acceptability, feasibility, and appropriateness. We continue to lack robust evidence about strategies that help attain outcomes. Advancing the field demands that the next 10 years further both aims of the 2022 research agenda focusing on building strong theory, more objective measurement, and evidence about how to achieve implementation outcomes. Moreover, we must empirically demonstrate that successful implementation matters for the improvement of clinical service systems and public health outcomes.