Introduction

In 2012, a Cochrane systematic review found that audit and feedback (A&F) can have a small, yet potentially meaningful impact in professional clinical practice [1]. Given this impact, sustainability is important to consider to ensure positive benefits are continued. Efforts to ensure sustainability are also important so research funding is not wasted, and the trust of the community is maintained [2,3,4,5,6,7,8]. To extend benefits outside the initial trial context, there is also a need to actively consider how A&F might be applied in other settings and contexts (spread) [9] and across a wider area (scale) [10].

Given the potential for beneficial impact and use at a large scale, such as throughout a geographic region or healthcare system, a deeper understanding of how trial teams plan for the A&F to be continued (if effective) in other settings or contexts is needed. In the past 10 years, there has been an influx of A&F trials and an update of the Cochrane review is underway in 2023 [11]. This update provided an opportunity to explore the understudied areas of sustainability, spread, and scale of A&F trials. Although understanding sustained effectiveness of A&F trials will be crucial, and the subject of future research, including specifying if the A&F strategy or the effect on clinical practice was sustained, given the heterogeneity of definitions of sustainability, spread, and scale, and the lack of a standardized sustainability duration period [2, 3], there is a need to first explore how sustainability, spread, and scale are described in A&F studies, before focusing on effectiveness. As sustainability of beneficial effects could be considered in all studies, yet is not typically the focus of many implementation trials, a broad approach was taken to inform and provide a basis for future work. The objectives of this study were to determine how A&F trials describe and plan for 1) sustainability and 2) spread and scale.

Methods

Study design

This is a secondary analysis of a Cochrane systematic review using qualitative synthesis methods informed by relevant theory. The focus was on keywords used to describe the three concepts, the timeframe used to claim the impact or overall intervention, including A&F, was sustained, the determinants of sustainability, and the sequence, mechanisms, and underlying factors for spread and scale.

Operational definitions and theoretical frameworks

For this review, we used the Moore et al. definition of sustainability that is, after a defined period of time, a program, clinical intervention, and/or implementation strategies continue to be delivered and/or individual behavior change (i.e., clinician, patient) is maintained; the program and individual behavior change may evolve or adapt while continuing to produce benefits for individuals/systems [12]. Within A&F trials, sustainability can be viewed as having the A&F continue to be delivered while measuring for continued impact on health or behavioral outcomes of interest, or stopping the A&F delivery and measuring for continued impact. Although trials sometimes refer to A&F as an evidence-based intervention or as an implementation strategy, the term A&F process or strategy is used throughout to distinguish implementation strategies from the clinical interventions that those strategies sought to encourage.

To explore determinants of A&F sustainability, the Integrated Sustainability Framework (ISF) was selected as it is theoretically and empirically informed, and identifies common determinants across key levels and domains that have been found to influence sustainability across a range of types of settings and populations [7]. Key domains in the ISF include outer/policy context, inner/organizational context, implementation processes, provider/implementer characteristics, and characteristics of the intervention [7], with determinants that are important to consider within each of those domains (e.g., staffing turnover, cost).

The terms “spread” and “scale” are often used interchangeably; however, for this work, they are defined separately. Spread is defined as “replicating an initiative somewhere else (i.e. one site to another)” [9]. Scale is defined as “deliberate efforts to increase the impact of successfully tested health innovations so as to benefit more people and to foster policy and program development on a lasting basis” [10]. As included studies are all trials, the number of sites included may be due to study design requirements, rather than purposefully spreading or scaling the A&F process. As there are still important learnings regarding spread/scale from implementing trials at multiple sites, the reason for the number of sites should be kept in mind while interpreting these results. To gain a deeper understanding of factors to consider when planning for scale, the Framework for Going to Full Scale (FGFS) was used, which includes the phases of scale-up, adoption mechanisms, and support structures (infrastructure) [13].

Search strategy and information sources

The updated Cochrane review includes trials from the previously published version of the review (n = 140 originally, with n = 117 included in the updated review) [1, 11], as well as (n = 170) trials identified from electronic searches of the following databases: Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE, EMBASE, CINAHL, clinicaltrials.gov, and WHO International Clinical Trials Registry Platform. The initial search was limited to trials published from 2010 to June 2020 (n = 121), with an updated search from June 2020 to January 2022 (n = 40 additional studies). Details on the search strategy for the Cochrane review are provided in the protocol [11].

Eligibility criteria

Trials with A&F as the core strategy or as part of a multi-component intervention were considered eligible for the updated review [11]. All trials included in the updated review published between 2011 and January 2022 were included. The 2011 cut-off was selected to align with the seminal paper by Scheirer and Dearing which increased the focus on sustainability considerations in research [2].

Data screening and extraction process

Data extraction included identification of keywords (yes/no); study duration (months); sustainability period (months, if relevant); author mention of measuring sustainability (yes/no); and the copying of relevant text from the main paper and supplemental files relevant to sustainability, spread, and scale. Location (abstract, introduction, etc.) of relevant text in the main file was included. Extraction was piloted in two rounds by four researchers (CL, ZL, AH, and NS), using feedback from each pilot to refine our strategy.

Duplicate extraction of included studies was completed independently by 6 researchers (CL, ZL, AH, NN, NS, and JC). Sustainability keywords included sustain*, maint*, institutional*, integrat*, normal*, embed*, durabil*, longitudinal*, long*-term, routine*, and standard*. Spread and scale keywords included spread*, scal*, roll* out, reach, and generali#e*. Keywords were initially identified from reviews with relevant search strategies for sustainability [14] and spread/scale [15]. Extractors could list additional relevant words identified. Only keywords within the appropriate meaning were included (i.e., mention of approval from the “institutional” review board would not be included). Negative instances (i.e., no focus on sustainability) were included as our focus was on all mentions of these terms in the context of A&F trials. Discrepancies were decided by CL. A full list of keywords is included in Additional file 1: Full list of keywords.

Extraction only continued for studies with at least one keyword for either search (sustainability or spread/scale), while studies without a keyword were removed. For studies with a keyword, each relevant passage of text was copied along with the location of the text. For sustainability studies, total study duration (including baseline data) was extracted along with duration of the period over which sustainability was assessed, which was qualified as after the intervention period and was referred by trial authors by multiple names (i.e., follow-up, maintenance phase). Studies needed a minimum of three data collection points to qualify as having a sustainability period (i.e., (1) pre-intervention or strategy; (2) post-intervention or strategy; (3) sustainability). Whether or not the author claimed to be measuring sustainability was also extracted as this did not always align with inclusion of a sustainability period based on our definition. For supplemental files, relevant text was copied and included separately. When merging the duplicate coding, all relevant text copied by each extractor was included for analysis.

Forward citation search

One researcher (CL) conducted a forward citation search between July and December 2022 for each included study following methods suggested by Brown University [16]. Publications which cited the included study were identified through PubMed Central using the “Cited By” feature which produced a list of studies that was screened by title and abstract, followed by full text review of relevant studies. Studies that directly connected to the original study and considered sustainability or spread/scale were included. For example, a brief report publishing the 12-month results after a 6-month study would be included, or a study that applied the same intervention, including A&F, in a new setting. Forward citation studies were not included in the keyword search; however, text related to sustainability, spread, and scale was extracted.

Data analysis

Results from the keyword searches were analyzed descriptively, along with sustainability phase durations, and information on whether the authors claim to be measuring sustainability. Descriptive results per trial (year of publication etc.) are based on extraction from the wider updated Cochrane review (in press).

Due to the variation in the amount of focus each study placed on sustainability and spread/scale, there was a need to group studies prior to analysis. Based on pilot data extraction and analysis of 15 studies, we differentiated between “frequent” and “occasional” mentions of relevant text. Frequent sustainability includes all studies that had sustainability-related text extracted from three or more locations (abstract, introduction etc.). Occasional sustainability includes all studies that had sustainability-related text extracted from one to two locations. Frequent spread/scale includes all studies that had spread/scale-related text extracted from two or more locations. Occasional spread/scale includes all studies that had spread/scale-related text extracted from one location.

Studies defined as “frequent” underwent comprehensive inductive content analysis and deductive analysis to the ISF or FGFS. Studies with “occasional” mentions underwent content analysis only and were not mapped to a framework. As the keyword “generalizabl*” was deemed to have a relevant but unique meaning, studies that were only included because of this keyword were grouped separately. See Additional file 2: Methods for grouping studies.

All qualitative analysis was conducted by two researchers (CL and ZL) using NVivo 12.

Piloting of the codebook (Additional file 3: Codebook) was conducted by CL and ZL for five studies each in frequent sustainability and frequent spread/scale. The codebook for frequent sustainability was based on definitions adapted from Shoesmith et al., which were designed with the original developers of the ISF [17]. The codebook for frequent spread/scale was based on the FGFS descriptions provided by Barker et al. [13].

As no differences in the content analysis were found between studies with the occasional sustainability and spread/scale groupings, results were merged with the frequent groupings. Text extracted from supplemental files (protocols, theses, appendices etc.) and the forward citation search was analyzed by one coder (CL).

Results

There were 161 included studies. Thirty percent (n = 49) were published in the USA, 85% (n = 137) were parallel cluster randomized control trials (RCTs), and 46% (n = 74) were conducted in a primary care setting (Table 1).

Table 1 Summary of trial descriptives for all studies and separated by sustainability and spread/scale groupings

For sustainability, within the 78% (n = 126) of studies with at least one keyword, 49% (n = 62; 39% overall) qualified as frequent sustainability. For trials grouped as occasional sustainability, 28% (n = 35/126; 22% overall) had text in two locations, and 23% (n = 29/127; 23% overall) with text in only one location. For spread/scale, within the 62% (n = 100) of studies with at least one keyword, 51% (n = 51/100; 32% overall) qualified as frequent spread/scale. For trials grouped as occasional spread/scale, 14% (n = 14/100; 9% overall) had text in one location. Thirty-five percent (n = 35/100; 22% overall) of trials only mentioned generalizability.

The forward citation search yielded n = 2698 studies; n = 122 for title/abstract review, n = 46 for full text review, for a total of n = 38 included. For sustainability, n = 28 new studies were included and linked to n = 19 original studies (n = 15 frequent sustainability). For spread/scale, n = 18 new studies were linked to n = 12 original studies (n = 7 frequent spread/scale; n = 3 generalizability only). Supplemental files were included for sustainability studies (n = 18) and spread/scale studies (n = 14). No new themes were identified from the supplemental files and extracted text was merged with the overall results. Although forward citation studies provided valuable information on sustained results, application of implementation theories, and protocols for future studies to sustain or scale-up the original results, no new themes were identified.

A summary of study inclusion is provided in Fig. 1. Descriptives of the trials are provided by groupings (Table 1) and by year of publication (Fig. 2). Figure 2 shows no trend regarding the number of keywords found for sustainability, spread, or scale over the past 10 years.

Fig. 1
figure 1

PRISMA statement of included and excluded studies separated by sustainability and spread/scale. *Generalizability only refers to studies that were only included for mentioning the term “generalizability” and were therefore removed. +Frequent sustainability includes all studies that had sustainability-related text extracted from three or more locations (abstract, introduction etc.). ++Occasional sustainability includes all studies that had sustainability-related text extracted from 1 to 2 locations (abstract, introduction etc.). +++Frequent spread/scale includes all studies that had spread/scale-related text extracted from two or more locations (abstract, introduction etc.). ++++Occasional spread/scale includes all studies that had spread/scale-related text extracted from one location (abstract, introduction etc.)

Fig. 2
figure 2

Summary of publication year for all trials, and those with frequent mentions of sustainability, and spread/spread. (2022 is excluded as only January data is available.)

Extracted text for sustainability fit within the broader ISF determinants (organizational context etc.); however, lack of details specific to A&F made it difficult to identify determinants (barriers and facilitators) directly impacting sustainability. For spread/scale, strong alignment was found with the FGFS for phases of scale-up, and support systems (infrastructure), but not for adoption mechanisms. Three new themes were identified including aligning affordability and scalability; balancing fidelity and scalability; and balancing effect size and scalability.

Keywords

For sustainability, the most frequent keyword mentioned was “sustain*” (n = 142), followed by “integrat*” (n = 67) and “long*-term” (n = 64). For spread/scale, the most frequent was “scal*” (n = 85), with only n = 12 mentions of “spread.” Word counts include negative instances, such as when studies did not measure sustainability. The full keyword count is included in Fig. 3.

Fig. 3
figure 3

Keyword counts for sustainability and spread/scale across all studies (n = 161). This count includes multiple keywords per study. The dark/black bars represent the sustainability keywords, and the lighter/gray bars represent the spread/scale keywords. *Word stem. Full list of words is provided in Additional file 1: Appendix 1

Sustainability

Trial durations

The total duration of all trials that included at least one keyword regarding sustainability (n = 126), ranged from 2 to 75 months, for an average of 21 months, with 24 months being the most frequent total duration. Of those with a sustainability period mentioned (n = 37 based on our definition), duration ranged from 2 to 24 months, for an average of 10.4 months. Multiple study types were included. Twelve months was the most frequent sustainability duration. Although n = 37 trials claimed to measure sustainability, two of the studies did not report a timeframe. Two separate studies did not claim to measure sustainability, but had at least two time points measured after the intervention period, which may be due to a need for multiple time points for analysis rather than a focus on sustainability.

Key themes

Most studies that mentioned sustainability indicated they needed a longer trial duration and/or that more research was needed to determine sustainability of their overall intervention, which would include A&F. In several studies, there were inconsistencies in how studies reported whether or not results were sustained. Explanations of sustained effect were typically predictions or interpretations in the discussion, rather than direct results, such as from a process evaluation. Most studies indicated the overall intervention, including A&F, stopped after the trial ended, some continued, and others did not mention either way. Some trials determined the need for ongoing A&F, while others thought occasional “booster” sessions could encourage sustained change. Multi-component interventions rarely discussed sustainability determinants for individual components of the intervention, and typically provided more generic statements.

Integrated Sustainability Framework

Determinants of the ISF were used for deductive analysis. Determinant descriptions, ISF factors, and supporting quotes are provided in Table 2. Not all determinants described within the ISF were identified.

Table 2 Domains and determinants adapted from the Integrated Sustainability Framework (ISF), along with key quotes from included audit and feedback trials

Outer/policy context

The ISF determinant of outer/policy context represents the impact of the external landscape (policies, funding availability, partnerships, fit with national values etc.) on sustainability. There was minimal mention of how this external context impacted A&F trials. When mentioned, focus was on implementing new guidelines, and how external partners facilitate long-term implementation. One study saw potential for “embedment in a national quality assurance cycle” [39] to support sustainability. Access to external funding was a barrier, yet the focus was on the cost of the intervention rather than the broader funding landscape. Any mention of alignment with national or regional values was about the need to consider these values, not how they should be considered, as shown by this study: “We would suggest this includes due attention to influencing the institutional culture and context of rural hospitals although willingness to invest in more integrated approaches often seems lacking” [35].

Inner/organizational context

Inner/organizational context represent the impact of the organizational structure, leadership, and support, as well as readiness to change, access to resources, and organizational stability, including staff turnover. Some trials designed their interventions for “real-world” conditions, with the intent to be sustainable. “Interventions need to fit with the ‘bigger picture’ of the organisation” [23]. Access to existing organizational infrastructure was mentioned in plans for long-term implementation and was predicted to impact future sustainability; however, this was rarely actioned or followed up with empirical data, with most studies only providing the recommendation. Access to an electronic medical record (EMR) to generate local data, the need to involve local staff, and access to existing resources were all suggested to impact sustained integration into the organization. “Translation of the trial results is readily feasible because the interventions are delivered using the practice systems that are employed in delivering routine care” [34].

There were many concerns about an organization’s ability to keep trials going long-term. “Although managers were pleased with the improvements in prescribing performance, they were in agreement that the intervention program was too labour- and resource-intensive for long-term implementation” [40]. Concerns included lack of supportive infrastructure or an organization’s ability to continue without researchers. “Many hospitals lack the resources or expertise to organise and lead an implementation effort or to manage the changes needed, collect data, and initiate improvement teams” [20].

Implementation processes

Implementation processes consider how the intervention is implemented (decision maker involvement, implementation team training and support, program evaluation, adaptation, strategic planning etc.). Within trials that planned for sustainability, focus was on how to embed the intervention into routine practice. This embedding was thought to be supported by involvement of key decision makers and local staff, mainly in the design process, and connected to ongoing adaptation. “Our [intervention] consisted of comparable standardized elements, but more strongly involved local professionals in the design and performance of the locally tailored interventions” [41]. The ability to tailor the intervention (including A&F) to changing patient and organizational processes was said to support embedding, but mainly how to tailor in the future, as changes were not typically made during the trial. “The stepped-wedge design did not allow us to anticipate in a flexible manner to all types of circumstances that hindered the implementation. In retrospect, it is fair to say that we expected too much change in a too short time frame” [20]. In studies that did include tailoring, the ability to adapt was generally reported as a facilitator to sustainability. “Allowing participants to develop tailored systems changes to address barriers may have promoted sustainability by building engagement and aligning efforts with existing clinical processes” [37].

There was little mention regarding team training for A&F. Strategic planning typically focused on recommendations for what should happen next for effective interventions (including, but not limited to A&F), rather than experience with strategic planning. Program evaluation and access to data focused on the infrastructure for access to audit data, not on data to evaluate the ongoing impact of the A&F strategy.

A new factor was the use of implementation theories, models, and frameworks, and behavior change theory, to strengthen the implementation process and support sustainability potential. “The principal strength of the study is that it met the requirements of systematic reviews calling for large well-designed long-term trials of hand-hygiene interventions which apply behavioural theory to intervention design” [42].

Provider/implementer characteristics

Specific provider or implementer characteristics, such as roles, benefits, stressors, skills, and expertise, were rarely mentioned. When characteristics were discussed, focus was on embedding with existing staffing models and capacity, as well as motivation of implementers, including champions, to stay involved. Aligning with organizational capacity, the reliance on existing staff was suggested to be beneficial when planning for real-world implementation. “Using existing staff is important for understanding whether a model is feasible and sustainable regardless of externally funded interventionists” [43]. Other studies found that what they were asking of local staff was infeasible. “It appeared that large-scale uptake of evidence-based but complex implementation strategies with a minimum of influence of external researchers, but with the stakeholders in healthcare themselves being responsible for the work that comes with integrating this intervention into their own groups, was not feasible” [44].

Motivation to stay involved was described as a barrier and a facilitator to sustainability. If there were multiple delays in the implementation process, and lack of time, these decreased initial implementation effectiveness and sustainability potential. “The operational delays in preparing the Dashboard in the latter months left supervisors with less time to perform their duties and may have reduced the quality of supervision. Second, supervisors could have lost motivation over time, which might have reduced the effectiveness of their supervision” [45]. Motivation could also be beneficial if implementers, particularly supervisors or champions, maintained enthusiasm and continued to apply and promote the changes. “An enthusiastic motivator who used her or his time and energy to provide feedback, encourage competition and energize the staff to keep up the efforts throughout the season” [46].

Population characteristics are typically included in this ISF domain; however, this information would not have been extracted from trials, so it was removed.

Characteristics of the intervention

Characteristics of the intervention include the ability of the intervention, including A&F, to be adapted (not how it is adapted), fit within the context, perceived benefit, need for this benefit, burden and complexity of the intervention, and the cost. The A&F trials focused on challenges of working with complex interventions and systems. “Delivering a complex intervention into a complex system, … is challenging with many barriers to achieving intended outcomes. There was no simple reality” [20].

Cost was mentioned as a key characteristic impacting sustainability, including comparison between research costs and sustained implementation. “Although the added costs of such resource-intensive support can be maintained during research evaluations, it is challenging to incorporate these costs into a business model that enables sustainable, scalable provision of the service” [47].

The fit with the context, population, or organization, as well as the need for the intervention, was mainly covered in the descriptions of the need for the trial itself, not connected to sustainability. Perceived benefits were mainly covered in the results regarding whether or not the intervention, including A&F, was effective, only speculating on the potential for sustained benefit in the discussion.

Spread and scale

Key themes

Most studies made generic statements regarding the need for more studies to consider scale for their specific clinical area and more generally. Within studies that mentioned conducting the trial at scale, many were reported as “first of their kind” and provided some strategies for how they planned for scalability. Strategies were mainly focused on keeping costs low and using existing infrastructure. Many of these same trials recommended that more preparation work was needed and provided suggestions on why the intervention did or did not have the desired effect at scale.

Framework for Going to Full Scale

Results of the deductive analysis to the FGFS, specific themes related to A&F, definitions of the FGFS determinants, and supporting quotes are included in Table 3. Additional themes and supporting quotes are provided in Table 4.

Table 3 Results from the deductive analysis for spread/scale text to the Framework for Going to Full Scale
Table 4 Results from inductive analysis for themes related to spread/scale

Phase of scale-up: what phase of the scale-up process is the trial working at?

For phase 1: set-up, trials discussed how they prepared the groundwork for the trial to scale, including designing materials and training that could be easily scaled. “The goal-setting and action-planning worksheet was designed to be readily scalable and was delivered with minimal supports” [63]. Some studies generically mentioned how the trial was “designed for scale”; however, this mainly focused on keeping costs low and some acknowledgment of tailoring for site-specific needs. Not all aspects of the FGFS definitions were addressed, as there was limited mention about how decisions were made about what would be considered “full scale” or how early adopters were brought on board.

In phase 2: develop the scalable unit, the trials mentioned moving beyond initial design to conduct small pilots to inform what would be taken to the next level. A scalable unit is defined as a small administrative unit (e.g., clinical unit, district) that includes key infrastructural components and relationship architecture that are likely to be encountered in the system at full scale [13]. As an example, one trial discussed their aim to “pilot test the systems consultation strategy in a small set of primary care clinics to see if the strategy demonstrated feasibility, acceptability, and preliminary effectiveness in improving clinician adherence” [31]. If effective, a follow-up study was planned for a large-scale RCT, followed by a population-level intervention.

Many of the trials that discussed scale frequently were focused on phase 3: test of scale up, as they conducted the trial across multiple sites/settings with the intention of going to full scale. The main focus was on conducting the trials under usual conditions across a large area. The approach taken in one study was mentioned to increase “confidence in the wider applicability of trial findings as it replicates guideline implementation activities under standard conditions. We paid close attention to ensuring that the evaluated intervention was embedded in real world practice, and the trial itself involved more than 94% of primary care practices in three geographical areas” [22]. In this phase, testing of infrastructure, as discussed in support systems (infrastructure), was mentioned regularly, particularly regarding the benefits of having the same data systems (i.e., EMRs) used across sites to facilitate scalability, while acknowledging the challenges of adapting to different site needs. Many trials concluded that they should have done more during phase 1 and phase 2.

For phase 4: going to full scale, there was no standardized way to determine what qualified as “full scale”; however, descriptions such as “across all of Australia,” “across the province,” or “on a national scale” were all treated as “full scale.” Trials at this level typically mentioned work from previous phases first, and although the FGFS suggests less emphasis on learning during this phase, as anticipated for a trial, these trials still focused on learning and results.

FGFS: adoption mechanisms

Within the adoption mechanisms, determinants include better ideas, leadership, communication, policy, and a culture of urgency and persistence. Included trials mentioned use of more scalable, or “better” ideas before phase 1, as the emphasis was on learning from the literature, and a need for simple ideas or principles that could improve scalability. For example, some studies focused on use of “nudges,” as they aim to be low-cost, innovative behavioral approaches that have potential to be scalable and align well with A&F [26, 62, 64]. There was little mention of leadership or policy, beyond identifying that leaders were involved, or the trial was conducted in a “live policy context,” rather than the impact of leaders or policies. There was no mention of how communication strategies impacted the scale-up process, and when communication was mentioned, it was more about the intervention itself (i.e., an e-mail intervention). The culture of urgency and persistence was mainly mentioned in study introductions, highlighting the need for the intervention, not about the impact of this urgency.

FGFS: support systems (infrastructure)

Within support systems (infrastructure), determinants include human capability for scale-up, infrastructure for scale-up, data collection and reporting systems, learning systems, and design for sustainability. Human capability for scale-up focused on implementing the trial in “usual circumstances,” the benefits of needing as little implementation support as possible, and not to be labor intensive. The focus in this determinant was on how to make it feasible for people to engage with the A&F; however, as with the ISF analysis, there was minimal mention about specific skills to enable scalable A&F processes.

Infrastructure for scale-up was the most frequently mentioned determinant, particularly with the emphasis on using existing data structures for audit results, and a standardized way to share feedback. Scaling across sites/settings that have the same systems was seen as a significant facilitator for scaling-up, such as working in systems with the same EMR, or when data was already collected and accessible. However, only embedding the A&F process into the EMR was not enough, and some trials acknowledged they still needed strong design and implementation processes with some adaptation to local settings and processes.

Data collection and reporting systems were directly linked to infrastructure for scale-up, as both focused on using existing data collection and reporting systems, including EMRs and open data reporting systems. This overlap is likely unique to A&F as the need for audit data is the intervention or strategy, while different intervention types would use the data for monitoring and evaluation. Some studies mentioned learning systems, mainly focused on the benefits of implementation laboratories, clinical networks, or taking a learning health systems approach. Design for sustainability is the FGFS domain focused on planning for sustainability, so is covered by the ISF results.

Three new themes were identified:

  • Aligning affordability and scalability: keeping costs low was a main way trials planned for future scalability. Studies mentioned how the high cost and high resource use common in these trials were barriers to scale, with some studies mentioning strategies to keep costs down. “Brief interventions likely need repeating at regular intervals to achieve sustained improvement, balancing affordability and scalability” [65]. How to align the need for an affordable intervention with the plan for the intervention to be scaled was a frequently mentioned concern. “Although it was designed with wide reach and scaling up in mind, our budget for Website development and implementation likely exceeded that available… raising concerns about sponsorship of such programs” [48]. Using existing infrastructure and data reporting systems were key strategies to reduce costs. “Routinely collected, accumulating data in administrative data sets offers a cost-effective opportunity to implement and evaluate antimicrobial stewardship interventions at scale across large populations” [60].

  • Balancing fidelity and scalability: there were strong concerns about how to maintain fidelity to previous trials while delivering the intervention at scale, particularly for complex interventions. “Although an all encompassing intervention is likely to achieve impact, complex interventions can be impractical to scale up” [66]. Some trials selected key elements of a previous trial to scale, while others tried to maintain fidelity, yet typically indicated more preparation work was needed.

  • Balancing effect size and scalability: although studies had concerns about smaller effect sizes than anticipated based on a pilot study, some trials acknowledged how this small effect at a large scale led to greater impact overall. “Although this is a small change for an individual prescriber, our study demonstrates how this can lead to large impacts on antibiotic use over a broad jurisdiction” [60]. The recognition of this impact potential was a driving force for trials that aimed to be implemented at scale. “Scalable and effective systems that require minimal support to implement could make major improvements in primary healthcare system performance and health outcomes globally” [25].

Discussion

A&F trials should plan for sustainability, spread, and scale so that if the trial is effective, the intended benefit can continue and benefit a wider audience, which also reduces research waste and increases trust from the community [2,3,4,5,6,7,8]. Sustainability periods ranged from 2 to 24 months, with 12 months used most frequently. Although 78% of included studies mentioned a keyword related to sustainability, only 38% mentioned it frequently, and this was usually in vague statements in the discussion with suggestions for how it could be sustained, if effective, not how it was sustained. Similar findings applied for spread and scale. This lack of experience, specificity, and detail makes it difficult to recommend concrete strategies related to barriers and facilitators to A&F sustainability, since we know sustainability planning benefits from careful consideration of sustainability determinants [7]. Mapping to the ISF provided some insight into the broader domains and determinants that shape sustainability of A&F as tested in trials, which are vital for planning for their sustainability. Planning for scale mainly focused on keeping costs down and using existing infrastructure, without acknowledging the role of other mechanisms, such as policy, leadership, and communication, that support scale.

Twelve months was the most frequent sustainability duration reported, but total study durations and sustainability periods were not clearly reported in many studies. As different terminology was used across studies, with many not explicitly calling it a sustainability period, some of these time periods were included when it may not have been considered by the trial authors to be measuring sustainability. There is currently no recommended time for claiming an intervention is sustained; however, 12 months may not be long enough to truly understand whether or not an intervention, implementation strategy, and/or impact are sustained. Authors are encouraged to report clearer sustainability durations, publish follow-up studies, and indicate if the intervention, including implementation strategies, continued or not during that time.

The ISF determinants provided a useful structure to explore what may impact sustainability of A&F-based interventions, although it was difficult to directly connect ISF determinants to A&F, rather than other components of the intervention (education, champions etc.). Using the ISF is recommended to design suitable and appropriate sustainability strategies for future A&F trials, alongside tools such as the Expert Recommendations for Implementing Change (ERIC) sustainability glossary [67], which may be useful for determining specific strategies when planning for A&F sustainability. Our difficulty differentiating between implementation and sustainability characteristics is common within sustainability research [4, 7] and demonstrates the interconnected nature of these characteristics. This interconnectedness may also reiterate the need to consider and plan for sustainability early, during initial implementation [8]. The FGFS was useful to categorize phases of scale-up and for highlighting what was, and was not, discussed within trial descriptions. The FGFS may be a useful guide to plan ongoing scale-up of A&F processes, particularly as an overarching guide to help avoid the common mention of the need for more planning when the effect was not seen when delivered at scale.

As limited work has been conducted regarding sustainability of A&F, this qualitative review was important to conduct before asking questions about sustained effectiveness of A&F. With confusion around the definition and timeline of sustainability (range from 2 to 24 months), lack of clarity on whether the intervention was continued during the sustainability period, and generally inconsistent reporting, clear criteria, informed by this review, will be needed going forward when exploring sustained effectiveness of A&F trials. Trials will likely need to report results for at least three time points (baseline, end of intervention, and post-intervention), have a minimum amount of time that qualifies as “sustained,” and a clear differentiation between trials that continued the intervention and implementation strategies, including A&F, after the intervention phase and those that did not. Further exploration of scale will also need more consistency regarding the scalability phase of the trial, particularly what is meant by “full scale.” Improved reporting of intervention timelines and increased descriptions of how sustainability and scalability were planned (in the original or subsequent publications) will help increase our understanding of this impactful topic.

Limitations

We limited eligibility to more recent trials given the more recent focus in the literature on sustainability, spread, and scale, but recognize that in doing so, some insights from older studies would be missed.

Results are based on A&F trials designed to look at effectiveness within clear time limits, so the lack of detail regarding sustainability and spread/scale planning was unsurprising. We mitigated this limitation through the forward citation search. As included trials often used multiple intervention components and implementation strategies, not limited to A&F, it is not possible to attribute results solely to A&F. Although our initial inclusion criteria based on keywords aimed to be as inclusive as possible, some studies were excluded due to lack of use of specific words. For example, one study always used “12 months” to refer to continuation of the trial and was excluded [68]. As many studies were cluster trials that may need multiple sites, these trials do not necessarily reflect spread/scale; however, given the focus on keywords regarding spread/scale, valuable information was learned about sustainability, spread, and scale from trials conducted at multiple sites. Cluster trials were also conducted at the level of sub-team, ward, or even clinician. With the limited focus on sustainability within these trials, we chose to focus on all mentions of the topic rather than differentiating between sustainability of the intervention post-trial and sustainability of the effect of the intervention on behavior change, or outcomes. As more focus is placed on how to sustain A&F processes and subsequent behavior change, further distinction should be made between these sustainability indicators and time periods.

We also acknowledge that these studies were not necessarily solely or explicitly designed to study sustainability, spread, or scale, and future work could focus on studies with this explicit focus.

Our initial aim was to extract text directly to the ISF and FGFS; however, there was a large discrepancy between reviewers during the first pilot due an inability to distinguish between text explaining the initial implementation versus information specific to sustainability/spread/scale. For this reason, the broader strategy for text extraction was used as it had more consistent extraction during the second pilot. This change meant that potentially relevant text for the frameworks may not have been extracted if it was not directly referring to sustainability, spread, or scale. This method may explain why limited information was found for factors of the ISF and adoption mechanisms of the FGFS; however, the general lack of detail regarding these planning strategies indicates that a different extraction process would likely have led to the same results.

Conclusion

A&F trials should plan for sustainability, spread, and scale so if effective, the benefit can continue and impact a wider audience. Many studies lacked detail on if or how they planned for any aspect of the intervention, including A&F, to be continued. Scalability planning must go beyond keeping costs low and using existing infrastructure, to considering other strategies that support scalability. Future research should explore if the effect of an A&F trial is continued, for how long, and whether this is with or without continuation of the A&F process. Careful planning for sustainability, spread, and scale is needed to ensure that the changes can have a positive, sustainable, impact for a wide audience across different contexts.