Background

In implementation science, theories, models, and frameworks are foundational for generalizing implementation efforts and research findings across diverse settings and to build a cumulative evidence base [1]. Although the three terms are often used interchangeably, Nilsen offered useful definitions for distinguishing among theories, models, and frameworks in implementation science. Unlike theories, which typically posit causal relationships, models and frameworks tend to be “more like checklists of factors relevant to various aspects of implementation”; models are more commonly used to describe the translation of research findings in practice (i.e., in implementation practice) while frameworks are often used to identify implementation determinants (i.e., in implementation research) [2]. Theories, models, and frameworks, collectively referred to hereafter as “TMF,” promote generalization of findings by providing common language and constructs that enable consistently articulated explanations of implementation-related phenomena, thus promoting progress and facilitating shared understanding [3]. More specifically, TMF guide the process of implementation and the evaluation of implementation, facilitate the identification of determinants of implementation, and aid in the selection of implementation strategies. They also inform research stages by framing study questions and motivating hypotheses, anchoring background literature, clarifying constructs, depicting relationships among constructs, and contextualizing results [4]. Theoretical approaches to implementation science may delay or inhibit the field’s advancement by limiting shared understanding.

The benefits of TMF often go unrealized, due in part to the challenge of selecting from among the many that exist in the field, resulting in superficial use of TMF, use of inappropriate TMF, or TMF going unused altogether [5]. As a first step toward the development of guidance to help researchers select appropriate TMF [6], we recently conducted a survey to identify which TMF implementation scientists report using, how they report using the TMF, and the criteria that they used to select TMF. The 223 implementation researchers and practitioners from 12 countries who responded to our survey reported using more than 100 different TMF spanning several disciplines to inform their work; the most commonly reported included the Consolidated Framework for Implementation Research (CFIR) [7], Theoretical Domains Frameworks (TDF) [8], PARIHS [9], Diffusion of Innovations [10], RE-AIM [11], Quality Implementation Framework [12], and Interactive Systems Framework [13]. These implementation scientists reported using an average of 7 criteria to select TMF, including analytic level, logical consistency/plausibility, empirical support, and description of a change process. Despite the many criteria that implementation scientists used to select TMF, there was little consensus on which are the most important. Instead, the selection of implementation TMF was often haphazard or driven by convenience or prior exposure. Similarly, in a recent scoping review, Strifler et al. identified 159 implementation TMF, noting that scholars seldom provided sufficient justification for their use [14].

The results of our survey, bolstered by Strifler et al.’s review, suggest that implementation scientists may benefit from a refined, manageable set of criteria for selecting TMF [6, 14]. The guidance for selecting TMF that such a set of criteria would offer may also promote theory testing and identification of needs around TMF development, contributing to the advancement of the science. Specifically, such a set of criteria would facilitate the meaningful application of TMF, making explicit assumptions about relationships that are otherwise left implicit; providing an opportunity to test, report, and enhance the TMF’s utility and validity; and providing evidence to support TMF adaptation or replacement [15, 16]. In this paper, we used results from our survey as a starting point to develop a user-friendly tool to guide TMF selection.

Methods

The study involved three stages. First, in a concept mapping exercise, implementation practitioners and researchers reviewed the criteria identified in our recent survey (described above) and engaged in a sorting and rating task that yielded conceptually distinct categories of criteria and ratings of their clarity and importance. Second, we used concept mapping results to develop a tool to guide TMF selection. Third, we assessed the tool’s usefulness through expert consensus, cognitive interviews, and semi-structured interviews with implementation practitioners and researchers who tested the tool.

Concept mapping recruitment, procedure, and analysis

Concept mapping is a mixed-method procedure in which stakeholders organize concepts into categories and generate ratings of specified dimensions [17,18,19]. It is useful for structuring the ideas of diverse groups and has been used in implementation research for multiple purposes such as the identification and prioritization of barriers and facilitators [20, 21], organizing implementation strategies [22], generating dimensions of pragmatic measurement [23], and identifying training needs.

We used a purposive sampling approach to recruit 18 implementation practitioners (i.e., professionals who systematically apply lessons and findings from implementation science within human service settings to develop capacity and support performance for the full and effective use of innovative programs and practices) and 19 implementation researchers (i.e., individuals who study “the use of strategies to adopt and integrate evidence-based health interventions into clinical and community settings in order to improve patient outcomes and benefit population health” [24]) to participate in an online concept mapping exercise via the Concept Systems Global MAX™ [25] web platform. Implementation practitioners and researchers on the study team identified potential participants from their respective professional networks in Canada, the UK, and the USA. We sent up to three emails offering potential participants a $50 incentive to engage in the concept mapping exercise.

To identify conceptually distinct categories of criteria, we asked participants to sort virtual cards for each of the 21 criteria identified in our recent survey, accompanied by their definitions, into piles as they deemed appropriate. We then asked participants to name each pile. We also asked participants to rate the importance and clarity of each criterion on a three-point scale (“not important/not clear,” “moderately important/clear,” “very important/clear”). Participants could engage in the activities in the order of their choosing and could do so over multiple online sessions, at their convenience, until their responses were complete.

Data analysis involved the use of multidimensional scaling and hierarchical cluster analyses to produce visual representations of the relationships among the criteria [18]. Specifically, multidimensional scaling was used to generate a point map depicting each of the TMF selection criteria and the relationships between them based upon a summed square similarity matrix. Criteria frequently sorted together were placed closer together on the point map [18]. Hierarchical cluster analysis was used to partition the point map into non-overlapping clusters [18]. The investigative team, joined by one visiting implementation scientist from Australia (HK) and one from Ireland (SM; see the “Acknowledgements” section), considered a range of potential cluster solutions, ranging from two to 10 clusters, to determine which solution best suited the purposes of the current study. Each individual identified the cluster map that they deemed most conceptually clear based on their knowledge of the field. The group then convened to discuss their choice and worked to reach consensus on what the group thought provided the most conceptually clear map. The group also labeled each cluster, a process aided by Concept Systems Global Max™, which suggested potential cluster labels based upon participant responses. In two cases, individual items were moved from one cluster to another to improve the clarity and consistency of the clusters. Model fit was assessed using the stress value, an indicator of goodness of fit between the point map and the total similarity matrix. Cross-study syntheses of concept mapping studies have consistently found mean stress values of 0.28 [18, 19, 26], with higher stress scores indicating poorer representation of the data.

We calculated descriptive statistics for the importance and clarity ratings and plotted them for each criterion. Using the mean of each dimension, we divided the resulting scatterplot into four quadrants to create a “go zone” diagram. For example, quadrant I in Fig. 2 contains criteria that have high importance and high clarity, indicated by values that were above the mean for both dimensions.

Tool development

A study team member with expertise in visual design optimization (JS) developed a prototype tool based on the clustered criteria derived from concept mapping. The prototype included the list of the criteria with their definitions, organized by cluster. We developed an example project about the role of electronic health records in the implementation of cancer survivorship care plans and described how the prototype tool could be used to identify an appropriate TMF.

Usefulness assessment recruitment, procedure, and analysis

We refined and assessed the usefulness of the prototype in two stages. First, we conducted cognitive interviews to assess the extent to which the tool conveyed its content to potential users as intended. We recruited two implementation researchers and two implementation practitioners via phone and email to participate in cognitive interviews. An experienced cognitive interviewer asked participants to “think aloud” as they read and reflected on criteria in the prototype (see Additional file 1 for the cognitive interview guide). In particular, we solicited feedback on criteria that participants found ambiguous or confusing. Cognitive interviews lasted 30–45 min and were digitally recorded.

Second, we recruited two implementation researchers and two implementation practitioners via phone and email to pilot test the prototype with a specific project and provide feedback on the prototype in semi-structured interviews. We began by sending the prototype to individuals who consented to participate with a request for them to use the prototype for a project at some point during the subsequent 2 weeks. We then conducted semi-structured phone interviews in which we asked participants to reflect on their experience using the prototype and provide suggestions for improving the prototype (see Additional file 2 for the semi-structured interview guide). Semi-structured interviews lasted 30–45 min and were digitally recorded.

Given that the primary purpose of the cognitive and semi-structured interviews was to identify concerns related to the interpretability and appropriateness of the prototype’s content, following each of these two stages, qualitative researchers (RT, MV; see the“Acknowledgements” section) listened to the recordings and inductively identified themes, noting concerns related to the prototype’s wording, ordering, and format. These themes were then summarized in a table that organized participants’ concerns within each of the identified themes. We revised the prototype iteratively to address interview participants’ concerns.

Results

Concept mapping

Thirty-seven implementation scientists (19 researchers and 18 practitioners) participated in the concept mapping exercise. Participant demographics are described in Table 1. Participants were located in the USA (n = 30), the UK (n = 6), and Canada (n = 1). The majority had a doctoral degree (n = 29), were affiliated with an academic institution (n = 21), and had been a principal investigator (n = 21).

Table 1 Concept mapping participant characteristics (n = 37)

All 37 participants completed the sorting exercise. We confirmed that sorts were valid by checking 5 participants’ responses to ensure that criteria were sorted into generally logical categories. All participants rated the importance and clarity of the criteria, but 4 participants failed to rate clarity for one criterion, yielding an overall total of 99.5% of the criteria’s clarity being rated (810/814 across all participants) and 100% of the criteria’s importance being rated (814/814 importance ratings of the 21 criteria across 37 participants).

The final concept map included four clusters: usability, testability, applicability, and familiarity. To conceptually distinguish clusters from each other, we moved two criteria from their original clusters to an adjacent cluster: We moved inclusion of change strategies from usability to applicability and degree of specificity from applicability to testability. The stress value was 0.26, demonstrating goodness of fit [18, 19, 26]. Figure 1 shows the final concept map. Table 2 displays ratings for criteria importance and clarity, organized by cluster. Figure 2 shows the “go zone” graph, depicting quadrants of importance and clarity ratings for each criterion.

Fig. 1
figure 1

Concept map

Table 2 Summary of 22 theory, model, and framework selection criteria, organized by cluster with mean clarity and importance ratings
Fig. 2
figure 2

Importance and clarity

Tool development

We iteratively refined the prototype based on feedback and reactions during cognitive and semi-structured interviews, as shown in Table 3. For example, practitioners suggested that it would be ideal to have separate tools tailored to practitioners and researchers, respectively. To address that feedback, we tailored separate versions of the paper version of the tool for practitioners versus researchers which include different lists of examples of potential applications provided in the instructions for use. Cognitive interview participants also suggested that the tool should be as succinct, intuitive, and self-explanatory as possible. To address this, we eliminated redundancies, including the conceptual names of each criterion, and shortened their descriptions.

Table 3 Suggestions for T-CaST improvement identified during phase 1

Suggestions for improving the refined tools identified in semi-structured interviews and the subsequent changes made are displayed in Table 4. For example, participants indicated that the tool’s usefulness was limited if they did not already have a TMF in mind. Based on this feedback, we reframed the purpose of the tool from identifying a TMF to evaluating or comparing one or more pre-defined TMF. To facilitate comparing, scoring, and ranking TMF, we added columns allowing users to select characteristics most important to their project, a scoring system, and space for assessing two TMF on the same tool.

Table 4 Suggestions for T-CaST improvement identified during phase 2

Notably, cognitive and semi-structured interview participants identified several strengths of the tool. Cognitive interview participants confirmed the importance of various domains in the tool and highlighted ways in which such a tool may enhance the work of implementation researchers and practitioners, such as by helping to bridge research and practice. Semi-structured interview participants emphasized that the tool offered them a way to clarify their priorities with respect to criteria for a TMF under consideration for a project; to be explicit about the criteria that they used to select a TMF; and to compare, select from among, and/or consider the usefulness of combining multiple TMF.

The first version of the Theory Comparison and Selection Tool (T-CaST) resulting from our efforts is displayed in Additional file 3 (tailored to practitioners) and Additional file 4 (tailored to researchers). T-CaST includes hyperlinks to descriptions of the purpose of T-CaST, how T-CaST was developed, and where users can find TMF to use with T-CaST. T-CaST provides instructions for use, examples of its application by practitioners and researchers to multiple implementation projects, fields for describing the project, and a table in which users may select criteria that are relevant to their project, note TMF under consideration for the project, and rate the fit of the potential TMF to their project with respect to each relevant criterion. T-CaST allows users to compare the fit of multiple TMF to their project based on their ratings and to compare ratings across team members. T-CaST also allows users to report how they will apply the information from the completed T-CaST to their project.

Discussion

In this study, to facilitate TMF selection and encourage their appropriate use in implementation science, we sought to develop a user-friendly tool. Our efforts yielded the first version of T-CaST. After implementation practitioners and researchers have specified their research questions and identified corresponding TMF, T-CaST can guide them through the process of considering the relevance of TMF criteria for their project and rating the extent to which one or more TMF exhibit those criteria. T-CaST also features examples from other practitioners and researchers who have used the tools in several disciplines (e.g., education, health care) and settings (e.g., schools, public health agencies).

Our goal in developing T-CaST was to help implementation scientists select a TMF. However, cognitive and semi-structured interview participants found that the tool was helpful when they had one or more TMF already in mind. In particular, they found the tool helpful for deciding whether a specific TMF was relevant for their project or for deciding which of several TMF was most relevant for their project. Thus, the first version of T-CaST aids in the selection of TMF from among a candidate list; its usefulness in terms of identifying TMF in the absence of a candidate list is limited by the lack of comprehensive lists of TMF for implementation with defined characteristics that can be mapped on to criteria in T-CaST.

To achieve the goal of helping implementation scientists select a TMF without having any candidate TMF for consideration, T-CaST would need to be linked to a comprehensive list of candidate TMF. The Dissemination & Implementation Models in Health Research & Practice website (dissemination-implementation.org) is intended to help implementation scientists select TMF from a list of the TMF identified in Tabak et al. and Mitchell et al. [27] (Additional TMF are added based on expert recommendations.). Users may browse included TMF or search for TMF from among the list by specifying whether they are interested in dissemination, implementation, or both; the socio-ecological level in which they are interested; and up to 45 constructs of interest. These functions represent a substantial contribution to the field. However, three key limitations of the website limit its potential. First, the criteria that the website includes may be too circumscribed to yield relevant TMF. The tool that we have developed could be used to augment the website’s criteria. Second, dichotomous evaluations of each criterion (e.g., dissemination focus: yes/no) may be insufficient to capture the nuance associated with the criteria that implementation scientists may consider when selecting a TMF. T-CaST has the potential to improve upon this feature by suggesting a tiered evaluation approach (e.g., poor, moderate, or good fit). Third, the TMF that Tabak et al. and Mitchell et al. identified does not contain every TMF available to implementation scientists, as evidenced by the 159 TMF identified by Strifler et al. A more comprehensive approach is needed to ensure that implementation scientists consider all relevant TMF that pertain to their research question(s). Such a list may help users to avoid defaulting to only the most commonly used TMF—even the most comprehensive of which are not comprehensive of all implementation determinants. Many existing references will be useful to guide implementation scientists to select from among these TMF, including Nilsen’s “Making sense of implementation theories, models and frameworks” [28] and Grol et al.’s “Planning and studying improvement in patient care: the use of theoretical perspectives” [28].

Current efforts to develop a decision support tool for selecting knowledge translation TMF among researchers and practitioners may address some of the aforementioned challenges (personal communication, Lisa Strifler, January 20, 2018). Strifler et al. conducted a scoping review to identify knowledge translation TMF used in practice [14]. The study team is also conducting semi-structured interviews with researchers and practitioners to identify barriers to the use of TMF. They will then use the barriers that they identify to create a decision support tool. This will be followed by heuristic usability testing, individual usability testing, and pilot testing with practitioners. Future studies should compare and contrast our respective tools in terms of usability, appropriateness for diverse end-users, and influence on the use of TMF in the field.

The criteria included in T-CaST overlap somewhat with the criteria for assessing TMF quality proposed by Davis et al. [29] (see Table 5). In some cases, the criteria are extremely similar (e.g., testability). In other cases, however, the relationship is less clear. For example, Davis et al.’s criteria include measurability (“Is an explicit methodology for measuring the constructs given?”), which differs slightly but importantly from our applicability sub-criterion (“A particular method [e.g., interviews; surveys; focus groups; chart review] can be used with the TMF”), with the former referring to clear guidance for measurement and the latter referring to a preferred method of measurement. And, in contrast to Davis et al.’s criteria, our criteria exclude parsimony, which concept mapping participants in our study deemed of insufficient importance for inclusion. Also, in some cases, one of our criteria addressed several of Davis et al.’s criteria (e.g., Davis et al.’s “having an evidence base” and “being explanatory” both mapped onto our criterion of “TMF contributes to an evidence base and/or TMF development because it has been used in empirical studies”). Our criteria may be more parsimonious because our study fulfilled Davis et al.’s call for efforts to “transform the[ir] nine quality criteria into forms, such as reliable scales or response options that can be used in evaluating theories.”

Table 5 Comparison of Davis et al.’s [29] criteria for assessing theory, model, and framework (TMF) quality and T-CaST criteria

Some limitations of our study should be noted. Criteria that were unclear may have been eliminated from T-CaST not because they were fundamentally unimportant but because their lack of clarity made their importance challenging to assess. However, the participation of 37 implementation scientists in concept mapping may have guarded against this risk, particularly given that there was some variation in participants’ evaluation of the criteria’s clarity. Also, the relevance of criteria included in T-CaST likely depends upon a TMF’s intended use. For example, the extent to which a TMF provides an explanation of how included constructs influence implementation and/or each other may be more relevant for determinant frameworks than for describing implementation processes [2]. Relatedly, T-CaST users may rate criteria without weighting them by their relative importance. Consequently, high ratings of several potentially less important criteria may outweigh low ratings of potentially more important criteria. Future research should improve upon this feature, perhaps by weighting criteria’s importance. For now, researchers and practitioners should determine which criteria are most important for their study or project. For example, researchers and practitioners who seek to describe—not explain—implementation may choose to omit the “TMF provides an explanation of how included constructs influence implementation” criterion. In addition, we developed and tested paper versions of T-CaST, limiting its interactive functionality. For example, the number of case examples that we could provide is limited. Our web-based version of T-CaST, now available at https://impsci.tracs.unc.edu/tcast/, will address many of these and other challenges. In the web-based version of T-CaST, with users’ permission, we will crowdsource examples of the tool completed for various projects in research and practice. Also, notably, crowdsourcing will allow us to identify the TMF that implementation scientists consider when using T-CaST, which TMF they decide to use, and which TMF they decide not to use.

Conclusion

T-CaST has several potential benefits. First, by helping implementation scientists to select a TMF, T-CaST has the potential to reduce fragmentation in the literature and promote the use of TMF in the field, which to date has been insufficient [30]. Second, T-CaST may limit the misuse of TMF in implementation science, which has been found to be widespread [30,31,32,33]. Semi-structured interview participants noted that T-CaST helped them to be explicit about the criteria that they used to select a TMF. Indeed, we recommend that T-CaST be used to facilitate transparent reporting of the criteria used to select TMF whenever a TMF is used in an implementation-related study. (See Fig. 3 for a checklist.) This recommendation stems from our finding that implementation scientists’ selection of TMF was often haphazard or driven by convenience or prior exposure [6] and perhaps applies even beyond implementation science, since the challenges of TMF selection are unlikely to be unique to our field. Transparent reporting of the criteria used to select TMF may limit the often superficial use of TMF [34]. Third, T-CaST has the potential to curb the proliferation of TMF by encouraging users to consider that a TMF (or multiple TMF in combination) may exist that meets their needs [35].

Fig. 3
figure 3

Checklist of criteria for selecting theories, models, and frameworks (TMF)