Background

Measurement issues in implementation science are among the most critical barriers to advancing the field [7, 9, 21,22,23, 30]. Measures developed and tested in efficacy trials may not be feasible in service systems, and the widespread use of “homegrown” implementation measures limits generalizability of study findings [12, 25]. Implementation science is especially vulnerable to measurement issues given the rapid growth of the field and the need for multi-level measurement in diverse health contexts (e.g., community mental health treatment, medicine, etc.) [31].

Measure development involves conceptualization (identifying measurement gaps and relevant constructs for a target population); development (generating measure content and administration procedures); and testing (assessing psychometric properties) [5]. Psychometric testing has received the most attention in the implementation science literature [20, 26]. However, implementation partners—treatment developers, implementation researchers, community leaders—are unlikely to select measures based on psychometric evidence alone [13, 14, 29]. Emphasis must also be placed on a measure’s pragmatic qualities, goals for use, and translatability to clinical practice [34].

Glasgow and colleagues [13] recommended guidelines for pragmatic implementation measures. Based on a review of the literature, the authors noted that pragmatic measures have four key characteristics: importance to partners; low burden for respondents; actionable; and sensitivity to change. Extending this work, Stanick and colleagues [34] interviewed implementation science experts and identified the following three characteristics as priorities: integration with an electronic/health record, facilitation of guided action (e.g., selection of an intervention), and low-cost. This work contributed to the development of the Psychometric and Pragmatic Evidence Rating Scale (PAPERS) for evaluating implementation measures [21, 22]. However, there remains limited guidance on methods for developing pragmatic implementation measures to be used across different contexts.

Implementation measures must balance both psychometric and pragmatic quality. To attain this balance, we advocate that implementation scientists routinely use cognitive interviewing, a qualitative method that collects partner feedback throughout measure development [40]. Cognitive interviewing is uniquely suited to address measurement concerns in implementation science for four key reasons. First, implementation measures often evaluate efforts that engage diverse partners across multiple levels (patient, provider, organization) [1, 35]. Cognitive interviewing can reveal whether measure content is relevant across partner groups and inform tailoring as needed. Second, cognitive interviews can help assess psychometric and pragmatic characteristics, including a measure’s construct validity, training burden, relevance, and usefulness across different contexts. Third, unique to implementation research, in which context is paramount [4, 11, 28], cognitive interviews can be used to collect partner feedback on measure administration procedures. Cognitive interviews can assess partner preferences for a measure delivery platform (e.g., electronic or paper), measure format (e.g., time, length, multiple choice versus free response), and strategies to integrate the measure with a clinical setting’s workflow (e.g., when, and how often to administer a measure), all of which can enhance a measure’s utility and scalability. Finally, collaborative research techniques like cognitive interviewing can be used to center partner perspectives, which can promote equitable partnership-building and increase buy-in [36].

To advance the development of psychometrically and pragmatically valid tools, we advocate for the widespread use of cognitive interviewing in implementation science studies. We first provide a detailed overview of cognitive interviewing theory and the stages of cognitive interviewing. We then provide a case example from an ongoing implementation trial to demonstrate how cognitive interviewing can be used to develop a pragmatic measure and to design a measure administration protocol [32]. We conclude with reflections on how cognitive interviewing can be used to improve measurement in implementation science.

Cognitive interviewing: overview of theory and techniques for use in implementation science

During a cognitive interview, implementation partners verbalize their thoughts as they evaluate measure questions and provide responses [2, 40]. As the partner reads a measure aloud, an interviewer uses intermittent verbal probes to elicit their response process (concurrent interviewing) or has the partner verbalize their thoughts after completion (retrospective interviewing). Interviews may be used to identify constructs that partners value and consider important to assess (concept elicitation) or to revise an existing measure (debriefing). This method is used widely in other areas such as survey methodology and health outcome measurement (e.g., patient-reported outcomes in clinical trials), and by organizations like the United States Census Bureau [6, 16, 27] for measure development.

Cognitive interviews can be tailored to the goals of an implementation study. Given implementation research often includes a broad range of academic and community partners, interviews can be tailored for specific partner groups, to assess specific parts of a measure (e.g., instructions, terms, response options), to examine the relevance of the measure, or to evaluate administration procedures. In addition to its flexibility, cognitive interviewing can produce informative data even with small sample sizes (e.g., 5–10 interviews and a 15–30-min interview period) [40], which is particularly useful for resource-constrained implementation efforts.

Cognitive interviewing theory

Drawing on cognitive psychology, cognitive interviewing frameworks propose that a partner follows a four-stage mental model: (1) comprehension; (2) memory retrieval; (3) judgement; and (4) response [10, 17, 37]. At the comprehension stage, the goal is for the partner to interpret measure content (e.g., instructions, items, response options) as intended by the developer [39]. Misunderstandings may result from confusing or complex wording, missing information, inattention, and unfamiliarity with terminology. Measurement error due to comprehension issues [40] is especially likely in implementation science where it is well documented that users are often unfamiliar with key constructs [3, 8]. For example, the question, “Recently, how many days have you participated in a training on evidence-based practice?” presumes the partner comprehends key terms about time reference (“recently”), implementation strategy (“training”), and a construct (“evidence-based practice”). If the partner is unfamiliar with these terms, they may not understand what types of training activities and intervention to include when responding to the question, which contributes to measurement error.

Next, to recall an answer, the partner must draw on information in memory. Several factors influence the memory retrieval process including a partner’s past experiences and the number and quality of memory cues provided, such as the time anchor (e.g., “recently”) and examples (e.g., participation in a workshop versus ongoing training) [10]. Third, the partner must integrate the information presented and form a judgement [40]. Previous studies indicate that decreasing item complexity (e.g., length, vocabulary) may facilitate decision-making, leading to more accurate self-reports [18]. In the example provided, researchers could consider changing the time anchor, replacing the general term “evidence-based practice” with a specific intervention, and simplifying the question (“Over the past month, did you attend a workshop on cognitive behavioral therapy?”).

In the final stage, the partner selects an answer and communicates it to the interviewer [17, 40]. It is important to consider how response options are provided, specifically the type of scale used (e.g., Likert scale, rank order, multiple choice, open-ended), the direction of response options (e.g., “Strongly Disagree to Strongly Agree” versus “Strongly Agree to Strongly Disagree”), and whether the partner can meaningfully differentiate among the response options. In sum, cognitive processes involved in recall and recognition are affected by how measure content is presented, and these factors warrant consideration in measure development.

Cognitive interviewing techniques

Several cognitive interviewing techniques, generally categorized as think aloud and verbal probing [10, 40], may be used. In think aloud, the interviewer takes an observer role and asks a partner to spontaneously verbalize their thoughts as they respond to questions. In verbal probing, the interviewer takes a more active role by asking a partner pointed follow-up questions after each response. Probes may be general (Does this question makes sense?) or item-specific (What do you think the term “evidence-based practice” means?). Probe selection can be standardized/pre-planned or applied flexibly in response to the partner (You hesitated to answer, can you tell me why?). The goals of the implementation study will guide probe selection. Table 1 presents key goals of cognitive interviewing and probes to elicit implementation relevant feedback.

Table 1 4-stage cognitive interviewing model and example verbal probes for implementation studies

Cognitive interviewing experts recommend using a structured or semi-structured protocol to guide data collection (see [40]). The protocol typically includes study-specific interview techniques (e.g., standardized probes) and administration information (e.g., use of technical equipment). For implementation studies, the cognitive interview protocol may also include several key additions: (1) probes to elicit multi-level partner perspectives (e.g., asking a clinical provider: What factors may affect how a patient would answer this question?,asking a clinical supervisor: Do you think clinicians would need additional training to administer this question?); (2) definitions of terms to facilitate shared understanding between partners (e.g., Can you describe what evidence-based practice means in your own words?); and (3) instructions on how to tailor probes for specific partner groups (e.g., clinic supervisors versus front-line providers). Given the multi-level nature of implementation studies, analyzing data at the item- and partner-level may reveal important patterns in terms of conceptual themes, informant discrepancies, targeted revision areas, and measurement feasibility barriers. These patterns can inform subsequent refinements to the measure and measure administration protocol to enhance the usability and scalability in real-world contexts.

Cognitive interviewing case example in ongoing implementation science project

Our team is currently employing cognitive interviewing to develop a pragmatic measurement-based care (MBC) tool. MBC is an evidence-based practice that involves the systematic administration, review, and discussion of patient assessment data to inform treatment decisions [19, 33]. Few measures to assess patient progress in opioid use disorder treatment exist [24]. To address this need, the Director of the National Institute on Drug Abuse (NIDA) put forth a call to develop pragmatic measures of opioid use disorder symptoms and overdose risk. In response to this call, the NIDA-funded Measurement-Based Care to Opioid Treatment Programs (MBC2OTP) Project (K23DA050729) aims to develop a pragmatic overdose risk measure and measure administration protocol [32]. A preliminary 22-item measure was drafted by members of our study team based on published recommendations from the NIDA Director and colleagues and the DSM-5 diagnostic criteria for opioid use disorder [24]. Cognitive interviews are being used to collect partner feedback on measure content (symptoms, impairment, frequency of opioid use), format (open-ended questions versus multiple choice, preferred length, scoring), and administration procedures to inform implementation in community opioid treatment programs (OTP).

Multi-level partners are being recruited via email for cognitive interviews in two rounds. In the first round, relevant partners include program leaders who would decide whether to introduce the measure at an opioid treatment program, clinical supervisors who would oversee the training and supervision of counselors in measure administration, and front-line counselors who would deliver the measure to a patient. The second round of interviews focus on patients who would complete the measure in treatment. Eligibility requirements include English fluency and staff employment at the opioid treatment program for at least 3 months. No other exclusion criteria are used. Exclusion criteria are purposefully minimal to capture a range of diverse partner perspectives.

During the interview, three female researchers trained in cognitive interviewing present partners with the measure draft and ask them to answer each question aloud. We then apply the four-stage cognitive model to assess participant comprehension, memory retrieval, judgement, and response. First, in the comprehension phase, we assess whether partners comprehend the question and all the embedded constructs. For instance, our draft tool contains the item, “What typical dose of opioids do you take?” Ensuring comprehension requires us to assess whether a patient understands what opioids are and if they are aware of their average levels of opioid use.

Next, we assess the partner’s ability to recall an answer by drawing on information in memory. For example, we assess whether a patient’s response to the question about typical opioid use may differ based on whether they are experiencing withdrawal symptoms and if they would value examples of opioids in the item wording.

Third, we ask the partner to think aloud and describe how they are answering the question, so that we can assess how they form a judgment [40]. We also assess whether item complexity (e.g., length, vocabulary) seems appropriate or whether the item can be simplified to facilitate more accurate self-reports [18]. In the example provided, we ask whether participants might prefer a different time anchor or simpler wording of the question (“Over the past month, did you use more opioids than usual?”).

In the final stage, we ask the partner to communicate their final response to the question to the interviewer [17, 40]. In our cognitive interviews, after a partner provides a response to one of the MBC items, we elicit their feedback on how the question is presented using verbal probes, which are outlined in a semi-structured protocol [10, 40]. We use both general probes (Does this question makes sense?) and item-specific probes (What do you think the term “dose” means?) that are applied flexibly in response to the partner (You hesitated to answer, can you tell me why?). Importantly, our cognitive interview protocol uses supplemental open-ended questions to collect feedback on the ideal measure administration procedures to facilitate implementation of the protocol into the organizational workflow. Specifically, we elicit feedback on assessment frequency (how often the measure should be administered), administration context (group vs. individual counseling; in-person vs. telehealth sessions), and preferred administration method (electronic health record vs. tablet vs. pen and paper). Additionally, as an extension of typical cognitive interviewing, partners are asked to reflect on the types of implementation supports likely needed. Table 2 presents the four steps of cognitive interviewing currently being applied in the MBC2OTP study. Additional file 1 presents the full cognitive interview script used in the MBC2OTP study.

Table 2 Cognitive interviewing applied to the development of a pragmatic measure and administration protocol: The MBC2OTP case example (K23DA050729)

One-on-one partner interviews are currently being conducted via videoconference, are audio-recorded, and transcribed. Transcripts are being analyzed by three independent coders (ZPS, HR, and KS) to thematically identify areas for revision using NVivo. Using a reflexive team analysis approach [15], the study team meets weekly to establish consensus and resolve coding discrepancies. Reflexivity in qualitative analysis refers to the process by which the researcher identifies and reflects on the impact they may have (i.e., their own assumptions and biases) on the data being collected and analyzed in a study. The reflexive team analysis approach was selected to enable the coding team to iteratively reflect on their roles as researchers who are unfamiliar with the OTP context, as well as how this outside role may have impacted data collection, analysis, and interpretation.

Suggested revisions are being analyzed by item and partner background. Cognitive interviews will be continued until a representative sample is obtained from each participating OTP, defined as interview completion with all eligible partners who consent at each site. Data from these initial interviews will inform iterative development of the pragmatic MBC measure and measure administration protocol. Discrepancies and conflicting views across different partner groups (e.g., leaders and patients) will be resolved via collaborative co-design meetings with representatives from each OTP and the research team following interview completion. Results from the qualitative data analysis will be presented to OTP representatives, and consensus discussions will be held to make final decisions about conflicting feedback on each measure item.

To date, we have conducted 13 first-round 30 to 60-min cognitive interviews with participants from three opioid treatment programs (n = 6 opioid program leaders; n = 3 clinical supervisors; n = 4 front-line counselors). Data collection is ongoing and an additional five opioid treatment programs will be recruited to participate in the MBC2OTP study. Table 3 presents illustrative data gathered from the multi-level partners thus far to highlight how cognitive interviewing can be used to elucidate feedback on potential measure refinements as well as workflow administration.

Table 3 Illustrative multi-level partner feedback to inform revisions to a pragmatic measure and measure administration protocol for community opioid treatment programs

The interviews have identified specific items, instructions, and response options that may require modification to enhance clarity. Specifically, partners have suggested shortening items due to confusing clinical wording to enhance literacy, rephrasing instructions using simpler language, and including a mix of open-ended and multiple-choice response options. Additionally, interviews have identified questions that can likely be removed due to limited perceived utility, conceptual overlap with other items, and fit with counseling procedures at opioid treatment programs. Perhaps most valuably, the interviews conducted thus far have elucidated partner preferences regarding ideal measure administration procedures. Specific administration advice elicited by the interviews has included: administration of the measure prior to individual or group counseling sessions, review of the measure at the start of a clinical encounter to guide service provision, and use of paper and pencil to facilitate administration off-line or in group contexts. The interviews have also provided encouraging preliminary data that the measure is viewed as low burden to be pragmatic within the standard opioid treatment program workflow. Final decisions about which items to eliminate, add, or modify, as well as how to administer the measure in the usual opioid treatment program workflow, will be made once data collection is complete to ensure responsiveness to the elucidated feedback.

Reflections on use of cognitive interviewing

Methods to develop pragmatic measures are critical to advance implementation science [23]. As the field evolves, ensuring that partners share a common understanding of implementation constructs is essential to further the study of implementation strategies and outcomes [38]. Although cognitive interviews can be time and labor intensive, involving partners in measure development incorporates the perspectives of the end-users, which can increase measure relevancy, increase the buy-in of front-line staff and administrators, and optimize a measure’s fit within a specific organizational context. Additionally, while interviews elicit discrepant data on measure quality and fit, cognitive interviews allow researchers to qualitatively capture discrepant partner viewpoints. This increased buy-in may result in measures that are more pragmatic, easily implemented, and sustained in community-based settings.

Cognitive interviewing can facilitate a shared understanding between partners and measure developers of implementation constructs, which with time, can reduce the field’s reliance on home grown implementation measures developed for single use. We assert that using cognitive interviewing to engage partners is complementary to psychometric testing because it increases measure utility and, thus, urge implementation researchers to routinely adopt this method. We believe that cognitive interviewing has potential to improve the rigor of implementation measures and facilitate a greater common language for the field.

Measurement concerns in implementation science are among the most significant barriers to advancing the field. There is an immense need for pragmatic and psychometrically sound measures but there remains limited guidance on methods to develop these measures. We hope that the overview of the four-stage approach to cognitive interviewing provided in this manuscript, along with a case example of how these stages are actively being applied in an ongoing implementation study, can help to advance the development of pragmatic measures and address measurement issues in the field.