Introduction

Social, emotional, and behavioral (SEB) problems are prevalent among youth and have been exacerbated from the COVID-19 pandemic (Sun et al., 2022). Universal evidence-based SEB interventions can effectively improve a broad array of outcomes integral to students’ success in school (Cipriano et al., 2023). There is an increased need for schools to implement population-based practices to prevent SEB struggles that result in negative school and life outcomes (Greenberg & Abenavoli, 2017). Despite their ubiquity (Schwartz et al., 2022), inconsistent implementation has resulted in many SEB interventions—even the most efficacious programs—failing to deliver on their intended impact (Dowling & Barry, 2020; Durlak & DuPre, 2008). Common elements of school evidence-based practice (EBP) implementation that attenuate the positive benefits students might otherwise experience (McLeod et al., 2023) include low fidelity, sustainment—maintenance of established context structures, processes—and supports for EBP delivery (Moullin et al., 2019). In real-world settings, implementation of EBP is frequently suboptimal, omitting active ingredients like reviewing student responsiveness data and canceling coaching sessions or including reactive adaptations that may detract from core components (Dusenbury et al., 2005; Stirman et al., 2013). As a result, there have been increasing calls for school mental health research to directly address implementation issues and develop effective solutions (Lyon & Bruns, 2019a; Sanetti & Collier-Meek, 2019).

Organizational Determinants

Implementation determinants are the contextual factors that act as barriers to or facilitators of high-quality EBP implementation (Nilsen & Bernhardsson, 2019). Across contexts, such determinants have been compiled and categorized by level of influence (e.g., individual, organizational; Aarons et al., 2011; Damschroder et al., 2009, 2020), with organizational factors emerging as particularly important for successful EBP implementation and sustainment (Aarons et al., 2016; Beidas et al., 2014; Bonham et al., 2014; Hunter et al., 2017; Novins et al., 2013). Of all possible organizational factors, strategic forms of leadership and climate that target a specific goal or initiative, like improved delivery of Tier 1 services (Schneider et al., 2005), are of importance to consider. Implementation leadership and climate, more specified versions of broader constructs in schools (Lyon et al., 2022; Thayer et al., 2022), strategically focus on supporting effective EBP implementation (Aarons et al., 2014; Ehrhart et al., 2014), which is frequently a goal among educators. Implementation leadership is a set of influential behaviors that facilitate a strategic approach to EBP implementation (Castiglione, 2020), while implementation climate reflects the shared perceptions among employees of their school’s prioritization and direct support of EBP use (Thayer et al., 2022).

School leaders can influence EBP implementation by allocating resources, communicating the importance of and expectations surrounding implementation of EBP, and recognizing and rewarding EBP use, among other things (Lyon et al., 2022). Leaders who routinely engage in these types of behaviors create the conditions that give rise to a favorable implementation climate (Aarons et al., 2017; Williams et al., 2022). Implementation leadership and climate have been shown to enhance educators’ attitudes toward EBP (Zhang et al., 2023), perceptions of feasibility (Corbin et al., 2022; Proctor et al., 2011), and EBP use (Williams et al., 2022), rendering them ripe for potential leverage for facilitating the successful implementation of EBP in schools.

One way to enhance EBP implementation is to use implementation strategies, which are methods and techniques that facilitate the adoption, high-fidelity use, and sustainment of EBP (Proctor et al., 2013). Implementation strategies exert their influence by targeting multilevel mechanisms (e.g., individual, contextual) known to modify implementation processes and outcomes (e.g., acceptability, reach, fidelity; Lewis et al., 2018). An abundance of implementation strategies and compilations exists, such as Expert Recommendations for Implementation Change (ERIC; Powell et al., 2015) and School Implementation Strategies, Translating ERIC Resources (SISTER; Cook et al., 2019; Lyon et al., 2019a) for the education sector. For implementation strategies to be effective, they must address contextually specific barriers to high-quality EBP implementation, which often requires tailoring the strategy itself (Powell et al., 2017). Tailoring implementation strategies typically involves an assessment of determinants likely to bolster EBP implementation, including EBP features like usability (Lu et al., 2022), the individual implementer attitudes (Aarons, 2004; Merle et al., 2022), and supportive leadership in the service setting (Aarons et al., 2017). Effectiveness of an implementation strategy is inextricably linked to implementation determinants, meaning it is critical to accurately identify which determinants are being targeted and at which level they reside (Aarons et al., 2011).

To target organizational determinants within school contexts, inner setting implementation strategies can be used to address the gaps in EBP implementation. The inner setting refers to the immediate organizational context in which implementation occurs (Damschroder et al., 2009, 2020) and includes structural, political, and cultural context features (Pettigrew et al., 2001). In the context of schools, the inner setting tends to be defined as building-level leadership which includes principals, teacher leaders, and administrative distributed leadership teams (Lyon et al., 2018a, 2018b). Implementation strategies targeting these inner setting players who have school-wide influence have the ability to impact EBP implementation through the establishment of a positive implementation climate in their schools. One such organizationally focused implementation strategy is the Leadership and Organizational Change for Implementation (LOCI; Aarons et al., 2015, 2017), designed to improve implementation leadership and climate to support EBP implementation (Skar et al., 2022). Studies have demonstrated strong evidence for the feasibility, acceptability, and utility of LOCI in mental health and substance use treatment settings based on leader quantitative and qualitative feedback (Aarons et al., 2015) and improvements in implementation leadership and climate for leaders after training at a four-month follow-up (Aarons et al., 2017).

Implementation Strategies

Prior work has applied LOCI in schools through the Translating Evidence-based Interventions (EBI) for ASD: Multi-Level Implementation Strategy (TEAMS; Brookman-Frazee & Stahmer, 2018) model, which provided training and support to providers for the implementation of an autism-specific intervention. However, this adaptation was not systematically or iteratively designed for use in schools which could have potential limitations given that the educational sector is a unique setting with particular nomenclature and contextual constraints (Cook et al., 2019). To date, implementation strategies that are leadership-oriented for schools are quite limited beyond the TEAMS adaptation. Adaptation is critical to improving its appropriateness—perceived fit, relevance, or compatibility of EBP for user and setting (Proctor et al., 2011)—or contextual fit (Proctor et al., 2013). Even following a strategy’s initial adaptation, iterative redesign is often needed to ensure multicomponent implementation strategies are tailored to the specific context. In an initial adaptation process, LOCI underwent two redesign phases to create Helping Educational Leaders Mobilize Evidence (HELM), which was developed to support elementary schools to increase their adoption and delivery of Tier 1 EBP (Institute of Education Sciences, 2020; Locke et al., under review). The first phase included focus groups of district administrators, principals, and teachers who identified necessary modifications relating to LOCI’s content and delivery. The following phase convened a National Expert Summit with 15 research and practice experts who participated in a nominal group process which found most recommendations to be actionable, impactful/effective, and feasible in addition to a hackathon whose results revealed two novel ideas and alignment areas with LOCI components. As a result, the HELM strategy was comprised of seven components: (1) 360° Assessment and Feedback (leadership and climate surveys administered to principals and their staff and 360° Feedback Reports shared with the principals), (2) Leadership Development Plan (document outlining principals’ goals for improving their implementation leadership and climate of their school, generated using 360° Feedback Report data), (3) HELM Training (two days where principals interactively learned how implementation leadership and climate impact their schools’ EBP implementation), (4) Organizational Strategy Development (meetings with district administrators and principals to develop and refine a Climate Development Plan to support building-level EBP implementation), (5) Individual Coaching (monthly one-on-one meetings where principals individually discuss their schools’ EBP implementation progress and problem-solve), (6) Group Coaching (optional monthly meetings where principals and leadership teams review their progress and share strategies across schools for idea generation and implementation support), and (7) Graduation (event to review final 360° Feedback Reports and celebrate progress over the past year). This paper summarizes the third iterative study to further develop the HELM strategy through user testing.

Strategy Design and Usability

Although multifaceted and multilevel strategies are common (Aarons et al., 2017; Glisson & Schoenwald, 2005; Kilbourne et al., 2007), implementation strategies vary in complexity (Proctor et al., 2013), and many are bulky, expensive, and/or not usable by implementation practitioners or other invested parties, resulting in interference with their widespread application to improving EBP use in schools. Implementation strategy development and adaptation are enhanced by the use of methods that can provide insight to the challenges experienced by users when interacting with them and identify avenues for redesign. Human-centered design is an approach that aligns product development with the needs of end users and settings in which products will be used (International Standards Organization, 2019). Usability, or the extent to which a product can be used by specified individuals to achieve specified goals in a specified context (International Standards Organization, 2019), is a key outcome of a successful design process and is needed to ensure that a product aligns with user needs, resources, expectations, and contextual constraints (International Standards Organization, 2018; Lyon et al., 2021a). Usability is a critical factor that drives the adoption and delivery of new innovations like implementation strategies (Dopp et al., 2019; Eisman et al., 2020; Lyon & Bruns, 2019b). Human-centered design methods are frequently used to identify and evaluate usability issues, or “aspects of an intervention and/or demand on end users which makes it unpleasant, inefficient, onerous, or impossible for them to achieve their goal in typical usage situations” (Lavery et al., 1997). In conducting usability testing, strategy developers ensure the feasibility and fit with the needs of end users which could result in information that facilitates the selection or tailoring of strategies (Powell et al., 2017; Wensing et al., 2009).

Cognitive walkthroughs are a low-cost assessment method commonly used in usability evaluations (Lyon et al., 2021a) that simulate the thought processes of a user by having them respond to questions that surface their thoughts, perceptions, and anticipated behaviors in the context of specific scenarios and tasks (Mahatody et al., 2010). Several variants exist (Bligard & Osvalder, 2013) and can be conducted individually or in group settings (Gutwin & Greenberg, 2000). The Cognitive Walkthrough for Implementation Strategies (CWIS; pronounced “swiss;” Lyon et al., 2021a) method is an implementation-specific operationalization of the generic cognitive walkthrough concept. CWIS is a pragmatic, generalizable (across implementation strategies, settings, system levels, users), and streamlined method to evaluate complex, socially mediated implementation strategies. CWIS’s six steps are subsequently described in the Methods as they relate to this study’s application of the method.

Current Study

The current study describes a rigorous usability test via the CWIS method as a component of the iterative development of the Helping Educational Leaders Mobilize Evidence (HELM) implementation strategy. HELM fills a gap in the need for leadership-oriented strategies for school improvement, none of which have targeted the specific factors of implementation leadership and implementation climate that are most proximal to EBP implementation (Aarons et al., 2015). This study sought to evaluate (1) the extent to which users understand HELM tasks presented to them, (2) HELM tasks that are likely to be most problematic/difficult for users to complete, (3) HELM usability issues that different user groups encounter, and (4) HELM components that can be simplified/streamlined to maximize usability.

Method

Participants

Principals were the targeted population for recruitment due to their role as a co-primary user of HELM and met the criteria for principalship experience required of coaches (see Step 1: Determine Necessary Strategy Preconditions). Recruitment for the group testing phase of the CWIS process, intentionally balancing the inclusion of early career and seasoned individuals, was conducted by an educational professional with principalship experience. Eligibility criteria included a minimum of one-year tenure in a principalship role, currently hold a principalship role at a public elementary school, and have access to Zoom. A total of 21 current elementary school (K-5) principals were recruited across one Pacific Northwestern state. Of the 19 participants that were scheduled to participate, the final sample included 15 principals across nine districts (four participants became unavailable during their scheduled group testing session and were unable to reschedule). See Table 1 for sample demographics.

Table 1 Participant demographics

Procedure

Step 1: Determine Necessary Strategy Preconditions

Following an Institutional Review Board (IRB) determination of exempt status, the research team began the first step of the CWIS process (Lyon et al., 2021a) to identify necessary preconditions for the HELM strategy to be effective. Despite target users of the strategy being principals and coaches (see Step 3: Task Prioritization Ratings), CWIS begins with the identification of preconditions for each of the strategy’s user groups (i.e., principals, teachers, district administrators, and coaches). Three Principal Investigators (PIs), all developers of the HELM strategy with expertise in the implementation of Tier 1 programs in education settings, along with a co-developer of LOCI collectively determined preconditions for user groups. Principals serve as the recipients and participants of the HELM strategy with preconditions defined as a minimum of one year tenure in a principalship role (including assistant principal). Teachers complete the 360° Assessments which provide critical data needed to generate 360° Feedback Reports for progress monitoring; therefore, willingness to complete 360° Assessments was a precondition. School-level preconditions included the desire and capacity for both teachers/school staff, principals, and administrators to implement a new Tier 1 EBP in their building and access to resources as EBP supports (e.g., training or education, funding). Preconditions for district administrators included their commitment to schools implementing a new Tier 1 EBP and their individual capacity to participate in HELM activities as the district develops its own organizational implementation strategy plan to align school and district efforts. Lastly, coaches play a critical role in the implementation of HELM components to directly support principals in their schools’ implementation of Tier 1 EBPs. Preconditions included previous principalship experience, previous experience implementing Tier 1 EBPs in schools, understanding of and commitment to the HELM strategy, along with the desire and capacity to attend HELM activities and conduct coaching sessions with principals.

Step 2: Hierarchical Task Analysis

Next the research team conducted a hierarchical task analysis by identifying all tasks and subtasks that comprised HELM’s components for each of its identified user groups. Individual and Group Coaching were combined into one component for analysis as tasks were the same. Graduation was de-prioritized as it is the one component that takes place post-implementation and thus is not expected to directly impact SEB intervention implementation. Some tasks were user-specific, such as adjusting the focus of the coaching session based on the 360° Feedback Report data and/or principal needs for HELM coaches. The development of an initial list began the iterative process of articulating tasks, reviewing, and revising (e.g., additional details, combining, parsing out, user applicability). The finalized hierarchical tasks were divided into a school/district task list with 48 unique tasks across three end users (i.e., principals, teachers, district administrators) and a coaching task list with 21 unique tasks across one end user type (i.e., coaches). See Figure 1 in the Online Resource for tasks.

Step 3: Task Prioritization Ratings

Further synthesis was needed to prioritize tasks that should be tested. The research team prioritized tasks to be completed by HELM principals and coaches as these user groups were determined to be the co-primary users of HELM who were anticipated to regularly interact with HELM components (teachers and district administrators were identified as secondary users and de-prioritized; Lyon et al., 2021a). By prioritizing the co-primary users of the HELM strategy, the method impacts the populations who were targeted for recruitment. Following the process described by Lyon et al. (2021a), the research team independently reviewed and rated the school/district and coaching task lists on (1) the anticipated likelihood that participants might experience an issue or commit an error when completing a task; and (2) the importance of completing a task correctly. Both items were rated separately on a Likert-type scale ranging from “1” (unlikely to make an error/unimportant to complete correctly) to “5” (extremely likely to make an error/extremely important to complete correctly). Mean ratings were calculated which combined likelihood of issue/error and completion importance. See Table 2. After consulting with a former principal with coaching experience for appropriateness, the highest rated tasks were selected and prioritized for user testing based on completion importance followed by likelihood of issue/error. While there are no task selection cutoffs articulated for the CWIS method, raters made the final task selections based on resources (e.g., time, incentives) for user-testing sessions. The research team considered the feasibility of sufficiently testing coaching tasks and de-prioritized 18 of the 21 tasks as a result. Of the initial school/district and coach task lists, a total of 15 tasks (principal, n = 12; coach, n = 3) were chosen for conversion to scenarios and subtasks displayed in Table 2.

Table 2 Prioritization of principal and coach tasks

Step 4: Top Tasks Converted to Scenarios and Subtasks

Based on the tasks selected from the previous step, the research team developed a series of scenarios and subtasks for the principal and coach user types. Overarching scenarios were crafted from task themes to provide background context on settings and activities and to cognitively illustrate how, when, where, and with whom its subtasks were to occur. Subtasks within scenarios were then created around related prioritized tasks. Revisions were made through an iterative process to ensure that scenarios were independent of one another, that subtasks were discrete, achieved through the expansion, combination, and operationalization of prioritized tasks, and that subtasks were aligned with those of the principalship role through consultation with and substantial edits from a former principal with coaching experience. Notably, some subtasks were not part of the prioritization process and were crafted to maintain necessary sequencing, take into account critical considerations of the scenarios, and align with the ability of average users (Hutchins et al., 1985). The principal user type consisted of four scenarios and 13 subtasks, and the coach user type consisted of three scenarios and eight subtasks (see Figure 2 in the Online Resource). Due to the potential unfamiliarity of implementation science concepts and volume of materials unique to the HELM strategy, those most necessary to complete subtasks were identified and adapted/developed.

Step 5: Group Testing with Representative Users

Group testing was conducted across three virtual sessions where participants (range = 4–7) assumed the HELM strategy’s principal (two groups; n = 10) or coach (one group; n = 5) user type. During the recruitment phase, interested individuals completed an electronic demographics questionnaire to confirm their eligibility. Participants were assigned to a two-hour session with an associated user type based on their availability. One week prior to their session, participants received digital materials associated with their user type’s scenarios and subtasks to review along with a disclosure form to complete (as informed consent was waived for the University of Washington IRB).

Three research team members attended each session to serve as the facilitator, notetaker, and technology assistant. Following a standard script, the facilitator began each session with an orientation to the project that included an overview of the HELM strategy and feedback expectations. The cognitive walkthrough process consisted of the presentation of a scenario and its subtasks, quantitative ratings and qualitative rationales from participants, and open-ended discussion by participants. For each subtask, a written description was visually presented on a PowerPoint along with an image that depicted the subtask’s intent. Participants were then given time to reflect on the subtask presented and asked to rate their anticipated likelihood of success (1) knowing what to do (“discovering that the correct action is an option”), (2) doing it (“performing the correct action or response”), and (3) knowing you did it successfully (“receiving sufficient feedback to understand that they have performed the correct action”) on a scale of “1” (No, a very small chance of success) to “4” (Yes, a very good chance of success; Lyon et al., 2021a). A unique link to access the previously shared digital materials that were necessary to complete the subtask was provided each time. The facilitator called upon participants, who were asked to share their ratings for how successful they would be in completing the presented task related to “knowing what to do,” “doing it,” and “knowing you did it successfully” and their rationale for each rating. Following the recommendation of Lyon et al. (2021a), the notetaker documented each participant’s three quantitative ratings and qualitative rationales in real time. The facilitator reoriented participants to the purpose of CWIS and/or the specified task or subtask as misunderstandings arose (e.g., a participant began to give feedback about the evidence-based practice instead of the HELM component). After each participant provided their ratings and justifications, the facilitator encouraged open discussion about potential barriers and facilitators for the subtask. Following the completion of all scenarios and subtasks, the facilitator presented three open-ended questions (i.e., overall impression of HELM strategy, comparison of HELM strategy to other implementation strategies or supports to promote adoption of new Tier 1 EBPs, anything to share about the HELM strategy generally or cognitive walkthrough experience) to the group designed to capture additional comments about potential usability issues. Lastly, participants completed a modified version of the System Usability Scale (Brooke, 1996)—the Implementation Strategy Usability Scale (Lyon et al., 2021a)—and received a $300 gift card for their participation.

Measures

The 10-item Implementation Strategy Usability Scale—modified from the System Usability Scale (Brooke, 1996)—is the default instrument of the CWIS method to assess and compare usability across implementation strategies or iterations of the same strategy (Lyon et al., 2021a). The original System Usability Scale was designed to evaluate digital interventions along two subscales (Learnable, Usable) and was modified such that the term “system” was replaced with “implementation strategy” in each item to fit the context of what was being tested. The Likert-type scale ranged from “0” (strongly disagree) to “4” (strongly agree) where half of the items were reverse-scored. The total score was calculated by multiplying the sum of each item by 2.5, yielding a range of 0–100. Bangor et al. (2008) was used to interpret scores with acceptability standards designated at or above 70 considered “acceptable,” with the best products scoring over 90, and scores of 50 or below considered “unacceptable” and likely to indicate serious usability issues.

Data Analysis

Step 6: Problem Identification, Prioritization, and Classification

The identification and prioritization of usability issues, two parts of the final CWIS step, were completed by one research team member and subsequently refined via group consensus with the entire study team. A recent study of 13 projects that reported a total of 90 usability issues developed the following format to describe usability issues: When [PRECURSOR], the [COMPONENT] is/has/is experienced as/results in/etc. [PROBLEM] with [CONSEQUENCE] (Munson et al., 2022). This approach to articulating complex psychosocial interventions and strategies emphasizes problem descriptions that place responsibility for problems on the implementation strategy itself rather than on the user. One research team member employed this common framework to code the qualitative rationales that participants provided to support their quantitative anticipated likelihood of subtask success ratings. Qualitative rationales across sessions were reviewed by user type, and the descriptive format from Munson et al. (2022) was followed to articulate usability issues. These rationales often mentioned participants’ familiarity with similar tasks in their principalship roles and specific references to aspects of the presented subtask or materials that were helpful or hindrances. This framework provided a template to clearly articulate a usability issue with its associated exemplary evidence and then determine its severity, scope, and level of complexity. The evidence used to support each usability issue was the qualitative rationales taken from testing session transcripts. The severity rating assigned to each usability issue ranged from levels 0–4 (“0”—catastrophic or dangerous/causes harm/high risk, “1”—prevents completion of task, “2”—creates a significant delay and frustration, “3”—has a minor effect on usability, “4”—subtle problem/points to future enhancement). Scope was determined by identifying the number of users who encountered the issue, and a complexity rating was assigned as low (“solutions are clear and feasible”), medium (“solutions are somewhat unclear”), or high (“solutions are unclear”). The resulting usability issues provided insight into the feasibility and contextual appropriateness of the principal and coach user type subtasks selected from the HELM strategy prototype.

Classification ratings help determine the overarching reason for problems that a usability issue surfaced (e.g., aspects of the strategy are misaligned with user knowledge and capability, hidden information). Instead of formal ratings, our research team discussed classifications for each usability issue during a shared review of all usability issues. During this shared review, the research team prioritized usability issues with the highest severity and scope scores and low to moderate complexity scores for redesign. The goal was to prioritize issues that were likely to have the largest impact on overall usability and were feasible for our team to redesign given available resources (e.g., time, funding). The research team collectively developed redesign solutions based on usability issue descriptions and exemplary evidence.

Average success ratings and factor-specific success ratings were calculated by user type from participants’ anticipated likelihood of success ratings on subtasks. Descriptive statistics were calculated by user type from the Implementation Strategy Usability Scale following the System Usability Scale scoring instructions from Brooke (1996).

Results

HELM Task Clarity

Participants’ individual anticipated likelihood of success ratings for each subtask is presented in Figure 3 in the Online Resource, color coded to visually represent scale ratings (“1”—No, a very small chance of success [red], “2”—No, probably not [orange], “3”—Yes, probably [yellow], “4”—Yes, a very good chance of success [green]) along with the percentage of confidence by factor. Participants who completed the testing session as the principal user type tended to rate their anticipated success of knowing what to do (M = 3.70, SD = 0.5), doing it (M = 3.66, SD = 0.5), and knowing they did it successfully (M = 3.70, SD = 0.5) between “Yes, probably” and “Yes” with an overall average across subtasks and factors of 3.69 (SD = 0.5). The consistent ratings across the three factors did not signal a difference in confidence between identifying the correct action, performing the action, and receiving sufficient feedback to know they performed the correct action. Principals gave the lowest confidence ratings on the first task of the first scenario (i.e., “Following review of the survey data, describe to the group the importance of the feedback you have received as it relates to your own leadership”) with knowing what to do at 72%, doing it at 72%, and knowing you did it successfully at 73%.

Participants of the coach user type similarly tended to rate their anticipated success of knowing what to do (M = 3.75, SD = 0.4), doing it (M = 3.83, SD = 0.4), and knowing you did it successfully (M = 3.59, SD = 0.5) between “Yes, probably” and “Yes” with an overall average of 3.72 (SD = 0.5). While these factor ratings reflect slightly more variation than those of the principal type participants, they remain within the 3.5–4.0 range. The first coach scenario on the first subtask received the lowest confidence ratings (i.e., “Collaborate to identify mission, responsibilities, and timeline for the distributed leadership team as it relates to EBP implementation”) for knowing what to do (80%) and knowing you did it successfully (81%) along with the second subtask (i.e., “Work with the principal to determine the potential composition of their distributed leadership team”) for knowing you did it successfully (80%).

HELM Usability Ratings

Scoring of the Implementation Strategy Usability Scale provided strategy usability ratings for each user type. On a scale of 1–100, participants’ ratings of the principal user type ranged from 52.2 to 100 with a mean of 77.8 (SD = 15.5; Mdn = 82.5) and participants’ ratings of the coach user type ranged from 80 to 100 with a mean of 87.5 (SD = 7.9; Mdn = 85.0). According to the standards put forth by Bangor et al. (2008), the average principal user type rating (Table 3) was indicated to be between “good” (2nd quartile) and “excellent” (3rd quartile), and the average coach user type rating was “excellent” (3rd quartile), which both fall into the “acceptable” range.

Table 3 Implementation strategy usability scale scores by user type

HELM Usability Problems

A total of five usability problems were identified, four for the principal (#1–#4) and one for the coach (#5) user types: (1) communication barrier in online training, (2) inability to track goals and barriers in provided materials, (3) overwhelming volume of 360° Feedback Report data, (4) principals’ struggle to interpret 360° Feedback Report data, and (5) coaches’ lack of knowledge about specific school buildings’ climate impacts ability to advise. Table 4 includes these issues along with their severity rating, scope count, and complexity level. Severity ranged from 3 to 4 (“has a minor effect on usability” to “subtle problems/points to future enhancement”) and complexity from low–medium (“solutions are clear and feasible” to “solutions are somewhat unclear”).

Table 4 Usability issue prioritization and redesign solutions

HELM Redesign Solutions

Redesign solutions (Table 4) were identified for all usability issues, and four of the five were incorporated into the next iteration of the HELM prototype strategy. These solutions included (1) conducting the HELM Training through an in-person format, (2) co-developing a coaching model with former principals with coaching experience, (3) including prompts and log space in HELM Training materials and the Leadership Development Plan for identifying barriers and documenting progression in overcoming them, (4) adding table of contents to aid navigation of the 360° Feedback Reports, and (5) providing data summaries within 360° Feedback Reports that highlight the (lack of) change areas and possible recommendations. Due to the ongoing COVID-19 constraints, redesign solution #1 could not be accommodated. See Figures 4 and 5 in the Online Resource for an example of how a redesign solution was implemented.

Discussion

Through centering the expertise of principals who educate the youth of our communities, this study applied an implementation strategy-specific cognitive walkthrough method to evaluate the usability of an organizationally focused implementation strategy prototype to improve its feasibility and usability for end users. As the HELM strategy was designed to address gaps in the leadership strategies of school administrators, results from their participation indicated that the strategy’s usability was “good” to “excellent” for the principal user type and “excellent” for the coach user type. These ratings reflect the work of the first two iterative phases of this study that adapted LOCI to school settings. While these designations are considered “acceptable,” passable scores are considered 70 and above with the best products scoring over 90 (interpreted using Bangor et al., 2008). The five usability issues identified through the cognitive walkthrough sessions describe the difficulties of the representative users and provide a starting place for generating solutions to inform the next iteration of HELM for a pilot study. Most usability issues could be addressed as their severity ratings reflected subtle problems/points to future enhancement and the complexity of potential solutions ranged from “clear and feasible” to “somewhat unclear.”

In preparation for a HELM pilot, these barriers to implementation led the research team to co-design a coaching model with former principals with coaching experience, include prompts and log space for identifying barriers and progress in overcoming them in the Leadership Development Plan and coaching session notes, and scaffold 360° Feedback Reports via a hyperlinked table of contents along with summaries that highlight (lack of) change areas and possible recommendations. The systematic incorporation of elementary school principals’ feedback indicated underdeveloped, unclear, and potentially daunting strategy components which critically informed the iterative development of the HELM strategy, promoting a design with a broadly applicable strategy tailored to that user population. Coaching, consultation, and facilitation processes are common to emerging multifaceted implementation strategies (Lyon et al., 2021b) and are routinely used in schools (Merle et al., 2022). These approaches are themselves complex and require clear and high-quality feedback (e.g., between coaches and principals) to be effective (Lefroy et al., 2015).

Findings from this study suggest that CWIS can be an invaluable method for surfacing, prioritizing, and addressing implementation strategy usability issues prior to active implementation, which is likely to increase the impact of the strategy itself. Although the contexts and leadership responsibilities of principals in elementary and secondary schools are similar, there are key differences that render this study as only generalizable to elementary settings. Principals, administrators, and coaches in secondary schools may identify different usability issues than elementary principals, administrators, and coaches. As such, other settings would require HELM to look different in terms of scope and sequence and its use in secondary settings would need to be tested, perhaps via the methods described in this study.

Limitations

Several limitations impacted this usability evaluation study. Recruitment was limited to one Pacific Northwestern state, and the participant pool was generated by an educational professional with networks throughout the state; the inclusion of additional districts and those across states may elicit additional feedback. The research team found that it was not feasible to fully engage principals in task prioritization ratings when preparing for the user-testing sessions, as doing so would have required training in HELM or developing an elaborate process to orient them. Instead, the research team focused on obtaining feedback on the tasks selected for prioritization as well as the scenarios and subtasks from a former teacher, principal, district administrator, and coach which allowed us to develop a more usable and contextually appropriate HELM strategy. Additionally, we could have sought user input on the redesign solutions which could have enhanced the likelihood of a solutions’ utility, but the opportunity to examine them remains possible through a pilot study.

All group testing sessions were conducted via Zoom as a COVID-19 pandemic precaution and likely increased the availability of participants (i.e., eliminated travel time from different, often more rural regions of the state). Despite the CWIS method’s clear participation protocol for group testing sessions, this virtual environment did not always permit the natural flow of open-ended discussions that is encouraged by the method. Usability testing was also only conducted once during the larger iterative development process to develop and test the HELM strategy prototype (Locke et al., under review), although the findings indicated acceptable usability and suggested that the prototype could be moved forward to the pilot stage. The CWIS method itself relies on the ability for participants to mentally project themselves into the scenarios presented, which could explain the lower confidence ratings on the first subtasks presented in group testing sessions as principals initially expressed difficulty understanding the cognitive aspect, in addition to presenting challenges for participants who may have struggled to understand the instructions of “imagining” versus “doing.”

With CWIS designed to be a pragmatic assessment method for gathering information from the highest priority user groups, the usability issues and contextual appropriateness of HELM for secondary users is unknown. Prioritization of users is critical to feasibly conducting user testing and can consider that district administrators may be appropriate targets of HELM, whereas teachers are not. Unlike teachers, who are unlikely to be aware of HELM, district administrators do directly interface with some components of HELM and would therefore be the next user group to test.

Future Directions

User testing prior to piloting an implementation strategy can help tailor a strategy to its’ intended end users and implementation context prior to piloting, thus saving valuable resources (e.g., funds for a feasibility study to learn the same thing). CWIS is a resource-efficient way to user test complex implementation strategies and is intended to be implementation strategy agnostic, making it widely applicable to aid the development and tailoring of various kinds of strategies. Its first application was to a consultation strategy to support mental health provider use of measurement-based care (Lyon et al., 2021a) and ongoing work is applying it to digital tools for health organizations to self-direct implementation of evidence-based innovations (Barwick et al., 2023). Additional multilevel human-centered design techniques like live prototyping, artifact analysis, and heuristic evaluation also are available (Dopp et al., 2019) and are being increasingly utilized in strategy redesign (Lyon et al., 2019b; Mohr et al., 2017). More broadly, school mental health implementation strategies and interventions can benefit from human-centered design processes like cognitive walkthroughs, where many variants exist (Bligard & Osvalder, 2013), as they have relevance to the redesign of client-facing interventions (Lyon et al., 2020) but are outside the scope of this paper. Broader applications of the CWIS method will facilitate examining its appropriateness for strategies intended for fields outside of education and mental health.

Conclusion

Usability evaluation of complex implementation strategies in schools is essential to building evidence that informs the creation of streamlined and pragmatic approaches to improve EBP efficacy. This is particularly true for organizationally focused strategies, which tend to be elaborate and time-intensive to respond to the inherent complexities of the inner setting. Employing human-centered design methodologies allows researchers to proactively examine implementation strategy usability to ensure that they meet the needs of end users and their environments. By partnering with school principals, their school administrator expertise related to their knowledge, experience, school buildings, and workflows informed the scope and severity of educational contextual constraints, as well as the specific needs for successful implementation and sustainment that can be addressed in the subsequent iteration of the HELM strategy. This application of the CWIS methodology yielded a more usable HELM strategy and simultaneously offers more broadly applicable information surrounding the development of psychosocial implementation strategies in real-world settings.