Introduction

School settings represent an ideal opportunity to provide support to youth at risk for anxiety disorders by offering effective interventions at school, during school hours (Lyon & Bruns, 2019). Subclinical levels of anxiety are common among youth (Balazs et al., 2013), associated with significant impairment (Angold et al., 1999), and increase the likelihood that youth will develop anxiety disorders (Pine, 2007). The majority of studies of indicated school-based prevention provided to youth with anxiety has been on cognitive behavioral therapy (CBT), and these studies have shown small to moderate effects (e.g., Haugland et al., 2020; Stoll et al., 2020; Werner-Seidler et al., 2017). However, barriers to implementing CBT programs in school settings exist. For example, schools may not have (enough) mental health professionals, lack the time and resources required to implement a program, or be reluctant to invest time for training and consultation (Rasmussen et al., 2019; Weist et al., 2017). Addressing these barriers may help improve the implementation and the effectiveness of indicated CBT programs in school settings.

Design of CBT Programs

Many CBT programs are designed to be delivered by mental health professionals (Muggeo et al., 2017; Werner-Seidler et al., 2017). Yet the extent to which mental health professionals are available to deliver CBT programs in schools varies across school districts (Langley et al., 2010). If treatments can be delivered by personnel with less mental health training in schools (e.g., school health nurses), then this may increase the number of individuals available to deliver mental health treatments (Kakuma et al., 2011). However, if a mismatch between the design of CBT programs and the training background of school personnel exists, then it may serve to undermine the effectiveness by impacting the delivery of these programs (Lyon & Koerner, 2016).

CBT programs can be designed to make them easy to deliver for school personnel with minimal prior CBT training (i.e., novice CBT providers; Bennett-Levy et al., 2010). First, program materials can be made simple (e.g., minimize the level of possible tailoring of the interventions to each client), because program protocols with higher complexity may require more clinical experience to deliver with treatment fidelity (Lyon & Koerner, 2016). Second, more structure can be introduced (e.g., more detailed instructions, assigning number of minutes that should be spent on each component in the sessions) to increase support for providers. A high degree of flexibility may create uncertainty in how to deliver a program (Chorpita & Daleiden, 2014). It is possible that increased structure and simple presentation materials for program content may make it easier for novice CBT providers to deliver CBT programs in school settings, but few studies have tested this question.

Treatment Fidelity

Measurement of treatment fidelity can be used to determine if efforts to create simplified and more structured CBT programs are successful (Proctor et al., 2011). Treatment fidelity concerns the extent to which a program is delivered as designed (Schoenwald et al., 2011). In mental health research, the term often includes three components: (a) Adherence—the degree to which the core elements of a program are delivered, (b) Competence—how skillfully the core elements of a program are delivered, and (c) Differentiation—the extent to which non-prescribed program content is avoided (Perepletchikova et al., 2007; Southam-Gerow et al., 2021). In this paper, we focus on two of these components, adherence, and competence, since our focus is on characterizing the delivery of two interventions.

One way to gauge if efforts to make CBT programs easy to deliver by novice CBT providers are successful is to determine if treatment fidelity is higher in a brief, simple and structured program than in a standard-length program, which may be more complex and with more room for flexibility. We tested this in a study where both programs were delivered by the same providers, training, and supervision were held constant across groups, and where both programs were effective in reducing anxiety symptoms (Haugland et al., 2020).

Beyond understanding the impact of program design on treatment fidelity, examining if treatment fidelity predicts clinical outcomes in school settings can inform implementation efforts. In the treatment evaluation literature, findings have been mixed regarding the relation between treatment fidelity and clinical outcomes (Collyer et al., 2019; Rapley & Loades, 2018; Webb et al., 2010). Some studies have found that higher adherence (Hogue et al., 2008; Podell et al., 2013) or higher competence (Podell et al., 2013) predict improvements in clinical outcomes. For example, a study of CBT for anxiety in community mental health clinics (hereafter called community clinics) found that higher adherence predicted improved clinical outcomes, and that competence among clinicians with two years of CBT training predicted improved clinical outcomes (Bjaastad et al., 2018). Yet most studies have found no significant prediction by adherence (Hartnett et al., 2016; Heywood & Fergusson, 2016; Liber et al., 2010; Overbeek et al., 2013; Southam-Gerow et al., 2021) or competence (Garner et al., 2012; Hogue et al., 2008; Southam-Gerow et al., 2021) on clinical outcomes. Most studies that have investigated whether fidelity predicts outcome focus on mental health professionals delivering care in university or outpatient mental health settings (see Collyer et al., 2019; Webb et al., 2010). Thus, it is an open question whether these findings will generalize to novice CBT providers delivering CBT programs in school settings (Mellin et al., 2011).

To date, only two studies have looked at whether fidelity predicts clinical outcomes in CBT for anxiety in school-based programs. The first study, with two published articles, indicated that greater adherence in the use of CBT session structure components (e.g., agenda-setting and homework assignment) and higher competence in administering these components, predicted better clinical outcomes (i.e., symptom reduction and diagnostic recovery; Ginsburg et al., 2012). In contrast, greater adherence in delivering specific CBT modules (e.g., psychoeducation, exposure) did not predict better clinical outcomes (Becker et al., 2012). The authors hypothesized that the lack of prediction was due to providers not having enough training and supervision to deliver the program with optimal skills (Becker et al., 2012; Ginsburg et al., 2012). The second study found that provider-rated adherence predicted anxiety symptom improvement at 3-month follow-up (Caron et al., 2020). However, both studies were limited by small samples size (N = 16 and N = 54), making it difficult to draw conclusions about the relation between treatment fidelity and clinical outcomes.

Aims of the Study

The current study investigated competence and adherence in two school-based CBT programs and how treatment fidelity predicts outcomes. Three research questions were investigated: (a) do adherence and competence differ by CBT program (brief and standard-length), (b) do adherence and competence predict change in anxiety symptoms and impairment at post-intervention and at 1-year follow-up, and (c) do the brief and standard-length CBT programs differ in the prediction of outcome by adherence and competence? Investigating these questions can help determine what programs are most suitable to implement in schools by novice providers and ascertain whether the association between fidelity and clinical outcomes differs across CBT programs with different designs. To investigate these questions, the current study presents secondary analyses from a multisite, school-based randomized controlled trial (RCT) evaluating the effectiveness of two group-based indicated CBT programs for youth with elevated anxiety symptoms compared to a waitlist (WL) control: a 5-session CBT program called Vaag (Raknes et al., 2015) and a 10-session CBT program called Cool Kids (CK; Rapee et al., 2006).

Methods

In the RCT, randomization was determined prior to inclusion, and groups of five to eight youth from each school were randomized to one of three conditions: Vaag (15 groups; n = 91), CK (19 groups; n = 118), or WL (18 groups; WL; n = 104). Youth and parents completed assessments at pre-intervention, post-intervention, and at 1-year follow-up. The study was approved by the Regional Committee for Medical and Health Research Ethics, Western Norway (approval no 2013/2331). Participants including youth, parents and facilitators provided written informed assent/consent. See Haugland et al. (2017) and Haugland et al. (2020) for details about procedures and outcomes (NCT02279251, clinicalrials.gov).

The RCT showed that school-based CBT was effective in reducing anxiety symptoms and impairment relative to WL (Haugland, 2020). The RCT involved 52 CBT groups (24 Vaag, 28 CK) delivered at school, during school hours. Within-group (pre-post) effect sizes (Cohen’s d) for youth- and parent-reported anxiety ranged from 0.41–0.53 in Vaag and 0.62–0.67 in CK, respectively. Thus, within-group effect sizes were generally larger for CK compared to Vaag. Furthermore, Vaag was deemed not non-inferior to CK. Outcomes were maintained or improved at 1-year follow-up for both interventions (Haugland et al., 2020). Facilitators achieved adequate adherence and competence in both programs. Fidelity was measured by the Competence and Adherence Subscale for CBT for Anxiety Disorders in Youth (Bjaastad et al., 2016). Adherence and competence scores for each group (mean of 2 rated sessions) ranged from 3.17 to 5.75 (M = 4.41, SD = 0.56) for adherence and 2.75 to 5.88 (M = 4.18, SD = 0.66) for competence.

Participants and Procedure

Youth Participants

The RCT included 313 youth (M age = 14.0 years, SD = 0.8; range 12–15 years, 84% female, 16% male). They were from high (29.7%), medium (62.3%), and low (8.0%) social class families, defined by the Registrar General Social Class coding scheme (Currie et al., 2008). The youth were recruited from 18 schools (17 public, 1 private) in urban and rural areas, from October 2014 to November 2016. Youth were invited to participate in the study if (a) either self- or parent-reported youth anxiety symptoms were ≥ 25 on the Spence Children’s Anxiety Scale (SCAS; Spence, 1998), and (b) interference in daily life was indicated by a score of ≥ 1 on the first question of the Child Anxiety Life Interference Scale (CALIS; self- and/or parent-report; Lyneham, 2013). See Table 1 for baseline youth and family characteristics.

Table 1 Youth- and parent-reported baseline youth and family characteristics

Youth participants in the RCT who attended at least one CBT session (N = 295; M age = 14.0, SD = 0.8; range 12–15 years; 85% female, 15% male) were included in the present study. Of these, 252 were defined as treatment completers (attending ≥ 4 sessions of Vaag or ≥ 7 sessions of CK), with the remaining defined as non-completers (n = 43). Retention rates were not different between interventions (p = 0.68; Haugland et al., 2020).

Facilitators

Thirty-two facilitators (M age = 43.2 years, SD = 8.1, range 32–62 years; 93.8% female, 6.2% male) delivered the CBT programs. The facilitators were recruited from school and community services (school nurses n = 21, community psychologists n = 5, family therapist n = 1) or from community clinics (social workers n = 3, psychiatric nurse n = 1, special education teachers n = 2). All delivered the interventions as part of their regular job. Among the facilitators, 83.9% had no prior CBT training, 75.0% had no prior experience working with anxiety, and they had an average of 6.70 years of experience working with youth (SD = 6.74, range 0–27; Haugland et al., 2020). Each facilitator delivered 1–8 groups (M = 3.25; SD = 1.80).

Interventions

Vaag and CK are manualized and include basic CBT components for anxiety (e.g., psychoeducation, cognitive restructuring, exposure). The components are introduced in the same order in both interventions. Skills are taught through goal setting, exposure plans, problem solving, and homework. Participants receive booklets with illustrative descriptions and fill-in-tasks to use in and between sessions. These materials differ between interventions when it comes to graphic design and terms used. The interventions also differ in the number of sessions, degree of structure, and the level of possible tailoring to each group member. Compared to the standard-length program, the treatment protocol for the brief program has more detailed instructions describing how much time should be used for each exercise and less room for adjustments to fit the individual youth. To reduce systematic bias, all facilitators received the same training, both interventions were delivered at each school involved in the RCT, and the majority of facilitators (75.0%) administered both interventions. To control for treatment differentiation, we used a 3-item measure indicating whether materials from one program was applied in the other (e.g., self-help material in Vaag or realistic thinking schema in CK). There was no overlap in use of materials and concepts between the programs.

Vaag

VaagFootnote 1 (Raknes et al., 2015) is a 5-session CBT group intervention with sessions varying between 45 and 90 min (total 5.5 h). The first four sessions are delivered weekly, with the fifth session delivered five weeks later. Session two is a joint youth-parent session. Vaag was developed for the RCT (Haugland et al., 2017).

Cool Kids

CK (Rapee et al., 2006) is a 10-session CBT intervention, with weekly sessions of 90 min (total 15 h, plus two 90-min parent-only sessions). The adolescent group-based school-version was applied (Rapee et al., 2006). CK has previously shown to demonstrate efficacy in the treatment of youth with anxiety disorders (Mychailyszyn, 2017), and as an indicated school-based prevention program (Mifsud & Rapee, 2005).

Training, Supervision, and Treatment Fidelity

Each group was delivered by two facilitators. Each facilitator received a four-day training workshop comprised of basic CBT principles for anxiety, assessment procedures, and both intervention manuals. Two additional two-day workshops were provided during the study to ensure cross-site consistency and prevent intervention drift. The facilitators received regular supervision by CBT experts (N = 10). Eight supervisors were certified by the national CBT association by completing a CBT training program. Supervision was 3–4.5 h for each Vaag group and 6–10.5 h for each CK group. The difference in number of hours is due to the different number of sessions in the two interventions. Thus, the supervision was held constant relative to program duration. Supervision was delivered according to a plan entailing the duration, structure, and content of supervision, including feedback on video recordings of sessions (Haugland et. al, 2020).

Anxiety Measures

Spence Children´s Anxiety Scale

(SCAS-C/P; (Spence, 1998) is a 38-item questionnaire assessing youth anxiety symptoms. Items are rated on a 4-point scale (0–3; never to always) yielding a maximum score of 114. The SCAS-C/P has demonstrated sound psychometric properties (Nauta et al., 2004; Spence, 1998), which have been replicated for youth in Scandinavian samples (Arendt et al., 2014; Olofsdotter et al., 2016). Studies using the Norwegian translation have been published (e.g., Fjermestad et al., 2020; Wergeland et al., 2014). Internal consistency in the total sample was good to excellent (SCAS-C, α = 0.91; SCAS-P Mothers, α = 0.89; SCAS-P Fathers, α = 0.87).

Child Anxiety Life Interference Scale

(CALIS-C/P; Lyneham et al., 2013) is a 9-item measure assessing impairment associated with anxiety in school, home, social life, and activities. Items are rated on a 5-point scale (0–4; not at all to a great deal) yielding a maximum score of 36. CALIS has demonstrated satisfactory psychometric properties (Lyneham et al., 2013), which have been replicated in Scandinavian samples (Johnsen et al., 2019; Kilburn et al., 2019). In addition to the current RCT, one other study using the Norwegian translation has been published (Raknes et al., 2017). Internal consistency for the CALIS in the total sample was adequate to good (Youth α = 0.86; Mothers α = 0.79, Fathers α = 0.84).

Adherence and Competence Measure

Competence and Adherence Scale for CBT

(CAS-CBT) for Anxiety Disorders in Youth (Bjaastad et al., 2016) is an 11-item observational instrument designed to assess adherence and competence in the delivery of manual-based CBT for youth anxiety. The CAS-CBT is based on the structure of the Cognitive Therapy Adherence and Competence Scale (Barber et al., 2003; Liese, Barber, & Beck, 1995), but adapted for CBT delivered to youth. The adherence items include (a) review of homework and presentation of new homework, (b) structure and progress, (c) parental involvement, (d) positive reinforcement, (e) facilitation of collaboration with the youths, (f) facilitation and completion of session goal 1 (e.g., facilitator helping the youths do exposure exercises), (g) facilitation and completing of session goal 2. Parent involvement was not scored as parents did not attend any session coded for the present study. This resulted in six adherence items. The competence items include (a) skill level in CBT structure, (b) flexibility in adjusting the intervention to the youths, (c) skill level on process and relational skills, and (d) skill level in administering the session goals. The two adherence items and the one competence item focusing on the two main goals of each session were operationalized by consulting the treatment manual. The Adherence subscale was calculated by adding the six item scores and averaging them. Likewise, the Competence subscale was calculated by adding the four item scores and averaging them.

Coders watch entire sessions and rate each item on a 7-point Likert-type extensiveness scale (Hogue et al., 1996). Adherence items are scored on a 0–6 scale reflecting the degree to which the facilitator delivers the intervention: 0 (None) to 6 (Thorough). Competence items are scored on a 0 to 6 scale reflecting the level of skills: 0 (Poor skills) to 6 (Excellent skills). Consistent with prior use of the CAS-CBT with group programs, both facilitators’ behaviors were considered when producing scores on each item (see Bjaastad et al., 2018). For each session, an Adherence subscale score was produced by averaging the six adherence items, and a Competence subscale score was produced by averaging the four competence items. Bjaastad et al. (2016) and Harstad et al. (2021) found that the CAS-CBT total scale demonstrated good internal consistency (α = 0.87 and α = 0.87, respectively), and good to excellent interrater reliability (Adherence subscale ICC = 0.83; Competence subscale ICC = 0.64 and Adherence subscale ICC = 0.87; Competence subscale ICC = 0.63, respectively). Interrater reliability was calculated based on 23 sessions for the Adherence and Competence subscales using ICC (Shrout & Fleiss, 1979). The reliability coefficients represent the model ICC (2,1) based on a two-way random effects model. The reliability coefficients showed good agreement for adherence (ICC = 0.63) and competence (ICC = 0.69; Cicchetti, 1994). In the present study, internal consistency for the CAS-CBT was excellent (Adherence α = 0.81; Competence α = 0.87; Total scale α = 0.91).

CAS-CBT Coding and Session Sampling Procedures

Seven coders (n = 6 clinical psychologists, n = 1 child psychiatrist; 28.6% male; M age = 48.7 years, range = 31–66 years; all Norwegian) comprised the coding team. Each coder had clinical experience delivering CBT for anxiety and were trained in both interventions. Six of the coders were supervisors in the trial but did not rate their supervisees. Training of coders involved didactic instruction, discussions of the scoring manual, and reviews of sessions with developers of CAS-CBT. Each item was reviewed with discussion and examples from videotapes. Coders engaged in coding of three videos, and results were discussed to reach consensus ratings. Each coder then independently scored four sessions for certification. The study principal investigator was one of the CAS-CBT developers and scored master codes for the certification sessions. A coder was certified if s/he had no more than two out of 40 items (4 sessions X 10 items) deviate plus or minus 2 points from the master coders. All coders reached this criterion.

After coders were certified, recordings were randomly assigned to each coder while ensuring that no one rated their supervisee. All coders evaluated both Vaag and CK. During coding, the coding team remained in regular contact to prevent coder drift. All group sessions were videotaped (except out-of-office exposure tasks). Two sessions from each group were coded (N = 104 sessions), representing 20% of the sessions in CK (session 6 and 7), and 40% of the sessions in Vaag (session 3 and 4). These specific sessions were selected due to similar duration and content across interventions. Tapes of seven sessions of CK (12.5%) were missing. These were replaced by subsequent or preceding sessions. For interrater reliability, the expert coders double coded ≥ 2 randomly selected tapes evaluated by each of the other coders (n = 23; 23.9%).

Data Analyses

Preliminary Analyses

Analyses were performed in SPSS 25 and STATA (15.1). Sample bias analyses were conducted to determine whether the included 295 participants differed from the parent sample on demographic variables and outcome measures. Pre-intervention differences for youth participants on demographic and clinical variables were analyzed by t-tests (continuous variables) and χ2 tests (categorical variables). In addition, missing data patterns were examined.

Group Differences in Adherence and Competence Scores

To determine how to generate CAS-CBT Adherence and Competence Subscale scores, descriptive statistics for each subscale within the coded intervention sessions were evaluated along with the magnitude of the correlations for the subscales within and across the two coded sessions for each group. The correlations were interpreted following Rosenthal and Rosnow (1984) where r is “small” if 0.10–0.23, “medium” if 0.24–0.36, and “large” if > 0.36. To analyze whether adherence (CAS-CBT Adherence subscale) or competence (CAS-CBT Competence subscale) differed between Vaag and CK, generalized linear models (GLM) with robust standard-length error estimates to account for clustering of facilitators were used. Mean CAS-CBT Adherence and Competence subscale scores with 95% confidence interval (CI) were calculated.

Treatment Fidelity Predicting Outcome

To account for multiple testing (four analyses, one per outcome), predictors were considered significant at a Bonferroni-corrected significance level of α = 0.0125. For evaluation of whether treatment fidelity predicted youth clinical outcomes, linear mixed effect models (LMM) were conducted for adherence and competence. LMMs were used to analyze if adherence (CAS-CBT Adherence subscale) predicted change over time (pre-, post-, and follow-up) in each of the clinical outcomes (SCAS-C/P; CALIS-C/P), and to examine if the relation varied by CBT program. The LMMs included random intercepts to account for the data’s hierarchical structure, with three levels of nesting: individual, intervention group, and school. Separate analyses were conducted for each anxiety measure. The model included time (pre-intervention, post-intervention, and follow-up), adherence, and an interaction term of adherence and time as fixed effects to examine if adherence was related to change in average level of anxiety over time. A model with the three-way interaction of intervention, adherence, and time was included to examine if the effect of adherence on outcome varied by intervention type. Likelihood-ratio test was applied to compare the different models and test if the interaction was significant. The same analyses were conducted for competence (CAS-CBT Competence subscale) as a predictor of outcome. All analyses were repeated with the treatment completers only, to investigate if outcomes in youth completing the interventions were impacted by treatment fidelity.

Effect sizes were generated by standardizing the adherence, competence and clinical outcome measures (i.e., M = 0; SD = 1; Ferron et al., 2008; Lorah, 2018). Clinical outcome measures were standardized within each time point. All analyses were repeated with the standardized variables. The resulting values are interpreted as the expected change in the number of standard deviations in the dependent variable, given a one standard deviation change in the independent variable (Lorah, 2018). For the current study, that means a negative effect size is interpreted as higher fidelity predicting better outcomes. The value of these effect sizes can be interpreted as smaller, more conservative correlation coefficients (r; Ferguson, 2009), with an effect size of 0.2 being the recommended minimum representing a “practically” significant effect for social sciences, i.e., a small effect, ≥ 0.5 is a moderate effect and ≥ 0.8 is a strong effect (Ferguson, 2009).

Residual Analyses

Because the adherence and competence scores were highly correlated, the two treatment fidelity components were not included in the same model. However, the unique contribution of each variable was investigated by providing residual analyses. We generated a residual competence variable from a linear regression with adherence as independent variable and competence as dependent variable. The residual competence variable was used as an independent variable in addition to adherence in additional analyses with LMMs using the same strategy as when the two treatment fidelity components were analyzed separately.

Results

Preliminary Analyses

There were no differences between the current study sample (n = 295) and the remaining participants in the Haugland et al. (2020) study (n = 18). Further, there were no differences between Vaag (n = 142) and Cool Kids (n = 160) on youth demographic and baseline clinical characteristics (see Table 1). All facilitators received training in both interventions and 75% delivered both interventions. There were no missing values in the adherence and competence data. We evaluated patterns of missing data in youth- and parent-reported outcomes. Missing data ranged from 0.7 to 1.0% at pre-intervention, 12.9–14.6% post-intervention, and 31.5–35.9% at follow-up. Missing data occurred completely at random for all time points (Little’s missing-completely-at-random test: Pre-intervention x2 = 1.674, df = 4, p = 0.795; Post-intervention x2 = 6.095, df = 10, p = 0.807; Follow-up x2 = 14.707, df = 10, p = 0.143). In the LMMs, we used full information maximum likelihood (FIML) missing data methodology (Wothke, 2000) in Stata, which is the state-of-the-art method for handling missing data (Schafer & Graham, 2002).

Group Differences in Adherence and Competence Scores

The CAS-CBT Adherence subscale score for each group ranged from 3.17 to 5.75 (M = 4.41, SD = 0.56) and 2.75 to 5.88 (M = 4.18, SD = 0.66) for the Competence subscale. A medium correlation was found for adherence between the two coded sessions (r = 0.36, p = 0.008), whereas a large correlation was found for competence across the two coded sessions (r = 0.41, p = 0.002; Rosenthal & Rosnow, 1984). Within each session, correlations between adherence and competence scores were large (session 1 r = 0.83, p < 0.001; session 2 r = 0.86, p < 0.001; Rosenthal & Rosnow, 1984). Based on the pattern and magnitude of the correlations, we decided to create a single CAS-CBT Adherence and a single Competence subscale score for each group by averaging scores across the two coded sessions. The correlation between the CAS-CBT Adherence subscale score and CAS-CBT Competence subscale score was large (r = 0.86, p < 0.001).

Group comparisons indicated that the CAS-CBT Adherence scores were significantly higher in Vaag (M = 4.71, 95% CI [4.51, 4.91]) than in CK (M = 4.15, 95% CI [3.97, 4.34]; p < 0.001). Also, CAS-CBT Competence scores were significantly higher in Vaag (M = 4.42, 95% CI [4.17, 4.67]) than in CK (M = 3.97, 95% CI [3.74, 4.20]; p = 0.010).

Treatment Fidelity Predicting Outcome

We examined if treatment fidelity (CAS-CBT Adherence and Competence subscales) predicted change in anxiety symptoms or impairment (SCAS-C/P, CALIS-C/P; see Table 2). The CAS-CBT Adherence subscale and CAS-CBT Competence subscale did not significantly predict change in anxiety symptoms, all p ≥ 0.043. All the effect sizes were lower than the recommended minimum for practical significance (Ferguson, 2009).

Table 2 Estimates of adherence and competence as predictors of change in anxiety symptoms and change in impairment from anxiety

When the interaction between the CAS-CBT Adherence subscale and CBT program was examined, no significant interaction was found for any of the outcomes, all p ≥ 0.424. Similarly, no interaction between the CAS-CBT Competence subscale and CBT program was found for any of the outcomes, all p ≥ 0.528. Overall, these results indicate that treatment fidelity does not predict youth anxiety outcomes, nor does treatment fidelity predict youth anxiety outcomes differently across Vaag and CK.

The main model was also tested for whether adherence predicted outcome and whether competence predicted outcome for those defined as treatment completers. No significant predictions were found.

Residual Analyses

As the CAS-CBT Adherence and Competence subscales were highly correlated (r = 0.86), LMMs with both adherence and a residual competence variable were conducted (see Table 3). No significant predictions by treatment fidelity were found for any of the youth anxiety outcomes, all p ≥ 0.055.

Table 3 Estimates of adherence and residualized competence as predictors of change in anxiety symptoms and change in impairment from anxiety

Discussion

In this study, we evaluated treatment fidelity across brief and standard-length school-based CBT delivered by novice CBT providers, and the prediction of treatment fidelity on outcome. We found that both adherence and competence were significantly higher in brief CBT than in standard-length CBT. However, neither adherence nor competence predicted youth anxiety outcomes across measures, informants, or time points. There was no difference in prediction of outcome by treatment fidelity across brief and standard-length CBT.

Our findings suggest that it may have been easier for novice CBT providers in school health services to establish treatment fidelity in the brief program, than the standard-length program manual that was more complex and had more room for flexibility. The simple and structured design of the brief program might explain the difference in adherence and competence across the brief and standard-length CBT programs (Lyon & Koerner, 2016). Our findings are in line with a recent study, where school nurses delivered a brief anxiety intervention with adequate adherence, and the intervention was deemed feasible for the school setting (Ginsburg et al., 2019).

Having available school personnel, who may not have previous mental health experience, or not have previous knowledge of CBT delivering mental health interventions may help increase the availability of evidence-based interventions. This may be especially beneficial for youth with anxiety symptoms and anxiety disorders, who often do not receive help due to lack of available providers, and other barriers to treatment such as stigma (Reardon et al., 2018). Training novice CBT providers in delivering anxiety interventions will increase the number of available providers. Further, one could speculate that stigma may be a less prominent barrier to treatment when providers are not highly specialized mental health professionals (e.g., psychologist/psychiatrist) and the interventions are easily accessible in schools rather than in clinics. In light of a recent study indicating school health services may reach youth with elevated anxiety symptoms who otherwise do not receive help (Husabo et al., 2020), there are potential benefits of delivering indicated anxiety prevention in schools. As group leaders achieved adherence and competence levels in both interventions observed in previous studies (Bjaastad et al., 2018; Harstad et al., 2021), this suggests that novice CBT providers can achieve treatment fidelity when training and supervision is provided. Our findings further indicate that such efforts may be even more successful when the design of the program matches available providers’ training and experience (Lyon & Koerner, 2016).

Treatment fidelity did not predict any clinical outcomes at any time points. These findings are in line with a recent study on adherence and competence in a clinical study that also found no significant relationship between adherence or competence and youth outcomes (Southam-Gerow et al., 2021). Given the accumulation of findings suggesting that fidelity does not predict outcomes, it may be appropriate to consider other approaches to investigating this relation. As one example, researchers could investigate the role that other components of treatment fidelity play in promoting positive client outcomes (e.g., participant responsiveness). As another example, researchers could use statistical approaches that investigate how the fidelity-outcome association unfolds over treatment (e.g., using random intercept cross-lagged panel models; Selig & Little, 2012). These approaches may help further elucidate the relation between fidelity and outcome in mental health treatment.

Further, effect sizes in our study varied in direction and magnitude but were all below what can be interpreted as a small effect (Ferguson, 2009). Our effect size estimates are similar to results in a previous meta-analysis on treatment fidelity and clinical outcomes in adults, where non-significant effect-sizes were reported for adherence (r = 0.02) and competence (r = 0.07) (Webb et al., 2010). However, our effect sizes are lower than reported for adherence in a meta-analysis of mental health treatments for youth (r = 0.096; Collyer et al., 2019). The same meta-analysis reported a non-significant effect size for competence (r = 0.026; Collyer et al., 2019). This suggests that our findings are in line with the results for competence in the meta-analysis (Collyer et al., 2019). The studies included in this meta-analysis differed regarding settings (e.g., school, community clinics and university clinics), with only one of the included studies conducted within schools. There is clearly a need for more studies with rigorous methodology on treatment fidelity and outcome from this setting.

The lack of associations between fidelity and outcomes in our study could have several explanations. One explanation could be that a threshold level may exist, where adherence and competence are needed to achieve positive outcomes but have no additional effect above a certain point (Collyer et al., 2019; Durlak & DuPre, 2008). As both programs were delivered with adequate treatment fidelity, adherence and competence may have been above such a threshold level. We did not evaluate the adherence and competence of specific intervention components (e.g., exposure or cognitive restructuring) separately. It might be that some, but not all, components must be delivered with adherence and competence in order to attain positive outcomes. Thus, as long as adequate adherence and competence are present for these core components, higher levels of adherence and competence may not lead to better outcomes (Durlak & DuPre, 2008). That said, in a previous school-based anxiety intervention study, the adherence to specific components did not predict outcome (Becker et al., 2012).

The seemingly contradictory finding of no association between treatment fidelity and outcomes in a study of interventions showing effect, underscore the limited knowledge in the field of what exact mechanisms account for change in clinical outcomes. It is possible that non-specific factors, such as the quality of the alliance could be investigated, but these factors have not consistently predicted effects in previous studies (Chiu et al., 2009; Liber et al., 2010; Southam-Gerow et al., 2021). Group cohesion, defined as a client’s sense of belonging and bonding toward other members in group therapy, may also play an important role in predicting outcomes in group CBT (Lerner et al., 2013; see Luong et al., 2021), but this has not been investigated in group CBT for youth anxiety. More research is needed to identify the factors that account for improvement within CBT for youth anxiety.

We found significant overlap between adherence and competence, which is consistent with previous studies with this instrument (Bjaastad et al., 2018; Harstad et al., 2021). However, other studies focused on CBT for youth anxiety have found less overlap (rs 0.55–0.65; McLeod et al., 2018). An open question in the field is the degree to which adherence and competence represent distinct treatment fidelity components. Previous research has found correlations between adherence and competence ranging from 0.30 (Carroll et al., 2000) to 0.96 (Barber et al., 2003). The correlations tend to be higher when the same coders score adherence and competence (e.g., Barber et al., 2003; Bjaastad et al., 2018) as opposed to when independent coders score the two components (e.g., McLeod et al., 2018). Our findings suggest that it may be difficult, even for expert coders, to distinguish between adherence and competence.

Strengths and Limitations

Strengths of our study include a large sample, documentation of effectiveness of the CBT programs (Haugland et al., 2020), robust methodology for assessing adherence and competence with the use of independent coders, and a treatment fidelity measure with evidence of score reliability (Bjaastad et al., 2016). To date, few studies use treatment fidelity measures with established psychometric properties as there are currently no consistent guidelines for implementation measures (Collyer et al., 2019). Further, the brief and standard-length CBT programs cover broadly the same CBT modules (e.g., cognitive restructuring, exposure), but the brief CBT program is shorter, more structured, and allows less flexibility. Novice CBT providers trained in both interventions delivered the programs with no overlap between them. The same training and supervision procedures were used for both interventions and both interventions were delivered in all schools.

The study also has limitations. Though the evidence supports the CAS-CBT score reliability of the Adherence and Competence subscales, no studies have provided support for the convergent validity of these subscales. The coding of adherence and competence by the same coders may have influenced the correlation between the two variables so that they have not been able to judge them separately (McLeod et al., 2016). For better understanding of treatment fidelity, separate measures for adherence and competence may be beneficial. Limited variability in the Adherence and Competence subscales may have influenced our ability to find an association between treatment fidelity and outcomes. We used average scores of two sessions to generate the CAS-CBT Adherence and Competence subscale scores. This approach assumes that adherence and competence scores are stable over the course of treatment, which may not be the case (McLeod et al., 2018; Smith et al., 2017). It is possible that a measure of treatment fidelity more specifically targeted to the intervention manuals and/or the core components of the program would give a different result. The instrument we used may not tap sequencing and adaptation of program specific components such as cognitive restructuring and exposure in sufficient detail (see, e.g., Marques et al., 2019; Park et al., 2015). Further, including other aspects of treatment fidelity, e.g., participant responsiveness, could have added additional information. By obtaining information on the participants response to the way the providers administered the program content, we might have gained more understanding of the lack of association between adherence or fidelity and outcome. In addition, we could have tested whether participant responsiveness (a) was influenced by adherence and/or competence, and (b) influenced clinical outcomes.

Treatment fidelity of novice CBT providers may change over time with increasing experience. Distinguishing between treatment fidelity when delivering the first groups compared to later groups could be a way of testing this assumption. The youth and facilitators were primarily female and Norwegian, which could limit the generalizability of the findings.

Only two sessions from each group were coded, which may not be enough to provide an accurate estimate of treatment fidelity (Southam-Gerow et al., 2020). Moreover, this sampling plan resulted in a higher proportion of program content coded for the brief program due to the difference in the number of total sessions between the two programs (i.e., 40% of the brief program vs. 20% of the standard-length program). It is possible that this may have resulted in lower treatment fidelity ratings in the standard-length intervention.

Conclusion

This study contributes to the limited understanding of treatment fidelity and its relation to outcomes, specifically in school-based interventions. Both the brief and the standard-length intervention were delivered with adequate fidelity, by novice CBT providers. Higher levels of adherence and competence were found in brief compared to standard-length CBT, suggesting that it may be easier for novice CBT providers to establish fidelity in a program designed to be short and easy to deliver by non-mental health practitioners. Neither adherence nor competence predicted outcome in either of the two school-based interventions. As both fidelity and outcomes must be considered, further studies are needed to decide what interventions to implement in school settings.