The high prevalence of depression and the burden it places on the individual and society has long been recognized (Cuijpers et al. 2007; Fried and Nesse 2014; Greenberg, Fournier, Sisitsky, Pike, and Kessler 2015; Kessler et al. 2005; Mathers, Fat, and Boerma 2008; Mehta, Mittal, and Swami 2014; Trivedi 2004). Next to antidepressant medication (ADM), several psychotherapeutic approaches to the treatment of major depression exist, among which cognitive therapy (CT; Beck 1964) is one of the most extensively researched (Cuijpers et al., 2013b; Hofmann, Asnaani, Vonk, Sawyer, and Fang 2012). Evidence has accumulated showing that CT can be as effective as ADM (DeRubeis, Gelfand, Tang, and Simons 1999). DeRubeis et al. (2005) conducted one of the first large trials in which they compared ADM, CT, and a pill placebo for patients with moderate to severe depression. Their overall results did not show significant differences between the active conditions at the end of treatment. However, both outperformed the pill placebo group. A meta-analysis by Cuijpers et al. (2013a) points in a similar direction. The authors found no significant differences between CT during the acute phase and continued ADM. This held for both short- and long-term effects. CT is now recommended as a first line choice in the treatment guidelines for depression in the Netherlands (Spijker et al. 2013).

The study reported herein is a pragmatic pilot trial that has been conducted as part of a series of Dutch trials in outpatient settings. Despite the original CT manual (Beck, Rush, Shaw, and Emery 1979) recommending two weekly sessions in the beginning of therapy, most therapists in Europe generally see their patients only once per week (Bruijniks et al. 2015). Therefore, the first aim of this trial was to test the implementation of a regimen of two psychotherapy sessions per week. Secondly, we wanted to inform a large ongoing study described in Bruijniks et al. (2015) on the effects of session frequency and other mechanism of change in psychotherapy for depression. Thirdly, we intended to provide a benchmark for the effectiveness and mechanism trial by Lemmens et al. (2015), which compared one session per week of CT to interpersonal psychotherapy (IPT). In addition to session frequency, we investigated specific and nonspecific processes that are believed to underlie symptom change. To do this, we compared a group of patients receiving CT to a group receiving behavioral activation (BA), while also measuring factors presumed to underlie symptom change in depression at multiple time points, in an attempt to understand how these treatments work. BA was chosen as a comparator due to its ease of application and renewed interest in the approach (Richards et al. 2016). Earlier research has indicated that BA is as effective as CT (Barth et al. 2013; Cuijpers, Andersson, Donker, and van Straten 2011; Dimidjian et al. 2006; Jacobson et al. 1996).

With respect to the purported mechanisms of change, CT posits that resolution of negative cognitions and schemas leads to improvement (Beck et al. 1979; Butler and Beck 1995), an assumption that has received considerable validation (DeRubeis et al. 1990; Renner, Lobbestael, Peeters, Arntz, and Huibers 2012; Warmerdam, van Straten, Jongsma, Twisk, and Cuijpers 2010). However, it has been shown that such cognitions co-vary with the disorder (Beevers and Miller 2005; Teasdale 1983). Moreover, though CT places unique importance on negative thoughts, a reduction in such cognitions seems to accompany most interventions (Fava, Bless, Otto, Pava, and Rosenbaum 1994; Garratt, Ingram, Rand, and Sawalani 2007; Kovacs, Rush, Beck, and Hollon 1981), making it difficult to clearly identify them as the underlying change mechanism. An analysis of the temporal sequencing of the outcome (e.g., depression) and the potential mechanism or mediator (e.g., negative cognitions) of a specific treatment approach is necessary (Kraemer, Wilson, Fairburn, and Agras 2002).

In contrast to CT, the BA model proposed by Lewinsohn (1974) makes no assumption about clients’ thinking patterns. Instead, it presumes a lack in positive reinforcement as the maintaining factor of depression, and emphasizes the importance of behavioral activation. Therefore, we assessed this latter factor as well. Finally, the therapeutic relationship has long been hypothesized as a nonspecific driving force in therapy success (Horvath and Symonds 1991; Martin, Garske, and Davis 2000); we therefore also assessed participants’ perception of the relationship with their therapist. To our knowledge, this is the first study that directly compared CT and BA in a Dutch healthcare setting. In this report, we provide the results of the pragmatic pilot trial.

Method

Design

Participants were recruited from a Dutch outpatient treatment center (Hendriks & Roosenboom, Arnhem; now Dr. Bosman, Arnhem). They provided informed consent before being included in the study. Inclusion criteria were a diagnosis of depression based on the results of the Structured Clinical Interview for DSM Disorders (SCID-I; First, Spitzer, Gibbon, and Williams 2002) and a score above 20 on the Beck Depression Inventory, Second Edition (BDI-II; Beck, Steer, Ball, and Ranieri 1996). Participants were excluded if they had severe comorbid conditions, such as substance dependence or schizophrenia. Moreover, participants were also excluded if they had received CT or BA within 12 months preceding the study, were taking antidepressant medication that could not be discontinued, or reported any level of suicidal ideation. The last criterion was applied due to the specific care that would be called for, in particular, the need for medication. Since the treatment center only provides specialist care for anxiety and mood disorders, incoming patients with (severe) personality or bipolar disorders are always referred elsewhere. Eligible patients were allocated to therapists who were in turn randomized to provide either BA or CT. Although a specified protocol detailing the allocation procedure to avoid confounding by therapist availability (e.g., more than one therapist available) was not employed, such confounding is highly unlikely. Thus, our trial approached quasi-randomization. Following assessment, patients were registered at the secured online research platform, which was used to collect the outcome measures at predefined time intervals.

Interventions

We scheduled two therapy sessions per week during the first 8 weeks of the study. The remaining sessions were held on a weekly basis, the number of which depended on the progress of the individual patient. Although all therapists underwent formal CT training, most were relatively inexperienced. To optimize treatment and increase motivation to participate in this trial, they received additional training. This was provided in the form of separate two-day courses by two experienced clinicians in the field of CT and BA (S.D. Hollon and C. Martell). In addition, they received frequent peer supervision. Though a formal integrity check was not performed, treatment integrity was monitored by means of weekly peer consultation and occasional videoconferences with the two trainers. The CT approach was based on the original treatment manual developed by Beck and his colleagues (Beck et al. 1979) using adaptations described elsewhere (Lemmens et al. 2011). Behavioral components were thus not specifically excluded in this approach. BA was provided according to the manuals developed by Martell and colleagues (Martell, Addis, and Jacobson 2001; Martell, Dimidjian, and Herman-Dunn 2010) as applied in a previous study on the effectiveness of BA (Moradveisi, Huibers, Renner, Arasteh, and Arntz 2013).

Measurements

Participants completed all measures using the online platform and personal log-in codes. Following baseline assessment, measures were taken during the active treatment phase at 4, 8, 12, and 16 weeks. As described before, by week eight most participants had received 16 therapy sessions. Follow-up measures were taken at weeks 20 and 24. Dutch translations of the original instruments were used.

Depressive Symptoms

Severity of depression was measured using the BDI-II, a 21-item tool with demonstrable reliability and construct validity for the original and Dutch version (Beck et al. 1996; Kuhner, Burger, Keller, and Hautzinger 2007; Nolen and Dingemans 2004). Values on the inventory range from 0 to 63 with higher scores indicating more severe depressive symptoms. More recent studies suggest a score below 12 to indicate at most minimal depression or the point of remission (Riedel et al. 2010).

Negative Cognitions

Maladaptive negative beliefs were assessed using two instruments. The 17-item version of the Dysfunctional Attitude Scale (DAS-17; de Graaf, Roelofs, and Huibers 2009) and the revised version of the Automatic Thought Questionnaire (ATQ-R; Kendall, Howard, and Hays 1989; Raes and Hermans 2011), The ATQ-R was used to complement the DAS-17 as a measure of more superficial cognitions. Higher scores on both instruments refer to more negative cognitions.

Behavioral Activation

The Behavioral Activation for Depression Scale (BADS; Kanter, Mulick, Busch, Berlin, and Martell 2007; Raes, Hoes, Van Gucht, Kanter, and Hermans 2010) consists of four factors, each of which comprises a subscale. The activation subscale (BADSa, 7 items) measures the degree of activation as perceived by the patient. Here, higher scores refer to more activation. The avoidance/rumination subscale (BADSar, 8 items) assesses patients’ ruminative and avoidant behaviors with higher scores relating to more of these instances. The school/work impairment and social impairment subscales (BADSws and BADSs, each 5 items) examine problems in the respective areas and higher scores indicate more impairment.

Therapeutic Alliance

The therapeutic alliance was examined with the 12-item form of the patient-rated version of the Working Alliance Inventory (WAI-S-C; Hatcher and Gillaspy 2006). The instrument incorporates three subscales, each consisting of four items. These refer to the perceived congruence between therapist and client on (a) the goals of therapy (WAI-S-Cg), (b) the assumption that the tasks employed in therapy will aid in problem resolution (WAI-S-Ct), and (c) the quality of the relationship within the therapeutic dyad (WAI-S-Cb). Higher scores on any subscale indicate a better relationship between practitioner and patient.

Statistical Analysis

Analyses were conducted at significance level α = .05 and based on the intention-to-treat (ITT) sample. Approaches that are insensitive to missing data (such as multilevel modeling) were not feasible in this study due to the size of our sample. Accordingly, initial attempts to use such analyses failed. In order to enable the use of other analyses, multiple regression imputation was used to replace incomplete information on the outcome and process measures. The investigation of missing data revealed no systematic pattern, confirming the validity of the application of multiple regression imputation. Random numbers were generated using the Mersenne Twister option and the results from a total of five imputations were aggregated to compose the final values. To preserve the explanatory power of the multiple assessment points, repeated measures analysis of variance (RM-ANOVA) were employed. BDI-II scores measured at all seven time points (i.e., including both follow-up assessments) were used as within-subjects variables to assess the effectiveness of the treatment modalities. Treatment condition was coded as − .5 for CT and .5 for BA and then added as the between-subjects variable (as suggested by Kraemer, Kiernan, Essex, and Kupfer 2008). Violations of sphericity were countered by using the Greenhouse-Geisser correction. Similarly, the change and temporal sequence of each individual process measure over the course of the treatment phase (baseline to week 16) was subjected to a RM-ANOVA with treatment condition as between-subjects factor (to assess differential process change in the two treatment conditions). Significant interactions between time and treatment condition were further investigated using simple contrasts for differences between baseline and each following assessment point per condition. Between- and within-group effect sizes (Cohen’s d) were computed for two time points per condition. The former were based on the mean scores at the end of the treatment phase (week 16) and last mean scores at follow-up (week 24). The latter were calculated from the difference between mean scores at baseline and week 16 and baseline and follow-up at week 24 (within the CT or BA group) and corrected for correlations between means. Next, exploratory analyses were conducted to investigate the predictive power of early change in process measures on later change in depression reduction. To do this, mean centered change scores from baseline to scores achieved at week 4 were computed for all process measures to comprise early change on these variables. Late change on depression was defined as the difference between BDI-II scores at week 4 and the last assessment during treatment at week 16 (before follow-up). Using multiple linear regressions, late change in depression was then regressed on early change in process measures, controlling for baseline BDI-II scores. To investigate differential effects of BA and CT, treatment condition and its interaction with change on mean centered process measures was included in the equations. Bonferroni corrections were used to counter inflation of type-I error rates (i.e., αcorrected .05/10 comparisons = .005). As outlined before, it is important to assess the direction of change and therefore look at the temporal sequencing of it (see also Kraemer, Stice, Kazdin, Offord, and Kupfer 2001; Kraemer et al. 2002). To provide a basic analysis of this, vice versa regression analyses were conducted. Early change in depression severity from baseline to week 4 was regressed on late change in the process variables, i.e., from week 4 to week 16. Case wise diagnostics with a two standard deviations (sd) criterion were used to identify potential outliers and analyses were rerun after these were controlled for.

Results

Participants

In total, 44 patients were eligible for participation. Only one (allocated to BA) dropped out of the study before the start of the first session. Forty-three participants were therefore left for assessment (CT = 23, BA = 20). Their age ranged from 19 to 72 with a mean of 39 years (sd = 14.5). Baseline scores on the BDI-II identified them as a highly depressed sample with mean scores of 30.2 and 32.1 in the CT and BA group respectively. Randomization proved successful in the equal distribution of demographic and baseline characteristics of process measures and depression scores. A slight majority of participants was female (51.2%) and most were of Dutch origin (95.3%). Table 1 provides an overview of demographic data per treatment condition. Inconsistent provision of information on depressive symptoms and processes was substantial, making imputation necessary. In the CT condition, 8.7% of data on all outcome and process measures were missing in week 4, further increasing to 34.8% in week 8. During weeks 12 to 20, the proportion sunk slightly to 30.4% only to increase at the last assessment in week 24 to 39.1%. In the BA condition, 5% of information on the ATQ-R-NL was missing at baseline assessment. For the remaining time points, the proportion was equal for all variables. From 20% in week 4, progressing to 35% in week 8 and slightly declining to 25% and 20% in weeks 12 and 16 respectively. At the first follow-up assessment, 35% of participants did not provide information on outcome measures, though only 30% of this information was missing at the second follow-up. χ2 tests indicated no differences in these percentages between the two conditions at any time point (all p ≥ .28).

Table 1 Summary of demographic variables per condition

At the end of the study, some patients were still in treatment. In these cases, the focus of therapy generally shifted towards issues not covered by the initial diagnosis of depression. Table 2 depicts a summary of descriptive statistics of depression and process measures after missing data were imputed. Based on the available data, the mean number of sessions received over the whole assessment period (i.e., including follow-up) was 21.3 (sd = 20.1, N = 19) in the CT condition and 19.6 (sd = 13.7, N = 18) in BA.

Table 2 Descriptive statistics at baseline end of treatment, and follow-up, as well as within- and between-groups effect sizes for these time points

Symptom Reduction

The results of the RM-ANOVA showed a significant reduction in depression scores over time, F(4.4, 179.8) = 39.7, p < .001. However, no differences between conditions emerged, F(4.4, 179.8) = .96, p = .44. The mean score on the BDI-II at the end of the study treatment phase in week 16 was 18.4 (sd = 11.6) for CT and 20.2 (sd = 10.6) for BA. At the last follow-up assessment in week 24, these scores were 15.4 (sd = 9.1) for participants receiving CT and 14.4 (sd = 7.9) for those in the BA group. The associated within-groups effect sizes at the last assessment were large with a Cohen’s d of 1.89 for CT and 1.69 for BA (see Table 2 for all effect sizes). Figure 1a shows the rate of change on depression scores over the entire course of the study for both conditions.

Fig. 1
figure 1

Change for scores on (a) the BDI-II, (b) the DAS-17, and (c) the BADS activation subscale for both conditions

Change in Process Measures over Time

The results of the RM-ANOVA showed that all process measures underwent significant change over time. Change in the direction that indicates improvement was found for the DAS-17, F(4, 164) = 6.7, p < .001; the BADS total score, F(4, 164) = 11.6, p < .001; the BADS activation subscale, F(4, 164) = 6.0, p < .001; the WAI-S-C goal subscale, F(3.4, 139.1) = 3.5, p = .01; the WAI-S-C task subscale, F(4, 164) = 2.4, p = .05; the WAI-S-C bond subscale, F(3.2, 131.2) = 6.5, p < .001; and scores on the ATQ-R-NL, F(3.1, 128.5) = 12.1, p < .001. Change in a negative direction (e.g., more ruminative behavior) was found on the BADS avoidance/rumination subscale, F(4, 164) = 7.3, p < .001; the BADS work/school impairment subscale, F(2.9, 118.7) = 7.6, p < .001; and the BADS social impairment subscale, F(3.1, 127.4) = 6.2, p < .001.

Interactions with Treatment

Two significant interactions emerged between treatment condition and time predicting scores on process measures. One was the DAS-17, F(4, 164) = 2.5, p = .04: negative cognitions decreased in both conditions, but this change occurred earlier in BA compared to CT (see also Fig. 1b). This was confirmed by the simple contrasts comparing change on the DAS-17 from baseline to each subsequent assessment point. For the CT condition, the planned comparisons became significant at week 16 and remained so at both follow-up assessments (all p < .02), whereas in the BA group, these contrasts were significant throughout the course of the study (p ≤ .04) and still marginally significant at the second follow-up in week 24 (p = .06). The second significant interaction was found for the BADS activation subscale, F(4, 164) = 2.6, p = 0.4: activation increased more steeply in CT, whereas this was relatively stable in BA (see also Fig. 1c). The simple contrasts for the BADS activation subscale showed that the planned comparisons from baseline to each subsequent assessment point were significant throughout all measurement points for CT (all p ≤ .01). In BA, the contrasts were only significant for the two follow-up measures (all p ≤ .04).

As shown in Table 2, within-group effect sizes of the process measures were generally moderate to large according to cut-points suggested by Cohen (1988). Merely, the effect of the WAI-S-C task subscale in week 24 almost disappeared in the BA condition, confirming the results of the analysis of change over time.

The effect sizes of all measures between the two groups were largely small (see Table 2). Taking into account the orientation of each instrument (i.e., do higher scores refer to improvement or impairment), a moderate effect was found on the BADS total score at week 16 (d = .56), indicating more overall activation in the CT condition at that point, which is confirmed by the results on the activation subscale at the same time (d = .34). The effect on the avoidance/rumination subscale (d = .54) shows less of these instances and therefore more favorable outcomes for the BA condition at week 16, and the same was true for the social impairment subscale (d = .7). However, both of these were diminished or disappeared at follow-up. At week 24, one moderate effect emerged on the WAI-S-C bond scale (d = .45), indicating a better relationship between patient and therapist in the CT condition. This effect reversed from the time of the last assessment during the study treatment phase in week 16 (d = − .36).

Prediction of Late Change in Depression

Early change on process measures was not predictive of late change in depression for any of the variables. This was true even before Bonferroni corrections were applied (all p > .36). Removing two outliers (one from each treatment condition, identical cases for all measures) did not significantly change these results. Moreover, treatment condition and its interaction with early change on each process measure did not predict late change in depression (all p > .11). Similar results were obtained when vice versa regressions were conducted in order to predict late change in process measures from early change in BDI-II scores (all p > .34).

Discussion

Main Findings

We observed significant reductions over time on the BDI-II as a measure of depression in both conditions (although in the absence of differences relative to any control condition we cannot rule out the occurrence of spontaneous remission). At the end of the study, many participants still evidenced residual complaints and were still in treatment, which suggest that patients in routine settings need more and continued treatment to recover from depression. There was no statistically significant difference in BDI-II scores between the CT and BA group. Although this is in line with meta-analytic evidence (Barth et al. 2013; Cuijpers et al., 2013b), the twice-weekly session frequency could have led to an accelerated symptom reduction, distinguishing the effects from other studies that used different or similar frequency regimens. In this regard, the study presented herein served as a benchmark for the randomized controlled trial by Lemmens et al. (2015). We observed similar reductions in depression scores during a shorter period of time using twice-weekly sessions compared to the weekly sessions of either CT or IPT in the trial by Lemmens et al. (2015).

Despite the promising results for the early stages of treatment, they should be considered with caution. In their seminal randomized controlled trial, Dimidjian et al. (2006) compared the effects of CT, BA, and ADM for the treatment of depression. They found that for the highly depressed participants, BA outperformed CT. Moreover, on average, their participants experienced a reduction of around 13 points on the BDI-II in the CT and around 20 in the BA condition during the early therapy phase, therefore significantly exceeding the reductions found in our trial. A potential reason could be different levels of therapist experience and skill between the trials. Moreover, adherence to the treatment manual could not be monitored continuously since there was no formal integrity check.

Process Measures

All processes that were hypothesized to be associated with symptom improvement changed over the course of treatment but mostly in a manner independent of treatment modality. It is conceivable that these processes were influenced directly by the relatively early changes in symptomatology and could therefore not be reliably distinguished as mechanisms of treatment.

Only negative cognitions measured by the DAS-17 and behavioral activation by the respective BADS subscale interacted significantly with treatment condition in the RM-ANOVA, pointing to a differential change path in BA and CT. However, as shown in Fig. 1b, the effect on the DAS-17 was largely a consequence of a sudden drop in maladaptive cognitions during the initial phase of BA. The post-hoc contrasts confirmed this relatively early reduction on negative cognitions in the BA group. This association is not what would have been predicted by cognitive theory. However, as Lorenzo-Luaces, German, and DeRubeis (2014) suggest in their extensive review, noncognitive interventions can produce changes in cognitions. This therefore does not disqualify cognitive change as a potential process underlying symptom reduction. In addition, depression is a complicated condition that involves a web of intertwined behavioral and cognitive factors. However, as is also shown in the figure, a comparable amount of change occurred on the DAS-17 in the CT condition at a later stage. Thus, the two treatment modalities were associated with comparable change in the DAS-17 over time, with the bulk of the observed change in BA happening over the first half of the study and the bulk of the change in CT happening over the second half. These findings are in line with previous research, showing that maladaptive thinking patterns “wax and wane” with the condition (Beevers and Miller 2005). Similarly, overall behavioral activation measured by the BADS activation subscale increased more in the CT group, as indicated by the RM ANOVA and effect sizes. This finding seems to counter the theoretical underpinnings of BA. Though the reasons for this are unknown, it is noteworthy that the CT approach did not specifically exclude behavioral techniques. In addition, cognitive restructuring techniques can lead to behavioral change. A general increase in activity was therefore not surprising.

An unexpected finding was the fact that most of the other constructs measured by the BADS changed significantly, but in a way that represents more problems in areas such as work, school, or other social contexts, and an increase in avoidant and ruminative behavior. Relatively more issues in the social domain were experienced by patients in the CT group. One explanation for this could be that they reported a larger increase in overall behavioral activation. More specifically, withdrawal from social situations has long been noted as a concurrent symptom in depression (Heinrich and Gullone 2006). Breaking through this containment is therefore certainly an uneasy process. In addition, healthy individuals often react rather negatively to depressed persons and their behavior (Sacco and Vaughan 2006), making the step out of the isolation even more difficult and confronting. The perception of an increase in problems in the social domain for patients in the CT group could therefore be explained by the fact that they encountered social situations more often. However, the between-groups effects on the BADS avoidance/rumination, work/school and social impairment subscales were either diminished or near zero at the follow-up assessment, showing that negative outcomes of such confrontation are not necessarily persistent.

The moderate effect size associated with the bond between therapist and patients suggests that at week 16, patients in the BA condition rate the quality of the relationship relatively better compared to those receiving CT but that this effect is reversed at follow-up. One possible interpretation is that the intensive consideration of thoughts and feedback by the therapist in CT increased the connection with the patient in the long term. On the other hand, the behavioral approach, using more applied exercises together with the therapist, could have enhanced the rating of the quality of the dyad particularly during the active treatment phase of the study, relative to the CT condition. Moreover, the bond between patient and therapist has been considered as a result of symptom improvement, rather than a mechanism in itself (Webb et al. 2011). The moderate size of the effect of this relationship at follow-up could therefore be a retrospection of some patients (as many of them did not receive treatment anymore at this point) that was the more positively evaluated, the better treatment outcomes were. However, as outlined before, others were still in therapy at this last assessment. For these patients, the WAI-S-Cb was a more valid measure of the bond with the therapist.

Strengths and Limitations

There were several limitations to this study that we tried to counter as far as this was possible. Although the design we used prevented us from drawing causal inferences, the finding that several of the processes that we observed (all derived from theory) did change in the predicted direction simultaneously with symptom change has an exploratory value. Moreover, the allocation procedure we used approximated quasi-randomization, an appropriate approach in this routine care setting. To counter usual criticisms of such pragmatic contexts, we specifically targeted and reduced the potential for the provision of lower quality therapy in effectiveness compared to efficacy trials. We did so by assuring good training and supervision by experts, as well as frequent peer meetings, although a formal integrity check was not performed. Despite these precautions, we cannot state that the level of therapists’ experience was equivalent to the strict requirements normally applied in efficacy studies. Finally, the small sample size we recruited affects the inferences drawn. Attempts to implement more complex multilevel approaches for effectiveness and mediation analyses failed due to unsatisfactory model fit. However, the number of participants is a general issue in clinical research and our sample falls within a range comparable to some other studies (e.g., Troeung, Egan, and Gasson 2014; Weiss et al. 2012) and the chosen statistical approach is a sophisticated alternative in this context. Finally, missing data is often problematic and this trial suffered from a high degree of it. We used the best possible analytic approach in this situation, namely multiple imputation. Although analyses indicated no differences between the two conditions on the percentage of missing data, this does not mean that this is not a major concern. The results need to be interpreted with caution, as the imputation may have inflated results.

Conclusion

Both CT and BA led to significant reductions in depressive symptomatology that was comparable to other studies and there is basic evidence that increasing the session frequency can have beneficial effects when compared to a benchmark trial. Moreover, the comparable effectiveness of the BA and CT approaches adds to the literature, which finds BA to be potentially more easy to implement and more cost effective (Gilbody et al., 2017; Richards et al. 2016). However, there was no clear evidence of differential change with respect to purported underlying mechanisms. Although negative cognitions showed a time-lagged decrease in favor of BA, other analyses did not indicate differential change in the two conditions. The same was true for perceived activation (i.e., measured by the BADS activation subscale). That could have been the case if the respective theories are correct even in the absence of differences in outcomes between the treatment conditions. The overall finding that problems in the social domain as measured by the respective BADS subscales (i.e., avoidance/rumination, work/school, and social impairment) increased instead of decreased over the course of the study could be a result of overall activation.

To our knowledge, this is the first time a study on the comparison between both modalities employed multiple assessments of process measures in a pragmatic sample in routine practice. However, a limitation of this study is the relatively small sample we recruited, and we encourage other researchers to replicate our approach with a larger sample.