Is Behavioral Activation (BA) More Effective than Cognitive Therapy (CT) in Severe Depression? A Reanalysis of a Landmark Trial

A landmark study (Jacobson et al., JCCP, 64:295–304, 1996) suggested that behavioral activation (BA) is as effective as cognitive therapy (CT) in the treatment of major depression. A conceptual replication supported the efficacy of BA and suggested BA is more effective than CT for severe depression (Dimidjian et al., JCCP, 74:658–670, 2006), though these findings have never been replicated. Outcome data from the participants in the BA and CT condition of the Jacobson et al. (JCCP, 64:295–304, 1996) trial were analyzed with the same analytic approach used by Dimidjian et al. (JCCP, 74:658–670, 2006). The sample was stratified on the Hamilton Rating Scale for Depression (HRSD) as higher-severity (HRSD ≥ 20) and lower-severity (HRSD ≤ 19). Treatment differences in change over time on the HRSD, Beck Depression Inventory (BDI), as well as response (≥ 50% change) and remission on each scale, were examined. Tests of moderation with severity as a categorical or continuous variable were conducted, and we explored the effect of severity by treatment on relapse. The results of Dimidjian et al. (JCCP, 74:658–670, Dimidjian et al. 2006) were not replicated. Tests of moderation with severity as a continuously measured variable (with the BDI or HRSD) also failed to find that BA was more effective than CT for more severe depression. No differences by severity emerged over the follow-up period. BA and CT may be roughly equivalent in the treatment of mild, moderate, and severe depression.


Introduction
Contemporary behavioral activation (BA) emerged as a treatment for depression following a landmark dismantling study (Jacobson et al. 1996). Jacobson et al. conjectured that the full CT for depression treatment could be divided into three broadly defined components: (1) behavioral activation (BA); (2) challenging automatic thoughts (ATs); and (3) modifying core beliefs (CBs). They randomized 151 participants to three conditions lasting a maximum of 20 sessions: (1) 100% BA; (2) all the elements of BA and AT work; and (3) Bfull CT^including BA, AT, and a minimum of 8 sessions devoted to CBs. Across various metrics, no statistically or clinically significant differences were observed during acute treatment or over the 2-year follow-up period (Gortner et al. 1998). 1 The BA condition of the Jacobson trial was expanded to feature functional analysis of behavior as a way of understanding as well as explaining depression and countering depressotypic patterns of behaviors (Martell et al. 2001). This form of BA was tested against CT, paroxetine, and a placebo control group in a sample of 240 patients (Dimidjian et al. 2006). In contrast to the earlier trial, however, the sample of that study was recruited and stratified based on the Hamilton Rating Scale for Depression (HRSD; Hamilton 1960) into a Bhigher severity^(HRSD ≥ 20) and Blower severity( HRSD ≤ 19) groups. Within the lower-severity group, there were no differences between the three active treatment groups. Within the high-severity group, BA outperformed CT, especially in regard to response (i.e., ≥ 50% reduction) on the Beck Depression Inventory (BDI; Beck et al. 1961) with 76% of high-severity patients meeting criteria for response in BA compared to 48% in CT and 49% with paroxetine. A subsequent trial also found BA to be superior to treatment with antidepressant medications (Moradveisi et al. 2013) and more effective in treating more severe depression.
Despite optimism regarding the promise of BA for severe depression, the data do not unequivocally suggest BA should be preferred to CT for more severe depression. In the recent COBRA trial, a large (N = 440) study that examined the efficacy of BA in primary care, Richards et al. (2016) did not find that BA was more efficacious than CT for severe depression. Additionally, process research does not support a differential effect of cognitive vs. behavioral interventions by severity. In one study of 60 patients with moderate to severe depression, Sasso et al. (2015) did not find that behavioral interventions were superior to cognitive ones among patients with more severe depression. By contrast, Hawley et al. (2017) reported that patient's use of cognitive skills predicted subsequent symptom change irrespective of symptom severity. Unexpectedly, in that study, patient's use of behavioral skills was predictive of greater symptom change among patients with milder, rather than more severe, symptoms of depression.
The current study examined the hypothesis that BA is more efficacious than CT in severe depression using data from the Jacobson et al. (1996) trial. We 1 There is confusion in the literature regarding the use of Bcognitive therapy (CT)^versus Bcognitive behavioral therapy (CBT).^CBT can be applied to refer to a family of interventions that are either cognitive, behavioral, or both. CBT can also refer to a specific intervention package that combines behavioral and cognitive interventions. CT focuses on challenging depressogenic cognitions using a set of strategies that may be cognitive or behavioral. We use the term CT for consistency as it was the one employed by Jacobson et al. (1996). sought to expand on the analyses by Dimidjian et al. (2006) by conducting a formal test of moderation, which is more appropriate than the stratification analyses Dimidjian et al. reported (Pocock et al. 2002). Stratified or subgroup (e.g., splitting the sample by severity and comparing treatment effects across levels of severity) analyses, are highly subject to spuriousness (Assmann et al. 2000). If one divides a dataset according to a specific variable and conducts statistical tests across subgroups, by chance, one subgroup will have a higher p value than another, and this p value may go over the p < 0.05 threshold. These stratified subgroup analyses are also limited by virtue of smaller sample sizes and the possibility of unequal variances across the subsamples. Moderation analyses, in which the interaction between the subgrouping variable of interest and the focal moderator are made to interact to predict outcome are preferrable to test subgroup questions (Pocock et al. 2002), in part because they require more power. In addition to testing the moderator findings during the acute phase of treatment, we also explored whether there were long-term differences in BA and CT according to severity, a question that was not explored in the follow-up study to the Dimidjian et al. trial (Dobson et al. 2008).

Sample
The sample for the present analyses consisted of 107 of the 151 participants described by Jacobson et al. (1996). We only included the Bpure^BA and Bfull^CT condition used in that study because they mirror the ways BA and CT are conducted. Participants in the study met the Diagnostic and Statistical Manual of Mental Disorders 3rd edition, revised (American Psychiatric Association 1987) definition of major depression as assessed by the Structured Clinical Interview for DSM-III-R (Spitzer et al. 1992). Additionally, participants were required to have both a score ≥ 20 on the BDI (Beck et al. 1961) and a score ≥ 14 on the HRSD (Hamilton 1960). Exclusion criteria included bipolar disorder, past or present psychosis, panic disorder, current substance abuse, organic brain syndrome, mental retardation, or the presence of imminent suicidal risk or psychosis, or active outpatient treatment. The study was approved by the University of Washington Institutional Review Board.

Treatment
Randomization was based on matching for prior depressive episodes, severity, comorbid dysthymia, depression severity, sex, and marital status. The treatment conditions in the original trial were BA, AT, and CT. Our analyses focused on the BA and CT conditions, but all results reported with the BA vs. CT contrast apply to the contrast of BA vs. AT/CT. The treatment conditions can briefly be described as: BA-The aim of the behavioral activation condition was to foster meaningful engagement with the environment. Interventions included activity monitoring, assessment of pleasure and mastery, graded task assignment, problem-solving, and social skills training.
CT-The full CT condition included elements of the BA as well as elements of the AT condition which was focused on the identification of cognitive distortions and their modification via completing thought records to assess the validity of beliefs, responding in more functional ways to negative thoughts, and behavioral experiments. It also added a focus on schemas or core beliefs. These interventions are aimed at revealing an underlying assumption that cuts across specific situations (e.g., BI am unlovable^) as well as those that explore its pros and cons, and possible alternatives. Therapists in the trial were required to focus on this kind of work for at least eight sessions.

Outcome Measures and Analytic Strategy
All participants were assessed and administered the BDI and HRSD before therapy, at the time of termination, and at 6-, 12-, 18-, and 24-month follow-ups. The BDI was administered before every treatment session. The timing of the BDI and HRSD differs somewhat between the current trial and the Dimidjian et al. (2006) trial, as the Dimidjian et al. (2006) trial had significantly fewer BDI assessments (i.e., only pre-, mid-, and post-treatment as well as early termination or as clinically indicated) but used an additional HRSD assessment mid-treatment. Beyond these minor differences, we closely followed the analytic plan outlined by Dimidjian et al. (2006; see page 662). For example, gender was controlled in all of our analyses because it was differentially represented, and controlled for, in that trial.
Response on the BDI and HRSD was defined as a 50% or more reduction in the pretreatment scores on these measures. Remission was defined as scores ≤ 7 on the HRSD and ≤ 10 on the BDI. As in Dimidjian et al. (2006), separate analyses were conducted for each outcome metric within the higher (i.e., HRSD ≥ 20) and lower (i.e., HRSD ≤ 19) severity subgroup. A hierarchical linear model (HLM), was used to investigate treatment differences in change over time on the BDI with the full intent to treat (ITT) sample. This HLM model included the mixed effect of time (i.e., the session number with session 1 being '0,' session 20 being '1' and other sessions as fractions) as well as treatment condition (BA vs. CT, coded ± 0.5), gender, baseline BDI, and their interactions with time. Random effects for the intercept and slopes were used for these analyses. Because we only had two observations of the HRSD during acute treatment, a general linear model was conducted in which raw change was regressed on the treatment condition and gender. Because the end-of-treatment HRSD was skewed, for these analyses we transformed the scores according to the two-step variable transformation procedure proposed by Templeton (2011) which retains the mean and standard deviation of the variable. Treatment differences in categorical rates of response, remission, and their combination were examined using the Cochran-Mantel-Haenszel (CMH) tests. All analyses were conducted using last observation carried forward (LOCF).
Formal tests of moderation on severity, assessed categorically and continuously, were also conducted. In these HLMs, the BDI at each session was regressed on the mixed effects of time, treatment condition, baseline severity (when categorical, ± 0.5, when continuous, mean centered), and the full factorial of time, treatment condition, and severity. The time by treatment condition by severity interaction indicates whether there were differences in the rate of change across the range of severity. We also evaluated the effects of treatment, by severity, over the follow-up period using a Cox regression to model depressive relapse (defined as meeting major depressive disorder criteria) among responders on the HRSD who were followed over 2 years. Indices of effect size included Cohen's d for continuous differences, the d-type effect size described by Feingold (2013) for HLM, and odds ratios (OR) for categorical differences. Because we coded treatment condition as 0.5 for BA and − 0.5 for CT, d-type effects sizes that are positive or ORs over 1 indicate a superiority of BA over CT. d-type effects sizes that are negative or ORs less than 1 indicate a superiority of CT over BA.

Discussion
The current study examined the hypothesis that BA is more effective than CT for severe depression, by employing a very similar analytic strategy to the study that reported this original finding (Dimidjian et al. 2006). In line with the findings from the COBRA trial (Richards et al. 2016), we found no evidence for the superiority of BA relative to CT for more severe depression across acute treatment or a long-term follow-up.
Although our results were clear, there are limitations of the study that warrant consideration. First, this was a secondary analysis of relatively old study data. The BA condition from the Jacobson trial was expanded by Dimidjian et al. (2006) and are thus not fully comparable. Moreover, the study has a relatively small sample size. Several factors and strengths mitigate these limitations. Although the two trials did not follow the exact same manual, Jacobson and Gortner (2000) deemphasized the differences between the two treatments when they stated that they Bkept the same set of treatment options, but created a behavior analytic theoretical framework^which guided therapist selection of interventions and was also taught to patients. Moreover, there is no evidence that BA with a focus on behavioral chain analysis is more effective than BA following a different rationale (Nyström et al. 2017). The Jacobson et al. (1996) and Dimidjian et al. (2006) studies were designed and conducted by similar teams, which suggests that the trials are more comparable than randomized controlled trials usually are. Additionally, we closely matched our analytic strategy to mirror the Dimidjian et al. study to ensure that differences in findings are not attributable to statistical artifacts. Despite issues related to sample size, the number of people randomized to BA (n = 50) and CT (n = 57) in the current study is larger than the numbers randomized in the Dimidjian et al. study (43 and 45,respectively), and studies with 50 or more participants per treatment arm, as in the current case, are rare in depression treatment research (Barth et al. 2013).
Statistical and methodological artifacts cannot be ruled out as an explanation for either the current results or those of Dimidjian et al. (2006) and Moradveisi et al. (2013). As the debates over the issue of reproducibility in psychology have illuminated, spurious effects are not uncommon or unexpected (Pashler and Harris 2012). Effects resembling statistical moderation are especially sensitive to study and analysis design features and may be less likely to replicate than other results (Aguinis and Stone-Romero 1997).
Other factors may also explain why the effects reported by Dimidjian et al. (2006) and Moradveisi et al. (2013) were not replicated in the current study or in the larger COBRA trial. It is possible that the optimal matching of patients to BA vs. CT or medications, may not be contingent on a single variable such as severity but the interaction among several variables which were differentially represented across the trials. Driessen et al. (2016) illustrate how this pattern of results could occur in a study that modeled non-linear interactions among multiple baseline variables. In their study, a small advantage of psychodynamic therapy over CT was evidenced for patients with depression that was both severe and chronic whereas a large advantage of CT was observed if depression was both severe and non-chronic. In mild to moderate depression, there were no advantages of one treatment over the other unless anxiety levels were low, in which case psychodynamic therapy had an advantage over CT. It is possible that the superiority of BA over medications and CT for severe depression in the Dimidjian et al. and Moradveisi et al. trials interacted with unmeasured third variables. Another alternative explanation is that BA may be superior to CT in cases of severe anxiety, but not severe depression (Sasso et al. 2015), as in the Dimidjian et al. trial, the severity grouping was performed with the HRSD which captures symptoms of depression and anxiety (Porter et al. 2017).
In summary, BA and CT appear to be efficacious treatments for depression. While CT has more evidence for its efficacy, BA may be easier to implement by paraprofessionals (Richards et al. 2016). Our results suggest that severity, at least by itself, is not a moderator of their efficacy. Thus, more research is needed on multiple moderators of response to either treatment, as well as the most effective and cost-effective methods for delivery. In lieu of another large RCT like the COBRA trial, an individual patient data meta-analysis could identify moderators of response to BA vs. CT.

Compliance with Ethical Standards
Conflict of Interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.