Major depressive disorder (MDD) is the most common mental health disorder worldwide (Sartorius, 2001). Antidepressant medication is a standard treatment; yet, treatment failures are frequent—over one third of patients do not fully recover after two or more courses of antidepressants, a condition known as treatment-resistant depression (TRD) (Janicak & Dowd, 2009; Souery, Papakostas, & Trivedi, 2006; Trivedi et al., 2014). As compared to depression that responds to treatment, TRD is associated with greater rates of relapse, prolonged disability, higher medical costs, and lower life quality (Fekadu et al., 2009; Judd et al., 2000; Russell et al., 2004). Consequently, understanding the pathophysiology of TRD is critical for developing effective treatments and reducing its social and personal burden.

MDD has been characterized by enhanced attention to (Gotlib, Krasnoperova, Yue, & Joormann, 2004; Koster, De Raedt, Goeleven, Franck, & Crombez, 2005) and processing of (Leppänen, 2006) emotional information. At the neural level, enhanced processing is often reflected by greater amygdala responses to emotional stimuli (Anand et al., 2005; Sheline et al., 2001), however this is not always the case (Thomas et al., 2001). Individuals with MDD also report difficulty regulating emotions (Beauregard, Paquette, & Levesque, 2006), and demonstrate smaller reductions in amygdala activation than healthy controls (HCs) during emotion regulation (Beauregard et al., 2006; Kanske, Heissler, Schönfelder, & Wessa, 2012). Greater depression severity has been related to increased amygdala activation during passive viewing of emotional stimuli (Lee et al., 2007) and during emotion regulation (Erk et al., 2010). These findings suggest that MDD is characterized by abnormal amygdala activation to affective stimuli that is influenced by depression severity.

Amygdala activation to emotional stimuli can attenuate following the treatment of depression using selective serotonin reuptake inhibitors (SSRIs; Delaveau et al., 2011; Sheline et al., 2001) or cognitive behavior therapy (CBT; Fu et al., 2008). Depressed individuals with the greatest amygdala reactivity prior to treatment appear to benefit the most, at least in the short-term. In one study, depressed individuals with the greatest sustained amygdala responses to emotional words demonstrated the greatest depression reductions directly after CBT treatment (Siegle, Carter, & Thase, 2006). Fewer studies have investigated how baseline neural activation impacts long-term effects of antidepressant treatments, however, one study suggested that depressed individuals with the highest amygdala activation during implicit emotional face processing had the lowest depression severity eight months after initial assessment, regardless of treatment received (Canli et al., 2005). These finding suggest that when amygdala hyperreactivity characterizes MDD, SSRIs, and CBT treatments can successfully reduce this reactivity and facilitate long-term recovery. However the long-term impact of baseline neural activation on response to treatment and recovery in MDD remain relatively unexplored.

Studies have largely focused on MDD generally; consequently, little is known about amygdala reactivity to affective stimuli or its relation to depressive symptoms specifically in TRD. One recent study reported no baseline differences between TRD patients and HCs in amygdala activation during the processing of sad faces (Murrough et al., 2015). Another study reported that TRD patients did not have differential resting-state amygdala connectivity relative to controls; however, treatment-responsive depressed patients did demonstrate disrupted resting-state amygdala connectivity (Lui et al., 2011). More recently, Jacobs and colleagues reported that individuals with multiple episodes of depression demonstrate differential resting-state connectivity between the amygdala and prefrontal regions as compared to patients who have only experienced one episode (Jacobs et al., 2016). These limited investigations suggest that TRD may be associated with distinct amygdala reactivity and connectivity; however, additional research is needed. Since TRD patients fail to respond to medications that target neural circuitry associated with affective processing, it is critical to establish if TRD patients alone demonstrate the same pattern of hyperactive amygdala response as is seen in MDD patients in general.

In the present study, we examined amygdala reactivity and its relation to concurrent and longer-term depression severity in TRD patients that were part of a larger randomized controlled trial (RTC) demonstrating that eight weeks of mindfulness-based cognitive therapy (MBCT), relative to a structurally equivalent comparator condition, the health-enhancement program (HEP), produced greater improvements in depressive symptoms (Eisendrath et al., 2014; Eisendrath et al., 2016). To probe amygdala reactivity, we used a common emotional face processing task. In healthy participants, labeling the emotion of faces in this task has produced the greatest reduction in amygdala activation relative to observing emotional faces, whereas labeling gender showed an intermediate effect (Critchley et al., 2000; Lieberman et al., 2007) and this result has been attributed to labeling serving as a form of emotion regulation. The preponderance of MDD studies suggest depressed patients are characterized by exaggerated amygdala activation and difficulties regulating affect (Beauregard et al., 2006; Sheline et al., 2001), and therefore would demonstrate amygdala hyperreactivity during the observation of emotional faces, as well as a failure to downregulate amygdala activation during gender and affect labeling. However the limited number of studies specifically investigating TRD suggest that amygdala activation in these patients may differ from that of MDD generally. Accordingly, our hypotheses about baseline amygdala activation in TRD are necessarily nondirectional. However, given the previous literature suggesting that heightened amygdala reactivity during the labeling of emotional faces is predictive of treatment response recovery regardless of treatment received (Canli et al., 2005), we predicted that patients with greater amygdala reactivity at baseline would show greater treatment response and better longer-term clinical outcomes.

Method and materials

Participants

Eighty-four TRD patients and 37 HCs participated in this study. Participants were recruited from outpatient psychiatry and general medicine clinics at the University of California San Francisco (UCSF), at the outpatient psychiatry clinic at Kaiser Permanente in San Francisco and through clinical referrals, flyers, and advertisements in newspapers and on municipal buses and trains. To qualify, TRD patients needed to meet the criteria for unipolar MDD based on the Structured Clinical Interview for DSM-IV Disorders (First, 1995), a Hamilton Rating Scale for Depression (HAM-D17; Hamilton, 1967) score greater than or equal to 14, and to be taking antidepressant medication with evidence of two or more adequate trials prescribed during the current episode as assessed with the Antidepressant Treatment History Form (ATHF; Sackeim, 2001). Patients were excluded for the following: lifetime history of bipolar disorder, schizophrenia, or any psychotic disorder; substance abuse or dependence within three months of study onset; currently suicidal, dangerous to others or self-injurious; psychotherapy that they were unwilling to discontinue during the 8-week treatment portion of the study; or a score of <25 on the Mini Mental Status Exam (Folstein, Folstein, & McHugh, 1975).

The HC group was matched to the TRD group on age, gender, and handedness, and had no history of a major Axis I psychiatric disorder, neurological illness, or current use of psychotropic medication. Participants were required to be at least 18 years of age, fluent in English, no MRI contraindications, and to have normal or corrected-to-normal vision.

Participants were excluded from analysis for mean amygdala activation values greater than three times the interquartile range (n = 2), missing behavioral data (n = 1), or task accuracy below 70% for any condition (n = 1). The final sample included 80 TRD and 37 HC participants. Written informed consent approved by the institutional review board at UCSF was obtained for participants. Their demographic data are presented in Table 1; information on their medications and treatments at each time point are presented in Table 2.

Table 1 Demographic data for treatment-resistant depression patients and healthy controls (at baseline)
Table 2 Treatment information for treatment-resistant depression patients at baseline and Weeks 8, 24, 36, and 52

Protocol

TRD participants were part of a RTC comparing MBCT to HEP as adjunctive treatments to antidepressant medication. Details regarding treatment programs and the randomization procedure are presented in the published protocol (Eisendrath et al., 2014).

MBCT treatment involved guided meditations and CBT exercises intended to help participants identify cognitive distortions, disengage from rumination, and use nonjudgmental present-moment awareness (Segal, Williams, & Teasdale, 2012). HEP treatment involved exercise, functional movement, music therapy, diet education, and guided imagery intended to promote health and improve mood (MacCoon et al., 2012). Both treatments met for eight weeks in groups of 6–12 once a week for 2 h 15 min. Participants were assigned 45 min of homework six days per week. Importantly, HEP treatment was designed to match MBCT treatment on group support, reduction of stigma, improved morale, facilitator attention, treatment duration, and time spent on at-home practice to carefully isolate the effects of MBCT.

Participants completed the emotional face processing task at baseline and following treatment; only baseline FMRI data are analyzed in the present report. TRD patients underwent assessments at baseline, and Weeks 8, 24, 36, and 52 (for a detailed description of the full battery of measures; see Eisendrath et al., 2014). Changes in depression severity were expressed as the percent change from baseline in HAM-D17 total scores ([baseline – posttreatment]/baseline × 100%). Anxiety was assessed using the self-report State and Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Lushene, 1970).

Emotional face-processing task

During shape matching, participants selected one of two shapes from the bottom of the screen that matched the target shape. During affect labeling, participants selected one of two emotional words (e.g., happy, angry) from the bottom of the screen that matched the emotion depicted on the target face. During gender labeling, participants selected one of two names (e.g., Sylvia, Allen) from the bottom of the screen that matched the gender of the target face. During observing, participants passively viewed an emotional face. The stimuli were selected from the NIMSTIM Face Stimulus Set (Tottenham et al., 2009). The faces (half female, half male) depicted negative emotions 80% of the time (fear, anger) and positive emotions 20% of the time (happiness, surprise) to prevent amygdala habituation to negative expressions.

Each block began with a 10-s fixation period, followed by 3 s of instruction (identify emotion, identify gender, observe, match shape), followed by ten images presented for 5 s each. Four blocks were presented per condition per run. Two runs were presented, each with a run time of 5 min 46 s.

Neuroimaging acquisition methods

Images were acquired on the Siemens 3-T TIM TRIO scanner at the UCSF Neuroimaging Center. Acquisition parameters for functional scans were as follows: TR = 2, TE = 30 ms, FoV = 220 mm; flip angle = 77°, bandwidth = 2298 Hz/pixel; matrix = 64 × 64. Thirty slices (3 mm thick, 1-mm gap) were acquired in an axial-oblique plane, parallel to the anterior–posterior commissure (AC–PC) line. Acquisition parameters for the high-resolution anatomical scan were as follows: 3-D MP-RAGE sequence, scan time: 5 min 17 s, flip angle = 9°, FOV = 220 mm, 160 slices per slab, 1.2 mm thick, no gap, TR = 2.30 s, TE = 2.94 ms.

Image processing

Preprocessing was achieved with Statistical Parametric Mapping 8 (SPM8; www.fil.ion.ucl.ac.uk/spm/software/spm8/). Image preprocessing entailed motion correction via affine registration; the first image of each run was realigned to the first image of the first run, and then realignment proceeded within each run. Images were slice-time-corrected to the middle slice. To further denoise the data, we implemented aCompCor (Behzadi, Restom, Liau, & Liu, 2007), a principal component analysis (PCA) based approach to noise reduction of fMRI time-series data. ACompCor derives principal components from the time series of voxels within noise regions of interest (ROIs) defined on eroded white matter and cerebrospinal fluid (CSF) parcels from participants’ segmented high-resolution T1-weighted anatomical images coregistered to their functional data. A binary union mask of noise ROIs was generated and co-registered to the mean functional scan. Voxels in the mask that showed even a weak relationship with the task regressors (p < .2) were excluded. Time series data for the remaining voxels in the noise ROI mask were then subjected to a PCA, and a number of noise components comprising weighted averages of white matter and CSF voxel time series were identified using a bootstrap procedure.

SPM’s canonical hemodynamic response function was convolved with task event vectors to create first-level task regressors representing affect labeling, gender labeling, observing, and shape matching. Head motion (six realignment parameters and their derivatives), were included with aCompCor noise regressors in the first level model. Parameters (i.e., beta coefficients) representing the fit of each regressor to a voxel’s time series were estimated using the general linear model after applying a high-pass temporal filter (128-s cutoff) to remove low-frequency noise. Mean beta images were calculated across runs and contrast images were created by subtracting out the shape-matching beta image from each condition of interest (affect labeling, gender labeling, and observing). The mean functional image from the motion correction preprocessing step was normalized to standard neuroanatomical space (the Montreal Neurological Institute’s MNI EPI template; www.bic.mni.mcgill.ca), resulting in 3-mm3 isotropic voxel dimensions, and the normalization parameters were applied to first-level beta and contrast images, which were then spatially smoothed with a 6-mm full-width-at-half-maximum Gaussian kernel.

Data analysis

Group-based (second-level) random-effects analyses were conducted on contrast images to test for statistically significant differences between groups and task conditions. A bilateral amygdala ROI was creating using the Automated Anatomical Labeling atlas (Tzourio-Mazoyer et al., 2002) using WFU PickAtlas (Maldjian, Laurienti, Kraft, & Burdette, 2003). Mean amygdala activations were extracted and imported into IBM SPSS Statistics 22 (SPSS Inc., Chicago IL).

Task accuracy and reaction time (RT; median across trials for each condition) data were analyzed using a mixed-design analysis of variance (ANOVA) with group as the between-subjects and condition as the within-subjects variable. Amygdala activation data were analyzed using a mixed-design ANOVA to test for group (TRD, HC), condition (affect labeling, gender labeling, observing), and hemisphere effects. If there were no significant hemisphere effects in the overall ANOVA model, bilateral amygdala ROIs were used for subsequent analyses. Significant interaction effects were followed by repeated measures ANOVAs examining condition effects within each group, and independent t tests examining group effects within each condition. Greenhouse–Geisser-corrected results are presented if Mauchly’s test of sphericity indicated a violation. The relationships between amygdala activation across the three conditions (affect labeling, gender labeling, observing) were examined using Pearson correlations. To assess the relationship amygdala activation across the three task conditions and baseline depression severity without the confounding influence of anxiety, HAM-D17 depression scores were first residualized on anxiety severity scores (STAI–Trait Anxiety). Next, these residualized depression scores were regressed on the three task condition-evoked amygdala activation measures in a multiple regression model in order to determine whether any of them were uniquely associated with depression severity while accounting for the covariation between them. If the three amygdala activation measures were significantly correlated and failed to uniquely account for variance in baseline depression, the relationship with depression was re-assessed using the mean amygdala activation across the three conditions. The effect of MBCT versus HEP treatment on percent change in depression severity was assessed using independent t tests. To assess whether baseline amygdala activation differentially predicted treatment response in the MBCT versus HEP groups, HAM-D17 change scores at each follow-up assessment (percent change from baseline) were separately regressed on the three task condition-evoked amygdala activation measures, group, and the three Group × Amygdala interaction terms in a multiple regression model. The increment in variance accounted for by adding the three interaction terms was tested as a block in order to determine whether the slopes of the depression change–amygdala relationships significantly differed between the groups. If the addition of the interaction terms did not result in a significant improvement in the fit of the model, the interaction terms were omitted and a common slope across the groups was assumed. The unique contributions of each of the three task-condition-evoked amygdala activation were then evaluated in the common slope model, accounting for the covariation between them. If the three amygdala activation measures were significantly correlated and did not uniquely contribute to the prediction of depression change within the treatment groups, the prediction of depression change was reassessed using the mean amygdala activation across the three conditions. For each separate family of tests, the family-wise error rate was set to .05.

Medication strength was quantified for the TRD patients at each time point according the procedure described in the ATHF (Sackeim, 2001). Briefly, this strength index takes various sources of information into account including drug type, dose, duration of treatment, and compliance. Estimates of medication strength at each time point are presented in Table 2.

Results

Task accuracy and RT

Task accuracy

The mean accuracy exceeded 95% for each condition. Accuracy did not differ between HCs and TRD patients [F(1, 115) = 1.13, p = .29, η p 2 = .01], and the Group × Condition interaction was not significant [F(2, 230) = 1.10, p = .33, η p 2 = .01]. However, we found a significant main effect of condition [F(2, 230) = 15.60, p < .001, η p 2 = .12]. Follow-up tests revealed that participants were more accurate during gender labeling than during both affect labeling [t(116) = – 5.79, p < .001, d = 0.69] and shape matching [t(116) = 5.75, p < .001, d = 0.69]; there were no differences in accuracy between affect labeling and shape matching [t(116) = – 0.75, p < .46, d = 0.09]. The condition means by group are reported in Table 3.

Table 3 Task accuracy and reaction time means (and standard deviations) for each condition of the emotional face-processing task for treatment-resistant depression patients and healthy controls

Task RT

No significant Group × Condition interaction emerged [F(2, 230) = 0.66, p = .52, η p 2 = .01]. However, we did find a significant main effect of condition [F(2, 230) = 226.39, p < .001, η p 2 = .66], with faster RTs during gender labeling than during both affect labeling [t(116) = 14.51, p < .001, d = 1.11] and shape matching [t(116) = 9.47, p < .001, d = 0.76], and faster RTs during shape matching than during affect labeling [t(116) = 19.23, p < .001, d = 1.77]. There was also a significant main effect of group [F(1, 115) = 4.64, p = .03, η p 2 = .04]; TRD patients (mean RT = 1459.13) were slower across all conditions than HCs (mean RT: 1,361.25 ms). The condition means by group are reported in Table 3.

Amygdala activation at baseline

Overall mixed-design ANOVA

None of the main or interaction effects on amygdala activation involving hemisphere were significant. Accordingly, analyses were conducted using a bilateral amygdala ROI. The Group × Condition interaction was significant after Greenhouse–Geisser correction [F(2, 230) = 6.31, p = .003, η p 2 = .05]; the TRD patients had less amygdala activation than HCs during affect labeling [t(115) = 2.21, p = .03, d = 0.41]. TRD patients also had marginally less amygdala activation during gender labeling [t(115) = 1.74, p = .09, d = 0.29], but there were no significant differences between TRD patients and HCs during observing [t(115) = – 0.81, p = .42, d = 0.21]. The main effects of condition [F(2, 230) = 1.75, p = .18, η p 2 = .02] and group [F(1, 115) = 1.67, p = .20, η p 2 = .01] were not significant. The mean amygdala activation for each condition is presented in Fig. 1 (left panel).

Fig. 1
figure 1

(Left) A bar graph depicting mean amygdala activation during each condition (affect labeling, gender labeling, and observing) for healthy controls (HCs) and treatment-resistant depression patients (MD). Error bars indicate ±1 SE. (Middle) A plot depicting the inverse relationship between anxiety-residualized scores on the Hamilton Rating Scale for Depression (HAM-D17) and affect-labeling-evoked amygdala activation (residualized on gender-labeling- and observing-evoked amygdala activation) in the treatment-resistant depression patients at baseline (n = 80). (Right) A plot depicting the relationship between percent reduction in Hamilton Rating Scale for Depression Scores (HAM-D17) severity and the mean amygdala activation averaged across conditions (affect labeling, gender labeling, and observing) among treatment-resistant depression patients at Week 52 (n = 47) following completion of either mindfulness-based cognitive therapy (MBCT) or a health enhancement program (HEP)

Repeated measures ANOVA in HCs

The condition effect was significant [F(2, 72) = 3.86, p = .03, η p 2 = .10], with HCs demonstrating greater amygdala activation during both affect labeling [t(36) = 2.34, p = .03, d = 0.37] and gender labeling [t(36) = 2.85, p = .007, d = 0.40], relative to observing. No significant differences were apparent between affect labeling and gender labeling [t(36) = – 0.05, p = .960, d = 0.01].

Repeated measures ANOVA in TRD patients

For TRD patients, the condition effect was marginally significant after Greenhouse–Geisser correction [F(2, 158) = 3.06, p = .06, η p 2 = .037], with TRD patients having significantly less activation during affect labeling than during observing [t(79) = – 2.06, p = .04, d = 0.24], and marginally less activation during affect labeling than during gender labeling [t(79) = – 1.78, p = .08, d = 0.18]. We found no significant differences between gender labeling and observing [t(79) = – 0.74, p = .46, d = 0.07].

Relationships between amygdala activation during each condition

Despite the ANOVA results suggesting that group differences in amygdala activation were relatively specific to affect labeling, the levels of amygdala activation evoked by the three task conditions were significantly intercorrelated within the TRD patients (affect labeling vs. gender labeling: r = .604, p < .001; affect labeling vs. observing: r = .478, p < .001; gender labeling vs. observing: r = .662, p < .001). Accordingly, in subsequent analyses examining the clinical correlates of amygdala activation, unique contributions of each task condition-evoked amygdala activation were evaluated by first taking the covariation among them into account. In the absence of evidence supporting unique contributions, the clinical correlation analyses were repeated using the mean amygdala activation across the three conditions.

Relationships between amygdala activation and depression at baseline

The relationship between baseline depression severity and task-evoked amygdala activation in the TRD patients was assessed after regressing HAM-D17 scores on STAI–Trait Anxiety scores and saving the residuals, thereby deriving depression scores from which shared variation with anxiety had been removed (HAM-D17 vs. STAI–Trait Anxiety r = .23, p = .045). The residual HAM-D17 scores were then regressed on the amygdala activation measures from the three task conditions in a multiple regression model. Although the overall model was significant [multiple R 2 = .15; F(4, 69) = 3.06, p = .02], only amygdala activation during affect labeling emerged as a significant unique predictor of depression [b = – .39; t(69) = – 2.76, p = .007], with the gender labeling [b = .14; t(69) = 0.83, p = .41] and observing [b = .07; t(69) = 0.47, p = .64] conditions failing to make unique contributions. A plot depicting the inverse relationship between HAM-D17 depression scores (anxiety-residualized) and affect-labeling-evoked amygdala activation (residualized on gender-labeling- and observing-evoked amygdala activation) in TRD patients is presented in Fig. 1 (middle panel).

Across task conditions, blunted amygdala activation was not significantly related to number of prior depressive episodes or to length of illness in the TRD patients (all ps < .05). A table of the uncorrected correlations between amygdala activation during each task condition and the clinical variables is presented in Table S1 of the supplemental materials.

Treatment response

Following treatment at Week 8, independent t tests indicated a greater percent reductions in depression severity in MBCT patients than among HEP participants [t(63) = – 2.67, p = .01, d = – 0.67]. We observed no significant differences in depression reduction between treatment groups at Week 24 [t(60) = – 1.57, p = .12, d = 0.41], 36 [t(59) = – 0.70, p = .49, d = 0.18], or 52 [t(45) = – 1.03, p = .31, d = 0.31]. The percent reduction in depression severity for each group at each time point is depicted in Fig. 2.

Fig. 2
figure 2

Percent reduction in Hamilton Rating Scale for Depression (HAM-D17) scores following completion of mindfulness-based cognitive therapy (MBCT; solid black line) or the health enhancement program (HEP; dashed gray line) at Weeks 8 (MBCT n = 33, HEP n = 32), 24 (MBCT n = 32, HEP n = 30), 36 (MBCT n = 31, HEP n = 30), and 52 (MBCT n = 24, HEP n = 23). Error bars indicate ±1 SE

Predictors of treatment response

For each follow-up assessment, the HAM-D17 depression percent change scores were regressed on the three task-condition-evoked amygdala activation measures (affect labeling, gender labeling, and observing), treatment group (MBCT vs. HEP), and their interactions. In no case did the addition of the interaction terms significantly improve the fits of the regression models (all ps > .05), indicating no significant differences in the slopes of the regression lines between the treatment groups. After dropping the interaction terms and assuming common slopes between the groups, tests of the unique contributions of each of the task-condition-evoked amygdala activation models failed to show any significant unique contributions to the prediction of depression change at any follow-up assessment (all ps > .05). Accordingly, in order to examine whether amygdala activation more generally predicted treatment within the treatment groups, the regression models were repeated using mean amygdala activation across the three task conditions. Again, significant Group × Amygdala Activation interactions were not found in any follow-up assessment (all ps > .05), supporting the evaluation of common slopes across the groups. The only significant common slope relationship between mean amygdala activation and the HAM-D17 percent change scores was observed at the 52-week follow-up assessment [b = .19; t(44) = 2.57, p = .0136]. As is shown in Fig. 1 (right panel), the greater the mean amygdala activation at baseline, the greater the improvement in depression severity at the 52-week follow-up.

Medication effects

Estimates of medication strength did not change significantly over the assessment time points in the TRD patients (p = .27). In addition, for all affective face conditions, amygdala activation in the TRD patients was unrelated to variation in medication strength at any time point (all ps < .05).

Discussion

The present study investigated amygdala reactivity during emotional face processing and its relation to concurrent and long-term depression severity in a sample of TRD patients who received MBCT or HEP treatment. As compared to HCs, TRD patients had blunted amygdala activation during affect labeling that was related to greater depression severity. Immediately following treatment, MBCT participants showed greater reductions in depression severity than HEP, but not at any future time points. Whereas amygdala activation during affect labeling, relative to gender labeling and passive observing of emotional faces, was not uniquely predictive of treatment outcomes, greater mean amygdala activation across task conditions was predictive of improvement in depression severity across treatment groups at 52 weeks. However, this predictive relationship was not evident at earlier follow-up assessments (8, 24, or 36 weeks). These results suggest that treatment modality differentially influenced short-term changes in depression severity, but that baseline amygdala reactivity during affect labeling was predictive of concurrent depressive symptom severity, whereas amygdala reactivity to emotional faces more generally, irrespective of the specific task instructions, was predictive of long-term clinical outcomes.

As compared to HCs, TRD patients exhibited blunted amygdala activation during affect labeling that was associated with greater concurrent depression severity. These findings contrast with literature on MDD that suggested that depressed individuals demonstrate enhanced amygdala activation to emotional stimuli, even in largely medicated samples (Anand et al., 2005; Fournier et al., 2013; Palmer, Crewther, & Carey, 2015; Sheline et al., 2001), that increases with greater depression severity (Lee et al., 2007). However, there have also been reports of normal amygdala activation in MDD patients (Van den Bulk et al., 2014), as well as blunted amygdala reactivity in depressed adults (Drevets, 2001) depressed children (Thomas et al., 2001), and children with severe mood dysregulation (Brotman et al., 2010). Heterogeneity in emotional reactivity in MDD has been attributed to moderating factors including subtypes of depression, number of previous episodes, and concurrent diagnoses that are not consistently reported (Bylsma, Morris, & Rottenberg, 2008). Very limited neuroimaging studies have investigated TRD specifically; those that have reported no differences in amygdala activation during the processing of sad faces (Murrough et al., 2015) and no differences in amygdala resting-state connectivity (Lui et al., 2011). The present study adds to these limited reports by also demonstrating no differences in amygdala activation during passive viewing of emotional faces between HC and TRD patients, but blunted amygdala activation during affect labeling. Together, these conflicting findings suggest that, although the average pattern of activation in MDD may be one of amygdala hyperactivation to emotional stimuli, there is clinically relevant variation in amygdala reactivity. In particular, some forms of depression, including TRD, may be characterized by blunted amygdala activation during explicit emotional processing that worsens with greater depression severity.

Consistent with results from the RCT investigating the effects of MBCT versus HEP on depression outcomes (Eisendrath et al., 2016), the MBCT participants in the present study demonstrated greater percent reductions in depression severity than did the HEP participants at Week 8 (45% vs. 28%). Treatment differences were not observed in longer-term clinical outcomes, although both groups demonstrated long-term sustained reductions in depression severity. By Week 52, greater baseline amygdala activation during affect labeling significantly predicted the degree of improvement in depression severity across both treatment groups. Together, these findings suggest that the differential effect of treatment may be relatively transient, and that baseline amygdala reactivity during emotional face processing, irrespective of the task instructions, may better predict long-term depression outcomes.

Our findings are consistent with reports suggesting that depressed individuals who demonstrate blunted amygdala reactivity to emotional stimuli prior to treatment have worse long-term clinical outcomes (Phillips et al., 2015). In depressed patients, enhanced amygdala activation during emotional face processing predicted greater depression improvement nearly a year later, independent of treatment received (Canli et al., 2005). Furthermore, exaggerated amygdala response to emotional words predicted greater depression improvement following CBT (Siegle et al., 2006). SSRIs and CBT are thought to target common neural mechanisms, including amygdala activation, and to influence emotional processing (DeRubeis, Siegle, & Hollon, 2008; Sheline et al., 2001). If these treatments function by regulating exaggerated amygdala response to emotional stimuli (DeRubeis et al., 2008), then they may not be particularly effective for TRD patients who demonstrated blunted responses to emotional stimuli.

In healthy individuals, affect labeling has been associated with reduced amygdala activation relative to observing or gender labeling, and has been considered a form of emotion regulation (Lieberman et al., 2007). In contrast, HCs in our study were characterized by greater amygdala activation during affect and gender labeling than during observing. Decreased amygdala activation during affect as compared to gender labeling has not consistently been reported in healthy individuals (Gee et al., 2015; Gorno-Tempini et al., 2001; Lange et al., 2003), and some studies have reported enhanced amygdala activation during affect labeling relative to implicit processing (Gur et al., 2002; Habel et al., 2007). These discrepant findings suggest that although affect labeling may down-regulate amygdala activity in some cases, affect labeling may also increase amygdala activation. Indeed, a meta-analysis of fMRI face processing studies concluded that explicit, relative to implicit, processing of emotional faces, was associated with greater amygdala activation (Fusar-Poli et al., 2009). In our sample, enhanced amygdala activation during affect and gender labeling represented the normal response to the task, whereas blunted amygdala activation not only characterized TRD patients, it was systematically predictive of more severe depression severity.

One limitation of this study is that positive and negative faces were intermixed in a block design, limiting our ability to isolate amygdala effects to either positive or negative expressions. This design was chosen to optimize power, mirror previous studies using this paradigm, and to prevent amygdala habituation to the repeated presentation of negative expressions; however, it also limits our ability to isolate the amygdala effects to either positive or negative facial expressions. Rather than being valence specific, these results reflect emotional processing in general. However, the amygdala is responsive to both pleasant and unpleasant facial expressions, and the bilateral amygdala is responsive to directing attention to the emotional features of the faces (explicit face processing), despite the valence of the emotional expression (Costafreda, Brammer, David, & Fu, 2008; Fusar-Poli et al., 2009). Further studies may wish to independently examine positive and negative faces in order to clarify if blunted amygdala activation in TRD is specific to negatively valenced faces, which comprised the majority of our trials. A second limitation is the lack of a treatment-responsive group of depressed patients, preventing us from directly testing if these findings are unique to TRD patients or if they are characteristic of MDD patients in general. In addition, although these findings suggest that TRD is associated with blunted amygdala activation during affect labeling, in the absence of longitudinal data we are unable to determine whether abnormal amygdala activation predates TRD, or if it occurs as a function of illness course, or variables that are secondary to illness, including prolonged exposure to antidepressant medication. In particular, we cannot rule out the possibility that the antidepressant medications taken by all of the TRD patients at the time of the baseline fMRI assessment contributed to dampened emotional responding and blunted amygdala activation. Finally, whereas both HEP and MBCT groups demonstrated sustained improvements in depression throughout the follow-up period, it is possible that a beneficial effect of MBCT over HEP beyond eight weeks might emerge if MBCT treatment had been continued for a longer period or if the implemented treatment had been augmented with booster sessions over the subsequent year.

In summary, TRD appears to be characterized by blunted amygdala activation during explicit emotional face processing that worsens with greater depression severity. MBCT is associated with greater reductions in depression severity than HEP directly after treatment; however, after 52 weeks, treatment no longer differentially impacts depression severity, and greater pre-treatment amygdala activation in response to emotional faces, irrespective of specific task conditions, predicts depression improvement in both treatment groups.

TRD is characterized by a lack of response to SSRI medications, which are thought to target abnormal neural circuitry related to affective processing, including hyperactive amygdala activation (DeRubeis et al., 2008). The present study provides evidence that TRD patients do not demonstrate the same affective processing abnormalities as MDD patients in general, which may provide some mechanistic explanation for why medications that target this circuitry are not effective at improving symptoms in TRD patients. Because TRD patients represent a major social economic and burden of depression as a whole, understanding specific neurobiological features of this group and their relation to clinical outcomes will add necessary nuance to depression models and may ultimately result in better diagnosis and treatment of TRD.