Introduction

Anxiety and depressive disorders account for more than 70 million years lived with disability globally each year [1]. Several effective treatments exist, but outcomes following psychological therapy are variable and many individuals do not experience improvement [2,3,4,5]. Development of a statistical model that predicts the likelihood of success following each available treatment, for each patient, on their first assessment is a major research goal. Identifying clinical, sociodemographic and genetic predictors of treatment outcome could facilitate this goal. Well-replicated associations with poor outcomes following treatment include clinical factors related to symptom severity, chronicity and comorbidity, such as pre-treatment symptom severity [6, 7], duration of illness [8,9,10], age of onset [11], history of previous treatments for major depressive disorder [12], functional impairment [13, 14] and comorbid anxiety disorders [15], dysthymia [16] or personality disorder [17,18,19,20]. Associations with sociodemographic factors have been less consistent. In large meta-analyses, no effects of age or sex on therapy outcomes were observed [11]. Lower cognitive ability [12] and social support [11], and stressful life events have been associated with unfavourable treatment outcomes [11]. Furthermore, lower levels of patient engagement and poor therapeutic alliance is associated with less favourable outcomes [6, 21].

Despite evidence for many associated factors, few statistical models can prospectively predict mental health treatment outcomes outside of the settings used to develop the models [22, 23]. A common explanation for the inconsistent associations across studies is low statistical power due to aetiological heterogeneity, small sample sizes and small effect sizes [23]. Efforts to increase statistical power have typically focused on either increasing sample size, or restricting analyses to more homogeneous subgroups. Cohort studies often collect detailed social, demographic and clinical data and may provide opportunities for further data linkage with national registries [24]. As such, observational cohorts may provide increased sample sizes for testing associations across a broad range of variables with outcomes following therapy. While such associations may only have small individual effects, they may eventually be combined with a range of factors to produce clinically meaningful prediction models. Furthermore, larger samples could allow for analyses to be stratified into more homogeneous subgroups and for interactions between predictive factors to be tested [23]. Linkage between cohort studies and medical records is the optimal strategy to achieve this aim, however, there are various challenges associated with data linkage [25]. Supplementary approaches such as retrospective data collection in existing cohorts may also help to provide the boost in sample size necessary for analyses of therapy outcomes. For example, some cohorts have made use of a single, self-report item to measure depression, which has been described as a minimal-phenotyping approach [26, 27]. In a UK Biobank study, a single item: “self-reported past treatment-seeking for problems with nerves, anxiety, tension or depression”, was used as a “broad depression” measure [28]. This measure increased the number of “depression cases” from 8,276 to 113,769 and 14 novel genetic associations were discovered [28]. While minimal phenotyping will likely measure somewhat different constructs from clinical assessments [26], the benefit of such strategies is that data can be collected on a larger scale, faster and at a substantially lower cost than clinically ascertained outcome measurements [23, 27]. Thus, such approaches should not replace traditional methods of measurement but can be used in large population based cohorts to collect data on therapy outcomes [23]. Exploratory analyses in observational cohorts could be used to generate new hypotheses about new prognostic factors that can subsequently be tested in prospective clinical studies. However, it will also be necessary to assess the validity of phenotypes derived from minimal approaches, by comparison with gold-standard approaches.

We used the Global Rating of Change (GRC) to retrospectively measure patient-perceived outcomes following psychological therapy in a large observational cohort. The patient-rated GRC has high test retest reliability [29], high face validity [30] and has been used in research to calculate the minimal clinically important change on symptom questionnaires [31]. For example, the minimum change in depression symptom scores (measured using the Patient Health Questionnaire 9-item version; PHQ-9 [32]) that was associated with reporting feeling “better” on the GRC was a reduction of ~ 1.7 points (~ 21%) [33]. We aimed to replicate associations from the literature between psychological therapy outcomes measured using clinical scales and sociodemographic, clinical and therapy related factors. Based on the literature, we hypothesised that individuals reporting less favourable outcomes would have higher symptom severity, chronicity and comorbidity (more episodes, earlier age of onset, more comorbid diagnoses, higher personality disorder symptom scores) and individuals reporting more favourable outcomes would have higher educational attainment. Given that evidence is less clear for other sociodemographic factors, we did not hypothesise about the effects of age, sex or ethnicity. Additional therapy-related factors were drawn from the therapy questionnaire and were included as covariates rather than primary analysis variables. We sought to demonstrate the utility of minimal phenotyping to generate a large sample for detecting expected associations with treatment outcomes.

Methods

Participants

Participants were from the Genetic Links to Anxiety and Depression (GLAD) Study, an online study of UK residents recruited via a nationwide advertising campaign and through NHS services [24]. Participants were recruited if they met criteria for a lifetime diagnosis of an anxiety or depressive disorder based on the Composite International Diagnostic Interview Short Form (CIDI-SF) [34, 35]. At GLAD sign-up, data were collected on demographics, mental and physical health symptoms and disorders, and a range of psychological and behavioural phenotypes relevant to anxiety and depression [24]. Following sign-up, participants are re-contactable and have the opportunity to participate in additional research studies and phases of data collection. At the time of this analysis, there were 37,413 participants who had completed the GLAD sign-up questionnaire (GSU-Q) between 09/2018 and 10/2020 (79.6% female; aged: 16–80 years, mean = 39, SD = 14).

Between 08/2019 and 10/2020, GLAD participants were invited to take part in a therapy history and outcomes questionnaire (THO-Q) (S. Figure 2). Analyses were restricted to respondents who had received psychological therapy to treat major depressive disorder (MDD), generalized anxiety disorder (GAD), specific phobia (SpP), social phobia (SoP) or panic disorder (PD). Those who reported treatment for other primary psychiatric diagnoses (e.g. personality disorder, bipolar disorder) were excluded from analyses. Psychological treatment included one-to-one cognitive-behavioral therapy (CBT), counseling or other one-to-one therapy and group CBT, counseling, or other group therapy. Those who reported use of other one-to-one or group therapy types were excluded from analyses, because these were mostly non-traditional therapies (i.e. those not typically offered in primary care; e.g. acceptance and commitment therapy, eye movement desensitization and reprocessing). We further restricted analyses to individuals who were at least 18 years old during their therapy and who had received their most recent course of therapy within the last ten years (2010–2020). This timeframe was chosen to increase likelihood of reliable recall and also to account for likely changes in therapy protocols and guidelines.

Measures

Outcome: global rating of change following the most recent course of therapy

Self-rated change in symptoms following therapy was retrospectively reported by participants on the therapy history and outcomes questionnaire. Participants were asked how much their symptoms and day-to-day functioning had improved following their course of talking therapy. There were five response options for this outcome, which was taken from the Global Rating of Change (GRC; [31]): much worse (-2), a little worse (-1), no change (0), a little better (+ 1), much better (+ 2; Table 1).

Analysis variables: sociodemographic, clinical and therapy-related factors.

We aimed to cover a breadth of factors for which there is some evidence, taking an inclusive approach to ensure a broad base of factors were covered. The participants in this study reported on their history of therapy and outcomes retrospectively. Therefore, we selected only core sociodemographic and clinical factors that reflected their overall course of illness, and therapy factors that related to the specific time that they were receiving their therapy.

Sociodemographic variables were mostly collected during the sign-up questionnaire. For our analysis, these included self-reported biological sex (male/female), ethnic background and education level. Given that the majority of the sample (95.3%) reported their ethnicity to be White British, we dichotomised ethnicity into two groups for comparison: (1) white British and (2) a collective UK ethnic minority group. Education level was dichotomised from six categorical responses, using the median category (University degree: yes/no). Age (at time of therapy in years), was retrospectively reported at THO-Q assessment.

Four clinical factors were included in analyses: (1) self-reported age of onset of the first episode of either depression or anxiety disorder, (2) the number of episodes of illness, (3) a personality disorder symptom score measured using the Standardized Assessment of Severity of Personality Disorder (SASPD [36]), and (4) number of psychiatric comorbidities during therapy. The first three factors were assessed at entry into the study and the fourth was assessed in the therapy history questionnaire.

Several ‘therapy factors’ derived from the therapy history questionnaire were included as explanatory variables or as covariates. Years since therapy was calculated as self-reported age at the time of treatment subtracted from age at time of completing the questionnaire. Primary diagnosis was self-reported MDD, GAD or a combined group PPD. Type of therapy was either one-to-one CBT, one-to-one counseling or group therapy. Given that participants who received any group therapy were a minority, we combined both group CBT and group counseling into one category for analysis. Concurrent medications asked whether the participant was taking antidepressant or anti-anxiety medications during the course of therapy (yes/no). First-time therapy was whether the participant had only ever received therapy once (yes/no). Use of therapeutic activities described whether the participant reported using an activity (e.g. yoga, mindfulness or meditation) as a form of self-help (yes/no).

Statistical analysis

Regression analyses

Univariable and multivariable ordinal logistic regression models were fitted using maximum likelihood estimation using the lrm function from the rms package [37] in R version 3.6.1 [38]. Effect sizes estimated from multivariable and univariable models were compared using two-sample z-tests (S.Table 12). A Brant test [39] was performed to assess the assumption of proportionality in the proportional odds model (S.Table 10). A Bonferroni p-value was calculated to correct for multiple effectively independent tests, which was computed as the number of principal components that explained 99.5% of the variance in the correlation matrix of all explanatory variables (S.Table 9; S. Figure 3; see: [40, 41]). The correlation matrix of explanatory variables also showed that none of the variables selected for the regression analyses were strongly correlated (all absolute r < 0.37). Post-hoc variance inflation factors (VIF) were also computed to assess multicollinearity in the multivariable regression models, using the rms package [37]. The maximum VIF calculated across all analyses was for age of onset (VIF = 1.29, i.e. an inflation of 29%, which is moderate but not considered as problematic; S.Table 11).

Sensitivity analyses

Prior to analysis, the characteristics of participants with complete data were compared with participants with missing data (S.Tables 1 and 3). To reduce the impact of missing data on our findings, “missing data” variables were dummy coded and included in analyses so that all participants were retained for analysis. For categorical variables, a “missing data” category was coded to replace missing values. Continuous variables were mean imputed and “missing data” variables were dummy coded and included as covariates. This approach was taken because data were unlikely to be missing at random and thus multiple imputation was not appropriate. As a sensitivity analysis, effect sizes from models with missing data indicators were compared with those estimated in complete-case analyses (S.Table 13).

We also tested whether time-varying confounding had an impact on our results. Participants were stratified into three groups. Group 1 included participants who received their most recent course of therapy before completing the GSU-Q. Group 2 included participants who were receiving therapy approximately concurrently to completing the GSU-Q. Group 3 included participants who received therapy after signing up and completing the sign-up questionnaire. Results are presented in S.Table 7 and S. Figure 4. Effect size estimates were compared across strata and with the primary analysis using two-sample z-tests (S.Table 14). Finally, to assess the impact of length of time between THO-Q completion and time of therapy, analyses were repeated using a more restrictive cut-off (5 years) and a less restrictive cut-off (15 years) compared the primary analysis (10 year cut-off). Effect size estimates were compared using two-sample z-tests (S.Table 15).

Results

Characteristics of the analysis sample

At the time of analysis 4814 GLAD participants (12.9% of 37,413; S. Figure 1) responded to the therapy history and outcomes questionnaire and 4380 respondents reported receiving psychological therapy at least once (90.1% of THO-Q respondents). Within these therapy receivers (79.8% female), 632 (14.4%) had received one course of therapy and 3727 (85.6%) had received more than one course of therapy in their lifetime. All subsequent analyses related to each participant’s most recent course of therapy. There were 2890 GLAD participants who reported receiving psychological therapy for a disorder of interest as an adult (i.e. aged 18 + at the time of receiving therapy), in the previous ten years (2010–2020). Further details on all analysis variables are provided in Table 1 and S.Tables 1, 2 and 3.

Table 1 Analysis variables in a subsample of the Genetic Links to Anxiety and Depression (GLAD) study participants (n = 2890) who received psychological therapy (cognitive behavioural therapy or counseling) for major depressive disorder, generalised anxiety disorder, or phobic/panic disorders

Factors associated with retrospectively self-reported therapy outcomes

For a summary of our findings see Table 2. Overall, the findings for univariable and multivariable analyses were consistent. Brant tests showed that the assumption of proportionality in the proportional odds model was not violated (Omnibus test: p = 0.31; S.Table 10). Factors associated with less favourable outcomes were: number of episodes, personality disorder symptoms, being male, and receiving therapy for the first time (versus having had a previous course of therapy). Factors associated with favourable outcomes were: having university-level education, and use of an additional therapeutic activity. After adjustment for all other factors and multiple testing, only four factors had statistically significant associations: more illness episodes and greater personality disorder symptom severity were associated with poor outcomes, and higher educational attainment and reported regular use of a therapeutic activity were associated with favourable outcomes.

Table 2 Summary statistics from multivariable (MV) and univariable (UV) proportional odds ordinal logistic regression models using maximum likelihood estimation to test for associations between self-rated therapy outcomes (global rating of improvement) and sociodemographic, clinical and therapy factors self-reported in a subsample of the Genetic Links to Anxiety and Depression (GLAD) study participants (n = 2890) who received psychological therapy (cognitive behavioural therapy or counseling) for major depressive disorder, generalised anxiety disorder, or phobic/panic disorders

Sensitivity analyses

Sensitivity analyses showed that there were no statistically significant differences between effect sizes estimated in the primary analyses, which used missing data indicators to retain all participants for analysis (n = 2890), and analyses restricted to participants with complete data (n = 2783; S.Table 13).

When analyses were stratified by timing of therapy relative to GSU-Q, only one notable difference was observed. For participants who received their course of therapy after completing the GSU-Q (n = 1437), concurrent medication use was significantly associated with poorer outcomes, whereas for individuals who received therapy prior to (n = 944), or concurrent to the GSU-Q (n = 701), non-significant effects towards more favourable outcomes were observed (S.Tables 7 and 14).

Finally, we also compared analyses based on different intervals of time (5 years, n = 2511; 10 years, n = 2890; 15 years, n = 3082) between receiving therapy and response to the THO-Q. We observed no statistical differences between effect sizes estimated using these different cut-offs (S.Table 15). This suggests that reliability of recall or changes in therapy protocols has not had a large impact on responses over these time frames.

Discussion

We tested associations between self-reported therapy outcome and patient characteristics, replicating four associations with therapy outcomes identified using traditional measures of symptom change. Personality disorder symptoms [19, 20]. and the number of recurrent episodes [11] were associated with poorer self-reported outcomes. Higher educational attainment (i.e. obtaining a university degree) and reported regular use of a therapeutic activity were associated with more favourable outcomes.

These findings are in keeping with the previous literature which suggests that more complex cases may require multimodal or long-term support [13]. In previous studies, these markers of case complexity have included personality disorder symptoms, comorbidity and chronicity, of which we found strong evidence only for personality disorder symptoms. Notably, high burden from personality disorder symptoms has been reported in primary care samples in the UK [42]. Whilst there was an association between psychiatric comorbidity and poor treatment outcomes in the univariable model, this association did not remain significant in the multivariable model.

The positive association detected between educational attainment and favourable treatment outcomes is also consistent with prior studies which have found lower cognitive ability to be associated with poorer outcomes following psychological therapy [12]. Reporting use of an additional therapeutic activity was also associated with better outcomes. It makes sense that individuals who used an additional therapeutic activity as well as psychological therapy would experience a greater improvement in symptoms, especially as some of these are evidence-based interventions for depression or anxiety [43]. Additionally, some may have been related to interventions discussed in therapy or part of a relapse-prevention plan agreed at the end of the course of therapy. For example, exercise is an efficacious treatment for mild-to-moderate depression [44] and mindfulness-based cognitive therapy is recommended in the NICE guidelines as an intervention for recurrent depression [45].

Overall, the findings from this study are as expected. All factors tested have small effect sizes and have the expected direction of effect based on the prior literature. While far from conclusive, this pattern of findings suggests that retrospective measures of therapy outcomes may have some value as a supplementary approach towards increasing sample sizes and identifying new predictive factors. Larger samples may provide additional statistical power such that analyses can be stratified into more homogeneous subgroups and allow for interactions between predictive factors to be assessed [23].

Moving forward, it will be particularly important to formally assess the validity of retrospective self-report approaches for measuring therapy outcomes, by collecting concurrent data on symptom severity. The GRC has high test retest reliability [29] and high face validity [30] in studies of chronic pain. However, qualitative research has identified discrepancies between the GRC and PHQ-9 in patients with MDD [46]. They found that participants attributed discrepancies to (1) differences in accuracy between the measures (citing the GRC as more accurate); (2) impact of recent life events at time of measurement; (3) influence of self-motivation (desire for improvement on PHQ9); and (4) poor recall when reporting the GRC.

We acknowledge that retrospective self-report data collected from a volunteer cohort is not a gold-standard approach and is likely to be noisier than clinician rated data, where greater specificity can be achieved. For example, self-reported data is less reliable than clinically ascertained data [47], and selection bias is a well-known limitation of such cohort studies [48]. However, we would argue that a range of approaches to data collection are needed to provide the huge sample sizes that will be required to detect interactions associated with treatment outcomes, in order to develop predictive tools to deliver stratified mental health care.

Another limitation of the study is the time-varying nature of data collection. Ideally, the ordering and intervals between data collection time-points would be the same for all participants. However, we aimed to use a “minimal” broad-brush approach with the ultimate goal of increasing sample size. Notably, when we stratified analyses by approximate timings, we found no striking differences between strata.

In sum, we used a minimal phenotyping approach, a single questionnaire item collected retrospectively, to assess therapy outcomes in a large volunteer cohort of individuals with anxiety or depression. We tested for associations with sociodemographic, clinical and therapy-related factors and replicated three previously reported associations. When traditional approaches are impractical, this approach provides an alternative to collect therapy outcome data quickly and inexpensively on a large scale. However, more work is required to assess the validity of retrospectively self-reported therapy outcome measures. If such measures are highly correlated with traditional outcomes, they may hold promise towards providing the sample sizes that will likely be required to develop predictive tools for stratified mental health care.