Introduction

Depression is currently the fourth leading cause of illness and disability among adolescents aged 15–19 years and fifteenth for those aged 10–14 years globally [1]. Different treatment approaches, including watchful waiting, medication, cognitive behaviour therapy (CBT), interpersonal therapy (IPT) and non-directive supportive therapy (NDST), are recommended for young people with depression in the UK [2]. These are predominantly delivered through Child and Adolescent Mental Health Services (CAMHS); UK NHS-based services responsible for providing mental health support to children and young people. However, with increasing demand and limited resources, CAMHS have long waiting lists and high thresholds for accepted referrals [3] resulting in many young people with depression not receiving timely clinical interventions. Lack of, or delays in, treatment often lead to more severe presentations of depressive illness over time [4]. In response, more emphasis has been placed on expanding therapy provision into non-NHS services, including schools [5]. It is therefore opportune to examine options that may be effectively delivered across non-NHS services and by professionals not formally trained in providing mental health support.

One treatment that can be delivered within various settings and by professionals of different levels of expertise [6], including non-specialists outside clinical services [7,8,9], is Behavioural Activation (BA). The fundamental aim of BA is to increase positive reinforcement through restoring and increasing engagement in purposeful and rewarding activities. Through increased engagement it is hoped that there is a resulting positive emotional impact on an individual’s mood, relationships and energy [10]. This brief psychological treatment requires fewer sessions and shorter training than several more established therapies (e.g., CBT, IPT), making it a less-resource intensive alternative. Furthermore, BA’s focus upon withdrawal, inactivity, and avoidance, which are common symptoms in young people with depression [11], may make it better suited for this group.

The National Institute for Health and Care Excellence (NICE) in the UK recommends BA for adults with depression [12], but less is known about its delivery with young people. BA does not currently feature in any national [2] or international [13] recommendations for depression in young people. A 2017 systematic review examined the effectiveness of BA in the treatment of depression in young people [14]. The review included ten studies of which three were Randomised Controlled Trials (RCTs) and were pooled with a meta-analysis. The results demonstrated an effect in favour of BA over its comparators (CDRS-R: -4.2; 95% CI -8.25, -0.09). Whilst these findings suggested that BA may be effective in treating depression in young people, the paucity of studies highlighted the necessity of further research. A later meta-analysis [15] including four BA-focused RCTs, two of which were included in the meta-analysis by Tindall et al. [14] reported an effect in favour of BA vs. controls (1 active intervention, 1 signposting and 2 no treatment) with a pooled standardised mean difference of -0.7 (95% CI -1.20 to -0.20).

Since we completed our literature search in 2015 for our original review [14], increased focus has been placed on BA for young people. In 2021, the first known European feasibility RCT of BA for young people with depression [16] was conducted. The acceptability of BA and its promising outcomes when delivered in CAMHS or in school settings were shown in case reports [17,18,19] and pilot and feasibility studies [20, 21].

To broaden the provision of mental health interventions in the community, online delivery can be used to increase anonymity [22] and accessibility [23] and to reduce stigmatisation [24]. While Tindall et al. [14] did not identify any online versions of BA, the general shift to online therapy was expedited by the COVID-19 pandemic, urging further examination in this area.

Aims and objectives

As recent years have seen an increased number of studies on BA with children and young people, we updated our earlier review by Tindall et al. [14] and we have included additional searches for economic evidence and for remote delivery by phone or online. The current review summarises the most up-to-date study-level evidence to answer four questions: (1) Is BA effective in treating young people with depression? (2) Does BA for depression improve comorbid symptoms of anxiety and quality of life? (3) Is BA for young people with depression cost-effective? (4) Can BA be delivered online or by telephone rather than in-person?

Methods

We registered the review protocol on the International Prospective Register of Systematic Reviews (PROSPERO) (reference: CRD42023410806) and followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [25] to guide our methods.

Information sources and screening

In March 2023 and February 2024, the following electronic databases were searched: Cochrane Library, EMBASE, MEDLINE, CINAHL Ultimate (EBSCO), PsychINFO, Scopus, and the ISRCTN registry. To cover peer review and grey literature sources, the Health Management Information Consortium, Open Grey, the Networked Digital Library of Theses and Dissertations and Web of Science Conference Proceedings were searched. The reference lists of all included studies were examined and reverse citation searching was completed in Google Scholar. We did not impose any restrictions on publication status or language.

All titles and abstracts identified were double screened by three reviewers (LT, PK, EH) against the pre-defined eligibility criteria. Where there was any uncertainty regarding a study’s inclusion, it was retained for full text screening. The same three reviewers conducted the full text screening with two reviewers independently screening each paper. Any disagreements were discussed until a consensus was reached. All screening was undertaken using Rayyan Software [26].

The search strategy (Supplementary Information S1) was based on three main constructs: behavioural interventions (i.e. behavioural activation, behavioural therapy, behavioural interventions, self-monitoring, activity scheduling), depression (i.e. depressive disorder, depressive, depression, depressed) and young people (i.e. adolescents, children, teen, youth, juvenile, pre-pubescent, student). The search period was set from 2015 to the present.

Inclusion/exclusion criteria

Studies were included if at least 90% of their sample consisted of ≤ 18-year-olds with a diagnosis of depression, or with symptoms likely to be of diagnosable depression, as established by a validated screening tool or diagnostic manual. We included studies in which BA was a) based on a schedule of enjoyable, purposeful and rewarding activities, b) designed for depression c) offered as a standalone intervention, or as the core/dominant intervention, rather than as an equal part of a multicomponent intervention and d) delivered in any settings (e.g. schools, health services, community) and in any mode (e.g. face-to-face, online, by phone).

We included all types of quantitative study designs: RCTs, observational studies, pre-post evaluations and case studies. The primary outcome was depression/depressive symptoms measured by validated instruments, including self-report questionnaires and clinician/researcher administered measures. We were also interested in comorbid anxiety symptoms (measured by validated assessments), cost-effectiveness data and quality of life (QoL) outcomes. No restrictions were placed on the length of follow-up (when outcomes were measured) but we only included studies with at least two assessment points, one of which was at baseline.

Data extraction

Three reviewers used a data extraction proforma, which was first piloted for consistency with three papers, to record the following information as reported in the included studies: study characteristics (study name, author(s), year of publication/production, location, and setting), study design, study populations (basic participant demographics, depression diagnosis methods), intervention and comparator details (intervention/comparator type, duration, session number), and relevant outcome data for effect size calculations (depression severity, unit of measurement).

Quality assessments

The original review [14] assessed the quality of all included studies using the Moncrieff Scale [27] plus the Cochrane Risk of Bias (RoB) tool [28] for the RCTs only. In this review, we used the Cochrane RoB tool (newest version, RoB-2) [29] to assess the quality of the included RCTs. Based on this tool, RCTs were graded in terms of their ‘bias’ as either ‘low risk, ‘some concerns’ or ‘high risk’. We used the Moncrieff Scale to assess the quality of both RCTs and pre-post evaluations, by attributing a score of 0, 1 or 2 to 23 risk items, with higher scores denoting higher study quality.

Data synthesis

We carried out a narrative synthesis of the results of all 24 studies and a meta-analysis of RCTs that reported outcomes based on the Children’s Depression Rating Scale – Revised (CDRS-R) [30], using a random-effects model and displaying the results in forest plots. All analyses were undertaken in Stata version 18 [31].

Two RCTs [16, 32] reported complete cases whilst four reported imputed results [23, 33,34,35]. For our analyses, we included the primary results as reported by authors, irrespective of whether the results were based on complete cases or imputed outcomes. We defined our primary outcome as the data reported by a study at end of treatment or at the earliest follow-up point, although this may not have necessarily been the primary follow-up point for the study. Where studies had three treatment arms, and one of these was BA, this was taken as the intervention and the other treatment arm that was a placebo/waiting list (rather than another active treatment) was selected as the comparator.

In one study [34] two forms of BA were compared with usual care. To address this within the meta-analysis, and in alignment with the Cochrane handbook for systematic reviews of interventions recommendations (version 6.3) [36], the pooled mean and standard deviation (SD) were calculated based on the two intervention groups and formed a single BA group. The meta-analysis included only self-reported outcomes by young people and not parent-reported outcomes.

Statistical heterogeneity was assessed using the I2 statistic with a value of 25% regarded as low, 50% as moderate, and 75% as high [37]. Publication bias was assessed using funnel plots.

Results

Retrieved and selected studies

We conducted our original searches in July and August 2015, identifying 5,931 records, of which 5,495 were screened, after removing duplicates (n = 436). Title and abstract screening identified 42 full-text articles of which ten were eligible for inclusion. We updated the same searches in March 2023 and February 2024 and identified 25,414 records, of which 17,024 were screened, after removing duplicates (n = 8,390). Reverse citation searching identified one additional eligible paper [38] that had not been identified in the original review.

Title and abstract screening identified 62 articles for full-text review, of which 14 were eligible and 48 were excluded for the following reasons: the sample was predominantly over 18 year olds, (more 90% aged ≤ 18 years) (n = 27), the sample included participants without depression at baseline (n = 11), the intervention was mixed and not standalone BA (n = 5), the BA was not developed for depression (n = 1), depression was not included as an outcome (n = 1), the inclusion criteria were not specified (n = 1), pre/post data was not included/was missing (n = 1), the reporting paper was inaccessible (n = 1). (Reasons for exclusion can be seen in Supplementary Information S2). Where necessary, authors were contacted to request additional information during data extraction.

The PRISMA diagram of the updated searches is presented in Fig. 1, the PRISMA of the earlier review can be found in Tindall et al. [14].

Fig. 1
figure 1

PRISMA Diagram. From: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 6(7): e1000097. doi:0.1371/journal.pmed100009.

Characteristics of included studies

Study design

A total of 24 studies, published between 1985 and 2024, met our inclusion criteria. Six (6) were RCTs [16, 23, 32,33,34,35] and 18 were pre-post evaluations [17, 18, 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53] in which outcomes were measured at baseline and at a minimum one follow-up point. The pre-post-evaluations included single case studies, one-group within-participant designs and multiple non-randomised groups.

Sample

Participants were between 8 and 18 years old. The largest study by Schleider et al. [23] included 2,452 participants, whereas the total sample size across the remaining 23 studies was 306 participants, ranging from single case studies to 60 participants [35]. Most studies (n = 15) included a higher proportion (> 50%) of girls/young women (n = 8) or a female-only sample (n = 7), and only 5 studies had a gender balance or included more boys/young men.

Study settings

More than half of the studies (n = 14) took place in the USA [23, 32, 33, 35, 38, 40, 42, 43, 45,46,47,48,49, 52], and the rest we carried out in the UK (n = 7) [16,17,18, 39, 42, 44, 53], Australia [51], Sweden [34] and the United Arab Emirates [50]. BA was delivered in clinics/treatment centres (n = 15), schools (n = 8) and the community (n = 1).

Interventions and comparators

In 20 of 24 studies BA was delivered in-person by professionals including clinical psychologists, students (graduate, doctoral and post-doctoral), mental health clinic staff, social workers, school counsellors, psychological wellbeing practitioners and study therapists. BA was delivered online in two studies [23, 34] and via videocall in two studies [41, 53], one of these due to the COVID-19 pandemic.

The number of BA sessions ranged from 1 to 22, with a typical frequency and duration of weekly sessions lasting 20–30 min or 1 h. Most studies delivered BA in one-to-one sessions, although two studies [33, 45] were group BA. Twenty (20) studies followed a standardised treatment manual for professionals, and three of those [17, 41, 51] also used a workbook for young people.

Comparators in RCTs included a waiting list, usual care (e.g., CBT, medication, Psychodynamic Therapy, IPT) and/or alternative psychological and supportive therapies (e.g., Self-Control Therapy, Supportive-Control Therapy, Growth Mindset Single Session Intervention [GM-SSI]) or a psychological placebo.

Outcome measures

CDRS-R [30] and the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) [54] were the most frequently used depression outcome measures. Eight (8) studies also assessed anxiety. Only two studies [34, 42] reported QoL data using the Family Quality of Life Scale (FQOLS)and KIDSCREEN-10 index, respectively. The outcome measures are detailed in Table 1.

Table 1 Characteristics of included studies by study design

Quality assessments

Quality of RCTs

Using RoB-2, only Chu et al. [33] was found to have low risk of bias across all 5 assessed domains. There were ‘some concerns’ of bias in Grudin et al. [34], whereas the remaining four RCTs [16, 23, 32, 35] had a high risk of bias overall (Fig. 2). The domains with lowest scores for risk were ‘randomisation process’ and ‘bias in outcome measurements;’ the domain with the highest scores was ‘bias due to missing outcome data.’

Fig. 2
figure 2

RoB-2 included RCTs (n = 6

The Moncrieff Scale scores for the RCTs (Table 2) ranged from 26 [16] to 38 [23]. All studies received two points for method of allocation, use of diagnostic criteria, recording exclusion criteria, describing outcome measures clearly, presenting results with inclusion of data for re-analysis of main outcomes and providing justified conclusions. All but one study [35] attained two points for providing objectives, specifications and main outcomes a priori and employing a representative sample. The domains with the highest risk of bias were for ‘conducting assessments of treatment compliance’ and ‘providing information on comparability and adjustment for difference in analysis.

Table 2 Moncrieff Scale – included RCTs (n = 6)

Quality of pre-post evaluations

Moncrieff Scale scores for the pre-post evaluations ranged from 8 to 28 (Table 3). Only one domain—‘outcome measures described clearly’—attained two points across all studies. Most studies (15/18) attained two points for providing clear descriptions about treatments and sample demographics. None conducted power calculations, blinded assessors, or provided information about side-effects. Furthermore, the Moncrieff Scale guidelines stipulate that samples of < 50 participants receive a score of zero [27]. As the maximum sample size employed in the pre-post evaluations was 36 all received zero on this domain. Across the remaining domains, most studies attained a score of one or two suggesting measures had been taken to minimise bias.

Table 3 Moncrieff Scale – included pre-post evaluations (n = 18)

Depression outcomes

Supplementary information S3 and S4 give details of individual study results. Below we summarise results across studies for each outcome measure.

Depression outcomes from RCTs

CDRS-R: Reductions in CDRS-R scores were demonstrated in four RCTs [32,33,34,35]. In Grudin et al. [34] scores reduced from pre-to-post-treatment, and then again at 3-months follow-up, in both the guided and self-guided BA groups. In the usual care group, although CDRS-R scores reduced from pre-to-post-treatment, little further reduction was seen at 3-months follow-up. Across time points these reductions were significant for the guided BA group (B =  − 11.3, p < 0.001, 95%CI − 14.9 to − 7.7) and the self-guided BA group (B =  − 10.38, p < 0.001, 95%CI − 13.93 to − 6.82), but not usual care (B =  − 4.40, p = 0.077, 95%CI − 9.33 to 0.52, p > 0.05).

In McCauley et al. [35] 76% of those randomised to BA scored ≤ 40 at post-treatment, indicating a depression diagnosis to be either ‘unlikely’ or ‘possible’, compared to 42% of the usual care group. Chu et al. [33] reported CDRS-R scores reduced from pre-to-post-treatment in the BA group, and increased in the wait-list group but statistical analyses were not performed. Finally, in Stark [32], CDRS-R scores reduced across time-points in all groups (BA, Self-Control Therapy, wait-list). Reductions were the greatest in the Self-Control Therapy group, followed by the BA group. The difference between groups at post-treatment was not significant (p < 0.30).

MFQ/SMFQ: Kitchen et al. [16] administered the Mood and Feelings Questionnaire (MFQ) which generates scores between 0–66 with higher scores representing worse mood. Larger mean reductions were reported from baseline to 6-months follow-up in the BA group (-18.11: n = 11, M = 33.91, SD: 11.80 to n = 5, M = 15.8, SD:6.22) compared to usual care (-8.8: n = 11, M = 35.55, SD:11.09 to n = 6, M = 26.67, SD: 12.6) on this measure. Grudin et al. [34] reported significant reductions on the Short Mood and Feelings Questionnaire (SMFQ) scores across all groups (guided-BA: B =  − 4.4, p < 0.001, 95%CI − 6.2 to − 2.6; self-guided BA: B =  − 3.39, p < 0.05, 95%CI − 6.48 to − 0.30; usual care: B =  − 4.04, p = 0.001, 95%CI − 6.22 to − 1.86). In McCauley et al. [35] SMFQ scores reduced for both BA and usual care from pre-to-post-treatment, but with no statistical significance (p = 0.53).

CDI: In Stark [32] CDI scores reduced across all groups from pre-to-post-treatment and then to follow-up, with greater reductions seen in the BA and Self-Control group (ANCOVA test p < 0.01). Similar results were found by Schleider [23] who administered the short-form CDI (CDI-SF) and reported reductions across all groups from baseline to 3-months follow-up. Compared to the control group, those in the BA group and the active comparator group (Growth Mindset: GM) demonstrated significant decreases in depression from baseline to follow-up (BA: t(1,673) =  − 3.62; Padj < 0.001; d = 0.18; 95% CI 0.08 to 028, GM: (t(1,629) =  − 3.53; Padj < 0.001; d = 0.18; 95% CI, 0.08 to 0.27). No significant differences from baseline to follow-up were found between the two active treatment conditions.

CDS: Stark [32] also reported reductions in CDS scores across all groups from pre-to-post-treatment and then to follow-up, with no statistical significance found between groups (p < 0.07). Chu et al. [33] reported larger reductions in CES-D scores for the BA group than the wait-list group but statistical tests were not performed. Figure 3 provides a graphical display of all depression outcome measures across RCTs.

Fig. 3
figure 3

Forest plot of outcome measures for depression, by study

Except for Chu et al. [33], all RCTs reported the number of participants and mean (SD) of outcome measures by group. We therefore calculated the standardised mean differences (SMD) for continuous measures using Hedge’s adjusted g [74]. For the studies employing the CDRS-R, this measure was selected for estimating SMD. The remaining two studies [16, 23] employed only one depression measure each (MFQ, CDI-SF, respectively) which were used to calculate the SMD. The overall effect size was -0.24 (95% CI -0.33 to -0.15) suggesting a significant effect favouring the intervention (Fig. 4). According to Cohen’s d approach [75], an effect size of 0.5 could be considered moderate and 0.2 small. The I2 statistic was 0% (p = 0.49) indicating no statistical heterogeneity was present.

Fig. 4
figure 4

Forest plot of standardised mean difference (excluding Chu et al., 2016)

Using the CDRS-R, a meta-analysis included the four studies [32,33,34,35] (Fig. 5). The overall mean difference was -4.99 (95%CI -8.67 to -1.31) in favour of the intervention (Z = 2.66, p = 0.008). Estimates of between-study variance τ2 = 0.0000. The I2 statistic was 0% (p = 0.802) suggesting no statistical heterogeneity.

Fig. 5
figure 5

Random effects meta-analysis of the CDRS-R across included RCTs (n = 4)

Publication bias was examined using funnel plots. Given the limited number of studies, the conclusions are indicative but uncertain. There was no evidence of publication bias when using CDRS-R as the outcome measure (Supplementary Information S5) but there was some indication when examining the outcome measures using SMDs, (Supplementary Information S6).

Depression outcomes from pre-post evaluations

CDRS-R: Six of the pre-post evaluations [40, 42, 43, 45, 48, 49] administered the CDRS-R. In the two studies employing a stepped-care approach [40, 42] CDRS-R scores reduced for those receiving BA. Riley and Gaynor [42] found a clinically significant change on CDRS-R at the end of treatment as well as from post-non-directive therapy (NDT) to post-BA (M = 41.57, SD:11.79; Z = 2.37, p = 0.02). In Douleh [40] only one participant received BA and demonstrated reduced CDRS-R scores from pre-to-post-treatment and then follow-up where scores indicated a depression diagnosis to be ‘unlikely’ or ‘possible’. Although the participant in Nabors et al. [48] had reduced CDRS-R scores from pre-to-post-treatment, their post-treatment score still suggested diagnosable depression. Significant decreases in CDRS-R scores were seen from pre-to-post-treatment in the studies by Ritschel et al. [43] (F(2, 40) = 33.60, p < 0.001) and Ritschel et al. [49] (F(1,5) = 19.94, p < 0.01). The remaining pre-post evaluation to administer the CDRS-R [46], also reported reductions in scores from pre-to-post-treatment.

BDI-II: The studies which administered the BDI-II [38, 40, 46, 49, 51, 53] reported reductions following BA with statistical significance found in Ritschel et al. [47](F(1,5) = 330.00, p < 0.001) and Ritschel et al. (2016) (F(2,40) = 34.14, p < 0.001).

RCADS depression subscale: Statistically significant reductions from pre-treatment to follow-up we reported in two studies [17, 18].

K-SADS: Three studies [35, 46, 49] collected the K-SADS at pre-treatment and follow-up. McCauley et al. [35] found that 77% of BA participants no longer met diagnostic criteria for depression on the K-SADS post-treatment compared to 25% of those who received usual care. In Ritschel et al. [49] and Jacob et al. [46] four out of six participants and two out of three participants (respectively) no longer met criteria for MDD.

Across other depression measures completed by the pre-post evaluations, reductions in depression were reported from pre-treatment to follow-up. However, in Arnott et al., [39] while mean SMFQ scores reduced from pre-to-post-treatment, they increased at follow-up.

Parent-reported depression outcomes

Two RCTs [16, 34] and five pre-post evaluations [17, 18, 45, 52, 53] administered depression measures to parents/guardians which provided a proxy outcome for their children.

MFQ/SMFQ: In Kitchen et al. [16] the MFQ was administered to parents/guardians where a young person was ≤ 15 years. Reductions were reported from baseline to 6-months follow-up in both groups (BA group: -18.83; usual care: -2). These reductions were of a similar level to the child-reported outcomes. Grudin et al. [34] reported significant reductions in parent-reported SMFQ scores across all groups (guided-BA: B =  − 2.83, p < 0.01, 95%CI − 4.31 to − 1.34, self-guided BA: B =  − 3.75, p < 0.01, 95%CI − 5.65 to − 1.85; usual care: B =  − 3.29, p < 0.01, 95%CI − 5.17 to − 1.42), mirroring the trend across the equivalent child-reported outcome scores.

RCADS depression subscale: Two studies [17, 18] demonstrated reduced scores between pre-treatment and follow-up and one [53] from pre-to-post treatment. In two studies [17, 45] reductions were larger overall than for the equivalent child-reported measures.

CES-D: In Chu et al. [45], the mean reduction in parent-reported CES-D score was almost twice as large (8.83) as for the child-reported score (4.55).

CDI: In Weersing et al. [52] parent-reported CDI scores for one participant, decreased from pre-to-post-treatment and increased at follow-up, whereas the child-reported score decreased at follow-up.

Comorbid anxiety outcomes

Three RCTs [23, 32, 33] and five pre-post evaluations [45, 47, 50, 52, 53] assessed anxiety. Using the Screen for Anxiety and Related Emotional Disorders (SCARED), Chu et al. [33] reported greater reductions in scores from pre-to-post-treatment for the BA group than the waiting list control but with no statistical analyses. Furthermore, Jenness et al. [47] reported a reduction from pre-to-post-treatment, whilst both participants in Weersing et al. [52] had reductions in scores from pre-to-post-treatment and 6-months follow-up, with one attaining a score of zero at follow-up.

One RCT [33] and two pre-post evaluations [50, 52] also administered SCARED to parents/guardians. In Chu et al. [33] the results, as with the child-reported outcomes, showed greater reductions in anxiety scores in the BA group from pre-to-post-treatment compared to the waiting list group, with no statistical analyses performed. The mean pre-treatment score for the parent-reported SCARED (29.67) was nearly twice as high as the child-reported SCARED (16.89) with this gap persisting through to post-treatment. Shadan et al. [50] found reduced SCARED scores from baseline to week four and week six of treatment. Whilst Weersing et al. [52] reported reductions from pre-to-post-treatment, one participant attained an increased score at 6-months follow-up in contrast to the child-reported score which reduced to '0’.

Schleider et al. [23] reported reductions on the Generalized Anxiety Disorder-7 (GAD-7) across all groups from pre-treatment to 3-months follow-up. Compared to the control, the BA group did not show significant decreases in generalized anxiety (t(1,637) =  − 0.37; p = 0.72; d = 0.02; 95% CI, − 0.08, 0.12) from baseline to follow-up but the active comparator group (GM) did (t(1,629) =  − 2.08; p = 0.038; d = 0.10; 95% CI, 0.006, 0.20).

Stark [32] reported statistically significant reductions on Revised Children’s Manifest Anxiety Scale (RCMAS) scores from pre-to-post-treatment for those receiving BA or Self-Control Therapy (p < 0.01) and no improvement for the wait-list group. Individuals who received Self-Control Therapy demonstrated the highest reductions in anxiety at post-treatment.

Several other anxiety measures showed mixed results following BA. Jenness et al. [47] administered the Liebowitz Social Anxiety Scale (LSAS) and found that although there was a reduction in scores from pre-to-post-treatment, this remained in the clinical range. In Chu et al. [45] two of the five included participants demonstrated reduced anxiety scores following BA using the Multi-dimensional Anxiety Scale for Children (MASC). For the remaining three participants, one demonstrated increased anxiety, one remained the same whilst the other withdrew from treatment and did not complete follow-up. When administered to parents/guardians, Chu et al. [45] reported a mean reduction from pre-to-post-treatment which was significantly larger (12.58) than that for the child MASC (1.40).

Chu et al., [33] found the Anxiety Disorders Interview Schedule for DSM-IV child interview Clinician Severity Rating (ADIS-IV CRS) scores to be significantly lower in the BA group than the waiting list when anxiety was a secondary diagnosis (B = 2.09 (0.82), p = 0.01). Using the same measure, in Chu et al. [45] 75% of participants no longer met anxiety diagnosis criteria following BA.

Mohamed et al. [53], found scores on the RCADS anxiety subscale to reduce from pre-to-post treatment when parent-reported, however little change was seen in the child-reported scores.

Combined depression and anxiety outcomes

One study [34] reported depression and anxiety scores combined using the RCADS. Scores reduced from pre-to-post-treatment in both the guided and self-guided BA groups, with increased scores then seen from post-treatment to follow-up. However, follow-up scores were lower in both groups than their respective values at pre-treatment. For the usual care group, RCADS scores increased from pre-to-post-treatment, reducing to around baseline levels at follow-up. These changes were significant in both BA groups (guided: B = -2.12, p < 0.001, 95%CI -0.92 to -3.32; self-guided: B = -2.48, p < 0.05, 95%CI –4.85 to -0.10) but not in the usual care group (B = -0.29, p = NS, 95%CI -2.75 to 2.16). On the parent-completed RCADS, scores reduced from pre-to-post-treatment and then, unlike the child-reported scores, continued to decrease through to follow-up across all groups. These improvements were significant in both BA groups (guided-BA group: B = -1.59, p < 0.05, 95%CI -2.94 to -0.24; self-guided BA group: B = -2.20, p < 0.01, 95%CI 3.58 to -0.82) but not the usual care group (B = 1.39, p = NS, 95%CI -3.17 to 0.39).

Quality of Life (QoL) outcomes

Three studies [34, 40, 42] assessed QoL with two of these [34, 42] reporting their findings. Significant improvements in Health-Related QoL (HRQoL) were reported across both studies from baseline to end-of-treatment [42] and at 3-months follow-up [34] (see also Supplementary Information S7).

Cost-effectiveness

We found no economic evaluations that met our inclusion criteria.

Online and telephone-facilitated BA

Four included studies delivered BA remotely: two RCTs used exclusively an online format [23, 34] and two pre-post evaluations used videocall with some online materials [41, 53]. Three of the studies [34, 41, 53] assessed feasibility and one RCT assessed effectiveness [23] with a very large sample (n = 2,452). The collective findings support remotely delivered BA as an acceptable alternative to in-person BA (see also Supplementary Information S8).

The large RCT [23] found a small effect with a single online session. The authors concluded that the programme’s brevity and flexibility may have reached young people who might not have otherwise accessed help at all. The study was delivered in a self-selected group of young people from the general public, and it was not intended as a replacement for routine care in clinical populations.

Discussion

This systematic review updated our earlier one [14] summarising study-level evidence on the effectiveness of BA for depression, comorbid anxiety and quality of life for children and young people. BA was delivered in-person, by phone or online, and was supported by a diverse group of trained staff in clinics, schools and the community. We looked for evidence on the cost-effectiveness of BA, but no economic evaluations met our inclusion criteria.

In a total of 24 studies, 11 more than our earlier review 7 years ago [14], the overall conclusion from 6 RCTs and 18 pre-post evaluations was that BA can reduce symptoms of depression in children and young people over time. The pre-post evaluations did not have a comparator, so we cannot say with certainty that this reduction was due to the intervention. In the RCTs, the reduction in depression symptoms was greater compared to waiting list controls, usual care or other therapeutic and supportive interventions. This was based on the CDRS-R, which was used in 4/6 RCTs and 6/18 pre-post evaluations.

Our meta-analysis of 4 RCTs found a small effect of BA compared to both active and passive controls. Only 1 of those RCTs was intended as a large clinical trial, whereas the other 3 were feasibility RCTs not designed or powered to test effectiveness. The large clinical trial was unusual in that it used a single session self-guided online BA during the pandemic as a public health intervention. This BA was not designed as a clinical intervention—despite the fact that the majority of the sample had elevated symptoms above a diagnostic cut-off; therefore, it is unlikely that this BA intervention is fit for the purpose of producing clinically meaningful change. Still, if we consider the small effect in the context of offering BA to the general population (e.g. all school children), then even a small difference per individual young person can shift the population distribution. In this case, studies need to be clear from the outset whether they intend to use BA as a clinical or a public health intervention.

In relation to whether BA for depression can also improve anxiety, the results were mixed and similar to findings reported in other studies [76]. Only a few studies measured and reported improvements on QoL. It is however challenging to detect any significant changes using QoL measures, moreover the guidelines for assessing QoL are less well-developed in children and adolescent populations than adult ones [77].

We identified three studies that used an online platform to facilitate BA delivery which suggested that these methods can help improve access to support for young people with depression. These findings endorse those of other studies that have found the remote delivery of BA to be acceptable and to enable the intervention to be delivered at large [78]. Besides increasing accessibility, delivering therapy to young people via an online platform has also been shown to increase anonymity [22] reduce stigmatisation and enable those reluctant to engage one-to-one with a therapist to access care [24].

Within the last 7 years, the number of young people participating in BA studies that meet our eligibility criteria, increased from 170 recruited from two countries – USA and Australia, to 2,758 participants in three additional countries—UK, Sweden and UAE. There have been studies in low middle income countries (LMICs) that evaluated BA for depression in children and adolescents [79, 80] but none met our inclusion criteria on account of their sample (not all participants had diagnosable depression) or their intervention (BA was only one of a multicomponent programme).

The depression outcomes reported within this review add support to consideration of BA as a promising treatment option for young people experiencing depression. Across the RCTs, improvements in depression were greater for those who received BA, or equivalent to groups who received an active comparator. These findings are similar to those reported in adult studies [81, 82].

Many of the included studies (n = 9/24) delivered BA in educational or community settings—rather than a clinic—supported by a wide range of professionals, including social workers, graduate students and school counsellors. This is important evidence to support policy initiatives that aim to expand evidence-based mental health interventions for young people beyond health services [4]. The findings are consistent with research in adult populations that found BA to be effective and cost-saving when delivered by non-specialists or by junior workers [6, 9].

One final significant observation was the discrepancies between parent and child reported outcomes in several of the included studies. These findings are consistent with other literature [83,84,85] in which parents either over-reported or under-reported depression symptoms by proxy for their children. This underscores the importance for researchers and clinicians to prioritise child self-reported outcomes for depression and anxiety, and not to rely exclusively or heavily on parental proxy report. Still, the parent’s view of their child’s depression symptoms is valuable as it can influence help-seeking in the first place and engaging in therapy thereafter.

Limitations

The included 6 RCTs did not rate well on RoB-2, with only one [33] scoring 'low' risk of bias overall. Only 3/6 RCTs [23, 34, 35] conducted power calculations. Sample sizes were small in 5 RCTs (22–60 participants), whereas the sample of the sixth RCT was nearly 14 times larger than the combined sample of the rest. For the pre-post evaluations, no studies reported power calculations, blind assessment, or adverse effects. Across all 24 studies, only 5 had a follow-up period long enough for assessment of long-term outcomes as defined by the Moncrieff scale (i.e., ≥ 6 months); therefore, we cannot tell whether the effect of BA is durable.

We have included a narrative synthesis of findings relating to anxiety and mixed anxiety and depression, but these were considered as the “secondary effect” of BA, which is designed for depression. To answer the question of whether BA is effective for anxiety, a review needs to include studies in which participants were recruited based on diagnosable anxiety – with or without depression, or studies with mixed samples in which anxiety scores were reported at baseline and followed-up separately for a sub-sample of participants who scored above a clinical cut-off. In addition, we need a plausible theoretical or clinical framework that explains how BA may work for anxiety, given that its mechanism of action hinges on depression.

Finally, we were stringent in our inclusion criteria by selecting studies in which participants in the sample had diagnosable depression established by a validated screening tool or diagnostic manual. This meant however that we excluded many studies that used a mixed sample of participants from the general population, that included some young people with depression. This is because, these studies did not analyse and report the findings for those with depression separately e.g. Lynch et al. [86].

Conclusion

BA is a promising enough intervention for reducing depression symptoms in children and adolescents to justify the need for further RCTs, providing that five conditions are met: studies are powered to detect a minimal clinically important difference; BA materials are fit-for-purpose to produce clinically meaningful change; follow-ups are longer than 6 months; primary outcomes are child-reported; intervention costs, resource use and adverse events are reported.

Clinical relevance and implications of systematic review and meta-analysis

Our review is a comprehensive synthesis of all current available research on behavioural activation—a brief psychological intervention—for diagnosable depression in children and adolescents. It has two important implications. Firstly, it justifies the merit of future clinical trials and economic evaluations on behavioural activation for people younger than 18 years old. Secondly, it identifies five requirements for the design of future clinical trials to be able to generate evidence about the intervention’s clinical utility and value for money in this population.

Conflicts of interest

None.

Ethics approval

Not applicable.

Informed consent

Not applicable.

Consent for publication

Not applicable.