Introduction

Anorexia Nervosa (AN), Bulimia Nervosa (BN), Binge-Eating Disorder (BED), and Other Specified Feeding and Eating Disorder (OSFED) represent the main eating disorders (ED) defined in the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) [1]. They are characterized by persistent disturbances of eating behaviors and a core psychopathology centered on food, eating, and excessive concerns about weight and shape. The age of onset for ED is most frequently during mid to late adolescence [2, 3]. The prevalences of ED in children and adolescents aged 11–19 years have been reported to be 1.2% (boys) and 5.7% (girls) [4, 5], with an increasing incidence during the Covid-19 pandemic [6]. ED can be life-threatening illnesses and are associated with significant impairments in psychiatric and somatic health, quality of life, and delays in development [7, 8]. Among adolescents with an ED there is common co-occurrence with other psychiatric disorders, particularly depression (up to 50%) and anxiety disorders (up to 35%) [9, 10]. Due to the serious nature of ED and their somatic and mental health consequences, early identification and treatment is important.

Clinical practice guidelines and consensus statements have been developed for the management of ED in children and adolescents up to 18 years [11,12,13,14]. Both commonalities and differences are found in the recommendations on psychological therapies for youth with ED among the clinical practice guidelines internationally [15]. For children and adolescents with AN and BN eating disorder-focused family therapy is the psychological treatment with the strongest evidence-base [10], and is considered the first choice of treatment across the guidelines, whereas cognitive behavior therapy (CBT) for eating disorders could be considered if family therapy for ED is not feasible or accessible, has been ineffective, or is undesired due to patient or family preferences [e.g., 12, 13]. For children and adolescents with BED treatment with CBT for eating disorder is recommended, and for youth with OSFED it is recommended to offer the ED-treatment that the OSFED most closely resembles [12].

Involvement of the family in the treatment of ED in children and adolescents is based on models and techniques from several schools of family therapy, as well as systemic and behavioral therapeutic approaches. The two main models of family therapy for ED in children and adolescents are the Maudsley family therapy [16], and family based treatment (FBT) [17]. While some differences exist both models are phased, and a major aspect of both approaches is empowering parents to mobilize the family system and family resources they possessed prior to the onset of the disorder, re-implement them in the family system, and to encourage behavioral change in their child [16, 18]. The treatment can be implemented in routine clinical care [19], be adapted for higher levels of care [20], and can be delivered in different formats, including multi-family therapy [21]. Following family therapy for ED the remission rates among children and adolescents with ED are reported to be 40–50% [10, 22]. These numbers indicate that for a significant proportion of children and adolescents with eating disorders high levels of eating pathology persist at the end of treatment. Whereas Maudsley family therapy and FBT are well-established treatments [10] and the leading psychological therapies for children and adolescents with ED, they may not be effective or suitable for all. A second-line therapy that could then be considered is cognitive behavior therapy for eating disorders (CBT-ED) [12].

CBT-ED is an individual therapy that addresses the core psychopathology of eating as well as weight and body shape concerns through behavioral reduction of restraint, establishing regular eating and normalization of weight among underweight patients, and cognitive interventions to address dysfunctional beliefs and practices [23, 24]. Enhanced CBT (CBT-E) is a manual-based and trans-diagnostic version of CBT, where an individualized formulation of the patient’s difficulties guides the treatment [25, 26]. CBT for the eating disorders is widely recognized as the treatment of choice for ED in adults, with good recovery rates particularly for non-underweight individuals [24]. CBT-E has been adapted for use with children form the age of 11 and may include the involvement of parents given the child’s age and circumstances [27, 28], and for use with those who required more intensive care such as daycare or inpatient treatment [29]. CBT has been found effective for children and adolescents with a remission rate of up to 50% [22].

Data on the effectiveness of family therapy and CBT for ED when delivered in routine clinical care are emerging, however, the data are scarce compared to the empirical support from efficacy studies [10]. Examining the effectiveness is important as evidence-based therapies may perform differently in routine clinical care compared to delivery in the research settings [30]. Studies conducted for the purpose of establishing efficacy are designed to have high internal validity, e.g., by using rigorous inclusion and exclusion criteria, randomizing participants to conditions, and having highly trained therapists. This methodological rigor of efficacy trials, aimed at maximizing experimental control, may reduce external validity. There have been questions raised about the transferability of results to routine clinical care where patients, therapists, and treatment context may differ from those in efficacy studies [31, 32]. Therefore, studies in less controlled routine clinical care settings, at sites beyond those where the initial evidence was derived, have been called for [33]. Also, the routine clinical care setting is a crucial service site since the majority of children and adolescents with ED will seek and receive their treatment there [34]. Thus, it is important for clinicians to know what outcome to expect from the recommended treatments of ED when delivered in routine clinical care, and how results fare in comparison with outcome in specialized university research settings.

Previous meta-analyses focusing on the effectiveness of evidence-based treatments for children and adolescents when delivered in routine clinical care have reported treatment outcomes comparable to outcomes from efficacy studies conducted in university research settings. These meta-analyses have examined effectiveness studies for children and adolescents with internalizing disorders [32], externalizing disorders [35], and for children with autism spectrum disorders [36]. To the best of our knowledge, no meta-analysis of effectiveness studies of family therapy and CBT for ED in children and adolescents has been published. In the most recent evidence-based update on psychosocial treatments for ED in children and adolescents results across 31 studies were examined for various interventions [10]. Thus, a meta-analysis on current state of the effectiveness of family therapy and CBT for children and adolescents with ED in routine clinical care is warranted.

Previous meta-analyses of psychological treatments have found different moderators of the effect size (ES). In the present meta-analysis we will use five categorical variables. The first is design. A meta-analysis by Hilbert et al. [37] reported that study design was not a significant moderator of the primary outcome. Since our meta-analysis uses pre-post ES and includes both RCTs and Non-RCTs it is important to investigate this variable. The second is statistical analysis. Some meta-analyses have found no difference in ES between intent-to-treat (ITT) and completer analysis [e.g., 38, 39], and others that completer analysis yielded a higher ES [e.g., 40]. Thus, from a methodological point of view, this is an important moderator to assess. The third is risk of bias. A meta-analysis on AN [38] described that studies with low RoB yielded higher effect size than studies with high RoB, whereas other meta-analyses have reported that RoB was not a significant moderator [39, 40]. Thus, RoB is included as a moderator. The fourth is treatment format. Meta-analyses by Davey et al. [41], Hilbert et al. [39], and Linardon and Wade [42] reported treatment format to be a significant moderator of outcome. The fifth is continent. Previous meta-analyses investigating this variable have reported different results. For example, Cuijpers et al. [43] found that studies from North America yielded higher ES than studies from Europe, whereas Öst et al. [44] and Wergeland et al. [32] reported that studies from Europe yielded higher ES than studies from other continents.

There are also some continuous variables of interest as potential moderators. We considered five. The first is mean age. Hilbert et al. [39] found that lower age of the sample was associated with better outcome, Linardon et al. [45] and Murray et al. [40] that age was unrelated to outcome, whereas Svaldi et al. [46] and van den Berg et al. [38] reported larger effect size for samples with older patients. The second is the percent of females. Hilbert et al. [39] found that higher proportion of females in the study was associated with better outcome, whereas Linardon [47] reported that sex was not related to the outcome. The third is pre-treatment severity. Hilbert et al. [39] found that the lower the BMI and the higher the number of binge-eating episodes at pre-treatment the better was the outcome, whereas Öst et al. [48] did not find that pre-treatment severity of ED psychopathology was a significant moderator. The fourth is the methodological quality of the included studies. Quality has in previous meta-analyses been found to be associated with lower ES [49] as well as with higher ES [50]. The fifth is the amount of therapy, measured as months of therapy and hours of treatment. Amount has in some meta-analyses been found to be a positive moderator [36, 51], but Svaldi et al. [46] did not find that the duration of treatment was related to the outcome.

The present study aimed to add information to the literature by providing a meta-analysis on the effectiveness of ED-focused family therapy and CBT for ED children and adolescents, i.e., treatments that are recommended according to international guidelines for ED, when these are carried out in routine clinical care. Studies in which patients are referred for treatments through usual clinical routes, treatments are delivered by practicing clinicians, and as part of routine clinical care were included. Both non-randomized and randomized trials were included to ensure comprehensive coverage and because both designs are commonly used in effectiveness studies [31]. The specific aims were: First, to examine the effectiveness of ED focused family therapy and CBT for ED in children and adolescents. Second, to evaluate methodological stringency and risk of bias in the effectiveness studies and investigate potential moderators of treatment outcome. Third, to compare the effectiveness of family therapy and CBT for ED, and fourth to compare the outcome of these treatments delivered in routine clinical care with that reported in efficacy studies for ED.

Methods

The meta-analysis was pre-registered at PROSPERO [CRD42023441794], and was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines (PRISMA) [52]. For details see Supplementary information (SI) 1. The meta-analysis was designed according to the PICOS acronym in the following way:

  • Population: children and adolescents with an ED diagnosis.

  • Intervention: any format or variations of family treatment (FBT, FT-AN, multi-family treatment), or any format (Individual, Group, Self-help, Guided self-help) of CBT (CBT-ED, CBT-BN, or Transdiagnostic CBT).

  • Comparison: within-group change, i.e., pre vs. post/follow-up data.

  • Outcome: primary (ED-psychopathology symptoms, weight) and secondary (depression, remission).

  • Study design: RCTs and pre-post/Non-randomized studies of intervention (NRSI).

Literature search

Studies were identified by systematic and comprehensive literature searches of the electronic databases Ovid MEDLINE, Embase OVID, and PsycINFO from the start of the data bases to April 14, 2023, with an updated search on December 8, 2023. The list of search terms was generated by the authors in collaboration with a university librarian. Both subject headings and free text words for the following search terms to search the databases were used: Family therapy and variations thereof; Cognitive behavior therapy and variations thereof; ED (including the different eating disorders); the design of the study; age group (up to 18 years). For details on the electronic search see SI 2.

The abstracts were read by four pairs of authors independently of each other to decide whether a study warranted a more detailed reading. Full-text articles were retrieved if there was any indication of a target group of patients receiving the family therapy for ED or CBT in a routine clinical care setting. Additionally, a manual search was performed by reviewing the reference lists of potentially eligible articles. In total, 339 full-text articles were considered for inclusion. The final decision for article inclusion was made using a stricter set of inclusion and exclusion criteria detailed below. Disagreements were resolved by consensus discussion among the authors and/or consultation with the last author. In cases where there was insufficient information provided for the inclusion criteria to be applied or there were insufficient details reported on the outcomes, the corresponding author was contacted to request inclusion in the meta-analysis.

Inclusion criteria

To be included a study had to:

  1. 1.

    Be published, or in press, in an English language journal.

  2. 2.

    Have participants diagnosed with an eating disorder according to DSM (III and later) or ICD (10 or 11) with focus on AN, BN, BED, Other specified ED (atypical AN, BN and BED, and purging disorder) and avoidant restrictive food intake disorder (ARFID). Samples selected based on a transdiagnostic perspective were also included.

  3. 3.

    Be testing any format of family therapy or CBT.

  4. 4.

    Have participants referred for treatment through usual clinical routes.

  5. 5.

    Be an effectiveness study, i.e., carried out in a routine clinical care setting such as a community mental health center, at patients’ homes, etc.

  6. 6.

    Have therapists who are practicing clinicians for whom provision of service is a substantial part of their job.

  7. 7.

    Have a treated sample consisting of at least 10 participants.

  8. 8.

    Have a maximum mean sample age of 18 years, and a maximum participant age of 20.

  9. 9.

    Provide a continuous or dichotomous measure of the principal disorder treated, with data making it possible to calculate effect size.

Exclusion criteria

  1. 1.

    The study is a secondary analysis of a previously published study. Separate follow-up studies to the basic study are included to provide follow-up data.

  2. 2.

    The study is an evaluation of a service where the results for individual disorders cannot be extracted.

  3. 3.

    The study is testing a combination of family therapy for ED/CBT and another, e.g., pharmacological treatment and all participants in that condition receive both treatments. Also, studies selecting a sample of patients fulfilling criteria for an eating disorder and another psychiatric disorder, e.g., depression, are excluded.

Categorization of studies

To be categorized as an effectiveness study, participants had to be referred through ordinary clinical channels (or self-referred), the treatment was carried out in routine clinical care settings (or in patients’ homes for internet-based treatment), and the therapists were ordinary clinicians who work with a caseload of patients with different diagnoses.

Studies with children and adolescents diagnosed with AN, BN, BED, EDNOS or OSFED were included. In addition, we included a number of studies that had a mix of the above eating disorders and presented combined results. For the comparison between treatment methods we combined CBT, CBT + App, and CBT-Enhanced (CBT-E) to the category CBT, and FBT, FT-AN/BN, and Multi-family therapy (MFT) to the category family therapy for ED.

Potential categorical moderators

An a priori requirement for including any potential categorical or continuous moderator in the analysis was that at least 70% of the studies provided information on that variable, as lower rates would probably lead to questionable representativity. Design was categorized as either RCT or NRSI/Pre-post studies. Statistical analysis was categorized as completers (if dropouts were deleted) or as intent-to-treat (ITT, if all randomized or starting participants were included in the statistical analysis). Risk of bias (RoB) was based on a summary evaluation of the domains rated for the different designs (see below) and the studies were categorized as low, moderate, or high RoB. Treatment format could either be individual, family, or a combination of individual and family. The location in which the study was carried out was categorized as Africa, Asia, Australia, Europe, North America, or South America.

Potential continuous moderators

The following variables were used as potential continuous moderators: mean age, percent females, pre-treatment severity (calculated as a percentage by dividing the sample mean with the maximum score possible of the rating scale applied), methodology score (see below), and amount of therapy. The data were extracted using a pre-designed coding scheme and a scoring manual including the variables of interest. The data were extracted and categorized independently by the pairs of authors and any disagreements were solved after consensus discussion.

Methodological quality

The psychotherapy outcome study methodology rating scale (POMRS)

The POMRS consists of 22 items covering various important aspects of the methodology in psychotherapy outcome research [53]. Each item is rated on a 3-point scale (0 = poor, 1 = fair, and 2 = good), and each step has a written description. The total score can vary from 0 to 44 points. Since all items do not apply to all studies, the total score was recalculated as a percentage of the maximum score possible for the individual study. The internal consistency of the scale was good with a McDonald’s ω of 0.80. The inter-rater reliability of the scale (between GJW and LGÖ), based on 20% randomly selected and blindly rated studies, was ICC = 0.98 (95% CI 0.94–0.99, p = 0.0001), which according to Cicchetti [54] is excellent.

Risk of bias

The Cochrane Collaboration tool for assessing RoB [55]was used for RCTs and the RoB in NRSI (ROBINS-I) [56] was used for NRSI and pre-post studies. An overall classification of the studies was done for RCTs into the categories high, moderate (some concerns), or low RoB. For the NRSI and pre-post studies the categories low, moderate, and high (serious or critical) RoB were used. The rating of the studies was done by two authors (GJW and LGÖ) and differences were discussed to reach consensus.

Effect size measures

Primary outcome measures

The first primary measure was scores on a validated semi-structured interview, or rating scale of eating disorder psychopathology. The Eating Disorders Examination (EDE) [57] is an interview-based assessor rating and different versions were used in 13 studies. The Eating Disorders Examination-Questionnaire (EDE-Q) [58] was applied in nine studies. The Eating Disorder Inventory-2 (EDI-2) [59] was used in seven studies, and EDI-3 [60] in four studies.

In studies on AN, weight is a commonly used primary outcome measure. This was assessed as per cent expected body weight (% EBW), defined as percentage of the expected weight corresponding to the 50th percentile for gender, age and height according to the Center for Disease Control growth charts in 21 studies, Body Mass Index (BMI) defined as kg/m2 was used in 14 studies, and % median BMI was calculated using charts from the World Health Organization (WHO) for age, height, and gender, and used in eight studies. In addition, we planned to assess binge eating episodes and compensatory behaviors but since very few studies provided this information it is questionable if the information extracted would be representative for the entire body of studies.

Secondary outcome measure

Depressive symptoms were considered in 15 studies, and ten studies used the Children’s Depression Inventory [61], three studies used the Beck Depression Inventory (BDI or BDI-II) [62, 63], and two the Mood and Feelings Questionnaire [64].

Another planned secondary measure was remission. However, only 15 studies (28%) provided such data, and this was not considered representative for all included studies.

Meta-analysis

To obtain as many effectiveness studies as possible, both RCTs and pre-post/ NRSI trials were included in the meta-analysis since within-group ES can be calculated from both types of studies. ES were calculated as (Mpre – Mpost)/SDpre according to a recommendation by Lakens [65], since there is good reason to assume that the interventions influence not only the means but also the standard deviations. The mean ES was computed by weighting each ES by the inverse of its variance. ITT data were used when a study provided those, otherwise completer data were used.

Before pooling, the effect sizes were screened for statistical outliers, defined as being outside M ± 2SD. On the ED-psychopathology measures there was one outlier at the post-treatment assessment and one at follow-up. On the weight measures there was one outlier at follow-up. For these ESs, winsorizing [66] was used by reducing outliers to the exact value of M + 2SD. The Comprehensive Meta-Analysis v.4 (CMA) [67] software was used for the analyses and Hedges’ g was calculated to correct for small sample sizes. A random effects model was used since it cannot be assumed that the ESs come from the same population. Lipsey [68] described an empirically developed rule-of-thumb for considering an ES as small (≤ 0.32), moderate (0.33–0.55), and large (0.56–1.20). Also, Sawilowsky [69] denoted ESs as very large (1.20–1.99) and huge (≥ 2.00).

Sensitivity analysis was done for the primary outcome measure ED-psychopathology in three ways to test the robustness of the pooled ES. First the pre-post correlation was varied from 0.1 to 0.9 and then the effect of the different ED-psychopathology measures was tested by deleting each of them not being EDE or EDE-Q. Third, the pooled ES was calculated by removing one study at a time.

Proportions were analyzed in CMA. The values of the individual studies were transformed using logit transformation and the statistical analysis was done on the transformed proportions using the random effects model. Then the pooled proportion and its 95% confidence interval was back-transformed to a proportion.

Heterogeneity among ESs was assessed with the Q-statistic and the prediction interval. The true effect size in 95% of all comparable populations will fall within this interval [70]. Publication bias was assessed with funnel plots, Egger’s regression intercept [71] and the trim and fill method described by Duval and Tweedie [72]. Moderator analyses of categorical variables were done with subgroup analysis using the mixed effect model and of continuous variables with meta-regression using the random effects model.

Efficacy studies for comparison

The recent comprehensive evidence-based update by Datta et al., [10] were consulted to obtain the efficacy studies to be used in a comparison with effectiveness studies. From this the RCTs of family therapy for ED and CBT recommended by the treatment guidelines reviewed in the introduction were listed. Since this update included both efficacy and effectiveness studies, those RCTs we had already included in the body of effectiveness studies were deleted. This resulted in 15 RCTs for our comparison and the references are listed in the SI 3. This type of benchmarking in which ES for effectiveness and efficacy studies are statistically compared using a meta-analysis software has previously been done in three similar meta-analyses on effectiveness studies in children and adolescents [32, 35, 36] and five in adults [44, 48, 73,74,75].

As for the effectiveness studies, data were extracted for the type of primary outcome measure most frequently used in both types of studies (some ED-psychopathology measure and weight in AN), at post-treatment and follow-up assessment separately. To compare the two categories of studies on background and treatment variables, data were also extracted on mean age, proportion of females, pre-treatment severity, comorbidity (% of the sample having at least one comorbid disorder), medication (% of the sample that at pre-treatment was prescribed a psychotropic drug), treatment time (in 60 min units), and attrition rate (% dropout of patients who participated in at least one session). Other variables were not reported systematically (or not at all) in a large enough proportion of studies, which precluded inclusion as a background variable. Since the result tables will entail many statistical tests, the Holm-Bonferroni correction was used to control the family-wise error rate [76].

Power analysis

The number of studies and treatment conditions, which is the unit of analysis in the overall comparison of effectiveness and efficacy studies, were as follows: effectiveness studies 44/53 and efficacy studies 15/20. This yields a total number of 59 studies and 73 treatment conditions with an average of 54 participants per condition. According to the formulas for power analysis in meta-analyses by Valentine et al., [77], with these figures we would have a 99% power to detect an ES of 0.20, assuming a high heterogeneity.

Results

Description of the effectiveness studies

A total of 44 studies comprising 53 treatment conditions were included. A flow-chart of the study inclusion is shown in Fig. 1. References to the included studies are provided in SI 4.

Fig. 1
figure 1

Flow-chart of the inclusion of studies

Background data

Background data for the included studies are displayed in Table 1. The conditions came from the following continents: Europe 24, North America 16, Australia 12, and Asia 1. The number of conditions for the different ED were: AN 32, BN 1, EDNOS 1, and Mixed 19. The total number of participants receiving treatment in the studies was 3251 (range 10–290), with 94.1% on average being females. The mean age across the studies was 15.4 years (SD 0.5; range 14.1–17.9). The prevalence of comorbid psychiatric disorders was only reported in 58% of the conditions with a mean of reported disorders being 42.9% (SD 7.7) in these studies, and use of psychotropic medication in only 42% of the conditions with a mean of 27.4% (SD 39.2). The pre-treatment severity on an ED-psychopathology measure could be calculated for 77% of the conditions and the mean was 49.3% (SD 29.1).

Table 1 Background data for the included studies

Treatment data

The treatment data are presented in Table 2. The treatment setting was outpatient care in 37 conditions, daycare in five, daycare followed by outpatient care in one, inpatient care in five, inpatient followed by daycare in one, and inpatient followed by outpatient care in four conditions. Treatment format was family in 24 conditions, individual in nine, a combination of formats in 19, and parent only treatment in one. The treatment was carried out over a mean of 7.1 months (SD 3.5; range 1.5–13.4) and calculated as hours of treatment the mean was 35.3 (SD 41.0; range 6–180). Follow-up assessment was done in 24 conditions (45%) and on average 17.7 months (SD 22.9; range 5–83) after the end of treatment. ITT statistical analysis was provided for 38 conditions (72%) and completer analysis for 15.

Table 2 Treatment data for the included studies

Methodological data

The research methodology score had a mean of 43.5% (SD 10.8), which corresponds to a raw score of 19.1 points. The RoB classification is presented in SI 5. Among the 12 RCT-conditions 11 had a low and 1 had a moderate RoB. Regarding the 41 NRSI/pre-post conditions 22 had a moderate RoB and 19 had a high RoB.

Meta-analysis

Attrition

Data on attrition were provided for 47 of the conditions (88.7%) and the mean rate was 15.3% (95% CI 12.8–18.2). AN-studies had a mean of 14.0% and Mixed ED-studies 17.8%. The Q between studies (Qb; 1 df) was not significant (1.77, p = 0.18).

Primary outcome measures

Data on the primary measure of ED-psychopathology were provided for 77% of the conditions, and the results are displayed in Table 3. Sixty-eight percent of the studies that provided such data used the EDE or the EDE-Q, whereas the EDI-2 was used in 22%, and the EDI-3 in 10%. Since they all measure ED-psychopathology pooling within this category was considered to be acceptable. At post-treatment the mean ES across all disorders was large (0.80) and significantly heterogeneous. A subgroup analysis comparing AN and Mixed ED did not yield a significant difference. At follow-up, the mean ES was still large (0.97) and heterogeneous with no significant differences between the ED-disorders. Thus, it seems that the effects of treatment were maintained at follow-up. However, only 45% of the conditions had follow-up data. Regarding publication bias for the ED-psychopathology, Egger’s regression intercept yielded a non-significant t-value (0.41, p = 0.68). Thus, publication bias does not seem to be a problem for the ED-psychopathology measure.

Table 3 Results on ED-psychopathology measures at post and follow-up assessment

The sensitivity analysis with different pre-post correlation (0.1, 0.3, 0.5, 0.7, and 0.9) yielded the following results at post-treatment; 0.798, 0.799, 0.800, 0.801, and 0.805. Thus, the mean ES changed very little due to the various estimates of the pre-post correlation. Regarding the effect of the various ED-psychopathology measures the overall ES at post was 0.80, when EDI-2 was removed 0.87, and when EDI-3 was removed 0.88. Thus, the overall ES was robust across measures of ED-psychopathology. The method of removing one study at a time was used and the mean ES fell between 0.78 and 0.82, indicating that none of the studies impacted the mean ES unduly.

The results for the primary outcome measure weight in AN are presented in Table 4. Some type of weight measure was provided by all of the conditions with AN-participants and yielded a very large ES (1.64) at post-assessment, which was significantly heterogeneous. The subgroup analysis of the type of weight measure did not show a significant difference between them. At follow-up the ES was even higher (2.07), significantly heterogeneous, and with no significant difference between the measures. The analysis of publication bias yielded a significant Egger’s regression intercept (t = 4.21, p < 0.001).

Table 4 Results on weight measures for AN at post and follow-up assessment

Moderator analyses

Regarding the ED-psychopathology measure, the subgroup analyses of categorical variables are displayed in Table 5, left hand side. Using Holm-Bonferroni correction only the RoB-variable yielded a significant difference (Q = 27.7, p = 0.001) between the included categories. Subsequent pair-wise comparisons showed that studies with moderate RoB had significantly higher ES than studies with either high (Q = 19.87, p = 0.001) or low RoB (Q = 13.14, p = 0.001). Regarding the five continuous variables, the meta-regression analyses yielded a significant point estimate (0.235, z = 2.73, p = 0.006) for mean age of the sample; studies with higher mean age were associated with a higher ES.

When it comes to the weight measures (Table 5, right hand side) there were two variables significantly moderating the ES. First, the type of statistical analysis showed a significant difference (Q = 10.11, p = 0.001), with higher ES for studies using completer analyses (2.21) than those using ITT analysis (1.37). Second, the continent at which the study was carried out also differed significantly (Q = 9.30, p = 0.01), and the subsequent pair-wise comparisons showed that studies from Europe had significantly higher ES (2.17; Q = 5.47, p = 0.02) than studies from North America (1.40) and studies from Australia (1.33; Q = 9.29, p = 0.002). None of the continuous variables acted as a significant moderator of the ES for weight measures.

Table 5 Subgroup analyses of categorical variables in ED-psychopathology measures and weight measures in AN at post-treatment

Secondary outcome measure

Depression was assessed in 40% of the conditions. At post-treatment the mean ES was 0.61 (95% CI 0.47–0.75, z = 8.55, p = 0.0001) and heterogeneous (Q = 70.1, p = 0.0001), and at follow-up it was 0.67 (95% CI 0.53–0.82, z = 9.13, p = 0.0001) but not heterogeneous. Regarding publication bias, Egger’s regression intercept was not significant (t = 1.17, p = 0.25).

Comparison between treatment methods

The results for the comparison between treatment conditions using family therapy for ED and those using CBT are presented in Table 6. On the ED-psychopathology measure the mean ES for CBT (1.05) at post-treatment was significantly higher (p = 0.001) than that for family therapy (0.68). At follow-up assessment there was also a higher (p = 0.022) ES for CBT (1.20) than for family therapy (0.84). On the weight measures for AN both treatments showed very large ESs at post-treatment and even larger at follow-up assessment, without being significantly different.

Table 6 Comparison of CBT and Family therapy (FT) on ED-psychopathology measures and weight (AN only)

Family therapy for ED and CBT were compared on the background variables for which data were available from most studies. The mean age of the samples was 15.8 and 15.3 years, the proportion of females was 96.5% and 95.2%, the proportion who declined participation in the studies was 15.8% and 12.5%, the mean duration of the eating disorders was 21.3 and 12.3 months, the POMRS scores were 45.1% and 44.8%, and the mean severity on the ED-psychopathology measure was 55.3% and 46.9% for CBT and family therapy, respectively. None of the differences between the treatments were significant. Regarding categorical variables there was no significant difference on number of RCTs versus open trials (p = 1.0), ITT vs. completer statistical analysis (p = 0.29), low vs. moderate/high RoB (p = 0.23), and inpatient vs. outpatient care (p = 0.71). However, CBT -studies were done in Europe (84.6%) to a larger extent than in other continents, whereas family therapy-studies were carried out in North America (39.3%), Europe (35.7%), and Australia (25%).

Effectiveness-efficacy comparison

Background and treatment variables

The comparisons of effectiveness and efficacy studies on some background and treatment variables are displayed in Table 7. Applying the Holm-Bonferroni correction on the 7 t-tests in this table yielded no significant difference between the two types of studies. This makes for a fair comparison regarding effect sizes.

Table 7 Some background and treatment data (M and SD) for effectiveness and efficacy studies

Effect size on primary outcome measure

The comparison between effectiveness and efficacy studies on eating psychopathology and weight are presented in Table 8. On the ED-psychopathology measure at post-treatment there were large ESs for both types of studies with a small difference between them (0.80 vs. 0.84) with all disorders combined. Regarding AN there was a tendency for effectiveness studies to yield a higher ES than efficacy studies (0.85 vs. 0.63), and for the Other category there was a tendency that efficacy studies gave a higher ES than effectiveness studies (1.16 vs. 0.72). However, when applying the Holm-Bonferroni correction none of these differences was significant. At follow-up assessment for both types of studies the mean ES across disorders was maintained with a small difference between them (0.97 vs. 1.09). For the individual disorders there was no significant difference between the types of studies in AN, but for Other disorders efficacy studies yielded a significantly higher ES than effectiveness studies. However, this difference must be interpreted with caution since there were only three efficacy studies.

The results for weight in AN-conditions are shown in the lower part of Table 8. There were very large ESs for both types of studies at post-treatment and even higher at follow-up assessment. However, the difference between them was not significant.

Table 8 Effect sizes on ED-psychopathology measures and weight (for AN only) for effectiveness and efficacy studies at post-assessment and follow-up assessment

Discussion

The current meta-analysis aimed to investigate how family therapy and CBT work in the treatment of ED in children and adolescents when delivered in routine clinical care. The first aim was to examine the effectiveness of ED focused family therapy and CBT for ED in children and adolescents. Across the various measures of ED-psychopathology and weight measures for AN, the overall within-group effect size was large to very large for all disorders combined at post, with no difference between the AN and mixed ED group. Direct comparisons of the effect size estimates to the outcomes reported in the evidence-based updates on psychosocial treatments for ED in children and adolescents [10, 78] and in systematic reviews [22, 79, 80] are challenged by the use of various outcome measures across the included studies. However, among specific studies that have reported similar effect size calculations of ED-psychopathology and weight measures for AN, our finding of large to very large ES) compares favorably to the ES reported (but not pooled) by Datta et al., [10].

The overall attrition rate across the studies was only 15%, with no difference between the AN-studies and Mixed ED-studies. This figure is comparable with the reported dropout rate in adolescent RCTs for AN that fall between 10 and 20% [81], but considerably lower than the 50% reported for adolescents being treated for AN in a recent review of psychotherapies for ED [82]. As dropout from treatments for ED is reported to be a significant problem, the results indicate that family therapy and CBT for ED were acceptable to youth and their caregivers. To broaden the view, outcome data for the common comorbidity of depression was extracted. However, only 40% of the conditions provided these data, and although a moderate effects size was found the results may not be representative for the entire body of studies.

The second aim was to evaluate methodological quality and RoB in the effectiveness studies and examine potential moderators of treatment outcome. The result of the methodological quality assessment was encouraging given the high proportion of open trials and is comparable to recent meta-analyses on the effectiveness of evidence-based treatments for externalizing disorders [35], and autism spectrum disorders [36], but somewhat lower compared to internalizing disorders in children [32]. Methodological flaws were noted in several of the studies, with RCTs having a lower RoB. Overall, results showed that the majority of the studies had a moderate or high RoB.

Characteristics of the patient sample and study variables were examined as potential moderators influencing treatment outcome. Moderator analyses of the categorical variables did not provide support for a difference in ESs between RCTs and NRSI/pre-post studies across the outcomes of ED-psychopathology measures and weight measures in AN at post-treatment. Furthermore, treatment format did not moderate outcome. These findings corroborate the results from three previous meta-analyses on the effectiveness of evidence-based treatments for internalizing, externalizing, and autism spectrum disorders in children and adolescents [32, 35, 36]. Moderator analyses also showed that studies with moderate RoB produced larger effects on the ED psychopathology outcome measure compared to studies with high or low RoB. This result is similar to the finding in a meta-analysis of CBT effectiveness studies in adult ED [48]. There was also a difference in ES of weight measures outcome in AN between continents, with studies conducted in Europe reporting a higher ES compared to those from North America and Australia. Similar results were found for internalizing disorders in children [32] and obsessive compulsive disorder in adults [44]. For type of statistical analysis, the use of treatment completers data moderated the weight measures outcome in AN. The use of completer analyses may inflate results of treatment, as it may be that patients who drop out from treatment more often are not benefitting or find the treatment unacceptable [83]. It is therefore encouraging that 70% of the studies reported ITT data.

Of the continuous variables age moderated the outcome of ED-psychopathology. The finding that higher mean age was associated with a higher ES was in line with the meta-analyses by Svaldi [46] and van der Berg [38]. Although it has not been possible to draw firm conclusion regarding moderators of treatment outcome for ED in children and adolescents [84], one study found older age to negatively impact outcome [85], whereas another found age not being related to change in ED-symptoms [86]. As such, our results regarding age need to be interpreted with caution. Moderator analyses did not provide support for the other continuous variables to moderate ED treatment effects.

The third aim was to compare the effectiveness of family therapy and CBT for ED when delivered in routine clinical care. There was a difference on the ED-psychopathology measure in favor of CBT with a large compared to a moderate ES at post treatment. At follow-up the difference between CBT and family therapy remained with a very large compared to a large ES. No significant differences between family therapy for ED and CBT were found for the weight outcome in AN. Family therapy for ED and CBT studies were compared on 11 background and treatment variables. The only variable that showed a significant difference was the proportion of family therapy and CBT studies carried out in different continents, where CBT-studies more often were conducted in Europe.

Whereas family therapy for ED has the strongest evidence-base for children and adolescents with ED, no therapy is effective for everyone. Data suggest that the best evidence-based approach for adolescent ED leaves about 50% of the patients not fully remitted following treatment [87]. To the best of our knowledge, there are only three RCTs comparing CBT and family therapy for ED in adolescents. Ball and Mitchell [88] found no significant differences on EDE or BMI in AN-patients, Schmidt et al., [89] studied BN-patients and reported a significantly larger reduction of binge eating for guided self-help CBT than family therapy for ED at post-treatment but no difference at follow-up, and Le Grange et al., [90], also working with BN, found a significant difference in abstinence rate in favor of family therapy for ED, which disappeared at follow-up. Thus, these RCTs do not show higher effect for either of these treatments and more RCTs comparing different treatments are warranted.

Our findings do not address the issue of one of the treatments’ superiority over the other for children and adolescents with ED, or for whom these therapies may be most suitable. The therapies differ in their conceptualizations, levels of parental involvement, in strategies, procedures, and postulated mechanism of action adopted to produce the change. For the CBT studies the treatment format was individual in 54% of the studies and a combination of individual and family or group in 46%. In comparison, family involvement has been the corner stone in all studies of family therapy for ED (which was combined with individual treatment in 39% of the studies). Taken together, our results lend support for the use of CBT for ED among children and adolescents and that it may be considered an option when treating children and adolescents with ED in routine clinical care, and not only when family therapy for ED fails or is not feasible.

The fourth aim was to compare the outcomes of family therapy and CBT for ED when delivered in routine clinical care to efficacy studies for the same disorders. As an initial step, effectiveness and efficacy studies were compared on relevant background and treatment variables, and no differences were found. The effectiveness studies of the ED combined, AN, or the Other ED generated post treatment ES for ED-psychopathology outcomes that were similar to the ES from efficacy studies. The ES in both settings were in the large to very large range at post and follow-up assessments. The only difference between the studies was for the Other ED group at follow-up, where the efficacy studies produced a very large compared to a large ES in the effectiveness studies. However, only three efficacy conditions could be included in this comparison. For the weight measure outcome in the AN-studies the ES at post and follow-up were in the very large range with no differences between the efficacy and effectiveness studies. This pattern of findings with no significant differences in outcomes across efficacy and effectiveness studies replicates the findings in other meta-analyses of evidence-based treatments for various mental health disorders in children [32, 35, 36], and adults [44, 48, 73,74,75].

Strong methodological elements of the current meta-analysis included a power analysis indicating a high power to detect a small effect size. Furthermore, pairs of researchers screened abstracts and extracted information from the included studies with disparities solved in consensus, and ratings of methodological quality and RoB were done by one of the authors and independently by another.

Study limitations and future directions

Although our classification criteria of effectiveness studies were predefined and assessment could be made reliably by trained raters, studies differed on the quality of reporting the needed information. Thus, judgment was based on the sometimes limited and ambiguous information available, and perhaps some studies are missed that should have been included. This meta-analysis attempted to include studies on all EDs, but for BED, OSFED, and ARFID, however, there were few or no clinical trials to be included, limiting the generalizability of our findings for these disorders. Furthermore, as only five studies included children younger than 11 years the age range should be considered when interpreting the findings. Also, pooling the outcome based on different measures of ED-psychopathology and weight measure for AN is a limitation and international consensus in the assessment of ED is needed. However, the sensitivity analysis of ED psychopathology measures and sub-group analysis of weight measures showed that the different measures did not influence the pooled ES significantly. Finally, the majority of the studies had a moderate or high RoB.

There are several areas in which the field of ED in children and adolescents can be improved. The lack of consensus definitions of response and remission in the assessment of ED calls for effort to establish such standards that can be applied consistently across studies. Furthermore, reporting of outcome separately for each diagnostic group in studies with transdiagnostic samples data and at long-term follow-up assessments is recommended.

Conclusion

Our findings support the effectiveness of family therapy and CBT for ED in children and adolescents. Adequately trained clinicians who provide these treatments in their work with children and adolescents and their families in routine clinical care can achieve outcomes comparable to those in research clinic settings, and the treatments have a low attrition rate. Whereas family therapy for ED has the strongest evidence-base, our results suggest that CBT could be considered an option when treating children and adolescents with ED in routine clinical care. At the same time, the results also suggest there is room for improvement as a substantial number of children and adolescents with ED do not respond to the treatments currently available.