Short-term effects of an elimination diet and healthy diet in children with attention-deficit/hyperactivity disorder: a randomized-controlled trial

An Elimination Diet (ED) may be effective in reducing symptoms of Attention-Deficit/Hyperactivity Disorder (ADHD), but has never been compared to an active control condition [i.e., Healthy Diet (HD)]. In a two-armed RCT, a total of N = 165 children (5–12 years) with ADHD were randomized by means of minimization (1:1) to either an ED (N = 84) or HD (N = 81) within two Dutch child and adolescent psychiatry centers. The design included a non-randomized comparator arm including N = 58 children being treated with Care as Usual (CAU). Treatment allocation was unblinded. The primary outcome was a 5-point ordinal measure of respondership based on a combination of parent and teacher ratings on ADHD and emotion regulation, determined after 5 weeks of treatment. Ordinal regression analyses were done on an intention-to-treat basis. Fewer ED (35%) than HD (51%) participants showed a partial to full response, despite overall good-to-excellent treatment adherence (> 88%) and comparable high parental prior believes. A younger age and higher problem severity predicted a better respondership. CAU-preferring participants responded more often favorably (56%) compared to ED—but not HD—participants. Small-to-medium improvements in physical health (blood pressure, heart rate, and somatic complaints) were found in response to ED/HD versus decrements in response to CAU (74% received psychostimulants). The lack of superiority of the ED versus HD suggests that for the majority of children, dietary treatment response is not rooted in food-allergies/-sensitivities. The comparable results for treatment with HD and CAU are remarkable given that CAU participants were probably ‘easier to treat’ than HD (and ED) participants with proportionally fewer with a (suboptimal/non-response to) prior treatment with medication (4% versus 20%). Further assessment of long-term effects is needed to evaluate the potential place of dietary treatment within clinical guidelines. The trial is closed and registered in the Dutch trial registry, number NL5324 (https://www.onderzoekmetmensen.nl/en/trial/25997). Supplementary Information The online version contains supplementary material available at 10.1007/s00787-023-02256-y.


Supplement S1: inclusion & exclusion criteria and research diagnosis
For eligibility, participants had to meet the following criteria: clinical and research ADHD diagnosis according to the DSM-5 (any presentation) and 5-12 years old at the inclusion date.
Comorbidities were allowed except for eating disorders (i.e.anorexia or bulimia nervosa) and diabetes mellitus.Exclusion criteria were insufficient mastery of Dutch language in parents or children; current treatment for ADHD that could not be discontinued or was not stabilized; severe parent-child relationship problems requiring family therapy; unwillingness to have meat or animal food products in the diet (without these products it is impossible to achieve nutritional adequacy of the overall diet for ED participants).Two participants (one HD and one ED participant) continued using a stable dosage of risperidone during the diet.Discontinuation of risperidone was not advisable for these participants.
Next to the clinical ADHD diagnosis, an ADHD research diagnosis was established (Supplement A). based on the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) [1] in combination with teacher reports on the Strengths and Weaknesses of ADHD-symptoms and Normal-behaviors rating scale (SWAN) [2,3].For six participants, the screening was incomplete (e.g.K-SADS and/or SWAN missing).For these participants, the clinical ADHD DSM-5 diagnosis was used to classify the ADHD presentation.

Demographics
Descriptive measures at T0 included 1) total IQ estimated using five subtests of the Wechsler Intelligence Scale for Children (WISC-III) [4]; 2) demographics (e.g.SES), and 3) self-reported parental psychopathology using the ADHD rating scale (46 items assessing ADHD symptoms of the past six months and of the ages between 0 and 12 years old) [5] and the General Health Questionnaire (GHQ-12; 12 items assessing general health of the past week) [6].

Secondary outcomes
Body Mass Index Standard Deviation Scores (BMI-SDS) were calculated using the Dutch growth reference data [7,8].Sleep problems were examined using a 5-item questionnaire assessing problems with falling asleep, sleeping through, and total amount of sleep compared to peers of the past week.Somatic complaints, of the past week, were assessed using the Pittsburgh side-effects rating scale (18 items) [9].
Other secondary outcomes included emotional symptoms, conduct problems, peer relationship problems, and social behaviors of the past week: parents and teachers were asked to complete the Children's Social Behavior Questionnaire (CSBQ; 40 items) [10] and the SDQ (25 items) [11].These were all completed at both T0 and T1.Family functioning and parenting styles, of the past week, were assessed using the Family Functioning Questionnaire including 28 items (FFQ: English translation of 'Vragenlijst Gezinsfunctioneren Ouders (VGFO)') [12] and the Brief Scale of Parental Behavior including 25 items (BSPB: English translation of 'Verkorte Schaal voor Ouderlijk Gedrag (VSOG)') [13].The CarerQol instrument (7 items) [14] and the Parenting Stress Questionnaire including 34 items (PSQ: English translation of 'Opvoedingsbelasting Vragenlijst (OBVL)') [15] measured carerrelated quality of life in caregivers in the current situation.These were all completed at both T0 and T1.

Measurements that were taken into account to interpret the results of respondership
Food consumption (all treatment conditions) was measured through an online tool ('Eetmeter', Dutch Nutrition Center) available at the website of the Dutch Nutrition Center or as a mobile app (free of charge) [3].The validity of this tool is sufficient [16].Parents were asked to report all food consumed by the child, for two weekdays and one weekend day before T0 and send this information to the research staff by email (an export function is part of the online tool).The exported files included macro-and micronutrient values originated from the Dutch Food Composition Database (NEVO) [17] and the Dutch Nutrition Center Database for all reported foods.Based on this information, mean daily nutrient intake was calculated.Adherence to treatment was assessed on a 10-point scale (ranging from 1 no adherence to 10 perfect adherence to the diet) by dieticians and parents separately every week.
An aggregated adherence measure was created: 1) excellent (i.e.every week scores of eight or higher); 2) good (i.e.scores not lower than six) and 3) insufficient (i.e. at least once a score lower than six).
Adherence was only calculated for participants with at least two weeks of adherence data.Parents' prior believes about the success and burden of treatment were evaluated using a 5-item questionnaire at T0 and T1.Time in weeks between start treatment and T1 was calculated and total amount of time and consults needed during the dietician supervision was calculated.Parents rated their overall treatment trajectory experience on a scale of 0 to 10 using the GGZ-Thermometer, with higher scores reflecting more satisfaction with the treatment trajectory (http://www.ggznederland.nl/leden/thermometer/handleiding.html).Adverse events (AE) were assessed, which is described in the TRACE protocol paper [3].

Supplement S4: respondership
Response to treatment was evaluated by assessing the change in ADHD and emotion regulation problems of the past week at T0 and T1 (i.e.(T0-T1)/ T0 * 100) [18].Two exceptions to this formula were included: 1) if the T0 score was zero, no change score could be computed.Therefore, value one was added to the T0 and T1 score to be able to compute a change score; 2) absolute values were used to ensure that an improvement was not coded as deterioration or vice versa.
A 30% or more symptom decrease was regarded as a significant response to treatment and a 30% or more symptom increase was regarded as significant deterioration of symptoms.The primary outcome variable 'respondership' is divided into five categories: 1. Full responder (significant response on both parent and teacher rated scales): a. ≥ 30% improvement on at least one of three parent rated scales AND ≥ 30% improvement on at least one of three teacher rated scales AND on none of the parent and teachers scales ≥ 20% deterioration b.OR: ≥ 30% improvement including one teacher rated scale and two parent rated scales or vice versa AND on maximally one scale a deterioration between 20% and 25% AND on all other scales a maximum deterioration of ≤ 20% 2. Partial responder (significant response on parent or teacher rated scale): a. ≥ 30% improvement on at least one of three parent rated scales AND on all three teacher scales no improvement of ≥ 30% AND all scales a maximum deterioration of < 30% b.OR: ≥ 30% improvement on at least one of three teacher rated scales AND on all three parent scales no improvement of ≥ 30% AND all scales a maximum deterioration of < 30% c.OR: improvement between 20% and 30% on at least one of three parent rated scales AND improvement between 20% and 30% on at least one of three teacher rated scales AND all scales a maximum deterioration of < 30% d.OR: ≥ 30% improvement on at least one of three parent rated scales AND ≥ 30% improvement on at least one of three teacher rated scales AND one scale a deterioration between 25% and 30% AND all other scales a maximum deterioration of < 30% 3. Mixed responder (significant response on at least one parent rated scale and significant deterioration on at least one teacher rated scale or vice versa, or a significant difference within rater): a. ≥ 30% improvement on at least one of three parent rated scales AND ≥ 30% deterioration on at least one of three teacher scales b.OR: ≥ 30% improvement on at least one of three teacher rated scales AND ≥ 30% deterioration on at least one of three parent scales c.OR: ≥ 30% improvement on at least one of three parent rated scales AND ≥ 30% deterioration on at least one of three parent scales d.OR: ≥ 30% improvement on at least one of three teacher rated scales AND ≥ 30% deterioration on at least one of three teacher scales 4. Non-responder (no significant response): all six scales show no ≥ 30% improvement or ≥ 30% deterioration 5. Deterioration (significant deterioration on at least one parent or teacher rated scale): ≥ 30% deterioration on at least one of three parent rated scales OR ≥ 30% deterioration on at least one of three teacher rated scales AND a maximum improvement of < 30% on all scales Supplement S5: (un)planned missing date

Planned missing data
For 13 CAU participants data was missing, because these participants chose to participate only in the measures that could be taken from home.This resulted in planned missing data for IQ and physical measurements (blood pressure, heart rate, weight, height).Analyses were performed to examine if this subsample of CAU participants differed on demographical data (see Table 1 for an overview of the demographical data) compared to the other CAU participants.One statistically significant between group difference was found: parents in the former group had higher prior believes about success of treatment (M = 3.33, SD = 0.61) compared to parents in the latter group (M = 3.81, SD = 0.50), (t(219) = -2.76,p = 0.006).

Unplanned missing data
For 23 participants (N = 7 ED participants; N = 10 HD participants; N = 6 CAU participants), there was unplanned missing data of parents or teachers on the primary outcome measures.
Consequently, these participants could not be categorized in respondership categories in which data of both raters was needed (i.e.full or mixed responders).Analyses were performed to examine if this subsample differed on demographical data compared to participants without these missing data.Two statistically significant between group differences were found.First, fathers in the former group more often had another country of birth than the Netherlands (21.7%) compared to fathers in the latter group (8.0%), χ 2 (1) = 4.52, p = 0.034.Second, parents in the former group experienced less often clinical levels of stress (9.1%; based on the GHQ-12) compared to parents in the latter group (36.0%), χ 2 (1) = 6.46, p = 0.011.

Supplement S6: sample size calculation
The justification of sample size was calculated based on the assumption of superiority, i.e. that ED was more effective than the HD on the ordinal primary outcome respondership (i.e.five categories).A clinically relevant outcome was defined as detecting twice as many full responders in the ED group than in the HD group.Each dietary group included 81 children.With this sample size and using ordinal regression, the power was 0.99 (α = 0.05, two-sided test) to detect double the amount of full responders in the ED compared to the HD (Table S1: scenario 1).In addition, the power to detect one and a half times as many full responders in the ED compared to the HD was 0.64 (α = 0.05, two-sided test) (Table S1: scenario 2).To evaluate whether the criteria for the ADHD research diagnosis were appropriate to determine the ADD presentation, we compared baseline characteristics of the ADD presentation group to all other presentations (see Table S3).Table S3 includes only child characteristics, because no differences were found on parent characteristics (see Table 1 for parent characteristics).Results revealed differences between groups such as more girls and lower comorbid ODD problems in the ADD presentation group compared to the other presentations, which are in line with previous studies [19,20].

Supplement S8: assessment of comorbidity
ODD was based on the K-SADS filled out by parents and the conduct problem subscale of the SDQ filled out by teachers.Specifically, ODD was defined by a total score on the K-SADS of eight or higher, including at least three items with a score of three ('severe problems'), and a score of four or higher (range 0-10) on the SDQ subscale.
Probable ASD was based on the CSBQ and prosocial behavior subscale of the SDQ, both filled out by parents and teachers.Specifically, probable ASD was defined by 1) a score equal to or higher than the cut-off score on the CSBQ filled out by parents [10] or a score equal to or lower than four (range 0-10) on the SDQ subscale filled out by parents, and 2) a score equal to or higher than the cut-off score on the CSBQ filled out by teachers based on age and sex [10] or a score equal to or lower than four on the SDQ subscale filled out by teachers.
Clinically elevated internalizing problems were based on the emotional symptoms subscale of the SDQ filled out by parents and teachers.Specifically, this was defined by a score equal to or higher than five (range 0-10) of the subscale filled out by parents and/or a score equal to or higher than six of the subscale filled out by teachers.Results showed a significant difference in adherence between ED participants who followed the diet until T1 and ED participants who quit the diet before T1.The latter group showed more often insufficient adherence to treatment before quitting χ 2 (2, N = 147) = 8.06, p = 0.018.For HD participants, a trend was found χ 2 (2, N = 147) = 5.29, p = 0.071.Therefore, proportions of adherence were compared in the HD group using a z-test with Bonferroni corrections.Results showed a difference of 26.2% (95% CI [3.87, 48.61]) in the category insufficient adherence: more HD participants who quit the diet before T1 were categorized in the insufficient adherence group compared to participants who followed the diet until T1.

Supplement S9: adherence to dietary treatments
Logistic regression analyses using the backward step method were run to determine which factors predicted good to excellent adherence to the dietary treatments.Analyses including child characteristics as predictors showed that older children were less likely to show good or excellent adherence to the diets, rated by parents (OR: 0.57, 95% CI [0.34, 0.95], p = 0.030).The interaction term with treatment was significant (OR: 0. Note.SD = Standard Deviation; en% = energy percent; g = grams; n.a.= not applicable.a higher scores reflect higher believes in successful effects of treatment; b a complete overview of all assessed micronutrients can be found in Appendix S10; c no significant difference after correcting for energy (kcal); d based on international cut off points for BMI for overweight [21]; e After one week, three and four weeks of the Elimination Diet and after two weeks in the Healthy Diet; f after three and five weeks in the Elimination Diet and after five days in the Healthy Diet.CAU participants had lower phosphorus intake compared to HD participants (p = 0.042), which was non-significant after correcting for energy intake (p = 0.326).

Supplement S11: assumptions statistical analyses
Most assumptions of the cumulative odds ordinal logistic regression were met: there were proportional odds, as assessed by a full likelihood ratio test comparing the fitted model to a model with varying location parameters, χ 2 (3) = 5.53, p = 0.137.The final model did not significantly predict the dependent variable over and above the intercept-only model, χ 2 (1) = 1.82 p = .18.In addition, the deviance goodness-of-fit test indicated that the model was a good fit to the observed data, χ 2 (3) = 5.53, p = 0.137.
Most assumptions of ANCOVA were met: there was no significant interaction between the treatment arm and any of the T0 variables, suggesting that the assumption of homogeneity of the regression slopes was met.The Levene F test was not significant for most dependent variables (except for two), which also confirms the homogeneity of regression.For BMI (p = 0.001) and parent rated ER (p = 0.036) this assumption was violated.However, ANCOVA is quite robust when this assumption is violated, if sample sizes do not differ from each other by more than a factor of three.This is the case in the present study with group sample sizes of 84, 81 and 58.In addition, the two dietary treatments did not differ on baseline ADHD and emotion regulation problems (Table 3).Standardized residuals for the interventions and for the overall model were normally distributed for the majority of variables, as assessed by Shapiro-Wilk's (p > .05).For diastolic and systolic blood pressure, heart rate, parental quality of life, somatic complaints and positive parental engagement, this assumption was violated.
Inspecting histograms, skewness and kurtosis values resulted for positive engagement, parental quality of life and somatic complaints in no significant values for skewness or kurtosis.Transforming these variables did not improve normality.Given the fact that ANCOVA is fairly robust when this assumption is violated, we did not transform the variables and used the original variables in the analyses.Using a Van der Waerden transformation for blood pressure and heart rate did improve normality in these variables.However, clinical interpretation deteriorated when using this transformation.Given this fact and given that ANCOVA is fairly robust for non-normally distributed data, we did not transform these variables.ADHD behaviors.In the partial responder group, improvement in behaviors at school and at home was found for children receiving HD or CAU; for children improving partially after receiving ED, this was completely attributable to parental ratings.In addition, compared to HD and CAU, the mixed responders in the ED group seem to consist more often of parents who report improvement and teachers who report deterioration.

Table S14. Logistic Regression Analyses using Baseline Measurements to predict specific Respondership Categories versus all other Respondership
Categories for Dietary Treatments Note.Full; partial; mixed; non; deterioration is coded as 1 and category others is coded as 0; bold numbers depict significant odds ratio (confidence interval).a higher scores reflect more problems or lower parental quality of life; b ADD presentation is coded as 1 and other presentations as 0. Table S13 displays only the baseline characteristics where a significant predictive effect was found.Results demonstrate that more inattention problems at baseline predicted higher chances of partial and non-respondership and lower chances of mixed respondership.Moreover, more emotion regulation problems (rated by teacher) at baseline predicted higher odds of partial respondership and lower odds of mixed respondership.ADD presentation predicted higher odds of mixed respondership.More internalizing problems (rated by parent) at baseline predicted higher odds of full respondership.Lower parental quality of life at baseline predicted higher odds of non-respondership.Finally, children of mothers with secondary education (i.e.junior general secondary, senior secondary vocational, senior general secondary, pre-university) and mothers with another country of birth than the Netherlands had higher odds of being categorized as mixed responders.Note.d = Cohen's d; ƞp 2 = partial eta squared; n.a.= not applicable.a values represent N=T0/N=T1; b SDS = Standard Deviation Score (how many SD's does a measure deviate from the median); c higher scores reflect more problems; d higher scores reflect more engagement in this parenting style; e not applicable for CAU group, because parents did not fill out this questionnaire; f higher scores reflect lower quality of life; g higher scores reflect higher happiness Supplement S16: different parental raters at T0 and T1

Supplement S15: results of ANCOVA of secondary outcomes
For 17 participants (N = 6 ED participants, N = 9 HD participants; N = 2 CAU participants), mothers filled out questionnaires at T0 and fathers at T1, or vice versa.A sensitivity analysis without these participants was performed, to examine if the results of the ordinal regression analysis changed.
Results showed the same pattern and a trend significant result was found: the odds ratio of being in a better response category for ED participants versus HD participants was 0.59, 95% CI [0.33, 1.11], p = 0.078.
Running the ordinal regression analysis without these 17 participants, showed the same pattern when comparing the ED to the CAU group: the odds ratio of being in a better response category for ED participants versus the CAU group was 0.39, 95% CI [0.20, 0.73], p = 0.004.The odds ratio of being in a better response category for HD participants versus the CAU group also showed the same pattern: 0.66, 95% CI [0.35, 1.24], p = 0.196.
Results of secondary outcome measurements also showed the same pattern when this subsample of participants was excluded from the analyses.All in all, the results of almost all sensitivity analyses did not differ from the original analyses, therefore we chose to include all participants while running the analyses.

Supplement 13 :
Figure S1.Percentage Change in T0 versus T1 in ADHD and ER Problems per Respondership Category

Table S2 :
Hypothetical Distribution of Participants for Power Calculation

Table S3 .
Baseline Descriptive Demographics of the ADD Presentation compared to other

Table S4 .
Adherence to Elimination Diet

Table S7 .
Micronutrient Intake at Baseline

Table S15 .
Results of ANCOVA of Secondary Outcomes