Introduction

The neuro-developmental disorder Attention-Deficit Hyperactivity Disorder (ADHD) is characterized by inattention, hyperactivity and impulsivity [1]. These symptoms are associated with academic problems such as lower grades, grade repetition and increased school drop-out [2,3,4,5]. Academic improvement is a common treatment target for children with ADHD with stimulant medication being the most commonly prescribed treatment [6]. Stimulants are clinically effective in reducing ADHD symptoms in the short- and medium term [7, 8]. Moreover, there is evidence that its benefits extend to improvements in cognition relevant for academic performance [9].

In the past, there have been several reviews of the effects of stimulant medication on academic performance [10,11,12,13,14,15,16]. These reviews report little evidence for positive effects [10, 11, 16, 17]. However, the first meta-analysis [13], demonstrated 9.7–14.4% (p < .001) improvements with stimulant medication compared to placebo in seatwork productivity (number of assignments completed) and on-task behavior (amount of time actively spend on seatwork). However, the effect of stimulant medication on the quality of academic performance (academic accuracy) was less clear-cut: in one analysis, only a third of the studies reported effects of medication on academic accuracy and the pooled effect was not significant [13].

The existing reviews [10,11,12,13,14,15,16] and meta-analysis by Prasad et al. [13] have a number of limitations. First, because the negative association between ADHD symptoms and both reading and math is stronger than the association between ADHD symptoms and spelling [5] and there is evidence from recent medication trials that medication efficacy differs between academic subjects [18, 19], it is important to independently assess the effects of medication on different academic subjects. The meta-analytic results of Prasad and colleagues [13] only applied to seatwork assigned by participant’s teachers (independent of the academic subject) and, therefore, tell us little about the possible differential effects of stimulants on specific academic subjects. Second, Prasad and colleagues reported on the number of items which are correct, a measure that might be confounded by the number of items attempted (task productivity), rather than percentage correct. Therefore, improvements in items completed correctly may merely reflect increased productivity. It is thus important to distinguish between improvements in accuracy and productivity, especially because long-term studies suggest that improvements in test scores with medication are often not accompanied by improvements in longer term academic outcomes, such as grades and grade repetition [14, 15].

Ultimately, prior reviews have not resolved the key question of whether there are improvements in core academic skills or just improvements in academic productivity. Here we conducted a meta-analysis to address this issue. Our meta-analysis improved upon previous work in several aspects. First, in contrast to the previous reviews, we quantified both accuracy and productivity while distinguishing between the core academic subjects (math, reading and spelling). Second, we added 6 years of literature to the prior meta-analysis [13], which more than doubled the number of studies included. Third, these more recent studies allowed for exploration of the moderating effects of demographic and disorder-related variables (age, gender, ADHD subtype and severity, and commonly reported comorbidity with oppositional defiant disorder, conduct disorder and learning disorders) and study characteristics [medication release system, dosage, titration method, time of measurement (i.e., hours after intake) and trial duration] on medication efficacy. Fourth, we followed up on recent studies suggesting that academic improvements due to stimulant medication were partly mediated by behavioral improvements [19, 20]. Therefore, the current meta-analysis included symptom improvements and on-task behavior (% time on task) as potential mediators in the analyses.

Methods

This systematic review conformed to PRISMA [21].

Study selection

We included studies published in the English language in peer-reviewed journals that (1) evaluated the effects of stimulant medication on academic functioning; (2) included mostly (at least 80% of the sample) primary school children (male and female) with a primary diagnosis of ADHD [established using the DSM-III, DSM-III-Revised (DSM-III-R) or DSM-IV/DSM-IV-Text Revisions (DSM-IV-TR) or ICD-10 criteria]; (3) evaluated the effects of methylphenidate (MPH) (immediate or extended release formulations or transdermal) on standardized achievement tests for math, reading or spelling; and (4) used a placebo-controlled crossover design or between-subject design. This review focused on primary school children as it is at this age that school teachers, concerned about academic performance, often drive the referral process, advising parents to seek help for their children’s ADHD. Further, there is a big difference in medication use in primary school age students and high school students (e.g., non-compliance rates are much higher in high school, see for example [22]). The computerized databases PubMed, EMBASE, ERIC and PsycINFO were used to identify relevant studies up to October 2017. The following search terms and all possible equivalents were used to search article title and abstract: (1) disorder terms, e.g., ‘ADHD’; (2) treatment terms, e.g., ‘methylphenidate’, including all brand names; (3) outcome terms, e.g., ‘academic’, ‘school’, ‘classroom’, ‘math’, ‘reading’, ‘spelling’, ‘writing’, ‘on-task’, ‘off-task’. In case of missing or incomplete data, authors were contacted twice for additional data. When data were presented in graphs only, we used GetData Graph Digitizer version 2.26 [23] to extract the exact numbers, which was done successfully for one study [24]. In cases where multiple articles were based on the same sample, we selected the original, most comprehensive report on that study, resulting in the exclusion of four studies [25,26,27,28] as these data were originally described elsewhere [18, 29, 30]. See Fig. 1 for a flow diagram of the meta-analytic search and study selection. The first author (AK) and a second independent investigator reviewed titles and abstract for eligibility. Full texts were also reviewed by the first author as well as by an independent investigator. A third independent investigator conciliated discrepancies. Reference lists of included articles were searched for additional articles meeting the inclusion criteria.

Fig. 1
figure 1

Flow diagram of the meta-analytic search and study selection

A total of 3084 records were identified corresponding to 2594 unique articles. Thirty-four articles met the inclusion criteria for meta-analysis (Fig. 1). Study characteristics, including design, medication titration, dependent variables, mediators and moderators obtained from each study are displayed in supplementary material Table E1.

Measures and data extraction

Table E2 in supplementary material gives an overview of tasks and questionnaires used to assess academic outcomes, an overview of the selected mediator and moderator variables, as well as the measures derived from the academic tasks.

Academic outcomes

Articles were included if they provided information about either accuracy or productivity scores for math, reading or spelling, or a combination of these. When mean accuracy and productivity scores were not reported as dependent variables, they were calculated by hand. Accuracy was calculated by dividing the mean number correct responses by the mean number items completed. Productivity was calculated by dividing the mean number of items completed by the total number of items.

Math and reading tasks were always speeded tasks requiring participants to complete as many items in a limited amount of time. Math tasks always consisted of simple math problems (addition, subtraction, multiplication and division) generally presented in ascending order of difficulty over a fixed period of time. Reading tasks consisted of a short passage text followed by multiple choice questions. Reading paragraphs were adapted to the student’s reading level. Meta-analyses were performed for math accuracy, math productivity, reading accuracy and reading number attempted. The latter was chosen as an outcome because reading productivity could not be calculated as the total number of reading items differed per study and was not reported in combination with reading number attempted. Reading number attempted is an informative measure as the included studies used identical tasks and time limits. Spelling was measured by spelling lists assigned by teachers or taken from local school district lists. Only two out of three studies from our search reported spelling accuracy [31, 32], the third study only reported standardized means related to baseline scores [33]. As only two studies met inclusion criteria and minimum number of studies to perform a meta-analysis is three [34] we limited our analysis to a narrative description and qualitative synthesis.

Mediators

ADHD symptom improvements were included as mediators if means and standard deviations were available. Because the number of studies reporting on parent-rated symptom improvements or on-task behavior were limited (n = 2 and n = 8, respectively) and at least ten studies are recommended for reliable meta-regression [34], we only included teacher-rated symptom improvements in our mediator analysis and performed meta-regression for math accuracy (n = 17) and math productivity (n = 11). Teacher-rated symptom improvements were measured with standardized questionnaires, which were either derivatives from the Conners Rating Scale [35], the Strength and Weakness of ADHD symptoms and Normal Behavior (SWAN) rating scale [36, 37] or the Swanson, Kotkin, Agler, M-Flynn, and Pelham (SKAMP) rating scale. SKAMP ratings show high correlations (r = .50–.84) with Conners ratings scales [38]. Supplementary material Table E2 gives an overview of all questionnaires used in the studies included in this meta-analysis. Reliability and validity of all questionnaires used have been established [38, 39]. Scores were standardized (mean difference between conditions divided by SD of the placebo condition) for inclusion in the meta-regression. Mediators were investigated using meta-regression, using difference scores (MPH minus placebo).

Moderators

Because the number of studies was limited and at least ten studies are recommended for reliable meta-regression [34], we performed meta-regression analyses only for math outcomes. For math accuracy we tested the following moderators: (1) demographic moderators age (year); gender (percent male); (2) disorder-related moderators percent children diagnosed with ADHD-inattentive subtype; percent diagnosed with comorbid ODD or CD; parent-rated ADHD severity [standardized (mean divided by SD) baseline ADHD symptom ratings on standardized questionnaires, for an overview of questionnaires used to assess severity of ADHD symptoms, see supplementary material Table E2], and (3) study characteristics including release system (immediate versus extended release, the latter including transdermal); duration of the study conditions (days); time of measurement (post-dose, in hours); medication dosage (mg); titration method (clinical titration versus fixed dosages). For math productivity we tested demographic and disorder-related moderators: age, gender, percent diagnosed with ADHD-inattentive subtype, and study characteristics: release system, trial duration, and titration method. Insufficient number of studies reported on comorbid learning disorders. In case of doubt, authors were contacted.

Moderators were explored using meta-regression between the study samples’ effect sizes for academic performance and the selected moderators. Mediator and moderator effects were studied separately for math accuracy and math productivity using meta-regression with a random model (method of moments) [34].

Statistical analysis

Statistical analyses were performed using SPSS version 21.0 [40] and Comprehensive Meta-Analysis software V3.0 [41]. Because accuracy and productivity measures are proportional measures that require effect sizes for binary data [34], risk differences (MPH minus placebo) were calculated. The standard errors of the risk difference were calculated [42] because included articles commonly reported on the number correct and the number completed and, therefore, the reported p values and standard deviations were not applicable to the calculated risk differences. Effect sizes were calculated for math accuracy, math productivity, reading accuracy and reading number attempted. In supplementary material Table E1, we provide a narrative description of the studies reporting on spelling. The derived effect sizes were weighted by their inverse variance to account for differences in sample size and error of measurement [34]. As heterogeneity may have been introduced using data from studies with different designs (i.e., differences in treatment duration, dosages) and different participants (e.g., differences in comorbidity), all meta-analytic effect sizes were calculated using a random effects model. The I2 statistic was used to assess heterogeneity of effect sizes, where values of 25, 50 and 75% indicate low, moderate and high heterogeneity, respectively [43].

Rosenthal’s fail-safe n was calculated to determine the number of studies with a null effect necessary to cancel out significant effect sizes, where fail-safe n values > 5 k + 10 were considered robust and k refers to the number of samples on which the relevant effect size was calculated [44]. Further, Egger funnel plot asymmetry was used to assess publication bias [45]. Associations between effect size and sample size were investigated to assess the possibility that studies with small samples and large effect sizes were more easily published than studies reporting non-significant findings. All tests of significance were two sided with α = .05. Risk of bias was estimated for each study based on Cochrane guidelines [46].

Results

A combined total of 1777 children from 34 different studies were included in the meta-analyses. Another 425 children from seven studies were included in the qualitative synthesis because results were either reported in figures only or exact values were not reported, including six studies on math performance and three studies on spelling accuracy. Table E1 provides an overview of which studies qualified for meta-analysis and gives a narrative description of the results of those studies that did not qualify. Meta-analyses were conducted for math accuracy (29 studies, N = 1528) and math productivity (17 studies, N = 912). For reading, meta-analyses were conducted for reading accuracy (nine studies, N = 207) and number of items attempted (five studies, N = 100). Most studies (88.2%) used a placebo-controlled crossover design. Four studies (11.8%) used a between-subject design. In 73.5% of the studies, medication dosage was clinically titrated on symptom improvement before start of the trial. In the other 26.5% of the studies, dosages were fixed. When multiple dosages were used in randomized order, we included results from the dosage showing greatest effects on academic outcomes to optimize MPH efficacy (please see Table E1 for details). While all studies predominantly involved primary school children, one study also included children from middle school (aged 12–16, 16.5%).

Effects of MPH on academic performance

Table 1 provides an overview of all meta-analytic results, heterogeneity statistics and the results of the publication bias analyses.

Table 1 Meta-analytic results for the effects of MPH on academic performance

Math

The meta-analytic results showed that MPH significantly improved math accuracy by 3.0% (p = .001) and math productivity by 7.8% (p < .001), see Figs. 2 and 3, respectively. Results from the four studies not qualifying for meta-analysis and reviewed in our qualitative synthesis (see Table E1) corroborate our meta-analytic findings.

Fig. 2
figure 2

Forest plot of the effects of MPH on math accuracy

Fig. 3
figure 3

Forest plot of the effects of MPH on math productivity

Reading

For reading, meta-analytic results showed that improvements in accuracy with MPH were not significant (improved by 6.2%, p = .089), see Fig. 4. In contrast, MPH increased the number of reading items attempted (d = .47, p < .001), see Fig. 5.

Fig. 4
figure 4

Forest plot of the effects of MPH on reading accuracy

Fig. 5
figure 5

Forest plot of the effects of MPH on reading number attempted

Spelling

The results from our qualitative synthesis were inconclusive with only one out of three studies reporting significant improvements in spelling with MPH compared to placebo, see Table E1.

All effect sizes reflecting the effects of MPH on math and reading performance showed low heterogeneity, see Table 1.

Publication bias

Inspection of Egger funnel plots for publication bias indicated no asymmetry for math productivity, reading accuracy and reading attempted. Egger’s test was significant for math accuracy indicating a risk for publication bias. Fail-safe n values indicated that the effects of MPH on math accuracy, math productivity and reading attempted were quite robust, whereas the effect of MPH on reading accuracy was not robust (Table 1). There was no significant relation between sample size and effect size for any of the dependent variables entered in the meta-analysis. Therefore, it is unlikely that publication bias meaningfully influenced results, with the exception of MPH effects on reading accuracy. Risk of bias of individual studies according to the Cochrane index for crossover trials was generally low, for details see supplementary Table E3.

Mediation and moderation

None of the potential mediators or moderators significantly interacted with the effects of MPH on math accuracy or productivity (all ps > .09). Supplementary Table E4 reports on the number of studies included in the meta-regression, Z values, 95% CI and p values.

Discussion

The current meta-analysis and systematic review summarized more than three decades of research on the effects of MPH on academic performance in ADHD. Our analysis particularly focused on the question whether MPH improves academic accuracy or just academic productivity.

There were small to medium-sized, positive effects of MPH on math accuracy, math productivity and reading accuracy. Math accuracy increased by 3.0%, whereas math productivity increased by 7.8%. MPH did not improve reading accuracy, but did improve the number of items attempted in reading (medium effect). Results from our qualitative synthesis regarding math performance corroborated the findings from our meta-analyses. The qualitative synthesis of the studies reporting on spelling accuracy was inconclusive and more studies on this topic are needed. Our main results underline the importance of assessing a full range of outcome measures (accuracy and productivity) and different academic subjects when studying the effects of MPH on academic performance. Moreover, these results underline the contrast between the large symptom improvements obtained with MPH and the small- to medium-sized improvements in school performance—with improvements are restricted to certain academic subjects, and are small or absent for measures of accuracy.

However, it is important to realize that the short-term tests of academic performance used in the studies analyzed are sensitive to potential longer term benefits of MPH. Thus, it may be that although positive effects of MPH on academic accuracy are small or absent on the short term, MPH-related behavioral improvements and increased productivity may result in long-term better school performance. Currently, evidence for long-term effects of MPH on academic performance is lacking (REF 14 Langberg) but cannot be ruled out because of methodological issues (i.e., the lack of long-term RCTs).

None of our mediators or moderators influenced MPH effects on math and reading accuracy or productivity. This may be because most variance is due to random error as indicated by the low I2 values obtained in our meta-analyses [34]. However, as there is large uncertainty (large confidence intervals) in heterogeneity estimates such as I2, we deemed our meta-regression relevant [47]. In particular, we hypothesized that teacher-rated symptom improvements would mediate MPH effects on academic performance, but the results from our meta-analysis did not confirm this hypothesis for MPH effects on math accuracy. Unfortunately, the number of studies reporting on teacher-rated symptom improvements and math productivity as well as reading was insufficient for meta-regression. Further, most studies reporting on on-task behavior and academic performance measures indicate simultaneous improvements on both with improvements in on-task behavior ranging from 2.9 to 12.0% [20, 24, 32, 48,49,50,51,52,53,54,55]. Unfortunately, the number of studies in the current study was too small to test the mediating effects of on-task behavior using meta-regression. Taken together, our results do not support a mediating role for classroom-expressed ADHD symptoms in the relationship between MPH and math accuracy, but this may be different for productivity measures as behavioral improvements in the classroom are generally seen as a prerequisite for academic improvements, especially academic productivity. Furthermore, cognitive improvements may be more relevant here than symptom improvements, as deficits of children with ADHD are apparent for those cognitive functions that are especially important for academic performance, e.g., attention, working memory and response inhibition [56,57,58,59]. Possibly MPH-related improvements in cognition play a large role in academic improvement, compared to behavioral improvements and these act through different pathways than those driving symptoms. This is consistent with a recent study by Coghill et al. [60] showing that while MPH improves both symptoms and some aspects of cognition these effects seem to be independent.

Demographic and disorder-related variables included in our analysis (age, gender, ADHD subtype and ADHD severity) did not moderate MPH efficacy on math performance either. The absence of a moderating effect of age and comorbid disorders is in line with the results of a recent meta-analysis on behavioral improvements with MPH [61]. In the meta-analysis by Storebø and colleagues, some evidence was found for a moderating effect of ADHD subtype and behavioral improvements, with highest MPH efficacy for the inattentive subtype.

Similarly, study characteristics also did not moderate efficacy of MPH on math performance, at least not for release system, trial duration, time of measurement, dosage and titration method. Thirteen studies used ER formulations, two studies reported on the effects of transdermal MPH and 19 studies reported on IR formulations. Results from our meta-regression suggest that ER formulations are equally effective as IR formulations in improving academic performance, which is in line with findings from individual studies comparing ER and IR formulations [52, 62, 63]. We found no effects of titration method (clinical titration prior to the trial or fixed dosages) which is also in line with findings from [61]. Possibly, to optimize the effects of MPH on academic outcome, titration should be based on academic outcomes instead of on symptom improvements. The absence of an effect of dose on academic performance is in line with the findings from Prasad et al. [13], who found no difference between studies comparing the effect of 0.3 mg/kg or 10 mg fixed dose to 0.6 mg/kg or 17.5–20 mg fixed dose on percentage seatwork completed. Also in line with this are the results from [61], who found no effects of dose on symptom improvements.

Strengths of the current review included a separate consideration of academic accuracy and productivity; including a distinction between academic subjects (math, reading and spelling) and the inclusion of randomized controlled trials only. There were, however, also some potential limitations. First, because we focused on effects with the optimal dose, we did not include dose–response analyses. Second, trial duration was generally short (between 1 and 7 days), limiting our conclusions to short-term effects of MPH on academic performance. Evidence for the longer term benefits of MPH on academic performance is lacking thus far [14]. For obvious ethical and practical reasons evidence for such effects is unlikely to be generated from placebo-controlled trials—and is, therefore, outside the scope of the current review. It remains possible that short-term effects of MPH on both behavior and academic performance (i.e., productivity) summarized here, may translate into longer term benefits. Furthermore, it should be investigated whether long-term benefits may be seen even where short-term effects are not evident.

On the basis of this review, we make a number of recommendations for future research. Some studies in the review relied on self-developed math or spelling test sheets. In the future, these should be replaced by validated tests (depending on the design of the study, with relevant norms). Further, researchers should always report both accuracy and productivity measures to allow for separate estimation of MPH effects on quantity and quality of academic performance. Moreover, by increasing the trial duration, more relevant measures like school grades can be included while still using randomized, placebo-controlled designs. Finally, more research on moderators and mediators of MPH efficacy is useful to isolate groups of patients who may benefit more or less from MPH and to reveal its mechanism of action. Although many studies have attempted to do so, measures of mediators and moderators are not uniform and generally not standardized, impeding (meta-analytic) aggregation of relevant results.

In summary, our results indicated that MPH results in robust improvements in the number of reading items attempted and small- to medium-sized improvements in math productivity and accuracy. Improvements in academic quality (accuracy) were small (3.0%) and limited to math. The effects of MPH on math accuracy were not mediated by teacher-rated ADHD symptom improvements, and MPH effects on math accuracy and productivity were not influenced by demographic variables, disorder-related variables or study characteristics. The discrepancy between the large behavioral improvements seen with MPH and these smaller and selective improvements in academic performance are important for treatment guidelines. As academic improvement is often one of the main treatment goals, parents and teachers should be advised about the specificity and limited size of MPH effects on academic performance.