Introduction

First-line medication therapy for children diagnosed with attention-deficit/hyperactive disorder (ADHD) is with one of the two classes of psychostimulant: methylphenidate or amphetamine [1, 2]. Although the use of stimulant medication leads to clinically significant reduction of ADHD symptoms in approximately 70% of cases, not all children respond to any particular stimulant medication [1]. Factors that contribute to an inadequate response include poor adherence, the severity and/or complexity of the ADHD, the adequacy of dosing, and/or the occurrence of dose-limiting adverse events. It is also recognized that some children respond preferentially to particular types of stimulant medication [3]. Guidelines of the American Academy of Child and Adolescent Psychiatry and the American Academy of Pediatrics have therefore recommended switching to another stimulant medication, particularly of a different class, if the expected reduction in ADHD symptoms does not occur or if adverse events are intolerable [1, 2, 4].

Because response to any particular stimulant medication is not certain, understanding possible predictors of clinical response can be useful for clinicians. Gender is a potential moderator of stimulant response in ADHD. Females differ from males in terms of ADHD prevalence, clinical presentation, and symptom profiles [5]. In the classroom, males exhibit more rule breaking and externalizing behaviors [6]. Gender differences in the density of dopamine transporters and clearance of extracellular dopamine [7, 8] have been suggested as possible explanations for why ADHD prevalence rates are higher in males than in females [9]. These factors could also be relevant to medication response.

Varying gender effects have been observed with the application of the ADHD rating scales often used for diagnosis and monitoring response to treatment in clinical trials and clinical practice, but these effects have rarely been explored or compared on a measure-by-measure basis. Male subjects score higher (greater symptom severity) than female subjects on the commonly used ADHD Rating Scale IV (ADHD-RS-IV) [10] when age and ADHD subtype are considered [11]. Although no consistent pattern of gender effects has been found with the commonly used Swanson, Kotkin, Agler, M-Flynn and Pelham (SKAMP) Rating Scale [12,13,14], male subjects did have significantly higher SKAMP Combined (SKAMP-C) and SKAMP Deportment (SKAMP-D) scores (higher scores signifying greater impairment) in a laboratory classroom study of lisdexamfetamine in 6–12 year olds with ADHD [13]. In that same study, however, no gender differences were seen with the Permanent Product Measure of Performance (PERMP), an individualized 400-problem math test used in laboratory classroom studies of ADHD drugs [13, 15].

Amphetamine contains a single chiral center that gives rise to distinct enantiomeric active forms, dextro (d)-amphetamine and levo (l)-amphetamine isomers. The two isomers have different pharmacokinetic properties (mean elimination half-life ranging from 9.77–11 h for d-amphetamine versus 11.5–13.8 h for l-amphetamine) and different neuropharmacological properties [16]. In animal models, d-amphetamine has been found to reduce overactivity and impulsiveness more efficiently, while l-amphetamine appears to be relatively more specific for sustaining attention [17]. Several amphetamine products with different proportions of each isomer have been developed for the treatment of ADHD. While some of these consist entirely of d-amphetamine, one currently available enantiomeric formulation consists of a ratio of 3:1 d-amphetamine to l-amphetamine.

Evekeo (RA-AMPH) (Arbor Pharmaceuticals, Atlanta, GA, USA) is an immediate-release racemic formulation of amphetamine that by definition consists of a 1:1 ratio of the d-amphetamine isomer and the l-amphetamine isomer. The efficacy and safety of RA-AMPH have been evaluated in a multicenter, dose-optimized, double-blind, randomized, placebo-controlled crossover laboratory classroom study [18]. The current paper reports a preliminary secondary analysis of that study to estimate effect sizes and facilitate power calculation for future studies regarding the possible presence of gender effects. A particular focus of the analysis is the possibility of differences as a function of gender between the findings with the ADHD-RS-IV-RS and SKAMP scales, and the relationship of any such differences to the findings with the laboratory-classroom-specific PERMP test.

Methods

Study Design

Full details of the methodological design and conduct of the study (NCT01986062) have been previously published [18]. Briefly, a laboratory school protocol [15] was used to measure efficacy, including onset and duration of effect. The study was conducted at 7 sites in the United States, in accordance with Good Practice guidelines, and the study protocol was approved by an appropriate institutional review board at each study site. The parent/guardian of each subject provided voluntary written informed consent for the subject to participate in this study, and the subject gave written assent.

Study inclusion criteria were a severity score of at least 3 (mildly ill) on the clinician-administered 7-point Clinical Global Impressions–Severity (CGI-S) scale and a score on the ADHD-RS-IV scale [19] at screening or baseline ≥ 90th percentile normative values for gender and age in at least one of the following categories: hyperactive-impulsive subscale, inattention subscale, or total score. A primary psychiatric diagnosis other than ADHD excluded a subject from participating in the study, as did secondary or comorbid diagnoses with the exception of simple phobias, oppositional defiant disorder (ODD), motor skill disorders, communication disorders, learning disorders, adjustment disorders, and sleep disorders. Subjects could be treatment-naïve or previously treated with an ADHD medication.

The study design consisted of a screening period, an 8-week open-label dose optimization phase, and a 2-week randomized double-blind crossover phase. Subjects took RA-AMPH twice daily (once in the morning and again 4–6 h later). The starting dose of RA-AMPH was 10 mg per day (5 mg bid), which was titrated weekly in 5-mg increments until an optimal dose based on clinical response and tolerability was achieved. A single down-titration for tolerability reasons was also permitted. To assess global changes in the severity of ADHD over the course of the open-label phase of the study, clinicians used the ADHD-RS-IV, the CGI-S, and the Clinical Global Impressions–Improvement (CGI-I) scale, which is scored from 1 (very much improved) to 7 (very much worse). Safety and tolerability were assessed via weekly monitoring of blood pressure and pulse and the tracking of spontaneously reported adverse events. The Columbia-Suicide Severity Rating Scale was also used to assess changes from baseline in suicidal ideation and behavior.

At the end of the open-label phase, subjects able to tolerate a stable dose of at least 10 mg per day were randomized on a 1:1 basis to a treatment sequence of 1-week RA-AMPH/1-week placebo or 1-week placebo/1-week RA-AMPH. No adjustment of RA-AMPH dose was permitted during the double-blind phase. The subjects continued to take the medication twice daily until the 7th day of each crossover week, when a final single dose was administered at the laboratory school site by study staff on the morning of each full-length laboratory classroom day. On the two laboratory classroom days, only the single morning dose was administered in order to establish duration of effect of the racemic formulation. At the end of the first laboratory classroom day, the subjects were dispensed their double-blind crossover medication for the following week leading up to the second laboratory classroom day. One week following the second laboratory classroom day, a final safety assessment was conducted during a post-withdrawal follow-up phone call or visit to the study site.

Efficacy Measures

The primary objective of the study was to establish that an optimal dose of RA-AMPH would result in a significant reduction in ADHD signs and symptoms compared to placebo at 2 h post-dose using the SKAMP-C scale. The ADHD-RS-IV was used during the open-label dose-optimization period to guide RA-AMPH dose changes. The ADHD-RS has been updated to be consistent with the 5th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), but this newer version (ADHD-RS-5) was not available at the time this study was conducted. The secondary efficacy objective was to determine the onset and duration of clinical effect for RA-AMPH after a single morning dose. The secondary efficacy variables included SKAMP-C scores, SKAMP Attention (SKAMP-A) subscale scores, SKAMP-D subscale scores, PERMP scores, each measured before dosing and at 0.75, 2, 4, 6, 8, and 10 h post-dose on the two full-length laboratory classroom days. For the PERMP, an objective individualized 400-problem mathematics exam, subjects were instructed by site staff to work at their seats and complete as many problems as possible in 10 min.

The safety and tolerability of RA-AMPH compared with placebo were assessed and reported in the primary outcomes paper [18] and will not be repeated here.

Description of the ADHD Scales

The ADHD-RS-IV includes all 18 symptoms described in the Diagnostic and Statistical Manual of Mental Disorders, 4th edition, Text Revision (DSM-IV-TR) [20] and measures the presence and severity of these symptoms. The 18 symptoms assessed by the ADHD-RS-IV are each rated on a 4-point scale; total scores range from 0 to 54, with higher scores indicating greater ADHD symptom severity. Normative data for age and gender have been published for the ADHD-RS-IV. Scores ≥ 90th percentile for age and gender are often used for clinical trial entry [10]. The ADHD-RS-IV was originally designed to be completed by a teacher or parent. However, an investigator interview version of the ADHD-RS-IV was later developed specifically for use in ADHD clinical trials and regulatory documents [21]. In the current trial (as in most clinical trials), the ADHD-RS-IV was completed by an investigator while interviewing a parent or caregiver at each visit. Since parents are not with subjects during school hours, the ratings often describe behaviors that are seen outside of school, but they may also reflect information that teachers have reported to caregivers. Completion of the ADHD-RS-IV is dependent on subjective recall and reporting of symptoms over the previous week, which can be inconsistent and imprecise.

The SKAMP Rating Scale differs from the ADHD-RS-IV in that ratings are based on direct observation in real time. The SKAMP scale is completed by trained raters observing subject behaviors during several 20-min sessions over the course of a day in a simulated classroom. The SKAMP measures core ADHD behaviors as well as associated features such as oppositional symptoms and conduct disturbances (e.g., frustration, lying, and relationships with staff and peers). The SKAMP scale consists of 13 items (grouped under the subcategories of attention, deportment, quality of work, and compliance) on each of which subjects are rated according to a 7-point scale (0 = normal to 6 = maximal impairment) by trained study personnel [15]. The SKAMP Combined score is obtained by summing the rating values for all of the 13 items. Higher SKAMP scores signify greater impairment.

Multiple behaviors are coded to each SKAMP item (Table 1) [22]. Two of the four items on the Deportment subscale and one of the two items on the Cooperation subscale do not describe core ADHD symptoms, while all four of the items on the Attention subscale and all three of the items on the Quality of Work (SKAMP-QOW) subscale describe core ADHD symptoms (although the items on the SKAMP-QOW subscale are scored in relation to the number of problems attempted and completed correctly during a programmed academic exercise and neatness of handwriting).

Table 1 Items in the SKAMP Combined scale and in the SKAMP subscales

The SKAMP scale was originally completed by teachers after a classroom period and validated using the Conners, Loney and Milich scale (CLAM) and Overactivity with Aggression (IOWA) Conners Rating Scales, which measure both ADHD and oppositional defiant symptoms [12, 23, 24]. ODD is more common in males than females, with a prevalence ratio estimated at 1.59:1 [25]. Since ODD is present in 40–70% of subjects who have ADHD, oppositional behaviors are common in the laboratory classroom even in patients who do not meet full criteria for ODD.

The 10-min, 400-question PERMP test was developed to be administered throughout a laboratory school day and has become a secondary outcome measure in most laboratory school studies [15]. The appropriate level of math difficulty for each student was determined based on results from an initial administration of the PERMP at the baseline visit at the beginning of the open-label phase. Performance on the PERMP test is evaluated using two scores, indicating the number of problems attempted (PERMP-A) and the number of problems correct (PERMP-C). PERMP scores at each laboratory classroom time point were also translated into a score of 0–6 and were entered as items 9 and 10 under Quality of Work on the SKAMP (Table 1).

Statistical Analyses

Primary and secondary efficacy analyses were performed on the intent-to-treat (ITT) population, consisting of all randomized subjects who received at least one dose of double-blind study medication and had at least one post-baseline assessment of the primary efficacy variable. Descriptive statistics for the SKAMP-C, SKAMP-A, SKAMP-D, and PERMP problems attempted, and PERMP problems correct, scores were calculated for each time point on the laboratory classroom days. Parameter estimates and effect sizes were generated from a cross-sectional fixed-effects model. The model included terms for treatment sequence (RA-AMPH/placebo and placebo/RA-AMPH), period (class effect: 1 and 2), treatment, gender, and treatment-by-gender interaction. All available data were used in the model, and there was no imputation for missing data. No corrections for multiple testing were applied on time points and subgroup statistical comparisons. Secondary analyses evaluated ADHD-RS-IV, SKAMP, and PERMP subscale values for males and females separately, and differences between them. The treatment comparison was conducted as a 2-sided test at the 5% level of significance. Standard errors (SEs) and 95% confidence intervals (CIs) were calculated. Statistical analyses were performed by Rho (Chapel Hill, NC, USA).

Results

Subgroup Demographics

Of 107 initially enrolled subjects, 10 withdrew prior to randomization at the end of the open-label phase, leaving 97 subjects for the ITT population: 47 subjects randomized to the treatment sequence of RA-AMPH followed by placebo and 50 randomized to the sequence of placebo followed by RA-AMPH.

Of the 97 subjects in the ITT population, 59 (60.8%) were male and 38 (39.2%) female. The mean (SD) subject age was 9.5 (1.87) years for males and 9.7 (1.88) years for females, each group with a range of 6–12 years. There were no significant differences between the male and female subjects in demographic and baseline characteristics, including ADHD type, except for weight: female subjects were on average heavier than male subjects—93.12 (33.54) pounds versus 79.12 (28.32) pounds (p = 0.0295)—a not unexpected difference between preadolescent boys and girls (Table 2). The mean (SD) daily dose of RA-AMPH was 17.2 (4.55) mg; the mean (SD) and median final doses were 23.4 (8.18) mg and 20.0 mg, respectively. The mean final dose did not differ for males and females (23.5 mg for males and 23.3 mg for females).

Table 2 Demographic and baseline characteristics by gender

Eight-Week Open-Label Dose-Optimization Phase

ADHD-RS-IV Scores by Gender

For the entire ITT population, during the open-label dose-optimization phase, mean (SE) ADHD-RS-IV total scores decreased (improved) by 27.8 points from 40.8 (7.64) at baseline to 13.0 (7.32) at the end of the 8-week study period [18]. All subjects were either drug-naïve or washed off of their ADHD medication prior to baseline ratings. Figure 1 shows the ADHD-RS-IV total scores and the inattention and hyperactivity/impulsivity subscale scores by gender during the open-label dose-optimization phase. At baseline, male and female subjects were comparable in ADHD-RS-IV total scores (p = 0.065) and inattention subscale scores (p = 0.523), but male subjects did have higher hyperactivity/impulsivity subscale scores than female subjects—20.8 (5.16) versus 18.4 (5.91) (p = 0.038)—a difference that disappeared at the end of the optimization period. For males and females, the improvement of ADHD symptoms was consistent and comparable at each visit through at least week 7 of the open-label phase. ADHD-RS-IV total scores changed by 25.9 points from 39.0 (7.97) at baseline to 13.1 (6.81) at week 8 for females, and by 29 points from 41.9 (7.25) to 12.9 (7.69) for males. There were no significant differences between the genders in terms of the optimized week-8 total scores (p = 0.876) or in terms of total score improvement at week 8 compared to baseline (p = 0.097), but the level of improvement showed a trend favoring boys—a trend related to the original difference between the genders in the hyperactive/impulsivity subscale scores. At week 8 of the open-label dose-optimization phase, the proportion of responders to treatment (≥ 50% improvement in the ADHD-RS-IV total score) was 86.8% (33/38) for females and 88.1% (52/59) for males. For males and females, improvements on the ADHD-RS-IV hyperactivity/impulsivity subscale and on the ADHD-RS-IV inattention subscale were also consistent and comparable. ADHD-RS-IV hyperactivity/impulsivity scores changed from 18.4 (1.0) at baseline to 5.6 (0.7) at week 8 for females and from 20.8 (0.7) at baseline to 5.9 (0.5) at week 8 for males. On this subscale, there was no significant difference between the genders in terms of optimized week-8 scores (p = 0.710), but the level of improvement at week 8 showed a trend favoring boys (p = 0.069). ADHD-RS-IV inattention scores changed from 20.7 (0.7) at baseline to 7.5 (0.6) at week 8 for females and from 21.2 (0.5) at baseline to 7.1 (0.6) at week 8 for males. On this subscale, there were no significant differences between the genders in terms of optimized week-8 scores (p = 0.664) or level of improvement at week 8 (p = 0.342).

Fig. 1
figure 1

ADHD-RS-IV total scores, ADHD-RS-IV inattention subscale scores, and ADHD-RS-IV hyperactivity/impulsivity subscale scores by gender during the 8-week open-label dose-optimization phase

Two-Week Double-Blind Randomized Crossover Phase

Analysis of Post-dose SKAMP Scores by Gender

On the SKAMP-C score at 2 h post-dose on the 2 full-length laboratory classroom days, RA-AMPH was superior to placebo: LS mean (SE) scores were 10.3 (1.09) for RA-AMPH versus 18.1 (1.09) for placebo (p < 0.0001) [18]. When SKAMP-C scores at 2 h post-dose for the ITT population were examined by gender, the treatment effect was greater for the male subjects, with statistically significant differences in LS means favoring RA-AMPH treatment over placebo of − 9.9 points for male subjects (n = 57, p < 0.0001) and − 4.8 points for female subjects (n = 38, p = 0.0002).

For the ITT population, significant separation from placebo was observed for RA-AMPH on the SKAMP-C scale and the SKAMP-D and SKAMP-A subscales at each time point tested (0.75, 2, 4, 6, 8, and 10 h) over the course of the 2 laboratory classroom days, during which each subject received RA-AMPH and placebo [18]. There were no significant or consistent differences in effect size (treatment difference divided by standard deviation) between the genders, and the magnitude of the effect sizes appeared typical for laboratory-classroom studies. Table 3 presents the cross-sectional fixed-effects analysis by treatment, gender, and treatment by gender for predose and post-dose time points for the SKAMP-C, SKAMP-A, and SKAMP-D scales. A significant treatment-by-gender interaction occurred only at the 2-h time point on the SKAMP-C scale (f = 5.86, p = 0.0174) and on the SKAMP-A subscale (f = 4.71, p = 0.0326). Subject scores on the SKAMP-C scale and the SKAMP-D and SKAMP-A subscales over the course of the two laboratory classroom days are broken out by gender in Fig. 2. Male subjects started at the predose time point with higher scores on all three of the SKAMP scales, and these higher scores continued throughout the day. Significant separation from placebo was observed for both genders for RA-AMPH on all three SKAMP scales at each time point tested (average p < 0.0001 over all post-dose time points), with the exception of the 10-h time point on the SKAMP-A subscale for female subjects (LS mean treatment effect at that time point was − 0.6 points, p = 0.2233).

Table 3 Cross-sectional fixed-effects analysis by treatment, gender, and treatment by gender for predose and post-dose time points
Fig. 2
figure 2

Analysis of laboratory-classroom SKAMP Deportment, SKAMP Attention, and SKAMP Combined scores by treatment and gender. *p < 0.0001 vs. placebo. p < 0.001 vs. placebo. p < 0.01 vs. placebo. §p < 0.05 vs. placebo

Analysis of Post-Dose PERMP Scores by Gender

For the PERMP-A and PERMP-C tests, there were no significant and consistent differences in effect size between the genders, and the magnitude of change appeared typical for this type of study. In the cross-sectional fixed-effects analysis, there were no significant treatment-by-gender interactions on the PERMP-A and PERMP-C tests (Table 3). Subject scores on the PERMP test over the course of the two laboratory classroom days are broken out by gender in Fig. 3. Significant separation from placebo was observed for both genders for RA-AMPH in terms of both problems attempted and problems correct at each time point tested, with the greatest RA-AMPH effect seen at 2 h post-dose for the male subjects and at 4 h post-dose for the female subjects, although effect levels were similar throughout the laboratory classroom days for both genders. Scores for females receiving placebo were consistently numerically higher than those for males receiving placebo, and scores for females receiving RA-AMPH were consistently higher than those for males receiving RA-AMPH, but no statistically significant differences were seen between males and females at any time point (Table 3).

Fig. 3
figure 3

Analysis of laboratory-classroom PERMP number of problems attempted and number of problems correct by treatment and gender. *p < 0.0001 vs. placebo. p < 0.001 vs. placebo. p < 0.01 vs. placebo

Discussion

In the first classroom study of 1:1 racemic amphetamine, an optimal dose of RA-AMPH resulted in significant reduction in symptoms of ADHD and related problems in children aged 6–12 years as measured by the SKAMP. Scores on all three SKAMP scales were significantly improved for all children combined from the first time point tested (0.75 h post-dose) through the end of the day at 10 h [18].

Although the SKAMP-C scores were significantly different from placebo for males and females at all time points except for 10 h post-dose, the treatment-by-gender interaction was not significant for any time point except at 2 h post-dose (p = 0.0174). Since both genders entered the study with similar baseline ADHD-RS-IV scores and optimized ADHD-RS-IV total scores were also comparable, one might expect SKAMP ratings to also be similar. However, the SKAMP-C scores were significantly higher (worse) for males than for females at all time points except 10 h post-dose (Table 3). These differences were driven primarily by the SKAMP-D scores, which were significantly different between the genders at all time points except at 8 h (p = 0.0614) and at 10 h (p = 0.0581) post-dose (Table 3). Since there were no significant gender differences on the SKAMP-A subscale or on PERMP, ADHD-RS-IV subscales were evaluated to determine whether there were gender differences between the inattention and the hyperactivity/impulsivity subscales. There were no significant differences in either subscale by gender at the end of open-label dose optimization.

So why did males have significantly higher (worse) SKAMP scores but not significantly higher (worse) ADHD-RS-IV or significantly lower (worse) PERMP scores than females in the trial? The answer might be related to the following fact, which is not frequently highlighted in studies of ADHD medication effects: namely, in addition to ADHD symptoms, the SKAMP scale also measures associated behaviors such as frustration, lying, and relationships with staff and peers, which are more characteristic of ODD and conduct disorder than of ADHD (Table 1). Two of the four items on the SKAMP Deportment subscale (items 5 and 6) and one of the two items on the SKAMP Cooperation subscale (item 13) do not describe core ADHD symptoms, while all seven of the items on the SKAMP-A and SKAMP-QOW subscales assess inattention directly or by evaluating functional outcomes (PERMP scores and handwriting neatness).

In the current study, an ODD diagnosis was not an exclusion criterion, except when ODD symptoms were severe enough to cause problems with study participation. In the actual enrolled population, however, only 7 (7.2%) of the randomized subjects met DSM-IV-TR criteria for ODD, a percentage far lower than the range of 39–84% meeting ODD criteria in other studies [26, 27]. Even in the Multimodal Treatment Study of Children with ADHD (MTA), 40% of the subjects with ADHD met ODD criteria [28]. Consequently, the presence of ODD cannot account for the gender difference in the current study. Since stimulant treatment is not only effective for treatment of core ADHD symptoms but also improves behavioral noncompliance and aggression [29], which are characteristic of youth with ODD and high levels of impulsivity, it is not surprising to find a lack of difference between genders in treatment effects, as all SKAMP symptoms tend to improve. However, the reason for improvement is less clear. Is it a direct effect of medication on associated symptoms or secondary to reduced ADHD symptoms? If core ADHD symptoms were only evaluated using the SKAMP, it is possible that the effect sizes of stimulants would be lower.

Another difference between the SKAMP and the ADHD-RS scales is that the SKAMP scale measures behaviors using direct observation by skilled raters in real time in a classroom setting, where subjects interact with peers, while the ADHD-RS scale is based on historical information about attention and hyperactive-impulsive behavior (typically, behavior during the past week in most trials, including this one). Although the ADHD-RS ratings are completed by a trained investigator, symptoms are based primarily on parent/caregiver reports, which are imprecise and most heavily influenced by behavior witnessed at home for only parts of most days. In addition, the SKAMP scale is useful in determining the onset, duration, and peak effects of stimulant medications because it measures real-time effects, which is not possible with the ADHD-RS scale. Thus, it is clear that the two scales are not identical, although both scales are accepted as study outcome measures by the US Food and Drug Administration for drug registration studies in ADHD.

Differences between the SKAMP and ADHD-RS scales are important for practitioners to understand. The ADHD-RS scale can be used to evaluate treatment effects during clinic visits, but even when there is much improvement in ADHD symptoms, patients may still have significant impairment due to associated behaviors not measured by the ADHD-RS scale. The SKAMP is better for assessing a range of ADHD and non-ADHD symptoms; however, the SKAMP scale is not practical to administer in a typical classroom or clinic and only useful when assessments are obtained in a controlled environment by trained raters.

Differences for females versus males are reported in the literature in terms of ADHD prevalence and type of presentation [30], patterns of referral/diagnosis and prescription [5, 31], neurobiology and pharmacokinetics [7, 8], and the timing or rhythm of response. Although most of the studies on the moderating effects of gender are based on populations with relatively limited female enrollment—contrasting with the female enrollment of nearly 40% in the randomized cohort in the study on which the current subgroup analysis is based—the studies converge in asserting that gender does not moderate overall outcomes and effect sizes [5, 13, 32,33,34,35]. In an analysis of results from the MTA study, investigators found that gender did not affect treatment outcomes, although 80% of that cohort were male [32, 36, 37]. In a reanalysis of data from the Comparison of Methylphenidates in an Analog Classroom Setting study, Sonuga-Barke et al. compared 136 boys (74%) and 48 girls (26%) aged 6–12 years in a double-blind crossover trial and found no main effect of gender on measures of overall response to treatment [33]. In the same trial, the authors also compared laboratory classroom measures to parent Swanson, Nolan, and Pelham IV (SNAP-IV) ratings [38] and found modest correlation at certain time points during the laboratory classroom day. Interestingly, after correction for multiple measures, there were no significant correlations at the end of the day (12 h after dosing), and the only significant correlation with the SKAMP-A subscale was prior to dosing. More significant correlations were noted during the time periods of 1.5–7.5 h after dosing than before dosing and at the end of the day. When the measures were analyzed by gender, only the SKAMP-C scale was significantly correlated with the SNAP-IV overall rating for females 1.5–4.5 h after dosing (−0.18, p < 0.001).

To summarize, males and females both have significant improvement in core ADHD symptoms and a range of other behavior problems when treated with stimulants. However, as supported by data from this RA-AMPH study, associated behavioral features and oppositional symptoms are more prevalent in boys (though clearly not exclusively so), are often a presenting complaint, and should be a target of treatment. It is therefore important to evaluate associated behavioral symptoms in youth with ADHD during the initial assessment and over the course of treatment. Since the ADHD-RS rating scale is not sufficient to evaluate associated behavioral symptoms, clinicians may use other rating scales to assess disruptive behaviors such as the IOWA Conners scale [23], the NICHQ Vanderbilt Assessment Scale [39], or the Conners 3rd Edition rating scale (Conners 3) [40].

The study results reported here should be considered in the context of several limitations. Because this secondary analysis was applied to the results of a laboratory classroom study of RA-AMPH that was not specifically designed to assess gender differences in treatment response, it was underpowered. The current paper presents the results of an exploratory post hoc analysis. As stated, there were no comparisons for the multiple testing, and the rate of false detection was not determined, which could for example have affected the finding of nonsignificance for the differences between males and females on the PERMP scales. Children with significant psychiatric and medical comorbidities were excluded from the laboratory classroom study, which may have limited generalizability. This included subjects with conduct disorder and severe ODD. Subjects who were randomized and completed the double-blind phase of the laboratory classroom study, including the classroom sessions, also had to both respond to and tolerate RA-AMPH, which is not true of all patients with ADHD. Further, dosing in this study did not fully replicate what would be done in the clinical setting. To illustrate, during the 8-week open-label dose-optimization phase, RA-AMPH was given twice daily. This was also the case during the double-blind period, except for the two laboratory classroom days, when only the single morning dose was administered. This was done to test the duration of action of RA-AMPH. Consequently, notwithstanding the possibility of amphetamine tachyphylaxis, the SKAMP results reported for the laboratory classroom days (and especially the results for the afternoon sessions)—in showing what might possibly be considered a “fatigue effect”—likely underrepresent the level of symptom control that can be achieved in clinical settings. Also, data from the SKAMP and ADHD-RS-IV were not obtained using the same dosing, as the former represented only QD and not BID dosing. The recently published ADHD-RS-5 reflects the revised ADHD criteria appearing in the Diagnostic and Statistical Manual of Mental Disorders, 5th edition, but that scale was not available at the time the RA-AMPH laboratory classroom study described in this manuscript was conducted.

Conclusion

In this secondary analysis of a laboratory classroom study of children with ADHD treated with R-AMPH, both genders responded equally well to treatment with RA-AMPH, with comparable onset and duration of effect in the laboratory classroom, as evidenced by the ADHD-RS-IV and SKAMP scales and by the laboratory classroom-specific PERMP. However, in the laboratory classroom setting, behavioral differences were found between boys and girls with ADHD. Boys demonstrated significantly more oppositional and conduct disordered behaviors than girls before and after treatment with RA-AMPH.

Although the ADHD-RS-IV and SKAMP scales measure changes in attention and behavior with drug treatment, the SKAMP scale also measures associated disruptive behaviors, such as frustration, lying, and interpersonal conflict, that are more characteristic of oppositional and conduct disorders and that are more prevalent in boys than in girls with ADHD. The SKAMP scale might thus be more sensitive for measuring the range of symptoms of boys with ADHD than the ADHD-RS-IV.