1 Introduction

Attention-deficit/hyperactivity disorder (ADHD) is a common psychiatric disorder in childhood, affecting an estimated 3–7 % of school-age children in the USA [1]. While stimulants have proven to be safe and effective first-line therapy for most patients with ADHD [2], some patients may have a contraindication or may be intolerant, unresponsive, or only partially responsive to stimulant treatment. In addition, families or patients may prefer not to use stimulant therapy [2, 3]. For these individuals, nonstimulant treatments may be considered as treatment alternatives [2]. Atomoxetine (ATX; Strattera®, Eli Lilly and Company), the first nonstimulant approved in the USA for the treatment of ADHD, is a selective norepinephrine reuptake inhibitor indicated for use as monotherapy in children (aged 6 years and older) and adults [4]. ATX has also been approved for use in Canada, Mexico, and several other countries throughout the world, including in Europe, Asia, Australia, Africa, and Latin America. An extended-release formulation of the selective α2A-adrenergic receptor agonist guanfacine (GXR; Intuniv®, Shire Pharmaceuticals Inc.) became the second nonstimulant approved by the Food and Drug Administration (FDA) for the treatment of ADHD. GXR is indicated for use both as monotherapy and as an adjunct to stimulant medications in children and adolescents aged 6–17 years [5]. Recently, a third nonstimulant, clonidine extended release (Kapvay™; Shionogi Inc.), has also been approved by the FDA for monotherapy and adjunctive therapy in the treatment of ADHD. At present, GXR and clonidine extended release formulations are only licensed for use in ADHD in the USA.

The objective of the current study was to compare the relative efficacy of GXR and ATX as nonstimulant monotherapies for the treatment of ADHD in children and adolescents. While a head-to-head randomized controlled study is generally considered the most robust method for comparing two competing interventions, this type of study data is rarely available given the time, expense, and dependency on manufacturers’ incentive to conduct such a study [6, 7]. Indeed, no head-to-head randomized controlled study has been conducted to date comparing GXR and ATX. However, there is an ever-increasing demand from US healthcare payers for comparative evidence, or comparative effectiveness research (CER), for competing treatment interventions to inform treatment decisions and shape best practices [8]. In the absence of direct comparisons from head-to-head randomized controlled studies, other methods for CER can be utilized, such as an indirect treatment comparison (ITC), mixed treatment comparison, or meta-regression [7]. While CER typically refers to comparative ‘effectiveness’ based on real world data, this study examines comparative ‘efficacy’ as measured in a controlled environment. In the absence of comparative effectiveness or efficacy data for GXR and ATX, this comparative efficacy analysis was conducted as part of CER. For the purposes of this publication, we will use the term CER to include ‘efficacy.’

Indirect treatment comparisons are made using available data from separate clinical trials of two different treatments in a number of ways. Traditional ITCs have well-recognized limitations. Comparisons based solely on published summary data (e.g., means), such as a naïve ITC, are subject to potential biases when only a small number of trials are available, and the summary data being compared may not be anchored to a common comparator or adjusted for placebo effects. Other ITC methods that may account for baseline differences may be subject to inconsistent conclusions depending upon the measure of effect used (e.g., odds ratio, relative risk) [9]. Meta-regression can be used to adjust for trial-level factors that differ between trials (e.g., mean age), but can be unreliable for small numbers of trials and may be subject to ecological bias (i.e., potential false inferences arising by using group rather than individual patient level data) [9, 10].

When only a few studies are available for an ITC, many of the limitations of ITC can be overcome if individual patient data (IPD) are available for just one of the treatments being compared. A recent methodology in CER, matching-adjusted indirect comparison (MAIC) [9, 11], leverages IPD from one treatment arm to model the observed baseline differences between two treatments. IPD from clinical trials for a given treatment can be re-weighted to exactly match the summary baseline characteristics and placebo outcomes from trials of another treatment for which only published data are available [9, 11]. After adjusting for the imbalance in baseline characteristics, study outcomes can be meaningfully evaluated across trials. These results can either stand on their own or be used to inform efficient prospective trial designs by informing power calculations. In the current analysis, available IPD from GXR studies were re-weighted to match summary published data from ATX studies, allowing for a more rapid, comparable, and likely less biased comparison of efficacy between GXR and ATX.

2 Methods

2.1 Trial Selection and Outcomes

A systematic literature search was conducted to include controlled clinical trials enrolling children and adolescents (age ≤18 years) with ADHD treated with GXR or ATX in Europe, North America, or Australia. The search was conducted in MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, EMBASE, PsycINFO, and the Cochrane Central Register of Controlled Trials. In addition, online documents (clinicaltrials.gov), published systematic reviews, meta-analyses, and post hoc analyses were reviewed for identification of additional studies. Primary publications written in English and published through December 2012 were included [keywords or Medical Subject Headings (MeSH) keywords: (ADHD or attention-deficit hyperactivity disorder) AND (guanfacine extended release or atomoxetine)]. Studies for data comparison were selected if they had comparable study characteristics based upon consistent study design (multicenter, double-blind, randomized, placebo-controlled trials), length of treatment, and commonly reported efficacy [ADHD Rating Scale IV (ADHD-RS-IV) scores]. Trials were excluded on the basis of the following criteria: open-label studies; studies focusing on ADHD subgroups (e.g., requiring certain comorbidities; requiring the study drug to be used as a specific line of treatment; or requiring symptom severity to be measured by scales other than the ADHD-RS-IV [e.g., Clinical Global Impressions-Severity (CGI-S) ≥4 at baseline]); studies with combination/adjunctive therapy with GXR or ATX (including combination with behavioral or educational programs); ADHD symptoms reported by teachers; age younger than 6 years; no outcome of interest (e.g., ADHD-RS-IV score change from baseline); no baseline ADHD-RS-IV values; or no outcome of interest for individual study arms. A total of six trials (two GXR [12, 13] and four ATX trials [1417]) met inclusion criteria (Fig. 1a, b). Study design characteristics of the selected trials are available in the Electronic Supplementary Material (Appendix 1).

Fig. 1
figure 1figure 1

a Guanfacine extended-release trial selection. b Atomoxetine trial selection. ADHD attention-deficit/hyperactivity disorder, ADHD-RS-IV ADHD Rating Scale IV, MAIC matching-adjusted indirect comparison, RCT randomized controlled trial

The efficacy outcome analyzed was the mean change from baseline in the ADHD-RS-IV score at the final on-treatment assessment prior to down-titration (i.e., drug tapering). The ADHD-RS-IV total score was evaluated as the primary endpoint; secondary endpoints included the ADHD-RS-IV hyperactivity/impulsivity and inattention subscale scores.

2.2 Dose Selection

For the purposes of a primary base case comparison of GXR and ATX, target doses used for comparison were selected on the basis of the maximum recommended effective dosages from their respective FDA-approved labels [4, 5]. For ATX, a body weight-based daily dose of 1.2 mg/kg is considered the target for children up to 70 kg [4]—thus, 1.2 mg/kg was chosen for the base case analysis; although a maximum recommended daily dose is 1.4 mg/kg, no additional benefit was demonstrated at doses >1.2 mg/kg/day for children and adolescents weighing up to 70 kg. Among children and adolescents weighing more than 70 kg, the maximum recommended daily dose should not exceed 100 mg [4]. The FDA-approved dose range of GXR is 1–4 mg once daily [5]. In order to match the published body weight-based ATX trial data, randomized fixed-dose data from individuals in the selected GXR trials were converted to body weight-based doses (also reported in the package insert and in the published study results) [5, 12, 13]. In GXR monotherapy clinical trials, clinically relevant improvements (e.g., in ADHD-RS-IV scores) were observed beginning in the 0.05–0.08 mg/kg/day dose range, with additional benefit observed at doses up to 0.12 mg/kg/day [5]. As the number of patients who received exactly GXR 0.12 mg/kg/day based on their weight in the selected GXR studies was too small to enable adequate comparisons, a dosage range for the base case analysis was expanded to 0.09–0.12 mg/kg/day range.

In both GXR monotherapy studies meeting the initial trial selection criteria [12, 13], the target dose range of GXR 0.09–0.12 mg/kg/day was used. Of the four ATX studies meeting the initial trial selection, only one study [15] included a treatment arm at the target dose of 1.2 mg/kg/day, and therefore only this study met the additional inclusion criteria of having a treatment dosage not exceeding the maximum dose at which efficacy was observed. Thus, the base case analysis was drawn from three trials: two GXR trials (GXR administered at 0.09–0.12 mg/kg/day) [12, 13] and one ATX trial (ATX administered at 1.2 mg/kg/day) [15].

2.3 Patient Selection

While only summary published data were available from the ATX trials, IPD were available from both GXR trials. In order to be included in the base case analyses, individual patients in the GXR trials [12, 13] were required to meet the published inclusion/exclusion criteria from the Michelson et al. 2001 ATX trial [15]: patients had to have a baseline symptom severity score of ≥1.5 standard deviations (SDs) above age and gender normative values on the ADHD-RS-IV total score or the hyperactivity/impulsivity or inattention subscale scores (Fig. 2). Patients from the GXR trials were further selected into the base case analysis cohorts on the basis of their expected body weight-based dosing. In the original GXR study designs, patients were randomized into fixed dose cohorts of GXR (1–4 mg) or placebo [12, 13]. For patients randomized to GXR, expected body weight-based dosing was calculated on the basis of patients’ assigned GXR dosage and baseline body weight. Similarly, the expected body weight-based dosing was evaluated for patients randomized to placebo to determine the probability of allocation into each dosing arm (1–4 mg/day); patients were not randomized to any ‘strength’ of placebo, but in order to match them to the active arm with different treatment strengths, the probability to receive one of the active treatments that the placebo patients would have potentially been assigned to was calculated. Patients randomized to placebo were excluded if, on the basis of their body weight, they would not fit into one of the body weight-adjusted dosing arms. Placebo groups across GXR trials were then collapsed into one group. For the base case analysis, this patient selection process allowed the comparative patient groups to be as homogenous as possible within the specified dosage ranges.

Fig. 2
figure 2

Patient selection: matching inclusion/exclusion criteria. aIndividual patients could have been included in more than one dose cohort where the dose ranges overlap. ADHD attention-deficit/hyperactivity disorder, ADHD-RS-IV ADHD Rating Scale IV, GXR guanfacine extended release, ITT intention-to-treat, SD standard deviation

2.4 Statistical Analysis

Comparative efficacy was analyzed before and after matching with IPD. In the unadjusted comparison (i.e., before matching with IPD), selected IPD that met all study selection inclusion and exclusion criteria were pooled from both GXR trials and compared with the summary results from the ATX trial [15] included in the base case analysis. Baseline characteristics and efficacy were compared for the GXR- and ATX-treated patients: continuous baseline variables and efficacy results were compared using Student’s t tests, and categorical baseline variables were compared using Chi-squared tests.

For the MAIC analyses (after matching with IPD), patients across GXR and ATX trials were matched for age, percentage of female patients, baseline ADHD-RS-IV hyperactivity/impulsivity and inattention subscale scores, percentage of inattentive and hyperactive/impulsive subtypes, and corresponding placebo response. Individual patients in the GXR trials were assigned weightings such that their baseline characteristics and average placebo outcomes (either ADHD-RS-IV total score or subscale scores) from GXR trials exactly matched those reported for the ATX trial. Weights were modeled as a linear combination of all reported baseline characteristics (age, percentage of female patients, race, weight, height, ADHD-RS-IV baseline inattentive and hyperactivity/impulsivity subscale scores, ADHD subtype, and disease duration), as well as pairwise interactions, and quadratic and cubic terms based on these measures in the GXR trials. The weights were then estimated using linear programming to minimize the sum of the squared weights (i.e., their deviation from uniform weighting) under the constraints that all weights were non-negative and that all weighted mean baseline characteristics and placebo arm outcomes were exactly balanced between the GXR and ATX trials. After matching, efficacy outcomes for the re-weighted GXR-treated patients were compared with those from the ATX trial in comparable trial populations. To assess statistical significance, a bootstrap procedure was applied [18]. Patients in the GXR trials were randomly sampled with replacement to generate 1,000 bootstrap replicates. Estimation of the weights and weighted-comparisons between GXR and ATX were repeated for each replicate. Statistical testing was then based on two-sample t tests with unequal variance, incorporating the bootstrap standard errors (SEs) for GXR outcomes and reported SEs for ATX outcomes. Two-sided statistical significance was assessed at the α = 0.05 level.

2.5 Sensitivity Analysis

In order to test the robustness of the base case results, sensitivity analyses were conducted. From the two GXR studies used in the base case comparison, different cohorts of patients receiving GXR at lower doses than the target dose (0.075–0.090 and 0.046–0.075 mg/kg/day) were selected and were compared with the same base case target dose of ATX (1.2 mg/kg/day) in the three-trial sensitivity analysis [12, 13, 15].

To determine if the base case results were generalizable to a broader heterogeneous trial population, a second MAIC analysis was conducted where ATX doses ≥1.2 mg/kg/day were compared against three GXR dose cohorts (0.09–0.12, 0.075–0.090, and 0.046–0.075 mg/kg/day). For this analysis, three additional ATX trials met the selection criteria in addition to the three trials included in the base case (‘six-trial’ analysis). The three additional ATX trials included in the six-trial sensitivity analysis did not report results with a fixed 1.2 mg/kg/day ATX dosage, but rather varied dosages of ATX (titrated up to 2 mg/kg/day) [14, 16, 17].

Statistical evaluations for the sensitivity analyses were conducted in a manner similar to that described for the base case comparison.

3 Results

Baseline pooled patient characteristics for the base case and three-trial sensitivity analyses are shown in Table 1. Before matching, mean age between patients in the GXR trials and ATX trials was statistically significantly different. After matching, all means and SDs of continuous baseline variables and percentages of categorical variables were identical between all cohorts from the two pooled GXR trials and ATX trial; in addition, placebo group efficacy results were matched exactly between the GXR and ATX trials.

Table 1 Baseline characteristics before and after matching in the three-trial base case and sensitivity analyses

Results from the unadjusted comparisons of the base case and associated three-trial sensitivity analyses are shown in Table 2. Unadjusted comparisons reveal that the placebo outcomes were different between the GXR and ATX groups. Compared with patients on placebo in the ATX trial, patients on placebo in the pooled GXR trials demonstrated significantly greater reductions in mean (SE) ADHD-RS-IV total [−4.8 (1.6); p < 0.01] and subscale scores [hyperactivity/impulsivity: −1.8 (0.8); p < 0.05; inattention: −3.1 (1.0); p < 0.01]. This suggests that unobserved variables (e.g., conduct of the trials, baseline characteristics, etc.) were different between the ATX and GXR studies and cohorts. Therefore, differences in placebo effect from cross-trial differences in unobserved variables were adjusted for in the MAIC model. In the base-case [target doses of GXR (0.09–0.12 mg/kg/day) and ATX (1.2 mg/kg/day)] unadjusted comparison before matching, GXR-treated patients demonstrated better efficacy (as shown by statistically significantly greater reductions in mean change from baseline for ADHD-RS-IV total score and subscale scores for hyperactivity/impulsivity and inattention) compared with ATX-treated patients.

Table 2 Unadjusted comparison of change from baseline at final on-treatment assessment in ADHD-RS-IV scores: three-trial base case and dosage range sensitivity analyses

After matching for baseline characteristics and differences in the placebo effect across treatments, the MAIC base case analysis confirmed significantly greater reductions for patients receiving GXR 0.09–0.12 mg/kg/day compared with ATX-treated patients in mean (SE) ADHD-RS-IV total score [−7.0 (2.2); p < 0.01] and subscale scores for hyperactivity/impulsivity [−3.8 (1.2); p < 0.01] and inattention [−3.2 (1.3); p < 0.05; Fig. 3]. Patients receiving a lower dose range of GXR (0.075–0.090 mg/kg/day) demonstrated significantly greater reductions in ADHD total score and in the hyperactivity/impulsivity subscale score compared with ATX-treated patients; patients receiving 0.046–0.075 mg/kg/day GXR demonstrated significantly greater reductions in the ADHD-RS-IV inattentive subscale score compared with ATX-treated patients (Fig. 3).

Fig. 3
figure 3

Matching-adjusted indirect comparison of change from baseline at final on-treatment assessment in ADHD-RS-IV scores: three-trial base case and sensitivity analyses. ADHD attention-deficit/hyperactivity disorder, ADHD-RS-IV ADHD Rating Scale IV, ATX atomoxetine, GXR guanfacine extended release, SE standard error. *p < 0.05 compared with ATX. **p < 0.01 compared with ATX

In the six-trial sensitivity analysis, before matching, patients receiving the target GXR dose (0.09–0.12 mg/kg/day) demonstrated significantly greater reductions versus ATX in mean ADHD-RS-IV total score, as well as subscale scores for hyperactivity/impulsivity and inattention, respectively (Table 3). As noted in the base case analysis, patients receiving placebo in the pooled GXR trials also demonstrated significantly greater reductions in ADHD-RS-IV total and subscale scores compared with patients receiving placebo in the pooled ATX trials. After matching, GXR patients in the target dose range (0.09–0.12 mg/kg/day) demonstrated significantly greater reductions over ATX in mean (SE) ADHD-RS-IV total score [−7.6 (1.4); p < 0.01] and subscale scores for hyperactivity/impulsivity [−4.0 (0.8); p < 0.01] and inattention [−3.7 (0.8); p < 0.01], as did patients in the lowest GXR dose group (0.046–0.075 mg/kg/day) versus ATX. The differences between GXR 0.075–0.090 mg/kg/day and ATX in mean ADHD-RS-IV total and subscale scores were not statistically significant (Fig. 4).

Table 3 Unadjusted comparison of change from baseline at final on-treatment assessment in ADHD-RS-IV scores: six-trial sensitivity analysis
Fig. 4
figure 4

Matching-adjusted indirect comparison of change from baseline at final on-treatment assessment in ADHD-RS-IV scores: six-trial sensitivity analysis. ADHD attention-deficit/hyperactivity disorder, ADHD-RS-IV ADHD Rating Scale IV, ATX atomoxetine, GXR guanfacine extended release, SE standard error. *p < 0.05 compared with ATX. **p < 0.01 compared with ATX

4 Discussion

The purpose of CER is “to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances” [19]. Such evidence-based information is important to clinicians, patients, payers and policymakers in determining treatment decisions and best practices. In 2009, the American Recovery and Reinvestment Act (ARRA) set aside more than US$1 billion to fund CER projects [19], highlighting the overall need for comparative evidence. It appears that CER will continue in the future, as the 2010 Patient Protection and Affordable Care Act was passed, building on the funding initiated by the ARRA by providing new and increased funding for CER until 2019 that will reach an annual total of US$600 million during this time period [20]. With this in mind, a cost-effective approach for generating CER data would be to leverage available clinical trial data at the individual patient level.

MAIC overcomes limitations of unadjusted ITC methods by utilizing IPD, to compare treatment outcomes across balanced populations. The utility of MAIC has been demonstrated in a number of other therapeutic areas, such as psoriasis, newly diagnosed chronic myelogenous leukemia, and type 2 diabetes [9].

With IPD available from GXR clinical trials, it was possible to select a specific group of patients given equipotent (based on mg/kg/day) doses and compare them with patients in the target (maximum effective) dose ATX trials. In the present analysis, the unadjusted comparisons revealed significantly larger decreases in ADHD-RS-IV scores with GXR at the target dose compared with ATX. However, patients in the placebo arm of the GXR trials also revealed significantly larger decreases in ADHD symptom scores compared with ATX. The apparent decreases in ADHD-RS-IV total and subscale scores in this unadjusted analysis may be impacted by baseline differences between the GXR and ATX cohorts as well as differences in placebo effect. Re-weighting of IPD using MAIC methodology allowed for exact matching of average baseline characteristics of age, sex, and baseline ADHD-RS-IV scores across cohorts. Although the cohorts were not significantly different in all baseline variables to begin with, it should be noted that even non-statistically significant baseline differences can lead to significant confounding effects. In the base case analysis, after adjusting for baseline differences and placebo arm response rates, patients on the target dose of GXR demonstrated significantly better efficacy compared with patients on the target dose of ATX in ADHD symptom reduction. Results of sensitivity analyses were consistent across a range of GXR doses below the target dose when compared with ATX doses at the target dose or higher, confirming the generalizability of the base case comparison to a broader and more heterogeneous trial population which may be clinically relevant.

Our results demonstrate that, on average, GXR is more efficacious than ATX for the treatment of ADHD in children and adolescents; these results demonstrated consistent significant reductions in symptoms in GXR over ATX in the base case analysis and in several of the sensitivity analyses. In the MAIC base case analysis, we observed an average reduction in ADHD-RS-IV total score of 7.0 for GXR compared with ATX. Similarly, in the six-trial sensitivity analysis, a significant reduction in ADHD-RS-IV total score of 7.6 points more for GXR compared with ATX was observed. An analysis by Zhang et al. [21] defined a between-treatment minimal clinically important difference for ADHD-RS-IV total scores as 6.6 points; this suggests that the ~7 point reduction in ADHD-RS-IV total score on GXR compared with ATX demonstrated by the MAIC analyses is not only statistically significant but may also be clinically meaningful. The potential clinical impact of these statistically significant differences should be explored further. Also, some researchers have attempted to correlate ADHD-RS-IV scores with scores on the Clinical Global Impressions-Improvement (CGI-I) and CGI-S scales, which correspond to global judgments regarding disease severity made by clinicians in practice [22]. In an evaluation of ADHD clinical trials of the effect of lisdexamfetamine dimesylate (Vyvanse®; Shire LLC, Wayne, PA, USA), Goodman et al. [22] found that differences in ADHD-RS-IV total score of 8–10 points were approximately correlated to a 1-level difference in CGI-S severity [e.g., from 4 (moderately ill) to 3 (mildly ill)]. However, the generalizability of those results to the current analysis of GXR and ATX is not known.

Our analysis has several limitations. As shown in Fig. 1, 92 trials (82 ATX and ten GXR) were assessed for eligibility, but the majority had to be excluded for a variety of reasons. For example, one trial [23] contained a study population that met the criteria for inclusion into our study; however, the paper did not report mean baseline ADHD-RS-IV total scores (required for MAIC) and therefore could not be used. Had more clinical trials been available for inclusion, different results might have been obtained. As is the case in any comparison of nonrandomized treatment groups, including MAIC, cross-trial differences in unobserved characteristics could have biased the comparison of outcomes. Only a direct treatment comparison through a randomized trial could potentially avoid bias due to unobserved characteristics through to randomization and blinding. However, the fact that the average placebo effect was also matched reduces this concern. Another potential limitation is that the MAIC, like any adjustment procedure, reduces the effective sample size compared with the unmatched trial populations. This is an unavoidable consequence of reducing variation between the different treatment groups. In addition, while the MAIC utilized IPD, the comparison between GXR and ATX was made at the population level. While our results show that GXR is more efficacious than ATX on average in the studied populations, particular individuals or subgroups may be more or less responsive to, or intolerant of, either medication. Individual responses indeed vary, and some individual patients may have better outcomes with ATX. Further patient-centered research would be necessary to better define patient characteristics associated with medication efficacy, including who may be a responder to GXR but not ATX and vice versa. As patient-centered medicine with a more personalized approach evolves, the MAIC method will be able to be potentially applied in subgroups to further help address such issues of patient heterogeneity. Also, an analysis of safety and tolerability with GXR versus ATX, while an important clinical consideration for treatment selection, was outside the scope of this CER. Additionally, the current study focuses on the short-term efficacy comparison between GXR and ATX. As ADHD is a chronic condition that may require long-term treatment, future research is needed to assess the long-term impact of these treatments. Lastly, our analysis compared GXR and ATX; CER studies using MAIC methodology between clonidine extended-release and either GXR or ATX have not yet been conducted to our knowledge.

5 Conclusions

Using the MAIC method, adjustments for observable cross-trial differences at baseline were made using IPD. Compared with patients administered ATX at the target dose of 1.2 mg/kg/day, patients administered GXR at the target dose of 0.09–0.12 mg/kg/day demonstrated statistically significant and clinically meaningful mean reductions from baseline in ADHD-RS-IV total and subscale scores [21]. Results were consistent across several sensitivity analyses, demonstrating the results’ generalizability to a wider more heterogeneous trial population.