Introduction

Attention-deficit hyperactivity disorder (ADHD), a childhood-onset neurodevelopmental disorder,1 is highly heritable.2, 3 Despite this strongly consistent finding, some remain sceptical about the diagnosis of ADHD, the biological validity of such a construct, and its neurodevelopmental origins.4 Public misunderstanding is further fuelled by the relative scarcity of knowledge regarding its pathophysiology. The knowledge gap on pathophysiology is not straightforward to address given practical limitations to directly assess biological and molecular systems in those who are affected. Genetic findings offer one non-invasive approach to providing clues about the biology and pathogenesis of neuropsychiatric disorders.5, 6, 7

In the last 5 years, one class of genetic variant (large, rare chromosomal deletions and duplications (copy number variants; CNVs)) has been found to contribute to ADHD risk across multiple studies.8, 9, 10, 11, 12 Common genetic variants also contribute to ADHD when considered ‘en masse’ as a composite risk.13, 14, 15, 16, 17, 18, 19 One important observation from the original ADHD CNV studies was that the CNVs spanned chromosomal and gene regions that overlapped with schizophrenia and autism CNV loci.9, 11 Since then, exome-sequencing investigations of schizophrenia and autism have been published and have highlighted additional novel genomic regions that harbour neurodevelopmental disorder risk variants (single-nucleotide variants; SNVs).20, 21, 22, 23 Exome-sequencing results for ADHD are awaited. Unlike CNVs, which generally span multiple genes, SNVs are located in individual genes and thus offer improved resolution for testing cross-disorder pathogenic mechanisms.

Schizophrenia and autism genetic findings have also indicated specific biological mechanisms; these involve targets of the Fragile X Mental Retardation protein (FMRP) in schizophrenia and autism20, 21, 22, 23, 24 and in schizophrenia, genes involved in glutamatergic post-synaptic processes: ARC (activity-regulated cytoskeleton-associated protein) and NMDAR (N-methyl-D-aspartate receptor) complexes.20, 21, 25 Additional pathways have been reported to be enriched for SNVs (for example, calcium channel complexes in schizophrenia21), but such findings are not yet replicated across multiple studies and research designs.

In the current study, we assemble five ADHD and control CNV data sets from the United Kingdom, Ireland, Northern Europe, United States of America and Canada.10 Our aim was to test the hypothesis that ADHD-associated CNVs show enrichment for gene sets that have been: (a) implicated by SNVs in the most recent, largest schizophrenia and autism de novo exome-sequencing studies20, 21, 22, 23 and (b) selected for function involving ARC, NMDAR and FMRP targets. We also undertook a hypothesis-free meta-analysis of all biological pathways across the five ADHD data sets.

Materials and methods

Participant and control data sets

The recruitment, assessment processes and clinical description of ADHD case subjects and controls have been described in detail previously.10 Case–-control CNV data sets were provided by the CNV lead for each study (SS, NW, JG, BF, EM). The five data sets came from: (a) Canada,11 (b) Cardiff, UK,9, 17 (c) the Children's Hospital of Philadelphia, USA (CHOP),8 (d) the International Multi-Center ADHD Genetics (IMAGE) 2 Project10, 16 and (e) the Pfizer-funded study from UCLA, Washington University and Massachusetts General Hospital (PUWMa).26 All affected subjects from each of the five studies were children aged 5–18 years, had an IQ⩾70, were of European descent, free of psychosis, epilepsy, serious neurological impairment and met Diagnostic and Statistical Manual of Mental Disorders (DSM)-III-R or DSM-IV criteria for ADHD as confirmed by semi-structured research diagnostic interviews. Collection and analysis of case–control data had been approved at each site by each Institutional Review Board or ethics committee. Case data were collected with informed consent from parents and assent from children.

After quality control exclusions (see later), the Canadian study included 247 DSM-IV ADHD cases and 2357 comparison subjects genotyped on the Affymetrix 6.0 array (Affymetrix, Santa Clara, CA, USA). The UK Cardiff study (excluding those that were included in IMAGE 2) included 603 DSM-IV ADHD cases genotyped on Illumina Human 660Q-Quad Beadchip (Illumina, San Diego, CA, USA) and 1047 comparison subjects genotyped on HumanHap550Beadchip. The CHOP study included 1013 participants with DSM-IV ADHD and 4105 comparison children genotyped on the Illumina Infinium HumanHap550K Beadchip. IMAGE 2 included data from 732 affected children collected in the United Kingdom, Ireland, Germany, Switzerland, the Netherlands and the United States. Control data came from 2010 subjects collected for a genome-wide association study of schizophrenia, described elsewhere.27 A total of 692 ADHD cases from PUWMa and 1101 controls were genotyped on the Illumina 1M BeadChip.

Quality control and procedures for CNV calling are described in each of the original manuscripts (see Williams et al.10 for a summary). CNVs were filtered for frequency (<1%), and also for overlapping (>50% length) sites of known common CNVs defined by the Genome Structural Variation Consortium (http://projects.tcag.ca/variation/ng42m_cnv.php) or segmental duplications present in the March, 2006, human reference sequence (National Center for Biotechnology Information reference build 36.1, hg18). To allow for differences in CNV detection between the arrays, we used controls genotyped on platforms that matched the platforms used for the cases. The primary analyses used all CNVs >500 kb as these are the most reliably called across different genotyping platforms, have consistently been found to show increased burden in ADHD cases vs controls and have withstood experimental validation.10 Pathways highlighted by the primary analysis were also tested for enrichment with all CNVs >100 kb (results available from last author).

Statistical method for enrichment testing

The method is adapted from that proposed by Raychaudhuri et al.28 and has been successfully applied to de novo schizophrenia CNVs.25 Briefly, each CNV was labelled ‘case/control’ according to whether it occurs in a case or a control. A ‘study’ covariate for each data set was included as this corrects for variation in the genotyping assays and CNV estimation algorithms used across the different studies.

The following two logistic regression models were fitted to the sample of CNVs:

  1. 1)

    Case–control study+CNV length+number of genes hit outside pathway

  2. 2)

    Case–control study+CNV length+number of genes hit outside pathway+hit gene in pathway (yes/no)

and the deviances of the two models compared. If there is no enrichment of case CNV hits on pathway genes, then the difference in deviances should be distributed asymptotically as a χ2 on one degree of freedom.29 CNV length was fitted in the model because long CNVs are more likely to hit any set of genes than small ones, and CNV length may differ systematically between cases and controls. The ‘number of genes hit outside pathway’ was fitted to allow for case CNVs influencing disease status by hitting genes other than those in the pathway being tested. A binary variable (yes/no) is used for whether a CNV ‘hits gene(s) in a pathway’ rather than the number of genes in the pathway hit by the CNV (which is also a possibility) to allow for some pathways having several genes that are physically close together (thus, likely to be hit by the same CNV). The same analysis method was used to determine gene-specific enrichments, by defining the ‘pathway’ as the gene.

For all analyses, we tested pathways showing significant enrichment in the combined data set of five studies and then undertook tests of sensitivity by examining each of the five data sets separately and rerunning analysis after excluding the most significant single sample. Additional analyses were stratified by CNV type (deletions and duplications). As the number of duplications was greater than the number of deletions,10 we tested for differential strengths of enrichment by comparing case duplications to case deletions in the logistic regression framework described above.

Specific gene sets

Genes selected by location

Four sets of genes were defined: (a) non-synonymous and (b) loss-of-function de novo SNVs in schizophrenia were taken from the most recent published de novo exome-sequencing study of schizophrenia, which also catalogued and annotated all SNVs in previous studies in a consistent way.20 There were a total of 611 genes containing at least one non-synonymous de novo SNV, with 87 of these containing a de novo loss-of-function SNV. We also took sets of genes containing: (c) non-synonymous and (d) loss-of-function de novo SNVs from the two largest, recent exome-sequencing studies of autism.22, 23 Combining the autism SNV sets from these studies, there were 2726 unique genes containing at least one non-synonymous de novo SNV, of which 538 contained a loss-of-function de novo SNV.

Genes selected by function: FMRP targets, ARC and NMDAR

Next, we examined a further three gene sets by function, selecting those implicated in schizophrenia and autism by CNV and more recent exome-sequencing studies. These were genes involved in FMRP targets (840 genes) as defined previously30 and the ARC (28 genes) and NMDAR (61 genes) complexes, as defined by Kirov et al.25 Bonferroni corrections were used to adjust for testing seven gene sets.

Hypothesis-free analysis of all pathways

This final analysis was undertaken using a large, unbiased, general set of pathways comprising:

  1. a)

    Gene Ontology31 (http://www.geneontology.org/, accessed on July 2013)

  2. b)

    KEGG32 (http://www.genome.jp/kegg/, accessed on June 2013)

  3. c)

    PANTHER pathways version 3.1 (http://www.pantherdb.org/pathway/)

  4. d)

    Mouse Genome Informatics database33 (http://www.informatics.jax.org/, accessed on August 2013)

  5. e)

    BioCarta (http://www.biocarta.com, accessed on June 2013)

  6. f)

    Reactome34 (http://www.reactome.org, accessed on June 2013)

  7. g)

    NCI35 (http://pid.nci.nih.gov, accessed on June 2013)

Pathways containing between 3 and 1500 genes were used in the analysis (15 111 in total). To increase the accuracy of the asymptotic P-values described above and to reduce the chance of a small pathway being falsely declared to be enriched based on a few gene hits, analysis was restricted to pathways with at least 10 gene hits in the total sample (7034 in total). Correction for multiple testing of pathways was performed by calculating q-values.36

Results

Table 1 shows the number of large, rare CNVs (>500 kb) for each of the previously published five case control samples and the total number of deletions and duplications. The rate and burden of CNVs for each sample have been published previously.9, 10, 11, 26, 37

Table 1 Burden and type of CNVs >500 kb in cases and controls from each study

Gene sets selected by location of schizophrenia and autism de novo SNVs

The genes containing non-synonymous schizophrenia de novo SNVs20 were significantly enriched for case ADHD CNV hits (P=5.4 × 10−4 for CNVs >500 kb, see Table 2), which is significant after Bonferroni correction. Findings remained significant even after the most significant single sample was removed from the analysis (Table 2). The significant enrichment was observed for duplications (P=5.6 × 10−4) but not for deletions (P=0.37). However, there was no significant difference between ADHD duplication and deletions in terms of the rate of hits for genes previously found to carry non-synonymous schizophrenia SNVs (P=0.142). For sample-specific enrichments, see Supplementary Table 1, and see Supplementary Table 2 for gene-wide P-values. Restricting analysis to genes hit by loss-of-function de novo schizophrenia SNVs showed no significant enrichment for case CNV hits.

Table 2 Enrichment of ADHD case CNV hits in genes containing schizophrenia and autism de novo SNVs (non-synonymous and loss-of-function)

Table 2 shows a nominally significant enrichment of CNVs among autism de novo SNV genes (non-synonymous or loss-of-function), although these do not survive Bonferroni correction.

Gene sets selected by function

There was significant enrichment of ADHD case CNVs in FMRP targets (P=0.0018), which remained significant after Bonferroni correction (see Table 3 for details). The enrichments also remained significant when the most significant single sample was removed. Significant enrichment was observed for duplications (P=0.005) but not for deletions (P=0.18), although there was no significant difference between the rate of case duplication and deletion hits for FMRP target genes (P=0.247). See Supplementary Table 3 for sample-specific enrichments and Supplementary Table 4 for FMRP target genes that showed significant enrichment. There was no evidence of enrichment of ARC complex or NMDAR gene sets.

Table 3 Enrichment of FMRP, ARC and NMDAR gene sets for ADHD case CNV hits

Hypothesis-free analysis of all pathways

The most significantly enriched pathways are shown in Table 4, with those remaining significant in the sensitivity analysis highlighted in bold. As can be seen from the q-values, many of these were highly significant even after correction for multiple testing of pathways. Much of the enrichment appeared to come from duplications, with the exception of the ion channel pathways where enrichment was observed in both deletions and duplications but the strength of enrichment did not differ significantly for duplications and deletions (see Supplementary Table 5 for details). Ion channel pathways (ligand gated ion channel activity and ion gated channel activity), transmembrane transport and organonitrogen compound catabolic process were robust to sensitivity analysis (see Supplementary Table 6 for enrichment effect sizes and the number of gene hits for pathways listed in Table 4; see Supplementary Tables 7-9 for most significant ion channel pathway genes, organonitrogen compound catabolic process and carbohydrate derivative catabolic process genes and transmembrane transport genes).

Table 4 Top pathways in pooled meta-analysis with most significant enrichment for ADHD case CNV hits among CNVs >500 kb

Secondary analyses of CNVs >100 kb showed that observed pathways were generally significantly enriched in these analysis (results available from last author).

Discussion

In these analyses of ADHD case–control CNV data, the largest to date, we found highly significant enrichment of CNVs in many, although not all, hypothesised neurodevelopmental gene sets. Hypothesis-free testing revealed additional biological pathways enriched for CNVs in those with ADHD. Genes spanned by the ADHD CNVs were enriched for those that have recently been found to harbour schizophrenia-associated de novo SNVs. Previous work demonstrated overlap of ADHD CNVs with schizophrenia and autism CNVs;9, 10, 11 however, because we were now able to define our gene sets using SNVs, which impact on single genes rather than chromosomal regions encompassing multiple genes, the current findings more precisely suggest overlap at the level of genes. These findings therefore extend, refine and are independent of previous studies that suggest biological overlap across these clinically very different disorders.

The clinical co-morbidity and co-heritability of ADHD is much more strongly established with autism than with schizophrenia.38 Although previous studies, using some of the data sets in the present paper, found ADHD and autism overlap at the level of CNVs9, 10, 11 and CNV-associated biological pathways,39 we did not observe this extended to autism de novo SNV genes.

Targets of the FMRP have been implicated previously in both schizophrenia and autism.21, 22, 23 Complete expression failure of FMRP itself, which characterises Fragile X syndrome, is known to be associated with elevated rates of ADHD, autism and other neurodevelopmental disorders.40 Indeed, the majority of males with Fragile X syndrome show ADHD. Previous work has shown that the protein FMRP regulates activity of 842 biological targets and suggested that some of these likely underlie the manifestation of autistic features in Fragile X syndrome.30 Our findings show that genes encoding FMRP targets are also enriched in ADHD CNVs, which could help explain why children with Fragile X syndrome show such high rates of ADHD. The results suggest that FMRP-mediated biology may be relevant across multiple neuropsychiatric disorders that include ADHD, as well as autism and schizophrenia. Although a recent report suggests that what has been considered as the specific contribution of FMRP targets to the pathogenesis of autism might simply reflect involvement of long, highly brain-expressed genes,41 here large gene size is controlled for. Nevertheless, further work is needed to understand how FMR1 is involved in the pathogenesis of different neurodevelopmental phenotypes, that is, ADHD, autism and schizophrenia, each of which has very different phenotype features and treatments.

Although schizophrenia and autism genetic findings have all strongly implicated the involvement of synaptic functions in pathogenesis,20, 21, 23 we did not observe that to be the case for ADHD. Nor did we replicate prior pathway analysis implicating neurite outgrowth at synapses in ADHD, although previous analyses have used SNP or candidate gene data or smaller data sets.42, 43, 44, 45 We cannot be certain if this is a genuine point of difference or whether larger studies of ADHD, which consider additional types of mutations, for example, through exome sequencing, will yield a different pattern of findings.

The hypothesis-free analysis of all ADHD CNV pathways yielded strongly significant enrichment for multiple biological pathways. Many but not all of these were robust to analysis after excluding the most significant study. As expected for this final, hypothesis-free set of analyses, the significance of the enrichment was reduced by the removal of the most significant study, even when there was no significant difference in the strength of the enrichment between that study and the remainder of the sample, as was the case for most of the pathways highlighted here (see Supplementary Tables 1 for details). Thus, this analysis should be regarded as a sensitivity analysis testing the robustness of the pathway enrichment, rather than a measure of its overall significance. The fact that the enrichment strengths were not significantly different for most of the pathways of interest gives further evidence for consistency of enrichment across studies.

Ion channel pathways (ligand gated ion channel activity and ion gated channel activity) and especially involvement of CHRNA7 and associated pathways have been implicated in previous analyses of a subset of the present pooled analysis.10, 17, 39 Although some enrichment of immune-related pathways was also observed in the current study, this finding was not robust to the sensitivity analysis and thus requires cautious interpretation until more data become available. Regardless, the present study demonstrates that most findings were robust to sensitivity testing, whereby we excluded the most significant sample, despite potential variation in case mix severity and ascertainment across research centres and concerns from some quarters about the validity of ADHD.

Although our study involves analysis of the largest ADHD CNV data set to date, there are several limitations. First, analyses of this type for any disorder are restricted by the quality of gene and pathway annotations and the fact that the same genes belong to multiple different functional gene sets. Second, genotyped ADHD sample sizes have markedly lagged behind those of many other neurodevelopmental and psychiatric disorders and results of exome sequencing are awaited. Third, ADHD-associated CNVs do not capture all possible forms of genetic risk, or risk loci; however, previous findings from smaller, individual studies do suggest biological convergence of CNVs and common gene variants.7, 17, 42, 44, 45 However, the biology and validity of ADHD remain poorly understood and it is still widely considered to be primarily a catecholaminergic disorder.43 Findings from the present study highlight the consistent and robust neurodevelopmental nature of ADHD and provide novel insights about its biological underpinnings.

In conclusion, in an international, cross-centre analysis of ADHD CNV data, we find evidence of biological overlap with schizophrenia at multiple levels; previous findings of overlap with schizophrenia CNVs have now been extended to SNVs. Furthermore, FMRP target enrichment appears to characterise ADHD as well as schizophrenia and autism. Ion channel pathway involvement in ADHD was robust to type of CNV and across samples. The findings reveal that CNVs in children with ADHD, from across multiple centres, converge on biologically meaningful gene clusters that are now robustly established as involved in neurodevelopmental disorder risk.