Introduction

Autism is characterized by impaired social interaction and communication, and by restricted/repetitive behavior typically presenting in early development. Autism affects neural development and information processing in the brain by altering how nerve cells and synapses connect and organize. Per the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V), the autism spectrum encompasses previous diagnoses of autistic disorder, Asperger syndrome, pervasive developmental disorder not otherwise specified (PDD-NOS), and childhood disintegrative disorder. Autism has a strong genetic component based on very high heritability in families, although the genetics of autism are complex. It is unclear whether autism spectrum disorders (ASDs) are explained more by rare mutations, or by rare combinations of common genetic variants. Debate remains about the genetic architecture of autism, specifically the extent to which common variant–common disease and rare variant–common disease models describe the disorder [1].

Epidemiology

Autism is approximately four times more common in males than in females [24]. Additionally, there is a twofold increase in risk if an older sibling is affected by autism [5]. The prevalence increased from 0.05 % of individuals throughout the 1980s to 2 % of school-age children in the USA [6], in part due to a change in the practices of diagnosis and ascertainment. Evidence that autism concordance in monozygotic twins approaches 92 %, in contrast to 10 % in dizygotic twins, underlines a strong genetic component [79].

Treatments

Antidepressants, stimulants, and antipsychotics are commonly prescribed, but no known medication relieves autism's core symptoms of social and communication impairments. However, aripiprazole and risperidone are effective for treating irritability in children with autistic disorders [10]. With new genetic insights offered by genome-wide association studies (GWAS) of autism, we can envision new medication development [11].

Families and the educational system are currently the main resources for treatment. Intensive, sustained special education programs and behavior therapy early in life can help children acquire self-care, social, and job skills, and often improve function and decrease symptom severity and maladaptive behaviors [12]. Available approaches include applied behavior analysis (ABA), developmental models, structured teaching, speech and language therapy, social skills therapy, and occupational therapy [13].

Genomic Studies

Large-scale collections of samples have been genotyped, including the Autism Genetic Resource Exchange (AGRE), Simons Simplex Collection (SSC), Autism Genome Project (AGP), Autism Case–control (ACC), and Autism Center of Excellence (ACE).

Evidence from different cases supports the understanding that chromosomal mutation contributes to autism susceptibility [14]. Linkage studies and cytogenetic analysis have led to identification of several novel candidate genes, including neurexins (NRXNs) and neuroligins (NLGNs). Significant genomic linkages have been reported on 2q31, 3q, 7q22, and 7q34 [15, 16].

Several regions of interest, including 1p13.2, 1q31.1, 5p13, 8q24, 13q, 15q, 16p, 17q, 19p, and Xq have been implicated by multiple studies. The majority of these regions, except 1p13.2 and 2q32, have been reported with chromosomal mutations [16].

Several genome-wide studies have been conducted to identify the genetic variants associated with risk for autism. Copy number variations (CNVs), including several large recurrent deletions or duplications have been found. The best established autism-associated CNVs include 7q11.23, 15q11–13, 16p11.2, and 22q11.2 loci, and NRXN1, CNTN4, NLGNs, and SHANK3 genes [1719]. CNVs have proven a critical source of genetic burden contributing to the autism phenotype [1822, 23••, 24, 25, 26••, 2729].

Recently conducted exome sequencing studies suggest that hundreds of de novo mutations have some role in the development of autism, and clear evidence implicates a few specific genes (CHD8, KATANAL2, SCN2A, NTNG1) [3033]. Together, these structural variants or de novo mutations, many of which are highly penetrant or protein-altering but individually rare, account for a limited proportion of the genetic risk for autism. In contrast, only a modest number of common variants have been reported at CDH9-CDH10, SEMA5A, and MACROD2 loci through GWAS.

Autism is marked with phenotypic heterogeneity and is etiologically multifactorial. The hypothesis that multiple common variants collectively, or through interaction with environmental factors, account for a certain proportion of risk for autism motivates the rationale for GWAS.

Major GWAS

Major GWAS of autism have been conducted (Tables 1 and 2).

Table 1 Genome-wide association studies of autism studies
Table 2 Genome-wide association studies of autism genome-wide significant findings

The first successful GWAS was by Wang et al. [34] and surveyed 780 families (3,101 subjects) with affected children, a second cohort of 1,204 affected subjects, and 6,491 control subjects of European ancestry. Six single nucleotide polymorphisms (SNPs) between cadherin 10 (CDH10) and cadherin 9 (CDH9), two genes encoding neuronal cell-adhesion molecules, revealed strong association signals, with the most significant SNP being rs4307059 (P = 3.4 × 10−8, odds ratio 1.19). Top hits replicated in two independent cohorts, with combined P values ranging from 7.4 × 10−8 to 2.1 × 10−10.

Later, Kerin et al. [35] used a tiling array within 100-kb linkage disequilibrium (LD) of the GWAS-identified peak and examined relevant expressed sequence tags (ESTs) and RNA. They identified only one functional element, a single non-coding RNA that was found to correspond to the moesin pseudogene 1 (MSNP1). The 3.9 kb RNA had 94 % sequence identity with the mature messenger RNA (mRNA) protein-coding gene MSN, located on the X chromosome. Interestingly, the (noncoding) RNA at 5p14.1 was encoded by the opposite (antisense) strand of MSNP1, and was named moesin pseudogene 1 antisense (MSNP1AS).

Second, Ma et al. [36] used a discovery dataset of 438 autistic Caucasian families and the Illumina Human 1 M beadchip. A total of 96 SNPs demonstrated strong association with autism risk (P < 0.0001). Validation of the top 96 SNPs was performed using an independent dataset of 487 autism families of European ancestry and genotyped on the 550 K Illumina BeadChip. The same region on chromosome 5p14.1 as reported by Wang et al. showed significance in both the discovery and the validation datasets. Joint analysis of all SNPs in this region identified eight SNPs having improved P-values (3.24 × 10−4 to 3.40 × 10−6) than in either dataset alone.

Third, Weiss et al. [37] analyzed 1,031 multiplex autism families, including 1,553 affected children, and identified regions of suggestive and significant linkage on chromosomes 6q27 and 20p13, respectively. Initial analysis did not yield genome-wide significant associations. Genotyping of top hits in additional families revealed an SNP on chromosome 5p15 (between SEMA5A and TAS2R1) that was significantly associated with autism (P = 2 × 10−7).

Fourth, Anney et al. [38] found that rs4141463, located within MACROD2, surpassed the genome-wide association significance threshold of P < 5 × 10−8. When a smaller replication sample was analyzed, the risk allele at rs4141463 was again over-transmitted. However, the effect size was much smaller and, combined, barely met the P < 5 × 10−8 threshold. Exploratory analyses of phenotypic subtypes yielded no significant associations after correction for multiple testing. Best signals were within KIAA0564, PLD5, POU6F2, ST8SIA2, and TAF1C.

Fifth, Cho et al. [39] analyzed 42 Korean ASD patients genotyped with Affymetrix SNP Array 5.0 and detected candidate SNPs in chromosome 11, rs11212733 (P = 9.76 × 10−6) and rs7125479 (P = 1.48 × 10−4), as a marker of language delay in ASD using the transmission disequilibrium test and multifactor dimensionality reduction test.

Sixth, Anney et al. [40] found no single SNP showing significant association with ASD or selected phenotypes at a genome-wide level. The SNP that achieved the smallest P-value from secondary analyses was rs1718101, which falls in CNTNAP2, a gene previously implicated in susceptibility for ASD. rs1718101 also shows modest association with age of word/phrase acquisition in ASD subjects.

Seventh, Connolly et al. [41•] examined 2,165 participants (mean age 8.95 years) and examined associations between genomic loci and individual assessment items from the Autism Diagnostic Interview-Revised, Autism Diagnostic Observation Schedule, and Social Responsiveness Scale. Significant associations with a number of loci were identified, including KCND2 (P = 3.05 × 10−8, overly serious facial expressions), NOS2A (P = 8.12 × 10−7, loss of motor skills), and NELL1 (P = 2.91 × 10−7, faints, fits, or blackouts).

Eighth, Smoller et al. [42••] represents the Psychiatric Genomics Consortium, which analyzed multiple major psychiatric disorders including ASD, attention deficit–hyperactivity disorder, bipolar disorder, major depressive disorder, and schizophrenia. SNP data for the five disorders were analyzed in 33,332 cases and 27,888 controls of European ancestry. SNPs at four loci surpassed the cutoff for genome-wide significance (P < 5 × 10−8) in the primary analysis: regions on chromosomes 3p21 and 10q24, and SNPs within two L-type voltage-gated calcium channel subunits, CACNA1C and CACNB2.

In an earlier CNV association of schizophrenia, CACNA1B (P = 8.68 × 10−4) and DOC2A (P = 5.69 × 10−6), both calcium-signaling genes responsible for neuronal excitation, were deleted in 16 cases and duplicated in ten cases, respectively [43].

Most recently, Xia et al. [44••] utilized two Chinese cohorts as discovery (n = 2,150) and three data sets of European ancestry populations for replication. Meta-analysis identified three SNPs, rs936938 (P = 4.49 × 10−8), non-synonymous rs6537835 (P = 3.26 × 10−8) and rs1877455 (P = 8.70 × 10−8); and related haplotypes, AMPD1-NRAS-CSDE1, TRIM33, and TRIM33-BCAS2, all associated with autism. All were mapped to a linkage region (1p13.2) previously reported as associated with autism. TRIM33 is an E3 ubiquitin ligase, further supporting the role of ubiquitin mutations in autism.

Scope of Autism GWAS and Future

Effect size for singular common variants is small. Although high odds ratios can be detected by GWAS, few greater than 1.5 have been found. Therefore, widespread epistasis likely exists between multiple common variants with small effect sizes. So-called unaffected relatives may have a sub-threshold genetic burden of variants; thus, we need better endophenotypes and intermediate phenotypes classification. Epistatic interactions are important biologically but computationally intractable with current methods.

Postsynaptic components of excitatory synapses and L-type calcium channels have been major biological categories emerging from neuropsychiatric GWAS. Specifically, NRXN, CNTN, GRM, ubiquitin, and SLIT gene members show multiple gene family members with variants associated with autism and related neurodevelopmental disorders.

Larger population samples are required for dissecting the genetics of psychiatric disease compared with other common diseases, most likely due to the complex genetic architecture of psychiatric disorders. Genetic risk is the culmination of many different genes, within individuals and across families, and both rare and common variants working in concert, making psychiatric disorders polygenic.

Progress is being achieved in the GWAS of autism due to improved understanding of the human genome sequence and the tremendous variation among individuals. Functional variants abound in the genome and provide the mode for polygenic inheritance. Technical innovation of genomic assays is also accelerating progress. The large-scale sequencing effort of the 1000 Genomes Project informs new content for SNP microarrays, which can be affordably genotyped across large populations to gain necessary statistical power to meet the multiple testing correction bar of 5 × 10−8. The large-scale worldwide collaborations are vital to enable sharing of data to allow for meta-analysis and mega-analysis.

Many common variants identified by GWAS have modest effects. Specifically, they act as a genetic risk factor within a genetic background of other variants, increasing risk or protection against autism, and acting in a range of magnitudes on the protein sequence or the expression levels[45]. Klei et al. [46] used simulations to examine the additive model of autism susceptibility, whereby autism is underscored by a large number of variants that cumulatively increase ASD risk, but individually have very small effects. Their model suggests that, in multiplex families, approximately 60 % of susceptibility is attributable to additive effects, whereas, in simplex families, the proportion is closer to 40 %.

We can learn some lessons from syndromic forms of autism, but the large-effect singular variants may radically disrupt brain development and function, such that the psychiatric phenotype is obfuscated. Risk variants typically represent more mild tissue-specific alterations in expression rather than knockouts in severe syndromes.

Ubiquitin

TRIM33, an E3 ubiquitin ligase, is the latest significant GWAS hit in autism. Mutations in ubiquitin E3 ligases are reported as associated with other neuropsychiatric disorders, such as Angelman syndrome, which overlaps phenotypically with autism (UBE3A);[47] Charcot–Marie–Tooth disease (LRSAM1);[48] juvenile-onset Parkinson’s disease (PARK2);[49] and X-linked mental retardation (CUL4B)[50, 51]. Also, CNVs not found in controls affecting ubiquitin pathways, including UBE3A, PARK2, RFWD2, and FBXO40 [18], are enriched in autism patients. The SNP in TRIM33, rs6537825, which displays the strongest association with autism, is nonsynonymous, causes an isoleucine-to-threonine substitution, and has a significant cis-acting regulatory effect on TRIM33 expression in the brain. Functional polymorphisms may slightly alter the function of the ubiquitin-proteasome system, decreasing its efficiency. TRIM33 was differentially expressed in human brains between autism patients and normal subjects, and was reported as one of the top loci associated with differential expressions across the whole genome in an independent sample of post-mortem human brains [52]. TRIM32/ASTN2[53] was found disrupted in males with autism and associated conditions.

Each human genome contains its own unique combinations of alleles that give risk to a unique finely interwoven phenotype. There is a limit of isogenic backgrounds and controlled environments to gain clear specific molecular and cellular phenotypes from certain alleles. The technology of precision genome editing within cell lines or model organism also needs development in order to assess precise variant effects.

The tagging SNPs associated in GWAS implicate a haplotype, but not the specific alleles therein that give rise to the phenotype of autism. Low-frequency alleles may segregate on the haplotype, or multiple genetic effects may exist in the same locus.

The GWAS result typically resides on tissue-specific enhancers distal from the protein-altering region. Sometimes, the effect is not a dysfunctional protein product, but rather a quantitative variation in expression or the cell type-specific expression pattern. Noncoding regions show less evolutionary conservation, complicating their detection. Epigenetics is also proceeding to help the scientific community understand the chromatin states in diverse cell types.

Rare Variants

Rare variants are likely very important, as the alleles that strongly predispose individuals to autism are likely kept at rare frequencies by purifying selection. There are many variants that alter or terminate protein sequence on the order of thousands per genome, complicating assignment of causality to certain variants. Rare variants may have only partial penetrance to the autism phenotype, resulting in subclinical impact on Intellectual Quotient or cognition, and contaminating control populations that may not be as carefully screened as case populations for GWAS[54•].

De novo variants also play a strong role, although they are exceedingly rare to find recurrently and must be placed in context of the gene-based mutation rate.

Common and rare alleles from autism GWAS have limited penetrance, act in concert with multiple other alleles in a genetic background, and are pleiotropic by contributing to multiple phenotypes.

Conclusions

We need progress from the narrow pathophysiological hypotheses from small, disconnected pieces of literature into a more holistic and systematic view of information to derive insights into neurobiological systems [55•].

We seek common biological themes and pathways to converge on successful therapeutic targets (Fig. 1) [56]. Autdb (gene.sfari.org/autdb/) keeps a database of CNV reports associated with autism. Major recent CNV GWAS are listed in Table 3.

Fig. 1
figure 1

Genome-wide association studies (GWAS) of autism studies results. (a) STRING[57] protein–protein interaction network of genome-wide significant GWAS genes; (b) the complex puzzle of autism is partly assembled based on GWAS; (c) count copy number variation (CNV) associations made at each genomic locus. Red indicates deletions, blue indicates duplications, and black indicates both deletions and duplications

Table 3 Major copy number variation genome-wide association studies of autism studies in last 3 years

The unbiased genome-wide search for genetic variants predisposing individuals to autism still remains the mainstay approach to uncover relevant biology. However, we must embrace the reality that hundreds of modest, common, and rare/strong impact/partially penetrant variants abound in each genome.