Autism spectrum disorders (ASDs) are a group of neuropsychiatric disorders that include autism, pervasive developmental disorder not otherwise specified (PDD-NOS), and Asperger's syndrome [1]. First described in 1943, their diagnostic features continue to evolve based on an expanding clinical and biological understanding [2]. A child is diagnosed with an ASD if he or she shows early childhood deficits in: social communication and interaction, involving social reciprocity, non-verbal communication, and maintenance of relationships; language development, such as delay of language onset and maintenance of conversation; and restrictive and repetitive behaviors, including in speech, motor movements, routines, and interests [3]. Classic autism, formally known as autistic disorder, is the most severe of the ASDs, with patients showing impairments in social, communication, and restrictive and repetitive behavior before the age of three. Additional features that are often comorbid with ASDs include sensory and motor abnormalities, attention deficit hyperactivity disorder (ADHD), epilepsy, and developmental regression [4, 5]. Those with ASDs can range from being mentally disabled to having above average intelligence [6]. ASDs are extremely prevalent in our society, with males being affected more than females, especially in high-functioning cases including what is currently known as Asperger's syndrome. Currently, it is estimated that one out of 88 children has an ASD, representing a 78% increase over the past 6 years [7]. This drastic increase is most likely due to sociocultural factors rather than biological factors, including age at diagnosis, changing diagnostic criteria, and broader inclusion rates, although genetic and environmental factors cannot be ruled out [811].

ASDs have a large genetic component. Concordance rates among monozygotic twins, dizygotic twins, and siblings are 50-90%, 0-30%, and 3-26%, respectively, supporting a major genetic contribution [1214]. Interestingly, the risk of ASD in second-born male siblings is threefold that in second-born females, supporting models of reduced penetrance in females [14, 15]. Moreover, a recent study found a roughly twofold greater ASD concordance among full siblings than in half siblings, additionally supporting a genetic contribution and heritability of greater than 50% [16]. Multiple converging research strategies to account for ASD genetic liability have identified a variety of genetic causes that account for roughly 20% of ASD cases. These include genetic copy number variation (CNV; duplicated or deleted regions of the genome greater than 1 kb [17]), syndromic forms of autism (ASD that occurs within a defined syndrome, such as fragile × syndrome), and single gene and metabolic disorders [18, 19]. Recent studies based on CNV and single nucleotide variant (SNV) data put the number of ASD-implicated genes at between 200 and 1,000 [2025], and multiple modes of inheritance have been proposed [2628]. In addition, many ASD-implicated genes are also associated with other neuropsychiatric disorders, including schizophrenia, ADHD, epilepsy, and intellectual disability [22, 2940], and none are specific for autism, suggesting that additional modifying factors dictate the clinical outcome of having disruptions in a specific gene.

The genetic complexity of ASDs mirrors their phenotypic complexity. The core domains within ASD phenotypes - social, language and restrictive and repetitive - also exist as a spectrum, with a distribution overlapping with extreme forms of normal behavior [41]. These subclasses of impairments, or 'endophenotypes', are also observed at some degree in unaffected family members, but are below threshold for clinical diagnosis [42].

Here, we first provide an overview of our most recent understanding of the genetics of ASDs and then highlight convergent pathways and biological mechanisms emerging from gene finding and expression studies. The areas in which molecular mechanisms intersect have great potential to guide future genetic discoveries and to aid in therapeutic design.

The current state of autism genetics

ASD-associated variants have been identified over the past three decades using various techniques; recently, next-generation sequencing on large cohorts has ushered in a wave of gene discovery that has greatly enhanced our understanding of the inheritance of ASDs. Previous work involved the cataloging of ASD-associated major gene disorders, such as fragile × syndrome and tuberous sclerosis [43, 44], cytogenetic analysis, which identified large structural genomic rearrangements, and genetic linkage studies [45]. Over the past several years, genome-wide association studies (GWAS) have revealed a handful of common alleles of modest effect size likely to contribute to ASD [4648]. Analysis of CNV has additionally implicated rare genomic structural changes, both de novo and inherited, of large effect size [20, 21, 4952]. Most recently, exome sequencing has lent insight into the contribution of de novo SNVs [2225]. In this section we review the major studies that have identified both common variants (CVs) and rare variants (RVs) associated with ASDs and will discuss models for how these variants may contribute to ASD pathology.

The contribution of common alleles versus rare alleles

The contribution of both common and rare alleles to ASD has been assessed using GWAS and CNV/exome sequencing studies. Given that ASD is highly prevalent, it was initially thought (consistent with the prevailing common variant-common disease model [53]) that common genetic single nucleotide polymorphism (SNP) variants (those occurring in at least 5% [54] of the population) would lead to this common disorder.

An alternative model is that RVs with moderate to large effect size lead to ASD (the rare variant-common disease model [55]). This is supported by mathematical modeling based on recurrence in multiplex families, which posits a relatively large contribution from spontaneous, de novo mutations with lower penetrance in females [15]. The contribution of RVs has been tested by measuring the frequency of rare CNVs and SNVs in cases and controls and is emerging as an exciting area in ASD genetics. Both types of study have been aided by the availability of large cohorts of ASD and control participants, specifically the Autism Genetic Resource Exchange (AGRE), Simons Simplex Collection (SSC), Autism Center of Excellence (ACE), and the Autism Genome Project AGP). Findings from these studies, outlined in Tables 1, 2, 3, are discussed below.

Table 1 Whole-exome gene finding sequencing studies that reveal common and rare variants associated with ASD
Table 2 Large-scale CNV studies that reveal common and rare variants associated with ASD
Table 3 Large-scale GWAS that reveal common and rare variants associated with ASD

Three large-scale GWAS have been conducted so far [4648] that are adequately powered to detect CVs of modest effect size (Table 3). Only two variants reached genome-wide significance: an intergenic variant, rs4307059, between cadherin 9 (CDH9) and cadherin 10 (CDH10) [46] and rs4141463 in an intronic region of the MACRO domain containing 2 (MACROD2) gene [48]. An additional intergenic variant, rs10513025, between SEMA5A and TAS2R1, had a p-value suggestive of genome-wide significance (p = 2.1 × 10-7) [47].

What conclusions can be made from these GWAS. First, the effect size for any single CV is rather small, as studies have had the power to detect odds ratios (ORs) of greater than 1.5 but have not found such variants. This suggests either widespread epistasis, or that multiple CVs of small effect size are needed for disease, or, alternatively, that the role for CVs in limited (Figure 1). Second, using unaffected relatives as controls, who under some models may harbor a sub-threshold genetic load of associated variants, would decrease the association signal. Studies of endophenotypes or intermediate phenotypes are one strategy that may help in this regard [29]. Third, the epistatic interaction of combinations of CVs, rather than single variants, may confer disease risk, prompting the need for bioinformatic tools capable of testing combinatorial models. In sum, GWAS has not provided evidence that single CVs ranging from modest to large effect contribute significantly to ASD risk. However, at the same time, the cohorts tested have been relatively small compared with the tens of thousands of patients tested in other common diseases [56, 57].

Figure 1
figure 1

Genetic models of ASD risk. Schematic representations of Mendelian and polygenic models of ASD risk are depicted, with evidence for and against each model listed below. In the diagram at the top, the rows represent the type of individual: those with ASD, and those with some risk factors but not sufficient to manifest the clinical syndrome, such as unaffected relatives. The columns represent the basic categories of genetic models under consideration. The size of the variant represents effect size, with a larger symbol indicating increased effect size. For simplicity, these models are presented as distinct categories, whereas in reality ASD risk is likely to be represented by a more continuous distribution of risk architecture. A single asterisk indicates that there is evidence to suggest that de novo CNVs in unaffected controls are smaller [21, 51, 52] and less gene-rich [20, 21] than in people with ASD. A double asterisk indicates that there is conflicting evidence for increased oligogenic heterozygosity [25, 156].

This has led many to a model in which RVs (either CNVs or rare SNVs) of moderate to large effect explain a large proportion of ASD heritability [15]. Over the past 5 years, 6 major studies have conducted refined screens of the genome to identify rare CNVs, both inherited and de novo, in ASD participants and matched controls (Table 2). These studies have shed light on the contribution of rare CNVs to ASD pathophysiology, with several themes emerging. First, in all five studies that examined inherited CNVs, inherited CNVs were equally prevalent in individuals with ASD as in controls [20, 21, 50, 51]. Although one study reports a 1.19-fold higher number of CNVs (de novo and inherited) in cases than in controls, this signal is driven by the contribution of rare de novo CNVs, as removing these CNVs from the analysis results in an equal distribution of CNVs between cases and controls [52]. Second, the emerging consensus from multiple studies is that larger CNVs, containing more genes, are observed in probands versus controls [20, 21, 50, 51]. Third, these studies do not consistently find that simplex families (those with only one member with an ASD) harbor many more large de novo mutations than multiplex families (those with more than one). For example, whereas two studies report a higher number of de novo events in simplex than in multiplex families (10% simplex versus 3% multiplex [49] and 7% simplex versus 2% multiplex [51]), another reports an even distribution of de novo events across the two types of families (5.6% simplex versus 5.5% multiplex [52]). Lastly, many CNVs are multigenic, especially in the genomes of people with ASD, making it difficult to determine the putative causative gene. Determination of pathogenicity of specific genes or pathways may be aided by modeling in animals [58], intersection with other functional data such as gene expression [59], and systems biology approaches, as discussed below. In any case, these large-scale CNV studies have generated the following list of intriguing ASD candidate genes disrupted by rare de novo CNVs in ASD participants: A2BP1, ANKRD11, C16orf72, CDH13, CDH18, DDX53, DLGAP2 [51, 52], DPP6, DPYD, FHIT, FLJ16237, NLGN4, NRXN1, SHANK2, SHANK3, SLC4A10, SYNGAP1, and USP7 [20, 21] (Table 2).

Advances in next-generation sequencing now enable the most powerful approach to finding de novo RVs. Four independent groups have recently conducted whole-exome sequencing projects using non-overlapping samples [2225] (Table 1). Strikingly, across all four studies, the frequency of de novo mutation was equal between ASD and control participants. Another commonality across studies was the correlation between older fathers and increased number of de novo point mutations, which could help explain the paternal-age-dependent risk for ASD [6063]. In addition, two studies report an increase in gene-disrupting SNVs in ASD individuals versus unaffected siblings, although the overall SNV mutation rate is equal between probands and siblings [23, 25]. In one study [25], there was a significantly greater number of non-synonymous and nonsense de novo SNVs in ASD individuals than in unaffected siblings when looking across all genes (OR of 1.93 (all non-synonymous to silent SNVs); OR of 4.03 (nonsense/splice-site to silent SNVs) and brain-expressed genes only (OR of 2.22 (all non-synonymous to silent SNVs); OR of 5.65 (nonsense/splice-site mutations to silent SNVs)), with silent SNVs showing an equal mutation rate between cases and controls. The other study [23] reported a twofold higher number of frame-shift, splice-site, and nonsense de novo mutations in cases than in controls, although there was an equal distribution of de novo missense mutations in this study. By combining genes that harbor frame-shift, splice-site, or nonsense de novo variants in cases across all four studies [2225], five high-priority genes were identified that were disrupted in two independent probands: DYRK1A, POGZ, SCN2A, KATNAL2, and CHD8 (Table 1). There are several interesting lessons from these studies, including the utility of having data from other family members, which can help prioritize variants. One example is that the Wnt/β-catenin signaling pathway was implicated in one study [22], but another that included a larger cohort of unaffected siblings [25] found that this pathway was over-represented in the unaffected siblings. These data suggest that more detailed pathway analysis is needed to understand the precise balance of signaling in this complex pathway [64] and its relationship to disease.

The study of RVs as ASD risk factors poses some challenges. Rarity does not indicate pathogenicity; rare events are seen in controls as well as in ASD participants, and inherited CNVs, by nature, will be present in the transmitting unaffected parent. In addition, a variant may be rare to the point of uniqueness for the sample sizes currently being studied, making causation difficult to establish and increasing the number of false negatives. Given these challenges, it is hard to determine which RVs are risk factors, which modulate risk, and which are unrelated to phenotype. The rarity of these events may preclude using traditional statistical techniques given that these techniques require a much larger sample to prove statistical association with disease [65]. Some reasonable statistical solutions are being developed [25].

One approach to elucidate the intersection of large candidate gene lists is to use systems biology techniques to incorporate our knowledge of protein interactomes. Towards this end, one group conducted network-based analysis of genetic associations (NETBAG) from a list of genes found to harbor de novo CNVs in individuals with ASD [20] and found a preponderance of network genes involved in neuronal motility, targeting of axons, and synapse development [66]. In addition, exome sequencing studies have found that proteins encoded by genes harboring de novo missense or nonsense mutations have a significantly enriched number of protein interactions [24] and form protein networks enriched for ASD candidate proteins that have specific molecular functions [22]. Another approach is to integrate genetic data with gene expression to identify CNVs that perturb gene expression, thus validating a functional effect. Such a study recently demonstrated the power of this method and identified several new potential ASD risk CNVs [59]. To fully understand the wealth of genomics data currently being generated, we will need both appropriate statistical techniques and bioinformatics approaches to identify significant points of convergence among candidate genes.

Integrating genetic findings into a picture of ASD genetic architecture

How do these findings inform our genetic models of disease? Several models have been put forth to explain the inheritance of ASDs. We discuss here the 'major effect model' and several polygenic models: a combination of CVs, a major effect RV in a background of CVs, a combination of RVs and CVs, and an oligogenic 'two hit' model (Figure 1). None of these are truly absolute and we expect that a wide range of genetic models will explain ASD in the individual [41].

The major effect model proposes that one major insult to the genome is sufficient for the disorder. This scenario is supported by the observation that disruptions of single genes can lead to ASD in an apparently Mendelian manner with reduced penetrance, as is seen in several syndromic forms of ASDs. For example, mutations in FMR1 (fragile × syndrome [43]), MECP2 (Rett Syndrome [67]), TSC1 and TSC2 (tuberous sclerosis [67]), CNTNAP2 (Cortical dysplasia-focal epilepsy syndrome [68]), DHCR7 (Smith-Lemli-Optiz syndrome [69]), CACNA1C (Timothy syndrome [70]) and PTEN [71] all result in syndromes with phenotypes overlapping those of ASDs [17]. However, each of these syndromes show incomplete penetrance for ASD and variable expressivity. For example, 10% of people with FMR1 mutations do not show any ASD phenotype [23], and those who do express a wide range of phenotypes, with no more than 30% crossing a threshold for clinical diagnosis of ASD [72]. This incomplete penetrance and variable expressivity suggest that additional factors - genetic, epigenetic, and environmental -modulate the presence of ASD in someone with a major genetic disruption [41]. This pattern of highly variable expressivity should not be unexpected even with major effect alleles, as it has been observed frequently in dominantly inherited neurologic diseases, including a wide range of neurodegenerative diseases [73]. Additional examples of 'major hits' come from early cytogenetic studies, such as maternally inherited duplications of 15q11-15q13, deletions of 22q13, deletions of 2q37, and disruptions in 5p15, 17p11, and Xp22 [74].

An alternative to the major effect model is the polygenic model, in which various combinations of genetic variants in an individual lead to disease. Here, we highlight four non-exclusive polygenic models to illustrate the range of likely possibilities (Figure 1). In the first model, ASD results from a combination of CVs that exceed a tolerance threshold. In this model, relatives of ASD participants carry a subclinical genetic load of ASD-associated CVs. Evidence to support this model is that ASD endophenotypes are sometimes observed in relatives, suggesting that subsets of CV combinations are sufficient for endophenotypes [17]. In addition, several ASD endophenotypes have a normal distribution in the population, which would be predicted by multiple contributory factors of modest to low effect [41].

The second and third polygenic models (Figure 1) are an RV in a genetic milieu of CVs that results in ASD when the load of CVs is sufficient to exceed an arbitrary threshold and a combination of RVs and CVs of various effect sizes that exceed a threshold of tolerance. Shared lines of support for both models are that (i) ASD risk factors, such as 15q11-15q13 [75] and 16p11.2 [76], that are rare inherited disruptions are present in both the unaffected parent and the affected offspring, suggesting that additional genetic modifiers are needed to confer disease risk; (ii) de novo CNVs are found in both cases and unaffected controls, again suggesting that additional genetic modifiers are needed for disease state or that some of these variants do not contribute to disease state; (iii) neuronal networks identified by bioinformatic analysis of transcriptome data are enriched for ASD-associated common and RVs [77]; and (iv) ASD-related component phenotypes are present in relatives owing to sub-threshold loading of common and RVs. Additional support for the polygenic models comes from the observation that even rare, de novo nonsense and splice-site mutations increase the odds of ASD by an average of only 6 fold [23, 25]. This probably represents a large range of genotype risk, but suggests that many rare deleterious mutations are not alone sufficient to cause ASD.

A fourth form of the polygenic model (Figure 1) involves two hits, wherein one RV is tolerated but two hits leads to a disease state, similar to cancer [78]. Some examples of this model have been presented [27, 79], and the model is consistent with inherited RVs being present in the transmitting parent (discussed above), de novo CNVs being found in unaffected controls, and relatives manifesting sub-threshold ASD traits. However, a two hit model is probably not the predominant cause based on recent exome data [2225] and, even in cancer, where this model originated, a more continuous model of genetic contribution is now supported [78]. Taken together, there is the greatest support for a more continuous, and highly heterogeneous, polygenic model in which ASD results from a combination of RVs and CVs that build to exceed a clinical threshold in many different combinations in the population.

Emerging biological themes

ASD genes fall into many potential functional classes; this heterogeneity raises the question of how such diverse mechanisms lead to ASD. To answer this question, it is critical to identify the points of potential convergence among autism candidate genes in developmental and anatomical terms. Toward this end, expression patterns of ASD genes have been annotated using whole-genome transcriptome profiling in blood and brain from ASD and control participants [54]. At the same time, large efforts have been made to build proteomic interactomes of autism candidate genes [22, 24, 80] so as to understand how these molecules functionally intersect. These efforts have been concurrent with the development of large protein and RNA expression databases that provide genome-wide spatial and temporal expression information (the Allen Brain Atlas [81], Gene Paint [82], the Cerebellar Development Transcriptome Database [83], the Ref-Seq Atlas [84], the Human Protein Reference Database [85, 86], the NIA mouse protein-protein interaction database [87], and the Genes to Cognition database [88]).

Definitive demonstration of convergence will require experiments testing causality in model systems. Currently, there are several vertebrate and invertebrate systems, including Drosophila [8991], zebrafish [58], and the mouse, that provide a tractable genetic and neurobiological systems for understanding the biological impact of specific susceptibility from the molecular to the complex behavioral level. Most modeling has been done in the mouse, in which many of the complex behaviors involved in autism can be tested, including social responsiveness [92]. However, given that the common ancestor of mouse and human is separated by 60 million years of evolution, it is not a foregone conclusion that disruption of a gene or genes that cause ASD in humans will lead to similar behaviors in mouse. There is little known about the parallels between neural systems serving social cognition and communication in mouse and human. So, it is reasonable to start without many preconceived assumptions and view the mouse, similar to the fly or zebrafish, as a genetically sensitized system for exploring the molecular, cellular, and circuit-level mechanisms of ASD-related genetic variation.

Crawley and colleagues [92] have elegantly outlined three basic levels of model validity: (i) construct validity (the model contains the same biological perturbation as the human disorder, for instance genetic or anatomical); (ii) face validity (the model displays endophenotypes or phenotypes that mirror the human disorder); and (iii) predictive validity (the model has a similar response to treatments effective in humans). Using this construct, it is remarkable that several ASD-associated genetic variants have recapitulated many human ASD endophenotypes when modeled in a mouse, including Cntnap2 knockout (language, restrictive/repetitive, and social domains) [93], Nlgn4 knockout (language and social) [94], En2 knockout (restrictive/repetitive and social) [95, 96], 15q11-13 duplication; chromosome 7 in mouse (language, restrictive/repetitive, and social) [97], Gabrb3 knockout (restrictive/repetitive and social) [98], Oxt knockout (language and social) [99101], Avpr1b knockout (language and social) [102, 103], and Fgf17 knockout (language and social) [104]. Inbred strains of mice, such as BTBR, BALB, and C58/J, also show ASD endophenotypes [92]. However, it is unclear exactly how a behavior in mouse, such as deficits in ultrasonic vocalization, translates into a human phenotype, such as language delay. Indeed, disparity in the molecular, anatomical, and neuronal circuitry between mouse and humans is likely and must be interpreted with caution. Keeping these caveats in mind, modeling of ASD variants in mouse is proving to be an exceptionally useful tool in understanding potential ASD mechanisms. It is hoped that combining mouse models and in vitro models will facilitate finding convergence points, especially at the molecular level, and will provide a tractable avenue for pharmaceutical intervention. Here, we touch on these areas of intersection at the molecular, cellular, systems, and neuroanatomical level and discuss progress toward integration across levels.

Neuronal activity and ASDs

One potential point of convergence developing from gene finding studies is that autism pathophysiology involves proteins that both modulate neuronal activity and show activity-dependent expression (Figure 2f). Of the handful of proteins identified by whole-exome sequencing reviewed above, SCN2A, SCN1A, and GRIN2B all code for subunits of synaptic ion channels, with SCN2A and SCN1A coding for the α subunits of voltage-gated sodium channels [22, 25]. GRIN2A, an N-methyl-D-aspartate (NMDA) receptor subunit mapping within the 16p11-13 region, was additionally identified in a large-scale ASD association study [105]. NMDA receptors are ionotropic ion channels that are critical regulators of activity-dependent synaptic plasticity. Other notable ASD candidate genes that code for ion channels are the ionotropic glutamate receptors GRIK2 [106] and GRIA3 [107] and the voltage-dependent calcium channel subunits CACNA1C [70] and CACNA1H [108].

Figure 2
figure 2

Emerging biological themes in ASD. (a,b) Predominant areas of neuroanatomical convergence in ASD. (a) Aberrant brain growth trajectories, with the size of ASD brains outlined in red against a background of normal brains [144146] (images adapted from [157]); (b) abnormal cortical columns [151]. (c,d) Systems-level convergence in ASD. (c) White matter tract and functional connectivity abnormalities [126, 147150, 152, 153] (images reproduced with permission from Mark Bastin, University of Edinburgh, UK); (d) excitation/inhibition network imbalances [93, 132, 136141], (e-g) Genetic convergence at the cellular and molecular levels. ASD-associated genes implicated in (e) activity-dependent protein synthesis [17, 21, 23, 79, 109, 113123], (f) neuronal activity [21, 22, 25, 70, 105112], and (g) neuronal cell adhesion [2022, 3437, 4952, 68, 75, 79, 93, 109, 126129, 131137].

ASD candidate genes are also enriched in sets of transcripts regulated by neuronal activity (Figure 2f). For example, UBE3A [21, 109], DIA1 [110], and PCDH10 [110] are all regulated by MEF2A/D, a transcription factor that has a major role in activity-dependent development of the synapse [111]. Moreover, the autism candidate gene NHE9 is regulated by NPAS4, a transcription factor regulated by neuronal activity [110]. Lastly, a recent study identified ASD candidate genes UBE3B, CLTCL1, NCKAP5L, and ZNF18 by whole-exome sequencing and found their expression to be regulated by neuronal depolarization [112]. In sum, these results point to a potential contribution of genes regulated by or regulating neuronal activity to autism pathophysiology.

Post-synaptic translational regulation

Another potential point of molecular convergence in autism genetics is activity-dependent protein metabolism at the postsynaptic density (PSD), a protein-rich specialization at the postsynaptic membrane critical for effective neural transmission (Figure 2e). Single gene disorders that intersect with ASD gave us first clues that this process is important in the pathophysiology of autism. Mutations in FMR1, the leading inherited cause of ASD [113], results in the absence of Fragile × mental retardation protein (FMRP), a key regulator of activity-dependent protein synthesis at the synapse [114]. FMRP-mediated translation is regulated in an activity-dependent manner by the autism candidate gene, CYFIP1, located within the 15q11-13 duplication region [115]. Recently, whole-exome studies have reported an enrichment of FMRP-associated genes in the lists of genes disrupted by RVs in ASD participants [23]. FMRP is associated with the autism candidate genes MET [116], PTEN, TSC1, TSC2 and NF1 [117], which are also located within the PSD [118120]. These genes are part of the phosphatidylinositol 3-kinase (PI3K)-AKT-mTOR pathway which is activated by metabotropic glutamate receptor signaling [119, 121], is an upstream effector of translation regulation, and is involved in cellular proliferation [122]. Individuals with RVs in several of these genes have been found in the large gene finding studies outlined above (PTEN [22], TSC [22], MET [21], NF1 [21]), and additional regulators of protein translation have been identified (RPL10 [21]).

Ubiquitination pathways, which regulate protein metabolism at the PSD, are also associated with autism (Figure 2e). Most notably, UBE3A, a protein implicated in the ASD-associated disorder Angelman's syndrome [17], is involved in ubiquitination of its target proteins, such as the FMRP translational target ARC [123], which leads to their degradation at excitatory postsynaptic densities. RVs in UBE3A and genes encoding associated proteins have been found in recent large-scale CNV studies (UBE3A, PARK2, RFWD2, and FBXO40 [109]; USP7 and UBE3A [21]).

Although not directly involved in protein metabolism, another large group of ASD proteins converge at excitatory postsynaptic densities. The most notable are the synaptic scaffolding proteins SHANK2 and SHANK3, identified as ASD risk factors in several studies [27, 52, 124, 125]. Recently, an autism protein interactome built using a human yeast two-hybrid screen and 35 ASD-implicated proteins as bait found that a large group of PSD-localized ASD-associated proteins interact. This study additionally confirmed the SHANK3-PSD95 interaction, added nine additional protein binding partners to this interaction, and identified novel PSD interactions such as the SHANK3-TSC1-ACTN1-HOMER3 interaction [80]. In sum, these data point to the excitatory PSD as a hot spot for ASD-associated molecules, making it a potential target for drug discovery.

Neuronal cell adhesion

ASD-associated mutations in several proteins involved in cell adhesion include CNTNAP2, CNTN4, CNTN6, NLGN1-4, NRXN1, PCDH9, and CHL1 (Figure 2g). Multiple converging lines of evidence implicate CNTNAP2 in ASD pathology, including its role in a syndromic form of autism [68], variants found in linkage and association studies [3537], presence of RVs [79], its impact in functional magnetic resonance imaging (MRI) readouts in humans [126], and molecular evidence that its knockout leads to the behavioral manifestation of all three core domains of autism as well as neuronal migration abnormalities [93]. A member of the neurexin superfamily, CNTNAP2 is involved in cell-cell adhesion, clustering of potassium channels at the juxtaparanode [127], neuronal migration, and regulation of GABAergic interneuron numbers [93]. There are data to support an additional contactin family member, CNTN4, in autism pathophysiology [109, 128, 129], although this has been recently challenged [130]. CNTN6 has also been implicated by CNV studies [20, 4952, 75, 109, 131]. Neurexins and neuroligins have both been heavily implicated in ASD pathophysiology. Neurexins are located presynaptically and bind to postsynaptically localized neuroligins. These molecules modulate both excitatory and inhibitory synaptic function [132]. NRXN1 has been identified as an ASD risk factor by cytogenetic analysis [133], large-scale CNV studies [21, 50, 109], and case reports [34]. NLGN1, NLGN3 and NLGN4 have also been identified in several studies [21, 22, 109, 134, 135], and CNTNAP2 is homologous to Drosophila Neurexin 4 [89]. Additional evidence for the role of NLGNs and NRXN1 in ASD involves introduction of ASD-associated variants, knockout, or overexpression of these proteins in mouse models. These studies have recapitulated various aspects of the ASD phenotype [132, 136, 137] and have additionally implicated NLGN2. PCDH9 and CHL1 may also contribute to ASD based on CNV studies [20, 4952, 75, 109, 131].

Balancing excitation and inhibition

Functional studies in mouse models have suggested that some of the ASD candidates contribute to network dynamics by altering the balance of excitation and inhibition (Figure 2d). For example, a slight increase in levels of NLGN2 in mouse reduces the excitation to inhibition ratio by decreasing the ratio of excitatory to inhibitory synapses, increasing inhibitory synaptic contacts, and increasing the frequency of miniature inhibitory PSCs in the frontal cortex [132]. In addition, introducing the ASD-associated NLGN3 missense mutation into a mouse increases inhibitory function in cortex [136]. Similarly, Nrxn1a knockout mice exhibit a decrease in hippocampal excitatory function [137]. Knocking out Cntnap2 in a mouse reduces cortical GABAergic interneuron numbers, potentially altering the balance of excitation and inhibition [93]. In addition, Shank3 knockout decreases cortical excitatory transmission [138]. Fmr1 knockout mice show several excitatory/inhibitory imbalances, including impaired inhibitory transmission in the amygdala [139], decreased excitatory inputs into inhibitory neurons in the cortex [140], and an increased inhibitory transmission in the striatum [141].

There is corroborating data for the role of excitation and inhibition in autism from whole transcriptome studies of human postmortem brain. One recent study used a sophisticated systems biology approach, weighted gene co-expression network analysis (WGCNA), to build transcriptome networks from human ASD and control postmortem brain samples [142]. The top autism associated WGCNA network, enriched for ASD-associated GWAS targets, showed high overlap with a previously identified interneuron-related module [143]. Understanding how perturbations in this delicate balance of excitation and inhibition lead to disease will be crucial in understanding ASD pathophysiology. Considerations in this endeavor will include a clear understanding of how deficits affect both microcircuits and more long distance connectivity.

Connecting convergent molecular pathways with higher-order ASD phenotypes

Effective drug design would be facilitated by convergence at the level of molecular pathways. However, convergence at higher levels is also plausible. In fact, some of the most reproducible clinical signatures have been at the level of brain structure and function. For example, the trajectory of head growth, which corresponds to brain size, seems to be reproducibly abnormal in children with ASD, who have smaller head circumferences at birth followed by a burst in head circumferences postnatally, eventually reaching normal size around adolescence (Figure 2a) [144146]. Studies have also repeatedly shown decreases in white matter tracts in autism (Figure 2c) [147, 148]. Specifically, long-range connections seem to be weakened, whereas local connections are strengthened [149, 150]. Cortical structure abnormalities, specifically denser and narrower cortical columns, have also been reported (Figure 2b) [151], and functional MRI neural signatures for autism are being defined [126, 152, 153].

Even if the point of convergence is at the molecular level, how do we connect these findings with those at the macroscopic level, described here? Some salient examples are worth noting. As discussed above, the PI3K-AKT-mTOR pathway is strongly enriched for ASD candidate genes. This pathway affects cellular proliferation, which could, in theory, lead to the abnormal brain growth reported in autism (Figure 2a). However, elucidating the 'dark matter' between this molecular pathway and brain size will not be trivial. Another example involves the link between activity-dependent brain specializations during early postnatal development and molecular pathways that rely heavily on neuronal activity, described as a point of molecular convergence above. A recent study reported a failure of frontal and temporal cortical specialization in autism brains as defined by transcriptome signatures [142]. This could be a result of disruptions in activity-dependent molecular pathways needed at critical developmental times. Nevertheless, connecting the dots between different levels of analysis will be a formidable task.

One proof of principle model involves the gene CNTNAP2 [154]. The ramifications of genetic perturbations in this gene have been studied on multiple levels, spanning molecular studies, mouse models, and functional MRI studies. A thorough examination of implicated pathways from molecules to brain structure will need to be conducted to integrate our understanding of autism pathophysiology across levels.

Future directions

The combination of worldwide collaborative data and sample sharing with advanced genomic techniques and bioinformatic strategies has provided the essential foundation for uncovering the genetic and molecular underpinnings of ASD. The contributory genes uncovered in the past 5 years have led to a revolution in our understanding of the disorder. Not surprisingly, the near future is highly focused on whole-genome and whole-exome sequencing of large patient cohorts, which is facilitated by continuing technological advances that reduce cost barriers.

The major obvious questions raised by this approach are: what degree of insight will be obtained and what advantages will whole-genome sequencing provide over whole-exome sequencing? Given the role of gene dosage changes, implicated by CNV [59], and evidence for splicing dysregulation in ASD [142], one should expect a significant contribution of non-coding, regulatory changes to ASD susceptibility. Thus, we envision a significant advance once whole-genome sequencing can be performed cost-effectively in large cohorts. At the same time, exome sequencing is predicted to yield dozens of new ASD genes, so it remains a productive short-term approach [2225]. Large population cohorts, perhaps using clinical sequencing rather than investigator-organized research cohorts, provide one avenue for comprehensive genetic evaluation in the necessary number of participants in an efficient manner, despite many potential barriers [155].

One notable absence in this discussion has been linkage analysis, perhaps raising the question: is genetic linkage dead in the age of genome sequencing? Few linkage peaks have been identified and replicated and dense SNP analysis of linkage peaks has not revealed common variation accounting for the linkage signal [17]. Thus, replicated linkage peaks are probably signals for aggregation of RVs. Given the emergence of RVs as factors in ASD susceptibility, genetic linkage, especially using quantitative trait approaches [29], probably provides a reasonable means for restricting the search space for ASD risk variants and assessing their segregation in families.

The next crucial issue is how to validate the pathogenicity of identified variants, especially non-coding SNVs. We envision that associated variants from these studies will be prioritized on the basis of their ability to be translated into tractable models of disease. A clear limitation is that associated variants may be found in poorly annotated non-coding regions. It has often been thought that non-coding variants are harder to functionally annotate, but in some ways they may prove more tractable to assess in high throughput. For example, it can be a very long road to understanding the effect of a missense mutation in a protein of known or unknown function. In contrast, many variants found in poorly annotated non-coding regions can be tested for cis or trans effects on gene expression, first in expression quantitative trait locus datasets and then in neuronal cell culture or in mouse models. As genome function becomes more densely annotated, the ease of such analyses will further increase. Thus, although there still remain major challenges in variant identification and initial assessment of their pathogenicity, these can be largely overcome by technology and greater numbers. However, phenotype definition and understanding what specific aspects of the broad ASD phenotype relate to individual genetic risk factors remains only superficially explored and will continue to be a major roadblock for those interested in understanding biological mechanisms of disease.

Now that significant contributions to genetic risk for ASD have been uncovered, it behooves us to perform parallel phenotypic analyses at multiple levels in humans and model systems to understand the mechanisms of diverse forms of major contributory mutations. For example, understanding what a group of a dozen syndromic forms of ASD have in common and what distinguishes their phenotypes from a molecular, cellular, and cognitive standpoint would be informative. Furthermore, combining information on chromatin structure and epigenetic modification to sequence data may reveal environmental contributions and their potential intersection with known genetic risks. In this manner, combining various forms of high-throughput data and pathway analyses with multiple levels of phenotype data in well-studied cohorts is likely to be necessary to deepen our understanding of ASD pathophysiology. Despite the extraordinary genetic heterogeneity revealed by recent studies, various forms of high-throughput data and pathway analyses discussed here have provided evidence of biological convergence. As our understanding of genetic contributions to ASD expands from the current dozens of genes into the hundreds from ongoing human genetic studies, the notion of biological convergence can be tested more rigorously. Furthermore, because even RVs on average have intermediate effects with regard to ASD risk, exploration of potential epistatic interactions between loci may contribute to a clearer picture of the landscape of ASD genetics. In the mean time, these new genetic findings from the last few years provide us with a starting point to explore the first generation of genetically targeted therapeutics in ASD.