Introduction

Despite the complexity of language and learning disorders, individual genes are being defined which appear to influence the development of abilities that are necessary in speech, language, and reading. Most of the identified candidate genes involve reading disability, and although the evidence supporting some of these genes is still somewhat tenuous due to small sample sizes and limited replication, most are known to be involved in early development, particularly neuronal migration (Galaburda 2005; Gabel et al. 2010; Poelmans et al. 2011). As will be discussed below, most of these candidate genes have been associated with several learning and language phenotypes, suggesting that they facilitate learning processes which are basic to learning reading and language. Similar pleiotropic effects are seen for several genes that primarily affect autism or language but have also shown effects on reading, including CNTNAP2 and ATP2C2 (Vernes et al. 2008; Newbury et al. 2011). However, despite replicated evidence for association of single nucleotide polymorphisms within and around the genes, very few coding mutations have been reported to account for their influence on these disorders. This has led to the hypothesis that mutations affecting reading and related disorders are likely to be in regulatory regions, controlling the quantity rather than quality of the gene product (Bates et al. 2011). Alterations of gene expression can be caused by mutations in gene promoters and enhancers located near the gene, but mutations in genes that mediate epigenetic controls of gene expression have been found that affect developmental learning disorders. These mutations may be in regions located further from the target gene, making it more difficult to recognize their significance.

Regulatory regions of genes influencing language and learning disorders

Of all of the genes that have been proposed as candidates for reading disability and language impairment, six genes have been well characterized with respect to their influence on reading and language disorders, the regions within and around the genes that appear to contain causal mutations, and the effects of the putative mutations or risk alleles on gene transcription: DYX1C1, DCDC2, KIAA0319, ROBO1, and the co-regulated genes MRPL1 and C2ORF3.

DYX1C1

The 15q21 region was identified as a candidate region for a gene or genes influencing reading disability (RD) through linkage studies (Fulker et al. 1991; Grigorenko et al. 1997), defining the DYX1 (DYsleXia-1st reported) locus. The DYX1C1 (DYX1-Candidate 1) gene was specifically targeted after a translocation t(2;15) (q11;q21) disrupting the previously uncharacterized gene was observed in a family with RD (Taipale et al. 2003). Since then, the DYX1C1 protein has been found to contain estrogen-receptor-binding sites (Massinen et al. 2009) and knockdown of the gene in embryonic rat brain produces delays in neuronal migration (Wang et al. 2006).

Sequence analysis of the DYX1C1 coding regions identified a missense mutation in some RD families: rs57809907, 1249G>T, which results in Glu417X and truncates the protein by four amino acids (Taipale et al. 2003), but this variant has not been consistently associated with reading disability in other studies (Scerri et al. 2004; Wigg et al. 2004; Marino et al. 2005; Meng et al. 2005a; Dahdouh et al. 2009; Bates et al. 2010). Another missense mutation, rs17819126 (271G>A, Val91Ile) has been associated with reading ability (Bates et al. 2010), but analyses of putative effect on protein function by the SIFT (Ng and Henikoff 2003) and PolyPhen (Adzhubei et al. 2010) algorithms indicate that this should be a benign change for the protein (Ensembl release 63: www.ensembl.org). Since studies have found association of RD with other SNPs within the gene, it seems likely that mutations affecting RD are in the regulatory rather than coding regions. A possible candidate is the –3G>A SNP rs3743205 in the 5′ untranslated region (UTR). In the original report by Taipale et al. (2005), the A allele was associated with RD, but subsequent reports found association with the common G allele (Scerri et al. 2004; Wigg et al. 2004; Dahdouh et al. 2009). Interestingly, a study of RD in Chinese children also found strong association with the G allele (Lim et al. 2011). Studies of transcription factor binding in the promoter region have shown that the A allele shows decreased binding to a repressive transcription factor, resulting in increased DYX1C1 expression (Tapia-Paez et al. 2008), leading Lim et al. to hypothesize that the A allele is actually protective compared to the downregulating G allele, and that instances where the A allele appeared to be associated with RD could be secondary to linkage disequilibrium with a second causal variant nearby. Two other SNPs, rs12899331 and rs16787, in the promoter region were also found to be involved in transcription factor binding (Tapia-Paez et al. 2008), but these were not found to be associated with RD in later studies (Dahdouh et al. 2009). Thus, further studies are needed to define the role of particular regulatory regions of DYX1C1 in the cause of RD.

There is also evidence for pleiotropic effects of DYX1C1 on short-term memory and mental calculation, a mathematics measure (Marino et al. 2011a). Linkage to the DYX1C1 region was found with speech sound disorder phenotypes (SSD) in one study (Smith et al. 2005) but not in a subsequent study (Stein et al. 2006), which located the SSD region more centromerically. Further studies are needed to determine whether the linkage signal for SSD was actually related to DYX1C1.

DCDC2

Initial linkage and association analysis defined the DYX2 locus on chromosome 6p22 (Grigorenko et al. 1997; Cardon et al. 1994; Cardon et al. 1995; Fisher et al. 1999; Gayan et al. 1999; Kaplan et al. 2002; Deffenbacher et al. 2004), and several subsequent studies have reported associations with the DCDC2 (doublecortin-2) gene within the region (Newbury et al. 2011; Meng et al. 2005b; Schumacher et al. 2006; Scerri et al. 2011). The structure of the gene is analogous to the X-linked DCX gene which is known to be involved in microtubular structure and influences neuronal migration. Mutation of the DCX gene produces lissencephaly in males and cortical abnormalities in females (des Portes et al. 1998; Sossey-Alaoui et al. 1998); accordingly, knockdown of DCDC2 produces delays in neuronal migration in embryonic rat brain (Meng et al. 2005b).

Sequence analysis of coding regions of DCDC2 in RD families has not identified causal mutations; however, association of RD was reported with a deletion in intron 2, termed BV677278, which appeared to contain transcription factor binding sites (Meng et al. 2005b). One study failed to replicate this association (Ludwig et al. 2008), but other studies have replicated it (Brkanac et al. 2007; Harold et al. 2006; Wilcke et al. 2009; Marino et al. 2011b), although the statistical significances in some of these studies were weak. Still, the importance of this region in gene regulation has been demonstrated by in vitro studies showing that sequences in the region act as enhancers for DCDC2 expression (Meng et al. 2011). Furthermore, these differences in gene expression may have a measurable phenotypic effect on brain structure in that BV677278 variants have been associated with differences in gray matter volume in unselected individuals (Meda et al. 2008), so this deletion appears to be an example of a mutation in a regulatory region that affects RD.

In addition to influencing RD, SNPs in DCDC2 have been associated with both hyperactive and inattentive forms of ADHD, indicating that this gene can affect both disorders (Couto et al. 2009). More recently, evidence has been presented that DCDC2 contributes to the risk for autism in families with both dyslexia and autism (Cuccaro et al. 2011).

KIAA0319

DCDC2 and KIAA0319 coding regions are separated by only 160 kb and were both included in the candidate region defined by linkage and association analyses that identified the DYX2 locus. Association of KIAA0319 with RD phenotypes as well as reading in the normal range has been supported by numerous studies (Newbury et al. 2011; Dennis et al. 2009; Scerri et al. 2011; Harold et al. 2006; Cope et al. 2005; Paracchini et al. 2006; Luciano et al. 2007; Paracchini et al. 2008). Although the function of the gene is not clear, knockdown of expression in embryonic rat brain results in delayed neuronal migration (Paracchini et al. 2006), similar to the knockdowns of DYX1C1 and DCDC2 noted above.

The SNPs showing association with RD tend to be located in the 5′ UTR, the first untranslated exon, and the first intron (Elbert et al. 2011), suggesting regulatory functions. Expression of the allele containing the RD-associated SNP haplotype in this region of KIAA0319 was shown to be decreased in cell lines from individuals with RD (Paracchini et al. 2006). Moreover, one associated SNP, rs9461045, has been shown to have a regulatory function. Reporter assays showed that the risk allele, which was hypothesized to create a binding site for the repressor OCT-1, resulted in decreased expression of KIAA0319 in vitro, and knockdown of OCT-1 restored expression (Dennis et al. 2009).

In recognition of the likely influence of epigenetic mechanisms on KIAA0319 expression in the etiology of RD, regions of acetylated histones were mapped in and around the gene in a neuroblastoma cell line to identify promoter regions (Couto et al. 2010). A 2.7-kb acetylated region was found spanning the 5′ UTR, first exon and first intron of KIAA0319 which corresponded to the location of five SNPs that had been associated with RD phenotypes in other studies. In addition, SNPs within or very near the acetylated region have been associated with language impairment phenotypes (Newbury et al. 2011; Rice et al. 2009) and linkage to the DCDC2/KIAA0319 region has been reported for SSD (Smith et al. 2005). Studies in an unselected population did not show effects on language in the normal range, suggesting that this gene has more of an effect on language impairment (Scerri et al. 2011).

ROBO1

Linkage analysis of a large family localized RD to a region on chromosome 3 (3p12-q13) which was designated DYX5 (Nopola-Hemmi et al. 2001). A translocation within this region, t(3;8)(p12;q11), was found in an individual with RD and it was determined that this disrupted the ROBO1 gene (Hannula-Jouppi et al. 2005), making it a candidate for RD. This gene is the human homologue of the roundabout gene in Drosophila and mice and is known to affect axonal guidance through the midline of the CNS and spinal cord. The coding regions of the ROBO1 gene were sequenced in the original DYX5 family, but no causal mutations were found. Association was found with a haplotype of SNPs in the gene in this family, and transcription of the allele containing the risk haplotype was decreased in lymphoblasts from individuals with RD. The individuals SNPs were not felt to have a regulatory function since they were also noted in unaffected individuals, pointing to an unknown regulatory mutation in the individuals with the risk haplotype. Although subsequent studies have not replicated the association of ROBO1 SNPs with RD, linkage has been found with SSD (Stein and Schick 2004) and SNP association has been found with phonological buffer deficits in an unselected population (Bates et al. 2011) indicating that the gene’s primary effects could be on language abilities related to RD.

MRPL19 and C2ORF3

The designation of these two genes as candidates in influencing RD rests on the assumption that the causal mutation is in a regulatory region that is about 34 kb from the genes. The 2p16–p12 region was first highlighted by a genome-wide microsatellite linkage study in an extended family (Fagerheim et al. 1999), and subsequent linkage studies replicated these results across the region (Petryshen et al. 2002; Fisher et al. 2002; Francks et al. 2002; Kaminen et al. 2003). SNP association studies focused on the 2p12 region, with results indicating a region that did not contain recognizable genes (Peyrard-Janvid et al. 2004; Anthoni et al. 2007). The transcription products of three nearby genes, FLJ13391, MRPL19, and C2ORF3, were examined to determine if the risk haplotype of SNPs in that region had an effect on gene expression. There was no effect on the transcription of FLJ13391, but transcripts of one allele from MRLP19 and C2ORF3 were decreased in individuals who carried the risk haplotypes in the adjacent region (as determined by heterozygous SNPs within the coding regions of the two genes). This suggested that an unknown mutation in the region of SNP association has an effect on gene expression of both genes. The MRPL19 protein is a component of the mitochondrial ribosome, but the function of the C2ORF3 gene is unknown.

Overall, there is substantial evidence for involvement of mutations in regulatory regions of the primary candidate genes influencing RD, and several of these genes also affect related language and learning disorders. Further investigation of epigenetic mechanisms of gene regulation is likely to be profitable, including elements that may be quite distant from the genes they affect, or factors that regulate more than one gene.

Mechanisms of epigenetic gene regulation

The term “epigenetics” refers to the controls of gene expression that are maintained through somatic cell division (and occasionally in germline cells) but do not involve change in the DNA code itself. Stable epigenetic controls are applied and subsequently maintained in cell lineages during differentiation and cell proliferation, and reversible epigenetic changes in gene expression can occur in differentiated cells in response to external signals (Jaenisch and Bird 2003; Ptashne 2007; Day and Sweatt 2011). The two major methods of epigenetic regulation involve changes in methylation of cytosines in regulatory regions of DNA or modification of histone proteins, primarily through acetylation and methylation.

Epigenetic modifications act to control the accessibility of DNA to transcription. Methylation of cytosines in regulatory elements or the complexing of DNA around nucleosomes can block gene expression, while removal of DNA methylation or relaxing of histone complexing can make DNA more accessible. Methylation often acts on CpG islands or shores, which are regions of cytosine–guanine dinucleotide sequences in promoters of genes, thus inhibiting the binding of transcription factors. One or both strands may be methylated, which can fine-tune the degree of expression. In addition, methylation of these regions can recruit histone modifications that also block transcription machinery. In contrast, methylation within the gene exons and introns is correlated with gene expression. DNA methylation is mediated by a family of DNMT enzymes which apply and maintain methylation tags (Day and Sweatt 2011; Portela and Esteller 2010; Gropman and Batshaw 2010).

Histone modification affects the wrapping of DNA around nucleosomes, which are octomers composed of two each of four different histone proteins: H2A, H2B, H3, H4. DNA complexing with nucleosomes is part of chromatin compaction into heterochromatin, which generally is less transcriptionally active. Each histone protein has multiple sites that are subject to modification (methylation, acetylation, phosphorylation, ubiquination, ADP ribosylation, or sumoylation) (Kouzarides 2007). Specific sites are designated by the histone type and the amino acid number, such that H4K12 designates the 14th amino acid in an H4 protein, which is a lysine (K). These modifications are reversible, mediated by families of enzymes such as histone acetylases (HATs), histone deacetylases (HDACs), histone methylases (HMTs), histone demethylases (HDMs), and so on. The combination of histone modifications at different histone sites appears to constitute a “code” or “language” that determines when, where, and how much a particular gene is expressed (Day and Sweatt 2011; Portela and Esteller 2010; Lee et al. 2010).

Mutations of genes affecting epigenetic mechanisms in humans and animal models

Since epigenetic mechanisms regulate the differential expression of genes in developing tissues, gene mutations that interfere with DNA methylation or histone modification may disrupt multiple organ systems. Table 1 gives several examples of developmental cognitive disorders caused by mutations in genes that disrupt epigenetic processes resulting in varying degrees of motor, craniofacial, and skeletal problems in addition to their effects on cognitive abilities. Other cognitive disorders such as Alzheimer Disease and Huntington Disease develop in adulthood through gradual neurodegeneration secondary to deregulated genes.

Table 1 Developmental disorders resulting from disruption of epigenetic mechanisms (Galaburda 2005)

While the effects of mutations of genes that affect epigenetic processes can be severe and disrupt multiple systems, other genetic effects on epigenetic modification can be much more circumscribed. The “language” of methylated DNA and specific histone modifications can precisely control gene expression to produce and maintain tissue-specific and region-specific cellular differentiation. Once differentiation is completed, the same regulatory mechanisms appear to be involved in the changes in gene expression that result from learning and memory in the hippocampus (Day and Sweatt 2011). For example, certain types of learning are correlated with specific patterns of histone modification in chromatin of hippocampal cells, e.g., the learning of contextual fear responses in mice is associated with acetylation at H3K9, H3K14, H4K5, H4K8, and H4K12, as well as changes in methylation and phosphorylation at other sites. Moreover, loss of acetylation at H4K12 interferes with learning, which is normalized by introduction of an HDAC inhibitor which restores actylation at that site (Day and Sweatt 2011; Peleg et al. 2010). Similarly, interference with the machinery that applies histone modifications or DNA methylation such as HATs, HDACs, HDMs, or DNMTs also cause learning problems; for example, mutation of the Cbp gene in mice, or blockage of DNA methylation through inhibition of Dnmts will both interfere with memory and long-term potentiation in the hippocampus (Day and Sweatt 2011; Alarcon et al. 2004; Lubin et al. 2008; Levenson et al. 2006; Miller and Sweatt 2007). The EHMT gene in humans encodes a histone demethylase and heterozygous deletion of this telomeric gene causes Kleefstra syndrome, a condition with severe intellectual disability, dysmorphic features, and behavioral problems such as autistic features, aggression, and bipolar disorder that can change in expression and severity over time (Kleefstra et al. 2009). In Drosophila, mutation of the EHMT homologue results in disruption of a jumping reflex and courtship memory. These deficits were also rescued by expression of EHMT in adult flies (Kramer et al. 2011). Additional studies of mouse models of Alzheimer disease and other neurodegenerative diseases have also shown rescue of learning deficits with treatment by HDACs (Fischer et al. 2007; Guan et al. 2009).

Most recently, there have been several reports of alterations of methylation in autism spectrum disorders. Alterations in methylation of CpG islands associated with the OXTR oxytocin receptor gene have been reported in brain tissues of individuals with autism spectrum disorders (Gregory et al. 2009). Abstracts at the International Congress of Human Genetics/American Society of Human Genetics meeting in Montreal in October 2011 reported that identical twins discordant for autism had significantly different genome-wide methylation patterns (Wong et al. 2011), and siblings discordant for autism had differences in 5-hydroxymethylcytosine across exonic sequences. Finally, DNA methylation was altered in CpG islands associated with the candidate gene SHANK3 in brain samples from individuals with autism spectrum disorders, resulting in an altered pattern of isoform expression (Zhu et al. 2011). The influence of more remote regulatory regions was noted in the downregulation of the CHRNA7 gene in autism by the Prader Willi imprinting center at 15q11.2–13.3 (Yasui et al. 2011).

Mouse models of human epigenetic syndromes, such as those listed in Table 1, can show severe phenotypic effects similar to their human counterparts (unless the models are constructed such that the mutations are only expressed in selected tissues); however, many of these disorders are caused by null mutations that have a significant effect on function. Other models of mutations of genes affecting epigenetic regulation can show much milder changes in hippocampal neurons or dendritic spines (Lagali et al. 2010). It seems possible, then, that less disruptive mutations or mutations of other genes may have much more focused effects on development and thus may be much more analogous to deficits that affect reading and language disorders. Thus, while mutations affecting epigenetic mechanisms have not been described in reading disability or language impairments, the role of epigenetic changes in learning and autism and the hints of potential therapy make it especially worthwhile to look for mutations in such genes in individuals with language and learning problems.

Approaches to the identification of epigenetic mechanisms in humans

Although candidate genes have been identified for reading disability and language impairment, the SNPs in these genes appear to account for a small portion of the phenotypic variability. In contrast, fairly substantial heritabilities have been claimed for these disorders, between 0.45–0.85 depending on population and definitions (Gayan and Olson 2001; Hawke et al. 2006; Astrom et al. 2007; Spinath et al. 2004; Tomblin and Buckwalter 1998; Dale et al. 1998; Bishop and Hayiou-Thomas 2008). There are several possible explanations for this “missing heritability,” but one of the primary reasons appears to be inherent in the current studies of SNPs, particularly in the large panels that are used for genome-wide studies. The SNPs selected for such panels are generally common in the population, which makes them more informative in comparisons between affected and unaffected individuals, but assessment of individual common SNPs ignores rare variants which are likely to have more impact, and also ignores epistatic interactions between loci (Manolio et al. 2009). There are approaches that enhance the identification of causal genes from genome-wide association studies (GWAS) data such as the simultaneous analysis of multiple variants associated with a gene (Neale and Sham 2004; Huang et al. 2011; Li et al. 2011) or focus on SNPs associated with loci which show phenotype-based differences in expression (eQTLs or eSNPs) (Innocenti et al. 2011; Majewski and Pastinen 2011) and “next generation sequencing” allows the analysis of rare as well as common variants around a gene; however, sites involved in epigenetic control of gene expression may not be included in the set of loci in gene-based analysis, and the variation in expression of eQTLs may be due in part to mutations in epigenetic regions which may be somewhat distant from the gene itself (Ernst et al. 2011). The influence of remote regulatory elements is likely to be missing in targeted screening approaches which focus on candidate genes, whether through SNP analysis or sequencing. This is due in part to the lack of information on where these regions are located, and initiatives such as the NIH Epigenomics Roadmap Program (http://nihroadmap.nih.gov/epigenomics/initiatives.asp) and the International Human Epigenome Consortium (http://www.ihec-epigenomes.org/). These are large collaborative efforts to map regions in the genome that are involved in epigenetic regulation, and the results will assist investigators in identifying regions for evaluation.

Heritable mutations that influence reading and language disorders could be in the genes that regulate epigenetic processes, analogous to the mutations in HDACs or DNMTs, or in genes such as MeCP2 or in the MAPK signaling pathway (Day and Sweatt 2011). Alternatively, mutations could be in the DNA binding regions themselves. Genome-wide association studies or even targeted SNP analysis might be able to detect such mutations, given that the sample size is large enough, the variants are not rare, and the adjacent SNPs are in linkage disequilibrium. Knowledge of the location of epigenetic regions could help prioritize the follow-up of SNPs in a GWAS that otherwise might be ignored because of lack of apparent functional relevance (Ernst et al. 2011), and location information would also guide the placement of SNPs in a targeted array. Sequence analysis would detect rare variants, but until whole genome sequencing of large populations is financially feasible, targeted sequencing studies are also dependent upon the selection of candidate genes and regulatory regions. Studies of epigenetic mechanisms in animal models should produce additional candidate genes for examination in cognitive disorders in humans.

Another approach would be to look for genomic regions of abnormal methylation or histone modification in individuals with specific forms of language or learning disorders. However, epigenetic patterns are likely to be different in different tissues, and histone modifications especially may change over time. Fortunately, there are studies which indicate that methylation patterns affecting disorders can be consistent across tissues, such as lymphocyte and brain methylation patterns in individuals with psychiatric disorders, suggesting that lymphocyte tissues can be a good proxy for brain (Dempster et al. 2011). An abnormal methylation pattern in a region of DNA from human tissues such as lymphocytes or fibroblasts could indicate an epigenetic process that could be pursued further by determination of the effects of that abnormality on gene expression and the impact on learning in animal models. Such studies could be valuable in identifying important genes and signaling pathways involved in learning. Further genetic studies such as association and sequencing could assess the influence of these new candidates at the population level. Conversely, though, lack of a methylation abnormality in “proxy” tissues would not rule out the involvement of an epigenetic mechanism that is confined to a region of the brain.

There are many approaches to the identification of genes that affect quantitative traits such as language and learning disorders, and the most effective will take advantage of simultaneous analysis of genomic and expression analyses (Charlesworth et al. 2009). The inclusion of information on epigenetic mechanisms of gene regulation may turn out to be an important consideration in gene identification and possibly even in therapy.