Main text

There has been major progress in identifying genetic risk variants for complex brain traits including neurodegenerative diseases (for example Alzheimer’s disease and amyotrophic lateral sclerosis [1,2,3]) and neuropsychiatric illnesses (for example schizophrenia and major depressive disorder [4,5,6,7]). A key challenge is to understand the biological effects of these genetic risk factors, especially because the actual gene(s) involved in mediating phenotypic variation are not necessarily the closest to the most significant genetic variant in genome-wide association studies (GWAS). The majority of GWAS variants do not directly index or tag coding changes affecting protein structure. Instead, common variants associated with disease are preferentially located in regulatory domains such as active enhancers and regions of open chromatin [8, 9], and therefore are hypothesized to act by influencing gene regulation [10]. There is, therefore, much interest in using epigenomic data to improve our understanding of how genetic variants associated with complex disease mediate differences in gene activity and regulation. Given the tissue-specific nature of gene regulation, it is critical these relationships are explored in relevant tissues; existing epigenomic annotation data has been largely generated in easily accessible tissues and cells, or commercially available cell lines. In particular, datasets based on the human brain are lacking, limiting the downstream interpretation of GWAS findings for brain traits. Recently, we quantified genome-wide patterns of lysine H3K27 acetylation (H3K27ac) - a robust mark of active enhancers and promoters that is strongly correlated with gene expression and transcription factor binding – using ChIP-seq in an extensive collection of entorhinal cortex samples (n = 47) [11]. In this study, we used these data to perform enrichment analyses of GWAS variants for a range of brain traits (attention-deficit hyperactivity disorder (ADHD), Alzheimer’s disease, autism, amyotrophic lateral sclerosis (ALS), major depressive disorder, bipolar disorder and schizophrenia) using linkage disequilibrium (LD) score regression [12] to test the hypothesis that the majority of these variants act by influencing gene regulation in the brain.

Detailed methods on the experimental procedures and informatics pipeline used to derive the set of cortical H3K27ac peaks have been previously described [11]. Briefly, post-mortem entorhinal cortex samples from 47 donors were provided by the MRC London Neurodegenerative Disease Brain Bank (https://www.kcl.ac.uk/ioppn/depts/bcn/index.aspx). The entorhinal cortex, which is located in the medial temporal lobe, has an important role in memory formation and has been implicated in a range of neuropsychiatric and neurological phenotypes [13]. We annotated genome-wide patterns of H3K27ac in the entorhinal cortex using chromatin immunoprecipitation (ChIP) followed by highly parallel sequencing (ChIP-seq). After stringent quality control of the raw H3K27ac ChIP-seq data, we obtained a mean of 30,032,623 (SD = 10,638,091) sequencing reads per sample, representing the most extensive analysis of H3K27ac in the human entorhinal cortex yet undertaken. H3K27ac peaks were called from the combined set of high quality mapped reads across all samples using MACS2 [14], and filtered to exclude those located on sex chromosomes, in unmapped contigs and mitochondrial DNA. In total, we generated a final dataset of 178,454 autosomal entorhinal cortex H3K27ac peaks which were used in the analyses presented here.

To test for enrichment of GWAS variants in H3K27ac peaks from adult cortex, we performed partitioned heritability analysis using the LD score regression software (https://github.com/bulik/ldsc) [12, 15]. Briefly, this method assumes that the test statistic for a given genetic variant also captures the effect of all other variants in LD with it; the number of additional variants tagged by the particular variant under consideration is measured by its ‘LD score’. Genuine polygenic effects are present, therefore, if the test statistics positively correlate with the LD scores. The method can be applied either across the genome to derive an estimate of total heritability or to subsets of genetic variants annotated to genomic features, so called partitioned heritability. Enrichment is determined if there is a stronger, positive correlation between the test statistics and LD scores for variants within a category relative to other categories. LD scores were generated based on custom annotations derived from our H3K27ac peaks and 1000 genomes reference data (downloaded alongside the software from https://data.broadinstitute.org/alkesgroup/LDSCORE/). The baseline model proposed by Finucane et al. [15] - based on the union of non-specific functional annotation categories including coding, UTR, promoters, introns, histone marks (H3K4me1, H3K4me3, H3K9ac5, H3K27ac), DNase I hypersensitivity site (DHS) regions, chromHMM/Segway predictions of underlying chromatin states derived from ENCODE annotations, regions that are conserved in mammals, super-enhancers and active enhancers - was taken as the background for enrichment testing. Genetic variants were annotated to two non-overlapping categories defined as follows: 1) entorhinal cortex H3K27ac peaks and 2) any other functional annotation category included in the baseline model. Heritability statistics for each annotation category were then calculated using publicly available GWAS results for seven psychiatric and neurodegenerative traits (ADHD [16], Alzheimer’s disease [1], autism [17], amyotrophic lateral sclerosis (ALS) [2], major depressive disorder [7], bipolar disorder [5] and schizophrenia [4, 6, 18]) and 14 non-brain phenotypes (birth length [19], body mass index (BMI) [20, 21], height [21, 22], cigarettes per day [23], ever smoked [23], coronary artery disease [24], Crohn’s disease [25], inflammatory bowel disease [25], ulcerative colitis [25], high density lipoprotein (HDL) [26], low density lipoprotein (LDL) [26], total cholesterol [26], triglycerides [26] and type 2 diabetes [27]) (See Additional file 1: Table S1). Enrichment statistics for each GWAS trait were calculated as the proportion of heritability attributed to that category divided by the proportion of SNPs annotated to that category, with 95% confidence intervals used to identify significant enrichment statistics. These represent the enrichment relative to the set of more broadly defined functional elements derived from cross-tissue datasets included in the baseline model.

We first estimated the total heritability of each trait using variants annotated to any functional genomic annotation category to confirm that the included GWAS had sufficient power to quantify heritability with enough precision to permit downstream enrichment analyses. Across the seven brain traits, the total heritability estimates ranged from 0.0535 for ALS (95% confidence interval (0.0321, 0.0749)) to 0.237 for schizophrenia (95% confidence interval (0.214, 0.260)) (Fig. 1a). Next, we estimated the partitioned heritability attributable to variants located within entorhinal cortex H3K27ac peaks. This ranged from 0.0302 for Alzheimer’s disease (95% confidence interval (0.013, 0.0478)) to 0.146 for schizophrenia (95% confidence interval (0.121, 0.170)); all seven brain traits had significantly non-zero estimates of heritability within H3K27ac peaks (Table 1). Finally, we compared partitioned heritability estimates between entorhinal cortex H3K27ac peaks and more broadly defined functionally active regions of the genome identified across multiple cell types. For all seven brain traits, heritability was enriched within the entorhinal cortex H3K27ac peaks (Fig. 1b). The strongest enrichment was for ALS (enrichment = 2.20; 95% confidence interval (2.12, 2.27)), followed by autism (enrichment = 2.11; 95% confidence interval (2.05, 2.16)) and major depressive disorder (enrichment = 2.04; 95% confidence interval (1.92, 2.16)); the lowest enrichment was for Alzheimer’s disease (enrichment = 1.10; 95% confidence interval (1.05, 1.15). Enrichments for all seven brain traits remained significant when correcting for the number of independent tests performed (Additional file 2: Table S2). We next compared these results to those for the 14 non-brain phenotypes; although most were found to have non-zero heritability estimates for variants located within entorhinal cortex H3K27ac peaks, these were generally not enriched relative to functional elements defined across multiple tissue types. The exceptions were for body mass index (BMI) (enrichment = 1.16; 95% confidence interval (1.13, 1.19)), ever smoked (enrichment = 2.07; 95% confidence interval 2.04, 2.10), high density lipoprotein (HDL) (enrichment = 1.53; 95% confidence interval (1.45, 1.62)) and triglycerides (enrichment = 1.33; 95% confidence interval = (1.24, 1.42)). These results are interesting given that both BMI and smoking are known to have a neurobiological component, and it is plausible that genetic variation associated with these traits may have mechanistic effects in the cortex.

Fig. 1
figure 1

Enrichment of heritability within entorhinal cortex H3H27ac peaks. a Bar plot of total heritability estimates calculated across genetic variants located within any functional element. b Bar plot of cortical H3K27ac enrichment statistics. Enrichment was calculated as the proportion of heritability divided by the proportion of variants within autosomal H3K27ac peaks in the entorhinal cortex, relative to values for the set of more broadly defined functional elements derived from cross-tissue datasets. Error bars represent 95% confidence intervals; dashed horizontal lines indicate null values

Table 1 Enrichment of heritability within entorhinal cortex H3K27ac peaks

In summary, we report an enrichment of heritability within active regions of regulatory and enhancer function in the adult entorhinal cortex for seven brain disorders. This augments an existing body of evidence that genetic variants identified in GWAS are involved in gene regulation [10]. Furthermore, it uses regulatory domains defined in the relevant tissue and demonstrates that these regions are more informative than functional elements defined across a panel of tissues and cell types, highlighting the importance of generating cell-type and tissue-specific epigenomic annotation datasets. Although our data represents the largest entorhinal cortex H3K27ac dataset generated to date, we were restricted to performing a global enrichment analysis. Future analyses in larger numbers of samples should aim to undertake a genetic analysis of each peak and align these results with GWAS results in order to identify the specific peaks, and ultimately genes, associated with genetic variants identified in genetic studies of brain traits. There are a number of limitations to our study. First, although one of the strengths of our study is the use of cortical H3K27ac data, our ChIP-seq analyses were performed on bulk tissue and future studies should aim to generate epigenomic annotation data for specific neural cell-types [28]. Second, we have only considered one specific epigenetic mark, H3K27ac; future studies exploring a more comprehensive set of marks may yield insights into the exact mechanism by which genetic variants influence gene regulation. Third, the H3K27ac data were generated in elderly adult post-mortem brain, which may be less relevant for neurodevelopmental brain phenotypes such as autism, ADHD and schizophrenia. In conclusion, our results support the hypothesis that genetic variants associated with brain disorders exert their effect through gene regulation in the brain. Future studies should aim to identify the specific regulatory elements affected by genetic variants associated with brain disorders and the genes that are transcriptionally altered by these differences.