Introduction

Mitochondria are unique organelles in that they have their own circular genome, approximately 16.6 kb in size [1]. Mitochondrial DNA (mtDNA) consists of 37 genes, 22 encoding for transfer RNAs (tRNAs), two for ribosomal RNAs (rRNAs) and 13 encoding for proteins important in the electron transport chain. Each of these 13 proteins are directly involved in the regulation of cellular respiration, generating the majority of ATP required for the process. However, mitochondria have an array of other important cellular roles such as calcium homeostasis [2] and neural stem cell differentiation [3]. As such, abnormal mitochondrial function, dynamics and trafficking have been associated with a number of brain disorders including Alzheimer’s disease [4, 5], schizophrenia [6], bipolar disorder [7] and major depressive disorder [8].

Epigenetic processes mediate the reversible regulation of gene expression, occurring independently of DNA sequence variation, acting principally through chemical modifications to DNA and nucleosomal histone proteins and orchestrate a diverse range of important physiological functions. DNA methylation is the best characterized and most stable epigenetic modification modulating the transcription of mammalian genomes and, because it can be robustly assessed using existing genomic DNA resources, is the focus of most human epidemiological epigenetic research to date [9]. The most widely used method for epigenome-wide analysis of DNA methylation is the Illumina 450K methylation array, and a number of studies have recently shown differential DNA methylation of the nuclear genome (ncDNA), between different tissue types [1012] and also in a range of complex diseases, from brain disorders such as Alzheimer’s disease [1315] and schizophrenia [16, 17], to systemic diseases such as type 2 diabetes [18] and Crohn’s disease [19]. However, with no representation of the mitochondrial genome on this platform, as well as a lack of analysis on other genome-wide platforms, the role of mtDNA methylation has been largely neglected [20, 21].

Since the identification of 5-methylcytosine (5-mC) in mitochondria, research into mtDNA methylation as an independent and potentially relevant mark has received more regular attention [22, 23]. However, most research is either focussed on low resolution, global DNA methylation, or candidate gene DNA methylation changes using techniques such as bisulfite pyrosequencing [20]. These recent publications have indicated that differences in mtDNA methylation are present in a variety of different phenotypes [2429] and may have potential utility as a biomarker [30]. In addition, a recent study has explored the use of Methylated DNA Immunoprecipitation Sequencing (MeDIP-seq) to investigate changes in mtDNA methylation across 39 cell lines and tissues from publicly available data [31]. At present, genome-wide sequencing technologies have not yet been used to interrogate alterations in the mtDNA methylome across tissues in the same individuals.

A high proportion of current, publicly available, genome-wide DNA methylation data has been generated through the use of MeDIP-seq, a method designed to interrogate genome-wide changes in methylation at high throughput and low cost [32]. However, given the presence of nuclear-mitochondrial pseudogenes (NUMTs), regions of the nuclear genome that share a high sequence homology with their mitochondrial paralogue [33, 34], mitochondrial reads are often discarded from further analysis. The development of bioinformatic pipelines to investigate regions of differential mtDNA methylation from whole genome data would provide a novel way in which to interrogate the mtDNA methylome in publicly available data. Here, we control for the presence of NUMTs in a previously published MeDIP-seq dataset, to investigate differential DNA methylation across the mitochondrial genome in human post-mortem brain samples.

Results

MtDNA methylation patterns are correlated between the cortex, cerebellum and blood

To date, no study has investigated differences in mtDNA methylation across different matched regions of human brain and blood samples. Our sample (Table 1) consisted of MeDIP-seq data from three individuals, free of any neuropathology and neuropsychiatric disease, for five different regions of the cortex (Brodmann areas (BA) 8, 9 and 10, superior temporal gyrus (STG) and entorhinal cortex (ECX)), the cerebellum (CER) and pre-mortem blood [35]. Given that MeDIP-seq data has been generated from standardly extracted total genomic DNA and thus contains a mixture of ncDNA and mtDNA [36], we initially controlled for regions of high sequence homology between the two genomes within our data by realigning mtDNA reads to a series of custom reference genomes using an in-house pipeline (see the Methods section) to specifically analyze mtDNA methylation (Fig. 1). Briefly, after an initial alignment to the GRCH37 reference genome using BWA, uniquely mapped reads were extracted and aligned to a custom GRCH37 reference genome not containing the mitochondrial sequence. Reads that did not map to this custom genome were found to share less homology with the nuclear genome and were taken forward and realigned once more to the full reference genome. Initially, we were interested to investigate whether changes in mtDNA across the mitochondrial genome were highly correlated between different tissue types. Using principal component analysis (PCA), we found that mtDNA methylation patterns are highly correlated between different cortical regions (r > 0.99, p < 2.2E−16), with a slightly weaker correlation between the cerebellum and cortex (r > 0.97, p < 2.2E−16) (Fig. 2). Due to the small number of blood samples available, deriving a significance level for the correlations between the cerebellum and blood could not be made. Instead, in an attempt to explore the similarity between matched blood and cerebellum samples, the direction of differential methylation with respect to the cortex was used. Here, we found that 93.1% of the windows analyzed in the cerebellum and blood had the same direction of methylation difference with respect to the cortex, further suggesting a strong correlation between the two tissue types.

Table 1 Demographic information
Fig. 1
figure 1

Overview of the analysis pipeline

Fig. 2
figure 2

MtDNA methylation patterns are correlated between the cortex, cerebellum and blood. Samples were ordered based upon the similarity of their principal components, RPKM values, with r calculated for the correlations between each tissue. BLD blood, BA8 Brodmann area 8, BA9 Brodmann area 9, BA10 Brodmann area 10, CER cerebellum, CTX cortex, ECX entorhinal cortex, STG superior temporal gyrus

Differentially methylated regions of the mitochondrial genome can be identified between anatomically distinct cortical regions and the cerebellum

Having identified correlated mtDNA methylation patterns across different brain regions, we were interested to investigate whether we could identify differentially methylated regions (DMRs) in the mitochondrial genome between different regions of the cortex and cerebellum. To identify such tissue-specific DMRs within the mitochondrial genome, paired t tests were performed across matched cortical and cerebellum samples at 100 bp windows across the mitochondrial genome (see the Methods section). In total, we identified 74 nominally significant DMRs (p < 0.05) between the five individual cortical regions and the cerebellum (Table 2; Fig. 3). Of these DMRs, seven (Table 2, bold face) were found to be present across all prefrontal cortex areas (BA8, BA9, BA10). Furthermore, the direction of methylation difference was maintained in all Brodmann area regions, with three conserved regions of hypomethylation and four conserved regions of hypermethylation, with respect to the cerebellum. Furthermore, four of the seven conserved regions were adjacent to each other within the mitochondrial displacement loop (D-Loop) (16201–16600 bp), a region associated with gene transcription and DNA replication.

Table 2 List of DMRs identified between five anatomically discreet cortical regions and cerebellum
Fig. 3
figure 3

DNA methylation differences are seen in the mitochondrial genome between brain regions and blood. Average raw RPKM values across the mitochondrial genome for each individual cortical brain region alongside matched blood and cerebellum samples are shown in the top panel, with gene positions downloaded from GENCODE shown in the middle panel. For each 100 bp window, paired t tests were performed to compare each cortical brain region and the cerebellum, with -log10 (p) shown in the bottom panel. BLD blood, BA8 Brodmann area 8, BA9 Brodmann area 9, BA10 Brodmann area 10, CER cerebellum, CTX cortex, ECX entorhinal cortex, RPKM reads per kilobase of transcript per million mapped reads, STG superior temporal gyrus. Red dashed line denotes the Bonferroni significance, whilst blue dashed line denotes p < 0.05 in the lower panel

A number of differentially methylated regions in mtDNA can be observed between the cortex and cerebellum

We were also interested to see whether total cortical tissue was significantly different to matched cerebellum samples. Given the paired nature of the different anatomical regions of the cortex, we used a multilevel mixed effects model to compare total cortex to cerebellum (see the Methods section). This analysis revealed 48 nominally significant (p < 0.05) windows (Table 3; Fig. 4), of which eight passed the Bonferroni correction (Table 3, bold face). Interestingly, three of these eight were adjacent to each other, lying between 10301 and 10600 bp and covering MT-ND3/MT-ND4L and MT-TR. We also saw a Bonferroni significant difference in DNA methylation in the D-Loop, where we earlier noted DNA methylation changes across all three Brodmann area regions.

Table 3 List of DMRs identified between total cortex and cerebellum
Fig. 4
figure 4

DNA methylation differences are seen in the mitochondrial genome between the cerebellum and cortex. RPKM values in the total cortex and cerebellum across the mitochondrial genome are shown in the top panel, with gene positions downloaded from GENCODE shown in the middle panel. For each 100 bp window, paired t tests were performed to compare the cortex to the cerebellum, with -log10 (p) shown in the bottom panel. BLD blood, BA8 Brodmann area 8, BA9 Brodmann area 9, BA10 Brodmann area 10, CER cerebellum, CTX cortex, ECX entorhinal cortex, RPKM reads per kilobase of transcript per million mapped reads, STG superior temporal gyrus. Red dashed line denotes the Bonferroni significance, whilst blue dashed line denotes p < 0.05 in the lower panel

MtDNA methylation patterns can distinguish between tissue types

Although we have shown that mtDNA methylation patterns are highly similar between distinct anatomical regions of the human brain and blood, we were also interested to identify whether mtDNA methylation patterns could distinguish between these tissue types. Through unsupervised hierarchical clustering, we showed that average mtDNA methylation patterns can segregate these tissues (Fig. 5a). Importantly, ncDNA methylation profiles in the same samples have also been previously shown to separate the cortex, cerebellum and blood [35]. Interestingly, when we performed unsupervised hierarchical clustering on the individual samples, we found that, in most cases, intra-individual differences across tissue types are greater than inter-individual differences within each tissue type, as the cortex, cerebellum and blood samples clustered with their own tissue type, respectively (Fig. 5b).

Fig. 5
figure 5

MtDNA methylation patterns can distinguish between tissue types. a Average RPKM values for each cortical brain region, cerebellum and blood samples were clustered based upon the Euclidean distance, identifying two major clusters; the cortex and blood-cerebellum. b When clustering RPKM values in the individual samples from the cortex, cerebellum and blood, we observed that individual cortex samples clustered together, whilst cerebellum and blood samples formed separate clusters. This highlights that tissue-specific differences between the cortex, cerebellum and blood are greater than intra-individual variability within a tissue. BLD blood, BA8 Brodmann area 8, BA9 Brodmann area 9, BA10 Brodmann area 10, CER cerebellum, CTX cortex, ECX entorhinal cortex, RPKM reads per kilobase of transcript per million mapped reads, STG superior temporal gyrus

Discussion

The availability of publicly available epigenomic data provides a great resource for mitochondrial epigenetics, a field that is relatively nascent and has yet to be thoroughly investigated in a range of complex diseases. Here, we present evidence that mtDNA methylation patterns across mtDNA are brain region specific. However, data such as that presented here is confounded by a lack of isolation of mtDNA prior to antibody enrichment and sequencing. As such, the potential of including NUMTs in datasets derived from data generated using total genomic DNA could lead to misleading results. Here, we controlled for regions of high sequence homology between the nuclear and mitochondrial genomes. However, this approach is likely over-conservative and does lead to the generation of a somewhat truncated consensus sequence. PCA of the mitochondrial epigenome after corrections for nuclear homology was able to separate individuals belonging to the three main tissue types, the blood, cortex and cerebellum based on mtDNA methylation variation among tissue types. This tissue specificity is further highlighted by the identification of eight DMRs that pass the Bonferroni correction for multiple testing between total cortex and cerebellum. MtDNA methylation has been shown to be cell line dependent in the past. [31] Although overall DNA methylation levels were low in all tissues, it is worth noting that the study was performed on non bisulfite-treated DNA. As such, the low percentage of mtDNA methylation is not a pitfall due to a lack of a total bisulfite treatment efficiency. One limitation of the current study is the unavailability of publicly available MeDIP-seq datasets of matched cortical and cerebellum tissue from other cohorts for validation purposes. Future work would aim to replicate our findings in additional study cohorts and also to investigate the relationship between mitochondrial DNA methylation and gene expression.

Despite a number of nominally significant windows being identified between each individual cortical region and the cerebellum, these did not pass the Bonferroni correction, although it is likely this method is too stringent. Nevertheless, the conservation of seven nominally significant windows across each Brodmann area is interesting to note. Four of these windows lie adjacent to each other and correspond to the mitochondrial D-Loop, a region containing the only two mitochondrial promoters which is typically associated with gene transcription and DNA replication. However, one limitation of this study is owed to the use of antibody-based enrichment, resulting in the analysis being limited to a window-based approach. Despite this, studies of the nuclear genome have shown high correlation between window-based approaches and, more sensitive, single-site assays such as the Illumina 450K beadarray [32]. However, given the small size of the mitochondrial genome and that 23 of the 37 genes present in the genome are below 100 bp in size, this window-based approach may not be the most appropriate for future studies designed to specifically assess mtDNA methylation as it can result in a window intersecting two genes in the polycistronic transcript.

Conclusions

This method provides a conservative approach to determine mtDNA methylation across the genome for data previously generated using next-generation sequencing approaches such as MeDIP-seq. Its conservative nature reduces the risk of the inclusion of NUMTs in the final analysis of whole genome data but may also lead to the inclusion of false negatives as well as potential gaps in the reference sequence. As such, it is best suited to analyzing previously generated whole genome data and is not a replacement for the isolation of mitochondrial DNA [36] prior to targeted methylation studies, which would be the optimal approach for investigating mitochondrial epigenomics. However, our method has allowed the identification of novel brain-region-specific DMRs in a previously generated publicly available dataset. Furthermore, the identification of brain region-specific mtDNA methylation patterns across the mitochondrial epigenome suggests the importance of a focussed, tissue-specific study design when investigating mtDNA methylation. As previously discussed, one caveat when utilizing MeDIP-seq data is the segregation of data into neighbouring windows, meaning that determining the exact corresponding gene of a DMR is difficult and, as such, future studies should aim to sequence the mitochondrial DNA methylome at single-base resolution to address this.

Methods

Data collection

We utilized publicly available MeDIP-seq data from Davies et al. [35]. In brief, this data was generated using 5 μg fragmented gDNA, which, following end repair <A> base addition and adaptor ligation, was immunoprecipitated using an anti-5-mC antibody (Diagenode, Liège, Belgium). MeDIP DNA was purified and then amplified using adaptor-mediated PCR, with DNA fragments between 220 and 320 bp subjected to highly parallel 50 bp paired-end sequencing on the Illumina Hi-Seq platform. The paired-end, raw fasta files were provided by the authors and quality checked using FastQC. Sample information is provided in Table 1.

Quality control and NUMT exclusion

Fasta files were subjected to adaptor and Phred score (q < 20) trimming. In an attempt to remove any potential contamination of possible NUMTs, multiple alignments to the reference genome were undertaken. Paired fasta files were aligned to GRCH37 using BWA. Unique and mapped reads aligning to the mitochondria were then re-mapped to a custom GRCH37 reference without the mitochondrial chromosome. Reads not mapping to the custom reference were then taken forward and realigned to the full GRCH37 reference to eliminate the possibility of homologous regions mapping falsely to the mitochondrial genome (Fig. 1). All alignments were carried out using BWA mem and default settings. Reads per kilobase of transcript per million mapped reads (RPKM) values for each sample were calculated using the MEDIPS package [37]. Methylation was averaged across 100 bp non-overlapping windows (default parameter setting in MEDIPS), and only windows with read counts >10 were considered for analysis. Due to the non-normal distribution of all cohorts, RPKM values were log2 transformed before statistical analysis.

Statistical analyses

All analyses were performed in the R statistical environment version 3.2.1 [38]. For all analyses, a nominally significant threshold of p < 0.05 and a Bonferroni significant threshold of p < 7.04E−04 were used. Given the matched sample nature of this cohort, two-tailed, paired t tests were performed at each window along the mitochondrial genome to identify DMRs between the individual cortical regions and cerebellum. To compare the total cortex to cerebellum, we performed a multilevel mixed effects model in the Lme4 package in R [39], using the brain region as the random effect and individual as the fixed effect. To assess the similarity of the brain regions, we used the R function “hclust” to cluster average RPKM values for the brain regions using the Euclidean distance. We used the R function “corrgram” within the corrgram package [40] to order samples based upon the similarity of their principal components.