Background

Aging is characterized by the gradual decline of physiological function over time. The human aging process is a major risk factor for cancer, diabetes, cardiovascular diseases and neurodegenerative disorders [1]. Therefore, there is growing interest to understand the cellular and molecular mechanisms underlying aging. In recent years, studies have described several hallmarks of aging including cellular senescence [2], telomere attrition [3, 4], gene expression changes [5, 6], dysregulation of nutrient sensing [7] and epigenetic modifications [8]. Specifically, DNA methylation (DNAm) is proposed to play an important role in the aging process, with DNAm-based biomarkers commonly used as predictors of age and age-related health outcomes [9, 10].

Previous epigenome-wide association studies (EWAS) in whole blood [11,12,13,14,15,16,17,18,19,20], saliva [11], adipose [21], brain [22, 23], and breast [24] have shown that age is associated with DNAm across thousands of cytosine-guanine (CpG) dinucleotides in the human genome. Specifically, DNAm at the promoter of ELOVL2, an enzyme involved in fatty acid elongation, has been found to be significantly associated with chronological age across different populations, tissue types, and studies [11, 12, 17, 21, 25]. While, this is an example of an age-associated change that occurs in many tissue types, studies have also identified age-associated CpG sites that are differentially methylated only in specific tissue types [25,26,27]. Nonetheless, the patterns of enrichment of age-associated DNAm changes within specific genomic features, such as CpG islands (CGI) and chromatin states, are found to be similar across most tissue types [25, 28]. The effects of aging on gene expression have also been well-characterized, with discovery of age-altered genes involved in inflammation, metabolism, cancer, and mitochondrial activity [29, 30]. Therefore, in addition to characterizing the tissue-specific and non-specific variation in age-associated DNAm patterns, understanding the functional consequences of epigenetic aging, via regulation of gene expression, remains an area of interest. Though, studies are often limited due to the lack of both DNAm and gene expression data collected from the same tissue samples.

In this study, we assessed the association between age and genome-wide measures of DNAm for 961 tissue samples representing 9 tissue types (lung, colon, ovary, prostate, whole blood, testis, kidney, muscle and breast) from the Genotype-Tissue Expression (GTEx) project. Additionally, to gain further insights into the consequences of epigenetic aging, we investigated the correlation between age-associated DNAm and local gene expression changes to identify age-assocciated expression quantitative trait methylathion (age-eQTMs).

Materials and methods

The genotype-tissue expression project (GTEx)

The GTEx project is a publicly available biobank that has collected multiple unique tissue types (up to 54 types) from ~ 960 post-mortem donors. Medical history of donors and characteristics of tissue samples were extensively reviewed, and any sample with suggestive findings (e.g., cancer) were excluded from the GTEx normal database [31]. The GTEx v8 database consists of RNA sequencing and genotyping data from 838 donors (17,382 samples from 52 tissue types) [32]. The database additionally provides metadata collected through questionnaires (e.g., sex, age, race/ethnicity) as well as measurements of ischemic time for all samples.

We collected DNAm measurements for 961 tissue samples representing 9 tissue types (colon, lung, ovary, prostate, skeletal muscle, kidney, whole blood, breast, and testis). Tissue types were selected based on several criteria including relevance to cancer (colon, lung, prostate, kidney, breast, testis), tissues with unique aging biology (breast, testis, muscle), and common use in epidiemiological studies (whole blood). With resources to collect DNAm for ~ 1000 samples, we selected larger number of samples for tissues types with larger public health interest (lung, colon, ovary) as well as to assess the effect of sample size on the power to detect DNAm quantitative trait loci (mQTLs), as previously reported [33].

DNA methylation measurement and quality control

DNA extraction from 1000 unique GTEx tissue samples was performed using the Qiagen Gentra Puregene method at GTEx Laboratory Data, Analysis and Coordinating Center (LDACC). The extracted DNA was shipped to the University of Chicago. The 1000 samples represent 424 GTEx donors and 9 tissue types. For each tissue type, all samples were obtained from distinct donors.

For the 1000 unique DNA samples, DNAm at > 850,000 CpG sites was measured using the Infinium MethylationEPIC array (Illumina, San Diego, CA, USA) at the University of Chicago. All DNA samples were prepared and analyzed following the manufacturer’s guidelines and protocols. For quality control, we excluded 3 samples with undetectable methylation values (detection P > 0.01) in ≥ 5% of CpG sites, 6 samples with mismatched sex, and 14 samples that did not clearly cluster with their tissue type. Using the measurements of 59 high-frequency SNPs in the EPIC array, we identified one sample that did not match the donor’s existing genotype data. 15 breast tissue obtained from men were excluded. Following quality control, there were 961 remaining samples used for analysis (representing 9 tissue types and 417 GTEx donors).

For quality control of CpG sites, we followed guidelines from Pidsley et al. [34]. CpG sites measured by probes with potential non-specific binding (43,254 sites), sites overlapping genetics variants or variants that overlap single-base extension sites for Type 1 probes (7708 sites), sites mapping to the sex chromosomes (16,037 sites), and poorly performing sites based on guidance from Illumina (167 sites) were excluded. We also excluded CpGs that had detection P > 0.01 in at least one sample (44,135 sites). A total of 754,119 CpGs passed QC, and were retained for analyses. Genomic positions for all CpGs (and for all SNP and gene expression analyses described below) are mapped to human reference genome build hg19/GRCh37.

GTEx gene expression data

Gene expression data, collected via RNA-sequencing, from GTEx v8 was obtained from the GTEx portal. The expression values for each gene was estimated as reads per kilobase of transcript per million mapped reads (RPKM) using RNA-SeQC. GTEx v8 dataset provides expression levels recorded as both read counts and transcripts per million (TPM) [35].

Read counts from these genes were normalized across samples using the Trimmed Mean of M-values (TMM) normalization method in edgeR to generate TMM-normalized TPM for each gene [36]. Following TMM normalization, genes were selected based on the expression threshold of > 0.1 TPM in at least 20% of samples and ≥ 6 reads in at least 20% of the samples. We restricted to the fully processed, filtered, and normalized autosomal genes from the GTEx v8 dataset which resulted in 26,095 genes expressed in lung (n = 546), 25,379 genes expressed in colon (n = 406), 25,026 genes expressed in breast (n = 459), 24,028 genes expressed in kidney (n = 90), 20,356 genes expressed in muscle (n = 803), 24,472 genes expressed in ovary (n = 180), 25,680 genes expressed in prostate (n = 245), 33,923 genes expressed in testis (n = 361), and 20,315 genes expressed in whole blood (n = 755).

Association of age with DNAm and gene expression

Beta values for each CpG was logit transformed in M-values prior to analyses using the formula: log2[beta/(1 − beta)]. The association between chronological age and DNAm at each CpG site was assessed using a linear model implemented by the R (R/4.2.1) package limma (3.54.2) [37]. Sex, BMI, race/ethnicity, ischemic time, batch/place, and surrogate variables (SVs) were included as covariates in our model. The R (R/4.2.1) sva (3.46.0) package [38] was used to generate the SVs for each tissue type. We included the age variable in the full model matrix but omitted the age variable from the null model matrix to prevent the effects of age from being captured by SVs. The resulting SVs were used to control unknown sources of variability (e.g., technical variation and cell type composition). As a rule, we adjusted for 10 SVs for tissue types with n > 100 and 5 SVs for tissue types with n < 100.

Similarly, association between age and expression for each gene was estimated using a linear model implemented in limma, adjusting for sex, BMI, race/ethnicity, ischemic time, and 10 SVs (created using expression data).

Enrichment and pathway analyses for age-associated CpG sites

We first selected age-associated CpG sites, false discovery rate (FDR) 0.05, found in more than one tissue type, without considering directionality. We then compared the percentage of clock CpGs from Horvath [39] (353 CpGs) and AltumAge [40] (20,318 CpGs) that are age-associated to the percentage of CpGs from our EWAS (754,119 CpGs) that we identified as age-associated in multiple tissue types (n > 1). Additionally, we computed Fisher’s exact P-values.

For each tissue type, we compared the distribution of age-associated CpG sites, FDR 0.05, assigned to CpG island, shore, shelf, and open sea (based on Illumina annotations) to the distribution in the entire Infinium MethylationEPIC array (754,119 CpG sites) using chi-square tests. We assessed enrichment of age-associated CpG sites (FDR 0.05) among chromatin segmentation features. CpG sites were assigned to chromatin segmentation features using reference data from the Roadmap Epigenomics project database [41]. Background CpGs in this analysis were all CpGs assayed in the Infinium MethylationEPIC array (754,119 CpG sites). We performed this analyses for GTEx tissue types that have a closely matched reference dataset in Roadmap Epigenomics (primary tissue colonic mucosa, primary tissue lung, primary culture vHMEC mammary epithelial, and primary tissue ovary). We used the R (R/4.2.1) package ‘oddsratio’ to calculate enrichment and fisher’s exact P values.

CpGs were assigned to genes (based on Illumina annotations), and genes were assigned to pathways and biologic processes using the KEGG pathways (~ 330 pathways) [42]. We conducted gene set enrichment analysis (GSEA) using the “gsameth” function in the R package (R/4.2.1) missMethyl [43] for all tissue types using age-associated CpG sites (FDR 0.05). This function accounts for the potential bias in GSEA due to the number of CpGs per gene by computing prior probabilities and evaluates enrichment using a hypergeometric test. Enriched gene sets were defined as those passing FDR 0.05.

Identification of age-related expression quantitative trait methylation (eQTMs)

For age-associated CpG sites, we investigated the association of DNAm with expression of nearby genes (with age-associated expression). Using the tool bedtools intersect, we assigned CpG-gene pairs if a CpG site overlapped a gene region ± 10 Kb [44]. 10 Kb flanking regions of genes were assigned using Galaxy.org’s get flanks tool [45]. Gene body annotations are based on GENCODE version 26 (https://www.gencodegenes.org/human/release_26.html). Note each CpG site could be assigned to multiple genes and each gene could have multiple CpG sites assigned to it. We then assessed association between DNAm and gene expression levels using R’s lm function, adjusting for sex, BMI, EPISCORE cell-type estimates (breast, colon, lung, and prostate), ischemic time, sample group, and ethnicity. EPISCORE is a methylation-based method to estimate cell-type composition [46]. We used the “wRPC” function, with the pan-tissue DNAm atlas as input for the reference dataset. The pan-tissue DNAm atlas includes reference data for breast, colon, lung, kidney, and prostate. Except kidney, which shows a lack of age-associated signals, the other four tissue types were analyzed to detect age-eQTMs. eQTMs (P < 0.05) where the CpG site (P < 10–3, FDR < 0.05) and expression of the associated gene (P < 10–3, FDR < 0.05) is highly associated with age were considered age-eQTMs. Note we have 149 lung samples, 86 colon samples, 50 prostate samples, 30 breast samples with both DNAm and expression data.

Results

Summary of GTEx tissue samples

We generated DNAm data for 961 unique methylomes from 417 donors, spanning 9 tissue types (Table 1). The sample sizes for the tissues ranged from 38 (breast) to 223 (lung and colon). The number of tissues samples collected per donor ranged from 1 to 6. The age distribution for each tissue type ranged from 20 to 70 years (Supplementary Fig. 1), with the mean age of sample donors being 53.68 years (12.67). For tissues that are not sex-specific, approximately 70% of samples collected were from male donors and 86.3% of donors self-reported as white.

Table 1 Characteristics of GTEx tissue samples used for DNA methylation analysis

Identification of age-associated differentially methylated CpG sites

We performed an EWAS examining the relationship between age and genome-wide DNAm levels, adjusting for sex, race/ethnicity, SVs, and other covariates. Our analysis identified age-associated differentially methylated CpG sites passing Bonferroni threshold and FDR 0.01 (Supplementary Table 1) and 0.05 (Table 2) in all tissues except skeletal muscle. The tissue type with the highest number of age-associated CpG sites identified was ovary (n = 157), with 134,986 CpG sites passing FDR 0.05 (P < 0.009). The signals identified in ovary account for over half of the tissue-specific hypermethylated sites (60.2%) and nearly half of the tissue-specific hypomethylated sites (39.5%). The lowest number of age-associated CpG sites was found for kidney (n = 50), with 41 CpG sites passing FDR 0.05 (P < 2.7e−6) (Supplementary Data Files 1–8; Supplementary Fig. 2). For subsequent analyses, we focused on CpG sites passing FDR 0.05. The alternative Bonferroni method is conservative and will miss many true signals that will be useful for pathway analyses. FDR provides better sensitivity to detect true associations while allowing only a small percentage of detected associations to be false positives.

Table 2 Number of age-associated CpG sites identified in each tissue type passing false-discovery rate (FDR) 0.05

The age-associated CpGs identified were classified as either showing increased methylation levels with age (hypermethylated CpGs sites) or showing decreased methylation levels with age (hypomethylated CpGs). In most tissue types, we observed that hypermethylated CpG sites are more abundant than hypomethylated CpG sites (Fig. 1A). We further examined the distribution of effect size estimates, calculated as log2 fold-change, for age-associated CpG sites passing FDR 0.05 (Fig. 1B). For both hypermethylated and hypomethylated CpG sites, tissues with larger sample sizes tend to exhibit lower median effect size estimates, except for breast (n = 38), where hypermethylated CpGs have a lower median effect size estimate than estimates in kidney (n = 50), testis (n = 50), and whole blood (n = 54). Nevertheless, across all tissue types, those with smaller sample sizes (i.e., n⪅50) tend to have limited statistical power to detect weaker associations relative to tissues with larger sample sizes (n > 100) (Fig. 1B, C).

Fig. 1
figure 1

Identification of age-associated CpG sites. A Number of age-associated CpG sites identified in each tissue type passing false discovery rate (FDR) 0.05, stratified by hypermethylated and hypomethylated status. B Boxplot showing the effect size (log2 fold-change in DNA methylation per year increase in age) distribution of hypermethylated age-associated CpGs sites in each tissue type. C Boxplot showing the effect size (absolute value of log2 fold-change in DNA methylation per year increase in age) distribution of hypomethylated age-associated CpGs sites in each tissue type

Next, we investigated the tissue-specificity and non-specificity of differentially methylated CpG sites. We observed some evidence of overlap between pairs of tissues (Fig. 2A, B), with a few age-associated CpG sites shared across most tissues (n > 5), such as ELOVL2 (cg16867657 and cg21572722 hypermethylated in 8 tissue types) and ZNF549 (cg06458239 hypermethylated in 7 tissue types) (Supplementary Table 2). Due to the large differences in sample sizes across tissues, it is difficult to rigorously assess the extent of overlap between all tissue pairs. Though, for tissues with relatively large sample sizes (n > 100) and similar sample sizes (lung, colon, ovary, and prostate), we observed that 46.8% of hypermethylated CpG sites in lung (FDR 0.05) are also hypermethylated in colon (FDR 0.05), while 34% of hypermethylated CpG sites (FDR 0.05) in colon are also hypermethylated in lung (FDR 0.05). Nearly half of the hypermethylated CpG sites in prostate (FDR 0.05) are also hypermethylated in lung (49.2%), colon (49.4%) as well as in ovary (32.8%) (FDR 0.05). Comparatively, the proportion of hypomethylated CpG sites that are shared between these pairs of tissues is lower than hypermethylated CpG sites. Nonetheless, majority of the hypermethylated (~ 80.3%) and hypomethylated (~ 82.4%) CpG sites are present in a single tissue suggesting clear differences in DNAm aging across tissue types (Fig. 2C, D; Supplementary Table 3).

Fig. 2
figure 2

Pairwise overlap of age-associated CpG sites between tissues, including (A) hypermethylated and (B) hypomethylated CpG sites. Overlapping CpG sites passed false discovery rate (FDR) 0.05 in both tissue types. The intensity of the color (red shade for hypermethylated and blue shade for hypomethylated) reflects the number of CpG sites that overlap. Number of (C) hypermethylated (D) hypomethylated CpG sites passing FDR 0.05 against the number of tissues the CpG was identified in

Additionally, given that DNAm-based biomarkers are commonly used as predictors of age, we reasoned that CpGs used in pan-tissue aging clocks should be among the age-associated CpGs found in multiple tissue types (n > 1) from our analysis. Focusing on two well-known pan-tissue aging clocks, Horvath [39] (353 CpGs) and AltumAge [40] (20,318 CpGs), we tested enrichment of clock CpGs within our set of multi-tissue age-associated CpGs (60,665 CpGs). 1,764 AltumAge clock CpGs and 98 Horvath clock CpGs are amongst our set of multi-tissue age-associated CpGs. The percentage of clock CpGs from Horvath (27.7%; P < 2.2e−16) and AltumAge (8.6%; P = 8.098e−06) that are age-associated is larger than the percentage we identified in our EWAS as multi-age associated (8.04%). These results highlight that our analysis captures previously characterized DNAm age predictors.

Enrichment of CpG sites within genomic features

We examined the distribution of age-associated CpG sites within genomic features, observing distinct patterns for hypermethylated and hypomethylated CpG sites. We found that hypermethylated CpG sites were enriched in CGIs (P < 0.0001) in 5 tissue types (Fig. 3A). However, this pattern of enrichment was not observed in the sex-specific tissues, ovary, testis, and breast. For hypomethylated CpG sites, we observed enrichment in non-CGIs (P < 0.0001), specifically “open sea” for all tissues except testis, where we observed enrichment in CGIs (P < 0.0001) (Fig. 3B). Differences in sample sizes and the underlying biology of the tissue could contribute to the observed differences in patterns of enrichment.

Fig. 3
figure 3

Locational distribution of age-associated CpGs sites (FDR 0.05) in relation to CpG islands (CGI) stratified by (A) hypermethylated and (B) hypomethylated CpG sites. Colors represent location of CpGs with respect to CGIs. Background CpGs are defined as all CpGs assayed in the Infinium MethylationEPIC array included in our analyses (754,119 CpGs)

To further characterize the genomic context of age-associated CpG sites, we examined enrichment within chromatin states. We performed this analysis for breast, colon, lung, and ovary only, because for these tissue types, we were able to identify a closely matched reference tissue dataset from the Roadmap Epigenomics project database [41]. We observed distinct patterns of enrichment for hypermethylated and hypomethylated CpG sites. Consistent with prior studies, hypermethylated CpGs sites were enriched in the polycomb repressive complex epigenomic signature (enrichment in repressed polycomb present in colon and lung), while hypomethylated CpG sites were enriched in active regions, such as enhancers (enrichment in enhancers and genic enhancers present in breast, colon, lung, and ovary) and active transcription (enrichment in flanking transcription and flanking active TSS present in ovary) (Fig. 4A, B) [25]. We observed weaker enrichment among hypomethylated CpG sites compared to hypermethylated CpG sites in all tissue types. Colon and lung tissue demonstrated more similar patterns of enrichment compared to ovary and breast.

Fig. 4
figure 4

Enrichment of age-associated CpG sites (FDR < 0.05) among chromatin segmentation features stratified by (A) hypermethylated (B) hypomethylated CpG sites. Enrichment expressed as odds ratio. Background CpGs are defined as all CpGs assayed in the Infinium MethylationEPIC array included in our analyses (754,119 CpGs). Fisher’s exact P value * < 0.05, ** < 0.01, *** < 0.001. Active chromatin states: active transcription start site (TssA), flanking active TSS (TssAFlnk), transcription at gene 5′ and 3′ showing both promoter and enhancer (TxFlnk), strong transcription (Tx), weak transcription (TxWk), genic enhancers (EnhG), enhancers (Enh), zinc finger protein genes and repeats (ZNF/Rpts ZNF). Inactive chromatin states: heterochromatin (Het), bivalent/poised TSS (TssBiv), flanking bivalent TSS/Enh (BivFlnk), bivalent enhancer (EnhBiv), repressed polycomb (ReprPC), weak repressed polycomb (ReprPCWk), quiescent/low (Quies)

Pathway enrichment of age-associated CpG sites

To assess pathways related to age-related epigenetic changes we first assigned CpGs to genes based on Illumina annotations (544,631 CpGs assigned to 25,604 UCSC RefSeq annotations). We then conducted a pathway enrichment analysis of age-associated CpG sites (FDR 0.05) assigned to annotated genes from all 8 tissue types. We observed significantly enriched biological pathways (FDR < 0.05) using the KEGG database for 5 tissue types (colon, lung, prostate, ovary, and breast) (Fig. 5). Seven pathways showed evidence of enrichment in all five tissue types (FDR 0.05): arrhythmogenic right ventricular cardiomyopathy, axon guidance, focal adhesion, hippo signaling pathway, proteoglycans in cancer, Rap1 signaling pathway, and Wnt signaling pathway.

Fig. 5
figure 5

Pathway analysis of age-associated CpG sites detected in breast, prostate, ovary, lung, and colon. Venn diagram showing the overlap of enriched pathway between tissue types. ‘n’ corresponds to the number of pathways identified at FDR < 0.05

Several additional pathways, such as breast cancer, circadian entrainment, gastric cancer, dilated cardiomyopathy, and the MAPK signaling pathway, showed evidence of enrichment in at least two tissue types. While the breast cancer pathway was significantly enriched (FDR < 0.05) in all other tissue types, it was only nominally significant in breast tissue (unadjusted P < 0.05). Similarly, although gastric cancer was enriched in colon tissue (FDR < 0.05), it showed stronger enrichment in lung and ovary tissues. Nevertheless, cell signaling pathways were consistently enriched across all five tissue types, aligning with their role in cellular function throughout the human body. Many of these signaling pathways, including Wnt and MAPK signaling, are associated with cancer development [47,48,49]. Ovary, with the highest number of enriched KEGG pathways, showed evidence of enrichment in pathways associated with hormone production and secretion, such as estrogen, thyroid, prolactin, aldosterone, and GnRh. Colon, prostate, and lung displayed similar numbers of enriched pathways, whereas breast showed the fewest enriched pathways (Fig. 5; Supplementary Tables 4–8).

Functional characterization of age-associated CpG sites

We investigated the association between age-associated CpG sites and expression of nearby genes using linear regression. Age-eQTMs were defined as CpG sites associated with gene expression (P < 0.05), and where both the CpG site (P < 10–3, FDR < 0.05) and expression of the associated gene (P < 10–3, FDR < 0.05) were highly associated with age. We identified several examples of age-associated eQTMs that were unique to specific tissue types and those that are found in multiple tissue types (Supplementary Table 9). For example, we identified the CDKN2A region as an age-eQTM locus that is shared across 3 tissue types (colon, lung, and prostate). Four age-associated CpGs in that region were associated with CDKN2A expression (which was also associated with age) (Fig. 6). We found that CDKN2A expression is positively associated with age in all 3 tissue types (FDR < 0.05). We also observed greater variability in gene expression with increasing age (Fig. 6). Similarly, hypermethylation at the associated CpG sites (FDR < 0.05), cg1811914, cg26349275, cg08686553, and cg2422208, is observed with increasing age in all three tissue types. Examples of age-eQTMs found in a single tissue include CpGs sites annotated to ZNF518B (lung), HENMT1 (colon), ZNF154 (prostate), and HAPLN3 (breast) (Supplementary Figs. 3–6, respectively). We observed a consistent relationship for these age-eQTMs, whereby there is a negative correlation between DNAm and gene expression. Additionally, for these CpG-gene pairs, we observed that the CpG sites (hypermethylated with age) are clustered at CGIs near the gene start, consistent with the downregulation of these genes with increasing age.

Fig. 6
figure 6

Association of age with DNA methylation and expression of CDKN2A in three tissue types. Pearson’s correlation coefficient (R) and Pearson’s correlation P value reported for expression scatterplots (right). Red dot indicates hypermethylation at CpG site with increasing age. The vertical, shaded, red rectangles denote regions of age-associated CpG sites found in all three tissue types. The y-axis represents the -log10 of the P value (left) 

Discussion

In this study, we collected DNAm data for 961 tissue samples, representing 9 human tissue types (lung, colon, ovary, prostate, testis, kidney, muscle, whole blood and breast) from the GTEx project. We tested the association of age with genome-wide measures of DNAm, identifying differentially methylated CpG sites (FDR 0.05) in 8 tissue types (all except skeletal muscle). We identified age-associated CpG sites that were tissue-specific (Supplementary Table 3) as well as sites shared across multiple tissue types (e.g., ELOVL2 and CDKN2A). While majority of the CpG sites identified appear tissue-specific, the patterns of enrichment within genomic features, such as CGIs and chromatin states, is largely shared across tissue types. We performed pathway enrichment analysis to identify pathways related to epigenetic aging, and our results showed clear enrichment of aging relevant pathways, such as cancer and cell-signaling. To gain insights regarding the functional consequences of age-related DNAm changes, we assessed the correlation of age-associated sites with local gene expression. We identified several regions showing correlation in multiple tissue types (> 1 tissue), including age-eQTMs in the CDKN2A region, the HENMT1 region, and the VCWE region.

For tissue types with larger sample sizes (n > 100), including lung, colon, ovary, and prostate, increased power enabled detection of more age-associated CpG sites compared to tissue types with smaller sample sizes (n ~ 50), except in the case of breast tissue (n = 38), where we observed a relatively large number of age-associated CpG sites. The abundance of associations observed for breast tissue could be explained by the underlying biology of this tissue type. It is well established that breast development and risk of breast-related diseases (e.g., breast cancer) is tightly linked to age, with breast tissue undergoing various biological changes, including regression of terminal duct lobular units, increased breast density and fat pads, hormonal fluctuations (perimenopause, menopause), breast milk composition, and cellular transformation with age [24, 50, 51]. However, as expected, for the tissue types with smaller sample sizes, we were only able to capture age-associated CpG sites with relatively large effect sizes. The differences in sample sizes limited our ability to characterize the extent of shared age-associated effects between tissue types. However, consistent with prior studies, we did identify that hypermethylation of CpG sites (cg16867657 and cg21572722) in the ELOVL2 region is shared across all 8 tissue types.

Our results show that patterns of enrichment of hypermethylated and hypomethylated age-associated CpG sites are generally consistent across tissue types. For most tissue types, we observed that, for CpGs measured in the EPIC array, hypermethylation with increasing age is more common than hypomethylation. Additionally, hypermethylated CpG sites tend to show enrichment in CGIs while hypomethylated CpG sites tend to show enrichment in non-CGIs (“open sea”). Prior studies have shown that hypermethylation in promoter regions overlapping CGIs is often observed in aging and aging-related diseases (e.g., cancer). This hypermethylation is associated with gene silencing, and is proposed to interfere with proper regulation of genes [52]. The EPIC array has extensive coverage of CGIs, genic, and enhancer regions, which likely contributes to the greater number of hypermethylated age-associated CpG sites identified in our study. Interestingly, the patterns of enrichment observed in testicular tissue show a striking difference to patterns observed in the other 7 tissues, which can be attributed to the unique aging biology and gene expression patterns of the testis [53, 54].

Next, we assessed enrichment of age-associated CpG sites in chromatin segmentation features. Consistent with multiple prior studies, we observed differing patterns of enrichment for hypermethylated sites compared to hypomethylated sites across tissues. Hypermethylated CpGs (colon and lung) showed enrichment in “repressed” polycomb and “poised” transcription states (bivalent domains including bivalent enhancers, flanking bivalent TSS/enhancers, and bivalent TSS). Hypomethylated CpGs (colon, lung, ovary, and breast) showed enrichment in enhancers and genic enhancers [13, 16, 25, 28, 55]. The polycomb group proteins (PrG) are chromatin-associated, multimeric proteins that regulate a diverse set of genes. PRC2 (one of the enzymatic forms of PrG) is a histone methyltransferase, which targets developmental genes and is involved in the silencing of bivalent genes, marked by repressive (H3K27me3 and H3K9me3) and activating (H3K4me1, H3K4me2, H3K4me3) histone modifications [56, 57]. As such, it is plausible that PRC2 contributes to age-related transcriptomic and epigenomic changes. Future studies using single-cell chromatin immunoprecipitation sequencing (ChIP-seq) and transposase-accessible chromatin with sequencing (ATAC-seq) data will provide tissue- and cell-type specific defined chromatin states, further improving our understanding of the distribution of age-associated in the epigenome.

Pathway enrichment analysis of age-associated CpG sites assigned to annotated genes showed clear enrichment of cancer-related and cell-signaling pathways in multiple tissue types. Hypermethylation at bivalent domains is also a signature found in various cancers [55, 58,59,60]. Aging is one of the major risk factors for cancer development [61,62,63]. Alterations in the epigenomic landscape with increasing age, can lead to chromatin conformations primed with oncogenic potential (e.g., aberrant silencing of tumor suppressor genes and activation of pro-oncogenes). This increased plasticity of the chromatin can additionally disrupt other processes, including DNA repair mechanisms and telomere maintenance, which taken together contribute to the development of cancer [64].

To provide functional insights into the consequences of epigenetic aging, we assessed the correlation between age-associated DNAm and local gene expression changes using a subset of samples with both gene expression and DNAm data. Our analysis identified several new multi-tissue signals, most interestingly, age-eQTMs in the CDKN2A region (colon, prostate, and lung). CDKN2A, cyclin dependent kinase inhibitor 2A, encodes the cell-cycle inhibitor p16. This gene has been extensively studied in the context of cancer and cellular senescence, with cellular damage and stress leading to activation of CDKN2A. Exponential increase in expression of CDKN2A has been observed with aging [65,66,67]. In our work, we find an increase in expression of CDKN2A and hypermethylation of the associated-CpG sites (located close to the gene and within the gene body) with increasing age. Although these results highlight a link between age-related DNAm and age-related gene expression changes, it does not address the mechanistic or causal relationships driving these changes.

While our study identified and characterized the effects of age on DNAm patterns, the DNAm measures used were obtained using the EPIC array, which covers a small proportion (~ 2%) of CpG sites in the human genome. Additionally, we were limited by small sample sizes (n ~ 50) for some tissues and thus, were unable to detect many associations in these tissue types and make conclusions on the extent of shared age effects between tissue types. Another limitation of this study is the skewed age distribution, with generally lower number of samples from younger donors compared to older ones. This potentially limited our ability to detect age effects on DNAm that are more prominent at younger ages. For some of our analyses, namely enrichment among chromatin segmentation features and identification of age-eQTMs, we were only able to perform them in a select number of tissues due the lack of matching reference datasets.

Conclusion

Our work highlights the importance of multi-tissue analyses to gain insights into effects of age on the epigenome. Future studies should use whole-genome data on DNAm, larger sample sizes of diverse tissue/cell types, and additional epigenetic features to validate and expand our findings to better understand the effects of aging in the human genome.