Background

Health disparities are marked differences or inequalities in health measures and indicators, such as morbidity and mortality, between two or more population groups. Health disparities disproportionately affect African Americans (AAs), other racial minorities, and the socioeconomically disadvantaged. The disparities for overall longevity as measured by life expectancy are particularly compelling in the USA where the most pronounced life expectancy gap for AA men who experienced a 20.7-year life expectancy gap when compared to Asian women who had the best overall survival [1]. Nearly two decades later, this troubling trend of lower life expectancy still continues to persist [2]. AAs manifest age-related phenotypes and develop chronic diseases such as cardiovascular diseases, diabetes, and cognitive disorders at younger ages than other demographic groups. This suggests that AAs experience significant rates of premature biological aging. A study conducted in participants of the National Health and Nutrition Examination Survey found that AAs age significantly faster than whites and that biological aging was associated with all-cause mortality rates [3].

The causes of health disparities and its premature aging phenotype are multifactorial and include but are not limited to socioeconomic status, psychosocial stress, genetics, poor access to health care, education, and toxic environmental exposures. However, it has never been fully explained how social determinants of health result in the premature aging phenotype, poor health outcomes, and reduced overall survival. The transduction of a social determinant of health may involve genomic and epigenomic processes that are indeterminate at this time. Among the various epigenetic processes, DNA methylation could be one potential mechanism that may mediate this observed disparity given that DNA methylation is influenced by age, lifestyle, environmental, and host factors [4, 5]. There is evidence from targeted methylation studies that DNA methylation changes are associated with socioeconomic status and age-related diseases [6, 7].

DNA methylation regulates gene expression and maintains genome stability. It is a dynamic process that changes over an individual’s lifespan and is influenced by age and environmental and genetic factors [4]. Further, altered patterns of DNA methylation have been considered as one of the hallmarks of aging and lifespan [8]. Identification of age-associated DNA methylation changes among diverse population groups could provide clues on the epigenetic basis of aging and age-related health disparities among population groups. However, AAs and other racial minorities are underrepresented in epigenetic studies of age-related diseases. Previous studies have identified several age-associated differentially methylated CpG positions (aDMPs) located in genes implicated in chronic diseases and aging [9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. However, the majority of these studies were limited by their low genome-wide coverage of CpG sites and also were mostly comprised of populations of European ancestry, therefore precluding the study of the role of DNA methylation in the biology of age-related health disparities among minority populations. The recent development of biological age prediction algorithms based on methylation levels of genome-wide selected CpG sites using elastic net regularized regression methods referred to as DNA methylation age (DNAm age) also known as epigenetic age or the “epigenetic clock,” and the demonstration of robust correlations between DNAm age (epigenetic age) and chronological age provides a valuable research tool to study the social determinants of biological age acceleration [24, 25]. Epigenetic age acceleration has been associated with overall and cause-specific mortality, physical and cognitive function decline, and other aging-related diseases; thus, epigenetic age has been suggested to be a marker of biological age [29,30,31,32,33]. Although sociodemographic and lifestyle factors were shown to accelerate epigenetic aging [32, 34, 35], the interplay between these factors is poorly understood. Specifically, whether race, socioeconomic status, and sex interact with each other to influence accelerated epigenetic age is not known.

The objectives of the present study were (1) to identify novel aDMPs among AAs and whites and (2) to assess the association between race, sex, and poverty status and their interaction on epigenetic age acceleration. We conducted the present study in samples drawn from the Healthy Aging in Neighborhoods of Diversity across the Life Span (HANDLS) study [36]. HANDLS is a population-based longitudinal study of community-dwelling urban AAs and whites aged 30–65 years. Using the Illumina Infinium MethylationEPIC BeadChip, we quantified genome-wide DNA methylation levels at single-CpG dinucleotide resolutions in blood DNA collected from AA and white men and women above and below poverty status. We performed epigenome-wide association analysis of chronological age. We also assessed main effects and interactions between sex, race, and poverty status on epigenetic age acceleration measures. We found that chronological age was associated with widespread DNA methylation changes in various CpG positions and that AAs compared to whites had more aDMPs. These aDMPs were enriched for important genetic regulatory regions, cellular pathways involved in growth and development, and age-related chronic disease susceptibility loci identified by genome-wide association analyses. We also found that AA men had a faster aging corroborating the epidemiologic observations that AA men have a shorter life expectancy.

Results

MethylationEPIC array methylation data preprocessing and normalization

We measured DNA methylation using the Illumina Infinium MethylationEPIC BeadChip in blood DNA of 487 participants (244 AAs, 243 whites, 248 men, and 239 women) and 12 technical replicates for quality control. Four hundred seventy participants (50.4% AAs, 50.6% men, and 49.8% above poverty status) passed quality control (Table 1). The age range of the total study participants was 30.2–65.2 years with mean age of 48.7 (standard deviation (sd) = 8.7). The mean (sd) age was 48.4 (8.8) in AAs and 49.0 (8.7) in whites (p value = 0.48) indicating no difference in age distributions. Because the EPIC array is a new technology, we compared the performance of different methylation data normalization and preprocessing approaches using methylation levels of technical replicates to identify an optimal method for data preprocessing. After excluding probes with detection p value ≥ 0.01, cross-hybridizing probes, and probes containing single nucleotide polymorphisms (minor allele frequency cutoff = 0.05) available in the DMRcate package [37], we calculated correlation and probe variance of methylation beta values between technical replicates. Additional file 1: Figure S1a and S1b show the relative performance (correlation and probe variance) of the different methylation data normalization and preprocessing methods: Illumina Genome Studio (Illumina), normal-exponential out-of-band (NOOB), stratified quantile normalization (quantile), subset-quantile within array normalization (SWAN), and no normalization (raw). We found that the NOOB method yielded a higher correlation between technical replicates compared to the other methods.

Table 1 Demographic characteristics of the HANDLS study participants with complete DNA methylation data

Identification of age-associated differentially methylated CpG positions in African Americans and whites

We hypothesized that there would be differences in genome-wide age-associated DNA methylation changes between AAs and whites. To identify age-associated differentially methylated CpG positions (aDMPs), we performed epigenome-wide association analysis of chronological age. We examined the association of baseline chronological age with each of the 765,808 CpG positions that passed quality control separately for AAs (N = 237) and whites (N = 233). Linear regression models were adjusted for sex, race, poverty status, estimated white blood cell compositions (granulocytes, monocytes, natural killer cells, B cells, CD4+, and CD8+ T cells), and the first two principal components to account for population stratification. Using a stringent Bonferroni corrected significance threshold and excluding CpGs with effect size between − 0.01 and 0.01, we found significant associations with chronological age for 4930 aDMPs in AAs and 469 aDMPs in whites with effect sizes ranging from − 0.039 to 0.051 in AAs and − 0.050 to 0.047 in whites. Figure 1a and b show the distributions of p values of the association between individual CpG positions and chronological age in AAs and whites, respectively. Of these significantly associated aDMPs, 4343 in AAs and 166 in whites gained methylation (hypermethylated) with age (Fig. 1c, d). Although there was substantial overlap of significantly associated aDMPs between AAs and whites, there were more wide-spread age-associated changes (hyper- and hypomethylated) in AAs compared to whites (Fig. 2a, b and c).

Fig. 1
figure 1

Distribution of age-associated differentially methylated CpG positions (aDMPs) with their effect size in beta values and significance p value in the African American (AA) and white participants of the HANDLS study: a Manhattan plot in AAs, b Manhattan plot in whites, c volcano plot in AAs, and d volcano plot in whites

Fig. 2
figure 2

Venn diagrams of significantly age-associated differentially methylated CpG positions (aDMPs) in African Americans (AAs) and whites: a overlap of aDMPs between AAs and whites, b overlap of aDMPs that were hypermethylated with age, and c overlap of aDMPs that were hypomethylated with age

Thus far, studies of DNA methylation with chronological age have identified 5321 unique aDMPs in blood-derived DNA using the previous 27K and 450K methylation BeadChips [10, 12,13,14, 17, 19, 21, 22, 25, 27, 28]. We replicated a number of these previously reported aDMPs in either AAs (1204 aDMPs) or whites (220 aDMPs) with p value < 6.53E−08. The list of previously reported aDMPs replicated in our study is shown in Additional file 2: Table S1. Of these previously reported aDMPs, 174 CpG positions including those located in ELOVL2 replicated in both AAs and whites in the HANDLS study. Previous studies of DNAm and age were conducted in predominantly European ancestry samples and were limited by the number of genome-wide CpG coverage. Using the EPIC array and a balanced sample size of AAs and whites, we found novel aDMPs in both AAs and whites. Of the 4930 aDMPs identified in AAs, 3726 (75.6%) were novel compared with 249 (53.1%) of the 469 aDMPs identified in whites. Table 2 shows the top 50 aDMPs that have not been previously reported. The beta coefficients of these top-ranking aDMPs ranged from − 0.030 to 0.041 in AAs and from − 0.034 to 0.044 in whites. Additional file 3: Figure S2a and b show scatter plots of top ten aDMPs and their corresponding Pearsons’s correlation coefficient with age in AAs and whites, respectively. Some of the top novel age-methylation associations identified in both AAs and whites include CpG positions located in FGF14, FHL2, C1QC, CELF6, NEFM, and LHFPL4. The top unique age-methylation association in AAs were C21orf91, JAZF1, NEURL1, and ADGRB2, and in whites were SLC25A21, CPED1, NRXN3, and OTUD7A (Table 2).

Table 2 Novel age-associated differentially methylated CpG positions (aDMPs) identified using the MethylationEPIC BeadChip

Genomic feature enrichment and functional annotations

We then characterized each of the significant aDMPs for enrichment across various regulatory regions of the genome as determined by the ENCODE and FANTOM projects using Fisher’s exact test. Enrichment analysis was performed separately for aDMPs that showed gain (hypermethylated) and loss of methylation (hypomethylated) with chronological age. As expected, aDMPs that gained methylation with age were enriched at CpG islands (Additional file 4: Table S2). Hypermethylated aDMPs were also enriched at 5′ UTR, DNase I hypersensitivity sites (DHS), first exon regions, and reprogramming-specific genomic regions (Additional file 4: Table S2). Hypomethylated aDMPs were enriched at Open Sea regions, transcription factor binding sites, CpG island shore regions, and open chromatin regions (Additional file 5: Table S3).

To further understand the functional significance of aDMPs and to identify canonical pathways overrepresented among aDMPs, we performed gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis while accounting for the differences in the number of CpG positions present in each gene in the MethylationEPIC BeadChip. aDMPs in both AAs and whites were enriched for gene ontology terms related to system and organismal development and morphogenesis. These top terms include central nervous system development, multicellular organism development, and cell-cell signaling (Additional file 6: Table S4 and Additional file 7: Table S5). These results suggest that aging may affect the methylation status of genes and pathways that are important for growth and development in the nervous system and other organ systems.

Age-related disease gene enrichment analysis

To characterize the significance of hyper- and hypomethylated aDMPs in age-related diseases and phenotypes, we performed enrichment analysis using genes identified by genome-wide association studies (GWAS). We focused on GWAS-identified genes implicated in age-related diseases and quantitative traits and longevity and survival (overall and disease-specific) [38]. After accounting for multiple testing, we found that genes containing hypermethylated aDMPs were enriched for genes linked with visceral fat distribution, lung function, cognitive ability, blood pressure, and IgG glycosylation (Additional file 8: Table S6). Hypomethylated aDMPs were enriched for iron homeostasis and breast cancer (Additional file 9: Table S7).

Assessment of DNA methylation age using the epigenetic clock

We calculated DNA methylation age (DNAm age) for each of the participants using the Horvath [24] and Hannum algorithms [25] implemented in the online DNAm age calculator [24]. DNAm age predicted by both the Horvath and the Hannum clock was strongly correlated with chronological age (Pearson’s r = 0.85) (Fig. 3). In subgroup correlation analysis stratified by sex, race, and poverty status, similar strong correlations between chronological age and DNAm age were observed (Pearson’s r range 0.83–0.89) indicating that the epigenetic clocks are robust estimator of chronological age and that the prediction algorithms performed well in our cohort.

Fig. 3
figure 3

Correlation between DNA methylation-predicted age based on the Horvath and the Hannum clocks, and chronological age in the HANDLS study. Abbreviation: AAs: African Americans

Determinants of epigenetic age acceleration

Epigenetic age acceleration, generally defined as the difference between DNAm age and chronological age, has been suggested as a promising marker of biological age [31]. We hypothesized that there would be population differences in biological age such that men compared to women, AAs compared to whites, and individuals below poverty status compared to those above would be fast agers. To test our hypothesis, we first computed a universal measure of age acceleration (AgeAccel) as the residuals of regressing DNAm age predicted by the Horvath method, which is independent of cell and tissue types, over chronological age. The absolute mean (± standard error) AgeAccel (in years) was − 1.03 (± 0.47) in AA women, + 1.01 (± 0.40) in AA men, − 0.20 (± 0.19) in white women, and + 0.19 (± 0.43) in white men. Positive residual values of AgeAccel indicate faster aging based on chronological age, and negative residual values indicate slower aging. Two additional measures of epigenetic age acceleration were derived: intrinsic epigenetic age acceleration (IEAA) and extrinsic epigenetic age acceleration (EEAA). IEAA is meant to capture cell-intrinsic properties of the aging process, which is independent of estimated white blood cell type proportions, while the EEAA measure is enhanced by white blood cell estimates and may further capture age of the immune system cells (immunosenescence) [31]. We found that men compared to women were fast agers in AgeAccel and EEAA (Fig. 4a). No differences between AAs and whites were observed in AgeAccel and IEAA measures, but significant differences in EEAA were observed by race (p = 4.8E−17) (Fig. 4b). Poverty status was not associated with any of the epigenetic age acceleration measures (Fig. 4c).

Fig. 4
figure 4

Associations between epigenetic age acceleration measures and sex, race, and poverty status. a Sex, b race, and c poverty status. Abbreviations: AAs: African Americans; AgeAccel: universal age acceleration measures; IEAA: intrinsic epigenetic age acceleration; and EEAA: extrinsic epigenetic age acceleration

We used linear regression models to assess two-way interactions between sex, race, and poverty status on the three epigenetic age acceleration measures. A nominally significant p value between sex and race for AgeAccel (pinteraction-term = 0.049) was observed but not for IEAA (pinteraction-term = 0.058) or EEAA (pinteraction-term = 0.6). Specifically, AA men by + 2.04 years (p = 6.07E−04) and white men by + 1.24 years (p = 0.038) were fast agers using AgeAccel compared to AA women (Table 3). However, if we were to account for multiple testing (nine tests: three age acceleration measures and three two-way interaction terms), the results become non-significant. Figure 5a–c shows interaction plots of the association between sex, race, and AgeAccel, IEEA, and EEAA and highlights the AgeAccel differences between AA women, AA men, and white men.

Table 3 Association between sex, race, poverty status, and measures of epigenetic age acceleration
Fig. 5
figure 5

Interaction plots of the association between sex, race, and three measures of epigenetic age acceleration. a AgeAccel, b IEAA, and c EEAA. Abbreviations: AAs: African Americans; AgeAccel: universal age acceleration measures; IEAA: intrinsic epigenetic age acceleration; and EEAA: extrinsic epigenetic age acceleration

Discussion

To the best of our knowledge, this is the first study to apply the MethylationEPIC BeadChip with its enhanced and expanded genome-wide CpG coverage to assess the genome-wide distribution of age-associated DNA methylation changes and to perform comparative analysis of aDMPs among socioeconomically diverse urban community-dwelling AAs and whites. We replicated several aDMPs previously discovered in blood DNA using the 27K and 450K arrays. However, we identified novel CpGs (75.6% of significant aDMPs in AAs and 53.1% of significant aDMPs in whites) with small effect sizes that were either hyper- or hypomethylated with age using stringent significance threshold criteria. Interestingly, we found that compared to whites, AAs display more widespread DNA methylation changes. Our results also indicate that, compared to AA women, white men and AA men are found to be fast agers as indicated by AgeAccel and IEAA measures. On the other hand, AA women have a slower age of the immune system cells as indicated by EEAA. Contrary to our hypothesis, there is no association between poverty status and any of the epigenetic age acceleration measures.

Age and the social determinants of health (sociodemographic characteristics, lifestyle, and environmental factors) are important risk factors of most chronic diseases. In many ways, these risk factors disproportionately affect racial minorities, socioeconomically disadvantaged, and marginalized population groups. There is ample evidence in the literature that supports the notion of premature aging or “weathering” among AAs in particular [39]. How these social determinants of health bring about the molecular and cellular changes that lead to chronic disease and aging-related health disparities is poorly understood. DNAm is one of the epigenetic modifications that plays an important role in the regulation of various cellular processes including developmental processes and imprinting, gene expression, and maintenance of genome stability [4]. DNAm, which is a dynamic process that is continuously added and removed from the genome during the lifespan of an individual, is affected by both environmental exposures and external stresses. Therefore, it could play a role in age-related health disparities. Aging is generally characterized by DNAm changes specifically a gain of methylation in CpG islands, bivalent chromatin domains [21], and polycomb-group target genes [22], and loss of methylation predominantly in non-CpG island regions and in the active chromatin mark H3K4me1 [11]. In regard to CpGs used in the epigenetic clocks, there is evidence in AAs indicating that a third of the CpGs in the Horvath’s epigenetic clock respond to glucocorticoid receptor activation and influence the gene expression of stress-responsive genes which are enriched for association with aging-related diseases [40]. Further, some of the CpGs found in the epigenetic clock of mice have been shown to be involved in the development, differentiation, and tissue morphogenesis consistent with a program-like behavior [41]. Although the exact mechanisms that drive changes in DNAm during aging are not fully understood, by the virtue of its link with age-related diseases and risk factors, DNAm is a promising molecular factor that could link health disparities and its risk factors. Therefore, understanding age-related DNAm changes and identification of differential association among racial groups could shed light on aging and aging-related health disparities.

Our findings of aDMPs and their enrichment for genomic regulatory elements, developmental, and morphogenesis processes are broadly consistent with previously published epigenetic association studies of chronological age showing that age is associated with extensive DNA methylation changes (both hyper- and hypomethylation) that overlapped with functional genomic regulatory regions [11, 21, 22]. We observed that genes containing aDMPs overlap with genes implicated in several age-related diseases and traits in genome-wide studies [38]. This overlap between genes containing aDMPs and age-related disease genes and traits implies there are common factors and underlying mechanisms that generally control changes in DNAm with age and the development of age-related diseases. Understanding these factors and the underlying molecular mechanisms will have implications in the effort to narrow down the age-related disease disparity gap between population groups. We replicated several aDMPs including those located in ELOVL2, PENK, KLF14, and SLC12A5. Interestingly, methylation changes in ELOVL2 and PENK are associated with age in various tissues in addition to DNA derived from blood [25, 27] suggesting that some age-associated methylation changes are tissue independent and that blood DNA could be used to further investigate the underlying biological mechanisms and downstream functional alterations in large population-based studies where blood is the most feasible source of DNA for methylation profiling. ELOVL2 has been linked to changes in human plasma metabolite levels such as glycerophospholipids [42], and photoaging response in epidermis [43]. It remains to be seen how age-associated DNAm changes in ELOVL2 and the other consistently replicated genes mechanistically contribute to aging-related disease development and accelerated biological aging phenotypes. The epigenome-wide association analysis results of methylation M values are reported here (Additional file 11: Figure S4).

The aDMPs we identified, similar to previous studies, have small beta coefficients, consistent with previous observations with other traits and exposures. What is the biological meaning of these large numbers of aDMPs with small effect sizes? First, one of the functions of DNA methylation is regulation and fine tuning of transcription by transducing external and internal cues, and hence, large effects may not be observed. Second, age-related common diseases are multifactorial in origin and include genetic and epigenetics factors which act in concert with each other or other risk factors. Therefore, it is possible that there could be interactions between significantly associated aDMPs and acting in synergy (between themselves or with other epigenetic modifications such as histone acetylation and methylation); they could bring about changes in transcription thereby contributing multiplicatively to age-related disease risk. Third, accumulating evidence indicates that these age-related chronic diseases have been shown to be associated with a large number of genetic sites mostly with small effects. Epigenetics and epigenetic inheritance have been put forward as a potential explanation for missing heritability of complex diseases, i.e., inherited risk factors of common complex diseases and traits that are yet to be identified [44]. Therefore, CpGs with small effect sizes would be consistent with the observations that common complex age-related diseases are associated with a large number of sites with small effect sizes that could have cumulative effect on disease risk. Finally, analogous to the variety of different molecular and cellular changes such as mutations that accrue over the life course of an individual, it is possible that not all the DNA methylation changes linked with age may have discernable biological consequences, and they might be “passengers” rather than key “drivers” DNA methylation changes that could lead to the expression a certain aging-related trait or disorder [45]. While these explanations require empirical data, it is important to note that large numbers of CpGs with small effect sizes linked with age, several environmental exposures, and other traits have been reported in diverse study designs and settings and were consistently replicated suggesting that these seemingly small effect size associations are robust and could have true biological significances [46].

Our findings that men are generally fast agers as indicated by AgeAccel and EEAA measures are consistent with previous studies of age acceleration studies using DNA extracted from blood, brain, and saliva [34, 35, 47]. This biological age acceleration difference between men and women has been postulated to explain the sex morbidity-mortality paradox. Several factors have been put forward to explain this observed difference including differences in health seeking behavior and lifestyle factors [34]. The slower immune system cell age seen in AAs compared to whites is consistent with studies that reported longer leukocyte telomere length in AAs compared to whites [48, 49].

It should be noted that not all aging-related diseases are associated with the premature aging phenotype. For example, cancer tissues have been shown to display inconsistent patterns of aging rate which is dependent on the site of cancer origin, cancer stage/histology, and type and the number of driver somatic mutations. Compared to adjacent normal tissue, slower epigenetic age was observed in basal-like breast cancer and glioblastoma multiforme with H3F3A mutations [24]. On the one hand, faster epigenetic age was observed in tumors carrying a smaller number of somatic mutations, tumors with TP53 mutations, acute myeloid leukemia, hormone receptor-positive (luminal type) breast cancer, and BRAF-positive colorectal cancer [24, 50]. In addition, faster epigenetic age in various cancer tissue samples (lung, skin, breast, and kidney), compared to matched normal tissue samples, has also been reported [25].

The observation of a faster epigenetic aging in AA men is consistent with the epidemiologic literature reporting higher chronic disease risk earlier in the life course of AA men and premature mortality rates [1, 2, 51]. This finding implies that the biological aging rate of AA men ticks faster before they succumb to age-related diseases. What is driving this observed age acceleration in AA men? Stressors (psychosocial and socioeconomic stress and their correlates) could be one potential explanation. Although we did not observe in our study differential association between poverty status and any of the epigenetic age acceleration measures, cumulative lifetime stress has been shown to cause epigenetic age acceleration in AAs possibly through glucocorticoid-induced epigenetic changes [40].

The implication of our study is that some of aDMPs and the epigenetic clock could be utilized for the identification of at-risk groups or to determine the efficacy of clinical and public health interventions to extend lifespan and reverse the accelerated aging process. Recent studies conducted in animal models provide a potential usefulness of these age-associated DNAm changes in identifying interventions to extend longevity. Using mouse models and non-human primate studies, Maegawa et al. showed that age-associated DNAm changes are amenable to longevity intervention. Specifically, they found that mice and rhesus monkeys exposed to caloric restriction showed attenuation of age-associated methylation changes compared to ad libitum-fed controls such that their blood DNAm age appeared younger than their chronologic age, and these effects were detectable across different tissues [52]. Another study by Petkovich et al. conducted in mouse models showed that the epigenetic clock accurately estimates biological age of various mouse models, and it could be used to evaluate the longevity effects of caloric restriction, pharmacological interventions, and genetic intervention of longevity such as growth hormone receptor knockout [41]. These results suggest that anti-aging interventions can affect the epigenetic clock in mouse and in non-human primates, and future research lies in determining whether these findings translate to humans.

Our study has a number of strengths: (1) large sample size, (2) use of MethylationEPIC array that doubled the number of genome-wide CpG sites to interrogate and improve one of the limitations of previous epigenome-wide association studies of age, and (3) study participants with diverse characteristics and balanced sample size. The limitations of our study include the use of DNA derived from mixed peripheral blood cells, which could confound the observed association between CpG positions and chronological age. To address the issue of confounding due to cellular heterogeneity, we included white blood cell estimates based on DNAm in our regression models. While studying DNAm changes in DNA derived from sorted white blood cells is ideal, it is not scalable in large population-based studies like ours. Other limitations include the use of cross-sectional data and lack of validation sample to confirm our race-specific aDMPs. Because of the cross-sectional nature of our study, it is impossible in this design to determine whether all the reported aDMPs in our study were directly driven by age. In addition, it must be considered that it is possible that the high number of significant sites could also be influenced by population sub-stratification and other unmeasured covariates. Functional studies in experimental model systems of aging and age-related diseases could provide further biological insight and help interpret these results.

Conclusions

In summary, we found that age differences are associated with DNAm changes at several genes enriched for predicted functional genomic regulatory regions and that AAs compared to whites have more aDMPs, majority of which are novel CpG sites. The identification of novel aDMPs has the potential to expand our knowledge of the effect of age on DNAm and its differential effect among racial groups. Our study also shows that there is an interaction between sex and race in influencing epigenetic age acceleration among population groups. These age-associated genes could provide insight in the epigenetic bases of aging and age-related health disparities and could explain the observed differences in disease incidence and lifespan between AAs and whites. Future larger studies with longitudinal data are required to replicate our findings. The results generated in the present study also provide a valuable resource to the study and prioritization of genes and gene networks that might be implicated in aging and age-related diseases, and to advance the nascent field of the epigenetics of health disparities.

Methods

Study aim, design, and population

Participants were drawn from the HANDLS study (https://handls.nih.gov/) [36], a large population-based prospective longitudinal study of middle-aged AA and white men and women above and below poverty status being conducted in Baltimore, Maryland. HANDLS was designed to explore the interplay among sex, race, poverty status, and biological and environmental factors in the development of aging-related diseases and health disparities in community-dwelling adults vulnerable for health disparities. Participants eligible for this study had DNA samples isolated from blood at their enrollment. We randomly sampled 508 participants using a factorial design across sex, race, and poverty status, oversampling AA men below poverty status, to test for interactions among these sociodemographic factors. From these samples, 487 had DNA methylation measures. The age range at baseline was 30.2 to 65.2 years [mean age (standard deviation) = 47.8 (8.7)]. Poverty status in the HANDLS study was defined as a household income above or below 125% of the 2004 US Federal Poverty Guidelines, and race was self-identified. We used blood DNA and sociodemographic data collected during the enrollment period, from 2004 to 2009. DNA was extracted from peripheral blood mononuclear cells using standard methods.

Bisulfite treatment and DNA methylation quantification using the EPIC BeadChip

Two hundred fifty nanograms of DNA was treated with sodium bisulfite using Zymo EZ-96 DNA Methylation kit as per the manufacturer’s protocol (Zymo Research, Orange, CA, USA). Following bisulfite treatment, we then measured genome-wide DNAm using the latest Illumina Infinium MethylationEPIC BeadChip, which contains 866,836 CpG sites, of these, 142,262 (16.4%) were assayed in type I probes and the rest in type II probes. This new EPIC array has similar Infinium assay design chemistry as the previous 450K array, except that it has twice as much coverage of CpG sites that are particularly enriched for promoter and enhancer regulatory regions [53], providing increased power and genome coverage to identify novel loci relevant for aging and health disparities. The EPIC array contains 92% and 94% of CpG sites found in the 27K and 450K arrays, respectively. We included 12 technical replicates (two per plate) for quality control.

Quality control and preprocessing of the DNA methylation data

We performed extensive quality control of the DNAm data at sample and probe levels to ensure high-quality methylation data. Given the EPIC array is a new technology, we assessed the performance of different data normalization and preprocessing algorithms in reducing technical variations using DNA methylation measured in technical replicates to identify a suitable method. The methods we compared were Illumina Genome Studio, normal-exponential out-of-band (NOOB) [54], stratified quantile normalization (quantile) [55], and subset-quantile within array normalization (SWAN) [56]. At the sample level, we excluded 17 samples which were multidimensional scaling outliers, low-quality methylation values as indicated by a mean detection p value ≥ 0.01 and have evidence of sex mismatch between self-reported sex and methylation predicted sex. At the probe level, we excluded low-quality probes (mean detection p value ≥ 0.01), probes with overlapping single-nucleotide polymorphisms (SNPs) (minor allele frequency cutoff = 0.05), cross-hybridizing probes [37], and probes mapping to the sex chromosomes leaving 765,808 CpG positions for the current analysis.

To identify an optimal method for normalization, we compared the performance of the above methods using correlations and probe variances of methylation beta values of technical replicates. We found the NOOB method yielded the lowest variance and highest correlation between technical replicates. Therefore, this method was used to normalize the EPIC methylation data in our cohort. Regression on correlated probes (RCP) method was used to correct for type I and type II probe design biases [57]. We performed principal component analysis to identify the presence of experimental batch effect and beadchip position. We then applied the ComBat method to adjust for batch effect [58]. We estimated white blood cell proportions based on DNAm data using the Houseman and the Horvath methods [24, 59]. The Houseman method estimated the proportion of CD8+ and CD4+ T lymphocytes, natural killer cells, B lymphocytes, monocytes, and granulocytes. The Horvath method was used to estimate the percentage of exhausted CD8+ T cells (CD8+CD28CD45RA), plasmablasts, and the number of naïve CD8+ T cells (CD8+CD45RA+CCR7+).

Principal component analysis and control for population stratification

To correct for population stratification, we calculated methylation-based principal components. Barfield et al. showed that principal components calculated based on CpG positions that overlap with SNPs serve as proxy and provide powerful and computationally efficient approach to account for population stratification in the absence of genetic data [60]. We used the methylation level of 7905 CpG positions that passed quality control and overlapped with SNPs to calculate principal components. The first two principal components were then in the regression models to control for population stratification. Once these CpG positions were used for the calculation of principal components, they were dropped before downstream analysis.

DNA methylation age prediction and epigenetic age acceleration measures

DNAm age was calculated using the Horvath and the Hannum methods based on the methylation levels of 353 and 71 CpG sites, respectively, using the epigenetic clock algorithm. The algorithms were trained and validated on participants from varied genetic ancestries and using DNA derived from various tissues including blood DNA. A detailed description of estimation of DNAm age and epigenetic age acceleration can be found here: (https://labs.genetics.ucla.edu/horvath/dnamage/). Briefly, the Horvath method predicts age irrespective of the tissue or cell source of DNA, and hence, it is tissue and cell type agnostic. On the other hand, the Hannum method was developed based on blood DNAm. Universal epigenetic age acceleration (AgeAccel) is defined as the residuals after regressing DNAm age predicted by the Horvath algorithm over chronological age. A positive residual value suggests a faster aging, and a negative value suggests a slower aging. In addition to the AgeAccel measure, we used two additional epigenetic age acceleration measures that reflect intrinsic and extrinsic epigenetic age acceleration—IEAA and EEAA, respectively. IEAA is thought to measure a cell’s epigenetic age acceleration, independent of estimated white blood cell type proportions. It is defined as the residuals after regressing DNAm age (predicted by the Horvath over chronological age and white blood cell proportions (naive CD8+ T cells, exhausted CD8+ T cells, plasmablasts, CD4+ T cells, natural killer cells, monocytes, and granulocytes). EEAA, which is based on the DNAm age (predicted by the Hannum algorithm), is thought to measure epigenetic age acceleration in addition to changes in white blood cell proportions, and it might be indicative of immune system cell aging (immunosenescence) [31]. Chen et al. showed that EEAA strongly predicts mortality better than the other measures of epigenetic age acceleration [31].

Epigenome-wide association study of age

To identify CpG positions associated with chronological age in AAs and whites, we fitted linear regression models adjusted for sex, race, poverty status, white blood cell compositions (granulocytes, monocytes, natural killer cells, B lymphocytes, CD4+, and CD8+ T lymphocytes), and principal components to account for population stratification. The resulting epigenome-wide results were adjusted using estimated empirical null distribution method, a recently described method for controlling genomic inflation and bias in epigenome-wide association studies [61].

The quantile-quantile plots, histogram and density plots of p values, and corresponding inflation measures before (AAinflation = 1.040 and whitesinflation = 1.263) and after (AAinflation = 0.998 and whitesinflation = 0.807) correction for inflation in both AAs and whites are shown in Additional file 10: Figure S3. In order to confirm the approximately 5000 differences found after correcting for genomic inflation were valid, we performed a sensitivity analyses by testing associations based on winsorized DNA methylation data. To increase the power estimation, we performed the winsorization for the extreme values over 3*IQR.

The inflation correction was performed using an R/Bioconductor package BACON [61], which constructs an empirical null distribution using a Gibbs Sampling algorithm by fitting a three-component normal mixture on z-scores. Respective genomic inflation factor (lambda) values before and after the correction were provided in the Additional file 12: Table S8. We used the DNAm beta (β) and M values of 765,808 CpG positions that passed quality control in the regression analysis. The DNAm β values were calculated as the ratio of the methylated (M) and unmethylated (U) fluorescent intensity signals, i.e., β = M/[M + U + 100], where 0 indicates unmethylated and 1 indicates fully methylated status. The methylation M value was defined as M = logit(β) and was shown to approximate a normal distribution and provide good model fit [62]. A Bonferroni corrected p value of 6.53E–08 was used as a significance threshold. We excluded CpG with effect size between − 0.01 and 0.01. The summary statistics are available at the HANDLS study website (https://handls.nih.gov/). The lambda estimates for M values after the inflation correction for AAs ranged from 0.99 to 1.39, whereas for whites, it ranged from 0.80 to 1.21 (Additional file 12: Table S8).

Genomic feature enrichment and functional annotations

Genomic regulatory feature enrichment was performed on significant aDMPs using one-sided Fisher’s exact test. The genomic regulatory regions compared for enrichment analysis included CpG islands, shelves, shores, DNase I hypersensitivity sites, open chromatin states, exon boundaries, transcription factor binding sites, and Phantom enhancer regions. Functional characterization through gene ontology and canonical pathway analysis was performed by accounting for the differing number of methylation probes found in each gene in the EPIC array given the evidence that a severe bias exists when performing gene set analysis for genome-wide methylation data that occurs due to the differing numbers of CpG sites profiled for each gene [63].

Age-related disease gene enrichment analysis

To assess enrichment of genes containing aDMPs for various age-related diseases and quantitative phenotypes, we leveraged the genome-wide association study (GWAS) summary association statistics catalog of various age-related diseases and traits available at the National Human Genome Research Institute (NHGRI) and the European Molecular Biology Laboratory (EMBL) [38]. Using the summary statistics catalog (accessed on July 17, 2018, https://www.ebi.ac.uk/gwas/) that included gene names, ontology annotations, and diseases/traits, we investigated whether genes containing aDMPs were overrepresented in GWAS-identified age-related diseases genes. While age is associated with many diseases and their risk factors and quantitative traits, the enrichment analysis was focused on GWAS-identified genes implicated in age-related diseases and traits, longevity, and survival (overall and disease-specific). Enrichment analysis was performed separately for hyper- and hypomethylated aDMPs using one-sided Fisher’s exact test.

Association between epigenetic age acceleration and demographic characteristics

Linear regression models adjusted for sex, race, and poverty status were used to identify associations between the epigenetic age acceleration measures (AgeAccel, IEAA, and EEAA) and these factors. We also assessed all two-way interactions among sex, race, and poverty status on the epigenetic age acceleration measures. Data quality control, preprocessing, and data analyses were conducted using the R/Bioconductor packages minfi, enmix, dmrcate, sva, and missmethyl (https://www.bioconductor.org/).