Introduction

Abnormal blood lipid levels are important risk factors for various diseases including cardiovascular disease1, diabetes2, renal disease3, Alzheimer’s disease4, and cancers5,6. Several large-scale genome-wide7,8 and exome-wide association studies of lipids9,10 have identified single nucleotide polymorphisms (SNPs) involved in lipid metabolism. However, the biological mechanisms behind abnormalities in lipid metabolism are not fully understood.

Complex traits are a manifestation of not only genetic but also environmental factors which in part express themselves through the epigenetic modification of DNA in all cell types. Epigenetic modifications can explain differences in a phenotype between monozygotic twins11 as well as changes in a phenotype within an individual over time12. Epigenome-wide association studies (EWAS) provide an opportunity to document differences in epigenetic marks between individuals through the quantification of the degree of methylation at thousands of CpG sites across the genome. To date, such studies have identified methylation signatures associated with a number of cardiometabolic traits including cigarette smoking13, BMI14, hepatic fat15, fasting insulin or HOMA-IR16, incident type2 diabetes17, renal function18, blood pressure19, and C-reactive protein20. In addition, several EWAS identified CpGs significantly associated with blood lipid levels in populations of European ancestry21,22,23,24. However, large-scale multi-ethic studies to identify epigenetic determinants of blood lipid levels are lacking25.

Here, we present a racial/ethnic groups specific meta-analysis of 15 EWAS of DNA from circulating white blood cells involving a total 16,265 participants from 3 racial/ethnic groups to identify CpGs with DNA methylation levels that are significantly associated with blood lipid levels. We identify 187 racial/ethnic specific novel CpG associations among Europeans, African Americans, and Hispanics and an additional 186 novel association through a trans-ethnic meta-analysis. To aid in the interpretation of our results, we quantify the consistency of associations across racial/ethnic groups, determine overlap between our findings and previously published relevant genome wide and epigenome-wide association studies, explore for the presence of cis-methylation quantitative trait loci (cis-mQTL) and/or cis-expression quantitative trait methylation (cis-eQTM) for CpG associations found to be significant in at least 2 racial/ethnic groups, and attempt to provide evidence on the direction of these associations using bi-directional Mendelian randomization.

Results

Study population

We analyzed 12 cohorts of Europeans (EA) involving 11,114 participants, 7 cohorts of African Americans (AA) involving 4,425 participants, and 2 cohorts of Hispanics (HISP) involving 699 participants (Table 1, Supplementary methods). The TwinsUK, WHI-BA23, and WHI-EMPC cohorts were composed of female participants only while NAS was composed of male participants only. The range of mean age, body mass index (BMI), high-density lipoprotein (HDL) levels, low-density lipoprotein (LDL) levels, and triglyceride (TG) levels was 42.7 to 76.0 year, 26.6 to 32.6 kg per m2, 45.5 to 59.3 mg per dl, 104.9 to 152.6 mg per dl, 74.1 to 168.5 mg per dl, respectively (Table 1). The percentage of study participants taking any lipid control medication at time of blood lipid measurement ranged between 0% in the Amish to 44% in FHS.

Table 1 Descriptive of the participated cohorts (N = 16,265).

EWAS stratified by racial/ethnic group

We measured DNA methylation levels using the Illumina Infinium HumanMethylation 450 K Beadchip in peripheral blood leukocytes or whole blood, except in GOLDN where CD4 + T cells exclusively were used (Supplementary Data 1). We performed an EWAS on HDL, LDL, and TG using four linear mixed effects models in each cohort, stratified by racial/ethnic group and a random effects meta-analysis26 with genomic control (GC) and Bonferroni correction for the number of probes tested (P < 1.09 × 10−7) (Methods, Supplementary Table 1).

We identified 447, 25, and 496 CpGs for HDL, LDL, and TG, respectively, among EA using the basic set of covariates (model 1 adjusted for age, sex, smoking, lipid medication, four SNP PCs, estimated cell proportions, plate, row, and column of plate) (Methods, Supplementary Data 2). When we further adjusted for BMI (model 2 additionally adjusted for BMI), the numbers of significant CpGs decreased substantially for HDL (146) and TG (206) but increased modestly for LDL (30). Among AA, we identified 34, 7, and 76 CpGs in model 1 for HDL, LDL, and TG, respectively, and the numbers decreased to 9, 7, and 55 with a further adjustment with BMI (model 2). For HISP, we identified 2, 0, and 6 CpGs in model 1 for HDL, LDL, and TG, respectively, and the number decreased to 0 for HDL. Excluding participants taking any lipid lowering medication decreased the sample size by 18% and decreased power but the effect estimates remained similar (models 3 and 4 adjusted for the same set of covariates of models 1 and 2, respectively, with the exception of adjustment for the use of lipid medications, Methods). Among EA, we identified 74, 15, and 86 CpGs significantly associated (P < 1.09 × 10−7) with HDL, LDL, and TG, respectively, using this most conservative model 4 that excluded statin users and adjusted for BMI (Fig. 1, Supplementary Data 2, Supplementary Fig. 1). For AA, these numbers were 7, 5, and 43 and, for HISP, they were 2, 2, and 4 CpGs, respectively. Through trans-ethnic meta-analyses of the same model, we additionally identified 49, 24, and 119 significant (P < 1.09 × 10−7) CpG-lipid level associations for HDL, LDL, and TG, respectively, of which 46, 22, and 118 were novel when compared to both our racial/ethnic specific analyses and the literature.

Fig. 1: Manhattan plots for the meta-analyses of the epigenome-wide association studies.
figure 1

Manhattan plots for (a) high-density lipoprotein (HDL), (b) low-density lipoprotein (LDL), and (c) triglycerides (TG) in European (N = 11,114), African (N = 4,452), and Hispanic (N = 699) populations. Results are plotted as negative log-transformed P values (y-axis) across the genome (x-axis). Odd chromosomes are in green and even chromosomes are in orange. The red horizontal line represents the epigenome-wide significance threshold of 1.09 × 10−7. Linear mixed effects models were implemented adjusting for age, sex (reference = male), smoking variable (never/previous/current, reference = never), lipid medication (Yes or No, reference = No), the top four principal components from genotypes (SNPs), the proportion of 5 types of cells estimated with the Houseman method (CD8 T lymphocytes, CD4 T lymphocytes, natural killer cells, B cells, and monocytes), and random effects for plate, row, and column, and BMI (model 4). The top CpGs of each chromosome were annotated with a gene name (in blue font: identified in a racial/ethnic group; red: identified in multiple racial/ethnic groups; bold: significantly associated with multiple lipid measures).

Comparison of results across racial/ethnic groups

A majority of significant CpG-lipid level associations in European population did not reach statistical significance in other racial/ethnic groups. For CpGs found to be significant in at least one racial/ethnic group (model 4), we found a high rate of concordance (88 to 100%) in the direction of effect observed in Europeans versus that observed in African Americans and, separately, Hispanics for all three lipid fractions (Fig. 2). We found a high correlation between effect sizes (Pearson correlation coefficient = 0.69 to 0.93) for HDL and TG but not for LDL (−0.20 to 0.20). Lastly, regression slopes between any 2 sets of betas were close to 1 for HDL (0.75, 1.03) and TG (0.68 to 1.12) but not for LDL (−0.28 to 0.26) (Fig. 2). We calculated the correlations coefficients and estimated regression slopes for a higher number of CpGs (122 CpGs with P < 1.09 × 10−5) before and after natural log transformation to further explore whether the differences in correlations and regression slopes between LDL and HDL/TG could be a consequence of having much smaller number of significant CpGs and/or the use of a log transformed lipid measure. For these analyses, we found both higher correlation of betas (0.21 to 0.49) and regression slopes (0.34 to 0.47) for LDL although still not as high as those observed for HDL and TG (Supplementary Fig. 2).

Fig. 2: Scatter plots and regression lines of beta estimate pairs observed in two racial/ethnic groups for CpG-lipid trait associations.
figure 2

Plotted are betas from CpGs found to be significant (P < 1.09 × 10−7) in one or more racial/ethnic group. Plots and lines are shown between Europeans (EA) and African Americans (AA) (blue arrowheads/lines), Europeans (EA) and Hispanics (HA) (red triangles/lines), and African Americans (AA) and Hispanics (HA) (green diamonds/lines) for (a) high-density lipoprotein (HDL), (b) low-density lipoprotein (LDL), and (c) triglycerides (TG). Numerical regression slope and Pearson correlation coefficients are presented in the bottom right corner of each plot.

When comparing results across all 3 racial/ethnic groups, we identified 4, 1, and 26 CpGs associated with HDL, LDL, and TG, respectively, in more than one racial/ethnic group (Table 2). Of these, 1 CpG, cg06500161, in ABCG1 was associated with both HDL and TG in opposing directions. Consistent with our findings for significant CpG-lipid trait associations overall, we found high rate of concordance of the direction of the associations of the 30 CpGs across all 3 racial/ethnic group but variable effect sizes (Supplementary Figs. 3-5). The 4 CpGs in the ABCG1, carnitine palmitoyltransferase 1 A (CPT1A), sterol regulatory element binding transcription factor 1 (SREBF1), and thioredoxin interacting protein (TXNIP) genes identified in HISP for TG were also significant in the EA and AA populations. We found multiple CpGs to be significantly associated with TG levels located within or near the phosphoglycerate dehydrogenase (PHGDH), CPT1A, SREBF1, and ABCG1 genes. Twelve out of the 30 replicated CpGs significant in at least 2 racial/ethnic groups have not been previously reported21,22,23,24 (Table 2).

Table 2 CpGs significantly associated with lipids levels in more than one racial/ethnic group.

Overlap with prior related GWAS and EWAS

We examined whether the CpGs identified in our lipid EWAS are located in or near the genes implicated in previous lipid GWAS7,27,28. Among EA, a total of 4 out of 106 genes had both CpGs and SNPs significantly associated with a blood lipid: apolipoprotein A5 (APOA5), apolipoprotein B (APOB), myosin regulatory light chain interacting protein (MYLIP), and PARP9 (Fig. 3). Among AA, 4 out of 32 genes had been identified in both lipid EWAS and GWAS: PARP9, Chromosome 7 Open Reading Frame 50 (C7orf50), ATP Binding Cassette Subfamily A Member 1 (ABCA1), and MYLIP. We further identified 192 CpG-SNP pairs within 10Mbp at an additional 19 loci of which 30 pairs at 10 loci were within 1Mbp (Supplementary Data 5).

Fig. 3: Venn diagram of genes identified through epigenome-wide association study of lipids and their overlap with other EWAS as well as genome-wide association studies of lipids.
figure 3

Genes identified in the racial/ethnic specific (European, African, and Hispanic) stratified meta-analysis of CpG-lipid associations involving either high-density lipoprotein (HDL), low-density lipoprotein (LDL), or triglycerides (TG) levels after adjusting for BMI and excluding subjects on statins. Genes identified in more than one racial/ethnic population are in bold font. Genes identified in EWAS of related cardiometabolic traits are marked with special characters: BMI (#), hepatic fat (*), fasting insulin or HOMA-IR ($), incident type2 diabetes (&), eGFR (^), blood pressure (%), C-reactive protein (CRP) (@), and cigarette smoking (+). Genes identified in previous GWAS are underlined.

Many of the CpGs identified in our lipid EWAS have been previously identified in EWAS of BMI14, hepatic fat15, fasting insulin or HOMA-IR16, incident type2 diabetes17, estimated glomerular filtration rate (eGFR)18, blood pressure19, C-reactive protein20, and/or cigarette smoking13 (Fig. 3, Supplementary Table 2). About one half of the genes (33/192 = 17% of CpGs) identified in our BMI adjusted lipid EWAS have also been implicated in previously published BMI EWAS14. There were also twelve genes identified in more than two EWAS: ABCG1, CPT1A, 24-dehydrocholesterol reductase (DHCR24), PHGDH, phosphoethanolamine/phosphocholine phosphatase (PHOSPHO1), seryl-tRNA synthetase (SARS), SKI proto-oncogene (SKI), solute carrier family 1 member 5 (SLC1A5), solute carrier family 43 member 1 (SLC43A1), SLC7A11, SREBF1, and TXNIP. Among EA, 37 out of 106 genes (39 out of 164 CpGs) have been identified in EWAS of other phenotypes. There were 14 out of 32 genes (14 out of 54 CpGs) and 2 out of 6 genes (4 out of 8 CpGs) have been identified in EWAS of other phenotypes for AA and HISP, respectively.

Methylation quantitative trait loci analysis

We searched for methylation quantitative trait loci (mQTL) influencing methylation levels of the 30 CpGs listed in Table 2. Five out of the 15 cohorts provided genetic data for this analysis including ARIC (NAA = 1,717), GOLDN (NEA = 713), KORA (NEA = 1,379), WHI-BA23 (NEA = 790, NAA = 540, and NHISP = 324), and WHI-EMPC (NEA = 494, NAA = 424, and NHISP = 221). We restricted our analysis to SNPs located within 25 kilobases up- or downstream of the CpGs with a minor allele frequency (MAF) > 0.01 in each cohort and implemented a fixed effects meta-analysis within each of the three racial/ethnic groups. A total of 11, 18, and 5 CpGs had at least one significant mQTL in EA (number of tests = 5549, Bonferroni corrected P = 9.01 × 10−6), AA (number of tests=8316, Bonferroni corrected P = 6.01 × 10−6), and HISP (number of tests=4,713, Bonferroni corrected P = 1.06 × 10−5), respectively (Supplementary Data 3). We found 7 out of our 11 CpGs in EA to also be listed to have at least one mQTLs in the mQTL DB (http://www.mqtldb.org/)29 (Supplementary Data 4). For the 7 CpGs, 95% of the significant mQTL SNPs in mQTL DB were also significant in our EA population and had consistent direction of effect. Out of the 190 significant mQTLs (SNP-CpG pairs) common to both our study and the mQTL DB, 51 (27%) were found to be mQTLs in datasets spanning across the life course in the mQTL DB including birth, childhood, adolescence, pregnancy, and middle age.

Expression quantitative trait methylation analysis

The association between DNA methylation and gene expression was investigated in the Framingham Heart Study (NEA = 4,278 including 2,726 offspring cohort participants and 1552 third generation cohort participants)30. The DNA methylation levels of 7 out of the 30 CpGs listed in Table 2 were negatively associated with the expression of their respective genes: phosphoglycerate dehydrogenase (PHGDH) (cg14476101: P = 4.09 × 10−9), poly(ADP-ribose) polymerase family member 9 (PARP9) (cg22930808 with a transcript Chr3:122398047–122449684: P = 5.58 × 10−10; with a transcript Chr3:122246779–122283503: P = 3.27 × 10−6), solute carrier family 7 member 11 (SLC7A11) (cg06690548: P = 3.34 × 10−12), CPT1A (cg09737197: P = 2.67 × 10−10; cg17058475: P = 1.18 × 10−11), ABCG1 (cg07397296: P = 6.26 × 10−8; cg06500161: P = 4.44 × 10−53). Three of these CpGs are in the 5′UTR region while the remaining four are in the gene body. Aside from one CpG in a CpG island, six were located in either the north or south shore regions (Supplementary Table 3).

Mendelian randomization approach

We explored the causal relationships between methylation and blood lipid levels for the 30 CpGs in EA (Table 2) using a bi-directional Mendelian Randomization (MR) study design31,32. First, we used genetic risk scores (GRS) for HDL, LDL, and TG constructed from established susceptibility loci for these phenotypes as instruments to examine the relationship between blood lipids and methylation (Supplementary Table 4). We found the GRSs to be significantly associated with their respective lipid levels in the 4 cohorts (GOLDN (NEA = 713), KORA (NEA = 1,379), WHI-BA23 (NEA = 790), and WHI-EMPC (NEA = 494) participating in the MR follow-up analysis (HDL: Pmeta = 1.86 × 10−37, LDL: Pmeta = 1.13 × 10−22, TG: Pmeta = 1.13 × 10−8). Our Mendelian randomization analysis suggested that the DNA methylation levels of three CpGs, cg00574958 (P = 4.23 × 10−6), cg17058475 (P = 4.72 × 10−4), and cg09737197 (P = 3.33 × 10−3), located in the 5′UTR region of the CPT1A were influenced by blood TG levels (Supplementary Table 5). Next, we investigated the effect of DNA methylation on blood lipid levels. A total of 7 out of the 30 CpGs had at least one significant mQTL with available GWAS results from the Global Lipids Genetics Consortium results (Supplementary Table 6). We implemented inverse-weighted MR method and MR-egger when >2 mQTLs were available for a given CpG (3 CpGs out of 7). None of the CpGs were significantly associated with lipid levels using MR-egger.

Discussion

We report the first large-scale multi-ethnic epigenome-wide association study (EWAS) of blood lipids. Our population specific meta-analyses revealed 187 novel CpG-lipid trait associations and identified a high concordance of the direction of effects across racial/ethnic groups for all 3 lipid traits and a high correlation of effects sizes for associations with HDL and TG. A majority of our significant CpG-lipid associations do not implicate genes previously identified through GWAS of lipids7,8,9,10,27,28, but many of our associations overlap with those identified in EWAS to date of related cardiometabolic traits especially for TG and HDL14,15,16,17,18,19,33. Thirty CpG-lipid trait associations were significant in at least 2 racial/ethnic groups with ~1/3 of these being novel. In subgroup analyses, 19 significant CpGs also harbored mQTLs and 7 were inversely associated with levels of expression of the annotated gene. Lastly, our Mendelian randomization analyses suggested that DNA methylation levels at one locus appeared to be influenced by blood TG levels.

The numbers of statistically significant CpGs decreased dramatically for HDL and TG after adjusting for BMI. These findings suggest that BMI serves as either a strong confounder or a strong mediator of a large fraction of our CpG-lipid associations for these traits. Even after adjusting for BMI, we found ~1/3 of CpGs to be linked to BMI in other studies14. In addition to the potential of residual confounding, we hypothesize that many CpGs may be independently influenced by both BMI and blood lipid levels (akin to how diet and exercise have an independent effect on weight loss).

We found a relatively high overlap of findings from our study with previous EWAS findings for cardiometabolic traits and a relative paucity of overlap with lipid loci implicated through GWAS. We hypothesize that the influence of important upstream environmental determinants of the metabolic syndrome such as diet and exercise may be responsible for these patterns although substantial additional research is needed to prove this hypothesis including studies measuring changes in epigenetic profiles of multiple relevant cell types after dietary and physical activity interventions.

We could not detect methylation QTLs (mQTLs) for many of our 30 CpGs-lipid trait associations replicating across 2 racial/ethnic groups. Among mQTLs identified, a majority were not consistently associated with the CpG over the life course29. Whether this observation reflects direct changes in methylation levels of CpGs associated with lipid levels that occur over a lifetime of accumulated environmental exposures, such as diet and exercise, remains unknown. Other environmental exposures or time-dependent events leading to subtle changes in white blood cell proportions may also be responsible for these observations.

We highlight 3 EWAS lipid loci uncovered through our meta-analysis among the plethora of both novel and known findings. First, we found a CpG to be associated with PARP9 among our few LDL findings. PARP9 is homologous to PARP10 and both are ADP-ribosyltransferases with 30% of their catalytic domains being identical34,35. Both also have been previously identified as either LDL or total cholesterol loci through GWAS28. Second, we identified CpG associations in ABCG1 for both HDL and TG. ABCG1 is a member of the superfamily of ATP-binding cassette transporters that plays an important role in macrophage cholesterol efflux. Notably, ABCA1, another member of this gene family, has also been robustly linked to the control of HDL and TC levels through GWAS36. We replicate the previously reported association between ABCG1 methylation and ABCG1 expression37 and note that expression levels of ABCG1 have also been found to be positively correlated with obesity38. Lastly, the methylation status of cg06500161 in ABCG1 is associated with an elevated risk of developing type 2 diabetes39 while genetic variants mapped to this locus are linked to atherosclerosis40,41. These constellations of findings suggest that ABCG1 may play a role in predisposing to or mediating the effects of the metabolic syndrome. Third, we found the methylation status of CpGs in CPT1A, a gene that initiates the oxidation of long-chain fatty acids, to be influenced by blood levels of TG through our MR analysis. The same CpG (cg00574958) in CPT1A was also found to be influenced by blood pressure levels in another EWAS19. In other human and animal studies, DNA methylation levels of the CpGs in CPT1A have been associated with CPT1A expression21, plasma adiponectin levels42, the metabolic syndrome43, BMI14, hepatic fat15, and high fructose consumption44,45. Collectively, these findings suggest that methylation status of CPT1A may mediate the downstream effects of the metabolic syndrome.

The two major strengths of our study are its size and ethnic diversity. These strengths allowed for improved power to detect novel CpG-blood lipid trait associations and to robustly explore the generalizability of the findings across multiple racial/ethnic groups. Our study has limitations in other respects some of which are common to all EWAS studies of blood. First, it remains unclear whether epigenetic changes in blood cells serve as a good surrogate of changes in the most relevant tissues controlling blood lipid levels46. To establish a reliable surrogate tissue, interindividual epigenetic differences must not only correlate between blood and the tissue of interest but also exposures must induce similar changes to both tissues46. Such evidence either does not yet exist or is incomplete for most trait-exposure combinations47,48,49,50. While circulating leucocytes are likely to exert at least partial direct control over blood lipid levels51,52, other relevant tissues/cells that we did not examine include hepatocytes, adipocytes, and enterocytes. A second limitation common to all EWAS of circulating leucocytes includes the potential for findings to not be truly reflective of a chronic alteration of transcriptional regulation from environmental perturbations but rather residual confounding due to persistent cell subtype proportional heterogeneity despite the application of statistical deconvolution techniques46,53,54. Lastly, the cross-sectional design of our study makes it difficult to determine the directionality of our associations54. We attempted to provide additional evidence of directionality using Mendelian randomization techniques32, but the power of our MR analyses was limited due to the lack of availability of genetic data for many cohorts limiting sample size combined with a lack of strong genetic instruments for many of CpGs examined55. In addition, the known shared genetic background of HDL, LDL, and TG introduces the possibility of biases due to pleiotropy in our MR analysis.

In conclusion, we identified 373 novel CpG-lipid traits associations through the largest multi-ethnic EWAS to date. We found a high concordance of the direction of effects for all 3 lipids traits across racial/ethnic groups and a high correlation of effects for HDL and TG with 30 CpGs—including 12 novel CpGs—reaching stringent statistical significance in at least 2 racial/ethnic groups. A large majority of implicated genes do not overlap with lipid loci identified to date through GWAS although many loci associated with HDL and TG in >2 racial/ethnic groups have been associated with related cardio-metabolic traits in previous EWAS. We provide some limited insights on mechanism of association through our mQTL, eQTM, and MR analyses but additional studies are needed before firm conclusions can be made on the causality and mechanisms behind a large majority of the associations we observed.

Methods

Study populations

A total of 15 cohorts (N = 16,265) from the epigenetics working group in the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium participated in this study. These included the Old Order Amish (OOA), Atherosclerosis Risk in Communities (ARIC), Bogalusa Heart Study (BHS), Cardiovascular Health Study (CHS), Framingham Heart Study (FHS), Genetic Epidemiology Network of Arteriopathy (GENOA), Genetics of Lipid Lowering Drugs and Diet Network (GOLDN), Hypertension Genetic Epidemiology Network (HyperGEN), Cooperative health research in the Region of Augsburg (KORA), Normative Aging Study (NAS), Prospective Investigation of Vascularity of Uppsala Elders Study (PIVUS), Rotterdam Study (RS), UK Adult Twin Registry (TwinsUK), Women’s Health Initiative Broad Agency Announcement 23 (WHI-BA23), and the Women’s Health Initiative Epigenetic Mechanisms of PM-Mediated CVD (WHI-EMPC) cohorts. Four cohorts, BHS, CHS, WHI-BA23, and WHI-EMPC, examined more than one racial/ethnic group. The total number of cohorts in the European, African, and Hispanic study populations is 12 (N = 11,114), 9 (N = 4,452), and 2 (N = 699), respectively. The participating cohorts are described in the Supplementary materials. All studies obtained written informed consent from participants and were approved by local institutional review boards and ethics committees.

Lipid measurements

High-density-lipoprotein (HDL, mg per dl) and triglycerides (TG, mg per dl) were directly measured in blood samples taken from participants after at least an 8 h fast. Low-density-lipoprotein (LDL, mg per dl) was inferred using the Friedewald’s formula56 in all cohorts except for GOLDN, HyperGEN, and KORA where LDL was measured directly. We did not infer LDL in subjects with triglycerides >400 mg per dL and we excluded lipid measure from subjects who did not fast for at least 8 h. We also excluded outliers as defined by >5 standard deviations from the mean of blood lipid in each cohort. To reduce skewness, HDL and triglycerides were natural log-transformed.

DNA methylation measurement, QC, and normalization

DNA methylation was produced by investigators from each cohort independently. Levels were measured from peripheral blood leukocytes isolated from whole blood in all studies except GOLDN where only CD4 + T cells were examined. The EZ DNA Methylation Gold Kit (Zymo Research, Orange CA) was used for bisulfite conversion. The Illumina® Infinium HumanMethylation450 BeadChip and the Illumina BeadXpress reader were used to perform the methylation assays. Either the SWAN57 method in the minfi58 R package, the Beta Mixture Quantile method (BMIQ)59, the DASEN method in the wateRmelon R package60, or the GenomeStudio® Methylation Module was used for pre-processing and normalization of the data in each cohort (Supplementary Data 1). For each CpG site, a beta-value was calculated representing the percent methylation at that CpG site. We used an annotation file provided on the Illumina website to annotate CpGs to genes. CpGs were annotated to genes by Illumina using the following rules: those located within 1500 bp upstream of transcription start site (TSS1500), TSS200, 5′UTR, 1st exon, gene body, or 3′UTR of a gene were annotated to that gene. All other intergenic CpGs were not annotated to a gene. To reduce technical batch effects, plate, row, and column information were added as random effects in the association analyses. To reduce confounding from cellular heterogeneity61, we estimated cell proportions using Houseman’s method53 in each subject and used these proportions as covariates in the association analyses.

Any single value with a detection p-value > 0.01 was set to missing. In each cohort, we excluded probes with a detection p-value > 0.01 in greater than 5% of samples. In addition, we excluded samples with a detection p-value > 0.01 in greater than 5% of probes. To avoid spurious signals in DNA methylation data, we excluded 29,233 CpGs that co-hybridize to alternate genomic sequences (highly homologous to the intended targets)62.

Epigenome-wide association study

Epigenome-wide association analyses (EWAS) were performed in each cohort stratified by racial/ethnic group (European, African, and Hispanic). For Model 1, a linear mixed effects model was used to study the association between the DNA methylation level of a CpG (dependent variable) and each of the lipid measures (independent variable; HDL, LDL, or TG), adjusting for age, sex (reference = male), smoking variable (never/previous/current, reference = never), lipid medication (Yes or No, reference = No), the top four principal components from genotypes (SNPs), and the proportion of 5 types of cells estimated with the Houseman method (CD8 T lymphocytes, CD4 T lymphocytes, natural killer cells, B cells, and monocytes)53. We added random effects for plate, row, and column. We also included family structure as a random effect among family-based studies. For Model 2, we further adjusted for BMI in addition to Model 1 covariates. We also ran Model 3 and Model 4 which were analogous to Model 1 and 2, respectively, in the subset of individuals not taking lipid-lowering medication.

Meta-analysis

We performed meta-analyses of all the participating cohorts (N = 16,265) and also stratified by racial/ethnic group: European Americans (12 cohorts, N = 11,114), African Americans (7 cohorts, N = 4,452), and Hispanics (2 cohorts, N = 699). These meta-analyses were performed for each of the 4 models, respectively. We used a random effects meta-analysis implemented in METASOFT26 to take into account the heterogeneity of the effect sizes of different cohorts while achieving a higher or comparable statistical power compared to fixed effects meta-analysis. To avoid spurious findings from population substructure, we applied genomic control63. We considered a p < 1.09 × 10−7 to be significant, equivalent to a Bonferroni correction for the number of CpG probes. We then compared results for all CpGs across all three racial/ethnic groups to identify the subset of CpGs with significant associations across two or more racial/ethnic groups. These CpGs were prioritized and followed-up with mQTL analysis, eQTM analysis, and a Mendelian randomization approach for causal inference.

Overlap with prior related genome-wide association studies

To identify the overlap between EWAS CpGs and genome-wide association studies (GWAS) SNPs of lipids, we tried three approaches: (1) Identify a GWAS SNP and a EWAS CpG pair located within 1 Mbp; (2) Identify a GWAS SNP and a EWAS CpG pair located within 10 Mbp; and (3) Identify a GWAS SNP and a EWAS CpG pair annotated to the same gene. SNPs were annotated to a gene if it is located within the transcript boundary of a protein-coding gene or a nearest gene if it is located outside of genes. The CpGs were annotated to a gene if it is located within 1500 base pairs of the transcription start site, 5′-UTR, gene body, or 3′-UTR using the Illumina annotation file. Intergenic CpGs were not compared for this 3rd approach.

Methylation quantitative trait loci

We investigated the association between either imputed or genotyped SNPs located within 25 kb upstream or downstream of each CpG and DNA methylation levels to identify cis-acting methylation quantitative trait loci (mQTL). For imputed SNP data, we restricted the mQTL analysis to SNPs with a good quality imputation (IMPUTE info > =0.4 or MACH r^2 > =0.3). Subjects taking lipid lowering medications were excluded from this analysis. Beta-values of DNA methylation levels were inverse-normal transformed and regressed on age, sex, smoking (current/former/never), BMI, at least 4 SNP PCs, cell proportions (WBC count and/or estimated WBC proportions (granulocytes as a reference)), and technical covariates (plate, row, and column as random effects) (two-sided test). Family information was also included as a random effect if a cohort was a family-based study. We then regressed the residuals on each SNP of interests stratified by racial/ethnic group.

Expression quantitative trait methylation

We extracted associations between DNA methylation levels and gene expression in blood (expression quantitative trait methylation or eQTM) from a pre-existing investigation involving 4278 participants of the FHS (2,726 offspring cohort participants and 1552 third-generation cohort participants)30. For gene expression, whole blood was collected in PAXgene™ tubes (PreAnalytiX, Hombrechtikon, Switzerland) and frozen at −80 °C. RNA was extracted using the whole blood RNA System Kit (Qiagen, Venlo, Netherlands) and mRNA expression profiling was assessed using the Affymetrix Human Exon 1.0 ST GeneChip platform (Affymetrix Inc, Santa Clara, CA), which contains more than 5.5 million probes targeting the expression of 17,873 genes. The Robust Multi-array Average (RMA) package64 as used to normalize the gene expression values and remove any technical or spurious background variation.

Linear mixed effects model was used to assess associations between residuals of DNA methylation levels and residuals of gene expression levels (two-sided test). We first regressed out the fixed effects of age, sex, white blood cell type proportions as estimated through the Houseman method53, technical variables (sample storage time, RNA integrity number), and the first component of a principal component analysis and the random effects of amplification batch in the gene expression levels. Next, we regressed out the fixed effects of age, sex, Houseman’s white blood cell type proportions, and the first two PCs of a principal component analysis and the random effect of batch (plate) from the DNA methylation levels. We then applied an additional adjustment for 25 surrogate variables to both the gene expression and DNA methylation data before the methylation-gene expression association analysis. Surrogate variables were calculated using the gene expression data (residualized for PC1, and technical variables as fixed effects and amplification batch as random effect) and the DNA methylation data (residualized for PC1 and PC2 as fixed effects and batch as random effect) adjusting for age, sex, and Houseman’s blood cell type prediction.

Mendelian randomization approach

We applied a bi-directional Mendelian randomization approach to shed light on the causal nature of associations we identified between blood lipid levels and CpGs across ≥ 2 racial/ethnic groups31,32,65,66,67. First, we created genetic risk scores (GRS) for increasing lipid levels using either imputed (where IMPUTE info > =0.4 or MACH r^2 > =0.3) or genotyped SNPs known to be robustly associated with lipids through prior large scale GWAS7,8. The GRSs for each subject were calculated as the sum of the lipid (HDL, LDL, or TG) increasing allele dosages of the SNPs listed in Supplementary Table 4 divided by the number of SNPs used to calculate the specific GRS. We performed a Mendelian randomization analysis in each study among participants not taking lipid lowering medication followed by a meta-analysis of analogous results in each study. Each lipid measure was regressed on the corresponding GRS adjusting for age, sex, smoking, BMI, at least 4 SNP PCs, and a family effect if a cohort was a family-based study. The DNA methylation level of each CpG was also regressed on the corresponding GRS adjusting for age, sex, smoking, BMI, at least 4 SNP PCs, cell proportions, technical batch effects, and a family effect if a cohort was family-based study. The study-specific estimates and standard errors of the GRS-lipid and the GRS-CpG associations were used as input for a meta-analysis to obtain overall estimates for GRS-lipid and GRS-CpG associations. The overall estimates were then used to assess the causal effect of each lipid measure to a CpG with a Mendelian randomization approach.

Secondly, we implemented Mendelian randomization65,66 method to study the effect of DNA methylation on blood lipids levels. The identified significant mQTL SNPs were used as an instrument for the CpGs of interest. Study participants taking any lipid lowering medication were excluded in the Mendelian randomization analysis. Beta-values of DNA methylation levels were inverse-normal transformed and regressed on age, sex, smoking (current, former, or never), BMI, at least 4 SNP PCs, cell proportions (WBC count and/or estimated WBC proportions (gran as a reference)), and technical covariates (plate, row, and column as random effects). Family information was also included as a random effect if a cohort is a family-based study. Then the residuals were regressed on each SNP located 25 kb up and downstream of the CpG site. The mQTL analyses were performed separately for each racial/ethnic group. Each SNPs were also searched to identify associations with lipid levels in the previous GWAS results from Global Lipids Genetics Consortium (GLGC, http://csg.sph.umich.edu/willer/public/lipids2013/)7. The estimates and standard errors of the SNP-CPG (obtained from our samples) and SNP-lipid associations (obtained from the GLGC GWAS results) were used as input for inverse-weighted MR and MR-egger methods to assess the causal effect of each CpG to a lipid measure.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.