Background

An abnormal blood lipid profile, also known as dyslipidemia, is a major risk factor for atherosclerotic cardiovascular diseases such as myocardial infarction and stroke [1, 2], which mainly manifests as elevated plasma triglyceride (TG), total cholesterol (TC), low-density lipoprotein-cholesterol (LDL-C) levels and decreased high-density lipoprotein-cholesterol (HDL-C) levels. The prevalence of dyslipidemia contributes to a great global burden of disease. For example, elevated LDL-C levels were one of the top 10 risk factors for all-cause mortality and disability-adjusted life-years worldwide in 2019 [3].

Blood lipid levels are a complex trait regulated by various genetic and environmental factors [4,5,6,7,8,9], with DNA methylation being a potential regulatory mechanism. DNA methylation is the "annotation system" of gene sequences and affects gene expression by dynamically modifying the methylation or demethylation state of CpG sites on DNA. The methylation status of the same individual and the same CpG site may change over time. DNA methylation modification is related to the physiological effects of environmental factors on the human body and is also involved in the pathological process of disease occurrence and progression [10]. Therefore, DNA methylation may play an important role in maintaining blood lipid homeostasis.

Several epigenome-wide association studies (EWAS) have been conducted in predominantly European populations to explore the association between DNA methylation and blood lipid levels and have identified CpG sites on genes such as ABCG1, CPT1A, and SREBF1 [11,12,13,14,15]. However, most existing studies have mostly been based on cross-sectional data to analyze the association between DNA methylation and blood lipids, which makes it difficult to assess the causal relationship between them. Mendelian randomization (MR) is an effective method for causal inference with cross-sectional data [16]. Some studies have used MR to estimate the causal relationships between DNA methylation levels and blood lipid levels [17,18,19]. However, the application of MR is limited because of the sample size and strict assumptions. Due to the variability of DNA methylation, studies based on longitudinal data are the best approach for causal inference.

As one of the risk factors for dyslipidemia, body mass index (BMI) is also associated with DNA methylation, and several CpG sites are associated with both BMI and blood lipids [20]. Previous studies proved that BMI could influence the methylation level of certain CpG sites [21,22,23], but whether DNA methylation mediates the association between BMI and blood lipids remains unclear.

Thus, we performed an EWAS of blood lipids with a discovery stage and independent replication in participants from the Chinese National Twin Registry (CNTR) and China Kadoorie Biobank (CKB). We assessed the temporal association between replicated CpG sites and lipids using the cross-lagged panel model (CLPM) in 308 participants with follow-up data from CNTR. For the CpG sites influencing blood lipids, we further conducted a mediation analysis with the CLPM to examine the role of DNA methylation in the association between BMI and blood lipids.

Results

Demographic

After quality control, 1060 participants in CNTR and 948 participants in CKB were retained in the EWAS (see Additional file 1: Table S1 for further details). The characteristics of the EWAS participants were shown in Table 1. The mean age of the two populations was around 50 years old (49.90 ± 12.15 and 50.56 ± 7.48). Males or never smokers accounted for more than half of the participants. In the discovery stage, approximately half of the people had never consumed alcohol, while in the replication stage, approximately 60% of the participants were current drinkers. Participants in the replication stage were more likely to have lower BMI, higher TG levels, and lower TC, HDL-C, and LDL-C levels than those in the discovery stage.

Table 1 Characteristics of participants in the EWAS phase

EWAS

For Model 1, the genomic inflation factors (λ) ranged between 1.009 and 1.049. We identified 26 CpG-lipid associations in the discovery stage, and methylation levels of 17, 3, 5, and 1 CpG sites were associated with TG, TC, HDL-C, and LDL-C, respectively (Additional file 1: Figure S1). A total of 23 associations of 19 CpG sites were replicated in the replication stage, and 15, 3, 5, and 0 CpG sites remained significant for the four lipid measures, respectively (Table 2). The direction of the effect was consistent in the two stages. cg06500151 (ABCG1), cg11024682 (SREBF1), cg19693031 (TXNIP) and cg27243685 (ABCG1) were associated with two lipid measures. In addition, we identified 4 novel CpG sites located in 2 genes (cg12054453 and cg18942579 in TMEM49 and cg17507897 in SNX5/SNORD17) and 1 intergenic region (cg05176551 adjacent to CCDC7).

Table 2 Associations between DNA methylation and lipid measures (Model 1)

When we further adjusted for BMI in Model 2, only 1/3 (8/23) of the associations were robust. The number of significant CpG sites was reduced by half for TG (7/15) and TC (1/3), and no CpG site was associated with HDL-C (Additional file 1: Table S2). The effect estimates were attenuated except for cg05176551. Two of the four novel CpG sites (cg05176551 for TG and cg12054453 for TC) remained significant.

Enrichment analysis

The results of enrichment analyses based on the EWAS from model 1 were shown in Additional file 1: Table S3–S6. GO enrichment analysis for TG revealed lipid and cellular ketone metabolic pathways. Insulin and glucose metabolism, peptide synthesis, and regulatory processes were also enriched in the analysis, which appeared to be related to TG homeostasis. Sterol signaling pathway, as well as alcohol signaling pathway, were enriched in the GO enrichment analysis for HDL. KEGG and Reactome analysis suggested enrichment for pathways associated with liver disease, lipid metabolism, and antioxidants such as PPARalpha and HMOX1, although the pathways did not survive after multiple comparisons (FDR > 0.05).

Cross-lagged analysis

Our cross-lagged analysis showed that two CpG sites had significant temporal associations with TG and two had significant temporal associations with HDL-C in all twins (Table 3). The paths of TG at baseline to the methylation levels of cg27243685 and cg05778424 at follow-up (ρ1) were significant, suggesting that the TG level had an effect on these CpG sites. We observed the same direction of association between HDL-C and the methylation level of cg11024682. The reverse path (ρ2) was significant for cg06500161 and HDL-C, suggesting that this CpG site could influence HDL-C levels. We did not detect any temporal associations between CpG sites and TC.

Table 3 Cross-lagged association between lipid measures and DNA methylation

We then performed a stratification analysis and revealed some new temporal associations. In monozygotic (MZ) twins, the methylation level of cg11024682 at baseline showed a significant association with TG at follow-up (ρ2). In dizygotic (DZ) twins, cg17507897 also showed an association between HDL-C at baseline and methylation level at follow-up (ρ1). The associations between cg05778424, cg06500161, and cg11024682 with TG or HDL-C could still be observed in only DZ twins (Additional file 1: Table S8).

Mediation analysis

According to the results of the cross-lagged analysis, cg06500161 and cg11024682, which had effects on HDL-C or TG, were selected as potential mediators in the mediation analysis. We first estimated the temporal association between BMI and lipid or CpG sites using the CLPM. In all twins, BMI at baseline was associated with TG, HDL-C, and the methylation level of the two CpG sites at follow-up (Table 4). We then performed a mediation analysis with the methylation level of CpG sites as the mediator, BMI as exposure, and lipid measures as the outcome, and only the mediating effect of cg06500161 was observed. The results are shown in Fig. 1. BMI was associated with HDL-C with a significant direct effect, indirect effect, and total effect (all P < 0.05). The methylation level of cg06500161 explained 10.1% (P < 0.05) of the association between BMI and HDL-C.

Table 4 Cross-lagged association between BMI and lipid measures or between BMI and DNA methylation
Fig. 1
figure 1

Cross-lagged panel model about the mediation effect of cg06500161 between BMI and HDL-C. The subscript t indicates traits at baseline, and the subscript t + 1 indicates traits at follow-up. Model fit: SRMR = 0.004, CFI = 1.000. *P < 0.05; **P < 0.01; ***P < 0.001

Discussion

In the current study, we reported the associations between lipid measures and DNA methylation and inferred the potential causal direction of the associations. To our knowledge, this is the first EWAS of blood lipids reported in the Chinese population. We identified 23 associations of 19 CpG sites, and 4 CpG sites located in or adjacent to 3 genes were novel. With the cross-lagged panel model, we found potential causal relationships of 2 CpG sites with TG and 2 CpG sites with HDL-C among all the significant associations and a relationship of cg11024682 with TG in only MZ twins. In addition, we conducted a mediation analysis and found that BMI has an effect on HDL-C, and the methylation level of cg06500161 partially mediates this effect.

In our EWAS, we confirmed that the widely validated CpG sites in European populations, such as cg06500161 in ABCG1, cg00574958 in CPT1A, and cg11024682 in SREBF1, were also associated with lipid measures robustly in East Asian populations and the direction of the associations were consistent. This complemented the findings of Jhun et al. [24] that the associations between blood lipids and DNA methylation have consistency across ethnic groups, including Europeans, African Americans, and Hispanics. Another study compared differences in DNA methylation patterns of multiple traits related to non-communicable diseases between black South Africans and Europeans and found that 95% confidence intervals for effect estimates overlapped for more than 85% of the shared lipid-related CpG sites [25]. However, we still identified 3 novel genes with TG (CCDC7), TC (TMEM49), and HDL-C (SNX5/SNORD17) with moderate sample size, and the latter two remained robust after adjustment for BMI. The differences in the methylation patterns may be associated with the genetic background of different ethnic groups [25], but the exact mechanisms need to be further explored.

Two CpG sites located in TMEM49 were related to TC. TMEM49, also known as VMP1, encodes an endoplasmic reticulum (ER) transmembrane protein that regulates the formation of autophagosomes, lipid droplets, lipoproteins, and other ER-derived structures [26, 27]. TMEM49 has been reported to affect the activity of lipoprotein-associated phospholipase A (2) (Lp-PLA2) [28], an inflammatory enzyme that is a risk factor for coronary heart disease and in which lipids may play a role [29, 30]. Lp-PLA2 was bound to serum cholesterol, and in vitro experiments showed that elevated LDL-C levels result in upregulation of Lp-PLA2 [31]. All these findings indicated a potential role for TMEM49 in TC homeostasis. In addition, the methylation level of TMEM49 was also associated with waist circumference [32], survival of cancer overall [33], and chemotherapy in breast cancer patients [34].

SNX5, short for sorting nexin 5, is a key regulator of endosomal trafficking [35], and SNX5 has been identified as an LDL-C-associated gene [36]. Elevated or decreased expression of SNX5 was found in multiple cancers [37, 38], and renal SNX5 was reported to positively regulate insulin-degrading enzyme expression and function [39]. Noncoding small nucleolar RNA SNORD17 is also related to carcinogenesis, including hepatocellular carcinoma [40], cervical cancer [41], and colon adenocarcinoma [42]. The association of SNX5/SNORD17 with HDL-C needs to be further investigated.

Previous studies have reported an association between BMI and some lipid-related CpG sites [20, 43, 44], which was validated to some extent in our Model 2 results. The magnitude of the association between lipid measures and methylation was attenuated when adjusting for BMI, indicating that BMI may partially influence the association. None of the HDL-C-related CpG sites remained significant. Given that these CpG sites were also BMI-related, we hypothesized that BMI might affect HDL-C through the CpG sites. This hypothesis was partially confirmed by the subsequent mediation analysis.

Based on the results of the cross-lagged analysis, we found that the methylation levels of ABCG1 could influence HDL-C levels and be influenced by TG levels. Previous studies have explored the causal relationship between ABCG1 methylation and TG and HDL-C using MR and gene expression analysis methods and obtained opposite directions [17, 45, 46]. Min et al. and Dekkers et al. [17] both found that the methylation level of ABCG1 was influenced by TG or HDL-C using MR analysis, while Pfeiffer et al. [46] suggested that the methylation level of ABCG1 could regulate TG and HDL-C levels through gene expression. Since our study used longitudinal data while the three studies above were based on cross-sectional data, we believe that our results are more credible. The ATP-binding cassette transporter (ABCG1) is the critical mediator of reverse cholesterol transport (RCT) and mediates cellular cholesterol efflux to HDL particles [47,48,49,50]. The methylation level of cg06500161 (ABCG1) was correlated with the ABCG1 transcript level, which in turn had an impact on HDL-C [46].

Our results also showed that the methylation level of AKAP1 might be influenced by TG. A-kinase anchoring protein 121 (AKAP1) binds protein kinase A and anchors it to the mitochondrial outer membrane to maintain mitochondrial function [51]. Animal experiments have demonstrated that AKAP1 is involved in the regulation of endothelial cell behavior [52], oxidative stress, and apoptosis [53] and plays a role in multiple metabolism-related diseases. For instance, researchers observed that energy expenditure and thermogenesis were significantly enhanced in brown adipose tissue of AKAP1 knockout obese mice, which could attenuate diet-induced obesity and insulin resistance [54]. DNA methylation may be an intermediate link in the association of AKAP1 with these diseases, and TG, as one of the important metabolic indicators, may have a similar regulatory mechanism.

In the stratification analysis, a noteworthy result was that cg11024682 located in SREBF1 had a potential influence on TG levels only in MZ twins. The direction of the association between TG and SREBF1 in our results was not completely consistent with a previous study, which found that the methylation level of SREBF1 was influenced by TG using MR analysis [17]. We hypothesize that the inconsistent results are due to genetic and early-life environmental factors. MR analysis used genetic variants as instrumental variables and could not exclude confounding from genetic factors. Since MZ twins share almost 100% genetic background and early-life environment, our result, which was found only in MZ twins, suggests that genetic and early-life environmental factors may influence the effects of the CpG site on TG. SREBF1 encodes sterol regulatory element-binding protein 1, which can activate and synthesize cholesterol and fatty acids [55]. MicroRNA (miR)-33b, located in the intron region of SREBF1, can also act as an important regulator of lipid metabolism [56, 57]. Antagonism of miR-33 inhibited the expression of genes involved in fatty acid synthesis, such as SREBF1, and thus reduced plasma very low-density lipoprotein triglyceride levels [58].

Our mediation analysis supported that cg06500161 (ABCG1) partially mediated the effect of BMI on HDL-C. Previous studies have reported that the methylation level of cg06500161 could be influenced by BMI [21,22,23] and mediate the effect of BMI on the expression of ABCG1 [59]. Johansson and his colleagues found that ABCG1 and CETP were the most upregulated genes that were differentially expressed in obese patients during weight loss and weight maintenance after weight loss, and the prevailing HDL concentration was correlated with the expression of ABCG1 [60]. Similar results were observed in morbidly obese women [61]. However, the above two studies only analyzed ABCG1 expression levels in adipose tissue rather than blood samples. Therefore, the mediating role of ABCG1 and its methylation level between BMI and HDL-C remains to be further explored.

The strength of our study is that we explored the temporal association between CpG sites and blood lipids using longitudinal data and corresponding analysis, circumventing the limitations of cross-sectional data in causal inference. In addition, the analysis in twins, especially MZ twins, provides a natural matched design to present causal inference while controlling for genetic and early family environmental factors [62].

There are also some limitations of our study. In the EWAS phase, to maximize the sample size, only probes common to both the 450 K and EPIC methylation arrays were included in our study, while other probes present in only one methylation array were excluded. Some potential lipid-associated CpG sites may not have been identified. In addition, the CKB cohort did not require participants to fast prior to blood sample collection, which may have affected the validation results. Therefore, we adjusted the fasting status in our analysis and used SmartSVA to control for potential confounding, avoiding the interference of fasting time on the results to some extent. In the cross-lagged analysis phase, the power of the stratification analysis was limited due to the moderate sample size. However, we still found some evidence that the association between TG and SREBF1 may be influenced by genetic and early-life environmental factors.

Conclusions

We identified 4 novel CpG sites related to lipid measures in the Chinese population. Based on the longitudinal twin data, we found the temporal sequence of DNA methylation levels of ABCG1, AKAP1, and SREBF1 with TG and HDL-C. In addition, we observed a potential mediation role of cg06500161 (ABCG1) in the temporal association between BMI and HDL-C. Our study provides evidence to elucidate the underlying biological mechanisms of DNA methylation in lipid metabolism, and future studies should continue to explore the biological role of DNA methylation in lipid metabolism and downstream effects on disease.

Method

Study population

The discovery stage of the EWAS phase and cross-lagged analysis phase was based on CNTR. Established in 2001, CNTR is a population-based twin registry, and the details of CNTR have been reported previously [63]. The current study included participants who participated in the follow-up surveys in 2013–2014 and 2017–2018. A total of 1392 blood specimens were collected for methylation measurements from 1084 participants, including 308 participants with repeated measurements. The replication stage of EWAS was based on CKB, a prospective cohort of more than 0.5 million adults in 10 geographically defined regions across China since 2004–2008 [64, 65]. DNA methylation data were available for 988 participants selected for a case‒control study nested within CKB [66]. Information from the two cohorts was collected via standardized questionnaires, physical examinations, and blood biochemical examinations.

Data measurements

Lipid measures, including TG, TC, HDL-C, and LDL-C, were measured in blood samples. Participants were asked to fast for at least 8 h in CNTR, while fasting was not mandatory in CKB. The fasting time for each participant was recorded and adjusted in the subsequent analysis in CKB. Other covariates, such as age, sex, smoking status, and alcohol consumption, were collected using the interview-administered questionnaire. Smoking status and alcohol consumption were divided into three categories: never, former and current. Height and weight were collected in the physical examination to calculate BMI, which was defined as weight in kilograms divided by the square of height in meters. Medication history was also recorded, and participants taking lipid-lowering medicine were excluded. Outliers that were three standard deviations from the mean of lipid measures in each cohort were removed. Blood samples with missing biochemical parameters or moderate to severe lipemia were also excluded. All lipid measures were natural log-transformed to approximate a normal distribution.

The zygosity of twins was determined based on the correlation of 59 SNPs in both the 450 K and EPIC methylation arrays. It was shown that the possible cutoff point was between 0.84 and 0.90 [67], which was set to 0.90 in the current study. Twin pairs with a correlation coefficient higher than 0.90 were considered MZ twins; otherwise, they were considered DZ twins.

DNA methylation and quality control

Genomic DNA was extracted from peripheral blood leukocytes and bisulfite-converted using the EZ DNA methylation kit (Zymo Research, Orange, CA, USA). Epigenome-wide DNA methylation levels were measured using the Infinium HumanMethylation450 BeadChip assay (Illumina, San Diego, CA, USA) or Infinium HumanMethylationEPIC BeadChip assay (Illumina, San Diego, CA, USA), and only the overlapping CpG sites of the two assays were included in the subsequent procedure.

DNA methylation measurement and preprocessing were conducted independently in the two cohorts. In CNTR, we applied the R package minfi [68] to process and combine the raw methylation data of two assays and obtained the β-value of each CpG site to report the methylation level. For quality control, we removed the probes if they (1) had detection p values > 0.05 in more than 1% of samples or had bead counts < 3 in more than 5% of samples; (2) were non-CpG or multi-hit probes; (3) were related to SNPs with MAF > 0.05 in the 1000 Genomes Project for the East Asian population; and (4) were located in sexual chromosomes. We excluded samples if they (1) were sex mismatched and (2) had a detection P-value greater than 0.01. The details of quality control of the raw methylation data in CKB can be found in a previous study [66]. The stratified quantile normalization method [69] in the minfi package was used for preprocessing and normalization. The cell proportions for each cell type were estimated using Houseman’s method [70] and adjusted using the champ.refbase function in the R package ChAMP. To correct batch effects, we performed an optimized surrogate variable analysis (SVA) with the package SmartSVA [71] in the EWAS phase. SmartSVA provided a fast and robust method to remove potential confounding factors for epigenetic or other genomic studies and was developed based on the linear model. In the cross-lagged analysis phase, batch effects were corrected using the Combat method instead.

Finally, 378,654 CpG sites were retained in the discovery stage of EWAS. We annotated CpG sites to genes with the manifest file provided on the Illumina website. The CpG sites located in the intergenic region would be annotated to the nearest gene using the R package matchGenes or genome browser (https://genome.ucsc.edu/).

Statistical analysis

EWAS

We implemented a two-stage EWAS with participants in CNTR and CKB. For Model 1, linear mixed regression models were fitted with β-value at a CpG site as the dependent variable and each lipid measure (TG, TC, HDL-C, and LDL-C) as the independent variable using the R package nlme, adjusting for age, sex, smoking status, alcohol consumption, and all surrogate variables generated above as fixed effects. Due to the correlation within twin pairs, the unique ID of each twin pair was added to the model as a random effect in the discovery stage. The fasting status was further adjusted as a fixed effect (< 8 or ≥ 8 h) in the replication stage. BMI was further adjusted in Model 2.

Enrichment analysis

For all CpG sites associated with lipid measures, we then performed Gene Ontology (GO) term analyses, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses, and Reactome pathway enrichment analysis with the R package "methylGSA" [72].

Cross-lagged analysis

We conducted a cross-lagged analysis based on participants with repeated measurements in CNTR, of whom the average follow-up duration was 4.67 ± 0.22 years. A residual analysis was performed first. Lipid measures were adjusted for age, sex, smoking status, and alcohol consumption as fixed effects at the corresponding time point and twin number as a random effect using the linear mixed regression model. The β-value of significant CpG sites was further adjusted with the blood cell proportions and batch effects. Residuals from the regression model were used in the subsequent cross-lagged analysis, and residuals of CpG sites were normalized using Z-transform. The CLPM simultaneously estimated the autoregressive and cross-lagged regressive effects of lipid measure and methylation data at two-time points, including (1) autoregression of lipid measure at follow-up on lipid measure at baseline, (2) autoregression of β value at follow-up on β value at baseline, (3) cross-lagged regression of β-value at follow-up on lipid measure at baseline (ρ1), and (4) cross-lagged regression of lipid measure at follow-up on β-value at baseline (ρ2). The significance and magnitude of ρ1 and ρ2 reflected the temporal associations of the two variables. We fitted a structural equation model to estimate all the parameters and statistics above using the R package lavaan and set the cluster argument to adjust for the correlation of twins. We also reported model fit indexes to evaluate model fit, including standardized root mean squared residual (SRMR) and comparative fit index (CFI), and models with SRMR < 0.08 and CFI > 0.95 were considered good fits [73]. We repeated the steps above in only MZ or DZ, and only twin pairs with complete data at two-time points were included.

Mediation analysis

Since BMI was associated with DNA methylation and could affect blood lipids, we performed mediation analysis to assess whether CpG sites were the mediator of the effect of BMI on lipids. Only CpG sites that showed a potential effect on lipid measures in the cross-lagged analysis were included.

The temporal associations between BMI and lipid measures and between BMI and CpG sites were assessed with the CLPM first to deduce the potential mediator. The relationships among exposure X, mediator M and outcome Y at baseline (b) and follow-up (f) were as follows:

$$X_{f} = \beta_{X} X_{b} + \varepsilon_{Xf}$$
(1)
$$M_{f} = \beta_{M} M_{b} + aX_{b} + \varepsilon_{Mf}$$
(2)
$$Y_{f} = \beta_{Y} Y_{b} + bM_{b} + c^{\prime } X_{b} + \varepsilon_{Yf}$$
(3)

where β is the autoregressive coefficient, ε is the residual item, ab is the direct effect, and \(c^{\prime }\) is the indirect effect. The model fitting and parameter estimation method was similar to that of the cross-lagged analysis.

The false discovery rate (FDR) was used for multiple comparisons, and the threshold of significance was considered to be less than 0.05. Statistical analysis was performed using R version 4.0.2 or 4.0.3.