Background

Tobacco smoking is a major modifiable risk factor for premature death and non-communicable diseases worldwide [1]. With almost 18 million deaths in 2017, cardiovascular diseases (CVD) account for the largest number of deaths of non-communicable diseases [2]. Smoking is also associated with cardio-metabolic traits, such as dyslipidemia, hypertension, insulin resistance, and obesity, which are major risk factors leading to CVD [3, 4]. Furthermore, persistent smoking has an excessive impact on DNA methylation [5,6,7] and gene expression [8,9,10], which their alterations are also linked to cardio-metabolic traits and risk of CVD [11,12,13,14,15,16].

Extensive studies have shown the independent association of smoking with DNA methylation, gene expression levels, and disease risk. In this context, smoking is associated with alteration in DNA methylation levels of several genes related to type 2 diabetes [17] and coronary artery disease [18]. Additionally, smoking-related CpGs have a strong association with all-cause and cardiovascular mortality [19]. Nevertheless, much less research has investigated smoking-related changes in DNA methylation and gene expression concurrently and in relation to health outcomes. A recent study identified a link between smoking-related DNA methylation and gene expression changes with metabolic health [20]. Their results indicate possible molecular pathways in which smoking affects disease development.

In this study, we hypothesized that smoking-related modifications in DNA methylation and gene expression are associated with each other and, additionally, with cardio-metabolic traits. Hence, we first determined three-way associations, including CpGs versus gene expression, cardio-metabolic traits versus CpGs, cardio-metabolic traits versus gene expression. To this end, we selected CpGs and gene expression probes associated with current versus never smokers using the largest published epigenome-wide association study (EWAS) [6] and transcriptome-wide association study (TWAS) [8] to date. Next, we used data from the Rotterdam Study to test the expression quantitative trait methylation (eQTM) association between the selected CpGs and gene expression probes. Subsequently, we tested the association for these CpGs and genes with different cardio-metabolic traits, including lipids, glycemic indices, blood pressure, and obesity-related traits. Moreover, we performed mediation analysis to test the mediating effect of; (1) DNA methylation in the association between smoking and cardio-metabolic traits, (2) gene expression in the association between smoking and cardio-metabolic traits, and (3) DNA methylation in the association between smoking and expression levels of smoking-related genes. To test the validity of our findings, we further replicated our results in an independent cohort, The Cooperative Health Research in the Region of Augsburg (KORA) study.

Results

An overview of our study design is illustrated in Fig. 1. The discovery dataset consisted of 1412 participants with DNA methylation data from the two sub-cohorts of the Rotterdam study; RS-II and RS-III. Of these, 716 participants from RS-III had also gene expression data [21]. The replication dataset comprises 1727 participants with DNA methylation data, of whom 687 also had gene expression data, from the KORA study (F4) [22]. Both the discovery and replication cohorts consisted of both males and females (53.3%) and current, former and never smokers. In the current study, the former and never smokers are combined in the non-smoker category (83.6%). General characteristics of the study population are listed in Table 1.

Fig. 1
figure 1

Schematic overview of the study design. In the current paper, previously identified CpGs by the largest available EWAS [6] and genes by the largest available TWAS [8] associated with current versus never smokers were used to test the link between smoking and cardio-metabolic traits. To this end, we first examined the association between smoking and alterations in gene expression (a). Second, we checked the association between the smoking-related CpGs and the smoking-related gene expression probes (a). Third, the smoking-related CpGs and gene expression probes that were in eQTM with each other were tested for their association with cardio-metabolic traits (b)

Table 1 Population characteristics

Correlation between smoking-related changes in DNA methylation and gene expression

We selected 2623 CpGs previously reported as being significantly (P < 1 × 10–7) differentially methylated between smokers and never smokers [6]. Of these, 2549 CpGs passed the quality control in the Rotterdam Study. Furthermore, we selected 502 gene expression probes that were differently expressed between smokers and never smokers (FDR < 0.05) and replicated in an independent dataset as part of the same study [8]. Of these, 443 gene expression probes passed quality control in the Rotterdam Study. Then, we investigated the eQTM associations to test the possible impact of smoking-related DNA methylation changes on the smoking-related genes, or vice versa. To this end, we computed the residuals for both the CpGs and gene expression probes. Then, we tested the association between all the smoking-related CpGs with all the smoking-related gene expression probes. Here, we investigated cis-eQTMs in which the CpG regulates transcription of a neighboring gene (≤ 250 Kb from each side of the transcription start site). Also, we studied the trans-eQTM association in which a CpG regulates distant genes located > 250 Kb of the transcription start site [23]. Notably, out of the 2549 smoking-related CpGs, 1224 were associated with at least one of the gene expression probes at the significance threshold of P < 4.4 × 10–8 (0.05/443 × 2549). Of the 443 tested gene expression probes, 200 probes were significantly associated with at least one of the 2549 CpGs, after correcting for multiple testing (Additional file 1: Table S1). The R code to generate the residuals for the CpGs and gene expression probes, and for the eQTM analysis are included in Additional file 2.

To examine the possible enrichment due to the smoking effect, we further tested if the number of significant eQTM associations is higher while using smoking-related CpGs and genes, compared to the number of eQTM associations while using randomly selected CpGs and genes. When testing the association between the 2549 smoking-related CpGs with 443 randomly selected gene expression probes, we found that only 325 CpGs are associated with at least one of these gene expression probes and 186 gene expression probes with at least one smoking-related CpG. Using the chi-square test of independence to compare the use of smoking-related gene expression probes versus randomly selected gene expression probes, we obtained for the CpGs (1224 vs 325, respectively) P < 1.0 × 10–5 and for the genes expression probes (200 vs 186, respectively) a P value of 0.38. Similarly, when testing the association between 2549 randomly selected CpGs with the 443 smoking-related gene expression probes, we found only 465 CpGs associated with at least one smoking-related gene expression probe, and 19 gene expression probes with at least one smoking-related CpG. Using the chi-square test of independence, comparing the use of smoking-related CpGs versus randomly selected CpGs, we found a significant difference (P < 1.0 × 10–5) for both the CpGs (1224 vs 465, respectively) and the gene expression probes (200 vs 19, respectively). These results indicate enrichment of smoking-related genes in smoking-related DNA methylation sites and vice versa.

The replication in the KORA study confirmed the association of 134 smoking-related CpGs with at least one gene expression probe and 50 smoking-related gene expression probes with at least one smoking-related CpG, after correcting for multiple testing, at the significance threshold of P < 2.04 × 10–7 (0.05/200 × 1224).

Association of smoking-related changes in DNA methylation and gene expression with cardio-metabolic traits

We tested the association of the 1224 CpGs and the 200 gene expression probes with cardio-metabolic traits, including high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG) and serum cholesterol, fasting glucose and insulin levels, systolic blood pressure (SBP) and diastolic blood pressure (DBP), waist to hip ratio (WHR) and body mass index (BMI) in the Rotterdam Study. After adjusting for age, sex, blood cell count, and technical covariates (model 1), we found significant associations between 202 out of the 1224 smoking-related CpGs and any cardio-metabolic trait at P < 4.08 × 10–5 (0.05/1224) (n = 1412 participants) (Additional file 3: Table S2). Among these, we observed associations with HDL (126 CpGs), TG (84 CpGs), glucose (2 CpGs), insulin (10 CpGs), DBP (1 CpG), WHR (21 CpGs), and BMI (16 CpGs). After further adjustment for BMI and relevant medication in the model 2, associations with 109 CpGs remained significant, including HDL (58 CpGs), TG (35 CpGs), DBP (1 CpG), WHR (6 CpG), and BMI (16 CpG same as model 1) (Additional file 3: Table S3). The R code to test the association between cardio-metabolic traits and the CpGs are included in Additional file 4. We pursued replication in the KORA study for the CpGs reaching significance in the model 2 and found that 26 CpGs surpassed the nominal significance (P < 0.05, n = 1727 participants), including 8 CpGs for HDL, 8 CpGs for TG, 4 CpGs for WHR, and 7 CpGs for BMI (Table 2, Additional file 3: Table S3). The direction of associations with cardio-metabolic traits was consistent in all 26 replicated CpGs. Based on the stringent Bonferroni-adjusted P value threshold, the replication signals were significant at 2 CpGs for TG (P < 0.05/35 = 1.43 × 10–3), 3 CpGs for WHR (P < 0.05/6 = 8.33 × 10–3), and 4 CpGs with BMI (P < 0.05/16 = 3.13 × 103) (Table 2 and Fig. 2).

Table 2 CpG sites associated with cardio-metabolic traits in DNA methylation analysis
Fig. 2
figure 2

The overlap of smoking-related CpGs and genes in association with cardio-metabolic traits. In the current study, 2549 smoking-related CpGs and 443 smoking-related gene expression probes were included. Of these, 1224 CpGs and 200 gene expression probes showed eQTM association. We found the association for 26 CpGs and 19 genes (21 expression probes) with at least one cardio-metabolic traits, which surpassing the nominal significance (P < 0.05) in the KORA replication study

Furthermore, out of the 200 smoking-related gene expression probes 39 (35 genes) were significantly associated with at least one cardio-metabolic trait at P < 2.5 × 10–4 (0.05/200) in the Rotterdam Study (n = 716 participants) (Additional file 2: Table S4). In the Illumina HumanHT-12 Expression BeadChip array, some of the annotated genes have more than one probe. Therefore, we adjusted the analysis for the number of probes we tested and provided both the probe ID and annotated gene in the tables. Of the 39 probes, we found associations with HDL (15 genes), LDL (1 gene), TG (18 genes), cholesterol (1 gene), glucose (3 genes), insulin (13 genes), WHR (6 genes), and BMI (14 genes). After further adjustments in model 2, the associations of 29 probes (27 genes) remained significant, including HDL (5 genes), LDL (1 gene), TG (14 genes), cholesterol (1 gene), and insulin (2 genes), for the association with BMI nothing changed (14 genes) (Additional file 3: Table S5). The R code to test the association between cardio-metabolic traits and the gene expression probes is included in Additional file 5. Replication in the KORA study for the gene expression probes that reached significance in model 2 showed 21 probes (19 genes) that passed the nominal significance (P < 0.05, n = 687 participants). These include 2 genes for HDL, 13 genes for TG, 1 gene for insulin, and 10 genes for BMI (Table 3, Additional file 3: Table S5). The direction of the association between gene expression and cardio-metabolic traits was consistent for all these genes. Based on the stringent Bonferroni-adjusted P-value in which we adjusted for the number of probes, the replication signal was significant at 2 genes for HDL (P < 0.05/5 = 0.01), 11 genes for TG (P < 0.05/15 = 3.33 × 10–3), 1 gene for insulin (P < 0.05/2 = 0.025), and 4 genes for BMI (P < 0.05/16 = 3.13 × 10–3). Several of these genes were associated in model 2 with more than one cardio-metabolic trait and were replicated at least at the nominal significance (P < 0.05). For example, KLRB1 (ILMN_2079655), ITM2C (ILMN_2366041), and CD3D (ILMN_2261416) were associated with both TG and BMI, and OCIAD2 (ILMN_1700306) was associated with both HDL and TG, and EFHD2 (ILMN_1761463) was associated with HDL, TG, and BMI (Table 3 and Fig. 2).

Table 3 Gene expression probes associated with cardio-metabolic traits

Next, we explored whether there is an overlap in the results obtained with DNA methylation and gene expression data, which possibly explain the link between smoking and cardio-metabolic traits. Table 4 shows the overlap of the replicated association of cardio-metabolic traits with gene expression, which were both also associated with the smoking-related CpGs, indicating a three-way association (Fig. 1). Additional file 3: Table S6 displays the three-way association as obtained in our discovery dataset. For example, we found in the Rotterdam Study overlapping association of serum HDL levels with four CpGs (cg01305745, cg06177555, cg07990556, and cg16448702) and expression levels of three genes (EFHD2, PRF1, and OSBPL5). Likewise, we found the association of TG levels with 18 CpGs and six genes (ICOS, GZMA, C12orf57, CD3D, CLDND1, and EFHD2). Finally, we found BMI to be associated with 16 CpGs and five genes (LRRN3, EFHD2, PID1, STOM, and CD3D) (Additional file 3: Table S6). Of these, we were able to replicate the three-way association of TG with DNA methylation levels of cg04716530 and expression levels of GZMA, and DNA methylation levels of cg21566642 and expression levels of CLDND1 in the KORA study. Furthermore, we found BMI to be associated with DNA methylation levels of 6 CpGs and expression of two genes (LRRN3 and PID1) (Table 4).

Table 4 The DNA methylation sites associated with gene

In the three-way association (Table 4), we also identified CpGs associated with expression levels of genes far approximate from their annotated gene/loci. We did a lookup for the identified CpGs for eQTM association using data from the BIOS-BBMRI database (https://www.genenetwork.nl/biosqtlbrowser/). Here, we found cis-eQTMs between cg17287155 and expression of EXOC3 and between cg03636183 and expression of F2RL3. In the Rotterdam Study, both EXOC3 and F2RL3 gene expression probes did not pass the QC. Hence, we could not test the influence of these genes in the identified eQTM associations in a three-way analysis.

Mediation analysis for smoking-related CpGs and genes associated with cardio-metabolic traits

As shown in Fig. 3, we used mediation analysis to investigate the effect of DNA methylation and gene expression, independently, in the association between smoking and cardio-metabolic traits. Also, we tested the mediating effect of DNA methylation in the association between smoking and gene expression. In total, we conducted three different models; first, gene expression as a mediator in the observed association between smoking and cardio-metabolic traits (A1 and A2 in Fig. 3); second, DNA methylation as a mediator in the observed association between smoking and gene expression (B1 and B2 in Fig. 3); and third, DNA methylation as the mediator in the association between smoking and cardio-metabolic traits (C1 and C2 in Fig. 3). We conducted the average causal mediation effect (ACME), average direct effect (ADE), and the proportion mediated (Prop. med.), which are illustrated in Table 5 (and Additional file 3: Table S7). The ADE reflects the effect of smoking on the tested outcome that does not depend on the mediator. The R code for the mediation analyses is included in Additional file 6, and an example input file is provided in Additional file 7.

Fig. 3
figure 3

Schematic overview of the mediation analyses. We used mediation analysis to test the mediation effect of gene expression in the association between smoking and cardio-metabolic traits (A1 and A2). Furthermore, we tested the mediation effect of DNA methylation in the associations between smoking and gene expression (B1 and B2) and the mediation effect of DNA methylation in the association between smoking and cardio-metabolic traits (C1 and C2)

Table 5 Mediation effect of DNA methylation and gene expression in the association between smoking and cardio-metabolic traits

The mediation effect of the three-way associations as obtained in our discovery dataset (Additional file 3: Table S6) is provided in Additional file 3: Table S7. Out of the 124 mediation analysis conducted, there was significant mediation effect in 69 of them in the Rotterdam Study (Additional file 3: Table S7). Of these, we were able to replicate the mediating effect of cg01305745 (VKORC1) and cg16448702 (INPP5D) in the association between smoking and PRF1 expression (ILMN_1740633). Also, we identified the mediating effect of cg16448702 (INPP5D) in the association between smoking and OSBPL5 (ILMN_1802151). Furthermore, we replicated the mediation effect of 9 CpGs in the association between smoking and LRRN3 expression (ILMN_1773650 and ILMN_2048591) (Additional file 3: Table S7). Finally, of the replicated three-way associations as shown in Table 4, we were able to replicate the mediation effect of cg03636183 (F2RL3), cg06096336 (PSMD1; HTR2B), cg13708645 (KDM2B), and cg17287155 (AHRR) in the association between smoking and LRRN3 expression (Table 5). We conducted the ρ at which ACME is 0, to test the models' sensitivity. Here, we obtained ρ's in the range between − 0.1 and − 0.5, and 0.1 and 0.4. A value of ρ close to 0 indicates that the assumption we made is sensitive to violations [24].

Discussion

The associations of smoking, gene expression, and DNA methylation with cardio-metabolic traits have been studied independently and reviewed in great detail [11, 25,26,27,28]; however, the overlap between epigenetics and transcriptomics in the association between smoking and cardio-metabolic traits has been studied much less. This study investigated the relationship between previously identified smoking-related changes in DNA methylation [6] and gene expression [8], followed by their associations with cardio-metabolic traits within two population-based cohort studies. In this line, we first showed several significant cis- and trans-eQTM associations between smoking-related CpGs and gene expression probes. Furthermore, we replicated 26 smoking-related CpGs and 19 smoking-related genes (21 probes) associated with cardio-metabolic traits. Moreover, we showed three-way association of TG with two CpGs and two genes (GZMA and CLDND1), and BMI with six CpGs and two genes (PID1 and LRRN). Finally, our study demonstrated a mediating effect of 4 CpGs (cg03636183, cg06096336, cg13708645, and cg17287155) in the association between smoking and the BMI-related gene LRRN3.

Our results showed a three-way association between TG with the decrease in DNA methylation levels of cg21566642 and the increase in expression levels of CLDND1. In this line, smoking was associated with an increase in the expression of CLDND1 [8] and a decrease in cg21566642 DNA methylation levels [6]; and here, we showed the inverse relation between CLDND1 expression and methylation levels at cg21566642. The expression of CLDND1, a tight junction protein, is shown to be highly increased in human Colon cancer samples and cell lines, and also positively correlated with tumor growth and disease progression [29]. The inverse association between DNA methylation levels at cg21566642 and smoking was previously shown in blood samples with cross-tissue replications in adipose tissue and skin tissue [20]. Additionally, cg21566642 is inversely associated with CVD risk [30], all-cause mortality [31], and with left ventricular mass (LVM) index in young adults [32]. LVM index is an important cardiac remodeling trait that is an intermediate phenotype for heart failure. In line with this, an increased LVM index is associated with high levels of TG [33, 34] and with an increased risk of depressed left ventricular ejection fraction, coronary heart disease, congestive heart failure, and stroke [35, 36].

In the three-way association for BMI, we found that smoking is associated with lower BMI, indicating that current smokers are less likely to be obese than never smoker, which has been reported in several previous studies as well [37,38,39]. Our results further showed that cg03636183 (F2RL3) was positively associated with BMI and negatively associated with the expression of PID1 and LRRN3. Smoking was inversely associated with cg03636183 [6] and positively with PID1 and LRRN3 expression [8]. Here, we found an inverse relation between cg03636183 and expression levels of PID1 and LRRN3. Due to the quality control implemented within the Rotterdam Study gene expression profiling data, gene expression data on F2RL3 was not available. Therefore, we could not test if the association of cg03636183 with PID1 and LRRN3 expression levels was independent or via a downstream effect of F2RL3 expression. Nonetheless, the inverse correlation between DNA methylation levels at cg03636183 and expression of F2RL3 was previously shown [20]. This might indicate that the identified eQTM associations are, at least partly, via F2RL3 expression. F2RL3 encodes the protease-activated receptor-4 (PAR-4), a protein expressed in various tissues that introduce platelet activation, intimal hyperplasia, and inflammation [40]. Furthermore, the expression of F2RL3 was associated with metabolic disease risk phenotypes, including a negative association with visceral fat mass and a positive association with total fat mass and android-to-gynoid fat ratio [20]. Additionally, the inverse association between DNA methylation levels at cg03636183 and smoking has been shown in blood samples with cross-tissue replications in adipose and skin tissues [20]. The inverse relation between DNA methylation levels at cg03636183 and TG [41], all-cause mortality [31], lung cancer incidence and mortality [42], as well as total mortality and cardiovascular mortality [43] was also previously identified. Also, a smoking-related decrease in cg03636183 methylation levels appears to increase serum levels of IL-18 [44]. IL-18 promotes the synthesis of IL-6, which stimulates the production of serum CRP [45, 46]. The increase in IL-18 and IL-6 leads to a higher risk ratio for CHD development [47]. Moreover, the increase in serum CRP concentrations results in increased risk ratios for CHD, ischemic stroke, vascular mortality, and non-vascular mortality [48].

Two of the CpGs, cg26361535 (ZC3H3) and cg25649826 (USP22), for which we found a three-way association with BMI and LRRN3, have been reported to be positively associated with BMI [49]. Both CpGs are cross-tissue replicated in adipose tissue and in isolated adipocytes for obese cases versus normal-weight controls. The association with cg26361535 was in the same direction and for cg25649826 in the opposite direction as obtained in our results [49]. Additionally, both CpGs were positively associated in blood with weight, WHR, glucose, insulin, TG, and CRP, and negatively with HDL. Furthermore, cg26361535 was positively associated with SBP and DBP [49] and all-cause mortality [31].

Finally, we identified a three-way association between BMI, an increase in methylation levels at cg17287155 (AHRR), and LRRN3 expression. Smoking is negatively associated with DNA methylation levels at cg17287155 [6] and, as we replicated here, positively associated with BMI [50]. Notably, in the eQTM look-up we found a cis-eQTM for cg17287155 with the expression of EXOC3, instead of with its annotated gene (AHRR). AHRR is a well-studied gene in relation to smoking [5] and is a key regulator of the Xenobiotic metabolism pathway responsible for detoxification of polyaromatic hydrocarbons (PAHs) in tobacco smoke [51, 52]. Nevertheless, EXOC3 overexpression increases insulin-induced glucose uptake in adipocytes [53], indicating a possible link for EXOC3 with CVD-related risk factors. Further research is needed to verify the eQTM-associations for cg17287155 with EXOC3 and its impact on the eQTM-associations identified in the current study.

The identified associations and mediating effects in our study indicate a possible regulatory effect of DNA methylation on the expression levels of genes far from the neighboring methylation site, which so-called trans-regulatory effect of methylated CpG sites on gene expression [54]. So far, most previous studies have limited their research to the correlation between gene expression and DNA methylation at CpGs located in the nearby regions and in the gene body, or the cis-regulatory effect. In this line, a recent study has shown the trans-regulatory effect of DNA methylation in the associations with gene expression and chronic obstructive pulmonary disease [54]. Therefore, future research is needed with a broader methodological approach, including examining possible trans-regulatory effects to gain more insight into the epigenetic regulatory effects in disease studies.

This study has strengths as well as limitations that should be considered when interpreting the results. The main strengths of this study include the availability of DNA methylation data in a large sample of adults from the general population overlapping with transcriptomic and clinical data. Another strength is the use of the largest available EWAS [6] and TWAS [8] to date for selecting the CpGs and genes of interest associated with smoking. A limitation of the current study could be that data on smoking habits are retrieved from questionnaires, which might be underestimating actual smoking levels possibly leading to information bias [55,56,57]. This self-reporting bias can arise due to several reasons, such as recall bias in which a participant might not remember the true exposure or social desirability bias in which participants deliberately underestimate due to the socially stigmatized nature [57]. However, we expect the underestimation to be primarily quantitative and should not significantly impact the current versus non-smoker categorization we used in this study. Also, the questionnaires used for smoking data-collection did not include information regarding passive smoking, which is a risk factor for CVD [58]. As a result, we were not able to adjust for the passive-smoking effect in our analysis. As these participants are included in the non-smoker group, this might have underestimated the true effect.

Furthermore, due to the nature of the current study we have included the same participants in all mediation analyses and have used the mediator and exposure measurements on the same time-point; therefore, we cannot rule out reverse causality. Another limitation is that DNA methylation and gene expression levels were only measured at baseline; hence, we have no access to pre-measurement covariates. Consequently, we could not further adjust our models without risking the adjustment of a mediator, which could explain the ρ values close to 0 we obtained in a subset of our models in the sensitivity analysis. However, we did include additional adjustments (e.g., BMI and relevant medication) in the association analysis between cardio-metabolic traits with DNA methylation and gene expression, indicating the robustness of the identified three-way associations. Also, due to the stringent quality control in the Rotterdam Study, we were not able to test the impact of the cis-eQTM genes in the identified eQTMs. Finally, the use of whole-blood for the quantification of DNA methylation and transcriptomics associated with smoking and cardio-metabolic traits could be a limitation, since DNA methylation and gene expression are tissue-specific. Nonetheless, these data from other tissues are currently not available in the majority of population-based studies including the two participating cohorts in this study.

Conclusions

In this study, we tested the association of smoking-related changes in DNA methylation and gene expression with cardio-metabolic traits. We found a three-way association of TG and BMI with CVD-relevant CpG sites and genes. Our results may provide further insight into the possible molecular cascades linking smoking to metabolic risk factors leading to CVD. Further research is warranted to conduct experimental research on the molecular mechanisms of the impact of smoking on cardiovascular disease and its risk factors through changes in DNA methylation and gene expression levels.

Methods

Study population

The discovery data set comprised a total of 1412 participants included in the Rotterdam Study; the design from the Rotterdam Study has been described elsewhere [21]. Briefly, in 1990 all residents of Ommoord, a district in Rotterdam, aged 55 years and older, were invited for participation (RS-I). In 2000, the cohort was extended with participants who had reached the age of 55 years or who had moved into the district (RS-II). An additional group was invited in 2006, from the age of 45 years and older (RS-III). Participants have been re-examined every 3–4 years. In the current study, we used data from the third visit from RS-II (RS-II-3) and the first and second visit of RS-III (RS-III-1 and RS-III-2). In total, DNA methylation measurements of 1412 participants from RS-III-1, RS-II-3, and RS-III-2 were included in our analysis. Additionally, gene expression data were available for 716 participants included in RS-III-1. Smoking information was collected via self-reported questionnaires; additional data collection details are described in Additional file 8.

The replication data comprised a total of 1717 participants included in The Cooperative Health Research in the Region of Augsburg (KORA) study. The KORA study is a series of independent population-based epidemiological surveys and follow-up studies of participants living in the region of Augsburg, Southern Germany. The KORA F4 study, a 7-year follow-up study of the KORA S4 survey (examined 1999–2001), was conducted between 2006 and 2008. The standardized examinations applied in the survey have been described in detail elsewhere [21]. A total of 3080 subjects with ages ranging from 32 to 81 years participated in the examination. In a random subgroup of 1802 KORA F4 subjects, the genome-wide DNA methylation patterns were analyzed as described in Additional file 3. Smoking information was collected via self-reported questionnaires; additional data collection details are described in Additional file 8.

DNA methylation data

DNA methylation in the Rotterdam Study and KORA study was extracted from whole peripheral blood and DNA methylation measurements were obtained using the Illumina Infinium Human Methylation 450K BeadChip (Illumina Inc, San Diego, CA, USA). The DNA methylation pre-processing procedures are described in Additional file 3. The methylation proportion of a CpG site was reported as a methylation β-value in the range of 0 to 1. Genome coordinates provided by Illumina (GRCh37/hg19) were used to identify independent loci.

In the current study, CpGs of interest were selected using a recent EWAS [6] investigating the association between tobacco smoking and changes in DNA methylation values in the epigenome. In total, 2623 CpG sites were identified as being significantly (P < 1 × 10–7) differentially methylated between smokers and never smokers. In the Rotterdam Study, 2549 out of the 2623 CpGs passed the quality control and are included in this study (Additional file 3: Table S8).

RNA expression data

In the Rotterdam Study, RNA was isolated from whole blood and gene expression profiling was performed using the IlluminaHumanHT-12v4 Expression Beadchips (Illumina, San Diego, CA, USA). The expression dataset is available at Gene Expression Omnibus (GEO) public repository under the accession GSE33828: 881 samples are available for analysis. In KORA F4, total RNA was extracted from whole blood and the Illumina Human HT-12 v3 Expression BeadChip (Illumina, San Diego, CA, USA) was used for gene expression profiling [59]. A more detailed description is implemented in Additional file 8.

In the current study, genes of interest were selected using a previous TWAS testing the association between gene expression and current versus never-smoking status [8]. In this TWAS, the meta-analysis was performed on all transcripts with matching gene Entrez IDs. Employing a significance threshold of FDR < 0.05, 886 significant gene Entrez IDs were identified, of which 387 replicated in an independent dataset. Employing the annotation file provided by the Illumina (HumanHT-12_V4), we found 502 gene expression probes to be annotated to these gene Entrez IDs out of which 443 were present in the Rotterdam Study and were included in the current study (Additional file 8: Table S9).

Correlation between DNA methylation and gene expression

Since DNA methylation and gene expression may affect each other (i.e., eQTMs), we tested the association between 2549 CpGs and 443 gene expression probes linked to smoking in participants who had both methylation and gene expression data available in the Rotterdam Study (N = 716). We regressed out age, sex, blood cell counts (fixed effect), and technical covariates (random effect) on the normalized beta-values of the CpGs and separately on the mRNA expression levels using a linear mixed model analysis. The association between the residuals of DNA methylation (independent variable) and gene expression (dependent variable) was examined using a linear regression model. The robust Bonferroni-corrected P value threshold for a significant association was P < 4.4 × 10–8 (0.05/443 × 2549).

Additionally, we randomly selected 443 gene expression probes from the IlluminaHumanHT12v4 Expression Beadchips, and 2549 CpGs from the Illumina Human 450K array, that were available in the Rotterdam Study. Using the same methods mentioned above, we tested the association between the 2549 smoking-related CpGs with the 443 randomly selected gene expression probes, and the association between 2549 randomly selected CpGs with the 443 smoking-associated gene expression probes. The chi-square test of independence was used to test possible enrichment for the smoking effect.

Association of DNA methylation and gene expression with cardio-metabolic traits

We studied the relationship of cardio-metabolic traits with (1) smoking-CpGs associated with at least one smoking-gene probe, and (2) smoking-gene probes associated with at least one smoking-CpG. We included the following cardio-metabolic-related phenotypes: HDL, LDL, TG, serum cholesterol, fasting glucose and insulin levels, SBP, DBP, WHR, and BMI.

First, we tested the association between the smoking-related CpGs (dependent variable) with the cardio-metabolic traits (exposure variable) using linear mixed effects models (LME4 package in R). The selected covariates in model 1 with fixed effects were age, sex, and cell counts for granulocytes, lymphocytes and monocytes. Array number and position number on array were added in the model as covariates with random effect to correct for batch effect. In model 2, we additionally adjusted for BMI and relevant medication, including for lipid exposures (lipid-lowering medication), for glycemic traits (glucose-lowering medication), for SBP and DBP (lipid-lowering medication and anti-hypertensives, diuretics, beta-blockers, calcium channel blockers, and RAAS modifying agents).

Second, we tested the association between gene expression (dependent variable) and the cardio-metabolic traits (exposure variable) using linear mixed-effects models (LME4 package in R), adjusting for age, sex, blood cell counts (granulocytes, lymphocytes, and monocytes), RNA quality score and batch effect. In model 2, we additionally adjusted for BMI and relevant medication (as described for DNA methylation).

Third, we combined our EWAS and TWAS results and showed the obtained three-way association; CpG versus gene expression; cardio-metabolic trait versus CpG; cardio-metabolic trait versus gene expression. For the CpG versus gene expression, we did a lookup for the identified CpGs to identify possible cis- eQTM associations using data from five Dutch biobanks (BIOS-BBMRI database) in a total of 3841 whole blood samples (https://www.genenetwork.nl/biosqtlbrowser/).

Mediation analysis

CpGs and gene expression probes associated with each other and associated with the same cardio-metabolic trait were reviewed in three mediation analyses (Fig. 3); (1) the mediation of gene expression in the association between smoking status and the cardio-metabolic trait, (2) the mediation of DNA methylation in the association between smoking status and gene expression changes, and (3) the mediation of DNA methylation in the association between smoking status and the cardio-metabolic trait. In all three analyses, we included the same participants, current versus non-smokers as exposure and all models are corrected for age and sex. In the first analysis, we used the gene expression as potential mediator and the cardio-metabolic trait as outcome. In the second analysis, we used DNA methylation as possible mediator and the gene expression as outcome. In the third analysis, we used DNA methylation as possible mediator and the cardio-metabolic trait as outcome. We used the “mediate” function in the mediation package in R [60], using the bootstrap method including 1000 simulations and confidence intervals using the BCa method [61]. The proportion mediated describes the average magnitude of indirect association between smoking status and the gene expression or cardio-metabolic trait attributed through changes in DNA methylation or gene expression relative to the average total association, and it is calculated by dividing the average causal mediation effect by the average total effect [62]. Asymptotic 95% confidence intervals (CI) were obtained from nonparametric bootstrapping with 1000 iterations. These mediation analyses assumed no additional unmeasured confounding; however, if unobserved variables confound the models, the unmeasured confounding assumption is violated. Therefore, we used the sensitivity analysis included in the mediation package using the “medsens” function conducted by varying the values of ρ and determine the ρ at which ACME is 0 per model. Obtaining a value of ρ close to 0 indicates that the assumption is sensitive to violations, meaning that having a confounder with a higher correlation than the value of ρ, the assumption of no additional unmeasured confounding likely does not hold [24].

Replication in the KORA study

The identified associations in the Rotterdam Study were replicated using the same models in the KORA study. The adjustment for blood cell counts (monocytes, granulocytes, and lymphocytes) was based on Houseman estimates rather than laboratory measurements [63]. Furthermore, principal components were used to adjust for technical covariates rather than plate number and position on array.

Statistical analysis

All analyses were performed using the statistical package R. The eQTM analysis, and the associations of the cardio-metabolic traits with smoking-related CpGs and genes were conducted in R (version 3.2.0) under a Linux operating system, using the “LME4” package (version 1.1-16) and the “parallel” package (version 3.2.0). The mediation analyses were conducted in R studio Desktop (version 3.2.0) under Windows operating system using the “mediation” package (version 4.4.6.). Data collection and related statistical methods are provided in Additional file 8.