Introduction

DNA methylation is a crucial epigenetic mechanism involved in regulating important cellular processes, including gene expression, cell differentiation, genomic imprinting, and preservation of chromosome stability. DNA methylation refers to the addition of methyl groups (–CH3) to the carbon-5 position of cytosine residues in a cytosine-guanine DNA sequence (CpG) by DNA methyltransferases. DNA methylation changes can be influenced by many factors including aging [17, 19] and environmental exposure such as smoking [1, 24] or specific dietary factors [35]. Experimental evidence suggests a link between B vitamins, including folate (vitamin B9), and epigenetic modifications [3]. B vitamins, especially folate, are essential components of one-carbon metabolism (OCM), the network of interrelated biochemical reaction in which a one-carbon unit is received from methyl donor nutrients and transferred into biochemical and molecular pathways essential for DNA replication and repair. Modifications in OCM can significantly impact gene expression and thereby cellular function [53].

Absorbed folate, circulating in the bloodstream, enters the OCM cycle in the liver where it is metabolized to 5-methyltetrahydrofolate (5-methylTHF) and converted into S-adenosylmethionine (SAM) after several successive transformation steps (Fig. 1). SAM is the methyl donor for numerous methylation reactions including the methylation of DNA, RNA, and proteins. The potential role of specific dietary factors including micronutrients such as folate, alcohol, and soya intake, in modifying breast cancer risk via epigenetic mechanisms, has been proposed [54], although evidence is still scarce and inconsistent.

Fig. 1
figure 1

Diagram of the one-carbon metabolism pathway. MS methionine synthase, MTHFR methylenetetrahydrofolate reductase, THF tetrahydrofolate, SAH S-adenosylhomocysteine, SAM S-adenosylmethionine

Alcohol intake affects epigenetic profiles [32]. Ethanol metabolism generates toxins that may directly lead to OCM dysfunction by reducing folate absorption, increasing renal excretion of folate, and inhibiting methionine synthase, the key enzyme in the generation of the methyl donor in the OCM [32, 33]. This antagonistic effect of alcohol on folate could plausibly increase the need of folate intake. Inadequate folate levels may result in abnormal DNA synthesis due to a reduced availability of SAM [27] and disrupted DNA repair and may, hence, influence cancer risk, including breast cancer [4, 60].

The epidemiological evidence linking dietary folate, alcohol intake, and epigenome modifications is, however, not well documented. Therefore, we investigated the relationships between dietary folate and alcohol intake with leukocyte DNA methylation patterns in the controls from the European Prospective Investigation into Cancer and Nutrition (EPIC) study on breast cancer. We complemented standard regression analysis with techniques for the identification of relevant methylated regions.

Methods

Study population

EPIC is a multicenter study that recruited over 521,000 participants, between 1992 and 2000 in 23 regional or national centers in 10 European countries (Denmark, France, Germany, Greece, Italy, The Netherlands, Norway, Spain, Sweden, and the UK) [43]. Among the 367,903 women recruited in EPIC, and after exclusion of 19,583 participants with prevalent cancers at recruitment (except non-melanoma skin cancer), first malignant primary BC occurred for 10,713 women during follow-up between 1992 and 2010. Within a nested case-control study that included 2491 invasive BC cases [34], a subsample of 960 women who completed dietary and lifestyle questionnaires and provided blood samples at recruitment (480 cases and 480 matched controls) from Germany, Greece, Italy, The Netherlands, Spain, and the UK was selected for the DNA methylation analyses [2]. The present study included analysis of 450 controls only originally enrolled in this case-control study on breast cancer (BC) nested within the EPIC study.

Methylation acquisition

Genome-wide DNA methylation profiles in buffy coat samples were quantified using the Illumina Infinium HumanMethylation 450K (HM450K) BeadChip assay [5] in 960 biospecimens from women included in the BC nested case-control study. A total of 20 biospecimens with replicates used to compare technical inter- and intra-assay batch effects and then excluded from the main analysis together with 19 matched pairs, i.e., 38 samples, where at least one of the two samples had a low-quality bisulfite conversion efficiency (intensity signal< 4000) or did not pass all of the Illumina GenomeStudio quality control steps, which were based on built-in control probes for staining, hybridization, extension, and specificity [23]. To prevent collider bias [11], as both alcohol intake and folate intake and DNA methylation profiles are all potentially associated with causes of BC, among the 902 remaining samples from the original case-control study on BC nested within EPIC study, only cancer-free women were selected for the present study. For the 451 controls sample, probes with detection p values higher than 0.05 were assigned “missing” value. After the exclusion of 14,548 cross-reactive probes [10], 47,963 probes overlapping known SNPs with minor allele frequency (MAF) greater than 5% in the overall population (European ancestry) [10] and 1483 low-quality probes (i.e., missing in more than 5% of the samples), 421,583 probes were left for the statistical analyses [2].

For each probe, β values were calculated as the ratio of methylated intensity over the overall intensity, defined as the sum of methylated and unmethylated intensities. The following preliminary adjustment steps were applied to β values: (i) color bias normalization using smooth quantile normalization [13], (ii) quantile normalization [6], and (iii) type I and type II bias correction using the beta-mixture quantile normalization (BMIQ) [56]. Then, M values, defined as \( {M}_{\mathrm{values}}={\log}_2\left(\frac{\beta_{\mathrm{values}}}{1-{\beta}_{\mathrm{values}}}\right) \), were computed [14]. Surrogate variable analysis (SVA) [30, 31] was used to remove systematic variation due to the processing of the biospecimens during methylation acquisition such as batch, indicating groups of samples processed at the same time, and the position of the samples within the chip [40]. Then M values were standardized to have an identical variance of 1.

The percentage of white blood cell counts, i.e., T cells (CD8+T and CD4+T), natural killer (NK) cells, B cells, monocytes, and granulocytes, was quantified using Houseman’s estimation method [20] and included as covariates in the analysis.

Lifestyle and dietary exposures

Data on dietary habits were collected at recruitment through validated center- or country-specific dietary questionnaires (DQ) [43]. Northern Italy (Florence, Turin, and Varese), UK, Germany, and The Netherlands used self-administered extensive quantitative food-frequency questionnaires (FFQs), whereas Southern Italy (Naples and Ragusa), Spain, and Greece’s centers used interview methods. Usual consumption of alcoholic beverages (number of glasses per day or week) per type of alcoholic beverage (wine, beer, spirits, and liquors) during the 12 months before the administration of dietary questionnaires was collected at recruitment. In addition, 24-h dietary recall (R) harmonized across EPIC countries was collected from a random sample (n = 36,900) in each center to be used as reference measurements [50]. R measurements were used to improve estimation of alcohol content per specific alcoholic beverages using a country-specific estimation of average of glass volume [48]. Dietary folate intake (μg/day) was estimated using the updated EPIC Nutrient Data Base (ENDB) [49], obtained after harmonization from country-specific food composition tables [7]. No specific information on the use of folate supplements was available.

Statistical analyses

After exclusion of one outlier value of dietary folate (value larger than the third quartile plus 10 times the inter-quartile range of the distribution), a total of 450 observations from controls only were retained for statistical analyses.

The association between dietary folate, alcohol intake, and methylation levels was evaluated via (i) CpG site-specific analysis, (ii) identification of differentially methylated regions (DMRs) [41], and (iii) fused lasso (FL) regression [57].

CpG site-specific models

M values expressing methylation levels at each CpG were linearly regressed on dietary folate (log-transformed to reduce skewness) and alcohol intake. Models were adjusted for recruitment center, age at recruitment (year), menopausal status (pre- or post-menopause), and white blood cell counts (proportions of T cells, natural killer cells, B cells, and monocytes in blood). False discovery rate (FDR) was used to control statistical tests for multiple testing.

For the two CpG sites that were associated with alcohol intake, based on q values, the percentage of methylation change for 1 standard deviation (SD) increase of alcohol intake was calculated as follows:

Methylation values in site j were log-transformed and regressed on alcohol intake (Ai), for each site j, and for i = 1, … , n, as:

$$ \log \left({M}_{ij}\right)={\alpha}_{0j}+{\alpha}_{1j}{A}_i+{\gamma_j}^T{Z}_i $$

where α1j estimate the regression coefficient, Zi is a vector of confounding factors related to methylation levels through a vector of regression coefficients γj. The ratio of any two log-transformed methylation values log(Mij1) and log(Mij0) with a difference of alcohol intake of 1 SD (\( {\widehat{\sigma}}_{\mathrm{alc}} \)) was predicted as \( {\widehat{\alpha}}_{1j}{\widehat{\sigma}}_{\mathrm{alc}} \). Therefore, the average percentage of methylation change for an increase of 1-SD in alcohol intake was estimated as:

$$ \frac{M_{ij1}}{M_{ij0}}=\left(\mathit{\exp}\left({\widehat{\alpha}}_{1j}{\widehat{\sigma}}_{\mathrm{alc}}\right)-1\right)\ast 100 $$

DMR models

Differentially methylated region (DMRs) analyses were identified with the DMRcate package [41]. The rationale of this method is to use kernel smoothing to replace the t test statistics at a given CpG site by a weighted average of t test statistics across its neighboring sites on the same chromosome. More precisely, let pc express the number of sites located on a given chromosome c with c ∈ {1,  … , 23} (the 23rd chromosome is chromosome X). For any site k on this chromosome, with k = 1, … , pc, the term tk2 indicates the square of the t test statistics obtained in site-specific analyses. For each site j on chromosome c, tj2 is replaced by the term \( {{\widehat{t}}_j}^2 \), defined as \( {{\widehat{t}}_j}^2=\sum \limits_{k=1}^{p_c}{K}_{jk}{t_k}^{2.} \)

where the terms Kjk express weights, with larger values for sites k closer to j. Let xk express the position of site k on the chromosome, i.e., its chromosomal coordinate in base pairs, these weights are defined using a Gaussian kernel, as

$$ {K}_{jk}=\exp \left(\frac{-{\left|{x}_j-{x}_k\right|}^2}{2{\left(\lambda /C\right)}^2}\right) $$

where parameters λ and C represent the bandwidth and the scaling factor, respectively. Here, we used λ = 1000 and C = 2, respectively, as recommended in [41].

Under the null hypothesis of no association between site j and alcohol (or folate), the distribution of \( \frac{{{\widehat{t}}_j}^2{\sum}_k^{p_c}{K}_{jk}}{\sum_k^n{K_{jk}}^2} \) can be approximated by a χ2 distribution [41] with \( {\left({\sum}_k^{p_c}{K}_{jk}\right)}^2/{\sum}_k^{p_c}{K_{jk}}^2 \) degrees of freedom [45]. Accordingly, p values were obtained for each site separately in each chromosome and q values were computed using FDR correction on all the p values to control for multiple testing. Then, DMRs were defined as regions with at least two significant sites separated by a maximal distance λ of 1000 base pairs. In line with [41], t statistics tk were obtained from regression models using an empirical Bayes method to shrink the CpG site variance [51], as implemented in the limma package [52]. For each DMR, the minimum q value, the minimum and maximum coefficients (in absolute value) of the sites included in the region were presented as qDMR, βmin, DMR, and βmax, DMR.

Fused lasso regression

Multivariate penalized regression provides an alternative to DMRs. We implemented a fused lasso (FL) regression [57], which is better suited than the standard lasso when covariates (CpGs) are naturally ordered and the objective is to identify regions on the chromosome of differentially methylated CpG sites. FL is particularly useful when the number of features (p) is way larger than the sample size (n), a situation classically known as p ≫ n.

FL is a multivariable regression method combining two penalties: (i) the lasso penalty, which introduces sparsity of the parameter vector, i.e., many elements of the estimated vector are encouraged to be set to zero, and (ii) the fused penalty, which encourages sparsity of the difference between two consecutive components in the parameter vector, thus introducing smoothness of parameter estimates in adjacent CpG sites [57].

To mimic the DMR analysis, a FL analysis was implemented where dietary folate and alcohol were, in turn, regressed on CpG methylation levels within each chromosome. The vector of methylation coefficient estimates \( \widehat{\beta} \) obtained by fused lasso regression was defined as

$$ \widehat{\beta}=\arg \min \left\{{\sum}_i{\left({y}_i-{\sum}_j{M}_{ij}{\beta}_j-{\gamma}^T{Z}_i\right)}^2+{\widehat{\lambda}}_1{\sum}_{j=1}^{p_c}{\omega}_j\left|{\beta}_j\right|+{\widehat{\lambda}}_2{\sum}_{j=2}^{p_c}{\nu}_j\left|{\beta}_j-{\beta}_{j-1}\right|\right\}, $$

where yi indicates, in turn, alcohol and dietary folate values for sample i = 1, … , n, Mij is the methylation levels at CpG site j, βj is the associated regression coefficient, Zi is a vector of confounding factors, consistently with linear regression and DMR analyses described above, γ is the corresponding non-penalized vector of coefficients, and ωj and νj are the weights associated with lasso penalty and fused penalty, respectively.

Following the rationale of the adaptive lasso [61] and the iterated lasso [8], the FL procedure was run for the first time with weights ωj and νj set to 1, which returned \( {\widehat{\beta}}_0 \), an initial estimate of \( \widehat{\beta} \). The final estimates \( \widehat{\beta} \) were obtained after running a second FL procedure with weights defined as \( {\omega}_j=\frac{1}{\left|{\widehat{\beta}}_{0,j}\right|+\varepsilon } \) and \( {\nu}_j=\frac{1}{\left|{\widehat{\beta}}_{0,j}-{\widehat{\beta}}_{0,j-1}\ \right|+\varepsilon } \), with ε = 10−4.

The FL procedure was implemented on a predefined grid of 50 × 50 = 2500 values for the pair of parameters (λ1, λ2). More precisely, the grid for λ1 consisted of 50 equally spaced values (on a log scale) between \( \frac{\lambda_{1,\max }}{1000} \) and λ1, max, where λ1, max was the lowest λ1 value for which FL returned a null \( \widehat{\beta} \) vector for λ2=0, a situation where FL reduces to a standard lasso. For each value λ1on this grid, the grid for λ2consisted of 50 equally spaced values (on a log scale) between \( \frac{\lambda_{2,\max}\left({\lambda}_1\right)}{1000} \) and λ2, max(λ1), where λ2, max(λ1) was the lowest λ2 value for which FL returned a vector \( \widehat{\beta} \) with all components equal. The optimal pair of tuning parameters (λ1, λ2) was selected as the one minimizing the prediction error estimated by 5-fold cross-validation [16], whose principle can be summarized as follows. The original sample is first partitioned into 5 equally sized subsamples. One subsample is held as the test set while the other 4 are used as a training set, on which FL estimates are computed for the 2500 values for (λ1, λ2). The prediction error is computed on the test set, and the process is repeated 5 times, and for each of the 2500 values of (λ1, λ2). The prediction error is defined as the averaged prediction error on the 5 test sets. FL analysis was implemented using the FusedLasso package.

Preprocessing steps and statistical analyses were carried out using the R software (https://www.r-project.org/) and the Bioconductor packages [21], including lumi, wateRmelon, and sva [29] for the preprocessing steps. The nominal level of statistical significance was set to 5%.

Results

Study population characteristics

Detailed characteristics of the 450 women included in the study are shown in Table 1. The average age at blood collection was 52 years (range 26–73). Participants had an average body mass index (BMI) of 26 kg/m2 (range 16–43) and were mostly post-menopausal (59%), never-smokers (56%), and moderately physically inactive (42%). The average daily intake of dietary folate was 270 μg/day (range 91–1012), and alcohol daily intake was 8 g/day (range 0–72). Non-alcohol consumers, defined as participants consuming less than 0.1 g/day of alcohol at recruitment, represented 15% of the population. Most participants were from the Italian and the German EPIC centers (Additional file 1: Figure S1).

Table 1 Characteristics of the study population (n = 450)

CpG site-specific models

After FDR correction, dietary folate intake was not significantly associated with methylation levels at any CpG sites (data not shown). Alcohol intake was inversely associated with the cg07382687 CpG site (qval = 0.048) and positively associated with the cg03199996 site (qval = 0.029) (Table 2). Both sites were located in an open sea region, i.e., a genomic region of isolated CpGs. cg07382687 was within the body region of gene CREB3L2, and cg03199996 was within the body region of gene FAM65C.

Table 2 CpG site-specific model results for the significant CpG sites for alcohol intake (adjusted for recruitment center, age at recruitment, menopausal status, and level of different lymphocyte subtypes)

DMR analysis

A total of 24 regions associated with dietary folate were identified, which included 190 CpG sites over-represented in the TSS1500 and 1st exon regions and under-represented in the body regions and regions outside any gene regions (Fig. 2a). The 15 most significant regions are described in Table 3 and the whole list provided in Additional file 2: Table S1. Among the 24 DMRs, 54% showed an inverse association with dietary folate, i.e., had a βmax, DMR < 0. The DMR most significantly associated with dietary folate (qDMR = 1.3E−13, βmax, DMR = 0.019) was DMR.F1 in chromosome 7, including 49 CpG sites, related to HOXA5 and HOXA6 genes. DMR.F5 was associated with HOXA4, another gene of the homeobox family, (qDMR = 5.8E−4, βmax, DMR =  0.016).

Fig. 2
figure 2

Repartition of gene regions (gene region feature category describing the CpG position, from UCSC. TSS200, 200 bases upstream of the transcriptional start site (TSS); TSS1500, 1500 bases upstream of the TSS; 5′UTR, within the 5′ untranslated region, between the TSS and the ATG start site; body, between the ATG and stop codon; irrespective of the presence of introns, exons, TSS, or promoters; 3′UTR, between the stop codon and poly A signal) among DMRs compare to their repartition within the Illumina 450K (the repartition of CpG sites was done among the 421,583 sites included in this study). a DMRs significant for folate. b DMRs significant for alcohol. c Illumina 450K

Table 3 The 15 most significant DMRs associated with dietary folate out of 24 significant DMRs (adjusted for recruitment center, age at recruitment, menopausal status, and level of different lymphocyte subtypes)

Alcohol intake was associated with methylation levels in 90 DMRs, including 550 CpG sites over-represented in TSS200, 1st exon, and 5′ untranslated regions (5′UTR) and under-represented in the body regions and the regions outside any gene regions (Fig. 2b). The 15 most significant DMRs are detailed in Table 4, and the full list is described in Additional file 3: Table S2. Alcohol intake was positively associated with methylation levels in 66% of the 90 DMRs. The two sites associated with alcohol intake in the CpG site-specific analyses were not included in any DMRs. The most significant DMR associated with alcohol consumption was DMR.A1, 9 sites within the GSDMD gene, (qDMR = 4.7E−14, βmax, DMR = 0.020).

Table 4 The 15 most significant DMRs associated with alcohol out of 90 significant DMRs (adjusted for recruitment center, age at recruitment, menopausal status, and level of different lymphocyte subtypes)

Methylation levels of each CpG site located in the two most significant DMRs for folate and alcohol, i.e.DMR.F1, DMR.F2, DMR.A1 and DMR.A2, are presented in Additional file 4: Figure S2 by tertiles of dietary folate and alcohol intake, respectively. Correlation heatmaps of CpG sites in DMR.A1, DMR.A2, DMR.F1, and DMR.F2 are displayed in Additional file 5: Figure S3, showing high levels of correlation among methylation levels within the DMR.F2 of dietary folate and the DMR.A2 of alcohol. Other regions showed less correlation, including the DMR.A1 of alcohol intake.

Fused lasso regression

For dietary folate, we identified 71 FL regions, 50 presenting a positive association and 21 an inverse association. Three FL regions were overlapping the 15 most significant DMRs (Table 3). Seven out of 8 sites from a FL region within the GDF7 gene were included in the DMR.F2 (βFL =  0.0029). All sites from a FL region associated with the PRSS50 gene were part of the DMR.F4 (βFL =  0.0069). Six out of 7 sites from the FL region within the GPR19 gene were within the DMR.F9 (βFL = 0.0076). None of the 68 other FL regions were overlapping any folate-related DMRs.

For alcohol consumption, we identified 133 FL regions, 71 regions presenting a positive association and 62 an inverse association. Twenty-one regions were included in alcohol-related DMRs. Among them, 9 were overlapping 6 of the 15 most significant DMRs (Table 4). The situation where two close FL regions were part of the same DMR was observed 3 times in the 15 most significant alcohol-related DMRs. In particular, four and three sites from two FL regions located in chromosome 22 were included in DMR.A11, associated with genes SMC1B and RIBC2. All the 9 sites from a FL region were included in DMR.A9 (βFL =  0.474).

Graphical representations of the DMRs, the FL regions, and their overlap are illustrated for each chromosome in Additional file 6: Figure S4 for dietary folate and Additional file 7: Figure S5 for alcohol intake. For dietary folate, most of FL regions were located in chromosome 3, chromosome 22, and chromosome X. A maximum of four DMRs located in the same chromosome was observed for chromosomes 2 and 3. As for alcohol intake, DMR and FL showed overlap mostly in chromosomes 6 and 22, with, respectively, 4 and 3 DMRs overlapping FL regions.

Discussion

In this study of women from a large prospective cohort, we investigated the association of dietary folate and alcohol intake with leukocyte DNA methylation via three different approaches. The site-specific analysis aimed at identifying single CpG sites independently from each other, whereas DMR and FL analyses aimed at identifying regions of CpG sites using the inter-correlation between methylation levels in close sites, thus exploiting the potential of specific regions of the epigenome to show methylation activity related to lifestyle factors.

While site-specific analysis showed a lack of association between dietary folate, alcohol intake, and individual CpG sites, DMR and FL analyses identified regions of the epigenome associated with dietary folate or alcohol intake. These two sites are located within the body region of the genes FAMB65C and CREB3L2. The FAMB65C gene, also named RIPOR3, is a non-annotated gene. The CREB3L2 gene encodes a transcriptional activator protein and plays a critical role in cartilage development by activating the transcription of SEC23A [18]. Translocation of CREB3L2 gene, located on chromosome 7, and the FUS gene (fused in sarcoma) located on the chromosome 16 has been found in some tumors, including skin cancer and soft tissue sarcoma [37, 38].

Alcohol is known to alter DNA methylation, mostly because it contributes to deregulation of folate absorption, which can lead to a dysfunction of OCM [27]. In our study, alcohol intake was associated with 90 DMRs, some of which may have a role in specific carcinogenesis processes. For example, alcohol intake was inversely associated with methylation levels in DMR.A64 related to the MLH1 gene, which is frequently mutated in hereditary nonpolyposis colon cancer (HNPCC) [39]. A positive association between alcohol intake and methylation in the DMR.A79 was related to the TSPAN32 (tetraspanin 32) gene, also known as the TSSC6 gene, which is one of the several tumor suppressor genes located at locus 11p15.5 in the imprinted gene domain of chromosome 11 [28]. This locus has been associated with adrenocortical carcinoma, lung, ovarian, and breast cancers. Methylations within DMR.A1 were positively associated with alcohol intake, and the related GSDMD gene has also been suggested to act as a tumor suppressor [44]. Alcohol intake was also positively associated with DMR.A6 related to the gene ADAM32, which encodes a protein involved in diverse biological processes, such as brain development, fertilization, tumor development, and inflammation [36].

Several genes, associated with the 24 DMRs identified in our study for dietary folate, were possibly involved in biological processes leading to carcinogenesis. For example, dietary folate was positively associated with methylation in DMR.F16 related to the RTKN (rhotekin) gene, which interacts with GTP-bound Rho proteins. Rho proteins regulate many important cellular processes, including cell growth and transformation, cytokinesis, transcription, and smooth muscle contraction. Dysregulation of the Rho signal transduction pathway has been implicated in many forms of cancer such as bladder cancer, gastric cancer, and breast cancer [9, 15]. Dietary folate was also associated with methylation levels in DMR.F1 and DMR.F5 within the HOXA4, HOXA5, and HOXA6 genes, members of the HOX family, known to be associated with cellular differentiation [46]. Perturbed HOX gene expression has been implicated in multiple cancer types [47]. In addition, HOXA5 may also regulate gene expression and morphogenesis. Methylation of this gene may result in the loss of its expression and, since the encoded protein upregulates the tumor suppressor p53, may play an important role in tumorigenesis [55].

Results from site-specific and DMR analyses were generated with different analytical strategies: methylation levels in different sites were assumed independent in the former, with linear regression models fitted separately in each CpG site, while in the latter, the physical proximity of CpGs was exploited to identify specific regions of the epigenome with similar methylation activity, under the assumption that neighboring CpG sites may share relevant epigenetic information. FL analysis revealed some overlaps with DMRs, particularly for alcohol intake, where 9 FL regions were observed within the 15 most significant DMRs. Yet, the overlap between DMR and FL analyses is relatively low and their results deserve cautious interpretations as they have differences in analytical strategies. Unlike DMRs, FL does not take into account the physical distance between consecutive sites, but rather introduce smoothness of parameters estimated in adjacent mutually adjusted CpG sites. Methylation levels within a chromosome were mutually adjusted in FL regression, while in DMR analysis t test statistics were based on independent associations of methylation levels with folate and alcohol.

The association between folate and DNA methylation has been investigated at different stages of human life, in particular during fetal development and elderly, where folate is especially needed. A meta-analysis of mother-offspring pairs estimated the association between maternal plasma folate during pregnancy and DNA methylation in cord blood [25]. After FDR correction, maternal plasma folate was positively associated with methylation level at 27 CpG sites and inversely associated with methylation level at 416 CpG sites. None of these sites was observed in any of the 24 DMRs related to dietary folate in the present study. This might be explained by the lack of power to identify specific sites due to the sample size: over 2000 samples were included in Joubert’s meta-analysis against 450 in our study. Then, different methods were used to assess folate intake, i.e., plasma folate against dietary folate.

An intervention study was conducted to evaluate the effects of long-term supplementation with folic acid and vitamin B12 on white blood cell DNA methylation in elderly subjects [26]. After the intervention of 2 years, 162 sites were significantly differentially methylated compared to baseline, versus 6 sites only for the placebo group. Folate and vitamin B12 were not significantly associated with methylation level in any CpG sites. Within the same study, 173 and 425 DMRs were identified for folate and vitamin B12, respectively. The gene HOX4, which was inversely associated with dietary folate in our study in DMR.F5, was the only region overlapping with the first 10 DMRs found in the intervention study [26]. However, a higher level of folic acid was observed in the intervention study: averages blood folate of 52 and 23 nmol/L in the intervention and placebo groups, respectively, compared to an average blood folate of 15 nmol/L in our study which might partly explain the different findings.

Within a recent meta-analysis including 9643 participants of European ancestry, aged 42 to 76 years with 54% women [32], 363 CpG sites were significantly associated with alcohol consumption, with 87% of these sites showing inverse associations. In our study, site cg02711608 was part of the 363 identified sites and was also included in DMR.A25 associated with gene SLC1A5. SLC1A5 gene encodes a protein which is a sodium-dependent amino acid transporter [42]. The important difference in the number of significant sites between the meta-analysis and the present study might mostly be explained by the larger study population size and the larger levels of alcohol intake observed in the meta-analysis [32]. Indeed, in the meta-analysis, composed of 46% of men, the medians of alcohol intake ranged from 0 to 14 g/day in the 10 European cohorts, while with a median of 3.5 g/day, alcohol intake was quite low in our study, which included only women. Lastly, cohort-specific approaches were used in the meta-analysis to remove technical variability, while the SVA approach was used in our study, which was shown to produce conservative findings compared to other normalizing techniques [40].

In our study, the sample size was relatively low (n = 450), and women only were included. With a median value of 3.5 g/day, a 95th percentiles equal to 31 g/day, and a percentage of non-consumers equal to 15%, alcohol intake displayed limited variability which potentially constrained the power of the study. In addition, questionnaire measurements used to assess dietary folate and alcohol intake are prone to exposure misclassification, which likely attenuated associations between lifestyle exposures and methylation levels. These elements may alone explain the lack of significant associations in our study. Further studies including men and women, possibly with larger sample size, are needed to further investigate the relationship between dietary folate, alcohol intake, and DNA methylation.

A major strength of this study was the use of ad hoc methodology for normalization of methylation data. Technical management of samples likely introduces systematic technical variability in methylation measurements that might compromise the accuracy of the acquisition process and, if not properly taken into account, could introduce bias in the estimation of the association of interest. The population used in this study included European women from the UK, Germany, Italy, Greece, The Netherlands, and Spain, implying a diversity of diet and lifestyle habits. Three approaches were used to evaluate the relationship between dietary folate, alcohol intake, and DNA methylation. The comparison between DMR and FL analyses was particularly relevant to identify regions of the genome associated with dietary folate and alcohol intake.

Alcohol was classified as group 1 carcinogen in 2012 by the IARC Monograph [22] and was associated with cancer of the upper aero-digestive tract, female breast, liver, and colorectum. Dietary folate has been recently inversely associated with the risk of breast cancer in EPIC [12], although the evidence is not conclusive [59]. Among the DMRs identified in this study for dietary folate or alcohol intake, several regions were associated with genes potentially implicated in cancer development, such as RTKN, the HOX family of genes, and the two tumor suppressor genes GSDMD and TSPAN32. Our study provides some evidence that dietary folate and alcohol intakes may be associated with carcinogenesis through a deregulation of epigenetic mechanisms, although our findings need to be replicated in future evaluations.

In this study, site-specific analyses served as a basis to explore more complex evaluations. By addressing the high dimensionality and complexity of DNA methylation, statistical techniques used in this work may prove useful for future epigenetic studies focusing on the relationship between lifestyle exposures, DNA methylation, and the occurrence of disease outcomes. These tools presented may be adapted to suit specific features of other -omics data.

Conclusion

Weak associations between alcohol intake and methylation levels at two CpG sites were observed. DMR and FL analyses provided evidence that specific regions of CpG sites were associated with dietary folate and alcohol intake, assuming that neighboring features share relevant epigenetic information. Folate and alcohol are known not only to be associated with breast cancer but also to have a mutually antagonistic role in the one-carbon metabolism. In some regions identified by DMRs or FL analysis, mapped genes are known to act as tumor suppressors such as the GSDMD and HOXA5 genes. These results were in line with the hypothesis that folate- and alcohol-deregulated epigenetic mechanisms might have a role in the pathogenesis of cancer.