Introduction

Type 2 Diabetes Mellitus (T2DM) affects 10.5% of the global population in the age group 20 to 79 years. T2DM threatens health of the individuals and healthcare systems due to its numerous complications and high healthcare cost and is among the top ten causes of death in the world [1].

During pregnancy, insulin resistance increases to maintain adequate glucose flow to the offspring. However, if the pancreatic beta cells cannot compensate with sufficiently high insulin secretion, this can result in hyperglycaemia and Gestational Diabetes Mellitus (GDM), which is defined by hyperglycaemia with first onset during pregnancy [2]. GDM increases the risk of pregnancy complications such as pre-eclampsia, caesarean section, neonatal hypoglycaemia, preterm birth, and foetal macrosomia. Moreover, GDM severely increases the future T2DM risk in the mother [3]. Although the prevalence of GDM is increasing globally, it varies depending on population characteristics and the diagnostic criteria used [4].

Epigenetics is the study of changes in gene expression that are not caused by changes in the DNA sequence. Some epigenome-wide association studies (EWAS) suggest that epigenetic mechanisms contribute to the pathogenesis of T2DM [5,6,7]. Genetic risk scores (GRS) have been increasingly used to assess disease risk, such as for T2DM [8, 9]. Similarly, methylation risk scores (MRS) are increasingly studied, to assess associations with outcomes of interest such as some cancers and kidney disease [10], and show promising potential as a tool to aid the prediction of T2DM and to understand gene-environment interactions. Schrader et al. showed that MRS separated T2DM subjects into different groups and were associated with diabetic complications like cardiovascular disease, chronic kidney disease and retinopathy [11].

Despite similar genetics and pathophysiology between GDM and T2DM [12], genetics only explain a small proportion of overall T2DM risk, and environment is known to play an important role for epigenetics. However, whether epigenetic marks are common between GDM and T2D has not been reported previously. Epigenetics marks common for GDM and T2DM may help us understand these pathological mechanisms better in order to prevent, diagnose or improve treatment of GDM and T2DM. We aimed to test whether an MRS for T2DM is associated with a GDM across ancestry and GDM criteria.

Methods

Study population

The EPIPREG cohort

A subgroup of 480 women was selected from the STORK Groruddalen (STORK G) pregnancy cohort. STORK G included 823 healthy women from different ethnic origins (European, South Asian, African, Middle Eastern and South American) who attended three different Child Health Clinics in the area of Groruddalen, Oslo, Norway, during the 2008–2010 period [13]. Ethnic origin was defined by either the individual’s country of birth or their mother’s country of birth if the latter was born outside of Europe [17]. The EPIPREG subgroup has a total of 312 European subjects (EPIPREG_EU), whereof 73 were diagnosed with GDM, and 168 South Asians (EPIPREG_SA), whereof 68 were diagnosed with GDM. European and South Asian ancestry was determined by genetic principal components [14]. Fasting blood samples were collected, and a 75 g oral glucose tolerance test (OGTT) was offered to all women (universal testing) at week 28 ± 2 of pregnancy. For the present study GDM was classified according to the slightly modified International Association of the Diabetes and Pregnancy (IADPSG) criteria (fasting glucose ≥ 5.1 mmol/l and/or 2-hour glucose ≥ 8.5 mmol/l, as 1-h glucose values were not available).

The Norwegian Regional Committee for Medical Health Research Ethics South East approved the STORK-G study including genetic and epigenetic data (ref. number 2015/1035). Written informed consent from all participants was obtained before any study-related procedure.

EPIDG cohort

A total of 32 (16 GDM, 16 non-GDM) pregnant women were selected from EPIDG cohort, which started in 2019 and is still recruiting participants. The EPIDG cohort has a total of 230 Mediterranean (South European) pregnant women who attended the Unit of Diabetes and Pregnancy at University Hospital Virgen de la Victoria, Málaga, Spain. All participants gave their consent to participate in the study. GDM criteria followed a two-step strategy from the National Diabetes Data Group (NDDG) guideline [15]. The first step was a screening using the O’Sullivan test (50 g glucose overload) between 24 and 28 weeks. Then an oral glucose tolerance test (OGTT)-100gr was performed on those women with a positive result in the O’Sullivan test (≥ 0.7.8 mmol/L). GDM was diagnosed if glucose values were for the OGTT-100 higher than the threshold, at least at two points: fasting > 5.8 mmol/l, after 1 h > 10.6 mmol/l, after 2 h > 9.2 mmol/L, after 3 h > 8.0mmol/l. Pregnant women with normal OGTT-100gr were considered as controls (non-GDM). The 32 samples were selected based on the availability of blood samples and matched by age, gestational age and pre-pregnant BMI. The study was approved by the Institutional review board at the Hospital Universitario Virgen de la Victoria, Málaga.

Samples extraction, DNA isolation and bisulfite conversion

In both cohorts, samples were collected in gestational week 28 ± 2. The samples were either aliquoted and biobanked or subject to routine laboratory analyses. In the EPIPREG sample, DNA from peripheral blood leukocytes was extracted subsequently throughout the data collection, at the Hormone Laboratory, Oslo University Hospital, using a salting out procedure [16], and stored at -80ºC. EZ DNA methylation TM Kit (ZYMO Research, Tustin, CA, USA) was used for the bisulphite conversion of DNA. In EPIDG cohort, DNA from peripheral blood leukocytes was extracted using DNA Blood Mini Kit (Quiagen, Hiden, Germany). Epitect Bisulfite Kit (Qiagen, Germany) was used for bisulphite conversion. DNA methylation in both cohorts was quantified with Infinium MethylationEPIC BeadChip (Illumina, San Diego, USA).

Methylation values extraction

Raw data from both studies were analysed in R. In EPIPREG, the Meffil R package (https://cran.r-project.org/) was used for quality control (QC), normalization and reporting of B-values [14]. During the QC procedure implemented in Meffil R package, we removed six samples who were considered outliers from the methylated to unmethylated ratio comparison (> 3SD), one sample that was an outlier in control probes bisulfite 1 and 2 (> 5SD), and one sample due to sex discrepancy (predicted sex outlier > 5 SD). Probes with a bead count fewer than three and dectection p-value < 0.01. In total 472 of the 480 samples and 864,560 probes passed the QC. Functional normalization was implemented in Meffil R, which takes into account potential batch effects such as slide, row and columns. Proportions of blood cells (namely, CD8T, CD4T, NK, β-cells, monocytes, and neutrophils) were calculated using the Houseman method [27] during the QC [17].

After processing, 472 of the 480 available individuals remained and 810.266 in the EPIPREG sample passed QC. In EPIDG, the full sample and 741.479 probes passed QC (Supplementary Fig. 1).

Methylation risk score

To construct the weighted DNA MRS for T2DM in our sample we used summary data for the CpGs sites discovered in an EWAS of T2DM in five prospective European cohorts (n = 5.859) [7, 18,19,20,21]. The regression coefficients for the EWAS of T2DM was based in β-values. We calculated MRS for each of the discovered CpG sites available in our two cohorts. Due the QC filters of CpGs in each cohort only 42 of 72 CpG could be included in the MRS (Supplementary Table 1). MRS was constructed by multiplying the regression coefficient from the EWAS of T2DM with individual β-values in EPIPREG and EPIDG (EPIPREG-EU, EPIPREG-SA and EPIDG). Thereafter we summarized the individual score for each of the 42 CpGs sites to obtain the MRS. Figure 1 illustrates the workflow, including development of the MRS and the statistical analysis.

Fig. 1
figure 1

Workflow of the methodology followed to construct the MRS and the rest of the statistical analysis. preBMI: previous pregnancy BMI, Fam his diabetes: family history of diabetes. Blue Line indicate the in-house data vs. the public data used in our study

Statistical analysis

All the statistical analyses were performed using Rstudio (3.4.4). Clinical variables were compared between GDM and non-GDM in each group separately (EPIPRE_EU, EPIPREG_SA and EPIDG), depending on if the variable was normally distributed or not, t-test or Whitney U test were used for the continuous variable. A logistic mixed model regression was used to elucidate if MRS was associated with GDM in our three groups. Linear mixed model regressions were performed to test the association between MRS and continuous variables such as HOMA-IR, C-peptide, fasting glucose and 2 h glucose level, using a Bonferroni corrected threshold p < 0.0125 (0.05/4). We used the R packages lme4 and lmerTest [22] to perform mixed models regression. In these models, ancestry (European, South Asian, Mediterranean (EPIDG)) was treated as a random intercept to overcome potential ancestry-related differences in DNA methylation. We performed an unadjusted and an adjusted model by age, pre-pregnant BMI, family history of diabetes and smoking status, since these are covariates that are associated with both epigenetic marks and gestational diabetes.

To identify individual CpG sites associated with GDM in the studied populations, we performed logistic regressions for each CpGs used in the MRS across the three groups, followed by a fixed effects meta-analysis of the three groups for each CpG site using METAL [23]. Since random-effect models may bias the results if smaller studies have a large effect compared to well-performed, larger studies [24], we decided to use a fixed-effect model for a conservative approach, where the case-control design in EPIDG would not count more than the population-based design in EPIPREG.

mQTLs and phenotypes associated

Due to low statistical power for mQTL analysis in EPIPREG, we performed look-ups in GoDMC of the significant CpGs obtained in the meta-analysis to identify methylation quantitative trait loci (mQTL) to see if these methylation sites were influenced by genetic variants. GoDMC include Cis and trans meta-analysis results from genome-wide scans of 420.509 DNA methylation sites. This information come from several projects, with the aim to share data on genetic basis of DNA methylation variation. GoDMC provides a list of SNPs that have been associated with the CpG of interest. We used the LDlink-SNPclip tool [25] to identify SNPs in linkage disequilibrium (r2 > 0.8, MAF = 0.01) and the selected populations were European and South Asians. To assess relevance of the mQTLs to diabetes related phenotypes, we performed explorative look-ups in Phenoscanner [26] of phenotypes nominally associated with the mQTLs. Phenoscanner is a database holding publicly available results from large-scale GWAS. Although the default threshold used by Phenoscanner is p < 1 × 10− 5, we also looked up phenotypes with p-value < 0.05 due to the explorative nature of this look-up [27].

Results

Characteristics of the samples

The clinical characteristics of EPIPREG and EPIDG are represented in Table 1. In the EPIPREG cohort, non-GDM and GDM women differed significantly in pre-pregnant BMI, fasting glucose (Gluc0), 2 h glucose after OGTT (Gluc2), systolic blood pressure (SBP), glycosylated hemoglobin A1C (HbA1C), Homeostatic Model Assessment for insulin resistance (HOMA-IR), triglycerides (TAG), cholesterol and HDL across ethnicities. Only fasting glucose levels differed significantly between GDM cases and controls in the EPIDG cohort (Table 1).

Table 1 Characteristics of the study subjects in each cohort

Methylation risk score

In the mixed models regression, MRS for T2D levels were higher in women with GDM compared to non-GDM (adjusted model O.R: 1.4, 95%C.I: 1.10–1.74, P.val = 0.00014). For continuous traits, the MRS for T2D was significantly associated only with fasting glucose in the adjusted linear mixed models (P.val = 0.0086) (Table 2). When assessing EPIDG, EPIPREG Europeans and EPIPREG South Asians separately, MRS for T2D was associated with higher OR for GDM in EPIPREG South Asians, and followed the same direction of effect in EPIPREG Europeans and EPIDG although not statistically significant (Fig. 2; Supplementary Table 2).

Fig. 2
figure 2

Boxplot MRS for each group: red (SA_nonGDM)- EPIPREG south Asians non-GDM, Yellow (SA_GDM)- EPIPREG South Asians GDM, Green (EU_nonGDM)- EPIPREG Europeans non-GDM, Light Blue (EU_GDM): EPIPREG Europeans GDM, Dark Blue (M_nonGDM)- EPIDG (Mediterraneas) non-GDM, Pink (M_GDM)- EPIDG GDM

mQTLs for CpG sites common between T2DM and GDM

In the meta-analysis of the 42 CpG sites included in the MRS for T2DM, six CpGs were significantly associated with GDM (FDR < 0.05) (Table 3). All the identified CpG sites, except cg25130381, had significant mQTLs from lookups in GoDMC. A total of 23 mQTLs were found (supplementary Table 3). According to lookups in Phenoscanner, only two CpGs had mQTLs associated at p-value < 1 × 10− 5 with phenotypes related to immunoglobulin G, Rheumatoid arthritis, and body composition (Supplementary Table 4). However, when using a liberal threshold (p-value < 0.05), we observed some mQTLs possibly related to diabetes related phenotypes (Table 4).

Table 2 Association between the MRS for T2D and phenotypes in EPIPREG and EPIDG
Table 3 List of the 6 CpGs significantly associated with GDM across the three samples (EPIPREG_EU, EPIPREG_SA, EPIDG)
Table 4 mQTLs and disease associated, according to phenoscanner

Discussion

This is the first study to show that an MRS for T2DM is associated with GDM across our two cohorts, diagnostic criteria and ancestry. In addition, the MRS for T2DM was associated with higher fasting glucose levels. Six CpGs of the 42 included in the MRS, were significantly associated with GDM across EPIPREG Europeans, EPIPREG South Asians and EPIDG Mediterranean. We identified 23 mQTLs linked to the 6 CpG sites. Some of them were associated with T2DM and GDM with nominal significance in lookups of GWAS summary data.

GDM and T2DM share many pathophysiological factors, but the exact underlying mechanisms are largely unknown. There is a need to know how T2DM and GDM are linked to improve the prevention and thus avoid metabolic complications in the future. Genome-wide association studies (GWAS) have been used to investigate the potential link between GDM and T2DM [12], but none have so far tested whether GDM and T2DM share epigenetic marks. An increasing number of studies suggest that environmental factors are associated with epigenetic marks and that those marks may be important to understand interactions between genetics and environmental factors. A few EWAS of GDM in maternal peripheral blood are published so far, but the data is not sufficient for an MRS for GDM due to heterogeneity in diagnosis criteria used as well as their small sample sizes [28,29,30,31].

Our MRS was constructed from a meta-analysis of five European cohorts that identified CpGs associated with incident T2DM [18]. According to the bibliography of the genes included in the MRS, some of them were associated with glucose metabolism [32, 33], T2DM, type 1 diabetes mellitus and GDM [34]. ATP binding cassette subfamily G member 1 (ABCG1) is a gene involved in macrophage cholesterol and phospholipid transport and may regulate cellular lipid homeostasis. ABCG1 has been associated with diabetes and metabolic syndrome [35]. Furthermore, some studies show that GDM affects cholesterol homeostasis through this gene, therefore it could have a role in GDM and cardiovascular events [36].

The mQTLs were nominally associated with T2DM and GDM in lookups, but were interestingly also associated with phenotypes related to body composition, immunoglobulin G and autoimmune diseases associated with inflammation, such as rheumatoid arthritis and Crohn’s disease. These observations may suggest a pleiotropic relationship between these mQTLs for diabetes related CpG sites, but further exploration is necessary to assess the relevance and understand potential mechanisms.

The strengths of this study include the combination of two independent cohorts with case-control design in EPIDG which maximizes differences in GDM and non-GDM versus the population-based design in EPIPREG which includes the full range of values including less severe GDM. This combined with the two ancestries in EPIPREG, strengthen the evidence that these CpG sites may be important to understand the common epigenetic grounds of GDM and T2D. Limitations include limited statistical power in analysis of each sample separately and in associations between separate CpG sites and GDM. Still, the MRS for T2D was associated with GDM in EPIPREG South Asians, with the same direction of effect in EPIPREG Europeans and EPIDG. Further, an MRS for GDM based on an EWAS of GDM instead could have given a more precise prediction of GDM. Unfortunately published EWAS of GDM are still few and have small sample sizes which would result in a less robust MRS [40,41,42]. Also, our cross-sectional study design cannot entangle whether GDM status influence DNA methylation or vice versa. Hence this MRS should be tested in early pregnancy to be considered as potential predictor of GDM.

Genetic risk score reflects the inherited risk but have shown poor performance in complex disease such as diabetes. MRS is thought to reflect environmental triggers of the disease or phenotype and help to understand disease mechanism. Recent studies have shown the utility of both GRS and MRS for clinical prediction of different phenotypes [43]. Although GRS is being used in multiple diseases, their use present some limitations [44]. For example, GRS doesn’t reflect the effect of the environment on the phenotype whereas MRS incorporate both genetic factors and environmental exposures and their variation over time. Undoubtedly, combining both scores would enhance their clinical application for predicting or stratification of risk subjects. The MRS level differed between EPIPREG and EPIDG cohort, suggesting that the MRS may be population specific. Although we cannot rule out that part of these differences may be due to the batch or handling effect, they could also be a result of the EPIDG control group not having negative O’sullivan test, while non-GDM women in EPIPREG had normal fasting and 2 h glucose values (1 h glucose OGTT missing in EPIPREG). Further, the populations in EPIPREG and EPIDG differ largely in many aspects, including the severity of GDM.

To conclude, we are the first to show that MRS for T2DM is significantly associated with GDM, suggesting shared epigenetic mechanisms between GDM and T2DM. This may help explain some of the molecular mechanisms mediating the increased risk of developing T2DM after GDM. Future research should compare whether GRS and MRS or a combination of the two provide better predictions of GDM and T2DM and explore how transcription of the identified genes impact methylation on nearby genes. Furthermore, understanding the role of the genetic variants in disease development may help to improve prevention and management of both GDM and T2DM in the future.