Type 2 diabetes mellitus is a complex metabolic disease, of which the prevalence worldwide is growing rapidly. According to recent data, globally 415 million people are estimated to have type 2 diabetes [1]. Hallmarks of type 2 diabetes include chronically elevated blood glucose levels due to decreased insulin secretion from pancreatic beta cells and insulin resistance in different tissues [2].

In addition to well-known risk factors for type 2 diabetes such as being overweight, unhealthy lifestyle, metabolic alterations, previous diagnosis of gestational diabetes, or a family history of cardiovascular disease (CVD) or type 2 diabetes [3], genetic susceptibility to the disease is also important, with heritability estimates ranging from 20% to 80% [4, 5]. To date, genome-wide association studies (GWASs) have identified at least 75 loci associated with type 2 diabetes [6]. However, these genetic variants explain only 10–15% of disease heritability, suggesting a major role for environmental and lifestyle factors [6, 7].

To identify the missing component of type 2 diabetes pathogenesis, researchers have started to examine the role of epigenetics in the disease aetiology. Epigenetics refers to DNA alterations that lead to differences in gene expression without changing the DNA sequence. These epigenetic changes can be influenced by the environment and may cause differences in disease susceptibility between individuals [8].

Initially, epigenetic studies used a candidate gene approach to identify DNA methylation changes in known type 2 diabetes susceptibility genes. With the advances in measurement technology, approaches have shifted towards epigenome-wide association studies (EWASs), allowing novel biomarkers for complex diseases to be found. Development of type 2 diabetes requires perturbation of multiple biological mechanisms in different organs, including pancreas, liver, skeletal muscle and adipose tissue [9]. EWASs using those tissues would provide a comprehensive insight into the disease aetiology; however, access to such samples is not possible on a large scale. Therefore, most EWASs have been conducted using whole blood [10].

Here, we present an overview of recent human EWASs investigating DNA methylation changes associated with type 2 diabetes and/or glycaemic traits represented by fasting glucose and HbA1c levels. Moreover, we discuss the EWASs findings and the strengths and limitations of different approaches. To validate methylation loci identified in the reviewed EWASs, we also performed a replication study in blood samples of 100 diabetic and 100 control individuals selected from a Dutch population-based Lifelines study [11]. Next, we investigated whether differential DNA methylation patterns as previously identified in pancreas, liver and adipose tissue were also reflected in blood.


Literature search

The systematic review was conducted according to the PRISMA and MOOSE guidelines. We searched PubMed and EMBASE for relevant studies investigating DNA methylation associated with type 2 diabetes or fasting glucose and HbA1c levels, up to 26 April 2017. The search strategy, inclusion and exclusion criteria are provided in the electronic supplementary material (ESM Methods). Ultimately, 22 publications were selected for whole-text evaluation. Three studies were excluded (Fig. 1), resulting in a total of 19 studies included in the review.

Fig. 1
figure 1

PRISMA 2009 flow chart of the literature search performed up to 26 April 2017

Replication analyses: selection of CpG sites

For the replication analyses, four additional studies were excluded: one that only indirectly investigated association with type 2 diabetes [12] and three that used a different platform from the Illumina array [13,14,15]. Thus, 15 studies were included for replication analysis (Fig. 1). For further CpG sites (CpGs) selection, we applied a study-specific Bonferroni correction for multiple testing for EWASs results (p value < 0.05/(the number of CpGs analysed)), even if a different multiple-testing correction was used by the authors of the original manuscript. This was done to avoid false positive results from the studies that used lenient significance thresholds.

Lifelines case–control sample

Lifelines is a prospective population-based cohort to study health and health-related behaviours of 167,729 individuals living in the North of the Netherlands [16]. Details on clinical examination and biochemical measurements have been described elsewhere [16]. In short, a standardised protocol was used to obtain blood pressure and anthropometric measurements such as height, weight and waist circumference. Blood was collected in the fasting state, between 08:00 and 10:00 h. On the same day, fasting blood glucose and HbA1c were measured.

For this study we used a case–control sample selected from the baseline of the Lifelines study (all unrelated and European ancestry samples, n = 13,436) [11]. Four groups were selected based on the following criteria (n = 50 for each group):

  1. (1)

    type 2 diabetes patients without CVD complications;

  2. (2)

    type 2 diabetes patients with CVD complications;

  3. (3)

    non-diabetic control participants, with no history of CVD risk factors, and age- and sex-matched to groups 1 and 2;

  4. (4)

    healthy, normal-weight control participants (BMI < 25), additionally obtained from available methylation dataset to increase the power of the study.

In total, we included 100 type 2 diabetic individuals and 100 control individuals. Diagnosis was based on self-reported disease and/or use of blood-glucose lowering medication, or an elevated fasting blood glucose ≥ 7.0 mmol/l at examination. Individuals with CVD complications had a CVD history defined as self-reported myocardial infarction, stroke, angina pectoris or vascular intervention.

DNA methylation methodology

DNA was isolated from fasting whole blood samples. Next, 500 ng of genomic DNA was bisulphite modified using the EZ DNA Methylation kit (Zymo Research, Irvine, CA, USA) and hybridised to Illumina 450K arrays (San Diego, CA, USA) according to the manufacturer’s protocols. Data were generated by the Genome Analysis Facility of UMCG, the Netherlands ( Quality control (QC) and normalisation steps are described in detail elsewhere [17] and in ESM Methods. In short, the QC pipeline developed by Touleimat and Tost was used with background correction and probe type normalisation [18]. Then, normalised β values were logit-transformed into M values for downstream analysis, since they have been shown to perform better in studies with smaller sample sizes [19].

Statistical analysis

All analyses were performed using R-studio software (version 3.3.0;; and the limma package. Linear regression model 1 included age, sex, measured blood cell composition (percentage of basophilic granulocytes, eosinophilic granulocytes, neutrophilic granulocytes, lymphocytes and monocytes), plate number and position on the plate as covariates. Additionally, we adjusted for other covariates in models 2–6: (2) model 1 + BMI; (3) model 1 + medication use and newly diagnosed diabetes; (4) model 1 + smoking status and education level; (5) model 1 + presence of cardiovascular complications; (6) model 2 + education level. In addition to the adjustment for measured cell type composition, we estimated cell types based on the Houseman method [20] and compared results. We also performed sensitivity analyses using the model 1 in smaller groups: (1) 50 type 2 diabetes individuals without complications compared only with 50 age- and sex-matched control individuals; and (2) 100 type 2 diabetes individuals with and without complications compared only with 50 age- and sex-matched control individuals. To determine whether the methylation levels at replicated top hits were correlated with type 2 diabetes risk factors, we calculated Pearson correlation coefficients based on methylation β values. We used a strict analysis-specific Bonferroni correction for multiple testing (p value < 0.05/(the number of CpGs selected for replication)).


Recent discoveries

Our search strategy retrieved 19 EWASs investigating DNA methylation associated with type 2 diabetes or glycaemic traits (Fig. 1), including 16 studies focusing on type 2 diabetes as outcome (Table 1) and four studies focusing on glycaemic traits (Table 2), with one study listed twice [25]. We assessed the quality of included studies using the Newcastle–Ottawa scale for observational studies (details in ESM Methods) [36]. Seventeen out of 20 studies (one listed twice) were assessed to have a low or medium risk of bias and only three studies were evaluated to have high risk of bias (data not shown). In the majority of the reviewed studies, an array-based methodology was employed in the discovery phase: two using the 27K and 13 using the 450K Illumina array. Only one study used whole-genome bisulphite sequencing, which is considered a gold standard in methylation studies [14]. Most of the blood-based studies (ten out of 19) were performed in larger sample sizes (n = 6 – Z2000) than studies in pancreas, liver, skeletal muscle and adipose tissue (n = 12–100). The EWASs were conducted in different ethnic groups: Europeans, Indian Asians, Mexican Americans, and Ashkenazi Jews [21, 24, 25, 28]. Despite the differences in ethnicity and study design, some CpGs such as those in the ABCG1, TXNIP and SREBF1 genes were reported in multiple blood-based studies [21, 23,24,25, 33, 34]. There was no clear overlap in significant CpGs across tissues, but some studies reported a significant correlation between the level of methylation at specific CpGs in blood and liver [21] or in blood and pancreas [12].

Table 1  Characteristics of EWASs associated with type 2 diabetes
Table 2  Characteristics of EWASs associated with glycaemic traits

Study design

The majority of the reviewed EWASs (18 out of 19) used a cross-sectional design, in which phenotype and DNA methylation profile were measured at the same time point either in unrelated individuals (type 2 diabetic and healthy control participants, 15 studies) or in twin pairs, discordant for type 2 diabetes (three studies) (Tables 1, 2). Strengths of this approach typically include a large study population selected from ongoing cohorts and the possibility to adjust for existing confounders like BMI or smoking. However, a cross-sectional approach cannot establish whether the difference in methylation preceded the onset of type 2 diabetes.


  1. (1)

    Blood: The interpretation of blood-based EWASs results can be difficult, because many top hits from EWASs are known genes from immune response and inflammatory pathways, which can be mediated by the blood cell composition and, thus, do not reflect true associations with type 2 diabetes. Six out of ten blood-based studies used the reference-based estimation methods by Houseman [20] or Jaffe [37] to adjust for confounding effects of cell composition. Results from the majority of those studies indicate that differentially methylated sites in the TXNIP, ABCG1, CPT1A and SREBF1 genes are associated with type 2 diabetes and glycaemic traits [21, 23,24,25, 33, 34].

  2. (2)

    Pancreas: The pancreas plays a key role in maintaining normoglycaemia through insulin secretion in response to blood glucose elevation [9]. In addition to the ten EWASs performed in blood, four of the included studies examined the association between DNA methylation in pancreas and type 2 diabetes. These studies were conducted in a limited number of individuals (n = 16 to 87) [27, 28] and no overlap in identified CpGs was found between the studies when considering specific multiple-testing corrections applied by the authors (FDR < 5% [12, 27]; p < 0.01 and 15% group-wise difference on methylation [28]). Interestingly, one study used whole-genome bisulphite sequencing (WGBS) and identified over 25,000 differentially methylated regions across the whole genome, suggesting large changes in methylome associated with type 2 diabetes [14].

  3. (3)

    Liver: Another important organ in glucose metabolism is the liver where, in diabetic individuals, suppression of hepatic glucose output by insulin is reduced, contributing to hyperglycaemia [38]. The exact pathophysiology causing liver insulin resistance is still unknown, suggesting a role for epigenetic mechanisms. We found two EWASs performed in liver tissue (Table 1) using rather small sample sizes (n = 15 [32] and 95 [31]). The majority of CpGs showing a significant methylation difference from these two studies were hypomethylated in individuals with type 2 diabetes compared with control individuals (92% and 94%, FDR < 25% and FDR < 5%, respectively). No overlap was found between liver and blood-based results of EWASs, suggesting that significant CpGs from liver EWASs might be tissue specific.

  4. (4)

    Adipose tissue: Pathogenesis of glucose intolerance is also associated with adipocyte metabolism and altered fat topography [39]. Among the reviewed studies, three EWASs were performed in adipose tissue: two investigating an association with type 2 diabetes (one study with five twin pairs and another with unrelated individuals, n = 95) and one investigating an association with HbA1c level (96 healthy male, 94 healthy female participants) [29, 30, 35]. We observed no overlap (manually checked) in the top 100 CpGs from the two studies focusing on type 2 diabetes [29, 30].


In 2013, the highest diabetes prevalence was observed in the North American and Caribbean region (around 11%), while the lowest was in the African region (around 5.7%) [40], suggesting differences in prevalence between ethnic groups. In the recent EWAS, the total risk of developing type 2 diabetes was three times higher in Indian Asians than in Europeans, regardless of differences in adiposity, physical activity and glycaemic values [21]. The authors estimated that 32% of the unexplained risk for future type 2 diabetes among Indian Asians compared with controls was associated with a higher methylation score based on the top five markers at TXNIP, ABCG1, SREBF1, SOCS3 and PHOSPHO1 [21]. A family-based study of 859 Mexican Americans showed that the degree of methylation at top regions including TXNIP, ABCG1 and SAMD12 genes and two intragenic regions accounted for 7.8% of the heritability of type 2 diabetes in Mexican Americans [25]. An EWAS performed in an Arab population showed that around 10% of methylation sites with FDR < 1% had median heritability of 0.7, supporting previous findings [22, 41]. These differences in DNA methylation between ethnic groups can be partly explained by their genetic ancestry, but also environmental and lifestyle factors may contribute to the variation, while some methylation loci (TXNIP or ABCG1) were found in populations with divergent ethnic backgrounds [21, 23,24,25].

Replication study

Selected CpGs

From the 19 studies included in the review, we selected 15 studies (Fig. 1). A list of CpGs robustly associated with type 2 diabetes or glycaemic traits was compiled based on the application of a stringent study-specific multiple-testing correction threshold to avoid false positive results (see Methods). After the removal of duplicates, we obtained a list of 100 unique CpGs (ESM Table 1) identified in peripheral blood (52 for type 2 diabetes and 21 for fasting glucose), pancreas (15 for type 2 diabetes), adipose tissue (ten for HbA1c blood level) and liver (two for type 2 diabetes).

Study population

We investigated which of the above-mentioned EWASs findings, both in blood and in other tissues, could be replicated in blood samples from the Lifelines case–control sample (for clinical characteristics see Table 3 and ESM Table 2). Individuals with type 2 diabetes were older, had a significantly higher BMI, waist–hip ratio and blood pressure, as well as higher levels of HbA1c, fasting glucose and triacylglycerols compared with control individuals. We observed no differences in socioeconomic status represented by level of education between type 2 diabetic and control participants (Table 3).

Table 3  Baseline characteristics of the study sample of type 2 diabetic individuals and healthy individuals from the Lifelines cohort (n = 198)

Association with type 2 diabetes: blood-specific CpGs

First, we analysed the 52 CpGs associated with type 2 diabetes in blood (ESM Table 1). In our Lifelines sample, five out of 52 included CpGs showed significant associations with type 2 diabetes (the Bonferroni-adjusted p < 0.0009 (0.05/52 CpGs)), including the loci in the ABCG1, LOXL2, TXNIP, SLC1A5 and SREBF1 genes (see a short description in ESM Box 1). This number increased to 15 CpGs when using the nominal significance level (p < 0.05) (Table 4). In agreement with previous studies, we observed hypermethylation in the loci at the ABCG1 and SREBF1 genes and hypomethylation in TXNIP, LOXL2 and SLC1A5 in type 2 diabetic compared with control individuals. Also, all nominally significant associations showed the same direction of effect as in the original EWASs. After adjustment for BMI, only the CpG site in ABCG1 remained significantly associated with type 2 diabetes, while for all other CpGs effect sizes became smaller and were no longer significant (ESM Fig. 1). Based on β values from regression analysis, we concluded that associations between significant CpGs and type 2 diabetes are partly explained by BMI (BMI accounted for 5–70% of variance, data not shown). Additional adjustment for other factors (see Methods) demonstrated that these covariates had only a relatively small impact on effect sizes and p values (ESM Table 3). Furthermore, we performed a sensitivity analysis on subsamples (see Methods), in which only the CpGs in TXNIP (50 vs 49) and ABCG1 (100 vs 49) reached the significance threshold (p < 0.0009), suggesting lack of power compared with the total group comprising 198 samples (data not shown). We also examined, for the 15 nominally significant CpGs, whether the differences in methylation were influenced by the occurrence of complications in diabetic individuals. We found no significant difference between individuals with and without complications (ESM Table 4). Finally, to check the effect of inflammation, we also adjusted the analysis for C-reactive protein (CRP) level and found no difference in the outcome (data not shown).

Table 4  Significant differentially methylated CpGs for type 2 diabetes as originally identified in blood and replicated in the Lifelines type 2 diabetes EWAS sample in blood (n = 198)

Next, we investigated whether the five replicated type 2 diabetes-associated CpGs are also correlated with glycaemic and lipid phenotypes of healthy individuals (n = 98, Table 5). The methylation level at the ABGC1 site was significantly and positively correlated with age, fasting glucose and triacylglycerols, while the methylation levels of the TXNIP and SLC1A5 CpGs was negatively correlated with age. The methylation level at SREBF1 was positively correlated with both fasting glucose and lipid levels. No significant correlation with BMI was found in healthy individuals.

Table 5  Correlations between DNA methylation (β values) of five replicated CpGs with type 2 diabetes risk factors in healthy individuals in Lifelines sample (n = 98)

Associations with type 2 diabetes: other tissue-specific CpGs

In addition to the 52 CpGs associated with type 2 diabetes in blood, we also analysed 17 CpGs that were associated with type 2 diabetes in pancreas and liver to test whether DNA methylation in metabolically active tissues may be reflected in DNA methylation in blood. No significant associations were found for any of these CpGs in blood samples (all p > 0.1).

Associations with glycaemic traits

Finally, we tested the CpGs previously shown to be associated with fasting glucose and HbA1c levels. In blood samples from the 98 healthy individuals, we replicated the association between CpGs in the CCDC57 and ABCG1 genes and fasting glucose level at nominal significance (p < 0.05, Table 6). Interestingly, after adjustment for BMI, two more CpGs, located in MDN1 and FLAD1 genes reached nominal significance (Table 6). We found no significant association between the level of HbA1c and DNA methylation at any of the ten CpGs identified in adipose tissue.

Table 6  Significant differentially methylated CpGs for fasting glucose replicated in healthy control individuals from the Lifelines type 2 diabetes EWAS subsample (n = 98)

The EWASs for other metabolically relevant traits

Since high BMI and dyslipidaemia are well-known risk factors for type 2 diabetes and are commonly observed in diabetic individuals [43], we compared the results from our replication study with the results from recent EWASs studying DNA methylation related to adiposity and blood lipids [42, 44,45,46]. We found a large overlap between CpGs that are significantly associated with BMI and triacylglycerol levels, and those that are associated with type 2 diabetes and fasting glucose (ESM Table 5).


In this study, we first comprehensively reviewed recently published EWASs investigations of DNA methylation patterns associated with type 2 diabetes and glycaemic traits. The potential use of DNA methylation as biomarker for type 2 diabetes is frequently reported in the literature, mostly using cross-sectional approaches. A more ideal setting for testing biomarkers would be to capture changes in the methylation profile prior to disease onset. A longitudinal study design would allow for this, since it provides measurements of methylation at multiple time points in the same individual, thereby capturing the epigenetic dynamics during life. However, due to higher costs and study duration, such EWASs are scarce, especially for complex diseases. To date, only one longitudinal EWAS study focusing on type 2 diabetes has been published, identifying five CpGs associated with disease onset in Indian Asians during the follow-up period [21], two of which (the CpGs in ABCG1 and PHOSPHO1) were replicated in a prospective study [47]. In our analysis we replicated three CpGs from the longitudinal study (i.e. ABCG1, TXNIP and SREBF1) indicating that those differences in methylation can also be captured in a cross-sectional study, for example, due to the stability of methylation level after disease onset. These CpGs represent potential predictive biomarkers for type 2 diabetes susceptibility.

Another issue concerns the inconsistency in EWASs methylation levels across tissues and whether blood can serve as a proxy tissue to capture these patterns. Changes in DNA methylation have been reported for different tissues like pancreas, liver, skeletal muscle or adipose tissue relevant in type 2 diabetes (ESM Table 1) [27, 31, 32, 48, 49]. The overlap in those results is limited, suggesting that the majority of the identified DNA methylation loci are tissue specific. However, some studies reported an overlap in disease-specific and age-related differentially methylated CpGs between blood and other relevant tissues. In recent EWASs, around 60% of the methylation changes associated with age in pancreatic islets also occur in blood, including FHL2, KLF14, FAM123C and GNPNAT1, all genes known to be associated with type 2 diabetes or insulin secretion [12]. Chambers et al reported that two out of five tested CpGs (in TXNIP and SOCS3) were differentially methylated in liver and reflected in blood [21]. Interestingly, another recent study showed hypermethylation at a CpG located in the SREBF1 gene in pancreatic cells and blood from type 2 diabetic individuals, and hypomethylation at the TXNIP locus in pancreatic islets, skeletal muscle and blood, which is directionally consistent with our findings in blood [47]. Taken together, these data indicate that some methylation changes found in the other tissues can be mirrored in blood. However, in our study we did not replicated the CpGs from the liver, pancreas and adipose tissue EWASs. This may be due to the small discovery sample sizes, the relatively small sample size of our replication study and/or reflect tissue-specific methylation patterns.

Epigenetic changes can be either a cause or a consequence of disease or an indirect contributing factor through environmental exposures that can affect both epigenome and type 2 diabetes risk [50]. Multiple factors can affect DNA methylation, such as environmental exposures [51], psychosocial [52] and genetic factors [53], together explaining the variance in DNA methylation levels between individuals. Also, accumulating data indicate that interactions between genetics and epigenetics influence gene expression levels in relevant metabolic traits, leading to the development of complex diseases [54, 55]. Recently, genetic ancestry and ethnicity is also shown to influence the methylation level [41]. Between the EWASs reviewed, we observed an overlap for a number of CpGs (TXNIP, ABCG1, SOCS3, SREBF1 and CPT1A) from EWASs performed in blood samples from Europeans, Indian Asians, Mexican Americans and Arabs, suggesting an association of DNA methylation with type 2 diabetes at these sites, irrespective of ethnic, social and environmental differences. Moreover, this finding highlights the usefulness of data sharing to create opportunities to perform meta-analyses, as is common practice for genome-wide association studies (GWASs).

In this study, we replicated five CpGs in blood, from which four reside in the genes previously shown to be associated with type 2 diabetes (ABCG1, LOXL2, SLC1A5, SREBF1) (ESM Box 1). Another replicated CpG site is TXNIP (cg19693031), which is shown to be hypomethylated in type 2 diabetes [21, 23,24,25]. Expression of TXNIP has been linked to glucose levels (ESM Box 1). Despite its important function in type 2 diabetes pathogenesis, TXNIP was not identified as one of the susceptibility genes in recent GWAS studies for type 2 diabetes [6]. These data suggest that DNA methylation is the major mechanism of controlling TXNIP expression, thereby affecting glucose homeostasis.

Blood cell composition can influence EWAS analyses and outcomes. There are several ways to avoid potential confounding effects of the cell composition, such as adjustment for direct measured cell count or reference-based cell count (e.g. the Houseman method [20]). In our analysis we observed no difference in effect sizes for the CpGs showing a significant association when using either the Houseman method or the measured cell count approach for adjustment, suggesting that these two methods may be used interchangeably (data not shown). Especially in studies in which information on blood cell composition is not available, methods such as the Houseman approach are essential.

It has been recently shown that methylation changes of the CpGs located in SREBF1, ABCG1 and CPTA1 were not only associated with type 2 diabetes but also with BMI [42, 44, 46]. Therefore, we compared our results with those from recent EWASs for adiposity and other relevant metabolic phenotypes [42, 44, 46]. We observed a substantial overlap between BMI and triacylglycerol-related CpGs, and CpGs associated with type 2 diabetes and glycaemic traits. Approximately 60% to 70% of diabetic individuals show some lipid abnormalities, which are associated with insulin resistance. The observed overlap in EWASs results could be explained by the fact that hypertriacylglycerolaemia leads to elevated non-esterified fatty acid levels, which in turn could induce insulin resistance and beta cell dysfunction [56]. Next, recent findings from the EWASs for adiposity indicate that adiposity determines methylation level at the majority of the identified loci [42] and that the methylation changes in blood might in part be a consequence of the alterations in lipid and glucose metabolism associated with BMI. In this EWAS, 62 of the 187 BMI methylation loci were associated with incidence of type 2 diabetes, and the BMI methylation risk score, calculated based on those CpGs, predicted future development of type 2 diabetes [42]. Together, this supports the hypothesis that BMI accounts partly for the association between DNA methylation and type 2 diabetes.

Overall, we conclude that a number of differentially methylated CpGs associated with type 2 diabetes in the published EWASs can be replicated in blood and show promise as disease biomarkers. Our data indicate that BMI partly explains the associations between DNA methylation and type 2 diabetes (i.e. only five out of 15 CpGs remained significant after adjustment for BMI). Whether these markers can be used as biomarkers for type 2 diabetes in a clinical practice requires further investigation. We recommend that more longitudinal studies are performed to confirm the robustness of these markers and to identify additional potential markers.