Introduction

The escalating prevalence of type 2 diabetes is a major health problem in industrial as well as developing countries. The numerous associated complications such as diabetic retinopathy, nephropathy, neuropathy, atherosclerosis, stroke and hypertension lead to reduced quality of life and increased mortality. Since type 2 diabetes is a heterogeneous disease whose onset and progression depends on genetic and environmental factors, epigenetic mechanisms may also play a key role in the pathology of diabetes and its complications. DNA methylation (DNAm), the most studied epigenetic mechanism, has been associated with stable alterations of gene expression and implicated in the pathogenesis of type 2 diabetes and other age-related diseases [14].

The study of epigenetic mechanisms can provide novel insights into the pathophysiology of diabetes and its complications, which may result in the identification of new drug targets. In addition, the investigation of associations with DNAm in peripheral blood may offer the identification of novel biomarkers for noninvasive early disease detection, since peripheral blood is the most widely available DNA source in large-scale epidemiological studies [5].

So far, the association of DNAm with type 2 diabetes or insulin has been investigated in human pancreatic islets [6], CD4+ T cells [7] and human adipose tissue [8, 9]. Owing to the potential use of peripheral blood as a biomarker, we aimed to identify type 2 diabetes-associated CpG sites in whole blood DNA. In our population-based cohort of 1,515 older adults with a type 2 diabetes prevalence of 16% (n = 240), we investigated the association of whole blood DNAm with type 2 diabetes at more than 360,000 CpG sites measured with the Illumina Infinium HumanMethylation450 BeadChip.

Methods

Study population

The cross-sectional data used for this analysis are from the baseline examination of the prospective ESTHER cohort study. The ESTHER study (n = 9,949) is a general population-based epidemiological study of older adults (aged 50−75 years at baseline) who were recruited by their general practitioners during a routine health check-up between 2000 and 2002 in Saarland, a state in southwest Germany (n = 9,949), and has been described in detail previously [10]. From all participants, two non-overlapping subcohorts were selected. Cohort 1 included 988 participants of the ESTHER cohort study, who were consecutively recruited at the start of the study between July and October 2000. Cohort 2 included 527 participants randomly selected from 3,499 ESTHER participants (with DNA available) recruited between October 2000 and March 2001. Baseline sociodemographic, lifestyle, health and diet information was obtained by a comprehensive questionnaire. Height, weight and history of several diseases, such as hypertension, stroke, diabetes mellitus and cardiovascular disease, were either obtained from a comprehensive participant questionnaire or from the general practitioner’s health check-up report at baseline. Only information obtained at baseline was used for the presented analyses. The study was approved by the ethics committees of the University of Heidelberg and of the physicians’ board of Saarland. Informed consent was obtained from all participants.

Laboratory methods

Blood samples were taken at recruitment. DNA was extracted from whole blood samples using a salting out procedure [11]. DNAm profile was measured in 2012 for subcohort 1 and in 2014 for subcohort 2 using the Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA), which enables the simultaneous quantitative measurement of the methylation status at 485,577 CpG sites [12]. The laboratory work was done in the Genomics and Proteomics Core Facility at the German Cancer Research Center, Heidelberg, Germany (DKFZ) as previously described [13]. Methylation levels at each CpG quantified by average beta values, where 1 corresponds to complete methylation and 0 to no methylation, were calculated with Illumina’s GenomeStudio 2011.1 (Modul M Version 1.9.0). Since the different normalisation approaches like β-mixture quantile normalisation (BMIQ), Illumina normalisation and preprocessing method implemented in Illumina’s GenomeStudio software and subset-quantile within array normalisation (SWAN) were all comparable in both the number of discovered CpGs and the number that were validated [14], data were processed according to the manufacturer’s protocol, no background correction was done and data were normalised to internal controls provided by Illumina. All controls were checked for inconsistencies in each measured plate. Each batch contained samples of participants with and without type 2 diabetes. Samples on the same batch were measured on the same day, with the same BeadChip and located on the same plate. Signals of probes with a detection p value >0.01 were excluded from analysis. We also excluded CpG sites associated with single nucleotide polymorphisms (SNPs), cross-reactive probes and CpG sites located on sex chromosomes, leaving 361,922 CpG sites to analyse.

Plasma glucose levels were measured by general practitioners during the preventive health check-up offered to people older than 35 years in the German healthcare system. Information on whether the participants had fasted overnight was documented on a standardised report sheet together with the measured values. HbA1c levels were measured in EDTA blood at the central laboratory of the University Clinic Heidelberg using standard high-performance liquid chromatography methods.

Statistical analysis

Prevalent diabetes was defined based on a physician’s diagnosis obtained from a preventive health check-up report or from the validation questionnaire at follow-up and by use of glucose-lowering drugs. If the prevalent diabetes value was zero and the HbA1c concentration was ≥6.5% (48 mmol/mol) the participant was considered to have undiagnosed diabetes [15]. Poorly controlled diabetes was defined as an HbA1c concentration ≥7% (53 mmol/mol) in participants with prevalent diabetes [16].

The following potential confounders of the association of DNAm with type 2 diabetes were considered: age, sex, BMI, smoking status (never, former, current) and batch. Restricted cubic spline regression was used to model the shape of dose–response relationships between whole blood DNAm and fasting glucose or HbA1c concentration, controlling for potential confounding factors (sex, age, BMI, smoking status and batch) for CpG sites at which the association of DNAm with type 2 diabetes was significant in both cohorts [17, 18]. Knots were set at 5th, 35th, 65th and 95th percentiles [17]. Since DNAm was measured in whole blood and methylation may vary between different leucocyte subtypes [19], we additionally adjusted for white blood cell composition using the method proposed by Houseman and colleagues [2023].

No missing values were observed for the variables age, sex and HbA1c. The number of missing values was less than 2% for BMI and smoking status. Nonfasting blood samples were taken in 8% of the participants, which were excluded to determine fasting glucose levels. Overall, overnight fasting glucose levels were missing for 13% of participants and dose–response relationships were calculated for nonmissing values only. Less than 1% of the DNAm data were missing. Although neighbouring CpG sites are often highly correlated with each other, the methylation levels in general might also be quite different at neighbouring CpG sites being either hypomethylated or hypermethylated. Since multiple imputation is based on functional relationships between variables, we replaced missing values by the median for each CpG site. The same approach was used for replacement of missing values for BMI. Participants with missing smoking status were considered as nonsmokers.

In subcohort 1 (discovery cohort) we investigated the association of whole blood DNAm with type 2 diabetes at 361,922 CpG sites. Since DNAm levels quantified by beta values at most of the CpG sites are not normally distributed, we used nonparametric median regression models, in which the dependent variables were the beta values at ~362,000 CpG sites and the independent variable was ‘type 2 diabetes at recruitment yes or no’, adjusted for sex, age, BMI, smoking status and batch. We corrected for 361,922 multiple tests using the Benjamini–Hochberg approach with a false discovery rate (FDR) of 5%. The q values were calculated according to Storey and colleagues [24, 25]. In subcohort 2 (replication cohort) we examined the association of whole blood DNAm with type 2 diabetes at CpG sites at which the association in subcohort 1 was significant; again we used median regression adjusted for sex, age, BMI, smoking status, batch and cell composition. The conservative method of Bonferroni was used to correct for 39 multiple tests (p < 0.0012).

Median and interquartile range for continuous data or number and percentage for categorical data were calculated for the description of population characteristics. Group differences between diabetic and nondiabetic individuals were compared using the Mann–Whitney U (Wilcoxon) test for continuous variables and the χ 2 test for categorical variables. A two-sided p value of 0.05 was considered significant. Data were analysed with the SAS software package (Version 9.2 and Enterprise Guide 4.2; SAS Institute, Cary, NC, USA) and using the procedures PROC CORR, PROC FREQ, PROC UNIVARIATE, PROC NPAR1WAY, PROC QUANTREG, PROC MULTTEST and PROC GPLOT.

Results

In the discovery cohort, type 2 diabetes was prevalent in 153 of 988 individuals (15%), 2% of the participants had undiagnosed type 2 diabetes (n = 23 with HbA1c ≥6.5% [48 mmol/mol]) and 45% (n = 69/153) of the diabetic patients had poorly controlled diabetes (HbA1c ≥7% [53 mmol/mol]). In the replication cohort, the prevalence of diabetes was 16% (n = 87/527), 4% of the participants had undiagnosed diabetes (n = 24) and 45% of diabetic patients had poorly controlled diabetes (n = 39/87). Characteristics of the study population are presented in Table 1. Diabetic patients were on average older, had higher levels of HbA1c and fasting glucose, and higher BMI. Smoking behaviour was not significantly different between individuals with and without diagnosed type 2 diabetes.

Table 1 Characteristics of the study population in the discovery (A) and replication (B) cohorts stratified by prevalent diabetes

In the discovery cohort, we analysed the association of whole blood DNAm at 361,922 CpG sites with prevalent type 2 diabetes at recruitment. The distribution of p values in the discovery cohort calculated using median regression models for each site is shown in Fig. 1 (Manhattan plot). After correction for multiple testing using the Benjamini–Hochberg approach with an FDR of 5%, we found 39 CpG sites where DNAm was significantly associated with type 2 diabetes. The regression coefficients and the p values for this association at these 39 type 2 diabetes-associated CpG sites are shown in Table 2. The median methylation level in diabetic patients in comparison to individuals free of type 2 diabetes was decreased at 20 CpGs and increased at 19 CpGs.

Fig. 1
figure 1

Manhattan plot of p values for the association of DNAm with prevalent diabetes at 361,922 CpG sites in the discovery cohort of 988 older German adults

Table 2 Regression coefficients, p values and q values in the discovery and replication cohorts for the 39 CpG sites at which DNAm was significantly associated with prevalent type 2 diabetes in the discovery cohort

We examined the association of DNAm with prevalent type 2 diabetes at these 39 CpGs in the replication cohort. After correction by Bonferroni for 39 multiple tests, DNAm was significantly associated with type 2 diabetes at cg19693031 located within the 3′-untranslated region (3′-UTR) of the gene TXNIP (Table 2). After additional adjustment for leucocyte composition, only marginal changes of the estimates and p values were observed and the association of DNAm with type 2 diabetes at cg19693031 was still significant (Table 2).

Dose–response relationships between DNAm levels at cg19693031 and fasting glucose or HbA1c concentrations adjusted for sex, BMI, age, smoking status, leucocyte composition and batch based on restricted cubic spline models are shown in Fig. 2. We observed a decrease in DNAm levels with increasing glucose and HbA1c concentrations in both subcohorts. Since poorly controlled diabetes was observed in 45% of the diabetic patients and DNAm levels decreased with increasing HbA1c, we estimated the difference in DNAm at cg19693031 in patients with controlled and poorly controlled type 2 diabetes in comparison with individuals free of diagnosed type 2 diabetes by median regression models adjusted for sex, BMI, age, smoking status, leucocyte composition and batch (Table 3). The decrease in DNAm was around 5% in patients with poorly controlled type 2 diabetes in comparison with individuals free of diagnosed type 2 diabetes in both cohorts.

Fig. 2
figure 2

Dose–response relationships between DNAm at cg19693031 (mapping to the gene TXNIP) and fasting glucose and HbA1c concentrations. Dose–response relationship between DNAm and fasting glucose concentration where the reference value for fasting glucose is (a) a median value of 5.2 mmol/l in the discovery cohort (n = 832) and (b) a median value of 5.1 mmol/l in the replication cohort (n = 451). Dose–response relationship between DNAm and HbA1c concentration where the reference value for HbA1c is (c) a median value of 5.6% (38 mmol/mol) in the discovery cohort and (d) a median value of 5.7% (39 mmol/mol) in the replication cohort. Solid line, estimation; grey dashed lines, confidence interval limits; points, knots. Ref., reference value. To convert values for HbA1c in DCCT % into mmol/mol, subtract 2.15 and multiply by 10.929

Table 3 Estimated change in DNAm at cg19693031 in patients with controlled and poorly controlled type 2 diabetes in comparison with individuals free of diagnosed type 2 diabetes

Discussion

In two independent subcohorts that were drawn from a population-based cohort of elderly German adults, we identified and replicated an association of type 2 diabetes with methylation within the 3′-UTR of TXNIP (cg19693031). Methylation at this CpG site significantly decreased with increasing fasting glucose and HbA1c concentrations. In patients with poorly controlled diabetes, DNAm was decreased by 5% in comparison with individuals free of diagnosed type 2 diabetes.

Thioredoxin-interacting protein (TXNIP), also known as vitamin D-upregulated protein or thioredoxin-binding-protein-2, has been linked to diabetes in multiple previous studies [2630]. It plays a particularly critical role in pancreatic beta cell biology and glucose homeostasis [2630]. TXNIP is activated in both hyperglycaemic animals and human adipose tissue treated with 25 mmol/l glucose [29] and regulates insulin-dependent and insulin-independent pathways of glucose uptake in human skeletal muscle [30].

The CpG site cg19693031 is located within the 3′-UTR, a region that contains regulatory regions that post-transcriptionally influence gene expression [31, 32]. It therefore appears biologically plausible that methylation within the 3′-UTR of TXNIP might be involved in the regulation of TXNIP expression and might play a role in defective glucose homeostasis preceding type 2 diabetes. Although we discovered and replicated an association of DNAm with type 2 diabetes at only one CpG site (cg19693031) this finding is plausible since of the 19 CpGs located within the gene region of TXNIP measured with the Illumina Infinium HumanMethylation450 BeadChip, cg19693031 is the only one that is located within the 3′-UTR (one CpG is located in the gene body and 17 CpGs are located in the promoter regions).

An epigenome-wide study using the Illumina Infinium HumanMethylation450 BeadChip reported the association of DNAm in CD4+ T cells with insulin and HOMA-IR in individuals free of type 2 diabetes at two CpG sites, cg01881899 and cg06500161, located within the gene ABCG1 [7]. In our study in whole blood, the p value for the association of DNAm with type 2 diabetes at cg01881899 was 0.36. Cg06500161 was not analysed because we excluded all probes with SNPs according to the annotation files since probe signals may be biased by the presence of SNPs in target CpG with the Illumina Infinium HumanMethylation450 BeadChip [33]. However, these apparent differences should be interpreted with caution because quite different study designs were used. While finalising this manuscript, two studies were published that found an association of DNAm within the gene region of TXNIP with type 2 diabetes in whole blood [34, 35]. Kulkarni et al reported an association of DNAm at cg19693031 with prevalent type 2 diabetes, fasting blood glucose and insulin resistance in 850 pedigreed Mexican American individuals using the 450k BeadChip, which was validated by pyrosequencing [34]. Chambers et al observed an association of DNAm at five loci, including TXNIP, with incident type 2 diabetes. Epigenome-wide association analyses using blood samples from Indian Asian individuals with incident type 2 diabetes and age-matched and sex-matched Indian Asian controls, followed by replication of top-ranking signals in Europeans were performed [35]. TXNIP was identified as one of seven loci at which methylome–metabotype associations independent of genetic variation and potentially driven by common environmental and lifestyle-dependent factors were observed [36].

A genome-wide study in skeletal muscle tissue from 28 men with or without a family history of type 2 diabetes and from nine monozygotic twin pairs discordant for type 2 diabetes observed differential DNAm at 26 CpG sites [37]. Differential non-CpG methylation of the PGC-1a (also known as PPARGC1A) promoter had already been found in skeletal muscle tissue of nondiabetic and diabetic individuals in 2009 [38]. In human pancreatic islets from 15 type 2 diabetic and 34 nondiabetic donors, 1,649 CpGs were identified with differential DNAm after correction for multiple testing by FDR [6]. In human adipose tissue from 28 type 2 diabetic donors and 28 age- and sex-matched controls, 15,627 CpG sites (a large number due to an FDR of 15%) with differential DNAm were identified [8]. In adipose tissue of 96 nondiabetic men, HbA1c level correlated significantly with DNAm at 711 CpG sites after correction for multiple testing (FDR of 5%), but in the female validation cohort (consisting of 94 nondiabetic women) these finding were not replicated [9]. Apart from the use of different tissues, discrepancies in the results described above can be explained by differences in study design. We replicated the findings of the current study in an independent sample for CpG sites that were identified in the discovery cohort and we additionally adjusted for smoking behaviour and BMI, as both factors have been associated with DNAm previously [3941].

Our study has specific strengths and limitations. Strengths include the large study population of more than 1,500 individuals, the replication of the significant associations in an independent sample and adjustment of the effect estimates for potential confounders including smoking behaviour and BMI. Besides the limitations resulting from the use of the Illumina Infinium HumanMethylation450 BeadChip (e.g. selection bias due to selection of available probes defined by the company and a consortium of experts), a potential limitation is the measurement of DNAm in whole blood rather than specific cell types. To adjust for the different leucocyte subtype composition we used the method introduced by Houseman and colleagues [2023].

In summary, in this large cohort of older adults from Germany, we found a novel association of DNAm within the gene region of TXNIP, which has been confirmed by others recently while we finalised this manuscript. Given that overexpression of TXNIP in diabetic animals and humans has been reported and the regulatory role of 3′-UTRs in gene expression is well known, it appears biologically plausible that methylation at cg19693031 might play a role in the pathophysiology of type 2 diabetes.