Epigenome-wide association study of incident type 2 diabetes: a meta-analysis of five prospective European cohorts

Aims/hypothesis Type 2 diabetes is a complex metabolic disease with increasing prevalence worldwide. Improving the prediction of incident type 2 diabetes using epigenetic markers could help tailor prevention efforts to those at the highest risk. The aim of this study was to identify predictive methylation markers for incident type 2 diabetes by combining epigenome-wide association study (EWAS) results from five prospective European cohorts. Methods We conducted a meta-analysis of EWASs in blood collected 7–10 years prior to type 2 diabetes diagnosis. DNA methylation was measured with Illumina Infinium Methylation arrays. A total of 1250 cases and 1950 controls from five longitudinal cohorts were included: Doetinchem, ESTHER, KORA1, KORA2 and EPIC-Norfolk. Associations between DNA methylation and incident type 2 diabetes were examined using robust linear regression with adjustment for potential confounders. Inverse-variance fixed-effects meta-analysis of cohort-level individual CpG EWAS estimates was performed using METAL. The methylGSA R package was used for gene set enrichment analysis. Confirmation of genome-wide significant CpG sites was performed in a cohort of Indian Asians (LOLIPOP, UK). Results The meta-analysis identified 76 CpG sites that were differentially methylated in individuals with incident type 2 diabetes compared with control individuals (p values <1.1 × 10−7). Sixty-four out of 76 (84.2%) CpG sites were confirmed by directionally consistent effects and p values <0.05 in an independent cohort of Indian Asians. However, on adjustment for baseline BMI only four CpG sites remained genome-wide significant, and addition of the 76 CpG methylation risk score to a prediction model including established predictors of type 2 diabetes (age, sex, BMI and HbA1c) showed no improvement (AUC 0.757 vs 0.753). Gene set enrichment analysis of the full epigenome-wide results clearly showed enrichment of processes linked to insulin signalling, lipid homeostasis and inflammation. Conclusions/interpretation By combining results from five European cohorts, and thus significantly increasing study sample size, we identified 76 CpG sites associated with incident type 2 diabetes. Replication of 64 CpGs in an independent cohort of Indian Asians suggests that the association between DNA methylation levels and incident type 2 diabetes is robust and independent of ethnicity. Our data also indicate that BMI partly explains the association between DNA methylation and incident type 2 diabetes. Further studies are required to elucidate the underlying biological mechanisms and to determine potential causal roles of the differentially methylated CpG sites in type 2 diabetes development. Graphical abstract Supplementary Information The online version contains peer-reviewed but unedited supplementary material available at 10.1007/s00125-022-05652-2.


General description of the cohorts and definitions of type 2 diabetes cases and controls
The Doetinchem Cohort Study is an ongoing, prospective, population-based study from Doetinchem, the Netherlands [1]. In brief, a random sample from the general population of women and men aged between 20 and 59 were selected for the first measurement round in 1987-1991. Adults who participated in the first round were invited for follow-up examinations in 1993-1997 (round 2, n=6117, mean age: 46 years), 1998-2002 (round 3, n=4918, mean age: 51 years), 2003-2007 (round 4, n=4520, mean age: 56 years), 2008-2012 (round 5, n=4018, mean age: 60 years), and 2013-2017 (round 6, n=3438, mean age: 64 years). Response rates were 75% or higher in all rounds. All participants provided an informed consent and the study was approved by the Medical Ethics Committee of the University Medical Center Utrecht.
The ESTHER study is an ongoing population-based cohort study conducted in the federal state of Saarland, Germany [2]. In brief, 9,949 older adults (50-75 years) were recruited by their general practitioners (GPs) during routine health check-ups (offered every two years to people older than 35 years in the German healthcare system) between 2000 and 2002, and followed up thereafter. During the baseline enrolment, epidemiological data were collected via a standardized self-administered questionnaire completed by participants and via additional reports from participants' GPs, and biological samples were obtained. Three subsets of ESTHER participants were selected for DNA methylation assessment in the baseline blood samples: subset I consists of 1,000 participants consecutively enrolled during the first 3 months of recruitment; subset II consists of 864 participants selected for a case-cohort design for mortality analysis [3]. Subset III was primarily selected to address cancer-related methylation signatures, consisting of 471 participants [4]. All participants provided an informed consent and the study was approved by the ethics committees of the University of Heidelberg and of the Medical Association of Saarland.
KORA (Cooperative Health Research in the Region of Augsburg) is a population-based cohort study conducted in Southern Germany 5 . In brief, the baseline surveys S3 and S4 were conducted in 1994/1995 and 1999-2001, respectively, and comprised independent samples of 4856 and 4261 subjects aged 25 to 74 years. Both cohorts were reinvestigated in the follow-up examinations F3 and F4 in 2004/2005 and 2006-2008, respectively, with 2974 and 3080 participants. Finally, there was a another follow up of F4 named FF4 in 2013/14 with 2279 participants. Two independent sub-cohorts from KORA were selected for EWAS analyses, designated as KORA1 (including KORA F4 and their FF4 follow up study) and KORA2 (including KORA S3 and S4 and their F3 and F4 follow up studies). Anthropometric variables and clinical parameters were determined at all examinations. All participants provided an informed consent and the study was approved by the ethics committee of the Bavarian Medical Association.
EPIC-Norfolk is a prospective cohort study that recruited 25,639 individuals aged between 40 and 79 years at baseline in 1993-1997 [5]. The cohort was representative of the general population of England and Wales but differed in that 99.7% of the cohort were of European descent. Follow-up was censored at date of diagnosis of T2D, 31 July 2006, or date of death-whichever came first. All participants provided an informed consent and the study was approved by the local research ethics committee. study of incident T2D have been described previously [6]. The LOLIPOP study is approved by the National Research Ethics Service (07/H0712/150), and all participants gave written informed consent at enrolment.

DNA methylation quality control and normalization
In the Doetinchem Cohort Study, DNA was isolated from whole blood samples with the commonly used salting out method [7]. Next, 500 ng of genomic DNA of each sample was bisulfite converted using the EZ DNA Methylation kit (Zymo Research, Irvine, California, USA) and hybridized to Illumina Infinium Methylation EPIC arrays according to the manufacturer's protocols. The original IDAT files were generated by the Illumina iScan BeadChip scanner. Data was generated by the Genome Analysis Facility of the UMCG, the Netherlands (www.rug.nl/research/genetics/genomeanalysisfacility/). Preprocessing and quality control of the methylation data was done according to the minfi tutorial [8]. Probes containing a SNP in the sequence, sex probes and probes with detection p-value>0.001 in more than 5% of samples were removed. Subsequently, quantile normalization was applied using limma [9]. Adjustment for batch effects was done using ComBat [10,11]. The Illumina Infinium Methylation EPIC array from Illumina includes approximately 90% of the probes from the Infinium HumanMethylation450K array. We used the HumanMethylation450 v1.2 Manifest File [12] to select the 450K probes from the EPIC array, which resulted in 424,748 probes to be used for this particular study.
In ESTHER, DNA methylation in whole blood was quantified using the Infinium HumanMethylation450K BeadChip (Illumina.Inc, San Diego, CA, USA). In brief, 1.5 mg DNA (allocated in 96-well format with three random duplicate samples in each format as quality controls) was bisulfite converted, and 200 ng bisulfite-treated DNA was applied to the 450K BeadChips following the manufacturer's instruction. Raw data pre-processing and initial quality control was carried out following the CPACOR pipeline [13]. Probes with detection p-value>0.01 were removed before quantile normalization, which was applied following stratification of the probe type into 6 categories according to probe type and color channel, using the R package limma [9]. Sample call rate threshold and CpG call rate threshold both were 95%. A principal component analysis (PCA) was performed for the positive control probes, and the first 30 control probe PCAs were included in the regression model for batch correction.
In KORA1 and KORA2 studies methylation was quantified in bisulfite converted genomic DNA from whole blood, using the Illumina Infinium HumanMethylation450 array in all samples. Quality control was performed using the minfi package [8] and included removal of the probes if detection p-value was higher than 0.01 and the sample and marker call rate thresholds were set at 95%. Quantile normalization of intensity values separated into 6 categories was applied. A principle component analysis (PCA) was performed for the positive control probes, and the first 30 control probe PCAs were included in the regression model for batch correction.
In EPIC-Norfolk, epigenome-wide DNA methylation data were analyzed in R (version 3.2.2). Methylation intensity values were corrected using the Illumina background correction algorithm as implemented in minfi [8], methylation intensities with a detection p-value ≥ 0.01 were set to "missing," and methylation intensity beta values were calculated for each methylation marker per sample. Details of the quality control were previously described [14].
In LOLIPOP, DNA methylation data were analyzed in R (version 2.15) using minfi [8] and other R scripts. Marker intensities were normalized by quantile normalization. Details of the quality control were previously described [6].
Since presence of extreme outlying values in DNA methylation data (< 25th percentile− 3*IQR (interquartile range) or > 75th percentile +3*IQR)) may have a great impact on EWAS results, such probes (i.e., CpGs) were removed prior to analysis if identified in more than 20% of samples. All other extreme outliers identified in this way were set to "missing" [15]. The leukocyte composition (CD4 and CD8 T-cell subtypes, natural killer cells, monocytes, granulocytes, and B cells) of the samples were estimated using Houseman's algorithm [11]. The resulting cell count estimates and cohort-specific DNA methylation batches were then used as covariates in all regression models for incident type 2 diabetes.
Analysis plan for all cohorts

DNA methylation Quality control and Normalization
Pre-processing may differ per cohort, and should be done before the EWAS. Specific details regarding pre-processing, excluded samples and probes, method of normalization and a short description about the cohort itself (cohort design, inclusion and exclusion criteria etc) should be included in the Study_Summary Excel file and uploaded together with other results.
Normalized beta-values will be used as outcome variable. Please use trimming method to get rid of outliers as proposed by the Pregnancy And Childhood Epigenetics (PACE) consortium (see code below). Remove probes if outliers were detected in >20% of samples (see code below).

Prediction models in Doetinchem Cohort Study
To calculate methylation risk scores based on different p-value thresholds, we first performed leave-oneout meta-analysis without Doetinchem Cohort Study. We then used the beta coefficients of this metaanalysis results as weights in calculating MRS. We included 4 p-value thresholds to investigate the predictive ability of CpG sites identified at less stringent p-value (1x10 -7 , 1x10 -6 , 1x10 -5 , 1x10 -4 Table 4. Attenuation of effect sizes (% change) between different models in discovery meta-EWAS and between discovery and replication for 76 significant CpG sites associated with incident T2D.

Illumina ID
Gene name           sites for incident T2D in the Doetinchem cohort.
ESM Figure 6. Predictive ability of methylation risk score based on CpG sites at 4 increasingly lenient p-value thresholds for incident T2D in the Doetinchem cohort.