Study population
The design and methods of the InterAct study, nested within the EPIC cohorts, are described in detail elsewhere [29]. All participants provided written informed consent, and the study was approved by the local ethics committee in the participating countries and the internal review board of the International Agency for Research on Cancer. Briefly, the study population included participants from 26 centres in the eight of the ten countries participating in EPIC who had available blood samples and information on diabetes (France, Italy, Spain, the UK, the Netherlands, Germany, Denmark and Sweden). All ascertained and verified incident type 2 diabetes cases between 1991 and 2007 (3.99 million person-years at risk, n = 12,403) were included in the case group. A centre-stratified, representative subcohort of 16,835 individuals was selected as the control group. Prevalent diabetes cases (n = 548) and individuals with uncertain diabetes status (n = 133) were excluded from the subcohort, leaving 16,154 individuals for analysis. Due to the random selection of the control group, 778 incident type 2 diabetes cases were part of the subcohort, leaving a total study sample of 27,779 individuals.
For the current analysis, we excluded participants with abnormal estimated energy intake (top 1% and bottom 1% of the distribution of the ratio of reported energy intake over estimated energy requirements, assessed by the basal metabolic rate) (n = 619) and those with missing information on dietary intake (n = 117). Participants with no data on any of the SNPs of interest (including all Danish participants) (n = 7617) were also excluded. For our interaction analysis, the models were adjusted for demographic and lifestyle factors, thus participants with no information on these key covariates were also excluded. Therefore, 18,638 individuals were included in the analysis, comprising of 8086 cases and 11,035 subcohort participants (including 483 cases in the subcohort). For some of these participants, information on specific SNPs was missing, thus the sample size for the analysis of each SNP differs slightly.
Case ascertainment
Ascertaining incident type 2 diabetes involved a review of the existing EPIC datasets at each centre using multiple sources of evidence including self-report, linkage to disease and drug registers, hospital admissions and mortality data. Information from any follow-up visit or external evidence with a date later than the baseline visit was used. To increase the specificity of the definition for these cases, we sought further evidence including individual medical records review in some centres. Follow-up was censored at the date of diagnosis, 31 December 2007 or the date of death, whichever occurred first. In total, 12,403 verified incident type 2 diabetes cases were identified.
Choice of dietary factors and dietary assessment
After a detailed literature review, we identified four dietary factors (whey-containing dairy, olive oil, coffee and cereal fibre) for which there was evidence for increasing postprandial incretin levels [13–15]. Self- or interviewer-administered country-specific validated dietary questionnaires and/or diet records (Sweden only) were used to assess the usual food intake of participants. Nutrient intake was estimated using the standardised EPIC nutrient database (ENDB) [30]. Whey-containing dairy was calculated by subtracting the intake of cheese from the total intake of dairy products including milk, yoghurt, milk-based puddings and cream desserts. Olive oil consumption was reported on its own or as an ingredient in recipes, depending on the country. In the Swedish study centre Umeå, olive oil was not assessed in the questionnaire, and in the UK, it was only reported as an ingredient in recipes. Therefore, these centres had to be excluded from the olive oil interaction analysis. Coffee intake included both caffeinated and decaffeinated coffee. Cereal fibre was derived from the ENDB by adding the fibre content of cereal-based products, including bread, rice, wheat-based pasta, crisp bread, rusks and breakfast cereals [31].
Covariate assessment
Questionnaires were used to collect information on lifestyle factors and socioeconomic status at baseline [32]. For the current analysis we used a four category physical activity index reflecting occupational and recreational physical activity [33]. Educational attainment was categorised as: none; primary school; technical school; secondary school; and further education including university degree. Smoking status was categorised as: never; former; and current smoker. Alcohol consumption was categorised as: 0 g/day; >0–6 g/day; >6–12 g/day; >12–24 g/day; and >24 g/day. Total energy intake was assessed as kcal/day (converted to MJ/day), and intake of specific foods and nutrients of interest as g/day. Anthropometric measures including weight, height and waist circumference were collected at baseline by standardised procedures and adjusted for clothing [34]. Information on prevalent diseases was obtained at baseline including stroke, myocardial infarction, hypertension and hyperlipidaemia.
DNA extraction, genotyping and SNP selection
DNA was extracted from up to 1 ml of buffy coat for each individual from a citrated blood sample. A detailed account on the DNA extraction and genotyping procedures has been published previously [35]. Briefly, a total of 10,027 participants (4644 cases) were randomly selected across all centres (except Denmark) for genome-wide genotyping using the Illumina 660 W-Quad BeadChip (Illumina, San Diego, CA, USA). In addition, 9794 EPIC-InterAct participants with available DNA and not selected for genome-wide measurement were genotyped using the Illumina Cardio-Metabochip (Illumina, San Diego, CA, USA) [35].
Seven SNPs (GIPR: rs10423928; TCF7L2: rs7903146 and rs12255372; WFS1: rs10010131; KCNQ1: rs151290, rs2237892 and rs163184) associated with type 2 diabetes in GWAS [36] were selected based on evidence from small-scale experimental studies that had revealed major roles in the regulation of the release and functioning of incretin hormones [2, 3, 7, 8]. As TCF7L2 has several point mutations implicated in type 2 diabetes, we chose to use the ones for which there is evidence for involvement in the incretin system, based on small-scale experimental studies in humans. More specifically, two independent studies have collectively shown that TCF7L2 rs7903146, rs7901695 and rs12255372 influence the second phase of GLP-1-induced insulin secretion [7, 8]. Since TCF7L2 rs7903146 and TCF7L2 rs7901695 are in strong linkage disequilibrium (LD) (R2 = 0.98, D’ = 0.88) it is likely that the two SNPs capture the same genetic information, thus of the two, we only included rs7903146 in our analysis. The KCNQ1 SNPs included in our analysis were in low LD (R2 < 0.25).
Four of the aforementioned seven SNPs (GIPR rs10423928, TCF7L2 rs7903146, WFS1 rs10010131 and KCNQ1 rs163184) were available from direct genotyping within the entire EPIC-InterAct study on Sequenom or Taqman platforms. Information on two of the remaining SNPs (TCF7L2 rs12255372 and KCNQ1 rs2237892) was available from genotyping with the GWAS Illumina 660 W-Quad Chip and Illumina Cardio-Metabochip. The SNP KCNQ1 rs151290 was not available on any of the genotyping arrays and KCNQ1 rs163171 was chosen as a proxy SNP (r2 = 0.94, D’ = 1.0 in the CEU [Utah Residents with Northern and Western European Ancestry] of 1000 Genomes phase 3 dataset assessed via www.ensembl.org, accessed 21 January 2015). No significant deviation from Hardy–Weinberg equilibrium was observed (p > 0.01).
An unweighted genetic risk score was constructed based on all unlinked genetic variants (TCF7L2 rs7903146, KCNQ1 rs163184, KCNQ1 rs2237892, GIPR rs10423928 and WFS1 rs10010131) that were significantly related to type 2 diabetes within the study population. Minor allele frequencies for all SNPs can be found in the electronic supplementary material (ESM) Table 1.
Statistical analyses
Prentice-weighted Cox regression was used with age as the underlying time-scale, incident type 2 diabetes as the outcome and each SNP (GIPR: rs10423928; TCF7L2: rs7903146 and rs12255372; WFS1: rs10010131; KCNQ1: rs151290, rs2237892 and rs163184) as well as the genetic risk score, in turn, as exposure variables, stratified by centre and age at baseline (rounded to the nearest integer) and adjusted for sex. For each SNP, per genotype and additive genetic effects were modelled using the minor allele as the effect allele. Gene–diet interactions for each SNP (as well as genetic risk score) and intake of each dietary factor (whey-containing dairy, cereal fibre, coffee and olive oil) were modelled assuming an additive genetic effect by inclusion of a multiplicative interaction term in the model. This model was further adjusted for physical activity, education, BMI, smoking status, total energy intake, intake of fruit and vegetables, meat, soft drinks and alcohol, and mutual adjustment of the four dietary factors of interest. All dietary factors, except coffee, were energy-adjusted by the residual method [37]. Dietary factors were scaled to represent one serving. Calculation of one serving for each food item was based on previous publications from the EPIC-InterAct study and main EPIC studies: dairy 150 g/day; cereal fibre 10 g/day; olive oil 10 g/day; and coffee 125 g/day [26, 38, 39].
The country-specific Cox regression coefficients in the genetic main effects analysis and gene–diet interaction analyses were estimated and combined using random-effects meta-analysis. Between-study heterogeneity was estimated using I2.
In the interaction analysis, we established the number of independent genetic variables from a correlation matrix (Pearson’s r) of all genetic variables tested in the interaction analysis (seven SNPs), and a genetic risk score (range 0–10) by spectral decomposition [40]. Based on the number of independent genetic variables (n = 7) and dietary factors (n = 4) tested, p values below the multiple testing corrected significance threshold of 0.0018 (=0.05/28) were considered to be statistically significant.
All statistical analyses were performed using the SAS Enterprise Guide 6.1 (SAS Institute, Cary, NC, USA) and SAS 9.4 (SAS Institute). Meta-analyses were performed using R (version 3.1.2, www.r-project.org) and the R function ‘metagen’ available from the R package ‘meta’ (version 4.3-0) in PROC IML.