Study populations
The NHS, NHSII and HPFS are ongoing prospective cohort studies. The NHS includes 121,700 female registered nurses aged 30–55 years enrolled in 1976 [11], the NHSII includes 116,671 female registered nurses aged 24–44 years enrolled in 1989 and the HPFS consists of 51,529 male health professionals aged 40–75 years enrolled in 1986 [12]. Participants in all studies have been followed through posted biennial questionnaires to collect and update information on lifestyles, health-related behaviours and medical histories. The study protocol was approved by the institutional review boards of the Brigham and Women’s Hospital and the Harvard T.H. Chan School of Public Health. The completion of the self-administered questionnaire was considered to imply informed consent.
Of the participants who completed a baseline food frequency questionnaire (FFQ; NHS 1984, n = 81,757; NHSII 1991, n = 97,605; and HPFS 1986, n = 51,530), we excluded individuals if they: (1) reported a diagnosis of diabetes, cardiovascular disease or cancer at baseline (n = 9392 in NHS, n = 6155 in NHSII and n = 6933 in HPFS); (2) had daily energy intake outside the normal range (<2092 [500] or >14,644 [3,500] kJ/day [kcal/day] for NHS and NHSII; <3347 [800] or >17,572 [4200] kJ/day[kcal/day] for HPFS) or missing gluten data (n = 2164 in NHSII and 1275 in HPFS) [13]; (3) had a missing date of type 2 diabetes diagnosis (262 in HPFS); and (4) only completed the baseline questionnaire or had missing age at baseline (n = 763 in NHS, 682 in NHSII and 1152 in HPFS). This left 71,602 participants in NHS, 88,604 in NHSII and 41,908 in HPFS for the analysis.
Ascertainment of diet and gluten intake
In 1984, NHS participants completed a validated 118 item FFQ to assess their habitual diet in the past year. Starting in 1986 in NHS and HPFS and 1991 in NHSII, a similar but expanded questionnaire was sent to the participants every 4 years to collect and update their dietary information [14]. We calculated the cumulative averages of diet based on valid assessments from baseline to the end of follow-up, and stopped updating dietary information if participants reported a diagnosis of diabetes, cardiovascular disease or cancer [14]. Nutrient intakes were adjusted for total energy using the residual method [15]. The Alternative Healthy Eating Index (AHEI) was calculated as described previously without the alcohol component [14].
Gluten intake was estimated based on gluten-containing ingredients of food items in the FFQ [15]. Gluten-containing ingredients included wheat, wheat flour, wheat bran, wheat germ, wheat berries, wheat cream, wheat gluten, rye and rye flour, barley and barley malt flour, cooked cereal, bulgur, couscous, farina, beer and pasta. We identified these ingredients from food items according to product labels and ingredient information provided by manufacturers in commercially prepared foods, and recipes from cookbooks for home-prepared items. Gluten-containing ingredients in each food source were quantified by multiplying the serving size by the amounts of gluten-containing ingredients in each serving of food. The proportion of gluten in the protein portion of the ingredients was estimated to be 75–80% in previous studies, and we used a conservative estimate of 75% when calculating the gluten content of these ingredients [10, 16, 17]. We used the same conversion factor for all three grains, although the proportion of gluten in total protein may be more variable in rye and barley than in wheat [18]. We did not account for trace amounts of gluten that may be present in oats and in condiments (for example, soy sauce), as the contribution to total gluten intake would be negligible [19]. Finally, gluten from all ingredient sources was added up to estimate the total gluten consumption. Of note, the FFQ assessments of some major sources of gluten were reasonably correlated with those by 7-day diet records: deattenuated correlation coefficients ranged between 0.57 (pie) to 0.79 (cold breakfast cereal) [20]. In a more recent validation study conducted in 2010–2012, the deattenuated correlation coefficients between dietary assessments by FFQs and 7-day dietary records were in the range of 0.54 and 0.69 for protein, dietary fibre and carbohydrates [21].
Ascertainment of incident type 2 diabetes
Participants who reported a physician diagnosis of diabetes were posted a supplementary questionnaire regarding symptoms, diagnostic tests and hypoglycaemic therapy. The diagnosis of type 2 diabetes was considered confirmed if at least one of the following was reported on the supplementary questionnaire according to the National Diabetes Data Group criteria [22]: one or more classic symptoms (excessive thirst, polyuria or frequent urination, weight loss, hunger) plus fasting plasma glucose ≥7.8 mmol/l or random plasma glucose levels ≥11.1 mmol/l; ≥2 elevated plasma glucose concentrations on different occasions (fasting glucose ≥7.8 mmol/l, random plasma glucose ≥11.1 mmol/l and/or plasma glucose ≥11.1 mmol/l after ≥2 h shown by oral glucose tolerance testing) in the absence of symptoms; or treatment with hypoglycaemic medication. The diagnostic criteria changed in June 1998, and a fasting plasma glucose of 7.0 mmol/l was considered the threshold for the diagnosis of diabetes instead of 7.8 mmol/l according to the American Diabetes Association criteria [22]. In validation studies, 61 of 62 self-reported cases of type 2 diabetes confirmed by the supplementary questionnaire were re-confirmed after a blinded endocrinologist reviewed medical records [23]; and in the HPFS, 57 of 59 cases (97%) were re-confirmed [24].
Assessment of covariates
Information on family history of diabetes, smoking status, physical activity, menopause status and menopausal hormone use, oral contraceptive use, multivitamin use and body weight was collected in a biennial follow-up questionnaire. Physical activity was estimated by multiplying the energy expenditure in metabolic equivalent tasks (METs), measured in h/week, by time spent on that activity, and then values of all activities were summed to derive total physical activity levels. BMI was calculated as self-reported weight in kg divided by baseline height in m2.
Statistical analysis
Person-years were calculated from the return of baseline FFQ to diagnosis of type 2 diabetes, last return of a follow-up questionnaire, death or end of follow-up (2012 for NHS and HPFS and 2013 for NHSII), whichever came first. Cox proportional hazards models were used to calculate HRs and 95% CIs for the association between quintiles of gluten intake and type 2 diabetes risk in each cohort and after pooling data from three cohorts. Multivariate-adjusted models were stratified jointly by age in months and calendar year, and adjusted for ethnicity (white, African-American, Asian and other ethnicity), family history of diabetes (yes/no), smoking status (never, former, current [1–14, 15–24, or ≥25 cigarettes/day], or missing), alcohol intake (g/day: 0, 0.1–4.9, 5.0–14.9, or ≥15.0 in women; 0, 0.1–4.9, 5.0–29.9, or ≥30.0 in men; or missing), physical activity (METs: <3.0, 3.0–8.9, 9.0–17.9, 18.0–26.9, ≥27.0, or missing), menopause status and menopausal hormone use (pre-menopause, postmenopause [never, former, or current hormone use], or missing, for women), oral contraceptive use (yes, no, or missing, for NHSII), multivitamin use (yes/no), BMI (kg/m2: <23.0, 23.0–24.9, 25.0–29.9, 30.0–34.9, ≥35.0, or missing), total energy intake, AHEI (in quintiles) and intakes of magnesium, folic acid and cereal fibre (in quintiles). Linear trend was tested by modelling the median gluten values in each category as a continuous variable. We performed restricted cubic spline analysis in pooled samples of three cohorts with 5 knots among participants between the 1st and 99th percentile of gluten distribution to minimise possible influence of extreme values on the curve. We repeated categorical and dose–response analysis using gluten intake as percentage of total energy.
We calculated Spearman correlation coefficients of gluten intake with whole grains, refined grains, bran, germ, starch, glycaemic index and glycaemic load, in addition to cereal fibre. We also controlled for carbohydrate variables other than cereal fibre in a secondary analysis, including: (1) refined grains; (2) whole grains; (3) glycaemic index and glycaemic load; and (4) bran, germ and starch. We also calculated gluten intake adjusted for whole grains and refined grains using the residual method for energy adjustment [15], and repeated the analysis using adjusted residuals of the gluten variable. Finally, joint analysis was performed to test potential interactions of gluten with intake of bran, added bran, cereal fibre and whole grain on risk of type 2 diabetes.
Analyses were also stratified by age (<65 years, ≥65 years), BMI (<30 kg/m2, ≥30 kg/m2), physical activity (<18 METs, ≥18 METs) and smoking status (current smoking, or not) to determine whether any interactions existed. To assess the robustness of findings, we conducted the following sensitivity analyses: (1) adjusting for individual diet components, including trans fats, polyunsaturated fat to saturated fat ratio, fruits, vegetables and red meats (in quintiles), instead of AHEI; (2) using baseline gluten data as the exposure; and (3) using baseline BMI instead of updated BMI. Data on prevalent coeliac disease were available for NHS and HPFS, and we performed a sensitivity analysis after excluding participants who reported coeliac disease in 2014 in these two studies.
All statistical analyses were conducted in SAS 9.4 (SAS Institute, Cary, NC, USA), and p values were two sided with a significance level of 0.05.