Study design and population
The EPIC-Norfolk study, described in detail previously , is a UK population-based cohort of 25,639 men and women aged 40–79 years at baseline, recruited in 1993–1997. All volunteers gave written informed consent, and the study was approved by the Norfolk Research Ethics Committee. Participants attended a baseline health examination at their general practitioner’s clinic, after which follow-up data collection points included a postal questionnaire at 18 months, a second health examination visit in 1998–2000 and a postal questionnaire in 2002–2004.
A nested case-cohort was designed, including 4,000 subcohort participants selected at random from the entire cohort, and 892 incident diabetes cases were ascertained. Due to the randomly selected nature of the subcohort, 143 of these cases were included within the subcohort, which the case-cohort design allows and accounts for in the analysis.
For the current analyses we excluded those with prevalent and uncertain diabetes status (n = 83), those with missing food diary data (n = 18) and other covariates (n = 3), and those with an implausible ratio of energy intake to basal metabolic rate as defined using published equations  (n = 82; top and bottom 1% of the distribution). Individuals with prevalent myocardial infarction, stroke and cancer were also excluded (n = 436) to account for possible post-diagnosis changes in diet. Therefore, a total of 4,127 participants (753 cases and 3,502 subcohort, including 128 subcohort cases) remained for analysis.
Case ascertainment and verification
Incident type 2 diabetes cases until 31 July 2006 were ascertained using multiple data sources, including self-report of doctor-diagnosed diabetes from the second health check or follow-up health and lifestyle questionnaires, self-report of diabetes-specific medication in either of the two follow-up questionnaires or medication brought to the follow-up health check. These were verified through record linkage with the general practice diabetes register, local hospital diabetes register, hospital admissions data and Office of National Statistics mortality data with coding for diabetes. Participants who self-reported a history of diabetes that could not be verified with any other sources of ascertainment were not included as confirmed cases of diabetes.
Baseline dietary intake data were collected using a 7 day food diary . Food weights were estimated using photographs representing portion sizes, household measures and standard units. Nurses trained to standardised protocols provided participants with instructions on how to complete the diary at the health check and asked participants to recall the previous day’s intake. This formed day one of the diary. Participants prospectively completed the remaining 6 days and sent it back to the study centre. Food intake data were entered using the Data into Nutrients for Epidemiologic Research (DINER) entry system  and converted into food weights and nutrient intakes by DINERMO .
A pragmatic approach was applied in estimating dairy product intake. Total dairy intake was estimated as food items that only consist of dairy plus composite dishes where dairy was the main ingredient. Ice cream, chocolate, butter used in cooking and dairy included as a minor ingredient in composite dishes were not included. Intakes were categorised into high- and low-fat dairy using 3.9% fat (the fat content of whole milk in the UK) as a cut-off point. Intakes were also categorised by subtype into yoghurt, cheese and milk intakes. A non-exclusive group, total fermented dairy products, was created and subdivided into high- and low-fat fermented dairy products using 3.9% fat as a cut-off point. Category descriptions are detailed in Table 1.
Baseline demographic, lifestyle and health characteristics were collected using a self-administered questionnaire. A validated four-point physical activity index was used to categorise participants as active, moderately active, moderately inactive or inactive . Height, weight, waist circumference and systolic and diastolic BP were measured, BMI was calculated and blood samples were collected using standardised procedures. The questionnaires, physical activity index and anthropometric measurements methods have been previously described in detail . Dietary covariates were estimated using data from the 7-day food diary. Plasma vitamin C measurement is a marker of recent fruit and vegetable intake  and provides an indication of dietary quality. To determine plasma vitamin C levels, venous blood was drawn from non-fasting participants into citrate tubes and stored overnight in a dark container at 4–7°C. Samples were centrifuged and plasma was stabilised using a standardised volume of metaphosphoric acid and measured using a fluorometric assay.
Dairy product intakes were divided into tertiles according to the subcohort intake distribution. Baseline characteristics and dietary intakes were examined across tertiles of total dairy intake in the subcohort. Dairy product intake was adjusted for energy intake using the residual method . The residuals from the regression of dairy intake on total energy intake were rescaled by adding the expected dairy intake for a person with mean total energy intake. By design, 128 incident diabetes cases were included in the random subcohort. To account for this case-cohort design, Prentice-weighted Cox regression models [21, 22] were used to calculate HRs and 95% CIs for the association between dairy intake and incident type 2 diabetes. Age was included as the underlying timescale in the Cox models, with entry time defined as age at recruitment and exit time as age at diagnosis of diabetes, death, loss to follow-up or censoring at the end of follow-up, whichever came first.
Model 1 adjusted for age (continuous, as underlying timescale) and sex. Model 2 additionally adjusted for BMI (continuous), family history of diabetes (yes or no), smoking status (current, former, never), usual alcohol consumption (continuous units/week) estimated from a health questionnaire, physical activity index (inactive, moderately inactive, moderately active, active), social class (professional, managerial, skilled, semi-skilled, unskilled) and education level (no qualification, O level, A level, degree or higher). Model 3 additionally adjusted for dietary covariates, including energy intake (continuous kJ/day), and intake of fibre, fruit, vegetables, red meat, processed meat and coffee (all continuous g/day). To test for linearity the median intake value of each tertile of dairy intake was included in the Cox regression model. The assumption of proportional hazards, checked by including time-dependent covariates in the model, was not violated. Possible interactions with sex, BMI, physical activity index and smoking status were examined by including the interaction terms in the most adjusted models.
The independence of the associations of specific dairy subtypes was tested by mutually adjusting for other dairy subtypes and separately for all other food groups. In a further model, we included hypertension (dichotomous >140 mmHg systolic BP or >90 mmHg diastolic BP, or on hypertension medication), hypercholesterolaemia (dichotomous >6.2 mmol/l or on lipid-lowering medication), sugar-sweetened beverage intake and trans fat intake. The potential mediating roles of saturated fat, vitamin D, calcium and magnesium in the association of dairy intake and type 2 diabetes were examined by entering these into further models. To test the effect of substituting dairy products for alternative foods, we set a priori criteria of including only those dairy products that were associated with type 2 diabetes and considering foods that would be likely replacements. The effect of substituting a portion of one food for another was examined by including both as continuous variables in a multivariable model (model 2 plus energy [kJ]). The difference in their beta coefficients and their variances and covariance were used to estimate the beta coefficient and variance for the substitution effect, which in turn was used to calculate HRs and 95% CIs . Portion sizes are means of those used in DINERMO .
Sensitivity analyses included repeating the models without the residual method for energy adjustment (i.e. using absolute intakes), restricting analyses to dairy product consumers only and including plasma vitamin C levels. In addition, the analyses were repeated excluding participants diagnosed with incident diabetes in the first 2 years of follow-up in order to minimise the possibility of reverse causality and those classified as energy misreporters according to published cut-offs for the ratio of energy intake to basal metabolic rate , and including those with prevalent chronic diseases at baseline. Participants with high dairy product intakes were not excluded as the intakes were deemed plausible and, when examined in consumers only, no intakes were more than 1 SD higher than the median.
The analyses were performed using Stata (version 12; Stata Corp, College Station, TX, USA).