The Finnish Twin Cohort comprises virtually all the same-sex twin pairs born in Finland before 1958 and with both co-twins alive in 1967 . In 1975, a baseline questionnaire (described below in detail) was sent to twin pairs with both members alive. The response rate was 89%. After excluding the participants with diagnosed diabetes at baseline, those of undefined zygosity and those who had moved abroad before 1976, the cohort consisted of 23,585 individuals with self-reported baseline data on education, social and occupational class, alcohol consumption, physical activity and BMI . The final cohort for the present study included 20,487 individuals, with 8,182 complete twin pairs, who had complete physical activity information available for metabolic equivalent (MET) index calculations (see explanation below). Of the total sample, 9,842 were male and 10,645 female, and 6,399 were monozygotic twin individuals and 14,087 were dizygotic twin individuals. Determination of zygosity was based on an accurate and validated questionnaire method .
To remove the confounding factors due to disease, we studied a subgroup of 13,291 presumably healthy individuals. Participants with chronic diseases (such as angina pectoris, myocardial infarction, stroke, diabetes, cardiovascular disease, chronic obstructive pulmonary disease and malignant cancer) affecting weight and ability to engage in leisure physical activity prior to 1982 had been identified by a questionnaire in 1981 and by medical records as described in detail by Kujala et al. . Type 2 diabetes  and some other diseases can remain subclinical and undiagnosed for some time after the onset of symptoms. Therefore, we set a 6 year period in order to ensure that any undiagnosed cases in 1975 would have been diagnosed by 1981. Thus, we obtained a true cohort of participants free of clinical co-morbidities.
The participants were informed about the purposes of the overall cohort study when given the baseline questionnaire in 1975. In responding to the questionnaire, participants also gave informed consent. The record linkages were also approved by the appropriate authorities responsible for the registers and the Ethics Committee of the Department of Public Health, University of Helsinki.
Baseline physical activity and covariate assessment
The 1975 questionnaire included questions on medical history, education, occupation, physical activity and other health habits. Assessment of leisure-time physical activity volume (MET index) was based on a series of structured questions on leisure-time physical activity (monthly frequency, mean duration and mean intensity of sessions) and commuting physical activity. The index was calculated by first assigning a multiple of resting metabolic rate (MET value) to one of four categories defined according to the strenuousness of the activity . After assigning the MET value, the product of the activity was calculated as follows: MET value × duration × frequency. The MET index was expressed as the sum score of leisure MET h/day (1 MET h/day corresponds to about 30 min walking every other day). The MET index thus established was then divided into quintiles. The same quintiles were used as in our earlier study on mortality . For cut-off points see Table 1. For further analyses the index was dichotomised as sedentary <0.59 MET h/day (QI) and active ≥0.59 MET h/day (combined QII–V).
The MET index was validated in a previous study by our group  by comparing the MET index with a 12 month detailed physical activity questionnaire conducted by telephone interview. The intraclass correlation between the MET index and the detailed 12 month physical activity MET index was 0.68 (p < 0.001) for leisure-time physical activity and 0.93 (p < 0.001) for commuting.
Baseline self-reported weight and height were used to calculate BMI, which was used as a covariate in the study. In another study of Finnish twins the correlation between self-reported and measured BMI was very high .
Self-reported smoking status, use of alcohol, work-related physical activity and social class at baseline in 1975 were also used as covariates. Smoking status was coded into four categories, determined from responses to detailed smoking history questions: never smoked; former smoker; occasional smoker; and current daily smoker . Alcohol use was coded as a dichotomous index of binge drinking and defined by whether the participant had drunk at least five drinks on a single occasion, at least monthly . Alcohol was also used as a continuous variable expressed as grams consumed daily, as described in detail earlier . Six categories were used to describe social class and the classification was based on self-reported job titles according to the criteria used by the Central Statistical Office of Finland . Work-related physical activity was used as a categorical variable with a four-point ordinal scale .
Type 2 diabetes information for 1976–1996 was collected from death certificates, the National Hospital Discharge Register and the Medication Register of the Social Insurance Institution by linking this information to the personal identification assigned to all residents of Finland . The Social Insurance Institution of Finland (KELA) is the agency responsible for the provision of basic social security [19, 29]. KELA reimburses whole or part of the cost of necessary medications to patients who are certified by a physician as having a diagnosed severe chronic disease . Although the register is not sensitive to cases of mild disease, it has very high validity and the possibility of false-positive cases is unlikely . The relevant medical records for 1976–1996 were reviewed and cases classified as type 2 diabetes, type 1 diabetes, gestational diabetes, secondary diabetes or other diagnoses as described by Kaprio et al. . The date of onset of disease symptoms was determined and used in the analyses. The diabetes information for 1996–2004 was collected solely from the Medication Register and individuals were presumed to have type 2 diabetes, given their age . For this period the date of being granted the right to reimbursable medication was used in the analysis as the date of disease onset. We have not yet extended the data collection for years 2005–2009, partly because the national programme of screening pre-diabetes and diabetes cases followed with preventive interventions (for example, dietary modification, physical activity) has been intensive during 2005–2009, which may cause a bias in our study design if included in our prospective long-term follow-up.
Cox proportional hazard regression was used to estimate the hazard ratios, with 95% CI, for the incidence of type 2 diabetes by MET quintile. The inactive category (QI: <0.59 MET h/day) was used as the reference group. The follow-up for type 2 diabetes ended at the time of diagnosis and for the others at the time of death, emigration from Finland or end of follow-up (31 December 2004). First, the Cox regression model was conducted as an individual analysis and second, the analyses were done as pairwise analyses, in which the data were stratified by pair and thus the risk estimates were within-pair estimates. For the individual analysis, the Cox regression model was adjusted for age and sex, and additionally for BMI. The pairwise analyses controlled by design for age and sex (co-twin-control design), but the models were also adjusted for BMI and were run separately for MZ and DZ pairs if the numbers permitted. The basic individual analysis was additionally adjusted for work-related physical activity, social class, alcohol use and smoking. In the individual-level analyses, lack of statistical independence of co-twins was taken into account by computing robust variance estimators for cluster-corrected data  to yield correct standard errors and p values. Data management and analysis were performed using the Stata statistical software, version 9.0.