Study cohorts
These analyses were conducted in two parallel cross-sectional cohorts of European ancestry adults from northern Europe: the first cohort (cohort 1) comprised of participants with blood glucose concentrations within the normal glucose control or prediabetes (impaired HbA1c, fasting glucose or 2 h glucose according to ADA criteria [19]) brackets and the second cohort (cohort 2) comprised individuals with recently diagnosed type 2 diabetes (within 6–36 months of study enrolment). Participants underwent detailed physical examinations, including MRI scans and carbohydrate challenge tests, diet assessment and objective habitual physical activity assessment. Approval for the study protocol was obtained from each of the regional research ethics review boards separately and all participants provided written informed consent at enrolment. The research conformed to the ethical principles for medical research involving human participants outlined in the declaration of Helsinki.
The study rationale and design and core characteristics of the IMI DIRECT cohorts are reported in detail elsewhere [17, 18]. Below, we provide a summary and describe the methods most relevant for the present analyses.
Cohort 1 (prediabetes) was from a sampling frame of 24,196 participants nested within prospective cohorts from Denmark (Copenhagen), Finland (Kuopio), the Netherlands (Hoorn) and Sweden (Malmö); 2127 participants at varying risk of glycaemic deterioration were enrolled into the study. To determine the risk of rapid glycaemic deterioration, we used the DIRECT-DETECT algorithm [20]. For cohort 2 (diabetes), 789 participants were recruited from health registries and primary care practices in Denmark (Copenhagen), the UK (Dundee, Exeter, Newcastle), the Netherlands (Hoorn) and Sweden (Lund). As neither of the Swedish study centres undertook MRI scans, they were not included in the current analysis.
Of these participants, 920 (cohort 1) and 435 (cohort 2) had all the necessary variables for a complete case analysis of the twin-cycle hypothesis (TC) model and 725 (cohort 1) and 361 (cohort 2) had all the necessary variables for the complete case analyses fitting the twin-cycle plus physical activity (TC-PA) model. The following variables were included in these models: fasting plasma glucose, 2 h glucose, oral glucose insulin sensitivity (OGIS), liver fat, pancreatic fat, fasting insulin secretion rate, glucose sensitivity (insulin secretion per glucose), age, sex, centre, metformin use, total daily energy intake, total daily carbohydrate, protein and fat intake, and mean daily physical activity intensity. The characteristics of these subcohorts are shown in Table 1.
Table 1 Characteristics of cohort subset used in each model Measures
Fasting glucose was assessed from venous plasma samples drawn in the morning following an overnight fast. Frequently sampled 75 g oral glucose tolerance tests (fsOGTTs) and mixed-meal tolerance tests (MMTTs) were carried out in cohort 1 and 2, respectively. Mixed meals (250 ml Fortisip liquid drink [18.4 g carbohydrate per 100 ml]) rather than 75 g oral glucose loads were used in cohort 2 to minimise the risk of severe hyperglycaemia as participants had type 2 diabetes. The 2 h glucose, OGIS, fasting insulin secretion rate and glucose sensitivity (dose–response slope of insulin secretion in response to glucose) were calculated from the fsOGTT and MMTT data, as described elsewhere [21, 22]. Liver and pancreatic fat were measured by MRI and quantified using a multi-echo technique described in detail elsewhere [17, 18, 23, 24]. Briefly, prone 1.5 T to 3 T images (depending on availability at each study centre) were acquired. T1-weighted images were obtained for the abdominal region (between the diaphragm and acetabulum) with maximum field of view and 10 mm slice thickness with a 10 mm slice gap. A three-dimensional scan using 50–80 images at slice thicknesses of 1.2–2 mm (depending on equipment) was acquired to image the pancreas. Further axial single-slice multi-echo images were acquired of the liver and pancreas (10 mm slice thickness). Whole-organ pancreatic and liver fat estimates were then inferred from these images, where experienced radiographers manually determined organ boundaries. Physical activity was objectively assessed using triaxial accelerometry (ActiGraph GT3X+, ActiGraph, Pensacola, FL, USA) on the non-dominant wrist over 10 days. Physical activity intensity was characterised by calculating the mean high-pass filtered vector magnitude (hpfVM) of the triaxial acceleration signal [25, 26]; the mean of this was used to describe overall physical activity level. Non-wear was inferred as a vector magnitude SD of less than 4 mg for a consecutive period greater than 60 min. To account for bias introduced by removal of non-wear time in combination with differential diurnal non-wear patterns between individuals, adjustments were made for diurnal rhythm [27]. Dietary intake was assessed using a validated multi-pass food habit questionnaire and 24 h diet record, as previously described [17, 18].
Statistical analysis
All continuous variables were standardised by rank-normal transformed (mean 0, SD 1) by sex (and by lifestyle vs metformin + lifestyle in cohort 2). Adjustment for putative confounders was done by two-step residual regression where, in the first step, residuals were extracted from general linear models undertaken on the transformed continuous variables or binary categorical variables, and these residuals were used in subsequent models as either outcome or predictor variables. Regression models for residual extraction co-varied for age, study centre, total daily energy intake and total daily intake of dietary carbohydrates, fats and proteins. Pearson correlation coefficients were generated and plotted in a matrix to illustrate simple pairwise relationships between all model variables.
Structural equation modelling
We used structural equation modelling to test the overall model fit and relationships between sets of variables within the hypothesised twin-cycle. Structural equation modelling is a multivariate statistical method that can be thought of as a combination of regression analysis, factor analysis and pathway analysis. In a structural equation model a structure (a pathway network) of relationships between variables can be defined according to a prespecified hypothesis (such as the twin-cycle). Based on the observed covariance between the variables in the model, the fit of the defined model can then be tested. In addition to this, pathway (mediation) effects within the defined model can also be tested. We defined models using measured variables only (manifest nodes) and fitted under a maximum likelihood framework using covariance matrices. The model definition (see Figs 1a, 2a) reflects the hypothesised twin-cycle model, proposed elsewhere [15, 16]. Here, relationships (edge estimates) between variables are adjusted for putative confounders (through two-step residual regression, described above, hence fitted on covariance matrices of the extracted residuals). Direct (non-mediated) edge estimates also account for covariance of the other edges pointing to the same outcome. In other words, edges (see arrows in Figs 1, 2) in the model represent regression coefficients, which co-vary with edges from other variables pointing to the same outcome node (see Text box for node and edge abbreviations). Pathway (indirect) effects were estimated for mediated associations of physical activity with glycaemic control using the coefficient product method [28], where mediation is defined using the approach described by Baron and Kenny [29]. Pathways were tested where statistically significant direct associations (individual, non-mediated, edge estimates) along the whole pathway were also observed. Relative model fit was assessed using the comparative fit index (CFI) and the Tucker–Lewis index (TLI), with values ranging from 0 (no fit) to 1 (perfect fit) [30]; a model with a ‘good’ fit typically requires both indices to exceed 0.95 [31, 32]. Absolute fit was assessed using root mean square error of approximation (RMSEA). This ranges from 0 to 1, with 0 indicating a perfect fit [30]. A poorly fitting model is typically defined by RMSEA >0.06 [33, 34]. CFI, TLI and RMSEA were not used to formally determine adequacy of fit, as their use in this context is controversial and there is limited consensus on appropriate cut-off values because each index is affected differently by degrees of freedom, model complexity and sample size; it is, however, standard practice to report these along with the χ2. To overcome this, we formally tested model fit by comparing the χ2 of the tested model with χ2 values obtained from variable-randomised null models with identical structures (in other words, the variables were randomly assigned to other nodes in the same structural equation model definition) and applied to the respective covariance matrix used for the tested model. This process was iterated 10,000 times. To determine whether the tested model χ2 values were lower (fitted better) than mean χ2 values, one-sample t tests were used. We also calculated the empirical probability of the χ2 value from the null model being lower than the χ2 value from the tested model by expressing the tested model χ2 value as a quantile within the iterated χ2 values.
Multiple testing adjustments were not undertaken, as this analysis sought to validate a single previously hypothesised model where direct effect estimates were nested within this single model, reflecting a single overarching hypothesis. Moreover, where results are consistent across the two subcohorts used here, one might regard this as replication. However, the absence of replication may reflect real differences in diseased and non-diseased states, as opposed to providing evidence of type 1 error.
All statistics were computed using R version 3.5.0 [35]. Structural equation models were fitted using lavaan version 0.6-5 [36]. Models were plotted using semPlot version 1.1.2 (CRAN repository or https://github.com/SachaEpskamp/semPlot, accessed 20 August 2019). The IMI DIRECT data release version used for these analyses was ‘direct_29-03-2019’.