Contextualisation of risk variables for developing type 2 diabetes
In total, 96,534 participants were included in the study. Study population characteristics are reported in ESM Table 1. In short, the population consisted of slightly fewer men than women (41%) and had a mean age of 45.2 years. A total of 1494 individuals developed type 2 diabetes. Of the 134 variables, we identified 53 variables (40%) in both directions (i.e., screened and replicated in both dataset A to B and B to A), and ten variables (7%) in a single direction (Fig. 2). The p values, number of individuals with complete data and the replication in one or two directions are documented in ESM Table 2 and ESM Fig. 3.
We identified categorical risk variables, including a borderline or pathological vs normal ECG (HR: 1.37 and 1.40), being a current (HR: 1.62) or ex-smoker (HR: 1.11) vs non-smoker, and having a prescription for hydrochlorothiazide, metoprolol, atorvastatin, enalapril, simvastatin, omeprazole, pantoprazole, salmeterol-fluticasone, or salbutamol (HR: 2.46 to 1.77). Further, low or medium vs high education (HR: 1.87 and 1.27), having a family history of diabetes (HR: 1.81 for mother, 2.28 for sibling), and several health-related quality-of-life variables were associated with a higher diabetes risk.
Of all risk variables, HbA1c attained the highest HR (3.65 for 1 SD increase). Next, we ‘contextualised’ the individual HRs with respect to HbA1c or estimated the equivalence of risk factors to HbA1c. The number of SDs of other continuous risk variables with respect to 1 SD increase of HbA1c is presented in Fig. 3a, as well as the population mean and the value corresponding to the SD increase (Fig. 3b). The HRs adjusted for HbA1c are depicted in Fig. 3c. Recall that a 1 SD increase in HbA1c equates to 3.39 mmol/mol (0.31%) HbA1c. First, we observed that serum glucose is on par with HbA1c. Specifically, the HR for a 1 SD increase in HbA1c is equivalent to an HR for a 1.08 SD (0.53 mmol/l; adjusted HR: 3.01) increase in glucose. Adiposity and HDL required at least a 1.5 SD change, a significant fraction of the population. For example, the HbA1c equivalence for waist circumference was an increase of +1.66 SD (19.8 cm; adjusted HR: 1.60) and 1.67 SD for HDL-cholesterol (decrease of 1.67 SD; 0.67 mmol/l). Of note, other adiposity-related anthropometrics (i.e., body weight, WHR, BMI) needed a respective increase of 1.87, 1.88, and 2.03 SDs (27.6 kg, 0.15 units, 8.34 kg/m2) to be equivalent to a 1 SD change in HbA1c. Apart from uric acid (+1.95 SD; 0.14 mmol/l), all other replicated risk variables were required to increase by at least 3 SDs to be equivalent to the HR for a 1 SD change of HbA1c. For example, a 3.04 SD increase in leucocyte count has a HR equivalent to 1 SD increase in HbA1c. Only 418 individuals (0.43% of the Lifelines population) had a leucocyte count this high.
Impact of impaired fasting glucose on the risk factors for developing type 2 diabetes
After we excluded individuals with IFG (n = 3510, 586 complete cases), all initially replicated risk factors remained nominally significant (ESM Fig. 4a). Sublevels for ECG (pathological) and smoking (ex-smoker) lost significance. HRs decreased for family history of diabetes (−14%) and HbA1c (−15%) and increased for omeprazole and individual levels of three quality-of-life indicators (+11 to 22%). When correcting for IFG status, we found HRs to weaken for 22 risk variables (ESM Fig. 4b). HRs decreased by more than 10% for glycaemic traits, erythrocyte indicators, uric acid, adiposity-related variables, pathological ECG, family history of diabetes, eight medications and social functioning. HRs, p values and changes (%) in respect to the main analysis are described in ESM Table 3.
Correlation patterns between risk factors
Correlation patterns between replicated variables are presented in ESM Fig. 5. We found correlations between variables to cluster for white blood cells, red blood cells, liver enzymes, adiposity-related anthropometrics, BP, dietary and smoking variables and quality of life (rho: >0.5). HDL-cholesterol showed weak to moderate inverse correlations with adiposity-related anthropometrics and triacylglycerols (rho: −0.27 to −0.45). All correlations remained stable across age groups and sexes. Sex-specific negative correlations were found for medications and differed between age groups. The number of effective variables decreased with at least one variable for the quality of life, anthropometric and the lifestyle group (ESM Table 4).
Risk prediction and interchangeability of variables in clinical contexts
The number of times each variable was selected, the cumulative number of variables added to the model and the model’s corresponding c-index and HR are shown in Fig. 4a and reported in ESM Table 5. Impact is depicted in Fig. 4b and reported in ESM Table 6, and HR trajectories are shown in ESM Fig. 6.
When we included all replicated risk variables, HbA1c, HDL-cholesterol, and work-related activities were selected in all bootstrapped lasso regression models (c-index: 0.834). The next increase in c-index was observed after glucose was included (detected in 81% of the models, c-index: 0.886), after which the model satiated (c-index after all variables included: 0.892). The model’s c-index decreased when glucose (1.9%) or HbA1c (1.3%) was removed. The inclusion of glucose decreased the HR of HDL-cholesterol (from 0.65 [0.60; 0.71] to 0.74 [0.68; 0.81]) and HbA1c (from 3.44 [3.22; 3.68] to 2.05 [1.91; 2.20]). In contrast, the HR of male sex increased from 0.93 (0.80; 1.07) to 0.73 (0.63; 0.84).
To test the interchangeability of glucose and HbA1c, we excluded them as potential risk variables. This led the algorithm to select a more complex model with lower discrimination that included age, sex, BMI, HDL-cholesterol, triacylglycerols, the number of pack years and omeprazole, which scored at least 99% (c-index: 0.813 vs 0.886). The full model attained a c-index of 0.843 (vs 0.892) and was not impacted by individual variables (data not shown). We observed a similar increase in HR trajectory for sex (HR after inclusion of ten variables: from 0.97 [0.85; 1.11] to 0.67 [0.56; 0.79]).
Next, we excluded all invasive variables and ECG. We found the most robust scores (retained in 99% of the models) for age, BMI, WHR and omeprazole (c-index: 0.802). The full model attained a c-index of 0.831, and was borderline impacted by family history of diabetes (0.8%). The HR for age gradually became weaker over the inclusion of the first ten variables (from 2.09 [1.96; 2.22] to 1.70 [1.57; 1.83]), whereas HRs of other included variables remained stable over the inclusion process.
To investigate the interchangeability of the key variables BMI and WHR, we excluded these respective risk variables from the model. As a result, the algorithm selected more variables, including age, work-related activities, use of pantoprazole, omeprazole or simvastatin, heart rate, family history of diabetes and waist circumference, which resulted in a similar discrimination (c-index: 0.812) and remained stable as further variables were included (all variables: 0.828). The inclusion of waist circumference had an influence on the HR of age (2.03 [1.89; 2.19] to 1.84 [1.71; 1.99]), family history of diabetes (2.31 [2.04; 2.62] to 2.04 [1.80; 2.31]) and simvastatin, pantoprazole and omeprazole (range from 2.00–2.16 to 1.62–1.87).
When solely considering questionnaire-based variables (including age), then omeprazole, work-related activities and vigorous intensity activities were significant in 96% of bootstrapped models (c-index: 0.729). After adding sex, vitality, education, and pantoprazole (≥68% score), the c-index increased to 0.749. The full model reached a c-index of 0.796, which was impacted by age (1.3%) and family history of diabetes (1.4%). When more variables were added to the model, HRs declined for age (from 2.17 [2.07; 2.30] to 1.59 [1.46; 1.75]) and omeprazole (from 2.09 [1.76; 2.50] to 1.63 [1.35; 1.97]).