Literature review
We included 15 publications reporting 22 different risk score models that predicted the 10 year risk of CVD. Only two of the scores were published before 2000 (Framingham 1991 [22], Framingham 1998 [23]) (ESM Results, ESM Tables 8, 9).
Out of the 22 identified CVD risk prediction models, nine were derived in individuals with type 2 diabetes alone (Risk Equations for Complications Of type 2 Diabetes [RECODE] [24], Diabetes Audit and Research in Tayside [DARTS] [25], UK Prospective Diabetes Study [UKPDS] 56 [26], UKPDS 68 Congestive Heart Failure [C-HF] and Stroke [27], UKPDS 82 C-HF and CHD [28], and CHS Basic and Advanced [29]), and 13 scores enrolled both non-diabetic individuals and individuals with type 2 diabetes (SCORE CHD and CVD [30]; Finrisk Stroke, CHD and CVD [31]; Framingham 1991 CHD, CVD and Stroke [22]; Framingham 1998 [23]; QRISK2 [32]; QRISK3 [33]; ASCVD [1]; and Reynolds Risk [34, 35]), and these were considered general population samples. Ten rules were designed to predict CVD, seven predicted CHD, three predicted stroke and two HF (ESM Table 8 [Type of predicted CVD reported]).
All the risk scores incorporated classic CVD risk factors, such as age, sex, blood pressure and smoking status. Twenty risk scores included information about lipids. The scores that included a proportion of individuals with diabetes typically included type 2 diabetes (presence/absence) as a predictor, but did not include diabetes-specific risk factors such as diabetes duration and glycaemic status (which are often used in diabetes-specific scores). The total number of predictors in these risk prediction models ranged from six (SCORE [30]) to 19 (QRISK3 [33]) (ESM Fig. 2, ESM Table 10).
Individual characteristics and 10 year CVD outcomes
The baseline characteristics of the individuals are presented in Table 1 and ESM Tables 2, 6, 7. The mean age was 59.3 years (SD: 13.9), 78,204 (46%) participants were women and 43,102 (26%) individuals were on statins.
During a median follow-up time of 10 years since type 2 diabetes diagnosis, 38,335 (22.70%) individuals suffered a CVD, AF or HF event. Of these, 29,025 (17.19%) had a CVD event, 20,628 (12.22%) CHD, 13,826 (8.19%) AF, 9465 (5.6%) HF and 6727 (3.98%) stroke (see Kaplain-Meier estimates in Fig. 1, ESM Table 11).
Predicting cardiovascular risk in individuals with type 2 diabetes
Results obtained from the complete case-analyses (see ESM Results) were similar to results from the multiple-imputation analysis. Nevertheless, because the complete-case analysis slightly overestimated model performance (ESM Figs. 3, 4, ESM Tables 12, 13), we present the later, more conservative, results in the main text (Fig. 2, ESM Figs. 5, 6, ESM Tables 14, 15).
Most models achieved similar calibration in CVD prediction (CS: from 0.38 to 0.74; CIL: from −1.89 to 2.26) (Fig. 2, ESM Table 14). Models designed to predict stroke and/or HF did not substantially underperform compared with CVD-derived models. The scores almost uniformly underestimated the risk of CVD+ (CS: from 0.41 to 0.88, CIL: from −1.50 to 2.69) (ESM Table 14), the exceptions being the Framingham 1991 CVD and DARTS scores which systematically overestimated risk.
The CHD Basic (CS: 0.86; CIL: −0.22), ASCVD (CS: 0.46; CIL: −0.19), QRISK2 (CS: 0.69; CIL: −0.25) and QRISK3 (CS: 0.72; CIL: −0.05) models (originally derived to predict any CVD) generally showed near-perfect calibration for CVD+. Focusing on scores not originally intended to predict CVD, we found that the Framingham 1998 score (a CHD score) could accurately predict both CVD (CS: 0.74 [95% CI 0.72, 0.76]; CIL: −0.15 [95% CI -0.16, −0.13]) and CVD+ (CS: 0.88 [95% CI 0.86, 0.90]; CIL: 0.23 [95% CI 0.22, 0.25]). For the ‘other’ group (including stroke and HF-derived scores), we found that RECODE for CVD (CS: 0.73 [95% CI 0.70, 0.76]; CIL: −0.2 [95% CI -0.21, −0.19]) and for CVD+ (CS: 0.85 [95% CI 0.82, 0.87]; CIL: 0.17 [95% CI 0.16, 0.18]) calibrated well (Fig. 2). Despite observing reasonable external calibration, models had more difficulty discriminating between individuals who experienced an event within 10 years of follow-up and those who remained event free: the C statistic ranged from 0.62 to 0.67 (95% CI 0.67, 0.67) for SCORE CVD (Fig. 3). Similar patterns of discrimination were observed when predicting CVD+, with this combined endpoint showing a minimally improved C statistic (from 0.64 to 0.69) compared with CVD, and again with SCORE CVD having the largest C statistic (0.69 [95% CI 0.69, 0.70]). Testing for the pairwise difference in C statistics (ESM Fig. 7) indicated that SCORE CVD outperformed all other scores aside from the ASCVD, Finrisk CVD and SCORE CHD. SCORE CVD also performed better than the nine diabetes-specific scores (Fig. 3, ESM Fig. 7). A net reclassification comparison (applied after model recalibration, see below) showed that SCORE CVD performed slightly better than QRISK2 and QRISK3 by assigning a lower risk to individuals who did not experience CVD in the available 10 years of follow-up (Table 2, ESM Table 16).
Table 2 A net reclassification table comparing the predicted CVD risk distributions of QRISK3 and SCORE CVD, among individuals with type 2 diabetes without and with a CVD event, during the available 10 years of follow-up
We observed that scores with more than ten predictors did not necessarily outperform scores with fewer variables: QRISK3 (19 variables) for CVD+ had a C statistic of 0.68 (95% CI 0.68, 0.69), compared with a C statistic of 0.69 (95% CI 0.69, 0.70) for SCORE CVD (six variables) and a C statistic of 0.69 (95% CI 0.69, 0.69) for the Framingham 1998 score (seven variables). Similar results were obtained for the CVD-only outcome. The scores derived from individuals with diabetes did not outperform scores derived in a population of non-diabetic individuals (Fig. 3).
Predicting individual CVD endpoints and model recalibration
We additionally evaluated the ability of these 22 rules to predict individual CVD components: CHD, stroke, AF and HF. Here, we observed that, similar to CVD, predictions for CHD were slightly overestimated (ESM Figs. 3, 5). This overestimating was more severe when using these models to predict stroke, AF and HF (ESM Figs. 4, 6). Nevertheless, when considering AF and HF, the discriminative ability of these models slightly improved (C statistic ≥0.70) relative to CHD and CVD (ESM Figs. 8–11).
Recalibrating the 22 models using the 10% training dataset considerably improved calibration (CS: from 0.96 to 1.04) (ESM Figs. 12–14, ESM Tables 13, 15), with most rules showing near-perfect calibration in the test data. Given that most of these 22 rules were not designed to predict stroke, AF or HF, it was somewhat surprising to see that recalibration markedly improved performance for these endpoints as well, (Fig. 4). For example, after recalibration, QRISK3 could predict HF risk (CS: 1.09 [95% CI 1.02, 1.17]) and AF risk (CS: 1.08 [95% CI 1.00, 1.16]) remarkably well (Fig. 5).
Subgroup analyses
Next, in individuals with type 2 diabetes without CVD+ at baseline, we explored the discriminative ability of these CVD scores in subgroup analyses of age, sex and statin usage (Fig. 6, ESM Fig. 15). Subgroup changes in performance were shared across the various scores, where discriminative ability was lower for men and statin naive and older individuals (significance interaction tests indicated by a solid line in Fig. 6, ESM Tables 17–19).
We additionally performed similar subgroup analyses for individuals with type 2 diabetes irrespective of their baseline CVD+ status (see Clinical characteristics in ESM Table 20), finding similar patterns of discrimination as for individuals without CVD+ at baseline (Fig. 7, ESM Fig. 16, ESM Tables 21–24). The results showed that score performance was significantly worse for people with pre-existing CVD+ at the time of type 2 diabetes diagnosis. Finally, we observed that RECODE performed best (a C statistic of 0.73 [95% CI 0.73, 0.74] for CVD+) in a sample of people with type 2 diabetes including individuals with and without CVD+ history at the time of diagnosis.