Clustering in three large cohorts based on clinical measures
In this cross-sectional study, 15,940 individuals from three cohorts were included, for which baseline characteristics are given in Table 1. The characteristics of the three cohorts were generally comparable, with the majority male participants and an average age of around 60 years. Individuals were clustered based on age, BMI, HbA1c, C-peptide and HDL-cholesterol. The optimal number of clusters was based on the gap statistic across the three cohorts. In GoDARTS the optimal number of clusters was five, with lower gap statistics from six onwards. In DCS and ANDIS, the increase in gap statistic showed a clear stabilisation after five clusters. Therefore, we considered five the most optimal number of clusters (ESM Fig. 1a). The first cluster comprised 13–17% of the individuals included. It was characterised by high HbA1c, but, compared with the other clusters, participants were younger with lower BMI, C-peptide and HDL-cholesterol levels. When compared with the original clusters in ANDIS , this cluster was most similar to the SIDD cluster with a sensitivity (SEM) of 90.7% (CI 88.4%, 92.6%; Fig. 1, ESM Fig. 1b) . Between 9% and 22% of individuals clustered to a cluster with high C-peptide levels and age, but relatively lower HbA1c and HDL-cholesterol levels, suggestive of insulin resistance. Indeed, compared with the ANDIS clusters, this cluster resembled most the SIRD cluster with an SEM of 92.4% (CI 89.7%, 94.6%; Fig. 1, ESM Fig. 1b) . The third cluster comprised participants with high BMI and the youngest age and relatively lower levels of HbA1c and HDL-cholesterol. It was most similar to the originally described MOD cluster with an SEM of 80.6% (CI 78.4%, 82.7%) and comprised 18–23% of the individuals included in the study. The fourth and fifth clusters were most similar to the MARD cluster and showed a combined sensitivity of 79.1% (CI 77.5%, 80.6%) against the MARD cluster in ANDIS (Fig. 1, ESM Fig. 1b) . The fourth cluster, which was also the largest, encompassing 29–35% of the individuals, showed no extreme characteristics and was termed mild diabetes (MD). The fifth cluster was characterised by higher age and HDL-cholesterol and was termed mild diabetes with high HDL-cholesterol (MDH), and comprised 16–19% of the individuals (Fig. 1). Between male and female participants there were small differences in characteristics, but the overall differences between clusters were similar across both sexes (ESM Fig. 2).
Clusters cross-validate between the three cohorts
To assess the stability across cohorts, clusters were cross-validated between cohorts. Clusters generally cross-validated well between the three cohorts (ESM Fig. 3, ESM Table 2). The SIDD and MDH clusters showed the highest sensitivity of the five clusters identified, ranging from 85.6% (CI 83.5%, 87.6%) to 97.1% (CI 94.8%, 98.5%) in SIDD and from 73.3% (CI 69.5%, 77.0%) to 92.9% (CI 91.3%, 94.3%) in MDH (ESM Fig. 3, ESM Table 2). The SIRD and MD clusters performed generally worst in terms of sensitivity, with sensitivities ranging from 36.1% (CI 32.3%, 39.9%) to 92.3% (CI 90.1%, 94.2%) in SIRD and from 40.8% (CI 38.9%, 42.7%) to 78.1% (CI 75.9%, 80.2%) in MD. Individuals clustered to SIRD were classified as MD and vice versa (ESM Fig. 3, ESM Table 2). The sensitivity of the MOD cluster ranged from 55.0% (CI 52.6%, 57.3%) to 93.2% (CI 91.5%, 94.7%).
Clusters are different in their progression to insulin requirement
Next, we assessed differences between clusters in terms of progression towards insulin initiation or requirement. As expected, the SIDD cluster showed the fastest progression (HR 3.40 [CI 1.72, 6.72]) compared with the other clusters (Table 2, ESM Fig. 4). The SIRD group showed slower progression (0.59 [0.46, 0.76]). The clusters MD and MDH also showed differences in their progression, where MDH showed the slowest progression compared with the other clusters (0.44 [0.33, 0.59]), also slower than MD (0.81 [0.63, 1.06]).