Trajectories of clinical characteristics, complications and treatment choices in data-driven subgroups of type 2 diabetes

Aims/hypothesis This study aimed to explore the added value of subgroups that categorise individuals with type 2 diabetes by k-means clustering for two primary care registries (the Netherlands and Scotland), inspired by Ahlqvist’s novel diabetes subgroups and previously analysed by Slieker et al. Methods We used two Dutch and Scottish diabetes cohorts (N=3054 and 6145; median follow-up=11.2 and 12.3 years, respectively) and defined five subgroups by k-means clustering with age at baseline, BMI, HbA1c, HDL-cholesterol and C-peptide. We investigated differences between subgroups by trajectories of risk factor values (random intercept models), time to diabetes-related complications (logrank tests and Cox models) and medication patterns (multinomial logistic models). We also compared directly using the clustering indicators as predictors of progression vs the k-means discrete subgroups. Cluster consistency over follow-up was assessed. Results Subgroups’ risk factors were significantly different, and these differences remained generally consistent over follow-up. Among all subgroups, individuals with severe insulin resistance faced a significantly higher risk of myocardial infarction both before (HR 1.65; 95% CI 1.40, 1.94) and after adjusting for age effect (HR 1.72; 95% CI 1.46, 2.02) compared with mild diabetes with high HDL-cholesterol. Individuals with severe insulin-deficient diabetes were most intensively treated, with more than 25% prescribed insulin at 10 years of diagnosis. For severe insulin-deficient diabetes relative to mild diabetes, the relative risks for using insulin relative to no common treatment would be expected to increase by a factor of 3.07 (95% CI 2.73, 3.44), holding other factors constant. Clustering indicators were better predictors of progression variation relative to subgroups, but prediction accuracy may improve after combining both. Clusters were consistent over 8 years with an accuracy ranging from 59% to 72%. Conclusions/interpretation Data-driven subgroup allocations were generally consistent over follow-up and captured significant differences in risk factor trajectories, medication patterns and complication risks. Subgroups serve better as a complement rather than as a basis for compressing clustering indicators. Graphical Abstract Supplementary Information The online version of this article (10.1007/s00125-024-06147-y) contains peer-reviewed but unedited supplementary material.


Introduction
Data-driven clustering analysis has been proposed for categorising type 2 diabetes based on six clinical parameters: age, BMI, HbA 1c , GAD antibodies and HOMA-2 estimates of beta cell function and insulin resistance [1].In the study by Ahlqvist et al [1], Swedish individuals with diabetes were stratified into five subgroups, including severe autoimmune diabetes, severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD) [1].These subgroups were reproduced in other countries and cohorts, and their risk profiles studied in both the short and medium term (5 to 15 years) [2][3][4][5][6][7].The findings suggest distinct risks of complications and molecular profiles across the subgroups [1][2][3][4][5][6]8].For example, SIRD had a higher frequency of non-alcoholic fatty liver disease and higher risk of developing chronic kidney disease (CKD) [1], and subgroups may help to identify underlying molecular mechanisms related to liver [8], which may provide insights into the diverse aetiology of diabetes.
As part of the Risk Assessment and ProgreSsiOn of Diabetes project (RHAPSODY, https:// imi-rhaps ody.eu), a new set of risk subgroups clustered based on clinical parameters were defined using Dutch and Scottish diabetes registry data and the original Swedish cohort of individuals with type 2 diabetes [9].Given that the data originated from routine care, some clinical parameters were slightly modified due to their availability [9].Replication analyses showed good resemblance between cohorts and also compared with the original Swedish subgroups (developed by Ahlqvist et al [1]) [9,10], except for the refinement of the original MARD cluster into two new clusters, the mild diabetes subgroup developed by RHAPSODY (RHAP-MD) and the mild diabetes with high HDL-cholesterol subgroup developed by RHAPSODY (RHAP-MDH), following the addition of HDL-cholesterol.Both RHAP-MD and RHAP-MDH exhibited slow glycaemic deterioration, but they showed significantly different molecular signatures [8].
Hence, following up on prior RHAPSODY subgroup research, the current study aims to gain more insight into the clinical relevance of subgroups by studying up to 23 years of follow-up data in two of the original RHAPSODY cohorts.Using contemporary cohorts and a significantly longer follow-up than previous studies, we wanted to: (1) estimate risk factor progression, time to macrovascular complications and treatment patterns by baseline subgroup over at least 15 years; (2) explore the added value of using data-driven subgroups compared with clustering indicators in predicting the progression of risk factors, risk of complications or treatment patterns; and (3) examine the consistency of membership to the data-driven diabetes subgroups over time.Using two distinct cohorts allowed us to validate our findings.

Study design and participants
This retrospective study investigated 9199 individuals with type 2 diabetes in two distinct cohorts: the Hoorn Diabetes Care System (DCS, the Netherlands) and the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS, Scotland).The reporting of study findings followed the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines [11], as listed in the electronic supplementary material (ESM) Appendix 1.
Our study's inclusion criteria consisted of a diagnosis age ≥35, GAD negativity and the availability of complete data for each of the five clustering indicators within 2 years of diagnosis.By omitting the data availability requirement for genomewide association used in the previous RHAPSODY clustering study [9], we employed more lenient criteria, yielding a slightly larger sample size compared with Slieker et al [9].
The DCS cohort consisted of 3054 individuals (median follow-up=11.2years) observed over the period 1998-2019 and the GoDARTS cohort consisted of 6145 individuals (median follow-up=12.3years) over the period 2003-2018 that matched the inclusion criteria (ESM Fig. 2.1).All results were produced for both cohorts, separately.
DCS is a comprehensive dynamic prospective cohort of the natural course of type 2 diabetes from 103 general practitioners (GPs) in the West-Friesland region of the Netherlands, with over 90% of its participants being of European ancestry [12].At baseline, 52.3% of the participants were men, with a mean age of 63 years.Educational levels varied among participants: 43.3% had a low educational level, 42.1% had a middle educational level and 14.6% had a high educational level [12].DCS generally represents a Western European, semiurban population [12].GoDARTS is a longitudinal cohort that includes individuals with diabetes from the Tayside region of Scotland, with more than 99% of its participants being white [13].At baseline, 53.3% of the participants were men, with a mean age of 64 years [13].GoDARTS generally represents a predominantly white population with diabetes in the East of Scotland [13].Pseudonymised data were collected through electronic record linkage from primary and secondary care data sources [13].Laboratory measurements of both cohorts have been described in detail in previous studies [9,[12][13][14] (ESM Appendix 2).
Medication use was categorised into treatment steps (ESM Table 2.2).These were defined according to the management steps described in the Dutch GP primary care guideline [15], as the relevant guidance for DCS practitioners at the time of data collection, adding information regarding the use of statins and other medication for CVD prevention.
Clustering Clustering was done on scaled clustering indicators at baseline, including age at baseline, BMI, HbA 1c , C-peptide (as a proxy of HOMA-2 estimates of beta cell function and insulin resistance in the absence of fasting glucose in GoDARTS [9]) and HDL-cholesterol (as a risk factor for time to insulin requirement [16]).The baseline for each individual was defined as the observation nearest to diabetes diagnosis.Therefore, it should largely reflect individuals who were either untreated or who only received first line treatment for a brief period (details in Table 1).Men and women were clustered separately and then pooled to avoid sex-dependent differences.Cluster centres were defined as the arithmetic mean of all the values belonging to the cluster.Once clusters were defined, we assigned the same cluster names as those in the original study [1,9], based on the distribution of cluster characteristics and the lowest Euclidean distance from the previous study [9], including severe insulin-deficient diabetes developed by RHAPSODY (RHAP-SIDD; characterised by high HbA 1c ), severe insulin-resistant diabetes developed by RHAPSODY (RHAP-SIRD; characterised by high C-peptide and age at baseline), mild obesity-related diabetes developed by RHAP-SODY (RHAP-MOD; characterised by high BMI), RHAP-MD (characterised by moderate risk factors) and RHAP-MDH (characterised by high HDL-cholesterol) [9].
Statistical analysis Subgroups identified at baseline were compared with the previously published RHAPSODY subgroups [9], considering the latter as the reference.The agreement was assessed based on sensitivity, specificity, specific agreement [17], overall accuracy rate along with a 95% CI and overall κ indices of agreement [18].Missing data (mean of 0.6% in DCS and 8.1% in GoDARTS; ESM Table 2.3) were omitted in their respective analyses to avoid excessive use of imputed data as observational evidence.
We reported baseline characteristics for each subgroup using frequencies (%) for categorical variables or mean (SD) for continuous variables.Trajectories of related clinical parameters (BMI, HbA 1c , HDL-cholesterol, systolic BP [SBP], diastolic BP [DBP], total cholesterol, LDL-cholesterol, blood creatinine and triglycerides) were visualised by plotting subgroup annual means, along with 1 SD boundaries based on observed variance within subgroups.The random intercept model was used to analyse longitudinal trajectory data with discrete subgroup membership, sex and diabetes duration as covariates.
Kaplan-Meier methods were applied to plot cumulative incidence for first events of each outcome since diagnosis of diabetes by subgroups.Group comparisons and pairwise comparisons were conducted by logrank tests, applying Benjamini-Hochberg correction [19] to adjust for multiple comparisons.A Cox regression model with diabetes duration as the time scale, left truncated at each individual's diagnosis of diabetes, was conducted to calculate the HR (95% CI).The Cox model was also adjusted for age at baseline and sex.Schoenfeld tests were applied to evaluate the proportional hazard assumption, and violation was indicated by p<0.05 [20].
We visualised medication patterns reflecting the proportion of individuals within each subgroup in each treatment step over the follow-up period by area graphs.Multinomial logistic regression, in which treatment steps were dependent variables, with discrete subgroup membership, diabetes duration and sex as covariates, was conducted to compare the proportion in each treatment step between subgroups.
The models described above were re-estimated using clustering indicators at baseline (HbA 1c , C-peptide, HDLcholesterol, age and BMI), with and without discrete subgroup membership data, to analyse the longitudinal risk factor trajectories, risk of complications and medication patterns.Akaike's information criterion (AIC) and relative likelihood (RL) were applied to compare the information loss and fitting of models [21].Smaller AIC values indicate better goodness of fit.The p value for the comparison of AIC differences was then indicated by RL = exp( ) .We visualised the results on a heatmap, using colours to indicate scaled AIC and text to indicate RL.
Two clustering algorithms were repeated with durations of 2-4, 4-6 and 4-8 years from diagnosis to assess the cluster consistency over time as follows: (1) de novo clustering (i.e.repeating k-means clustering); or (2) centre-based reallocation (i.e.assigning individuals to the subgroup with the lowest Euclidean distance to cluster centres identified at baseline).The agreement between estimated subgroups over time and subgroups identified at baseline was assessed, and the cluster migration pattern was presented graphically for individuals with available clustering indicators in all four 2 year intervals (GoDARTS n=4914; DCS n=2756), along with the top ten transition trajectories.An analysis of the associated risk factors and treatment patterns was visualised in the same manner for the most representative movements.We used the Cox regression model to compare the risk of complications for those who moved between severe subgroups (including RHAP-SIRD and RHAP-SIDD) and mild subgroups (including RHAP-MD, RHAP-MOD and RHAP-MDH).

Baseline characteristics and the progression of clinical parameters over time
Our current subgroups identified at baseline, which were based on a larger sample size of individuals than previously published RHAPSODY subgroups [9] (2953 individuals in DCS), showed a good resemblance with an accuracy of 0.92 (95% CI 0.91, 0.93) (ESM Table 4.1), despite a slight change in clustering centroids (ESM Table 4.2).
Significant differences in baseline clustering indicators, treatment patterns and other clinical parameters were observed among subgroups identified at baseline (Table 1, ESM Figs 4.1-4.4).
Figure 1 and ESM Fig. 4.5 show that the ranking of risk factors across baseline subgroups remained relatively unchanged throughout follow-up for those risk factors used to characterise specific subgroups (e.g. the subgroup characterised by high HDL-cholesterol at baseline recorded the highest mean HDL-cholesterol during follow-up).The exception was for the trajectory of HbA 1c as observed in GoDARTS (Fig. 1b), where the RHAP-MOD subgroup crossed with the RHAP-SIDD subgroup after 4 years from diagnosis and became the subgroup with the highest mean HbA 1c .Random intercept models (ESM Table 4.3) indicated that subgroups' properties over time are not only visually distinct but also statistically significantly different.Specifically, compared with RHAP-SIDD, RHAP-SIRD had significantly higher creatinine (an average difference of 12.65 μmol/l across the two cohorts) and RHAP-MDH had significantly lower triglyceride (an average difference of 0.48 mmol/l).
Multiple comparisons of survival rate curves indicated that CHF and CKD incidence was significantly higher in the RHAP-SIRD subgroup than in RHAP-MOD and RHAP-MD (ESM Tables 5.4, 5.5) in both cohorts.Although these higher risks of complication might be driven mainly by the higher age of the RHAP-SIRD and RHAP-MDH subgroups, Cox models adjusted for age and sex still indicated significantly higher HRs of AMI and CKD in RHAP-SIRD compared with RHAP-MDH for both GoDARTS (AMI HR  cotransporter 2 inhibitors and thiazolidinediones (on average 23.83% in GoDARTS and 3.55% in DCS).Among these, the prescriptions for thiazolidinediones (6.16%) and dipeptidyl peptidase-4 inhibitors (5.15%) were the highest in GoDARTS, whereas both were less than 1% in DCS, reflecting differences in prescribing practices between the two countries.a moderate to substantial agreement over time compared with subgroups identified at baseline.The specificity (ESM Table 9.3) was 0.91 on average, while the sensitivity and specific agreement were around 0.65 and 0.64 (lowest for RHAP-SIDD with average values of 0.25 and 0.28, respectively).By the centre-based reallocation method, accuracy (0.61-0.72), κ (0.51-0.64) and the proportion of individuals staying in the same cluster (0.46-0.72) improved by an average of 4.22%, 6.39% and 5.73% compared with the de novo clustering method.By the centre-based reallocation method, in GoDARTS, the RHAP-SIRD subgroup displayed the highest stability, with 77% of individuals remaining in the same cluster for 8 years.In contrast, the RHAP-SIDD subgroup was the least stable, with only 8% of individuals staying in the same cluster (Fig. 4).The most common transitions for RHAP-SIDD were to RHAP-MD (17%) and RHAP-MDH (7%) within the initial 2 years, with individuals maintaining their position in that subgroup for the subsequent 6 years.These individuals had a higher proportion receiving insulin-based control treatment and a greater decrease in HbA 1c levels than individuals who were assigned to RHAP-MD or RHAP-MDH initially and stayed for the

Discussion
Using a much longer follow-up, we confirm previous findings [2] that data-driven subgroups effectively recognised individual phenotype heterogeneity, as reflected by significant differences in risk factor progression, complication risks and treatment patterns.Integrating subgroup information with clustering indicators may offer improved prediction of progression variation compared with either approach alone, emphasising the complementary role of subgroups rather than replacing continuous indicators.While most subgroups remain generally consistent over time, the RHAP-SIDD subgroup is notably volatile, indicating the necessity to expand insights from baseline subgroups to longitudinal status.
Significant differences in clinical parameters were observed not only at baseline but also over time among the subgroups, Prior research has demonstrated that the SIRD subgroup exhibited higher risks of liver disease, macroalbuminuria, nephropathy, CKD and ESRD [1,2,6,23].Our analysis also revealed that RHAP-SIRD presented a higher risk of AMI, CHF, PVD, CKD and ESRD compared with other subgroups.By definition, subgroups varied in clustering indicators, such as age at baseline, which are among the risk factors for these complications.Upon adjusting for age, RHAP-SIRD maintained a significantly higher risk of AMI and CKD compared with other subgroups.
Treatment patterns varied significantly among subgroups, with the highest proportions of other OADs and overall glucose control treatment observed in RHAP-MOD and RHAP-SIDD subgroups, respectively.This suggests that physicians' treatment decisions for individuals within these subgroups differed, likely due to variations in age and other clustering indicators, as they were unaware of the individuals' subgroup membership.The significant differences in disease progression, complication risks and treatment patterns among subgroups highlight their utility in understanding the underlying pathways of disease progression.Slieker et al [8] demonstrated that diabetes subgroups reveal distinct molecular mechanisms in key metabolic tissues, uncovering varied causes of the disease that are not apparent when it is viewed uniformly.Beyond aiding in aetiological understanding, subgroups may also be useful for predictive purposes.However, datadriven subgroups have been criticised for their unsuitability in predicting outcomes, such as drug response or complications [4,24].Our study partially supports this critique, as we found that using the clustering indicators may perform better than solely using subgroups for prediction.This is due to subgroups compressing data from several individual indicators, leading to information loss.However, we found that combining subgroup membership (e.g.SIDD) with the clustering indicators (e.g.age, BMI, etc.) often enhanced the performance of the progression models, indicating a potential predictive benefit from including subgroup information.
As expected, allocating individuals with long diabetes duration based on the lowest distance to baseline centroid leads to higher consistency of baseline subgroups.Practically, using cluster centres enables easy assignment of individuals to subgroups without requiring information about other individuals.To enhance accuracy, cluster centres can be periodically updated according to the latest cohort characteristics, similar to routine updates in risk prediction models.Furthermore, our study revealed RHAP-SIRD to be the most consistent subgroup over time, with over 70% of individuals remaining for over 8 years, signifying its distinct, partially divergent aetiology.This aligns with prior research identifying SIRD as the most genetically unique subgroup [25], exhibiting an insulin resistance molecular signature [8] and lacking associations with the type 2 diabetes locus in the TCF7L2 gene or insulin secretion risk scores, contrary to SIDD and MOD [1,[25][26][27].
Ahlqvist's original study was designed to deepen the understanding of diabetes heterogeneity and enhance individualised treatment by identifying baseline phenotypes [1].To fully benefit from the long follow-up information available, we expanded this concept to include more than just baseline subgroups, attempting to explore the dynamics of disease.As expected, we observed changes in subgroup memberships over time, reflecting the combination of treatment effects and underlying phenotypes.For example, we found that more than 28% of individuals transitioned to other subgroups after 2 years.These temporal dynamics might be shaped by interactions between disease heterogeneity, adherence to treatment and treatment efficacy.Diabetes heterogeneity, such as distinct molecular signatures and genetic characteristics [1,8], may result in individuals consistently belonging to specific subgroups with unique phenotypes.However, the treatment meanwhile aims to shift individuals toward milder subgroups.For example, newly diagnosed individuals who subsequently meet guidelinebased treatment targets (53 mmol/mol (7%) HbA 1c [28,29], 0.9 mmol/l HDL-cholesterol [30], 25 kg/m 2 BMI [31]) will either remain or progress to the RHAP-MD subgroup over time, whereas insufficient risk factor control could result in increased progression to severe subgroups.
The longitudinal nature of our data allowed us to estimate the impact of changes in subgroup membership over time.We found that complication risks were more closely associated with individuals' current subgroups rather than the initial subgroups they were assigned at baseline.The risks of complications for individuals progressing from mild to severe subgroups were similar to those for individuals initially allocated to and remaining in severe subgroups.Also, individuals progressing from severe to mild subgroups showed complication risks lower than for those who remained in severe subgroups.Thus, an initial allocation to a mild subgroup did not necessarily translate into mild progression, and efforts should aim at achieving or maintaining mild subgroup status.This might suggest the importance of periodically re-clustering with changing risk factors as the disease progresses to capture the evolving dynamics and guide more informed decision-making.
Our study is not without limitations.First, C-peptide, one of the five clustering indicators, was assumed to be constant, due to the lack of follow-up data.This might overestimate subgroup consistency, but its impact is likely limited due to C-peptide's stability [1].Second, we estimated the treatment pattern from observed data and ignored censoring (ESM Figs 10.1, 10.2), which might underestimate the proportion of individuals taking the most intensive treatment steps.Third, due to the unavailability of fasting glucose data in GoDARTS, we were unable to replicate Ahlqvist's subgroups within this registry.Ahlqvist's method captures two key pathogenic mechanisms: insulin deficiency and resistance, indicated by HOMA-IR and HOMA-B.We used C-peptide instead, which may obscure the pathology link with type 2 diabetes.Nevertheless, considering the high sensitivity and specificity of RHAP-SIDD (72% and 100%) and RHAP-SIRD (67% and 89%) in relation to Ahlqvist's subgroups (ESM Fig. 11.1), our findings for RHAP-SIDD and RHAP-SIRD may offer insights for Ahlqvist's subgroups.Of note, SIDD had worse beta cell function than other subgroups described by Ahlqvist et al [1], and this was partially conveyed by the lower C-peptide of RHAP-SIDD among the RHAP-SODY subgroups.Since C-peptide is generally stable over time [9], but beta cell function progressively declines [32], we might expect even worse stability for SIDD in Ahlqvist's subgroups.Fourth, DCS registered events were based on self-report, which could lead to an underestimation of events.However, a validation study found events to be well reported, with 86% sensitivity and 90% specificity [12].Finally, our cohorts, predominantly consisting of white individuals, may limit the generalisability of findings to other settings.
In conclusion, the significant differences observed in subgroups' trajectories raise the possibility of identifying and understanding different phenotypes of type 2 diabetes.Also, subgroup information may improve prediction when added as a predictor.This lays the foundation for considering diabetes subgroups as complementary to, rather than replacements for, individual indicators.