Rates of diabetic retinopathy among cluster analysis—identified type 2 diabetic mellitus subgroups

Purpose To determine whether phenotypic clustering of patients with diabetes mellitus (DM) is associated with more advanced diabetic retinopathy (DR). Methods Retrospective cohort study of 495 patients with no prior DR treatment seen at a tertiary care clinic 2014–2020. Four previously identified clusters from Ahlqvist’s 2018 paper were reproduced utilizing baseline hemoglobin A1c, body mass index, and age at DM diagnosis. Age-adjusted Cox proportional hazard ratios were used to compare clusters with reference as the lowest risk cluster. Results All four type 2 DM clusters were replicated with our cohort. There was a significant difference in racial distribution among clusters (p = 0.018) with severe insulin-resistant diabetes (SIRD) having the higher percentage of Caucasians and lower percentage of Hispanics compared to other groups and a higher percentage of African Americans comprising the severe insulin-deficient diabetes (SIDD) cluster than other groups. Rates of proliferative diabetic retinopathy were higher in mild obesity-related diabetes (MOD) (28%), SIDD (24%), mild age-related diabetes (MARD) (20%), and lowest in SIRD (7.9%), overall p = 0.004. Rates of vitreous hemorrhage were higher in MOD (p = 0.032) and MARD (0.005) compared to SIRD. Conclusion Baseline clinical measures may be useful in risk stratifying patients for progression to retinopathy requiring intervention.


Introduction
As more is understood about the underlying pathophysiology of diabetes mellitus (DM), it has become clear that type 1 and type 2 are not sufficient in distinguishing the different phenotypes that present due to the complexity of the disease.An emerging body of literature in endocrinology has set out to define the nuances within the catch-all diagnosis that is type 2 diabetes mellitus (T2DM).
Ahlqvist's seminal 2018 Lancet study, utilized 8980 newly diagnosed diabetic patients from the Swedish All New Diabetics in Scania (ANDIS) cohort followed over 10 years, demonstrated novel subgroups of adult-onset diabetes using datadriven cluster analysis of baseline characteristics [1].The six variables of age at diagnosis, body mass index (BMI), hemoglobin A1c (HbA1c), glutamate decarboxylase antibodies, and homeostatic model assessment 2 estimates of beta-cell function and insulin resistance (HOMA-2B, HOMA-2IR) were used to create five clusters.These include severe autoimmune diabetes (SAID), severe insulin deficient diabetes (SIDD), severe insulin resistant diabetes (SIRD), mild obesity-related diabetes (MOD), and mild age-related diabetes (MARD).
SAID was defined by the presence of glutamic acid decarboxylase antibodies (GADA) as present in type 1 diabetes and latent autoimmune diabetes as well as characterized by low insulin secretion and poor metabolic control.SIDD, while lacking auto-antibodies, was also identified by low insulin secretion and poor metabolic control with the associated findings of increased risk of the microvascular processes of retinopathy and neuropathy.Interestingly, SIRDdefined by its insulin resistance, late onset, and association with obesity-was associated with nephropathy, another microvascular condition, as well as fatty liver disease.MOD was associated with obesity and early age of onset, while MARD was defined by its late age of onset, mild metabolic derangements, and low risk of complications [1].
These clusters have since been reproduced in several studies which have included more diverse populations [2DDI.Of note, Kahkoska and colleagues were able to reproduce the clusters retrospectively from three global cardiovascular outcomes trials using just three readily attainable variables: age at diagnosis, BMI, and HbA1c [2].These cited studies assessed numerous outcome variables including risk of diabetic complications.However, one shared limitation of assessment of diabetic complications was treating diabetic retinopathy (DR) as a binary outcome: present or absent.The severity of and complications related to DR are on a spectrum, secondary to the degree of capillary leakage, capillary occlusion, and retinal ischemia that underlie the disease process [3].Retinal ischemia in DR ultimately leads to sequelae including retinal neovascularization, vitreous hemorrhage (VH), tractional retinal detachment (TRD), and neovascular glaucoma (NVG).This project sought to assess whether clusters were reproducible in a retrospective cohort of patients with DR and if these clusters were associated with risk of advanced DR.

Methods
This retrospective cohort study was approved by the Colorado Multiple Institutional Review Board and the study complied with the Health Insurance Portability and Accountability Act of 1996 and the Declaration of Helsinki.Patients seen at the Denver Health Eye Clinic at Denver Health Medical Center (DHMC) (Denver, Colorado) between January 2014 and December 2020 with any stage of DR were considered eligible for inclusion.Inclusion criteria were a complete electronic medical record (EMR) and DR treatment-naïve status.Data was captured for initial entry into the county hospital ophthalmology clinic as well as annual follow-up points as available.
Exclusion criteria were deceased status at time of data collection, type 1 DM diagnosis, other significant ocular pathology, correctional facility status, and/or absence of the three required cluster variables: HbA1c, body mass index (BMI), and age at DM diagnosis.Because type 1 DM diagnosis was an exclusion criteria and GADA antibodies were not available, the SAID cluster was not identified or analyzed in this study.
For each patient who met inclusion criteria, the following variables were abstracted into a secure database: age, age at time of diabetes diagnosis, sex, race/ethnicity, diagnosis of hypertension, diagnosis of chronic kidney disease, HbA1c, BMI, microalbumin, albumin: creatinine ratio, lipid panel, alanine transaminase (ALT), DR severity at initial and final visits, NVG, VH, diabetic macular edema (DME), and DR treatment received.Both eyes of each patient were included in the study.When an event occurred in one eye, the patient remained in the study and contributed data from the fellow eye.
Following Kahkoska and colleagues' method in validation of T2DM clusters in the DEVOTE, LEADER, and SUSTAIN cardiovascular outcomes trials [2], we utilized the techniques described below.Scaled and centered values of the baseline HbA1c, baseline BMI, and age at the time of T2DM diagnosis were used for clustering [2].Males and females were assigned to clusters separately based on the shortest Euclidean distance from values of each individual's clustering variables to cluster centroids previously identified by Ahlqvist, who used a K-means clustering algorithm to find centroids [1].Cluster performance was assessed by computing the subject-specific ratio of the Euclidean distance to the assigned cluster centroid to the Euclidean distance to the next closest cluster centroid.A ratio closer to 0 indicates a strong association between a subject and the assigned cluster, and a ratio closer to 1 indicates a weak association [2].
Basic frequencies and percentages are presented for categorical variables.Means, standard deviations (SD), and medians are presented for continuous variables.The summary measures are presented for the entire cohort and for each cluster group.Demographic and clinical data were analyzed for differences across clusters using the Kruskal-Wallace rank sum test for continuous variables, and Fisher's exact test for categorical variables.Post-hoc pairwise comparisons to assess differences in rates of proliferative diabetic retinopathy (PDR) and VH between clusters were performed using Fisher's exact test with Bonferroni-corrected p-values.To determine the association between PDR and variables of interest, unadjusted logistic regressions and logistic regressions adjusted for assigned cluster were fitted.Differences in baseline HbA1c and the mean of 1-3 years of follow-up HbA1c were assessed using Welch's two-sample t-test.Statistical analysis was performed using R version 4.1.3[4].p-values less than 0.05 indicated statistical significance for this study.

Results
Four hundred sixteen DR treatment-naïve patients were seen for DR between January 2014 and December 2020 and met inclusion criteria.One hundred three subjects were assigned to SIDD, 101 subjects were assigned to SIRD, 78 subjects were assigned to MOD, and 134 subjects were assigned to MARD. Figure 1 provides a visualization of clustering by combinations of clustering variables.The mean and median of the distance ratio of assigned cluster centroid to next closest cluster centroid (described in more detail above) was 0.685 (SD = 0.205), and 0.710 (IQR = 0.306).The distribution of the three clustering variables among the four clusters is further illustrated by box plot of Fig. 2.
Baseline characteristics stratified by cluster are presented in Table 1.While 55% of the cohort was male, gender did not significantly vary between cohorts (p = 0.590).The mean age of study patients at study enrollment was 57.6 (SD = 10.2) years and varied significantly across clusters in an all group comparison (p < 0.001) with the MOD cluster having the lowest average age at 49.7 (SD = 8.9) years.There was a high percentage of patients reporting Hispanic ethnicity Albumin:creatinine ratio (ACR) was significantly different across cohorts (p = 0.019).SIRD had the lowest ACR, 219.7 (SD = 506.5)and also had the lowest proportion of diagnosis of chronic kidney disease at 29%, which differed across cohorts (p = 0.033).LDL was significantly different across clusters (p = 0.009) and was highest in SIDD with an average of 106.9 (SD = 36.1)mmol/L.ALT was also significantly different across clusters (p = 0.043) and was highest in SIRD at 43.8 (SD = 63.7)IU/L.SIDD had the greatest reduction in HbA1c (p < 0.001) when comparing HbA1c at time of establishing care at the county hospital to the average over the following 3 years (shown in Table 3).This corresponds to SIDD having the highest A1c at presentation.
The incidence of stages of DR and its sequelae including VH, DME, and NVG is shown by the cluster in Table 2.A significant difference across clusters was found for severity of DR (p = 0.004) and incidence of VH (p = 0.007).Post hoc pairwise comparisons of the proportion of PDR by cluster showed that SIDD (24%) and MOD (28%) had significantly higher rates of PDR compared to the cluster with the lowest rate of 7.9%, SIRD (Bonferroni adjusted p-values 0.012 and 0.003, respectively) (Table 4).When adjusted for cluster assignment, age at DM diagnosis (OR = 0.969, 95% CI: 0.942-0.995),baseline microalbumin (OR = 1.003, 95% CI: 1.001-1.006),and ACR (OR = 1.0003, 95% CI: 1.0001-1.0004)were significantly associated with the odds of PDR (p = 0.022, p = 0.002, p < 0.001, respectively) (Table 5).Increasing age at DM diagnosis was associated with reduction in the odds of PDR.Higher levels of baseline microalbumin and higher albumin-creatinine ratio were associated with increase in the odds of PDR.Table 6 shows post hoc pairwise comparisons of the proportion of VH by cluster in which MOD and MARD had significantly higher proportions of VH compared to SIRD (Bonferroni adjusted p-values 0.032 and 0.005).Rates of DME did not vary across clusters (Table 2).

Discussion
To the best of our knowledge, this is the first study to utilize cluster analysis in evaluating ophthalmic complications in T2DM.Prior cluster studies in diabetes analyzed  primarily whether DR was present or not.In contrast, our focus was determining whether certain clusters were associated with DR complications such as PDR, VH, DME, or NVG.Ultimately, being able to determine those that are most at risk of progression or certain complications will allow for targeted prevention and early treatment.
In Ahlqvist and colleagues' work, they found that SIDD had a significantly higher risk of DR compared with MARD in the ANDIS cohort and that SIDD and MOD had a significantly higher risk of DR than MARD in the ANDIU cohort [1].Anjana and colleagues studying an Indian population similarly showed that SIDD had the highest risk of DR and that it was significantly higher than MARD [5].Our analysis showed more specifically that both SIDD and MOD had a higher risk of PDR compared to SIRD.While age at study enrollment did significantly vary across cohorts, SIDD and MOD had the lowest mean ages.This observation was  substantiated by the results of logistic regression analysis, which showed that when controlling for cluster assignment, increasing age is associated with a decrease in the odds of PDR.In addition, we found that increases in baseline microalbumin and ACR are associated with an increase in the odds of PDR.Despite this higher risk of PDR in SIDD, it was the MOD and MARD groups which had significantly higher risk of VH than SIRD.No clear risk factors inherent to the MOD/ MARD groups were identified to explain the higher risk VH in these patients.A 2017 study examining levels of VEGF in primary vitrectomy for late VH found significantly higher levels of VEGF in eyes that developed late VH and identified iris neovascularization, hypertension, and proteinuria as risk factors [6].Hypertension did not significantly vary across clusters in our study to indicate that blood pressure was driving the risk of VH in the MOD and MARD groups.
SIDD appears to have poorer metabolic control as this cluster has the highest average HbA1c on presentation at 12.8 mmol/mol, but also had a significant decrease in HbA1c after establishing care as evidenced by the mean HbA1c of 9.7 mmol/mol over the following 3 years (p < 0.001)-a finding which was not observed in the other cohorts.SIDD cluster patients were severely insulin deficient compared to other clusters likely leading to higher elevations of HbA1c when noncompliant compared to other groups that had mild metabolic derangements.The decrease in subsequent HbA1c in these patients was likely due to starting insulin therapy after presenting to medical provider in our system and reflects the importance of initiation of insulin therapy in this insulin deficient group.
While Ahlqvist et al. identified SIRD as having the highest risk of nephropathy, our analysis found that SIRD had the lowest creatinine:microalbumin ratio of the groups and lowest rate of known kidney damage; SIDD meanwhile had the highest.Population differences as well as differences in diabetes duration between the two studies likely contribute to the contradictory findings regarding nephropathy in our study.Given that diabetic kidney disease is uncommon if disease duration is less than one decade [7].In Ahlqvist's study, patients were followed for only 10 years after diabetes mellitus diagnosis and could have had different rates of nephropathy compared to our population.Furthermore, there are marked differences in epidemiology of diabetic kidney disease between different ethnic/racial groups with African Americans, Hispanics, and Native Americans having a much higher risk of developing end-stage kidney disease than non-Hispanic Whites [8].SIRD did have the highest ALT of the clusters in our analysis, which was consistent with both Ahlqvist's findings regarding SIRD having the highest rate of fatty liver disease and Zaharia's study demonstrating that SIRD had the highest hepatocellular lipid content at diagnosis and highest rate of hepatic fibrosis at 5-year follow-up [1,8].Some of this variability could be explained by different study populations.Our study population was predominantly safety-net hospital population with end stage diseases.Furthermore, significant differences in demographics should be noted between prior studies and our study population.
One of the limitations of our study was that our study population only included patients with DR.Patients without a diagnosis of DR were not included in our cluster analysis as they were not recorded in our database.Therefore, some of the characteristics of our clusters are likely skewed to the severe spectrum of diabetic disease.Associations with other diabetic complications such as nephropathy should therefore be interpreted with caution given the selective cohort.Another limitation of our study was the retrospective nature of our data collection.The longitudinal follow-up of our patients started after diagnosis of DR rather than their initial diabetes mellitus diagnosis.Consequently, some of our patients were either newly diagnosed diabetes mellitus patients or patients with longstanding history of systemic diabetic disease.Also, our study was conducted at a large metropolitan safety-net hospital serving mostly lower socioeconomic patients who were either uninsured or underinsured.Cluster analysis of this population might not be representative of other populations.Furthermore, we observed a relatively large number of subjects with high cluster distance ratio.The distribution of distance ratio in our cohort was similar to what was observed by Kahkoska [2] and may indicate the importance of HOMA2-B and HOMA2-IR (the additional clustering variables used by Ahlqvist) in cluster performance [1].

Conclusion
Type 2 diabetes mellitus appears to have significant variability in severity, complication risk, and response to treatment.The pathophysiologic cause of this variability is still under study and not yet well understood.A big data approach with cluster analysis of demographic and clinical biomarkers has been used by Alhqvist et al. to help distinguish these phenotypic variants.In our study, we evaluated these clusters for risk of advanced DR and its complications.We found that rates of PDR were higher in MOD, SIDD, and MARD clusters relative to SIRD and that rates of VH were higher in MOD and MARD compared to SIRD.Our study population was selective for patients with DR and thus does not represent the full spectrum of disease of DM, limiting its generalizability to other study populations and potential application to new diabetics.Nevertheless, the diverse population presenting with more advanced stages of diabetes is reflective of many communities within the USA and thus may have certain advantages in terms of generalizability over other international studies.Further research is needed to further define these diabetic variants so physicians can tailor treatment strategies and practice personalized medicine.

Fig. 1
Fig. 1 Cluster plot.Visualization of clustering by combinations of clustering variables.Subjects were assigned clusters based on the smallest Euclidean distance from a subject to the nearest cluster centroid.Clustering variables include age at DM diagnosis, baseline BMI, and baseline A1C.Values for each variable were centered around their mean and scaled to a standard deviation of 1. Squares in each panel represent cluster centroids identified by Ahlqvist, 2018.Sizes of shapes in each panel represent larger values of the clustering variable not included in the axes of the panel

Fig. 2
Fig. 2 Box plot.Box plots showing distribution of clustering variables among clusters.Boxes represent the 25th, median, and 75th percentiles.Whiskers extend 1.5 times the interquartile range

Table 1
Demographic and clinical factors

Table 1 (
Bold highlight statistical significance p <0.051Kruskal-Wallis rank sum test2Fisher's exact test for count data with simulated p-value (based on 2000 replicates) Abbreviations for the clusters above refer to SIDD severe insulin deficient diabetes, SIRD severe insulin resistant diabetes, MOD mild obesityrelated diabetes, and MARD mild age-related diabetes

Table 5
Logistic regression analysis of proliferative diabetic retinopathy