Cohort descriptions
Data from 15,940 individuals with type 2 diabetes from three cohorts, DCS (Netherlands), GoDARTS (Scotland) and ANDIS (Sweden), were used in this cross-sectional study within the RHAPSODY consortium. RHAPSODY (Risk Assessment and ProgreSsiOn of Diabetes, https://imi-rhapsody.eu) is an Innovative Medicine Initiative project and one of the aims is to improve the segmentation of people with type 2 diabetes, supporting the implementation of novel strategies for diabetes prevention and treatment. Inclusion criteria for RHAPSODY were age of diagnosis ≥35, clinical data available within 2 years after diagnosis, GAD negative, no missing data in one of the five clinical measures used for clustering and the presence of genome-wide association study (GWAS) data.
Hoorn DCS cohort
The Hoorn DCS cohort is an open prospective cohort started in 1998 with currently over 14,000 individuals with type 2 diabetes from the north-west part of the Netherlands [10]. The study has been approved by the Ethical Review Committee of the Vrije Universiteit University Medical Center, Amsterdam. People visit DCS annually to monitor their diabetes. During this visit, multiple measurements are collected as part of routine care, including anthropometric and laboratory measurements. Measurements were used anonymously. Individuals were informed about the use of their data and were offered an opt-out. All laboratory measurements were done on samples taken in a fasted state. HbA1c measurements were performed using the turbidimetric inhibition immunoassay for haemolysed whole EDTA blood (Cobas c501, Roche Diagnostics, Mannheim, Germany, run CV 1.6%) [10]. HDL-cholesterol (mmol/l) was measured enzymatically (Cobas c501, Roche Diagnostics). C-peptide was measured on a DiaSorin Liaison (DiaSorin, Saluggia, Italy). In total, 2953 individuals matched the inclusion criteria.
GoDARTS
For clinical purposes, individuals with diabetes mellitus from the Tayside region of Scotland (n = 391,274; January 1996) were added to the Diabetes Audit and Research Tayside Study (DARTS) register [11]. Retrospective and prospective longitudinal anonymised data were collected, including data on prescribing and biochemistry and clinical data. All laboratory measurements were measured in a non-fasted state. People with type 2 diabetes were asked to participate in the Genetics of DARTS study (GoDARTS), which currently includes over 10,000 individuals with type 2 diabetes [11]. The GoDARTS study was approved by the Tayside Medical Ethics Committee. Informed consent was obtained from all participants. C-peptide was measured on a DiaSorin Liaison. In total, 5509 individuals matched the inclusion criteria.
ANDIS
The ANDIS cohort aims to recruit all people with incident diabetes within Scania County, Sweden. Recruitment started in January 2008 until November 2016. People are included in the study close to diagnosis, with a median of 40 days (IQR 12–99). All laboratory measurements were measured in a fasted state. HbA1c measurements were obtained from the Clinical Chemistry database. C-peptide was determined with an electro-chemiluminescence immunoassay on a Cobas e411 (Roche Diagnostics) or by a radioimmunoassay (Human C-peptide radioimmunoassay; Linco, St Charles, MO, USA; or Peninsula Laboratories, Belmont, CA, USA). In total, 7478 individuals matched the inclusion criteria.
Statistical analysis
Clustering was performed on five risk factors for type 2 diabetes progression [12]: age at first visit (years); BMI (kg/m2); HbA1c (mmol/mol); HDL-cholesterol (mmol/l); and C-peptide (nmol/l). C-peptide was included as a proxy of insulin resistance and, to some extent, beta cell function (electronic supplementary material [ESM] Table 1) in absence of fasting glucose in GoDARTS (preventing the use of HOMA). HDL-cholesterol levels were included as lower HDL-cholesterol has previously been recognised as a risk factor for time to insulin requirement [12]. Clustering was performed separately in each cohort and stratified by sex. Clusters were defined based on k-means using the kmeansruns function in the R package fpc (https://cran.r-project.org/web/packages/fpc/index.html). The optimal number of clusters was determined using the gap statistic across the three cohorts [13], this being defined as the point where the curve of the gap statistic vs the number of clusters flattened, with little added value of increasing the number of clusters. The stability of the clusters was assessed in two ways. The clusters identified here in ANDIS using C-peptide instead of HOMA2 were compared with their previously published clusters based on HOMA2 [1]. Second, identified clusters were cross-validated between cohorts to assess their stability. For this, individuals from cohort A were assigned to clusters based on the cluster centres of each of the clusters identified in cohort B. This approach will quantify the probability that an individual in cohort A will be assigned to the same cluster, but based on the clustering model for cohort B. Next, predicted clusters in cohort A based on the clusters of cohort B were compared with the ‘real’ clusters of cohort A. This was done for each of the three pairwise comparisons (DCS–GoDARTS, DCS–ANDIS, GoDARTS–ANDIS). Agreement between clusters was assessed based on the specificity and sensitivity.
Time to insulin requirement was defined as the period until an individual started sustained (more than 6 months in duration) insulin treatment or required insulin, defined as ≥2 HbA1c measurements >69 mmol/mol (8.5%) at least 3 months apart and when on ≥2 non-insulin glucose-lowering drugs. Cox proportional hazard models were used where one cluster was tested against the other clusters as a reference group in each individual cohort. Thereafter, results were meta-analysed using random effects meta-analysis using the metagen function from the meta package (https://cran.r-project.org/web/packages/meta/index.html). Analyses were performed using R statistics (version 3.6.2; https://www.r-project.org/). Figures were produced using the R packages ggplot2 (v3.3.0) (https://cran.r-project.org/web/packages/ggplot2/index.html) and omicCircos (v1.22.0) (http://www.bioconductor.org/packages/release/bioc/html/OmicCircos.html).