figure b

Introduction

Type 1 diabetes is a chronic autoimmune disease with increasing incidence worldwide [1,2,3]. Individuals with type 1 diabetes experience significant disease burden [4,5,6], making prevention a priority. Current strategies for type 1 diabetes prevention focus on modifying the presymptomatic disease course in individuals with significant disease risk. Presymptomatic type 1 diabetes is characterised by progressive immune-mediated destruction of insulin-producing pancreatic islet cells [7]. Autoantibodies against islet cells, including GAD autoantibodies (GADA), tyrosine phosphatase islet antigen-2 autoantibodies (IA-2A) and insulin autoantibodies (IAA), are important biomarkers for type 1 diabetes that can be detected during this period [8, 9].

Current risk stratification models for type 1 diabetes rely on positivity status of islet autoantibodies in genetically susceptible individuals [7]; however, not all individuals that develop islet autoantibody positivity go on to develop type 1 diabetes and the time to disease onset varies considerably. Islet autoantibody characteristics, including age at appearance [10,11,12,13,14], the type and combinations [8, 12, 15, 16], and the titre of islet autoantibodies [17, 18], have been individually shown to stratify risk for type 1 diabetes. While these studies provide insight into islet autoantibody development, few studies have considered the longitudinal effects of these characteristics in unison, due to insufficient analytical methods.

Unsupervised machine learning are data-driven methods that are well-equipped to characterise these complex interactions [19, 20]. Unsupervised learning algorithms aim to identify patterns in data without making any a priori assumptions on disease status. Previous studies leveraging unsupervised machine learning identified clusters on the basis of timing and type [21, 22] or timing and titre of islet autoantibodies [23, 24]. However, no studies have evaluated the combined effects of timing, type and titre of islet autoantibody development on type 1 diabetes risk. Unsupervised machine learning methods may provide novel insights into the relationship between multiple autoantibody characteristics and may improve risk stratification models for disease onset.

Precise stratification of type 1 diabetes risk is needed to identify at-risk individuals, ultimately improving aetiologic studies, disease screening and enrolment in prevention studies. Therefore, this work aimed to determine if a data-driven model incorporating islet autoantibody timing, type and titre could stratify risk for type 1 diabetes in children with high-risk HLA genotypes in The Environmental Determinants of Diabetes in the Young (TEDDY) study.

Methods

An overview of the methods is summarised in Fig. 1. This research was approved by the University of Utah Institutional Review Board. All analyses were performed in the Health Insurance Portability and Accountability Act (HIPAA)-compliant protected environment at the Center for High Performance Computing at the University of Utah. Additional details on the participants and data collection, inclusion/exclusion criteria, preprocessing, cohort extraction, identification of optimal models, and analysis of identified clusters are described in the electronic supplementary material [ESM] Methods.

Fig. 1
figure 1

Methods workflow. Workflow of methods used to include/exclude participants, preprocess data, develop 1-year cohorts, perform unsupervised clustering, analyse identified clusters and calculate cluster transitions

Software

Unsupervised clustering was performed in R (v4.0.2) [25] using kml3d (v2.4.2) [26]. All other analyses were performed with Python (v3.8.12; www.python.org): statistical analysis was performed using SciPy (v1.7.2) [27], time-to-event analysis was performed using lifelines (v0.26.4) [28], and the network diagram was generated using network (v2.8.0) [29].

Participants and data collection

Data from the TEDDY study was obtained from the National Institute of Diabetes and Digestive and Kidney Diseases Central Repository. The TEDDY study enrolled 8,677 children with high-risk HLA genotypes for type 1 diabetes across six clinical centres in the USA and Europe [30]. Participants were followed every 3 months from 3 months of age until 4 years, and then every 3 or 6 months until 15 years, depending on autoantibody positivity status [30, 31]. Additional descriptions of the TEDDY study can be found elsewhere [30,31,32,33,34] (ESM Methods).

Inclusion/exclusion criteria

Participants with at least one positive GADA, IA-2A or IAA measurement, as determined by the thresholds defined by the testing laboratory for the specific autoantibody and collection assay, were included for analysis (ESM Methods). Concordance rates of measurements are summarised in ESM Table 1. Participants with too few islet autoantibody measurements, as defined by having less than four autoantibody measurements in total, last study visit occurring before 12 months of age, first study visit occurring after 3 months of age or less than 50% of autoantibody measurements available, were excluded (n=238) to limit the amount of data imputation required during analysis and remove participants with only occasional or episodic measurements. These criteria led to the inclusion of 1,415 participants.

Preprocessing

GADA, IA-2A and IAA titres were extracted for included participants. Harmonisation of measurements across the type of autoantibody, collection assay and processing laboratories was performed and log-scaled z-score of GADA, IA-2A and IAA titres were calculated as described in previous literature (ESM Methods) [11, 23]. To address the multiplicity of measurements at the same time point due to different assays, processing laboratories and sample retesting [31, 33], a modified measurement selection procedure was adapted from previous literature [11, 23] and is described (ESM Methods). Missing data are summarised in ESM Table 2 and were linearly imputed (ESM Methods).

Year cohort extraction

To address varying autoantibody trajectory lengths due to loss-to-follow-up or reaching the study endpoint, measurements for all participants were divided into 1-year cohorts from 3 months to 15 years of age, as defined in the ESM Methods. To assess adequacy of sample size for unsupervised clustering, the feature-to-observation ratio was calculated and cohorts were excluded if they did not meet the required sample size [35]. Sex, clinical centre, high-risk HLA genotype group and islet autoantibody positivity status were extracted as covariates, and the status and age at type 1 diabetes diagnosis were extracted as outcome measures (ESM Methods). Differences in the distribution of covariates were assessed using a χ2 test with Yates’s continuity correction [27] and evaluated at the 0.05 significance level.

Unsupervised clustering and evaluation

For each year cohort, unsupervised machine learning was performed to identify clusters of GADA, IA-2A and IAA development. Non-parametric algorithms for clustering joint trajectories were developed (using the R package kml3d [26]) with pre-specified clusters ranging from two to ten, Euclidean distance and the k-means++ algorithm with the centroid method. These parameters were tested across 100 different initialisations to determine if clustering results were consistent across different starting conditions set by k-means++. The Calinski–Harabasz score was used to assess clustering performance and identify the optimal number of clusters [36]. The optimal numbers of clusters for each year cohort were identified as described in ESM Methods.

Analysis of identified clusters

Cluster centres, defined as the arithmetic mean of all points within a cluster, and standard deviations were calculated for each cluster. The log-scaled z-score of GADA, IA-2A and IAA autoantibody cluster centres and standard deviations were plotted for each year cohort.

Time-to-event analysis [28, 37] was used to examine risk of progression to type 1 diabetes for each cluster in each year. The period from last autoantibody measurement in each year to age at type 1 diabetes diagnosis or age at last autoantibody measurement was used as the event time. Kaplan–Meier survival estimates were generated for each cluster in each year. A logrank test was used to compare the overall difference in survival curves between clusters each year and pairwise logrank tests were used to compare progression of type 1 diabetes between each cluster each year. Results were evaluated at the 0.05 significance level. Adjusted p values were calculated for pairwise logrank tests with more than one comparison using a Benjamini–Hochberg correction for multiple comparisons. For each cluster, the 1-year and 5-year risk for type 1 diabetes were calculated with a 95% CI when applicable and the titre percentile for each autoantibody at each timepoint for each cluster were calculated.

For each cluster, the distribution of participants by sex, HLA genotype group, clinical centre and islet autoantibody positivity status was calculated. Differences in the distribution of covariates for each cluster in each year were assessed using a χ2 test with Yates’s continuity correction and evaluated at the 0.05 significance level.

To determine whether cluster membership was stable or varied across each year, the number and percentage of individuals that transitioned from a given cluster each year to a different cluster in the subsequent year were calculated. Transitions of cluster membership across years 1–12 were visualised using a network diagram [29, 38] (ESM Methods).

Results

Year cohort characteristics

Overall, 1,415 participants were included for analysis. The feature-to-observation ratio for assessing adequacy of sample size was calculated to be 280 (70×k, where k is the number of variables, i.e. k=4 visits per year). All year cohorts met the sample size inclusion threshold except years 13 (n=136), 14 (n=18) and 15 (n=0). Therefore, years 1–12 were included for further analysis. Year cohorts did not differ significantly by sex (p=1.000), clinical centre (p=0.996), high-risk HLA genotypes (p=1.000) or islet autoantibody positivity status (p=0.719) (ESM Table 3), indicating that covariates were similarly distributed across all years.

Unsupervised clustering

kml3d identified two to four clusters of GADA, IA-2A and IAA development across year cohorts. Years 1, 4, 5, 6, 7 and 8 cohorts contained three clusters, years 2 and 3 cohorts contained two clusters, and years 9, 10, 11 and 12 cohorts contained four clusters with the highest and most consistent Calinski–Harabasz scores (ESM Table 4; ESM Fig. 1). These models were selected for further analysis.

Analysis of identified clusters

Years were categorised into four groups based on the similarity of visualised cluster centres. The first year in each group was selected as a representative image (Figs 2, 3, 4 and 5). Cluster centres (ESM Fig. 2), time-to-event analysis (ESM Fig. 3, ESM Table 5, Table 1) and covariates (ESM Table 6) are discussed for each cluster in each year. Cluster transitions between select years are described (Fig. 6, ESM Table 7).

Fig. 2
figure 2

Year 1 representative clusters. In the first year of life, three clusters were identified. Clusters are coloured according to the autoantibody cluster centre: baseline (blue), all declining (orange) and all increasing (green). Log-scaled z-scores of cluster centres identified through kml3d respective to (a) GADA, (b) IA-2A and (c) IAA. (d) Survival curves for each identified cluster. T1DM, type 1 diabetes mellitus

Fig. 3
figure 3

Years 2–3 representative clusters. From ages 2 to 3, two clusters were identified and year 2 was selected as representative for this age group. Clusters are coloured according to the autoantibody cluster centre: baseline (blue) and all increasing (green). Log-scaled z-scores of cluster centres identified through kml3d respective to (a) GADA, (b) IA-2A and (c) IAA. (d) Survival curves for each identified cluster. T1DM, type 1 diabetes mellitus

Fig. 4
figure 4

Years 4–8 representative clusters. From ages 4 to 8, three clusters were identified and year 4 was selected as representative for this age group. Clusters are coloured according to the autoantibody cluster centre: baseline (blue), IA-2A dominant (red) and GADA dominant (purple). Log-scaled z-scores of cluster centres identified through kml3d respective to (a) GADA, (b) IA-2A and (c) IAA. (d) Survival curves for each identified cluster. T1DM, type 1 diabetes mellitus

Fig. 5
figure 5

Years 9–12 representative clusters. From ages 9 to 12, four clusters were identified and year 9 was selected as representative for this age group. Clusters are coloured according to the autoantibody cluster centre: baseline (blue), IA-2A dominant (red), GADA dominant (purple) and GADA & IA-2A dominant (brown). Log-scaled z-scores of cluster centres identified through kml3d respective to (a) GADA, (b) IA-2A and (c) IAA. (d) Survival curves for each identified cluster. T1DM, type 1 diabetes mellitus

Table 1 Type 1 diabetes risk and titre percentiles for each islet autoantibody cluster
Fig. 6
figure 6

Transitions in cluster membership across years 1–12. Network diagram of transitions in cluster membership across years 1–12. The nodes represent cluster membership in each year and are coloured according to the autoantibody cluster centre: baseline (blue), all declining (orange), all increasing (green), IA-2A dominant (red), GADA dominant (purple) or GADA & IA-2A dominant (brown). The numbers in the nodes represent cluster number. The size of the node correlates to the scaled number of individuals assigned to that cluster. The black arrows represent the scaled number of participants that transitioned from a given cluster to a different cluster in the subsequent year

Three clusters were identified during the first year of life that stratified risk for type 1 diabetes (logrank p=4.951×10−42, Fig. 2). The first cluster centre showed a baseline pattern with titres below the 50th percentile in all three autoantibodies (baseline; Fig. 2a–c; Table 1). Survival analysis of participants in the baseline cluster revealed minimal progression to type 1 diabetes (Fig. 2d), with a 1-year risk of 1.80% and a 5-year risk of 10.07% (Table 1). The second cluster centre showed elevated GADA titres at 3 months with minimal elevations in IA-2A and IAA that declined over time (all declining; Fig. 2a–c). The survival curve of participants in the all-declining cluster was not distinguishable from the baseline cluster (pairwise logrank adjusted p=0.059, ESM Table 5), indicating a low risk for diabetes (1-year risk: 3.12%; 5-year: 3.12%, Table 1). Mothers of individuals in the all-declining cluster were GADA positive at 0 or 9 months (ESM Results). The third cluster centre demonstrated an increase in titres of all islet autoantibodies (all increasing; Fig. 2a–c), with an incline to the 99th percentile at 9 months noted in IAA (Table 1). Individuals in the all-increasing cluster rapidly progressed to type 1 diabetes (Fig. 2d), with a 56.25% 1-year and 68.75% 5-year risk (Table 1) and had a higher proportion of participants with the DR3/4 genotype (71.88%, ESM Table 6). The distribution of individuals who developed islet autoantibody positivity varied significantly across each cluster, with the all-declining and all-inclining clusters having 94.12% and 100.00% of individuals with positivity (χ2 test p=4.729×10−10, ESM Table 6). These findings suggest that islet autoantibody development among genetically susceptible individuals is characterised by increasing or decreasing patterns in all three autoantibodies, with a subset of individuals demonstrating declining patterns in islet autoantibody titres.

During years 2 to 3, two clusters of islet autoantibody development were identified that effectively stratified risk for type 1 diabetes (logrank p=1.132×10−113 and 1.912×10−136 for years 2 and 3, respectively, ESM Fig. 2b, c; ESM Fig. 3b, c). Similar to clusters identified in year 1, clusters in years 2–3 showed baseline patterns with titres below the 50th percentile in all-autoantibodies or all-increasing patterns (Table 1; Fig. 3a–c). The all-increasing clusters demonstrated increased risk for progression to type 1 diabetes (1-year risk: 27.96% and 20.87%, 5-year risk: 69.19% and 67.73% in years 2 and 3, respectively, Table 1; Fig. 3d) compared with the baseline clusters (1-year risk: 1.21% and 0.52%, 5-year risk: 6.10% and 4.80% in years 2 and 3, respectively, Table 1; Fig. 3d). Though most participants remained in the same cluster they were previously assigned to, participants in the all-declining cluster in year 1 transitioned to the baseline cluster in year 2 (Fig. 6; ESM Table 7). Most individuals in the all-increasing clusters were islet autoantibody positive, while individuals in the baseline cluster were split (ESM Table 6). Together, these findings indicate that risk for type 1 diabetes is characterised by all-increasing patterns of islet autoantibody development during the first 3 years of life, with few individuals returning to baseline thereafter.

During years 4 to 8, three clusters of islet autoantibody development were identified that effectively stratified risk for type 1 diabetes (logrank p=4.804×10−150, 3.518×10−146, 1.899×10−91, 5.892×10−82, 2.716×10−57 for years 4, 5, 6, 7 and 8, respectively, ESM Fig. 2d–h, ESM Fig. 3d–h). Similar to the first 3 years, baseline clusters (Fig. 4a–c) with minimal progression to type 1 diabetes (1-year risk: 1.03%, 0.60%, 0.63%, 0.66% and 0.70% for years 4, 5, 6, 7 and 8, respectively, 5-year risk: 4.50%, 4.34%, 5.41% and 5.05% for years 4, 5, 6 and 7, respectively, Table 1; Fig. 4d) were identified. Individuals previously assigned to all-increasing clusters in year 3 primarily transitioned to one of two novel clusters in years 4–8 (Fig. 6, ESM Table 7): 49.57% transitioned to a cluster with IA-2A titres above the 90th percentile (IA-2A dominant; Table 1; Fig. 4b) while 29.57% transitioned to a cluster with GADA titres above the 90th percentile (GADA dominant; Table 1; Fig. 4a). Though both IA-2A- and GADA-dominant clusters demonstrated greater risk for diabetes compared with baseline clusters (Fig. 4d), individuals assigned to IA-2A dominant clusters progressed to type 1 diabetes faster than individuals assigned to the GADA-dominant clusters (ESM Table 5). On average, the 1-year diabetes risk for IA-2A-dominant clusters was 4.8 times higher than for GADA-dominant clusters (IA-2A: 28.93%, 25.49%, 20.88%, 24.42% and 24.08% vs GADA: 5.93%, 6.11%, 4.86%, 4.38% and 4.81% in years 4, 5, 6, 7 and 8, respectively, Table 1) and the 5-year risks were 2.4 times higher than GADA-dominant cluster (IA-2A: 78.78%, 74.06%, 63.40%, 62.73% vs GADA: 30.45%, 28.49%, 25.06%, 31.44% for years 4, 5, 6 and 7, respectively, Table 1). Individuals assigned to IA-2A dominant clusters in years 6–8 were more likely to be male and islet autoantibody positive (ESM Table 6). Together, these findings suggest that during ages 4 to 8, diabetes risk transitions to autoantibody and titre-specific clusters, with IA-2A dominant clusters having faster progression to type 1 diabetes.

During years 9 to 12, four clusters of islet autoantibody development were identified that effectively stratified risk for type 1 diabetes (logrank p=8.614×10−33, 5.347×10−31, 8.673×10−13 and 4.024×10−14 for years 9, 10, 11 and 12, respectively, ESM Fig. 2i–l, ESM Fig. 3i–l). Similar to previous years, baseline clusters (Fig. 5a–c) with minimal progression to type 1 diabetes (1-year risk: 1.05%, 1.39%, 1.25%, 0.00% in years 9, 10, 11 and 12, respectively, Table 1; Fig. 5d) were identified. GADA-dominant clusters and IA-2A-dominant clusters were also identified in years 9–12 (Fig. 5a–c). The importance of GADA dominance in diabetes risk diminished (1-year risk: 6.09%, 3.74%, 4.18%, 3.85% in years 9, 10, 11 and 12, respectively, Table 1; Fig. 5d) compared with IA-2A dominance, which remained equally important (1-year risk: 23.71%, 21.65%, 14.82%, 30.93% for IA-2A in years 9, 10, 11 and 12, respectively, Table 1; Fig. 5d). Though most individuals assigned to GADA- or IA-2A-dominant clusters in years 4–8 remained in the same clusters in years 9–12 (Fig. 6, ESM Table 7), a subset of individuals transitioned to novel clusters that demonstrated elevated titres above the 90th percentile in both GADA and IA-2A (GADA & IA-2A dominant; Table 1; Fig. 5a–c). Notably, the survival curves of GADA & IA-2A-dominant clusters were not significantly different from IA-2A-dominant clusters (pairwise logrank adjusted p=0.613, 0.947, 0.538 and 0.865 for years 9, 10, 11 and 12, respectively, ESM Table 5; Fig. 5d). Both IA-2A-dominant and GADA & IA-2A-dominant clusters demonstrated faster progression to type 1 diabetes compared with baseline and GADA-dominant clusters (1-year diabetes risk: 8.27%, 13.5%, 25.00%, 23.61% for years 9, 10, 11 and 12, respectively, Table 1). All participants in IA-2A-dominant clusters and GADA & IA-2A-dominant clusters were islet autoantibody positive (ESM Table 6). Together, these findings indicate that IA-2A clusters play more important roles in type 1 diabetes risk stratification at older ages.

Discussion

Type 1 diabetes risk varies substantially according to islet autoantibody characteristics. A more precise stratification of type 1 diabetes risk is needed to better identify at-risk individuals for aetiologic studies, disease screening and enrolment in prevention trials. In the present study, we leveraged data-driven methods to identify clusters of GADA, IA-2A and IAA islet autoantibody development that stratified the risk for type 1 diabetes in children with genetic susceptibility enrolled in the TEDDY study.

During the first 3 years of life, risk for type 1 diabetes was characterised by clusters with increasing titres of all three autoantibodies (Figs 2 and 3). Though increased diabetes risk associated with multiple islet autoantibody positivity in early life is well-established [10, 39], these findings are the first to provide insight into the joint development of GADA, IA-2A and IAA over time. Notably, our findings suggest that increases in all three autoantibodies above the 70th percentile can be detected as early as 12 months of age and continue to increase until 3 years (Table 1). By providing a more precise estimate of early type 1 diabetes risk and characterising increases in multiple autoantibody titres, these findings may aid in planning type 1 diabetes prevention trials.

We also identified a subgroup of participants with a decline in GADA titres from the 99th percentile to the 50th percentile during the first year of life that likely resulted from elevated maternal autoantibodies [40] (ESM Results). Participants in this cluster demonstrated a decreased 5-year risk for type 1 diabetes compared with the baseline cluster (Table 1). This finding supports previous observations that maternal GADA may serve a protective role against type 1 diabetes onset [41] and further studies are needed to assess the role of GADA in slowing disease progression. Overall, cluster 2 reflects a heterogeneous group in which autoantibodies generated from mothers with type 1 diabetes were likely transmitted to participants. These mothers are additionally likely to transmit insulin antibodies that may be generated as a consequence of maternal insulin therapy [42]. The findings from this analysis support a reduced risk for type 1 diabetes in infants born to mothers with type 1 diabetes.

Type 1 diabetes risk transitioned to type-specific titres during ages 4 to 8, as clusters with titres of IA-2A above the 98th percentile demonstrated faster progression to diabetes compared with clusters with titres of GADA above the 98th percentile (Fig. 4). These findings suggest that while titres of GADA and IA-2A may both provide insights into the lifetime risk of type 1 diabetes, higher titres of IA-2A may indicate an earlier onset of type 1 diabetes. Type 1 diabetes is a highly heterogeneous disease, and recent studies have proposed autoantibody-driven subgroups of presymptomatic disease [43,44,45]. Our findings may reflect a novel age-related endotype driven by IA-2A dominance with faster progression to diabetes. The importance of GADA decreased during ages 9–12, with clusters containing IA-2A or both GADA and IA-2A demonstrating increased risk for type 1 diabetes (Fig. 5). Together, these findings suggest that IA-2A plays a more important role in type 1 diabetes risk stratification later in life. Previous studies have also found that higher titres of IA-2A but not GADA increased the risk for type 1 diabetes [11, 46], and our findings add temporal context by highlighting the utility of GADA in years 4–8, but diminished effects after that.

This analysis detected single baseline clusters in each year cohort (Figs 2, 3, 4 and 5). We also found that individuals who did not meet the criteria for confirmed persistently positive islet autoantibody status but had at least one positive islet autoantibody were most prevalent in the baseline clusters (ESM Table 6). These findings suggest that subclinical variations in islet autoantibody titres are minimal and do not confer significant risk for future onset of type 1 diabetes.

Though HLA genotype groups significantly differed between clusters in most year cohorts, no individual genotype accounted for cluster membership (ESM Table 6). This suggests that factors beyond genotype influence autoantibody and type 1 diabetes development. Environmental exposures are implicated in the aetiology of type 1 diabetes [47,48,49] and the clusters identified in this work may provide a useful framework to investigate aetiologic factors in type 1 diabetes. We also noted a significant association between IA-2A dominant clusters in years 6–10 and male sex (ESM Table 6). Further research is needed to corroborate these findings in other diverse datasets and explore why boys are more likely to demonstrate dominance of IA-2A in years 6–10 of life, especially given our finding of increased type 1 diabetes risk in this group (Table 1). Future studies should explore underlying aetiopathologic mechanisms associated with the data-driven clusters identified in this work.

Strategies to improve the prediction of risk for development of type 1 diabetes are needed to improve enrolment in prevention and aetiologic studies [50]. Overall, this work offers evidence for including the timing, type and titre of islet autoantibody measurements in risk stratification for type 1 diabetes. This is the first work to include all three autoantibody characteristics in a risk stratification model. These findings may also provide insights into optimal times and types of autoantibodies to guide screening programmes during presymptomatic stages of type 1 diabetes. In addition, the importance of IA-2A in risk stratification after 4 years of age was established, which may guide strategies for recruitment and enrolment in prevention studies.

Strengths of these findings include a robust longitudinal cohort and a novel data-driven clustering method. The TEDDY study is the largest prospective birth cohort study that monitors longitudinal changes in islet autoantibodies [30], allowing unprecedented opportunities for big-data analytics to elucidate the presymptomatic features of type 1 diabetes. Though several studies have evaluated select autoantibody characteristics in type 1 diabetes risk [21,22,23,24], this is the first data-driven study that considers the combinatory role of the timing, type and titre of islet autoantibodies in one model. The unsupervised machine learning algorithm used in this analysis facilitates the evaluation of joint trajectories across multiple variables [26]. We presented a year-based model of multiple islet autoantibody development that captured changes in the timing, type and titre of autoantibody development that correlated with type 1 diabetes risk. The clusters identified in this study can serve as a computational framework for investigating other factors in presymptomatic type 1 diabetes development.

There are some limitations to this study. Only children with genetic susceptibility to type 1 diabetes were enrolled in the TEDDY study [30], limiting the generalisability of these findings. Autoantibody titres were measured using different assays across two different laboratory sites and the visit schedule changed after 4 years of age based on autoantibody status. Though analytical measures were taken to address these limitations (ESM Methods), residual measurement bias may remain, and our findings may be skewed towards individuals who developed eventual islet autoantibody positivity. Future studies should evaluate these findings in more consistently collected datasets.

In this study, we included participants with at least one autoantibody positivity to limit bias towards baseline clusters and obtain a sufficient sample size for analysis. We also excluded individuals with few autoantibody measurements, but some individuals with episodic or occasional measurements may remain. Though we were able to identify distinct clusters, the number of participants assigned to baseline clusters was much larger than in non-baseline clusters. We also divided participants into 1-year cohorts to account for loss to follow-up or reaching the study endpoint. Future studies in larger cohorts should seek to evaluate clusters among participants with persistently positive islet autoantibodies using alternate temporal intervals.

The unsupervised clustering method used in this analysis was limited to temporal islet autoantibody measurements. Future studies should seek to use more complex machine learning methods to evaluate other characteristics, including other novel islet autoantigens [51], other genetic risk markers, environmental exposures, dietary intake and socioeconomic factors.

In conclusion, this study leveraged data-driven techniques to assess the role of multiple islet autoantibody characteristics in type 1 diabetes risk. These findings highlight the importance of islet autoantibody timing, type and titre in type 1 diabetes risk stratification. The identification of age-dependent percentiles for each autoantibody and associated 1-year and 5-year risk for type 1 diabetes onset may help to improve screening and prediction strategies for prevention studies. Future work should validate these findings in independent cohorts with diverse populations across longer temporalities and further characterise phenotypic features.