IAA assay characteristics in DASP proficiency testing
The results of IAA determination in DASP 2000, 2002, 2003 and 2005 are summarised in Table 1
. Twenty-three laboratories reported the results of 23 assays in 2000, 32 reported results of 35 assays in 2002, 28 reported results of 28 assays in 2003, and 26 reported results of 30 assays in 2005. Data-reporting errors resulted in poor performance for two laboratories in DASP 2003, and incomplete results (<90%) were reported by two laboratories in 2002, two in 2003 and one in 2005. In DASP 2005, 25 laboratories used competitive assays with displacement of IAA binding with unlabelled insulin and five laboratories used non-competitive assays. As shown in Table 1, the median AUC improved progressively from DASP 2000 to DASP 2005 for all participating laboratories (p = 0.001; Fig. 1a), and in laboratories participating in three or four workshops (p = 0.011; Table 1). There was no overall difference in AS95 between the workshop 2002, 2003 and 2005 (p = 0.268; Fig. 1b). Laboratory-assigned sensitivity using local thresholds (p < 0.0001) also improved between workshops. In particular, the median sensitivity was up to fourfold higher in 2005 compared with 2000 in laboratories that participated in three or four workshops (53% [IQR 33–58%] vs 14% [IQR 9–31%]; p = 0.0001). In contrast, the median laboratory-assigned specificity decreased from 2000 to 2005 (p < 0.0001), and this occurred also in the subset of laboratories that participated in three or four workshops (p = 0.0009). Full results for individual laboratories are given in Table 2.
Of 22 laboratories with assay performance below the median AUC in 2002 and/or 2003, ten did not register for DASP 2005 (five participants in 2002, one in 2003 and four in 2002 and in 2003), and the performance of a further five laboratories remained below the median AUC in DASP 2005.
In house radioimmunoassays vs commercial kits
In every DASP workshop, the highest laboratory-assigned sensitivity, specificity, AUC and AS95 for IAA were achieved by laboratories using in-house radioimmunoassays. In DASP 2002, two commercial RIA kits, one time-resolved immunofluorometric assay and one ELISA kit were tested in five different laboratories, but achieved lower sensitivity, specificity, AUC and AS95 (Table 2). In DASP 2003, three commercial RIA kits and one time-resolved immunofluorometric assay were tested in four laboratories, and in DASP 2005, six laboratories tested commercial RIA kits. The results obtained with the six commercial kits are shown together with those of the 26 in-house RIAs in Fig. 2.
Variation between commercial kits
The performance of the kits was variable. In DASP 2005, the median laboratory-assigned sensitivity for assays using kits was 33% (IQR 18–49%) vs 52% (IQR 25–58%) for in-house RIA (p = 0.147), median specificity was 96% (IQR 58.5–99%) vs 98% (IQR 96–99%; p = 0.35), median AUC was 0.78 (IQR 0.48–0.86) vs 0.81 (IQR 0.72–0.83; p = 0.539) and median AS95 was 37% (IQR 14–65%) vs 47% (IQR 33–63%; p = 0.351). In DASP 2002–2005, only one commercial RIA kit (laboratory 132) achieved sensitivity, specificity, AUC and/or AS95 above the median values of all participating laboratories. One RIA kit (laboratory 209) achieved the highest AUC and AS95 of all assays in DASP 2005, but the laboratory-assigned sensitivity was only 22%. Of note, the two RIA kits with lowest AUC and AS95 values used the non-competitive assay format without displacement of IAA binding with unlabelled insulin (Fig. 2a, b; white circles). In DASP 2005, four assays (laboratories 121, 150, 153 and 209) reported values for both AUC and AS95 in the upper quartile.
Concordance of laboratory-reported measurements
In DASP 2005, serum samples from nine patients and one healthy control were reported positive in ≥75% of assays. An additional 12 patient samples, but none of the control samples, were reported positive in ≥50% of assays, and an additional nine patient samples and another two control samples were positive in ≥25% of assays. There was agreement on positive/negative status in ≥75% of assays for 108 samples (nine patient samples and 99 control samples; ESM Fig. 1a, b). In three of four laboratories with assay performances for both AUC and AS95 in the upper quartile, there was agreement for either positivity or negativity in 127 samples (27 patients samples and 100 control samples; data not shown).
The concordance of ranking of the IAA level in the patient samples between all laboratories by linear regression analysis was highly significant (r
2 = 0.642, variance = 73.7, p < 0.0001; Fig. 3). As expected, concordance in ranking of patient samples was lower between the assays with both AUC and AS95 below the 25th centile (n = 5 assays, r
2 = 0.392, variance = 126) than between the assays with AUC and AS95 between the 25th and 75th centile (n = 21 assays, r
2 = 0.669, variance = 67.9; p < 0.0001) and between assays with AUC and AS95 above the 75th centile (n = 4 assays, r
2 = 0.861, variance = 29.3; p < 0.0001 vs lower 25th centile, and p < 0.0001 vs 25th–75th centiles using the F test).
Concordance of laboratory-reported IAA levels, common IAA index and common IA index
IAA and IA indices were calculated in 27 of the 30 assays in DASP 2005. Three laboratories failed to include the standards in their measurements. The DASP IAA standard was reported positive in all assays. The median IAA index of the IB4.4 IA standard was 56.1 (IQR 42.5–82.1) and the median IAA index of the IC9.3-IA standard was 43.3 (IQR 22.4–53.1; p = 0.003); IB4.4 was reported positive in all 27 assays and IC9.3 was reported positive in 26 assays. Further analyses were therefore based on units derived from the IB4.4 standard curve.
The ranking of patient samples by laboratory-reported IAA levels varied greatly between the 27 assays (r
2 = 0.088, variance 128,000; p < 0.0001) and also between the four assays with AUC and AS95 performances in the upper quartile (r
2 = 0.467, variance 468,000; p < 0.0001). The overall concordance of ranking was markedly improved by expressing results as an index in relation to either the IAA or IA common standard (r
2 = 0.779, variance 385, p < 0.0001, and r
2 = 0.747, variance 1,100, p < 0.0001, respectively; F test IAA index and IA index vs laboratory-reported IAA level, p < 0.0001; Fig. 4a, c). This was particularly apparent in the four laboratories with AUC and AS95 performances above the 75th centile (IAA index: r
2 = 0.904, variance 173, p < 0.0001; IA index: r
2 = 0.918, variance 356, p < 0.0001, respectively; F test, IAA index and IA index vs laboratory-reported IAA level, p < 0.0001; Fig. 4b, d). In all assays, the variance of ranking was lower using the IAA index compared with the IA index (F test, p < 0.0001). The ranking by units derived from the complete IB4 standard curve did not improve the inter-laboratory concordance of all assays, or of the four assays with AUC and AS95 in the upper quartile (r
2 = 0.147, variance 204, p < 0.0001, and r
2 = 0.786, variance 711, p < 0.0001; F test IAA index and IA index vs IB4-IA units, p < 0.0001; Fig. 4e, f).
Combined ROC curve
The median IAA index values for each patient and control sample compiled from 27 assay measurements including the standards provided in DASP 2005 were used to construct a combined ROC curve with AUC 0.89 (CI 95% 0.824–0.957, p < 0.0001; Fig. 5). Using this combined curve, the AS95 was defined at 70%. The cut-off IAA index value of 1.5 corresponded to a specificity of 98% and a sensitivity of 54%. In comparison, for autoantibodies to GAD (GADA), the AUC was 0.95 (95% CI 0.91–1.0), and at specificity 98% sensitivity was 88%. For IA-2A, the AUC was 0.86 (95% CI 0.78–0.94), and at specificity 98% sensitivity it was 74% .