The Scottish Diabetes Research Network Type 1 Bioresource (SDRNT1BIO)  is a prospective cohort study comprising 6127 people with a clinical diagnosis of type 1 diabetes mellitus, representing 25% of all adults with type 1 diabetes in Scotland, recruited between December 2010 and November 2013. On the day of recruitment (which we refer to as the study day or biosample date), clinical measurements and blood and urine samples were obtained in which serum creatinine and urinary ACR were measured. From electronic healthcare records we extracted routine health-related data retrospective and prospective to study day, as described . For this study we selected 1629 individuals with eGFR 30 ml min−1 [1.73 m]−2 or above at study biosample date and with at least three prospective eGFR determinations over a period of at least 2 years or incident ESRD. These were a random sample of 50% and 25% of those with starting eGFR below and above 75 ml min−1 [1.73 m]−2, respectively.
The study was performed in accordance with the Declaration of Helsinki; all participants gave their written consent and the study protocol was approved by the local ethics and data governance committee.
Renal outcomes and covariate data
eGFR was calculated with the CKD-EPI equation  using serum creatinine values directly measured and retrieved retrospectively and prospectively from medical records. These excluded readings concurrent with hospital admissions. A summary measure of the historical eGFR was obtained by computing a weighted average of all retrospective eGFR records for each person, with weights inversely related to the amount of time leading to the biosample date. Participants with no retrospective eGFR data had their historical eGFR imputed to study day eGFR. Final eGFR was defined as the median eGFR reading of the last 6 months of follow-up. Initiation of renal replacement therapy was considered to indicate an achieved eGFR of 10 ml min−1 [1.73 m]−2 and all subsequent readings were censored. The decline of renal function was estimated by fitting a simple linear regression model to the serial prospective eGFR determinations of each person. We also defined binary clinically significant progression categories of progression to <30 and <45 ml min−1 [1.73 m]−2.
ACR was measured in paired urine samples with the first taken at study day and the second several days later using the ADVIA 2400 immunoturbidimetric method for albumin and the ADVIA 2400 enzymatic method for creatinine (Siemens, Munich, Germany). These data were used for adjusting for ACR in the analyses here. In addition, longitudinal urinary ACR data were captured from the routine clinical laboratory data. Clinical record data close to study day were highly correlated (r = 0.73) with the direct measurement. At any time point, albuminuric status was defined based on the most recent available albuminuria measurement provided there was no contradictory record of that stage in the preceding or subsequent 90 days; i.e. someone who transited from normo- to microalbuminuria but then had another normoalbuminuria measurement within 90 days was assigned as having been normoalbuminuric across that period, such that transient changes in albuminuria readings were ignored. Participants were classified as normo-, micro- or macroalbuminuric according to their ACR falling in the intervals 0–3.39, 3.39–33.9 or above 33.9 mg/mmol, based on two out of three consecutive measurements before baseline.
Retrospective and prospective covariate data including drug exposures were obtained from the electronic healthcare record SCI-Diabetes as described previously .
Biomarkers measured and analysed
Serum and urine biomarkers were measured on samples stored at −80°C with no prior freeze/thaw cycles. The assay methods used and the quality control performance are summarised in electronic supplementary material (ESM) Table 1. The candidate biomarkers were chosen either because we had already shown these proteins to be predictive of eGFR decline at chronic kidney disease (CKD) stage 3 or worse (serum KIM-1, CD27, α-1-microglobulin, thrombomodulin) or because they were reported from other well-conducted studies as predictive of renal disease progression (EGF and its ratio to monocyte chemotactic protein 1 [MCP-1], EGF receptor, cystatin C, macrophage inhibitory protein, matrix metalloproteinase 8, TNF receptor 1 [TNFR1]) or as strong candidates from known biology of DKD (syndecan-1), or because they are routinely measured on the same multiplex panel as a candidate (the remainder).
We used a combination of assays using (1) the Luminex platform (Austin, TX, USA) at the Clinical Laboratory Improvement Amendments (CLIA)-certified Myriad RBM laboratory (Austin, TX, USA); (2) a high-sensitivity SIMOA assay for KIM-1 at Myriad RBM that we had found detected KIM-1 in samples KIM-1 negative on their standard Luminex assay; (3) R&D Systems (Minneapolis, MN, USA) Luminex assay at the Immunoassay Biomarker Core Laboratory, University of Dundee (Dundee, UK).
Intraclass correlations were computed over 48 blinded duplicate aliquots to evaluate the reproducibility of the measurements obtained. Biomarkers were excluded from the analysis if their intraclass correlation was less than 0.4, or if over 99% of the readings in the dataset were identical due to falling below the detection threshold. Accordingly, nine serum and 13 urinary biomarkers were included in the final analyses.
Values below the detection limit were imputed to half the detection threshold. All urine biomarkers were normalised to urinary creatinine.
To test for associations with renal outcomes, biomarkers were evaluated independently in linear models for final eGFR and logistic regression models for progression adjusted for age, sex, diabetes duration, study day eGFR and length of follow-up (basic covariates). To allow comparison with ACR, we reran the basic and biomarker models including ACR at biosample date. We further adjusted models for BMI, systolic BP, diastolic BP, HbA1c, HDL-cholesterol, total cholesterol, smoking status and a weighted summary of prior eGFR from retrospective records (full covariates). Prior to fitting models, all continuous covariates and biomarkers were Gaussianised and standardised to zero mean and unit standard deviation. Associations were declared significant at Bonferroni-corrected p = 0.05/22 = 2.3 × 10−3.
Construction of parsimonious panels of biomarkers
Urine and serum biomarker sets were modelled independently from each other. As previously described , we adopted a Bayesian modelling approach based on hierarchical shrinkage priors, in which the clinical covariates used to control for confounding in the models were assigned a weakly informative Gaussian prior (which induces some regularisation as in ridge regression), while biomarkers were penalised through the regularised horseshoe prior (which heavily shrinks regression coefficients toward zero unless they are informative) to promote sparsity . A similar approach was also adopted elsewhere for biomarker selection in the context of type 2 diabetes mellitus . The hierarchical shrinkage approach was implemented using the Stan Bayesian inference framework , which uses Markov chain Monte Carlo (MCMC) to sample the posterior distribution of the parameters given the data and the model. The specific models implemented are available in the R package hsstan (version 0.6: https://CRAN.R-project.org/package=hsstan).
We evaluated the predictive performance of models on withdrawn data using tenfold cross-validation. For each set of baseline covariates used, we reported the difference in log-likelihood (computed on the observations withdrawn for testing from tenfold cross-validation) between the model with baseline covariates and biomarkers and the model including only the clinical covariates. For models of achieved eGFR we computed the r2 as the squared Pearson correlation coefficient between observed and predicted outcome on the test folds. For models of progression, we reported the area under the receiver operating characteristic curve (AUC) and the expected information for discrimination, Λ, expressed in bits . This measure quantifies the gain in information that a set of biomarkers provides over and beyond the baseline clinical covariates. The expected information for discrimination was computed with the R package wevid (version 0.6.2: https://CRAN.R- project.org/package=wevid).
To recover a parsimonious model, we then applied projection predictive variable selection . This approach is based on projecting the high-dimensional draws from the posterior of the model containing all biomarkers (full model) onto lower-dimensional subspaces corresponding to sparse candidate submodels [19, 20]. Predictions made by each candidate submodel are compared with those obtained by the full model: their discrepancy is evaluated using the Kullback–Leibler divergence, which measures the information lost when a smaller submodel is used to approximate a more complex one. By recursively choosing in a forward-selection fashion the submodel that minimises the Kullback–Leibler divergence from the full model, one can construct a series of candidate models of growing complexity. We evaluated each candidate model in terms of its relative contribution towards the performance of the full model, and plotted the relative explanatory power obtained by biomarker panels of different sizes.