Protein markers and risk of type 2 diabetes and prediabetes: a targeted proteomics approach in the KORA F4/FF4 study

The objective of the present study was to identify proteins that contribute to pathophysiology and allow prediction of incident type 2 diabetes or incident prediabetes. We quantified 14 candidate proteins using targeted mass spectrometry in plasma samples of the prospective, population-based German KORA F4/FF4 study (6.5-year follow-up). 892 participants aged 42–81 years were selected using a case-cohort design, including 123 persons with incident type 2 diabetes and 255 persons with incident WHO-defined prediabetes. Prospective associations between protein levels and diabetes, prediabetes as well as continuous fasting and 2 h glucose, fasting insulin and insulin resistance were investigated using regression models adjusted for established risk factors. The best predictive panel of proteins on top of a non-invasive risk factor model or on top of HbA1c, age, and sex was selected. Mannan-binding lectin serine peptidase (MASP) levels were positively associated with both incident type 2 diabetes and prediabetes. Adiponectin was inversely associated with incident type 2 diabetes. MASP, adiponectin, apolipoprotein A-IV, apolipoprotein C-II, C-reactive protein, and glycosylphosphatidylinositol specific phospholipase D1 were associated with individual continuous outcomes. The combination of MASP, apolipoprotein E (apoE) and adiponectin improved diabetes prediction on top of both reference models, while prediabetes prediction was improved by MASP plus CRP on top of the HbA1c model. In conclusion, our mass spectrometric approach revealed a novel association of MASP with incident type 2 diabetes and incident prediabetes. In combination, MASP, adiponectin and apoE improved type 2 diabetes prediction beyond non-invasive risk factors or HbA1c, age and sex. Electronic supplementary material The online version of this article (10.1007/s10654-018-0475-8) contains supplementary material, which is available to authorized users.


Introduction
Type 2 diabetes causes an enormous burden for the individual as well as for societies of many countries worldwide [1]. Therefore, improved understanding of disease pathophysiology and the development of preventive measures are particularly important, as well as tools for optimal prediction of future disease occurrence enabling targeted preventive measures. Biomarker data may contribute to these aims [2][3][4][5].
Research on type 2 diabetes risk stratification either aims at diagnosis of insulin resistance or prediabetes [6,7] or attempts to directly predict the future risk of diabetes [7][8][9][10]. Ideally, already the development of prediabetes is prevented as not only diabetic patients but also prediabetic persons may suffer from complications caused by hyperglycemia [11]. Yet, only few studies have developed algorithms to predict incident prediabetes [12].
An important issue for new prediction algorithms or devices is the selection of an appropriate benchmark model. Typical decision points are whether the new algorithm shall add benefit on top of or replace mostly questionnaire-based non-invasive tools or other benchmark biomarkers [13]. Often a combination of both is relevant. Type 2 diabetes is usually diagnosed by fasting glucose or HbA1c concentrations [14] and several other blood markers (most prominently insulin) are known to play important roles in the pathophysiology [3,15]. Therefore, several biomarkers are already available and a deliberate choice is needed.
The German Diabetes Risk Score (GDRS) is a non-invasive tool and currently recommended for type 2 diabetes screening and risk prediction in Germany [16,17]. Since there is no international consensus regarding prediction algorithms in the field of type 2 diabetes, we used the established risk factors included in the GDRS for benchmarking in our German study. Additionally, we used HbA1c concentrations, which have been proposed to be evaluated in combination with the GDRS for screening and prediction [17]. The main advantage of HbA1c in contrast to glucose is its spontaneous availability, avoiding fasting, which is no longer required in routine clinical practice [18], and avoiding oral glucose tolerance test (OGTT) burden [19].
Our study aimed to identify novel protein associations with incident type 2 diabetes or incident prediabetes in order to further elucidate pathophysiological processes underlying diabetes development. 14 candidate proteins selected based on previous results from a mouse model on type 2 diabetes (5), unpublished shotgun discovery proteomic analyses, and literature mining were quantified by a targeted selection reaction monitoring (SRM) mass spectrometry (MS) approach. In addition to the dichotomous outcomes, we investigated whether baseline plasma protein levels were prospectively associated with the diabetes-related continuous outcomes fasting glucose, OGTT 2-h-glucose, fasting insulin, and insulin resistance. Furthermore, our study aimed to assess whether our protein panel improved the prediction of incident type 2 diabetes or incident prediabetes on top of established risk factors and to select the protein subsets with the best predictive power.

Study population, definitions of incidence outcomes and selection criteria
The Cooperative Health Research in the Region of Augsburg (KORA) F4 (2006)(2007)(2008) and FF4 studies (2013-2014) are follow-up examinations of the population-representative KORA S4 study (1999)(2000)(2001), which was conducted in Augsburg (Germany) and two surrounding counties. The study design has been described previously in detail [20].
Previously known type 2 diabetes was defined as selfreport that could be validated by the responsible physician or medical chart review, or as current use of glucose-lowering medication. All participants without known diabetes were assigned to receive a standard 75 g oral glucose tolerance test (OGTT). Blood samples were taken without stasis after an overnight fast of at least 8 h and 2 h after glucose solution intake. Normoglycemia (i.e. fasting glucose < 6.1 mmol/l and 2-h-glucose < 7.8 mmol/l), prediabetes (fasting glucose ≥ 6.1 mmol/l but < 7.0 mmol/l, and 2-h-glucose < 7.8 mmol/l [isolated impaired fasting glucose (IFG)] or fasting glucose < 6.1 mmol/l and 2-h-glucose ≥ 7.8 mmol/l but < 11.1 mmol/l [isolated impaired glucose tolerance (IGT)], or both [IFG and IGT]), and newly diagnosed diabetes (fasting glucose ≥ 7.0 mmol/l or 2-h-glucose ≥ 11.1 mmol/l) were defined according to the 1999/2006 WHO criteria [21,22]. Newly diagnosed and known diabetic participants for whom the diabetes type could not be validated and for whom no contradictory information was given were assumed to have type 2 diabetes. The outcome diabetes included known and newly diagnosed diabetes.
The KORA F4 study included 3080 participants aged 32-81 years, of whom 2161 also participated in KORA FF4 (Fig. 1). For the current prospective KORA F4/FF4 investigation we excluded 189 KORA F4 participants with prevalent diabetes, 112 participants with unclear diabetes status or missing/invalid OGTT at KORA F4 or FF4, 407 participants younger than 42 years at KORA F4 (due to the low incidence of type 2 diabetes in the lowest 10-year agecategory), and six participants with missing covariate data. The remaining 1447 participants qualified for SRM-MS protein measurements. Out of these, we randomly selected a subcohort of 728 participants plus all additional incident type 2 diabetes or incident prediabetes cases. Two participants were excluded due to outliers in the SRM-MS data. The final study comprised 890 participants. The type 2 diabetes analysis sample included 660 non-cases from the subcohort and 123 incident cases. The (pre)diabetes analysis sample contained 446 non-cases from the subcohort and 255 incident cases; these 255 cases comprised 223 prediabetic and 32 diabetic cases who directly progressed from normoglycemia to diabetes. The case-cohort sampling is illustrated in Supplemental Fig. 1. The longitudinal analyses of the continuous diabetes-related outcomes were restricted to 831-855 (depending on outcome) participants with complete data who were not taking glucose-lowering medication.

Clinical measurements and assessments of risk factors
All participants underwent standard physical and medical examinations at KORA F4 and FF4.
HbA1c in KORA F4 was assessed in hemolyzed whole blood using a cation-exchange high performance liquid chromatographic, photometric assay on an Adams HA 8160 Hemoglobin Analysis System (Arkray Inc., distributed by A. Menarini Diagnostics, Florence, Italy). Insulin levels in KORA F4 were measured in thawed serum by an electrochemiluminescence immunoassay on a Cobas e602 instrument (Roche Diagnostics GmbH, Mannheim, Germany). The assessment of the other metabolic, anthropometric, and lifestyle variables was performed as described [23]. In KORA FF4, glucose concentrations were measured in fresh serum by an enzymatic, colorimetric method using the GLU assay on a Dimension Vista 1500 instrument (Siemens Healthcare Diagnostics Inc., Newark, USA) or using the GLUC3 assay, on a Cobas c702 instrument (Roche). KORA FF4 serum insulin concentrations were assessed by a solidphase enzyme-labeled chemiluminescent immunometric assay on an Immulite 2000 systems analyzer (Siemens) or by an electrochemiluminescence immunoassay on a Cobas e602 instrument (Roche). The measurement instrument and assays changed in KORA FF4 from Siemens to Roche halfway during the study. Calibration formulas were developed  No calibration was needed for glucose, because the double measurements were very similar. The Siemens insulin results were calibrated to the Roche measurements using the following formula: Insu-lin_Roche = 7.842 pmol/L + Insulin_Siemens × 1.016. The homeostasis model assessment insulin resistance (HOMA-IR) was calculated as fasting insulin (in pmol/l) × fasting glucose (in mmol/l) ÷ 135. Parental history of diabetes was defined as positive (at least one parent with diabetes), negative (both parents without diabetes) or unknown diabetes status (else). Sibling history of diabetes was defined as positive (at least one sibling with diabetes) or negative (no siblings or all siblings without diabetes or with unknown diabetes status). Because the KORA F4 participants older than 72 years had not been asked for their parental and sibling history of diabetes, missing values were replaced using the participants' data from the KORA/MAGiC-Control, KORA S4 or FF4 studies where available. One sibling history value was imputed using covariate data with the package mice in R [24].

Targeted SRM-MS protein measurements
The plasma samples of 271 persons of the subcohort had already been measured previously (lot 1) [23]. The remaining subcohort and all additional incident cases were newly measured in 2016 when the FF4 data became available (lot 2, n = 621). These measurements were performed similarly as published for lot 1 and are described together with the combined data preprocessing in detail in the Supplemental Material.
In short, the 621 lot 2 samples were randomly distributed into thirteen processing batches and quality control was performed as described for lot 1 [23]. All lot 2 sample preparation batches passed quality control. All measured peptides were proteotypic with the exception of the two 'MASP' peptides TGVITSPDFPNPYPK and AAGNECPELQP-PVHGK which together with the SLPTCLPVCGLPK peptide are transcribed from the MASP1 gene (according to Ensembl human database, release 90, August 2017). While the 'SLPT' peptide only translates into the MASP-1 isoform, the 'TGVI' and 'AAGN' peptides translate into the protein isoforms MASP-1, MASP-3 and MAP44 [25]. Accordingly, all results for the protein originally termed MASP-1 [23] are addressed in this manuscript as MASP. Isotope-labelled synthetic peptides were used for each peptide as internal control to correct signal integration and for relative quantification as described [23]. Liquid chromatography MSMS analysis was performed on an Ultimate3000-HPLC system (Thermo Fischer Scientific, Dreieich, Germany) coupled online to a QTrap4000 mass-spectrometer (ABSCIEX, Framingham, MS, USA) by a nanospray ion source. This technique measures the area under the curve (AUC) signals of pre-specified collision-induced peptide dissociation products, the so-called transitions.
Coefficients of variation (CV) of pooled samples were calculated for all transition signals based on five replicate measurements per lot using the software AuDIT [26]. Transitions with a CV ≥ 30% were excluded. Light (endogenous) to heavy (synthetic) ratios of the AUC values were calculated, log 2 -transformed, and averaged for all transitions of each peptide. Peptide-level light-to-heavy-ratios were averaged per protein to yield relative protein levels. Within this process, the data was corrected for technical covariates. Quality control of signals was based on CV-results of lot 1 and lot 2 pools and 29 duplicate measurements of lot 2 samples. Only peptides demonstrating reliable reproducibility were included in the combined analysis, leaving a total of 30 peptides in lot 1 and 31 peptides in lot 2, and representing 14 candidate proteins (Supplemental Table 1

Statistical analysis
Statistical analyses were performed using R version 3.4.2 [27]. All SRM-MS protein-level light-to-heavy-ratios were divided by their sex-specific SDs. Associations between standardized protein light-to-heavy-ratios and incident type 2 diabetes/incident (pre)diabetes were analyzed by logistic regression. For the prospective analyses between protein light-to-heavy-ratios and continuous diabetes-related outcomes, all diabetic participants using glucose-lowering medication were excluded; the glucose and insulin variables were log e -transformed and z-standardized and analyzed by linear regression, adjusting the follow-up outcome for the respective baseline variable.
The association analyses focusing on pathophysiological mechanisms were adjusted for important baseline type 2 diabetes risk factors, i.e. for sex (male/female), age, waist circumference, and height (all continuous) (model 1), plus smoking (current/former/never), physical inactivity (inactive/active), actual hypertension (yes/no), triglyceride concentration, and total cholesterol/HDL-cholesterol ratio (model 2a). Model 3a was adjusted for all variables available in KORA F4 of the primary prediction reference model, the non-invasive GDRS [16] (GDRS adapted , the adaptations are described in the Supplemental Material). Compared to model 2a, the two lipid variables were replaced with parental (both parents/one parent/unknown/no) and sibling (at least one sibling/no sibling) history of diabetes. The models 2b and 3b were additionally adjusted for baseline HbA1c concentrations.
Interactions between sex and plasma protein levels were assessed in the main pathophysiological model 2a. Because the MASP protein signal comprised peptides derived from different protein isoforms, we conducted sensitivity analyses on the association between the individual MASP peptides and incident type 2 diabetes. In another sensitivity analysis, we tested whether exclusion of the participants who converted directly from normoglycemia to incident type 2 diabetes affected the association estimates for incident prediabetes.
In the prediction analyses, we first selected the protein subsets which best predicted incident diabetes or incident (pre)diabetes according to the Akaike Information Criterion using logistic regression with stepwise variable selection on top of the following basic models: (1) GDRS adapted , (2) age + sex + HbA1c, (3) GDRS adapted + HbA1c. For assessment of the predictive performance, we calculated the following metrics with 95% CIs using 10,000 bootstrapsamples as internal validation approach: (1) area under the receiver operating characteristic (ROC) curve (AUC) of the basic and protein-extended models and their difference (DeltaAUC), (2) integrated discrimination improvement (IDI) and (3) category-free net reclassification improvement (cfNRI) [28]. ROC-plots and risk assessment plots, which separately show the sensitivity of case prediction and the false positive rate of non-case-prediction over all possible risk cut-off values [29], were drawn.
Test results with two-sided p value < 0.05 were considered statistically significant. Table 1 shows the characteristics of the study participants. The cases comprised more men than women and were on average older. At baseline, they were more likely to be physically inactive, to suffer from actual hypertension, and to have a parental and sibling history of diabetes. Furthermore, they had a higher waist circumference, a higher total cholesterol/ HDL-cholesterol ratio, and higher triglyceride concentrations. At baseline and at follow-up cases had higher levels of fasting glucose, 2-h-glucose, fasting insulin, and HOMA-IR.

Analyses focused on pathophysiological mechanisms
After adjustment for age, sex, waist and height (model 1), the levels of adiponectin were inversely and of apolipoprotein C-II (apoC-II), apoC-III, apoE, and MASP positively associated with incident type 2 diabetes (Supplemental Table 2). MASP levels were also positively associated with incident (pre)diabetes.  (Fig. 2); MASP also remained significantly associated with incident (pre)diabetes (1.241 [1.036, 1.486]). Adiponectin association estimates for incident diabetes and incident (pre)diabetes tended to be stronger in men than in women (p value sex-interaction = 0.053 and 0.067, respectively), and the apoC-II diabetes association tended to be stronger in women (p value sex-interaction = 0.077).
In the sensitivity analysis of the outcome incident prediabetes, in which we excluded the 32 study participants having progressed directly from normoglycemia to incident diabetes, the results remained essentially the same (Supplemental Table 3). In the sensitivity analysis investigating separate associations between the three measured MASP peptides and incident diabetes, the 'TGVI' peptide showed the strongest association (OR per sex-specific SD = In the prospective analyses of the continuous outcomes, adiponectin levels were inversely associated with fasting insulin and HOMA-IR levels at follow-up (Fig. 3). Positive associations were observed for apoC-II with fasting glucose, fasting insulin and HOMA-IR, for C-reactive protein (CRP) as well as for glycosylphosphatidylinositol specific phospholipase D1 (GPLD1) with fasting insulin and HOMA-IR, and for apoA-IV and MASP with 2-h-glucose.

Prediction analyses
The non-invasive GDRS-variables predicted incident type 2 diabetes with an AUC of 0.749 [0.687, 0.807] ( Table 2). A subset of three out of all 14 proteins, namely MASP, adiponectin and apoE, was found to best improve the prediction on top of the non-invasive risk factors, with an AUC of 0.772 [0.712, 0.828]. Figure 4a illustrates the AUC values estimated using the complete study data without bootstrapping because they cannot be adequately drawn for the bootstrap-approach.
The same protein panel also best improved diabetes prediction compared to our second reference model consisting of age, sex, and HbA1c.  Table 2). While the cfNRI describes the proportion of individuals for whom the change in calculated risk was in the desired direction (higher for cases, lower for non-cases), the IDI quantifies the actual change in calculated risk for each individual Table 1 Baseline characteristics of the study population Percentages are given for categorical variables, arithmetic means ± SDs for approximately normally distributed, and median (25 th ; 75 th percentile) for skewed continuous variables a Nondiabetic (fasting glucose < 7.0 mmol/l and 2-h-glucose ≤ 11.1 mmol/l) at baseline and follow-up b Nondiabetic at baseline and known clinically diagnosed (n = 56) or newly OGTT diagnosed (n = 67) type 2 diabetes (fasting glucose ≥ 7.0 mmol/l and/or 2-h-glucose ≥ 11.1 mmol/l) at follow-up c Normoglycemia (fasting glucose < 6.1 mmol/l and 2-h-glucose < 7.8 mmol/l) at baseline and follow-up d Normoglycemia at baseline and prediabetes (n = 223) or known (n = 16) or newly diagnosed (n = 16) type 2 diabetes at follow-up (fasting glucose ≥ 6.1 mmol/l and/or 2-h-glucose ≥ 7.8 mmol/l) e For differences between groups: Kruskal-Wallis test for continuous variables; χ 2 test for categorical variables f Skewed, continuous variables g Descriptive statistics for the continuous type 2 diabetes related traits are only given for the study participants who were included in the linear regression analyses of these traits; number of non-cases/cases incident type 2 diabetes: n = 660/88 for fasting glucose, n = 660/64 for 2-h-glucose, n = 657/87 for fasting insulin and HOMA-IR; non-cases/cases (pre)diabetes: n = 446/247 for fasting glucose, n = 446/239 for 2-h-glucose, n = 445/246 for fasting insulin and HOMA-IR (sensitivity for cases and 1-specificity for non-cases). For both reference models the gain in prediction performance mainly consisted of an improved sensitivity to identify diabetes cases (Fig. 4b, c)

Discussion
This large SRM-MS-based cohort study discovered a novel association between MASP protein levels and both incident type 2 diabetes as well as incident (pre)diabetes even after adjustment for established risk factors and biomarkers. This means that MASP levels are not only elevated relatively shortly before the onset of type 2 diabetes but already in those normoglycemic individuals who will progress to (pre) diabetes during the next 6.5 years. The results of this study also clearly show that MASP together with apoE and adiponectin improves the prediction of type 2 diabetes on top of non-invasive risk factor variables and on top of age, sex, and HbA1c concentrations, which are well-known to have a high predictive power [30].

MASP signal
Our previous cross-sectional investigation of the KORA F4 study showed that, compared to normoglycemic persons, prediabetic and diabetic cases had higher MASP plasma levels. Moreover, higher MASP levels were associated with both higher fasting and 2-h-glucose levels [23]. Krogh and coworkers have recently confirmed this cross-sectional association by reporting higher MASP-1 levels in persons with type 2 diabetes compared to controls [31].
The current prospective investigation, in which we newly measured protein data for more than two thirds of the studied persons, adds that MASP plasma levels are already elevated years before type 2 diabetes or (pre)diabetes developed. The potential mechanistic link is currently unclear, but non-traditional roles of complement proteins, such as involvement in type 2 diabetes associated inflammation, beta-cell secretory function and maintaining homeostasis of the pancreatic islets have been suggested [32]. In our data MASP was specifically associated with elevated 2-h-glucose concentrations suggesting that particularly insulin secretion after glucose stimulation may be impaired. Since our MASP signal stems from peptides which are proteotypic for the three isoforms MASP-1, MASP-3, and MAP44, a clear discrimination between these isoforms is currently not possible. MASP-1 is the most abundant serine protease of the complement lectin pathway and thus a major player in the complement cascade which is initiated when a complex comprising mannose-binding lectin (MBL), MBL-associated serine proteases (MASPs: MASP-1, MASP-2, MASP-3) and MBL-associated proteins (MAP19 and MAP44) binds to its target carbohydrate-containing  ligands, primarily derived from pathogens or damaged tissues [33,34]. In our peptide-specific sensitivity analysis, the 'TGVI' and 'AAGN' peptides were most strongly associated with incident diabetes. In contrast to the 'SLPT' peptide which is proteotypic solely for the MASP-1 isoform, these two peptides can derive from MASP-1, MASP-3 or MAP44, suggesting that not MASP-1 but rather MASP-3 or MAP44 are responsible for the observed association and should also be most useful for prediction purposes. As compared to MASP-1 and MASP-2, MASP-3 has a distinct substrate specificity and inhibitor profile and for instance selectively cleaves insulin-like growth factor (IGF) binding protein 5 (IGFBP-5) which binds to IGFs such as IGF-1. MASP-3 may thus modulate the interaction between IGFs and their receptors on cell surfaces [35]. Circulating IGF-1 has been shown to correlate negatively with HOMA-IR in the Framingham Heart Study [36].

Further mechanistic implications
While it has been known for a long time that triglycerides, total cholesterol and HDL cholesterol are associated with type 2 diabetes risk, the association between different apolipoprotein components and type 2 diabetes has only recently been addressed. In the present study apoE, apoC-II and apoC-III levels were higher in those study participants who later developed incident type 2 diabetes, when adjusted for age, sex, and anthropometric measures. This confirms previous studies for apoE [37] and apoC-III [37,38]. All three apolipoproteins correlate positively with the total cholesterol/HDL-cholesterol ratio and triglycerides [23] (although this fact is not completely understood for apoC-II, because this apolipoprotein is an essential cofactor for the lipoprotein lipase which mediates triglyceride hydrolysis [39]). In order to assess whether the apolipoproteins are associated with incident diabetes independently of these known risk factors, we adjusted for these lipids as a next step. The Fig. 3 Estimated difference in continuous outcomes at follow-up for study participants not taking glucose-lowering medication expressed as the SD change in the continuous outcome (standardized z-score β estimate with 95% CI) per one sex-specific SD increase in the respective protein, adjusted for age, sex, waist circumference, height, smoking, physical inactivity, actual hypertension, triglyceride level, total cholesterol/HDL-cholesterol ratio (model 2a) and the baseline value of the investigated outcome variable. FG fasting glucose (n = 855); 2hG 2-h-glucose (n = 831); FI fasting insulin (n = 851), IR HOMAinsulin resistance (n = 851). Bars and diamonds of proteins associated statistically significantly are printed in bold. apoA-IV apolipoprotein A-IV; apoC-II apolipoprotein C-II; apoC-III apolipoprotein C-III; apoE apolipoprotein E; CD5L CD5 molecule-like; CRP C-reactive protein; GPLD1 glycosylphosphatidylinositol-specific phospholipase D1; MASP mannan-binding lectin serine peptidase; MBL2 mannosebinding lectin 2; PZP pregnancy-zone protein; RBP4 retinol-binding protein 4; SHBG sex hormone-binding globulin; THBS1 thrombospondin 1 ◂ adjustment attenuated all associations which corresponds to previous results observed for apoE but not for apoC-III [37]; the reason might be that in contrast to the previous study, we adjusted not only for triglycerides but also for the total cholesterol/HDL-cholesterol ratio. Interestingly, apoC-II, which to our knowledge has not been investigated for association with incident type 2 diabetes to date, showed the strongest positive associations with fasting glucose, fasting insulin and HOMA-IR of all investigated apolipoproteins in the present study. Several pathways have been suggested to link apolipoproteins with diabetes risk. Apart from the pathogenic role of higher triglyceride levels associated with higher apolipoprotein levels [39], and their possible role in inflammatory pathways [38], it was shown in animal and cell studies that apoC-III may promote the development of diabetes directly, by interfering with both function and survival of pancreatic beta-cells [40].
The inverse association between concentrations of the adipose-tissue derived hormone adiponectin and incident type 2 diabetes observed in the present study is well established and confirms our own previous investigations using a different measurement technology [4] and work from other groups [41].

Predictive value of proteins
Regarding our prediction results, several issues deserve to be highlighted. First, the HbA1c reference model predicted type 2 diabetes substantially better than the GDRS adapted -variables alone. Nevertheless, MASP, apoE, and adiponectin improved the prediction on top of HbA1c, age and sex to a similar extent in terms of the IDI-and NRI-metrics as on top of the GDRS adapted reference model. Second, the combination of the circulating biomarkers MASP, apoE, adiponectin, HbA1c, age and sex predicted diabetes equally well as the GDRS adapted -variables and HbA1c. We therefore conclude that while the non-invasive GDRS was mainly developed for  Fig. 4 a Receiver operating characteristic (ROC) curves comparing main prediction models for incident type 2 diabetes. b Risk assessment plot for the GDRS adapted prediction model, without (dashed lines) and with (solid lines) protein-extension. Lines in the lower left part of the figure represent 1-specificity for all possible risk cutoffs for non-cases; lines in the upper right part represent sensitivity for type 2 diabetes cases. The grey area represents the integrated discrimination improvement (IDI). c Risk assessment plot for the 'Age + Sex + HbA1c' prediction model, without (dashed lines) and with (solid lines) protein-extension. The non-case data was grossed up to represent the complete study cohort for the parts B and C of this figure in order to illustrate the relationship between risk of type 2 diabetes, sensitivity and specificity correctly. The ROC-and risk assessment plots were drawn using the complete study data without bootstrapping. Therefore, the AUC values displayed here deviate from the AUC bootstrap values given in the text. All basic, extended and DeltaAUC values computed based on the complete study data are supplied in the Supplemental Table 4 ▸ self-assessment of one's own risk of type 2 diabetes development [42], biomarker information may strongly improve the predictive power. HbA1c combined with information on age and sex is highly predictive, and the additional assessment of the circulating proteins MASP, apoE and adiponectin may improve the prediction even further. Other recent unbiased biomarker prediction studies using different measurement platforms have also identified promising candidates for type 2 diabetes prediction such as ferritin, α-hydroxybutyrate, or α-tocopherol [9,10]. Combining these markers with the novel markers identified in the present study may further improve the prediction. However, this needs to be investigated in additional studies as the comparison of the predictive performance across different published studies is hampered by use of different methodology, especially by use of different study populations, reference models and prediction measures [43]. Moreover, the clinical relevance of the biomarkers' gain in predictive power will depend on the availability of cost-efficient measurement devices.
Concerning (pre)diabetes, both reference models had similar predictive power in terms of AUCs, but overall the AUCs for the prediction of incident (pre)diabetes were substantially lower as compared to incident diabetes. The proteins' predictive power was also lower. On top of the non-invasive GDRS adapted -variables, there was no evidence of improved prediction. However, on top of HbA1c, age, and sex, MASP combined with CRP improved (pre)diabetes prediction significantly in terms of both IDI-and cfNRI-metrics. We are aware of only one other study that has assessed the utility of biomarkers to predict incident prediabetes [12]. This study investigated electronic health record data and also found that CRP levels (besides HDL cholesterol and alanine aminotransferase) predicted the progression from normoglycemia to prediabetes, though no information on statistical significance of the prediction improvement was reported.

Strengths and limitations
To our knowledge, this study is the largest SRM-MS-based proteomics biomarker study in the field of type 2 diabetes research. An additional strength is the prospective design with OGTT data available at baseline and follow-up. This enabled us to investigate (1) who progressed to type 2 diabetes (including newly diagnosed diabetes) among all participants who were nondiabetic at baseline, (2) who converted to (pre)diabetes among all participants who were normoglycemic at baseline, and (3) prospective associations with continuous glucose and insulin outcomes. The availability of many established type 2 diabetes risk factors and HbA1c concentrations made adjustment of our association analyses for the most relevant confounders and selection of appropriate reference models for prediction benchmarking possible.  Finally, we assessed the predictive power of the proteins thoroughly, using several metrics and plots as recommended [43]. A limitation is that our approach did not provide absolute protein concentrations, which, however, should not affect the reported associations. Moreover, it was unclear to which protein isoform the MASP signal belonged. Nevertheless, the signal can be used for prediction purposes, but future studies should clarify which protein isoform or combination thereof is physiologically most relevant. Another limitation is the use of an adapted version of the GDRS as reference model, which most probably lead to a slightly lower basic predictive performance compared to the originally proposed model. In our second reference model, we used HbA1c together with age and sex information amongst others because HbA1c concentrations are not affected by food intake. However, because our study only comprised fasting blood sampling, we could not investigate whether the predictive power of our selected proteins would be equally high using non-fasting samples. Finally, although this study used a state-of-the-art statistical technique for internal validation of the prediction performance, no replication in an independent study has been conducted. Therefore, and because we did not adjust our analyses for multiple testing due to our relatively small sample size (compared to other epidemiological studies investigating single or only few markers), corroboration of our findings will be necessary.

Conclusion
In summary, we report a novel association of increased MASP plasma protein levels with incident type 2 diabetes and incident prediabetes, independent of established type 2 diabetes risk factors. In combination with apoE and adiponectin, MASP improved the prediction of type 2 diabetes beyond non-invasive risk factor variables and beyond HbA1c, age, and sex. External replication and cost-effectiveness studies will need to assess the clinical relevance of the proteins' gain in predictive power.