Multiplex proteomics for prediction of major cardiovascular events in type 2 diabetes

Aims/hypothesis Multiplex proteomics could improve understanding and risk prediction of major adverse cardiovascular events (MACE) in type 2 diabetes. This study assessed 80 cardiovascular and inflammatory proteins for biomarker discovery and prediction of MACE in type 2 diabetes. Methods We combined data from six prospective epidemiological studies of 30–77-year-old individuals with type 2 diabetes in whom 80 circulating proteins were measured by proximity extension assay. Multivariable-adjusted Cox regression was used in a discovery/replication design to identify biomarkers for incident MACE. We used gradient-boosted machine learning and lasso regularised Cox regression in a random 75% training subsample to assess whether adding proteins to risk factors included in the Swedish National Diabetes Register risk model would improve the prediction of MACE in the separate 25% test subsample. Results Of 1211 adults with type 2 diabetes (32% women), 211 experienced a MACE over a mean (±SD) of 6.4 ± 2.3 years. We replicated associations (<5% false discovery rate) between risk of MACE and eight proteins: matrix metalloproteinase (MMP)-12, IL-27 subunit α (IL-27a), kidney injury molecule (KIM)-1, fibroblast growth factor (FGF)-23, protein S100-A12, TNF receptor (TNFR)-1, TNFR-2 and TNF-related apoptosis-inducing ligand receptor (TRAIL-R)2. Addition of the 80-protein assay to established risk factors improved discrimination in the separate test sample from 0.686 (95% CI 0.682, 0.689) to 0.748 (95% CI 0.746, 0.751). A sparse model of 20 added proteins achieved a C statistic of 0.747 (95% CI 0.653, 0.842) in the test sample. Conclusions/interpretation We identified eight protein biomarkers, four of which are novel, for risk of MACE in community residents with type 2 diabetes, and found improved risk prediction by combining multiplex proteomics with an established risk model. Multiprotein arrays could be useful in identifying individuals with type 2 diabetes who are at highest risk of a cardiovascular event. Electronic supplementary material The online version of this article (10.1007/s00125-018-4641-z) contains peer-reviewed but unedited supplementary material, which is available to authorised users.


Introduction
The prevalence of type 2 diabetes is increasing worldwide, with currently over 400 million individuals diagnosed and over 190 million undiagnosed as having diabetes [1]. Up to 40% of the US population will develop type 2 diabetes during their lifetime, and type 2 diabetes is an important contributor to major adverse cardiovascular events (MACE) such as myocardial infarction and stroke-the leading causes of morbidity and mortality in Western countries [2]. Diabetes is one of the strongest risk factors for MACE [3], and one major treatment goal in type 2 diabetes is to prevent MACE. However, compared with those without diabetes, most individuals with type 2 diabetes remain at increased risk of MACE despite optimal treatment according to current guidelines [4,5]. Cardiovascular prevention is further complicated by increased rates of drug side effects in people with diabetes, including potential adverse glycaemic effects of lipid-modifying agents [6,7].
Newer glucose-lowering drugs such as sodium-glucose co-transporter 2 inhibitors, and cholesterol-modifying agents such as proprotein convertase subtilisin/kexin type 9 (PCSK9) inhibitors, in addition to standard treatment reduce cardiovascular risk in high-risk individuals [8,9]. The high treatment costs and potential side effects, however, currently prohibit their prescription in the majority of individuals with diabetes  [7,10,11]. Identification of high-risk individuals in whom the benefits of aggressive prevention outweigh the costs and side effects is therefore crucial. Clinical decision-making based on overall cardiovascular risk in addition to individual risk factors can improve outcomes, as demonstrated for antihypertensive treatment [12]. Available risk models for MACE in type 2 diabetes are, however, only moderately accurate [13,14], and there is a need for better prediction tools to guide healthcare.
Measuring circulating proteins with presumed roles in cardiovascular pathology by targeted proteomics is a promising approach for biomarker discovery [15]. The translation of proteomics into the clinic, however, has so far been hampered by the resource-demanding technology. Multiplex protein arrays that rely on common methods such as PCR, require small sample volumes and are available at a fraction of the cost of large-scale platforms may provide a clinically applicable method for individualised treatment based on biomarker profiles. One such technique, the proximity extension assay, has been shown to be useful for biomarker discovery in cardiometabolic disease [16][17][18]. Multiprotein assays have been used to discover new risk markers for cardiovascular disease in type 2 diabetes [19], but the proximity extension method has not been tested to predict risk of MACE in type 2 diabetes.
Here, we used a proximity extension assay to measure the abundance of 80 cardiovascular and inflammatory proteins in plasma and serum from six prospective community cohorts of middle-aged people (30-77 years of age) with type 2 diabetes. We aimed to identify markers of future risk of MACE and to assess the assay's performance against an established risk model in the Swedish National Diabetes Register for the prediction of MACE.  [20] recruited outpatients aged 55-65 years with type 2 diabetes from 25 primary healthcare centres in the counties of Östergötland and Jönköping, Sweden, between November 2005 and December 2008. Counties were selected to represent different demographic, rural and urban, small-and largeintake areas. Specialist diabetes nurses performed annual assessments [20]. Out of 761 consecutively enrolled participants, 708 with available outcome data and plasma samples were included in the present investigation.

Participating cohorts
Prospective Investigation of the Vasculature in Uppsala Seniors In 2001, a non-selective sample of Uppsala community residents aged 70 years were recruited to participate in the longitudinal Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) study [21] to evaluate measures of endothelial function; 1016 (50.2%) out of 2025 invited individuals enrolled. Follow-up biomedical assessments have been performed at 5-yearly intervals (for more information, please see www.medsci.uu.se/pivus/). All 98 participants with type 2 diabetes at baseline were included in the present study.

Uppsala Longitudinal Study of Adult Men
In 1970, all 2841 male residents of Uppsala county, Sweden, who had been born between 1920 and 1924 were invited to participate in the Uppsala Longitudinal Study of Adult Men (ULSAM) study [22], and 2322 (81.7%) were enrolled. Health assessments have been performed regularly since then (for details, please see www.pubcare.uu.se/ulsam/), and the current study includes all 86 participants with type 2 diabetes at an assessment age of 77 years. (2) claudication symptoms with an ankle-brachial pressure index ≤0.90; or (3) claudication symptoms with signs of arterial occlusive disease in the ipsilateral extremity on ultrasound examination. Out of 614 eligible individuals, 452 (73.6%) enrolled. Control participants (n = 692) were recruited from Swedish residents in the Swedish population register who were matched by age, sex and municipality to participants enrolled in VaMIS. The current study includes all 80 individuals in SAVacontrol and 99 in PADVa who were diagnosed with type 2 diabetes at baseline.

Malnutrition, Inflammation and Vascular Calcification cohort
The aim of the Malnutrition, Inflammation and Vascular Calcification (MIVC) cohort [24] is to study risk factors in kidney disease. Between March 2010 and March 2013, the study enrolled 300 consecutive outpatients who were not undergoing dialysis with stage 3-5 chronic kidney disease at the Dante Pazzanese Institute of Cardiology, São Paolo, Brazil. The current study includes all 140 participants with type 2 diabetes.

Ethical permission
Participants provided written informed consent, and the study was conducted according to the Declaration of Helsinki. Ethical permission was granted by the ethics committees of Linköping University (Dnr. 26

Inclusion criteria and outcome definition
In CARDIPP, MIVC, SAVa-control and PADVa, type 2 diabetes was defined as a physician diagnosis of type 2 diabetes according to national guidelines (at least two separate fasting glucose levels ≥7.0 mmol/l, or at least two separate HbA 1c concentrations >48 mmol/mol [>6.5%; in MIVC], or prescription of diabetes medication). In ULSAM, type 2 diabetes was defined as HbA 1c >48 mmol/mol (>6.5%), prescription of diabetic medication or a fasting plasma glucose level ≥7.0 mmol/l. In 25 out of 86 participants included in ULSAM, diabetes was diagnosed by elevated fasting glucose alone. In PIVUS, type 2 diabetes was defined as a physician diagnosis, prescription of glucose-lowering medication or a fasting plasma glucose level ≥7.0 mmol/l. In the PIVUS group, diabetes was diagnosed by elevated fasting glucose alone in 21 out of the 98 included participants. Individuals without available fasting frozen plasma or serum samples, or with missing outcome data, were excluded. MACE was defined as a new episode of fatal or non-fatal myocardial infarction (I21 in ICD-10; www.who.int/classifications/icd/en/) or fatal/non-fatal stroke (I60-I63), whichever occurred first, and was from obtained from hospital and death register linkage.

Covariate definitions
To adjust for established risk factors, we selected all variables included in the Swedish National Diabetes Register (NDR) calculator for 5 year risk of MACE in individuals with type 2 diabetes [13]: sex, systolic blood pressure (mmHg), BMI (kg/m 2 ), current smoking, diagnosis of atrial fibrillation, history of myocardial infarction or stroke, HbA 1c (mmol/mol, %), HDL-cholesterol and total cholesterol (mmol/l), duration of type 2 diabetes (days), microalbuminuria (3-30 mg/mmol urinary creatinine) and macroalbuminuria (>30 mg/mmol urinary creatinine). Additional covariates included current antihypertensive, statin or diabetes medication, LDL-cholesterol (mmol/l) and eGFR (ml min −1 [1.73 m] −2 ), calculated with plasma creatinine according to sex, age and ethnicity). Missing values in covariates were imputed by multivariate imputation by chained equations with predictive mean matching using all other covariates and averaged across five iterations. Imputed values were compared against recorded values to assess for aberrations.

Multiplex protein assay
Blood samples were obtained from individuals instructed to fast overnight, and were then spun down and stored as serum (ULSAM) or EDTA plasma samples (all other cohorts) at −70°C until analysis. The Proseek CVD Multiplex 96×96 (Olink, Uppsala, Sweden) measures 92 cardiovascular or inflammatory proteins and four internal control samples using the proximity extension assay method (details on quality control, validation and content of the assay are available in electronic supplementary material [ESM] Table 1 and ESM Methods). It has previously been applied to discovering biomarkers for cardiometabolic traits [16][17][18]. In brief, approximately 10 μl of sample were assayed on a 96-well plate, and protein abundance was measured by PCR based on the binding of two specific antibodies for each protein. Log 2 -scaled abundance values adjusted for technical variation with internal controls were transformed to a mean of zero and an SD of 1. Proteins with >15% missing values were excluded. Other missing values were imputed by the lower limit of the detection threshold divided by two. The numbers of missing values are given in ESM Table 2. A total of 12 proteins had >15% missing values in at least one cohort and were excluded, leaving 80 proteins for inclusion in the study.

Statistical analysis
Design The study was divided into two parts, one aimed at biomarker discovery and one at risk prediction (Fig. 1). In part 1, the largest sample, from CARDIPP, was used for discovery, and all other cohorts, combined at the individual level, were used for replication. In part 2, the combined discovery and replication cohorts were randomly split into a 75% training and a 25% test set to assess whether the different proteins would improve the prediction of MACE.
Part 1: biomarker discovery Cox proportional hazards regression adjusted for age and sex was used for each protein, with time-to-MACE as outcome. Participants were considered to be at risk until the occurrence of MACE or until the last day of follow-up. An inverse Gaussian frailty effect was included to adjust for heterogeneity between cohorts. The linearity of associations with risk of MACE was assessed by adding a spline term to the linear model (using the pspline function in R with defaults, and retaining the linearity assumption if the regression β coefficient's p value exceeded 0.05). Proportional hazards assumptions were assessed in Schoenfeld residual plots and tests of weighted residuals (threshold p < 0.05). The protein assay does not provide standard concentration units, and values were scaled to a mean of zero and an SD of 1. Proteins associated below a 5% false discovery rate (FDR) in the CARDIPP discovery sample were tested in the replication sample, and associations at <5% FDR at the replication stage were considered significant. To test for independent associations with MACE, we additionally adjusted for the following established cardiovascular risk factors [25] that were available in the cohorts: atrial fibrillation, BMI, HbA 1c , LDL-cholesterol, microalbuminuria, systolic blood pressure, sex, smoking, statins, duration of type 2 diabetes, history of cardiovascular disease and eGFR.
Part 2: risk prediction To assess whether adding proteins to established risk factors improved prediction, we tested the performance of the variables included in the NDR risk calculator with and without the protein values. The NDR model (https://www.ndr.nu/IFrameRisk/) [13] was developed in the Swedish NDR to predict 5 year risk of MACE in 30-75-yearolds with type 2 diabetes and comprises age of onset and duration of type 2 diabetes, log e (total cholesterol/HDLcholesterol), log e (HbA 1c ), log e (systolic blood pressure), log e (BMI), sex, current smoker, microalbuminuria, macroalbuminuria, atrial fibrillation and history of cardiovascular disease. This is recommended for evaluating cardiovascular risk in adults with type 2 diabetes by the Swedish National Board of Health and Welfare [26].
We selected the NDR variables for our baseline risk model but used a different statistical approach than that used by Zethelius et al, who developed the NDR calculator [13]. The combined cohorts were randomly split into training (75%) and test (25%) datasets. Cox gradient boosted machine (GBM) learning [27] was applied to the training sample. A baseline model with NDR variables and a baseline-plus-protein model were derived. GBM variables were optimised with regard to model performance (AUC) and complexity as explained in ESM Methods. C statistic, sensitivity and specificity were estimated in the separate test sample. Performance measures and CIs were obtained by bootstrapping in 1000 random iterations. In order to identify a sparse selection of proteins that need to be added to the NDR risk factors to achieve comparable risk discrimination as the whole assay, we used L1regularised lasso Cox regression. We forced the NDR risk factors into the model by setting the penalty variable in the cv.glmnet function in R to zero, and trained the model by tenfold bootstrapped cross-validation in a random 75% training sample. The optimum sparse model that minimised the prediction error (selected by lambda.min) was evaluated in the separate 25% holdout test sample. Analyses were performed in R software version 3.3.2 (https://www.r-project.org/) using the packages survival, nephro, mice, powerSurvEpi, gbm, glmnet, pROC and ggplot2.

Results
Sample characteristics Figure 1 illustrates the study flow chart, and

Protein biomarkers associated with risk of MACE
In the discovery sample, 35 out of 80 proteins were associated with prospective risk of MACE at a 5% FDR after adjustment for age and sex (ESM Table 3). Eight associations were replicated at <5% FDR in the separate replication sample (ESM Table 4). In order to test for associations between biomarkers and MACE independent of established risk factors, we combined all cohorts and tested the eight replicated biomarkers in models adjusted for cardiovascular risk factors. Figure 2 shows the results for the eight biomarkers. In the fully adjusted models, increased levels of the following were associated with incident MACE: matrix metalloproteinase (MMP)-12 ( Tables 5-7). A sensitivity analysis with additional adjustment for circulating levels of Nterminal pro-brain natriuretic peptide in those cohorts with available measurements resulted in somewhat increased p values, but essentially left the associations between biomarkers and risk of MACE unchanged (ESM Results, ESM

Discussion
In this prospective multicohort study of adults with type 2 diabetes, we used multiplex proteomics to identify four novel biomarkers associated with prospective risk of a major cardiovascular event independent of potential confounders.
Addition of proteomics data to established risk factors improved the 6 year risk prediction of cardiovascular events.

Novel biomarkers for cardiovascular risk in diabetes
We identified eight circulating biomarkers, including four novel ones, for incident cardiovascular events after adjustment for established risk factors. Our results replicate previous findings in individuals with type 2 diabetes of associations of increased levels of MMP-12 [17], FGF-23 [28], TNFR-1 and TNFR-2 [29] with incident MACE. For the other four biomarkers, we found no previous studies of prospective associations with MACE in type 2 diabetes, although all have been implicated in cardiometabolic disease in other settings. Protein S100-A12 (EN-RAGE), the ligand for RAGE, has been associated with incident type 2 diabetes [30] and risk of coronary heart disease [31]. Interaction between RAGE and EN-RAGE triggers an inflammatory cascade, and it has been shown that expression of protein S100-A12 in vascular smooth muscle cells induces oxidative stress, inflammation and vascular remodelling [32].
KIM-1 is mainly expressed in the apical membrane of the renal proximal tubule, and raised circulating levels of KIM-1 are associated with progressive stages of chronic kidney disease in individuals with type 2 diabetes [33,34]. Associations between raised plasma levels of KIM-1 and adverse cardiovascular risk factors in the general population have recently been reported [35]. Our results in analyses adjusted for kidney function support a potential role of circulating KIM-1 as a cardiovascular risk marker independent of its association with renal function. Our study cannot address the pathogenic mechanisms or potential causality linking KIM-1 to cardiovascular risk in type 2 diabetes, and future experimental studies are indicated. TRAIL-R2 is a cell surface receptor for TNF-related apoptosis-inducing ligand (TRAIL), involved in apoptosis. Raised circulating TRAIL-R2 levels have been linked with cerebral atherosclerosis [36] and increased mortality in acute myocardial infarction [37]. Possible mechanisms linking the TRAIL/TRAIL-R2 pathway to atherosclerotic disease involve the endothelial response to cholesterol deposits [37,38] and the composition of circulating fatty acids, as a study in an Alaskan Inuit population found an association between plasma fatty acid levels and genetic variants of the TRAIL-R2 gene TNFRSF10B [39].
IL-27 has complex pro-and anti-inflammatory effects that include direct modification of CD4 + and CD8 + T cells, as well as roles in both innate and antibody-mediated immunity [40]. It has been linked, for instance, to type 1 diabetes [41] and improved atherosclerosis in mice [42], yet functional genetic variants of IL27 were not associated with cardiovascular outcomes in a sample of Chinese individuals [43]. The roles of the four new biomarkers in inflammatory pathways point to an important role of the immune system in cardiovascular pathology in type 2 diabetes. Whether the novel biomarkers might serve as treatment targets remains to be assessed in future studies.

Multiplex proteomics improves prediction beyond established risk factors
The addition of proteins to the variables included in the NDR risk model significantly improved cardiovascular risk prediction. In our test sample, added biomarkers improved discrimination from 68.6% to 74.8%, compared with 72.0% reported in the original publication of the NDR model [13]. The model containing the NDR risk factors plus proteins also improved sensitivity and specificity for the upper half (79.1% and 55.8%, compared with 76.2% and 52.9%, respectively, in the original NDR model [13]) and the upper quarter of predicted risk (53.6% and 80.5%, compared with 51.2% and 77.9%, respectively). Importantly, direct comparisons with the NDR calculator are not possible as we used a different statistical method and study design, as well as a smaller test sample and a somewhat longer follow-up of approximately 6 years. The crucial comparisons are therefore the test set performances in our own sample. Predictor selection with lasso regression retained a subset of 20 proteins in addition to risk factors and achieved a near-identical discrimination performance (C = 0.747) as the model including all 80 proteins. Our results demonstrate that adding proteomics data to known risk factors might aid decision-making for cardiovascular prevention in individuals affected by type 2 diabetes. The protein assay used in this study analyses small sample volumes in under 48 h, making it potentially useful for clinical practice. The accessibility of proteomics platforms is likely to increase in the coming years, and a number of studies have demonstrated how proteomics can discover new biological insights [16][17][18].
Clinical decisions about whether more aggressive cardiovascular prevention with newer drugs will benefit individuals with type 2 diabetes are difficult, given the progressively smaller benefits, risk of side effects and treatment costs [7-9, 11, 14]. In this study, we demonstrate how a multibiomarker assay can improve risk prediction; future studies in an embedded healthcare setting are indicated to assess the value of '-omics' methods in day-to-day practice. Targeted cardiovascular proteomics might also be useful for streamlining clinical trials of cardiovascular prevention by risk-stratifying participants for cardiovascular prevention, which could lead to improved power to detect clinically meaningful effects and limit expenses [44]. Any application of proteomics with clinical consequences, however, first requires careful validation in future studies.

Strengths and limitations
Strengths of our study include the prospective community samples, a discovery/replication design and the use of a low sample-volume assay with high-specificity antibody doublets. We limited the risk of overfitting by replicating results in a separate random test subsample and averaging across 1000 iterations, but the bootstrapped CIs have to be interpreted with caution, and our model should be replicated in an independent study. The C statistic of the baseline model was somewhat lower than expected, which may have led to overoptimistic results after adding proteins. On the other hand, the C statistic is usually rather insensitive to added predictors, and we showed convincing improvement [45]. Limitations include the moderate sample size, lack of power to assess the components of MACE as separate outcomes, and failure of 12 proteins in quality control because of missing values. Generalisability is limited to middle-aged to elderly adults (30-77 years of age). Analyses accounted for heterogeneity between cohorts and, rather than limiting variability and effective sample size by tightening the inclusion criteria, we attempted to increase external validity by including a broader range of individuals with type 2 diabetes that would reflect clinical reality.

Conclusion
We found that a high-throughput multiprotein assay for presumptive disease markers can identify novel biomarkers and improve the identification of individuals with type 2 diabetes at highest risk of a cardiovascular event. Larger clinic-based studies are needed to assess the value of multiplex proteomics in a healthcare context. Data availability The authors report that, for approved reasons, some access restrictions apply to the data underlying this study. Phenotypes from ULSAM, PIVUS, CARDIPP, MIVC, SAVa-control and PADVa are not publicly available for ethical reasons, as agreed upon by participating volunteers in their informed consent. Data are available on request for researchers who meet the criteria for confidential data access. Data from the ULSAM study are available from the ULSAM steering committee (http://www.pubcare.uu.se/ulsam/Database; contact: V. Giedraitis, vilmantas.giedraitis@pubcare.uu.se). Data from the PIVUS study are available from the PIVUS steering committee (http://www.medsci.uu. se/pivus/; contact: lars.lind@medsci.uu.se). Data from the MIVC study are available from the MIVC steering committee (contact: A. Cordeiro, accordeirojr@uol.com.br). Data requests for the PADVa/SAVa study should be addressed to the steering group (https://savastudy.se/ coworkers/; contact: P. Hedberg, par.o.hedberg@regionvastmanland.se). Data requests in the CARDIPP study should be addressed to the steering committee (details: https://clinicaltrials.gov/ct2/show/NCT01049737; contact: C. J. Östgren, carl.johan.ostgren@liu.se). Duality of interest EI is a scientific advisor for Olink Proteomics for projects unrelated to the present study. The company had no influence over the design, analysis or interpretation of data in the present study, and did not provide any funding for the study. JÄ has received lecturing fees from AstraZeneca unrelated to the present project. The authors report that there are no other duality of interests associated with their contribution to this manuscript.