Background

The continued need for tools to identify risk for future cardiovascular disease (CVD) and Type 2 diabetes mellitus (T2DM) has driven the design of risk-prediction scoring systems, which can then be used to identify and motivate high-risk patients toward preventative treatments and lifestyle improvement. These predictive scoring systems include the Framingham calculator from D’Agostino et al. [1, 2] and the more recent atherosclerotic cardiovascular disease (ASCVD) Pooled Cohort Equations scoring system by Goff et al. [3] (Table 1). These systems can be used clinically to detect future CVD risk, with respectable c statistics of 0.76–0.79 for the Framingham calculator [1] and 0.71–0.82 for the ASCVD [3].

Table 1 Predictors included in existing CVD and T2DM risk scores

Scoring systems for risk of future T2DM include that of Schmidt et al. [4], based on prospective data from the Atherosclerosis Risk in Communities Study (ARIC). A predictive score for risk of current T2DM by Bang et al. [5] has been endorsed by the American Diabetes Association (ADA) [6], potentially because of its reliance on a relatively small number of variables and no need for blood testing (Table 1). These T2DM scoring systems detect T2DM risk with an area-under-the-curve of 0.80 for the Schmidt equation [4] 0.74 for the Bang equation [6].

Another notable cardiovascular and diabetes risk predictor is the metabolic syndrome (MetS), a cluster of CVD risk factors including central obesity, high blood pressure, high fasting triglycerides, low HDL cholesterol fasting glucose [7]. These factors are associated with insulin resistance and appear to be driven by underlying processes of adipocyte dysfunction, systemic inflammation and oxidative stress [8]. Using traditional criteria, MetS was classified according to the presence of at least three abnormalities in the MetS components [7]. The value of MetS as a concept had been questioned, with several studies demonstrating that the presence of MetS by traditional criteria did not provide additional CVD or T2DM prediction beyond that conveyed by the individual MetS components [9,10,11]. We recently formulated a sex- and race/ethnicity-specific MetS severity Z-score with differential weighting of the individual MetS components based on how these components correlated together by sex- and racial/ethnic sub-group [12, 13]. While this MetS severity score was not specifically formulated to be a risk predictor, we demonstrated that this score remained significantly associated with long-term risk for coronary heart disease (CHD) [14, 15] and T2DM [16, 17], even in models that included the individual MetS components [15, 17]—contributing to the notion that the presence of the underlying processes driving MetS may confer additional risk for CHD and T2DM.

Current risk scores do not incorporate a component estimating the presence of MetS, beyond incorporation of some of its individual risk factors. We hypothesized that adding MetS severity to common risk scores for CVD and T2DM would increase their power to predict risk. The goal of this study was to assess (1) whether MetS severity remained a significant predictor of long-term CHD and T2DM risk, even when assessed alongside existing risk scores and (2) whether the addition of MetS to these scores improved their ability to predict CHD risk. We utilized longitudinal data from black and white participants of the Atherosclerosis Risk in Communities (ARIC) study and the Jackson Heart Study (JHS) with up to 20-year follow-up to assess for optimizing of risk via use of MetS severity.

Methods

Study sample

The study sample consisted of participants from two large cohort studies: ARIC and the Jackson Heart Study (JHS). This study and/or its analysis was approved by the Institutional Review Boards of the University of Florida, the University of Virginia, and the study sites for the ARIC; all participants provided informed consent. ARIC started in 1987 as a large community-based epidemiological cohort study of mostly white and African–American participants. A total of 15,792 individuals aged 45–64 years old were recruited for four visits across four US sites. JHS started in 2000 as an expansion of the ARIC study site in Jackson, MS, with three visits, focusing on African Americans. JHS recruited 5301 African Americans aged 21–95 years old, among which 1626 participants had been followed as part of ARIC. For these 1626 participants, we used data from their ARIC follow-up and not their JHS follow-up. For the purposes of this analysis, we excluded the participants who self-identified as a race other than white or African American (n = 46). We also excluded participants with baseline (Visit 1) T2DM (n = 2485), CHD (n = 973), or stroke (n = 393), and participants who had missing baseline data on MetS components (n = 792), who had non-fasting laboratory studies (n = 507), and/or those without follow-up data regarding T2DM outcomes (n = 2992). Eventually, data from 13,141 participants were used for this study. Previous reports have published details of procedures for blood collection and analysis for lipids [18] and serum glucose [19]. Briefly, participants fasted overnight for 12 h before the examination. Phlebotomy was performed, blood sample was centrifuged and serum was sent to a central laboratory for examination. Triglycerides were measured by enzymatic methods, and HDL cholesterol was measured after dextran-magnesium precipitation. LDL cholesterol was calculated using the Friedewald equation. Serum glucose was measured by the hexokinase-6 phosphate dehydrogenase method [9]. BP was examined in sitting position with a random-zero sphygmomanometer—of the three measurements performed, the average of the last two measurements were used for analysis.

Study outcomes

Time to incident CHD

Incident CHD was determined from adjudicated outcomes using standard ARIC and JHS protocols and included fatal or nonfatal hospitalized myocardial infarction, fatal CHD, silent myocardial infarction identified by electrocardiography, or coronary revascularization [19, 20]. The study outcome time to incident CHD events was defined as the minimum number of days between the baseline visit and either the first event, death from other causes, last contact, or Dec 31, 2011. Given the focus on evaluating the performance of risk scores that were specifically designed to estimate 10-year risk of CHD, we calculated prediction statistics (described below) of these overall survival models with respect to 10-year risk.

T2DM

Incident T2DM was determined slightly differently for the ARIC and JHS participants due to differences in variable specifications. In ARIC, participants were defined as having T2DM if they reported that a physician had told them they had diabetes, had a fasting glucose ≥ 126 mg/dL or a non-fasting glucose ≥ 200 mg/dL, or if they reported they were taking insulin or oral hypoglycemic medications [4]. In JHS, participants were defined as having T2DM if they had a fasting glucose ≥ 126 mg/dL or an HbA1c ≥ 6.5% or if they took a diabetic medication within 2 weeks prior to the clinic visit [21]. Incident T2DM was determined for Visits 2–4 separately or ARIC participants and for Visits 2–3 separately for JHS participants (as we excluded those with T2DM at Visit 1). The study outcome T2DM was defined as a dichotomized variable, with “Yes” being having T2DM during any Visits and “No” being not having T2DM for all Visits, considering a follow-up time of 10 years.

Predictors: risk scores

Existing CVD and T2DM risk scores

We utilized two existing CVD risk scoring algorithms based on Cox regressions for model fitting (Table 1). Using data from the Framingham Heart Study, D’Agostino et al. derived a sex-specific multivariable risk factor algorithm for assessing 10-year general CVD risk [1, 2]. The 2013 American College of Cardiology/American Heart Association Guideline on the Assessment of Cardiovascular Risk (Goff et al. [3]) subsequently published the ASCVD, a sex- and race-specific 10-year ASCVD risk estimation algorithm derived using extensive data from several large, racially and geographically diverse cohort studies, including the Framingham Heart Study, ARIC, the Cardiovascular Health Study, and the Coronary Artery Risk Development in Young Adults (CARDIA) study.

We also utilized two existing T2DM risk scoring algorithms based on logistic regression (Table 1). Bang et al. published a risk scoring algorithm for undiagnosed diabetes developed using the National Health and Nutrition Examination Survey (NHANES) data [5]. The Bang risk algorithm was later adopted by the ADA as the Type 2 diabetes risk test [6]. The other T2DM risk algorithm was developed using ARIC data by Schmidt et al. [4]. For each of these four existing CVD and T2DM risk scores, we converted the scores in the analytic sample to Z-scores for use in the final predictive models.

MetS severity score

We calculated the MetS severity Z-scores at baseline for the study participants using sex- and race/ethnicity-based formulas [12]. The MetS severity score was derived from the five traditional MetS components (WC, triglycerides, HDL-cholesterol, systolic BP, fasting glucose) using a factor analysis approach. Because of differences in traditional MetS criteria by race/ethnicity [22,23,24], confirmatory factor analysis was performed as previously described [12] to determine the weighted contribution of each component to a latent MetS factor on a sex- and race/ethnicity-specific basis, using the National Health and Nutrition Examination Survey (NHANES) data for adults aged 20–64 years. For each of the six sub-groups based on sex and race/ethnicity (non-Hispanic-white, non-Hispanic-black and Hispanic), factor loadings from the five MetS components were determined and used to generate equations for computing a standardized MetS severity score for each sub-group (http://mets.health-outcomes-policy.ufl.edu/calculator/). The MetS severity score was shown to correlate with other MetS risk markers, such as insulin [25] and adiponectin [25], and is predictive of long-term risk of CVD [14, 15] and T2DM [16, 17]. We recently demonstrated that the MetS severity score was predictive of future CHD and T2DM events above and beyond the individual MetS components alone [15, 17]. Because of the importance of insulin resistance in T2DM and CVD, we additionally assessed the homeostasis model of insulin resistance (HOMA-IR) as a risk predictor. HOMA-IR was calculated as: HOMA-IR = (fasting insulin × fasting glucose)/405, where insulin is measured in mU/L and glucose is in mg/dL.

Statistical analysis

Using data from the combined ARIC and JHS cohorts, we used Cox proportional hazards models to assess the association of existing CVD risk scores and MetS with the time to incident CVD. We used logistic regression models to assess the association of the existing T2DM risk scores and MetS with the incidence of T2DM. To explore the effect of adding MetS on model performance, we used a series of nested models for both Cox and logistic regressions. The predictors in the nested models were: (1) risk score only (Model A); (2) MetS severity only (Model B); (3) risk score and MetS severity (Model C); and (4) risk score, MetS severity, and risk score by MetS severity interaction (Model D). For both outcomes, we also fitted sex- and race-specific models and reported associated odds ratios or hazard ratios. We controlled for study site in all models (four ARIC sites plus JHS). When presenting hazard ratios (HR’s) and odds ratios (OR’s), we standardized the Framingham, ASCVD, and Schmidt risk scores to facilitate comparability with the HR’s and OR’s associated with the MetS severity Z-score. Given the ordinal nature of the Bang diabetes score, we did not standardize for this comparison. Our primary interest was in the model prediction statistics described below and how these risk scores would perform in clinical settings; for these evaluations we used the risk scores on their original scale. All statistical analyses were performed using SAS version 9.4 (SAS, Cary, North Carolina, USA).

Model performance was evaluated using the following statistics: Akaike information criterion (AIC), c statistic, integrated discrimination improvement (IDI), and continuous net reclassification improvement (NRI). The AIC and c statistic were computed for all models. The IDI and continuous NRI were computed for comparing Model C to Model A, and Model D to Model A. The c statistic, and IDI are measures of discrimination, which is a model’s ability to distinguish between subjects with and without the disease. The c statistic is the estimated area under the Receiver Operating Characteristics (ROC) curve. The IDI equals the difference in discrimination slopes between the model with the additional predictor and the model without [26], or the difference in the proportion of variance explained by the two different models [27]. The continuous NRI is a measure of improvement in reclassification, defined as the sum of two differences in proportions resulting from the addition of a new predictor: (1) proportion of individuals with events who have an increase in predicted risks minus the proportion with a decrease (event NRI), and (2) proportion of individuals without events who have a decrease in predicted risks minus the proportion with an increase (non-event NRI). Extensions of the c statistic, IDI, and NRI in the context of survival data have been previously reported [28,29,30]. While all available follow-up data for CVD was used in our models, we calculated these statistics to evaluate predictive performance at 10 years. Performance statistics, except for the AIC, were computed using validated SAS macros available from: http://ncook.bwh.harvard.edu/sas-macros.html. Finally, variance inflation factors (VIF’s) were computed to assess the degree of collinearity when including MetS and other predictive scores in the same model, with VIF’s greater than 10 representing severe collinearity.

Results

Participant characteristics

We summarized the characteristics of the 13,141 study participants at baseline by sex and race in Table 2. The average age of the participants was 53.0 (SD = 7.1) years old. The incidence of CVD at 10 years was the highest among white men (24.6%), compared to 9.4% among white women, 10.3% among black men, and 5.7% among black women. Overall, the incidence of CVD at 10 years was 13.2% for all the participants. The incidence of T2DM at 10 years was 12.0% overall and differed by sex and race. Black men (16.4%) and women (16.5%) had a higher incidence of T2DM, while the rate was 11.7% among white men and 7.9% among white women.

Table 2 Characteristics of study participants

CVD risk prediction with CVD risk scores and MetS

Results from adding MetS severity score in addition to the Framingham and ASCVD risk score for predicting future CHD were summarized in Table 3. The CVD risk scores (Model A) and the MetS severity score (Model B) by themselves were significant predictors for future CVD, across the sex and race groups, with an overall HR’s of 2.38 per normalized SD unit increase in Framingham score, 2.68 per normalized SD unit increase in ASCVD and 1.77 per standard deviation unit increase in MetS severity. When included in the same model (Model C), the Framingham score but not MetS severity was a significant predictors for future CHD. Both the ASCVD score and MetS severity were significant predictors for future CHD when included in the same model.

Table 3 Cox proportional hazards models: time to incident CVD, overall and by sex and race: risk scores and MetS severity

Regarding model performance, there appeared to be mixed indicators of added distinguishing ability when adding MetS severity score to Framingham. In moving from Model A to Model C, there was no change in the c statistic, suggesting the distinguishing ability remaining unchanged with a non-significant IDI. Conversely, the continuous NRI was 0.16 (95% CI 0.08, 0.23), indicating a significant increase in the model’s ability to correctly classify individuals without CVD events (non-event NRI = 0.10; 95% CI 0.08, 0.11) when adding MetS severity. Similar statistics were observed comparing Model D (that included the interaction between the risk score and MetS) to Model A. No major differences in patterns of discrimination and reclassification performance were observed between sex and race groups.

We observed similar results when adding MetS severity score to the ASCVD prediction model. There was no change in c statistic from Model A to Models C and D. Comparing Model C to Model A, the IDI was not significant, while the continuous NRI was 0.15 (95% CI 0.05, 0.23), suggesting a significant increase in reclassification performance. Comparing Model D to Model A, the IDI was not significant, while the continuous NRI was 0.38 (95% CI 0.30, 0.45), again indicating an increase in reclassification performance. Among the sex and race groups, we observed no major differences in patterns of discrimination and reclassification performance. Again, the continuous NRI revealed that adding MetS severity improved the ability to correctly classify individuals without CHD events.

T2DM risk prediction with T2DM risk scores and MetS

Results from adding MetS severity score in addition to the Bang and Schmidt risk score for predicting incident T2DM were summarized in Table 4. The T2DM risk scores (Model A) and MetS severity score (Model B) were significant predictors for T2DM by themselves, across the sex and race groups. MetS severity was a stronger predictor than the Bang score. When included in the same model (Model C), both the Bang score and MetS severity were significant predictors for T2DM, except in models for women where the Bang score was not significant. In the overall combined model, the Schmidt risk score but not MetS severity was a significant predictors for future T2DM when included in the same model, except in the models for black men and women, in which MetS was more strongly associated and in the model for white men where the MetS severity score was protective.

Table 4 Logistic models for predicting type 2 diabetes, overall and by sex and race: risk scores and MetS severity

When adding MetS severity score as a new predictor in addition to the Bang score (from Model A to Model C), there was a significant increase in the c statistic, suggesting added discrimination. The c statistic for Models A and C were 0.69 (95% CI 0.68, 0.71) and 0.78 (95% CI 0.77, 0.79), respectively. Similarly, the comparison of Model C with Model A, for which the IDI was 0.07 (95% CI 0.06, 0.08), suggests a large increase in MetS severity’s ability to distinguish participants with and without T2DM relative to the Bang score. The continuous NRI was 0.68 (95% CI 0.63, 0.72), suggesting a significant increase in model’s ability to correctly classify individuals with or without T2DM. The performance of Model D was similar to that of Model C.

We observed smaller increases in discrimination and reclassification performance when adding MetS severity score to the Schmidt score for predicting incident T2DM (Table 4). Comparing Model C to Model A, the c statistic remained unchanged, and the IDI was statistically non-significant (− 0.00; 95% CI − 0.00, 0.00). However, the continuous NRI was 0.16 (95% CI 0.12, 0.22), indicating a significant increase in the model’s ability to correctly classify individuals with or without T2DM when adding MetS severity. Model D had a better performance than Model C, with the continuous NRI being 0.47 (95% CI 0.41, 0.52). Across all analyses of T2DM models (Table 4), there were no major differences in model performance among the sex and race groups.

HOMA-IR and risk prediction

As a risk predictor, HOMA-IR was not as strongly linked to future CVD or T2DM as was MetS in individual models (Additional file 1: Tables S1 and S2). In the combined models, HOMA-IR remained linked to future CVD to a similar extent as seen for MetS-Z. For T2DM models, HOMA-IR remained linked to future T2DM when assessed alongside both diabetes scores.

Interactions between Risk Scores and MetS

In the overall and sex/race specific models for predicting CHD and T2DM, we also fitted the interaction term (i.e., how the relationship between MetS severity a future disease diagnosis varied by the level of the comparator risk score) between the respective disease risk score (by quintiles) and MetS severity score (Model D). In each of the four overall prediction models (2 for CVD and 2 for T2DM), the risk score by MetS interaction was statistically significant, demonstrating that MetS severity performed differently in risk assessment depending on the underlying score. We summarized the interaction plots from the four models in Fig. 1. As seen in Fig. 1a, b, the hazard ratios of MetS assumed an s-shape across the CVD risk quintiles based on either the Framingham or ASCVD score. MetS was a stronger predictor for CHD among individuals with the lowest CVD risk (quintile 1). Then, it became a relatively weaker predictor for CHD among individuals with higher CVD risk (quintiles 2 and 3), before becoming a stronger predictor among individuals with higher CVD risk in quintile 4. As seen in Fig. 1c, d, the odds ratios of MetS differed across the T2DM risk quintiles. In the model with the Bang score, MetS was a stronger predictor for T2DM for the middle risk quintile, but a weaker predictor among individuals with the highest T2DM risk. In the model with the Schmidt score, MetS was a stronger predictor for T2DM for the middle risk quintile, but a weaker predictor among individuals with higher T2DM risk (quintiles 4 and 5).

Fig. 1
figure 1

Interaction between risk scores and MetS severity score by disease risk score quintiles. Hazard ratios (HR) or odds ratios (OR) for each MetS severity score Z-score standard deviation unit by quintiles of risk score for cardiovascular disease (CVD) (a Framingham risk score; b ASCVD score) and Type 2 diabetes mellitus (T2DM) (c Bang score (Ref. [5]); d Schmidt score (Ref. [4])). For each of the comparator risk scores, there was a significant interaction between MetS severity and future disease risk depending on the quintile of the comparator score, with increasing MetS severity exhibiting higher hazard ratios (HR) for future CHD among individuals in the lowest quintile of CVD risk according to the Framingham and ASCVD risk and with increasing MetS severity exhibiting higher HR’s for future diabetes among individuals in the middle quintiles of diabetes risk according to the Bang and Schmidt risk scores. All models controlled for study site

Discussion

We found that a MetS severity Z-score, when added to predictive models alongside existing T2DM risk scores, consistently improved the models’ discrimination performance for future T2DM. This contrasted with the MetS severity when applied to CHD events: while the MetS severity score remained significantly associated with future CHD when added to models with the ASCVD risk score, this addition did not result in a consistent improvement in the prediction models’ discrimination performance and overall performed best at reclassifying individuals without CHD events. We had previously demonstrated that a MetS severity Z-score was associated with CHD and T2DM outcomes, even in models that included the individual MetS components [15, 17]; however, when tested in models without the individual components, this association was much stronger when comparing the 4th vs. the 1st quartile of MetS severity for future T2DM (HR = 17.4 over 8 years) than with CHD (HR = 4.0 over 25 years follow-up) [15, 17]. The current results reveal that the same hierarchy exists when adding MetS severity for existing scores—that the utility in adding the MetS severity score to improve clinical accuracy is much clearer for T2DM than for CVD.

It is important to note that the existing CVD- and T2DM risk scoring systems that we assessed in the current study were formulated specifically to predict long-term CVD events or diabetes diagnosis using known risk factors (but without incorporating an estimate of MetS beyond its individual components) [1, 3,4,5]. By contrast, the MetS severity Z-score was formulated not based on CVD or T2DM risk per se but on how the individual MetS variables cluster together—potentially as an estimate of the pathological processes underlying MetS, such as adipocyte cellular dysfunction, systemic inflammation, and oxidative stress [8]. Nevertheless, adding the MetS severity score to existing T2DM risk scores in incident diabetes models revealed that this estimate of MetS severity increased the discrimination performance of models, especially for the score from Bang et al. In the prediction models for both CVD and T2DM outcomes, adding MetS severity score to existing risk scores increased the NRI assessment of the models’ ability to correctly reclassify individuals with or without the disease. This suggests that the MetS score has identified risk associated with the way that these individual components are clustered—and that this MetS factor is distinct from risk associated with the existing T2DM scores.

Both of the CVD scoring systems that we evaluated had similar risk prediction for CHD that exceeded that of the MetS severity Z-score. This is not surprising given the inclusion in both of these CVD scores of smoking and LDL cholesterol—clearly important non-MetS-related risk factors that themselves carry strong associations with future CHD. Nevertheless, MetS severity predictive ability was strongest among individuals who were in the lowest risk category based on the CVD scoring systems—suggesting a role for MetS severity as a follow-up test among individuals previously identified as low risk by the scoring systems themselves. This was true for the Bang T2DM risk score also as well as for those in the second and third quintiles of the Schmidt score.

Despite the availability of web-based calculators, clinical use of scoring systems such as these remains limited by the time required to calculate the score on a per-patient basis. Thus, addition of an extra layer of complexity of an additional factor such as MetS severity is at first intimidating. Nevertheless, automated calculation of such scores using the electronic health record (EHR) could facilitate wider use toward identification of high-risk patients. In addition to laboratory values, smoking status is widely available as a codified item through meaningful-use programs. Similar risk-identification algorithms are already widely utilized in EHR systems. Our data suggest the potential that MetS severity could be calculated automatically for use in such systems.

MetS has strong associations with insulin resistance, which itself is linked to risk for T2DM [31] and CVD [32]. It is thus perhaps not surprising that HOMA-IR [33] also had associations with these outcomes. Unfortunately, HOMA-IR has had difficulties in clinical application, as it relies on measurement of insulin. Insulin still does not have a standardized laboratory approach, with measured outcome varying significantly between laboratory assays. However, in research studies using a single insulin assessment technique, levels of insulin are associated with these outcomes of interest. None of the risk scores that we assessed here included insulin measure, potentially explaining its persistent association here.

This study had multiple limitations. The cohorts represented here were initially enrolled 12–29 years ago, at a time when many current CVD treatments and management approaches were not available or not in common use. Thus, the precise predictive ability represented here is likely not generalizable to modern populations. In addition, the definition of Type 2 diabetes differed between the two cohorts, with JHS including elevated HbA1c as an indicator of diabetes—potentially identifying more individuals who had previously-unidentified diabetes but may have had normal fasting blood glucose. However, we adjusted all of these analyses for study site; thus, if there were any differences in the relationship between risk score and outcome according to the method of diabetes diagnosis, this would have been accounted. In this analysis we did not examine the relative performance of MetS severity in predicting future disease among the myriad other risk scores available for CVD and T2DM outcomes. To address our primary aims of examining the utility of MetS severity in predicting future disease above and beyond existing risk scores, we selected some of the more prominently studied and used risk score algorithms. Some of these risk equations (the ASCVD and the Schmidt prediction of T2DM) utilized ARIC in their development; thus, comparisons between these and other scores themselves may not be appropriate. However, our primary goal was to evaluate the added value of risk prediction associated with MetS severity, and using popular equations was an initial step in this process. While it is plausible major differences in our conclusions would result if we examined other algorithms, given the similarity in the composition of most of the algorithms, we suspect similar findings regarding our hypotheses regarding MetS severity and prediction of future disease. In addition, there are participants with less than 10 years of follow-up, particularly with respect to reliable T2DM diagnosis, which would impact the predictive ability of the 10-year risk scores. Nevertheless, any associated bias would not differ among the scores themselves, nor would the ability to determine if MetS severity adds any predictive benefit.

Conclusions

We found that adding MetS severity—potentially as a marker of underlying metabolic disarray—improved the predictive performance of risk scores for future T2DM much more consistently than for CVD. These data are significant in providing the potential to strengthen current scoring systems via the addition of an estimate of MetS severity—potentially adding accuracy to the identification of individuals at high disease risk, who can then be more appropriately targeted for treatment.