1 Introduction

Early lifestyle intervention is a cost-effective recommendation to reduce the incidence of type 2 diabetes (Knowler et al. 2002; Li et al. 2010; Nanditha et al. 2014), asking for informative, sensitive and specific markers. Although the standard laboratory tests, such as fasting glucose, 2-h postprandial glucose, and glycated hemoglobin A1c (HbA1c), provide strong evidence for the risk of type 2 diabetes(Haffner et al. 1990; Shaw et al. 1999; Droumaguet et al. 2006), these predictors emerge after years of subclinical metabolic dysfunction (Tabak et al. 2009). Traditional risk factors (TRFs) such as age, sex, body mass index (BMI), and waist circumference also explain considerable part of future risk (Gray et al. 2010; Wilson et al. 2007), but fail to capture the full complexity of the etiology and their predictive performance vary between different risk groups (Kengne et al. 2014). BMI has been put forward as the modifiable risk factor but, there are also metabolically unhealthy normal weight (MUHNW) as well as metabolically healthy obese (MHO) individuals, raising the question to what extent BMI explain the mechanisms of the underlying metabolic disease (Mathew et al. 2016). Therefore, there is an increasing interest in finding informative markers that indicate the particular metabolic dysfunctions before the manifestation of the disease. Hence, people identified at high risk would be able to take preventive lifestyle interventions or treatments targeted to their individual molecular profile, eventually personalizing their health care.

High throughput metabolomics offers an opportunity to test multiple metabolic markers in large settings. Such approach led to the discovery of five amino acids by the prospective Framingham Heart Study (FHS) using a 12-year follow-up (Wang et al. 2011). Branched chain amino acids (BCAA) from this panel were previously pointed out in a case-control setting (Suhre et al. 2010) and later in a follow-up study of limited size (Lu et al. 2016; Yu et al. 2016). Other metabolites including phospholipids, triglycerides, acyl-carnitines, organic acids and small molecular weight compounds were also added to the list of metabolomics based predictors (Floegel et al. 2013; Walford et al. 2014; Wang-Sattler et al. 2012; Lu et al. 2016; Yu et al. 2016; Suhre et al. 2010), covering the glucose and phospholipid metabolism. However, lipoprotein metabolism, which is one of the key components of metabolic dysfunction, has not been addressed.

In the present study, we aimed to identify novel metabolic markers using a total of 261 metabolic features measured by either targeted mass spectrometry (MS) or by targeted nuclear magnetic resonance (NMR). The chemical classes of tested molecules include sub-fractions of lipoproteins, triglycerides, phospholipids, amino acids, and small intermediate compounds. We estimated the predictive performance of the selected marker set in comparison to other well-known predictors, including fasting glucose, TRFs, and the validated panel of amino acids.

2 Research design and methods

2.1 Study population

The Erasmus Rucphen Family genetic isolate study (ERF) is a prospective family based study located in Southwest of the Netherlands. This young genetic isolate was founded in the mid-eighteenth century and minimal immigration and marriages occurred between surrounding settlements due to social and religious reasons. The ERF study population includes 3465 individuals that are living descendants of 22 couples with at least six children baptized. Informed consent has been obtained from patients where appropriate. The study protocol was approved by the medical ethics board of the Erasmus Medical Center Rotterdam, the Netherlands (Santos et al. 2006).

The baseline demographic data and measurements of the ERF participants were collected around 2002–2006. All the participants filled out questionnaires on socio-demographics, diseases and medical history and lifestyle factors, and were invited to the research center for an interview and blood collection for biochemistry and physical examinations including blood pressure and anthropometric measurements have been performed. The participants were asked to bring all their current medications for registration during the interview. Venous blood samples were collected after at least 8 h fasting. Hypertension was defined as systolic blood pressure ≥140 mmHg or diastolic blood pressure ≥90 mmHg or treatment for hypertension. The family history was coded as 0, 1, 2 based on no first-degree relatives has type 2 diabetes, one has type 2 diabetes and more than one have type 2 diabetes. Baseline type 2 diabetes was defined according to the fasting plasma glucose ≥7.0 mmol/L and/or anti-diabetic treatment, yielding 212 cases and 2564 controls, totaling up to 2776. The follow-up data collection of the ERF study took place from March 2015 to May 2016 (9–14 years after baseline visit). During the follow up a total of 1935 participants’ records were scanned for incidence of type 2 diabetes in general practitioner’s databases. Additionally, a questionnaire on type 2 diabetes medication surveyed on 1232 participants in June 2010 (4–8 years after baseline visit) was referred if a participant were not included in May 2016 follow-up. This effort yielded the inclusion of 18 otherwise missed extra cases. To summarize, out of the 2564 controls at baseline, 1571 were followed-up for a mean 11.3 years (inter quartile range 11.0–12.2). Among those, 137 developed type 2 diabetes, whereas 1434 did not, comprising together the analytical sample for prediction analysis.

2.2 Metabolomics measurements

In total 261 metabolic marker molecules including sub-fractions of lipoproteins, triglycerides, phospholipids, amino acids and small intermediate compounds, which throughout this article will be referred as “metabolites”, were measured by three different targeted platforms, either by NMR spectrometry or MS at baseline. The samples included in metabolomics measurements were not selected based on any disease. The platforms used in this research are: (1) Liquid Chromatography-MS (LC-MS, 116 positively charged lipids, comprising of 39 triglycerides (TG), 47 phosphatidylcholines (PC), 8 phosphatidylethanolamines (PE), 20 sphingolipids (SM), and 2 ceramides (Cer), available in up to 2638 participants) measured in Netherlands Metabolomics Center, Leiden using the method described before (Gonzalez-Covarrubias et al. 2013), (2) small molecular compounds window based NMR spectroscopy (NMR-COMP, 41 molecules comprising of 29 low-molecular weight molecules and 12 amino acids available in up to 2639 participants) measured in Center for Proteomics and Metabolomics, Leiden University Medical Center (Demirkan et al. 2015; Verhoeven et al. 2017), (3) lipoprotein window based NMR spectroscopy (NMR-LIPO, 104 lipoprotein particles size sub-fractions comprising of 28 very low-density lipoprotein (VLDL) components, 30 high-density lipoprotein (HDL) components, 35 low-density lipoprotein (LDL) components, 5 IDL components and 6 plasma totals, available in up to 2609 participants) measured in Proteomics and Metabolomics, Leiden University Medical Center and lipoprotein sub-fraction concentrations were determined by the Bruker algorithm (Bruker BioSpin GmbH, Germany) details were given previously (Kettunen et al. 2016). Details over the quality control of samples in these platforms can be found in the Supplementary Information. The laboratories had no access to phenotype information and the data pre-filtering and quality control for measurement errors were based on internal controls and duplicates.

2.3 Metabolite identification

The compounds measured by LC-MS and NMR-COMP were identified according to the metabolomics standards initiative (MSI) level 1 using information coming from at least 2 different sources (Sansone et al. 2007). The available ChEBI ID were shown in Supplementary Table 1.

For metabolites measured by LC-MS, the identities of the lipids were assigned on the basis of accurate mass, fragmentation pattern, and retention times matched to authentic standards where available. The detail of the metabolite identification can be found in previous publications (Hu et al. 2008).

For metabolites measured by NMR-COMP, the identities of the small components and molecules were assigned by the peaks which are annotated using the combined information from chemical shift databases, spiking experiments, and correlation behaviors. The detail of the metabolite identification can be found in the methodological paper (Verhoeven et al. 2017).

For lipoproteins measured by NMR-LIPO, the method is based on the analysis of signals in the 1H-NMR spectrum which are related to the lipoproteins. Differences in lipoprotein composition, size and density translate into respective signal line shape differences, which can be used to extract information on lipoprotein main- and subclasses. As these are not real metabolites, the MSI criteria do not apply.

2.4 Statistical methods

The distributions of individual metabolites were checked for non-normality by eye and outlying values that were more than four times standard deviation away from the mean were excluded from analysis. Non-normally distributed measurements were natural logarithm transformed, or rank transformed. Figure 1 shows the procedure that we followed for the selection of metabolites. Firstly, we tested the association between the 261 individual metabolites and prevalent type 2 diabetes using a logistic regression model adjusting for age, sex, and lipid-lowering medication. Residuals from the polygenic model (using “polygenic” function in the R package GenABEL), were used in all analysis to account for family relations among the ERF participants (Aulchenko et al. 2007). To control for multiple testing, we applied a Bonferroni correction based on the effect number of independent vectors in the data which were estimated to be 81 independent equivalents using Matrix Spectral Decomposition (MSD) (Li and Ji 2005). Thus, a p-value less than 6.18 × 10− 4 (0.05/81) was used as the threshold for metabolome-wide significance. We repeated analysis stratifying the cases into medicated and non-medicated cases to test if the associations were attributed to the effect of anti-diabetic medication. Metabolites that did not differentiate (p-value >0.05) between non-medicated diabetics (n = 68) versus controls (n = 2564) were not taken forward. These metabolite levels were assumed to be different due to the post medication metabolic changes in the diabetics. The remaining metabolites (n = 88, the list is given in Supplementary Table 1) and the TRFs (age, sex, family history, BMI, waist circumference, hypertension, HDL-cholesterol, and triglycerides) with scaled around 0 and standard deviation as 1 were included in the prior to LASSO (Least Absolute Shrinkage and Selection Operator) regression to select the set of predictors that maximize the prediction performance. The LASSO regression was performed using glmnet package in R (Friedman et al. 2010). We imputed these missing data points (i.e. 9.6–18.5% missing values) before selecting the independent predictors by LASSO regression which requires all the variables to be complete measurements. In order to select the best imputation method suitable for our data, we first generated a training dataset with 20% missing values at random and compared three methods: (1) deterministic imputation, (2) random regression imputation, (3) multiple imputation with R package “mice” (Andrew and Jennifer 2006). After comparing the results to the initial correlation with glucose and the means between the imputed values and real values for each method, multiple imputation was selected. The sum of predicted values from the multiple random regression model divided by the number of imputations (n = 20) was used to replace the missing data. The outliers more or less than four times standard deviation were removed after imputation. With the selected independent type 2 diabetes metabolic predictors, we assessed their associations with fasting glucose by linear regression analysis in the non-diabetic participants at baseline. To account for multiple testing in these 24 linear regression sets, a p-value <0.003 (0.05/16) was used as the threshold after MSD of the 24 metabolites that yielded 16 independent components.

Fig. 1
figure 1

Flow chart of the metabolite selection

2.5 Prediction of incident type 2 diabetes

The metabolites selected from the baseline population were tested to predict the incidence of type 2 diabetes during the follow-up time. Area under the receiver operator characteristics (ROC) curve (AUC) of logistic regression together with continuous Net Reclassification Improvement (NRI) was performed to estimate the discrimination and reclassification in different models (Pencina et al. 2012). The models compared were: ERF metabolite model (the metabolites those selected in the current study only), FHS metabolite model, (the amino acids reported by the FHS research: isoleucine, leucine, valine, tyrosine, and phenylalanine), and the TRF model (age, sex, family history, BMI, waist circumference, hypertension, HDL-cholesterol, and triglycerides) and glucose only model (fasting plasma glucose measured at baseline) and combination of those. As some of the previous studies showed the association between metabolites and covariates, i.e. age, sex and BMI (Dunn et al. 2015; Lawton et al. 2008), we also tested the models in subgroups of sex, age (<50 vs. ≥50 years), and BMI (<25 vs. ≥25 kg/m2). A p-value <0.05 here was used as a cut off for significance improvement across the models. Meanwhile, the specificity with fixed 80% sensitivity in different prediction models is compared. Analyses were conducted using R (version 3.2.3).

3 Results

Table 1 displays the baseline characteristics of the participants stratified by prevalent cases at baseline and incident cases in the follow-up. Compared to the participants who did not develop type 2 diabetes, those with type 2 diabetes were older, more often had a family history of the disease, suffered from hypertension, and have been using lipid-lowering medication. They had higher levels of BMI, waist circumference, blood pressure, triglycerides, fasting glucose, and lower levels of HDL-cholesterol. The participants with incident type 2 diabetes had higher fasting glucose at baseline compared to the individuals who did not develop type 2 diabetes during the follow up.

Table 1 Characteristics of the study population

3.1 Metabolites associated with type 2 diabetes at baseline

We identified 24 independent metabolites together with five TRFs (age, sex, family history, waist circumference, and HDL-cholesterol) from LASSO regression maximizing the discrimination at baseline. These metabolites and their associations with prevalent and incident type 2 diabetes, as well as fasting glucose at baseline are listed in Table 2. Four of them (i.e. PC(O-34:2), L-HDL-free cholesterol, XXL-LDL-phospholipids and L-LDL-cholesterol) are associated with decreased risk of type 2 diabetes, whereas twenty of them associated with increased risk; including three triglycerides, seven lipoprotein particles, three amino acids, and seven small intermediate compounds. Among the seven lipoprotein particles, two are sub-fractions of HDL, two are of LDL, and three are of VLDL (See details in Table 2). Out of the 24 metabolites, PC(O-34:2), XXL-LDL-triglycerides, HDL-triglycerides, L-HDL-ApoA2, and M-HDL-ApoA2 are not associated with fasting glucose in the non-diabetic population at baseline and incident type 2 diabetes.

Table 2 Association of LASSO regression selected metabolites with type 2 diabetes and fasting glucose

3.2 Predicting incident type 2 diabetes

Figure 2 shows the AUC comparisons across the different prediction models. The ERF metabolites discriminate future type 2 diabetes with an AUC [95% confidence interval] of 0.81 [0.77, 0.85]. The AUC of the ERF metabolite model was significantly higher than of the FHS metabolite model [AUC 0.81 (0.77, 0.85) vs. 0.77 (0.73, 0.81), NRI = 0.42, p-value <0.0001]. It is of note that tyrosine and isoleucine, which were previously selected by FHS, were also selected in the ERF metabolite model. The AUC for the model including both ERF and FHS metabolites together was significantly higher than the AUC for models with either set of predictors [AUC 0.83 (0.79, 0.86) vs. 0.77 (0.73, 0.81), NRI = 0.67, p-value <0.0001 for ERF and FHS metabolites vs. only FHS metabolites; AUC 0.83 (0.79, 0.86) vs. 0.81 (0.77, 0.85), NRI = 0.29, p-value =0.0015 for ERF and FHS metabolites vs. only ERF metabolites]. The AUC of the ERF and FHS combined metabolite model did not differ from that of fasting glucose [AUC 0.83 (0.79, 0.86) vs. 0.84 (0.81, 0.88), p-value =0.45]. However, combining the ERF metabolites and fasting glucose together in a model improved the predictive performance significantly over the performance of fasting glucose [AUC 0.88 (0.84, 0.91) vs. 0.84 (0.81, 0.88), NRI = 0.66, p-value <0.0001]. Adding TRFs to fasting glucose and metabolite model maximized the AUC to 0.89 [0.86, 0.92]. The specificity with fixed 80% sensitivity increases from 70 to 80% when metabolites are added to the glucose only model (Supplementary Fig. 1).

Fig. 2
figure 2

AUC comparisons in different prediction models. Continuous Net Reclassification Improvement (NRI) indices were performed to compare different prediction models. FG fasting glucose, TRFs all traditional risk factors—age, sex, family history, BMI, waist circumference, hypertension, HDL-cholesterol, triglycerides

3.3 Predicting incident type 2 diabetes in different baseline risk groups

The AUC of the combined ERF, and FHS metabolite models and of fasting glucose model in subpopulations stratified by age, sex, and BMI is shown in Fig. 3. In the group with age < 50 years, the AUC of the combined metabolite model is significantly higher than that of fasting glucose model [AUC 0.86 (0.78, 0.94) vs. 0.77 (0.67, 0.87), NRI = 0.72, p-value <0.0001], whereas the AUCs of these two models are not statistically different in the elderly group [AUC 0.83 (0.78, 0.87) vs. 0.84 (0.80, 0.88), p-value =0.06]. The AUC of the metabolite model is significantly higher than that of fasting glucose in the female group [AUC 0.88 (0.83, 0.92) vs. 0.84 (0.79, 0.90), NRI = 0.44, p-value =0.001], whereas in the male group there is an opposite trend (0.78 [0.72, 0.84] vs. 0.83 [0.79, 0.88], NRI = −0.40, p-value =0.001). Similarly, in the group with normal BMI, the AUC of metabolite model is significantly higher than that of fasting glucose model [AUC 0.85 (0.75, 0.95) vs. 0.80 (0.66, 0.93), NRI = 0.49, p-value =0.04]. In the overweight and obese group, the trend is opposite but not significantly different [AUC 0.81 (0.76, 0.85) vs. 0.83 (0.79, 0.87), p-value =0.13]. When the sensitivity is fixed to 80%, the specificity rises from 59% (glucose only model) to 87% (glucose and metabolite model) in the young (age < 50 years), which is much higher increase than in the old (age ≥ 50 years, from 66 to 82%). The specificity also grows when we add metabolites or TRFs to the prediction model. (Supplementary Fig. 1) The ROC curves for the models and subgroups are given in the Supplementary Fig. 2 and Supplementary Fig. 3. The separation shown by time to event curves across different risk groups are given in Supplementary Fig. 4.

Fig. 3
figure 3

AUC comparisons in different subgroups. Continuous Net Reclassification Improvement (NRI) indices were performed to compare different prediction models. Black bars metabolite model; white bars fasting glucose model. (/): Number of controls and incident cases analyzed in the follow-up

4 Discussion

In the present study, we showed that the combined effect of 24 metabolites including ten lipoprotein sub-fractions yield a powerful discrimination model for predicting future type 2 diabetes. The ERF metabolite model significantly improved the prediction performance of FHS metabolite model and fasting glucose. We showed that combined metabolite model predicts future type 2 diabetes better than fasting glucose in the population who are female, younger than 50 years, or those with normal weight. In addition, we confirmed the conclusion from the FHS that isoleucine and tyrosine are predictors of type 2 diabetes independent of other factors (Wang et al. 2011).

The ERF metabolite model includes molecules from five classes: triglycerides, amino acids, lipoproteins, phospholipids and small intermediate compounds. Among those, metabolites such as 1,5-anhydro-d-glucitol (1,5-AG), 2-hydroxybutyrate, pyruvate, phosphatidylcholines, betaine, some triglycerides, and BCAA have been previously reported to be potential predictive and diagnostic markers for type 2 diabetes. (Wang et al. 2011; Nanditha et al. 2014; Wang-Sattler et al. 2012; Kim et al. 2016; Park et al. 2015; Yousri et al. 2015). Despite the fact that LASSO regression method is used to select independent components of our model, various metabolites from the same biochemical class were selected, supporting the view that the sub-fractions of some classical measurements play independent functions in the pathogenesis of type 2 diabetes (Kotronen et al. 2009). In line with this, the ERF metabolite model points out lipid perturbations evident in the very early stage of the disease. For example, levels of different triglycerides [e.g. TG (48:0), TG (48:1)] show independent effects. Our results on HDL and LDL sub-fractions are particularly interesting. We found them associated with both increased and decreased risk. L-HDL-ApoA2, M-HDL-ApoA2, XS-LDL-ApoB and XXL-LDL-triglycerides are associated with increased risk of type 2 diabetes, whereas L-HDL-free cholesterol, XXL-LDL-phospholipids and L-LDL-cholesterol are associated with decreased risk of type 2 diabetes. This suggests different roles for HDL and LDL particles and their content. Our results highlight the importance of reclassifying lipoproteins of clinical value into sub-fractions of HDL, LDL and VLDL, as the measurement techniques develop in the coming decade.

We also demonstrated that PC(O-34:2) is inversely associated with type 2 diabetes, which is in line with a recent study performed in the population based KORA study that showed decrease in PC(O-34:2) levels in patients with impaired glucose tolerance (Wang-Sattler et al. 2012). Phosphatidylcholine is a key element in lipoproteins (Park et al. 2015). Elevated plasma levels of choline and betaine mark cardiovascular risk in diabetes (Lever et al. 2014), while increased level of isoleucine was significantly associated with an increased risk of hypertriglyceridemia (Mook-Kanamori et al. 2014). 2-hydroxybutyrate appears to be useful as an early indicator of insulin resistance in non-diabetic subjects (Gall et al. 2010), and its elevated serum levels have recently been indicated to predict worsening of glucose tolerance (Ferrannini et al. 2013).

Among the other ERF metabolites, our results on two (1,5-AG and glycerol) are inconsistent with the previous studies in terms of direction of association: Suhre et al. studied on 40 diabetes cases and 60 healthy male controls in the German population (Suhre et al. 2010); Lu J. et al.’s study included 22 Chinese cases and 22 healthy controls (Lu et al. 2012), and the study by Shaham O. et al. was done in 47 healthy academic students (Shaham et al. 2008). Considering the larger sample size, our study should have yielded more reliable estimates compared to the above studies. It has been shown that levels of 1,5-AG metabolite reflect glycemic changes, and recent clinical studies demonstrated significant differences in 1,5-AG levels between diabetic patients receiving different treatments, consistent with their individual glucose profiles (McGill et al. 2004; Moses et al. 2008).

As shown in Table 2, all metabolites are associated with prevalent diabetes, but some are not associated with incident case control status. We kept those in the ERF prediction model as this could be due to their small effect sizes which need more sample size (power) to be detected since the effect estimates were in the expected direction. Another explanation could be that some metabolite levels change depending on the duration and progression of the disease that we cannot control for in the statistical model. A third explanation is that it could as well be due to anti-diabetic medication effect but we have already filtered the associations controlling for that. The addition of ERF metabolites can complement the type 2 diabetes prediction by fasting glucose and TRFs, yielding the best model when combined. This is partly a result of our selection approach performed independently of TRFs but may also be due to the assumption that high resolution metabolites reflect different possible etiologies of type 2 diabetes. Thus, improvement of predictive performance with additional metabolites implies that potential metabolic ramifications may extend far beyond and prior to impaired glucose metabolism. It is of note that each metabolite contributed equally to the improvement of the AUC except tyrosine, exclusion of which dropped the AUC significantly. The AUC of the model without tyrosine is 0.79, and is significantly lower than the AUC of ERF metabolite model which is 0.81 (NRI = 0.38, p-value <0.0001), suggesting that tyrosine is an important component of the model.

In the present study, we found a higher AUC of the metabolite model in lower risk population as female, younger, or leaner subgroups. For the optimum cut-off value of the ROC curve, we observed the biggest difference in specificity especially in the young age group, such that if the sensitivity of the prediction model is set to 80%, the metabolite only model yielded a specificity of 0.82, whereas the glucose model is as low as 0.59. This suggests that the metabolomics information may have better utility for type 2 diabetes prediction specifically in those without the risk condition, which is in agreement with previous study from Walford et al.(Walford et al. 2014). Interestingly, low risk population that develop type 2 diabetes were reported to have higher risk of mortality (Carnethon et al. 2012), raising the importance of more specific predictors suited for different underlying mechanisms. Markers which reflect the metabolic condition both dependent and independent of BMI that may partially help to address the different active pathways underlying to the MUHNW and MHO phenotypes (Mathew et al. 2016).

Two additional platforms measured among subsets of the ERF population which were not included in our main analysis due to sample size restrictions gave us the opportunity to compare some of the associations using these different measurement methods. These were electrospray-Ionization MS, measured in 878 participants, using the method described before (Demirkan et al. 2012) and AbsoluteIDQTM p150 Kit of Biocrates Life Sciences AG measured in 989 participants as details mentioned before (Draisma et al. 2015). Supplementary Fig. 5 shows the x–y plots of the effect estimates per standard error (i.e. Z score) in the 62 lipids and 9 amino acids that were measured in duplication. The Z scores between these platforms are strongly correlated with correlation coefficients ranging from 0.74 to 0.87.

The present study has a strong design such that the new cases develop among the control group in the baseline. However, due to the wide metabolite spectrum in the present study, validation of the full model in an external sample is not available yet. One limitation can be that in the present study, 46.7% of the type 2 diabetes patients at baseline took lipid-lowering medication compared 10.3% in the non-diabetics. To reduce the bias, all the participants were fasted overnight before taking the blood sample and we adjusted for lipid-lowering medication in each step of statistical analysis. It also needs mentioning that the metabolite set that predicts type 2 diabetes is assumed to point out the biochemical pathways disrupted before the disease onset. However, these metabolites may not be necessarily in the causal pathway. We have previously shown that most these metabolites are partially heritable (Demirkan et al. 2015; Kettunen et al. 2016; Draisma et al. 2015) and our increasing knowledge about their genetic determinants opens up new opportunities for testing causal inference using Mendelian randomization (Kettunen et al. 2016).

Conducting a 14-years prospective study with comparably large sample size and wide metabolite spectrum, we developed a novel prediction model which includes informative markers of dyslipidemia, and which also increases the specificity for the young individuals. Importantly, this model has a high potential to result with better understanding of the biological mechanisms leading to glycemic deterioration in prediabetes and diabetes.