figure b

Introduction

Type 2 diabetes is a complex and common metabolic disorder, resulting from the body’s ineffective use of insulin. It can be characterised by hyperglycaemia (high blood sugar) due to impaired insulin secretion and insulin resistance, with most affected people being overweight or obese [1]. Impaired glucose tolerance (IGT) and impaired fasting glucose, together known as impaired glucose regulation (IGR) or prediabetes, characterise an intermediate condition before converging towards diabetes. Recent studies show that a complex interplay of genetic susceptibility, environmental factors, lifestyle (including diet, physical activity, smoking and alcohol consumption), clinical heterogeneity, drugs and gut microbiome orchestrates the development of type 2 diabetes [2]. Over time, individuals with type 2 diabetes are more likely to have a higher risk for heart attacks, strokes [3], neuropathy (nerve damage), retinopathy (causing blindness) and kidney failure as well as several infectious diseases including COVID-19, reducing life quality and causing social burden [4, 5].

Metabolomics profiles involve a set of low-molecular-weight biochemicals (metabolites) that includes sugars, amino acids, organic acids, nucleotides, lipids, xenobiotics and other compound classes. Identifying biochemical changes occurring between prediabetes and diabetes improves risk prediction for better-targeted prevention [6, 7]. In addition, genetic composition can be used to make predictions regarding disease susceptibility. Genome-wide association studies (GWAS) show that more than 400 loci influence the risk of type 2 diabetes [8] and that 900 genetic variants have been associated with BMI [9]. Therefore, linking metabolites with genetics gives access to genetics’ influence on the metabolic compositions [10,11,12,13], providing comprehensive molecular understanding of the disease.

In the Innovative Medicines Initiative - Diabetes Research on Patient Stratification (IMI-DIRECT), we characterised 132 metabolites from targeted measurements and 779 metabolites from untargeted measurements profiled in 3000 individuals at baseline. The study population was stratified by following ADA 2011 glycaemic categories as follows: 23.89% (n=692) had normal glucose regulation (NGR) with fasting glucose 5.23 (SD=0.39) mmol/l; 48.91% (n=1418) had IGR with fasting glucose 5.90 (SD=0.51) mmol/l; and 27.2% (n=890) had type 2 diabetes with fasting glucose 7.15 (SD=1.39) mmol/l [14]. For the integration of non-omics data such as health status, lifestyle and medication with metabolomics, advanced statistical techniques were applied to analyse the data (see Methods). Beyond multivariate and association analyses we performed causal mediation analysis to evaluate potential causal roles of mediators on outcome [15, 16]. A study on drug–omics associations in type 2 diabetes [17] used an unsupervised deep learning framework of multi-omics variational autoencoders (MOVE) to extract significant drug response patterns from 789 individuals newly diagnosed with type 2 diabetes in the IMI-DIRECT cohort. We integrated the polypharmacy effect on metabolomics knowledge from MOVE and compared with our molecular findings in this study.

Our aims in this study were as follows: (1) to characterise 911 small molecular (132 targeted, 779 untargeted metabolomics analysis approach) features associated with prediabetes/IGR and type 2 diabetes; (2) to identify baseline metabolites associated with progression rate estimated from cross-sectional data; (3) to investigate potential mediation effects of metabolites from baseline glycaemic status to follow-up using mediation analysis; and (4) to identify causal relationships between metabolites and type 2 diabetes using genetics drivers using two-sample Mendelian randomisation (2SMR) tests.

Methods

DIRECT cohort

The Diabetes Research on Patient Stratification (DIRECT) cohort encompasses 24,682 European participants at varying risk of glycaemic deterioration, identified and enrolled into a prospective cohort (study 1) of prediabetes (n=2235) and type 2 diabetes (n=830). Using ADA 2011 glycaemic categories in study 1, 33% (n=692) of cohort 1 (prediabetes risk) had NGR, 67% (n=1418) had IGR and 108 were excluded. In study 2, 789 samples were included and 41 samples were excluded. From study 1, 101 excluded samples entered study 2 (n=890). The ratio of self-reported sex varied in each study. Detailed characteristics on inclusion and exclusion criteria, along with the protocol timeline for visits and tests for both studies, have been described elsewhere [14, 18]. In summary, venous blood fasting samples were obtained, followed by performance of DNA extractions and additional biochemical analyses. Metabolomics measurements for distinct samples at the baseline is considered in this study.

Targeted metabolomics (AbsoluteIDQ p150 Kit)

Blood samples in the study were analysed with the AbsoluteIDQ p150 Kit (BIOCRATES Life Sciences, Innsbruck, Austria) (see electronic supplementary material [ESM] Methods for details) [19]. After data export, lower and upper outliers were defined as samples with >33% of metabolite concentrations below 25% quantile (±1.5 × IQR). Metabolite traits with too many zero-concentration samples and unidentified metabolites (NAs, >50%) were excluded (none). The CV was calculated in reference samples for each metabolite over all plates. Metabolite traits with CV>0.25 were excluded. After quality control, 132 metabolites were included in this study (ESM Table 1). Metabolite concentrations were loge-transformed and scaled (mean=0, SD=1) to ensure comparability between the metabolites.

Untargeted metabolomics (Metabolon platform)

Untargeted LC/MS-based techniques covers a broad spectrum of metabolites, in contrast to the targeted techniques wherein metabolites are limited to a predefined set of molecules. For details on sample preparation, measurement and identification of metabolites, see ESM Methods. Incomplete databases and the presence of unknown or novel metabolites have been reported with an asterisk (*) against the metabolite name. The measured volume of the datasets contained 12% missing values. We screened for outlier remover (see ESM Fig. 1 for an example), which added 4% more missing values onto existing missing values (ESM Table 2). Peaks were quantified using AUC. For studies spanning multiple days, a data normalisation step was performed to correct variation resulting from instrument inter-day tuning differences. Essentially, each compound was corrected in run-day blocks by registering the medians to equal one and normalising each data point proportionately (termed the ‘block correction’; ESM Fig. 2). Principal component analysis was performed on the metabolite dataset and checked for technical effects such as centre and sex (see ESM Fig. 3). The data missing pattern was tested using logistic regression considering missing as 0 and non-missing as 1; there was no significant association between missing and regressors indicating the missing-at-random pattern. The K-nearest neighbour (KNN)-based imputation method was applied using K=10 as suggested and optimised from German Cohort KORA F4 [20].

Statistics

Multivariable logistic regression and linear regression

Identifying metabolites specifically associated with the presence of IGR and type 2 diabetes, we ran the logistic regression with adjustment for age, sex, BMI and centre as the basic model, and adjusted additionally for alcohol consumption, smoking, BP, fasting HDL-cholesterol and fasting triacylglycerol as the full model. The concentration of each metabolite was loge-transformed and scaled to have a mean of zero and an SD of 1. Each metabolite was taken as exposure and a binary NGR-IGR, NGR-type 2 diabetes (NGR-T2D) or IGR-type 2 diabetes (IGR-T2D) variable as an outcome. The OR of outcomes was calculated using the β coefficient from logistic regression, where OR>1 indicates higher odds of outcome and OR<0 shows lower odds of outcome. To account for multiple testing, the p values from regression analyses were adjusted for multiple testing using the Bonferroni correction (pfdr values). To stratify sex-dependent metabolites, men and women were separated to test the associations by performing the logistic regression full models.

For incidents of IGR and type 2 diabetes analysis, a binary NGR-IGR, NGR-T2D or IGR-T2D variable at follow-up times of 18 months and 48 months was taken as the outcome; transformed metabolites and the same risk factors in the full model were taken as exposure and covariates, respectively. The same p correction method was adopted.

The linear regression model was used to explore the association between HbA1c progression rate and metabolites at the baseline. HbA1c progression rate was computed with a conditional linear mixed effect model and adjusted for changes in BMI and diabetes medications [21]. Each transformed metabolite was taken as the independent variable and HbA1c concentration as the dependent variable, with adjustment for age and sex. Bonferroni correction was performed for p correction.

Mediation analysis

Mediation analysis followed the basic steps suggested by Baron and Kenny [22], and the significance of the mediation effect was tested with a non-parametric causal mediation analysis [22, 23]. Each identified metabolite was taken as a mediator, glycaemic category status at the baseline as the independent variable and glycaemic category at the follow-up (18 months and 48 months) as the dependent variable. R package ‘mediation (4.5.0)’ was used to calculate the p value and proportion of the mediation effect by bootstrapping with 1000 resamples.

Mendelian randomisation

We used 2SMR approaches from the MRInstruments (0.3.2) and TwoSampleMR library (v0.5.6) to check causal inference [24]. The 2SMR technique enables the establishment of a causal relationship between two observational studies (ESM Fig. 4), solely relying on summary statistics obtained from GWAS [24, 25]. To evaluate the influence of type 2 diabetes on metabolite levels, we conducted a 2SMR examination. Type 2 diabetes instruments were obtained from the genome-wide genotyping study [26] and the corresponding SNP estimates on metabolites were extracted from the metabolite-GWAS [10, 27]. Prior to performing Mendelian randomisation (MR) analysis, exposure and outcome data were harmonised by aligning the SNPs on the same effect allele. We employed the inverse‐variance weighting [10, 26, 27] to estimate the causal effect.

Results

Study populations

After stringent quality control (see ESM Methods), we identified 132 (ESM Table 1) and 779 (ESM Table 2) metabolites from targeted and untargeted metabolomics measurements, respectively, that were profiled for 3000 samples (ESM Table 3) [28]. Baseline characteristics (Table 1) revealed that there were significant differences in BMI, fasting variables and health status observed between NGR, IGR and type 2 diabetes groups. No significant differences in age and smoking status were observed between these three groups. In addition, the study was conducted across seven countries; type 2 diabetes participants were recruited in all centres while participants with NGR or IGR were only recruited in the Amsterdam, Copenhagen, Kuopio and Lund centres.

Table 1 Baseline characteristics of the DIRECT participants based on their glycaemic category

Metabolites associated with prediabetes and diabetes from targeted metabolomics measurements

A multivariable logistic regression model was used with known diabetes-related variables as covariates to identify significant metabolites. Study centre, sex, age and BMI were covariates in the basic model while the additional variables systolic BP, fasting HDL-cholesterol, fasting triacylglycerol, smoking status, alcohol status and health status were added in the full model. Based on the full model, four metabolites differed significantly between the NGR and IGR groups (Fig. 1a). Of these, hexoses (H1) showed the strongest association (OR 1.81 [95% CI 1.59, 2.06], pfdr=3.97×10−17) and served as a positive control throughout our analysis. Thirty-four and 50 metabolites differed significantly between NGR and IGR vs type 2 diabetes, respectively (Fig. 1b,c). As a general pattern, phosphatidylcholines (PCs) and lysophosphatidylcholine (lysoPC) were negatively associated with progression to type 2 diabetes, while branched-chain and aromatic amino acids as well as valeryl/glutaryl-related acylcarnitines were positively associated with type 2 diabetes.

Fig. 1
figure 1

Flag plots representing the results of the multivariable logistic regression models for NGR vs IGR (a), NGR vs type 2 diabetes (b) and IGR vs type 2 diabetes (c) as dependent variables and the metabolites as independent variables, adjusted for study centre, sex, age, BMI, BP, fasting HDL-cholesterol, fasting triacylglycerol, smoking status, alcohol status and health status. The x-axis shows OR (95% CI) and the y-axis shows each significant metabolite; metabolite classes are represented by different colours. SM, sphingomyelin

H1 (OR 9.67 [95% CI 6.54, 14.32], pfdr=1.13×10−27) also had the strongest associations in NGR-T2D while C5-M-DC (OR=5.31 [95% CI 4.16, 6.77], pfdr=1.07×10−38) had the strongest association in IGR-T2D. Three metabolites (H1, lysoPC a C17:0, lysoPC a C18:0) were significantly different in all comparisons (NGR-IGR, NGR-T2D and IGR-T2D), suggesting their important roles in diabetes indication and severity. Detailed statistics for the basic model and full model are shown in ESM Tables 38. As there were many more male participants than female participants enrolled in the study, a sensitivity analysis stratified by sex was conducted, and is reported in ESM Results, ESM Tables 914 and ESM Fig. 5.

Metabolites associated with prediabetes and diabetes from untargeted metabolomics measurements

Fifteen metabolites were significantly changed between NGR and IGR based on the logistic regression analyses in the full model (Fig. 2a). Fructosyl lysine had the highest statistically significant association with progression to IGR (OR 1.53 [95% CI 1.37, 1.71], pfdr=8.64×10−12). Similarly, 99 and 108 metabolites differed significantly between NGR or IGR and type 2 diabetes, respectively (Fig. 2b,c). As a general pattern, lipids were negatively associated and amino acids were positively associated with progression to type 2 diabetes. 1-(1-Enyl-palmitoyl)-2-oleoyl-GPC (P-16:0_18:1)* (OR 0.23 [95% CI 0.17, 0.31], pfdr=3.48×10−18) had the strongest association for the NGR-T2D comparison, while cysteine-S-sulphate (OR 3.25 [95% CI 2.55, 4.15], pfdr=3.11×10−18) was significantly associated in the IGR-T2D comparison. Seven metabolites (fructosyl lysine, glutamate, 1-stearoyl-GPC (18:0), N-lactoylphenylalanine, N-lactoylvaline, picolinoyl glycine, mannonate) appeared significant in all comparison groups, suggesting their important roles as diabetes risk indicators. Detailed statistics are presented in ESM Tables 1520. A sex-based sensitivity analysis of metabolomics data from the untargeted measurements is reported in ESM Results, ESM Table 2126, ESM Fig. 6.

Fig. 2
figure 2

Flag plots representing the results of the multivariable logistic regression models for NGR vs IGR (a), NGR vs type 2 diabetes (b) and IGR vs type 2 diabetes (c) as dependent variables and the metabolites as independent variables, adjusted for study centre, sex, age, BMI, BP, fasting HDL-cholesterol, fasting triacylglycerol, smoking status, alcohol status and health status. The x-axis shows OR (95% CI) and the y-axis shows each significant metabolite; metabolite classes are represented by different colours. Asterisks (*) indicate the presence of unknown or novel metabolites

Metabolites associated with HbA1c progression rate

HbA1c progression rate was computed with a conditional linear mixed effect model and adjusted for changes in BMI and diabetes medications [21]. In multivariable linear regression analysis, lysoPC a C17:0 (β −0.0535 [95% CI −0.08, −0.0269], pfdr=0.0109), glycine (Gly) (β −0.0509 [95% CI −0.0782, −0.0236], pfdr=0.0347) and H1 (β 0.0481 [95% CI 0.0218, 0.0745], pfdr=0.0452) were significantly correlated with HbA1c progression rate and all were related to glycaemic-deterioration traits as well. In untargeted metabolomic profiling, 20 metabolites were significantly related to HbA1c progression rate, with pyruvate (β 0.0877 [95% CI 0.0609, 0.114], pfdr=1.28×10−7) showing the strongest association. Besides pyruvate, N-lactoylleucine, lactate, N-lactoylphenylalanine, X-15245, N-lactoylisoleucine, N-lactoylvaline, 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-16:0/18:1)*, cortolone glucuronide, X-24295, formiminoglutamate and N-lactoyltyrosine were also significantly associated with glycaemic categories. Tables 2 and 3 show the metabolites with significant associations, while the complete results are reported in ESM Tables 2728.

Table 2 Metabolites from targeted measurements significantly associated with HbA1c progression rate from a linear regression model
Table 3 Metabolites from untargeted metabolomics measurements significantly associated with HbA1c progression rate from a linear regression model

Metabolite association with incident diabetes (IGR/type 2 diabetes)

Several metabolites were identified to be significantly associated with HbA1c progression rate as well as glycaemic category: three targeted metabolites (lysoPC a C17:0; glycine, H1); and 12 untargeted metabolites (pyruvate, N-lactoylleucine, lactate, N-lactoylphenylalanine, X-15245, N-lactoylisoleucine, N-lactoylvaline, 1-[1-enyl-palmitoyl[-2-oleoyl-GPC* [PC(P-16:0/18:1)], cortolone glucuronide, X-24295, formiminoglutamate, N-lactoyltyrosine). Next, we investigated their predictive value for IGR and type 2 diabetes by including baseline metabolite concentrations and incident IGT or type 2 diabetes in follow-up timelines in multivariable logistic regression. As shown in Table 4, lysoPC a C17:0 concentration at baseline was observed to significantly differ in 244 incident IGR individuals compared with 398 NGR control individuals after 18 months. The sum of H1 at baseline concentrations showed significant differences between incident IGR (at 48 month follow-up) and NGR or incident type 2 diabetes and IGR at both the 18 month and the 48 month follow-up.

Table 4 Metabolites from targeted measurements that were significantly associated with incidence of IGR and type 2 diabetes in different pairwise comparisons

In untargeted metabolomic profiling, lactate and X-24295 baseline concentrations were significantly correlated with IGR or type 2 diabetes incidence at the 18 month and 48 month follow-up (Table 5). Formiminoglutamate, N-lactoylleucine and N-lactoylvaline significantly differed in 244 incident IGT individuals compared with 398 NGT control individuals after 18 months. We did not find any significant metabolites from untargeted measurements to predict the incidence of IGR from NGR at 48 months.

Table 5 Metabolites from untargeted measurements that were significantly associated with incidence of IGR and type 2 diabetes in different pairwise comparisons

Mediation analysis

Causal mediation analysis was employed to explore the potential mediation effects of the identified metabolites from baseline glycaemic status to follow-up. Consistent with incidence results, lysoPC a C17:0 showed strong significance (proportion of mediation by 13%, mediation effect p=0.034, Fig. 3a), indicating that this metabolite partially mediated the glycaemic deterioration from NGR to IGR at 18 months. The positive control H1 exhibited significant mediation effects in all groups (between 6% and 9%) as it is mainly represented by blood glucose.

Fig. 3
figure 3

Schematic overview of mediation analysis with lysoPC a C17:0 and hexoses (a) or N-lactoylvaline, lactate, N-lactoylleucine, formiminoglutamate and X-24295 (b) as mediators. Numbers above the red arrows indicate the percentage and significance of mediation effects. T2D, type 2 diabetes

N-Lactoylvaline (proportion of mediation 24%, mediation effect p<2×10−16), lactate (proportion of mediation 22%, mediation effect p=0.002), N-lactoylleucine (proportion of mediation 20%, mediation effect p=0.006), formiminoglutamate (proportion of mediation 11%, mediation effect p=0.034) and X-24295 (proportion of mediation 11%, mediation effect p=0.042) were all observed to show significant mediation effects from baseline NGR to IGR at 18 months’ follow-up (Fig. 3b). Furthermore, formiminoglutamate (proportion of mediation 23%, mediation effect p=0.006) showed a significant mediation effect from NGR to IGR at 48 months. These results suggest that these metabolites own a significant mediation effect on glycaemic deterioration.

MR

The availability of genetic data on type 2 diabetes makes the use of MR particularly compelling. To assess bidirectional causal relationships between type 2 diabetes and metabolites (Fig. 4), we employed 2SMR tests. After multiple testing correction only the concentration of the sum of H1 was determined by type 2 diabetes (p<0.05/117=0.00042). For untargeted metabolites we found instruments for only 19% of the metabolites (i.e. 151 out of 779). For example, instruments are from genes TCF7L2, IGF2BP2, NOTCH2, CDKAL1, PABPC4, FTO and JAZF1, known to be associated with diabetes and that have been further significantly associated with the metabolites. Following multiple testing correction, it suggests that the change in an amino acid (glutamate) and a lipid (caproate, FA C6:0) was caused by change in type 2 diabetes status (p<0.05/151=0.000331). However, metabolites that are causal for type 2 diabetes (meaning that the change in metabolite caused change in the disease status) included several phosphatidylcholines, namely PC aa C36:2, PC aa C36:5, PC ae C36:3 and PC ae C34:3, from the targeted metabolomics dataset. From the untargeted metabolomics dataset, two n-3 fatty acids, namely stearidonate (18:4n3) and docosapentaenoate (n3 DPA; 22:5n3), were identified to be causal for type 2 diabetes. Detailed statistics of our MR analysis are presented in ESM Tables 2932.

Fig. 4
figure 4

Forest plot representing causal estimates of type 2 diabetes on targeted and untargeted metabolites in the two-sample MR test. T2D, type 2 diabetes

Discussion

In this study, we used untargeted metabolomics to provide semi-quantitative global screening of metabolites in the development of a disease whereas targeted metabolomics was used to quantify a pre-selected subset of metabolites with absolute concentrations. However, the overlap between the two metabolomic techniques was limited to a few amino acids and lipids. In the current study we report 19 metabolites (three from targeted and 14 from global profiling, plus one common lysoPC a C18:0 / 1-stearoyl-GPC [18:0]) that were significantly associated with prediabetes in the DIRECT cohort. The advantages of global profiling become evident as it allows for the identification of a broader spectrum of metabolites. Few notable examples are given here. First, picolinoylglycine (HMDB0059766), which is potentially a phase II product of picolinic acid, a degradation product of tryptophan [29] and glycine [30], and shows potential as a novel marker for glycaemic deterioration. Prediabetes is often associated with dyslipidaemia, marked by an imbalanced lipid profile compared with individuals with NGR [24]. Second, N-lactoyl amino acids are not infrequently observed in metabolomic datasets. In fact it has come to light that N-lactoyl amino acids were misidentified in some metabolomic studies and were erroneously reported as 1-carboxyethyl amino acids. In particular, N-lactoyl-phenylalanine (Lac-Phe) is known to act as an appetite suppressant when given to obese mice [31]. However, in humans Lac-Phe concentrations were observed to rise following vigorous exercise [32]. In fact, the most recent study shows that Lac-Phe facilitates the impact of metformin on both food intake and body weight [33, 34]. It seems that the exact role of Lac-Phe in the human body and pathways downstream, such as energy metabolism, insulin signalling, exercise-induced pathways, are unclear and needs further research.

We are aware of several limitations to our study. Although metabolomics screening showcases numerous valuable attributes in health science, challenges inherent to this approach continue to exist, especially in the accurate identification of metabolites which is crucial for the biological interpretation and validation of metabolomics data [35]. Variability in sample collection, preparation and analytical techniques can impact the reproducibility and comparability of results across different studies. Standardisation efforts are ongoing but may not fully address all sources of variation. The identification of metabolites, especially in untargeted metabolomics, can be challenging. Incomplete databases and the presence of unknown or novel metabolites have been reported with a metabolite name with an asterisk (*) sign. However, ongoing advancements in technology, methodology and standardisation efforts aim to enhance the robustness and applicability of metabolomics studies [35]. The current study is predominantly based on White male participants from the Kuopio region of Europe, and for this reason an additional sex-based sensitivity analysis has been performed and reported separately (ESM Results 1 and 2). Challenges in MR studies include limited statistical power, potential reverse causation, confounding and pleiotropy [36]. Caution is advised in interpreting causality inference, considering the various limitations mentioned in the methods, and precautionary measures were taken by using valid MR instruments and reporting Bonferroni significance.

A drug–metabolomics associations study [17] was examined to determine whether or not metabolites linked to type 2 diabetes from the DIRECT study were also associated with a particular drug. Looking at our results and those of Allesøe et al [17], we found that 44% (15 out of 34) of targeted metabolites and 3% (three out of 99) of non-targeted metabolites that were significantly associated with type 2 diabetes also showed a significant association with at least one of the 20 drugs. This suggests that metabolites linked to type 2 diabetes may be confounded by polypharmacy.

However, metabolite association with incident prediabetes or diabetes (IGR-T2D) showed that lysoPC a C17:0 could predict the risk of developing IGR at 18 months and 48 months. It has already been shown that lysoPCs differ significantly between individuals with incident IGT or type 2 diabetes and individuals with NGR in the KORA study [37]. LysoPC a C17:0 was negatively associated with diabetes, a finding that was confirmed in several studies [38, 39]. The aforementioned drug–metabolomics association study [17] showed that lysoPC a 17:0 was not associated with the drugs. However, the origin of odd-chain fatty acids (mainly C15:0 and C17:0) remains elusive. Jenkins et al [40] investigated the origin of circulating odd-chain fatty acids (C17:0, C15:0) through a combination of animal and human studies to determine possible contributions of fatty acids from the gut-microbiota, diet and novel endogenous biosynthesis [41]. The findings suggested that C15:0 was linked to dietary intake, while C17:0 was predominantly biosynthesised, indicating independent origins and non-homologous roles in disease causation.

Causal mediation analysis indicated that plasma lactate strongly mediates the effects of identified metabolites in the transition from baseline glycaemic status to follow-up [42]. In a longitudinal study of Swedish men, elevated serum lactate was independently linked to a higher incidence of type 2 diabetes, irrespective of obesity measures [43]. Formiminoglutamate was confirmed to be associated with a higher risk of incident type 2 diabetes in older Puerto Ricans [44]. N-lactoylleucine and N-lactoylvaline, derivatives of leucine and valine, respectively, are ubiquitous pseudodipeptides of lactic acid and amino acids that are formed by reverse proteolysis [32] and are correlated with underivatised amino acids in human plasma. The Microbiome and Insulin Longitudinal Evaluation Study (MILES) [45] investigated the association between ABO haplotypes and insulin-related characteristics, and explored possible pathways that could mediate these associations. The study showed that the A1 haplotype potentially enhances favourable insulin sensitivity in non-Hispanic White individuals, with lactate likely influencing this mechanism, while gut bacteria are not believed to be a contributing factor.

In MR, causality signifies that modifying exposure leads to a predictable change in the outcome. Our 2SMR analysis suggests that the metabolites causal for type 2 diabetes are PC aa C36:2, PC aa C36:5, PC ae C34:3 and PC ae C36:3 and all these metabolites are significantly associated with drug–metabolomics. However, from untargeted metabolomics two n-3 fatty acids, namely stearidonate (18:4n3) and docosapentaenoate DPA 22:5n3), are not further associated with drugs. In 2012, Banz et al [46] explored the therapeutic implications of stearidonate acid in preventing or managing type 2 diabetes. The Fatty Acids and Outcomes Research Consortium (FORCE) [47] found that higher circulating biomarkers of seafood-derived n-3 fatty acids were associated with lower type 2 diabetes risk. On the contrary, branched-chain amino acids [48] and sphingomyelin [15] have been shown to have a causal role in type 2 diabetes development, a correlation not observed in the DIRECT study.

Conclusions

Our study demonstrates that alteration in blood plasma metabolites is associated with glycaemic deterioration. The progression from prediabetes to diabetes is mediated by novel metabolites such as picolinoylglycine and N-lactoyl-amino acids, as demonstrated by evidence from the DIRECT study. N-lactoyl-amino acids are known to be exercise-induced metabolites that suppress food intake and influence glucose homeostasis. Additional functional research and quantification are needed to advance the identification of early metabolic biomarkers such as N-lactoyl-amino acids, which have the potential to forecast the onset of type 2 diabetes. Collectively, these findings direct attention towards novel metabolic signatures associated with glycaemic deterioration.