figure b

Introduction

Type 2 diabetes, constituting over 90% of diabetes cases, poses a major threat to global health. The worldwide prevalence of type 2 diabetes in adults has increased from ~150 million in 2000 to >450 million in 2019 and is projected to rise to ~700 million by 2045 [1]. The complications of type 2 diabetes, both macrovascular (for example, CVD) and microvascular (for example, complications affecting the kidney, the retina and the peripheral nervous system), further increase the global burden of type 2 diabetes [2]. The major drivers of the global diabetes epidemic are unhealthy diet, overweight/obesity, genetic predisposition and other lifestyle factors [3]. Plant-based diets, especially healthy plant-based diets rich in high-quality plant foods, such as whole grains, fruits and vegetables, have been associated with a lower risk of type 2 diabetes [4, 5]. However, the underlying mechanisms are not fully understood.

Recent advances in high-throughput metabolomics profiling have brought about a new era for nutritional research. Metabolomics measures downstream components or metabolic products of foods. Identifying metabolites associated with plant-based diets may provide new insights into the biological processes behind the association between plant-based diets and reduction in type 2 diabetes risk. Many previous studies have identified the metabolites that were correlated with several other dietary patterns, such as the Alternate Mediterranean Diet [6], the Dietary Approaches to Stop Hypertension diet [7], and western and prudent diets [8]. However, most studies analysed metabolites individually. Only a few studies accounted for the high correlations among the metabolites [9, 10] and obtained multi-metabolite profiles of dietary patterns [11].

We therefore analysed the plasma metabolites among 10,684 participants from the Nurses’ Health Study (NHS), NHSII, and Health Professionals Follow-up Study (HPFS) and identified multi-metabolite profiles associated with plant-based diets by applying a training and testing approach. The identified metabolite profiles may serve as potential biomarkers of plant-based diets, but also markers of complex metabolic responses to the dietary exposures [12, 13]. We then prospectively examined the association of obtained multi-metabolite profiles with incident type 2 diabetes risk and explored the potential mediating metabolites.

Methods

Study population

Our analyses were performed within three prospective cohort studies: NHS, NHSII and HPFS. The NHS was initiated in 1976, enrolling 121,700 female nurses aged 30–55 years [14]. The NHSII was established in 1989 and recruited 116,429 female nurses aged 25–42 years [14]. The HPFS began in 1986 and enrolled 51,529 male health professionals aged 40–75 years [15]. Using a similar protocol, we collected blood samples from 32,826 participants in the NHS between 1989 and 1990, from 29,611 participants in the NHSII between 1996 and 1999, and from 18,225 participants in the HPFS between 1993 and 1995. Participants were mailed blood collection kits, and after collection samples were shipped via overnight courier with an icepack to our laboratory. Whole blood samples were then processed and separated into plasma, buffy coat and red blood cells and stored in liquid nitrogen freezers [16, 17].

In the present study, we included participants who were selected for 14 prior nested case–control sub-studies on metabolomics (electronic supplementary material [ESM] Table 1). A total of 10,684 participants with available dietary intake data were included in the metabolomics analysis to identify the multi-metabolite profiles that were correlated to plant-based diets. For the analysis of metabolite profiles with incident type 2 diabetes, we excluded participants lost to follow-up after blood collection or participants reporting a history of cancer, CVD, or type 2 diabetes at blood draw. We also excluded participants from the gestational diabetes sub-study, which oversampled type 2 diabetes cases. In all, 8827 participants were included in the prospective analysis (Fig. 1). The study protocols were approved by the institutional review boards of Brigham and Women’s Hospital and Harvard T.H. Chan School of Public Health. Participants’ completion of questionnaires was considered as implied consent.

Fig. 1
figure 1

Schematic of the study design. To make full use of the data, we calculated metabolite profile scores in the training set using a leave-one-out approach. We left one participant out each time and used the remaining participants to train an elastic net model (using tenfold cross validation). Then we applied this trained model to the participant left out and calculated the metabolite profile score. We repeated this step for every participant in the training set and calculated their metabolite profile scores

Dietary assessment and plant-based diet indices

Self-reported dietary data were ascertained using semiquantitative food frequency questionnaires (FFQs). The validity and reproducibility of the questionnaires have been well documented [18, 19]. To better reflect habitual diets, we calculated the mean of the dietary intakes from the two FFQs closest to the time of blood draw. Three indices were calculated to estimate the adherence to three plant-based diets: an overall Plant-based Diet Index (PDI), a Healthy Plant-based Diet Index (hPDI), and an Unhealthy Plant-based Diet Index (uPDI). The development of these three indices has been described previously [5, 20]. In brief, based on nutrient and culinary similarities, 18 food groups were created within three large categories: healthy plant foods (n = 7; whole grains, fruits, vegetables, nuts, legumes, vegetable oils and tea/coffee); unhealthy plant foods (n = 5; refined grains, fruit juices, potatoes, sugar-sweetened beverages and sweets/desserts); and animal foods (n = 6; animal fats, dairy, eggs, fish/seafood, meat and miscellaneous animal-based foods). We distinguished between healthy and unhealthy plant foods using existing knowledge of associations of the foods with type 2 diabetes, CVD, certain cancers, and intermediate conditions, including obesity, hypertension, hyperlipidaemia and inflammation [5]. Alcoholic beverages were not included in the indices but were adjusted for in the multivariable analyses.

Each of the 18 food groups was divided into quintiles of consumption and given positive or reverse scores. For positive scores, a score of 5 was assigned to the highest quintile and a score of 1 to the lowest; for reverse scores, the scoring pattern was inversed, with 5 being assigned to the lowest quintile and 1 to the highest. All plant foods were given positive scores, and animal food groups were given reverse scores to calculate PDI. When calculating hPDI, only healthy plant foods were given positive scores, whereas only unhealthy plant foods were given positive scores for uPDI calculation. Finally, the 18 food group scores for each participant were summed to obtain the indices, with a possible range of 18 to 90.

Metabolomics measurement

The plasma metabolomics profiling data were obtained using high-throughput LC MS techniques at the Broad Institute of MIT and Harvard (Cambridge, MA, USA) [21]. Details of the metabolomics measurement can be found in the ESM Methods. Polar metabolites were separated using hydrophilic interaction LC with positive ionisation mode MS detection (HILIC-pos), whereas C8 chromatography with positive ionisation mode detection (C8-pos) was used to profile lipids. Targeted raw data were processed using TraceFinder software (Thermo Fisher Scientific, Waltham, MA, USA), and non-targeted data were processed using Progenesis QI (Nonlinear Dynamics, Newcastle upon Tyne, UK). Metabolite identification was reported based on the standards proposed by the Metabolomics Standards Initiative [22]. Requirements for ‘level 1’ and ‘level 2’ identification were described in the ESM Methods.

We excluded metabolites whose intraclass r across blinded quality control replicates (10% of study samples) were <0.3 (n = 7) and metabolites with an overall detection rate <75% (n = 139). Missing data for each metabolite were imputed using the random forest imputation approach as it has been previously recommended for metabolomics analysis [23]. We also excluded metabolites with poor stability (Spearman correlation or intraclass r, 0.75) due to delayed processing that was observed in our pilot study (n = 61) [24]. Finally, we included a total of 264 known metabolites in the current analysis (ESM Table 2). They were primarily lipids (n = 178, including 75 glycerolipids, 28 glycerophospholipids, 23 plasmalogens, 21 carnitines, 19 lysophospholipids, and 12 cholesteryl esters), but also included amino acids related metabolites (n = 37) and other metabolites (n = 49) (Fig. 2a). Metabolites were highly correlated within each category (ESM Fig. 1).

Fig. 2
figure 2

Known metabolites measured and the metabolite profile score for three plant-based diet indices. (a) The distributions of categories for the 264 assayed known metabolites. (b) Venn diagram of selected metabolites for the metabolite profile score of PDI, hPDI and uPDI. (c) Correlation between PDI and the corresponding metabolite profile score

Ascertainment of type 2 diabetes

Participants who reported physician-diagnosed diabetes on the biennial questionnaire were sent a validated supplementary questionnaire to enquire about symptoms, diagnostic tests and medication use and confirm the diagnosis [25, 26]. We only included confirmed type 2 diabetes cases that met at least one of the following National Diabetes Data Group criteria: (1) one or more classic symptoms (excessive thirst, polyuria, weight loss and hunger) plus fasting blood glucose ≥7.8 mmol/l (≥140 mg/dl) or random blood glucose ≥11.1 mmol/l (≥200 mg/dl); (2) no symptoms but elevated blood glucose on two separate occasions (fasting blood glucose ≥7.8 mmol/l [≥140 mg/dl] or random blood glucose ≥11.1 mmol/l [≥200 mg/dl] or 2 h blood glucose ≥11.1 mmol/l [≥200 mg/dl] after OGTT); (3) any treatment with insulin or other glucose-lowering medications for diabetes [27]. The threshold for fasting blood glucose was changed to ≥7.0 mmol/l (≥126 mg/dl) as per the ADA diagnostic criteria after 1998 [28].

Non-dietary covariates

Information on body weight, physical activity, smoking status, multivitamin use, ethnicity, family history of type 2 diabetes, hypertension, antihypertensive medication use, hypercholesterolaemia and lipid-lowering medication use was collected from self-reported biennial questionnaires preceding blood collection. BMI was derived using height reported at cohort baseline and body weight reported closest to the blood draw. We collected age and fasting status through questionnaires completed at blood collection.

Statistical analyses

We log-transformed metabolites that were highly skewed (absolute skewness ≥2) [29]. All metabolites were then converted to z scores with a mean of 0 and SD of 1 within each sub-study. To identify metabolite profiles correlated to plant-based diets, all the participants were randomised to either the training set or the testing set in a 7 to 3 fashion (Fig. 1). We used an elastic net model within a tenfold cross-validation framework to regress PDI, hPDI or uPDI on the 264 known metabolites in the training set [30]. We then applied the trained model to the testing set to calculate a metabolite profile score. The metabolite profile score was calculated as the weighted sum of the selected metabolites with weights equal to the elastic net regression coefficients. The score in the training set was obtained using a leave-one-out approach to avoid overfitting. We then calculated r between diet indices and the corresponding metabolite profile score. To evaluate the robustness of the metabolites selected in the metabolite profile score, we examined the associations between these metabolites and the corresponding diet index stratified by training/testing set, cohort, fasting status, and original case–control status. Associations of selected metabolites with diet indices and the dietary components were examined using multivariable linear regression. Plant-based diet indices and intake of each component were standardised before regression analyses.

Associations of plant-based diet indices, metabolite profile scores and selected metabolites with type 2 diabetes risk were assessed by Cox regression using combined data from the training set and testing set (Fig. 1). The Cox regression was stratified by cohort, original sub-study and case–control status and adjusted for age in months, BMI, ethnicity, fasting status, multivitamin use, smoking status, physical activity, hypertension, antihypertensive medication use, hypercholesterolaemia, lipid-lowering medication use, family history of diabetes, total energy intake, alcohol intake and margarine intake. The person-time for each participant was calculated from the blood collection date until the diagnosis of type 2 diabetes, death, loss to follow-up, or end of follow-up (June 2016 in the NHS, June 2017 in the NHSII, and January 2016 in the HPFS), whichever came first.

To test whether an association between the metabolite profile score and risk of type 2 diabetes was due to its correlation with diet index, we further included the diet index and the corresponding score simultaneously in the multivariable model. Among metabolites selected in the profile score, we identified potential mediating metabolites of the association between plant-based diets and type 2 diabetes risk according to four predefined mediation criteria [31]: (1) existence of an association between PDI/hPDI/uPDI and type 2 diabetes; (2) existence of an association between PDI/hPDI/uPDI and the potential mediating metabolite in all participants and each stratum mentioned above; (3) existence of an association between the potential mediating metabolite and type 2 diabetes in the opposite direction as with PDI/hPDI or in the same direction as with uPDI; and (4) attenuation of the diet–type 2 diabetes association after adjusting for the potential mediating metabolite. Mediation analyses were further performed to estimate the proportion of diet-type 2 diabetes association that could be explained by the intermediate metabolite and the metabolite profile score [32, 33]. All statistical tests were two-sided (p<0.05), and multiple testing was corrected using the Bonferroni method. All analyses were performed in R version 4.0 [34]. The main R packages used were ‘missRanger’ for random forest imputation, ‘glmnet’ for elastic net regression, and ‘survival’ for Cox regression.

Results

Characteristics of the study participants

The participants included in the metabolomics analysis were predominantly white, middle-aged (mean age 54 ± 10 years), and the mean BMI was 25.6 ± 4.8 kg/m2 (Table 1). Among them, 8827 participants free of diabetes, CVD and cancer at blood collection were further included in the association analysis of type 2 diabetes (Fig. 1). Compared with participants who did not develop type 2 diabetes after the blood draw, participants who were diagnosed with type 2 diabetes had a higher BMI, were more likely to have hypertension and hypercholesterolaemia, use antihypertensive and lipid-lowering medications, and report a family history of diabetes, but less likely to be physically active (Table 2). They also had a lower intake of healthy plant foods and have lower PDI and hPDI.

Table 1 Characteristics of the study participants included in the metabolomics analysis
Table 2 Characteristics of the study participants included in the association analysis of type 2 diabetes

Metabolite profiles related to plant-based diet indices

A panel of 55 metabolites for PDI, a panel of 93 metabolites for hPDI and a panel of 75 metabolites for uPDI were selected by the elastic net regression (Fig. 2b). The metabolite profile score based on the selected metabolites was significantly correlated with each corresponding diet index (Pearson r = 0.33 for PDI, 0.41 for hPDI and 0.37 for uPDI in the training set; Pearson r = 0.35 for PDI, 0.45 for hPDI and 0.38 for uPDI in the testing set; all p values <0.001) (Fig. 2c). The correlations between the diet index and the corresponding score were consistent across three cohorts (ESM Fig. 2). Among the selected metabolites, 24 out of 55 were consistently associated with PDI in all participants and across training/testing set, three cohorts, fasting/non-fasting, and case/non-case participants (ESM Fig. 3), 37 out of 93 with hPDI (ESM Fig. 4), and 26 out of 75 with uPDI (ESM Fig. 5).

Seventeen metabolites were only selected in the metabolite profile for PDI (ESM Fig. 6a). Metabolites inversely correlated with PDI, such as hydroxyproline, C5 carnitine, C38:5 PE plasmalogen, C40:7 PE plasmalogen and three triacylglycerols (TAGs), were negatively associated with intake of healthy plant foods but positively associated with animal foods, especially animal fats, dairy and total meat (Fig. 3). hPDI and uPDI exhibited different metabolite profiles (ESM Fig. 6b and 6c). Several metabolites were selected in the profile for both hPDI and uPDI but with coefficients in the opposite directions (ESM Fig. 6c). Metabolites including hippurate, trigonelline, betaine, pipecolic acid, pantothenic acid, N-acetylornithine, C22:0 LPE and C58:11 TAG were positively associated with hPDI but inversely associated with uPDI. On the contrary, metabolites such as C16:1 LPC, C38:4 PC plasmalogen, N2,N2-dimethylguanosine, C54:1 TAG and myristoleic acid were inversely associated with hPDI but positively associated with uPDI. The positive associations of the above metabolites with hPDI were also observed for whole grains, fruits, vegetables, legumes and tea/coffee; the positive associations of metabolites associated with uPDI were observed for potatoes, sugar-sweetened beverages and sweets/desserts (Fig. 3).

Fig. 3
figure 3

Associations of the 142 metabolites selected in the metabolite profiles for three plant-based diet indices with food groups. Associations with food groups were conducted among 10,684 participants with available metabolomics data and dietary intake data. Presented from left to right are the metabolites’ coefficients obtained from elastic net regression (weights in the calculation of metabolite profile score) and associations with food groups. Coefficients for associations with food groups indicate the SD changes in metabolites per SD increment in dietary intake. Colours denote the association directions (red, positive; blue, inverse) and magnitudes (the darker the colour, the stronger the magnitude); asterisks in the coloured cell represent association significance (* p<0.05 and ** p<Bonferroni-corrected 0.05). aLevel 2 identification defined by the Metabolomics Standards Initiative

Associations of metabolite profile score with the risk of type 2 diabetes

In the prospective analyses of the diet indices with type 2 diabetes risk, inverse associations were observed for PDI and hPDI after adjusting for BMI and other potential confounders (HR per 1 SD higher [the same below]: 0.88 [95% CI 0.81, 0.95] for PDI; HR: 0.85 [95% CI 0.79, 0.93] for hPDI), while no significant association was observed for uPDI (HR: 1.07 [95% CI 0.99, 1.16]). The metabolite profile scores for PDI and hPDI were also both inversely associated with type 2 diabetes incidence after multivariable adjustment (HR: 0.83 [95% CI 0.76, 0.90] for PDI score; HR: 0.80 [95% CI 0.73, 0.87] for hPDI score). No association was observed for the metabolite profile score of uPDI (HR: 1.03 [95% CI 0.95, 1.11]) (Table 3). The inverse associations of metabolite profile scores for PDI and hPDI with type 2 diabetes risk were only slightly attenuated and remained statistically significant after further adjustment for diet indices PDI and hPDI (Table 3). However, the inverse associations for diet indices PDI and hPDI were markedly weakened and became statistically non-significant after adjusting for the metabolite profile score (Table 3). The metabolite profile score for PDI explained 47.8% of the PDI-type 2 diabetes association (Fig. 4a); the proportion of hPDI-type 2 diabetes association explainable by the metabolite profile score for hPDI was 51.0% (Fig. 4b).

Table 3 Associations of plant-based diet indices and metabolite profile scores with type 2 diabetes risk
Fig. 4
figure 4

Associations of PDI (a) and hPDI (b) with the risk of type 2 diabetes, adjusting for intermediate metabolites. All models were stratified by study cohorts, original sub-studies, and the case–control status in the original sub-study, and adjusted for age in months (continuous), fasting status (yes or no), BMI (continuous), race (white or non-white), family history of diabetes (yes or no), multivitamin use (yes or no), smoking status (never, past or current), physical activity (continuous), hypertension (yes or no), antihypertensive medication use (yes or no), hypercholesterolaemia (yes or no), lipid-lowering medication use (yes or no), total energy intake (continuous), alcohol intake (continuous) and margarine intake (continuous). aLevel 2 identification defined by the Metabolomics Standards Initiative. GABA, γ-aminobutyric acid

Metabolites mediating the associations of plant-based diets with type 2 diabetes

Following four predefined criteria, we identified eight potential intermediate metabolites of the PDI-type 2 diabetes association (ESM Fig. 7a). γ-aminobutyric acid (GABA), C5 carnitine and three TAGs (C48:0 TAG, C51:0 TAG and C52:0 TAG) were inversely associated with PDI but positively associated with type 2 diabetes, whereas trigonelline, betaine and glycine were positively associated with PDI but inversely associated with type 2 diabetes (ESM Table 3). The proportion of PDI-type 2 diabetes association explainable by these single metabolites ranged between 8.5% and 37.2% (Fig. 4a). Similarly, 11 metabolites were identified as potential intermediates of the hPDI-type 2 diabetes association (ESM Fig. 7b). Isoleucine, C22:0 Ceramide (d18:1) and six TAGs (C49:1 TAG, C49:2 TAG, C50:3 TAG, C51:1 TAG, C51:3 TAG and C54:1 TAG) were inversely associated with hPDI but positively associated with type 2 diabetes. In contrast, trigonelline, hippurate and C22:6 CE were positively associated with hPDI but inversely associated with type 2 diabetes (ESM Table 4). The explainable proportion of hPDI-type 2 diabetes association by these metabolites ranged between 10.6% and 33.9% (Fig. 4b).

Discussion

Leveraging metabolomics data from 10,684 participants, we identified multi-metabolite profiles associated with plant-based diets, showing differential profiles for healthy vs unhealthy plant-based diets. The metabolite profile scores for both the overall plant-based diet and a healthy plant-based diet were inversely associated with incident type 2 diabetes, independent of BMI, the corresponding diet index, and other diabetes risk factors among a generally healthy population. We further identified several potential intermediate metabolites (trigonelline, hippurate, isoleucine and a subset of TAGs) of the association of overall and healthy plant-based diets with type 2 diabetes risk.

Plant-based diets, by definition, are higher in plant foods and lower in animal foods, which is reflected in our metabolomics readouts. We observed that hydroxyproline, an amino acid highly abundant in meat and negligible in many plant foods [35], was inversely correlated with PDI and positively associated with total meat intake. A general plant-based diet could encompass unhealthy foods such as refined grains, potatoes and sugar-sweetened beverages, which are associated with higher cardiometabolic risk [36, 37]. Therefore, an overall plant-based dietary pattern does not appear sufficient to differentiate the quality of the diet. As such, diet indices hPDI and uPDI were correlated with unique patterns of metabolites. Among 93 metabolites selected in the metabolite profile for hPDI, 50 were unique to hPDI; similarly, 32 out of 75 were unique to uPDI. A plant-based diet captured by hPDI is scored for being higher in whole grains, fruits, vegetables, nuts, legumes, vegetable oils and tea/coffee, whereas the uPDI is lower in these foods. Thus, metabolites related to coffee intake such as trigonelline and hippurate [38], metabolites related to legume intake such as pipecolic acid [39], and a potential novel marker of vegetable intake – N-acetylornithine [6], were positively correlated with hPDI but inversely with uPDI.

Consistent with our previous findings on diet indices PDI and hPDI, the metabolite profile scores for PDI and hPDI were also inversely associated with the risk of type 2 diabetes [5]. Although our previous study observed that participants with higher uPDI had a higher type 2 diabetes risk than those with a lower uPDI, we did not observe an association between the metabolite profile score for uPDI and type 2 diabetes risk. One explanation could be that the strength of the association between diet index uPDI and type 2 diabetes risk was weaker than that for PDI and hPDI [5]. With a smaller sample size compared with our previous study, which used participants from the whole cohorts, the association between uPDI and risk of type 2 diabetes was not significant in the present study. It could also be that our metabolomics platforms did not adequately capture the deleterious metabolic effects of unhealthy plant foods, such as high glycaemic starchy foods and sugary beverages.

Our metabolomics results depict the links between healthy plant foods, potential intermediate metabolites, and subsequent risk of type 2 diabetes (Fig. 5). The attenuation of the association after adjustment for these metabolites suggests their potential mediating role in linking plant-based diets to type 2 diabetes risk. Consumption of healthy plant foods, especially nuts, was associated with lower levels of TAGs with ≤56 carbon atoms and ≤3 double bonds. These TAGs have been associated with an increased risk of type 2 diabetes [40, 41]. Physiologic and pharmacologic experiments also showed that levels of these TAGs decreased in response to insulin action and were elevated in the setting of insulin resistance [41]. Trigonelline and hippurate were two other intermediate metabolites. Trigonelline is a phytochemical found in coffee [42], and its favourable effects on insulin resistance have been reported in animal studies [43]. Hippurate is typically increased after consuming phenolic compounds [44], and an increased level of hippurate was associated with better glycaemic control and enhanced insulin secretion [45, 46]. An association between reduced level of hippurate and higher risk of type 2 diabetes has been reported as well [47]. Another notable intermediate metabolite was the branched-chain amino acid isoleucine, which was inversely correlated with tea/coffee but positively associated with the risk of type 2 diabetes. Branched-chain amino acids are consistently associated with increased insulin resistance [48], and multiple mechanisms have been proposed to explain how the elevations of these amino acids promote insulin resistance [49].

Fig. 5
figure 5

Association between healthy plant foods, potential intermediate metabolites and type 2 diabetes risk. Associations between healthy plant foods and intermediate metabolites were examined by multivariable linear regression. Associations between the intermediate metabolites and type 2 diabetes risk were examined by Cox regression. Only associations significant at Bonferroni p<0.05 were shown. The lines between healthy plant foods, metabolites and type 2 diabetes represent regression coefficients. The line thickness indicates the strength of the association and colour indicates the direction (red, positive; blue, negative). aLevel 2 identification defined by the Metabolomics Standards Initiative. GABA, γ-aminobutyric acid

The main strengths of our study include the prospective examination of metabolomics with incident type 2 diabetes, the large sample size, and the use of means of two validated FFQs to minimise the within-person variability of diet. In addition, we used elastic net regression modelling, which performs well in high-dimensional data, where there are high correlations among the predictors [50]. Nevertheless, several limitations should be discussed as well. First, the metabolomics platform used in the present study was not able to measure all established biomarkers of food intake. More comprehensive approaches [51] will be helpful in understanding the metabolomic link between diet and health. Second, within the metabolites selected in the metabolite profiles, it is difficult to differentiate the metabolites that directly come from the diet and the metabolites that come from the metabolic response to the dietary intake or other metabolic influences. Future studies, especially human feeding trials, can benefit from measuring both the food metabolome and the human metabolome [12, 13]. Third, our findings regarding the intermediate metabolites are hypothesis-generating. We are not able to confirm their causal role in the associations of plant-based diets and type 2 diabetes risk. In addition, we only collected blood samples at one time point for metabolomics measurement. Long-term repeated metabolomics data are needed to understand how dietary changes relate to changes in metabolome, thereby influencing type 2 diabetes risk. Furthermore, we excluded metabolites that were not detected in at least 75% of all samples for quality control. It is possible that we might have omitted some metabolites related to the foods that were not commonly consumed. Finally, further studies are needed to evaluate external validity and replicate our findings in other racial and ethnic groups.

In conclusion, we identified multi-metabolite profiles of plant-based diets. Scores based on identified metabolites for an overall plant-based diet and a healthy plant-based diet showed an inverse association with the risk of type 2 diabetes among a generally healthy population. The large attenuation of inverse associations between plant-based diets and type 2 diabetes after adjustment for trigonelline, hippurate, isoleucine, a small set of TAGs, and several other intermediate metabolites suggests that these metabolites could be further investigated to provide mechanistic explanations for the benefits of plant-based diets on diabetes risk.