Introduction

Over the past decades, the field of nutritional epidemiology has generated a large body of evidence indicating that maternal nutrition plays a critical role in fetal growth1,2,3,4 and pregnancy outcome5. Research regarding the phenomenon termed “fetal programming” and the theory of “fetal origins of adult disease” has initially focused on the impact of maternal overnutrition or undernutrition on the growth potential in utero1,2,3,4. However, emerging data6,7,8,9 indicate that even small alterations in dietary quality or quantity may be associated with significant shifts in the fetal environment, probably related to increased vulnerability to chronic diseases in adult life.

Amniotic fluid (AF) provides a rational compartment for studies on fetal nutrition and metabolism, since its composition reflects both maternal health and fetal status10,11,12,13. Scientific evidence10 indicates that this biofluid is a complex and dynamic milieu containing nutrients essential for fetal growth; its composition is changing, as pregnancy progresses, and, at the second half of gestation, its content and volume are affected by several factors including fetal urination and swallowing, as well as, fetal skin keratinisation10.

The potential effect of maternal diet on the nutrient composition of AF has been demonstrated in animal studies14,15,16,17,18,19. In particular, pregnant rats exhibited a significant increase in AF glucose and decrease in uric acid, as the level of carbohydrate increased in the maternal diet14. On the contrary, maternal dietary glucose restriction in rats resulted in the reduction of AF methionine and phenylalanine15. Similarly, maternal nutrient restriction in ewes markedly reduced total amino acids and polyamines concentration in AF16, while, a “famine diet” in rats also influenced AF composition17. Furthermore, Friesen and Innis18 demonstrated that maternal fat intake alters AF and fetal intestinal membrane essential n-6 and n-3 fatty acids in rats. In addition, in a very recent study in sows19, chitosan oligosaccharide supplementation induced AF metabolic profile modifications.

To the best of our knowledge, published data on the effect of maternal nutrition on human AF composition are only limited to the study by Felig et al.20, reporting changes in AF, after 84–90 hours of fasting. Nevertheless, it should be highlighted that the effect of maternal habitual diet on the composition of human AF has not been yet explored.

Metabolomics is a bio-analytical approach that allows the identification of a large number of metabolites in biological matrices, essentially reflecting biological processes of the organism21,22,23. In prenatal medicine, metabolomics of human blood, urine, and AF have been used for the evaluation/prognostication of fetal malformations, preterm delivery, and other pregnancy complications12,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40. Furthermore, within the same research field, Wan et al.19 reported that AF metabolomics provides novel insights into the diet-regulated fetal survival and growth in a pig model study.

Hence, the challenge for us was to explore whether maternal habitual dietary patterns influence the composition of human AF. To accomplish this task, we used a validated food-frequency questionnaire41 and applied 1H NMR-based metabolomics. It is of interest to note that, although amniocentesis is an invasive procedure employed only under specific indications, the realization that AF content may be influenced by maternal diet would advance the knowledge on the importance of maternal nutrition during pregnancy.

Results

Identification of dietary clusters

Sixty-five women were included in the present study, as shown in the flow diagram (Fig. 1). Two interpretable and statistically significant (upper tail rule: t = 39.85, df = 63, P < 0.001) dietary patterns were identified through Hierarchical Cluster Analysis (HCA). Thirty-three women were grouped in cluster 1 (C1), while 32 in cluster 2 (C2). The Discriminant Analysis indicated good classification ability of the selected cluster solution, since the agreement between actual and predicted cluster allocation was 93.8%. C1 and C2 differed (P < 0.05) in the percentages of energy contributed by 10 out of the 20 predefined food groups (Table 1). C1 had higher intakes of refined cereals, yellow cheese, red meat, poultry, and “ready-to-eat” foods (P < 0.05). The macro- and micro- nutrient intakes, as well as selected dietary indices, reflecting these dietary preferences, are given in Table 2. As indicated, C1 had significantly higher energy contributions from total protein, animal protein, and saturated fatty acids. Additionally, the intake of heme iron was elevated compared to that of C2 (P < 0.05). Dietary glycaemic index (GI) was also higher in C1 (P < 0.05). Instead, C2 was characterized by significantly higher percentages of energy derived from plant protein, monounsaturated and polyunsaturated fatty acids (P < 0.05) (Table 2). These differences in energy generating nutrients, in combination with the higher intake of fibre, folate, vitamin C, vitamin E, magnesium, potassium, and non heme iron (P < 0.05) in C2 (Table 2) may ensue from the significantly higher energy contributions from whole cereals, vegetables, fruits, legumes, and nuts (Table 1).

Figure 1
figure 1

Flow diagram of the study.

Table 1 Percentages of energy contribution of food groups between the two dietary clusters, cluster 1 (C1) and cluster 2 (C2). *P-value < 0.05 represents significant differences in mean values according to the results of t-test or Mann-Whitney test indicated by †. SD: standard deviation.
Table 2 Nutritional profile (energy, macro- and micro- nutrient intakes, dietary indices) of the two dietary clusters, cluster 1 (C1) and cluster 2 (C2). *P-value < 0.05 represents significant differences in mean values according to the results of t-test. SD: standard deviation; SFA: Saturated fatty acids; MUFA: Monounsaturated fatty acids; PUFA: Polyunsaturated fatty acids.

The demographic/anthropometric and clinical characteristics of the two dietary clusters are presented in Table 3; a borderline statistically significant difference was recorded for ponderal index (P = 0.076).

Table 3 Demographic/anthropometric and clinical characteristics of the 65 participants and their offspring for the two dietary clusters, cluster 1 (C1) and cluster 2 (C2). *P-value < 0.05 represents significant differences in mean values according to the results of t-test. SD: standard deviation; BMI: body mass index.

Analysis of 1H NMR Spectroscopic Data

Typical standard 1H NMR spectra of human AF, urine, and serum with annotations on the identified metabolites are depicted in Fig. 2, Supplementary Figs S1 and S2, respectively. Principal Component Analysis (PCA) was implemented to provide an overview on the samples’ clustering (Fig. 3 for AF, Supplementary Fig. S3 for urine and serum). Interestingly, a clear trend for clustering of the samples was observed along the first component, which explained 58.1% of the metabolic variance in AF, 60% in urine, and 48% in serum. This clustering indicates that these unsupervised models highlighted metabolic differences in relation to the dietary patterns.

Figure 2
figure 2

1H NMR spectra of amniotic fluid sample with annotation on the identified metabolites. 1:valine; 2:leucine; 3:isoleucine; 4:isobutyrate; 5:2-hydroxy-3-methylbutyrate; 6:2-hydroxybutyrate; 7:lactate; 8:3-hydroxybutyrate; 9:alanine; 10:lysine; 11:arginine; 12:acetate; 13:acetone; 14:acetoacetate; 15:glutamine; 16:glutamate; 17:citrate; 18:methylamine; 19:aspartate; 20:dimethylamine; 21:creatine; 22:creatinine; 23:choline; 24:phosphocholine; 25:betaine; 26:methanol; 27:α-D-glucose; 28:β-D-glucose; 29:glycine; 30:glycerol; 31:myo-inositol; 32:threonine; 33:tyrosine; 34:histidine; 35:phenylalanine; 36:formate.

Figure 3
figure 3

PCA model of amniotic fluid samples. A = 5; N = 58; R2(cum) = 0.82; Q2(cum) = 0.70. Cluster 1 (C1) Cluster 2 (C2).

Then, we embedded the class information from the dietary clusters into Orthogonal Partial Least Squared-Discriminant Analysis (OPLS-DA) models in order to pinpoint the metabolites responsible for the discriminations. The extracted OPLS-DA models classified correctly 89% of the AF samples, 79% of the urine samples, and 83% of the serum samples.

For AF, the discrimination between the two clusters was evident along the first component (Fig. 4a) and the key metabolites, which exhibited a strong correlation with C1 as depicted in the S-line plot, are presented in Fig. 4b. We extracted Receiver Operating Characteristic (ROC) curves for each metabolite, in order to elucidate the markers that express the impact of habitual diet between the two clusters and avoid false selection. In fact, glucose, alanine, tyrosine, valine, citrate, cis-acotinate, and formate exhibited Area Under the curve of the ROC (AUROC) > 0.7 (Table 4). These should be considered as the most fitting markers of habitual diet in the AF samples and their trends framed in box plots are presented in the Supplementary Fig. S4. Metabolites exhibiting an 0.5 < AUROC < 0.7 and AUROC < 0.5 are, also, presented in Table 4.

Figure 4
figure 4

(a) OPLS-DA model of amniotic fluid samples. A = 1 + 1; N = 54; R2X(cum) = 0.66; R2Y(cum) = 0.76; Q2(cum) = 0.64. Cluster 1 (C1) Cluster 2 (C2). (b) S-Line plot (1:valine; 2:alanine; 3:acetate; 4:citrate; 5:glutamine; 6:cis-acotinate; 7:glucose; 8:tyrosine; 9:formate, higher in C1).

Table 4 List of metabolite changes in amniotic fluid, maternal urine, and maternal serum corresponding to the two dietary clusters, cluster 1 (C1) and cluster 2 (C2). δ (1H shift) ppm corresponds to signals used for integration; s: singlet; d: doublet; t: triplet; dd: doublet of doublets; m: multiplet; AUROC: Area under the curve of the receiver operating characteristic.

For urine samples, the clear separation along the first component (Supplementary Fig. S5a), based on the corresponding S-line plot (Supplementary Fig. S5b), was attributed to the metabolites presented in Table 4.

For maternal serum, the extracted OPLS-DA model (Supplementary Fig. S5c) clearly discriminated the samples along the first component and indicated that the samples belonging to C2 were characterized by higher levels of lipoproteins, as depicted in the corresponding S-line plot (Supplementary Fig. S5d). These lipoproteins are presented in Table 4.

Finally, the use of validation steps (P < 0.05, Permutation testing, and ROC curves) confirmed that the results of all OPLS-DA models for each substrate were unbiased and reliable as described in the Supplementary Figs S6, S7, and S8.

Metabolite pathway analysis

After feature selection, metabolites exhibiting AUROC > 0.7 in AF samples were subjected to pathway analysis in order to relate the framed metabolic patterns to the most relevant pathways. The result of the pathway analysis for AF samples is depicted in Supplementary Fig. S9. Specifically, the pathways of importance containing at least 2 compounds involve the aminoacyl-tRNA and the citric acid cycle.

Metabolite Set Enrichment Analysis (MSEA), using Metaboanalyst 3.042, was performed for the metabolites in AF exhibiting AUROC > 0.7. MSEA monitors whether these metabolites are represented more often than expected by chance and in an attempt to identify biologically meaningful patterns. The results pointed to protein biosynthesis as the only statistically significant pathway (P < 0.05) (Supplementary Fig. S10).

Discussion

The present study is the first report attempting to probe the effects of maternal habitual diet on human AF composition, suggesting that the nutritional environment of AF is sensitive to female diet in the 2nd trimester of pregnancy. The metabolic modifications in AF induced by different maternal dietary habits could be linked to amino acid metabolism, glucose metabolism, and citric acid cycle.

A detailed comparative analysis of our results against published literature is not feasible due to the limited data available in this area. To the best of our knowledge, there is only one relevant study in humans by Felig et al.20, where paralleling changes were reported in maternal plasma and AF, i.e. increase in branched-chain amino acids and decrease in alanine levels, after 84–90 hours of fasting, at 16–22 weeks of gestation. Evidence from animal models indicates, also, that maternal diet can affect the complex nutrient matrix of AF14,15,16,17,18,19.

To facilitate the interpretation of our results, perturbed metabolites identified in AF, as well as their associated metabolic pathways, are depicted in Fig. 5. As shown in Fig. 5a, higher AF glucose levels were recorded for C1, although no difference in maternal serum glucose was found between the two dietary clusters. Koski and Fergusson14 reported – in rats being in a post-absorptive (fed) state – no significant changes in maternal blood glucose concentrations, but increases in AF glucose with increases in maternal dietary carbohydrate intake levels. Considering that there was no statistically significant difference between C1 and C2 either for the time since last meal, or for carbohydrate intake, a plausible explanation for our finding may be related to the quality of carbohydrate. The latter can be linked to the higher dietary GI of C1 which may, in turn, alter the rate of glucose flux. It is important to note that glucose is the major energy substrate for fetal development and may be utilized, through conversion to other compounds, for protein synthesis and new tissue growth43. Commensurate with the higher AF glucose in C1, the higher levels of the essential amino acids histidine, phenylalanine, valine, and of the non-essential ones, alanine and tyrosine (Fig. 5a), may simply indicate either a differential rate to meet the requirements for elementary building blocks or a comparative under-use in gluconeogenesis. Furthermore, the increased levels of valine in AF of C1 may contribute to the balance between the branched chain amino acids44, known to be the major source of nitrogen for the ureogenic amino acids, alanine and glutamine45. At this point it is tempting to hypothesize that the relative increases in AF metabolites of C1 might reflect an increased energy availability ensuing from the increased fluxes of substrates, as echoed by the different combination of dietary factors characterizing this cluster. This is further supported by the fact that citrate is also elevated in C1, exhibiting a similar trend with glucose; a finding that is in agreement with the results found by Wan et al.19, who reported that citrate fluctuation in AF corresponds to glucose level fluctuations. The relative abundance of citrate in tandem with cis-acotinate in C1 may, thus, also suggest a differential management of the metabolic pool, since citric acid cycle may, also, provide building blocks for important biomolecules (Fig. 5a). Whether these changes direct/promote a metabolic switch that affects fetal development/growth, as well as the risk to develop chronic diseases in adult life, remains an open question. Accordingly, the relative higher AF levels of fumarate, observed in C1, could be related to a distinct intermediary metabolic rate, given that fumarate is situated at an important metabolic junction, performing key physiological functions; i.e. (i) its synthesis links the urea and the citric acid cycles (Fig. 5a); (ii) fumarate is involved in the cataplerotic pathway of phenylalanine and tyrosine (Fig. 5a); (iii) fumarate is generated during purine biosynthesis (Fig. 5b), where formate – increased in AF of C1, as well – acts as a potential alternative single-carbon source. We dare to speculate that formate in AF may be a marker of the biological consequences of the quality of dietary intake, since it is suggested in the literature46 that formate is excreted as a secondary metabolite in the case of high GI diets. The above speculation is further supported by the fact that, in the last decade, important evidence has shown that during pregnancy, maternal gut microbiota or its metabolic products may be transferred to the fetus through the placenta47,48.

Figure 5
figure 5

Schematic diagram illustrating the metabolic pathways that are possibly influenced by maternal habitual diet: (a) energy metabolism, amino acids metabolism, and urea cycle; (b) fumarate generation during purine biosynthesis; (c) choline metabolism.

With respect to pregnancy and fetal nutrition, it was of interest to explore how the habitual dietary patterns would be reflected in the metabolomic data of maternal compartments, i.e. urine and serum. As expected, 1H NMR spectra of maternal urine allowed the identification of metabolites associated with the two dietary patterns. Urine is the biofluid most frequently used to study nutrient intake49,50,51, since it is the body’s liquid waste repository21. At this point it is worth mentioning that during pregnancy the urine metabolome is also influenced by the remarkable physiological forces set in motion by conception52. Within this frame, the excretion of alanine increases rapidly in early pregnancy and continues to increase as pregnancy proceeds53. However, since the two clusters did not differ in gestational age, the increased alanine excretion in C1 could be attributed to dietary intake. Holmes et al.54 reported that urinary excretion of alanine is higher in people consuming a predominantly animal diet, proposing a direct association between excreted alanine and blood pressure. Furthermore, Bertram et al.55 and Dragsted56 ascribed the higher levels of excreted urea to the higher red meat consumption and higher animal protein intake. In line with our observation, in the study conducted by O’Sullivan et al.57, higher urinary dimethylglycine and trimethylamine N-oxide (TMAO) were identified in a dietary cluster characterized by higher habitual intakes of white bread, sugars/preserves, red meat, red-meat dishes, and meat products, and a lower contribution from vegetables. Interestingly, dimethylglycine, TMAO, creatine, and creatinine, as well as choline, betaine, and formate – increased in C1 - are metabolically linked in two different pathways of “choline metabolism” (Fig. 5c); (i) choline oxidation into betaine and (ii) bacterial degradation of choline into TMAO by the gut microbiome. Regarding formate excretion, it has been reported to be elevated in a group of adults following a high GI diet46. Moreover, the presence of bile acids in maternal urine during pregnancy has been suggested in the literature31,38; however, the higher levels of these important signalling biomolecules in C1 merit further investigation, preferably by quantitative LC-MS/MS analysis.

Regarding the maternal circulatory metabolome, it was dominated by signals from lipids and lipoproteins. Hyperlipidemia of normal pregnancy results in high blood HDL, LDL, VLDL, and triglycerides, accompanied by increases in the length of fatty acid chain and the degree of unsaturation38,58,59,60. The present study showed that this expected increase in maternal lipids was, further, promoted in women of C2, whose dietary preferences were associated with higher total lipid, monounsaturated and polyunsaturated fatty acids intakes, compared to C1. It is of interest to mention that in a very recent study61, higher blood total cholesterol levels were recorded in pregnant women following a dietary pattern characterized, among others, by higher intakes of fruits, vegetables, whole grains, and low-fat dairy. However, due to the strong influence of pre-pregnancy lipid levels and maternal hormonal status during pregnancy on lipid metabolism59,62, no clear biochemical interpretation may be advanced at least at this stage.

Our results express the potential prospects of using metabolomics in the quest for habitual diet induced metabolic signals in AF, in spite of existing limitations related to genetic background information. Furthermore, to obtain a more accurate picture of the overall metabolic changes, confounding factors, such as maternal hormonal status have to be assessed. Nevertheless, the results of the current study have to be interpreted in the light of its strengths, concerning the experimental approaches undertaken. Firstly, the fact that in the present study we analysed AF after excluding samples from pregnancies that (i) were complicated by structural malformations and/or chromosomal abnormalities of the fetus, (ii) were characterised by obstetrical or medical disorders, or (iii) ended in delivering a small or large for gestational age infant, eliminated the potential overlapping with metabolic effects attributable to these aforementioned fetal/maternal disturbances12,24,25,26,27,29,30,33,34,35,40. Moreover, the parallel analyses of the three biological specimens (obtained at the time of genetic amniocentesis) provide complementary information of fetal metabolism, through AF analysis, and maternal metabolism, by the excretive and circulating characteristics of the mother. The great advantage of using untargeted metabolomics is that all metabolites (those present in detectable concentrations) are measured simultaneously. Thus, metabolic profiling of AF, as well as of maternal urine and serum, in conjunction with detailed recording of the maternal complex dietary preference background, does offer a more holistic approach that leads to a better description of the metabolic trajectory of the fetus, with respect to maternal nutrition.

In conclusion, our data provide the first evidence to suggest that maternal habitual dietary patterns influence the metabolic profile of human AF. Notably, very recently, Kermack et al.63 reported that differences in women’s diet quality can alter the amino acid concentration of human uterine fluid. Taken together, these results highlight the need to raise nutritional awareness and provide a framework for further research on the effect of maternal nutrition on pregnancy evolution and outcome, using a combination of biological matrices and analytical platforms.

Methods

Study population

The present study was part of the Embryometabolomics project64. Women in the second trimester of pregnancy were invited to participate in the Embryometabolomics project, while visiting the 1st Department of Obstetrics and Gynecology, Papageorgiou General Hospital, Thessaloniki, Greece, to undergo amniocentesis for prenatal diagnosis. Indications for amniocentesis included maternal age, ultrasound markers, family history of genetic disorders, previous fetal aneuploidy, and maternal anxiety. Women were informed about the objectives of the Embryometabolomics project and gave their signed consents; women who agreed to participate completed a structured interview concerning maternal demographic/anthropometric characteristics, while respective samples of AF were stored at −80 °C until further analysis.

The methodological strategy of the present study is depicted in Fig. 1. From those women who were enrolled in the Embryometabolomics project, dietary information was available from 72 women (Fig. 1) and, as such, they were recruited for the present study. Finally, 65 were included, as they met the following criteria: (a) singleton pregnancy, (b) absence of structural malformations and/or chromosomal abnormalities of the fetus, (c) delivery of an appropriate for gestational age infant (birth weight between the 10th and 90th centile), (d) absence of obstetrical or medical complications, such as preeclampsia or gestational diabetes mellitus, and (e) dietary energy intake within the allowable range for pregnant women65,66.

Ethical approval was obtained from the Bioethics Committee of the Medical School of the Aristotle University in Thessaloniki, Greece (A19479–26/2/08). All methods were performed in accordance with the relevant guidelines and regulations.

Biofluid collection

All biological specimens were collected under non-fasting conditions, due to medical restrictions in controlling/limiting pregnant women’s diet. AF specimens were retrieved using a 20 G spinal needle under ultrasound guidance. Blood samples were collected, allowed to clot, and centrifuged at 3500 g for 5 min; serum was, then, aliquoted. Spot urine samples were collected in sterile containers. Biofluids were stored at −80 °C until further preparation and analysis.

Dietary assessment

Dietary assessment was carried out using a semi-quantitative Food Frequency Questionnaire (FFQ) validated for pregnant women41. All dietary information were collected prior the antenatal appointment via personal interview by a registered dietician or a well-trained interviewer (food scientist-nutritionist). For the conversion of women responses into dietary data, the Microsoft excel database was used as described by Athanasiadou et al.41.

Statistical Analyses for identification of dietary patterns

HCA67,68 was used to identify groups of women consuming a similar dietary pattern. Prior to cluster analysis, the individual food items were categorized into 20 predefined food groups – as shown in Table 1 – based on similarities in their nutrient profiles and culinary usage/parameters with potential relevance to food culture69,70,71,72,73,74,75,76,77,78,79.

For entry into the cluster analysis, the percentage of energy contributed by each of the 20 food groups was selected as input variable. Cluster construction was based on Ward’s minimum variance criterion80, while the squared Euclidian distance was used as a dissimilarity measure67. The food-group data were transformed into standardized z scores, before clustering, so that they had equal weights when distances were computed72. The theoretical background for adopting the above mentioned methodological scheme for HCA is reported by Taxidis et al.81.

Runs of cluster formation were performed to establish the best cluster configuration. Criteria for cluster solutions were nutritional meaningfulness and a reasonable sample size. The solution was confirmed by the tree diagram resulting from the Ward method of cluster analysis. Furthermore, Discriminant Analysis was carried out to examine the classification ability of the cluster solution82. The statistical significance of the final cluster solution was evaluated with the upper-tailed rule, using the Clustan ver. 5.2783.

In order to compare normally and non-normally distributed parameters between the clusters, Student’s t test for independent samples and Mann-Whitney test were used, respectively. In Mann-Whitney test, the observed significance level (P-value) was computed with the Monte-Carlo simulation method84 utilizing 10000 random samples. All statistical analyses were performed with SPSS v.15.0 (SPSS Inc., Chicago, IL). The significance level was predetermined at P  < 0.05.

NMR spectroscopy

Sample preparation

All NMR spectra were acquired on a Varian-600MHz NMR spectrometer equipped with a triple resonance probe {HCN} at 25 °C. The Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence was applied with 128 transients collected with 64 K data points to AF, urine, and serum samples. The samples were thawed at room temperature 60 min before performing the NMR experiments.

AF: 400 μL D2O and 150 μL phosphate buffer in D2O were added in lyophilized samples. After centrifugation (4500 g, 15 °C, 5 min), 50 μL sodium maleate was added as internal standard to 500 μL of the supernatant and the sample was transferred to 5 mm NMR tubes.

Urine: Samples were prepared by adding 150 μL phosphate buffer in D2O to 400 μL urine. After centrifugation (10000 g, 4 °C, 10 min), 50 μL sodium trimethylsilyl propionate (TSP) was added as internal standard to 500 μL of the supernatant and transferred to 5 mm NMR tubes.

Serum: Samples were prepared by adding 140 μL phosphate buffer in D2O to 400 μL serum. After centrifugation (10000 g, 4 °C, 10 min), 50 μL sodium maleate was added as internal standard to 500 μL of the supernatant and transferred to 5 mm NMR tubes.

Sodium maleate was chosen as reference standard for serum and AF since it is suitable for CPMG pulse sequence and provides a distinct peak in the 1H NMR spectrum85. Relaxation delay was set to 6 s. Proton spectra were referenced at the resonance peak of sodium maleate (5.95 ppm). Receiver Gain was kept constant for all acquisitions.

A series of 2D experiments, gCOSY, zTOCSY, gHMBCad, gHSQCad were recorded at 25 °C and permitted the assignment of metabolites. The acquisition parameters for 2D NMR experiments are described in the Supplementary Material. The interpretation of 2D spectra was performed with the use of MestReNova v.10.1 software. The identification procedure was also assisted by literature data12,24,28,31,38, a reference metabolite 1H NMR database (Chenomx NMR Suite 7.0) and an in-house fully automated metabolite identification platform86.

All 1H NMR spectra were phase and baseline corrected.

Data reduction and spectral alignment

The 1H NMR spectra were reduced into buckets of 0.0001 ppm and the D2O (4.6–4.8 ppm) region was removed. The spectra were aligned, normalized to the standardized area of the reference compound and converted to ASCII format using the Mnova processing template.

Statistical Analyses for 1H NMR data

The SIMCA-P version 14.0 (Umetrics, Umeå, Sweden) was facilitated. The spectral data were mean-centered Pareto scaled (Par) and the PCA, as well as the OPLS-DA models were extracted at a confidence level of 95%. The mathematical background and applications of these methods have been extensively discussed elsewhere87.

The online software Metaboanalyst 3.0 was utilized42 for biomarker discovery, classification and pathway mapping. A hypergeometric test using over-representation analysis and pathway topology analysis related these metabolites to metabolic pathways.

Identification of important Features in the OPLS-DA models

Feature selection for the OPLS-DA models was based on variable importance in projection (VIP) scores larger than 0.7 and P(corr) > 0.2 to reveal the variables which bear class discriminating power. S-line plots were facilitated to pinpoint those metabolites that contribute to the samples’ discrimination.

Model Validation

The validation steps followed by Fotakis et al.85 were implemented in this work, as described in the Supplementary Material.

Data availability

All data generated or analyzed during this study are included in this published article (and its Supplementary Information files).