Metabolic phenotyping of 1,200 subjects from a UK population
More than 3,000 serum samples were collected at three separate UK locations across a 4-year period, applying the same standard operating procedure (SOP) at all sites. 1,200 serum samples were selected for a first-pass of data acquisition. GC–MS and UPLC–MS in positive (UPLC–MS(+)) and negative ion (UPLC–MS(−)) ion modes were applied as complementary analytical platforms to profile a diverse range of hydrophilic and lipophilic metabolites present in the serum of 1,200 adult subjects from the UK in the age range of 19–81 years; at the time of sampling all subjects were defined as ‘healthy’, with no diagnosis of any disease. Data were acquired across 11 months in 10 different analytical experimental batches; each batch was composed of a single serum sample from 120 subjects and analysed across a five-day period. Each batch included the periodic analysis of a pooled quality control (QC) sample) to allow analytical variation to be measured quantitatively within and between these analytical experiments (Dunn et al. 2012). The same pooled QC sample was applied for all analytical experimental runs.
Following data pre-processing to construct a robust dataset, 126, 2178 and 2280 metabolite features were detected by GC–MS, UPLC–MS(+) and UPLC–MS(−), respectively; due to multiple adducts/fragments etc. during electrospray ionisation (Brown et al. 2009) more than 1,500 metabolites are estimated as being detected. All of these metabolites were detected reproducibly across all analytical experimental batches in a periodically analysed (every 5th injection) single pooled QC sample; this quantifies the variation introduced by sample preparation, data acquisition and data pre-processing. The criterion applied to define reproducible detection was relative standard deviation (RSD) less than 20 % for UPLC–MS and RSD less than 30 % for GC–MS, calculated after signal correction [see Dunn et al. (2011)]. Classes of metabolites detected included amino acids (GC–MS), organic acids (GC–MS), carbohydrates (GC–MS), fatty acids (GC–MS and UPLC–MS), peptides (UPLC–MS), acyl glycerides (UPLC–MS), sphingolipids (UPLC–MS), steroids including vitamin D metabolites (UPLC–MS) and glycerophospholipids (UPLC–MS), representing a diverse set of metabolic pathways and regulatory processes. This allowed many different areas of metabolism and biological function to be investigated simultaneously, so as to identify their importance with regard to the human ‘healthy’ population phenotype. This approach is in contrast to targeted studies that focus on small segments of metabolism or just a few metabolite classes. Additionally, a variety of exogenous metabolites were also detected including drugs and their metabolic products (e.g. paracetamol, (acetaminophen)). By applying linear discriminant analysis, we concluded that no metabolic differences were observed that could be related to time differences in acquiring the analytical data (see Supplementary Fig. 1), showing for the first time that a metabolome-wide study of large sample sets derived from the human population could be profiled reproducibly via chromatography-mass spectrometry platforms over a period of 11 months. A range of standard clinical chemistry measurements was also performed for all 1,200 subjects (23 assays in total including lipids (LDLC, CHOL, HDLC, TRIG), enzyme concentrations (PLT, ALK, AST, ALT, GGT, ALP, LDH), metabolites (glucose, creatinine, urea), ions (Ca, K, Na, phosphate), blood components (WBC, RBC, HAEM, TBIL) and total protein and albumin) and provided the ability to relate changes in these assays applied in routine clinical use to metabolic pathways and associated mechanisms. All metabolite data and associated demographic/clinical metadata are available at the publically available metabolomics data repository MetaboLights (http://www.ebi.ac.uk/metabolights/; study identifier MTBLS97). The clinical characteristics of the cohort discussed here are provided in Table 1.
Table 1 Clinical characteristics of the cohort studied defining median and inter-quartile range
Variability in relative metabolite concentrations
The relative concentrations of metabolites were investigated to derive the cumulative variation associated with background/baseline genetic and environmental influences. The distribution of variation associated with inter-subject variability [as calculated as the relative standard deviation (RSD)] for all 1,200 subjects following signal correction) is shown in Fig. 1. The distribution is skewed to lower RSD values; one interpretation of this is that the serum metabolome is comparatively tightly regulated in “healthy” populations (i.e., subjects with no diagnosed disease at the time of sampling). This could reasonably be expected, with a greater variation observed in the human urine metabolome (Bouatra et al. 2013), a biofluid composed of metabolites that are being excreted from the body. Of course, if the inter-subject variability is equivalent to the technical variability measured by replicate analysis of the same quality control (QC) sample then the metabolite feature contains no biological information. For GC–MS, UPLC–MS(+) and UPLC–MS(−), respectively, 7 of 126, 71 of 2,181 and 42 of 2,283 metabolic features were observed to have an inter-subject RSD/QC RSD <1.5; thus the overwhelming majority of metabolite features reported contain biological information and those metabolite features with a value less than 1.5 were removed from further analyses.
In this distribution (see Fig. 1), those metabolites showing a high variability between subjects describe inter-subject variability which is likely to be caused by environmental and genetic variation. Caffeine showed an RSD greater than 200 % and salicylic acid, probably derived as a result of aspirin use but possibly also via tobacco, had an RSD >800 %, whilst N-methylpyrrolidinine (used in the formulation of drug vehicles) had an RSD of 550 %; such analytes thus show variation related to consumption of pharmaceutical drugs and/or specific food components. Trehalose varied by greater than 200 %, suggesting significant variation in glucose usage and storage properties, albeit trehalose is also used as a food additive. Tetradecanoic, hexadecenoic and eicosanoic acids all had variances greater than 100 %. Oxidised longer chain fatty acids and acyl carnitines also showed higher variations—potentially a sign of oxidative stress or changes in energy production in the body. Glycerophosphoethanolamines and a small peptide (γ-glutamyl-l-isoleucine or γ-glutamyl-l-leucine) also show high variation. By contrast, the aromatic amino acids (tryptophan, phenylalanine and tyrosine) all showed a low degree of inter-subject variability.
Metabolite-metabolite correlations
Metabolites do not operate in isolation but through a complex network of interactions, with metabolism being one network, though other networks are observed in biological systems (Camacho et al. 2005), especially through correlation of non-neighbour metabolites indicating their involvement in regulatory pathways [see e.g. Kotze et al. (2013)]. We note also that as reported in Camacho et al. (2005) without clear metabolite linkage, correlations should be treated with caution as correlation does not necessarily equate to causation. To highlight these complex networks we illustrate the 20 metabolites for GC–MS that show the highest pairwise Pearson’s correlations. Where a metabolite was detected as more than one ‘metabolic feature’, only one ‘feature’ has been included in Fig. 2, the feature with the higher correlation coefficient. The data show the expected correlations between leucine and valine (both involved in branched chain amino acid metabolism) and between different fatty acids and glycerol (related to glycerolipid and glycerophospholipid metabolism). However, and unexpectedly, proline was also correlated with leucine and valine, and phosphate with fatty acids. Assessing the UPLC–MS data (Supplementary Fig. 2) we detected expected correlations between fatty acids and oxidized fatty acids, between different sphingolipids, between fatty acids and sphingolipids, between different lyso-glycerophospholipids, between different diacylglycerides and between diacylglycerides and sphingolipids.
The effect of sample size
It is becoming increasingly evident that many biological studies are underpowered with regard to their ability to come to a robust and statistically significant and justifiable biological conclusion (Broadhurst and Kell 2006; Button et al. 2013; Dunn et al. 2011; Dunn et al. 2012; Ioannidis 2005; Ioannidis and Panagiotou 2011). It is obvious that sample size in metabolomic studies is an important aspect of experimental design, especially in terms of applying metabolites as predictive biomarkers. Although these issues have been addressed in theory [see Xia et al. (2013) for a detailed discussion], to our knowledge, no previous large-scale studies have assessed the influence of sample size. Thus, we studied the effect of sample size in terms of the prediction power of classification and the consistency of feature selection. The experimental design was to divide the whole sample population into several subsets for classification and feature selection. The results of these subsets are used to select the smallest subset which has an acceptable performance, comparing this with the whole sample population in both classification and feature selection. Three groups, viz. age, gender and BMI for the three analytical platforms, have been used to evaluate the effects of sample size. Sample size is defined as the sum of samples in both classes in a binary classification and in this study the number of samples in each class was not equivalent (see Supplementary Table 1). In an ideal study the number of samples in each class would be balanced. Figure 3 shows the prediction accuracy using Random Forests (RF) with a 95 % confidence interval in the three groups (age, gender and BMI) for UPLC–MS positive ion mode. At low sample sizes the prediction accuracy was variable, but as the sample size was increased the median accuracy also increased with concomitant decrease in variation. These data showed that a sample size of 600 was appropriate to achieve similar results to those of the whole sample population with the current dataset where we are looking for general (i.e. not disease-specific) changes and where the variation is expected to be lower than that for the comparison of two populations such as ones that are ‘healthy’ and ‘diseased’. A previous study based on NMR data has shown that sample sizes of low thousands of subjects offer sufficient statistical precision to detect biomarkers quantifying predisposition to disease, a different assessment to the one we have performed above (Nicholson et al. 2011). We emphasise that this highlights the requirement to include hundreds of samples in these types of studies but does not suggest that a sample size of 600 is appropriate for all studies [for detailed discussions on this subject see Xia et al. (2013)]. However, the trends observed for all analytical platforms suggested a higher sample size would still slightly increase the prediction accuracy. The same trends were also seen with UPLC–MS(−) as well as for GC–MS. Classification results with RF and Support Vector Machine (SVM) classifiers for all three platforms and the effects of sample size on feature selection are shown in Supplementary Figs. 3 and 4.
Metabolic characteristics of this UK population
Metabolic characteristics of this subset of the UK population are discussed below. Results of data analysis performed applying consensus feature selection as described in the methods section and associated with the discussions related to gender, age, BMI, blood pressure and smoking are available. The results of two-way analysis of variance (ANOVA) and their post hoc analysis by Tukey’s HSD (“honestly significant difference”) test are available and are summarized, where appropriate, below. All data analysis results are available in supplementary material files 2–4. Where no results for two-way ANOVA are included, metabolites have been defined as biologically important by applying consensus feature selection protocol, but two-way ANOVA has shown no statistical significance with a ‘critical’ p < 0.05 [cf. Broadhurst and Kell (2006)]. Similarly interactions between main effects are only discussed if significant.
Gender
Two-way ANOVA was performed using Gender (male, female) and Age (four grouped categories: <40, 40–49, 50–64 and >64 years) as the main effects. Many differences in the serum metabolome were observed when comparing the metabolic profiles of males and females. A number of these had been observed previously highlighting the robustness of our study; these included 4-hydroxyphenyllactic acid [F(1,1123) = 245.1, p = 3.9 × 10−50], creatinine, citrate, urate [F(1,1092) = 512.3, p = 2.6 × 10−93], glycerol [F(1,1081) = 93.7, p = 2.6 × 10−21], hexadecenoic acid [F(1,1097) = 62.8, p = 5.5 × 10−15] and tyrosine (Kochhar et al. 2006; Lawton et al. 2008; Slupsky et al. 2007). For glycerol, there was also a significant difference between age categories [F(3,1081) = 3.1, p = 1.1 × 10−12]. Tukey post hoc test showed that, independent of gender, comparisons of age categories <40 vs. 40–49 (p = 0.0005), <40 vs. 50–64 (p = 9.1 × 10−12), <40 vs. 65–81 (p = 1.8 × 10−8), 40–49 vs. 50–64 (p = 0.004) and 40–49 vs. 65–81 (p = 0.03) were significant using a critical p value of 0.05. There was also a significant interaction between gender and age categories for urate [F(3,1092) = 4.8, p = 0.002], glycerol [F(3,1081) = 2.8, p = 0.039] and hexadecenoic acid [F(3,1097) = 4.7, p = 0.003]. In our study, 4-hydroxyphenyllactic acid was found to be higher and tyrosine lower in males. Both of these metabolites are structurally related and these differences may reflect differences in gut microfloral co-metabolism, or the effects of alcohol consumption (Liebich and Pickert 1985). However, we observed a multitude of other robust changes related to gender also. Eight diacylglycerides were observed to be higher in relative concentration in the serum of women compared to men including DG(44:6) [F(1,808) = 276.5, p = 1.3 × 10−53] and DG(46:2) [F(1,848) = 206.1, p = 5.3 × 10−42]). For DG(46:2) there was also a significant difference between age categories [F(3,848) = 5.8, p = 0.0006] and a significant interaction between gender and age categories [F(3,848) = 7.5, p = 6.0 × 10−5]. Tukey post hoc test showed that, independent of gender, comparisons of age categories <40 vs. 65–81 (p = 0.002) and 50–64 vs. 65–81 (p = 0.0009) were significant using a critical p-value of 0.05. Four fatty acids (for example, hexadecenoic acid as shown above) and thirteen glycerophospholipids (for example, PC(36:2) [F(1,1103) = 224.8, p = 2.2 × 10−46]) showed the same trend as diacylglycerides. PC(36:2) also showed a significant difference between age categories [F(3,1103) = 3.4, p = 0.02] and a significant interaction between gender and age categories [F(3,1103) = 4.5, p = 0.004]. Tukey post hoc test showed that, independent of gender, comparisons of age categories <40 vs. 40–49 (p = 0.02) was significant using a critical p-value of 0.05. Serum creatinine relative concentrations were observed to be higher in females than males and, when integrated with higher phosphate levels, might suggest greater breakdown of creatine phosphate in muscles in females. Caffeine relative concentrations were higher in women [F(1,847) = 38.3, p = 9.6 × 10−10] perhaps reflecting coffee/tea/chocolate consumption, as was 2-aminomalonic acid [F(1,1048) = 87.6, p = 4.8 × 10−20] which has been associated with atherosclerotic plaques (Rupérez et al. 2012) and renal failure (Mao et al. 2008). For caffeine [F(3,847) = 9.3, p = 5.0 × 10−6] and 2-aminomalonic acid [F(3,1048) = 3.6, p = 0.01] there was also a significant difference between age categories and a significant interaction between gender and age categories for caffeine [F(3,847) = 6.3, p = 0.0003] and 2-aminomalonic acid [F(3,1048) = 24.3, p = 3.5 × 10−15]. Tukey post hoc test for caffeine showed that, independent of gender, comparisons of age categories <40 vs. 40–49 (p = 8.2 × 10−5), <40 vs. 50–64 (p = 0.0002) and <40 vs. 65–81 (p = 1.4 × 10−5) were significant using a critical p-value of 0.05. Tukey post hoc test for 2-aminomalonic acid showed that, independent of gender, comparisons of age categories <40 vs. 50–64 (p = 0.03) and 40–49 vs. 50–64 (p = 0.03) were significant using a critical p-value of 0.05. Three glycerol-like metabolites (glyceric acid [F(1,1107) = 9.1, p = 0.003], glycerol [F(1,1081) = 93.7, p = 2.6 × 10−21] and glycerol-3-phosphate [F(1,1127) = 11.8, p = 0.0006]) were present in greater amounts in the serum of women compared to men, suggesting differences in glycerol metabolism and potentially related to differences in the rate of glycerolipid and glycerophospholipid synthesis. For glycerol [F(3,1081) = 20.1, p = 1.1 × 10−12] and glyceric acid [F(3,1107) = 6.8, p = 0.0001] there was also a significant difference between age categories. There was also a significant interaction between gender and age categories for glycerol [F(3,1081) = 2.8, p = 0.04] and glycerol-3-phosphate [F(3,1127) = 8.7, p = 1.1 × 10−5]. Tukey post hoc tests showed that, independent of gender, comparisons of age categories for glycerol [<40 vs. 40–49 (p = 0.0005), <40 vs. 50–64 (p = 9.1 × 10−12), <40 vs. 65–81 (p = 1.8 × 10−8), 40–49 vs. 50–64 (p = 0.004) and 40–49 vs. 65–81 (p = 0.03)], glycerol-3-phosphate [<40 vs. 50–64 (p = 0.04)] and glyceric acid [<40 vs. 40–49 (p = 0.006), <40 vs. 50–64 (p = 0.0002), <40 vs. 65–81 (p = 0.005)] were significant using a critical p-value of 0.05. Methionine sulfoxide, also present in greater amounts in the serum of women [F(1,901) = 20.3, p = 7.7 × 10−6], is an oxidation product of methionine and is considered to be a marker of oxidative stress (Bachi et al. 2013) (Fig. 4). Other gender-specific changes in the metabolome as a function of age, BMI and BP were also observed and are discussed below.
Age
We assessed age-related changes through the comparison of all subjects below the age of 50 years with all subjects older than 64 years. Two-way ANOVA was performed using Gender and Age (two categories: <50 years, and >64 years) as the main effects. Different classes of metabolites showed changes related to age, with some changes not being gender-related and others being specific to one gender. For example, citric acid showed a general increase with age for both males and females [F(1,779) = 79.8, p = 3.1 × 10−18] and therefore is probably not thus a biomarker for pancreatic cancer (Bathe et al. 2011); visually the rate of increase was greater in females than in males (Fig. 5). Citrate has previously been shown to be related to age, along with other metabolites also observed in our study. These include serine [F(1,755) = 6.5, p = 0.011], phosphate, aspartate, erythritol/threitol [F(1,743) = 171.0, p = 2.6 × 10−35], caffeine [F(1,565) = 8.8, p = 0.0032], hexadecenoic acid, glycerol-3-phosphate, histidine, tryptophan [F(1,778) = 39.1, p = 0.0007], tyrosine [F(1,788) = 39.1, p = 6.8 × 10−10] and threonine [F(1,778) = 3.9, p = 0.05] (Lawton et al. 2008; Menni et al. 2013). There was a significant difference between gender categories for serine [F(1,755) = 7.4, p = 0.007], erythritol/threitol [F(1,743) = 10.5, p = 0.001], caffeine [F(1,565) = 24.3, p = 1.1 × 10−6] and tryptophan [F(1,778) = 55.4, p = 2.6 × 10−13]. There was also a significant interaction between gender and age categories for caffeine [F(1,565) = 17.6, p = 3.2 × 10−5].
Age-related changes in amino acids were also observed. These changes included tryptophan [F(1,778) = 11.7, p = 0.0007]; also showed a significant difference between gender categories [F(1,778) = 55.4, p = 2.6 × 10−13] which decreases with age and tyrosine [F(1,788) = 39.1, p = 6.8 × 10−10] which increases with age (as shown in Fig. 6), threonine and serine which both decreased with age and methionine and cysteine [F(1,785) = 16.0, p = 7.1 × 10−5] which also both decreased with age. Cysteine also showed a significant difference between gender categories [F(1,785) = 12.9, p = 0.0003] and showed a significant interaction between gender and age categories [F(1,785) = 4.8, p = 0.03]. Vitamin D metabolites also show decreases with age in both males and females, and have been related to the onset of the metabolic syndrome [e.g. Lee et al. (2009), Lu et al. (2009)] and this observation might argue for the benefits of vitamin supplementation in older people. For example, 24-Hydroxygeminivitamin D3 showed a difference between age categories [F(1,703) = 52.2, p = 1.3 × 10−12], gender categories [F(1,703) = 36.8, p = 2.2 × 10−9] and a significant interaction between age and gender categories [F(1,703) = 5.7, p = 0.02]. Different fatty acids showed either increases or decreases with age (e.g. octadecadienoic acid increased with age [F(1,763) = 8.6, p = 0.003]), but no correlation between age and carbon number, nor degree of saturation, was observed for fatty acids. Erythritol and/or threitol showed an increase (as shown above) with age as did inositol [F(1,779) = 151.8, p = 5.5 × 10−32], which also showed a significant interaction between age and gender categories [F(1,779) = 11.3, p = 0.0008]. These two changes are consistent with the age-dependent increases in classes of carbohydrates that underpin diabetic complications (Brownlee 2001).
BMI
While gender and age are independent variables, the body mass index (BMI) is not (although is taken as such for the purposes of this study where one class is BMI <25 and the other class is BMI >30). Nonetheless, with obesity becoming a growing problem in developed and developing countries, even in children (Friend et al. 2013), the measurement of BMI and its relationship to the serum metabolome has become of increasing importance. As is well known, increased BMI is correlated to increases in body fat, greater risk of insulin resistance and metabolic disorders including diabetes and cardiovascular diseases [e.g. Pradhan (2007)]. It should be remembered that BMI is linked to excess weight and the associated risk of insulin resistance and metabolic disorders. BMI is not directly correlated to adiposity as a higher BMI can be related to excess bone, muscle or fat and does not take into account the distribution of the latter and its influence on metabolic diseases. However, BMI provides a readily available surrogate measure of overall body fatness in large-scale studies and was therefore chosen as an appropriate surrogate marker in this study. Two-way ANOVA was performed using BMI (<25 vs. >30) and gender as the main effects.
In this study a range of amino acids showed either an increase (cysteine [F(1,690) = 18.8, p = 1.6 × 10−5], cystine [F(1,686) = 16.9, p = 4.4 × 10−5], glutamine, tyrosine [F(1,695) = 62.6, p = 9.9 × 10−15], phenylalanine [F(1,687) = 28.4, p = 1.4 × 10−7] and valine [F(1,685) = 32.0, p = 2.2 × 10−8]) or decrease (asparagine [F(1,687) = 12.8, p = 0.0004], histidine, serine [F(1,670) = 4.1, p = 0.04] and phosphoserine [F(1,498) = 29.6, p = 8.3 × 10−8]) in relative amounts as BMI increased in one or both genders. Cysteine [F(1,690) = 11.6, p = 0.0007], valine [F(1,685) = 53.9, p = 6.0 × 10−13], serine [F(1,670) = 6.9, p = 0.009] and phosphoserine [F(1,498) = 6.1, p = 0.01] also showed a significant difference between gender categories and there was a significant interaction between gender and BMI categories for tyrosine [F(1,695) = 4.3, p = 0.04] and phosphoserine [F(1,498) = 5.7, p = 0.02]. Valine, tyrosine and phenylalanine have been strongly linked as early makers of insulin resistance and markers of risk for the development of diabetes (Newgard et al. 2009; Wang et al. 2011). Phosphoserine can be associated with cysteine production, serine metabolism or as a byproduct of protein degradation. Short-chain organic acids (including acetate [F(1,645) = 38.4, p = 1.1 × 10−9], 2-aminobutanoic acid [F(1,637) = 8.9, p = 0.003] and 2-aminomalonic acid [F(1,642) = 57.2, p = 1.4 × 10−13]) showed a decrease in relative concentration with increasing BMI. 2-Aminomalonic acid also showed a significant difference between gender categories [F(1,642) = 34.3, p = 7.4 × 10−9]. Four diacylglycerides show a decrease as BMI increased, for example, DG(44:6) showed a statistically significant difference applying 2-way ANOVA [F(1,489) = 57.0, p = 2.1 × 10−13] and also showed a significant difference between gender categories [F(1,489) = 143.5, p = 3.6 × 10−29]. Five sphingolipids show a decrease as BMI increased, for example, SM(d18:1/24:1) showed a statistically significant difference applying 2-way ANOVA [F(1,518) = 36.0, p = 3.8 × 10−9] and also showed a significant difference between gender categories [F(1,518) = 88.5, p = 1.6 × 10−19]. Four lyso-glycerophospholipids show a decrease as BMI increased, for example, lysoPC(18:2) showed a statistically significant difference applying 2-way ANOVA [F(1,693) = 88.4, p = 7.6 × 10−20] and also showed a significant difference between gender categories [F(1,693) = 27.1, p = 2.6 × 10−7]. Three fatty acids show a decrease as BMI increased, for example, dodecanoic acid showed a statistically significant difference applying 2-way ANOVA [F(1,658) = 20.4, p = 7.4 × 10−6] and also showed a significant difference between gender categories [F(1,658) = 34.9, p = 5.7 × 10−9]. Citrate and fructoselysine-3-phosphate showed female-specific decreases as a function of BMI. The latter is observed in increased concentrations in tissue and biofluids of diabetic subjects as an Advanced-Glycation Endproduct (AGE) (Delpierre and Van Schaftingen 2003). Glycerol [F(1,655) = 43.9, p = 7.1 × 10−11] and glycerol-3-phosphate showed male-specific increases in amounts (2-way ANOVA results for comparison of gender for glycerol was F(1,655) = 91.2, p = 2.6 × 10−20). Glutamine and glutamate showed an increase and a decrease respectively and threonine showed a decrease as BMI increased. Correlation analysis showed that diglycerides, glycerophosphocholines, sphingomylenins, tyrosine, tyrosyl-arginine and urate also correlated with BMI (Supplementary Fig. 5).
Blood pressure
Elevated blood pressure (BP) is associated with an increased risk of cardiovascular diseases [e.g. He and Whelton (1999)]. In the UK, up to 38 % of the population is considered hypertensive at one stage or another of their lives, with a greater prevalence of high blood pressure in men. Here we found that a range of metabolic classes in serum were altered in relation to increasing blood pressure when comparing normal blood pressure (systolic = 90–120 mmHg) versus hypertension (systolic >140 mmHg). Two-way ANOVA was performed using Blood Pressure (Normal; Hypertension) and Gender as the main effects.
Methionine sulfoxide was negatively correlated with BP, in both males and females, and methionine showed an increase in relative concentration with blood pressure. One interpretation is that reactive oxygen species that do not oxidize methionine may damage other tissues, leading to a range of disorders (Kell 2009).
Multiple amino acids showed changes including a decrease in cysteine [F(1,589) = 11.5, p = 0.0007] and lysine in both males and females whilst other changes were gender specific (e.g., decreased alanine [F(1,567) = 13.3, p = 0.0003] and increased tryptophan in males only and increased histidine and decreased threonine in females only). Cysteine also showed a significant difference between gender categories [F(1,589) = 10.2, p = 0.001] and there was a significant interaction between gender and BP categories for cysteine [F(1,589) = 8.6, p = 0.003]. Lactate relative concentrations were increased in both genders [F(1,587) = 9.3, p = 0.002] whilst acetate [F(1,543) = 10.1, p = 0.002] decreased in females only. Citrulline increased (F(1,592) = 5.7, p = 0.02) and showed a significant difference between gender categories [F(1,592) = 3.9, p = 0.05] across both sexes as BP increased, while erythritol/threitol [F(1,554) = 10.1, p = 0.002] showed a interaction between gender and BP which was statistically significant [F(1,554) = 5.8, p = 0.02], and erythronic acid/threonic acid decreased in both genders. Glyceric acid and glycerol-3-phosphate both increased and sucrose decreased in both males and females. Other changes included decreases in indole-acetate [F(1,561) = 9.7, p = 0.002] in males only. Correlation analysis showed links to elevated BP to urate, triacylglycerides, dipeptides, glycerophosphocholines and 4-hydroxyphenyllactic acid (Supplementary Fig. 5).
Smoking
Smoking is an important risk factor in cancer and cardiovascular diseases; the metabolic disturbances associated with smoking can have important roles in the onset and progression of these diseases. Two-way ANOVA was performed using smoking (non-smoker, ex-smoker, smoker) and gender as the main effects. Correlation analysis showed links between smoking status and salicylic acid, assumedly derived from aspirin and the lifestyle influences on the metabolic phenotype. Smoking was also correlated with the two aromatic amino acids tyrosine ([F(2,796) = 3.7, p = 0.02], Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for smokers vs. non-smokers (p = 0.02)) and tryptophan (elevated in smokers). Tryptophan has been associated with smoking initiation and nicotine dependence previously (Wang and Li 2010) and our data show decreases in the metabolically related indole-acetate and indole-propionate ([F(2,757) = 1.4, p = 1.3 × 10−5]; indole propionate also showed a significant difference between gender categories [F(1,757) = 4.7, p = 0.03] and there was a significant interaction between gender and BP categories for indole-propionate [F(2,757) = 6.2, p = 0.002]; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for smokers vs. non-smokers (p = 7.7 × 10−6) and smokers vs ex-smokers (p = 0.04)). Statistical analysis also showed decreases in other amino acids including aspartate, histidine and lysine in smokers. Glycerol ([F(2,759) = 3.3, p = 0.04]; glycerol also showed a significant difference between gender categories [F(1,759) = 40.8, p = 2.9 × 10−10]; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for smokers vs. non-smokers (p = 0.03)) and glycerol-3-phosphate were decreased in smokers as were a number of fatty acids (for example, octadecenoic acid [F(2,759) = 3.3, p = 0.04]; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for non-smokers vs. ex-smokers (p = 0.02)). Lactate [F(2,778) = 3.5, p = 0.03; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for non-smokers vs. smokers (p = 0.03)] and citrate [F(2,800) = 3.9, p = 0.02; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for non-smokers vs. smokers (p = 0.01)] are also decreased in smokers as is inositol [F(2,784) = 15.7, p = 2.0 × 10−7; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for non-smokers vs. smokers (p = 9.2 × 10−8) and for ex-smokers vs. smokers (p = 0.0006)]. Biotin was decreased in smokers [F(2,814) = 20.0, p = 3.2 × 10−9; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for non-smokers vs. smokers (p = 1.3 × 10−9) and for ex-smokers vs. smokers (p = 0.001)] and this has been shown previously in women (Sealey et al. 2004). Finally caffeine is present at lower relative concentrations in smokers [F(2,655) = 8.1, p = 0.0003; also showed a significant difference between gender categories [F(1,655) = 32.5, p = 1.8 × 10−8]; Tukey post hoc tests showed that, independent of gender, comparisons of smoking categories were statistically significant for smokers vs. non-smokers (p = 0.001) and smokers vs. ex-smokers (p = 0.0006)] which is unexpected as there is a logical lifestyle link between coffee drinkers and smokers; however this may show a change in rates of caffeine metabolism in smokers.
Correlations between clinical chemistry and metabolic profiling data
In addition to metabolite profiling, each sample was also subjected to a panel of conventional clinical chemistry assays. This was to enable positive and negative correlations (if any) to these standard clinical diagnostics and the broader metabolic phenotypes to be determined. This ability to anchor newer methods of volunteer/patient phenotyping, in this case metabotyping, with currently used “best practice” represents an important step towards obtaining wider acceptance of the utility of the metabolite profiling approach. The results of this for the correlation of clinical chemistry with GC–MS analysis is illustrated in Fig. 7 (UPLC–MS correlation in Supplementary Fig. 5).
An obvious area where such correlations would be expected is across lipid (and particularly cholesterol) metabolism. As might be expected correlations emerged from the metabotypes determined here between total cholesterol concentrations in serum and the amounts of monoglycerides and diglycerides present. There were also positive correlations between circulating high density lipoprotein cholesterol (HDLC) and relative concentrations of fatty acids, diglycerides, phosphotidylcholines, sphingomylenins and triglycerides, although we were unable to find any correlations for the low density lipoprotein (LDLC). Triglycerides as determined by standard clinical chemistry assays were associated with raised di- and monoglycerides, phosphatidylcholines, sphingomylenins and urate. As discussed above, there was also a correlation of diastolic blood pressure with urate, triglycerides and phosphatidylcholines.
Another set of interesting correlations relating to organ function was seen when some of the clinical markers for liver function were examined. For example, amongst a range of other correlations, both AST and ALT were associated with relative concentrations of urate and 4-hydroxyphenyllactic acid. ALT, in addition, also covaried with acylglycerides and the PC/PE ratio. As observed above, systolic blood pressure (SBP) was also associated with 4-hydroxyphenyllactic acid. Another liver enzyme, GGT varied with diglycerides, glycerophosphocholines, urate, tyrosyl-arginine, aspartate and glutamate whilst no correlations were seen for LDH. In the case of renal function creatinine and urea concentrations were both associated with circulating dipeptides and hexanoylglycine, with creatinine also covarying with phosphatidylcholines, sphingomylenins, urate, erythritol/threitol and triglycerides. Correlations for many other clinical chemistry markers, for e.g., serum glucose, with circulating metabolites were also found.
Concluding statement and future roles for The Husermet Project
The importance of the Husermet project is that it has developed the tools and resources to collect and provide metabolic profiles based on chromatography-mass spectrometry for a large human population (Dunn et al. 2011). This is a vital prerequisite to well-powered studies that can complement the large-scale but necessarily qualitative studies of genome sequence variation now appearing. Here we describe how these have been applied to profile a sample of the ‘normal’ UK population and for these 1,200 healthy individuals define biologically important metabolic changes associated with age and gender as well to link metabolic changes with disease risk factors including BMI, blood pressure and smoking. It was noteworthy that a significant number of metabolites known to be associated with insulin resistance and the metabolic syndrome did indeed increase with age, indicating the great dangers of a diabesity epidemic in the UK. Additionally, we have correlated metabolic variations with clinical chemistry measurements to indicate metabolic disturbances associated with differences in these variables. ‘Omics’ measurements are normally hypothesis-generating rather than hypothesis testing (Kell and Oliver 2004), although it is always gratifying to be able to reproduce known and published data, as many examples illustrated above have done.
Most importantly, the Husermet protocol has developed a dataset made publicly available through MetaboLights (Haug et al. 2013) so this large resource can be applied as required by the scientific community. An obvious next step is the integration of our data with those for the recently published human metabolic network reconstruction (Swainston et al. 2013) and the other small molecules with which it interacts (Kell 2013; Kell et al. 2013).