Scope of metabolomics

The metabolites in a living system or a given sample are termed the metabolome [1]. Metabolites analysed by metabolomics are in the molecular mass range of 80–1200 Da. Metabolomics identifies a multitude (ideally all) of the metabolites in a given biological sample. In this way it provides a snapshot of the metabolites involved in distinct processes.

Whereas many functional features can be bioinformatically computed from the genome (e.g. RNA variants, splicing, protein sequences), the metabolome has to be analysed empirically and cannot be predicted from the genome. This is mainly due to the fact that metabolomics reflects input from the genome and the very dynamic environmental interaction with biochemical homeostasis. Furthermore, metabolomics is very closely linked to the functional phenotype, since the metabolites mirror dynamic processes that have been already performed or were happening at the moment of sample collection.

Metabolomics has been successfully used in the search for biomarkers for disease prediction and progression [2, 3], for analyses of drug action [4, 5] and for the development of companion diagnostics [6, 7]. Furthermore, metabolomics has been instrumental in discovering the impact of the genome on metabolic subtypes of human physiology [8, 9]. The metabolome in humans is influenced by sex [10], age [11], BMI [12], hormonal status [13], medication [14], nutrition [15], lifestyle (alcohol [16], smoking [17], coffee [18]) and diurnal rhythm [19], just to name the most penetrant confounders. However, the human metabolome is very stable over months [20] and even years [21] and deviations from conserved patterns may reflect a disease, environmental challenge or lifestyle change.

Several metabolites or metabolite classes have gained much attention in the field of diabetes research in the search for a method for early disease detection, differentiation of progressor types and compliance with medication. Among these metabolites, branched-chain amino acids [22], lysophosphatidylcholines, acylcarnitine and glycine [17], 2-amino adipic acid [23] and 1,5-anhydroglucitol [24] are being investigated for diagnostic clinical use. There are several excellent reviews dedicated to metabolic biomarkers in diabetes discovered by metabolomics [2528].

Generic processes in metabolomics

Metabolomics profits from the experience with other omics approaches, especially with respect to study design and biostatistics and bioinformatics. Table 1 depicts major elements of contemporary metabolomics. Projects in metabolomics have several requirements in terms of study design, since they generate a large amount of data [29]. Therefore, detailed documentation of phenotypes associated with samples has to be prepared and maintained. Human studies involve samples of urine, serum, plasma and saliva, which have very different metabolite spectra [25, 3032]. The identification of controls appropriate to the aims of the project poses a special challenge. This is because of the fact that the human metabolome is influenced by many confounders with a known impact. In an ideal situation all samples in a large metabolomics experiment should be matched for confounders. As this is often not possible, a sample randomisation might improve the outcome of analytical procedures, and confounder effects must be corrected for during data analyses. Preparation and adherence to standard operating procedures (SOPs) is essential to maintain a cohort dataset as a sustainable resource. Pre-analytical issues could create a substantial component of variance prohibiting reliable biostatistical processing and interpretation of the data [33]. The common issues (e.g. multicentre study with different collection procedures, confused sample identity, insufficient sample volume) are known and should be avoided [34]. Handling of the liquid sample should include use of robots to improve both throughput and precision. Separation of metabolites prior to their identification has been proven to increase the resolution and sensitivity of analytics. Gas chromatography (GC) or liquid chromatography (LC) can be coupled to analytical instruments and improve the overall performance. Progress in analytical methods (especially in mass spectrometry) has made metabolomics possible and efficient in the last decade. Data evaluation is an integral part of metabolomics. Sophisticated biostatistics models are needed to interpret the collected data. As metabolomes of different species (e.g. human, plant or bacteria) are in part distinct but equally easily accessible in databases, care has to be taken to assign metabolites to pathways of health and disease. Quality control and quality management are essential in all steps of metabolomics, because without these procedures it is impossible to make decisions on outliers, metabolite identity and concentration bias/distribution and confidence intervals.

Table 1 Essential elements of metabolomics experiments

Common metabolomics methods

There are a variety of analytical approaches that can successfully be used for metabolomics analysis [35, 36] (Table 2). The key element in the success of population-based metabolomics over the last decade is the availability of the quadrupole tandem mass spectrometer. Contemporary instruments are very robust, fast and sensitive, although they have a lower mass resolution. An MS/MS unit can be coupled to GC or LC to increase metabolite coverage. The GC-MS/MS requires laborious on-the-fly chemical derivatisation of metabolites prior to analysis. Because of the high temperatures in the GC unit, thermic labile metabolites cannot be identified. Highly polar metabolites are also identified by GC-MS/MS. Nevertheless, the GC-MS/MS is very popular in diagnostic laboratories for analyses of drugs and steroids and further for fatty acids, sugars or tricarboxylic acid cycle metabolites in the discovery labs. LC-MS/MS does not require metabolite derivatisation (but, for example, amino acids benefit from this process) and allows the detection of a broad range of molecules (molecular mass <2000 Da) not covered by GC-MS/MS, such as amino acids, biogenic amines, organic acids and lipids. NMR or LC-NMR do not require any metabolite derivatisation and samples might be re-used for other analytics after the NMR analysis. Other advantages of NMR are its very high measurement stability and its resolution of lipids. However, NMR still reveals a major drawback in terms of sensitivity as only a few hundreds of metabolites can be quantified.

Table 2 Analytics for metabolomics

Two analytical approaches can be used for metabolomics: targeted and non-targeted. The features of these approaches are compared in Table 3. The experiments with GC-MS/MS are mostly targeted, whereas the LC-MS/MS could be performed either in targeted or non-targeted mode. These different approaches require distinct sample preparation and equipment tuning.

Table 3 Comparison of features of targeted and non-targeted analytics

In the targeted mode only a select set of metabolites (often a complete metabolite family, e.g. eicosanoids) can be quantified during the MS/MS analysis. The simplest version of a targeted assay is the flow injection analysis (FIA), where the sample is directly injected into the mass spectrometer. FIA may work for many applications, including quantification of amino acids and lipids, but will not resolve certain isobaric compounds, such as leucine and isoleucine or lipids with the same total chain length. Therefore, in addition to FIA, further LC-MS/MS approaches are also popular. The analysis of the pre-selected metabolites is based on the characteristic fragmentation pattern that allows their unequivocal identification and quantification. For absolute quantification purposes, known concentrations of a set of internal standards with identical/similar chemical properties to the metabolites of interest (often isotopically labelled metabolites) are added to the sample and analysed together. Apparatus that is properly tuned and operating in the targeted mode can be very fast, robust and automated (Fig. 1).

Fig. 1
figure 1

Comparison of throughput and coverage in targeted and non-targeted metabolomics. Specialised targeted approaches can be very fast but will not be able to provide comprehensive metabolome coverage. On the other hand, non-targeted approaches provide large coverage at slower throughput

In the non-targeted approach the analytical procedures are optimised to cover the entire metabolome present in the sample without focusing on a specific metabolite class. Logically, quantification is difficult as it is not possible to provide internal standards for all molecules. Instead, the differences in ion count for every metabolite analysed are used for semi-quantitative comparison. Non-targeted metabolomics requires instruments with high and very high mass accuracy to allow identification of the measured metabolites. This may include the use of quadrupole time-of-flight (Q-TOF), orbitrap, Fourier transform ion cyclotron resonance (FTICR), or quadrupole linear ion trap (Q-TRAP) instruments. The throughput and sensitivity in the non-targeted mode is lower than that in the targeted mode. A common problem associated with metabolomics analysed by LC-MS/MS is that the mass spectra are hardware-dependent. Therefore, the same metabolite may have different features, such as retention time or fragmentation spectra. As a consequence, it is still a challenge to compare data from different sources containing unknown (not annotated) metabolites.

It has to be said that for the quantification of a small number of metabolites, any ELISA or RIA would be superior to mass spectrometry or NMR analyses in terms of sensitivity. However, the antibody-based quantification of metabolites has the drawbacks that it has low throughput and is limited to a couple of metabolites. Furthermore, antibody cross-reactivity limits the selectivity of assays.

If the same sample were to be subjected to different analyses available for metabolomics, some metabolites would be detected by only one approach, whereas others would be identified by more than one. In the example shown in Fig. 2 the serum sample underwent analyses by targeted LC-MS/MS, non-targeted LC-MS/MS and LC-NMR, and altogether 482 metabolites were detected [37]. In targeted and non-targeted LC-MS/MS 39 metabolites were found to overlap, and using all three approaches only nine metabolites (glucose, proline, alanine, valine, tyrosine, methionine, phenylalanine, histidine, lysine) overlapped. This example clearly shows that multiple approaches could be used to increase the coverage of the metabolome. Recently, human serum and urine metabolomes have been analysed with different mass spectrometry methods applied in parallel and revealed 4229 and 2206 metabolites, respectively [38, 39].

Fig. 2
figure 2

Comparison of the coverage of the metabolome with different methods. The number of all detected metabolites is given in the circle with the dotted line. Distinct metabolomic analytical methods reveal different unique but also common metabolites as indicated in the Venn diagram. Modified from [37]

Future developments

The area of metabolomics is developing very fast and several issues have been already identified as limitations. The study design may benefit from the rules regarding procedures already defined for clinical trials. The same applies to requirements for pre-analytical procedures, including collection, storage and transport. SOPs on pre-analytical procedures are present in public records but compliance is not high among different laboratories because there is no binding agreement on usage, or the elements of SOPs cannot be realised in the same way. Standardisation is a very large issue. Proposed approaches for standardisation, currently being investigated by many laboratories, include provision of reference substances and their mass spectra, as well as formats for data deposition in public repositories for metabolomics. In contrast to genomics or transcriptomics, metabolomics does not cover the whole metabolome. Therefore technological approaches, freely accessible mass spectrometrical analysis algorithms and data analysis resources (Table 4) are being developed worldwide to increase coverage of the metabolome.

Table 4 Useful resources for metabolite analyses