Introduction

Biomarker discovery has traditionally played an important role in medicine, dating back to medieval times when urine was used to diagnose the health of people based on markers such as colour, smell and taste. Similar to drug discovery, methods and applications of biomarkers have slowly evolved. The discovery and qualification of new biomarkers has usually been the domain of clinical chemists, with a range of metabolic biomarkers being routinely measured today. More recently genomics and proteomics have promised biomarkers for improving the diagnosis and prognosis of pathologies [1].

The apparent usefulness of new biomarkers has made biomarker discovery a tangible aim in the life sciences. Metabolomics and metabolic profiling have recently become very popular in many research projects and often promise discovery of useful novel biomarkers. The majority of these projects, aided by advanced bioinformatics techniques, reveal one or more metabolites at differential levels between sample sets believed to be indicative of a disease, a physiological condition or a similar health-related target. Currently, many research reports and publications designate such metabolites as biomarkers. Most of these studies, however, take the research no further than the discovery stage. Very rarely are these new “biomarkers” validated, leading to an untimely and sometimes undeserved “death” of the biomarker. This lack of validation and translation (called “biomarker qualification” by the US Food and Drug Administration [2, 3]) is undoubtedly detrimental to the metabolomics science. Indeed, this is exactly where the major scientific challenges now are, and lack of progress may ultimately lead to loss of interest of funding agencies and industry in metabolic biomarkers.

This “black cloud” is not only hanging over metabolomics; proteomics has long been promising new biomarkers, but has so far failed to deliver anything clinically applicable [4]. As metabolomics is the newcomer, it should learn from the “mistakes” and shortcomings of proteomics. Importantly, it must be understood that there are significant differences between metabolomics and proteomics. Still, there is little doubt that metabolic biomarkers will follow, just like proteins, a “long and uncertain path to clinical utility” [5].

The aim of this short perspective is to critically discuss the current state-of-the-art of metabolic biomarker discovery, with highlights and shortcomings, and to suggest a pathway to clinical usefulness.

Cynics will argue that discovering differentiating metabolites is the same as spotting the differences between two cartoons commonly found in puzzle books. The main difference is that the data-mining tools involved make this a somewhat more sophisticated and expensive exercise, with the cartoons being generated by multimillion-dollar instruments. Noticing differences between the metabolic profiles in urine samples of a severe diabetic and a healthy mouse is still roughly the same exercise; and it is not much more scientifically interesting or useful. In reality the discovery of differentiating metabolites between sample sets is as far from establishing a qualified and useful biomarker as is finding a compound that is active in an in vitro bioassay from this compound becoming the basis of a new drug.

Humans or animals showing clear, easy to diagnose pathophysiological changes are usually very sick creatures suffering from derailed homeostasis; a complex set of problems that may be secondary or tertiary effects of the original cause. Most of these processes are disease side effects or processes, where the metabolism is trying to compensate for the perturbation of the underlying responsible illness. Metabolic differences between healthy and sick individuals can be significant but are likely to relate to the consequences rather than the causes of the pathology, and be biologically or medically of little predictive relevance. More importantly, the measurement of these differences by “comprehensive” metabolomics methods will probably lack the precision and accuracy that is needed in a clinical setting. In almost all cases, differentiating metabolites can only become diagnostic biomarkers after a multistep qualification process, rigorously testing precision, accuracy and diagnostic/prognostic value.

The ideal design of an experiment for biomarker discovery incorporates one or more steps towards qualification. At the very least, it should clearly foresee the next required steps. At the same time, editors and referees of scientific journals as well as funding agencies should feel obliged to scrutinize these projects, to make sure that biomarkers can and will go through qualification steps.

In light of this situation, we suggest a different terminology for the biomarker discovery, qualification and application stages (Fig. 1). During the first stage, differentiating metabolites are obtained, in particular for two-way comparisons (sick versus healthy, before and after dosage etc.). More sophisticated projects will use multiple sampling points (over different time points or from different tissues) or a range of stages of disease, concentrations of dosage etc. In such cases, it may be better to call the discovered compounds indicating metabolites; in this article, we will continue to use the more general phrase differentiating metabolite though. After the differentiating or indicating metabolites are unambiguously identified, the biological relevance to the underlying pathology should become apparent. We suggest that only after sufficient proof of the precision and accuracy, possible chemical and biological interferences etc., there is sufficient proof for a differentiating metabolite to be moved to the status of candidate biomarker. When the candidate biomarkers are subsequently tested and clearly show established utility, they reach the application stage. In our opinion, they can only genuinely be called biomarkers at that stage (Fig 1).

Fig. 1
figure 1

The process of biomarker discovery covers more than finding differentiating metabolites

Differentiating metabolites

The quality of the initial sample set determines the specificity, accuracy and application range of discovered metabolites and determines the chance of successful translation to clinical applications. It is therefore best to use the term differentiating metabolites at this stage, as there is no proof that differentiating metabolites from the initial study will show the same treatment differences in subsequent experiments. The fact that a number of metabolites show a differentiating behaviour may simply be the result of the profiling or sampling method used, or particular circumstances of the experiment rather than a real biological phenomenon. Many biomarker discovery projects focus on health versus disease. Diseases will often result in general perturbations of homeostasis, with many general metabolic pathways affected. Although this can lead to significant changes of some metabolites, it may not be specific to the disease or pathology, resulting only in a marker for general illness rather than a marker for a specific disease.

Experimental design

The experimental design offers two possible strategies, neither of which is error-proof. A highly controlled experiment can be designed, differing only in the single parameter of interest. This will yield a highly specific differentiating metabolite, but may show very little robustness; or in other words, it may only be valid under the particular circumstances of the experiment. When that experimental control is relaxed, the accuracy and precision of the differentiating metabolites become less certain. It will be much more difficult to guaranty that other factors are evenly and randomly distributed across the samples, introducing the possibility that metabolic differentiation derives from unknown biases present in the sample set. This will demand larger sample numbers, to gain enough statistical power to determine relevant differentiating metabolites. In the end, it is less important which experimental design is followed as long as the scientists involved are aware of the possible shortcomings and develop their qualification accordingly.

Biological or analytical differences

Most biomarker discovery projects use unbiased metabolic profiling methods, allowing rapid analysis of large sample sets for multivariate statistics with a sufficient base. Scientists often design highly controlled experiments based on 50 to 100 samples (Table 1). The methods can easily yield over 1,000 variables. Analysing such a large number of analytes in a short analysis time can compromise the precision and accuracy, as compared to traditional targeted analysis. In these statistically underpowered experiments, there will be a high rate of false positive discoveries without very stringent significance thresholds. Importantly, it is impractical to validate an analytical method before knowing which metabolite is of interest as a biomarker. The first and possibly most essential step towards validating differentiating metabolites is therefore reanalysis of a subset of the samples by using a different method targeted specifically towards the differentiating metabolites, to confirm the initial findings and reduce the possibility of analytical bias of the first method in the discovery phase. This is by far the easiest step in the qualification process and, importantly, does not require new biological experiments. It does, however, leave a chance of unwanted biases in the sample set. It is advisable to re-extract the samples. This procedure is standard practice in other fields of analytical sciences, to assess the accuracy of any new analytical method. Unfortunately, scientists often seem overly confident in their methodologies and results are often presented without this step. Even the most carefully performed, targeted GC-MS analysis, using isotopically labelled internal standards for the analysed metabolites can result in CVs of 30% [6] and a less targeted approach will almost certainly introduce a high chance of machine bias in metabolomic analyses. For scientific publications, journal editors and referees should not accept manuscripts presenting the discovery of differentiating metabolites without a proper qualification using complementary analytical methods. Even for NMR profiling, the use of an alternative analysis to prove both the identity and the quantification should be mandatory, unless differentiating metabolites can be identified and quantified with multiple signals.

Table 1 Selected, recent metabolic biomarker discovery projects published in peer-reviewed journals

Unambiguous identification

The identification of metabolites remains a major bottleneck for metabolomics. A significant part of the human metabolome remains unidentified. These unknowns might be genuinely unreported or they may not yet be known from human samples. This lack of knowledge seriously hampers the rapid assimilation of metabolomics data in systems biology approaches and obstructs many biomarker projects. The first challenge is recognition of known components of the metabolome. Current initiatives to describe the human metabolome either based on literature (e.g. Human Metabolome Database (HMDB)Footnote 1 [7]) or chemical data (MassBankFootnote 2 [8], MetlinFootnote 3 [9]) are extremely helpful and enable rapid dereplication of identification endeavours. International collaborations to consolidate these databases to create one comprehensive standardized database are highly desirable. Such a standard database would encourage submission of data from those who currently are unsure about the choice of database. One universal database would not only improve the dereplication process but also assist in the interpretation of metabolomics results and help in understanding the biological relevance of differentiating metabolites. Journals publishing metabolomics data should strive to include the data on structural characterisation in detail, so that this information reaches the public domain. The submission of data should discriminate between unambiguously identified novel metabolites and unknowns or partially identified compounds, but give a similar level of detail in both cases, which could help in identification by other groups.

The process of unambiguous identification is tedious. Many scientific papers suffice with reporting accurate masses and retention times to “identify” metabolites (Table 1). In many studies, the metabolites are annotated as unknown. Where putative identifications are made, repeat analyses, including standards to verify the identifications, are often omitted.

It almost appears as if the significant amount of effort required to unambiguously identify a metabolite has made it acceptable to simply report differentiating metabolites as unknowns. This should be discouraged, as the usefulness of unidentified metabolites is very limited. In contrast to genomics and proteomics, where pieces of DNA or unknown peptides can at least be sequenced, it is impossible to link unknowns from a metabolomics study to those of other studies, or to metabolic pathways and metabolites within the current study. Furthermore, chemical identification is important for devising new efficient measurement techniques for later stages of qualification, which may be different from the initial discovery.

The subsequent fate of published unknowns is obscure. There are several scientific and practical problems with publishing unknowns. First, it is usually not clear to what extent the authors have tried to identify the unknowns; i.e. an unknown in one paper may be reported as an identified compound in another paper. The second problem is that spectral or chromatographic data are not always adequate or reliable. For example, Table 2 lists a few m/z ratios and retention times from published metabolomics studies. Yin et al. describe a phosphatidylcholine moiety at m/z 184.422 [10]. This number may be precise but it is not very accurate, as the calculated exact mass is 184.1507. The lack of accuracy is difficult to determine and makes database searching impossible. Furthermore, Table 2 gives three different reports for an unknown at m/z 235. All three studies, however, have determined these ratios to a different number of significant figures. It remains unclear whether these three compounds are the same or whether they are three different unknown compounds. Sun et al. propose a molecular formula, but fail to report that an aromatic heterocyclic compound of C8H11N8O is probably not from human origin, weakening their assumption. There is currently no established path to relating unknowns from different studies, but listing retention time and m/z only clearly is not sufficient. The definition of the minimal amount of information needed to identify unknowns across different studies has been the subject of considerable discussion, and an expert group has suggested that a combination of MS/MS spectrum, accurate mass and retention time relative to a standard might be sufficient [11]. These criteria have not been consistently tested yet and more evaluation is required. In the interim, authors, referees and journal editors should apply these minimal criteria.

Table 2 Various markers observed in different studies showing difficulty of comparing retention times (Rt) and m/z of unknowns

The first step in identification is usually classification. Some classes of metabolites are less useful as biomarkers than others, e.g. phospholipids (vide infra), which may be sufficient to decide on further identification. Unambiguous identification of target metabolites means that chromatographic properties and spectral data of differentiating metabolite are identical to those of authentic standards analysed in the same laboratory. A less rigorous identification can be based on close similarity to published data. If identifying data for the metabolite is not available from the literature or databases, isolation of the compound followed by multidimensional NMR, x-ray analysis or total synthesis are currently the only practical options to fully elucidate the structure of the metabolite. Obviously, this requires a significant amount of work, especially when the metabolite is present at very low levels (micromolar or less) or only limited tissue amounts are available for isolation. If this identification procedure is impossible, authors should be very reluctant to emphasise their discovery, but more importantly, journal editors should scrutinise the reproducibility and usefulness of the data.

Biomarkers and biology

The unambiguous identification of a differentiating biomarker is important from a biological point of view. For example, phospholipids (phosphotidylcholines) and ceramides might differ between sample groups but what does that really mean? In five of the studies listed in Table 1, glycerophosphatidylcholines (C16:0 and C18:2) are reported as differentiating metabolites. However, all five studies cover different pathologies in different organisms: chronic acute deterioration of liver function [12] to intestinal fistula [10], obese rats [13], viral infections of rhesus macaques [14] and abnormal savda (traditional Uighur medicine diagnosis [15]). Wikoff et al. [14] confirmed the upregulation of phospholipid biosynthesis by additional experiments focusing on regulation of phospholipase A2 isoenzyme. In a further unrelated study on prostate cancer in mice, phospholipid metabolism was also shown as changing, measured by NMR [16]. These six totally unrelated studies imply that several major phospholipids are the first class of compounds to change when the homeostasis of a biological system is perturbed, clearly showing that scientists must be very careful when assigning phospholipids as specific biomarkers. A careful scrutiny on the biochemistry of differentiating metabolites is an essential part of the biomarker discovery process. The purpose of understanding the biology is not only an interesting exercise; it is vital to the next step in biomarker discovery, as it is the best source of information on the likely relevance, specificity and robustness. A biomarker must not merely show a difference between healthy controls and individuals with a certain pathophysiology; it must be a distinctive difference. That is, conceptually the control group is not just a group of healthy individuals, but also a group of subjects with all other pathophysiologies. The extensive body of physiological literature may provide indications whether other pathophysiological processes as well as normal biological fluctuations of that particular metabolite affect the level of a particular differentiating metabolite, and metabolomics databases such as HMDB can play an important role here. Although the abovementioned example seems to paint a somewhat grim picture of metabolomics-based biomarker discovery, there are several studies that do find new metabolites showing clear biological relations to the underlying pathology [17], and that have been moved towards the qualification process [18, 32].

Finishing stage 1

At this point of the research, it becomes necessary to carry out initial qualification studies to decide whether it is worth advancing a differentiating metabolite to the status of a candidate biomarker. Based on identity, literature data and the first initial qualification study, it should clear if the differentiating metabolite is worth pursuing. A first step towards initial qualification will be the development of a fast and inexpensive method for biomarker detection. This method has to be precise, accurate and compatible with very large sample sets. The method also needs to be carefully validated and readily transferred to a routine environment. This new fully validated method can then be applied to a completely new sample set, preferably from an experiment specially designed for this purpose. It can be of similar size or even smaller than the original experiment. In this study, fewer variables are being monitored (i.e. only the already identified differentiating metabolites), thus the statistical power of the experiment will be higher. Preferably, the experiment will cover a wider population and a wider set of sampling conditions. The purpose of this experiment is to confirm the differential behaviour of the metabolite(s) and exclude any chances that the differential behaviour is originating from unwanted or unexpected biases in the initial sampling (Fig. 2). A positive result in this experiment successfully closes stage 1 of the qualification and promotes the differentiating metabolite to a candidate biomarker.

Fig. 2
figure 2

The design of biomarker discovery experiments determines specificity and robustness of biomarkers; more controlled experiments are more specific, but decrease the robustness; less controlled experiments will give more robust biomarkers, but may introduce unexpected biases

Stage 2: Now the hard work begins…

All the preceding work was focused on the question:

Are there any metabolites that differ in concentration levels between samples from the test population indicative of a specific pathophysiology?

At the end of stage 1, there should be a validated positive answer to this question.

In stage 2, the same question will be pursued from a different angle:

Does an abnormal level of this specific metabolite clearly indicate a specific pathophysiology when the other symptoms are considered?

Although the two questions are very similar, the required work for answering the latter question is much more complicated, because to show that a biomarker is specific, it becomes necessary to exclude all other possibilities of false positive or false negative results. It is necessary to generate a sample set significantly more elaborate than the previous experiments, reflecting the expected population to which the biomarker will be applied. The sample set should cover a large range of time points and large range of severities of the pathophysiology of interest. Care should be taken that factors such as diet, sampling time etc. are not introducing similar biases to those that may have driven metabolic differentiation in the discovery phase. The results will again be the basis for a decision on the efficacy of the candidate biomarker. Using a much larger sample set in combination with a validated measurement will reveal the detailed behaviour of the candidate biomarker with respect to viability in a clinical application. The factors chosen at this stage can significantly affect the applicability of any biomarker-based diagnostic test and should be clearly stated in any report. Subsequently, it is necessary to test the behaviour of the candidate biomarker extensively in healthy individuals, which will show the likelihood of false positives. While database and literature investigations should already have excluded generic biomarkers of ill health (vide supra), it is now necessary to analyse samples from patients with related pathophysiologies, to determine the specificity of the biomarker. At this stage, a full understanding of the behaviour of the candidate biomarker under a wide range of circumstances should be reached. That is, analysis of the biomarker in one sample should give a clear diagnosis, with a known probability for both false positive and negative results. It would also be highly desirable to have additional scientific evidence on the mechanisms underlying the relation between candidate biomarker and pathophysiology of interest.

The final stage of the qualification process is a test phase, where biomarker analysis is used routinely next to conventional methods. When the biomarker shows clear advantages over the conventional methodology (and is economically viable), the candidate biomarker can be moved to the stage of a genuine biomarker. This last stage is well removed from the initial discovery phase in terms of processes and time, and publication of interim findings can be warranted. At the current stage of development of metabolomics, innovative research directed at the early stages of the pathway to qualified biomarkers is of wide interest and worth putting on public record. However, it would be very desirable to see the focus of research efforts and publications move from the detection of differentiating metabolites to the process of progressing these initial leads to candidate and qualified biomarkers. Progress in biomarker development will also be advanced if characterising data for the identification of metabolites and evidence for the perturbation of metabolites under different pathophysiologies are not only published but also logged in readily accessible public databases such as HMDB.

Outlook to the future of biomarker-based diagnostics

Metabolomics is still in its infancy and has made great strides to prove its efficacy in the medical field. A number of reported metabolomic experiments have clearly shown its usefulness in finding diagnostic aids for the medical profession [17, 18, 25, 32]. It is also clear, however, that for metabolomics to advance in this field, a greater degree of scientific rigour will be needed in the future. This will occur either through self-regulation or through legislation, probably through both. We hope that metabolomics scientists will be at the forefront of this debate rather than trailing it. With the advent of more sensitive and advanced instrumentation and the data handling tools to cope with the large datasets, metabolomics is expected to grow significantly in importance, as long as the practitioners remember to apply good scientific practice in their experiments.