Untargeted Metabolomics Strategies—Challenges and Emerging Directions

Schrimpe-Rutledge, Alexandra C.; Codreanu, Simona G.; Sherrod, Stacy D.; McLean, John A.

doi:10.1007/s13361-016-1469-y

Untargeted Metabolomics Strategies—Challenges and Emerging Directions

Critical Insight
Published: 13 September 2016

Volume 27, pages 1897–1905, (2016)
Cite this article

Download PDF

Journal of The American Society for Mass Spectrometry

Untargeted Metabolomics Strategies—Challenges and Emerging Directions

Download PDF

Alexandra C. Schrimpe-Rutledge^1,2,3,4,
Simona G. Codreanu^1,2,3,4,
Stacy D. Sherrod^1,2,3,4 &
…
John A. McLean^1,2,3,4

9639 Accesses
445 Citations
4 Altmetric
Explore all metrics

Abstract

Metabolites are building blocks of cellular function. These species are involved in enzyme-catalyzed chemical reactions and are essential for cellular function. Upstream biological disruptions result in a series of metabolomic changes and, as such, the metabolome holds a wealth of information that is thought to be most predictive of phenotype. Uncovering this knowledge is a work in progress. The field of metabolomics is still maturing; the community has leveraged proteomics experience when applicable and developed a range of sample preparation and instrument methodology along with myriad data processing and analysis approaches. Research focuses have now shifted toward a fundamental understanding of the biology responsible for metabolomic changes. There are several types of metabolomics experiments including both targeted and untargeted analyses. While untargeted, hypothesis generating workflows exhibit many valuable attributes, challenges inherent to the approach remain. This Critical Insight comments on these challenges, focusing on the identification process of LC-MS-based untargeted metabolomics studies—specifically in mammalian systems. Biological interpretation of metabolomics data hinges on the ability to accurately identify metabolites. The range of confidence associated with identifications that is often overlooked is reviewed, and opportunities for advancing the metabolomics field are described.

Metabolomics: Going Deeper, Going Broader, Going Further

Fundamentals of Mass Spectrometry-Based Metabolomics

Integration of Metabolomic and Proteomic Data to Uncover Actionable Metabolic Pathways

Introduction

The ultimate goal of metabolomics is the comprehensive study of the low molecular weight molecules within an organism. Metabolites are the result of both biological and environmental factors and, as such, provide great potential to bridge knowledge of genotype and phenotype. Metabolomics is often likened to its proteomics sibling and has leveraged proteomics experience, but the field has evolved with inherently different challenges including the identification process. Peptides and proteins are typically a linear polymer and can be sequenced. Proteins are inferred by matching of identified experimental peptides against in-silico fragmentation spectra. Metabolites are more challenging to annotate. These small molecules often lack a common building block, although there is common use of the elements C, H, O, N, S, P, and potentially heteroatoms. The idea that untargeted mass spectrometry (MS)-based metabolomics analysis will result in a large list of ‘identified’ small molecules that can be mapped to networks and pathways is often assumed, yet high confidence analyte assignments/identifications may not be made owing to the fundamental challenges of the metabolomic identification processes. For example, features (i.e., mass-to-charge ratio and retention time pairs) can be assigned to a vast number of tentative or preliminary structures, or there may be no candidate matches in curated databases. Because metabolites lack a genetic template such as that for proteins, metabolomics databases are currently considered incomplete. In-silico metabolite databases can provide guidance, but validation of retention times and MS/MS fragmentation data with a reference standard is nearly always required for confident metabolite identification.

Since its inception, the metabolomics field focus has shifted from detecting changes to understanding the biology leading to the changes [1], and thus the accuracy of metabolite assignments is extremely important. In this Critical Insight, we will discuss various challenges inherent to LC-MS-based metabolomics and describe the ranges of confidence for small molecule annotations when performing global metabolomic analyses, a concept essential for applying metabolomic data toward a better understanding of the mechanisms of human health and disease.

LC-MS-Based Metabolomics: Strengths and Challenges

Metabolomics experiments aim to characterize diverse classes of small molecules from a variety of sample types (e.g., cell extracts, culture media, urine, serum, etc.). The metabolomics community has been influenced by numerous aspects of proteomics methodology such as separation technologies, state-of-the-art instrumentation, and data processing approaches. However, there are fundamental differences in MS-based metabolomics versus proteomics that are important to recognize. Table 1 outlines the strengths and challenges in contemporary metabolomics relative to proteomic analyses.

Table 1 Current Scale, Strengths, and Challenges of MS-Based Metabolomics and MS-Based Proteomics

Full size table

The annotated human metabolome is considered to be less complex than the proteome [2, 3], yet the diverse chemical structures exhibit a wide range of concentration, solubility, polarity, and volatility [4]. Proteomics samples often require a multi-step preparation that may involve cell lysis, purification, enzymatic digestion, and solid phase extraction [5]. Sample preparation for metabolomics involves cell lysis and metabolite extraction [6], although purification and fractionation can also be performed. Metabolomics analyses are challenged by an analyte’s rapid temporal dynamics and sample composition reflecting endogenous and exogenous species (e.g., drugs, toxins, microorganisms, and nutrients) [1]. While proteomic analyses can often differentiate organism species based on protein sequence [6] (which is particularly useful in microbiome studies), species determination in metabolomics is challenging because many small molecules are common across different organisms [7]. This can, however, be advantageous for metabolomics animal model studies as knowledge of physical properties guiding identifications can be shared across species. Another major difference between proteomic and metabolomic technologies involves the interpretation of fragmentation data. Known protein sequences and enzyme cleavage patterns enable predictable peptide sequences and fragmentation spectra. Further, the large size of protein molecules often results in multiple peptides being observed, thus increasing confidence of protein identification. This is in contrast to metabolomics studies, where the small size and wide array of molecular structures of metabolites results in a singular species with no consensus fragmentation pattern.

Targeted and Untargeted Metabolomic Studies

Figure 1 outlines the goals and the types of data sets that are generated in targeted and untargeted/global metabolomic studies. In general, targeted approaches are aimed at identifying and quantifying a limited number (tens to hundreds) of known metabolites, such as those commonly encountered in clinical analyses. Many untargeted or hypothesis generating approaches focus on acquiring data for as many species as possible, annotating metabolites, and reviewing both known and unknown metabolic changes. Data can be used for relative quantification across sample groups and to provide hypotheses that can be further studied with targeted approaches. There are two broad approaches for data acquisition in untargeted metabolomics studies. The first method uses full scan MS1 to generate accurate mass measurements for individual molecules (i.e., features) to permit statistical calculations followed by data-dependent acquisition (DDA) of a subset of samples to guide identifications. Similar to conventional proteomics techniques, metabolomics DDA methods generate fragmentation patterns for metabolites exhibiting the highest signal intensity. A second untargeted metabolomics approach is based on data independent acquisition (DIA), where workflows integrate full MS1 with MS/MS fragmentation for all precursor ions either simultaneously (MS^E [8]) or in finite mass ranges (SWATH [9]). DIA methods produce complicated fragmentation spectra and the link between precursor and product can be difficult to decipher. In downstream data analysis steps, fragment ions are matched with precursor ions based on retention time, mass, and drift time (when applicable). DIA allows fragmentation data to be acquired regardless of metabolite signal intensity. Both DDA and DIA approaches ultimately define features with mass-to-charge ratio (m/z), retention time (RT), and drift time (DT) descriptors. In the identification step, precursor ions and corresponding fragment ions are searched against databases for metabolite assignments.

One major advantage of untargeted metabolomics is the collection of data without preexisting knowledge; however, this is accompanied with the caveat that certainly sample preparation and analytical methods have a direct impact on the qualitative results that are obtained. Owing to the diverse composition of the metabolome [10], sample preparation steps, separation methods, and instrument platform and parameters will influence the subset of metabolites detected.

Analytical Platforms

A variety of separation [liquid chromatography (LC), gas chromatography (GC), and capillary electrophoresis (CE)] and detection [MS and nuclear magnetic resonance (NMR)] methods are used for metabolomics experiments. We focus on LC-MS-based metabolomics as it has become a leading technology for both polar and nonpolar small molecule analyses and draws many parallels with LC-MS-based proteomics analyses referenced herein. LC methods are time-consuming (minutes to hours) compared with direct infusion or flow injection analyses (seconds to minutes) [6]. However, the ability of LC to increase both selectivity and data content makes it invaluable [11], particularly for complex metabolomics samples such as human blood where an average of three isomers or isobars per nominal mass are estimated [12].

The coupling of ion mobility (IM) separations with LC-MS based analyses represents an emerging technology (LC-IM-MS) for metabolomics research. Ion mobility resolves gas phase ions based on their size-to-charge ratio or gas phase packing efficiency, complementing polarity and mass separations. The addition of ion mobility separation offers increased peak capacity [2], the ability to decrease chromatography time without sacrificing resolution, and opportunities to separate co-eluting precursors [12]. Rapid (milliseconds) IM separations are well integrated into time scales of most MS platforms; multiple IM spectra are acquired for each LC peak, and multiple mass spectra (microsecond time scale) are acquired for each IM spectrum [13]. In addition to improved mass spectra quality and increased selectivity, IM measurements can be used to determine collision cross sections (CCS) for individual metabolites. Unlike RT measurements that vary based on column chemistry, mobile phase, and elution gradient, CCS values are physical properties and not influenced by MS or LC settings. Inter-laboratory CCS precision is reported to be less than 5% for a broad range of molecules assayed [14]. Improvements to this precision are rapidly evolving with the development of more standardized protocols for CCS measurements.

Analysis and Identification

Untargeted metabolomics data processing workflows incorporate several defined steps including noise filtering, peak detection, peak deconvolution, retention time alignment, and finally feature annotation. Importantly, features are not always metabolites; related species (e.g., isotopes, neutral losses, adducts) of a single metabolite may be present with different m/z values. Metabolite identification is necessary to draw biological conclusions from untargeted metabolomics data. Analyte identification can be performed by searching the experimental MS1 or MS/MS data through databases available to the public for free (e.g., ChemSpider (http://www.chemspider.com), METLIN [15], Human Metabolome DataBase (HMDB) [16], MassBank [17], mzCloud (https://www.mzcloud.org), GNPS (http://gnps.ucsd.edu/), and LipidBlast [18]), or for a nominal fee [e.g., NIST Mass Spectral Library (http://chemdata.nist.gov)]. Batch searching MS/MS fragmentation spectra within these databases, however, is often not possible without commercial software. Given that numerous libraries are generally queried to maximize metabolome coverage, bioinformatics efforts are necessary to remove or reduce match redundancy. This process can be complicated since metabolite nomenclature is not entirely standardized and varies greatly by database.

Feature annotation is performed by comparing an experimental mass measurement to a database of known metabolites within a mass tolerance window to generate potential candidates. Thus, the development of high-resolution high-mass accuracy mass instruments has proven to be invaluable for discovery (MS1) and heuristic validation (MS2) metabolomics efforts. As illustrated in Figure 2, it is difficult for MS mass measurement alone to provide metabolite information beyond molecular formula. Kind and Fiehn demonstrated that high mass accuracy measurements (<1 ppm error) were inadequate for determining the elemental composition of numerous metabolites [19]; notably, the authors later showed that isotope ratio measurements were more important than mass accuracy for determining the most probable elemental composition for small molecules [20]. Additional information, such as fragmentation data, is essential for structure elucidation of a mass measurement. Putative identifications require matching an experimental MS/MS spectrum with a reference fragmentation spectrum [21, 22]. Metabolomics spectral libraries have been created with experimental data from commercially available or synthesized standards. Significant efforts are being made to routinely update content as new compounds are analyzed; as such, these libraries are considered incomplete [23]. MS/MS data is often insufficient to differentiate structural and stereo-isomers. Orthogonal evidence is needed in these cases and when experimental MS/MS data is non-discriminating. LC and IM can be used to generate retention time and collision cross section information, respectively. Both of these separation methods are capable of resolving some isomeric/isobaric species. IM has even shown utility for differentiating lipids based on position of double bond, which is often unable to be accomplished by LC [24]. MS-based metabolomic studies are performed on numerous different instrument platforms; ion intensities and fragmentation patterns vary based on analytical conditions including instrument, ionization source, and collision energy [17]. For small molecules, a collision energy that depletes some precursors may have little effect on others. The selection of isobaric co-eluting precursor ions for fragmentation may further complicate experimental MS/MS data. MS/MS matching can be subjective. Scores are generated to represent similarities between experimental data from an unknown versus that from the standard, and assignments are often made using the best match. False positives and false negatives may be the result of low quality spectra and incomplete databases, respectively. Many opportunities exist for the development of methods to calculate these unknowns, as well as a confidence metric for scoring MS/MS matches [11].

A subset of experimental metabolomics data does not match any database entry. Null matches may represent truly new metabolites or simply known metabolites that are missing from or do not match the spectral database (e.g., in-source fragments, metabolites modified by enzyme activity, etc.) [21]. Characterization of these “unknown” unknowns requires significant effort, such as that often encountered in natural product discovery of secondary metabolites [25, 26]. Algorithms geared toward predicting and comparing small molecule in silico and experimental MS/MS data are also currently available (e.g., MetFrag [22]), however, significant opportunities exist for the refinement and further development of these tools. The addition of IM data is informative; mobility-mass correlations as well as CCS/mass ratios can guide unknown identifications by giving an idea of molecular class and by excluding unlikely candidates on the basis of structure. Established metabolomics labs and metabolomics centers have fixed chromatography methods that are robust, reliable, and yield stable retention times. High quality RT and MS/MS fragmentation data of pure reference standards have been acquired for in-house libraries. These efforts certainly facilitate identification confidence, but are not feasible for most small research groups. Thus, leveraging methodologies and data with shared knowledge will benefit the entire metabolomics community.

Confidence Levels

Metabolite annotation is the crucial link between acquired data and meaningful biological information. It is essential that the confidence of metabolite assignments is transparent. In 2007, the Chemical Analysis Working Group (CAWG) of the Metabolomics Standards Initiative (MSI) published a first stage of guidelines for reporting the minimum metadata relative to metabolite identification as a means to communicate the confidence of identifications [27]. Recently, revisions to these levels have been proposed to cover special cases where level determination may be unclear [28, 29]. We propose modest changes to include orthogonal IM-MS data as evidence for metabolite identification (Figure 3).

The highest confidence identification, a validated identification (Level 1), confirms a structure with a minimum of two orthogonal properties from a pure reference standard acquired under identical analytical conditions. A lack of reference standard acquisition but predictive or externally acquired structure evidence, namely MS/MS data, exhibiting diagnostic fragments or neutral losses consistent with a specific structure would be considered a putative identification (Level 2). Preliminary identifications (Level 3) arise when accurate mass and isotopic distribution patterns produce tentative structures from database searches. Note, a single molecular formula typically renders multiple candidate structures. Molecular formula candidates (Level 4) and a deconvoluted experimental m/z (Level 5) complete the less confident annotation classifications.

Suggestions have been made to clarify the set of reporting standards with the inclusion of an evidence-based quantitative score [30]. With either a score- or level-based system, the future of annotation is likely to be influenced by multiplexed technologies. Recently, Pacini et al. obtained five levels of small molecule data in a single DIA acquisition (LC, UV, IM, MS, MS/MS) [31]. Advances of multidimensional analytical approaches are inherently the most promising for the broadest metabolome coverage. Orthogonal in-line data can provide the needed evidence to meet minimum data requirements for confident identifications. At the present time, LC and IM are successfully multiplexed with MS providing RT and CCS data, respectively. As IM-MS becomes more widespread and CCS data is populated in searchable metabolomic databases and libraries, identifications using this knowledge will increase metabolite assignment confidence. For metabolite candidates that lack an available reference standard, a quantitative structure retention relationship (QSRR) model can predict retention times [32] and computational calculations can estimate CCS values [33]. The class-specific relationship in IM is also valuable evidence to support both annotations of metabolites and exclusion of unlikely candidates. For example, only correlated molecular classes based on IM trend lines or retention times based on polarity may be considered for identification purposes.

False Discovery Rate

False positive identifications are a significant challenge for metabolomics. As described above, annotations arise by querying neutral mass against a database of candidate small molecule masses. Neutral masses are inferred from experimental m/z, and thus the presence of related isotope and adduct features may complicate neutral mass determination and potentially lead to false positive identifications. False positives can also arise during MS/MS spectrum matching. Statistical tools for estimating the error of metabolite-spectrum matches are necessary for evaluating the confidence of annotation results. The inclusion of experimental orthogonal data such as RT and CCS data will decrease false positives, but there is currently no agreed upon metric to assess false discovery rate (FDR) of metabolite identifications. In MS-based proteomic studies, target-decoy search-based FDR calculations are widely accepted [34, 35]. Briefly, predicted peptide MS/MS spectra are used to create a reverse decoy database, and experimental data matches are used to estimate FDR. In principle, a target-decoy strategy could be utilized for metabolomics using a small molecule set exclusive of the experimental species, but since metabolomics databases are incomplete, this approach is currently challenging [23]. There are reports of novel FDR methods exclusive of decoy approaches, though none has yet gained widespread acceptance. For example, a simulation model that uses the rate of a match for elemental composition search queries [36], the incorporation of a spectrum similarity score with a completion score for GC × GC/TOF-MS data [37], and a mixture modeling method coined GREAZY for phospholipids [38] have all been suggested. Querying predicted molecular formula of experimental data against a decoy set of theoretically possible candidates has been proposed; however, the inflated search space increases the chance of a false positive identification and it is difficult to distinguish artificial compositions from legitimate candidate metabolites [36]. Potential inclusion of the Seven Golden Rules [20] may offer an approach to classify the legitimate candidate list for exclusion from the decoy elemental composition list.

Biological Analysis of Metabolites

Biological interpretation of metabolomics data and, ultimately systems biology studies, hinges on the ability to accurately identify metabolites so they can be mapped to pathways and networks. Data from an untargeted metabolomics experiment is challenging to visualize and interpret due to the high density of measurements. This challenge is amplified by the fact that numerous features are identified with varying levels of confidence. Table 2 outlines several open source options for analyzing metabolomics data depending on identification confidence level (unique feature to validated identification, described in Confidence Levels section above). Subsets of data may be analyzed using different tools based on the data obtained (e.g., MS/MS fragmentation spectra is often only available for higher abundance ions from DDA analyses). Statistical and multivariate analyses are applied to prioritize data; multiple hypothesis testing, data dimension reduction (e.g., principal component analysis (PCA) scores or loadings plots and partial least squares (PLS) modeling), and data visualization (e.g., cloud plots [39]) and clustering (e.g, self-organizing map (SOM) [40]) can reveal altered ion abundances and patterns that may be characteristic of the phenotype.

Table 2 Confidence Annotation, Statistical Evaluation, and Selected Bioinformatics Tools

Full size table

Most of the existing analysis tools require a list of identified metabolites to integrate biological knowledge [42–45]. New techniques for placing small molecules in a biological context are now being presented, relying on the integration of systems biology tools. For example, genomic and metabolomic data have been combined in a mining workflow to identify pharmaceutical candidates [26]. Another innovative approach utilizes the fact that single upstream biological disruptions result in a cascade of metabolomic changes. The creation of informatic strategies, such as mummichog [41], which predict biological activity from MS1 data rather than formal MS2-dependent identifications, is an attractive concept as it circumvents identification challenges. Importantly, a high level of agreement between identifications from mummichog results and conventional identification pipelines is found. This software uses the accurate mass of m/z features to map candidate metabolites to genome-scale metabolic networks and calculates local enrichment of metabolites to distinguish those networks from a stochastic distribution of metabolites [41]. Modules are generated to reveal areas of network activity, as illustrated in Figure 4, which presents a comparison of metabolomic profiles of glucose 6-phosphate dehydrogenase deficient (G6PDd) and normal human erythrocytes. These data are then used to focus additional efforts on validating the prioritized metabolites from the multitude of possibilities, including isomeric species, from database searching.

Summary

This is an exciting time for metabolomics research. Tremendous successes have been made to establish the necessary foundation for the field to mature. The metabolomics community now has the opportunity to address the high-reward challenges associated with MS/MS data interpretation, database content, isomer resolution, identification confidence, and FDR estimation. Innovative research and development is essential, particularly at the interface of biomedical, cheminformatics, and bioinformatics fields. The metabolome is thought to be most predictive of phenotype; thus, novel ideas that address these challenges will allow the field to better understand mechanisms underlying health and disease.

References

Johnson, C.H., Ivanisevic, J., Siuzdak, G.: Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016)
May, J.C., McLean, J.A.: Advanced multidimensional separations in mass spectrometry: navigating the big data deluge. Annu. Rev. Anal. Chem. 9, 387–409 (2016)
Kim, M.-S., Pinto, S.M., Getnet, D., Nirujogi, R.S., Manda, S.S., Chaerkady, R.: A draft map of the human proteome. Nature 509, 575–581 (2014)
Article CAS Google Scholar
Beisken, S., Eiden, M., Salek, R.M.: Getting the right answers: understanding metabolomics challenges. Expert. Rev. Mol. Diagn. 15, 97–109 (2015)
Article CAS Google Scholar
Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)
Article CAS Google Scholar
Fuhrer, T., Zamboni, N.: High-throughput discovery metabolomics. Anal. Biotechnol. 31, 73–78 (2015)
CAS Google Scholar
Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G., Kell, D.B.: Metabolomics by numbers: acquiring and understanding global metabolite data. Trends Biotechnol. 22, 245–252 (2004)
Article CAS Google Scholar
Plumb, R.S., Johnson, K.A., Rainville, P., Smith, B.W., Wilson, I.D., Castro-Perez, J.M.: UPLC/MSE; a new approach for generating molecular fragment information for biomarker structure elucidation. Rapid Commun. Mass Spectrom. 20, 1989–1994 (2006)
Article CAS Google Scholar
Gillet, L.C., Navarro, P., Tate, S., Röst, H., Selevsek, N., Reiter, L.: Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteom. 11, 1–17 (2012)
Sana, T.R., Waddell, K., Fischer, S.M.: A sample extraction and chromatographic strategy for increasing LC/MS detection coverage of the erythrocyte metabolome. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 871, 314–321 (2008)
Article CAS Google Scholar
Cajka, T., Fiehn, O.: Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016)
Article Google Scholar
Kaplan, K.A., Hill, H.H.: Metabolomics using ion mobility mass spectrometry. In: Lutz, N.W., Sweedler, J.V., Wevers, R.A. (eds.) Methodologies for metabolomics-experimental strategies and techniques, p. 185–204. Cambridge University Press, Cambridge (2013)
May, J.C., Goodwin, C.R., McLean, J.A., Lyubimov, A.V.: Gas-phase ion mobility-mass spectrometry (IM-MS) and tandem IM-MS/MS strategies for metabolism studies and metabolomics. In: Lyubimov, A. (ed.) Encyclopedia of drug metabolism and drug interactions, p.1–29. John Wiley and Sons, Inc., Hoboken (2012)
Paglia, G., Williams, J.P., Menikarachchi, L., Thompson, J.W., Tyldesley-Worster, R., Halldórsson, S.: Ion mobility derived collision cross sections to support metabolomics applications. Anal. Chem. 86, 3985–3993 (2014)
Article CAS Google Scholar
Smith, C.A., Maille, G.O., Want, E.J., Qin, C., Trauger, S.A., Brandon, T.R.: METLIN: a metabolite mass spectral database. Ther. Drug Monit. 27, 747–751 (2005)
Article CAS Google Scholar
Wishart, D.S., Jewison, T., Guo, A.C., Wilson, M., Knox, C., Liu, Y.: HMDB 3.0—The Human Metabolome Database in 2013. Nucleic Acids Res. 41, D801–D807 (2013)
Article CAS Google Scholar
Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K.: MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010)
Article CAS Google Scholar
Kind, T., Liu, K.-H., Yup Lee, D., DeFelice, B., Meissen, J.K., Fiehn, O.: LipidBlast—in-silico tandem mass spectrometry database for lipid identification. Nat. Methods 10, 755–758 (2013)
Article CAS Google Scholar
Kind, T., Fiehn, O.: Metabolomic database annotations via query of elemental compositions: mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics 7, 234–243 (2006)
Kind, T., Fiehn, O.: Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8, 105–124 (2007)
Tachibana, C.: What's next in ‘omics’: the metabolome. Science 345, 1519–1521 (2014)
Article Google Scholar
Wolf, S., Schmidt, S., Müller-Hannemann, M., Neumann, S.: In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics 11, 148–159 (2010)
Matsuda, F.: Rethinking mass spectrometry-based small molecule identification strategies in metabolomics. Mass Spectrom. (Tokyo) 3, S0038 (2014)
Article Google Scholar
Groessl, M., Graf, S., Knochenmuss, R.: High resolution ion mobility-mass spectrometry for separation and identification of isomeric lipids. Analyst 140, 6904–6911 (2015)
Article CAS Google Scholar
Sherrod, S.D., McLean, J.A.: Systems-wide high-dimensional data acquisition and informatics using structural mass spectrometry strategies. Clin. Chem. 62, 77–83 (2016)
Maansson, M., Vynne, N.G., Klitgaard, A., Nybo, J.L., Melchiorsen, J., Nguyen, D.D.: An integrated metabolomic and genomic mining workflow to uncover the biosynthetic potential of bacteria. mSystems. 1, e00028–15 (2016)
Sumner, L.W., Amberg, A., Barrett, D., Beale, M.H., Beger, R., Daykin, C.A.: Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3, 211–221 (2007)
Article CAS Google Scholar
Jeon, J., Kurth, D., Hollender, J.: Biotransformation pathways of biocides and pharmaceuticals in freshwater crustaceans based on structure elucidation of metabolites using high resolution mass spectrometry. Chem. Res. Toxicol. 26, 313–324 (2013)
Article CAS Google Scholar
Schymanski, E.L., Jeon, J., Gulde, R., Fenner, K., Ruff, M., Singer, H.P.: Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014)
Article CAS Google Scholar
Creek, D.J., Dunn, W.B., Fiehn, O., Griffin, J.L., Hall, R.D., Lei, Z.: Metabolite identification: are you sure? And how do your peers gauge your confidence? Metabolomics 10, 350–353 (2014)
Article CAS Google Scholar
Pacini, T., Fu, W., Gudmundsson, S., Chiaravalle, A.E., Brynjolfson, S., Palsson, B.O.: Multidimensional analytical approach based on UHPLC-UV-ion mobility-MS for the screening of natural pigments. Anal. Chem. 87, 2593–2599 (2015)
Article CAS Google Scholar
Creek, D.J., Jankevics, A., Breitling, R., Watson, D.G., Barrett, M.P., Burgess, K.E.: Toward global metabolomics analysis with hydrophilic interaction liquid chromatography-mass spectrometry: improved metabolite identification by retention time prediction. Anal. Chem. 83, 8703–8710 (2011)
Article CAS Google Scholar
Lanucara, F., Holman, S.W., Gray, C.J., Eyers, C.E.: The power of ion mobility-mass spectrometry for structural characterization and the study of conformational dynamics. Nat. Chem. 6, 281–294 (2014)
Article CAS Google Scholar
Käll, L., Storey, J.D., MacCoss, M.J., Noble, W.S.: Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J. Proteome Res. 7, 29–34 (2008)
Article Google Scholar
Elias, J.E., Gygi, S.P.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007)
Article CAS Google Scholar
Matsuda, F., Shinbo, Y., Oikawa, A., Hirai, M.Y., Fiehn, O., Kanaya, S.: Assessment of metabolome annotation quality: a method for evaluating the false discovery rate of elemental composition searches. PLoS One 4, e7490 (2009)
Article Google Scholar
Jeong, J., Shi, X., Zhang, X., Kim, S., Shen, C.: An empirical Bayes model using a competition score for metabolite identification in gas chromatography mass spectrometry. BMC Bioinformatics 12, 392–401 (2011)
Kochen, M.A., Chambers, M.C., Holman, J.D., Nesvizhskii, A.I., Weintraub, S.T., Belisle, J.T.: GREAZY—open-source software for automated phospholipid tandem mass spectrometry identification. Anal. Chem. 88, 5733–5741 (2016)
Gowda, H., Ivanisevic, J., Johnson, C.H., Kurczy, M.E., Benton, H.P., Rinehart, D.: Interactive XCMS online: simplifying advanced metabolomic data processing and subsequent statistical analyses. Anal. Chem. 86, 6931–6939 (2014)
Article CAS Google Scholar
Goodwin, C.R., Sherrod, S.D., Marasco, C.C., Bachmann, B.O., Schramm-Sapyta, N., Wikswo, J.P.: Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data. Anal. Chem. 86, 6563–6571 (2014)
Article CAS Google Scholar
Li, S., Park, Y., Duraisingham, S., Strobel, F.H., Khan, N., Soltow, Q.A.: Predicting network activity from high throughput metabolomics. PLoS Comput. Biol. 9, e1003123 (2013)
Article CAS Google Scholar
Xia, J., Sinelnikov, I.V., Han, B., Wishart, D.S.: MetaboAnalyst 3.0—making metabolomics more meaningful. Nucleic Acids Res. 43, W251–W257 (2015)
Article Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016)
Article Google Scholar
López-Ibáñez, J., Pazos, F., Chagoyen, M.: MBROLE 2.0—functional enrichment of chemical compounds. Nucleic Acids Res. 44, W201–W204 (2016)
Caspi, R., Billington, R., Ferrer, L., Foerster, H., Fulcher, C.A., Keseler, I.M.: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44, D471–D480 (2016)
Article Google Scholar

Download references

Acknowledgments

The authors thank the many students, colleagues, and collaborators who have influenced their thinking in this area. Specific thanks are given to Jody C. May who conceptualized and constructed Figure 2 and elements of Figure 3, and to Jerry D. Holman for adaptation and presentation of algorithms in Figure 4. The authors are also grateful to their collaborators, Babu Tekwani - Research Institute of Pharmaceutical Sciences and the National Center for Natural Products Research and Department of Biomolecular Sciences at the University of Mississippi, Jagrati Jain - Department of Biomolecular Sciences at the University of Mississippi, and David W. Wright - Department of Chemistry at Vanderbilt University and Laboratories for Innovations in Global Health Technologies, for erythrocyte studies represented in Figure 4. Support for aspects of this work was provided by the National Institutes of Health/National Institute for General Medical Sciences (R01GM092218) and the National Institutes of Health/National Center for Advancing Translational Sciences (4UH3TR000491), the Vanderbilt University Center for Innovative Technology, Vanderbilt Institute of Chemical Biology; the Vanderbilt Institute for Integrative Biosystems Research and Education; and Vanderbilt University.

Author information

Authors and Affiliations

Department of Chemistry, Vanderbilt University, Nashville, TN, 37235, USA
Alexandra C. Schrimpe-Rutledge, Simona G. Codreanu, Stacy D. Sherrod & John A. McLean
Center for Innovative Technology, Vanderbilt University, Nashville, TN, 37235, USA
Alexandra C. Schrimpe-Rutledge, Simona G. Codreanu, Stacy D. Sherrod & John A. McLean
Vanderbilt Institute of Chemical Biology, Vanderbilt University, Nashville, TN, 37235, USA
Alexandra C. Schrimpe-Rutledge, Simona G. Codreanu, Stacy D. Sherrod & John A. McLean
Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt University, Nashville, TN, 37235, USA
Alexandra C. Schrimpe-Rutledge, Simona G. Codreanu, Stacy D. Sherrod & John A. McLean

Authors

Alexandra C. Schrimpe-Rutledge
View author publications
You can also search for this author in PubMed Google Scholar
Simona G. Codreanu
View author publications
You can also search for this author in PubMed Google Scholar
Stacy D. Sherrod
View author publications
You can also search for this author in PubMed Google Scholar
John A. McLean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John A. McLean.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schrimpe-Rutledge, A.C., Codreanu, S.G., Sherrod, S.D. et al. Untargeted Metabolomics Strategies—Challenges and Emerging Directions. J. Am. Soc. Mass Spectrom. 27, 1897–1905 (2016). https://doi.org/10.1007/s13361-016-1469-y

Download citation

Received: 03 June 2016
Revised: 27 July 2016
Accepted: 29 July 2016
Published: 13 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s13361-016-1469-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Untargeted Metabolomics Strategies—Challenges and Emerging Directions

Abstract

Similar content being viewed by others

Metabolomics: Going Deeper, Going Broader, Going Further

Fundamentals of Mass Spectrometry-Based Metabolomics

Integration of Metabolomic and Proteomic Data to Uncover Actionable Metabolic Pathways

Introduction

LC-MS-Based Metabolomics: Strengths and Challenges