Abstract
Introduction
LC–MS/MS based untargeted metabolomics is evoking high interests in the metabolomics and broader biology community for its potential to uncover the contribution of unanticipated metabolic pathways to phenotypic observations. The major challenge for this methodology is making the computational metabolite identification as reliable as possible in order to reduce subsequent target candidate validation to a minimum. Metabolite library matching techniques based on precise masses and fragment mass patterns have become the de facto method in the field. However, in the literature the original methods are often under-validated, making it complicated to judge their intrinsic value.
Objectives
We aimed to demonstrate that large MS/MS metabolite spectral libraries can be used not only to validate and compare, but also to improve the methods.
Methods
Several computational tools for metabolite identification (MAGMa, CFM-ID, MetFrag, MIDAS) were applied on a large MS/MS dataset derived from Metlin. Their performance was first compared and for the two best-performing tools (MAGMa and MIDAS), the performance was then improved by applying a parameter fine-tuning procedure.
Results
We confirmed MIDAS and MAGMa as the state-of-the-art freely available tools for metabolite identification. Moreover, we were able to identify optimized working parameters, engendering an improvement in their performance. For MAGMa, dynamic, metabolite-dependent optimized parameters were obtained using machine learning techniques.
Conclusion
We were able to achieve an incremental increase in the identification accuracy of MIDAS and MAGMa. A wrapper script (MAGMa+) capable of calling MAGMa with tailored parameters is made available for download.
Similar content being viewed by others
References
Allen, F., Greiner, R., & Wishart, D. (2014). Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics, pp. 1–13.
CASMI (2015). Critical Assessment of Small Molecule Identification. http://www.casmi-contest.org2015.
Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., et al. (2008). ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36, D344–D350. doi:10.1093/nar/gkm791.
Duhrkop, K., Shen, H., Meusel, M., Rousu, J., & Bocker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proceedings of the National Academy of Sciences,. doi:10.1073/pnas.1509788112.
Dunn, W. B., Erban, A., Weber, R. J. M., Creek, D. J., Brown, M., Breitling, R., et al. (2013). Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics, 9(1), S44–S66. doi:10.1007/s11306-012-0434-4.
Durant, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences, 42(6), 1273–1280.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422. doi:10.1023/A:1012487302797.
Haga, S. W., & Wu, H. F. (2014). Overview of software options for processing, analysis and interpretation of mass spectrometric proteomic data. Journal of Mass Spectrometry, 49(10), 959–969. doi:10.1002/jms.3414.
Heinonen, M., Shen, H., Zamboni, N., & Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28(18), 2333–2341. doi:10.1093/bioinformatics/bts437.
Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45(7), 703–714. doi:10.1002/jms.1777.
Hufsky, F., Scheubert, K., & Böcker, S. (2014). Computational mass spectrometry for small-molecule fragmentation. TrAC Trends in Analytical Chemistry, 53, 41–48.
Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42(1), 46–57.
Jeffryes, J. G., Colastani, R. L., Elbadawi-Sidhu, M., Kind, T., Niehaus, T. D., Broadbelt, L. J., et al. (2015). MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. Journal of Cheminformatics, 7, 44. doi:10.1186/s13321-015-0087-1.
Klekota, J., & Roth, F. P. (2008). Chemical substructures that enrich for biological activity. Bioinformatics, 24(21), 2518–2525. doi:10.1093/bioinformatics/btn479.
Neumann, S., & Bocker, S. (2010). Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry, 398(7–8), 2779–2788. doi:10.1007/s00216-010-4142-5.
O’Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., & Hutchison, G. R. (2011). Open Babel: an open chemical toolbox. Journal of Cheminformatics, 3, 33. doi:10.1186/1758-2946-3-33.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Ridder, L., van der Hooft, J. J., Verhoeven, S., de Vos, R. C., van Schaik, R., & Vervoort, J. (2012). Substructure-based annotation of high-resolution multistage MS(n) spectral trees. Rapid Communications in Mass Spectrometry, 26(20), 2461–2471. doi:10.1002/rcm.6364.
Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: a metabolite mass spectral database. Therapeutic Drug Monitoring, 27(6), 747–751.
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., & Willighagen, E. (2003). The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences, 43(2), 493–500. doi:10.1021/ci025584y.
Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G. J., & Siuzdak, G. (2012). An accelerated workflow for untargeted metabolomics using the METLIN database. Nature Biotechnology, 30(9), 826–828. doi:10.1038/nbt.2348.
Vaniya, A., & Fiehn, O. (2015). Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends in Analytical Chemistry, 69, 52–61. doi:10.1016/j.trac.2015.04.002.
Wang, Y., Kora, G., Bowen, B. P., & Pan, C. (2014). MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Analytical Chemistry, 86(19), 9496–9503. doi:10.1021/ac5014783.
Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0-the human metabolome database in 2013. Nucleic Acids Res, 41, D801–D807. doi:10.1093/nar/gks1065.
Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., Gautam, B., et al. (2009). HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res, 37, D603–D610. doi:10.1093/nar/gkn810.
Wolf, S., Schmidt, S., Muller-Hannemann, M., & Neumann, S. (2010). In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics, 11, 148. doi:10.1186/1471-2105-11-148.
Acknowledgments
The authors wish to thank Marco Saerens and Pascal Francq (UCLouvain) and Yorick Poels, Matthieu Moisse and Bram Boeckx (VIB – KU Leuven) for providing computing power and technical support for this research.
Funding
This study was supported by a Federal Government Belgium grant (IUAP P7/03), long-term structural Methusalem funding by the Flemish Government, grants from the Research Foundation Flanders (FWO), the Foundation Leducq Transatlantic Network (ARTEMIS), Foundation against Cancer, an ERC Advanced Research Grant (EU-ERC269073), an ERC Consolidator Grant (RCN:191, 995), an AXA Research Fund, and by VIB.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Verdegem, D., Lambrechts, D., Carmeliet, P. et al. Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization. Metabolomics 12, 98 (2016). https://doi.org/10.1007/s11306-016-1036-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11306-016-1036-3