Skip to main content
Log in

Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

LC–MS/MS based untargeted metabolomics is evoking high interests in the metabolomics and broader biology community for its potential to uncover the contribution of unanticipated metabolic pathways to phenotypic observations. The major challenge for this methodology is making the computational metabolite identification as reliable as possible in order to reduce subsequent target candidate validation to a minimum. Metabolite library matching techniques based on precise masses and fragment mass patterns have become the de facto method in the field. However, in the literature the original methods are often under-validated, making it complicated to judge their intrinsic value.

Objectives

We aimed to demonstrate that large MS/MS metabolite spectral libraries can be used not only to validate and compare, but also to improve the methods.

Methods

Several computational tools for metabolite identification (MAGMa, CFM-ID, MetFrag, MIDAS) were applied on a large MS/MS dataset derived from Metlin. Their performance was first compared and for the two best-performing tools (MAGMa and MIDAS), the performance was then improved by applying a parameter fine-tuning procedure.

Results

We confirmed MIDAS and MAGMa as the state-of-the-art freely available tools for metabolite identification. Moreover, we were able to identify optimized working parameters, engendering an improvement in their performance. For MAGMa, dynamic, metabolite-dependent optimized parameters were obtained using machine learning techniques.

Conclusion

We were able to achieve an incremental increase in the identification accuracy of MIDAS and MAGMa. A wrapper script (MAGMa+) capable of calling MAGMa with tailored parameters is made available for download.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Allen, F., Greiner, R., & Wishart, D. (2014). Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics, pp. 1–13.

  • CASMI (2015). Critical Assessment of Small Molecule Identification. http://www.casmi-contest.org2015.

  • Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., et al. (2008). ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36, D344–D350. doi:10.1093/nar/gkm791.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Duhrkop, K., Shen, H., Meusel, M., Rousu, J., & Bocker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proceedings of the National Academy of Sciences,. doi:10.1073/pnas.1509788112.

    Google Scholar 

  • Dunn, W. B., Erban, A., Weber, R. J. M., Creek, D. J., Brown, M., Breitling, R., et al. (2013). Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics, 9(1), S44–S66. doi:10.1007/s11306-012-0434-4.

    Article  Google Scholar 

  • Durant, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences, 42(6), 1273–1280.

    Article  CAS  PubMed  Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422. doi:10.1023/A:1012487302797.

    Article  Google Scholar 

  • Haga, S. W., & Wu, H. F. (2014). Overview of software options for processing, analysis and interpretation of mass spectrometric proteomic data. Journal of Mass Spectrometry, 49(10), 959–969. doi:10.1002/jms.3414.

    Article  CAS  PubMed  Google Scholar 

  • Heinonen, M., Shen, H., Zamboni, N., & Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28(18), 2333–2341. doi:10.1093/bioinformatics/bts437.

    Article  CAS  PubMed  Google Scholar 

  • Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45(7), 703–714. doi:10.1002/jms.1777.

    Article  CAS  PubMed  Google Scholar 

  • Hufsky, F., Scheubert, K., & Böcker, S. (2014). Computational mass spectrometry for small-molecule fragmentation. TrAC Trends in Analytical Chemistry, 53, 41–48.

    Article  CAS  Google Scholar 

  • Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42(1), 46–57.

    Article  CAS  PubMed  Google Scholar 

  • Jeffryes, J. G., Colastani, R. L., Elbadawi-Sidhu, M., Kind, T., Niehaus, T. D., Broadbelt, L. J., et al. (2015). MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. Journal of Cheminformatics, 7, 44. doi:10.1186/s13321-015-0087-1.

    Article  PubMed  PubMed Central  Google Scholar 

  • Klekota, J., & Roth, F. P. (2008). Chemical substructures that enrich for biological activity. Bioinformatics, 24(21), 2518–2525. doi:10.1093/bioinformatics/btn479.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Neumann, S., & Bocker, S. (2010). Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry, 398(7–8), 2779–2788. doi:10.1007/s00216-010-4142-5.

    Article  CAS  PubMed  Google Scholar 

  • O’Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., & Hutchison, G. R. (2011). Open Babel: an open chemical toolbox. Journal of Cheminformatics, 3, 33. doi:10.1186/1758-2946-3-33.

    Article  PubMed  PubMed Central  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    Google Scholar 

  • Ridder, L., van der Hooft, J. J., Verhoeven, S., de Vos, R. C., van Schaik, R., & Vervoort, J. (2012). Substructure-based annotation of high-resolution multistage MS(n) spectral trees. Rapid Communications in Mass Spectrometry, 26(20), 2461–2471. doi:10.1002/rcm.6364.

    Article  CAS  PubMed  Google Scholar 

  • Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: a metabolite mass spectral database. Therapeutic Drug Monitoring, 27(6), 747–751.

    Article  CAS  PubMed  Google Scholar 

  • Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., & Willighagen, E. (2003). The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences, 43(2), 493–500. doi:10.1021/ci025584y.

    Article  CAS  PubMed  Google Scholar 

  • Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G. J., & Siuzdak, G. (2012). An accelerated workflow for untargeted metabolomics using the METLIN database. Nature Biotechnology, 30(9), 826–828. doi:10.1038/nbt.2348.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Vaniya, A., & Fiehn, O. (2015). Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends in Analytical Chemistry, 69, 52–61. doi:10.1016/j.trac.2015.04.002.

    Article  CAS  PubMed  Google Scholar 

  • Wang, Y., Kora, G., Bowen, B. P., & Pan, C. (2014). MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Analytical Chemistry, 86(19), 9496–9503. doi:10.1021/ac5014783.

    Article  CAS  PubMed  Google Scholar 

  • Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0-the human metabolome database in 2013. Nucleic Acids Res, 41, D801–D807. doi:10.1093/nar/gks1065.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., Gautam, B., et al. (2009). HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res, 37, D603–D610. doi:10.1093/nar/gkn810.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wolf, S., Schmidt, S., Muller-Hannemann, M., & Neumann, S. (2010). In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics, 11, 148. doi:10.1186/1471-2105-11-148.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Marco Saerens and Pascal Francq (UCLouvain) and Yorick Poels, Matthieu Moisse and Bram Boeckx (VIB – KU Leuven) for providing computing power and technical support for this research.

Funding

This study was supported by a Federal Government Belgium grant (IUAP P7/03), long-term structural Methusalem funding by the Flemish Government, grants from the Research Foundation Flanders (FWO), the Foundation Leducq Transatlantic Network (ARTEMIS), Foundation against Cancer, an ERC Advanced Research Grant (EU-ERC269073), an ERC Consolidator Grant (RCN:191, 995), an AXA Research Fund, and by VIB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Ghesquière.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 89 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verdegem, D., Lambrechts, D., Carmeliet, P. et al. Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization. Metabolomics 12, 98 (2016). https://doi.org/10.1007/s11306-016-1036-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-016-1036-3

Keywords

Navigation