, 12:98 | Cite as

Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization

  • Dries Verdegem
  • Diether Lambrechts
  • Peter Carmeliet
  • Bart Ghesquière
Original Article



LC–MS/MS based untargeted metabolomics is evoking high interests in the metabolomics and broader biology community for its potential to uncover the contribution of unanticipated metabolic pathways to phenotypic observations. The major challenge for this methodology is making the computational metabolite identification as reliable as possible in order to reduce subsequent target candidate validation to a minimum. Metabolite library matching techniques based on precise masses and fragment mass patterns have become the de facto method in the field. However, in the literature the original methods are often under-validated, making it complicated to judge their intrinsic value.


We aimed to demonstrate that large MS/MS metabolite spectral libraries can be used not only to validate and compare, but also to improve the methods.


Several computational tools for metabolite identification (MAGMa, CFM-ID, MetFrag, MIDAS) were applied on a large MS/MS dataset derived from Metlin. Their performance was first compared and for the two best-performing tools (MAGMa and MIDAS), the performance was then improved by applying a parameter fine-tuning procedure.


We confirmed MIDAS and MAGMa as the state-of-the-art freely available tools for metabolite identification. Moreover, we were able to identify optimized working parameters, engendering an improvement in their performance. For MAGMa, dynamic, metabolite-dependent optimized parameters were obtained using machine learning techniques.


We were able to achieve an incremental increase in the identification accuracy of MIDAS and MAGMa. A wrapper script (MAGMa+) capable of calling MAGMa with tailored parameters is made available for download.


Untargeted metabolomics Metabolite identification MAGMa Method comparison Method optimization Machine learning 



The authors wish to thank Marco Saerens and Pascal Francq (UCLouvain) and Yorick Poels, Matthieu Moisse and Bram Boeckx (VIB – KU Leuven) for providing computing power and technical support for this research.


This study was supported by a Federal Government Belgium grant (IUAP P7/03), long-term structural Methusalem funding by the Flemish Government, grants from the Research Foundation Flanders (FWO), the Foundation Leducq Transatlantic Network (ARTEMIS), Foundation against Cancer, an ERC Advanced Research Grant (EU-ERC269073), an ERC Consolidator Grant (RCN:191, 995), an AXA Research Fund, and by VIB.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

11306_2016_1036_MOESM1_ESM.docx (90 kb)
Supplementary material 1 (DOCX 89 kb)


  1. Allen, F., Greiner, R., & Wishart, D. (2014). Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics, pp. 1–13.Google Scholar
  2. CASMI (2015). Critical Assessment of Small Molecule Identification. http://www.casmi-contest.org2015.
  3. Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., et al. (2008). ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36, D344–D350. doi: 10.1093/nar/gkm791.CrossRefPubMedPubMedCentralGoogle Scholar
  4. Duhrkop, K., Shen, H., Meusel, M., Rousu, J., & Bocker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proceedings of the National Academy of Sciences,. doi: 10.1073/pnas.1509788112.Google Scholar
  5. Dunn, W. B., Erban, A., Weber, R. J. M., Creek, D. J., Brown, M., Breitling, R., et al. (2013). Mass appeal: metabolite identification in mass spectrometry-focused untargeted metabolomics. Metabolomics, 9(1), S44–S66. doi: 10.1007/s11306-012-0434-4.CrossRefGoogle Scholar
  6. Durant, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences, 42(6), 1273–1280.CrossRefPubMedGoogle Scholar
  7. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1–3), 389–422. doi: 10.1023/A:1012487302797.CrossRefGoogle Scholar
  8. Haga, S. W., & Wu, H. F. (2014). Overview of software options for processing, analysis and interpretation of mass spectrometric proteomic data. Journal of Mass Spectrometry, 49(10), 959–969. doi: 10.1002/jms.3414.CrossRefPubMedGoogle Scholar
  9. Heinonen, M., Shen, H., Zamboni, N., & Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics, 28(18), 2333–2341. doi: 10.1093/bioinformatics/bts437.CrossRefPubMedGoogle Scholar
  10. Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45(7), 703–714. doi: 10.1002/jms.1777.CrossRefPubMedGoogle Scholar
  11. Hufsky, F., Scheubert, K., & Böcker, S. (2014). Computational mass spectrometry for small-molecule fragmentation. TrAC Trends in Analytical Chemistry, 53, 41–48.CrossRefGoogle Scholar
  12. Ihlenfeldt, W. D., Voigt, J. H., Bienfait, B., Oellien, F., & Nicklaus, M. C. (2002). Enhanced CACTVS browser of the Open NCI Database. Journal of Chemical Information and Computer Sciences, 42(1), 46–57.CrossRefPubMedGoogle Scholar
  13. Jeffryes, J. G., Colastani, R. L., Elbadawi-Sidhu, M., Kind, T., Niehaus, T. D., Broadbelt, L. J., et al. (2015). MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. Journal of Cheminformatics, 7, 44. doi: 10.1186/s13321-015-0087-1.CrossRefPubMedPubMedCentralGoogle Scholar
  14. Klekota, J., & Roth, F. P. (2008). Chemical substructures that enrich for biological activity. Bioinformatics, 24(21), 2518–2525. doi: 10.1093/bioinformatics/btn479.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Neumann, S., & Bocker, S. (2010). Computational mass spectrometry for metabolomics: identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry, 398(7–8), 2779–2788. doi: 10.1007/s00216-010-4142-5.CrossRefPubMedGoogle Scholar
  16. O’Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., & Hutchison, G. R. (2011). Open Babel: an open chemical toolbox. Journal of Cheminformatics, 3, 33. doi: 10.1186/1758-2946-3-33.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
  18. Ridder, L., van der Hooft, J. J., Verhoeven, S., de Vos, R. C., van Schaik, R., & Vervoort, J. (2012). Substructure-based annotation of high-resolution multistage MS(n) spectral trees. Rapid Communications in Mass Spectrometry, 26(20), 2461–2471. doi: 10.1002/rcm.6364.CrossRefPubMedGoogle Scholar
  19. Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: a metabolite mass spectral database. Therapeutic Drug Monitoring, 27(6), 747–751.CrossRefPubMedGoogle Scholar
  20. Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., & Willighagen, E. (2003). The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences, 43(2), 493–500. doi: 10.1021/ci025584y.CrossRefPubMedGoogle Scholar
  21. Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G. J., & Siuzdak, G. (2012). An accelerated workflow for untargeted metabolomics using the METLIN database. Nature Biotechnology, 30(9), 826–828. doi: 10.1038/nbt.2348.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Vaniya, A., & Fiehn, O. (2015). Using fragmentation trees and mass spectral trees for identifying unknown compounds in metabolomics. Trends in Analytical Chemistry, 69, 52–61. doi: 10.1016/j.trac.2015.04.002.CrossRefPubMedGoogle Scholar
  23. Wang, Y., Kora, G., Bowen, B. P., & Pan, C. (2014). MIDAS: a database-searching algorithm for metabolite identification in metabolomics. Analytical Chemistry, 86(19), 9496–9503. doi: 10.1021/ac5014783.CrossRefPubMedGoogle Scholar
  24. Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0-the human metabolome database in 2013. Nucleic Acids Res, 41, D801–D807. doi: 10.1093/nar/gks1065.CrossRefPubMedPubMedCentralGoogle Scholar
  25. Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., Gautam, B., et al. (2009). HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res, 37, D603–D610. doi: 10.1093/nar/gkn810.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Wolf, S., Schmidt, S., Muller-Hannemann, M., & Neumann, S. (2010). In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics, 11, 148. doi: 10.1186/1471-2105-11-148.CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Dries Verdegem
    • 1
    • 2
  • Diether Lambrechts
    • 3
  • Peter Carmeliet
    • 2
  • Bart Ghesquière
    • 1
  1. 1.Metabolomics Expertise Center, Vesalius Research Center (VRC)VIB, KU Leuven - University of LeuvenLouvainBelgium
  2. 2.Laboratory of Angiogenesis and Neurovascular Link, Vesalius Research Center (VRC), Department of OncologyVIB, KU Leuven - University of LeuvenLouvainBelgium
  3. 3.Laboratory for Translational Genetics, Vesalius Research Center (VRC), Department of OncologyVIB, KU Leuven - University of LeuvenLouvainBelgium

Personalised recommendations