Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization
- 571 Downloads
LC–MS/MS based untargeted metabolomics is evoking high interests in the metabolomics and broader biology community for its potential to uncover the contribution of unanticipated metabolic pathways to phenotypic observations. The major challenge for this methodology is making the computational metabolite identification as reliable as possible in order to reduce subsequent target candidate validation to a minimum. Metabolite library matching techniques based on precise masses and fragment mass patterns have become the de facto method in the field. However, in the literature the original methods are often under-validated, making it complicated to judge their intrinsic value.
We aimed to demonstrate that large MS/MS metabolite spectral libraries can be used not only to validate and compare, but also to improve the methods.
Several computational tools for metabolite identification (MAGMa, CFM-ID, MetFrag, MIDAS) were applied on a large MS/MS dataset derived from Metlin. Their performance was first compared and for the two best-performing tools (MAGMa and MIDAS), the performance was then improved by applying a parameter fine-tuning procedure.
We confirmed MIDAS and MAGMa as the state-of-the-art freely available tools for metabolite identification. Moreover, we were able to identify optimized working parameters, engendering an improvement in their performance. For MAGMa, dynamic, metabolite-dependent optimized parameters were obtained using machine learning techniques.
We were able to achieve an incremental increase in the identification accuracy of MIDAS and MAGMa. A wrapper script (MAGMa+) capable of calling MAGMa with tailored parameters is made available for download.
KeywordsUntargeted metabolomics Metabolite identification MAGMa Method comparison Method optimization Machine learning
The authors wish to thank Marco Saerens and Pascal Francq (UCLouvain) and Yorick Poels, Matthieu Moisse and Bram Boeckx (VIB – KU Leuven) for providing computing power and technical support for this research.
This study was supported by a Federal Government Belgium grant (IUAP P7/03), long-term structural Methusalem funding by the Flemish Government, grants from the Research Foundation Flanders (FWO), the Foundation Leducq Transatlantic Network (ARTEMIS), Foundation against Cancer, an ERC Advanced Research Grant (EU-ERC269073), an ERC Consolidator Grant (RCN:191, 995), an AXA Research Fund, and by VIB.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Allen, F., Greiner, R., & Wishart, D. (2014). Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification. Metabolomics, pp. 1–13.Google Scholar
- CASMI (2015). Critical Assessment of Small Molecule Identification. http://www.casmi-contest.org2015.
- Jeffryes, J. G., Colastani, R. L., Elbadawi-Sidhu, M., Kind, T., Niehaus, T. D., Broadbelt, L. J., et al. (2015). MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. Journal of Cheminformatics, 7, 44. doi: 10.1186/s13321-015-0087-1.CrossRefPubMedPubMedCentralGoogle Scholar
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
- Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., & Willighagen, E. (2003). The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. Journal of Chemical Information and Computer Sciences, 43(2), 493–500. doi: 10.1021/ci025584y.CrossRefPubMedGoogle Scholar