Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification
Electrospray tandem mass spectrometry (ESI-MS/MS) is commonly used in high throughput metabolomics. One of the key obstacles to the effective use of this technology is the difficulty in interpreting measured spectra to accurately and efficiently identify metabolites. Traditional methods for automated metabolite identification compare the target MS or MS/MS spectrum to the spectra in a reference database, ranking candidates based on the closeness of the match. However the limited coverage of available databases has led to an interest in computational methods for predicting reference MS/MS spectra from chemical structures. This work proposes a probabilistic generative model for the MS/MS fragmentation process, which we call competitive fragmentation modeling (CFM), and a machine learning approach for learning parameters for this model from MS/MS data. We show that CFM can be used in both a MS/MS spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target MS/MS spectrum). In the MS/MS spectrum prediction task, CFM shows significantly improved performance when compared to a full enumeration of all peaks corresponding to substructures of the molecule. In the metabolite identification task, CFM obtains substantially better rankings for the correct candidate than existing methods (MetFrag and FingerID) on tripeptide and metabolite data, when querying PubChem or KEGG for candidate structures of similar mass.
KeywordsTandem mass spectrometry MS/MS Metabolite identification Machine learning
Many thanks to Dale Schuurmans, Liang Li, and Jun Peng at the University of Alberta, as well as to the Steinbeck Group at the European Bioinformatics Institute (EMBL-EBI), for invaluable discussions and advice. This work was supported by the Natural Sciences and Engineering Research Council of Canada; Alberta Innovates Technology Futures; and Alberta Innovates Health Solutions and made possible by the Compute Canada Westgrid facility.
- Bolton, E., Wang, Y., Thiessen, P., & Bryant, S. (2008). PubChem: Integrated platform of small molecules and biological activities. In Chapeter 12 in Annual reports in computational chemistry (Vol. 4). Washington, DC: American Chemical Society.Google Scholar
- Cappé, O., Moulines, E., & Ryden, T. (2005). Inference in hidden Markov models. Berlin: Springer.Google Scholar
- de Hoffman, E., & Stroobant, V. (2007). Mass spectrometry: Principles and applications (3rd ed.). Chichester: Wiley.Google Scholar
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.Google Scholar
- Galezowska, A., Harrison, M. W., Herniman, J. M., Skylaris, C. K., & Langley, G. J. (2013). A predictive science approach to aid understanding of electrospray ionisation tandem mass spectrometric fragmentation pathways of small molecules using density functional calculations. Rapid Communications in Mass Spectrometry (RCM), 27(9), 964–970.CrossRefGoogle Scholar
- Gasteiger, J., Haneback, W., & Schulz, K. P. (1992). Prediction of mass spectra from structural information. Journal of Chemical Information and Computer Sciences, 32, 264–271.Google Scholar
- Hastings, J., de Matos, P., & Dekker, A. (2013). The ChEBI reference database and ontology for biologically relevant chemistry: Enhancements for 2013. Nucleic Acids Research, 41(Database issue), D456–D463.Google Scholar
- Kerber, A., Meringer, M., & Rücker, C. (2006). CASE via MS: Ranking structure candidates by mass spectra. Croatica Chemica Acta, 79(3), 449–464.Google Scholar
- Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge: The MIT Press.Google Scholar
- Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A., & Lederberg, J. (1980). Applications of artificial intelligence for organic chemistry: The DENDRAL project. New York: McGraw-Hill Book Company.Google Scholar
- McLafferty, F. W., & Turecek, F. (1993). Interpretation of Mass Spectra (4th ed.). Mill Valley: University Science Books.Google Scholar
- Oberacher, H., Pavlic, M., Libiseller, K., et al. (2009). On the inter-instrument and the inter-laboratory transferability of a tandem mass spectral reference library: 2. Optimization and characterization of the search algorithm. Journal of Mass Spectrometry (JMS), 44(4), 494–502.Google Scholar