, 12:113 | Cite as

Regular expressions of MS/MS spectra for partial annotation of metabolite features

  • Fumio Matsuda
Original Article



Partial annotation and characterization of metabolite structures on the basis of data from tandem mass spectrometry (MS/MS) spectra are technical bottlenecks in metabolomics. Novel approaches should be explored for evaluation of spectral similarities among structurally related compounds as well as for description of fragmentation motifs commonly observed in MS/MS spectra.


A regular expression of MS/MS data was developed to search for structurally similar metabolites and to describe spectral motifs for partial annotation and characterization of metabolite structures.


After definition of an MS/MS string as a text representation of an MS/MS spectrum, a regular expression of MS/MS strings involving meta characters, anchors, and quantifiers was introduced. Here it was also demonstrated that spectral motifs can be described by a regular expression to define a common fragmentation pattern observed among structurally related metabolites.


The regular expression was applied to a search for similar MS/MS spectra. Analysis of MassBank data with fragment assignment information (fragment ion and neutral loss matrix, suggested that the regular expression of MS/MS spectra can detect spectral similarities among structurally related metabolites. Analysis of MS/MS spectral libraries of Arabidopsis and rice revealed that the metabolite features can be partially annotated or characterized by the spectral motifs and can be assigned the corresponding ontology codes produced by Chemical Entities of Biological Interest (ChEBI).


The MS/MS spectral motifs represent a method for partial annotation or characterization of metabolite features. A regular expression of MS/MS data holds promise for further enrichment of metabolite annotations and for easy sharing of ambiguous annotation data among metabolomic studies.


MS/MS spectrum Regular expression Small molecule identification Mass spectral motif Fragmentation 



I am highly grateful to Prof. Takaaki Nishioka (Nara Institute of Science and Technology), Prof. Masanori Arita (National Institute of Genetics), and Mr. Yuya Ojima (MassBank) for providing the MS/MS spectra dataset from ( I also thank Dr. Yuji Sawada, Dr. Yutaka Yamada (RIKEN CSRS), Dr. Nozomu Sakurai and Dr. Nayumi Akimoto (Kazusa DNA research institute) for their helpful comments on this manuscript and databasing work.

Financial support

This work was partially supported by the JST, Strategic International Collaborative Research Program, SICORP for JP-US Metabolomics, and a Grant-in-Aid for Scientific Research (B) No. 25820400.

Compliance with ethical standards

Conflict of interest

The authors declared that they have no conflict of interest in the submission of this manuscript.

Ethical approval

All institutional and national guidelines for the care and use of laboratory animals were followed.

Supplementary material

11306_2016_1052_MOESM1_ESM.xlsx (700 kb)
Supplementary material 1 (XLSX 701 kb)
11306_2016_1052_MOESM2_ESM.doc (738 kb)
Supplementary material 2 (DOC 737 kb) (1.6 mb)
Supplementary Data 1 The tandem mass spectrometry (MS/MS) string datasets of MassBank data (ZIP 1687 kb)


  1. Arita, M., & Suwa, K. (2008). Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database. BioData Mining, 1, 7.CrossRefPubMedPubMedCentralGoogle Scholar
  2. Besson, E., et al. (1985). C-Glycosylflavonoids from Oryza sativa. Phytochemistry, 24, 1061–1064.CrossRefGoogle Scholar
  3. Bottcher, C., et al. (2008). Metabolome analysis of biosynthetic mutants reveals diversity of metabolic changes and allows identification of a large number of new compounds in Arabidopsis thaliana. Plant Physiology, 147, 2107–2120.CrossRefPubMedPubMedCentralGoogle Scholar
  4. Chen, W., et al. (2014). Genome-wide association analyses provide genetic and biochemical insights into natural variation in rice metabolism. Nature Genetics, 46, 714–721.CrossRefPubMedGoogle Scholar
  5. Degtyarenko, K., et al. (2008). ChEBI: A database and ontology for chemical entities of biological interest. Nucleic Acids Research, 36, D344–D350.CrossRefPubMedGoogle Scholar
  6. Dookeran, N. N., Yalcin, T., & Harrison, A. G. (1996). Fragmentation reactions of protonated a-amino acids. Journal of Mass Spectrometry, 31, 500–508.CrossRefGoogle Scholar
  7. Ekanayaka, E. A., Celiz, M. D., & Jones, A. D. (2015). Relative mass defect filtering of mass spectra: a path to discovery of plant specialized metabolites. Plant Physiology, 167, 1221–1232.CrossRefPubMedPubMedCentralGoogle Scholar
  8. Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3, 175–178.CrossRefGoogle Scholar
  9. Gerlich, M., & Neumann, S. (2013). MetFusion: Integration of compound identification strategies. Journal of Mass Spectrometry, 48, 291–298.CrossRefPubMedGoogle Scholar
  10. Herzog, R., et al. (2012). LipidXplorer: A software for consensual cross-platform lipidomics. PLoS ONE, 7, e29851.CrossRefPubMedPubMedCentralGoogle Scholar
  11. Horai, H., et al. (2010). MassBank: A public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45, 703–714.CrossRefPubMedGoogle Scholar
  12. Iijima, Y., et al. (2008). Metabolite annotations based on the integration of mass spectral information. The Plant Journal, 54, 949–962.CrossRefPubMedPubMedCentralGoogle Scholar
  13. Kind, T., & Fiehn, O. (2006). Metabolomic database annotations via query of elemental compositions: mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics, 7, 234.CrossRefPubMedPubMedCentralGoogle Scholar
  14. Kind, T., & Fiehn, O. (2007). Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics, 8, 105.CrossRefPubMedPubMedCentralGoogle Scholar
  15. Kind, T., & Fiehn, O. (2010). Advances in structure elucidation of small molecules using mass spectrometry. Bioanalytical Reviews, 2, 23–60.CrossRefPubMedPubMedCentralGoogle Scholar
  16. Kolakowski, L. F, Jr., Leunissen, J. A., & Smith, J. E. (1992). ProSearch: fast searching of protein sequences with regular expression patterns related to protein structure and function. BioTechniques, 13, 919–921.PubMedGoogle Scholar
  17. Ma, S., & Chowdhury, S. K. (2013). Data acquisition and data mining techniques for metabolite identification using LC coupled to high-resolution MS. Bioanalysis, 5, 1285–1297.CrossRefPubMedGoogle Scholar
  18. Ma, Y., Kind, T., Yang, D., Leon, C., & Fiehn, O. (2014). MS2Analyzer: A software for small molecule substructure annotations from accurate tandem mass spectra. Analytical Chemistry, 86, 10724–10731.CrossRefPubMedPubMedCentralGoogle Scholar
  19. Matsuda, F. (2014). Rethinking mass spectrometry-based small molecule identication strategies in metabolomics. Mass Spectrometry, 3, S0038.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Matsuda, F., Yonekura-Sakakibara, K., Niida, R., Kuromori, T., Shinozaki, K., & Saito, K. (2009). MS/MS spectral tag (MS2T)-based annotation of non-targeted profile of plant secondary metabolites. The Plant Journal, 57, 555–577.CrossRefPubMedGoogle Scholar
  21. Matsuda, F., et al. (2010). AtMetExpress development: A phytochemical atlas of Arabidopsis development. Plant Physiology, 152, 566–578.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Matsuda, F., et al. (2011). Mass spectra-based framework for automated structural elucidation of metabolome data to explore phytochemical diversity. Frontiers in Plant Science, 2, 40.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Matsuda, F., et al. (2012). Dissection of genotype-phenotype associations in rice grains using metabolome quantitative trait loci analysis. The Plant Journal, 70, 624–636.CrossRefPubMedGoogle Scholar
  24. Matsuda, F., et al. (2015). Metabolome-genome-wide association study (mGWAS) dissects genetic architecture for generating natural variation in rice secondary metabolism. The Plant Journal, 81, 13–23.CrossRefPubMedGoogle Scholar
  25. Meringer, M., & Schymanski, E. L. (2013). Small molecule identification with MOLGEN and mass spectrometry. Metabolites, 3, 440–462.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Mulder, N. J., & Apweiler, R. (2002). Tools and resources for identifying protein families, domains and motifs. Genome Biology, 3, REVIEWS2001Google Scholar
  27. Mylonas, R., et al. (2009). X-Rank: A robust algorithm for small molecule identification using tandem mass spectrometry. Analytical Chemistry, 81, 7604–7610.CrossRefPubMedGoogle Scholar
  28. Nakabayashi, R., et al. (2009). Metabolomics-oriented isolation and structure elucidation of 37 compounds including two anthocyanins from Arabidopsis thaliana. Phytochemistry, 70, 1017–1029.CrossRefPubMedGoogle Scholar
  29. Neumann, S., & Bocker, S. (2010). Computational mass spectrometry for metabolomics: Identification of metabolites and small molecules. Analytical and Bioanalytical Chemistry, 398, 2779–2788.CrossRefPubMedGoogle Scholar
  30. Oberacher, H., Weinmann, W., & Dresen, S. (2011). Quality evaluation of tandem mass spectral libraries. Analytical and Bioanalytical Chemistry, 400, 2641–2648.CrossRefPubMedGoogle Scholar
  31. Rogalewicz, F., Hoppilliard, Y., & Ohanessian, G. (2000). Fragmentation mechanisms of a-amino acids protonated under electrospray ionization: A collisional activation and ab initio theoretical study. International Journal of Mass Spectrometry, 195(196), 565–590.CrossRefGoogle Scholar
  32. Sakurai, T., et al. (2013). PRIMe Update: innovative content for plant metabolomics and integration of gene expression and metabolite accumulation. Plant and Cell Physiology, 54, e5.CrossRefPubMedPubMedCentralGoogle Scholar
  33. Sawada, Y., et al. (2012). RIKEN tandem mass spectral database (ReSpect) for phytochemicals: A plant-specific MS/MS-based data resource and database. Phytochemistry, 82, 38–45.CrossRefPubMedGoogle Scholar
  34. Scheubert, K., Hufsky, F., & Bocker, S. (2013). Computational mass spectrometry for small molecules. Journal of Cheminformatics, 5, 12.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Schymanski, E. L., et al. (2012). Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. Analytical Chemistry, 84, 3287–3295.CrossRefPubMedGoogle Scholar
  36. Shannon, P., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Stein, S. E., & Scott, D. R. (1994). Optimization and testing of mass-spectral library search algorithms for compound identification. Journal of the American Society for Mass Spectrometry, 5, 859–866.CrossRefPubMedGoogle Scholar
  38. Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3, 211–221.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Tohge, T., Yonekura-Sakakibara, K., Niida, R., Watanabe-Takahashi, A., & Saito, K. (2007). Phytochemical genomics in Arabidopsis thaliana: A case study for functional identification of flavonoid biosynthesis genes. Pure Applied Chemistry, 79, 811–823.CrossRefGoogle Scholar
  40. Wolf, S., Schmidt, S., Muller-Hannemann, M., & Neumann, S. (2010). In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics, 11, 148.CrossRefPubMedPubMedCentralGoogle Scholar
  41. Yang, Z. et al. (2014). Toward better annotation in plant metabolomics: Isolation and structure elucidation of 36 specialized metabolites from Oryza sativa (rice) by using MS/MS and NMR analyses. Metabolomics, 10, 543–555CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Bioinformatics Engineering, Graduate School of Information Science and TechnologyOsaka UniversitySuitaJapan
  2. 2.RIKEN Center for Sustainable Resource ScienceYokohamaJapan

Personalised recommendations