Speedy Colorful Subtrees

  • W. Timothy J. WhiteEmail author
  • Stephan Beyer
  • Kai Dührkop
  • Markus Chimani
  • Sebastian Böcker
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9198)


Fragmentation trees are a technique for identifying molecular formulas and deriving some chemical properties of metabolites—small organic molecules—solely from mass spectral data. Computing these trees involves finding exact solutions to the NP-hard Maximum Colorful Subtree problem. Existing solvers struggle to solve the large instances involved fast enough to keep up with instrument throughput, and their performance remains a hindrance to adoption in practice.

We attack this problem on two fronts: by combining fast and effective reduction algorithms with a strong integer linear program (ILP) formulation of the problem, we achieve overall speedups of 9.4 fold and 8.8 fold on two sets of real-world problems—without sacrificing optimality. Both approaches are, to our knowledge, the first of their kind for this problem. We also evaluate the strategy of solving global problem instances, instead of first subdividing them into many candidate instances as has been done in the past. Software (C++ source for our reduction program and our CPLEX/Gurobi driver program) available under LGPL at


Molecular Formula Integer Linear Program Full Version Reduction Rule Incoming Edge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Böcker, S., Lipták, Z.: A fast and simple algorithm for the Money Changing Problem. Algorithmica 48(4), 413–432 (2007)zbMATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Böcker, S., Rasche, F.: Towards de novo identification of metabolites by analyzing tandem mass spectra. Bioinformatics 24, I49–I55 (2008). Proc. of European Conference on Computational Biology (ECCB 2008)CrossRefGoogle Scholar
  3. 3.
    Böcker, S., Letzel, M., Lipták, Z., Pervukhin, A.: SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics 25(2), 218–224 (2009)CrossRefGoogle Scholar
  4. 4.
    Cherkassky, B., Goldberg, A.: On implementing push-relabel method for the maximum flow problem. Algorithmica 19, 390–410 (1997)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Dührkop, K., Böcker, S.: Fragmentation trees reloaded. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 65–79. Springer, Heidelberg (2015) Google Scholar
  6. 6.
    Dührkop, K., Hufsky, F., Böcker, S.: Molecular formula identification using isotope pattern analysis and calculation of fragmentation trees. Mass Spectrom 3(special issue 2), S0037 (2014)CrossRefGoogle Scholar
  7. 7.
    Kind, T., Fiehn, O.: Metabolomic database annotations via query of elemental compositions: Mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics 7(1), 234 (2006)CrossRefGoogle Scholar
  8. 8.
    Menikarachchi, L.C., Cawley, S., Hill, D.W., Hall, L.M., Hall, L., Lai, S., Wilder, J., Grant, D.F.: MolFind: A software package enabling HPLC/MS-based identification of unknown chemical structures. Anal Chem 84(21), 9388–9394 (2012)Google Scholar
  9. 9.
    Meringer, M., Reinker, S., Zhang, J., Muller, A.: MS/MS data improves automated determination of molecular formulas by mass spectrometry. MATCH-Commun Math Co 65, 259–290 (2011)Google Scholar
  10. 10.
    Nishioka, T., Kasama, T., Kinumi, T., Makabe, H., Matsuda, F., Miura, D., Miyashita, M., Nakamura, T., Tanaka, K., Yamamoto, A.: Winners of CASMI2013: Automated tools and challenge data. Mass Spectrom 3(special issue 2), S0039 (2014)CrossRefGoogle Scholar
  11. 11.
    Pluskal, T., Uehara, T., Yanagida, M.: Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal Chem 84(10), 4396–4403 (2012)CrossRefGoogle Scholar
  12. 12.
    Rasche, F., Svatoš, A., Maddula, R.K., Böttcher, C., Böcker, S.: Computing fragmentation trees from tandem mass spectrometry data. Anal Chem 83(4), 1243–1251 (2011)CrossRefGoogle Scholar
  13. 13.
    Rasche, F., Scheubert, K., Hufsky, F., Zichner, T., Kai, M., Svatoš, A., Böcker, S.: Identifying the unknowns by aligning fragmentation trees. Anal Chem 84(7), 3417–3426 (2012)CrossRefGoogle Scholar
  14. 14.
    Rauf, I., Rasche, F., Nicolas, F., Böcker, S.: Finding maximum colorful subtrees in practice. J Comput Biol 20(4), 1–11 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Rojas-Chertó, M., Kasper, P.T., Willighagen, E.L., Vreeken, R.J., Hankemeier, T., Reijmers, T.H.: Elemental composition determination based on MS\(^n\). Bioinformatics 27, 2376–2383 (2011)CrossRefGoogle Scholar
  16. 16.
    Shen, H., Dührkop, K., Böcker, S., Rousu, J.: Metabolite identification through multiple kernel learning on fragmentation trees. Bioinformatics 30(12), 157–164 (2014). Proc. of Intelligent Systems for Molecular Biology (ISMB 2014)CrossRefGoogle Scholar
  17. 17.
    Tautenhahn, R., Cho, K., Uritboonthai, W., Zhu, Z., Patti, G.J., Siuzdak, G.: An accelerated workflow for untargeted metabolomics using the METLIN database. Nat Biotechnol 30(9), 826–828 (2012)CrossRefGoogle Scholar
  18. 18.
    Wishart, D.S., Knox, C., Guo, A.C., Eisner, R., Young, N., Gautam, B., Hau, D.D., Psychogios, N., Dong, E., Bouatra, S., Mandal, R., Sinelnikov, I., Xia, J., Jia, L., Cruz, J.A., Lim, E., Sobsey, C.A., Shrivastava, S., Huang, P., Liu, P., Fang, L., Peng, J., Fradette, R., Cheng, D., Tzur, D., Clements, M., Lewis, A., Souza, A.D., Zuniga, A., Dawe, M., Xiong, Y., Clive, D., Greiner, R., Nazyrova, A., Shaykhutdinov, R., Li, L., Vogel, H.J., Forsythe, I.: HMDB: A knowledgebase for the human metabolome. Nucleic Acids Res 37, D603–D610 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • W. Timothy J. White
    • 1
    Email author
  • Stephan Beyer
    • 2
  • Kai Dührkop
    • 1
  • Markus Chimani
    • 2
  • Sebastian Böcker
    • 1
  1. 1.Chair for BioinformaticsFriedrich-Schiller-UniversityJenaGermany
  2. 2.Institute of Computer ScienceUniversity of OsnabrückOsnabrückGermany

Personalised recommendations