Analytical and Bioanalytical Chemistry

, Volume 410, Issue 7, pp 1931–1941 | Cite as

Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS

  • Meng Hu
  • Erik Müller
  • Emma L. Schymanski
  • Christoph Ruttkies
  • Tobias Schulze
  • Werner Brack
  • Martin KraussEmail author
Research Paper


In nontarget screening, structure elucidation of small molecules from high resolution mass spectrometry (HRMS) data is challenging, particularly the selection of the most likely candidate structure among the many retrieved from compound databases. Several fragmentation and retention prediction methods have been developed to improve this candidate selection. In order to evaluate their performance, we compared two in silico fragmenters (MetFrag and CFM-ID) and two retention time prediction models (based on the chromatographic hydrophobicity index (CHI) and on log D). A set of 78 known organic micropollutants was analyzed by liquid chromatography coupled to a LTQ Orbitrap HRMS with electrospray ionization (ESI) in positive and negative mode using two fragmentation techniques with different collision energies. Both fragmenters (MetFrag and CFM-ID) performed well for most compounds, with average ranking the correct candidate structure within the top 25% and 22 to 37% for ESI+ and ESI− mode, respectively. The rank of the correct candidate structure slightly improved when MetFrag and CFM-ID were combined. For unknown compounds detected in both ESI+ and ESI−, generally positive mode mass spectra were better for further structure elucidation. Both retention prediction models performed reasonably well for more hydrophobic compounds but not for early eluting hydrophilic substances. The log D prediction showed a better accuracy than the CHI model. Although the two fragmentation prediction methods are more diagnostic and sensitive for candidate selection, the inclusion of retention prediction by calculating a consensus score with optimized weighting can improve the ranking of correct candidates as compared to the individual methods.

Graphical abstract

Consensus workflow for combining fragmentation and retention prediction in LC-HRMS-based micropollutant identification


LC-HRMS Environmental contaminants Structure elucidation Fragmentation prediction Retention prediction Micropollutants 



We gratefully acknowledge the support of the European Marie Curie Initial Training Network EDA-EMERGE (Grant Agreement no. 290100) and the European FP7 Collaborative Project SOLUTIONS (Grant Agreement no. 603437). MH was supported by EDA-EMERGE (ESR09) and the HIGRADE Graduate School of Helmholtz Centre for Environmental Research-UFZ. MH thanks Juliane Hollender for hosting her exchange at Eawag within EDA-EMERGE. A free academic license for Instant JChem, JChem for Excel, Marvin, and the Calculator Plugins was kindly provided by ChemAxon (Budapest, Hungary). We thank Steffen Neumann for fruitful discussions and comments on the manuscript, Felicity Allen for her support in implementing and applying CFM-ID, Nadin Ulrich and Christine Hug for their support for application of the CHI model, and Janek Paul Dann for providing the data used for the development of the log D model.

Compliance with ethical standards

Conflict of interest

The authors declare no conflicts of interest.

Supplementary material

216_2018_857_MOESM1_ESM.pdf (448 kb)
ESM 1 (PDF 447 kb)
216_2018_857_MOESM2_ESM.xlsx (66 kb)
ESM 2 (XLSX 66 kb)
216_2018_857_MOESM3_ESM.rar (3.9 mb)
ESM 3 (RAR 3981 kb)
216_2018_857_MOESM4_ESM.rar (62 kb)
ESM 4 (RAR 61 kb)


  1. 1.
    Krauss M, Singer H, Hollender J. LC-high resolution MS in environmental analysis: from target screening to the identification of unknowns. Anal Bioanal Chem. 2010;397(3):943–51.CrossRefGoogle Scholar
  2. 2.
    Little J, Williams A, Pshenichnov A, Tkachenko V. Identification of "known unknown" utilizing accurate mass data and ChemSpider. J Am Soc Mass Spectrom. 2012;23:1–7.CrossRefGoogle Scholar
  3. 3.
    Royal Society of Chemistry. ChemSpider. 2015. Cambridge: Royal Society of Chemistry. Accessed 25 Sept 2017.
  4. 4.
    Böcker S. Searching molecular structure databases using tandem MS data: are we there yet? Curr Opin Chem Biol. 2017;36:1–6.CrossRefGoogle Scholar
  5. 5.
    Vinaixa M, Schymanski EL, Neumann S, Navarro M, Salek RM, Yanes O. Mass spectral databases for LC/MS and GC/MS-based metabolomics: state of the field and future prospects. TrAC Trends Anal Chem. 2016;78:23–35.CrossRefGoogle Scholar
  6. 6.
    Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform. 2016;8(1):1–16.CrossRefGoogle Scholar
  7. 7.
    Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010;11(1):148–60.CrossRefGoogle Scholar
  8. 8.
    Heinonen M, Shen H, Zamboni N, Rousu J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics. 2012;28(18):2333–41.CrossRefGoogle Scholar
  9. 9.
    Dührkop K, Shen H, Meusel M, Rousu J, Böcker S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci. 2015;112(41):12580–5.CrossRefGoogle Scholar
  10. 10.
    Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, Bino RJ, Vervoort J. Automatic chemical structure annotation of an LC-MSn based metabolic profile from green tea. Anal Chem. 2013;85(12):6033–40.CrossRefGoogle Scholar
  11. 11.
    Allen F, Greiner R, Wishart D. Competitive fragmentation modeling of ESI–MS/MS spectra for putative metabolite identification. Metabolomics. 2015;11(1):98–110.CrossRefGoogle Scholar
  12. 12.
    Tyrkkö E, Pelander A, Ojanperä I. Prediction of liquid chromatographic retention for differentiation of structural isomers. Anal Chim Acta. 2012;720:142–8.CrossRefGoogle Scholar
  13. 13.
    Bade R, Bijlsma L, Sancho JV, Hernández F. Critical evaluation of a simple retention time predictor based on LogKow as a complementary tool in the identification of emerging contaminants in water. Talanta. 2015;139(0):143–9.CrossRefGoogle Scholar
  14. 14.
    Talebi M, Schuster G, Shellie RA, Szucs R, Haddad PR. Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography. J Chromatogr A. 2015;1424:69–76.CrossRefGoogle Scholar
  15. 15.
    Goryński K, Bojko B, Nowaczyk A, Buciński A, Pawliszyn J, Kaliszan R. Quantitative structure–retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Anal Chim Acta. 2013;797:13–9.CrossRefGoogle Scholar
  16. 16.
    Barron LP, McEneff GL. Gradient liquid chromatographic retention time prediction for suspect screening applications: a critical assessment of a generalised artificial neural network-based approach across 10 multi-residue reversed-phase analytical methods. Talanta. 2016;147:261–70.CrossRefGoogle Scholar
  17. 17.
    Aalizadeh R, Thomaidis NS, Bletsou AA, Gagoferrero P. Quantitative structure–retention relationship models to support nontarget high-resolution mass spectrometric screening of emerging contaminants in environmental samples. J Chem Inf Model. 2016;56(7):1384–98.CrossRefGoogle Scholar
  18. 18.
    Albaugh DR, Hall LM, Hill DW, Kertesz TM, Parham M, Hall LH, et al. Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices. J Chemical Inf Model. 2009;49(4):788–99.CrossRefGoogle Scholar
  19. 19.
    Ulrich N, Schüürmann G, Brack W. Linear solvation energy relationships as classifiers in non-target analysis—a capillary liquid chromatography approach. J Chromatogr A. 2011;1218(45):8192–6.CrossRefGoogle Scholar
  20. 20.
    Sadek PC, Carr PW, Doherty RM, Kamlet MJ, Taft RW, Abraham MH. Study of retention processes in reversed-phase high-performance liquid chromatography by the use of the solvatochromic comparison method. Anal Chem. 1985;57(14):2971–8.CrossRefGoogle Scholar
  21. 21.
    Valko K, Plass M, Bevan C, Reynolds D, Abraham M. Relationships between the chromatographic hydrophobicity indices and solute descriptors obtained by using several reversed-phase, diol, nitrile, cyclodextrin and immobilised artificial membrane-bonded high-performance liquid chromatography columns. J Chromatogr A. 1998;797(1):41–55.CrossRefGoogle Scholar
  22. 22.
    Du CM, Valko K, Bevan C, Reynolds D, Abraham MH. Characterizing the selectivity of stationary phases and organic modifiers in reversed-phase high-performance liquid chromatographic systems by a general solvation equation using gradient elution. J Chromatogr Sci. 2000;38(11):503–11.CrossRefGoogle Scholar
  23. 23.
    Hug C, Ulrich N, Schulze T, Brack W, Krauss M. Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. Environ Pollut. 2014;184(1):25–32.CrossRefGoogle Scholar
  24. 24.
    Stravs MA, Schymanski EL, Singer HP, Hollender J. Automatic recalibration and processing of tandem mass spectra using formula annotation. J Mass Spectrom. 2013;48(1):89–99.CrossRefGoogle Scholar
  25. 25.
    Smith CA, Want EJ, O'Maille G, Ruben Abagyan A, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779.CrossRefGoogle Scholar
  26. 26.
    R Development Core Team. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2010.Google Scholar
  27. 27.
    Schymanski EL, Gallampois CM, Krauss M, Meringer M, Neumann S, Schulze T, et al. Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. Anal Chem. 2012;84(7):3287–95.CrossRefGoogle Scholar
  28. 28.
    O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: an open chemical toolbox. J Cheminform. 2011;3(1):33.CrossRefGoogle Scholar
  29. 29.
    Yang X, Gubian S, Hoeng J, Suomela B. Generalized simulated annealing for global optimization: the GenSA package. J Mech Engin. 2013;49(12):153–60.CrossRefGoogle Scholar
  30. 30.
    Allen F, Pon A, Wilson M, Greiner R, Wishart D. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. 2014;42(W1):W94–W9.CrossRefGoogle Scholar
  31. 31.
    Schollée JE, Schymanski EL, Stravs MA, Gulde R, Thomaidis NS, Hollender J. Similarity of high-resolution tandem mass spectrometry spectra of structurally related micropollutants and transformation products. J Am Soc Mass Spectrom. 2017;28(12):2692–704.Google Scholar
  32. 32.
    Schymanski EL, Ruttkies C, Krauss M, Brouard C, Kind T, Kai D, et al. Critical assessment of small molecule identification 2016: automated methods. J Cheminform. 2017;9(1):22.CrossRefGoogle Scholar
  33. 33.
    Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61.CrossRefGoogle Scholar
  34. 34.
    McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem. 2017;409(7):1729–35.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Meng Hu
    • 1
    • 2
  • Erik Müller
    • 1
  • Emma L. Schymanski
    • 3
    • 4
  • Christoph Ruttkies
    • 5
  • Tobias Schulze
    • 1
  • Werner Brack
    • 1
    • 2
  • Martin Krauss
    • 1
    Email author
  1. 1.Department Effect-Directed AnalysisHelmholtz Centre for Environmental Research – UFZLeipzigGermany
  2. 2.Department of Ecosystem Analyses, Institute for Environmental ResearchRWTH Aachen UniversityAachenGermany
  3. 3.Eawag: Swiss Federal Institute of Aquatic Science and TechnologyDübendorfSwitzerland
  4. 4.Luxembourg Centre for Systems Biomedicine (LCSB)University of LuxembourgBelvauxLuxembourg
  5. 5.Department of Stress and Developmental BiologyLeibniz Institute of Plant BiochemistryHalle (Saale)Germany

Personalised recommendations