Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS


In nontarget screening, structure elucidation of small molecules from high resolution mass spectrometry (HRMS) data is challenging, particularly the selection of the most likely candidate structure among the many retrieved from compound databases. Several fragmentation and retention prediction methods have been developed to improve this candidate selection. In order to evaluate their performance, we compared two in silico fragmenters (MetFrag and CFM-ID) and two retention time prediction models (based on the chromatographic hydrophobicity index (CHI) and on log D). A set of 78 known organic micropollutants was analyzed by liquid chromatography coupled to a LTQ Orbitrap HRMS with electrospray ionization (ESI) in positive and negative mode using two fragmentation techniques with different collision energies. Both fragmenters (MetFrag and CFM-ID) performed well for most compounds, with average ranking the correct candidate structure within the top 25% and 22 to 37% for ESI+ and ESI− mode, respectively. The rank of the correct candidate structure slightly improved when MetFrag and CFM-ID were combined. For unknown compounds detected in both ESI+ and ESI−, generally positive mode mass spectra were better for further structure elucidation. Both retention prediction models performed reasonably well for more hydrophobic compounds but not for early eluting hydrophilic substances. The log D prediction showed a better accuracy than the CHI model. Although the two fragmentation prediction methods are more diagnostic and sensitive for candidate selection, the inclusion of retention prediction by calculating a consensus score with optimized weighting can improve the ranking of correct candidates as compared to the individual methods.

Consensus workflow for combining fragmentation and retention prediction in LC-HRMS-based micropollutant identification

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Krauss M, Singer H, Hollender J. LC-high resolution MS in environmental analysis: from target screening to the identification of unknowns. Anal Bioanal Chem. 2010;397(3):943–51.

    CAS  Article  Google Scholar 

  2. 2.

    Little J, Williams A, Pshenichnov A, Tkachenko V. Identification of "known unknown" utilizing accurate mass data and ChemSpider. J Am Soc Mass Spectrom. 2012;23:1–7.

    Article  Google Scholar 

  3. 3.

    Royal Society of Chemistry. ChemSpider. 2015. Cambridge: Royal Society of Chemistry. Accessed 25 Sept 2017.

  4. 4.

    Böcker S. Searching molecular structure databases using tandem MS data: are we there yet? Curr Opin Chem Biol. 2017;36:1–6.

    Article  Google Scholar 

  5. 5.

    Vinaixa M, Schymanski EL, Neumann S, Navarro M, Salek RM, Yanes O. Mass spectral databases for LC/MS and GC/MS-based metabolomics: state of the field and future prospects. TrAC Trends Anal Chem. 2016;78:23–35.

    CAS  Article  Google Scholar 

  6. 6.

    Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform. 2016;8(1):1–16.

    Article  Google Scholar 

  7. 7.

    Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010;11(1):148–60.

    Article  Google Scholar 

  8. 8.

    Heinonen M, Shen H, Zamboni N, Rousu J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics. 2012;28(18):2333–41.

    CAS  Article  Google Scholar 

  9. 9.

    Dührkop K, Shen H, Meusel M, Rousu J, Böcker S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci. 2015;112(41):12580–5.

    Article  Google Scholar 

  10. 10.

    Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, Bino RJ, Vervoort J. Automatic chemical structure annotation of an LC-MSn based metabolic profile from green tea. Anal Chem. 2013;85(12):6033–40.

    CAS  Article  Google Scholar 

  11. 11.

    Allen F, Greiner R, Wishart D. Competitive fragmentation modeling of ESI–MS/MS spectra for putative metabolite identification. Metabolomics. 2015;11(1):98–110.

    CAS  Article  Google Scholar 

  12. 12.

    Tyrkkö E, Pelander A, Ojanperä I. Prediction of liquid chromatographic retention for differentiation of structural isomers. Anal Chim Acta. 2012;720:142–8.

    Article  Google Scholar 

  13. 13.

    Bade R, Bijlsma L, Sancho JV, Hernández F. Critical evaluation of a simple retention time predictor based on LogKow as a complementary tool in the identification of emerging contaminants in water. Talanta. 2015;139(0):143–9.

    CAS  Article  Google Scholar 

  14. 14.

    Talebi M, Schuster G, Shellie RA, Szucs R, Haddad PR. Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography. J Chromatogr A. 2015;1424:69–76.

    CAS  Article  Google Scholar 

  15. 15.

    Goryński K, Bojko B, Nowaczyk A, Buciński A, Pawliszyn J, Kaliszan R. Quantitative structure–retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Anal Chim Acta. 2013;797:13–9.

    Article  Google Scholar 

  16. 16.

    Barron LP, McEneff GL. Gradient liquid chromatographic retention time prediction for suspect screening applications: a critical assessment of a generalised artificial neural network-based approach across 10 multi-residue reversed-phase analytical methods. Talanta. 2016;147:261–70.

    CAS  Article  Google Scholar 

  17. 17.

    Aalizadeh R, Thomaidis NS, Bletsou AA, Gagoferrero P. Quantitative structure–retention relationship models to support nontarget high-resolution mass spectrometric screening of emerging contaminants in environmental samples. J Chem Inf Model. 2016;56(7):1384–98.

    CAS  Article  Google Scholar 

  18. 18.

    Albaugh DR, Hall LM, Hill DW, Kertesz TM, Parham M, Hall LH, et al. Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices. J Chemical Inf Model. 2009;49(4):788–99.

    CAS  Article  Google Scholar 

  19. 19.

    Ulrich N, Schüürmann G, Brack W. Linear solvation energy relationships as classifiers in non-target analysis—a capillary liquid chromatography approach. J Chromatogr A. 2011;1218(45):8192–6.

    CAS  Article  Google Scholar 

  20. 20.

    Sadek PC, Carr PW, Doherty RM, Kamlet MJ, Taft RW, Abraham MH. Study of retention processes in reversed-phase high-performance liquid chromatography by the use of the solvatochromic comparison method. Anal Chem. 1985;57(14):2971–8.

    CAS  Article  Google Scholar 

  21. 21.

    Valko K, Plass M, Bevan C, Reynolds D, Abraham M. Relationships between the chromatographic hydrophobicity indices and solute descriptors obtained by using several reversed-phase, diol, nitrile, cyclodextrin and immobilised artificial membrane-bonded high-performance liquid chromatography columns. J Chromatogr A. 1998;797(1):41–55.

    CAS  Article  Google Scholar 

  22. 22.

    Du CM, Valko K, Bevan C, Reynolds D, Abraham MH. Characterizing the selectivity of stationary phases and organic modifiers in reversed-phase high-performance liquid chromatographic systems by a general solvation equation using gradient elution. J Chromatogr Sci. 2000;38(11):503–11.

    CAS  Article  Google Scholar 

  23. 23.

    Hug C, Ulrich N, Schulze T, Brack W, Krauss M. Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. Environ Pollut. 2014;184(1):25–32.

    CAS  Article  Google Scholar 

  24. 24.

    Stravs MA, Schymanski EL, Singer HP, Hollender J. Automatic recalibration and processing of tandem mass spectra using formula annotation. J Mass Spectrom. 2013;48(1):89–99.

    CAS  Article  Google Scholar 

  25. 25.

    Smith CA, Want EJ, O'Maille G, Ruben Abagyan A, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779.

    CAS  Article  Google Scholar 

  26. 26.

    R Development Core Team. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2010.

    Google Scholar 

  27. 27.

    Schymanski EL, Gallampois CM, Krauss M, Meringer M, Neumann S, Schulze T, et al. Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. Anal Chem. 2012;84(7):3287–95.

    CAS  Article  Google Scholar 

  28. 28.

    O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: an open chemical toolbox. J Cheminform. 2011;3(1):33.

    Article  Google Scholar 

  29. 29.

    Yang X, Gubian S, Hoeng J, Suomela B. Generalized simulated annealing for global optimization: the GenSA package. J Mech Engin. 2013;49(12):153–60.

    Article  Google Scholar 

  30. 30.

    Allen F, Pon A, Wilson M, Greiner R, Wishart D. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. 2014;42(W1):W94–W9.

    CAS  Article  Google Scholar 

  31. 31.

    Schollée JE, Schymanski EL, Stravs MA, Gulde R, Thomaidis NS, Hollender J. Similarity of high-resolution tandem mass spectrometry spectra of structurally related micropollutants and transformation products. J Am Soc Mass Spectrom. 2017;28(12):2692–704.

    Google Scholar 

  32. 32.

    Schymanski EL, Ruttkies C, Krauss M, Brouard C, Kind T, Kai D, et al. Critical assessment of small molecule identification 2016: automated methods. J Cheminform. 2017;9(1):22.

    Article  Google Scholar 

  33. 33.

    Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61.

    Article  Google Scholar 

  34. 34.

    McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem. 2017;409(7):1729–35.

    CAS  Article  Google Scholar 

Download references


We gratefully acknowledge the support of the European Marie Curie Initial Training Network EDA-EMERGE (Grant Agreement no. 290100) and the European FP7 Collaborative Project SOLUTIONS (Grant Agreement no. 603437). MH was supported by EDA-EMERGE (ESR09) and the HIGRADE Graduate School of Helmholtz Centre for Environmental Research-UFZ. MH thanks Juliane Hollender for hosting her exchange at Eawag within EDA-EMERGE. A free academic license for Instant JChem, JChem for Excel, Marvin, and the Calculator Plugins was kindly provided by ChemAxon (Budapest, Hungary). We thank Steffen Neumann for fruitful discussions and comments on the manuscript, Felicity Allen for her support in implementing and applying CFM-ID, Nadin Ulrich and Christine Hug for their support for application of the CHI model, and Janek Paul Dann for providing the data used for the development of the log D model.

Author information



Corresponding author

Correspondence to Martin Krauss.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Electronic supplementary material


(PDF 447 kb)


(XLSX 66 kb)


(RAR 3981 kb)


(RAR 61 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, M., Müller, E., Schymanski, E.L. et al. Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS. Anal Bioanal Chem 410, 1931–1941 (2018).

Download citation


  • Environmental contaminants
  • Structure elucidation
  • Fragmentation prediction
  • Retention prediction
  • Micropollutants