Abstract
In nontarget screening, structure elucidation of small molecules from high resolution mass spectrometry (HRMS) data is challenging, particularly the selection of the most likely candidate structure among the many retrieved from compound databases. Several fragmentation and retention prediction methods have been developed to improve this candidate selection. In order to evaluate their performance, we compared two in silico fragmenters (MetFrag and CFM-ID) and two retention time prediction models (based on the chromatographic hydrophobicity index (CHI) and on log D). A set of 78 known organic micropollutants was analyzed by liquid chromatography coupled to a LTQ Orbitrap HRMS with electrospray ionization (ESI) in positive and negative mode using two fragmentation techniques with different collision energies. Both fragmenters (MetFrag and CFM-ID) performed well for most compounds, with average ranking the correct candidate structure within the top 25% and 22 to 37% for ESI+ and ESI− mode, respectively. The rank of the correct candidate structure slightly improved when MetFrag and CFM-ID were combined. For unknown compounds detected in both ESI+ and ESI−, generally positive mode mass spectra were better for further structure elucidation. Both retention prediction models performed reasonably well for more hydrophobic compounds but not for early eluting hydrophilic substances. The log D prediction showed a better accuracy than the CHI model. Although the two fragmentation prediction methods are more diagnostic and sensitive for candidate selection, the inclusion of retention prediction by calculating a consensus score with optimized weighting can improve the ranking of correct candidates as compared to the individual methods.
Similar content being viewed by others
References
Krauss M, Singer H, Hollender J. LC-high resolution MS in environmental analysis: from target screening to the identification of unknowns. Anal Bioanal Chem. 2010;397(3):943–51.
Little J, Williams A, Pshenichnov A, Tkachenko V. Identification of "known unknown" utilizing accurate mass data and ChemSpider. J Am Soc Mass Spectrom. 2012;23:1–7.
Royal Society of Chemistry. ChemSpider. 2015. Cambridge: Royal Society of Chemistry. http://www.chemspider.com. Accessed 25 Sept 2017.
Böcker S. Searching molecular structure databases using tandem MS data: are we there yet? Curr Opin Chem Biol. 2017;36:1–6.
Vinaixa M, Schymanski EL, Neumann S, Navarro M, Salek RM, Yanes O. Mass spectral databases for LC/MS and GC/MS-based metabolomics: state of the field and future prospects. TrAC Trends Anal Chem. 2016;78:23–35.
Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform. 2016;8(1):1–16.
Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics. 2010;11(1):148–60.
Heinonen M, Shen H, Zamboni N, Rousu J. Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics. 2012;28(18):2333–41.
Dührkop K, Shen H, Meusel M, Rousu J, Böcker S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc Natl Acad Sci. 2015;112(41):12580–5.
Ridder L, van der Hooft JJJ, Verhoeven S, de Vos RCH, Bino RJ, Vervoort J. Automatic chemical structure annotation of an LC-MSn based metabolic profile from green tea. Anal Chem. 2013;85(12):6033–40.
Allen F, Greiner R, Wishart D. Competitive fragmentation modeling of ESI–MS/MS spectra for putative metabolite identification. Metabolomics. 2015;11(1):98–110.
Tyrkkö E, Pelander A, Ojanperä I. Prediction of liquid chromatographic retention for differentiation of structural isomers. Anal Chim Acta. 2012;720:142–8.
Bade R, Bijlsma L, Sancho JV, Hernández F. Critical evaluation of a simple retention time predictor based on LogKow as a complementary tool in the identification of emerging contaminants in water. Talanta. 2015;139(0):143–9.
Talebi M, Schuster G, Shellie RA, Szucs R, Haddad PR. Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography. J Chromatogr A. 2015;1424:69–76.
Goryński K, Bojko B, Nowaczyk A, Buciński A, Pawliszyn J, Kaliszan R. Quantitative structure–retention relationships models for prediction of high performance liquid chromatography retention time of small molecules: endogenous metabolites and banned compounds. Anal Chim Acta. 2013;797:13–9.
Barron LP, McEneff GL. Gradient liquid chromatographic retention time prediction for suspect screening applications: a critical assessment of a generalised artificial neural network-based approach across 10 multi-residue reversed-phase analytical methods. Talanta. 2016;147:261–70.
Aalizadeh R, Thomaidis NS, Bletsou AA, Gagoferrero P. Quantitative structure–retention relationship models to support nontarget high-resolution mass spectrometric screening of emerging contaminants in environmental samples. J Chem Inf Model. 2016;56(7):1384–98.
Albaugh DR, Hall LM, Hill DW, Kertesz TM, Parham M, Hall LH, et al. Prediction of HPLC retention index using artificial neural networks and IGroup E-state indices. J Chemical Inf Model. 2009;49(4):788–99.
Ulrich N, Schüürmann G, Brack W. Linear solvation energy relationships as classifiers in non-target analysis—a capillary liquid chromatography approach. J Chromatogr A. 2011;1218(45):8192–6.
Sadek PC, Carr PW, Doherty RM, Kamlet MJ, Taft RW, Abraham MH. Study of retention processes in reversed-phase high-performance liquid chromatography by the use of the solvatochromic comparison method. Anal Chem. 1985;57(14):2971–8.
Valko K, Plass M, Bevan C, Reynolds D, Abraham M. Relationships between the chromatographic hydrophobicity indices and solute descriptors obtained by using several reversed-phase, diol, nitrile, cyclodextrin and immobilised artificial membrane-bonded high-performance liquid chromatography columns. J Chromatogr A. 1998;797(1):41–55.
Du CM, Valko K, Bevan C, Reynolds D, Abraham MH. Characterizing the selectivity of stationary phases and organic modifiers in reversed-phase high-performance liquid chromatographic systems by a general solvation equation using gradient elution. J Chromatogr Sci. 2000;38(11):503–11.
Hug C, Ulrich N, Schulze T, Brack W, Krauss M. Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. Environ Pollut. 2014;184(1):25–32.
Stravs MA, Schymanski EL, Singer HP, Hollender J. Automatic recalibration and processing of tandem mass spectra using formula annotation. J Mass Spectrom. 2013;48(1):89–99.
Smith CA, Want EJ, O'Maille G, Ruben Abagyan A, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779.
R Development Core Team. A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2010.
Schymanski EL, Gallampois CM, Krauss M, Meringer M, Neumann S, Schulze T, et al. Consensus structure elucidation combining GC/EI-MS, structure generation, and calculated properties. Anal Chem. 2012;84(7):3287–95.
O'Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: an open chemical toolbox. J Cheminform. 2011;3(1):33.
Yang X, Gubian S, Hoeng J, Suomela B. Generalized simulated annealing for global optimization: the GenSA package. J Mech Engin. 2013;49(12):153–60.
Allen F, Pon A, Wilson M, Greiner R, Wishart D. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra. Nucleic Acids Res. 2014;42(W1):W94–W9.
Schollée JE, Schymanski EL, Stravs MA, Gulde R, Thomaidis NS, Hollender J. Similarity of high-resolution tandem mass spectrometry spectra of structurally related micropollutants and transformation products. J Am Soc Mass Spectrom. 2017;28(12):2692–704.
Schymanski EL, Ruttkies C, Krauss M, Brouard C, Kind T, Kai D, et al. Critical assessment of small molecule identification 2016: automated methods. J Cheminform. 2017;9(1):22.
Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61.
McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem. 2017;409(7):1729–35.
Acknowledgments
We gratefully acknowledge the support of the European Marie Curie Initial Training Network EDA-EMERGE (Grant Agreement no. 290100) and the European FP7 Collaborative Project SOLUTIONS (Grant Agreement no. 603437). MH was supported by EDA-EMERGE (ESR09) and the HIGRADE Graduate School of Helmholtz Centre for Environmental Research-UFZ. MH thanks Juliane Hollender for hosting her exchange at Eawag within EDA-EMERGE. A free academic license for Instant JChem, JChem for Excel, Marvin, and the Calculator Plugins was kindly provided by ChemAxon (Budapest, Hungary). We thank Steffen Neumann for fruitful discussions and comments on the manuscript, Felicity Allen for her support in implementing and applying CFM-ID, Nadin Ulrich and Christine Hug for their support for application of the CHI model, and Janek Paul Dann for providing the data used for the development of the log D model.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Rights and permissions
About this article
Cite this article
Hu, M., Müller, E., Schymanski, E.L. et al. Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS. Anal Bioanal Chem 410, 1931–1941 (2018). https://doi.org/10.1007/s00216-018-0857-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-018-0857-5