Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard
Chemical features observed using high-resolution mass spectrometry can be tentatively identified using online chemical reference databases by searching molecular formulae and monoisotopic masses and then rank-ordering of the hits using appropriate relevance criteria. The most likely candidate “known unknowns,” which are those chemicals unknown to an investigator but contained within a reference database or literature source, rise to the top of a chemical list when rank-ordered by the number of associated data sources. The U.S. EPA’s CompTox Chemistry Dashboard is a curated and freely available resource for chemistry and computational toxicology research, containing more than 720,000 chemicals of relevance to environmental health science. In this research, the performance of the Dashboard for identifying known unknowns was evaluated against that of the online ChemSpider database, one of the primary resources used by mass spectrometrists, using multiple previously studied datasets reported in the peer-reviewed literature totaling 162 chemicals. These chemicals were examined using both applications via molecular formula and monoisotopic mass searches followed by rank-ordering of candidate compounds by associated references or data sources. A greater percentage of chemicals ranked in the top position when using the Dashboard, indicating an advantage of this application over ChemSpider for identifying known unknowns using data source ranking. Additional approaches are being developed for inclusion into a non-targeted analysis workflow as part of the CompTox Chemistry Dashboard. This work shows the potential for use of the Dashboard in exposure assessment and risk decision-making through significant improvements in non-targeted chemical identification.
KeywordsNon-targeted analysis Suspect screening DSSTox High-resolution mass spectrometry
The authors would like to thank Jim Little for graciously providing the dataset used in Little et al. (2012). This work was supported in part by an appointment to the ORISE participant research program supported by an interagency agreement between the US EPA and DOE. This work has been internally reviewed at the US EPA and has been approved for publication. The views expressed in this paper are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Compliance with ethical standards
Conflicts of interest
The authors declare that they have no conflicts of interest.
- 5.Letzel T, Lucke T, Schulz W, Sengl M, Letzel M. OMI (Organic Molecule Identification) in water using LC-MS (/MS): steps from “unknown” to “identified”: a contribution to the discussion In a class of its own. Lab More. 2014;4:24–28. http://www.int.laborundmore.com/archive/921107/OMI-(Organic-Molecule-Identification)-in-water-using-LC-MS(-MS)%3A-Steps-from-%E2%80%9Cunknown%E2%80%9D-to-%E2%80%9Cidentified%E2%80%9D%3A-a-contribution-to-the-discussion.html.
- 9.Royal Society of Chemistry. ChemSpider. 2016. http://www.chemspider.com/.
- 17.R Team Core. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2016.Google Scholar
- 23.RISK-IDENT. STOFF-IDENT. 2013. http://risk-ident.hswt.de/pages/de/links.php.
- 25.HighChem. mzCloud. 2016. https://www.mzcloud.org/. 16 August 2016.