Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard

A Related Article is available


Chemical features observed using high-resolution mass spectrometry can be tentatively identified using online chemical reference databases by searching molecular formulae and monoisotopic masses and then rank-ordering of the hits using appropriate relevance criteria. The most likely candidate “known unknowns,” which are those chemicals unknown to an investigator but contained within a reference database or literature source, rise to the top of a chemical list when rank-ordered by the number of associated data sources. The U.S. EPA’s CompTox Chemistry Dashboard is a curated and freely available resource for chemistry and computational toxicology research, containing more than 720,000 chemicals of relevance to environmental health science. In this research, the performance of the Dashboard for identifying known unknowns was evaluated against that of the online ChemSpider database, one of the primary resources used by mass spectrometrists, using multiple previously studied datasets reported in the peer-reviewed literature totaling 162 chemicals. These chemicals were examined using both applications via molecular formula and monoisotopic mass searches followed by rank-ordering of candidate compounds by associated references or data sources. A greater percentage of chemicals ranked in the top position when using the Dashboard, indicating an advantage of this application over ChemSpider for identifying known unknowns using data source ranking. Additional approaches are being developed for inclusion into a non-targeted analysis workflow as part of the CompTox Chemistry Dashboard. This work shows the potential for use of the Dashboard in exposure assessment and risk decision-making through significant improvements in non-targeted chemical identification.

Identifying known unknowns in the US EPA's CompTox Chemistry Dashboard from molecular formula and monoisotopic mass inputs

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    Rager JE, Strynar MJ, Liang S, McMahen RL, Richard AM, Grulke CM, et al. Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring. Environ Int. 2016;88:269–80.

    CAS  Article  Google Scholar 

  2. 2.

    Schymanski EL, Singer HP, Slobodnik J, Ipolyi IM, Oswald P, Krauss M, et al. Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis. Anal Bioanal Chem. 2015;407(21):6237–55.

    CAS  Article  Google Scholar 

  3. 3.

    Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ Sci Technol. 2014;48(4):2097–8.

    CAS  Article  Google Scholar 

  4. 4.

    Letzel T, Bayer A, Schulz W, Heermann A, Lucke T, Greco G, et al. LC–MS screening techniques for wastewater analysis and analytical data handling strategies: Sartans and their transformation products as an example. Chemosphere. 2015;137:198–206.

    CAS  Article  Google Scholar 

  5. 5.

    Letzel T, Lucke T, Schulz W, Sengl M, Letzel M. OMI (Organic Molecule Identification) in water using LC-MS (/MS): steps from “unknown” to “identified”: a contribution to the discussion In a class of its own. Lab More. 2014;4:24–28.

  6. 6.

    Little JL, Cleven CD, Brown SD. Identification of “known unknowns” utilizing accurate mass data and chemical abstracts service databases. J Am Soc Mass Spectr. 2011;22(2):348–59.

    CAS  Article  Google Scholar 

  7. 7.

    Little JL, Williams AJ, Pshenichnov A, Tkachenko V. Identification of “known unknowns” utilizing accurate mass data and ChemSpider. J Am Soc Mass Spectr. 2012;23(1):179–85.

    CAS  Article  Google Scholar 

  8. 8.

    Pence HE, Williams A. ChemSpider: an online chemical information resource. J Chem Educ. 2010;87(11):1123–4.

    CAS  Article  Google Scholar 

  9. 9.

    Royal Society of Chemistry. ChemSpider. 2016.

  10. 10.

    Schymanski EL, Singer HP, Longrée P, Loos M, Ruff M, Stravs MA, et al. Strategies to characterize polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry. Environ Sci Technol. 2014;48(3):1811–8. doi:10.1021/es4044374.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Godfrey AR, Brenton AG. Accurate mass measurements and their appropriate use for reliable analyte identification. Anal Bioanal Chem. 2012;404(4):1159–64. doi:10.1007/s00216-012-6136-y.

    CAS  Article  PubMed  Google Scholar 

  12. 12.

    Ruttkies C, Schymanski EL, Wolf S, Hollender J, Neumann S. MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J Cheminform. 2016;8(1):1–16. doi:10.1186/s13321-016-0115-9.

    CAS  Article  Google Scholar 

  13. 13.

    Bade R, Causanilles A, Emke E, Bijlsma L, Sancho JV, Hernandez F, et al. Facilitating high resolution mass spectrometry data processing for screening of environmental water samples: an evaluation of two deconvolution tools. Sci Total Environ. 2016;569:434–41.

    Article  Google Scholar 

  14. 14.

    Zedda M, Zwiener C. Is nontarget screening of emerging contaminants by LC-HRMS successful? A plea for compound libraries and computer tools. Anal Bioanal Chem. 2012;403(9):2493–502. doi:10.1007/s00216-012-5893-y.

    CAS  Article  PubMed  Google Scholar 

  15. 15.

    Richard AM, Williams CR. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. Mutat Res-Fund Mol M. 2002;499(1):27–52. doi:10.1016/S0027-5107(01)00289-5.

    CAS  Article  Google Scholar 

  16. 16.

    McEachran AD, Shea D, Bodnar W, Nichols EG. Pharmaceutical occurrence in groundwater and surface waters in forests land-applied with municipal wastewater. Environ Toxicol Chem. 2016;35(4):898–905. doi:10.1002/etc.3216.

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    R Team Core. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2016.

    Google Scholar 

  18. 18.

    Kolpin DW, Furlong ET, Meyer MT, Thurman EM, Zaugg SD, Barber LB, et al. Pharmaceuticals, hormones, and other organic wastewater contaminants in U.S. streams, 1999-2000: a national reconnaissance. Environ Sci Technol. 2002;36(6):1202–11.

    CAS  Article  Google Scholar 

  19. 19.

    Klosterhaus SL, Grace R, Hamilton MC, Yee D. Method validation and reconnaissance of pharmaceuticals, personal care products, and alkylphenols in surface waters, sediments, and mussels in an urban estuary. Environ Int. 2013;54:92–9.

    CAS  Article  Google Scholar 

  20. 20.

    Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–13. doi:10.1093/nar/gkv951.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Dionisio KL, Frame AM, Goldsmith M-R, Wambaugh JF, Liddell A, Cathey T, et al. Exploring consumer exposure pathways and patterns of use for chemicals in the environment. Toxicol Rep. 2015;2:228–37.

    CAS  Article  Google Scholar 

  22. 22.

    Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, et al. CERAPP: collaborative estrogen receptor activity prediction project. Environ Health Persp. 2016. doi:10.1289/ehp.1510267.

    Article  Google Scholar 

  23. 23.


  24. 24.

    Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45(7):703–14. doi:10.1002/jms.1777.

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    HighChem. mzCloud. 2016. 16 August 2016.

  26. 26.

    Wolf S, Schmidt S, Müller-Hannemann M, Neumann S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformat. 2010;11(1):1.

    Article  Google Scholar 

  27. 27.

    Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, et al. METLIN: a metabolite mass spectral database. Ther Drug Monit. 2005;27(6):747–51.

    CAS  Article  Google Scholar 

Download references


The authors would like to thank Jim Little for graciously providing the dataset used in Little et al. (2012). This work was supported in part by an appointment to the ORISE participant research program supported by an interagency agreement between the US EPA and DOE. This work has been internally reviewed at the US EPA and has been approved for publication. The views expressed in this paper are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Author information



Corresponding authors

Correspondence to Andrew D. McEachran or Antony J. Williams.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflicts of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.


(PDF 694 kb)

Table S1

Identifiers, masses, and formulae of the 162 chemicals in the study. DTXSID is the DSSTox substance identifier and is the unique identifier in the US EPAs DSSTox Database. (XLSX 32 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McEachran, A.D., Sobus, J.R. & Williams, A.J. Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Anal Bioanal Chem 409, 1729–1735 (2017).

Download citation


  • Non-targeted analysis
  • Suspect screening
  • DSSTox
  • High-resolution mass spectrometry