Challenge

Non-target screening (NTS) workflows are a powerful method for the large-scale analysis of environmental samples. They consist of wide-scope target, suspect, and non-target analysis. Recently, NTS has developed rapidly with the advance of HR-MS techniques, as reviewed elsewhere [1]. Smart monitoring combining cost-effective methods for wide-scope target and suspect screening with a battery of well-established high-throughput bioassays could be used routinely to reduce the risk of overlooking toxic chemicals in the environment [2, 3].

Continental scale wide-scope target and non-target screening required for an appropriate monitoring of complex chemical contamination is rapidly developing in many monitoring laboratories, as recommended in [4]. This will provide an amount of information unprecedented so far in environmental monitoring. Currently, monitoring data are typically stored and evaluated in a closed and decentralised way using non-harmonised formats and without substantial data exchange between the scientists and agencies involved. These deficiencies hamper the recognition of newly emerging contaminants and mixtures, the prioritisation and identification of the newly recognised chemicals, and the efficient exploitation of these data for quality assessment and management on a European and even global scale. So far, the infrastructure for storage, long-term archiving, open exchange, processing and analysis of these data is largely lacking, although the required technology for ‘big data’ repositories is already available [1, 5].

Any LC-HR-MS or GC-HR-MS technique needed for the detection of suspect and non-target chemicals generates large amounts of data, up to tens of GB per analysis. This brings environmental monitoring into the arena of ‘big data’. Currently, only a fraction of the information from HR-MS measurements is extracted and the rest is discarded. The challenge is (i) to extract the minimum necessary information for a quick overview of presence/absence of a large number of suspects in the samples and (ii) to save all information from HR-MS (raw data) in a format harmonised at the European (and possibly global) level for retrospective screening of environmental samples for the currently known and future pollutants.

Dealing with tens of thousands of substances, their transformation products, technical mixtures, salts, isomers, etc. may lead to a great confusion when not coordinated. Neither the CAS No. nor the name is a sufficiently unique identifier for a compound of interest. At present, the US EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard; > 875,000 chemicals, [6]) is used as a reference for extracting quality checked information. Still, many of the chemicals with high production volumes and their transformation products are not found in this or any databases.

The identification of compounds with experimentally obtained mass spectra is more reliable than just exact mass matching of compound databases [7]. To ensure this, community-based databases containing measured mass spectra need to grow considerably. In addition, the mass spectra of ‘unknowns’ frequently recorded in environmental samples should be stored for future identification, as done in prototype form in the European (NORMAN) MassBank (https://massbank.eu/MassBank/).

Complex mixtures of chemicals should be considered together with their complex effects and ecosystem impacts. Technical developments that now allow for recording extensive chemical fingerprints from NTS, toxicity profiles, and omics responses in laboratory test systems and wildlife and environmental DNA to address biodiversity are delivering enormous amounts of data. The challenge is to establish the infrastructure needed for data storage and the tools for multivariate biological and chemical analysis to facilitate the use of such data.

Recommendations

  • Establish a federated European infrastructure storing raw non-target screening data converted into a common (open) format allowing for ‘on demand’ accessibility for retrospective screening

  • Establish a central platform/database storing regularly updated information on available data sets Europe-wide and, eventually, at a global scale

  • Establish a common European platform where the unique identifiers of newly discovered environmental pollutants can be shared in a harmonised format

  • Apply commonly agreed workflow(s) for retrospective analysis to identify and prioritise pollutants frequently detected in environmental samples.

Requirements

Establishing the data infrastructure for compilation and exchange of screening data on a European scale requires:

  • Recognising the need for screening data within the framework of European water policy, air and soil pollution, and waste management

  • Providing incentives by the European Commission to scientists, monitoring agencies, and Member States to share the screening data

  • Providing incentives by the scientific journals to scientists to share the raw screening data in a harmonised format as a supplementary information to the publications using these data

  • Securing European and national scale funding for establishment of the interoperable infrastructure

  • Support of the European MassBank for systematic storage of mass spectral information of environmentally relevant substances (https://massbank.eu/MassBank)

  • Further harmonisation of wide-scope target and suspect screening techniques in Europe

  • Further development of HR-MS data processing workflows.

Achievements

SOLUTIONS/NORMAN database system

The NORMAN network (https://www.norman-network.net); a network of more than 80 reference laboratories, research centres and other organisations for monitoring of emerging environmental substances in Europe and North America; [8]) and the SOLUTIONS project (https://www.solutions-project.eu); [9]) have pushed the limits of NTS further using European case studies. It is now possible to screen more than 2000 target compounds and more than 40,000 suspect substances in environmental samples. An online database for wide-scope target and non-target screening data was developed as a part of the NORMAN Database System (https://www.norman-network.com/nds) and the SOLUTIONS Database System (https://www.norman-network.com/solutions/norman.php). The latter contains also a unique list of modelling-based prioritised substances, whose presence in the environment is not determined on actual occurrence measurements, but rather on the predictions related to their production volumes, use pattern, and how easy they can be released into environment.

NORMAN suspect list exchange

A collaborative trial organised by the NORMAN network on a surface water sample from the Danube river basin revealed that suspect screening using specific lists of chemicals to find “known unknowns” was a very common and efficient way to expedite non-target screening [10]. As a result, the NORMAN Suspect List Exchange was founded (https://www.norman-network.com/nds/SLE/) and members were encouraged to submit their suspect lists. To date, more than 50 lists of highly varying substance numbers have been uploaded. Over 40,000 substances are available in the correspondingly merged SusDat database (https://www.norman-network.com/nds/susdat). This database contains harmonised names, CAS Nos., SMILES, InChIKeys, “MS-ready structure forms” with chemical substances provided in the form observed by the mass spectrometer (e.g., desalted, as separate components of mixtures [11]), exact masses, retention indices, and modelling-based predicted ecotoxicity threshold values. Further > 40,000 substances are in the pipeline. The curation was done within the network using open-access cheminformatics toolkits. Starting in 2017, the NORMAN Suspect List Exchange and US EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) pooled resources in curating and uploading these lists to the Dashboard (https://comptox.epa.gov/dashboard/chemical_lists).

NORMAN digital sample freezing platform (DSFP)

A retrospective screening platform for hosting mass spectrometric data obtained by LC-HR-MS was created in 2017 (https://norman-data.net), with the ambition of becoming a European and possibly global standard for retrospective suspect screening of environmental pollutants [5; Fig. 1]. This platform enables a quick and effective overview of the potential presence of thousands of substances either known or suspected to be present in the environment (based on the SusDat database), including a wide range of contaminants of emerging concern, their transformation products and unknowns, across a large number of samples and different matrices. A tool for semi-quantitative estimation of concentrations of any detected compound based on their structure similarity is being tested.

Fig. 1
figure 1

Adopted workflow for obtaining harmonised raw screening monitoring data through the Digital Sample Freezing Platform (DSFP) interface [5]

European (NORMAN) MassBank

A database for MS (mainly high resolution) spectra of substances of environmental and metabolomic relevance was created in Europe in 2011, using a format developed previously in Japan. European (NORMAN) MassBank (https://massbank.eu/MassBank/) now contains 57,472 unique mass spectra of 14,667 substances (accessed on 10 May 2019). The exact mass, fragmentation, and measurement information on all substances are feeding into the NORMAN DSFP. In SOLUTIONS, the joint efforts of the environmental and metabolomics community on MassBank development improved and a developer consortium was founded (https://github.com/MassBank/).

Demonstration and evaluation in case studies

The databases developed within NORMAN/SOLUTIONS presented above have already been applied in several case studies related to SOLUTIONS. In the Joint Danube Survey 3 (2013; [12]), a wide-scope target and suspect screening using comprehensive substance lists was tested by several laboratories. Wide-scope target screening tools combined with bioassays were systematically used at the assessment of abatement options in the River Rhine catchment [13]. The NormaNEWS study was carried out in 2017, establishing a global emerging contaminant early warning network to rapidly assess the spatial and temporal distribution of contaminants of emerging concern in environmental samples through performing retrospective analysis on HR-MS data. The effectiveness of such a network was demonstrated through a pilot study, in which eight reference laboratories with available archived HR-MS data retrospectively screened data acquired from aqueous environmental samples collected in 14 countries on 3 different continents [14]. Wide-scope target (> 2100 substances) and suspect screening (NORMAN SusDat; > 40,000 substances) were performed in water, sediment, and biota samples in the Joint Black Sea Surveys (2016, 2017; [15]). A thorough analysis of waste water treatment plant effluents with a battery of SOLUTIONS/NORMAN bioassays was applied using wide-scope target and suspect screening in the Danube River Basin in 2017 in cooperation with the International Commission for the Protection of the Danube River (ICPDR) [16]. The outcomes of the case studies support further development of harmonised databases for archiving ‘big data’ from NTS.