Highly polar trace organic compounds, which are persistent, mobile, and toxic (PMT) or are very persistent and very mobile (vPvM) in the aquatic environment, may pose a risk to surface water, ground water, and drinking water supplies. Despite the advances in liquid chromatography-mass spectrometry, there often exists an analytical blind spot when it comes to very polar chemicals. This study seeks to make a broad polarity range analytically accessible by means of serially coupling reversed-phase liquid chromatography (RPLC) and hydrophilic interaction liquid chromatography (HILIC) to high-resolution mass spectrometry (HRMS). Moreover, a workflow is presented using optimized data processing of nontarget screening (NTS) data and subsequently generating candidate lists for the identification of very polar molecules via an open-access NTS platform and implemented compound database. First, key input parameters and filters of the so-called feature extraction algorithms were identified, and numerical performance indicators were defined to systematically optimize the data processing method. Second, all features from the very polar HILIC elution window were uploaded to the STOFF-IDENT database as part of the FOR-IDENT open-access NTS platform, which contains additional physicochemical information, and the features matched with potential compounds by their accurate mass. The hit list was filtered for compounds with a negative log D value, indicating that they were (very) polar. For instance, 46 features were assigned to 64 candidate compounds originating from a set of 33 samples from the Isar river in Germany. Three PMT candidates (e.g., guanylurea, melamine, and 1,3-dimethylimidazolidin-2-one) were illustratively validated using the respective reference standards. In conclusion, these findings demonstrate that polarity-extended chromatography reproducibly retards and separates (very) polar compounds from surface waters. These findings further indicate that a transparent and robust data processing workflow for nontarget screening data is available for addressing new (very) polar substances in the aqueous environment.
Trace organic compounds (TOrCs) are widely known for being ubiquitous in the aquatic environment . Recently, attention was drawn to one specific group of very polar and poorly degradable TOrCs—classified as “persistent in the environment, mobile in the aquatic environment, and toxic” (PMT) or “very persistent in the environment and very mobile in the aquatic environment” (vPvM)—because they give rise to environmental concerns [2, 3]. PMTs and vPvMs originate from household or industrial chemicals, pharmaceuticals, or pesticides, and enter the aquatic environment through point sources like municipal and industrial wastewater discharge  or diffuse sources such as runoff from agricultural fields or urban surfaces. Additionally, biotic or abiotic reactions from both metabolites and transformation products tend to be more polar than the precursor molecule . Whereas nonpolar chemical compounds can often be efficiently eliminated by sorption to sludge or biotransformation/biodegradation, conventional wastewater treatment fails to significantly remove PMTs and vPvMs . Being highly polar and thus mobile in water, these compounds spread along partially closed water cycles and are able to overcome technical and biological barriers. The result is that PMTs and vPvMs are high-risk candidates for eventually reaching drinking water supplies.
The advent of liquid chromatography (LC) coupled to mass spectrometry (MS) has created new opportunities for detecting polar substances in complex samples. Nevertheless, a large fraction of PMTs and vPvMs still constitute a blind spot in terms of analysis, monitoring, and regulation . Recognition of this “analytical gap” led to an increasing demand for chromatographic techniques which complement classical reversed-phase liquid chromatography (RPLC). Supercritical fluid chromatography (SFC) appears to be a promising technology because it is able to separate compounds of a broad polarity range within a relatively small elution window . Furthermore, hydrophilic interaction liquid chromatography (HILIC) seems to be predestined for coupled separations because of its orthogonality to RPLC [9, 10]. Alpert originally postulated a partitioning mechanism between a hydrated layer partially immobilized on a hydrophilic stationary phase and a relatively hydrophobic bulk eluent . Later on, a theory of a multimodal separation was put forward in which retention occurs through partitioning interactions as well as hydrogen bonding and ion exchange [12, 13]. Despite having a complex retention mechanism, applications already exist which prove that HILIC can effectively separate very polar and environmentally relevant molecules [8, 14, 15]. In 2013, Greco et al. developed a serial coupling of RPLC and HILIC that takes advantage of the complementarity of both techniques and enables both highly polar and nonpolar compounds to be separated within a single run . Injecting large volumes of aqueous sample on a HILIC column might render elution patterns. However, the injection on the RPLC column of the polarity-extended chromatographic system presented here overcomes this issue, and the final eluent containing a high acetonitrile content is very suitable for mass spectrometric electrospray ionization (ESI).
In order to widen the analytical window in favor of more polar molecules, the overall approach regarding how to handle a sample and the respective data requires adjustment. The nontarget screening strategy as initially introduced by Hérnandez et al.  and further refined by Krauss et al.  is well-suited for revealing unknown molecules of interest in complex matrices without prior information or reference standards. The German Water Chemical Society recently described it as a “[…] procedure without limitation to pre-select substances” . The nontarget approach was essentially driven by the evolution of high-accuracy and high-resolution mass spectrometers (HRMS) able to acquire full scan data within a remarkably large mass range. However, even though high mass accuracy reduces the chance of erroneously assigning a molecular formula to a detected mass, further constraints need to be applied for an unambiguous allocation. Kind and Fiehn generated a comprehensive in silico test set of molecular formulae using the elements C, H, N, O, S, and P while complying with chemical and mathematical rules. They showed that the upper mass limit for unique formula assignment is still as low as 185.9760 Da at a theoretical accuracy of 0.1 parts per million (ppm) . It becomes obvious that the results of a nontargeted measurement need to undergo further filtering steps. Criteria can be derived from instrumental boundary conditions and physicochemical properties specific to a compound group. In addition to the predefined mass range, chromatographic information like a retention time (RT) window and/or retention time indexing (RTI) also constitutes instrumental constraints beneficial to data filtering. In parallel with a suspect screening approach, the gathered data may be compared with entries of potential PMTs/vPvMs in a compound database. The search is facilitated by polarity indicators like the logarithmic distribution coefficient between octanol and water (log D). Substances which are known in chemical databases but unknown to the investigator are referred to as “known unknowns”  or “hidden targets” . In contrast to the hidden target strategy, a suspect screening commonly targets the suspected ions already at the data acquisition stage, for instance, by conducting “multiple reaction monitoring” experiments.
Alongside the hidden target screening workflow—from planning an experiment to a final list of tentative PMTs/vPvMs found in a sample—several issues typically arise that require some form of compromise:
Introducing statistical value to the conducted study versus maintaining feasibility in terms of total measurement time and HRMS data volume
Reducing false positive allocations by applying a set of restrictive parameters and filters to the data processing method versus avoiding information loss by choosing overly rigid threshold values
Optimizing the data processing method according to individual needs versus keeping the process transparent and reproducible
These conflicts pose a classical optimization problem and become apparent when composing a method for extracting features from nontarget screening data. Such a feature extraction (FE) method has several quantitative parameters and filters upon which the quality of the resulting feature list depends. Despite the complexity of setting these parameters, the common approaches include either using default settings or optimizing by “trial and error,” or changing “one factor at a time.” Not only do these approaches strongly rely on the investigator’s personal experience, they frequently also disregard statistical interactions between parameters. A more efficient and systematic approach is the statistical design of experiment (DoE): The maximum amount of information is extracted from complex nontarget screening data while simultaneously reducing experimental effort. The idea is to vary relevant peak picking parameters simultaneously over a reduced set of experiments. Cause-and-effect correlations are described by a mathematical model that can in turn be used for the interpretation, prediction, and optimization of these parameters. Since the pioneering work of Fisher in 1926 , the field has developed several methodological instruments such as the Plackett-Burman screening design , upon which the experimental plan used in this study was based.
In the following, an analytical workflow is suggested for facilitating the identification of PMTs and vPvMs in aqueous environments. Data was acquired by an RPLC-HILIC-ESI-time of flight (TOF)-MS coupling. Features were extracted by means of a systematically optimized method, then followed by a database inquiry and a filtering step for polar and very polar substances.
Material and methods
In the following, polarity is approximated by the pH-dependent octanol-water partition coefficient log D. Substances are referred to as either “‘nonpolar’ for a log D > + 2, ‘polar’ for values between − 2.5 and + 2, or ‘very polar’ for a log D < − 2.5” at pH 7 as described elsewhere . It should be kept in mind that a one-parameter coefficient like the log D can only describe the compound variability within a single substance class and does not consider interactions involved in partitioning .
Acetonitrile and water (both LC-MS grade) were purchased from Fluka (Buchs, Switzerland). Reference compounds with a purity of > 99% were purchased from Sigma-Aldrich (Seelze, Germany). First, they were dissolved individually into stock solutions of 1 mM in either acetonitrile (nonpolar compounds) or in acetonitrile/water (50/50, v/v) (polar or very polar compounds). Afterwards, they were combined into a standard working mixture of 10 μM per compound.
Water sampling and sample preparation
The samples used in this study were collected from the Isar river in Germany during March, May, and July 2015, respectively. Grab samples were taken at eleven locations between the Austrian-German border and the Bavarian city of Dingolfing, respectively (exact coordinates and descriptions of the sites are given in Table S1 in the Electronic Supplementary Material, ESM). Each monthly sampling pass and subsequent filtration were completed within a single day.
The samples were enriched using an offline polarity-extended solid-phase extraction (SPE) method. Therefore, two different extraction steps were sequentially conducted: (a) Reversed Phase (RP) Strata-X C18-endcapped from Phenomenex (Aschaffenburg, Germany) for nonpolar compounds and (b) ZIC-HILIC from Dichrom GmbH (Haltern am See, Germany) for (very) polar compounds. Firstly, the preconditioned RP cartridges were loaded with 150 mL of water sample. After passing through the column, it was freeze-dried with an Alpha 1 – 4 LSC freeze dryer (Christ, Osterode am Harz, Germany). The nonpolar fraction of the sample was eluted from the RP cartridge with 3 mL of acetonitrile/water (80/20, v/v), followed by 3 mL of pure acetonitrile. The eluate was dried in a vacuum until the solvents were completely evaporated. The remaining freeze-dried water sample was reconstituted in 12 mL acetonitrile/water (95/5, v/v) and centrifuged. Afterwards, the supernatant was loaded onto the preconditioned ZIC-HILIC cartridge. Three milliliters of acetonitrile/water (50/50, v/v) was used to elute the sample’s polar fraction from the ZIC-HILIC column into the tube containing the same dried nonpolar fraction of the sample. The combined extracts were dried again and finally reconstituted in 0.5 mL acetonitrile/water (50/50, v/v). The resulting solution was filtered through a 22 μM PVDF filter into a glass vial and stored at 4 °C prior to analysis.
Pure water (LC-MS grade) was processed the same way and served as a blank sample.
In short, two binary pumps and two online degassers were used (Agilent Technologies, Waldbronn, Germany). The RP separation was performed on a Poroshell 120 EC-C18 (50.0 × 3.0 mm, 2.7 μm; Agilent Technologies) column and the HILIC separation on a ZIC-HILIC (150 × 2.1 mm, 5 μm, 200 Å; Merck Sequant, Umea, Sweden). The two columns were connected in series via a T-piece with a mixing frit (Upchurch, IDEX Europe GmbH, Erlangen, Germany). The third port of the T-piece was connected to a second binary pump. The following solvents were used as mobile phases: Solvent A was 10 mM ammonium acetate in water/acetonitrile (90/10, v/v) and solvent B was 10 mM ammonium acetate in acetonitrile/water (10/90, v/v) for the RPLC separation. For HILIC, acetonitrile (solvent C) and water (solvent D) were used. Information on the gradients is provided in Table S2 of the ESM. The injection volume was 10 μL.
The chromatographic system was coupled to an Agilent 6230 TOF-MS equipped with a Jet Stream ESI interface (Agilent Technologies, Santa Clara, CA, USA). The ESI source was operated in positive mode under the following conditions: gas temperature 325 °C, drying gas flow 8.0 L min−1, nebulizer gas pressure 45 psi, sheath gas temperature 250 °C, sheath gas flow 5.5 L min−1, capillary voltage 3 kV, fragmentor voltage 175 V. Nitrogen was used as both the drying and the sheath gas. During the analysis, a mass range up to 1700 m/z was scanned in high-resolution mode. The instrument was continuously calibrated on 125 nM purine and 6.25 nM HP-921 MS tuning mix (Agilent Technologies, Waldbronn, Germany) . The instrument’s resolution is specified at 22,000 (full width at half maximum, FWHM) at m/z 1522 after automatic tuning procedure . The accuracy was below 10 ppm.
Each sample was measured once. Interday repeatability of the chromatographic system was investigated on 68 standard substances by measuring them four times over the course of the campaign (ESM Table S3). For each compound, the mean value and standard deviations were calculated for RT, mass error, and the peak’s full width at half maximum (FWHM) and its height, respectively.
Data processing and (tentative) molecule identification
The software that acquired the raw mass spectrometric data and controlled the system was MassHunter Workstation LC/MS Data Acquisition (version B.05.01, Agilent Technologies).
Chromatographic peaks were binned together across all samples by RT and accurate mass. This compound bin is referred to as a feature which is characterized by the median values of RT, accurate mass, and signal intensity. Had MS/MS data been available, the spectrum would have been a further part of the feature.
The mass spectrometric raw data was processed by MassHunter Workstation Profinder (version B.06.00, Agilent Technologies) software. The feature extraction (FE) method is composed of two consecutive algorithms: The first which processes the HRMS data is an untargeted matter, deconvoluting all peaks that exceed an intensity limit defined by the investigator. Subsequently, ion species (molecular ions, isotopes, and adducts) that display the same chromatographic behavior are grouped together and aligned across all samples. Median values for masses, RTs, and composite spectra are calculated and fed into the second, recursive algorithm. This second algorithm uses these values to perform a targeted extraction and thereby improves reliability in features .
Within the above program, a “batch recursive FE” method for small molecules was built in order to generate a list of features. Each feature contains peaks of defined mass and RT aligned across the respective number of samples. Isotope grouping was performed on the basis of the common organic molecule model with a peak spacing tolerance of 0.0025 m/z + 7.0 ppm. The critical FE parameters thereby were optimized by means of DoE (see the “Optimization of the FE parameter using the design of experiment strategy” section), and the results are shown in Table 1.
Optimization of the FE parameter using the design of experiment strategy
Prior to the FE, critical method parameters referred to as quantitative factors (F) were identified and optimized by means of DoE using MODDE Pro software (version 126.96.36.19991; Sartorius Stedim Biotech GmbH, Göttingen, Germany). The optimization was based on the HRMS raw data from a subset of six river water samples (due to extensive data processing times): The March and July samples were each taken at locations 1, 7, and 10 (ESM Table S1), respectively. The following critical quantitative FE factors were optimized and are illustrated in Fig. 1:
Absolute threshold for peak height prior to the FE
Charge state of the ions
RT tolerance for the binning and alignment of features
Mass tolerance for the binning and alignment of features
Absolute peak height filter applied after the untargeted algorithm
Absolute peak height filter applied after the targeted algorithm
Absolute peak height filter after integration
Symmetric m/z tolerance for the recursive chromatogram extraction
These eight FE factors were varied according to a strategically designed experimental plan of 28 runs which is provided in Table S4 of the ESM. Optimization was based on six response factors (R) in the form of numerical descriptors which were chosen to assess the performance of the FE method:
The RT span of all peaks binned into a compound group across all samples. After every experimental run, the median value was calculated for the final feature list
The relative standard deviation (RSD) of masses within a compound group across all samples. Again, the median value was calculated
The number of single ions relative to the total number of extracted features
The mass difference between the median mass within a compound group and the target mass used by the recursive algorithm
The number of erroneously integrated chromatographic peaks out of a sample of ten features, which covered broad mass and RT ranges and appeared in more than ten experiments (ESM Table S5). The extracted ion chromatograms (EICs) were checked manually
The number of missed chromatographic peaks out of the same sample of ten features
For each experiment, the response variables were calculated from the extracted feature list out of the sample subset. A model was fitted using multiple linear regression. The optimization was based on desirability functions with the objective of minimizing all responses. These desirability functions ran on the specification limits for each response variable given in Table S6 (see ESM) along with the experimental results. To search for the robust setpoint, a design space was generated with a resolution of 8, with 1000 iterations and a 1% acceptance limit.
(Tentative) molecule identification
The exact masses of the features retained by the HILIC column were uploaded and processed by the FOR-IDENT platform  using the STOFF-IDENT compound database . The transition from the HILIC to the RPLC retention window was between 16 and 21 min (Fig. 2). A hard limit was defined at the RT of metformin 17.0 min since it was the last standard compound eluting from the HILIC column (ESM Table S3). A pH level of 7 and a 10.0 ppm mass tolerance for the molecular ion were selected as input parameters. The resulting list of compounds, including their physiochemical properties, was downloaded. That list was then filtered for compounds with a log D value (at pH 7) below 0. Afterwards, the underlying chromatographic peaks of successfully matched features were manually checked. Compound lists were processed in Microsoft Excel (version 1902).
Results and discussion
This study sought to combine a polarity-extended chromatographic method, i.e., the serial coupling of RPLC and HILIC, using a hidden target screening approach for tentatively identifying PMTs/PvMs.
A set of environmental data was used in order to exemplify a novel data processing and molecule identification strategy for very polar anthropogenic compounds. It is being presented with an emphasis on key parameters for ensuring transparency and reproducibility. Consequently, the study focused on data processing and tried to explicitly highlight the importance of transparently communicating the FE method. Since a large variety of data analysis software tools exist, the key peak picking parameters need to be identified and clearly stated. Therefore, a data-specific optimization is suggested for parameters and filters which are critical to extracting features from HRMS raw data. Subsequently, a compound database was consulted in order to tentatively identify (very) polar molecules by comparing physiochemical properties of the observed analytical data with compounds stored in the database. The data processing strategy for going from mass spectrometric raw data to a list of potential compounds of the very polar fraction of the sample is illustrated in Fig. 1. Every step along the workflow is characterized by several input parameters (orange boxes) and filters (blue boxes) acting as variables influencing the outcome in the form of a feature list.
Features are characterized by the median RT, accurate mass, and signal intensity, as described in the experimental section.
Environmental samples studied
The workflow was built on 33 environmental samples that were taken during early 2015 from the Isar river in southern Germany (Bavaria). The eleven sites sampled started at the German-Austrian border close to the river’s source, then covered the urban area of Munich and finished in the town of Dingolfing. The exact locations are specified and commented on in Table S1 (see ESM). Each sampling pass was performed during a single day and repeated during the months of March, May, and July, respectively. Karakurt et al.  calculated the relative wastewater effluent contribution at different gauging stations along the Isar river from the accumulated discharge of upstream wastewater treatment plants under mean minimum annual discharge conditions. Their results showed that the wastewater effluent contribution was < 1% in Mittenwald, near the source. It jumped from 5 to > 50% after the plants treating the wastewater of Munich and recovered to 11% at the gauging station of the town of Landshut, 40 km downstream (probably due to dilution) . Judging from the substantial portions of treated wastewater in densely populated areas, an elevated amount of anthropogenic PMTs and vPvMs was expected in the samples taken shortly downstream of Munich. Furthermore, this campaign was designed to unravel seasonal differences between the spring and summer samples.
The hidden target screening approach is well-suited for such global surveys of surface waters because it creates options for comparing samples by their molecular fingerprint. The aqueous samples were analyzed using the polarity-extended chromatographic separation technique RPLC-HILIC coupled with accurate TOF-MS.
Robust operation of the analytical setup is a prerequisite for successfully handling the observed nontarget screening data and the evaluation thereof.
As a result, the data on the interday repeatability of the RPLC-HILIC-TOF coupling is presented in Table S3 of the ESM.
For the 43 standard compounds eluting from the RPLC column between 23.7 and 35.1 min, the mean standard deviation of RT was 0.3 min (RSD 0.9%). Norfluoxetine showed the largest variability at 0.6 min (2.1%). The mass spectrometric accuracy can be expressed by the mass error which corresponds to the relative deviation of the detected mass from the monoisotopic mass. All RPLC standard compounds displayed a mean mass error of < 7 ppm.
Twenty-five compounds were retained on the HILIC column and eluted between 5.9 and 17.0 min. The mean standard deviation of RT was 0.2 min (1.9%) whereas gabapentin and metformin exhibited the highest values at 0.5 min (4.6%) and 0.8 min (4.4%), respectively. The mean mass error was below 9 ppm for all HILIC compounds, which is in good agreement with an expected accuracy of 10 ppm for the applied time of flight mass spectrometer.
The overall scattering of the RTs for the standard compounds is considerably low, which indicates a reliable and robust separation of both the RPLC and the HILIC parts. Retention time stability is an important parameter when dealing with nontarget screening data since it directly factors into the componentization during the feature-finding process. Accordingly, for all features detected in the real samples during this 5-month project, the RSD of RTs was below 4% (Table 2). Moreover, the mass errors observed in the test set justify adopting a maximum mass difference of 10 ppm as a criterion for matching features with database compounds by accurate mass. More details on stability testing of the coupling can be found in the previous work published by Bieber et al. .
Data processing strategy and optimization
Optimization of FE parameters and filters
The two FE algorithms applied have differing objectives:
The nontargeted one that should allow a less restricted search in order to reduce the number of missed chromatographic peaks and
the targeted one that should be more precise in order to introduce statistical confidence to the features. Since the target criteria of the second algorithm depend on the componentization of the first one, the parameter settings need to be coordinated.
Therefore, critical input parameters and filters (called factors) were identified and optimized before applying the FE method to real samples. The optimization was based on six responses which characterize the performance of a certain combination of factors: R1, R2, and R3 (see the “Optimization of the FE parameter using the design of experiment strategy” section for a definition) assess the quality of the feature componentization. Consequently, given closer binning and alignment tolerances, the percentage of single ions (R3) is likely to increase. Nevertheless, a fraction of observed single ions might also derive from a low signal intensity. R4 provides information on the accuracy of the recursive algorithm. R5 and R6 indicate whether integrated peaks are correctly allocated to a compound bin.
Based on the results of all 28 experiments included in the strategically designed plan, a model was fitted for each individual response by means of multiple linear regression. For model diagnostics, the coefficient of determination R2 was considered. The latter represents the fraction of a response that can be explained by the model. The predictive ability of a model can be estimated by Q2 . For each response, R2 was > 0.7 and Q2 > 0.6. Furthermore, the FE algorithms generate the same feature lists for repeated processing attempts given that the same data set was used, and the same parameter values were set. Considering all three performance characteristics (R2, Q2, and reproducibility), the model was considered to be sufficiently significant. The formula for R2 and Q2, along with the individual values for each response, can be found in Table S7 (see ESM). Moreover, the model terms are also listed therein along with their respective coefficients.
A robust setpoint for the critical FE input parameters and filters was iteratively calculated with the objective of minimizing the six response variables. The resulting factor settings are presented in Table 1. Some results can be transferred to other data analysis workflows applied for HRMS, e.g., setting the initial threshold for the signal intensity (F1) to zero in order to decrease the number of missed chromatographic peaks while simultaneously applying a rigid post-processing filter (F6, F7) in order to reduce the number of falsely integrated peaks and single ions. Further findings are specific to the data set at hand, the measurement campaign which was conducted over a period of 5 months. The chromatographic and mass spectrometric conditions shifted slightly as a result. It is generally accepted that tolerance windows need to be implemented in feature-building algorithms which account for these RT and mass drifts . Accordingly, an optimized EIC tolerance (F8) of 10 ppm was calculated from the model and is in good agreement with the mass errors found for the HILIC standard compounds in the TOF system (see “Environmental samples studied” section and ESM Table S3). Even though the average interday RT variability over the 25 standard compounds was < 2%, in some rare cases (such as for gabapentin and metformin), the RT shifted over an absolute span of > 1.0 min. As a result, an optimal and robust binning and alignment tolerance of 0.9 min was determined for RT (F3). Another way of coping with shifting instrumental conditions would be to split data sets into homogeneous blocks, with each block containing data from successively measured samples. On the one hand, this approach could lead to smaller tolerance windows for RT and accurate mass. On the other hand, however, it would mean an inability to handle the data from a consecutive campaign as one batch.
These findings prove that FE parameter settings need to adapt to fluctuating data quality. In this context, the concept of optimizing these parameters by DoE introduces flexibility into a workflow that processes nontarget data. A DoE approach is inherently accompanied by a thorough statistical evaluation of a parameter’s significance, effect, or potential interactions with other parameters and, as a result, leads to a better understanding of the algorithm in general. Examples of DoE are already being used for the purpose of optimizing peak picking parameters for XCMS software  in metabolomics [34, 35]. The present study shows that the concept is not limited to specific software and could serve as an add-on for improving a preexisting nontarget or hidden target screening strategy in environmental analysis. Moreover, it has the potential to become a fully automated element within an FE workflow.
Polarity-dependent feature extraction
The final FE method was applied twice: First, on the raw HRMS data of the blank sample and afterwards to the real sample set. The masses found in the blank sample were excluded from the feature list of the real river water samples. That way, 179 features were eliminated from the sample results table. One would anticipate more lost features after a blank exclusion compared with a blank subtraction, but there actually appears to be no significant differences between both methods . One other option would be to define a sample-to-blank intensity ratio. However, blank exclusion is easier to implement into automated workflows. As depicted in the RT-mass plot in Fig. 2, the remaining 1739 features can be separated into two groups: The first group represents compounds retained by the HILIC column and the second those retained by the RPLC column. Thus, the features eluting in the first section of the run were expected to be polar or very polar . Accordingly, all features with RTs > 17.0 min were eliminated from the feature list. Preliminary tests showed that the last very polar reference compound eluted from the HILIC column at that time (ESM Table S3).
The RPLC fraction of features that were detected by the serial coupling was not further considered in this study. Consequently, the instrumental setup implicitly set boundary conditions that could be used as a filter criterion with respect to the research question at hand.
Features able to be found in fewer than three different water samples were excluded in a further filter step prior to the database search. Although triplicates are in contemporary nontarget analysis considered to be the optimal with regard to replicate injections , each sample was measured only once. An elevated number of samples in addition to the already high data output of a nontargeted HRMS screening requires measures for increasing the feasibility of studies while simultaneously minimizing the number of false positive allocations. In the aforementioned study from 2015, the statistical need for triplicate injections of the same sample was not clearly proven and, further, single measurements reduce the expenditure of measurement time and resources. Instead, a threshold for features was introduced specifying the minimum number of appearances across all samples. The approach of filtering by “detection frequency” in real samples has been successfully applied in order to prioritize features for further investigations under different research questions [37, 38]. It works well for large sample sets, as was the case in this study, because eleven locations were sampled at several dates over 1 year (see ESM Table S1). In addition to the binning and alignment of peaks within a sample group, such a detection frequency threshold introduces more statistical confidence to a feature. However, this approach is not applicable in all situations since, for example, short-term emission of TOrCs from point sources will be missed. The number of eligible features was reduced to 408 after removing those eluting later than 17.0 min and those that were detected in fewer than three samples.
(Tentative) compound identification in water samples
Inquiry of a compound database
These features were uploaded to the open-access platform FOR-IDENT handling the compound database STOFF-IDENT. Providing physicochemical properties of more than 11,000 anthropogenic compounds, this database was developed to support identification of pollutants and emerging contaminants relevant to the aquatic environment . The database search was performed based on accurate mass ± 10 ppm on the platform. Thus, 132 features were successfully located in the database, and 287 anthropogenic trace organic compounds suggested. The difference derives from the given mass window, so multiple compounds were matched with a single feature. In some cases, it was the other way around: Various features were assigned to one compound. A list of compounds was proposed together with information on their polarity expressed by pH-dependent log D value. The list served as a basis for further filtering and investigation of the underlying peaks.
Polarity filtering of the FOR-IDENT results
Due to the complex retention mechanism of HILIC, the relationship between the analytes’ RTs and the related log D values has yet to be determined in detail. However, Bieber et al. tested a mix of 262 standard compounds on the RPLC-HILIC coupling and found that all of the 136 compounds that eluted from the HILIC column have a negative log D value . In the current investigation, this was also true for the 25 reference compounds that eluted earlier than 17.0 min (ESM Table S3). Knowing that, 287 database hits were filtered for compounds that have a log D value < 0 at a pH value 7. One hundred nine remaining compound candidates had polar to very polar properties as indicated by their negative log D values and consequently were expected to elute from the HILIC column.
Finally, the 68 features that led to the 109 candidates were manually reevaluated in order to ensure that they originated from chromatographically acceptable peaks. For a feature to be classified as “acceptable,” it had to fulfill the following criteria:
The spread of relative mass differences within the compound bin of the feature is lower than ± 10 ppm.
The chromatographic peaks are approximately Gaussian shaped.
Each peak’s mass spectrum comprises at least two ions to increase the reliability of single features.
For at least one chromatographic peak in the compound bin, the peak’s mass spectrum displays an isotopic pattern.
Our approach assigned 64 PMT/vPvM candidates to 46 features (Table 2) detected in Isar river samples from March, May, and July of 2015. The numerous filtering steps for feature lists throughout this workflow reduced the effort of manually evaluating chromatographic peaks. Nevertheless, 18 features had multiple components allocated to them. That number could probably have been reduced by comparing MS/MS spectra. A validation with reference standards is still pending at this time.
The spatial and temporal distribution of these features across all 33 samples is illustrated in Fig. 3. The study indicates that the overall number of features in the Isar river was significantly higher in March than in May or July of 2015. This result could be attributed to a concentration effect due to a relatively low precipitation (26 mm) in February of the same year. In contrast, there were 80 mm and 58 mm of rainfall in April and May, respectively (measured in Garmisch by the Bavarian State Ministry for the Environment) . However, in all campaigns, the number of features sharply increased downstream of the city of Munich. As an example, the dot plot in Fig. 4 shows the polar features detected in March around Munich. It indicates whether the feature’s absolute abundance increases or decreases downstream of Munich. Even though no definite quantitative statement can be made, comparing the signal intensities at different locations or dates could help in prioritizing features. For example, the peak height of the feature with the mass/RT coordinate (102.0547 Da/14.62 min) increased by a factor of 34. The compound proposed for that feature (ID 2 in Table 2) by the STOFF-IDENT database was guanylurea, which is an aerobic bacterial degradation product of the antidiabetic drug metformin . Since guanylurea is stable against further photo- and biodegradation, it distributes over the entire regional water cycle, to such an extent that it has already been detected in the North Sea .
Confirmation using a reference standard
By using the feature’s signal intensity for rudimentary prioritization, we uncovered guanylurea (ID 2) as a compound of interest because its signal intensity increased significantly downstream of the city of Munich. In order to achieve a higher level of confidence in compound identification, a reference standard was measured under the same chromatographic and mass spectrometric conditions. Additionally, melamine (ID 11) and 1,3-dimethylimidazolidin-2-one (DMI, ID 6) were exemplarily confirmed by matching their RT and accurate mass matching with the respective reference standards. The respective EICs (Figs. S1–S3) and isotopic ratios (Table S8) are provided in the ESM. The information is summarized in Table 3. In each case, the RT deviation was below 5%, and the mass difference was below 5 ppm. MS/MS matching would be additionally required for a level 1 identification according to Schymanski et al. .
However, validation by means of RT and MS matching could resolve ambiguous allocations, as the example of melamine shows: For the three features A (126.0655 Da/6.9 min), B (126.0651 Da/7.2 min), and C (126.0665 Da/8.4 min), the STOFF-IDENT database output was melamine. The RT of the reference standard of 7.0 min closely matched the RT of feature A. In addition, the absolute mass deviation of 0.0001 Da was the lowest of the three.
The feature of guanylurea was detected 13 times in total at locations 2, 7, 8, 9, and 10 (ESM Table S1).
The feature allocated to melamine was found in four samples at locations 9, 10, and 11. The findings are in line with the expectation that pharmaceuticals and their transformation products primarily occur downstream of urban areas.
One newfound polar substance of emerging interest is 1,3-dimethylimidazolidin-2-one (DMI). The feature of DMI was detected in all eleven samples from March 2015 and compared with the reference standard. It is a high-boiling aprotic solvent that, according to the European Chemicals Agency (ECHA), is used in pH regulators, water treatment products, and laboratory chemicals. It is registered by six active suppliers under REACH and 100–1000 t is brought into the European Economic Area per year. The Predicted No-Effect Concentration in freshwater is 100 μg L−1 . To the best of our knowledge, there is currently no detailed information available regarding the occurrence and distribution of DMI in the aquatic environment.
Schwarzenbach RP, Escher BI, Fenner K, Hofstetter TB, Johnson CA, von Gunten U, Wherli B. The challenge of micropollutants in aquatic systems. Science. 2006;313(80-):1072–1077.
German Environment Agency (UBA) Protecting the sources of our drinking water - the criteria for identifying persistent, mobile, and toxic (PMT) substances and very persistent, and very mobile (vPvM) substances under EU REACH Regulation (EC) No 1907. 2006.
Schmidt TC. Recent trends in water analysis triggering future monitoring of organic micropollutants. Anal Bioanal Chem. 2018;410:3933–41.
Comero S, Loos R, Carvalho R, Anto DC, Locoro G, Tavazzi S, et al. EU-wide monitoring survey on emerging polar organic contaminants in wastewater treatment plant effluents. Water Res. 2013;47:6475–87. https://doi.org/10.1016/j.watres.2013.08.024.
Boxall A, Sinclair C, Fenner K, Koplin D, Maund S. When synthetic chemicals degrade in the environment. Environ Sci Technol. 2004;38:368–75. https://doi.org/10.1117/1.3589100.
Reemtsma T, Weiss S, Mueller J, Petrovic M, Gonzalez S, Barcelo D, et al. Polar pollutants entry into the water cycle by municipal wastewater: a European perspective. Environ Sci Technol. 2006;40:5451–8. https://doi.org/10.1021/es060908a.
Reemtsma T, Berger U, Arp HPH, Gallard H, Knepper TP, Neumann M, et al. Mind the gap: persistent and mobile organic compounds - water contaminants that slip through. Environ Sci Technol. 2016;50:10308–15. https://doi.org/10.1021/acs.est.6b03338.
Bieber S, Greco G, Grosse S, Letzel T. RPLC-HILIC and SFC with mass spectrometry: polarity-extended organic molecule screening in environmental (water) samples. Anal Chem. 2017;89:7907–14. https://doi.org/10.1021/acs.analchem.7b00859.
Chalcraft KR, McCarry BE. Tandem LC columns for the simultaneous retention of polar and nonpolar molecules in comprehensive metabolomics analysis. J Sep Sci. 2013;36:3478–85. https://doi.org/10.1002/jssc.201300779.
Hemström P, Irgum K. Hydrophilic interaction chromatography. J Sep Sci. 2006;29:1784–821. https://doi.org/10.1002/jssc.200600199.
Alpert AJ. Hydrophilic-interaction chromatography for the separation of peptides, nucleic acids and other polar compounds. J Chromatogr A. 1990;499:177–96. https://doi.org/10.1016/S0021-9673(00)96972-3.
Berthod A, Chang SSC, Kullman JPS, Armstrong DW. Practice and mechanism of HPLC oligosaccharide separation with a cyclodextrin bonded phase. Talanta. 1998;47:1001–12. https://doi.org/10.1016/S0039-9140(98)00179-9.
McCalley DV. Study of the selectivity, retention mechanisms and performance of alternative silica-based stationary phases for separation of ionised solutes in hydrophilic interaction chromatography. J Chromatogr A. 2010;1217:3408–17. https://doi.org/10.1016/j.chroma.2010.03.011.
Tang Y, Xu Y, Li F, Jmaiff L, Hrudey SE, Li X. Nontargeted identification of peptides and disinfection byproducts in water. J Environ Sci. 2016;42:259–66.
Greco G, Grosse S, Letzel T. Robustness of a method based on the serial coupling of reversed-phase and zwitterionic hydrophilic interaction LC – MS for the analysis of phenols. J Sep Sci. 2014;37:630–4. https://doi.org/10.1002/jssc.201301112.
Greco G, Grosse S, Letzel T. Serial coupling of reversed-phase and zwitterionic hydrophilic interaction LC/MS for the analysis of polar and nonpolar phenols in wine. J Sep Sci. 2013;36:1379–88. https://doi.org/10.1002/jssc.201200920.
Hernández F, Pozo JÓ, Sancho JV, López FJ, Harín JM, Ibánez M. Strategies for quantification and confirmation of multi-class polar pesticides and transformation products in water by LC – MS 2 using triple quadrupole and hybrid quadrupole time-of-flight analyzers. Trends Anal Chem. 2005;24:596–612. https://doi.org/10.1016/j.trac.2005.04.007.
Krauss M, Singer H, Hollender J. LC – high resolution MS in environmental analysis: from target screening to the identification of unknowns. 2010:943–51. https://doi.org/10.1007/s00216-010-3608-9.
“Non-Target Screening” expert committee of the German Water Chemistry Society. Use of non-target screening by means of LC-ESI-HRMS in water analysis, 1.0. 2019.
Kind T, Fiehn O. Metabolomic database annotations via query of elemental compositions : mass accuracy is insufficient even at less than 1 ppm. BMC Bioinformatics. 2006:1–10. https://doi.org/10.1186/1471-2105-7-234.
Little JL, Williams AJ, Pshenichnov A, Tkachenko V. Identification of “known unknowns” utilizing accurate mass data and chemspider. J Am Soc Mass Spectrom. 2012;23:179–85. https://doi.org/10.1007/s13361-011-0265-y.
Letzel T, Bayer A, Schulz W, Heermann A, Lucke T, Greco G, et al. LC – MS screening techniques for wastewater analysis and analytical data handling strategies: Sartans and their transformation products as an example. Chemosphere. 2015;137:198–206. https://doi.org/10.1016/j.chemosphere.2015.06.083.
Fisher RA. The arrangement of field experiments. J Minist Agric Gt Britain. 1926:505–13. https://doi.org/10.1007/978-1-4612-4380-9_8.
Plackett RL, Burman JP. The design of optimum multifactorial experiments. Biometrika. 1946;33:305–25.
Goss K-U, Schwarzenbach RP. Linear free energy relationships used to evaluate equilibrium partitioning of organic compounds. 2001;35:1–9. https://doi.org/10.1021/es000996d.
Agilent Technologies. Agilent 6230B Accurate-Mass TOF LC / MS Data Sheet. 5991-1347EN. 2012.
Agilent Technologies. MassHunter Profinder Software, Quick Start Guide. G3835-90027 31–32. 2016.
Open Access Platform FOR-IDENT. https://www.for-ident.org/. Accessed 28 Oct 2019.
Compound Database STOFF-IDENT. https://www.lfu.bayern.de/stoffident/#!home. Accessed 28 Oct 2019.
Karakurt S, Schmid L, Hu U, Drewes JE. Dynamics of wastewater effluent contributions in streams and impacts on drinking water supply via riverbank filtration in Germany - a national reconnaissance. Environ Sci Technol. 2019;53:6154–61. https://doi.org/10.1021/acs.est.8b07216.
Golub GH, Heath M, Wahba G. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 1979;21:215–23. https://doi.org/10.1080/00401706.1979.10489751.
Schollée JE, Schymanski EL, Hollender J. Statistical approaches for LC-HRMS data to characterize, prioritize, and identify transformation products from water treatment processes. In: ACS Symposium Series; 2016. p. 45–65.
Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78:779–87. https://doi.org/10.1021/ac051437y.
Eliasson M, Rännar S, Madsen R, Donten MA, Marsden-Edwards E, Moritz T, et al. Strategy for optimizing LC-MS data processing in metabolomics: a design of experiments approach. Anal Chem. 2012;84:6869–79. https://doi.org/10.1016/j.ijpharm.2011.11.009.
Zheng H, Clausen MR, Dalsgaard TK, Mortensen G, Bertram HC. Time-saving design of experiment protocol for optimization of LC-MS data processing in metabolomic approaches. Anal Chem. 2013;85:7109–16. https://doi.org/10.1021/ac4020325.
Nürenberg G, Schulz M, Kunkel U, Ternes TA. Development and validation of a generic nontarget method based on liquid chromatography - high resolution mass spectrometry analysis for the evaluation of different wastewater treatment options. J Chromatogr A. 2015;1426:77–90. https://doi.org/10.1016/j.chroma.2015.11.014.
Zonja B, Delgado A, Pérez S, Barceló D. LC-HRMS suspect screening for detection-based prioritization of iodinated contrast media photodegradates in surface waters. Environ Sci Technol. 2015;49:3464–72. https://doi.org/10.1021/es505250q.
Müller A, Schulz W, Ruck WKL, Weber WH. A new approach to data evaluation in the non-target screening of organic trace substances in water analysis. Chemosphere. 2011;85:1211–9. https://doi.org/10.1016/j.chemosphere.2011.07.009.
Huckele S, Track T. Risk management of emerging compounds and pathogens in the water cycle (RiSKWa). Environ Sci Eur. 2013;25:1.
Bavarian State Ministry for the Environment Precipitation Garmisch-Partenkirchen-Griesen. https://www.gkd.bayern.de/en/meteo/precipitation/isar/garmisch-partenkirchen-griesen-6307/download. Accessed 29 Oct 2019.
Trautwein C, Kümmerer K. Incomplete aerobic degradation of the antidiabetic drug metformin and identification of the bacterial dead-end transformation product guanylurea. Chemosphere. 2011;85:765–73. https://doi.org/10.1016/j.chemosphere.2011.06.057.
Trautwein C, Berset J, Wolschke H, Kümmerer K. Occurrence of the antidiabetic drug metformin and its ultimate transformation product guanylurea in several compartments of the aquatic cycle. Environ Int. 2014;70:203–12. https://doi.org/10.1016/j.envint.2014.05.008 0160–4120/©.
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ Sci Technol. 2014;48:2097–8. https://doi.org/10.1021/es5002105.
European Chemicals Agency Brief Profile of 1,3-dimethylimidazolidin-2-one. https://echa.europa.eu/brief-profile/-/briefprofile/100.001.187. Accessed 29 Oct 2019.
We would like to thank the “Westfälische Wasser- und Umweltanalytik GmbH” for financial support and Dr. Andre Liesener for fruitful discussions. Moreover, the authors are thankful to Stefan Moser for software support and scientific solutions. Particular thanks go to Prof. Dr.-Ing. Jörg E. Drewes, Dr. Giorgia Greco, and various colleagues at the “Chair of Urban Water Systems Engineering” for their contributions and support. Finally, we appreciate the reviewers’ detailed comments and suggestions.
Open Access funding enabled and organized by Projekt DEAL. This study received financial support from the German Federal Ministry of Education and Research for funding FOR-IDENT (02WRS1354A) and FOR-IDENT - Nachhaltigkeit (02WRS1456).
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Published in the topical collection Persistent and Mobile Organic Compounds – An Environmental Challenge with guest editors Torsten C. Schmidt, Thomas P. Knepper, and Thorsten Reemtsma.
Electronic supplementary material
About this article
Cite this article
Minkus, S., Grosse, S., Bieber, S. et al. Optimized hidden target screening for very polar molecules in surface waters including a compound database inquiry. Anal Bioanal Chem 412, 4953–4966 (2020). https://doi.org/10.1007/s00216-020-02743-0
- Hydrophilic interaction liquid chromatography
- High-resolution mass spectrometry