Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples

Plassmann, Merle M.; Tengstrand, Erik; Åberg, K. Magnus; Benskin, Jonathan P.

doi:10.1007/s00216-016-9563-3

Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples

Rapid Communication
Open access
Published: 27 April 2016

Volume 408, pages 4203–4208, (2016)
Cite this article

Download PDF

You have full access to this open access article

Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples

Download PDF

Merle M. Plassmann¹,
Erik Tengstrand¹,
K. Magnus Åberg¹ &
…
Jonathan P. Benskin¹

3765 Accesses
27 Citations
1 Altmetric
Explore all metrics

Abstract

Non-targeted mass spectrometry-based approaches for detecting novel xenobiotics in biological samples are hampered by the occurrence of naturally fluctuating endogenous substances, which are difficult to distinguish from environmental contaminants. Here, we investigate a data reduction strategy for datasets derived from a biological time series. The objective is to flag reoccurring peaks in the time series based on increasing peak intensities, thereby reducing peak lists to only those which may be associated with emerging bioaccumulative contaminants. As a result, compounds with increasing concentrations are flagged while compounds displaying random, decreasing, or steady-state time trends are removed. As an initial proof of concept, we created artificial time trends by fortifying human whole blood samples with isotopically labelled standards. Different scenarios were investigated: eight model compounds had a continuously increasing trend in the last two to nine time points, and four model compounds had a trend that reached steady state after an initial increase. Each time series was investigated at three fortification levels and one unfortified series. Following extraction, analysis by ultra performance liquid chromatography high-resolution mass spectrometry, and data processing, a total of 21,700 aligned peaks were obtained. Peaks displaying an increasing trend were filtered from randomly fluctuating peaks using time trend ratios and Spearman’s rank correlation coefficients. The first approach was successful in flagging model compounds spiked at only two to three time points, while the latter approach resulted in all model compounds ranking in the top 11 % of the peak lists. Compared to initial peak lists, a combination of both approaches reduced the size of datasets by 80–85 %. Overall, non-target time trend screening represents a promising data reduction strategy for identifying emerging bioaccumulative contaminants in biological samples.

Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system

Article Open access 11 January 2023

Functional Heatmap: an automated and interactive pattern recognition tool to integrate time with multi-omics assays

Article Open access 15 February 2019

Integrative analysis of time course metabolic data and biomarker discovery

Article Open access 09 January 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Most approaches for screening environmental contaminants target individual chemicals or chemical classes using highly specific analytical methods. Despite their utility for low-level detection and quantification, these methods often overlook novel contaminants or transformation products which may pose a risk to humans and wildlife. Recent advances in mass spectrometry and chemometrics have addressed this limitation through development of non-targeted screening approaches, in which samples are analyzed without a priori knowledge of the contaminants of interest [1]. Non-targeted methods involve broad sample extraction procedures combined with gas or liquid chromatography high-resolution mass spectrometry (GC- or LC-HRMS, respectively), advanced data processing tools, and identification by comparison with mass spectral libraries and structure elucidation.

Among the principal challenges of a purely non-targeted screening approach is the number of peaks present in datasets, which can be on the order of several thousands per sample [2, 3]. Therefore, processing of non-targeted datasets is time intensive, making it advantageous to reduce the number of relevant peaks prior to attempting compound identification. Current strategies for data reduction include flagging peaks with chlorine and bromine isotopes or retention time homologues, and statistical comparisons, in which peaks absent in controls are selected for further investigation. While demonstrating great potential for screening water [2, 4–7] and sediment samples [8], applications to biological matrices are less common [9–11]. This is likely due to the ubiquity of endogenous substances (i.e., metabolites), which are not easily differentiated from xenobiotics [3] and are present at substantially higher concentrations in blood samples compared to xenobiotics [12].

An alternative data filtering strategy—specific to chronological datasets—involves flagging important chromatographic peaks based on their systematic (i.e., non-random) fluctuation over time. This approach was applied for narrowing down transformation products and metabolites in river sediments [13], and a similar feature is offered through the software package SIEVE (Thermo Fisher Scientific Inc., USA), which can allow the user to assess data based on intensity or trend ratios. Despite showing considerable potential for identifying important features in a chronological dataset, these approaches rely on visual assessment of trends, which is not practical for filtering the thousands of peaks obtained from non-targeted analysis of biological samples. A more automated, statistically based approach was reported by Peters et al. [14] in which curve fitting and autocorrelation algorithms were applied to detect non-random variation in metabolite levels, resulting in a >98 % data reduction. While showing great promise for metabolomics (where both increasing and decreasing trends are important), this approach may not be appropriate for emerging bioaccumulative contaminants, which are expected to only display increasing time trends.

In the present work, we investigated an automated, statistically based data reduction strategy for identifying emerging bioaccumulative contaminants using increasing peak intensities over time. Decreasing trends were not included since they are less relevant to emerging contaminants, but could be investigated simply by reversing the sample order. As an initial proof of principle, human whole blood samples were fortified with isotopically labelled xenobiotics to create different time trends. Following extraction, analysis by ultra performance liquid chromatography (UPLC)-HRMS, and peak alignment, two statistical approaches were employed. The extent of data reduction was assessed, as well as the efficacy of each method for filtering model compounds.

Materials and methods

Standards and reagents

Standards of caffeine-d₉, sulfamethoxazole-d₄, bezafibrate-d₅, diflufenican-d₃, metoprolol-d₇, sotalol-d₆, propranolol-d₇, fluoxetine-d₅, diatrizoic acid-d₆, glimepiride-d₅, ranitidine-d₆, and acetaminophen-d₄ were obtained from Toronto Research Chemicals (Toronto, Canada). Labelled standards were chosen due to availability and their suitability towards the analytical method. Human whole blood samples from nine anonymous individuals were obtained from Karolinska Institutet (Stockholm, Sweden) in accordance with ethical guidelines set by the Swedish ethics committee.

Sample preparation

Spiking scenarios represented either increasing trends starting at different time points using caffeine-d₉, sulfamethoxazole-d₄, bezafibrate-d₅, diflufenican-d₃, metoprolol-d₇, sotalol-d₆, propranolol-d₇, and fluoxetine-d₅ or trends which increased initially and then plateaued using diatrizoic acid-d₆, glimepiride-d₅, ranitidine-d₆, and acetaminophen-d₄. Concentrations increased by a factor of two to ten over the course of the time trend. In order to introduce variability into the dataset, each of the nine individual blood samples were used as a different time point (arbitrary time trend increments of 1 to 9 [unitless]). The number of samples in the time series was selected in order to have a sufficient number of time points to generate a robust time trend, but so as not to generate an excessively large peak list. Of the nine samples, three artificial time trends were prepared by fortifying blood samples (1 mL each) with labelled standards at high (10–100 ng/mL), medium (2–20 ng/mL), and low (0.2–2 ng/mL) concentration ranges. An additional series was prepared without fortification with labelled standards (blank series). Spiked trends normalized to 100 % of the highest concentration are presented in Fig. 1, and exact fortification levels can be found in Table S1 in the Electronic Supplementary Material (ESM).

The blood samples were extracted according to a previously tested method [15] which involved liquid-liquid extraction with 2 mL of acetonitrile (ACN), 0.4 g of MgSO₄, and 0.1 g of NaCl. Three stainless steel beads (3.2 mm diameter) were added, and the mixture was placed into a bead blender (1600 MiniG®, SPEX SamplePrep, USA) for 30 s at 1500 rpm, followed by centrifugation at 2500 rpm. An aliquot of the supernatant (1.6 mL) was concentrated to dryness by N₂ and reconstituted in 80 μL of ACN/H₂O (1/1).

Instrumental analysis

Analysis was performed using an Acquity UPLC coupled to a Xevo G2-S quadrupole time-of-flight (QTOF) mass spectrometer (Waters) via an electrospray ionization source operated in positive mode. The instrumental analysis method was adapted from methods previously used in a collaborative trial on non-target screening of water [2]. Five microliters of extract was injected onto an Acquity UPLC HSS C18 SB column (2.1 × 100 mm, 1.8 μm) maintained at room temperature. Separation was achieved using a 19-min gradient from 95 % H₂O (5 mM ammonium formate, 0.01 % formic acid) to 99 % ACN (0.01 % formic acid) with a flow of 0.5 mL/min (plus a 2-min equilibration time). The mass spectrometer was operated in full scan (100–1000 Da) with a scan time of 0.25 s and a collision energy of 4 eV.

Data processing

Data processing was conducted using the software TracMass2 [16], running under MATLAB (MathWorks®, USA). Parameters used for peak detection and alignment are listed in Table S2 (ESM). Peak lists containing aligned peaks were created for each spike level and one containing all 36 samples. To reduce the number of false positives, peaks detected in a single sample were not included. Statistical analysis was conducted in MATLAB and Microsoft Excel.

Two statistical approaches were tested, one based on comparison of average intensities in two sample sets and one testing the increasing trend by application of Spearman’s rank correlation coefficient. For each peak, the following calculations were performed: First, the average detected intensities at time points 7–9 were divided by the average detected intensities at time points 1–6 (+1 to avoid dividing by 0). We defined this value as the “time trend ratio (TTR).” A high TTR—representing a possible emerging bioaccumulative contaminant—is produced by peaks with low intensities in early samples and high intensities in later samples of the time trend. Second, Spearman’s rank correlation coefficient was calculated for all peaks with detections in at least three samples in the time trend. This results in a value close to 1 for peaks with a monotonically increasing time trend. Peaks in the full peak lists were subsequently ranked according to calculated TTR and Spearman’s rank correlation coefficients (ρ).

Results and discussion

Detection

The nine blood samples were extracted and analyzed four times each, for a total of 36 analyses. The number of total aligned peaks detected in each artificial time trend series (detection in ≥2 of 9 samples) was 11,800, 11,400, 12,600, and 12,200 for the high, medium, low, and blank levels, respectively. When aligning all time trend series in one list, a total of 21,700 aligned peaks (detection in ≥2 of 36 samples) were obtained. The consistency in number of peaks arises from using the same blood samples for each time trend series. Following analysis by TracMass2, 11 of 12 spiked compounds (all except diatricoic acid-d₆) were detected at the high and medium spike levels, and 8 were detected at the low spike level (not detected: diatricoic acid-d₆, acetaminophen-d₄, caffeine-d₉, and diflufenican-d₃). Spiked compounds were not detected in the blank series. The spiked and measured time trend scenarios are plotted in Fig. 1, showing reasonable consistency even at the low spike level.

To assess the distribution of replicate and biological variation in the dataset, Bayesian ANOVAs [17] were performed. The relative standard deviation (RSD) of the four replicates (three spike levels and one blank) was compared to the RSD of the nine individual samples (biological RSD; see Fig. S2, ESM). The replicate RSD was on average 21 %, with 90 % of peaks displaying RSDs of 6–47 %. In contrast, the average biological RSD was 24 % but showed a much broader range (1–77 %), indicating that replicate variation in the data is about the same as the variation between samples from different persons. Therefore, for the analysis of real time trend samples, several replicate analyses should be conducted to reduce uncertainty in the detected intensities. The use of quality control samples and internal standards has been described as another means of reducing analytical variability [18]. Additionally, the repeated analysis of pooled samples could reduce the variability of both the endogenous and exogenous compounds present at each time point. On the other hand, pooling samples in a longitudinal study should be conducted with caution as this can result in a loss of information.

Ranking

Ranking peaks for the entire spiked time trend series according to the TTR or ρ values resulted in a high rank for each of the spiked compounds using at least one of the two methods. The calculated TTR and ρ values and the resulting ranks for the spiked compounds at the high spike level can be found in Table 1, while the data for the other two spike levels are listed in Table S3 (ESM).

Table 1 Time trend ratios (TTR), Spearman’s ρ, and resulting ranks of spiked compounds in the peak list of the high-spike-level artificial time trend. The colored names represent the scenarios in the same colors of Fig. 1

Full size table

The TTR calculated by comparison of average intensities was particularly effective at ranking spiked compounds only present at two to three of the latest time points in the time trend high on the list, i.e., caffeine-d₉ and sulfamethoxazole-d₄. These two compounds showed substantially higher ratios than other spiked compounds present at more than three time points. This calculation is thus an efficient method to filter out substances appearing in recent years (i.e., emerging bioaccumulative contaminants), which may thus far have not been discovered. The TTR comparing the latest three with the first six time points ranks those compounds only present at one to three of the latest time points high on the list. When changing the TTR to comparing the latest four time points with the first five instead, the ranks of the compounds present at the latest four time points were increased; however, the rank of caffeine-d₉ (present at the two latest time points only) was decreased. This thus includes more compounds appearing at a wider time span. Which TTR to use for future applications is thus dependent on the number of time points and the span of years that are covered.

Using Spearman’s rank correlation coefficient, ten spiked compounds received ρ values of ≥0.76 at the high spike level (≥0.76 and ≥0.72 at the medium and low spike levels, respectively); no value was calculated for caffeine-d₉ as it was only detected in two samples. The magnitude of the concentration increase over the time course did not affect the ρ value since it was calculated based on ranks. Thus, ρ was solely affected by how well the increase showed a monotonic trend, as can be seen by comparing glimepiride-d₅ and ranitidine-d₆ with acetaminophen-d₄. Glimepiride-d₅ and ranitidine-d₆ displayed random variation over the last five time points resulting in ρ values of 0.77 and 0.76, respectively, while acetaminophen-d₄ displayed only one time point breaking a monotonically increasing trend, resulting in a ρ value of 0.98. When ranking according to ρ values, all ten compounds were in the top 8 % of the entire peak list at the high spike level (and in the top 6 and 11 % at the medium and low levels, respectively). Thus, this rank test appears to be a good test to filter out compounds when a general increasing trend is present at three or more time points.

Peak list reduction

The two tested approaches filter out two different types of trends in the data: those associated with compounds which are predominantly present at some of the latest time points (using the TTR) or compounds with a general increasing trend (ρ). Thus, an assessment including both tests was conducted, which included all spiked compounds. The extent of data reduction based on the rankings using both TTR and ρ was assessed. All peaks with either a TTR of ≥10 or a ρ ≥ 0.7 were combined in a separate peak list and duplicates were removed. This resulted in combined peak lists of 1800, 1700, and 2600 peaks for the high, medium, and low spike levels, respectively. Compared to the full peak lists, this represented a data reduction of 85, 85, and 80 %, for high, medium, and low spike levels, respectively.

Clearly, in a scenario involving real (i.e., unfortified) time trend samples, greater variability in the dataset may be expected. However, this variability can be reduced through inclusion of multiple samples per time point or a pooled sample. Even with a larger margin of safety, we expect that the number of peaks in a peak list could be reduced substantially using non-targeted time trend screening. Future work will apply this approach to real time trend samples, where pollutants with known increasing time trends (e.g., perfluoroalkyl acids [19]) can be used to assess the TTR and ρ values for peak list cutoff.

Despite the peak list reduction, the number of peaks left after using the non-target time trend approach is still too large to be identified. Thus, on top of the approach tested here, peak lists need to be further reduced by assessing isotopic ratios, adducts, and in-source fragmentation [7], along with checking peaks against known metabolite databases (e.g., the human metabolome database [20]) to exclude endogenous compounds from the peak lists [3]. These approaches, combined with non-targeted time trend screening, have the potential to significantly reduce non-targeted datasets, allowing greater resources to be placed on identification using suspect lists, isotopic or homologue pattern, mass defects, fragmentation spectra, and finally the comparison with reference standards.

References

Krauss M, Singer H, Hollender J. LC–high resolution MS in environmental analysis: from target screening to the identification of unknowns. Anal Bioanal Chem. 2010;397:943–51.
Article CAS Google Scholar
Schymanski E, Singer H, Slobodnik J, Ipolyi I, Oswald P, Krauss M, et al. Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis. Anal Bioanal Chem. 2015;407:6237–55.
Article CAS Google Scholar
Plassmann MM, Brack W, Krauss M. Extending analysis of environmental pollutants in human urine towards screening for suspected compounds. J Chromatogr A. 2015;1394:18–25.
Article CAS Google Scholar
Ibanez M, Sancho JV, McMillan D, Rao R, Hernandez F. Rapid non-target screening of organic pollutants in water by ultraperformance liquid chromatography coupled to time-of-light mass spectrometry. Trac-Trends Anal Chem. 2008;27:481–9.
Article CAS Google Scholar
Moschet C, Wittmer I, Simovic J, Junghans M, Piazzoli A, Singer H, et al. How a complete pesticide screening changes the assessment of surface water quality. Environ Sci Technol. 2014;48:5423–32.
Article CAS Google Scholar
Hug C, Ulrich N, Schulze T, Brack W, Krauss M. Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. Environ Pollut. 2014;184:25–32.
Article CAS Google Scholar
Schymanski EL, Singer HP, Longrée P, Loos M, Ruff M, Stravs MA, et al. Strategies to characterize polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry. Environ Sci Technol. 2014;48:1811–8.
Article CAS Google Scholar
Chiaia-Hernandez A, Schymanski E, Kumar P, Singer H, Hollender J. Suspect and nontarget screening approaches to identify organic contaminant records in lake sediments. Anal Bioanal Chem. 2014;406:7323–35.
Article CAS Google Scholar
Hernández F, Portolés T, Pitarch E, López FJ. Searching for anthropogenic contaminants in human breast adipose tissues using gas chromatography-time-of-flight mass spectrometry. J Mass Spectrom. 2009;44:1–11.
Article Google Scholar
Liotta E, Gottardo R, Bertaso A, Polettini A. Screening for pharmaco-toxicologically relevant compounds in biosamples using high-resolution mass spectrometry: a ‘metabolomic’ approach to the discrimination between isomers. J Mass Spectrom. 2010;45:261–71.
Article CAS Google Scholar
Rotander A, Kärrman A, Toms L-ML, Kay M, Mueller JF, Gómez Ramos MJ. Novel fluorinated surfactants tentatively identified in firefighters using liquid chromatography quadrupole time-of-flight tandem mass spectrometry and a case-control approach. Environ Sci Technol. 2015;49:2434–42.
Article CAS Google Scholar
Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Perspect. 2014;122:769–74.
Google Scholar
Li Z, Maier MP, Radke M. Screening for pharmaceutical transformation products formed in river sediment by combining ultrahigh performance liquid chromatography/high resolution mass spectrometry with a rapid data-processing method. Anal Chim Acta. 2014;810:61–70.
Article CAS Google Scholar
Peters S, Janssen H-G, Vivó-Truyols G. Trend analysis of time-series data: a novel method for untargeted metabolite discovery. Anal Chim Acta. 2010;663:98–104.
Article CAS Google Scholar
Plassmann M, Schmidt M, Brack W, Krauss M. Detecting a wide range of environmental contaminants in human blood samples—combining QuEChERS with LC-MS and GC-MS methods. Anal Bioanal Chem. 2015;407:7047–54.
Tengstrand E, Lindberg J, Åberg KM. TracMass 2—a modular suite of tools for processing chromatography-full scan mass spectrometry data. Anal Chem. 2014;86:3435–42.
Article CAS Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. Boca Raton: Chapman & Hall/CRC; 1995.
Google Scholar
van der Kloet FM, Bobeldijk I, Verheij ER, Jellema RH. Analytical error reduction using single point calibration for accurate and precise metabolomic phenotyping. J Proteome Res. 2009;8:5132–41.
Article Google Scholar
Roos A, Berger U, Järnberg U, van Dijk J, Bignert A. Increasing concentrations of perfluoroalkyl acids in Scandinavian otters (Lutra lutra) between 1972 and 2011: a new threat to the otter population? Environ Sci Technol. 2013;47:11757–65.
Article CAS Google Scholar
Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, et al. HMDB 3.0—the Human Metabolome Database in 2013. Nucleic Acids Res. 2013;41:D801–7.
Article CAS Google Scholar

Download references

Acknowledgments

We greatly acknowledge Rikard Tröger and the Section for Organic Environmental Chemistry and Ecotoxicology at the Swedish University of Agricultural Sciences for the help and measurement time on the LC-QTOF instrument.

Author information

Authors and Affiliations

Department of Environmental Science and Analytical Chemistry (ACES), Stockholm University, Svante Arrhenius Väg 8, 10691, Stockholm, Sweden
Merle M. Plassmann, Erik Tengstrand, K. Magnus Åberg & Jonathan P. Benskin

Authors

Merle M. Plassmann
View author publications
You can also search for this author in PubMed Google Scholar
Erik Tengstrand
View author publications
You can also search for this author in PubMed Google Scholar
K. Magnus Åberg
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. Benskin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Merle M. Plassmann.

Ethics declarations

Human whole blood samples from nine anonymous individuals were obtained from Karolinska Institutet (Stockholm, Sweden) in accordance with ethical guidelines set by the Swedish ethics committee.

Conflict of interest

The authors declare that they have no competing interests.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 443 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Plassmann, M.M., Tengstrand, E., Åberg, K.M. et al. Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples. Anal Bioanal Chem 408, 4203–4208 (2016). https://doi.org/10.1007/s00216-016-9563-3

Download citation

Received: 09 December 2015
Revised: 08 April 2016
Accepted: 12 April 2016
Published: 27 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00216-016-9563-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples

Abstract

Similar content being viewed by others

Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system

Functional Heatmap: an automated and interactive pattern recognition tool to integrate time with multi-omics assays

Integrative analysis of time course metabolic data and biomarker discovery

Introduction

Materials and methods

Standards and reagents

Sample preparation

Instrumental analysis

Data processing

Results and discussion

Detection

Ranking

Peak list reduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-target time trend screening: a data reduction strategy for detecting emerging contaminants in biological samples

Abstract

Similar content being viewed by others

Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system

Functional Heatmap: an automated and interactive pattern recognition tool to integrate time with multi-omics assays

Integrative analysis of time course metabolic data and biomarker discovery

Introduction

Materials and methods

Standards and reagents

Sample preparation

Instrumental analysis

Data processing

Results and discussion

Detection

Ranking

Peak list reduction

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation