Introduction

Microplastics have been found in a wide array of aquatic environments, from pristine mountain streams to the Arctic [23] to deep undersea habitats [1]. Toxicological studies have determined microplastics can cause adverse effects, such as tissue inflammation [50], impaired growth [71], feeding disruption [62], developmental anomalies [21], and changes in gene expression [69]. However, the prevalence of those biotic effects in natural aquatic ecosystems is not well understood [24].

Quantifying the risk of microplastics in aquatic ecosystems is challenging for two reasons. First, the concentrations at which those effects manifest in biota are not well understood. That uncertainty arises because of shortcomings in existing toxicological studies [16], with fewer than half of the studies conducted to date having included more than two exposure concentrations and many of those exposures at extreme concentrations well beyond what is typically encountered in the natural environment [9]. Although this testing provides useful insights into potential effects and mechanisms of toxicity, testing at multiple relevant concentrations to establish a dose-response relationship is necessary to quantitatively characterize risk. Exacerbating this problem is that microplastics have a diversity of properties, such as size, shape, and polymer type, that can affect toxicity, and few studies have quantified the relative importance of these factors [8]. Most (72% out of 163) toxicity studies have been conducted using single-sized beads of a single polymer type [42] which is a poor representation of mixtures encountered in the ambient environment [54].

The second challenge is a lack of holistic exposure data to compare directly to toxicologically derived response thresholds. Most ambient microplastics data are collected by towing ~ 330 μm mesh nets, which underestimates the abundance of microplastics smaller than the mesh size [7]. Studies that have sampled from the environment and report broader size distributions find that the smaller sized particles are exponentially more abundant [13, 37], suggesting the need for sampling regimes and/or estimation methods that capture a more complete size range of particles. Additionally, field monitoring particle data often suffers from unquantified biases due to self-contamination [55], difficulties associated with sampling and analyzing fibers [43], spectroscopic interferences and library mischaracterizations [14, 65], spatial and temporal heterogeneity [32], as well as spectroscopy subsampling regimes performed to ensure feasibility when particles are counted manually [70].

Here we take advantage of two recent advances that address these challenges. The first is a meta-analysis in which a broad array of toxicological studies were incorporated into a single risk assessment framework [42], which produced thresholds for a range of biotic responses and recommended management actions. This meta-analysis applied critical quality criteria to screen reliable toxicity studies and integrated the results into a combined outcome that transcended shortcomings of the underlying individual studies. The second is the use of probability density functions (PDFs) to quantitatively correct exposure data for biases due to sampling methodologies [33, 36]. Size abundance microplastic particle data can be used to derive probability density functions (PDFs) that allow the rescaling of field monitoring data restricted to a given size range (e.g., > 330 μm) to a more holistic size range (e.g., 1 to 5000 μm), enabling direct comparison to toxicity thresholds from laboratory studies aligned to the same size range [34].

Combining these two advances, we assess the risk to aquatic ecosystems from microplastic exposure in San Francisco Bay, California (SFB) where a comprehensive study of ambient exposure was conducted [70] (Fig. 1). After rescaling to a common size distribution, we compare the monitoring data to aligned risk and management thresholds from Mehinto et al. [42] to estimate the likelihood and pervasiveness of a local biological response. Using PDFs and Monte-Carlo modeling we quantify the uncertainty of the rescaling methods to determine where the greatest uncertainties in this risk characterization lie pointing to the science advancements needed to improve risk assessments in the future.

Fig. 1
figure 1

Flowchart of general steps involved in microplastics risk characterization as employed in this study. Data obtained/derived from respective studies are annotated using colors; Zhu et al. [70] is green, Mehinto et al. [42] is red, and this study is blue

Materials and methods

Data quality

Crucial to assessing risks is selecting data fit for that assessment purpose. Surface water monitoring data for microplastics in SFB, California, USA reported by Zhu et al. [70] were quantitatively assessed for quality according to the criteria defined for water sampling in Koelmans et al. [32]. Data reported for other matrices (e.g., stormwater effluent, fish tissue, sediment) were not scored due to a lack of established quality criteria for such matrices. Briefly, nine criteria which relate to the reproducibility and reliability of aqueous microplastics sampling, contamination mitigation, sample processing/handling, and chemical analysis were applied. For each criterion, a score of 0, 1, or 2 was applied and a total accumulated score was calculated by adding scores for individual criteria (maximum 18 points). Samples that received a ‘zero’ value for any individual score were not considered sufficiently reliable [25].

Microplastics characterization

Blank-corrected environmental microdebris occurrence data from SFB and outside of SFB, in National Marine Sanctuaries, were obtained from Zhu et al. [70]. Sampling details can be found in [70]. In short, sampling sites were picked to represent each region within SFB. Regions are characterized by differences in population sizes and point sources upstream, such as wastewater and stormwater. Surface water concentrations were spectroscopy corrected. Spectroscopy was only performed on a subset of particles, i.e., roughly 10% of each morphology within each size fraction (see [70] for more detail). This methodology was chosen in an attempt to be more representative. This led to spectroscopy conducted on 23% of all particles from surface manta trawls. The systematic removal of all fiber particle counts from manta trawl data in Zhu et al. [70] was corrected for using a subset of manta samples in which all fibers were counted and reported in Hung et al. [27]. While Zhu et al. [70] presented a novel method to correct manta trawl samples for size, a different size rescaling method was used here (i.e., [33]). Blank-, fiber-, and plastic polymer-corrected particle concentrations were rescaled to a common size distribution (1 to 5000 μm) to allow direct comparison to hazard thresholds [42] according to the methods in Koelmans et al. [33] using marine surface water size distribution data from Kooi et al. [37]. Additionally, a statistically significant outlier was identified based on four-times the mean Cook’s distance and was removed from the Zhu et al. [70] dataset, which was a sample collected from a tidal front and was highly contaminated with microplastics and other debris (sample identification: CB9-Manta-11 Jan 18).

Due to time-constraints of spectroscopically confirming the polymer identity of all particles in samples, Zhu et al. [70] subsampled particles from samples based on the number of particles of a particular morphology and size class within a given sample. For each site in the SFB and compartment (i.e., stormwater, wastewater, fish tissue, sediment, surface water) the proportion of particles that were spectroscopically determined to be a specific polymer (e.g., polyester, polyethylene, etc.) were divided by the total number of particles spectroscopically characterized for that compartment-site combination (Fig. 1). Zhu et al. [70] reported that interferences such as dyes and carbon black prevented the spectroscopic confirmation of all particles and reported some polymers using suspected terms such as “anthropogenic (synthetic)” or “anthropogenic (unknown base)”, etc. To be conservative, particles that could not be polymerically confirmed were excluded from the proportion of microdebris particles considered to be plastic, with a hierarchical schema developed here and employed to classify particle types (Fig. S1). To determine if a single polymer correction factor should be applied to all matrices or should be performed separately for each matrix, a one-way analysis of variance (ANOVA) was conducted (Table S3). Then, for surface water data obtained using manta trawl, a one-way ANOVA was conducted to determine if there are site-specific differences in plastic proportions of microdebris particles, with site-specific correction factors being applied only if the ANOVA determined statistically significant differences. Proportions of microdebris confirmed to be plastic values were multiplied by the total number of microdebris particles reported by Zhu et al. [70] for each compartment to obtain microplastic occurrence data in a probabilistic manner as part of the Monte-Carlo modeling method (described below).

Due to the mesh size of a manta trawl net (333 μm), fibers are known to pass through the sampling apparatus, leading to a systematic shape-based undercounting bias. Zhu et al. [70] did not include fibers in total particle counts in their blank-corrected concentrations for samples collected using manta trawls to remove that uncertainty with the impact of systematically undercounting microplastic particles. Hung et al. [27] reported particle count data for all shapes - including fibers - in 9 manta trawl samples taken from various sites across the SFB and Marine Sanctuaries. This manta trawl fiber subsampling dataset was used to derive a correction factor to estimate the amount of microplastics that would be present in the other manta trawl samples from SFB if plastic fibers had been counted in those samples, with uncertainties propagated probabilistically using the Monte-Carlo method described below. For each manta trawl sample with fiber counts, the proportion of particles that were fibers was calculated and a fiber correction factor was derived as the inverse of one minus the median fraction of particles that were fibers. Due to the small sample size, site-specific differences in fiber proportions were not considered.

Rescaling of environmental concentrations

Environmental microdebris occurrence data from SFB reported in Zhu et al. [70] included various size ranges of particles based on each sampling technique (e.g., > 333 μm for manta trawl; > 50 μm for grab samples: see Table S8) which were rescaled to a common size distribution of 1-5000 μm to compare to ecotoxicity thresholds aligned to the same size distribution in Mehinto et al. [42] based on the methods described in Koelmans et al. [33, 34]. Environmental concentrations were multiplied by a correction factor derived for each sampling technique based on their particle size limits (Eq. 1) [33].

$${C}_{env}={CF}_{meas}\ast {C}_{meas}$$
(1)

In Eq. 1, Cenv is the environmentally realistic occurrence concentration in particles · L− 1 (adjusted for non-alignment of mesh sizes), CFmeas is a dimensionless correction factor for the environmentally monitored concentration (meas); and Cmeas is the measured environmental concentration, expressed in particles · L− 1 [33]. Environmental concentrations are rescaled to an upper (UL,D; μm) and lower default size range (LL,D; μm) (here 5000 and 1 μm respectively), using the power law slope of microplastic particle abundance in the environment based on size (a, unitless), with the upper limit (UL, meas; μm) and lower limit (LL, meas; μm) defined by the size limits of quantification of the monitoring method employed (Eq. 2) [33].

$${CF}_{meas}=\frac{L_{UL,D}^{1-a}-{L}_{LL,D}^{1-a}}{L_{UL, meas}^{1-a}-{L}_{LL meas}^{1-a}}$$
(2)

Kooi et al. [37] derived power slope exponents (a) based on size for freshwater and marine environments across several locations in Europe using individual particle datasets obtained using state-of-the-science Fourier-transform infrared imaging coupled with a focal-plane array detector and automated image analysis. When measured particles length data was pooled across all samples for each compartment across distinct locations (e.g., Rhine and Dommel rivers), power law exponent values contained low variability within each compartment, but were significantly different between compartments (e.g., a = 2.64 ± 0.01 for marine and a = 2.07 ± 0.03 for freshwater surface waters), implying microplastic size relationships are highly conserved within compartments [37]. While within-compartment particle size distributions are not expected to deviate significantly across regions, site-specific data would be preferable to rescale environmental concentrations so long as the data is high resolution and is reliable [37]. Lacking site-specific high-resolution particle size distribution data, compartment-specific a values for length were used from Kooi et al. [37] to rescale environmental concentrations in SFB (Table S7). To account for within-compartment variability in rescaling environmental concentrations, uncertainties in the derived a value were propagated probabilistically using Monte-Carlo methods described below.

Zhu et al. [70] reported particle length data for SFB for all compartments (except surface water obtained using 1-L grab samples) which was measured manually and was not intended to provide high-resolution information on particle size distributions. Nonetheless, this dataset was used to derive compartment-specific size a values according to the methods described in Kooi & Koelmans [36] as part of a sensitivity analysis only and were not used to rescale concentrations for risk characterization purposes. Briefly, all particle monitoring data from SFB reported on an online open data repository managed by the California Natural Resources Agency (https://data.cnra.ca.gov/dataset/microplastic-sf-bay) were aggregated, and only the measured lengths of individual particles spectroscopically confirmed to be plastic were used in estimating size distributions. Plastic particle data were grouped by compartment (i.e., surface water, sediment, fish tissue, wastewater effluent, stormwater runoff), and abundance particle length-based data were plotted on a log-log scale with relative abundance on the y-axis and particle size on the x-axis and a linear trendline fit to the data was used to derive the a exponent value. Since power laws usually only apply to values greater than some minimum value (in this case, particle length), both the minimum applicable value and the final a value were determined using a maximum likelihood estimation method [10, 48] using the poweRlaw package [22], and bootstrap 100 times. As part of the sensitivity analysis only, the a value derived using manta trawl data (> 333 μm) was used to rescale the manta trawl-derived surface water monitoring data.

Risk characterization

Aquatic ecotoxicological thresholds were used to characterize risk by deriving the ratio of predicted no-effect concentration (PNEC) thresholds to predicted environmental concentrations (PEC), with exceedances greater than one indicative of risk. While data were available for microplastics concentrations in various matrices (e.g., surface water, stormwater, wastewater effluent, sediment, fish tissue) in the SFB, only surface water concentration data were used for risk characterization as direct comparisons of undiluted stormwater or wastewater are not representative of environmental exposures, and due to the lack of hazard thresholds for marine sediment or fish tissue.

Surface water concentration data were compared to ecotoxicological thresholds derived by Mehinto et al. [42] using species sensitivity distributions (SSDs) based on chronic no-observed-adverse-effect concentrations for 14-16 freshwater and marine species from 6 to 7 taxonomic groups. Prior to derivation, Mehinto et al. [42] screened 162 peer-reviewed laboratory toxicity studies for a set of pre-defined quality criteria based on the standards defined by de Ruijter et al. [16]. A total of 290 threshold data points were extracted from 21 studies that met the minimum pre-defined criteria. These thresholds were aligned to a common size distribution of 1-5000 μm using environmental PDFs and based on mechanisms of action as described in Koelmans et al. [32] and Kooi et al. [36]. Following Kooi et al. [36], Mehinto et al. [42] derived thresholds for two effect mechanisms/pathways – food dilution and tissue translocation, which were aligned by volume and surface area ecologically relevant metrics, respectively. The food dilution-based effect considered particles small enough to be ingested by the organism of interest to be accessible (i.e., exclude non-accessible particles), then aligned (both monodisperse and polydisperse) laboratory effect concentrations to environmentally realistic concentrations based on particle volume [42]. The tissue translocation mechanism of action considered particles wide enough to translocate across tissues (83 μm) following ingestion to be accessible and aligned laboratory effect concentrations to environmentally realistic concentrations based on translocatable surface area [42]. Although the methodology used to align thresholds was identical for the mechanism of the food dilution effect, additional studies were used in Mehinto et al. [42] compared to Koelmans et al. [32], which only used effect thresholds data for studies in which the authors confirmed that a food dilution mechanism was demonstrated or was plausible. For both effect mechanisms/pathways, four PNEC thresholds were derived which correspond to different levels of confidence that microplastics can cause adverse effects to aquatic organisms and call for varying levels of management action - ranging from increasing monitoring to implementing Source control measures (Table S9) [42]. At the time of writing, the risk management actions associated with thresholds from Mehinto et al. [42] carry no regulatory or legal authority in California or any other jurisdiction and are only suggestions.

Sensitivity analysis

Uncertainties were evaluated probabilistically using Monte Carlo methods based on PDFs derived for each correction factor, including: manta trawl fiber correction (shape under-counting bias), plastic fraction of total microdebris particles (spectroscopic subsampling variability), and rescaling concentrations to a common size range (a variability) (Fig. 1). To obtain the combined correction factor with probabilistic propagation of uncertainties, a data frame of 10,000 values was generated for each correction factor based on their modeled distribution, and each of these three data frames was multiplied by one another row-wise. The 50th percentile value from this combined correction factor distribution was used to correct the manta trawl surface water monitoring data, and the 5th and 95th percentile values were used to calculate uncertainty. This methodology accurately accounts for the underlying distributions of the correction factor data and is preferable to error propagation techniques that rely on assumptions of normality.

For each correction factor, a theoretical distribution was fit to the data based on the shape of the underlying distribution (see Table S10) - which was evaluated visually using Cullen and Frey graphs (Fig. S8) and using a maximum likelihood estimation approach with the fitdistrplus R package [17]. Due to the highly skewed distribution in the manta trawl fiber correction data (Fig. S6), these data were log-transformed prior to derivation of a PDF, then back-transformed into linear space following the Monte Carlo simulation. Following log-transformation, fiber correction data (unitless; > 1) were well-fit by a normal distribution (Fig. S8a). Plastic proportion data (unitless; 0 - 1) were well-fit by a two-shape beta distribution (Fig. S8b) that was truncated such that values greater than one was not produced during the Monte Carlo simulated due to their theoretical implausibility (i.e., more than 100% of particles cannot be plastic). Because the size-based a values used to rescale concentrations (i.e., from Kooi et al. 2022) were derived using maximum likelihood estimation based on a log-log linear distribution (Fig. S7; Table S7), a normal distribution was assumed, and the PDF was approximated using a normal distribution based on the mean and standard deviation of the linear regression slope (Table S7). Correction factors were derived from the Monte-Carlo simulated distribution of alpha values using Eq. 2.

To quantify and compare the relative sensitivity of correction and rescaling factors applied here on the resulting exposure assessment, variability for each parameter was held constant while uncertainty in the other variables was calculated. Finally, to assess the relative uncertainty between rescaled and corrected environmental occurrences with modeled risk thresholds, the 95th percentile of the Monte Carlo simulated occurrence data water was compared to the 95% confidence intervals for microplastics hazard thresholds reported in Mehinto et al. [42] based on the SSD model. An additional sensitivity analysis was performed using site-specific a values derived for SFB using manta trawl particle length data as described above.

Statistics

All statistical analyses were conducted in R (version 4.1.1; R Core Team, [53]) and figures were produced using the package ggplot2 [66]. Base maps sourced from Google were used for mapping using the ggmap package [30]. One-way ANOVAs were used to determine if plastic correction factors should be separated for each matrix and site (for manta trawl only). To determine if bias corrections (i.e., rescaling, fiber correction, plastic correction) resulted in comparable concentrations between water matrices, one-way ANOVAs were run for both raw and rescaled/corrected concentrations. For all hypothesis tests, statistical significance was determined at an alpha level of 0.05, and multiple comparisons were performed using Tukey’s Honest Significant Difference post-hoc test, when applicable. All Monte-Carlo simulations were performed with 10,000 iterations with a seed set for reproducibility.

Results

Data quality

Microplastics monitoring data reported in Zhu et al. [70] received total accumulated scores of 13, 10, and 14 for manta trawl, grab samples, and wastewater treatment plant effluent samples respectively according to criteria defined in Koelmans et al. [32] (Table S1). While manta trawl and wastewater treatment plant effluent data received a score of at least one for each quality criteria, grab samples received “zero” scores for several criteria (sample size and sample treatment) (Table S1). Accordingly, manta trawl and wastewater treatment plant effluent data from Zhu et al. [70] are considered sufficiently reliable for the purposes of risk characterization, while the grab sample data are not, however only manta trawl data (surface water) were used for risk characterization due to the non-applicability of wastewater data for estimating exposure. Because blank-corrections were applied based on color-morphology combinations instead of polymer identification, all matrices received a score of “1” for negative controls instead of “2”. The blank correction procedure applied in Zhu et al. [70] may lead to an underestimation of concentrations if microplastic particles of the same color have a different polymer identity - an uncertainty which is not accounted for in this probabilistic assessment. The grab, wastewater treatment plant effluent, and manta trawl data scores were higher than the average score for surface waters (7.9; range 4 to 15; n = 55) reported in Koelmans et al. [32]. While additional quality criteria are available for biota and sediment (Bäuerlein PS, Erich MW, van Loon W, Bakker I, Mintenig SM, Koelmans AA: A monitoring and data analysis method for microplastics in marine sediments for OSPAR and MSFD, Submitted) (Redondo- Hasselerharm Paula Elisa, AR, Koelmans, AA: Risk assessment of microplastics for freshwater benthic ecosystems guided by strict quality criteria and data alignment methods, Submitted) [25], risk thresholds are unavailable for these compartments and these data were not quality scored here. Additionally, stormwater was not scored due to a lack of established quality criteria.

Microplastics characterization

The percentage of analyzed particles spectroscopically determined to be plastic was significantly different between matrices according to a one-way ANOVA (p < 1 × 10− 16; Table S3), so matrix-specific plastic correction factors were derived accordingly using PDFs (Figs. S3 and S9; Tables S2 - S4; Table S10). Tukey’s post-hoc test for significance revealed significant differences in plastic proportions of total particles between manta trawl and sediment, fish tissue, wastewater treatment plant (WWTP) effluent, and surface water collected with 1-L grab (Table 1 and Table S4). Surface water samples collected with manta trawl contained the highest percentages of confirmed microplastics (72% ± 24%), followed by surface water collected by 1-L grab (42% ±24), sediment (37% ± 14%), wastewater treatment plant effluent (31% ± 18%), and fish tissue (24% ±12%) (Table 1, Table S2; Fig. S2). Additional significant differences were found between sediment and fish tissue; sediment and 1-L grab surface water; and fish tissue and 1-L grab surface water (Table 1 and Table S4). As surface water data obtained using manta trawl were the only monitoring data used for risk characterization here, site-specific differences were tested using a one-way ANOVA, which demonstrated no significant difference in proportions of plastic particles relative to all spectroscopically characterized particles by location (p = 0.12; Table S5; Figs. S3 – S4). Accordingly, a single correction factor for plastic percentages was applied to all manta trawl data regardless of location, which was the median value of 0.63 (0.31 to 0.95: 95% CI) that was derived from a two-shape beta distribution PDF (Table 1 and Table S10; Fig. S9).

Table 1 Summary of correction factors applied probabilistically to samples from SFB

To correct for the systematic removal of fiber particle data from the blank-corrected dataset reported in Zhu et al. [70], data were used from Hung et al. [27] for 9 manta trawl samples from SFB in which all fibers were counted. On average, fibers constituted 78% (± 28% sd) of particles in the manta trawl samples in which they were counted (Table 1 and Table S6). Other aqueous matrices and sediment contained lower percentages of fibers, while fish tissue contained a higher percentage (Table 1 and Table S6, Fig. S5). Based on the PDF of the fiber proportions, a fiber correction factor of 8.87 (95% CI: 1.29 to 50.89; Table 1 and Table S10) was calculated and applied to manta trawl monitoring data as part of the Monte-Carlo analysis. Due to the relatively small sample size (n = 9) and skewed nature of the fiber proportion distribution (Fig. S6), the fiber correction factor contains relatively high uncertainty compared to the plastic spectroscopy correction factor and size rescaling correction factor (Fig. S13).

Rescaled environmental occurrence data

Size abundance distributions of microplastics in SFB were fit by linear regression on a log(10)-log(10) scale using a maximum likelihood estimation approach [37], with a exponent values ranging from 2.15 to 3.02 (Fig. S7, Table S7). Length-based power law exponent values (a) derived for microplastics in SFB were comparable to values derived from various locations in Europe reported by Kooi et al. [37] (Table 1 and Table S7). Notably, the a values for marine surface waters were 2.15 ± 0.48 and 2.07 ± 0.03 (mean ± sd) for SFB and in Europe, respectively, thus representing less than a 5% difference and are not statistically significant from one another (Table 1 and Table S7). Additionally, a law exponents followed the same rank order by matrix between Kooi et al. [37] and those derived here for comparable matrices (i.e., marine surface water < wastewater effluent < marine sediment). The greatest difference between a law exponents was for marine sediment (2.90 ± 0.41 and 2.57 ± 0.20 for SFB and Europe, respectively) (Table 1 and Table S7). Direct comparisons for power law exponents derived for SFB to other studies/locations were not possible for stormwater runoff (which has not been reported elsewhere) or fish tissue (“biota” reported in [37] corresponds to benthic invertebrates).

Size-based correction factors for matrices ranged from 58 to 9774 depending on matrix (a value) and mesh size (Table 1 and Table S8). Fish tissue data had the smallest mesh size (25 μm) and had the smallest correction factor accordingly (58; 95% CI: 53 to 63) (Table 1 and Table S8). Manta trawl data had the largest mesh size (333 μm) and had the second largest correction factor (529; 95% CI: 401 to 704 based an a value of 2.07 from [37]) (Table 1 and Table S8). Stormwater data had a smaller mesh size than manta trawl (106 μm), however the correction factor was over 10x higher (9361; 95% CI: 20 to 6,120,045) due to the high a value (2.97 ± 0.83) (Table 1 and Table S8).

In theory, rescaling data and correcting for systematic biases (i.e., fiber correction, spectroscopic subsampling) should reduce differences in monitoring concentrations taken at similar times and locations within a given matrix due to size-differences in mesh sizes of sampling apparatus [33]. Before rescaling and correcting, surface water concentrations collected using manta trawl as well as effluent from wastewater treatment plants were not significantly different from one another according to one-way ANOVA with Tukey’s post-hoc (p > 0.05; Tables S11 – S12) but were both significantly lower than surface water collected through other means (stormwater, 1-L grab surface water; p < 0.001) (Table S12). Additionally, 1-L grab surface water concentrations were significantly higher than stormwater concentrations collected with a depth-integrated peristaltic pump (p = 0.03; Table S12). Following rescaling and correcting, manta trawl-collected surface water concentrations were still significantly lower than other surface water concentrations collected via other methods (i.e., stormwater, 1-L grab surface water) (p < 0.001; Tables S13 – S14), however wastewater concentrations were no longer significantly different from both 1-L grab and manta trawl-collected surface water concentrations (p > 0.05) (Fig. 2; Table S14). Despite the combined correction factor to account for systematic under-counting based on size and fibers as well as fractions of particle spectroscopically confirmed to be plastics, rescaled and corrected manta trawl surface water data were still significantly lower (p = 3.4 × 10− 14; Table S14) than rescaled surface water 1-L grab samples, of which the majority were taken at similar locations and times. These results suggest that additional systematic biases are present in either the manta trawl (likely undercounting) or the 1-L grab samples (potentially overcounting). Undercounting in manta trawl samples may be due in part or in whole to imprecise blank corrections based on shape-color combinations as opposed to polymer-based corrections.

Fig. 2
figure 2

Unadjusted (blue) and rescaled (1 to 5000 μm; red) aqueous microplastics concentrations in SFB for A) aquatic matrices, B) sediment, and C) fish. Data are presented as box and whisker plots, with the center lines representing the median values (50th percentiles), while the box contains the 25th to 75th percentiles. The whiskers mark the 5th and 95th percentiles, and values beyond those upper and lower bounds are considered outliers, marked with dots. All monitoring data were rescaled for size using matrix-specific PDFs derived for SFB and were corrected for plastic proportions due to spectroscopic subsampling. Manta trawl data were further corrected to account for systematic removal of fibers from blank-corrected data in Zhu et al. [70]. For each matrix, sampling apparatus are defined in parentheses

Risk characterization

Depending on the postulated effect mechanism/pathway (i.e., food-dilution or tissue translocation), risk exceedances of microplastics in SFB vary significantly. For all comparisons of PNECs (i.e., hazard thresholds from Mehinto et al. [42]) with PECs (i.e., corrected, and rescaled surface water concentrations in SFB) stated throughout this manuscript, only those in which the 95% CI does not include ‘0%’ represent statistically significant exceedances. Accordingly, only food-dilution thresholds one, two, and three have statistically significant exceedances in the SFB, while all other thresholds (i.e., food-dilution threshold four, and all tissue translocation thresholds) do not.

Comparison of corrected and rescaled manta-trawl collected surface water samples with food-dilution thresholds derived by Mehinto et al. [42] resulted in 82% (95% CI: 27% to 100%) of samples exceeding the most conservative risk threshold (i.e. “Investigative monitoring” threshold one), 27% (95% CI: 3% to 73%) of samples exceeding threshold two (“Discharge monitoring”), 21% (95% CI: 3% to 58%) of samples exceeding threshold three (“Management planning”), and 3% (95% CI: 0% to 18%) of samples exceeding threshold four (“Source control measures”) (Fig. 3; Table S15).

Fig. 3
figure 3

Comparison of corrected and rescaled surface water concentrations of microplastics in SFB collected using manta trawl with food-dilution threshold derived by Mehinto et al. [42]. A percentages of samples exceeding each threshold are shown as bar plots, with solid-line error bars reflecting the 25th and 75th percentiles and dashed-line error bars reflecting the 95th percentile confidence intervals derived using Monte Carlo simulations (n = 10,000) based on probability density functions derived from the combined variability of correction factors and rescaling. B Empirical cumulative density plot of surface water concentrations and 25th and 95th percentile confidence intervals of correction factors compared to food dilution thresholds. Exceedances are only considered statistically significant when the 95% confidence interval does not include ‘0%’

Comparison of surface water samples with tissue translocation-based thresholds derived by Mehinto et al. [42] resulted in 3% (95% CI: 0% to 9%) of samples exceeding the most conservative risk threshold (i.e., “Investigative monitoring” threshold one), 0% (95% CI: 0 to 3%) of samples exceeding threshold two (“Discharge monitoring”), 0% (95% CI: 0 to 3%) of samples exceeding threshold three (“Management planning”), and 0% (95% CI: 0 to 0%) of samples exceeding threshold four (“Source control measures”) (Table S15).

Risk exceedances were higher during the rainy season, with 94% (95% CI: 41% to 100%) of surface water samples collected following a storm event exceeding food dilution threshold one compared with 71% (95% CI: 12% to 100%) of samples collected during the dry season (Fig. S10). Rainy season samples exceeded food dilution threshold three within confidence limits (29%; 95% CI: 6% to 71%), however dry season samples did not (12%; 95% CI: 0% to 47%) (Fig. S10).

Risk exceedances varied by location within the SFB (Fig. 4, Table S16). The Central Bay had the highest proportion of samples exceeding risk thresholds, with 85% (95% CI: 38% to 100%) exceeding Mehinto et al. [42]‘s most conservative food dilution threshold one (“Investigate monitoring”), 38% (95% CI: 8% to 85%) exceeding food-dilution threshold two (“Discharge monitoring”), 38% (95% CI: 8% to 77%) exceeding threshold three (“Management planning”), and 8% exceeding food dilution threshold four (“Source control measures”), however exceedances of threshold four were not statistically significant (95% CI: 0 to 31%) (Table S16). Additionally, the Central Bay was the only location with any samples exceeding a tissue translocation-based threshold at the 50th percentile, with 8% exceeding threshold one – however these exceedances were not statistically significant (95% CI: 0 to 23%) (Table S16).

Fig. 4
figure 4

Map of San Francisco Bay showing food dilution threshold risk exceedances based on corrected and rescaled surface water concentrations of microplastics collected using manta trawl. Points represent approximate coordinates of manta trawl sampling locations. Colors represent risk in relation to food dilution thresholds in Mehinto et al. ([42]; re-produced in Table S9). Greater risk can be seen within SFB and San Pablo Bay relative to open-ocean waters outside of the SFB

Comparison of SFB samples to samples taken from outside of the bay demonstrated substantially higher risk within the bay. Samples taken from the National Marine Sanctuaries - which is an open-ocean location with minimal inputs from wastewater discharge or stormwater runoff and was selected as a reference location as part of the study design [70] did not have any samples exceeding the most conservative threshold (i.e., food-dilution threshold one) with statistical significance (i.e., 35%; 95% CI: 0% to 91%) (Table S16; Fig. 4). Of the samples in the National Marine Sanctuaries exceeding food dilution threshold one at 50th percentile, the three highest were at the mouth of the bay just West of the Golden Gate Bridge, suggesting rapid dilution of microplastic particle concentrations outside of the SFB (Fig. 4).

Sensitivity analysis

Comparison of the influence of factors in estimating environmental occurrence from manta trawl data reveals that the fiber correction factor contains the highest relative uncertainty compared to the spectroscopic sub-sampling correction factor for plastics and the size-based alignment correction factor (Fig. S13). Holding variability for all correction/rescaling factors constant except for the fiber correction, the 95% confidence interval for percentage of samples in SFB exceeding Mehinto et al. [42]‘s food-dilution threshold one is (29% to 100%), compared with (76% to 88%) for size rescaling, (65% to 94%) for the plastic-proportion due to spectroscopic subsampling correction, and (26% to 100%) for combined rescaling and corrections (Fig. S13). If the fiber correction factor is omitted from the analysis entirely, uncertainty decreases substantially in the risk characterization, and the overall number of statistically significant risk exceedances decreases as well (Fig. S11). If the fiber correction factor is not applied, 27% of SFB samples would exceed food-dilution threshold one (95% CI: 18% to 39%) compared to 82% of samples when the fiber correction is applied (95% CI: 27% to 100%) (Fig. S11 and Table S15).

While the size distribution value (a) has a substantial impact on the outcome of the risk characterization due to the high correction factor values derived for manta trawls (529; 95% CI: 401 to 704, Fig. S9), the site-specific values for marine surface waters for SFB were of minimal difference from those derived elsewhere and applied for this risk characterization (i.e. [37]) and therefore had limited uncertainty in this assessment (Figs. S13 and S14). However, larger mesh sizes correspond to exponentially larger correction factors (Eq. 2) and are therefore highly influential in the case of manta trawl data (333 μm mesh). For example, the correction factor for 1-L grab samples would be 66 (95% CI: 55 to 80), using the same a value and uncertainty applied for manta trawl here, indicating the higher uncertainty and influence of rescaling manta trawl data compared to grab samples.

Comparison of the total uncertainties associated with estimating environmental surface water concentrations with the uncertainties in risk thresholds from Mehinto et al. [42] reveals comparable levels of uncertainty, with food dilution thresholds spanning ~ 2 to 5 orders of magnitude between 95% confidence intervals depending on the tier (Table S10) while estimated environmental concentrations for manta trawl surface samples span ~ 2.5 orders of magnitude between 95th percentiles based on the combined correction and rescaling uncertainties (Table S11).

Discussion

Here, we combine occurrence data from SFB, California with a risk assessment framework to estimate the risk to local aquatic ecosystems. The risk framework includes hazard thresholds for two ecologically relevant categories of effect mechanisms - food dilution and mechanisms triggered upon tissue translocation. Based on the best available toxicological evidence and monitoring data, our results suggest that microplastic exposure in SFB in 2017 was high enough to cause biological perturbation through the food dilution effect mechanism. Eighty-two percent of the SFB had concentrations that exceeded Mehinto et al. ‘s [42] tier one food-dilution threshold (“Investigative monitoring”) with statistical significance, with the highest percentages of statistical exceedances occurring within the Central Bay. Furthermore, the Central Bay was the only region within the SFB with any samples exceeding the third food dilution threshold (“Management planning”) with statistical significance, however no samples exceeded the highest food dilution threshold (need of immediate source control measures) with statistical significance. Because samples were not taken with the goal of being spatially or temporally representative of the SFB [70], additional monitoring is suggested to improve confidence in risk characterizations.

Our analysis suggests that the risk associated with tissue translocation-mediated effects is substantially less than that for the food-dilution endpoint in SFB. While there were a few samples with concentrations greater than the first threshold at the 50th percentile, the exceedances were not statistically significant, and there were no samples above any of the other three thresholds at the 50th percentile. This lesser effect likely reflects tissue translocation-mediated effects being initiated by the subset of particles that are small enough (< 83 μm) to permeate the intestinal wall [28, 42, 52], whereas food-dilution is caused by a wider spectrum of particle sizes - based on ingestibility - that artificially fill the gut and lead to reduced food assimilation by blocked food passage [5, 12, 39, 47]. Still, our analysis suggests that early stages of tissue translocation-mediated toxicities (e.g., oxidative stress, inflammation [40, 67];) are possible in SFB, a finding that would not have been apparent without the rescaling procedures used to correct the underestimation of small particles captured in trawl nets [33]. Rescaling particle counts based on size to correct for sampling bias in combination with toxicological thresholds aligned to ecologically relevant metrics provides the opportunity to compare exposure and hazard more appropriately [34].

Based on the species sensitivity distributions used to derive the food-dilution thresholds in Mehinto et al. [42], the most sensitive species are the black-lip pearl oyster (Pinctada margaritifera), the marine medaka fish (Oryzias melastigma), and a water flea (Ceriodaphnia dubia), thus representing a diversity of taxonomic groups (mollusk, fish, and crustacea). Like many productive marine/freshwater estuarine systems, these three taxa are present in the SFB, with some similar species for which these laboratory model organisms may be suitable indicators. For instance, the Olympia oyster (Ostrea conchaphila) is native to the SFB and has experienced declining abundance which has been primarily attributed to loss of habitat and other factors [49]. Experiments using model species within the Pinctada genus (e.g., Pinctada mazatlanica) have been used to inform risk management of the Olympia oyster in SFB [64].

The ecological risk that we found for SFB was larger than that for several previous risk characterizations conducted for other geographies (e.g., [2, 19, 33]). Key reasons for these differences are the use of different hazard threshold values and alignment procedures (or lack thereof), and that this sampling effort focused inside an urban enclosed water body with limited circulation. The SFB area has a large population of over 7 million people [45], and 39 WWTPs feeding into the bay [26]. This difference is confirmed by the application of our risk characterization methodology in the National Marine Sanctuary areas in the open ocean. Here, we did not find any samples exceeding any thresholds with statistical significance (Table S16), with the three highest samples taken at the mouth of the SFB (Fig. 4, interactive map in SI). Comparison of SFB concentrations to global marine surface water concentrations rescaled by Everaert et al. [19] to the same size range used here (1 to 5000 μm) using similar methods, reveals higher concentrations within SFB than in ~ 75% of global marine locations [19]. While most marine monitoring data has been conducted in the open ocean, measurements in enclosed areas near urban centers indicate higher contamination. Everaert et al. [19] reported 50 microplastics·L− 1 in the Yellow Sea near China - an enclosed water body adjacent to a population of ~ 600 million people [61] - which is higher than ~ 98% of rescaled surface water samples in SFB. The high concentrations in the SFB and other enclosed water bodies demonstrates the importance of targeted monitoring to protect coastal resources globally.

While our probabilistic 95th percentiles based on Monte Carlo modeling and PDFs were sufficiently small to confidently state SFB contains microplastics at concentrations of biological concern, there is room for improvement in future risk characterizations to reduce uncertainties. Despite the research advancements that allowed us to compare exposure data to hazard data with higher certainty than what has been previously possible without the rescaling and realignment procedures developed in Koelmans et al. [33], our risk estimates for SFB still contain substantial uncertainty and understanding the factors that contribute to that variability will help focus advancements needed to improve future estimates. Here we employed four key factors to calculate risk: ambient concentration measurements, critical thresholds at which biological effects manifest, size rescaling and other correction procedures used to correct for data collection biases, and the alignments of the biological thresholds [34]. Understanding the relative contribution of these factors to variability will help improve precision of future studies.

Variability associated with the fiber correction factor represented the largest uncertainty in this risk characterization (Fig. S13). The ambient concentrations used to characterize risk were collected using surface manta trawls [70], which are known to significantly undercount particles smaller than the mesh size (~ 333 μm), which includes the width-measurement of fibers allowing them to pass through the mesh like spaghetti [, 18, 70]. For this reason, even after correcting for the uncounted fibers and rescaling concentrations to a common size range using compartment-specific size distributions, estimates generated by the 1-L grab samples taken at the same time as the trawl remained several orders of magnitude higher (Fig. 2), suggesting our SFB risk estimates based on corrected and rescaled manta trawl data were either an underestimate, or that the 1-L grab samples overcounted particles. Another possible cause for underestimation of number concentrations is the fact that the blank corrections were made based on color and morphology combinations, rather than polymer identity due to the use of spectroscopic subsampling as opposed to identification of all particles. This leads to underestimation of concentrations if microplastic particles of the same or similar color have a different polymer identity. We chose to rely on the manta trawl data because the 1-L grab sample volume was too small to provide accurate representation [32], with the higher percent relative standard deviations for duplicate manta samples (up to 13.3) being much lower than 4 duplicate grab samples (up to 46.9 [27];). This variability was so great that the manta and grab samples were not even significantly correlated (R2 = 0.04), even though they were paired closely in space and time (Fig. S12). Future risk characterization studies would be improved by applying polymer-specific data and blank corrections and using pumps with in-line filtration that include small mesh sizes (< 300 μm) to reduce small-scale spatial variability while capturing a fuller range of particle sizes than a towed net can  [41, 44, 58].

The rescaling procedures described in Koelmans et al. [33] help with addressing some of the size undercounting biases associated with use of 333 μm mesh manta trawl nets, but these corrections yield additional sources of uncertainty, and complicate comparisons to other risk characterizations that do not perform rescaling procedures. The large mesh size of a manta trawl (333 μm) means that the size-based correction factor is relatively sensitive to the size exponent value used (see Table S8). For instance, the manta-trawl collected rescaled concentrations in SFB are nearly an order of magnitude higher than similarly rescaled data from manta-trawls reported in Everaert et al. [19] due to the larger size power exponent value used here. Everaert et al. [19] used a power exponent of 1.6 (± 0.5) corresponding to a correction factor of 40x for a 333 μm mesh, while a power exponent of 2.07 (± 0.03) was used here and corresponds to a correction factor of 530x for a 333 μm mesh. The power exponent used in Everaert et al. [19] was derived by Kooi and Koelmans [32] using the best available data at the time - which was arbitrarily size binned data extracted from tables and graphs from other studies - and is therefore less accurate than the values derived in Kooi et al. [37] which used high-resolution datasets at the individual particle level. Using data from five studies that used state-of-the-art Fourier-transform infrared imaging and automated analysis, Kooi et al. [37] derived a length-based power exponent value of 2.07 (±0.03) for marine surface waters, which is slightly smaller and substantially more certain than the site-specific value derived here using low-resolution manta trawl data measured using manual techniques (i.e., 2.15 ± 0.48). To reduce uncertainty due to size rescaling in SFB, small mesh-size samples could be obtained using in-line filtration and be analyzed using state-of-the-science analytical techniques to derive local data (e.g., [44, 51]).

A minor point of uncertainty was the spectroscopy correction factor applied to concentration data to ensure that only plastic particles were used to characterize risk (Fig. S13). We did not consider particles that were clearly anthropogenic but were not unequivocally plastic, due to spectroscopic interferences from dyes and/or plastic additives ([70]; see Fig. S13). As such, the concentrations used here are also an underestimation for this reason. Additionally, unquantified inaccuracies in blank corrections based on color-shape combinations are expected to result in further underestimations of exposure and therefore risk. To reduce this uncertainty, future studies should use microplastic-specific spectral libraries to reduce the proportion of spectra that are less polymer-specific [, 15, 46] as well as automation to allow for chemical confirmation of all particles and polymer-specific blank corrections [51].

Uncertainties in this risk characterization due to the selection of concentration thresholds at which effects manifest are both clearly illustrated and robust, as Mehinto et al. [42] identified four thresholds that bracket the severity of response whereas other risk assessments have relied on single thresholds and tested the sensitivity based on relevant factors. While the variability associated with the distribution modeling component of the SSDs in Mehinto et al. [42] spanned ~ 2 to 5 orders of magnitude, the median threshold values were insensitive to individual studies - which had only approximately a two-fold influence. Furthermore, the eight threshold values used in this study from Mehinto et al. [42] bracket the published range for microplastics (see [34]), with 95% confidence intervals based on SSDs being smaller in the food dilution thresholds in Mehinto et al. [42] than those of thresholds derived in previous thresholds. Koelmans et al. [33] developed and applied the same rescaling and alignment methodology to obtain an SSD using studies which demonstrated ingestion and suggested food dilution as the effect mechanism and derived a hazard concentration for 5% of species (HC5) of 76 particles/L (95% CI: 11 to 521 particles/L). Additional microplastics HC5 values which have not been rescaled or aligned span several orders of magnitude, however six out of nine published values are within confidence intervals of 76 particles/L [19, 20, 29, 56], with three exceeding the range due to their inclusion of nanoplastics data [2,3,4] as demonstrated in a review by Koelmans et al. [34]. There is room for additional studies to improve the threshold values [59], with specific attention to experimental design to assess risk more accurately [16] and additional studies using fibers (which were highly abundant in the SFB), however the uncertainty associated with the threshold values is still smaller than that of the exposure data.

While the Zhu et al. [70] monitoring study in SFB included more than just surface water samples, here our risk assessment was limited to surface water exposure. At present, quantitative risk thresholds have not yet been developed for marine sediment exposure. However, microplastics are known to accumulate in sediments, and the SFB had relatively high concentrations of microplastics in the sediments. Moreover, we only considered the particle-induced effects of microplastics and did not account for additional potential risks resulting from pathogens [6, 68], or potential chemical-mediated effects (see [31] for summary of sorbed contaminants), including the leaching of chemical additives [71] that preferentially desorb following ingestion [11]. In particular, tire wear particles were a large portion of the microplastics found in SFB sediments and chemical derivatives from those tire products have been found to cause acute mortality in salmon [60]. Furthermore, microplastics are anticipated to interact with and exacerbate effects from additional stressors such as thermal stress due to climate change [35, 63]. Future work should seek to look holistically at risk in the SFB, including comparing relative risks from microplastics particles and other stressors such as dissolved and sorbed chemicals, for which risk assessment frameworks already exist.

Conclusion

Overall, our results indicate that SFB has regions where present exposure concentrations are above thresholds for risk with statistical confidence based on the best available ecotoxicological hazard thresholds. If we continue business as usual, inputs of microplastics to coastal environments are anticipated to triple over the next 20 years [38]. As such, the region might consider management actions now to prevent greater risk in the future. Beyond San Francisco Bay, our study can inform risk assessments in local regions across the globe. Combined these methods can be used globally to inform management locally.