Background

For decades, water bodies have been structurally degraded to protect against flooding, gain land for agriculture and settlements, and navigate rivers [1]. To achieve this, rivers were straightened (i.e., channelized) [2, 3], their beds obstructed and monotonous transversal and longitudinal profiles were created to avoid bank erosion and enable navigation [2, 3]. Additionally, wooded banks were cleared [3], gravel banks dredged, river connectivity was hindered by the construction of weirs, and old arms were cut off [4]. This resulted in a loss of aquatic habitats and thus in a decreased biodiversity [2, 5,6,7]. With the European Water Framework Directive (2000/60/EC; EU-WFD) [8], the European Parliament and the Council of the European Union adopted in 2000 a regulatory framework for measures in the field of water policy. The aim of the EU-WFD was to achieve a good ecological and chemical status for all European surface waters by 2015, which is defined by a near-natural water type-specific species assemblage of aquatic flora, benthic invertebrate and fish fauna as well as specific pollutants, hydromorphological, chemical and physico-chemical components as expected in conditions of minimal anthropogenic changes and disturbances [8,9,10]. The ecological status of a given water body is then determined by the worst rated component (“one-out, all-out-principle”) [8]. Accordingly, all stressors, including pollutants, can lead to a loss of species and thus to altered biocenoses, resulting in an inadequate status of water bodies according to the EU-WFD. In principle, it is therefore to be expected that chemical contamination may lead to ecological deficits in water bodies according to the EU-WFD.

In fact, only 8.2% of German surface waters and a minority of rivers in other European countries achieved the required good ecological status according to the EU-WFD by 2015 [9, 10]. This is mainly attributed to structural deficits (e.g., straightening, damming, embankments) [2, 11, 12] and also to chemical contaminations of water bodies (e.g., by wastewater treatment plants, intensive agriculture) [6, 13,14,15,16]. However, the relative contribution of chemical contamination in relation to structural deficits to the inadequate ecological status of water bodies remains unclear.

Nevertheless, hydromorphological restorations with the aim of increasing habitat and species diversity are considered as a key measure to improve the ecological status of water bodies [5, 13, 17, 18]. Such morphological restorations include dismantling of bank and bed fixations [5, 19, 20], removal of weirs to restore river connectivity enabling the migration of fish and invertebrates [20, 21], purchase of land along the watercourse and thus the extensification of land use [12, 20], channel reconfiguration and reconnection of floodplains for flood protection [2, 20], alteration of structural complexity to increase habitat and species diversity [20], as well as the creation of riparian strips to reduce trophic effects by capturing nutrients and toxicants, moderating temperatures and introducing organic matter [2, 21, 22].

However, a number of studies have investigated the efficiency of restoration measures with conflicting results. While few studies reported an enhanced water quality [23,24,25] and an improved diversity of the benthic invertebrate assemblage [17, 26,27,28], others found little signs of improvement even years after the restoration [13, 29,30,31,32,33]. As a possible cause for the poor success of many restoration efforts, the potential impact of the prevailing chemical contamination at two model restorations in the catchment of the river Nidda in Hessen (Germany) is in focus of the present study. For this purpose, we investigated the biological effects caused by chemical contamination in active biomonitoring campaigns, as they represent temporally integrated exposures to pollutants and thus offer a more holistic approach than the investigation of individual pollutants in grab samples, which only provide a snapshot of the chemical contamination in water bodies.

Therefore, we performed active biomonitoring campaigns and laboratory experiments with combined water/sediment samples of the corresponding sampling sites with the freshwater mudsnail Potamopyrgus antipodarum and the amphipod Gammarus fossarum at sites in restored river sections, at unrestored reference sites upstream (space-for-time-substitution [16, 34]) and at a transect downstream the restored sections to account for changes in biological responses. In addition, water and sediment samples from every sampling site were analyzed with effect-based in vitro bioassays (yeast reporter gene assays and microtox assay) to support the in vivo findings.

Methods

Sampling sites and restoration measures

The Nidda catchment, which covers almost 2000 km2, is characterized by intensive agricultural and industrial use [35] and represents a typical catchment of Central Europe. Many water bodies in the catchment have been structurally degraded due to river engineering for flood protection and exhibit numerous obstacles to migration as for example weirs, dams and hydropower plants. In addition, the river Nidda and its tributary Horloff are negatively impacted by intensive agriculture, as well as municipal and industrial wastewater treatment plants (WWTPs) resulting in proportions of clearwater (i.e., treated wastewater) of up to 50% at mean low discharge (MNQ) [36] and thus a potential high chemical contamination level. On the other hand, for more than 20 years restoration measures have been conducted in the Nidda catchment. Despite these efforts, the ecological status of the water bodies has not improved and is still deficient [35, 37]. For this reason, we examined two restoration measures with different intervention depths in a comparative manner: the lower Horloff, whose status in the investigated river sections was assessed with ecological status class 5 (bad), and the river Nidda, which corresponds to ecological status class 4 (poor) throughout the examined sections [35, 37].

At the river Horloff, we chose three unrestored reference sites upstream of the restored section (marked in white in Fig. 1a) based on space-for-time-substitution [16, 34]. Thus, the reference sites represent the unrestored condition in the same river stretch, which is also subject to the same influences as the restoration. Reference site H1 is located 60 m upstream of a WWTP effluent with a capacity of 78,000 population equivalents (PE) [38], H2 is 160 m downstream the WWTP effluent and H3R, which serves as statistical reference, is about 700 m upstream the restoration site H4. Additionally, the river Horloff receives the surface runoff of the A45 motorway bridge upstream of site H4 [35]. Restoration site H4 is located within a river section which was restored in 2002/2003 and 2006/2007 and extends over a total length of 1.6 km (marked in light grey in Fig. 1a). Due to intensive agriculture, the intervention depth for the restoration measure was lower compared to the second model restoration at the river Nidda. Then five transect sites follow 1.4 km (H5), 2.4 km (H6), 4.3 km (H7), 6.1 km (H8) and 8.1 km (H9) downstream of the restored section, in which possible changes in biological effects were investigated (marked in dark grey in Fig. 1a).

Fig. 1
figure 1

Sampling sites at the river Horloff (a) and sampling sites at the river Nidda (b). Black: wastewater treatment plant, white: reference sites, with H3R and N2R as statistical reference sites, light grey: sites in restored section, dark grey: transect sites. Maps were modified with Adobe® Photoshop CC (Version 20.0.0, Adobe Systems Incorporated, San José, California, US) and are based on kompass.de [39]

At the river Nidda, we selected two unrestored reference sites (marked in white in Fig. 1b) on the basis of space-for-time-substitution [16, 34]. N1 is located 50 m upstream the effluent of a WWTP with a capacity of 48,000 PE [38] and N2R, which serves as statistical reference, lies 500 m downstream the WWTP effluent. Subsequently, three restoration sites N3, N4 and N5 follow (marked in light grey in Fig. 1b). N3 represents the restoration at the so-called “Nidda-Knie” from 2001 and sampling sites N4 and N5 are located within the restoration “Gronauer Hof” from 2010, which extends over a length of 3.2 km. Then four transect sites follow 1.0 km (N6), 1.7 km (N7), 3.2 km (N8) and 6.0 km (N9) downstream the restored section (marked in dark grey in Fig. 1b).

Test organisms

Potamopyrgus antipodarum, the New Zealand mudsnail, was chosen as test organism, as it is a standard organism in the testing of chemicals according to OECD guideline 242 [40] and reacts sensitively towards reproductive toxicants including endocrine disrupting chemicals (EDCs) [41,42,43]. Besides, P. antipodarum has successfully been used in field studies to evaluate the conditions of rivers and environmental samples [44,45,46,47,48]. The snails used in the present study originated from the in-house breeding stock of the Department Aquatic Ecotoxicology at Goethe University, which was kept according to the recommendations of the OECD guideline 242, annex 2 [40].

The genus Gammarus, which includes the freshwater amphipod Gammarus fossarum, reacts sensitively to pollutants such as pesticides (e.g., terbutryn, fenoxycarb) or micropollutants from wastewater (e.g., 17α-ethinylestradiol) [49,50,51,52,53,54,55]. Furthermore, G. fossarum has already been used in field studies to assess the conditions of rivers and environmental samples [56,57,58,59,60]. The gammarids used in the present study were collected 1 day prior to the active biomonitoring campaigns and laboratory experiments from the source region of the river Nidder (N 50°29′7″, E 9°14′52″, Sichenhausen, Hessen, Germany), and kept in aerated river water in a climate chamber at 10 ± 1.5 °C over-night until tests started.

Active biomonitoring campaigns

The active biomonitoring campaigns at the rivers Horloff and Nidda were conducted in March and May 2017, respectively, and were performed as previously described in detail [48], but with minor modifications. Thus, the number of snails and gammarids used in the active biomonitoring campaigns was increased to ten gammarids and ten snails per replicate. The sexing of gammarids prior to the exposure was not possible, since determination of sexes requires a fixation of the individuals. Gammarids with a minimum size of 6.0 mm were introduced in stainless steel enclosures (12.5 cm × 6 cm) with a piece of wire gauze (polytetrafluorethylene, 8.2 cm × 3.3 cm) and conditioned black alder leaves (Alnus glutinosa) ad libitum. Snails with a size of 3.5 to 4.5 mm were introduced in stainless steel tea-eggs (4.5 cm × 3.5 cm) that contained pieces of carrots from controlled biological cultivation ad libitum. At each site, two cages each containing three replicates were exposed (in total 60 snails and 60 gammarids per site). In addition, a data logger (HOBO Pendant®, Onset Computer Corporation, Bourne, USA) was exposed simultaneously at each site measuring the water temperature every 30 min. After 28 days of exposure, snails and gammarids were recovered and checked for mortality. Snails were shock frozen in liquid nitrogen per replicate, and gammarids were fixed separately in 70% ethanol. As endpoints the size and the number of embryos in the brood pouch of snails as a reproductive parameter were measured according to OECD guideline 242 [40], while in gammarids the size, sex, and fecundity index describing the number of eggs depending on the size of the respective gammarid were assessed.

Associated water parameters (water temperature, pH, conductivity, oxygen concentration and saturation) were measured with a portable multimeter (HQ40d, Hach, Germany), ammonium, nitrite, nitrate, ortho-phosphate, sulfate, chloride and dissolved organic carbon (DOC) concentrations were determined via Spectroquant test kits (Merck, Darmstadt, Germany), and total hardness as well as carbonate hardness were determined with MColortest kits (Merck, Darmstadt, Germany) at the beginning of the biomonitoring campaigns.

Laboratory experiments with combined water/sediment samples

Static laboratory experiments with P. antipodarum und G. fossarum using combined water/sediment samples from each site served for plausibility verification to exclude environmental stressors such as water temperature, hydraulic pressure or stream velocity as causes for occurring in vivo effects and were conducted in parallel to the active biomonitoring campaigns. Tests with mudsnails were conducted at 16 ± 1.5 °C in 500 mL glass beakers with 400 mL river water and 40 g sediment according to Duft et al. [41]. The negative control contained 400 mL test medium according to the OECD guideline 242 [40] and 40 g artificial sediment [95% (dw) quartz sand, 5% (dw) powdered beech leaves (Fagus sylvatica)] according to Duft et al. [41]. The test conditions followed the OECD guideline 242 [40] and every combined water/sediment sample and the control were tested in duplicate, each containing 26 individuals of P. antipodarum with sizes between 3.5 and 4.5 mm. Snails were fed three times per week with 70 µg finely ground TetraPhyll® (Tetra GmbH, Melle, Germany) per snail and day.

The tests with gammarids were performed at 10 ± 1.5 °C in 250-mL glass beakers with 200 mL river water and 50 g sediment. The negative control contained 200 mL ISO test water [61] and 50 g artificial sediment according to Duft et al. [41]. In the experiments with gammarids, we used six replicates for the control and four replicates for every combined water/sediment sample, each containing 10 individuals of G. fossarum with a minimum size of 6.0 mm. Gammarids were fed ad libitum with conditioned black alder leaves (Alnus glutinosa). After 28 days of exposure, snails and gammarids were checked for mortality, snails were shock frozen in liquid nitrogen per replicate and gammarids were fixed separately in 70% ethanol and subsequently examined regarding the same endpoints as described for the active biomonitoring campaigns. Associated water parameters (water temperature, pH, conductivity, oxygen saturation and concentration) were measured once per week.

In vitro analyses of water samples

At the start day of the active biomonitoring campaigns, aqueous grab samples were collected at every site. Within 48 h after collection, anti-estrogenic and anti-androgenic activities of unfiltered native water samples were analyzed in the yeast anti-estrogen screen (YAES) and the yeast anti-androgen screen (YAAS) [62]. Thereby, YAES and YAAS require background concentrations of 0.3 nmol 17β-estradiol/L and 10 nmol testosterone/L, respectively.

For the analysis with agonist screens and the microtox assay, water samples were solid-phase extracted (SPE) according to Giebner et al. [62]. Therefore, 1000 mL of each water sample were filtered within 24 h after collection through glass microfibers filters (VWR International GmbH, No. 692, European Cat. No. 516-0885, 90 mm, particle retention: 1.0 µm, Darmstadt, Germany), and the filtrate was passed through conditioned Oasis HLB cartridges (200 mg, Waters, Milford, MA, USA) to capture mid-polar and non-polar substances [63]. Furthermore, a SPE blank was prepared by passing 1000 mL ultrapure water through conditioned cartridges. The cartridges were dried under a gentle stream of nitrogen and eluted with 4 mL methyl tert-butyl ether (MTBE) and 4 mL methanol (MeOH) according to Giebner et al. [62]. Afterwards, 0.1 mL dimethyl sulfoxide (DMSO) was added and the extracts were concentrated under a gentle stream of nitrogen to the final volume of 0.1 mL, which corresponds to a 10,000-fold enrichment. Subsequently, the extracts were analyzed in the yeast estrogen screen (YES), the yeast androgen screen (YAS) [62] and the yeast dioxin screen (YDS) [64]. The maximum DMSO concentration amounted to 0.21% in yeast assays. The measured activities were expressed as equivalent concentrations for 17β-estradiol (YES), testosterone (YAS), 4-hydroxytamoxifen (YAES), flutamide (YAAS) and β-naphthoflavone (YDS) and have been corrected for dilution and enrichment so that equivalent concentrations refer back to native water samples.

In addition, the microtox assay with Aliivibrio fischeri was conducted with water extracts [49, 65]. The maximum DMSO concentration amounted to 1% in the microtox assay. Therefore, the luminescence inhibition in A. fischeri is measured, which is expressed as 50% effect concentration (EC50) referring to the relative enrichment factor (REF) of the respective water sample. An EC50-threshold value of 750 REF was defined for water samples that reached less than 20% luminescence inhibition according to Harth et al. [49]. This threshold is equivalent to the lowest EC50 that a non-toxic sample can reach.

In vitro analyses of sediment samples

Sediment samples were collected from the first two centimeters of the upper sediment layer at each site on the start day of the active biomonitoring campaigns. Sediment samples were freeze-dried (Martin Christ Gefriertrocknungsanlagen GmbH, Alpha 1-4 LSC plus, Osterode, Germany), and 50 g of each sediment sample was shaken with 100 mL ultrapure water at 210 rpm for 10 min (GFL 3017, GFL Gesellschaft für Labortechnik mbH, Burgwedel, Germany), eluted by sonication for 10 min (Sonorex RK 52 H, Bandelin electronic, Berlin, Germany) and afterwards centrifuged at 4400 rpm for 5 min (Centrifuge 5702, Eppendorf AG, Hamburg, Germany). After centrifugation, the estrogenic (YES), androgenic (YAS), anti-estrogenic (YAES), anti-androgenic (YAAS) and the dioxin-like activities (YDS) of the aqueous eluates were measured within 48 h [62, 64]. These activities were expressed as equivalent concentrations per kg sediment for 17β-estradiol (YES), testosterone (YAS), 4-hydroxytamoxifen (YAES), flutamide (YAAS) and β-naphthoflavone (YDS) and corrected regarding dilution.

To quantify the baseline toxicity of sediment samples in the microtox assay with A. fischeri, sediment extracts were prepared. Therefore, 20 g of each freeze-dried sediment sample was extracted with 400 mL acetone in a Soxhlet extractor (Electrothermal EME30500/CEB, Cole-Parmer Ltd., Staffordshire, UK; VWR RC-10 Digital Chiller, VWR International GmbH, Darmstadt, Germany) at 56 °C for 24 h. Sediment extracts were concentrated in a rotary evaporator (Heidolph Laborota 4000-efficient, vacubrand CVC 2000, Heidolph Instruments GmbH & Co. KG, Schwabach, Germany; VWR RC-10 Digital Chiller, VWR International GmbH, Darmstadt, Germany) at 56 °C, 0.5 mL DMSO were added and extracts were reduced under a gentle stream of nitrogen to the final volume of 0.5 mL. These extracts were subsequently analyzed in the microtox assay [49, 65]. The inhibitions of the luminescence are expressed as EC50 referring to mg sediment-equivalents. An EC50 threshold value of 30 mg sediment-equivalents was defined for non-toxic sediment samples, i.e., samples that reached less than 20% inhibition of luminescence. This threshold is equivalent to the lowest EC50 that a non-toxic sample can reach.

In addition, the mean grain size [66] and the loss on ignition [67] were determined in sediment samples from each sampling site.

Data analysis

Statistical analyses were conducted with the software Microsoft® Excel 2016 (Microsoft Corporation, Redmond, USA) and GraphPad Prism®, v.5.04 (GraphPad Software Inc., San Diego, CA, USA). Differences in mortality compared to the corresponding reference site were determined using Fisher’s exact test. Continuous data were examined for normal distribution with D’Agostino and Pearson omnibus normality test and for variance homogeneity with Bartlett’s test for equal variances. In case of normal distribution and variance homogeneity an unpaired t test or a one-way ANOVA with Bonferroni’s post hoc test was applied. If continuous data were not normally distributed or variances were inhomogeneous, a Mann–Whitney test or Kruskal–Wallis test followed by Dunn’s post hoc test was applied. The level of significance was defined as α < 0.05 and is illustrated in the graphs with asterisks (*p < 0.05, **p < 0.01, ***p < 0.001). For the correlation of explanatory variables (in vitro activity and baseline toxicity in water and sediment samples, loss on ignition, mean grain size) and the response variables (mortality and reproduction of P. antipodarum and G. fossarum), linear regression analyses were conducted.

Results

Active biomonitoring campaigns and laboratory experiments with Potamopyrgus antipodarum

At the river Horloff, the reproduction of snails exposed at site H2 downstream the wastewater discharge slightly increased compared to H1 but was not significantly enhanced in the active biomonitoring campaign (Fig. 2a). At sites H1 and H2 also significantly fewer snails died than at reference site H3R (p < 0.05–0.01, Fig. 2c). At restoration site H4, we found the highest mortality of P. antipodarum among all sites at the Horloff (Fig. 2c). Here, 53.3% of the exposed snails died which corresponds to a significantly higher mortality than at reference site H3R (p < 0.01). At the transect sites, the reproduction varied around the reference level (Fig. 2a), the mortality of P. antipodarum decreased again and was, in case of transect site H8, significantly reduced compared to reference site H3R (p < 0.01, Fig. 2c). A similar trend in the reproduction of P. antipodarum was also observed in laboratory experiments with combined water/sediment samples from the corresponding sampling sites (Additional file 1: Figure S1a).

Fig. 2
figure 2

Potamopyrgus antipodarum. Mean and standard deviation of the number of embryos (a, b) and mean and standard error of the mean of the percentage mortality (c, d) after 28 days of exposure in the active biomonitoring campaigns at the river Horloff in March 2017 (a, c) and the river Nidda in May 2017 (b, d). White: reference sites, with H3R and N2R as statistical reference sites, light grey: restoration sites, dark grey: transect sites. Significant differences in the number of embryos compared to the reference site N2R (shaded) were determined via unpaired t test. Significant differences in the percentage mortality compared to the corresponding reference site H3R or N2R (shaded) were determined using Fisher’s exact test. *p < 0.05, **p < 0.01, ***p < 0.001, n = 6

In contrast to the river Horloff, snails produced considerably more embryos in the active biomonitoring campaign at the river Nidda (cf., Fig. 2a, b). At reference site N2R downstream the wastewater discharge, snails produced slightly less embryos compared to site N1, but this was not statistically significant (Fig. 2b). At the restoration sites N3 to N5, snails’ reproduction was significantly higher than at reference site N2R (p < 0.05–0.01), whereas mortality was not significantly affected (Fig. 2b, d). Reproduction of P. antipodarum corresponded in the transect sites to the reference level at N1 (Fig. 2b) and mortality rose significantly, reaching 33.3% and 31.7% at transect sites N7 and N9, respectively (p < 0.001, Fig. 2d). In laboratory experiments with P. antipodarum and combined water/sediment samples from the river Nidda, we found a different pattern for the reproduction and mortality (Additional file 1: Figure S1b, d). Here, the embryo numbers in snails significantly decreased at restoration site N5 compared to N2R (p < 0.05) and mortality was not affected.

Active biomonitoring campaigns and laboratory experiments with Gammarus fossarum

At the river Horloff, the fecundity indices of gammarids slightly increased from sites H2 to H4 (Fig. 3a), whereas the mortality decreased slightly at these sites compared to H1 (Fig. 3c), but these differences were not statistically significant. At the transect sites H5 (p < 0.001) and H8 (p < 0.05), significantly fewer G. fossarum individuals died, whereas at H7 a significantly higher mortality of gammarids was observed compared to H3R (p < 0.01, Fig. 3c). At H9, the highest fecundity index was recorded but did not differ significantly from H3R (Fig. 3a). Also, in the laboratory experiment with combined water/sediment samples from the river Horloff, the fecundity index did not differ significantly between sampling sites (Additional file 2: Figure S2a). In contrast to the active biomonitoring at the river Horloff, significantly fewer gammarids died at site H1 in the laboratory experiment than at reference site H3R (p < 0.05, Additional file 2: Figure S2c). The highest mortality with 25% dead G. fossarum individuals was observed at the reference site H3R in laboratory experiments, and a significantly lower mortality was determined at the restoration site H4 (p < 0.01, Additional file 2: Figure S2c).

Fig. 3
figure 3

Gammarus fossarum. Mean and standard deviation of the fecundity index (a, b) and mean and standard error of the mean of the percentage mortality (c, d) after 28 days of exposure in the active biomonitoring campaigns at the river Horloff in March 2017 (a, c) and the river Nidda in May 2017 (b, d). White: reference sites, with H3R and N2R as statistical reference sites, light grey: restoration sites, dark grey: transect sites. Significant differences in the fecundity index compared to the reference site N2R (shaded) were determined via unpaired t test. No brooding females occurred at N7 and N9, therefore, fecundity index could not be calculated. Significant differences in the percentage mortality compared to the corresponding reference site H3R or N2R (shaded) were determined using Fisher’s exact tests. *p < 0.05, **p < 0.01, ***p < 0.001, n = 6

At the river Nidda, the fecundity index of gammarids declined by 43% at reference site N2R downstream the wastewater discharge, whereas mortality rose by 78% compared to site N1 (Fig. 3b, d), but these differences were not statistically significant. At the restoration site N3, the fecundity index significantly increased compared to the reference site N2R (p < 0.01, Fig. 3b) and the mortality remained on a comparable level as at site N2R (Fig. 3d). At N5, the fecundity index was highest among all sites at the river Nidda (Fig. 3b) but was not statistically significant compared to N2R due to the high standard deviation. At the transect site N6, the fecundity index was significantly higher than at reference site N2R (p < 0.05, Fig. 3b) and at N7 and N9, no reproduction of gammarids occurred (Fig. 3b), since the mortality increased significantly to 76.7% and 83.3%, respectively (p < 0.001, Fig. 3d). In the laboratory experiment with combined water/sediment samples from the river Nidda, completely different patterns were observed for fecundity and mortality (Additional file 2: Figure S2b, d). Within the restoration sites, the fecundity indices showed a decreasing trend compared to reference site N2R and reached the lowest fecundity index at N5. Significantly fewer gammarids died at reference site N1 (p < 0.05), restoration site N3 (p < 0.001) as well as transect sites N6 (p < 0.001), N8 (p < 0.05) and N9 (p < 0.001) compared to N2R.

In vitro analyses of water samples

The microtox assay revealed already moderately toxic water samples at the Horloff reference sites H1 to H3R (Fig. 4a). At sites H1 and H2, we also found significantly increased anti-estrogenic activities with 2.21 and 1.48 mg OHT-EQ/L, respectively, compared to H3R (p < 0.01–0.001, Additional file 3: Figure S3a). Furthermore, we also found slight dioxin-like activities in water samples from sites H2 and H3R (Fig. 4c), but these were very low and close to the detection limit (Additional file 4: Table S4). Surprisingly, the baseline toxicity at restoration site H4 increased significantly by 66% compared to the reference site H3R (p < 0.05) and, therefore, represented the most toxic water sample from the river Horloff (Fig. 4a). The baseline toxicity of water samples declined within the transect sites compared to H3R, reaching a significantly increased EC50 of 750 REF at H6 and H7, which is equivalent to non-toxic water samples (p < 0.001, Fig. 4a). At site H7, we also found significantly increased dioxin-like and anti-estrogenic activities with 0.495 mg OHT-EQ/L compared to H3R (p < 0.01, Fig. 4c, Additional file 3: Figure S3a). However, these activities were comparatively low and close to the detection limits (Additional file 4: Table S4). In water samples from the river Horloff, no estrogenic activities were found at any sampling site (Fig. 4e).

Fig. 4
figure 4

Mean and standard error of the mean of EC50 for baseline toxicity (a, b), dioxin-like activity (c, d) and estrogenic activity (e, f) in water samples from the river Horloff in March 2017 (a, c, e) and the river Nidda in May 2017 (b, d, f). White: reference sites, with H3R and N2R as statistical reference sites, light grey: restoration sites, dark grey: transect sites. Significant differences in baseline toxicity compared to the reference site H3R (shaded) were determined using one-way ANOVA and Bonferroni’s post hoc test. Significant differences in dioxin-like activity and estrogenic activity compared to the corresponding reference site H3R or N2R (shaded) were determined via Kruskal–Wallis test with Dunn’s post hoc test. If no bar is illustrated, the activity was below the LOQ or no activity at all was measured. *p < 0.05, **p < 0.01, ***p < 0.001, n = 3 with 8 pseudo-replicates each

At the river Nidda, we also found a moderate baseline toxicity in water samples at the reference sites N1 and N2R as well as the highest baseline toxicities in water samples from the restoration sites N3 to N5 (Fig. 4b), but these did not differ significantly from N2R. At these restoration sites, the dioxin-like activities significantly raised by up to 92% (p < 0.001, Fig. 4d) and the estrogenic activities significantly increased by up to 124% compared to N2R (p < 0.001, Fig. 4f). The baseline toxicities of water samples from transect sites N6 and N8 reached a comparable level as at the restoration sites N3 to N5 (Fig. 4b). At these transect sites, the dioxin-like (Fig. 4d) and the estrogenic activities (Fig. 4f) were also significantly higher compared to N2R (p < 0.05–0.001). Also, the anti-estrogenic activity was at N8 with 24.6 mg OHT-EQ/L considerably higher but not significantly different from reference site N2R (14 mg OHT-EQ/L) and increased significantly at N9 with 28.8 mg OHT-EQ/L (p < 0.05, Additional file 3: Figure S3b).

In vitro analyses of sediment samples

The microtox assay revealed at site H2 downstream the wastewater discharger the highest baseline toxicity of sediment samples from the river Horloff (Fig. 5a) but did not differ significantly from H3R. At this site, we also observed an increasing trend in the dioxin-like activity compared to H1 (Fig. 5c). At the restoration site H4 and the transect sites, baseline toxicity and dioxin-like activity varied around the reference level of H3R.

Fig. 5
figure 5

Mean and standard error of the mean of EC50 for baseline toxicity (a, b) and dioxin-like activity (c, d) in sediment samples from the river Horloff in March 2017 (a, c) and the river Nidda in May 2017 (b, d). White: reference sites, with H3R and N2R as statistical reference sites, light grey: restoration sites, dark grey: transect sites. Significant differences in baseline toxicity compared to the reference site N2R (shaded) were determined using one-way ANOVA and Bonferroni’s post hoc test. Significant differences in dioxin-like activity compared to reference site N2R (shaded) were determined via Kruskal–Wallis test with Dunn’s post hoc test. If no bar is illustrated, the activity was below the LOQ or no activity at all was measured. **p < 0.01, ***p < 0.001, n = 3 with 8 pseudo-replicates each

In the microtox assay with sediment samples from the river Nidda, the highest baseline toxicity was found at restoration site N3 but did not differ significantly from reference site N2R (Fig. 5b). At reference site N2R and restoration sites N3 and N4 no dioxin-like activities were found, since they were below the LOQ (Fig. 5d, Additional file 4: Table S4). The least toxic sediment from the river Nidda and, thus, a significantly higher EC50 were determined at N7 (p < 0.01, Fig. 5b). At this site, we found a significantly increased dioxin-like activity compared to reference site N2R (p < 0.001, Fig. 5d). However, the activity was comparably low and close to the detection limit (Additional file 4: Table S4). At N9, the dioxin-like activity significantly rose to 38.5 µg β-NF-EQ/kg (p < 0.01, Fig. 5d).

Anti-estrogenic activities were only found in sediments from reference site N1 and the restoration site N3 with 152 mg OHT-EQ/kg and 84.8 mg OHT-EQ/kg, respectively.

The results of the mean grain size of sediment samples from the river Horloff revealed, that nearly all sampling sites, except restoration site H4 and transect site H9, belong to the sediment category fine sand (Table 1). The highest loss on ignition and, thus, the highest organic content was found at reference site H2 and at transect site H6.

Table 1 Mean grain size in mm and corresponding 95% confidence intervals (CI), classification according to DIN EN ISO 14688-1 [66] and mean and standard deviation of percentage loss on ignition according to DIN 38414-3 [67] of sediment samples from the reference sites (H1-H3R, N1-N2R), restoration sites (H4, N3-N5) and transect sites (H5-H9, N6-N9) at the river Horloff in March 2017 and at the river Nidda in May 2017

At the river Nidda, the mean grain size of most sediment samples lay between 0.2 and 0.6 mm, which corresponds to sediment category medium sand. Only reference site N2R and restoration site N3 exhibited mean grain sizes ranging from 0.06 to 0.2 mm and correspond to sediment category fine sand. The highest loss on ignition was measured in sediment samples from the reference site N2R, the restoration site N5 and the transect site N8.

Linear correlation analyses

The results of the linear correlation analyses between explanatory variables (endocrine activity and toxicity of water and sediment samples, loss on ignition, mean grain size) and response variables (reproduction and mortality of P. antipodarum and G. fossarum) are summarized in Table 2 and are illustrated in Additional file 5: Figure S4. In addition to the linear correlations of Table 2, we found a significant positive correlation between the estrogenic and the dioxin-like activity in water samples (p < 0.001, Additional file 5: Figure S4i) as well as a significant negative correlation between the loss on ignition and the dioxin-like activity in water samples (p < 0.05, Additional file 5: Figure S4f).

Table 2 Overview of the linear correlation analyses

Discussion

Horloff

Since reproduction of snails and gammarids increased or at least showed an increasing trend at H2 in both, the active biomonitoring campaign and under standardized conditions in the laboratory, this cannot be explained by the higher water temperature (+ 0.78 °C) due to the discharge of the WWTP. However, the organic carbon content in sediments increased considerably from sampling site H1 with 8.45% dry weight (dw) to 14.1% dw at site H2, which is possibly due to the discharge of the WWTP (Table 1). A higher proportion of organic matter may provide additional food for the detritivorous mudsnails and gammarids, probably increasing reproduction [45, 68]. In addition, the enhanced food supply may have masked toxic effects, such as the substantially increased sediment toxicity at H2 compared to H1 (Fig. 5a) [69,70,71,72]. It is also conceivable that the increased organic content in sediments, the smaller mean particle size (Table 1) and the increased DOC (Additional file 4: Table S1) could have bound organic pollutants and thus reduced the bioavailability of toxic substances for snails and gammarids [68, 73,74,75,76,77,78,79]. Therefore, no toxic effects, i.e., no significantly increased mortality of P. antipodarum and G. fossarum, occurred despite the high sediment toxicity determined in the microtox assay at H2 following a total extraction of sediments via Soxhlet (Fig. 5a). This is in line with the observations of Schmitt et al. [80], who found significant increases in the estrogenic activity after total extraction of sediments, while the reproduction of P. antipodarum was not enhanced. Furthermore, the anti-estrogenic activity declined at site H2 compared to H1 (Additional file 3: Figure S3a), which may have also contributed to the higher reproduction at site H2, since anti-estrogens are able to reduce the reproduction of snails [41, 81, 82].

The high mortality of P. antipodarum within the restoration measure in the active biomonitoring cannot be attributed to differences in associated water parameters, since these differed just slightly between sampling sites (Additional file 4: Table S1) and is rather due to the significantly increased baseline toxicity in water (Fig. 4a), the high sediment toxicity (Fig. 5a) and the dioxin-like activity at site H4 (Fig. 5c). The decline in water and sediment quality is likely to result from the surface runoff of the A45 motorway bridge and might be attributed to the presence of polycyclic aromatic hydrocarbons (PAHs) and metals in sediments and water phase [74, 83, 84] which would also explain the in vitro results in the present study. In contrast to the active biomonitoring, no significant increase in the mortality of P. antipodarum occurred in laboratory experiments. This may be due to the use of grab samples of river water and sediments that may have been taken before contaminants appeared and affected snails and gammarids in the active biomonitoring campaign. Moreover, it is conceivable that contaminant concentrations have decreased through chemical and biological degradation since no water renewal was conducted in laboratory experiments.

Although most invertebrates do not express the aryl hydrocarbon receptor (AhR) and are, therefore, relatively unresponsive to dioxin-like compounds [85], PAHs and dioxins as potent agonists at the AhR [85,86,87] negatively affect reproduction and mortality of invertebrates [74, 88,89,90,91,92,93]. The correlation analyses also revealed that increasing dioxin-like activities lead to a decrease in the reproduction of G. fossarum (Table 2, Additional file 5: Figure S4h) but do not have a lethal effect on G. fossarum and P. antipodarum. Therefore, an influence of dioxin-like substances on the reproduction of G. fossarum and P. antipodarum cannot be excluded in the present study and might have contributed to the biological responses.

Nidda

A possible explanation for the decreasing trend in reproduction and the increasing trend in mortality of snails and gammarids downstream the WWTP discharger is the 22% lesser DOC concentration at site N2R compared to N1 (Additional file 4: Table S1) so that less hydrophobic organic contaminants are bound, thus increasing bioavailability and toxicity of these pollutants to aquatic organism [77,78,79, 94].

The results of the river Nidda revealed a substantially higher level of endocrine activity and toxicity in restored river sections compared to the reference site upstream, although not all observed endpoints were significantly elevated. This especially refers to estrogenic and dioxin-like activities in water samples as well as baseline toxicity in water and sediment samples (Figs. 4b, d, f, 5b). The estrogenic activity in water samples correlates significantly and positively with the embryo numbers in P. antipodarum (cf., Figs. 2b, 4f, Table 2, Additional file 5: Figure S4a), for which an increase in reproduction with rising estrogenic activities has already been reported [41, 47, 80, 93, 95]. Since only reduced reproduction and increased mortality of invertebrates by dioxin-like substances are described in the literature [83, 88,89,90,91] but no increased reproduction, it can be assumed that the significant positive correlation between dioxin-like activities and the reproduction of snails only reflects the wastewater load and thus the estrogenic activity (Table 2, Additional file 5: Figure S4b). This results mainly from the correlation between estrogenic and dioxin-like compounds in water samples and is supported by the significant negative correlation between dioxin-like activities in sediment samples and the reproduction of P. antipodarum (Table 2, Additional file 5: Figure S4c, i). Moreover, the correlation analyses showed that the reproduction of snails in the active biomonitoring increases with increasing mean grain size and decreases with increasing loss on ignition (Table 2, Additional file 5: Figure S4d, e). This was to be expected as the bioavailability of estrogens increases with increasing mean grain size and decreases with increasing organic content in sediments [96,97,98], so that these could have contributed indirectly to the elevated reproduction of snails in restored river sections.

In contrast, the relation between water and sediment contamination and the reproduction of G. fossarum is less clear. On the one hand, Schneider et al. [53] observed an increase in the fecundity index of Gammarus pulex with increasing wastewater content and attributed this to the presence of EDCs, especially to estrogenic substances, which is in line with the present findings within restored river sections in the active biomonitoring; on the other hand, various studies report on decreasing fecundity indices downstream of WWTP effluents or on shifts in sex ratio in favor of females induced by estrogens but not on elevated fecundity of gammarids [50, 53, 54]. Thus, a reliable proof of an increased fecundity of G. fossarum due to estrogen exposure is not yet available and correlation analyses in the present study revealed decreasing fecundity indices with increasing estrogenic or dioxin-like activity, which is likely due to the correlation between estrogenic and dioxin-like activity in water samples (Table 2, Additional file 5: Figure S4g, i). Therefore, it is assumed that estrogenic activity represents just the wastewater load and that dioxin-like activities are responsible for a decrease in fecundity indices. However, the cause for an increased reproduction of gammarids within restored river sections remains unknown in the present study.

The effect pattern on reproductive parameters in the laboratory experiments with P. antipodarum and G. fossarum showed a completely different picture (Additional file 1: Figure S1b, Additional file 2: Figure S2b), probably due to a superimposition of estrogenic and toxic effects (cf., Figs. 4b, d, f, 5b). Snails and gammarids in the active biomonitoring campaigns are primarily exposed via the water phase, while they are more affected in laboratory experiments by substances in the sediment. As already mentioned, the biomonitoring campaigns represent temporally integrated exposures to substances in the water phase, whereas laboratory experiments were performed with grab samples of water and sediments and, therefore, only represent snapshots of the chemical contamination at a certain time. Estrogenic substances in water samples may have been degraded during the 28-day exposure in laboratory experiments as stated in Schneider et al. [53], who showed a halving of estrogenic activity within 24 h, whereas the contamination of sediments remains comparably stable. Therefore, an increased reproduction as a consequence of estrogen exposure was not observed in laboratory experiments with snails and gammarids but reduced reproduction, probably due to the baseline toxicity in sediments and the dioxin-like activities in water and sediment samples known to be highly resistant to chemical and biological degradation [99]. Correlation analyses also revealed a decreasing number of embryos in P. antipodarum with increasing dioxin-like activity in sediment samples (Table 2, Additional file 5: Figure S4c).

Comparison of the rivers Horloff and Nidda

The reproduction of mudsnails at the river Nidda was much higher than at the river Horloff, which can partially be explained by the higher water temperature in May 2017 in the Nidda (N1: 18.1 ± 2.07 °C) compared to the Horloff in March 2017 (H1: 9.82 ± 1.97 °C), since reproduction of P. antipodarum is temperature-dependent [100] and snails in the river Nidda were exposed almost to their optimum temperature [101]. Besides the substantially higher water temperature, the significantly higher estrogenic activity in combination with a higher bioavailability due to lower organic carbon contents and higher mean particle sizes (Table 1) [96,97,98] in the Nidda is a likely cause for the higher embryo numbers in P. antipodarum compared to the river Horloff [42, 47, 80, 95]. The higher reproduction at the reference sites of the Nidda compared to the Horloff is likely due to higher water temperature and higher estrogen exposure as measured by the YES. But since the reproduction of snails and the estrogenic activity at the river Nidda highly correlate over the course of all sampling sites (Additional file 5: Figure S4a) and the temperatures are more or less constant, the estrogenic compounds are the likely explanation for the significantly increased embryo numbers within the restoration sites at the river Nidda. In contrast to the snails, gammarids exhibited higher fecundity indices at the river Horloff than at the river Nidda, which is probably due to the lower average water temperature at the Horloff in March compared to the river Nidda in May, considering that the optimum temperature for G. fossarum is 12.1 °C [102, 103]. In addition, a substantially higher contamination with dioxin-like substances could be detected in the Nidda than in the Horloff, which could have also contributed to the lower reproduction of the gammarids in the Nidda (Additional file 5: Figure S4h).

The in vitro assays indicated a higher level of activity and, therefore, contamination in the river Nidda, although the analyzed section was restored with a considerably higher intervention depth and is primarily surrounded by grassland and pastures, while the river Horloff is mainly surrounded by intensive agricultural land. The most probable causes for the increased in vitro activities in river water and sediments are wastewater discharges [104,105,106,107], intensive agricultural use [108,109,110,111] and surface runoff from motorways [74, 83, 84]. Since riverine sediments represent a major sink for some contaminants [96, 97, 112,113,114,115,116,117,118], the lower level of activity at the river Horloff could likely be due to the higher organic content and the smaller mean grain size of sediments so that contaminants are less bioavailable (except for the microtox assay with sediment samples for which a total extraction was performed) [68, 73, 74, 97]. This is also supported by the significant negative correlation between loss on ignition and dioxin-like activity in water samples (Additional file 5: Figure S4f).

As this is the first study to report an increase in toxicity as well as estrogenic and dioxin-like activity in restored river sections, this may explain the lack of success of many other in-stream restoration projects [13, 29, 31,32,33]. A possible explanation for the significantly worse results within the restored sections is the transport and deposition of polluted fine sediments within the restoration measures [18, 119,120,121]. Since the restored sections are characterized by a higher flow diversity, it is conceivable that polluted fine particulate matter, e.g., introduced by soil erosion from surrounding fields [122,123,124,125] or by WWTPs [107, 125], may settle in the flow-calmed zones [119, 126,127,128]. The reduced flow velocity might result in an increased exchange between sediment and water, so that formerly sediment associated substances are remobilized and affect the local invertebrate fauna [96, 97, 122, 129]. Furthermore, remobilization of sediment associated contaminants increases by bioturbating activities of sediment dwelling invertebrates [130,131,132,133] and dredging activities during the restoration process [18, 123, 134, 135]. This underlines that river sediments are both important sinks and sources of contaminants [85, 115, 116, 136].

Hence, the present study shows that the success of restoration measures is endangered by the prevailing chemical contamination, which was assessed via in vivo whole organism and effect-based in vitro bioassays. Thus, restoration measures on their own will not lead to the desired good ecological status according to EU-WFD unless chemical contamination of water and sediments is reduced in parallel. As our approach provides clear advantages compared to the assessment of the chemical status according to EU-WFD, which focuses only on the concentrations of 45 priority substances and completely neglects effects of metabolites, transformation products, non-regulated substitutes of priority substances and mixture effects of substances [137,138,139,140,141,142,143], while these can be assessed by effect-based in vitro bioassays, we recommend implementing effect-based methods in the EU-WFD [137,138,139, 144] and improving water and sediment quality in conjunction with hydromorphological restoration measures to achieve the objectives of the EU-WFD.

Conclusion

The in vivo and in vitro assessments yielded the worst results for the restored sections. In addition, the measured in vitro activities and in vivo effects for the river Nidda were even worse than for the river Horloff, although the Nidda was restored with a considerably higher intervention depth. Accordingly, restoration measures do not seem to have any compensating positive influence on the organisms that are affected by chemical contamination, irrespective of the intervention depth of the restoration. Furthermore, the results revealed that the prevailing chemical contamination negatively affected snails and gammarids in the active biomonitoring campaigns and consequently will also affect the local invertebrate community and thus endangers the restoration success.