Background

River ecosystems are of great ecological and social importance as they provide, for example, habitats for animals and plants, recreational areas for humans, resources for wastewater disposal, for drinking water production and for shipping routes. Nevertheless, they are one of the most anthropogenically modified ecosystems of all [1, 2]. Therefore, the European Parliament and the Council of the European Union adopted the European Water Framework Directive (2000/60/EC, EU-WFD) [3] in 2000, which stipulated that all European surface waters should have had achieved at least a good ecological status by 2015. However, in 2015, only 8.2% of the German surface waters and 6.7% of German rivers achieved a high and good ecological status [4]. As the objectives of the EU-WFD were missed by 2015, the deadline for implementing the objectives has been extended until 2027 [4].

The ecological status of a river according to the EU-WFD is mainly defined by the organisms living in the water, since the composition of the aquatic biocenosis of the respective water body reflects the totality of all influencing factors and disturbances [4, 5]. To achieve a good ecological status, the biological quality components aquatic flora, benthic invertebrate fauna, and fish fauna must, therefore, be evaluated at least as “good” [3], but are also supported by specific pollutants, hydromorphological, chemical, and physico-chemical components. The worst-rated component then determines the ecological status of the water body (“one-out, all-out-principle”) [3, 4]. Thereby, the EU-WFD defines five ecological status classes, which indicate the degree of anthropogenically induced disturbance of the respective ecosystem [3, 4]. Duft and Oehlmann [6] adapted this concept with five status classes to assess the effects of chemical contamination in in vivo tests with river sediments (Table 1).

Table 1 Definition of ecotoxicological status classes according to Duft and Oehlmann [6]

The ecotoxicological status class 1 reflects the background level with the natural variability of effects at reference sites [4]. Status class 2 represents the protection target for river water and river sediment and from class 3 on, which is characterized by significant changes compared to the reference state, action is required. This also applies to ecotoxicological status classes 4 and 5, for which population damage is to be expected in long terms [6].

The causes of non-compliance with the objectives of the EU-WFD are manifold; anthropogenically influenced surface waters are often characterized by structural deficits, inter alia due to flood protection measures (e.g., channelization of watercourses, river regulation), and inadequate water and sediment quality through chemical contamination (e.g., by wastewater treatment plants, industrial and agricultural use), which permanently alter the species assemblage of rivers or prevent rivers from recolonization and thus result in an insufficient ecological status [7,8,9,10]. The EU-WFD requires restoration measures for rivers that do not achieve a good ecological status [11]. In this context, hydromorphological restoration measures have often been conducted to improve habitat and species diversity and, therefore, the ecological status of water bodies [7, 8, 12, 13]. However, such measures were rarely successful. Sundermann et al. [7] have shown that hydromorphological restorations only resulted in a good ecological condition according to EU-WFD in three out of 25 analyzed river restoration projects. Further studies have confirmed this finding that habitat improvement is not necessarily accompanied by biota enhancement and thus by the improvement of the ecological status of rivers [7, 8, 13, 14]. The authors attributed this inter alia to the poor water quality through chemical contamination from agriculture or by wastewater treatment plant effluents (e.g., micropollutants and nutrients such as phosphorus and nitrogen) [5, 7, 15, 16]. To be able to prioritize measures, decision criteria for the water management practice are necessary, which enable the prognosis, whether chemicals in the water or sediment and their resulting effects or other factors, such as deficits in the structure of the water bodies, are the main cause for the failure to meet the objective of a good ecological status according to EU-WFD at a given water body. As a consequence, the aim of the present study was (1) to provide evidence and guideline to assess if chemical contamination is a relevant stress factor for the aquatic biodiversity that contributes to the failure to meet the objectives of the EU-WFD [3] and (2) to develop an ecotoxicological, Water Framework Directive–compliant assessment system as decision criterion for the water management practice. To measure and assess the current ecotoxicological conditions of the rivers in the catchment of the Nidda, active biomonitoring campaigns and laboratory experiments with gammarids and snails, as well as in vitro assays with water and sediment samples were conducted quarterly over the course of 1 year. Thereby, the active biomonitoring campaigns represent temporally integrated exposures and gammarids and snails are known to respond to a vast number of substances, as well as different types of toxicity [17,18,19,20]. Therefore, active biomonitoring campaigns provide in vivo effects that best reflect the real conditions of rivers. The laboratory experiments with combined water/sediment samples were used as a plausibility measure to exclude environmental stress factors such as stream velocity or water temperature as causes for arising in vivo effects. In addition, the in vitro assays with water and sediment samples were performed to track local pollution sources and to gain effect-based information which could possibly explain appearing in vivo effects [21,22,23]. Based on the results of these in vivo and in vitro tests, an ecotoxicological assessment system in compliance with the EU-WFD was subsequently derived. The comparison of these derived ecotoxicological status classes with the ecological status classes allows the identification of probable causes for the failure to meet the objectives of the EU-WFD and to derive recommendations for action for the implementation of priority measures in water management practice.

Methods

Sampling sites

The Nidda with its tributaries represents a typical catchment of Central Europe in many respects. Here, numerous conflicts of use, resulting in multiple causes for deficits according to the EU-WFD, as well as other challenges for a sustainable management of surface waters can be examined and system solutions can be developed as a model for other catchments. The Nidda is one of the most important surface waters in Hessen (Germany) and its catchment with an area of almost 2000 km2 is characterized by intensive agricultural and industrial use [24]. Due to river engineering for flood protection, wide sections of the Nidda equal artificial watercourses with deficient water body structures. The water quality of the Nidda and its tributaries is influenced through municipal and industrial wastewater treatment plants (WWTP) and the percentage of clearwater (i.e., treated wastewater) reaches up to 50% in summertime [25]. Therefore, the Nidda and two of its most affected tributaries, Usa and Horloff, are investigated in the present study and compared to reference sites, which correspond to the original type of the rivers as suggested by Chovanec et al. [26] and Reyjol et al. [27]. The relevant reference and sampling sites are summarized in Table 2.

Table 2 Overview of the reference and sampling sites at the rivers Nidda, Usa, and Horloff

Test organisms

Gammarus fossarum was chosen as test species, even though it is not yet a standard organism in ecotoxicological testing, because of its wide distribution, high abundance, its sensitivity towards chemicals such as micropollutants from wastewaters and pesticides [16, 28,29,30] and since it has already been used for in situ testing of environmental conditions [20, 28, 31, 32]. As G. fossarum feeds on detritus, leaves and other organic materials, it provides an important ecosystem service as shredder and serves as food for fish. Therefore, it plays a key role in lotic food webs, provides an important link between primary and secondary production and is a representative of stream invertebrates [33]. Prior to the active biomonitorings and the laboratory experiments, gammarids were collected via kick-sampling with a Surber sampler from a pure population of G. fossarum in an uncontaminated area near the source of the river Nidder (N 50°29′7″, E 9°14′52″, Sichenhausen, Hessen, Germany).

Potamopyrgus antipodarum was selected due to its sensitivity towards reproductive toxicants, including disrupting chemicals (EDCs) [34,35,36,37]. Therefore, it is a standard organism in toxicity testing of chemicals according to OECD guideline 242 [38] and has already been used successfully for the assessment of environmental conditions [18, 37, 39, 40]. The mudsnail originates from New Zealand and has been introduced to Europe as an invasive species via ballast water from ships in the 19th century [34]. Because of its wide distribution and abundance in European rivers, P. antipodarum is regarded as a representative of the benthic macroinvertebrate community [28]. Mudsnails used for biomonitoring and laboratory experiments originated from the in-house breeding stock of the Department Aquatic Ecotoxicology at Goethe University.

Active biomonitoring campaigns (field)

For the active biomonitoring campaigns cylindrical stainless steel enclosures (12.5 cm × 6 cm, Fig. 1a) were equipped with polyester net bags (Netzfabrikation Renate Heberle, 15 cm × 9 cm, mesh size 1.0 mm), a piece of wire gauze (polytetrafluorethylene, 8.2 cm × 3.3 cm, to enlarge the surface and thus avoid cannibalism between gammarids) and conditioned black alder leaves (Alnus glutinosa) ad libitum. The leaves were collected prior to the biomonitoring campaigns in spring 2015 and conditioned in unpolluted spring water for 14 days so that a biofilm could emerge that decomposes the cellulose and makes the leaf material accessible for the gammarids. Subsequently, eight G. fossarum, with a minimum size of 6.0 mm to guarantee maturity, were inserted in each enclosure and the enclosures were closed with hose clamps (50–70 mm, W4, Fig. 1b). In addition, eight P. antipodarum with a size of 3.5 to 4.5 mm were introduced into stainless steel tea–eggs (4.5 cm × 3.5 cm, mesh size 0.7 mm, Fig. 1c) containing pieces of carrots from controlled biological cultivation ad libitum. Two cages per sampling site (45 cm × 7.5 cm × 13 cm, perforated plate covering on one side to avoid clogging with leaf material, Fig. 1d), each containing three enclosures and three tea–eggs, were exposed in the water phase by attaching them with stainless steel rods (1.5 m length x 1.5 cm diameter) onto the waterbed (Fig. 1e), resulting in six replicates per species and sampling site. Furthermore, a data logger (HOBO Pendant®, Onset Computer Corporation, Bourne, USA) was exposed in parallel at each sampling site to measure the water temperature every 30 min. In addition, associated water parameters, such as pH, electric conductivity, oxygen concentration and oxygen saturation were measured at the beginning of each biomonitoring campaign with a portable multimeter (HQ40d, Hach, Germany). Concentrations of ammonium, nitrite, nitrate, ortho-phosphate, sulfate, chloride and dissolved organic carbon (DOC) were determined via Spectroquant test kits (Merck, Darmstadt, Germany) and total hardness as well as carbonate hardness were determined with MColortest kits (Merck, Darmstadt, Germany). Flow velocity was measured via flow meter (ALMEMO® 2290-4 V5, Ahlborn Mess- und Regelungstechnik GmbH, Holzkirchen, Germany; sensor C-61702, Schiltknecht Messtechnik AG, Gossau, Switzerland) and mean grain size [41] and loss on ignition [42] were determined once for every sampling site. After an exposure time of 28 days the cages were transferred into the laboratory and gammarids were checked for mortality and fixed separately in Eppendorf tubes containing 70% ethanol until further investigation. Snails were checked for mortality (empty shells or no reaction on stimuli), subsequently shock frozen in liquid nitrogen per replicate and stored at − 80 °C until further analysis. Snails were examined under a microscope (Motic SMZ-168, Motic Electric Group Co., Xiamen, P.R. China) and shell length, mortality (snails without intact soft bodies were determined as dead) and number of embryos in the brood pouch were assessed according to OECD guideline 242 [38]. Gammarids were investigated under a stereomicroscope (Olympus SZ61 R, Olympus Corporation, Tokio, Japan) with a digital camera (JVC Digital Camera KY-F75U, Victor Company of Japan Ltd., Yokohama, Japan) and the software Diskus (Version 4.50.1458, Carl H. Hilgers, Königswinter, Germany) for their size, sex, number of brooding females, brood size, and fecundity index. Active biomonitoring campaigns were repeated quarterly at each river over a period of 1 year (July 2015 until the end of August 2016).

Fig. 1
figure 1

Setup of the active biomonitoring campaigns with Potamopyrgus antipodarum and Gammarus fossarum. a Composition of enclosure for gammarids, b stainless steel enclosure for the exposure of G. fossarum, c stainless steel tea–egg for the exposure of P. antipodarum, d cage for the attachment onto the river bed, e exposed cages [Photo: Moritz Blumer]

In vivo screens with combined water/sediment samples (lab)

Static laboratory experiments under standardized conditions (e.g., temperature, 16/8 h light dark cycle) with P. antipodarum, G. fossarum and Lumbriculus variegatus were performed over 28 days in parallel to every active biomonitoring campaign using combined water/sediment samples from each sampling site. River water was collected in buckets and aerated until laboratory experiments started. River sediments were collected in polyethylene vessels from the riverbed top layer and frozen for 24 h to eliminate possible predators and species that were also used for the experiments (i.e., P. antipodarum, G. fossarum, L. variegatus).

The laboratory experiments with G. fossarum were performed at 10 ± 1 °C in 250 mL glass beakers containing 200 mL river water and 50 g of the corresponding sediment from each sampling site. The negative control consisted of 200 mL ISO test water [43] and 50 g artificial sediment [95% (dw) quartz sand and 5% (dw) powdered leaves of beech (Fagus sylvatica)] as defined in Duft et al. [34]. Every combined water/sediment sample was tested in four replicates and the negative control in six replicates, each containing ten G. fossarum and a piece of wire gauze (polytetrafluorethylene, 8.2 cm × 3.3 cm). The gammarids were supplied ad libitum with conditioned black alder leaves (Alnus glutinosa). After 28 days of exposure gammarids were fixed separately in Eppendorf tubes with 70% ethanol and examined regarding the same endpoints as described for the active biomonitoring campaigns.

The laboratory experiments with P. antipodarum were conducted at 16 ± 1.5 °C following Duft et al. [34] in 500 mL glass beakers containing 400 mL river water and 40 g sediment of the corresponding sampling site. The negative control consisted of 400 mL test medium according to OECD guideline 242 [38] and 40 g artificial sediment (95% (dw) quartz sand and 5% (dw) powdered leaves of beech (Fagus sylvatica)) as defined in Duft et al. [34]. Every combined water/sediment sample, as well as the negative control was tested in duplicate with 20 individuals of P. antipodarum each. Snails were fed three times a week with 70 µg finely ground TetraPhyll® (Tetra GmbH, Melle, Germany) per snail and day in accordance with OECD guideline 242 [38]. Snails were shock frozen in liquid nitrogen and examined regarding the same endpoints as described for the active biomonitoring campaigns.

In addition to the experiments with snails and gammarids, tests with Lumbriculus variegatus, an endobenthic species, were performed according to OECD guideline 225 [44]. 14 days before the start of exposure the worms had to be artificially fragmented (synchronization) to ensure similar physiological state. The laboratory experiments with L. variegatus were conducted at 20 ± 1 °C in 250 mL glass beakers containing 200 mL river water and 50 g sediment of the corresponding sampling site. The negative control consisted of 200 mL ISO test water and 50 g artificial sediment [75% (dw) quartz sand, 20% (dw) kaolin clay, 4.5% (dw) sphagnum peat, 0.5% (dw) mixture of powdered Urtica sp. leaves and alpha-cellulose (1:1)] in accordance with the OECD guideline 225 [44]. Every combined water/sediment sample was tested in four replicates, the negative control in six replicates with ten individuals of L. variegatus each. L. variegatus were fed three times a week with 0.40–0.60 mg finely ground TetraMin® (Tetra GmbH, Melle, Germany) per individual and day. After 28 days of exposure worms were shock frozen in liquid nitrogen and examined regarding mortality, reproductive performance (i.e., number of worms) and dry weight per replicate.

All test vessels were aerated via glass pipettes throughout the experiments and the evaporated water was daily refilled with deionized water. Associated water parameters such as pH, electric conductivity, oxygen content and oxygen saturation were measured at least once a week with a portable multimeter (HQ40d, Hach, Germany).

In vitro screens with water samples

In parallel to every biomonitoring campaign aqueous grab samples from each sampling site were collected in amber glass vessels with a volume of 1 L. The glass vessels were cleaned previously with acetone, ethanol and deionized water and heated to 200 °C in a drying oven (VWR VENTI-line® 115, VWR International GmbH, Darmstadt, Germany) to avoid contaminations. Water samples were stored at 4 °C until further analysis.

Unfiltered native water samples were analyzed within 48 h in the yeast anti-estrogen screen (YAES) and the yeast anti-androgen screen (YAAS) according to Giebner et al. [35] to avoid a possible loss of bioactive compounds during enrichment. The antagonist screens require background concentrations of the agonistic reference substances (0.3 nmol/L 17β-estradiol, 10 nmol/L testosterone).

For measurements in the yeast estrogen screen (YES), the yeast androgen screen (YAS), the yeast dioxin screen (YDS) and the microtox assay, native water samples were solid-phase extracted (SPE) according to Giebner et al. [35]. After filtering the water samples within 24 h following collection through glass microfibers filters (VWR International GmbH, No. 692, European Cat. No. 516-0885, 90 mm, particle retention: 1.0 µm, Darmstadt, Germany), 1000 mL of the water sample from each sampling site were passed through conditioned Oasis HLB cartridges (200 mg, Waters, Milford, MA, USA). The cartridges were dried under a gentle stream of nitrogen and afterwards eluted according to Giebner et al. [35]. This resulted in a 0.1 mL DMSO extract which was 10,000-fold enriched. Extracts were analyzed in the YES and YAS as described in Giebner et al. [35], as well as in the YDS according to Stalter et al. [45]. The measured activities were expressed as equivalent concentrations for 17β-estradiol (YES), testosterone (YAS), 4-hydroxytamoxifen (YAES), flutamide (YAAS) and β-naphthoflavone (YDS) and were corrected regarding dilution and enrichment so that equivalent concentrations relate back to the native water sample.

The water extracts were also analyzed for baseline toxicity in the microtox assay with Aliivibrio fischeri following the procedure of Völker et al. [46] and Harth et al. [28]. The inhibition of luminescence is expressed as 50% effect concentration (EC50) referring to the relative enrichment factor (REF) of the water sample. For non-toxic water samples an EC50 threshold value of 750 (REF) was defined as described in Harth et al. [28].

Subsequently, annual average activities were calculated for estrogenic, anti-estrogenic, androgenic, anti-androgenic and dioxin-like activities, as well as for the EC50 for baseline toxicity per sampling site. The annual average activity at each sampling site is based on four independent water samples collected in four consecutive quarters. If no activity was detected in a sample, half the LOQ was considered for the respective sampling date to calculate the annual average activity. The LOQ for the YES with water samples ranged between 0.073 and 0.439 ng EEQ/L, for the YAS between 4.63 and 15.3 ng T-EQ/L, for the YAES between 0.400 and 4.98 mg OHT-EQ/L, for the YAAS between 344 and 1190 µg Flu-EQ/L and for the YDS between 0.321 and 0.355 µg β-NF-EQ/L.

In vitro screens with sediment samples

In parallel to every biomonitoring campaign samples from the sediment top layer were collected at each sampling site in new polyethylene vessels with a volume of 1 L. Sediment samples were freeze-dried (Martin Christ Gefriertrocknungsanlagen GmbH, Alpha 1-4 LSC plus, Osterode, Germany) and 20 g of each sediment sample was extracted with 400 mL acetone at 56 °C in a Soxhlet extractor (Electrothermal EME30500/CEB, Cole-Parmer Ltd., Staffordshire, UK; VWR RC-10 Digital Chiller, VWR International GmbH, Darmstadt, Germany) for 24 h. Subsequently, the sediment extracts were reduced with a rotary evaporator at 56 °C (Heidolph Laborota 4000-efficient, vacubrand CVC 2000, Heidolph Instruments GmbH & Co. KG, Schwabach, Germany; VWR RC-10 Digital Chiller, VWR International GmbH, Darmstadt, Germany) and in a second step under a gentle stream of nitrogen and finally transferred to 0.5 mL DMSO. These sediment extracts were analyzed for baseline toxicity in the microtox assay following Völker et al. [46] and Harth et al. [28]. The inhibition of luminescence is expressed as EC50 referring to mg sediment equivalents of the sediment extract. In contrast to the microtox assay with water samples an EC50 threshold value of 30 mg sediment equivalents was defined for non-toxic sediment samples.

Furthermore, 50 g of the freeze-dried sediments were shaken with 100 mL ultra-pure water for 10 min at 210 rpm (GFL 3017, GFL Gesellschaft für Labortechnik mbH, Burgwedel, Germany) resulting in a 1:2 dilution. In a further step, the samples were eluted by sonication for 10 min (Sonorex RK 52 H, Bandelin electronic, Berlin, Germany). The aqueous eluates were centrifuged for 5 min at 4400 rpm (Centrifuge 5702, Eppendorf AG, Hamburg, Germany) and the supernatant was analyzed within 48 h in the YES, YAS, YAES and YAAS as described in Giebner et al. [35], as well as in the YDS following Stalter et al. [45]. Measured activities were expressed as equivalent concentrations for 17β-estradiol (YES), testosterone (YAS), 4-hydroxytamoxifen (YAES), flutamide (YAAS) and β-naphthoflavone (YDS) and corrected regarding dilution.

The LOQ for the YES with sediment samples amounted to 31.5 ng EEQ/kg, for the YAS it was 528 ng T-EQ/kg, for the YAES 11.7 mg OHT-EQ/kg and for the YAAS 3.02 mg Flu-EQ/kg. The YDS assay with sediment samples was not conducted for the rivers Nidda, Usa and Horloff, as it was added later to the standard test program. Nevertheless, the YDS assay was conducted with sediment samples from smaller rivers of the Nidda catchment and is included in the further derivation of the WFD–compliant assessment system.

Data analysis

Statistical analyses were performed with the software Microsoft® Excel 2016 (Microsoft Corporation, Redmond, USA) and GraphPad Prism®, v.5.04 (GraphPad Software Inc., San Diego, CA, USA). Quantal data, such as differences in mortality to the corresponding reference site, were analyzed via Fisher’s exact tests. Continuous data as for example differences in the number of embryos, fecundity index, number of worms, dry weight and in vitro data compared to the corresponding reference site were examined via unpaired t tests or a one-way ANOVA with Bonferroni’s post hoc tests. If continuous data were not normally distributed or in case of variance inhomogeneity a Mann–Whitney test or a Kruskal–Wallis test with Dunn’s post hoc test were applied. The level of significance was defined as α < 0.05 and is represented in the graphs as asterisks (*p < 0.05, **p < 0.01, ***p < 0.001).

Results

In the following, only some representative results of the largest rivers of the Nidda catchment (Nidda, Usa and Horloff) are presented as examples, but the results of smaller rivers and creeks such as the Rambach, Laisbach, and Bleichenbach are also included in the further establishment of the ecotoxicological, Water Framework Directive–compliant assessment system. Figure 2 illustrates the percentage mortality and the fecundity index of G. fossarum and Fig. 3 represents the percentage mortality and the reproduction of P. antipodarum, recorded by the number of embryos in the brood pouch, in the active biomonitoring campaigns at the rivers Nidda (June 2016), Usa (January 2016) and Horloff (April 2016), as well as in the laboratory experiments with combined water/sediment samples from the corresponding sampling sites.

Fig. 2
figure 2

Gammarus fossarum. Mean and standard error of percentage mortality (a, c, e) and mean and standard deviation of fecundity index (b, d, f) in active biomonitoring campaigns at the rivers Nidda, Usa and Horloff (Field) and in laboratory experiments with combined water/sediment samples (Lab) of the corresponding sampling sites. Significant differences in mortality compared to the corresponding reference site N1, U1 or H1 (shaded) were determined using Fisher’s exact test. Significant differences in fecundity index in comparison with the corresponding reference site were determined via unpaired t test. *p < 0.05, ***p < 0.001, nField = 6, nLab = 4

Fig. 3
figure 3

Potamopyrgus antipodarum. Mean and standard error of percentage mortality (a, c, e) and mean and standard deviation of embryo numbers (b, d, f) in active biomonitoring campaigns at the rivers Nidda, Usa and Horloff (Field) and in laboratory experiments with combined water/sediment samples (Lab) of the corresponding sampling sites. Significant differences in mortality compared to the corresponding reference site N1, U1 or H1 (shaded) were determined using Fisher’s exact test. Significant differences in embryo numbers in the field in comparison with the corresponding reference site were determined via unpaired t test. Significant differences in reproductive output in the lab compared to the corresponding reference site were determined using Mann–Whitney test. *p < 0.05, **p < 0.01, ***p < 0.001, nField = 6, nLab = 40

At the Nidda, the mortality of G. fossarum in the laboratory experiment shows a similar trend, only in an attenuated form, as in the biomonitoring campaign (Fig. 2a). This could be due to a reduced stressor profile in the laboratory (standardized temperature, oxygen saturation, no hydraulic pressure etc.) but also to the static test design (i.e., degradation of possible substances during the laboratory tests). In contrast to the comparatively high mortality of gammarids, the mortality of P. antipodarum in the active biomonitoring campaign and the laboratory experiment at the river Nidda is quite low and reaches a maximum of approximately 10% at the reference site (Fig. 3a). The fecundity index of gammarids, which describes the number of eggs depending on the size of the respective gammarid, also shows a similar pattern under field and lab conditions, except for sampling site N4 in the active biomonitoring campaign (Fig. 2b); here, the fecundity index is significantly reduced in comparison to the reference site (p < 0.05). This is probably a result of the raising water toxicity at N4 (Table 3) which could be due to the discharges of two WWTPs (30,000 and 12,000 PE) upstream of sampling site N4 and chemicals from agricultural use (e.g., spray-drift, surface run-off). The same effect was not detected in the laboratory experiment, since only grab samples of the river water and the sediment were used and no water exchange was done during the entire exposure period, so that chemical substances could possibly diminish by adsorption as well as chemical and biological degradation. It is also conceivable that the elimination of stressors led to the absence of significant effects in the laboratory experiment compared to the active biomonitoring campaign or that chemicals appeared and affected organisms in the active biomonitoring campaign, after grab samples for the laboratory experiment were taken. In contrast to the reproduction of gammarids, the embryo numbers of the mudsnails increase significantly downstream in comparison to the reference site in the active biomonitoring campaign at the Nidda as well as in the laboratory experiment (except for site N2, p < 0.01–p < 0.001, Fig. 3b), which could possibly be explained by raising estrogenic activities in the water phase (Table 3). The embryo number decreases significantly at N2 in laboratory experiments compared to the reference site (p < 0.05), which is possibly due to the high sediment toxicity (Fig. 4), since snails in laboratory experiments are directly exposed to the sediments, whereas snails in active biomonitoring campaigns are primarily exposed via the water phase.

Table 3 Annual average activities and standard deviations of estrogenic, anti-estrogenic and dioxin-like activities and EC50 in the microtox assay referring to the REF for baseline toxicity of water samples
Fig. 4
figure 4

Microtox assay. Mean and standard error of the mean of EC50 values (in mg sediment equivalents) for baseline toxicity of sediment samples from the rivers Nidda, Usa and Horloff. Reference sites N1, U1 and H1 are shaded. n = 6 with 8 pseudo-replicates each

At the river Usa the mortality of gammarids in the field decreases significantly at sampling site U3, which is located within a restored area, compared to that at the reference site U1 (p < 0.05, Fig. 2c). In contrast, the mortality of P. antipodarum is significantly reduced at sampling sites U2 and U4 (p < 0.05, Fig. 3c) in the active biomonitoring campaign compared to the reference site, while in the laboratory experiment no mortality of mudsnails occurred. The fecundity index of gammarids in the active biomonitoring campaign initially rises at sampling site U2 and then drops rapidly at sampling site U4 (Fig. 2d). This could be due to the discharges upstream U4 and chemicals from intensive agriculture and garden plots. Here, no significant effects in comparison to that of the reference site U1 could be recorded, because the sample size of brooding females, and therefore, the statistical power was too small. The embryo numbers of P. antipodarum decrease significantly at sampling sites U2 and U4 in the active biomonitoring campaign in comparison to that at the reference site U1 (p < 0.05, Fig. 3d). This is most likely due to the low average water temperatures of 5 °C (Additional file 1: Table 6S1), since reproduction of P. antipodarum is known to be temperature-dependent [47]. Possible effects of chemicals (e.g., estrogenic activities) might be masked by the low water temperatures, since the snails are not in their reproductive period. In contrast, snails produced significantly more embryos under controlled water temperature in the laboratory experiment with combined water/sediment samples from U2 to U4 compared to the reference site (p < 0.05–p < 0.001, Fig. 3d). This could be a result of the discharges of the WWTPs as well as chemicals from intensively used agricultural land and garden plots around sampling sites U2 to U4.

At the river Horloff the mortality of gammarids in the active biomonitoring campaign slightly increases downstream compared to the reference site and reaches a maximum of 70% at sampling site H4 in the active biomonitoring campaign (p < 0.01, Fig. 2e). Laboratory experiments have also shown the highest mortality of gammarids at sampling site H4 (Fig. 2e). On the contrary, no significantly increased mortality in comparison to the reference site H1 occurred in the active biomonitoring campaign with P. antipodarum (Fig. 3e). Nevertheless, significantly higher mortalities of mudsnails were found in the laboratory experiment at sampling sites H2 and H3 (p < 0.001, Fig. 2f) compared to the reference site, which are surrounded by intensive agriculture and lie downstream of four small-sized WWTPs (< 1000 PE) and a large WWTP (78,000 PE), respectively. At these sampling sites we also found the lowest EC50 values in the microtox assay with sediment samples and, therefore, the most toxic sediments (Fig. 4). Snails and gammarids were primarily exposed via the water phase during the active biomonitoring campaigns and are, therefore, less affected by sediment pollutants than in the laboratory experiments. The fecundity index of gammarids decreases significantly in the active biomonitoring campaign at sampling sites H3 (p < 0.05) and H4 (p < 0.01, Fig. 2e) in comparison to that at the corresponding reference site. This could also be attributed to the raising sediment toxicity downstream (Fig. 4). In the laboratory experiment with gammarids, a decreasing trend in the fecundity index downstream compared to the reference site was also observed (Fig. 2f). In contrast, the snails reacted in the active biomonitoring campaign, as well as in the laboratory experiment with significant increases in the number of embryos in the brood pouch downstream of the WWTPs compared to the reference site H1 (p < 0.001, Fig. 3f).

Since associated water parameters such as ammonium and nitrite concentrations as well as organic content, i.e., loss on ignition of sediments, did not differ considerably between sampling sites (except for electric conductivity and chloride concentration at U4) and thus showed no correlation with mortality or reproduction of snails and gammarids, it is assumed that these had just a minor influence on the observed biological responses. Electric conductivity and chloride concentration were always considerably higher at U4 compared to the upstream sampling sites and, therefore, may have contributed to the biological responses at site U4 (Figs. 2c, d, 3c, d). The complete measurements of physico-chemical data during the experiments are provided in Additional file 1: Tables 5S1–9S1.

Table 3 exemplifies the annual average estrogenic, anti-estrogenic and dioxin-like activities and the EC50 referring to the REF for baseline toxicity of water samples from the rivers Nidda, Usa and Horloff. The androgenic and anti-androgenic activities are not shown in Table 3, since they were always below the detection limit in the rivers Nidda, Usa and Horloff. However, the anti-androgenic activity was included in the further calculation of the ecotoxicological assessment system, as anti-androgenic activities were found in smaller rivers of the Nidda catchment. Androgenic activities, by contrast, did not occur in any water sample of the Nidda catchment and were, therefore, not considered for the assessment system.

As presented in Table 3, the Nidda exhibits a higher estrogenic activity than the rivers Usa and Horloff. The highest annual average anti-estrogenic activity was detected at U4, which results from a comparably high value of 16.1 mg OHT-EQ/L in November 2015, whereas the other measured values at U4 were at a comparable level as at sampling sites U1–U3. These findings are possibly due to discharges of the WWTP (43,800 PE), of the public baths and chemicals from agricultural use. At sites N1, H1 and H2 only one out of four measurements showed dioxin-like activities, which resulted in annual average activities below the LOQ.

In vitro assays with sediment samples revealed considerably fewer activities compared to water samples. Estrogenic and androgenic activities were not found in any sediment sample of the Nidda catchment and were, therefore, not considered in the assessment system. Although dioxin-like activities were not measured in sediment samples from the rivers Nidda, Usa and Horloff, they were nevertheless included in the ecotoxicological assessment system, as the YDS was conducted with sediment samples from smaller rivers of the Nidda catchment. Anti-estrogenic and anti-androgenic activities in sediment samples were only measured once in early 2015. The anti-estrogenic activity in the Usa was substantially higher with 559 mg OHT-EQ/kg at sampling site U4, than in the rivers Nidda or Horloff. In contrast, anti-androgenic activities only occurred at sampling site N4 of the river Nidda with 11.1 mg Flu-EQ/kg and at sampling sites H1 and H2 at the river Horloff with up to 3.61 mg Flu-EQ/kg sediment.

Figure 4 illustrates the mean EC50 values for baseline toxicity in the microtox assay of sediment samples from the rivers Nidda, Usa and Horloff. As the EC50 values indicate, the toxicity increases downstream at each river. Already at sampling site N2, the toxicity increases rapidly compared to the reference site, which could be due to the discharges of a paper mill and two WWTPs (35,000 and 7500 PE) as well as chemicals from the surrounding intensive agricultural land. At sampling sites N3 and N4 toxicity remains on a comparable level. The sediments of the Usa are less toxic compared to the sediments of the Nidda and Horloff. Only the sediment from site U4 reaches a similar toxicity as the sediments from Horloff and Nidda. At the Horloff, the most toxic sediment was from site H2 downstream the four small-sized WWTPs (< 1000 PE). Nevertheless, no significant differences between sampling sites and the corresponding reference site of a river were determined (one-way ANOVA with Bonferroni post hoc test, p > 0.05).

Deriving ecotoxicological status classes for in vivo data

After statistical evaluation of the in vivo and in vitro data, an ecotoxicological evaluation system in conformity with the EU-WFD was established. For the active biomonitoring campaigns and laboratory experiments with P. antipodarum and G. fossarum the endpoints mortality as an acute parameter and differences in embryo numbers or fecundity index compared to the reference site as chronic parameters were included in the derivation of the ecotoxicological status classes. For reference sites, the status class was set with mortality as the only criterion. The status class 1 (high) was not considered for the in vivo tests. Significant effects served as threshold between status classes 2 (good) and 3 (moderate), from which on action is required. The remaining status classes were derived on the basis of Duft and Oehlmann [6] and the class boundaries were adapted to the field data set. This resulted in the following classification into ecotoxicological status classes for mudsnails and gammarids (Table 4, for further details see Additional file 1).

Table 4 Overview of the classification systems for the endpoints mortality and embryo number of Potamopyrgus antipodarum or mortality and fecundity index of Gammarus fossarum

For the laboratory experiments with Lumbriculus variegatus an analogue classification system was derived. However, we decided to exclusively consider those species for the derivation of the assessment system that were also used in the biomonitoring campaigns. Therefore, the experiments with L. variegatus were not considered in the classification scheme (Additional file 2: Figure 6S2).

After a status class had been assigned for a given test and site, the mean value from the four active biomonitoring campaigns was calculated for every in vivo test, resulting in a single status class per sampling site and test organism. The same procedure was performed for the status classes of combined water/sediment samples tested in the laboratory with P. antipodarum and G. fossarum.

Deriving ecotoxicological status classes for in vitro data

The status classes for in vitro data were derived separately for water and sediment samples. For the YES results in water samples, the proposed Environmental Quality Standard (EQS) of 0.4 ng 17β-estradiol/L from the European Union watch list [48] served as threshold between the status classes 2 (good) and 3 (moderate). Five class boundaries were derived for the assignment of the YES results to ecotoxicological status classes (for more details see Additional file 1). No EQS values or proposals are available for the other in vitro assays, thus the classification systems were derived using the iterative method according to Erhardt et al. [49] and class boundaries referring to Duft and Oehlmann [6] (for more details see Additional file 1). Following the establishment of the classification systems, each sampling site and each combined water/sediment sample was assigned to a status class based on the annual mean activity in the respective in vitro assay.

Weighting and merging of status classes

To obtain a representative ecotoxicological status class for the contamination of water or sediment at each sampling site, the status classes of all endpoints were calculated with a weighted mean. For the ecotoxicological status class of water, the status classes of active biomonitoring campaigns with P. antipodarum and G. fossarum (i.e., the mean values from four active biomonitoring campaigns each) were weighted with 40% each. The remaining 20% are distributed among the in vitro assays with water samples. Thereby the status classes of yeast screens contributed with 2.5% each (YAAS, YAES, YDS, YES) and the status class of the microtox assay with 10% to the calculation of the weighted mean at a given station. The in vivo results contributed more strongly to the weighted average, since effects in intact organisms have a significantly higher relevance than results from in vitro assays, which rather represent effect potentials than real effects on organisms. The resulting ecotoxicological status class for water is shown in Fig. 5 in the box at the top left for each sampling site. For graphic representation, a color code in compliance with the EU-WFD ranging from green to red is used, which indicates an increase in anthropogenic influence and thus a higher, i.e., worse, ecological or ecotoxicological status class [3]. For the ecotoxicological status class of river sediment, the status classes of the active biomonitoring campaigns (i.e., the mean values from four active biomonitoring campaigns each) were also weighted with 40% each, while the yeast screens with sediment samples (YAAS, YAES, YDS) were weighted with 3.3% each and the microtox assay with sediment samples with 10%. The resulting ecotoxicological status class for sediment is illustrated in the box at the top right in Fig. 5. In addition, the ecological status classes for fish according to HLNUG [50] are plotted in the box at the bottom left and the ecological status classes for macroinvertebrates (macrozoobenthos—MZB) according to HLNUG [51] are depicted in the box at the bottom right of Fig. 5. This enables the comparison between the derived ecotoxicological status classes and the ecological status classes and thus the prognosis whether the chemical contamination of water or sediment and its resulting effects or other factors, such as structural deficits and the hydromorphology of water bodies, are the main causes for the failure to meet the objective of a good ecological status of the EU-WFD. If chemical pollution is the dominant factor, the ecotoxicological status class should be identical to the ecological status class or should not deviate by more than one status class. A poor or bad ecological status class and a good ecotoxicological status class at a site indicate that it is not chemical contamination but other factors that are primarily responsible. The comparison of ecotoxicological and ecological status classes at the sampling sites allows the identification and prioritization of the best suited measures for the water management practice.

Fig. 5
figure 5

Map of the Nidda catchment with derived ecotoxicological status classes and ecological status classes for fish and macrozoobenthos (MZB) according to HLNUG [50, 51]. Green: good status (class 2), yellow: moderate status (class 3), orange: poor status (class 4), red: bad status (class 5), white: no HLNUG data for fish or MZB available. Red spots mark dischargers from WWTPs, public baths and a paper mill, green lines restoration areas. The map was modified with Adobe® Photoshop CC (Version 20.0.0, Adobe Systems Incorporated, San José, California, US) and is based on WRRL-Viewer [24]

The presented evaluation matrix for river water and for river sediment exclusively considers the status classes from the active biomonitoring campaigns and the in vitro assays. The status classes of the laboratory experiments with combined water/sediment samples were omitted as the laboratory experiments were only performed to verify the plausibility of the in vivo experiments under standardized conditions and to eliminate environmental stressors as possible causes for observed in vivo effects. Therefore, a second evaluation matrix for river water and a second for river sediment were derived, which also include the status classes of laboratory experiments (Additional file 3: Figure 7S6).

Discussion

Nidda

At the reference site N1 of the river Nidda the ecotoxicological status class of the water and sediment was good (class 2) and moderate (class 3), respectively. The worse status class of the sediment could be due to the Nidda river dam upstream N1 [52] and the resulting exposure to contaminated sediments. In addition, the input of contaminated sediments from agricultural land is also conceivable. At the same site, the ecological status classes for fish and MZB were evaluated as good [50, 51]. Here, the ecotoxicological status class is worse than the ecological status class. This illustrates that, although there is a relevant chemical contamination of the sediment, this does not affect the species community of fish and MZB, indicating that the good hydromorphological conditions can partially compensate for the chemical contamination. Between sites N1 and N2 there are a paper mill and two WWTPs (35,000 and 7500 PE) discharging into the Nidda, and both sites are surrounded by intensive agriculture. These all potentially contribute to the deterioration of the ecotoxicological condition of sediment and water at site N2 in comparison to N1. The water and sediment at N2 are only in a poor ecotoxicological condition (class 4), while the ecological status class for MZB has been assessed as bad (class 5) [51]. Since the ecological status is only slightly worse than the ecotoxicological status class for water and sediment, it can be assumed that the high biological effects in water and sediment are probably caused by chemical contaminants and thus represent the dominant factor for the bad ecological status and structural deficits play just a minor role. At this point, however, it has to be considered that the MZB is exposed to pollutants over the entire life cycle, whereas gammarids and snails were only exposed in the river for 28 days during active biomonitoring campaigns in the present study. Therefore, the ecotoxicological condition is rather a conservative estimate of the actual biological effects caused by the chemical contamination and it cannot be excluded that the chemical stressors have caused the bad ecological status class for the MZB. To achieve an improvement of the ecological condition at N2, chemical contamination should be reduced in priority, for example by a reduction of the clearwater (i.e., treated wastewater) proportion in the river or by the application of more effective wastewater treatment technologies [29, 53, 54]. In addition, measures should also be implemented to reduce the impact of intensive agriculture on water bodies (e.g., larger riparian strips, extensification of agricultural land). Between sampling sites N2 and N3 the Horloff enters the Nidda. The inflow of the river Horloff seems to have a dilution effect, so that the ecotoxicological status classes of water and sediment each improve by one status class. Nevertheless, the water and sediment quality are only moderate (class 3), so there is still a need for action to reduce chemical contamination. Since no data is available for the ecological status of MZB and fish at this site, no statement can be made regarding the relative proportion of the chemical contamination compared with other stressors. Sampling site N4 is located within a restored area below two WWTPs (30,000 and 12,000 PE) and is surrounded by intensive agricultural land. No improvement of the water and sediment quality at this site can be observed and the ecotoxicological status classes thus remain in a moderate condition (class 3). The ecological status class for MZB was also assessed as moderate (class 3), while the ecological condition of fish was poor (class 4) [50, 51]. Thus, the comparison again indicates a dominant contribution of chemical stressors to the moderate (MZB) or poor condition (fish) of the biological quality components, irrespective of the fact that site N4 is already in a restored area.

Usa

At the reference site U1 of the river Usa, water and sediment quality are in a good ecotoxicological condition (class 2). The status class for fish was also assessed as good (class 2) [50], whereas the status class for MZB was only classified as moderate (class 3) [51]. Since there seems to be no relevant chemical contamination of water and sediment at this site, structural deficits must be the major driver for the MZB’s moderate condition. To achieve a good ecological status at this reference site, hydromorphological restoration measures are recommended as a matter of priority. Between sites U1 and U2 a WWTP (49,000 PE) discharges into the Usa. This is the likely reason why the ecotoxicological status classes of sediment and water drop to 3 (moderate) at site U2. Besides, an impact of the garden plots surrounding sampling site U2 on the ecotoxicological status classes is also conceivable (e.g., private use of pesticides). The ecological status class for MZB remains moderate, while the status class for fish declines by two status classes at sampling site U2 (class 4—poor). This suggests that at this sampling site chemical contamination is a relevant factor for the failure to meet the objectives of the EU-WFD, which is possibly accompanied by relevant structural deficits for fish in this section. At this sampling site hydromorphological restoration measures may lead to a slightly improved ecological condition of fish, but since the overall assessment of the ecological status class according to EU-WFD is based on the worst-rated component [3], it seems to be impossible to reach the good ecological status for fish and MZB without the previous reduction of the chemical contamination in water and sediment. Downstream of site U2 two restored areas follow without direct dischargers. Nevertheless, the ecotoxicological status classes of water and sediment remain moderate (class 3) at sampling site U3, which lies within one of the restored areas, while the ecological condition of MZB even decreases by one status class (class 4—poor) compared to site U2. The deterioration of the ecological status of the MZB is likely due to the fact that hydromorphological improvements may not meet the needs of the MZB. Haase et al. [14] reported that most of the hydromorphological restorations studied improved parameters that were less relevant to the MZB. This group of taxa is more directly dependent from microscale parameters such as substrate diversity and the presence of high-quality habitats (e.g., pool frequency, wood loading) that may not have been adequately addressed by the restoration measures at site U3 [14, 55,56,57]. Nevertheless, chemical contamination has to be reduced primarily to achieve a good ecological status at this sampling site. Between sites U3 and U4 a WWTP (43,800 PE) and the public baths are discharging their wastewaters. In addition, sampling site U4 is surrounded by intensive agricultural land and garden plots, which might also impact water and sediment quality. While the water quality at site U4 remains moderate (class 3), the ecotoxicological status class of the sediment and the ecological status class for MZB decrease by one status class each (class 5—bad) compared to site U3. We also found the highest toxicity of sediment samples in the microtox assay at this site of the river Usa, the highest electric conductivity with up to 5700 µS/cm and the highest chloride content with up to 11.7 g/L, which probably result from the health resort’s saline wastewaters being discharged into the Usa upstream U4. In addition, we detected the highest anti-estrogenic activities with an annual average of 8.00 mg OHT-EQ/L at this site, which, for example, could be due to the application of breast cancer pharmaceuticals in the hospitals discharging their wastewater into the river Usa via the WWTP upstream of U4. These parameters may have influenced the biological response but are nevertheless associated to an insufficient water quality at U4, which supports the statement that hydromorphological restoration measures by themselves are not sufficient to achieve a good ecological status, thus, a reduction of the chemical contamination appears indispensable to improve the ecological status class.

Horloff

At the reference site H1 of the river Horloff the ecotoxicological status classes of water and sediment, as well as the ecological status class for MZB are good (class 2) as neither direct dischargers can be found upstream of the site nor structural deficits occur. Thus, no measures to reduce chemical contamination or structural deficits are necessary at this site. Downstream of H1 four small-sized WWTPs (< 1000 PE) follow. As a consequence of these dischargers and the intensive agriculture surrounding H2, the ecotoxicological status classes of water and sediment and the ecological status class for MZB decline by one status class each (class 3—moderate) at sampling site H2 in comparison to the reference site H1. The ecological condition for the fish community is even assessed as poor (class 4), which indicates structural deficits relevant for fish and a relevant chemical contamination that should be reduced. Between sampling sites H2 and H3 there is a WWTP (78,000 PE) located which discharges into the Horloff and both sampling sites H2 and H3 are surrounded by agricultural land. In consequence, the ecotoxicological status classes for water and sediment (class 4—poor) and the ecological status class for fish (class 5—bad) decline by one class, while the ecological condition of MZB decreases by two status classes at sampling site H3 (class 5—bad) compared to H2. Fish and MZB are in the worst condition possible. As the ecotoxicological status classes for water and sediment are in a poor state, hydromorphological restoration measures on their own will not lead to good ecological status classes unless chemical contamination (e.g., from WWTPs and agriculture) is also reduced. Downstream of site H3, two restoration areas follow, between which site H4 is localized. Even though H4 lies directly within a restored area, the ecotoxicological status class for water remains poor (class 4) and the ecotoxicological status class for sediment becomes even worse (class 5—bad) compared to H3. A likely explanation for the further increase of the ecotoxicological effect level in the sediment within the restored area is the increased sedimentation of contaminated fine particulate matter in the flow-calmed zones of the restorations as a result from the higher flow diversity [52, 58]. Furthermore, it is conceivable that the dredging activities within the restored areas may have stirred up contaminated sediments, which may then have a negative impact on the invertebrate fauna. Contaminated sediments are also the likely explanation for the bad ecological condition (class 5) of the MZB at site H4. Surprisingly, the ecological status class for fish is moderate (class 3). On the one hand, this could be due to the fact that fish, in contrast to MZB, are more exposed to the water phase than to the sediment and, on the other hand, the good structure or hydromorphology within the restoration might partially compensate for the chemical contamination. In contrast, MZB are more exposed to the sediment. Furthermore, the hydromorphological restoration measures probably did not address the needs of the MZB sufficiently. For example, Haase et al. [14] reported that fish responded with a significant increase in the number of taxa (richness) and the number of endangered species (sensitivity) in restored compared to unrestored river sections to hydromorphological improvements such as the elongation of river length or the creation of multiple channels, whereas improvements for the MZB were more directly dependent from microscale parameters such as the composition of the hyporheic zone. Consequently, MZB did not profit from hydromorphological restoration measures to the same extent as the fish community and neither an increase in richness nor sensitivity was found. As a consequence of the bad sediment quality, structural restoration in this section of the river will not lead to an improvement of the condition of the MZB without a previous reduction of the chemical pollution.

Implementation in the EU-WFD and for water management practice

The ecotoxicological assessment system developed in the current study complies with the EU-WFD using the identical system of status classes. Furthermore, the new assessment system can be used as an extension of the EU-WFD, in which the assessment of the ecological status hitherto is based primarily on the response of aquatic communities to environmental stressors in general, although supported by specific pollutants, hydromorphological, chemical and physico-chemical elements [3, 4, 13]. By the comparison of the results of the ecotoxicological and the ecological assessment system under the EU-WFD, it is now feasible to identify the probable causes for the failure to meet the objectives of the EU-WFD and derive recommendations for action from these findings for the implementation of priority measures in water management practice. We, therefore, recommend the implementation of the ecotoxicological WFD–compliant assessment system in the EU-WFD to keep the workload for the water management practice and the expenses for implementation and realization as low as possible. As already mentioned, we recommend not to consider laboratory experiments with G. fossarum, P. antipodarum and L. variegatus for the assessment system, as these experiments are performed with grab water and sediment samples and, therefore, do not represent temporally integrated exposures. As expected, the evaluation on the sole basis of active biomonitoring campaigns results in worse, i.e., higher, status classes than the combined assessment with biomonitoring campaigns and laboratory experiments (cf. Additional file 1: Figure 7S6), since no water exchange was done during the laboratory experiments over the exposure period of 28 days and contaminants could, therefore, be volatilized, adsorbed to the vessel walls and chemically or biologically degraded and since other stressors prevailing in the field were eliminated in laboratory experiments. Therefore, tests with snails and gammarids under constant laboratory conditions were conducted for reasons of plausibility, to exclude non-chemical environmental stressors as causes for the observed biological effects, such as increasing or decreasing fecundity at rising temperatures [33, 40, 59]. The results of the laboratory experiments under constant temperature revealed similar effects in an attenuated form like in the monitoring campaigns. The laboratory experiments demonstrate that associated water parameters such as different water temperatures and flow velocities between sampling sites had just a minor influence on the biological response and can, therefore, be excluded as major causes for effects observed during the biomonitoring campaigns (for further details of associated parameters, see Additional file 1).

Furthermore, we found the smallest differences in the average water temperature between sampling sites of each river in spring. To minimize temperature-dependent effects on biological responses, we recommend performing active biomonitoring campaigns in spring. In addition, a worst-case estimation for the aquatic community would be possible if monitoring campaigns are performed in summertime, where the clearwater portion is highest due to low water levels. In addition, prior to performing active biomonitoring campaigns it should be ensured that snails and gammarids are applied during their reproductive period, to assess potential effects on this population-relevant endpoint. Reproduction in P. antipodarum occurs all over the year with seasonal fluctuations [40], while the reproductive period of G. fossarum extends from December to September [59]. We thus recommend performing the active biomonitoring campaigns in spring or summer.

To derive ecotoxicological status classes with minimal effort, we recommend performing at least one active biomonitoring campaign at a river per year and investigating sediment samples at least once per year in all mentioned in vitro screens. This is in line with the Austrian water quality monitoring system (AWQMS) [26]. In contrast to sediment samples, in vitro screens with water samples should be conducted more frequently as water samples represent a snapshot of the current condition of rivers, while sediment samples accumulate substances in a time-integrated manner [60, 61]. The AWQMS proposes to collect water samples from the rivers monthly or bi-monthly to check the quality of the river water [26]. To obtain a balanced cost–benefit ratio, we suggest testing water samples up to four times a year to capture environmental conditions with seasonal variations (e.g., high clearwater content due to low water levels).

For the future application of the ecotoxicological assessment system in water management practice, it has to be decided how the status classes for water and sediment should be combined so that only one status class results per sampling site. We recommend applying the worst-case principle (“one-out, all-out principle”) according to the EU-WFD [3], i.e., the worst assessed component determines the general status class, which is in line with the precautionary principle and the principles of risk assessment for industrial chemicals according to the European Technical Guidance Document (TGD) [62]. Hence, a single ecotoxicological status class would result, allowing a simplified comparison with the ecological status class under the EU-WFD.

Bridging the gap between ecological and chemical status within the EU-WFD

Until now, the assessment of the ecological status of rivers according to the EU-WFD is primarily based on the species community, i.e., the biocenosis, and the chemical condition of a water body is determined by the assessment of the concentrations of 45 priority substances supplemented by river basin-specific pollutants [63, 64]. Therefore, EQSs have been defined for priority substances to protect both individuals and the environment [63, 64]. To achieve a good chemical status, the concentrations of priority substances must be below both the annual average and the maximum allowable concentrations [63]. Thereby, the chemical status of a water body is classified into two status classes (good and not good) [63]. As already mentioned by Brack et al. [23] the classification into only two status classes fails to consider improvements and impedes the prioritization of effective control measures. Therefore, a graded system for classifying the chemical status of water bodies, as applied in the present ecotoxicological assessment system, seems to be more appropriate. Besides, the chemical detection of substances, such as 17β-estradiol or 17α-ethinyl estradiol, for which the European Commission proposes maximum allowable EQSs of 0.4 ng/L and 0.035 ng/L [48], respectively, remains challenging, since detection and quantification limits are often too high to fulfill the required standards in routine analytical methods [65,66,67]. In contrast, effect-based methods, such as reporter-gene assays or so-called small-scale in vivo whole organism bioassays, allow the detection of, for example, estrogenic substances at sub-nanogram levels and are, therefore, recommended as a complementary screening tool of estrogenic substances for the implementation in the EU-WFD [21,22,23, 66, 67].

Thereby, it is proposed to investigate water samples by in vitro assays and to focus time-consuming and cost-intensive chemical analysis on samples with estrogenic activity [21, 67]. Kase et al. [67] further explain that in vitro screens should be used to derive a status class for water, which should then be termed an ecotoxicological status, that completes the assessment of chemical and ecological status with one of the most important modes of action of EDCs. This strategy is consistent with the approach of the present study, in which water and sediment samples were analyzed in an in vitro test battery, and in addition, effect-based in vivo bioassays were conducted within an active biomonitoring in the field.

On the basis of these results, an ecotoxicological assessment system was derived. It is very important to highlight that we also included effect-based in vivo bioassays in the present ecotoxicological assessment system, as the exclusive use of cell-based assays or chemical analytical methods does not consider organism-level toxicokinetic changes (e.g., metabolism), which might lead to an under- or overestimation [67]. In addition, effect-based bioassays can help identify the causes of impairment of ecosystems and derive mitigation options [21]. However, targeted chemical surveillance, as used to assess the chemical status of water bodies in the context of the EU-WFD, only allows to monitor the concentration of priority substances. The presence of unknown chemicals, metabolites, transformation products, or non-regulated substitutes of priority substances are not taken into account [22], but can be considered when using effect-based bioassays [66, 68].

In addition, chemicals are usually present in complex mixtures in the aquatic environment, and although individual chemicals may be present below thresholds, the mixture effects of many chemicals can negatively affect aquatic organisms. Nevertheless, mixture effects are completely neglected by the EU-WFD. In line with this limitation, Carvalho et al. [69] have recently shown that EQSs are not sufficiently protective, because they neglect mixture effects of substances, as they occur under field conditions, since EQSs were derived on the basis of single substance tests under controlled laboratory conditions. Therefore, the implementation of effect-based methods in the EU-WFD is gaining importance and is increasingly recommended [22, 23, 66, 67], since effect-based methods integrate biological effects of mixtures of chemicals with the same mode of action [35, 68, 70, 71], as well as antagonistic and synergistic mixture effects [67].

This is one of the main advantages of the present ecotoxicological assessment system. The ecotoxicological assessment comprises active biomonitoring campaigns with standardized exposure times as in vivo effect-based methods (mortality and reproduction test) and also the results of in vitro effect-based methods (YAAS, YAES, YAS, YDS, YES, and microtox assay). In addition, the ecotoxicological assessment system includes the results of in vivo and in vitro bioassays with whole sediments, as bioassays are less matrix-dependent than analytical methods [67], while the chemical status according to EU-WFD focuses mainly on water quality. As mentioned by Escher et al. [22], a more holistic approach, as conducted in the present ecotoxicological assessment system, should be applied using a bioassay test battery that covers different types of effects and bridges the gap between the chemical and ecological condition [23, 66,67,68].

Conclusion

With the comparative use of the new ecotoxicological assessment system and the ecological assessment system according to the EU-WFD, it is now feasible to identify the probable causes for an insufficient ecological status of rivers. On the basis of the results obtained, recommendations for action can be derived for the implementation of priority measures in water management practice. At most sampling sites at the rivers Nidda, Usa, and Horloff, the ecotoxicological and ecological status classes were identical. At some sites, the ecotoxicological status was one and in individual cases two classes better than the ecological status. Therefore, there is a need for action to improve water and sediment quality at most sampling sites of the rivers Nidda, Usa, and Horloff based on the fact that ecotoxicological status classes were rated worse than class 2, except for the reference sites. Thereby, we could provide evidence and guideline that chemical contamination is a relevant stress factor for the aquatic biodiversity that contributes to the failure to meet the objectives of the EU-WFD and that hydromorphological restorations on their own will not lead to a good ecological status in these surface waters, as long as the water and sediment quality is deficient, i.e., not sufficient due to the prevailing contamination [2, 7, 14]. Therefore, we conclude in accordance with Haase et al. [14] that hydromorphological restoration measures should be performed in conjunction with measures to reduce the exposure to chemicals to achieve a good ecological status of the rivers Nidda, Usa, and Horloff.