1 Introduction

Ecotoxicological testing as an approach to assess sediment quality was first described in the late 1970s (e.g., Anderson and Prater 1977; Swartz et al. 1979) and gained interest in the following years. Giesy et al. (1988) have reasoned that chemical analyses of sediments should be complemented by toxicity tests, because the large number of potentially toxic substances would make the assessment of potential biotic impacts time consuming and costly, and neither bioavailability nor interactions among substances could be accounted for. Consequently, toxicity tests of elutriates, porewater, and/or sediments were introduced into management guidelines. Among the first examples, the US Army Corps of Engineers and the US Environmental Protection Agency developed a decision-making framework for the management of dredged material in 1985 that included biotesting of dredged material in addition to chemical evaluation if aquatic disposal was the preferred option (Lee and Peddicord 1988). Dutka et al. (1988) have stated the importance of applying biotests in combination (“biotest battery”) in assessing sediment quality, owing to the differing sensitivity of organisms. Several years later, Chapman (1990) described the sediment quality triad approach, in which a combination of chemical and ecotoxicological data, in addition to information on the biological community, was proposed as a powerful tool to determine pollution-induced degradation.

In Europe, compared with the USA, ecotoxicological testing started later but gained attention in light of the large amounts of sediments that must be dredged and managed in ports such as Rotterdam, Antwerp, and Hamburg. The Port of Hamburg had ecotoxicological tests performed on samples of dredged material for the first time in the mid-1990s, and around the same time, the Federal Institute of Hydrology in Germany began gathering biotest data on sediments along the Rhine and Elbe rivers. Since then, ecotoxicological investigation of sediment and dredged material has become much more common, but inclusion in national regulatory frameworks of European states remains slow. In an overview in 2003, den Besten et al. (2003) analyzed the regulatory implementation of bioassays in sediment and dredged material assessment in Europe. They showed a large variation among European countries in decision-making frameworks, which ranged from solely chemical-based assessment to biotests playing an important role in decision support systems. During a workshop in 2018, co-organized by SedNet and the NSR-Interreg project Sullied Sediments, data on how regulations in Europe have changed over time, as well as information regarding the roles of biological effect-based data, were gathered. Although a comparison between 2003 and 2018 showed some manifestation of bioassays in decision-making frameworks, the opposite trend has also been observed (Fig. 1). The Netherlands, which relied on bioassays for their dredged material assessment in 2003, has now excluded them from their regulations. Italy, in contrast, significantly increased the extent of biotesting and integrated biotesting into the assessment of marine dredged materials before management. Other countries, such as France and Italy, have included bioassays in their national guidelines, but have not yet integrated them into national regulations.

Fig. 1
figure 1

Status of the inclusion of biological effect-based assessments (BEBA) into national regulatory frameworks for dredged material (DM) in European states in 2003/2007 compared to 2018 (based on den Besten et al. 2003; den Besten 2007 and the outcome of the SedNet & Sullied Sediments Workshop 2018)

Even in countries with long histories of bioassay use, such as France, The Netherlands, and Germany, and decades after the first inclusion of ecotoxicological testing in decision-making frameworks, concerns remain among decision-makers and other stakeholders. The following concerns may explain why ecotoxicological testing and data are often met with skepticism:

  • “Biotest results are much less precise and accurate than chemical data.”

  • “The low number of test organisms cannot represent the ecosystem sensitivity.”

  • “Because of altered environmental conditions, laboratory testing does not reflect natural conditions and thus cannot be related to bioavailability in situ.”

  • “Agreement on how to assess biotest data is lacking.”

  • “Biotesting significantly increases the costs of sediment management.”

The authors of this article observe, with reservations of their own, that misconceptions and assumptions, which may be mistaken for facts, can lead to decisions that are harmful to the environment and could be avoided. In this article, we thus scrutinize each of those concerns in light of what we have learned over the past several decades, and we hope to start a discussion on how stakeholders and scientists can work together to improve decision-making in the management of sediments and dredged material.

2 Responses to stakeholders’ concerns

2.1 “Biotest results are much less precise and accurate than chemical data”

If chemical analysis were a precise and accurate tool, different laboratories should detect very similar concentrations of compounds in the same sediment sample, and the same laboratory should detect the same concentration when measuring a sample several times. Interlaboratory proficiency-testing exercises allow participating laboratories to test their regular in-house analytical methods. For regulatory bodies, they serve as an external quality control evaluation of monitoring data.

Table 1 summarizes the outcomes of the interlaboratory exercises in the analysis of contaminants in sediments, organized by the International Atomic Energy Agency (IAEA) in 1998, 1999, and 2001. The interlaboratory coefficient of variation (CV) for organic compounds exceeded 100% for some analytes in all chemical substance groups tested. According to de Mora et al. (2007), these outcomes were in line with those of other proficiency tests, e.g., from Quality Assurance of Information in Marine Environmental Monitoring (QUASIMEME), and no improvement in analytical performance was detected between 1996 and 2006. However, for interlaboratory comparisons, laboratories perform analyses according to their in-house protocols. Thus, methods and potentially sediment pretreatment may differ. Consequently, interlaboratory differences tend to be larger than intralaboratory differences. In an exercise conducted by the National Institute of Standards and Technology (NIST) on analysis of PAHs, PCBs, chlorinated pesticides, and PBDE congeners, the CVs for three replicates, indicating the precision of in-house analyses, were below 10% for most laboratories but sometimes exceeded 50%, depending on the substance (Schantz et al. 2006).

Table 1 Reproducibility of chemical analyses of sediment contaminants in IAEA interlaboratory assessments (coefficient of variation, CV range in % for different compounds), compiled from Villeneuve et al. (2000, 2002), de Mora et al. (2007), and Wyse et al. (2004)

For trace metals, the reproducibility and accuracy of analyses are better when outliers are removed. Table 2 shows results from the IAEA-405 intercomparison exercise for arsenic and heavy metals that are routinely analyzed in most sediment and dredged material samples. With the exception of cadmium, all CVs were below 20% and thus were considered acceptable (Wyse et al. 2004). The data also showed a wide range of results when outliers were not removed, thus demonstrating the uncertainty that can accompany chemical data, particularly if quality control procedures are omitted.

Table 2 Extracted results of the intercomparison exercise IAEA-405 for commonly regulated trace metals in sediments (Wyse et al. 2004)

With regard to sediment bioassays, Dillon (1994) has stressed the necessity for intra- and interlaboratory comparison before these bioassays are included in regulatory decision support systems. Standardization procedures, and in this context round robin tests, are usually a prerequisite before bioassays are relied upon. Interlaboratory comparisons that precede standardization of bioassays are usually conducted on spiked rather than natural sediment or water samples. For this discussion, however, the focus is on the reproducibility of the results for natural (environmental) samples from the following selected biotests that are being or have been used for sediment and dredged material assessment in Europe, such as the whole sediment assays with Corophium volutator (amphipod), Echinocardium cordatum (sea urchin), and Caenorhabditis elegans (nematode); solid-phase tests with Aliivibrio fischeri (luminescent bacteria); sediment contact tests with Myriophyllum aquaticum (aquatic plant); and elutriate tests with an embryo-larvae development bioassay with Crassostrea gigas (oyster), Paracentrotus lividus (sea urchin), and Daphnia magna (water flea) (Table 3).

Table 3 Examples of reproducibility (coefficient of variation, CV in %) of sediment toxicity tests in interlaboratory assessments

Inter- and intralaboratory CVs differ strongly between bioassays. The variability would be expected to be highest for tests with larger organisms, and/or if test organisms are sampled from the field and not cultured in a laboratory. The latter is often the case for marine test species. Test organisms, collected in the field, are genetically more diverse, usually have longer life cycles, and fewer organisms are used per test replicate. However, the CVs for tests with marine organisms are most often within the range reported for ISO sediment toxicity tests performed with freshwater laboratory-cultured organisms (e.g., Feiler et al. 2014).

Because of the heterogeneity of sediments, lower precision might have been assumed for direct contact tests. Available data (Table 3) does not support this general assumption. Moreover, most results are well within the commonly accepted criterion of a CV of less than 30 to 40% (Environment Canada 1990; Moore et al. 2000).

On the basis of these interlaboratory comparisons, the assumption that ecotoxicological results in general are less reliable than chemical data can thus not be confirmed. Despite the more recent use of biotesting compared to chemical analytics, and although biological organisms naturally have variable phenotypes, CVs in chemical and ecotoxicological results for sediment quality assessment are in the same range, and sediment contact tests are not necessarily less reproducible than elutriate tests.

However, sampling methodology (pooling of subsamples, homogenization, and sample volume) has been shown to have a large effect on the reproducibility of solid-phase toxicity, and it cannot be recorrected after a sample is brought to the laboratory (Ferrari et al. 1999). Similarly, sampling has a key influence on the variability of the results of chemical characterization, often to a greater extent than analytical variability (Schiavone et al. 2011). Moreover, sediment storage and pretreatment significantly affect test results (e.g., De Lange et al. 2008).

Another major source of interlaboratory variability in ecotoxicological testing, as suspected by Stronkhorst et al. (2004), is the degree of experience of laboratory technicians with bioassays. Effort should be made to provide specific training in performing ecotoxicological tests if the results are used for regulatory purposes. This aspect is particularly important when the evaluation of test endpoints has some degree of subjectivity, such as the development of sea urchin larvae (Casado-Martínez et al. 2006b). One possibility for improving the performance of ecotoxicological testing of laboratories is the initiation of frequent interlaboratory comparisons for bioassays.

Although here we compare the precision of numerical endpoint results with those from analytical techniques through CVs, the use of CVs alone for assessing the results of toxicity tests has been challenged. Whereas extremely toxic or nontoxic samples may result in very low CVs (Burton Jr. et al. 1996), good agreement in the classification of samples according to toxicity and no toxicity may also be achieved with high CVs (Thursby et al. 1997; Casado-Martínez et al. 2006a, b). As Norberg-King et al. (2006) indicated: “it is important to keep in mind that the purpose of a toxicity test is not to find statistical differences; rather, it is to decide, with an acceptable degree of uncertainty, whether a sample is toxic.”

In conclusion

The statement that biotest results are generally less precise and accurate than chemical data cannot be confirmed. Nevertheless, more intra- and interlaboratory comparisons would help to harmonize procedures (sampling, pretreatment, and standard operation procedures) and to train technicians.

2.2 “The low number of test organisms cannot represent the ecosystem sensitivity”

This statement refers to the application of biotests to assess in situ sediment quality and to protect the environment against stress from contaminants. The sensitivity and stress levels of an ecosystem can be best assessed by studying the benthic community. However, changes in diversity can also be due to noncontaminant stressors, such as temperature or light; therefore, the “triad approach” combines benthic community data with toxicity data (and chemical data) (Chapman et al. 1997). Sometimes hypothetical “most sensitive test organisms” reflecting the sensitivity of the biological community have been desired to allow for cost efficient and fast determination of the chemical stress in situ.

This statement misinterprets the importance of ecotoxicological testing, and the search for the most sensitive organism will not be successful anyway, as Cairns (1986) has explained. Species differ in sensitivity toward chemicals with different modes of action; the same species may be very sensitive to substance A yet tolerant to substance B. Consequently, the search for a species “representative” of an ecosystem’s status is necessarily flawed. What we can expect from a biotest is information, such as the presence or availability of (undefined) substances that have the potential to disturb and affect organisms in the field. If basic biological traits are inhibited, such as photosynthesis, reproduction, or energy metabolism, the probability of implications for the ecosystem rises.

For the selection of a test species for sediment toxicity test development, practical reasons will prevail (e.g., availability and handleability). The utility of such tests can be greatly improved if the proposal for a species is accompanied by appropriate information regarding its sensitivity to contamination, its ecological importance, and its exposure pathways (Dillon 1994). As an indicator of potential risk to the biological community, a given biotest must be sensitive to chemical stress. An excessive tolerance would increase the likelihood of false-negative responses. Accordingly, field validation is needed, during which reactions of biotests are compared with measurable changes in the biological community, so that regulatory agencies can assess the relevance of bioassay results. Although there has been some debate regarding the need for field validation for sediment toxicity testing (Chapman 1995), a workshop to evaluate the uncertainty of measurement endpoints used in sediment ecological risk assessment highlighted the inadequate field validation of sediment toxicity tests in 1996 (Ingersoll et al. 1997). To overcome this bottleneck, several initiatives in the USA have demonstrated the ecological relevance of amphipod sediment toxicity testing. Long et al. (2001) have studied the relationship between acute sediment toxicity tests with marine and estuarine amphipods and benthic community structure metrics (abundance and diversity) in more than 1400 samples from studies conducted in the USA. Although the authors found considerable variability among the datasets, they concluded that ecologically relevant losses in the abundance and diversity of the benthic infauna frequently corresponded to decreased amphipod survival in laboratory tests. In > 90% of the samples classified as toxic, at least one measure of benthic diversity or abundance was < 50% of the average reference value. No amphipods were found in 39% of samples classified as toxic, although amphipods were also absent from 28% of the nontoxic samples. However, the abundance of crustaceans (notably amphipods) decreased in the infauna as amphipod survival decreased in the laboratory tests in many of the studied areas. A break point in the data indicated that, in general, amphipod abundance in the field was lowest when survival in the laboratory tests decreased below 50% that of controls.

A field validation study was also completed at a PAH-contaminated Superfund site USA, involving a 10-day toxicity test with the marine amphipods Leptocheirus plumosus and Rhypoxynius abronius (Ferraro and Cole 2002). Both toxicity tests were validated as indicators of changes in several macrofaunal community metrics that had low but sufficient statistical power to discriminate ecologically important effects: the percentage loss of the indices increased relative to values determined for nontoxic reference areas, as the average survival in laboratory toxicity tests decreased. Losses of benthic resources reached 50% when the survival dropped to 0%.

According to Borgmann et al. (2005), the freshwater amphipod Hyalella azteca is frequently one of the most sensitive organisms in sediment toxicity tests, according to the results of risk assessment of chemical substance registration. A close correlation between toxicity to H. azteca in laboratory toxicity tests and an abnormally low abundance of sensitive benthic species, such as amphipods, mayflies, sphaeriid clams, and tanytarsid midges, in the field has been shown to predict effects on sensitive species in situ.

The nematode Caenorhabditis elegans has become another frequently used organism in freshwater sediment toxicity tests. Haegerbaeumer et al. (2018) have compared the sensitivity of 27 wild nematode species extracted from freshwater sediments with that of C. elegans toward metals and PAHs. Although C. elegans is more tolerant to chemical stress than the average freshwater nematode species, the sensitivity of the extracted animals varied over a wide range, and the C. elegans responses were well within that range, except for benzo[a]pyrene.

In conclusion

Single-test species should not be expected to represent the sensitivity of ecosystems but should be regarded as indicators of available and harmful substances that may affect biological communities. From this perspective, more information must be compiled and provided on the sensitivity of test species toward relevant substances in comparison to that of biological communities, to provide information on the possibility of false-negative outcomes in batteries of biotests.

2.3 “Because of altered environmental conditions, laboratory testing does not reflect natural conditions and thus cannot be related to bioavailability in situ”

As indicated above, a direct extension of laboratory results to situations in situ is certainly not possible, as also described by Ferrari et al. (2019). The above statement reveals a misunderstanding of the purpose of performing biotests. These tests are intended to show whether there is a hazard for the aquatic or benthic community.

Therefore, experimental conditions may change as long as they remain environmentally relevant, and different scenarios may be tested.

In Europe, discussions of biotests in a regulatory context apply primarily to dredged material assessment. Deciding on management options for dredged material requires deciding on treatments, during which the material undergoes several physico-chemical changes, as do sediment samples in preparation for ecotoxicological testing. Bioassays usually require oxic conditions and more water than was present in the original sediment. Resuspension in a greater water volume and oxidation of samples will also occur during relocation of dredged material, and thus preparation of samples for biotesting simulates realistic conditions. Test conditions such as pH, temperature, or salinity, however, depend on the requirements of the given test organisms and must be kept within a certain range, even if it does not reflect the environmental situation. Ecotoxicological tests must be understood not to predict with high certainty what will happen in the environment but to characterize environmental samples on the basis of their properties under fixed conditions. The toxicity measured in the laboratory reflects the capability of the sediment to do harm under certain conditions and thus indicates toxicity potential. The information on ecotoxicity becomes meaningful in an environmental context, considering the exposure. Management decisions, e.g., to dredge or to relocate sediment, should be performed on the basis of its toxic potential to ensure that the material’s properties do not adversely affect the environment.

The same applies to chemical data for sediments. Bioassays, elution, or leaching tests are performed in standardized conditions that do not necessarily represent the in situ bioavailability of contaminants. Moreover, as reported above, sampling strategy, storage, and pretreatment may also alter contaminant bioavailability, thus affecting the results of chemical analysis. For example, De Lange et al. (2008) have reported the analysis of acid volatile sulfide (AVS) and simultaneously extracted metals (SEM) stored under different conditions: AVS increased significantly during cool storage, whereas SEM was not affected. The authors found different AVS values according to the sediment layer (i.e., 0–2 cm vs. 2–5 cm). In addition, the choice of digestion procedure may significanlty affect the results of trace element analysis (e.g., Mossop and Davidson 2003). Therefore, chemical analysis, like ecotoxicological tests, is performed to characterize environmental samples under fixed conditions, which may not reflect the in situ status.

In conclusion

Both biotesting and chemical analysis characterize sediments under standardized conditions that do not necessarily represent in situ conditions. Despite this limitation, both can reveal potential hazards to aquatic communities. However, ecotoxicological tests are more powerful in detecting the effects of pollutant mixtures and of chemicals that are not assayed.

2.4 “Agreement on how to assess biotest data is lacking”

Interpretation of individual test results within a biotest battery is performed differently depending on the laboratory and the guidelines. For whole organism tests, the results are often expressed as percentage inhibition of a certain endpoint such as mortality, photosynthesis, growth, or reproduction, compared with that of an unaffected control. Thresholds that differentiate “toxicity classes,” indicating, e.g., low, moderate, or high toxicity, often appear to be set arbitrarily and not to account for the characteristics of the test systems.

Different toxicity endpoints of different organisms have different response ranges, sensitivity, and precision, thus calling into question the use of strict threshold values in ecotoxicology (Ahlf and Heise 2005; Höss et al. 2010).

The issue becomes even more complicated in interpreting the results of biotest batteries, because the bioassays usually yield differing responses. Interpretation of multi-test results providing information on sediment or dredged material quality range from always considering the most sensitive organisms in a test battery (e.g., in Germany, according to GÜBAK-WSV (2009)) to integrating biotest data by more complex classification techniques, such as the Hasse diagram technique, fuzzy logic expert systems (Hollert et al. 2002), or toxicity profiling (Hamers et al. 2010). These integrating assessment approaches, although more complicated and less transparent, have the potential to improve decision-making on the basis of sound science and have found acceptance, e.g., in the Italian regulation for disposal of dredged marine sediments at sea in other than National Relevance Sites (SedNet and Sullied Sediments 2018).

In quantifying single and multiple responses in bioassays to assess their relevance in providing information on environmental toxicity of sediments, chemical analyses face a similar problem. Sediment quality guidelines (SQGs) are intended to relate the chemical concentrations in sediments to hazards. They have been developed to protect the biological community, to predict effects on benthic organisms, or both. Most have been derived through empirical or theoretical/mechanistic approaches.

Many (controversial) discussions have debated the design, implementation, and limitations of SQGs, thus resulting in a large variation in guidelines. DelValls et al. (2004) have reviewed SQGs from different European countries and have shown that they differ by two orders of magnitude for some substances (e.g., As, Cu, and seven PCBs). Most of the limitations listed and discussed at the Pellston Workshop on “use of sediment quality guidelines and related tools for the assessment of contaminated sediments” in 2002 have not been addressed to date for existing SQGs; e.g., they deliver no or limited information on the ecologically important aspects of chronic toxicity to sediment-dwelling organisms and cause-effect relationships, in addition to the questionable transferability of SQGs, derived from one endpoint in the laboratory, to, e.g., effects on organisms in the environment (Wenning et al. 2005). Moreover, existing SQGs cover tens of substances at best, and therefore substances of emerging concern cannot be reliably assessed with this tool. The same applies to chemical guidelines developed by countries to manage dredged material (action levels). These action levels vary substantially among countries and cover only a very limited number of substances (see Röper and Netzband 2011).

In conclusion

There is indeed no agreement yet on how to assess biotest results, although several approaches that account for test-specific characteristics have been reported. Contrary to the common perception, however, sediment quality guidelines and action levels also substantially vary among countries and, even if effect-based, have limited ability to predict adverse effects or protect benthic communities. Complementary application of chemical analyses and ecotoxicological testing still appears to be the best way to decrease the probability of false-negative results from sediment or dredged material analyses. Sediment toxicity testing with carefully selected organisms to target contaminants with a special mode of action could become a cost-effective monitoring technique.

2.5 “Biotesting significantly increases the costs of sediment management”

A brief study of testing costs was performed during drafting of a guidance document dedicated to the hazard assessment of sediments in French waterways (Stamm and Babut 2019). Two types of biotests were considered: miniaturized tests intended for a screening tier and classic biotests intended for an in-depth assessment if the screening tier did not lead to making a decision. The associated costs are shown in Table 4.

Table 4 Expected unit costs (excl. VAT) for a range of biotests. na not available

Well-known, commonly used tests, such as ostracod (ISO 14371) or Microtox™, have unit costs similar to the costs of “simple” analyses, such as those of trace elements, PCBs (except dioxin-like congeners) or PAHs, which are mostly automated in chemical laboratories. Other tests appear to be more expensive for several reasons. A longer test duration (e.g., Gammarus), which entails a higher workload, leads accordingly to a more expensive test. The cost cited by potential contractors is also influenced by the potential demand (i.e., the number of tests the contractor expects to perform), which in turn is associated with the investment needed and the number of laboratories accredited for those tests.

Thus, currently, according to these tariffs, the cost of the screening tier would amount to approximately 2000 € per sample when biotests are included or 1000 € when they are not, in which case, the list of chemicals analyses would be limited to trace elements, PCBs, except dioxin-like congeners, and PAHs. If the concentrations of more chemicals of emerging concern are required in a screening tier, only the cost of chemical analyses would increase, because the biotest battery would have the same level of response (i.e., taking into account that bioassays assess the potential effects of all chemicals in the samples).

Costs can be further cut substantially with the use of smaller test organisms, such as bacteria and algae, thus enabling the test procedures to be miniaturized (Rojíčková et al. 1998; Heise and Ahlf 2005; Wadhia and Thompson 2007; Paixão et al. 2008).

More broadly, including any additional lines of evidence in the assessment framework would increase the overall expense leading to decision. This trend would be true for not only biotests but also for chemicals of emerging concern beyond the current lists of priority substances.

In conclusion

In our view, the cost issue should be discussed in relation to the needs—what information is required to reach a decision—and the cost of mismanagement, that is, of making a wrong decision. With analytical and bioassay data complementing each other, the risk of false-negative results which would guide the decision in the wrong—and costly—direction would be reduced. Objecting that biotests are expensive does not make much sense: it is a simplistic argument with no rational grounds.

3 Opportunities in biotesting environmental samples

3.1 Environmentally safer decisions

The following examples demonstrate the risk of overlooking the adverse effects that chemicals that are not routinely measured or unknown might have on the environment if ecotoxicological testing of sediments is not performed.

  • de Baat et al. (2019) have performed sediment toxicity tests with the midge Chironomus riparius and chemical analyses in sediments from 12 areas in the Netherlands, where the major pollution sources were urban, agricultural, or from wastewater treatment plant (WWTP). They measured traditional contaminants, such as metals and PAHs, as well as emerging contaminants, which usually are not included in monitoring programs (WWTP markers such as bisphenol A and nonylphenol; pesticides such as prosulfocarb and triallate). Although the overlying water did not show toxicity to the sensitive invertebrate Daphnia magna, sediments from all sites had lethal and/or sublethal effects on midge larvae (Chironomus riparius). Chemical analyses showed that metals and PAHs were predominantly present in the pore waters of sediments at urban locations. Sediments from the areas of the WWTP and agricultural use, however, were dominated by WWTP markers and pesticides. The effects of these sediments were more pronounced than those from urban area material. This impact would not have been detected if only the “usual” list of contaminants had been measured and no ecotoxicological testing had been performed.

  • Feiler et al. (2013) applied a test battery consisting of five sediment contact tests—a plant test (Myriophyllum aquaticum, ISO 16191), nematode test (Caenorhabditis elegans, ISO 10872), oligochaete test (Lumbriculus variegatus, OECD 225), fish embryo test (Danio rerio, on the basis of DIN 38415–6), and bacteria test (Arthrobacter globiformis, ISO/DIS 10871)—to 21 native freshwater sediments characterized by a broad variety of geochemical properties and anthropogenic contamination. On the basis of the toxicity pattern derived from the test battery, the sediments were assessed according to a classification system for sediment toxicity. For sediments with high toxicity potential, the test-derived classification agreed well with the application of consensus-based sediment quality guidelines, whereas in sediments with low to medium toxic potential, SQGs often underestimated the toxicity detected by the sediment contact tests.

These examples also demonstrate the importance of performing tests in direct contact with sediments, although also elutriate or pore water tests are meaningful, given the possibility of sediment resuspension into the water phase.

  • Claus et al. (2009) have investigated the observation that phytotoxicity of elutriates and pore water from Elbe river sediments in Germany showed an increasing trend between 1992 and 2007, although the overall contamination decreased substantially since the fall of the Iron Curtain in 1989. Through an effect-directed analytical approach, the authors were able to characterize the substance that caused high toxicity in algae growth inhibition tests with Desmodesmus subspicatus as a low volatile, thermostable, nonpolar, and highly lipophilic compound, but were unable to structurally identify the contaminant.

3.2 Less overprotective measures

Another opportunity for ecotoxicological testing is the identification of cases in which low bioavailability of contaminants decreases the need for costly management actions.

  • Marziali et al. (2017a) have reported the results of test batteries applied on sediments collected in a North Italian reservoir where arsenic was the contaminant of concern, showing values up to 20 times higher than the chemical threshold (33 mg kg−1 d.w., according to the Probable Effect Concentration by MacDonald et al. 2000), which were significantly higher than the concentrations in the downstream river. To recover water storage capacity and maintain proper functioning of the dam, desiltation was necessary. Even if they were derived mainly from natural weathering, such values of arsenic would normally result in the mechanical removal of tons of sediments from the reservoir, with high costs and low potential for reuse. However, the application of test batteries on whole sediments and elutriates showed no/slight toxicity to test organisms, thus demonstrating low bioavailability of the toxicant, even after sediment remobilization and mixing. According to these results, sediment flushing in the downstream river stretch was considered feasible, according to a proper operational plan. Similar results were obtained in other North Italian reservoirs where SQGs were exceeded because of the accumulation of geogenic trace metals, atmospheric deposition, or local anthropogenic enrichment (Marziali et al. 2017b).

There are few examples of cases of management issues in which low toxicities despite high contamination have led to less strict management decisions, because the regulatory context for SQG is clear, and confidence in chemical data is high. While the potential of information on nonbioavailable substances from combined chemical and ecotoxicological data should not be underestimated in decision-making frameworks (Ahlf et al. 2002, 2008), analysis would require intensified biotesting with whole sediment bioassays to simulate the long-term exposure of organisms in the downstream river stretch, and chronic or subchronic tests, such as the whole sediment test with Heterocypris incongruens (Marziali et al. 2017a).

4 Conclusion and outlook: ecotoxicological testing of sediments—an overlooked opportunity

In the ongoing debate with stakeholders on the inclusion of bioassays in sediment management decision frameworks, tests are considered not meaningful but unnecessary and consequently cost inefficient. However, as shown above, their reliability is largely similar to that of chemical analysis of sediment. The examples in Section 3 demonstrate their added value in determining potential risks that would be overlooked if solely chemical data were relied upon. Whether the costs justify the outcome may be a matter of perspective, but if the aim is to perform management activities in a way that respects and protects the environment in a sustainable way, ecotoxicological testing provides the opportunities to do so.

Moreover, this article shows that many challenges still remain, some unique to bioassays and some shared with chemical analyses. The following challenges should be addressed to facilitate future biotesting of sediments for regulatory purposes:

  • Standardization of more biotests with a strong focus on sediment contact tests should be promoted.

  • The full paradigm for sediment toxicity test development by Dillon (1994) should be followed, and this work should be supported by regulatory agencies.

  • More intra- and interlaboratory comparisons should be performed to train laboratory assistants and to reduce the variability of biotest results.

  • Research into the automation of biotests should be increased to decrease costs and improve sample turnover.

  • To facilitate the interpretation of ecotoxicity data, test-specific effect-based thresholds should be developed.

Expertise in ecotoxicology should be strengthened in industry and regulatory agencies to maximize the opportunities that ecotoxicological testing offers.