Background

In the current era of rapid global development and escalating consumption, there is a growing reliance on chemicals essential to nearly all manufacturing products [19, 51]. This has fueled population growth and advancements in sectors, such as energy and food production, yet it also poses significant risks to human and ecological health, including toxic pollution and species extinction [3, 5, 29, 51]. In European waters, organic chemicals are major contributors to biodiversity loss and ecosystem service degradation [24, 55]. Despite recognizing chemical pollution as a critical threat to sustainability, our understanding of its full impact is still emerging [10, 54]. Furthermore, this issue compounds other global challenges, such as climate change and biosphere integrity, with recent assessments indicating that the pace of chemical production and emission outstrips our capacity to manage [29, 34, 46].

Since the 1800s, the Chemical Abstracts Service (CAS) has documented over 204 million substances, with approximately 350,000 chemicals or combinations listed in national or regional inventories; notably, the European Chemicals Agency (ECHA) has over 26,500 chemical registrations [8, 11, 54]. Increasing production and use of chemicals raises concerns, as lessons from legacy pollutants such as PCBs and mercury highlight the risks of widespread chemical use [28]. Internationally, the Globally Harmonized System (GHS) has been adopted for consistent chemical classification and labeling, categorizing chemicals by properties, health impacts, and environmental hazards [32]. Moreover, the rising concern over Per- and polyfluoroalkyl substances (PFAS) has led to calls for their class-wide regulation. PubChem, one of the world's largest chemical databases, now includes over 7 million PFAS compounds following the OECD revised definition [45].

Frameworks for chemical management and assessment, such as life cycle impact assessment (LCIA), chemical alternatives assessment (CAA), comparative risk screening, and risk assessment, play a crucial role in assessing the toxicological effects of chemical exposures [4, 14, 27]. Life Cycle Assessment (LCA) tools, adhering to standards, such as ISO 14040, are increasingly recognized for quantifying potential toxicity throughout a product's lifecycle [13, 22, 35, 47]. However, comprehensive toxicity quantification relies on having characterization factors (CFs) for every chemical emission [16]. To determine CFs, modeling chemical environmental fate, exposure, and effects is crucial [36]. The USEtox model offers a fate-exposure-effect matrix framework for calculating freshwater ecotoxicity CFs, integrating fate factors (FF), exposure factors (XF), and effect factors (EF) [12]. USEtox uses chronic ecotoxicological data, typically the HC50, to calculate quantitative EFs for chemicals [12, 35].

Although numerous chemicals pose risks to ecosystems, only a small percentage have been thoroughly assessed for LCA, resulting in significant inaccuracies and gaps [37]. This is highlighted by the vast difference between the 145,299 substances listed by the ECHA (ECHA, 2023a) but only 3104 substances (3077 organic and 27 inorganic) detailed in the USEtox database (version 2.01) [12]. With USEtox providing freshwater ecotoxicity EFs for only 2499 substances, a large number of chemicals remain uncharacterized, primarily due to scarce EC50 chronic data.

In recent years, the USEtox consortium and the European Commission have made significant efforts in refining the methodology for calculating EFs and expanding the range of chemicals within the LCA framework by computing EFs for a broader array of chemicals [30, 38]. Such advancements have broadened the chemical scope within the Product Environmental Footprint (PEF), facilitating more comprehensive comparative life cycle analyses in Europe [41,42,43]. Saouter et al. [39, 40] highlighted the influence of selecting ecotoxicological data sources and toxicological endpoints on the USEtox EFs, underscoring the need for improved substance information within USEtox. Furthermore, Saouter et al. [41] incorporated aquatic toxicity hazard values from REACH, EFSA, and PPDB, yielding HC20 values from REACH/EFSA for 6764 chemicals and from PPDB for 1316 chemicals. However, this newly derived HC20 data has yet to be integrated into USEtox version 3.0, which currently relies on chronic EC50-based data.

The emergence of online experimental databases, particularly REACH and CompTox, offers access to ecotoxicity data for thousands of chemicals to calculate EFs as demonstrated by recent publications. However, in the absence of experimental data, QSAR models are increasingly employed to bridge data gaps [18]. These models correlate chemical structures with compound properties, using chemical descriptors for predictions [20]. Over time, their accuracy and reliability have grown, boosting their application. This trend is particularly significant as regulatory frameworks are progressively leaning towards minimizing animal testing [44]. Yet, millions of animals are still used annually for regulatory testing, a figure projected to rise given the ever-expanding use of chemicals in society [9, 23]. Prominent QSAR-based methods in use include ECOSAR, VEGA, and T.E.S.T. [6, 25, 26]

To effectively incorporate the available ecotoxicological data into CF calculations, the derivation of EFs is crucial. This study endeavors to leverage existing ecotoxicity data for such calculations. The objectives of this study are twofold: first, to develop a rigorous approach for ecotoxicity data harmonization, facilitating the determination of experimental EFs, and second, to compare ecotoxicity data from various QSAR models with experimental findings.

Materials and methods

Aquatic ecotoxicity data collection

Data on aquatic ecotoxicity were gathered from a variety of open-access sources to form an extensive data set. The primary experimental data were obtained from the REACH databases, which has substantial data on aquatic ecotoxicology [33]. This was further enriched by incorporating data from the CompTox version 2.2.1 database, which includes data from ToxValDB v9.4 [1]. A targeted approach was employed to secure experimental data specifically concerning aquatic environments from these sources. The REACH data were retrieved in August 2020, and the CompTox data were collected in July 2023. Given the regular updates to the CompTox database by the US EPA, the most recent ToxValDB version 9.4 was utilized, rather than the initial data set of ToxValDB version 9.1.1 released in 2022. However, the REACH raw data, acquired from another research project, has not been updated due to time and resource constraints.

The REACH database is structured into sections that detail specific study results. This study concentrated on the "ecotoxicological information" section, more precisely the "aquatic toxicity" subsection. In this each study provides a specific tested concentration value, supplemented with comprehensive details such as the endpoint, testing methodology, environmental conditions, species examined, and the duration of the test. The data from the CompTox database was derived from the U.S. EPA’s ToxValDB (version 9.4), a vast repository of experimental toxicity data aggregated from 49 distinct public sources, excluding any data from REACH. Within this collection, 10 sources are particularly focused on ecotoxicity data, including COSMOS, DOE ECORISK, DOE Wildlife Benchmarks, ECHA IUCLID, ECOTOX, EFSA, EnviroTox_v2, HAWC Project, HEAST, and HPVIS.

Estimated aquatic ecotoxicity data were gathered from two quantitative structure–activity relationships (QSARs): ECOSAR™ Version 1.11 through EPI Suite v4.11 [6, 50] and the US EPA Toxicity Estimation Software Tool (TEST) [26, 48], both accessed in August 2023. Estimated ecotoxicity data were first, sourced from ECOSAR™ Version 1.11 via EPI Suite v4.11 [6, 50]. ECOSAR encompasses organic and inorganic chemicals but omits organometallic chemicals and polymers. QSAR model applicability is constrained by the scope of its training set, which can lead to potential uncertainties when applied to substances beyond its intended domain. Consequently, any results generated by a QSAR tool for substances outside its domain were treated with caution due to these significant uncertainties. For this study, the harmonized experimental data set was processed through ECOSAR, estimating endpoints like fish LC50 96 h, daphnia LC50 48 h, and green algae EC50 96 h. More than 100 ECOSAR classes emerged from the analysis. For chemicals that aligned with several classes, both in terms of endpoints and species, a geometric mean was used to aggregate the data, resulting in a singular concentration value per endpoint and species. Second, ecotoxicity data estimates were obtained using the US EPA T.E.S.T. v5.1.2 [26, 48]. The estimated endpoints, derived from the consensus model in TEST, included fathead minnow LC50 96 h, daphnia magna LC50 48 h, and tetrahymena pyriformis IC50 48 h.

Selection and harmonization of experimental and estimated ecotoxicity data

A data harmonization strategy was formulated in this study to streamline both experimental and estimated ecotoxicological effect data. The primary objective was to harmonize the gathered ecotoxicity data, to ensure uniformity and comparability across crucial aspects like test organisms, endpoints, exposure duration, and ecotoxicity concentration values. Chemicals with uncertain or missing details were omitted. This approach was built upon four steps, as shown in Fig. 1: chemical substance identification, data quality assessment, data harmonization, and consistency verification.

Fig. 1
figure 1

Decision tree for experimental and estimated ecotoxicity data harmonization strategy

Data harmonization begins with the first step of chemical identification, which ensures that each data point is distinctly linked to its corresponding chemical. Following this initial step, the second step focuses on data quality assessment, emphasizing the importance of data reliability. This involves utilizing the database metadata to filter out lower quality datapoints from the overall data set. The third step of data harmonization ensures uniformity across several key aspects, including the naming and classification of test species, endpoint classification, units of study duration, exposure duration categories, and ecotoxicity concentration units. The process concludes with the final step of consistency checking, which scrutinizes the metadata for each data point to verify its completeness and the uniqueness of the collected data, alongside the removal of duplicates. Detailed descriptions of each of these steps are provided in the Supplementary Information in Table S2.

Ecotoxicity effect factors calculation strategy

This study focuses on the computation of ecotoxicity EFs, which are required to determine freshwater ecotoxicity CFs using USEtox tool. In USEtox, EFs are determined by calculating the linear slope along the concentration–response relationship of the HC50 [kg m−3] up to the point where the PAF (potentially affected fraction of species) is 0.5 [12]. This relationship is quantified using the equation: EF = 0.5/HC50 [PAF m3 kg−1]. The HC50 represents the concentration at which half of the species in a freshwater ecosystem are exposed above their EC50 value, derived from the geometric mean of chronic EC50s for various freshwater species whereas, the EC50 represents the concentration at which 50% of the exposed species population exhibits an effect. All equations involved in calculating USEtox EFs are detailed in the official USEtox documentation [12]. In this study, the calculation of log HC50EC50eq begins with the extrapolation of endpoints to chronic EC50 values (if needed), employing species group-specific extrapolation factors [2]. This step is followed by aggregating the extrapolated harmonized data at the species level utilizing the geometric mean for each chemical. Finally, calculating the average of log-values per chemical, resulting in the logHC50EC50eq, which is utilized in the USEtox framework to calculate EFs based on HC50EC50eq for individual chemicals.

Building on the collaborative work of the Ecotoxicity Task Force and insights from the SETAC Pellston workshop, a new method for calculating USEtox EFs was proposed by Owsianiak et al. [30]. Traditionally, CFs relied on chronic EC50 values, but the recommendation advocated for an HC20EC10eq based approach. HC20 represents the environmental concentration affecting 20% of species, with the effect on individual species determined using chronic EC10 equivalent ecotoxicity data. This relationship is quantified using the equation: EF = 0.2/HC20EC10eq [PAF m3 kg−1]. Owsianiak et al. [30] emphasized the importance of utilizing experimentally determined chronic EC10 data for the calculation of HC20, acknowledging the scarcity of such data. To bridge this gap, extrapolation factors were suggested [41]. Following these updated recommendations, this study also employs this methodology to calculate EFs based on HC20EC10eq, thereby aligning with the latest guidelines for calculating EFs.

Comparing ecotoxicity effect factors

The calculated log10 transformed EFs are used for the comparative analysis. Whenever there was at least one chemical shared between the compared data sources, EFs were plotted for analysis. Pairwise correlations were employed to evaluate the linear relationship between these log10 transformed EFs, with the correlation coefficient, r, indicating both strength and direction of the relationship. Further evaluation of the correlation strength involved performing linear regression analyses with unrestricted slopes. The coefficient of determination (R2) was then used to determine the robustness of the observed relationships.

The comparisons were conducted in several stages. Initially, this study compared the EFs derived from this study's experimental data with the existing USEtox EFs from the USEtox organic substances database (version 2.01). In this study, the EFs calculated were classified in three groups, starting with EFs based on HC50EC50eq using all the toxicity data after extrapolation to EC50 chronic equivalent denoted as EF (HC50EC50eq), All data. Secondly, using only EC50 chronic datapoints denoted as EF (HC50EC50eq), EC50chronic data only and thirdly using both EC50 chronic and EC50 acute extrapolated to chronic, denoted as EF (HC50EC50eq), EC50 data only. Then, this study proceeded to compare the EFs derived from this study estimated data with the USEtox EFs. Following this, this study compared experimental EFs with those calculated using ECOSAR and TEST. Additionally, this study compared the EFs derived from this study experimental data using two different methodologies for calculating EFs. One method is based on the USEtox 2.13 documentation, and the other follows the recommendations by Owsianiak et al. [30], as described in Sect. "Ecotoxicity effect factors calculation strategy". To evaluate the consistency of the relationship, this study utilized the R2 coefficient obtained from log-transformed regression analysis.

Results and discussion

Aquatic ecotoxicity data collected

The ecotoxicity data used in this study were gathered from two primary sources: experimental data obtained from REACH and CompTox, and estimated data extracted from ECOSAR and TEST. This collected data set underwent a harmonization process to create a harmonized ecotoxicity data set for EF calculations. From the REACH database, this study retrieved 225,517 ecotoxicity datapoints spanning 12,411 chemicals, each associated with an European Commission (EC) Number. In parallel, the CompTox database contributed 517,067 ecotoxicity datapoints encompassing 8640 chemicals, each identified by its CAS number. On the estimated data side, ECOSAR data provided 27,354 toxicity datapoints for 6029 chemicals, while the US EPA T.E.S.T. database accounted for 17,055 datapoints spanning 6762 chemicals. It is important to note that chemical identification in these sources relied on CAS numbers.

Ecotoxicity data selection and harmonization results

The data harmonization strategy, as described in Sect. "Selection and harmonization of experimental and estimated ecotoxicity data", is applied to both experimental and estimated ecotoxicity data. The results of this process at various steps of the ecotoxicity data harmonization strategy, across different data sources, are presented in Table 1 and further detailed in the supplementary information in Table S3. Table 2 provides an overview of the distribution of harmonized ecotoxicity data points and the corresponding number of chemicals across various data sources and endpoint classifications, illustrating the number of datapoints and chemicals at different endpoints. Following harmonization, the REACH data set was refined to include 72,705 data points for 5318 chemicals, while the CompTox data set retained 364,434 datapoints for 7464 chemicals. The aggregation of these two data sets and a subsequent duplicate verification step resulted in the elimination of an additional 2485 data points from the consolidated data set. As a result, this study compiled a harmonized data set of experimental datapoints comprising 434,654 datapoints for 11,295 chemicals, as shown in Tables 1 and 2. Each datapoint in the final combined data set is distinctly identified by its CAS, species name, species group, exposure classification, and endpoint classification. The final data sets for ECOSAR and TEST stood at 19,719 datapoints for 6029 chemicals and 17,055 datapoints for 6762 chemicals, respectively, as shown in Tables 1 and 2.

Table 1 Overview of ecotoxicity data harmonization results at different steps of ecotoxicity data harmonization strategy across various data sources
Table 2 Overview of the distribution of harmonized ecotoxicity datapoints and corresponding number of chemicals across various data sources and endpoint classifications

Ecotoxicity effect factors results

The harmonized ecotoxicity data in this study underwent an aggregation process, as outlined in Sect. "Ecotoxicity effect factors calculation strategy" to ensure data integrity and relevance for EF calculations in accordance with USEtox guidelines. Initially, for the calculation of log HC50EC50eq begins with the extrapolation of endpoints to chronic EC50 values employing species group-specific extrapolation factors [2]. This step is followed by aggregating the extrapolated harmonized data at the species level, producing 59,195 (n = 11,295) aggregated effect concentration data points for the experimental data set, with minor changes in the ECOSAR data set resulted in 16,456 (n = 6029) aggregated datapoints but no alterations in the TEST data set. Lastly, chronic EC50 data underwent further aggregation, utilizing the average of logarithmic values to calculate log HC50EC50eq [kg m−3]. This data set was then utilized in the USEtox model to compute EFs based on HC50EC50eq for each chemical.

In this study, EFs were derived by combining experimental EC50 chronic values with extrapolated endpoints to EC50 chronic values, resulting in EFs for 11,295 chemicals. While USEtox advocates for the exclusive use of EC50 chronic data for reduced uncertainty, this study also computed EFs solely from EC50 chronic data for 5047 chemicals. Furthermore, the analysis extended to encompass EFs derived from EC50 endpoints, both chronic and extrapolated acute, for 9543 chemicals, broadening the chemical coverage but introducing the uncertainty inherent in relying on a single endpoint. As for the estimated data, EFs were computed for 6029 chemicals using ECOSAR and 6762 chemicals using TEST. In contrast, USEtox provides EFs for 2499 chemicals, with 2426 chemicals overlapping with the experimental data in this study. Comprehensive statistics, including data points, species groups, log HC50 (log mg/L), and EF [PAF.m3.kg−1], are presented in Table 3.

Table 3 Overview of the summary statistics of different ecotoxicity data sources

Following recommendations proposed by Owsianiak et al. [30], this study also calculated EFs based on HC20EC10eq, as outlined in Sect. "Ecotoxicity effect factors calculation strategy" thereby aligning with the latest guidelines for calculating EFs. The calculation of log HC20EC10eq begins with the extrapolation of endpoints to chronic EC10 values employing species group-specific extrapolation factors [2]. This step is followed by aggregating the extrapolated harmonized data at the species level, producing 59,195 (n = 11,295) aggregated effect concentration data points for the experimental data set. Finally, chronic EC10 data underwent further aggregation, utilizing the procedure given in Owsianiak et al. [30] to calculate log HC20EC10eq [kg m−3]. The equation used is logHC20EC10eq = SSDμlogEC10eq + z0.2.SSDσlogEC10eq. This data set was then utilized to compute EFs based on HC20EC10eq for each chemical using the equation: EF = 0.2/HC20EC10eq [PAF m3 kg−1].

For substances with only one species toxicity data, HC20EC10eq was calculated following the procedure outlined in Saouter et al. [41]. This involves assuming that if there is only one EC10eq, then this value is equivalent to an HC50. Subsequently, an extrapolation equation is applied to convert the HC50 into an estimated HC20. In this study, the extrapolation equation was derived by comparing log-transformed values of HC50EC10eq [mg/l] and HC20EC10eq [mg/l] across chemicals with data from at least three species, encompassing 4563 chemicals. The regression analysis, as shown in Fig. 2, revealed a robust correlation, with an R2 value of 0.93, indicating a strong consistency between the compared values. The resulting extrapolation equation is log HC20EC10eq = -0.5 + 1.04*log HC50EC10eq, with the default extrapolation factor set as HC20EC10eq = 0.31* HC50EC10eq.

Fig. 2
figure 2

Regression analysis of log HC20EC10eq [mg/l] versus log HC50EC50eq [mg/l] for chemicals with data from at least 3 species, exhibiting a high correlation (n = 4563, R2 = 0.93, r = 0.96)

It is important to note that not all scientists endorse the USEtox approach, and ongoing improvements are being made to the methodology. These include recommendations by Owsianiak et al. [30] to introduce additional environmental compartments and adjust ecotoxicity effect modeling to reflect more environmentally relevant conditions by shifting focus from EC50 to EC10. USEtox also faces reliability issues with specific chemical groups such as PFAS, prompting Holmquist et al. [17] to develop a PFAS-adapted version of USEtox (version 2.1), which incorporates several enhancements for assessing PFAS ecotoxicity impacts. Adjustments were also made to ecotoxicological EFs considering species richness, following the LC-Impact method [15, 53]. For nanomaterials, USEtox is not suitable because its parameters do not support multimedia fate modeling of nanomaterials [7, 31]. Additionally, the method for including transformation products in CF calculations is not well-defined; Van Zelm et al. [52] suggested adjusting the parent compound CFs in proportion to the CFs of its transformation products. Related to risk assessment, Saouter et al. [39] noted the importance of aligning the ranking of chemicals by their ecotoxicity impacts in LCA with hazard characterizations used in risk assessment. Saouter et al. [39] compared the ecotoxicity approaches in LCA and chemical risk assessment and observed that these approaches could differ significantly, with discrepancies up to four orders of magnitude for the chemicals compared. LCA toxicity indicators cover various aspects of a chemical’s ecotoxicity potential, with USEtox aiming to assess the potential overall toxicity impact of substances on humans and ecosystems rather than focusing solely on hazard identification, which is central to risk assessment [40]. USEtox is a scientific consensus model used to screen, not necessarily to identify the chemical of concern according to risk-assessment approach [40]. This discrepancy underscores the critical need to establish a consensus between risk assessment methodologies and USEtox-based LCA calculations to optimize resource use and enhance the sustainable management of chemicals.

The calculated freshwater EFs exhibit a significant range, spanning between 11 to 16 orders of magnitude, as depicted in Table 3 and elaborated in supplementary information in Table S1. To visually illustrate this variation, Fig. 3 presents box plots categorizing EFs based on the type of data considered and EF calculation method used. These graphical representations effectively showcase the distribution of EFs, highlighting the considerable diversity in values across the different data categories.

Fig. 3
figure 3

Box plots of calculated EFs from experimental and estimated data in the freshwater compartment for different types of data

Comparison of ecotoxicity effect factors

The initial comparison involved contrasting the EFs based on HC50EC50eq derived from experimental data in this study with the existing USEtox EFs available in the USEtox organic substances database (version 2.01), as illustrated in Fig. 4 and Table 4. This assessment covered EFs for 2426 chemicals already present in the USEtox database, recalculated using experimental data. The results of the regression analysis were indicative of a robust correlation, with an R2 value of 0.84, demonstrating a strong alignment between recalculated values and those in USEtox.

Fig. 4
figure 4

Regression analysis of pre-calculated log transformed USEtox 2.13 database EFs [PAF m3 kg−1] in freshwater ecosystem versus log transformed EFs [PAF m3 kg−1] calculated in this study based on HC50EC50eq with experimental ecotoxicity data: all (left), EC50 chronic (middle), and EC50 (right), with correlation: moderate (n = 2426, R2 = 0.84, r = 0.92), low (n = 1418, R2 = 0.64, r = 0.80), and high (n = 2406, R2 = 0.85, r = 0.92) respectively

Table 4 Overview of the regression analysis of log transformed EFs based on HC50EC50eq calculated in this study versus pre-calculated log transformed USEtox 2.13 database [PAF m3 kg−1]

In line with the USEtox guideline prioritizing EC50 chronic data, this study conducted a separate analysis comparing EFs calculated solely with this data against USEtox EFs. This yielded an R2 value of 0.64, indicating a moderate correlation. However, when EFs derived from both chronic EC50 and extrapolated acute EC50 data were compared to USEtox EFs, the R2 value substantially increased to 0.85, signaling a strong correlation. This improvement suggests that incorporating extrapolated data significantly enhances data coverage and species group representation, resulting in normalized values closely aligned with USEtox. The notably high correlation observed between recalculated and USEtox EFs provides confidence that the EFs computed in this study for chemicals absent from the USEtox database adhere to USEtox calculation criteria. This underscores the reliability of the EFs generated for chemicals not originally included in USEtox.

The second comparison involved contrasting the EFs based on HC50EC50eq derived from estimated data in this study with the pre-existing USEtox EFs available in the USEtox organic substances database (version 2.01), as illustrated in Fig. 5. In this assessment, EFs for 2165 chemicals from ECOSAR and 2098 chemicals from TEST, which were already listed in USEtox, were recalculated using their respective estimated data.

Fig. 5
figure 5

Regression analysis between pre-calculated log transformed USEtox 2.13 database EFs, ECOSAR EFs, and TEST EFs [PAF m3 kg−1] in freshwater ecosystem: USEtox vs ECOSAR (left), USEtox vs TEST (middle), and ECOSAR vs TEST (right), with correlation: weak (n = 2165, R2 = 0.42, r = 0.65), weak (n = 2098, R2 = 0.55, r = 0.74), and weak (n = 4930, R2 = 0.57, r = 0.76) respectively

The regression analysis between USEtox and ECOSAR EFs revealed an R2 value of 0.42, indicating a relatively weak correlation. A similar analysis between USEtox and TEST EFs yielded an R2 value of 0.55, suggesting a slightly stronger but still weak correlation when compared to ECOSAR. However, when EFs from ECOSAR were compared against TEST EFs, the R2 value increased to 0.57, signifying a relatively stronger, yet still weak, correlation. This observed increase in the R2 value may be attributed to the inclusion of more data points, which can lead to values that align more closely with each other.

The low correlation between recalculated EFs using estimated data and USEtox EFs implies that the EFs for chemicals with estimated data do not align well with USEtox effect data. This suggests a lower degree of confidence in the EFs calculated for chemicals based on estimated data. Table 5 provides summary statistics of the regression analysis, presenting log-transformed pre-calculated USEtox 2.13 database EFs, ECOSAR EFs, and TEST EFs. Furthermore, the EFs calculated in this study with ECOSAR and TEST exhibit relatively low correlation, indicating that estimations from different QSARs can lead to differing results. It is important to note that QSAR models have a limited applicability domain, determined by their training data, making them more suitable for certain substance groups but not universally applicable. Typically, QSARs provide acute data for specific endpoints, covering a limited range of species groups. This data often requires extrapolation to the necessary endpoints and aggregation to calculate the EF for each chemical, introducing an additional layer of uncertainties. Therefore, when utilizing QSAR-based tools for calculating toxicity data to determine EFs, it is crucial to carefully understand the associated uncertainties and exercise caution. Users should take into account the applicability domain of QSARs for specific chemicals and avoid prioritizing QSAR derived EFs over experimental EFs, except in cases where experimental data are unavailable. This is supported by the findings presented in Table 5, which show a generally low correlation between experimental and estimated data based EFs.

Table 5 Overview of the regression analysis between log transformed pre-calculated USEtox 2.13 database EFs, ECOSAR EFs, and TEST EFs [PAF m3 kg−1]

The third comparison focused on contrasting the EFs based on HC50EC50eq derived from the experimental data in this study with the calculated EFs from ECOSAR, as illustrated in Fig. 6 and Table 6. In this investigation, 6029 chemicals were found in common between ECOSAR and the experimental data from this study.

Fig. 6
figure 6

Regression analysis of ECOSAR EFs [PAF m3 kg−1] in freshwater ecosystem versus log transformed EFs [PAF m3 kg−1] calculated in this study with experimental ecotoxicity data: all (left), EC50 chronic (middle), and EC50 (right), with correlation: very weak (n = 6030, R2 = 0.26, r = 0.51), very weak (n = 2964, R2 = 0.26, r = 0.51), and very weak (n = 5409, R2 = 0.30, r = 0.55) respectively

Table 6 Overview of the regression analysis of log transformed EFs versus calculated log transformed ECOSAR EFs [PAF m3 kg−1]

The regression analysis between ECOSAR EFs and all the experimental data showed an R2 value of 0.26, indicating a very weak correlation. Adhering to the USEtox guideline of prioritizing EC50 chronic data, a separate analysis compared EFs calculated using only these data with ECOSAR EFs, resulting in an R2 value of 0.26, pointing to a very weak correlation. However, when EFs derived from both chronic EC50 and extrapolated acute EC50 were compared against ECOSAR EFs, the R2 value increased to 0.30, signifying a very weak correlation, but relatively stronger than the previous comparisons.

The low correlation between the experimental EFs and the ECOSAR EFs implies that the estimated values for chemicals do not align well with experimental values in general. This results in a low degree of confidence in the EFs calculated with estimated data. Table 6 provides summary statistics of the regression analysis, presenting log-transformed EFs calculated in this study versus log-transformed ECOSAR EFs.

The fourth comparison centered on contrasting the EFs based on HC50EC50eq derived from the experimental data in this study with the calculated EFs from TEST, as illustrated in Fig. 7 and Table 7. In this research, 6762 chemicals were common between TEST and the experimental data.

Fig. 7
figure 7

Regression analysis of TEST EFs [PAF m3 kg−1] in freshwater ecosystem versus log transformed EFs [PAF m3 kg−1] calculated in this study with experimental ecotoxicity data: all (left), EC50 chronic (middle), and EC50 (right), with correlation: very weak (n = 6762, R2 = 0.34, r = 0.58), very weak (n = 2994, R2 = 0.31, r = 0.56), and very weak (n = 5794, R2 = 0.38, r = 0.62) respectively

Table 7 Overview of the regression analysis of experimental log transformed EFs versus calculated log transformed TEST database EFs [PAF m3 kg−1]

The regression analysis between TEST EFs and all the experimental data showed an R2 value of 0.34, indicating a very weak correlation. Following the USEtox guideline, which prioritizes EC50 chronic data, a separate analysis compared EFs calculated using only this data with TEST EFs. This yielded an R2 value of 0.31, pointing to a very weak correlation. However, when EFs derived from both chronic EC50 and extrapolated acute EC50 were compared against TEST EFs, the R2 value increased to 0.38, signifying a very weak correlation, but relatively stronger than the previous comparisons.

The low correlation between the experimental EFs and the TEST EFs implies that the estimated values for chemicals do not align well with experimental values. Consequently, there is a low degree of confidence in the EFs calculated with estimated data. Table 7 provides summary statistics of the regression analysis, presenting log-transformed EFs calculated in this study versus log-transformed TEST EFs.

The fifth comparison focused on evaluating the EFs based on HC50EC50eq derived from the experimental data in this study, considering different endpoint inclusions, as illustrated in Fig. 8. The regression analysis conducted between EC50 chronic data, and all experimental data yielded an R2 value of 0.82, indicating a strong correlation. Likewise, when comparing combined chronic EC50 and extrapolated acute EC50 against all experimental data, a high R2 value of 0.94 was observed, signifying a robust correlation.

Fig. 8
figure 8

Regression analysis of EFs [PAF m3 kg−1] in freshwater ecosystem between log transformed EFs [PAF m3 kg−1] calculated in this study with experimental ecotoxicity data: all vs EC50 chronic (left), all vs EC50 (middle), and EC50 chronic vs EC50 (right), with correlation: moderate (n = 5047, R2 = 0.82, r = 0.91), high (n = 9543, R2 = 0.94, r = 0.97), and high (n = 5047, R2 = 0.86, r = 0.93) respectively

However, when EFs derived from both chronic EC50 and extrapolated acute EC50 were compared against EC50 chronic data EFs, the R2 value remained strong at 0.86, suggesting that while the inclusion of different endpoints has an effect, it is not a significant determinant of the correlation strength. This implies a high degree of confidence in the EFs calculated when considering both chronic EC50 and extrapolated acute EC50 data. Table 8 provides summary statistics of the regression analysis, presenting log transformed EFs calculated in this study for all experimental data, all EC50 chronic data, and EC50 data.

Table 8 Overview of the regression analysis between log transformed EFs calculated with all experimental data, all EC50 chronic data, and EC50 data [PAF m3 kg-1]

The final comparison focused on contrasting the EFs derived from the experimental data in this study based on HC50EC50eq with the calculated EFs based on HC20EC10eq, as illustrated in Fig. 9 and Table 9.

Fig. 9
figure 9

Regression analysis of log transformed EFs [PAF m3 kg−1] based on HC20EC10eq in freshwater ecosystem versus log transformed EFs [PAF m3 kg−1] based on HC20EC50eq calculated in this study with experimental ecotoxicity data: all (left), EC50 (middle), and EC50 chronic (right), with correlation: high (n = 11,295, R2 = 0.93, r = 0.97), moderate (n = 5047, R2 = 0.75, r = 0.86), and high (n = 9543, R2 = 0.88, r = 0.94) respectively

Table 9 Overview of the regression analysis between log transformed EFs calculated using two different methodologies [PAF m3 kg−1]

The regression analysis with all the experimental data showed an R2 value of 0.93, indicating a very strong correlation. Adhering to the USEtox guideline of prioritizing EC50 chronic data, a separate analysis compared EFs calculated using only this data with EFs based on HC20EC10eq, resulting in an R2 value of 0.75, pointing to a moderate correlation. However, when EFs derived from both chronic EC50 and extrapolated acute EC50 were compared against EFs based on HC20EC10eq, the R2 value increased to 0.88, signifying a strong correlation.

The strong correlation between the experimental EFs based on HC50EC50eq and the EFs based on HC20EC10eq implies that data can be used to calculate extrapolation factor to convert EF (HC50EC50eq) to EF (HC20EC10eq) for chemicals with available EF (HC50EC50eq). In this study, the extrapolation equation was derived by comparing log-transformed values of EF (HC50EC50eq) and EF (HC20EC10eq) across chemicals with data from at least three species groups, encompassing 3827 chemicals. The regression analysis, as shown in Fig. 10, revealed a robust correlation, with an R2 value of 0.93, indicating a strong correlation between the compared values. The resulting extrapolation equation is log EF (HC20EC10eq) = 0.75 + 0.9897*log EF (HC50EC50eq), with the default extrapolation factor as EF (HC20EC10eq) = 5.33*EF (HC50EC50eq).

Fig. 10
figure 10

Regression analysis of log EF (HC20EC10eq) versus log EF (HC50EC50eq) for chemicals with data from at least 3 species groups, exhibiting a high correlation (n = 3827, R2 = 0.93, r = 0.96)

In this research, ecotoxicity data from REACH and CompTox version 2.2.1 underwent a systematic harmonization process as discussed in the method section, aligning with USEtox guidelines to ensure data set integrity and relevance for EF calculations. This thorough harmonization procedure led to a 37% reduction in data points. However, the study encountered limitations due to expertise constraints and information gaps inherent in ecotoxicity data with gaps and uncertainties. Challenges arose from the absence of a globally recognized standard for species naming and categorization in ecotoxicity, prompting the use of the US EPA ECOTOX knowledgebase [49]. Exposure classifications, as provided by Aurisano et al. [2], were employed as the best available estimates for algae tests due to a lack of clear acute/chronic differentiation in typical 72-h exposure. Endpoint classification complexities, especially within lower species sensitivity distributions, made distinguishing between NOEC, LOEC, and EC 1–10 values difficult [21]. Inconsistent units across endpoints resulted in the exclusion of endpoints that were unconvertible. Additionally, the variety of effects studied for a specific endpoint, species, and exposure type introduced further complexity, given varying sensitivities and data limitations. The study aimed to offer a close approximation rather than an accurate representation of the situation, acknowledging the influence of numerous identified and unidentified factors on ecotoxicity test outcomes and derived EFs.

Additionally, analyzing results for specific subgroups of chemicals could enhance our understanding. Nevertheless, a limitation of this study is that chemicals have not been categorized into distinct subgroups such as organic, inorganic, elemental, organometallic, and petroleum products, among others. This limitation stems from the absence of available extrapolation factors needed to convert various endpoints to chronic EC50 and chronic EC10 for different chemical subgroups. This study main aim was to provide EFs that aid in calculating CFs for chemicals, rather than facilitating comparisons between distinct chemical subgroups. However, investigating how different subgroups of chemicals exhibit varying ranges of EFs represents a potential direction for future research. While such a comparative analysis of chemical groups aligns with the future scope of this study, it falls outside the limited scope of the current study.

In terms of applications, USEtox CFs are typically useful for a first-tier assessment in potential toxicity assessment. However, if a substance appears to significantly contribute to the toxicity impact scores, it is recommended to verify and improve the reliability of the chemical-specific input data whenever possible. USEtox classifies ecotoxicity CFs as indicative or recommended based on data coverage across species and trophic levels. Recommended CFs require effect data from at least three different species covering at least three different trophic levels [12]. The scarcity of toxicity data points per chemical for various species at desired endpoints is a notable limitation in deriving EFs. To address this, Table S1 in the supplementary information of this study details the number of data points, species, and species groups per chemical, highlighting the uncertainties associated with each EF due to limited data availability. Moreover, uncertainty can also arise when the available data require conversion to the desired format through extrapolation methods, such as changing from acute to chronic exposure classes or changing endpoints. The available extrapolation and conversion factors are derived from limited data, contributing further to the uncertainty of the results.

Conclusions

The escalating use of chemicals presents both opportunities and dilemmas. While their benefits are evident, concerns about their potential environmental and health impacts are growing, prompting questions about the sustainability of current practices. This underscores the need to transition from a preliminary lifecycle perspective to a comprehensive LCA that quantifies the environmental concerns of chemicals. Central to this is the availability of EFs for chemical emissions. While risk assessment tools have been employed for years, generating vast data, there is a pressing need to consolidate this information for LCA applications. Drawing from the REACH and CompTox databases, this study identified 11,295 unique chemicals, of which only 2426 are currently cataloged in the USEtox database. To address this, this study calculated EFs for an additional 8869 chemicals, enhancing their representation in LCA studies. In total, this research covers 11,368 chemicals, with 2426 overlapping with USEtox. However, data scarcity posed challenges, leading to employing QSAR models like ECOSAR and TEST for estimating ecotoxicity data.

The strong correlation between the experimental EFs and those in the USEtox database underscores a high degree of confidence in the calculations for chemicals not yet included in USEtox. However, discrepancies arise when comparing USEtox EFs values with derived EC50 chronic values, likely due to variations in foundational ecotoxicity data. This correlation improves when both chronic and extrapolated EC50 data are considered, emphasizing the value of extrapolated data in broadening the data set and species representation. Conversely, the weak correlation between estimated EFs and those in USEtox, as well as between EFs derived from ECOSAR and TEST, suggests that different QSAR models yield varied results. This divergence between estimated and experimental data from REACH and CompTox databases underscores the need for caution when relying on estimated data. As chemical data continues to evolve, EFs should be periodically updated to remain current and accurate, underscoring their dynamic nature in reflecting the most recent data. There is also a recognized need to enhance consensus between risk assessment and USEtox-based LCA calculations. Achieving this alignment would optimize resources and allow for the reliable use of both methodologies in quantifying the toxicity impacts of chemicals.