Value and limitation of structure-based profilers to characterize developmental and reproductive toxicity potential

The uncertainty regarding the safety of chemicals leaching from food packaging triggers attention. In silico models provide solutions for screening of these chemicals, since many are toxicologically uncharacterized. For hazard assessment, information on developmental and reproductive toxicity (DART) is needed. The possibility to apply in silico toxicology to identify and quantify DART alerts was investigated. Open-source models and profilers were applied to 195 packaging chemicals and analogues. An approach based on DART and estrogen receptor (ER) binding profilers and molecular docking was able to identify all except for one chemical with documented DART properties. Twenty percent of the chemicals in the database known to be negative in experimental studies were classified as positive. The scheme was then applied to 121 untested chemicals. Alerts were identified for sixteen of them, five being packaging substances, the others structural analogues. Read-across was then developed to translate alerts into quantitative toxicological values. They can be used to calculate margins of exposure (MoE), the size of which reflects safety concern. The application of this approach appears valuable for hazard characterization of toxicologically untested packaging migrants. It is an alternative to the use of default uncertainty factor (UF) applied to animal chronic toxicity value to handle absence of DART data in hazard characterization.


Introduction
The uncertainty regarding the safety of food contact materials (FCM) has triggered increasing public, scientific and regulatory attention. The most consistent data gap deals with migrating substances, which originate from impurities in starting raw materials and from reaction and degradation products formed during material manufacture (the so-called Non-intentionally Added Substances, NIAS). For many of these chemicals, no toxicological data is available, and therefore, to assess their potential health risk is highly challenging. Schilter et al. (2014) developed an approach to establish the level of safety concern of chemicals lacking toxicity data. This approach, allowing to prioritize toxicologically untested chemicals, requires quantitative prediction of toxicological endpoints as well as exposure estimate to calculate a margin of exposure (MoE). Up to now it has only relied on prediction of chronic toxicity (such as from 2-year rodent studies), which in general constitutes the most sensitive endpoint (Kroes et al. 2000). However, for a complete hazard assessment, information on developmental and reproductive toxicity (DART) is also needed (such as obtained from twogeneration and teratogenicity studies).
DART includes a number of adverse outcomes, such as abnormalities in fetal growth, fetal death, structural dysmorphogenesis, reproductive impairments and cognitive dysfunctions. Importantly, the occurrence of maternal toxicity increases the complexity of the interpretation of such endpoints, since it may indirectly induce developmental effects . Experimental research has suggested that DART may be elicited by a diversity of mechanisms (Wu et al. 2013), although for most chemicals known to produce such types of effects, exact mechanisms of action are not fully elucidated (National Research Council 2000). Activation or/and inhibition of hormone nuclear receptors are well described mechanisms of action of certain endocrine disruptors and are considered as highly plausible molecular initiating events involved in chemical-induced DART (National Research Council 2000;UNEP/WHO 2013). Indeed, hormonal systems play major roles in normal developmental processes and interfering with these during critical windows of development is known to often result in permanent adverse effects later in life (UNEP/WHO 2013). Therefore, endocrine activity such as through nuclear receptor interaction can be considered as a strong alert for DART.
In absence of appropriate DART data, the use of uncertainty factors (UF) (3 or 10) to be applied to chronic animal no observed adverse effect level (NOAEL) has been recommended by governmental bodies, i.e., the US Environmental Protection Agency (EPA). In other framework, the size of such factor has been revised based on the level of incompleteness of available toxicological databases, e.g., when chronic NOAEL is available only in one species (Gadagbui et al. 2005). More recently, Blackburn et al. (2015) proposed that the results of in silico models [i.e., the DART decision tree developed by Wu et al. (2013)] could be valuable in informing the appropriate magnitude of uncertainty regarding developmental and reproductive toxicity.
The present work investigates the possibility to apply in silico toxicology to identify and quantify alert for DART with the ultimate aim to avoid the systematic application of conservative default UF to handle absence of experimental DART data in hazard characterization.
Publicly available predictive in silico tools exist, e.g., models implemented in the VEGA platform (www.vegah ub.eu) and in the OECD QSAR Toolbox profilers (www. qsart oolbo x.org/). Some of them address developmental toxicity only, others combine developmental and reproductive toxicity, while several refer to endocrine potential (i.e., estrogenic and androgenic potential) (Roncaglioni et al. 2008;Cassano et al. 2010;Marzo et al. 2016;Porta et al. 2016;Manganelli et al. 2019). However, they often only give a qualitative information related to the presence or absence of DART alert. This approach is suitable for hazard identification (identification of intrinsic properties of a chemical to cause harm) but it does not provide any information about hazard characterization (how much is needed to trigger a toxic effect) and even less on its potential health risk. Moreover, some models have a quite restricted domain of applicability in terms of chemical diversity, while others are applicable to a broader chemical space. Due to this large variety of data and chemical diversity their integration and application is very challenging. Aside from (quantitative) structure activity relationship ((Q)SAR) models, molecular docking is an efficient means for identifying potentially endocrine active chemicals (Vedani et al. 2006;Zhang et al. 2016). Compared to (Q)SAR, this method is more time-consuming.
In this context, the present work was aimed at developing and applying a fast in silico strategy to predict potential developmental and reproductive toxicity of FCM chemicals for which experimental data is missing. It follows basic principles similar to those reported by Blackburn et al. (2015). A stepwise approach, incorporating structural alerts (SA) and docking on nuclear receptors followed by read-across to quantify the alert (i.e., defining a NOAEL/LOAEL), was designed. Before being applied to a large list of FCM chemicals, the value of this new approach was tested in a pilot study to a restricted set of toxicologically uncharacterized FCM substances. Importantly, it assigns a quantitative information (NOAEL/LOAEL) for comparison with exposure to obtain a MoE (ratio between toxic level and exposure). The size of the MoE is a reliable indicator of the level of safety concern. Introducing this DART prediction approach to the decision tree proposed by Schilter et al. (2014) will significantly improve the establishment of safety concern of toxicologically uncharacterized food chemicals and could replace the use of default factor to account for the uncertainties related to the absence of experimental DART data.

Data set curation
The list of 195 chemicals used to develop and validate this strategy includes plastic and ink FCM migrating substances (Price and Chaudhry 2014). We extended the list of 183 chemicals used in our previous work on mutagenicity prediction ) by adding twelve FCM structural analogues identified by Price and Chaudhry (2014). In the curation process, experimental values, reported by Price and Chaudhry (2014), were revised and some new ones were introduced following literature and database searches, e.g., from freely available databases compiled for building (Q) SAR models: • Experimental information from the database of VEGA Estrogen receptor Relative Binding Affinity (ERRBA) model (Roncaglioni et al. 2008 (Cassano et al. 2010), combining subsets of information from the Teratogen Information System (TERIS) and US Food and Drug Administration (FDA) guidelines. Both sources contain assessments of human and animal data on potentially teratogenic chemicals. • The P&G data set with DART effects mediated by nuclear receptors, and by other mechanisms obtained from a detailed literature review (Wu et al. 2013). It also includes effects, which occur in the presence of maternal toxicity. This data set has been used to build the developmental/reproductive models (P&G) implemented in the VEGA platform ).
The final data set of 195 chemicals included 74 chemicals with experimental data: 14 substances gave at least one evidence for DART and/or endocrine activity, the 60 remaining ones were negative for one or more endpoints.
Curation of chemical structures was performed as described previously . The 74 toxicologically characterized chemicals were used to define our approach, which was then used to evaluate and prioritize the 121 remaining chemicals.

Profilers selected for DART hazard identification
The stepwise approach was developed on the 74 experimentally characterized chemicals with the aim to build a workflow easily and quickly applicable to prioritize long lists of chemicals. Therefore, predictions using several models, most of them freely available in the VEGA platform [CAESAR for developmental toxicity, IRFMN/CER-APP, ERRBA and in the OECD QSAR ToolBox (P&G DART and ER binding profilers)], mentioned in the data set curation section, were evaluated. From a toxicological perspective, to have a consistent coverage of potential DART all these models need to be used. However, since experimental data were mostly collected from the training sets of these models and due to the relatively small number of chemicals of interest, the selection of only two OECD QSAR Toolbox profilers was sufficient to establish the proof of concept for this pilot study. Indeed, Toolbox profilers are not intended to be used as models per se, due to recognised limitations (e.g., lack of an applicability domain). We selected the following profilers, keeping in mind these weaknesses.

P&G DART v.1.0 scheme Toolbox v 4.1
This profiler of the QSAR Toolbox is an adaptation of the decision tree for identifying chemicals with structural features associated with DART potential, developed by Wu et al. (2013), based on a detailed review of 716 chemicals. These chemicals were grouped into different categories and sub-categories. After running a chemical through the decision tree, the results indicate if the chemical of interest is associated or not with chemical structures known to have DART, or that it has structural features outside the chemical domain of the decision tree. We used the P&G DART to achieve a broad coverage of the relevant endpoints described by Wu et al. (2013), since it was built based on the evidence of both reproductive and developmental effects, mediated by nuclear receptors, e.g., AR and ER, and/or other mechanisms. P&G developed an internal automated version of the DART decision tree, defining a chemical domain of the tree and considering out of domain a chemical if the chemical scaffold structure does not overlap with any of the structures considered in the development of the tree. One limitation of the QSAR Toolbox version is that it does not assess the match of the underlying chemical scaffold of the target compound compared to structures addressed by the DART decision tree. Consequently, determination of whether a chemical is in or outside of the chemical domain of the tree needs to be done manually (Blackburn et al. (2015)). In our study we highlighted these limitations.

Estrogen receptor (ER) binding profiler Toolbox v 4.1
The Toolbox profiling scheme for ER binding classifies chemicals based on their molecular weight (MW) and on structural features, as very strong, strong, moderate and nonbinders based on experimental data and literature evidences (Schultz et al. 2002;Gallegos Saliner et al. 2003;Hamblen et al. 2003). This profiler was selected to have a coverage of available experimental data on ER mediated effects, not necessarily linked to DART effects, or for which DART potential has not been yet identified.

VirtualToxLab (VTL, version 5.8)
VTL is an in silico tool for predicting the endocrine activity potential of chemicals. The technology is based on an automated protocol that simulates and quantifies the binding of small molecules towards a series of sixteen proteins, mainly nuclear receptors. The approach is not training set dependent, and therefore, the applicability domain concept cannot be applied with the only restriction that the ligand should have MW less than 500. The approach has been extensively validated (Vedani et al. 2012), and in our study, the predictions were used to confirm ER and AR structural alerts highlighted by the other models. VTL is based on the three-dimensional interactions between the ligand and nuclear receptors, but it does not provide information on if the binding affinity will trigger or not an endocrine effect. Predicted binding affinities were converted in -log units and expressed in the following ranges: i. strong binders: binding affinity ≥ 9 (nanoM); ii. moderate binders; 9 > binding affinity ≥ 6 (microM); iii. weak binders: 6 > binding affinity ≥ 3 (milliM).

Read-across for DART hazard characterization
To assign a quantitative developmental/reproductive toxic value to chemicals for which a DART alert was identified, read-across was performed. Developmental/reproductive (N) LOAEL values of the most similar chemical belonging to the relevant P&G (sub)category were used for that purpose. Similarity was based on Tanimoto coefficient implemented in KNIME (Berthold al. 2008), ranging from 0 (i.e., maximum dissimilarity) to 1 (i.e., maximum similarity). Similar chemicals were promoted for read-across based on expert judgment.

Results
The stepwise approach (shown in Fig. 1) was developed on the 74 experimentally characterized chemicals to first identify and then characterize their DART potential. P&G DART and ER binding profilers were used for hazard identification followed by docking to confirm nuclear receptor binding mediated mechanisms. A read-across step was then introduced and applied to chemicals identified as positive for their DART potential.

Structural profiler application on experimentally positive chemicals
The P&G DART scheme correctly categorized seven out of fourteen experimentally positive chemicals, the ER binding profiler classified five of the remaining ones as potential binders. Their combination gave alerts for twelve toxicologically positive substances (see Table 1 and Supplementary  Table 1S). Overall, six of the experimentally positive chemicals were associated with ER binding. These substances were identified only by the ER binding profiler (see Table 1), except for BPA, which was associated with nuclear receptor mediated DART according to the P&G DART scheme. VTL confirmed the binding profiles of these chemicals, giving more specific details on binding affinity, also considering different receptor subtypes. For example, unlike the ER binding profiler giving the same label of "very strong binder" for bisphenol F isomers, VTL was able to discriminate between them, highlighting higher affinity (moderate) of 4, 4′-bisphenol F towards ER β and weak binding for 2,2′and 2,4′-isomers.
(2) Chemicals assigned to nuclear receptor binding mediated effects are screened by VTL to confirm binding to Nuclear Receptors (NR) through docking method. (3) For chemicals associated with a SA from P&G scheme and/or ER binding profiler, read-across is performed to assign an experimental quantitative (N)LOAEL value based on the most similar compound contained in the SA class  Yes? Yes?
which was labelled as positive by Price and Chaudhry (2014), and chlorobenzilate with a positive AR activity label from CoMPARA. Further literature and database searches did not provide any experimental data for 2,6-ditert-butyl-4-(3,5-ditert-butyl-4-hydroxyphenyl)phenol, and we concluded that the experimental value reported by Price is highly uncertain. Regarding chlorobenzilate, it was used as a pesticide and it is reported as FCM structural analogue by Price, thus it is not specifically a food contact chemical. This chemical was evaluated as a probable human carcinogen by US EPA. Injury to the sperm and atrophy of the testes have been observed in rats exposed to chlorobenzilate in their diet (US EPA 1999), which confirms that this chemical is a false negative.

Structural profiler application on experimentally negative chemicals
We also examined the 60 chemicals which were experimentally negative for DART/estrogenic/androgenic potential.
Twelve of them were classified as positive by at least one profiler (nine by P&G DART and three by ER binding profiler) (see Tables 2 and 2S). We examined experimental data of these to understand if their positive labels indicate true misclassification. Triphenyl phosphate and triethyl phosphonoacetate were categorized by P&G DART profiler as organophosphorus compounds. Triphenyl phosphate, a printing ink compound (Bradley et al. 2013), showed no evidence of toxicity to fertility in an old one-generation study in rat (Welsh et al. 1987) and in a recently performed prenatal developmental toxicity study in rabbit according to ECHA registration data (NOAEL = 200 mg/kg BW/day) (ECHA 2019). It was also negative for in vitro AR binding, according to CoM-PARA data. For Triethyl phosphonoacetate, an additive with restricted use in PET (EFSA 2008), there were no direct effects on visceral and/or skeletal development of foetuses in prenatal developmental toxicity study and the repeated-dose toxicity study did not show any effect on fertility, as reported in ECHA registration data (ECHA 2019). Even if in this case the two chemicals falling in this category were not developmental toxicants according to our experimental data search, this is still an important information given by the profiler. Indeed, organophosphates represent a large subcategory of chemicals potentially generating neurodevelopmental toxicity primarily by the inhibition of acetylcholinesterase (AChE) enzyme inhibition (Wu et al. 2013).
4-Methylbenzophenone, a photoinitiator for printing inks in the packaging field (EFSA 2009a), was classified as negative by Price and Chaudhry (2014) and falls in P&G DART category "Toluene and small alkyl toluene derivatives (8a)". Chemicals associated to this category are primarily developmental toxicants, and reprotoxic in the presence of a nitro-group (Wu et al. 2013). Unfortunately, literature searches fail to identify any experimental data which could inform on potential DART activity. Therefore, it was not possible to confirm or refuse the hypothesis that it is a false positive.
Propane-1,2-diol and butane-1,3-diol fall in the P&G DART category "Di-substituted hydrocarbons (24a)". This category is mostly based on evidence of toxicity exerted by chemicals, which are not reproductive or developmental toxicants per se; however, they are converted into active metabolites (Wu et al. 2013). This class does not refer to AR and ER mediated mechanisms, inline with available AR and ER experimental data reported in our database. For propane-1,2-diol, a food additive in the EU, no adverse effects were observed in the available oral reproductive and developmental toxicity studies (EFSA 2018). Similarly, 1,3-butanediol (1,3-butylene glycol), included in the EU register of flavorings substances used in or on foodstuffs, was considered to be non-developmentally toxic (EFSA 2011).
n-Butyl acrylate is classified in the P&G category "Vinyl amide, aldehyde and ester derivatives (21a)", which is based on teratogenic chemicals acting through non-NR mediated DART mechanism, inline with available absence of AR and ER experimental activity. For this chemical, adverse effects were observed only in the presence of maternal toxicity after exceeding the upper limit dose recommended by the respective OECD guidelines according to ECHA registration data. Since the P&G scheme considers also DART effects occurring in the presence of maternal toxicity (Wu et al. 2013), the information available indicates that this chemical may not be misclassified according to the intrinsic P&G DART criteria. However, classifying a chemical as developmental toxicant when the effect is observed at maternally toxic dose is a matter of debate. n-Butyl methacrylate and ethyl acrylate fall in the same P&G category as butyl acrylate and similar reasoning may be applied.
The three remaining chemicals were not classified for DART effects by P&G but they are labelled as binders by the ER binding profilers and by VTL. 2-hydroxybenzophenone is a false positive, since it is experimentally negative for ER binding. We did not find further experimental information on DART/endocrine activity for this chemical through databases and literature search.
Octabenzone and 2-aminoanthraquinone are structural analogues of certain FCM chemicals. Octabenzone is experimentally negative for AR and ER binding activity but is classified as binder by ER Toolbox profiler, confirmed by VTL. It cannot be considered a false positive, since experimental negative data refer to binding related to activity, while the ER-binding profiler refers solely to binding. 2-Aminoanthraquinone, profiled as strong ER binder by Toolbox, is classified as weak binder towards ER receptors and as nonbinder towards AR receptor by VTL, inline with available

Read-across
For each chemical associated with a P&G DART category and/or with nuclear receptor binding confirmed by VTL, the most similar compound of the relevant P&G category was identified and proposed for read-across. Experimental developmental/reproductive (N)LOAEL data of the most similar chemicals were used to fill the toxicological data gaps of these target chemicals when appropriate. Organophosphates were not considered for read-across, since a specific threshold of toxicological concern is available and can be readily applied (Kroes et al. 2004). Some chemicals [i.e., bisphenol A (BPA), Glycolic acid, 2-ethylhexan-1-ol, 2-ethylhexanoic acid, 2-propylhexanoic acid] are part of the categories/data collected by Wu et al. (2013), thus the most similar compound is the compound itself (similarity index = 1). Among the remaining chemicals, 2-and 4-methylbenzophenone, falling in the P&G (sub) category, "Toluene and small alkyl toluene derivatives (8a)", have a similarity index of 0.42 and 0.48 with o-xylene and 4-tert-butyltoluene, their most similar chemicals in this P&G category (see Supplementary Tables 1S and 2S). They share only the toluene ring with this chemical and not the methylbenzophenone, i.e., meaning similarity is insufficient to support read-across and may not lead to a fair comparison in read-across. Similarly, octabenzone, 3-tert-butyl-4-hydroxyanisole (3-BHA) and 2-aminoanthraquinone, profiled by ER Binding Profiler share only some functional groups when associated with their most similar chemicals in the P&G category 2b (related to ER and AR binding), which are tamoxifen, BPA and isoproteron, respectively, with similarity values of 0.42, 0.48, and 0.32, respectively. Based on this evidence, for chemicals falling in these two categories, i.e., Toluene and small alkyl toluene derivatives (8a) and ER and AR binding (2b), we established a cut-off similarity value of 0.5 as initial criterion to consider read-across reliable. For most chemicals associated to ER binding confirmed by VTL (i.e., Bisphenol F isomers, 2-and 4-hydroxybenzophenone, 3-BHA) the most similar chemical found in the P&G category 2b (related to ER and AR binding) was BPA (see Tables 1S and 2S). Their similarity values in comparison with BPA were greater than 0.5 except for 3-BHA, which was slightly below this threshold. 2-aminoanthraquinone and octabenzanone are structural analogues of FCM (as reported by Price) and not food contact chemicals as such. For analogues exceeding the expert-based 0.5 cut-off value in these two classes, we searched for their developmental/reproductive (N)LOAEL values to be assigned to the target chemicals through read-across (see Table 3). In some cases, inhalation values were found, which may be converted into oral values (Schilter et al. 2014). Actual (N)LOAEL values for developmental/reproductive toxicity of target chemicals, where available, are reported to test the approach. However, in most cases experimental designs were different from each other and the highest doses tested were too far from each other to enable a fair comparison between (N)LOAEL values.

Design of a stepwise approach
Based on the results described above, a stepwise approach for DART hazard identification and characterization was designed, as depicted in Fig. 1. Toolbox profilers are the first steps to identify DART hazard. Based on the limited number of chemicals of this study, sequential or parallel application of these two profilers did not affect final classification and results. Chemicals assigned to nuclear receptor binding mediated effects are screened using VTL to confirm binding to nuclear receptors docking method. Considering the limitations of the application of structural profilers, positive ER binding from Toolbox is considered only if confirmed by VTL. For each chemical associated with a SA from P&G scheme and/or ER binding profiler, similarity search to identify the most similar chemical contained in the relevant P&G category is conducted. For SA not identified by P&G DART scheme but identified by the ER binding profiler and confirmed by VTL, similarity search is performed with the compounds contained in the relevant P&G category related to ER and AR binding (2b) (Wu et al. 2013). Read-across is used to assign an experimental quantitative (N)LOAEL value based on the most similar compound contained in the SA class.

Application of the stepwise approach to toxicologically uncharacterized FCM chemicals
Finally, the stepwise approach was applied to the 121 untested chemicals (for DART) starting with profilers' application: overall, alerts were identified for 18 chemicals (see Table 4). The P&G DART scheme raised alerts for 13 out of 121 substances and ER binding profiler for seven of them, two were classified positive by both.

Discussion
The approach proposed is an improved strategy for translating DART alerts into a quantitative value to be used to establish a level of safety concern according to risk assessment principles. It represents a proof of concept study based on a limited number of FCM substances with the ultimate aim to be applied later to a significantly larger number of compounds. Here we applied profilers related to DART covering different mechanisms of action represented by the SA embedded in the P&G scheme and by NR-binding prediction. The selection of models used was tested on a limited number of chemicals. It can be refined at any time if required, i.e., through inclusion of new models for hazard identification. The stepwise approach can be considered as a workflow that can be applied using other (even proprietary) validated DART models both for SA/ QSAR as well as docking. In addition, read-across will be more reliably applicable if based on a bigger library of either food contact chemicals and/or other chemicals with appropriate toxicity data available. The Toolbox profilers are not intended to be used as independent in silico models. However, for documented DART chemicals the combination of the P&G DART scheme and ER binding profilers followed by VTL delivered almost a full coverage of positive hits on the small set of molecules we considered, leading to only one misclassification for the FCM analogue chlorobenzilate. For this chemical, AR binding activity was not identified by the Toolbox, highlighting the need to apply at least an additional model to cover specifically AR binding activity. Consequently, the inclusion of more in silico models covering a broader range of NRs in the first screening phase is advisable. This will be addressed in the next stage, when this stepwise approach will be applied to a larger number of substances. On the other hand, for experimentally negative chemicals, positive toxicity profiling does not indicate misclassification by default. Indeed, AR or ER binding may not always be associated with developmental and/or reproductive toxicity.
The first screening phase of chemicals allowed to reduce the number of chemicals to be tested with docking, which is a more sophisticated but more time-consuming technique requiring sharper competences. For some toxicologically characterized and uncharacterized chemicals docking highlighted stronger binding affinities on specific receptors. This may help to define the best analogues to conduct read-across. Additionally, docking did not confirm ER binding for two uncharacterized structural analogues of FCMs and, therefore, constituted the driver for actual classification.
Most importantly, we propose a read-across approach allowing to translate structural alerts into a quantitative (N) LOAEL to be compared with exposure estimate to obtain a margin of exposure (MoE). The size of the MoE informs on the level of safety concern. The read-across approach is based on structural similarity and is, furthermore, backed up by mechanistic plausibility. Chemical similarity search was performed among chemicals belonging to the same P&G category of the target mostly sharing the same mode of action (MoA). Importantly, similarity analysis highlighted some weaknesses of the P&G DART scheme profiler implemented in the QSAR Toolbox. In particular, chemicals labeled as "Toluene and small alkyl derivatives (8a)" have low similarity indices when compared to their analogues and, most importantly, contain more than 50% of other significant stuctural moieties in addition to the toluene structural scaffold.
In a number of cases, prioritized chemicals were in the database of the model, thus their quantitative reproductive/ developmental (N)LOAEL were directly used to quantify structural alerts. For most of the others, read-across could be applied using the most suitable analogue of the proper class. For the remaining, read-across assessment by applying a one-to-one comparison between target and the most similar chemical in the relevant P&G category was not feasible due to a lack of appropriate experimental data for the comparator. It is important to note that read-across can be performed using a category approach, extending the comparison to relevant chemicals (with suitable DART data) falling in the same category but not contained in the P&G database. Finally, when read-across is not applicable on chemicals presenting clear DART alerts, the possibility to apply an UF of 10 on chronic L(N)OAEL is still possible, as proposed by Blackburn et al. (2015). These authors showed that chemicals identified by the original DART tree (Wu et al. 2013) as being related to structures with known DART toxicity would potentially have lower DART NOAELs compared to their respective repeated-dose toxicity NOAELs than structures that lacked this association. They proposed to apply an UF of 10 to chronic NOAELs for chemicals giving exact match with structures with DART precedent according to P&G decision tree, or an UF of three for chemicals not matching with DART alert but in domain of the decision tree. In our case the most conservative UF of 10 would be recommended, since the P&G DART profiler implemented in Toolbox does not provide info on applicability domain.
The results described in the present paper indicate that methodologies (e.g., Schilter et al. 2014) aimed at establishing level of safety concern of chemicals without toxicological data may significantly benefit from integrating the approach proposed in the present paper. Indeed, bringing evaluation of DART potential would allow a more complete hazard assessment complying better with risk assessment principles. However, limitations and uncertainties have to be kept in mind. The models/profilers limitations are highlighted in the material and method section. One remaining uncertainty is the adequacy of the selected models/profilers to cover all DART endpoints/mechanisms. The P&G scheme has been documented to cover an array of alerts and mechanisms of DART. In our investigation, the addition of tools to predict interactions with estrogen receptor was considered sufficient to identify correctly all chemicals of our dataset documented to produce DART effects, probably due to the relatively low number of active substances in our dataset.
The uncertainties linked to the application of the proposed approach are more difficult to characterize. Schilter et al. (2014) observed and concluded that establishing safety concern based on an integration of high quality and relevant in silico toxicology predictions was unlikely to be significantly more uncertain than that based on experimental data (Schilter et al. 2014). Such conclusion may apply to the present approach dealing with DART hazard characterization, although uncertainties associated with read-across may be higher. The main uncertainties associated with read-across concern the number and degree of similarity of the chemical analogues, as well as the uncertainties related to the toxicological data found for the analogues.
Recent trends on read-across extended the concept of similarity not only to the chemical structure, but also to other features, related to the toxicological profile and toxicokinetics aspects, including metabolism (Schultz et al. 2015;Patlewicz et al. 2017). The use of SA in read-across supports this concept. Thus, read-across can be extended in this direction. Any improvement in read-across and in the set of in silico models will have an impact in two directions: (1) to increase elements for the evaluation, covering potential gaps; (2) allowing a refinement of the uncertainty of the overall strategy. With a reference to the EFSA Guidance on weight-of-evidence (EFSA 2017), the inclusion of these models will represent additional lines of evidence. According to this guidance, the different lines of evidence must be evaluated considering their relevance, reliability and consistency. In the case of read-across, if similar substances are not so similar to a given target, as in several cases we discussed, their relevance is very low, and they can be disregarded. We discussed also about the reliability of the experimental values for the case of the read-across. Another element to consider is the consistency between multiple lines of evidence. The acceptability of the level of uncertainty is not an absolute value. The integration of results of multiple in silico models and read-across within the conceptual scheme of the EFSA guidance (2017) has been discussed by Benfenati et al. (2019).

Conclusions
The present approach offers the possibility to identify and assess chemicals for which hazards are driven by developmental and/or reproductive toxicity. For such chemicals, MoE based on DART could be used for safety assessment. The major drawback of the present approach is the scarcity of developmental/reproductive data which may reduce the possibility to conduct read-across. However, in cases of chemicals with DART alerts but not eligible for read-across because of the absence of appropriate analogues, chronic toxicity values may still be applied but with the application of a factor to take into account the uncertainty regarding DART effects.