CORAL: model for no observed adverse effect level (NOAEL)

The in vivo repeated dose toxicity (RDT) test is intended to provide information on the possible risk caused by repeated exposure to a substance over a limited period of time. The measure of the RDT is the no observed adverse effect level (NOAEL) that is the dose at which no effects are observed, i.e., this endpoint indicates the safety level for a substance. The need to replace in vivo tests, as required by some European Regulations (registration, evaluation authorization and restriction of chemicals) is leading to the searching for reliable alternative methods such as quantitative structure–activity relationships (QSAR). Considering the complexity of the RDT endpoint, for which data quality is limited and depends anyway on the study design, the development of QSAR for this endpoint is an attractive task. Starting from a dataset of 140 organic compounds with NOAEL values related to oral short term toxicity in rats, we developed a QSAR model based on optimal descriptors calculated with simplified molecular input-line entry systems and the graph of atomic orbitals by the Monte Carlo method, using CORAL software. Three different splits into the training, calibration, and validation sets are studied. The mechanistic interpretation of these models in terms of molecular fragment with positive or negative contributions to the endpoint is discussed. The probabilistic definition for the domain of applicability is suggested. Electronic supplementary material The online version of this article (doi:10.1007/s11030-015-9587-1) contains supplementary material, which is available to authorized users.


Introduction
In risk assessment, repeated dose toxicity (RDT) provides information on the adverse toxicological effects which can be induced by repeated exposure to a substance over a limited period of time [1,2]. No observed adverse effect level (NOAEL) indicates the dose at which no effects are observed, and the lowest observed adverse effect level (LOAEL) is the lowest dose at which adverse effect can serve as measure of the RDT [2]. The NOAEL is a toxicological requirement imposed by registration, evaluation authorization, and restriction of chemicals (REACH). When the NOAEL is not available, LOAEL can be used for the same purpose [1,[3][4][5][6][7].
REACH is recommended development of methods such as quantitative structure-activity relationships (QSARs) to assess the potential harmful effects of chemicals [3]. The aim of QSAR models is to establish correlation between the chemical structure of a substance and biological activity [8]. QSAR for estimating NOAEL/LOAEL has been described in the scientific literature [2,9,10].
The aim of this work was to build up QSAR model for the NOAEL by means of the CORAL software (http://www. insilico.eu/coral/).

Data
Experimental data were obtained from the OECD toolbox version 3.2. (http://www.qsartoolbox.org/download), downloading HESS and Munro databases and from the US EPA's integrated risk information system (IRIS) database (http://www.epa.gov/IRIS/). For all the compounds the canonical simplified molecular input-line entry systems (SMILES) were obtained by searching through CAS numbers on the PubChem Compound website (https://pubchem. ncbi.nlm.nih.gov/). The correspondence between CAS numbers and chemical structures was further checked using ChemID Plus Advanced website (http://chem.sis.nlm.nih. gov/chemidplus/). The numerical data from various sources were compared. If several values were available for the same compound, we used the lowest.
All doubtful or inorganic compounds, salts, and mixtures were eliminated, because the relationships between molecular structure and the NOAEL are very complex. We considered only data referring to 90 days of oral administration in rats and rejected reproductive toxicity studies. It is to be noted that the exchange of the 90-day study by shorter testing is an attractive alternative [11]. Taking into account this circumstance, values for 28 days of treatment were considered but, in order to have consistent data, they were divided by a factor of 3, as specified by the scientific committee on consumer safety (SCCS) in order to approximate the 90-day NOAEL [1]. After the above selection, about four hundreds of various substances with small molecules (e.g., 2-3 atoms) and vice versa with extremely large molecules (e.g., 100 or more atoms), molecules with specific groups, such as [N+], [NH4+], [nH], etc., and substances with molecules containing many various cycles / heterocycles were remained. Under such circumstances, the following limitations were used in the selection of compounds for the work set: (i) too large and, vice versa, too small molecules were removed (practically, molecules which can be represented by SMILES with length less than 70 and larger than 10 symbols, were selected); (ii) molecules which have only one cycles or have no cycles at all were selected; and (iii) molecules with special groups (indicated by square brackets) were removed from the work set. Thus, the dataset of 140 compounds has been selected. All values were converted to decimal logarithms (lgNOAEL). These compounds were randomly split into training, calibration, and validation sets three times.

Optimal descriptors
The hybrid optimal descriptors calculated with two representation of the molecular structure by SMILES [12] and by graph of atomic orbitals (GAO) [13,14] were used for QSAR analysis: where SA k is a structural attribute extracted from SMILES (NOSP, HALO, and BOND represented in Table 1) or from the graph of atomic orbitals (EC0 k represented in Table 2); the CW(x) is so-called correlation weight for a structural attribute extracted from SMILES (i.e., NOSP, HALO, and BOND) or from GAO (i.e., EC0 k ). The correlation weights are coefficients which are used for calculation with Eq. 1: the numerical data on the correlation weights are calculated with the Monte Carlo optimization which gives maximum of the determination coefficient (r 2 ) between experimental and predicted lgNOAEL for the training set. The T is threshold, i.e., coefficient for discrimination of SA k into two categories (i) rare (if frequency of SA k in the training set is less than T ) and (ii) not rare (if frequency of SA k is larger than T in the training set). The N is the total number of the Monte Carlo method epochs (N = 1, 2, . . ., 30). The NOSP is a descriptor indicating the presence (absence) of nitrogen, oxygen, sulfur, and phosphorus; HALO is a descriptor indicating the presence (absence) of halogens (i.e., "F," "Cl," "Br," and "I"); the BOND is a descriptor indicating the presence (absence) of double, triple, and stereo chemical bonds; Table 1 contains clarifications for SMILES attributes involved in building up a model. Table 2 contains an example of representation of the molecular structure by the molecular graphs. The EC0 k are extended connectivity of zero order in the GAO [13,14]. The modeling approach examined in this study includes three steps: Step 1 Preparation of the list of attributes extracted from SMILES and from GAO (Tables 1 and 2). Thus, functions of the training set, the test set, and the validation set are considerably different ones. The substances from the training set are the basis to build up model. The substances from the test set are a tool to examine the "objectivity" of the model which is built up with the training set. In other words, it is evaluating "if the overtraining is absent". Finally, the external validation set is a tool to estimate the true predictive potential of model with data on substances which are invisible during building up the model using the above-mentioned parameters T = T * and N = N * .
The user can calculate the DCW (T *, N*) and build up the model.
The predictive potential of the model calculated with Eq. 2 should be validated with external validation set invisible during building up the model [12,15]. It is to be noted that similar nonlinear models as rule are considerable better for visible training set (i.e., for the system of training and test sets) but poorer for the external invisible validation set [16,17]. The measure of the statistical prevalence of various molecular features (SA k ) which are extracted from SMILES and graph of atomic orbitals can be calculated as the following equation: where the P TRN (SA k ) is the probability of the presence of the SA k in SMILES of the training set, i.e., The P TRN (SA k ) is the probability of the presence of the SA k in SMILES of the test set, i.e., Table 2 Example of the representation of Acetoin (CAS 513-86-0; and SMILES="O=C(C)C(O)C") by means of (i) hydrogen-suppressed graph; and (ii) Graph of atomic orbitals Hydrogen suppressed graph    The N TRN (SA k ) is the number (frequency) of SMILES which contain SA k in the training set.
The N TRN is the total number of SMILES in the training set.
The N TST (SA k ) is the number (frequency) of SMILES which contain SA k in the test set.
The N TST is the total number of SMILES in the test set. If the probability of SA k in the training set is equal to the probability of SA k in the test set, it is the ideal situation and the defect of the SA k in this case should be estimated as minimal (zero). However, this situation is not typical, i.e., the difference between the probability of SA k in the training set and the probability of SA k in the test set, as rule, is not zero. Under such circumstances, the numbers of SA k in the training set and in the test set also should be taken into account.
The small frequency or the absence of SA k in the training set most probably will lead to decrease of statistical significance of the SA k for the model. The absence (or even the small frequency) of SA k in the test set together with significant prevalence of the SA k in the training set will lead to overfitting: the improvement of the model for the training set due to the correlation weight of the SA k will be accompanied by unpredictable influence of this correlation weight for the model within the test set. The Eq. 3 gives two criteria of expedient distribution into the training set and test set: (i) the difference between probabilities of attributes be in the training and be in the test sets should be as small as possible; and (ii) the numbers of attributes in the training set and test sets should be as large as possible. If the above-mentioned two conditions take place, the split into the training and test sets should be estimated as "satisfactory" one. Finally, the mole-cular features which are absent in the test set cannot improve model, and their defect should be defined as maximum (i.e., unit).
Thus, the measure calculated with Eq. 3 can be used for the classification of the active (not blocked) attributes in accordance with their prevalence in the training and test set.
Having the numerical data on the defects of SA k , one can compare reliability of the prediction for a substance, using the following criterion (DefectSMILES): The domain of applicability can be defined as follows: a substance is fall into the domain of applicability if its DefectSMILES obeys the condition: where DefectSMILES is average for visible set (training and test sets). Thus, the DefectSMILES gives possibility to define the domain of applicability for the models.
Using the summation of the Defect SMILES calculated with Eq. 4 one can define an integral characteristic of a split into the training and test sets: The Split Defect can be a useful characteristic of the distribution into the training set and test set from heuristic point of view.

Results
The CORAL software gives for three above-mentioned splits the following models (the n is the number of compounds in a set, r 2 is determination coefficient, q 2 is cross-validated r 2 , RMSE is root-mean-square error; F is Fischer F ratio):  Table 3 contains experimental and calculated lgNOAEL with Eqs. 7-9 for the training set and test set. The distributions of compounds into the training set and test set also are represented in Table 3. Table 4 contains experimental and calculated lgNOAEL for the external validation set. Figure 2 contains the graphical representation of these models.
The additional analysis of twelve splits (including three represented by models which are calculated with Eqs. 7-9; these are the splits 1, 2, and 3; Table 5 contains the data) gives possibility to study the criterion calculated with Eq. 6. Figure 3 represents the correlation between Split Defect calculated with Eq. 6 and the determination coefficient between experimental and predicted lgNOAEL for the validation set (12 random splits). Figure 4 represents the correlation between Split Defect calculated with Eq. 6 and the root-mean-square error for the validation set (twelve random splits). Unexpectedly, the increase of the Split Defect is accompanied by increase of the determination coefficient and by decrease for root-mean-square error for the external validation set. Thus, very likely, these correlations can be useful criteria to compare different splits into the training set and test set.        tors) is characterized by n = 218, r 2 = 0.35, and q 2 = 0.21 [18]. Although the correlation between molecular structure and the NOAEL takes place, there is a considerable percentage of other factors that can influence this endpoint [8,10,[18][19][20].
Thus, the suggested approach gives quantitative models of the NOAEL for three random splits into the training set, the test set, and the validation sets, and the predictive potential of these models are comparable with the predictive potential of models for the NOAEL [21,22] described in the literature [18].
Instead of the NOAEL/LOAEL approach, the Benchmark Dose Methodology (BDM) [23] can be used. However, the BDM is more expensive approach. Besides, in some cases, the BDM cannot be used to estimate toxicological behavior of substances. Taking this into account, the NOAEL/LOAEL approach should be estimated as a useful alternative for the BDM.
Finally, we deem that the principle "a QSAR is a random event" can be useful from regulatory point of view. In other words, the reliability of a QSAR approach for any endpoint in general, and for NOAEL in particular, should be validated with a group of different splits into the visible training set and invisible validation set [15]. The use of the approach (analysis of a group of distribution into the training set, test set, and external validation set, i.e., not only one split) in the case of the lgNOAEL numerical data for other set of organic compounds [18] gave the models which are statistically characterized by [24]: n ≈ 174, r 2 ≈ 0.70, s ≈ 0.41(training set), and n ≈ 21, r 2 ≈ 0.64, s ≈ 0.39 (test set). These models [24] are based on the representation of the molecular structure by  solely SMILES (without data on the molecular graph). In fact, compounds examined here are characterized by a bigger variety of the molecular structure; therefore, QSAR mod-els for these substances which are based on solely SMILES are characterized by a poor statistical quality. Fortunately, the hybrid approach (SMILES together with the molecular graph) gives the satisfactory statistical quality of the models for these very varied compounds calculated with Eqs. 7-9. The suggested criteria for the estimation of the defect for the individual SMILES and GAO (Defect SMILES, Eq. 4) and for the estimation of the distribution into the training set and test set (Split Defect, Eq. 6) can be a convenient tool for the QSAR analysis. The Defect SMILES gives possibility to detect "suspected" compounds, thus this criterion is a tool to define the domain of applicability. The Split Defect gives possibility to compare different distribution into the training set and test set and to select preferable distribution from point of view of robustness of a QSAR. The disadvantage of these criteria is their dependence upon the distribution of available data into the training set and test set.
The Supplementary materials section (Table S1) contains the numerical data on the correlation weights for SA k calculated with three different splits into the training and test sets. There are a group of SA k which are stable promoters of the lgNOAEL increase (i.e., SA k characterized by (i) significant frequency in the training set and (ii) stable positive values of correlation weights) and group of stable promoters of the lgNOAEL decrease (i.e., SA k characterized by (i) significant frequency in the training set and (ii) stable negative values of correlation weights). The above described SA k defects are also represented in Table S1 for split 1, 2, and 3. Table  S2 contains the numerical data on the correlation weights of SMILES attributes and GAO invariants for twelve random splits into the training set and the test set. Table S3 contains an example of the DCW (1, 30) calculation for a substance represented by the SMILES and GAO (acetoin, CAS 513-86-0). Table S4 contains the list of compounds that were selected as the external validation set. Table S5 contains twelve random splits into the training and test set which are examined in this work.
Thus, the suggested models have (i) definition of the domain of applicability (Tables 3, 4); (ii) the mechanistic interpretation in terms of the promoters of increase / decrease for lgNOAEL; and (iii) unambiguous algorithm to build up model. Consequently, described models are built up in accordance with OECD principles [21,22].

Conclusions
The NOAEL can be modeled by the Monte Carlo technique using SMILES and graph of atomic orbitals for the representation of the molecular structure. The statistical quality of models for the NOAEL calculated with the 2D descriptors is comparable with the statistical quality of models based on the 3D representation of the molecular structure with additional input of the physicochemical data. There are correlations between predictive potential of the models and the Split Defect calculated with Eq. 6 ( Table 5). It should be noted that the described approach based on 2D descriptors can be used to build up predictive models for the cases of other complex endpoints, i.e., endpoints related to nanomaterials [25,26] and endpoints related to peptides [27].