Identification of potential aryl hydrocarbon receptor ligands by virtual screening of industrial chemicals

We have developed a virtual screening procedure to identify potential ligands to the aryl hydrocarbon receptor (AhR) among a set of industrial chemicals. AhR is a key target for dioxin-like compounds, which is related to these compounds’ potential to induce cancer and a wide range of endocrine and immune system-related effects. The virtual screening procedure included an initial filtration aiming at identifying chemicals with structural similarities to 66 known AhR binders, followed by 3 enrichment methods run in parallel. These include two ligand-based methods (structural fingerprints and nearest neighbor analysis) and one structure-based method using an AhR homology model. A set of 6445 commonly used industrial chemicals was processed, and each step identified unique potential ligands. Seven compounds were identified by all three enrichment methods, and these compounds included known activators and suppressors of AhR. Only approximately 0.7% (41 compounds) of the studied industrial compounds was identified as potential AhR ligands and among these, 28 compounds have to our knowledge not been tested for AhR-mediated effects or have been screened with low purity. We suggest assessment of AhR-related activities of these compounds and in particular 2-chlorotrityl chloride, 3-p-hydroxyanilino-carbazole, and 3-(2-chloro-4-nitrophenyl)-5-(1,1-dimethylethyl)-1,3,4-oxadiazol-2(3H)-one. Electronic supplementary material The online version of this article (10.1007/s11356-017-0437-9) contains supplementary material, which is available to authorized users.


In vitro data evaluation process and relative effect potency (REP) calculations
The search of compounds having shown aryl hydrocarbon receptor (AhR) mediated effects resulted in 214 compounds (Dataset S1), i.e. the AhR modulators. Out of these AhR modulators , 78 were measured in both ethoxyresorufin-O-deethylase (EROD) and luciferase reporter gene assays (dioxin-responsive Chemical Activated LUciferase gene eXpression, DR-CALUX) by a single research group (Behnisch et al. 2003 Figure   S1) and when correlating EC 20 data to EC 50 data ( Figure S2). That is, this showed the close relation in values between the two methods. Furthermore, these 78 compounds had EC 20 and EC 50 in the order of magnitude 10 pM to roughly 0.1 uM, i.e. a difference in REP value of 100 000 for the least active compound compared to 2378-TCDD (Behnisch et al. 2003).
For the investigated in vitro studies from the literature, REPs were calculated compared to the corresponding measurement of 2378-TCDD in the same study. Exceptions were the following; in the case of PCB 126 being measured instead as 2378-TCDD, the REP was made relative to PCB 126, and then extrapolated by a factor 10 to be comparable with REPs based on 2378-TCDD. The extrapolation was based on the fact that the WHO established risk assessment factors, i.e. toxic equivalency factors (TEFs) of 2378-TCDD and PCB 126 differ by one order of magnitude (Behnisch et al. 2003). In many cases, several articles had measured the same compound, i.e. multiple values were obtained. Due to that, the median REP was used as final result for the compound 's potency. The median REPs were used as a rough measurement to find patterns and correlations in the PCA based on calculated physico-chemical properties ( Figure S3, S4), that is, only for visual inspection. In addition, if 2378-TCDD was not tested for a compound, but a lower concentration (e.g. 1-10 nM) gave a significant increase in AhR-mediated effect or the compound was compared to another known AhR agonist (here, beta-napthoflavone, BNF) (Casado et al. 2006), the compound was added but its REP value put as "Missing". This was also applied in cases where 2378-TCDD had been tested in the same study, and dose-response curves were reported but no actual EC-values for both compounds. This was the case of, for instance, the known AhR ligands 3methyl cholanthrene (3-MC) and BNF (Denison et al. 2003, Garrison et al. 1996. Since the reported established K d -value of 3-MC and K i -value of BNF were 0.27 nM and 2.0 nM, respectively, the compounds were included in the data set of 66 AhR binders that induced AhR related activity as well as in the PCA of the 214 AhR modulators. In Dataset S1, REP values of 3-MC and BNF were set as missing (---), but the compounds seem to be approximately 1000 000 times less potent than 2378-TCDD, i.e. REP of 1*10 -6 , according to the doseresponse curves provided by Garrison et al. In the literature search, the compound lipoxin 4A was encountered, which dramatically differs in structure compared to typical AhR ligands, such as 2378-TCDD (Schaldach et al. 1999). The compound was tested for AhR activity in vitro, and produced a concentration-dependent response in a dioxin-responsive element (DRE)-driven chloramphenicol acyltransferase (CAT) reporter gene construct. Even though it was not a luciferase reporter gene assay, it was decided interesting enough to include in the analysis of AhR modulators. Furthermore, the compound also showed to increase the levels of cyp1a1 mRNA and to competitively bind to the AhR (half maximu m inhibition, IC 50 , = 0.1 uM) in experiments with guinea pig cells.

Additional information on the virtual screening protocol
The virtual screening protocol consisted of six steps (see Section 3.2 in the main text): pretreatment of the industrial chemicals (1), characterization based on physico-chemical properties and multivariate chemical space (2), three parallel steps (3-5) and a concluding analysis for the outcome of the parallel steps (6). Below follows additional information on the settings for some of the steps and the decision-making regarding cut-offs. The used statistical programs, methods and concepts are further described in the main text.

General data settings and procedures for the principal component analyses
Prior to all principal component analyses (PCAs), i.e. step 2, 2' (explained in next section) and 3, the descriptor data was scaled to unite variance. Significant principal components were defined as those having eigenvalues above 2, level of explained variation (R 2 X) above 4%, combined with the interpretability of the PC. The PCAs in step 2 and 2' had four significant PCs, and the one in step 3 had five significant PCs.
Step 2: Physico-chemical properties In step 2, the Hoteling T 2 values were used to detect and exclude outliers to the model (T 2 crit(95%)= 10,5681), and used as the actual filter. The remaining compounds' Distance to Model (DModX)-values were though studied -they ranged between 0-7, most of them below 5 in values -but it was concluded that the ones having high DModX values still showed high similarity in structure to the 66 AhR binders, and that it would be misfortunate to exclude them this early on in the process.
An additional step, step 2', was added due to five AhR binders being located outside the initial T 2 -Hoteling range in the PCA of the 66 AhR binders (Section 3.2.1); those compounds whose neighbors most likely were not fully accounted for in step 2. From the PCA of these five AhR binders (together with the 6,445 industrial chemicals), Euclidean distances were calculated and the cut-off was set to 1.5; higher distances did not share the same number of rings and/or similar functional groups in the right positions as the AhR binders. The descriptors for the PCAs in step 2' was, besides s caled, also log-transformed (using the auto-transform option in SIMCA 13.0) to normalize their distributions and to minimize the influence of extreme values originating from the industrial chemicals (Rannar and Andersson 2010).

Step 3: Structural fingerprints
The structural similarities between industrial chemicals and the 66 AhR binders were accounted for by calculating Tanimoto coefficients (TCs). A cut-off of 0.60 was chosen based on studying the output (and related structures) from a selection of the known AhR binders. For instance, a very structurally alike compound to one of the AhR binders (FICZ) was found having the value of 0.61; it only lacked one single bond to have the exact structural framework as FICZ. A TC of 0.50, on the other hand, showed to give compounds with very low resemblance to the tested known AhR binder, when visually inspecting the outcome (data not shown).

Step 4 : Nearest neighbor analysis based on chemical properties and chemical space
The cut-off in Euclidean distances to locate nearest neighbors was set by visually study the structures having the lowest distances and increasing it gradually. The PCA of the resulting industrial chemicals yielded five significant principal components (PCs) explaining 36%, 18%, 11%, 8%, and 5 % of the overall variation in the data, respectively. The first PC describes size and the second hydrophobicity ( Figure S8b), the latter both regarding surface characteristics and octanol-water partition coefficient. Furthermore, the third PC describes aromaticity normalized to the total number of bonds in the structure, the fourth shape, degree of halogenation and number of rings regarding highly aromatic compou nds, and the fifth describes the number of rings in relation to the compound's flexibility. The DModX values also show that the compounds are well fitted to the PCA model; the values were between 0 and 3, but all but five were below 2.
Step 5: Molecular docking In the molecular docking, the 3D-optimized structures from step 2 and 2' were docked together with the 66 known AhR binders. Then these structures were rescored to estimate the energy of binding (ΔG bind ) between the chemicals and AhR, using the MM-GBSA method to account for interaction energies and desolvation effects occurring upon complex formation. The visual analysis of the binders' top-poses showed that all ligands were present and highly clustered to the same space of the pocket. Also the analysis showed that there was a trend in ΔG bind -values rendering higher values for less potent congeners according to their IC 50 values. For instance, PCDD/Fs with less than four lateral chlorines in the structures, achieved higher ΔG bind than those with four lateral chlorines. Also PAHs received higher ΔG bind than the PCDD/Fs with four lateral chlorines, which is consistent with the trend between PAHs and PCDD/Fs according to their IC 50 values. All together this was taken as a proof of validity for the docking and rescoring procedure.
For each docked chemical, the pose with the lowest ΔG bind , i.e. the strongest binder, was extracted for further analysis. Figure S5 shows the distributions of ΔG bind -values from the industrial chemical data set and the AhR binders. The distributions from the two sets were fairly normal distributed and the distribution for the industrial chemicals was shifted to higher ΔG bind as compared to the AhR binders. That is, the distributions showed that the known AhR binders were more probable to bind than the total collection of industrial chemicals. We estimated a cut-off based on the ΔG bind values of the 65 known AhR binders (one failed in the docking procedure) which had an average ΔG bind of −112.5 kcal/mol. The industrial chemicals that had a ΔG bind within one standard deviation of the average ΔG bind (−99.3 kcal/mol) of the 65 known binders were considered more likely to be potential AhR binders than those with higher ΔG bind . This selection became the final enrichment from the molecular docking. Within this set of industrial chemicals no further ranking was made since that kind of accuracy was concluded not to exist. This conclusion was based on the fact that 2378-TCDD, the compound known to be the most potent AhR binder, only achieved a ΔG bind slightly above average among the AhR binders (-106 kcal/mol). This indicated that the recorded ΔG bind -values did not represent the exact ranking for each individual binder, and hence we could not expect the ranking to be more exact for the industrial chemicals.

Additional principal component analysis of the 214 AhR modulators
The first three PCs of the 214 AhR modulators showed four clusters: polycyclic aromatic hydrocarbons (PAHs), halogenated aromatics, natural products and flexible endogenous compounds ( Figure S3). Pharmaceuticals and other small and less flexible/more rigid aromatic compounds (among the endogenous substance s) were located at the center where all four clusters meet. The clusters were analyzed by the descriptors presented in the loading plot ( Figure S3b) and by creating contribution plots (SIMCA 13.0) from marking the clusters, two at the time, in the PCA. Table 2 includes some of the characteristics of the located clusters and differences between them (regarding PC 1-3). For instance, a difference between halogenated aromatics and PAHs was that the PAHs had higher HOMO energy than the halogenated aromatics, and the latter had higher GAP than the PAHs (Table 1).  (Larsson et al 2013). The score plot of the second and fifth PC managed to roughly capture the toxicological trend given by the median REPs ( Figure S4), at least for the halogenated aromatics and PAHs. This indicates that shape descriptors and measures of density in high extent were able to gradually discriminate between less potent halogenated aromatic compounds.

Additional analysis of the chemicals from the structural-based method
A unique feature captured by the molecular docking was the steroid ring structures. A class with steroid ring structure, the ginsenosides, recently demonstrated to include naturally occurring weak AhR agonists and antagonists based on rat, mouse, guinea pig luciferase reporter gene assays combined with a guinea pig competitive binding assay (Hu et al. 2013). As the ginsenosides, a handful of the 177 compounds had hydrophilic groups or side-chains connected to rings at both ends. One example is cholic acid (81-25-4), which is discussed in the main text. The other mentioned steroid, ethisterone (434-03-7) share similarities with another known AhR binder and inducer equilenin (Jinno et al. 2006). Both compounds have a keto-group and a hydroxyl-group at lateral positions in the structures and a methyl-group on the same ring as the keto-group (Table S4). This implies that they could interact with AhR in a similar fashion. Besides steroids, also aliphatic 3fused ring structures were promoted to bind to the AhR in the molecular docking and rescoring. An example of this is methyl tetrahydroabietate (MTHB, 19941-28-7) (cluster 6) which has three fused carbon rings and lateral substituents. To our knowledge, structures like MTHB have not earlier been tested for AhR-mediated effects.
Compared to cholic acid, the carboxylic acid feature is replaced by an ester functional group and the hydroxyl group by an iso-propyl group. There were many other compounds that we believe did not re semble any known AhR ligands. Among these were, for instance, compounds with numerous hydrogen -acceptors and one or two aromatic rings (cluster 28 and 29). One example is the tris(oxiranylmethyl) benzene-1,2,4-tricarboxylate (7237-83-4), a benzene derivative which is a triple ester with epoxide groups (cluster 28). The drug ambroxol (18683-91-5), a common in cough syrup and in other treatment of respiratory diseases, is an example of another combination of structural features (cluster 8). As endosulfan alcohol (2157-19-9) (Table S4), it has one halogenated and one hydrophilic, i.e. hydroxylated, lateral end of the structure. However, it also contains nitrogen atoms as bridging atoms in the carbon chains or as amine group. The same three features in the same order, i.e. halogens, nitrogen, hydrophilic groups, in a two ring structure are encountered in another drug, vinclozolin. Vinclozolin has shown to modulate hepatic cytochrome P450 isoforms in pregnant rats (de Oca et al. 2015). In cluster 12, three of the four benzothiazoles with an aliphatic ring structure had been tested by He et al. : N,N-dicyclohexylbenzothiazole-2-sulphenamide (4979-32-2), 2-(morpholinodithio)benzothiazole (95-32-9) and N-cyclohexyl-2-benzothiazolylsulfenamide (95-33-0) (Table S4). Although the high structural similarities the outcome of the in vitro assay were quite different. N-cyclohexyl-2-benzothiazolylsulfenamide was among the most potent tested derivatives, together with di(benzothiazol-2-yl) disulphide. Both compounds induced close to 70% luciferase activity at 10 uM concentration of the compounds, when compared to the activity of 2378-TCDD at 1 nM. However, N,N-dicyclohexylbenzothiazole-2-sulphenamide only induced close to 10 % luciferase activity in the same study, and 2-(morpholinodithio)benzothiazole showed no activity. The fourth benzothiazole with an aliphatic ring structure was 2-(morpholinothio)benzothiazole (102-77-2) which is used as a vulcanization accelerator in rubber industry (Lewis and Lewis 2016). This compound has structural similarities with the inactive 2-(morpholinodithio)benzothiazole (95-32-9), but also with the active 2-(morpholinodithio)benzothiazole. The only difference in the structures is the sulfur, double sulfur or no bridging atom between the morpholine and the benzothiazole ring. Hence, it is hard to, on beforehand, predict the potency of 2-(morpholinothio)benzothiazole in a similar AhR mediated in vitro study.

Figures
The data for Figures            , explaining 38% and 18%, respectively, of the variation in the data. The five structurally diverse AhR binders (turquoise) (according to T2 hoteling 95%), i.e. atypical binders, the rema ining AhR binders depicted as green dots (typical binders), the results from the T2 hoteling filter (step2) as blue dots, the additional industrial chemicals added from Euclidean distances from the five atypical AhR binders as yellow dots, and the remaining 148 AhR modulators as grey dots.    Tables   Table S1. The 3D descriptors used in the principal component analysis of the 214 AhR modulators . The following descriptors depend on the structure connectivity and conformation (dimensions are measured in Ångstrom) (MOE 2012.10).

polarizability AM1_HOMO
The energy (eV) of the Highest Occupied Molecular Orbital (HOMO) calculated using the AM1 Hamiltonian (Dewar et al. 1985).

reactivity AM1_LUMO
The energy (eV) of the Lowest Unoccupied Molecular Orbital (LUMO) calculated using the AM1 Hamiltonian (Dewar et al. 1985). reactivity AM1_GAP Difference in LUMO and HOMO energy (LUMO-HOMO, eV) calculated using the AM1 Hamiltonian (Dewar et al. 1985  size DistMax DistMax calculates the longest distance (Ångstrom) between two atoms in the molecule from its 3D-coordinates (AM1-optimized) (Dewar et al. 1985). size pmi2/pmi1 Second diagonal element of diagonalized moment of inertia tensor divided by the first diagonal element of diagonalized moment of inertia tensor (Dewar et al. 1985).
shape pmi3/pmi1 Third diagonal element of diagonalized moment of inertia tensor divided by the first diagonal element of diagonalized moment of inertia tensor (Dewar et al. 1985).
shape Table S2. Characteristics of the clusters found in the principal component analysis (PCA) of the 214 AhR modulators.
Compared to halogenated aromatics: lower reactivity as reflected by a larger difference between the LUMO and HOMO energies, i.e. large GAP (AM1_GAP), larger number of rings (rings).

Natural products
Example: daidzin
a Defined clusters by the PCA of the first, second and third principal component ( Figure S3a).
c Tested in Tox21 but with a purity less than 90% d Compounds not tested for AhR activity according to our scientific literature search or tested in Tox21 at a purity less than 90% or with lacking purity information.