Background

Indole is a heterocyclic aromatic organic compound consisting of a six-membered benzene ring fused to a five-membered pyrrole ring (Bouchikhi et al. 2008) with numerous physiological activities which include anti-inflammatory, anticancer, antiulcer, antimalarial, antioxidant, antibiotic, antimicrobial, and anticonvulsant (Patil and Dandagvhal 2016). Indole is broadly disseminated in the real environment and can be generated by different species of bacteria (Lee and Lee 2010). The synthesis of indole has been performed utilizing different starting materials in the literature of organic chemistry (Chadha and Silakari 2017).

β-Glucuronidase enzyme is mostly found in plant in the Escherichia, anaerobic bacteroides, and Clostridia. β-Glucuronidase enzyme is also found in human fluids, serum, blood cells, gastric juice, bile, urine, spleen, and it can also be found in organs like the kidney, liver, lung, and muscles (Ali et al. 2016; Gloux et al. 2011). It plays a vital role in detoxification of reactive metabolites that are interrelated to several illnesses and the growth of colon cancer (De Moreno de LeBlanc and Perdigón 2005). β-Glucuronidase enzyme speeds up the breaking down of β-glucuronosyl-O-bonds. Lack of β-glucuronidase enzyme leads to Sly syndrome in animals (humans) that is related to the high level of glycosaminoglycans in cells, and overexpression of this enzyme leads to many diseases which include urinary tract infections, active pyelonephritis, acute renal necrosis, and some pathological conditions including epilepsy, cancer, AIDS, renal diseases, neoplasm of the breast, larynx, bladder, and testes, and inflammation in joints and hepatic. Therefore, it becomes necessary to mediate the effect of this enzyme (Salar et al. 2015; Sharma et al. 2014; Taha et al. 2015).

Ibrahim et al. performed quantitative structure-activity relationship (QSAR) modeling, molecular docking, and pharmacokinetic study on 30 N-Arylidenequinoline-3-carbohydrazides analogs utilizing multi-linear regression analysis adopting genetic function algorithm method. Semi empirical method using PM6 basis set was used for complete geometry optimization of the data set. The molecular docking study showed that the most active chemical in the data set was better than the standard β-glucuronidase inhibitor both in terms of binding scores and the amino acid residues that interacted with the drug and β-glucuronidase enzyme. The pharmacokinetic studies indicated that none of the chemicals violated any of the condition set by the Lipinski’s Rule of five which confirm the bioavailability of these chemicals (Ibrahim et al. 2020a).

Theoretical modeling is a unique area of interest in computer-aided drug design which provides computational software and tools that are utilized in the design and development of novel compounds of medicinal benefit (Jorgensen 2004). Quantitative structure-activity relationship (QSAR) is a computational modeling technique used to study the connection between the experimental activities with different physicochemical properties associated with the structures of a particular molecule (Ibrahim et al. 2020b; Ojha Lokendra et al. 2013). Molecular docking is also a computational modeling technique used to predict the binding energy of intermolecular complexes based on their 3D structures when one bind to the other (Abdullahi et al. 2020; Kitchen et al. 2004). This study is aimed at carrying out QSAR and docking simulation on indole derivatives against β-glucuronidase enzyme and design new β-glucuronidase inhibitors using structure-based drug design.

Methods

QSAR modeling methodology

Data retrieving, activity normalization, and structure drawing

Thirty set of indole derivatives and their corresponding β-glucuronidase inhibitory activities (IC50) were retrieved from the work of (Baharudin et al. 2017) for the purpose of this study. After the data were collected from the literature, the inhibitory activities (IC50) of these compounds were transformed to their corresponding negative logarithm (pIC50) using Eq. 1. The Chemdraw software Ultra version 12.0 was employed for drawing of the structures of all the data set prior to optimization as shown in Fig. 1 (Ibrahim et al. 2019). Figure 1 shows the 2D general structure of the studied compounds.

$$ {\mathrm{pIC}}_{50}=\log\ \left(1/{\mathrm{IC}}_{50}\right) $$
(1)
Fig. 1
figure 1

The 2D general structure of the studied compounds

Searching for optimum geometry and descriptor computations

In order to search the structure of all the data set at global minima on potential energy surface (PES), semi-empirical method using PM6 basis set was used for the searching of the most optimum geometry of all the data set on the PES (Amin and Gayen 2016). For the computation of physicochemical parameters (descriptors), Pharmaceutical Data Exploration Laboratory (PaDEL) descriptor tool kit was used to generate all the descriptors used in this research (Yap 2011).

Dataset pretreatment and division

Before dataset division, the data were pre-treated using data pre-treatment software retrieved from drug theoretic and cheminformatics (DTC) laboratory to remove redundant values (Ambure et al. 2015). Data division software was further employed to separate the data into model building set (70%) and validation set (30%) based on the Kennard-Stone algorithm also gotten from DTC laboratory (Kennard and Stone 1969).

Model generation and validation

The model building set was subjected for further analysis (regression analysis) adopting genetic function algorithm (GFA) method in material studio software to generate the models.

The equation for the regression analysis is shown in Eq. (2).

$$ Y={\mathrm{B}}_1{\mathrm{Z}}_1+{B}_2{Z}_2+{B}_3{\mathrm{Z}}_3+C $$
(2)

where Y is the pIC50, “B’s” are coefficients for the corresponding “Z’s” which are the descriptors, and the last variable “C” is the regression constant (Ibrahim et al. 2020c).

After model development, the model must be subjected to validation assessments, in order to ascertain the model’s quality, reliability, and predictive ability. The most widely used methods for the assessments of QSAR models were the internal validations, external validations, and applicability domain. The square correlation coefficients (R2) of both the training and test sets are used alongside with (Tropsha 2015) cross-validation coefficient (Qcv2) as the assessment parameters. Cross-validation is the statistical theory of separating of data into subsets such that the analysis is first performed on a single subset, while the other subset(s) are retained for subsequent use to confirm and validate the first analysis (Jalali-Heravi and Kyani 2004) and are defined as:

$$ R2\mathrm{test}=1-\frac{\sum \left({Y}_{\mathrm{prd}}-{Y}_{\mathrm{obs}}\right)2}{\sum \left({Y}_{\mathrm{obs}}-{Y}_{\mathrm{mntrng}}\right){}^2} $$
(3)
$$ \mathrm{Qcv}2=1-\frac{\sum \left({Y}_{\mathrm{prd}}-{Y}_{\mathrm{obs}}\right){}^2}{\sum \left({Y}_{\mathrm{obs}}-{Y}_{\mathrm{mntrng}}\right){}^2} $$
(4)

where Yprd and Yobs are the predicted and observed activity (pIC50) respectively of the validation set, and Ymntrng is the mean activity value of the model building set.

The multi-collinearity between the descriptors can be detected using their variation inflation factors (VIF), which can be calculated as follows:

$$ \mathrm{VIF}=\frac{1}{1-{R}^2} $$
(5)

where R2 is the correlation coefficient of the model. If VIF is equal to 1, it means there is no inter-correlation between the descriptors; if it is with the range of 1–5, the model is acceptable; and if greater than 10, the model is bad and therefore rejected and must be rechecked (Beheshti et al. 2016).

In order to confirm the predictability, strength, reliability, and accuracy of the QSAR model developed and also identify influential and outlier compounds, the mechanistic space of the model must be exploited (applicability domain) (Tropsha et al. 2003). In exploiting the applicability domain of the studied model, leverage approach was employed and is given as hi:

$$ {h}_i={y}_i\ {\left({Y}^T\ Y\right)}^{-K}\ {y_i}^T\ \left(i=K,\dots, P\right) $$
(6)

where yi is the model building compound matrix I, Y is n × k descriptor matrix of the model building set compounds, and YT is the transpose matrix Y used to develop the model. As a prediction tool, the thresh-hold value (h*) which is the limit for Y values and is given as:

$$ h\ast =3\left(q+1\right)/z $$
(7)

where q is the number of descriptors in the model, and z is the number of model building compounds. For any QSAR model to be considered as valid and used, it must pass both the internal and external validations as recommended by Veerasamy et al. (Veerasamy et al. 2011) (see Table 1).

Table 1 The minimum recommended value for QSAR models

Docking methodology

Docking studies was conducted on personal Dell computer system, with Intel ® Core™ i7, 8GB of RAM using the Auto dock vina of Pyrex virtual screening software, UCSF Chimera version 1.10.2, and Discovery Studio Visualizer software.

Preparation of ligands and the enzyme β-glucuronidase

Before docking commences, the optimum geometry optimized using Spartan’14 was saved as protein data bank file format (ligand preparations) (Abdulfatai et al. 2017). Figure 2a presents the 3D structure of the prepared of the most potent ligand.

Fig. 2
figure 2

The a 3D structure of the prepared ligand and b 3D structure of the prepared β-glucuronidase

3D crystal structure of human β-glucuronidase was retrieved from protein data bank (with PDB ID 1bhg). Discovery Studio Visualizer was used to prepare the human β-glucuronidase enzyme for the docking studies; in the course of the preparation, chain B, heteroatoms, and co-ligands were deleted from the dimer structure and saved as protein data bank file format (Veerasamy et al. 2011). Figure 2b shows the prepared structure of the human β-glucuronidase.

Docking simulation

Vina of Pyrex virtual screening software was used to dock the ligands with human β-glucuronidase (Trott and Olson 2010). Vina of Pyrex virtual screening software has a short-coming of separating the receptor and the ligand after performing the docking. As a result of that, the UCSF Chimera 1.10.2 software was utilized to re-build the complexes (ligand-receptor) for further investigation. Discovery Studio Visualizer was used to visualize the ligand-receptor to show the interactions.

Design

Structure-based drug design also called direct drug design was used to design new β-glucuronidase inhibitors. Ligand 10 was chosen as the template as it has the highest binding energy of − 9.5 kcal/mol. Structural modification was carried out on the C-4 of the indole moiety by adding halo-substituted phenyl ring. Table 2 shows the structures of the design compounds

Table 2 The structures of the designed compounds

Results

The results of the QSAR modeling is given below:

Model 1

pIC50 = − 0.250653769 * AATS4i − 0.007819193 * ATSC2m + 0.001714704 * ATSC8v + 0.949838372 * SpMAD_Dzs − 0.050437194 * SpMax_Dt + 36.58388647.

R2trng = 0.954942, R2adj = 0.93885, Qcv2 = 0.925462, Ntrng = 20, R2test = 0.855393, Ntest = 10, R2 − Q2 = 0.016092, and LOF = 0.042924.

Results of docking studies

The results of the docking studies are shown below in Table 6 and Fig. 5.

Results of the design compounds

The result of the newly designed β-glucuronidase inhibitors is given in Table 7 and Fig. 6 a and b.

Discussion

QSAR modeling results

Three QSAR models were built; out of these built models, the best reported model was based on its statistical fitness with R2 value of 0.954942, R2adj value of 0.93885, Qcv2 value of 0.925462, and the R2pred value of 0.855393 and also lowest Friedman’s Lack of fit value of 0.042924. On taking the comparison of the statistical parameters of the reported model with those reported by Veerasamy et al. (Veerasamy et al. 2011) shown in Table 1, it can be seen clearly that the statistical parameters of the reported model were all greater than the minimum recommended values which confirmed the reliability of the model. Table 3 presents the symbols, descriptions, and the classes of descriptors that appear in the reported model. The physicochemical parameters responsible for the inhibitory activities of the β-glucuronidase inhibitors were identified to be AATS4i, ATSC2m, ATSC8v, SpMAD_Dzs and SpMax_Dt. Among these descriptors as seen in the model, SpMAD_Dzs has the highest coefficient of + 0.949838372 which gave the highest contribution in the model followed by ATSC8v. ATSC2m gave the lowest contribution in the model as it has the lowest coefficient of 0.007819193. Either increase or decrease in these physicochemical parameters, the inhibitory activities of these compounds may increase or decrease.

Table 3 The symbols, description, and classes of the descriptors of the selected model

Predicted activities of both the model building set and that of the validation set were plotted against the actual activities (Fig. 3a) in order to affirm the strength of the reported model. The tool used in this case is R2 value of both the plot and that of the internal validation. The strength of the reported model was confirmed by the corroboration of R2 value of the plot (0.9963) and that of the internal validation (0.954942).

Fig. 3
figure 3

a Plot of actual pIC50 against predicted pIC50. b Plot of predicted pIC50 against standardized residuals

Every good QSAR model is expected to be free from systematic error. In order to determine whether the selected model is free from systematic error, predicted activity was plotted against standardized residuals (Fig. 3b). The random propagation of the standardized residuals on either side of zero on the plot confirmed that the reported model is free from systematic error.

The observed activities were seen to have good correlations with the predicted activities. Table 4 represents the structures, IC50, pIC50, predicted pIC50, and residual values of the data set. The low residual values observed in the table further confirmed the stability and reliability of the reported model.

Table 4 The structures, pIC50, predicted pIC50, and residuals of the data set

The correlation analysis of descriptors in the model building set of the reported model in Table 5 shows the importance of descriptors in the model. From the table, no correlation exists between the descriptors that appear in the reported model. The computed VIF values for all descriptors were obtained to be less than 5 (Table 5) indicating that the reported model was statistically significant and the descriptors were orthogonal (no multi-collinearity between the descriptors).

Table 5 The correlation analysis and VIF of descriptors in the model building set

To confirm the quality of the reported model, it is subjected to applicability domain as shown in Fig. 4 (the plot of leverage value calculated for all the data set vs. standardized residuals or Williams plot), which permit a graphical identification of both the outliers and influential compounds in the chosen model (Beheshti et al. 2016). It can be seen from the plot that all the compounds of the model building set and the validation set were in the square area (mechanistic space). Also, there were no outliers or influential compounds with standard residuals > 3d for both model building set and the validation set. This confirmed the quality of the reported model which means the model can make good prediction of activities of new compounds.

Fig. 4
figure 4

Williams plot (h* = 0.9)

Results of docking analysis

Docking analysis on the four highly potent indole derivatives and the standard drug was performed to investigate the nature of interactions between them and their target receptor (β-glucuronidase). Table 6 gives the results of the docking analysis. It was seen from the docking results that ligand 10 being the most active compound has the highest binding free energy of − 9.5 kcal/mol. The most active ligand bound to the active site of β-glucuronidase enzyme by hydrophobic, electrostatic, and hydrogen bond. Furthermore, it bound via hydrophobic with TYR29, LEU300, PRO323, MET393, CYS396, ARG436, and LEU322 amino acid residues and bound through electrostatic with HIS440 amino acid residue. More so, it bound via hydrogen bond with TYR29 (1.93247 Å), HIS440 (2.3299 Å), and GLU390 (3.68037 Å) amino acid residues. Looking at ligand 2 having the docking scores of − 8.8 kcal/mol, it forms hydrophobic interactions with TYR29, PRO148, LEU322, PRO323, MET393, ARG436, and ARG436; electrostatic interactions with CYS396; and hydrogen bond interactions with ASN439 (2.20367 Å), HIS440 (2.8928 Å), and LEU28 (3.55365 Å) amino acid residues. The standard drug with docking score of − 5.7 kcal/mol formed only hydrogen bond with ASN502 (2.42729 Å) and GLN524 (3.44886 Å) amino acid residues of β-glucuronidase enzyme. On comparing the most active compound with the standard drug, it can be seen that the most active compound has higher docking score than the standard drug which may be attributed to the kinds of interaction that occur between the active compound and the binding pocket of β-glucuronidase. Figure 5 shows the 2D structure of the visualized ligand-receptor 10.

Table 6 The binding energy, amino acid residues, hydrogen bonds, and hydrogen bond distance of reported ligands
Fig. 5
figure 5

The 2D structure of visualized ligand-receptor 10

Result of the design

Four new β-glucuronidase inhibitors were designed and found to bind tight in the binding pose of β-glucuronidase enzyme (Table 7). Addition of halo-substituted phenyl ring increased the binding energy of the designed compounds. The designed compounds were found to have higher binding energy than the template (− 9.5 kcal/mole) and the standard drug (5.7 kcal/mole). A design compound with serial number ID 4 was identified to have the highest binding energy among the designed compounds. It bound to the binding site of the β-glucuronidase enzyme via halogen bond with TYR505; electrostatic bond with GLU540, ASP207, GLU451, GLU451, and GLU451; hydrophobic bonds with ASP207; and hydrogen bond with GLU540 (1.86751 Å) and TYR505 (2.71161 Å). Figure 6 a and b show the 2D structure of the visualized ID 4 and the standard drug with their target enzyme.

Table 7 The binding energy, amino acid residues, hydrogen bonds, and hydrogen bond distance of design compounds and the standard drug
Fig. 6
figure 6

The 2D structure of visualized a ID 4-Receptor, b standard drug-receptor

Conclusion

A very high predictive QSAR model with good internal and external validation parameters: R2trng = 0.954942, Qcv2 = 0.925462, R2test = 0.855393, and LOF = 0.042924 was built and reported in this study. The model was further subjected to applicability domain to confirm the quality of the model in terms of its predictions and found to make good prediction. Docking investigation reveals that the most potent compound among all the data set bound to the binding pocket of β-glucuronidase enzyme via hydrophobic, electrostatic, and hydrogen bond. The design compound with serial number ID 4 was identified to have the highest binding energy of − 9.9 kcal/mole among the designed compounds. It bound in the binding site of the β-glucuronidase enzyme via halogen, hydrophobic, electrostatic, and hydrogen bond. The design compound was discovered to be better than the template used in the designing new compound and the standard drug. The aim of this research as stated was achieved.