Introduction

Flavonoids are a class of polyphenolic compounds which possess a phenyl benzopyrone structure (C6–C3–C6) and are present in all vascular plants. These are produced as secondary plant metabolites, which are known to demonstrate broad-spectrum pharmacological activities, but the human body is unable to produce them [1,2,3]. These compounds according to saturation level subdivided into flavanols, flavonols, flavones, flavanones, isoflavones, flavanonols, and chalcones [4, 5].

The CYP17A1 has an important role in the biosynthesis of dehydroepiandrosterone (DHEA) as the precursor of androgens and overexpression of this enzyme can cause prostate cancer. Abiraterone as an approved anti-prostate cancer drug is a CYP17A1 inhibitor [6, 7]. Flavonols are characterized by a hydroxyl group present at C-3 of the flavone skeleton and there are some reports about the CYP17A1 inhibitory activity of flavonoids like rutin, morusflavone, quercetin, kaempferol and isorhamnetin [8,9,10].

These have also been attracted by medicinal chemists because of their effective anti-prostate cancer properties. Prostate cancer is the most common type of diagnosed cancer among males worldwide with the incidence of 28 cases per 100,000 and mortality being 7 per 100,000 [11,12,13]. Normal growth and maintenance of the prostate is dependent on androgen hormones that act through the androgen receptor. Activation of the androgen receptor drives the development of prostate cancer. It has been reported that the agents such as flavonols that down-regulate androgen receptors can inhibit the development of prostate cancer cells [14,15,16].

The influence of chemical structures of flavonols over their anticancer activities has been investigated experimentally and shown that structural modification can further increase its anti-cancer activity and ability to activate PC-3 cell apoptosis. However, the structure–activity relationship for flavonols as anti-prostate cancer agents has captured attention by quantitatively correlating the molecular structures or properties with variation in pharmacological activity [17, 18].

The anti-prostate cancer activity is expressed typically with IC50 (half maximal inhibitory concentration) values. Quantitative structure–activity relationships (QSARs) are a powerful tool to predict IC50 of flavonoids in general. Already, no study has been reported on QSAR modeling for predicting the IC50 of flavonols against prostate cancer.

QSAR model is a mathematical equation which is widely employed to estimate and predict pharmacological activity or physical, chemical properties/activities of chemicals using descriptors derived from chemical structure [19,20,21,22]. The CORAL (Correlation and Logic) freeware software is employed for designing the Quantitative structure–activity/activity relationships (QSPRs/QSARs) models in compliance with OECD principles [23,24,25,26]. In CORAL software, the SMILES notations of the molecular structure are used as an input file and produce the best model based on Monte Carlo optimization [27,28,29,30]. It can be applied to compute the optimal descriptor by using solely SMILES or molecular graph-based descriptor or a combination of both descriptors (so-called hybrid descriptor). A literature survey reveals that the index of ideality of correlation (IIC) parameter of CORAL software can be employed to build robust QSAR models [31,32,33,34].

Molecular docking simulation is a computational methodology that purveys automatic tools to measure the conformation of a protein–ligand complex. The aim of molecular docking is to regulate the position of the ligand in the protein. An energy-based scoring function is commonly used in docking procedures to find the energetically most advantageous ligand conformation when attached to the target. Intermittently, the Monte Carlo computational methodologies are also applied in molecular docking simulation [35, 36].

Since ancient times various natural products have been used as traditional medicine against various human diseases. Moreover, natural products are easily applicable, cheap, accessible and acceptable treatment method with minimum cytotoxicity [37]. As a results of QSAR modeling, the pIC50 activity of some natural flavonols as anti-proliferative agents were predicted and reported.

The goal of this report is to devise reliable first QSAR models utilizing CORAL software to predict pIC50 of 81 flavonols against prostate cancer. In the development of QSAR models, a hybrid optimal descriptor, a combination of SMILES and hydrogen suppressed graph (HSG), is employed. The index of ideality of correlation (IIC) is used to improve the predictive potential of QSAR models. Further, the pIC50 is also calculated for a series of eight natural flavonols using the QSAR models of all splits. As mentioned above flavonols show their anti-prostate cancer activity through different mechanism of actions. However, molecular docking is also performed for eight natural flavonol derivatives in order to evaluate their potential affinity to CYP17A1 (PDB: 3RUK).

Methods

Data

Experimental data on anti-prostate cancer (PC-3) activities of 86 flavonols were taken from the four literature reports (Additional file 1: Table S1) [11, 38,39,40]. The numerical values of activity were converted to a negative logarithmic scale, pIC50 (− logIC50) (Molar) for QSAR modelling. The range of pIC50 for PC-3 cell line was from 3.39 to 6.28. The current dataset was not previously used for QSAR modeling. The molecular structures of the flavonol derivatives were sketched by BIOVIADraw 2019 and transferred to the SMILES code for modeling with the CORAL software. Three splits were made from the dataset and each split was further randomly divided into four sets i.e., training (≈ 35%), invisible training (≈ 25%), calibration (≈ 15%), and validation (≈ 25%) sets. In CORAL-based QSAR modeling, each set was assigned its specific accountability. The task of the training set (TRN) was to compute correlation weights and the task of the invisible training set (iTRN) was to control the adaptability of the data which were not employed in the training set. The assignment of the calibration set (CAL) was to detect the overtraining whereas the final estimation of the predictive potential of the designed QSAR model was assigned to the validation set (VAL) [34, 41].

Hybrid optimal descriptor

Herrin, the optimal hybrid optimal descriptor based on SMILES and HSG was employed to create QSAR models for pIC50 of flavonol compounds. The literature reports showed that the QSPR models produced through the ‘hybrid’ optimal descriptor had better statistical parameters than the model designed by individually SMILES or HSG descriptors [42, 43].

The QSAR model employed to predict pIC50 of flavonol derivates is demonstrated in the following equation:

$${pIC}_{50}={\mathrm{C}}_{0}+{\mathrm{C}}_{1}\times {}^{Hybrid}\mathrm{DCW}\left({\mathrm{T}}^{*}, {\mathrm{N}}^{*}\right).$$
(1)

Here, C0 is the regression coefficient and C1 is the slope computed by the least-squares method; DCW (descriptor of correlation weights) is computed with correlation weights of molecular features extracted from HSG and SMILES notations. The following equation is employed to compute DCW:

$$DCW\left({T}^{*},{N}^{*}\right)=\sum CW({A}_{K}),$$
(2)

where AK is an attribute of SMILES or HSG, the T* and N* define the threshold value and number of epochs of the Monte Carlo optimization, respectively.

$${}^{\mathrm{Hybrid}}\mathrm{DCW}\left({\mathrm{T}}^{*}, {\mathrm{N}}^{*}\right)={}^{\mathrm{SMILES}}\mathrm{DCW}\left(\mathrm{T}, {\mathrm{N}}^{*}\right)+{}^{\mathrm{Graph}}\mathrm{DCW}\left({\mathrm{T}}^{*}, {\mathrm{N}}^{*}\right).$$
(3)

The DCW of HSG and SMILES employed here are illustrated as Eqs. (4) and (5):

$$\begin{aligned}{}^{SMILS}\mathrm{DCW}\left(\mathrm{T},\mathrm{ N}\right)= &\sum \mathrm{CW}\left({\mathrm{S}}_{\mathrm{k}}\right) +\sum \mathrm{CW}\left({\mathrm{SS}}_{\mathrm{k}}\right)+\mathrm{CW}\left(\mathrm{BOND}\right)+\mathrm{CW}\left(\mathrm{NOSP}\right)+\mathrm{CW}\left(\mathrm{HARD}\right)+\mathrm{CW}\left(\mathrm{PAIR}\right)\\&+\,\mathrm{CW}\left(\mathrm{Cmax}\right)+\mathrm{CW}\left(\mathrm{Nmax}\right)+\mathrm{CW}\left(\mathrm{Omax}\right) \end{aligned}$$
(4)
$$\begin{aligned}{}^{HSG}\mathrm{DCW}\left(\mathrm{T},\mathrm{ N}\right)= & \sum \mathrm{CW}\left({\mathrm{e}1}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left({\mathrm{e}2}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left({\mathrm{e}1}_{\mathrm{k}}+{\mathrm{e}2}_{k}\right)+\sum \mathrm{CW}\left(\left|{\mathrm{e}1}_{\mathrm{k}}-{\mathrm{e}2}_{k}\right|\right)\\&+\sum \mathrm{CW}\left({\mathrm{pt}2}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left({\mathrm{pt}3}_{\mathrm{k}}\right)\\&+\sum \mathrm{CW}\left({\mathrm{pt}2}_{\mathrm{k}}+{\mathrm{pt}3}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left(\left|{\mathrm{pt}2}_{\mathrm{k}}-{\mathrm{pt}3}_{\mathrm{k}}\right|\right)+\sum \mathrm{CW}\left({\mathrm{S}2}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left({\mathrm{S}3}_{\mathrm{k}}\right)+\sum \mathrm{CW}\left({\mathrm{S}2}_{\mathrm{k}}+{\mathrm{S}3}_{\mathrm{k}}\right)\\&+\sum \mathrm{CW}\left(\left|{\mathrm{S}2}_{\mathrm{k}}-\mathrm{S}3\mathrm{k}\right|\right)+\mathrm{CW}\left(\mathrm{C}5\right)+\mathrm{CW}\left(\mathrm{C}6\right)\end{aligned}$$
(5)

The SMILES attributes and HSG invariant applied in Eqs. (4) and (5) are depicted in Table 1.

Table 1 The detailed description of SMILES attributes and graph invariants for constructed models of pIC50

A flowchart of a Monte Carlo optimization cycle is presented by Sokolovic et al. [44]. At first cycle, the CW(x) of features is randomly generated and then optimized based on the proposed objective function. Herein, two kinds of target functions consisting of the balance of correlation without IIC (TF1) and the balance of correlation with IIC (TF2) are studied.

The following mathematical equation is employed to compute the TF1 and TF2:

$${TF}_{1}={R}_{TRN}+{R}_{iTRN}-\left|{R}_{TRN}-{R}_{iTRN}\right|\times Const$$
(6)
$${TF}_{2}={TF}_{1}+{IIC}_{CAL}\times Const$$
(7)

The Rtraining and RinvTraining are the correlation coefficients for the training and invisible training sets, respectively. The empirical constant (Const) is usually fixed [45, 46].

The IICCAL is calculated with data on the calibration (CAL) set as the following:

$$\mathrm{IIC}={\mathrm{R}}_{\mathrm{C}AL}\times \frac{\mathrm{min}({}^{-}{\mathrm{MAE}}_{\mathrm{CAL}}, {}^{+}{\mathrm{MAE}}_{\mathrm{CAL}})}{\mathrm{max}({}^{-}{\mathrm{MAE}}_{\mathrm{CAL}}, {}^{+}{\mathrm{MAE}}_{\mathrm{CAL}})}$$
(8)

RCAL is the correlation coefficient for the calibration set. The negative and positive mean absolute errors are shown with MAE and +MAE, which are computed using the following equations:

$${}^{-}{\mathrm{MAE}}_{\mathrm{CAL}}=-\frac{1}{\mathrm{N}}\sum_{y=1}^{{N}^{-}}\left|{\Delta }_{\mathrm{k}}\right| \quad {\Delta }_{\mathrm{k}} < 0, {}^{-}\mathrm{N\,is\,the\,number\,of\,}{\Delta }_{\mathrm{k}} < 0$$
(9)
$${}^{+}{\mathrm{MAE}}_{\mathrm{CAL}}=+\frac{1}{\mathrm{N}}\sum_{y=1}^{{N}^{+}}\left|{\Delta }_{\mathrm{k}}\right| \quad {\Delta }_{\mathrm{k}}\ge 0, {}^{+}\mathrm{N\,is\,the\,number\,of\,}{\Delta }_{\mathrm{k}}\ge 0$$
(10)
$${\Delta }_{\mathrm{k}}={\mathrm{Observed}}_{\mathrm{k}}-{\mathrm{Calculated}}_{\mathrm{k}}$$
(11)

The ‘k’ is the index (1, 2,…N). The observedk and calculatedk are related to numerical values of the endpoint.

This IIC is obtained by using the correlation coefficient between the observed and predicted values of the endpoint for the calibration set, taking into account the positive and negative dispersions between the observed and calculated values [47].

Applicability domain

The applicability domain (AD) is another key guideline that should be included in a built QSPR/QSAR model. It was defined by the OECD as "the response and chemical structure space in which the model produces predictions with a specified reliability" [48, 49]. The CORAL-based QSAR model computes AD based on the dispersion of SMILES features in the training and calibration sets [50]. The AD is defined as ‘DefectAK’, which was computed with the following equation:

$$\begin{aligned}&{\mathrm{Defect}}_{{\mathrm{A}}_{\mathrm{K}}}=\frac{\left|{\mathrm{P}}_{\mathrm{TRN}}{(\mathrm{A}}_{\mathrm{K}})-{\mathrm{P}}_{\mathrm{CAL}}{(\mathrm{A}}_{\mathrm{K}})\right|}{{\mathrm{N}}_{\mathrm{TRN}}{(\mathrm{A}}_{\mathrm{K}})+{\mathrm{N}}_{\mathrm{CAL}}{(\mathrm{A}}_{\mathrm{K}})} \quad \mathrm{ If\, }{\mathrm{A}}_{\mathrm{K}}>0\\&{\mathrm{Defect}}_{{\mathrm{A}}_{\mathrm{K}}}=1 \quad \mathrm{ If\,}{\mathrm{A}}_{\mathrm{K}}=0 \end{aligned}$$
(12)

\({P}_{TRN}{(A}_{K})\) and \({P}_{CAL}{(A}_{K})\) are the probability of an attribute 'Ak' in the training and the calibration sets; \({N}_{TRN}{(A}_{K})\) and \({N}_{CAL}{(A}_{K})\) are the number of times of Ak in the training and calibration sets, respectively.

The statistical defect is computed using the following equation:

$${\mathrm{Defect}}_{\mathrm{Molecule}}=\sum_{\mathrm{k}=1}^{N{\mathrm{A}}}{\mathrm{Defect}}_{{\mathrm{A}}_{\mathrm{K}}}$$
(13)

NA is the number of active SMILES attributes for the given compounds.

In CORAL, a substance is an outlier if inequality 14 is fulfilled:

$${\mathrm{Defect}}_{\mathrm{molecule}} >2\times {\overline{\mathrm{Defect}} }_{\mathrm{TRN}}$$
(14)

\({\overline{\mathrm{Defect}} }_{\mathrm{TRN}}\) is an average of statistical defect for the dataset of the training set.

Validation of the model

It is most important to validate the predictive potential of a constructed QSAR model. In the present manuscript, the reliability and robustness of the QSAR models were verified using the following three methodologies: i) internal validation or cross-validation by considering the training dataset, ii) external validation by considering the prediction set and iii) data randomization or Y-scrambling.

The various standard statistical metrics such as correlation coefficient (R2), cross-validated correlation coefficient (Q2), concordance correlation coefficient (CCC), the IIC, \({Q}_{F1}^{2}\), \({Q}_{F2}^{2}\), and \({Q}_{F3}^{2}\), standard error of estimation (s), mean absolute error (MAE), Fischer ratio (F), novel metrics (\({r}_{m}^{2}\)) and Y-scrambling (\({\mathrm{c}}_{{R}_{p}^{2}})\) were employed to validate the developed QSAR models. The mathematical equations of various validation metrics are shown in Table 2.

Table 2 The mathematical equation of different statistical benchmark of the predictive potential for CORAL models

R2 statistic is a metric to evaluate the goodness of fit of a regression analysis. It measures the variation of experimental data with the predicted ones. The range of R2 is between 0 (no correlation) and 1 (perfect fit). R2 cross‐validated (Q2) is used for internal validation. The concordance correlation coefficient (CCC) is calculated to measure both precision and accuracy detecting how far each observation deviate from the best-fit. The CCC is calculated to detect both precision and accuracy distance of the observations from the fitting line and the degree of deviation of the regression line from that passing through the origin, respectively [51]. A lower value of MAE and s is desirable for good internal/external predictivity. Roy et al. [54] introduced a new metric \({\mathrm{r}}_{\mathrm{m}}^{2}\) that penalizes the r2 value of a model when there is large deviation between r2 and \({\mathrm{r}}_{0}^{2}\) values (Table 2). For a reliable QSAR model, the \(\overline{{r }_{m}^{2}}\) and \(\Delta {r}_{m}^{2}\) should be greater than 0.5 and smaller than 0.2, respectively. Y-scrambling or Y-randomization is an assessment to ensure the developed QSAR model is not due to chance, thereby giving an idea of model robustness [52]. For a robust QSAR model, Todeschini \({\mathrm{c}}_{{R}_{p}^{2}}\) parameter [55] is also calculated which should be more than 0.5. One of the important statistical parameters to judge different QSAR models is \(\overline{{r }_{m}^{2}}\) for test set. Here, this parameter is used to select best model between six proposed models.

Model interpretation

A straightforward process for the structural interpretation of QSPR/QSAR models is provided by the CORAL application. Three types of attributes may be identified by computing the correlation weights across several iterations of the Monte Carlo optimization algorithm. The positive numerical value of CWs in every iteration is considered for endpoint increase, the attributes with a negative value of CWs in every iteration is a notation for endpoint decrease. The unstable numerical value in the different runs is not considered for predicting the promoter of the increase/decrease endpoint [19, 56].

Molecular docking

Molecular docking is a method commonly employed in drug discovery and development to identify protein–ligand binding configurations This approach involves the docking of a molecule with a specific macromolecule and then computing the binding free energy between the ligand and receptor[35]. The structure was sketched in ChemDraw 16.0, and the energy was minimised in Chem3D using the MM2 technique [57]. The crystallographic structure of Human cytochrome P450 CYP17A1 in complex with abiraterone was obtained from the Protein Data Bank (PDB: 3RUK) and used for molecular docking [58]. AutoDock Vina was employed for docking studies (Molecular Graphics Lab, CA, USA) [59]. The value of exhaustiveness was 8 and the dimensions of the grid box were 20.0, 20.0, and 20.0 Å in size. The findings and illustration were examined visually using Discovery Studio visualizer 2021.

Results and discussion

QSPR modelling for pIC50

Three types of outliers affect the model quality in QSPR/QSAR study. The first is the outliers in the dependent variable y, the second is the outliers in the direction of the independent variable X, and the third type of outliers indicates a different relationship between X and y. [60]. Here, based on several preliminary QSAR models, six compounds (compounds No. 31, 32, 36, 37, 67, and 80) identified as outliers, these molecules showed a large absolute error (> 3 s). These compounds fall in first type of outlier. The structure of these compounds is similar to the main body of the samples. So, they were removed from the data set before further data processing.

In this study, the balance of correlation approach was employed to generate QSAR models. A total of six QSAR models was generated utilizing two kinds of target functions i.e. TF1 (WIIC = 0.0) and TF2 (WIIC = 0.2). To obtain the preferable threshold value (T*) and the number of epochs (N*), the range of 1–10 for threshold and 1 to 50 for epoch were employed. In the case of TF1, the value of T* and N* were 1 and 10 for split 1; 1 and 3 for split 2; 1 and 7 for split 3, respectively. However, in the case of TF2, the value of optimum (T*, N*) for splits 1, 2, and 3 were (1, 10), (1, 10), and (1, 7), respectively.

The mathematical relationship for the developed QSAR model of pIC50 using TF1 and TF2 for three splits are displayed below:

The Monte Carlo optimization with target function TF1

$$\mathrm{Split }1\,\, {\mathrm{pIC}}_{50}=-8.4912\left(\pm 0.2835\right)+0.0978\left(\pm 0.0021\right)\times \mathrm{DCW}(1, 10)$$
(15)
$$\mathrm{Split }2\,\, {\mathrm{pIC}}_{50}=-16.266\left(\pm 0.2769\right)+0.1309\left(\pm 0.0017\right)\times \mathrm{DCW}(1, 3)$$
(16)
$$\mathrm{Split }3\,\, {\mathrm{pIC}}_{50}=-4.2842\left(\pm 0.2158\right)+0.0626\left(\pm 0.0015\right)\times \mathrm{DCW}(1, 7)$$
(17)

The Monte Carlo optimization with target function TF2

$$\mathrm{Split }1 \,\,{\mathrm{pIC}}_{50}=-3.1689\left(\pm 0.2140\right)+0.0272\left(\pm 0.0007\right)\times \mathrm{DCW}(1, 10)$$
(18)
$$\mathrm{Split }2\,\, {\mathrm{pIC}}_{50}=-9.6171\left(\pm 0.3420\right)+0.0758\left(\pm 0.0017\right)\times \mathrm{DCW}(1, 10)$$
(19)
$$\mathrm{Split }3\,\, {\mathrm{pIC}}_{50}=-7.0645\left(\pm 0.3206\right)+0.0482\left(\pm 0.0013\right)\times \mathrm{DCW}(1, 7)$$
(20)

The statistical results of designed QSAR models for three splits utilizing TF1 and TF2 are presented in Table 3. As can be seen, all developed QSAR models were acceptable statistically and agreed with the requirements of various validation criteria.

According to the results presented in Table 3, it was found that the models constructed using TF2 (with IIC) had better statistical results than the models constructed using TF1 (without IIC). The results of calibration and validation sets were better for the models constructed by using TF2, but the inferior quality of the model for the training sets was obtained. Hence, it can be expressed that the models designed with the IIC are more statistically considerable and robust for the present dataset. Based on validation metric study of QSPR/QSAR models by Ojha et al., the \({\overline{r} }_{m}^{2}\) value of models is used to judge the quality of the predictions by different models. The QSAR model developed by TF2 for split 3 was selected as a prominent model with highest \({\overline{r} }_{m}^{2}\) (\({\overline{r} }_{m}^{2}\)=0.615).

Table 3 The statistical characteristics of CORAL models for pIC50 generated with TF1 and TF2

The plot of observed pIC50 versus predicted pIC50 for three models designed with TF2 is displayed in Fig. 1. In the QSAR model generated by utilizing the Monte Carlo method, the outliers were introduced by the statistical defects. So, in the present QSAR model created by TF2, the number of outliers was found six for all splits. Table 4 displays flavonols IDs, SMILES codes, and descriptor of correlation weights (DCWs) with their experimental and predicted pIC50.

Fig. 1
figure 1

Observed pIC50 versus predicted pIC50 values for three CORAL models constructed based on TF2

Table 4 SMILES notation, the distribution of splits, DCWs, observed and predicted pIC50 of flavonols (+, −, #, and * show the componds located in the training, invisible training, calibration, and validation sets respectively)

Interpretation of the QSAR model

The mechanistic interpretation of a QSAR model is the fifth principle of OECD. The mechanistic interpretation of the QSAR model provides a correlation and a relationship between the chemical structure of the compounds and their property/activity. It also enunciates the molecular features which are responsible for the increase/decrease of endpoints that can be computed from QSAR models. Information on the mechanistic interpretation of flavonols as a promoter of pIC50 increase/decrease may aid in the design and development of new flavonol derivatives.

In CORAL, correlation weights (CWs) of structural attributes (SAk) are calculated in three or more runs and the mechanistic interpretation is achieved by analysis of CWs. If in all probes of the optimization, the numerical value of CW of structural attributes is found greater than zero, then these attributes are considered as a promoter of increase. Whereas, if the numerical value of CW of structural attributes is found smaller than zero, then these attributes are defined as the promoter of decrease [61, 62].

The list of attributes and their correlation weights for three runs of all splits computed with TF2 is presented in Table 5. The most significant structural attributes as the promoter of pIC50 increase were distinguished and extracted. The structural attributes as promoters of increase of pIC50 were aliphatic carbon atom connected to double-bound (C…=…, aliphatic oxygen atom connected to aliphatic carbon (O…C…), branching on aromatic ring (c…(…), and aliphatic nitrogen (N…). The good fingerprints obtained from Monte Carlo optimization method are indicated in Fig. 2. These attributes for two compounds with the highest pIC50 are shown in Fig. 2 (compound no. 60 and 64).

Table 5 Important features interpretation for increasing of pIC50 values of three splits
Fig. 2
figure 2

Good fingerprints obtained from Monte Carlo optimization method

A series of natural flavonols with unknown pIC50 was selected and their pIC50 was calculated from the QSAR models of best split (split 3). Names, chemical structure and corresponding predicted pIC50 of selected natural flavonol derivatives with pIC50 more than 4, are depicted in Table 6. These compounds were also considered for molecular docking studies.

Table 6 The chemical structure of some natural flavonols with predicted pIC50 using leading model (split 3), docking scores (Kcal mol−1) and amino acid interacted with 3RUK

Molecular docking studies

The docking for abiraterone was performed into the active site of Human Cytochrome P450 CYP17A1 (PDB: 3RUK) to validate the binding energy of ligand–protein interactions. The validation results showed a binding energy of − 10.3 kcal/mol for abiraterone and a root-mean-square deviation (RMSD) value 1.172 Å (Fig. 3). The active pocket consisted of amino acid residues such as Val366, Val483, Val482, Ala367, Glu305, Gly301, Leu209, Asn220, Tyr201, Ile206, Ile205, Arg239, Phe114, ala302, Ile371, Ala113, Thr306, and Cys442, which play fundamental roles by hydrophobic interactions and forming H-bond (Fig. 4).

Fig. 3
figure 3

Superposition of the abiraterone output docked ligand (blue) and the co-crystallized ligand (green) of 3RUKA

Fig. 4
figure 4

3D docking mode and 2D schematic interaction diagram for the best pose of abiraterone redocked into 3RUK crystal structure

In addition, the docking studies for eight natural flavonols with predicted pIC50 more than 4.0 based on the best model (split 3), were conducted along with compound number 60, which has high experimental activity. Natural flavonols azaleatin, gossypetin, isorhamnetin, myricetin, pachypodol, quercetin, rhamnazin, and rhamnetin exhibited binding energies of − 8.1, − 8.5, − 8.0, − 8.2, − 7.9, − 8.4, − 8.3, and − 8.2 kcal/mol, respectively (Table 6). The docking outcomes matched the calculated pIC50 of flavonols. The superimposition image of the optimum binding pose for each suggested flavonol is displayed in Fig. 5. Figure 6 shows the 3D docking mode and 2D schematic depiction of interactions for some natural flavonols and the active ligand. The oxygen atom was involved in hydrogen bond interactions with the active site amino acid residues, and so the oxygen of flavonols was particularly significant for the anti-prostate cancer effect of flavonols. The positive contribution of oxygen atom on pIC50 of flavonol derivatives was seen in the mechanistic interpretation of the above-mentioned QSAR models. So, the present QSAR models are acceptable for a wide range of flavonols derivatives.

Fig. 5
figure 5

Superimposed poses of docked molecules and the co-crystallized abiraterone (violet) into the active site of 3RUKA

Fig. 6
figure 6

3D docking mode and 2D schematic interaction diagram for the best pose of some natural flavonols against 3RUK crystal structure (for interpretation of the references to color in this figure legend, the reader is referred to the web version of this article)

Conclusion

In the present study, a reliable QSAR model was described to predict the anti-prostate cancer activities of 81 flavonol derivatives using the Monte Carlo optimization technique of CORAL software. To date, the QSAR models to predict the pIC50 of this dataset were not previously reported. Six QSAR models were constructed utilizing the balance of correlation method with two target functions TF1 (WIIC = 0.0) and TF2 (WIIC = 0.2). The IIC was employed to improve the reliability and robustness of the models. The QSAR models developed by using TF2 were found better than the models developed by TF1. The predictability and robustness of designed models were evaluated by the various statistical parameters such as R2, Q2, IIC, CCC, MAE, s, \(\overline{{r }_{m}^{2}}\), Δ\({r}_{m}^{2}\), \({C}_{{R}_{p}^{2}}\), F and Y-test. Based on ‘statistical defect’, d(A) for a SMILES attribute, the AD was also analysed and the outliers were extracted. The structural attributes as promoters of increase/decrease of pIC50 were identified and used to predict the pIC50 of natural flavonols. The mechanistic interpretation was also confirmed by molecular docking of natural flavonols into the active site of Human Cytochrome P450 CYP17A1 (PDB: 3RUK).