1 Background

The emergence of new diseases coupled with the increasing resistance of existing diseases to therapies currently in use has ensured the continual need for novel medication and therapies. The need for novel therapies has had its toll on the environment—particularly on plants and micro-organisms [1, 2]. The use of the trial and error method of traditional drug design has led to wastage of precious natural resources, disruption of the ecosystem, and in some cases, environmental pollution. Additionally, obtaining a drug candidate using traditional methods usually take a long time [3]. Computational methods, however, proffer time-saving, cost-efficient, and cleaner alternatives. Computer-aided drug design (precisely ligand based design) employs tools that model the activities of certain compounds, and afterward, the model built is used to predict the activities of other compounds having a similar pharmacophore [4, 5].

Cancer refers to a collection of over 200 different types of diseases in which certain cells grow abnormally and have the potential to invade or spread to other body parts, killing healthy cells in the process [6]. Prostate cancer is one of the most common cancers diagnosed in males. It is second only to skin cancer and more prevalent than lung cancer in the UK and the USA. Prostate cancer accounts for about 4% of all cancers suffered by men annually. An estimate of 1.3 million cases was reported worldwide in 2018. Prostate cancer is prevalent in older men (> 65 years) with 6 out of 10 men in the age group being diagnosed with the cancer. About 80% of all prostate cancer cases are reported in this age group [7,8,9]. The mortality rate of prostate cancer is not very high; the 5-year survival rate of prostate cancer is about 98%. However, prostate cancer is still one of the leading causes of cancer-related deaths in males, second only to lung cancer. It is the fifth leading cause of deaths in males worldwide. It also accounts for about 4% of cancer-related deaths, and over three hundred thousand deaths were attributed to the cancer in 2018 in the USA [7, 8].

DU145 cell line is one of the three most used prostate cancer cell lines in prostate cancer research, the other two being PC3 and LNCaP cell lines. DU145 cells have moderate metastatic potential compared to PC3 cells and are androgen receptor positive [10]. The growth of prostate cancer cell lines has been reported to be via modification of the androgen receptor [11]. Thus, modification of the androgen receptor is a viable strategy for combating early-stage prostate cancer. This study built a QSAR and QSTR model to predict the activity and toxicity of some phenylpiperazine derivatives against DU145 prostate cancer cell lines and normal prostate epithelial cells. The study also investigated the interaction between the compounds and the androgen receptor via molecular docking studies.

2 Methods

2.1 Dataset

Thirty-seven (37) phenylpiperazine derivatives reported by Chen et al., [12, 13] were employed in building a QSAR model that predicts their anti-proliferate activity against DU145 prostate cancer cell lines. Thirty (30) other derivatives having toxic (proliferative) activity against normal prostate epithelial cells were employed in building the QSTR model [12,13,14]. Figure 1 presents 2D structures of the compounds used to build the QSAR model while Fig. 2 presents those for the QSTR model. The anti-proliferative activity (IC50) of the compounds ranged from 0.77 to 46.24 μM while toxicity ranged from 3.87 to 49.21 μM. The activity and toxicity of the compounds were converted to the logarithmic scale using the formula pIC50 = − log10(IC50). This conversion reduced the skew in the activity and linearized the activity of the compounds [15]. In the logarithmic scale, the activity ranged from 4.3350 to 6.1135 while toxicity ranged from 4.3079 to 5.4123, respectively.

Fig. 1
figure 1

2D structures compounds used in QSAR Study

Fig. 2
figure 2

2D structures compounds used in QSTR Study

2.2 QSAR/QSTR model

A 2D structure of each molecule was drawn using the ChemDraw Ultra 12.0 software and then converted to their equivalent 3D structure using the Spartan 14 V1.4 software. The ground state equilibrium geometry of the compounds was afterward obtained via optimization using the B3LYP/6-31G* basis set of the density functional theory (DFT) in the Spartan 14 software [16, 17]. The molecular descriptors of the optimized molecules were calculated using the Pharmaceutical Data Exploration Laboratory (PaDEL) version 2.21 software. The dataset of molecular descriptors obtained was pretreated at a correlation cut-off of 0.8 in the DTC Lab Pretreatment software version 1.2 and divided into a training and test set using the Kennard-Stone algorithm in DTC Lab Dataset Division software version 1.2. The training set was transferred to the Accelery Material Studio Version 8.0 software where the model was built while the test set was subsequently employed for external validation of the built model [18, 19].

2.3 Model validation

The validity, robustness, and predicting ability of a built model are ascertained by subjecting the model to certain validation tests. The internal consistency and validation of the model were ascertained using the coefficient of determination (R2 and R2adj.) and cross-validated coefficient of determination (Q2cv). A robust model has values ≥ 0.7 for R2 and ≥ 0.6 for Q2cv [20]. R2 is a measure of the variation in the activity/toxicity of the molecules that can be explained by the model. Thus, a robust model should be able to explain at least 0.7 (70%) of the variation in the activity/toxicity of the compounds. Q2cv measures the degree to which the model can generalize to another independent dataset of phenylpiperazine derivatives [21]. R2, R2adj., and Q2cv alongside other statistical parameters such as Friedman’s lack of fit (LOF) and significance-of-regression F-value were automatically generated by the Material Studio software while the model was built. The reproducibility of the model on other independent datasets was evaluated by subjecting the model on an independent dataset. The external coefficient of determination (R2ext.) was determined using Eq. 1 [22].

$$ {R^2}_{\operatorname{ext}.}=1-\frac{\sum {\left({Y}_{\exp_{\mathrm{test}}}-{Y}_{{\mathrm{pred}}_{\mathrm{test}}}\right)}^2}{\sum {\left({Y}_{\exp_{\mathrm{test}}}-{\overline{Y}}_{\exp_{\mathrm{test}}}\right)}^2} $$
(1)

where Yexptest is the experimental activity/toxicity of each test set molecule, Ypredtest is the predicted activity/toxicity of each test set molecule, and \( {{\overline{Y}}_{\mathrm{exp}}}_{\mathrm{train}} \) is the mean activity/toxicity of the training set compounds. A model with good predicting power has R2ext. ≥ 0.6 [20].

The inter-correlation between the molecular descriptors was also evaluated using Pearson’s correlation and variance inflation factor (VIF) tests [18]. To ensure that each molecular descriptor made a unique contribution to the prediction of the activity/toxicity, the descriptors ought to be poorly correlated (R < 0.4) and have VIF values less than 10. The VIF of each molecular descriptor was calculated as (1 − R2)−1 [18, 22]. The mean effect of each molecular descriptor was also calculated using Eq. 2. The mean effect reveals the descriptors which have the highest positive (or negative) impact on the activity/toxicity [21].

$$ \mathrm{Mean}\kern0.5em \mathrm{effect}=\frac{\beta_j{\sum}_i^n{D}_j}{\sum_j^m\left({\beta}_j{\sum}_i^n{D}_j\right)} $$
(2)

where βj is the jth descriptor’s coefficient in the regression model, Dj is the value of the jth descriptor for each molecule in the training set, m is the number of molecular descriptors in the regression model, and n is the size of the training set.

William’s plot of the applicability domain was also drawn using the leverage technique (Eq. 3) [19]. The applicability domain is the surface space in which the model makes reliable predictions for the activity/toxicity of the compounds [23]. Compounds in the dataset which fell within the applicability domain can be reliably employed in further computer aided compound design such as ligand based design.

$$ {H}_j={x}_j{\left({X}^TX\right)}^{-1}{x}_j^T $$
(3)

where Hj is the leverage of the jth compound, xj is a 1 × m row matrix of the m molecular descriptors of compound j, and X is an m × k matrix made up of m row descriptor values and k columns of training set values. The boundary of the applicability domain was evaluated as h* = 3(m + 1)/k.

2.4 Molecular docking

Molecular docking studies were carried out using Biovia’s Discovery Studio 2016 client and the Autodock Vina integration of the PyRx-Python Prescription 0.8 software. The crystal structure of the androgen receptor (PBD code: 5T8E) was downloaded from the protein data bank [24]. The downloaded receptor was prepared on the Discovery Studio software. To prepare the receptor, water molecules, heteroatoms, and cofactors were removed from the receptor. Plate 1 presents a 3D crystal structure of the prepared receptor. 3D optimized structures of compounds 25 and 32 (Fig. 1) were prepared for molecular docking by converting them to .pdb file format on the Spartan 14 software [18, 22]. The binding affinity of the compound and the androgen receptor was determined using the PyRx software while the category and type of interaction between the compounds and the receptor were viewed on the Discovery Studio software.

Plate 1
figure 3

Crystal structure of prepared androgen receptor (PDB ID: 5T8E)

3 Results

3.1 QSAR

The compounds presented in Fig. 1 were employed to build a QSAR model using the Genetic Function Algorithm–Multilinear Regression (GFA-MLR) method. Four models were built each consisting of five molecular descriptors. Model 1 was adopted as the best model because it had statistical parameters similar to those reported for a robust model [20]. The regression equation of model 1 is presented in Eq. 4. Its statistical parameters are presented in Table 1 while the definition and class of its molecular descriptors are presented in Table 2. The regression equations of the other QSAR models built are presented in Supplementary Table S1. Table 3 presents the correlation between the molecular descriptors in the model as well as their VIF and mean effect values. The external validation of the model was investigated by subjecting the model to the test set. The external coefficient of determination (R2ext.) was calculated as presented in Table 4.

$$ Model\ 1:{pIC}_{50}=-0.298912928\times VR3\_ Dzp-0.171389530\times VE3\_ Dzi+0.961235807\times Kier3-6.822313106\times RHSA+0.120746172\times RDF55v+8.757696717 $$
(4)
Table 1 Statistical parameters of model 1
Table 2 Description and class of molecular descriptors in the built model
Table 3 Correlation, VIF, and mean effect of molecular descriptors
Table 4 External validation of the built model

THSA sum of solvent accessible surface areas of atoms with absolute value of partial charges less than 0.2

NB, Ytrain was evaluated to be 5.0034

Equation 1 was used to predict the anti-proliferate activity of the compounds in the training and test set. The predicted activity and residual of each compound in the training and test sets are presented in Supplementary Tables S2 and S3. Figure 3 presents a graph of the observed experimental activity against the predicted activity. Figure 4 is a graph of the observed anti-proliferate activity against the standardized residual, and Fig. 5 presents William’s plot of the applicability domain of the built model.

Fig. 3
figure 4

Experimental activity against predicted activity

Fig. 4
figure 5

Experimental activity against standardized residuals

Fig. 5
figure 6

William’s plot of the applicability domain

3.2 QSTR

The toxicity of the compounds against normal prostate epithelial cells was also modeled by building four GFA-MLR models. Each model consisted of four molecular descriptors. The first model had statistical parameters similar to those reported for a stable, robust model [20]. Equation 5 presents the regression equation of the model. Supplementary Table S4 presents the regression equation of the other QSTR models built. Table 5 presents the statistical parameters of the built models while Table 6 presents the description and class of molecular descriptors in model 1. Table 7 presents the results of correlation studies, VIF, and mean effect of the molecular descriptors. The model was used to predict the toxicity of the compounds in the training set (Table S5) and test set (Table S6). The external validation (R2ext.) of the model was also calculated (Table S7) and was equal to 0.6344.

$$ Model\ 1:{pIC}_{50}=-5.247227835\times MATS8c-2.264116018\times MATS3s+3.344208710\times ETA\_ EtaP\_F-0.082468094\times RDF95m+2.449569658 $$
(5)
Table 5 Statistical parameters of built models
Table 6 Description and class of molecular descriptors in model 1
Table 7 Correlation, VIF, and mean effect of molecular descriptors

Figure 6 presents a graph of the experimental activity against the predicted toxicity of the compounds. Figure 7 is a graph of the standardized residuals against experimental toxicity while Fig. 8 presents the domain of applicability of the built model.

Fig. 6
figure 7

Experimental toxicity against predicted toxicity

Fig. 7
figure 8

Standardized residuals against experimental toxicity

Fig. 8
figure 9

William’s plot of the applicability domain

3.3 Molecular docking

Molecular docking studies were carried out to investigate the binding affinity and types of interactions between compounds 25 and 32 (Fig. 1) and the androgen receptor. The binding affinity, category, and type of interaction between each compound and the receptor are presented in Table 8. Plate 2 presents the 2D interaction between compound 25 and the receptor while Plate 3 presents the 2D interaction between compound 32 and the receptor.

Table 8 Binding affinity and interaction between compounds 25 and 32 and the androgen receptor
Plate 2
figure 10

2D structure of the interaction of compound 25 and the androgen receptor (PDB ID: 5T8E)

Plate 3
figure 11

2D structure of the interaction of compound 32 and the androgen receptor (PDB ID: 5T8E)

4 Discussion

In silico methods are invaluable computer aided techniques employed in obtaining and optimizing potential drug leads and candidates. QSAR is a variant of the Quantitative Structure Property Relationship (QSPR) approach. QSAR models the activity of a set of compounds as a linear combination of certain molecular descriptors. Molecular descriptors are numbers that describe certain molecular properties of a compound [5, 25]. A robust QSAR model employs molecular descriptors which significantly affect the activity of the compounds. A QSAR model can be employed in predicting the activity of other similar compounds [18]. The built QSAR model had internal validation parameters (R2 = 0.7792, R2adj. = 0.7240, Qcv2 = 0.6607) which are similar to those reported for a robust model [20]. The square of the coefficient of determination (R2 and R2adj.) is a measure of the variation in the activity of the compounds which can be explained by the model. The built model explained at least 70% (0.7) of the variation in the activity of the compounds. Qcv2 as earlier defined measures the degree to which the built model can generalize over another independent dataset of phenylpiperazine compounds [21]. The model built had at least a 66% (0.66) probability of generalizing over any dataset of phenylpiperazine compounds. The external coefficient (R2ext.) of 0.6049 obtained is similar to that reported for a model with high predicting capacity [20]. A robust model is characterized by poorly correlated molecular descriptors and VIF values less than 10.0 [18]. Table 3 presented the correlation between the molecular descriptors in the model as well as their VIF and mean effect values. The descriptors were observed to be poorly correlated (R < 0.42). The VIF values obtained (< 2.5) revealed that descriptors were poorly correlated and as such, each molecular descriptor can be considered to make a significant contribution to predicting the activity of the compounds. Mean effect studies (Table 3) revealed that the molecular descriptors RHSA and VR3_Dzp had the highest significant positive effect on the activity of the compounds while Kier3 and VE3_Dzi had the highest negative impact on the activity of the compounds. The coefficient of determination R2train and R2test obtained (Fig. 3) highlights the robustness and predicting power of the model while the random spread of the standardized residuals (Fig. 4) indicates the absence of systematic errors [19]. The domain of applicability of the model is the surface space where predictions made by the model can be reliably employed for further theoretical studies or experimental applications [23]. The domain of applicability (Fig. 5) revealed the presence of four outliers (compounds 2, 8, 26, and 31).

QSTR is also a variant of the QSPR approach, and it models the toxicity of compounds as a linear combination of molecular descriptors. QSTR models are built on the same premise as QSAR models. The QSTR model built explained at least 80% (R2 and R2adj. > 0.8) of the variation in the toxicity of the model. The cross validated R-squared (Q2cv = 0.7788) and external coefficient of determination (R2ext. = 0.6344) obtained were similar to those reported for a robust model [20]. The molecular descriptors were observed to be poorly correlated, having VIF values less than 2.0. The molecular descriptor ETA_EtaP_F was observed to have the highest positive impact on the toxicity of the compounds. Figure 6 showed the built QSTR model to have a high predicting ability, posing R2train and R2test values of 0.8652 and 0.6317. The model was also observed to have no systematic error (Fig. 7), and the domain of applicability plot (Fig. 8) revealed the presence of only one outlier (compound 4). The QSAR and QSTR models built can be employed in ligand based compound design to design novel phenylpiperazine compounds with better anti-proliferative activity and less toxicity.

Molecular docking is an in silico approach that investigates the binding interaction between a ligand and receptor. A receptor is a macromolecule, usually an enzyme, biological receptor, tissue, and so on. Molecular docking is primarily concerned with the binding affinity (or energy) and type of interaction between the receptor and ligand [26, 27]. Findings from molecular docking studies are employed in in silico compound design via structure based methods. The binding affinity and types of interactions between compounds 25 and 32 and the androgen receptor were presented in Table 8. Compound 32 was observed to form more non-bonding interactions with the androgen receptor, and this accounts for its relatively higher binding affinity (− 7.00 kcal/mol). It was observed to have interactions with threonine (THR755), arginine (ARG752) proline (PRO801), and phenylalanine (PHE754) amino acid residues of the receptor and compound 25 on the other hand for interactions with glutamic acid (GLU772), tyrosine (TYR781), and arginine (ARG779) protein residues. Furthermore, it was observed that the hydrogen bond interaction (bond distance = 2.21195 Å) had a greater stabilizing effect compared to the halogen interaction (bond distance = 3.13077 Å). Shorter bonds hold atoms closer to each other and as such are stronger than longer ones. The findings from the molecular docking studies can be employed in the in silico design of novel phenylpiperazine compounds via structure based design.

The compounds employed in this study have been reported to show in vitro cytotoxic activity against the DU145 prostate cancer cell lines [12, 13]. The regression models built revealed the molecular descriptors which significantly affect the cytotoxicity of the compounds. Furthermore, the models provide a platform upon which novel phenylpiperazine compounds can be designed via the ligand based design approach.

5 Conclusion

This study employed variants of the Quantitative Structure Property Relationship approach for the in silico studies of some phenylpiperazine compounds. The variants employed were QSAR and QSTR techniques. The study built two robust models to predict the toxicity and anti-proliferative activity of the phenylpiperazine compounds against normal prostate epithelial cells and the DU145 prostate cancer cell lines. Both models were built using the Multilinear Regression–Genetic Function Algorithm method available on the Biovia’s Material Studio version 8.0 software. Both models had statistical parameters: R2 > 0.7, R2adj. > 0.7, Q2cv > 0.6, R2ext > 0.6, R2train > 0.7, and R2test > 0.7 which were similar to those reported for robust models. The activity of the compounds was revealed to be strongly dependent on the molecular descriptors VR3_Dzp, VE3_Dzi, Kier3, RHSA, and RDF55v. The toxicity of the compounds, on the other hand, was observed to be strongly dependent on the molecular descriptors MATS8c, MATS3s, ETA_EtaP_F, and RDF95m. Molecular docking studies were also carried out between compounds 25 and 32 and the androgen receptor. Molecular docking studies revealed that compound 25 formed halogen and hydrophobic interactions with a binding affinity of − 6.40 kcal/mol while compound 32 form hydrogen and hydrophobic bond interaction with a binding affinity of − 7.00 kcal/mol with the receptor. This study provides insight on the activity and toxicity of phenylpiperazine compounds and also provides information that can be employed in in silico design of other phenylpiperazine compounds via ligand based and/or structure-based design methodology.