QSAR, molecular docking, design, and pharmacokinetic analysis of 2-(4-fluorophenyl) imidazol-5-ones as anti-breast cancer drug compounds against MCF-7 cell line

The anti-proliferative activities of Novel series of 2-(4-fluorophenyl) imidazol-5-ones against MCF-7 breast cancer cell line were explored via in-slico studies which includes Quantitative structure–activity relationship QSAR, molecular docking studies, designing new compounds, and analyzing the pharmacokinetics properties of the designed compounds. From the QSAR analysis, model number one emerged the best as seen from the arithmetic assessments of (R2) = 0.6981, (R2adj) = 0.6433, (Q2) = 0.5460 and (R2pred) of 0.5357. Model number one was used in designing new derivative compounds, with higher effectiveness against estrogen positive breast cancer (MCF-7 cell line). The Molecular docking studies between the derivatives and Polo-like kinases (Plk1) receptor proved that the derivatives of 2-(4-fluorophenyl) imidazol-5-ones bind tightly to the receptor, thou ligand 24 and 27 had the highest binding affinities of −8.8 and − 9.1 kcal/mol, which was found to be higher than Doxorubicin with a docking score of −8.0 kcal/mol. These new derivatives of 2-(4-fluorophenyl) imidazol-5-ones shall be excellent inhibitors against (plk1). The pharmacokinetics analysis performed on the new structures revealed that all the structures passed the test and also the Lipinski rule of five, and they could further proceed to pre-clinical tests. They both revealed a revolution in medicine for developing novel anti-breast cancer drugs against MCF-7 cell line.


Introduction
Cancer is described by uncontrolled cell proliferation, thereby affecting the surrounding tissue, and over again spreading throughout the body. It is a complicated disease. (Bajaj et al. 2018). Despite the vast high-tech and social enhancement, cancer remains the most alarming disease and a leading cause of pain and death in humans (Bhaumik et al. 2019).
Among women, after lung cancer, cancer of the breast is the second source of mortality. About 40,610 women passed away from cancer of the breast and about 252,710 more diagnoses were projected in 2017 (Bajaj et al. 2018). Cancer of the breast accounts for about 24% of all cancer types in females (Xiao et al. 2018). A likely population of people with breast cancer has similar features such as older age, lack of prolonged breast feeding, adding weight, overdue age at first birth, lack of exercise, etc. (Liu et al. 2018). Detailed investigation of pathways and mechanisms on how cancer spreads and discovering many anti-cancer agents have made a breakthrough in the treatment of cancer (Bhaumik et al. 2019).
Luminal type of breast cancer is Estrogen receptor (ER)−/ progesterone receptor (PR) -the positive type which is caused by the overexpression of estrogen receptor α (ERα). It is made up of about 70% of the mammary tumor patients tagged as ER-positive (ER+). The endless stimulation of ERα by estrogens induce the multiplication of cancer cells, MCF-7 cell line (Jordan et al. 2007). The master mitotic regulator, Polo-like kinase 1 (Plk1), is an important gene cell division and a known cancer drug target. It is found overexpressed in a large collection of different cancer types and this tumoral overexpression often correlates with poor patient prognosis (De Cárcer 2019).
In the study of (Sanhaji et al. 2012), it showed that Pololike kinases (Plk1) causes destructive proliferation in tumor cells and strongly stimulates the development of cell circle. Plk1 overexpression allows cells to supersede barriers, causing genomic uncertainty and stimulating the alteration of mammalian cells. Plk1 was proven as amongst the utmost striking receptor for breast cancer treatment. Plk1 mediates estrogen receptor (ER) which regulates gene overexpression in human breast cancer cells. Recently, (Abo-Elanwar et al. 2019) reported novel Thirty-nine derivatives of Imidazolones connected to chalone moiety which showed a great anti-breast cancer activity against MCF-7 cell line. Imidazole and its derivatives have much significance in both natural products and synthetic molecules. The exclusive molecules have great electronrich features which allow them to bind easily with diverse enzymes and receptors, thus showing wide antiproliferative activities (Zhang et al. 2020). Several activities such as anticancer, antimicrobial (Premakumari et al. 2014), cardio-activity, and angiotensin II receptor antagonistic activity have been described explicitly in compounds containing imidazolone moiety (Abo-Elanwar et al. 2019).
Chemotherapy remains one of the best and fast clinical options though it is often limited due to undesirable toxic effect including weight loss, fatigue, nausea, loss of appetite, and so on, making it urgent to develop more effective drug candidates with less toxicity to eradicate this disease (Iqbal et al. 2019). The computer-aided drug design approach saves time and ensures better effectiveness of the drug candidate. This research would be aimed at exploring the novel derivatives of imidazolone by building a mathematical model (QSAR) that predicts the anti-proliferative activities from its compound and using plk1 receptor with the derivatives to understand their interactions via molecular docking studies towards anti-breast cancer drug discovery, concentrating on breast cancer treatment with less toxicity and more effectiveness.

QSAR studies
Dataset 39 derivatives of Imidazolones connected to chalone moiety with anti-proliferative activities (IC 50 ) on MCF-7 cancer cell lines were obtained from (Abo-Elanwar et al. 2019) writings. The anti-proliferative activities were measured in inhibitory concentration (IC 50 ) and then converted to the logarithm scale (pIC 50 ). The tabulated form of the IC 50 is measured in concentration of micromolar (μM) and the pIC 50 is shown in Table 1.

Molecular optimization
The geometric optimization is executed such that the countable electronic and molecular parameters could depict the original physicochemical properties of the observed molecule. (Putri et al. 2019). The derivatives were sketched using Chemdraw (V12.0.2) software in 2D format. They were then converted to 3D format and further optimized using Spartan 14 (V1.1.4) software, with the parameters, Density Functional Theory (DFT) at B3LYP, 6-31G/ basis set (Ibrahim et al. 2018;Abdullahi et al. 2019).

Molecular descriptors
Pharmaceutical Data Exploration Laboratory Software V (2.20) was used in calculating molecular descriptors for the thirty-nine (39) optimized compounds of Imidazolones derivatives which were then converted to SDF format after optimization. (Yap 2011).

Pretreatment and division of data set
The outcomes obtained from PADEL-software were pretreated to remove persistent values and undesirable descriptors using Data Pre-treatment software GUI 1.2.
(Abdulrahman et al. 2020). Kennard-Stone algorithm (Kennard and Stone 1969) method divided the derivatives into 27 calibration and 12 validation sets to get a mathematical equation.

QSAR equation
Material studio software (V8) was used in constructing a model with Genetic Function Approximation (GFA) procedure. The dependent variable is the anti-proliferative activities (pIC 50 ) and the independent variable is model parameters.

Validating the equation (internal)
The validity of the built model should be tested on an external set of data that has not yet been used during the process of developing the model (Tropsha et al. 2003). The built equations were evaluated using the Friedman formula; (Friedman 1991).
s d equals user-defined smoothing parameter, C equals to the sum of the model terms, M equals the sum of train set compounds and p is the equation parameters (Ibrahim et al. 2018). The correlation coefficient (R 2 ) accounts for the fragment of the total difference of the equation. The nearer the R 2 value is to 1, the more enhanced the model is built. R 2 is given as: Where Y exp and Y pred are the means of biological and calculated activities of the calibration set (Tropsha et al. 2003).
R 2 rises as the number of descriptors increase. Hence R 2 does not assure the effectiveness of the equations. R 2 adj is used to reconfirm the strength and effectiveness of the equation.
Where p and n are the descriptors from the equation and calibration set. Validation coefficient test (Q 2 cv ) was used in assessing the robustness of the model and prediction power of the derivatives, it's given as: Y mintraining Y exp , and Y pred equals to the mean activities (pIC 50 ) of calibration set, bio-activities (IC 50 ) and calculated of the calibration set (Brandon and Orr 2015).

External model validation
Mean effect The mean effect shows the descriptors or model parameters that influence the generated equation.
The symbols on the model parameters show the various impact of each parameter in the overall derived equation, either an increase or decrease of the model parameter. Thus it's expressed as; Where m equals the model parameters, B j equals to descriptors coefficient j, n equals to the prediction set molecules and D j is the matrix value of the model parameter in the prediction set (Minovski et al. 2013). The compounds with (*) are the validation compounds while the compounds without (*) are the calibration set Variance inflation factor (VIF) The VIF takes into account the amount of co-linearity amongst the descriptors in an equation. It is calculated as R 2 is the correlation coefficient. (Myers 1990). The greater the value, the bigger the link amongst the model parameters. The VIF values of less than 10 show the equation is stable while the values above 10 indicate the equation is not effective and cannot be used.
Applicability domain The applicability domain approach is aimed at estimating independently, the reliability of every generated equation (Eriksson et al. 2003). Model validation should be within the training domain and the compounds need to be assessed as fitting within the domain to ascertain the model. The applicability domain is evaluated by the leverage values for each compound. Leverage defines the applicability domain of the built equation (Veerasamy et al. 2011). It is given as; Where X T is the matrix transpose of X used in constructing the equation, X i is the matrix of calibration set of I and X is the n x k matrix of train set descriptors. (d*) is the warning leverage, d* searches for outliers. It is shown as; Quality assurance model generated Table 2 shows the least required values in assessing the mathematical equation (Ibrahim et al. 2018). The table parameters were used in conforming to the effectiveness and prediction power of the derived equations.

Molecular docking studies
Five compounds with high pIC 50 underwent molecular docking studies with the receptor Polo-like kinase 1(PKL1) in complex with B16727. The receptor used was obtained from Protein Data Bank (Code: 3FC2) and was set using Discovery studio software, the ligand (compounds) were also converted to PBD format as shown in Fig. 4. Autodock Vina of Pyrx software was employed in calculating the binding affinity of the ligand and receptor (Abdulfatai et al. 2018).

Pharmacokinetics (drug-likeness)
SwissADME; an online tool, used in investigating the ADME property physicochemical, pharmacokinetic, and medicinal chemistry responsiveness of smalls compounds (Daina et al. 2017) was employed in assessing the Pharmacokinetic parameters of the new structures.
Again, the designed compounds were checked for their adaptability with Lipinski's rule of five (Hou et al. 2019), well-used criteria to comprehend if a compound can be taken orally or not, such as molecular weight (MW) ≤ 500, octanol/ Geary autocorrelation -lag 5 / weighted by Sanderson electronegativities. 2D SPMAX4_Bhs The largest absolute eigenvalue of Burden modified matrix -n 4 / weighted by relative I-state. 2D RDF150U Radial distribution function-150 / unweighted. 2D

QSAR of 2-(4-fluorophenyl) imidazol-5-ones derivatives
QSAR examination was used to verify the relationship of 2-(4-fluorophenyl) imidazol-5-ones derivatives with its antiproliferative activities. Using the Genetic Function Approximation (GFA) method, four QSAR equations were built to predict the anti-proliferative activities of imidazole derivatives. From both internal and external validation parameters, model number 1 passed with correlation coefficient squared (R 2 ) of 0.6981, correlation coefficient adjusted squared (R 2 adj ) of 0.6433, cross-validation coefficient (Q 2 ) of 0.5460 and external validation (R 2 pred ) of 0.5390. Tables 3 and 4 shows how the external validation of model 1 was calculated using the validation set (test set) and model descriptors.  The biological, calculated and the residual values of 2-(4fluorophenyl) imidazol-5-ones compounds are seen in Table 5.
The descriptors obtained from the mathematical model 1 were defined and classified as seen in Table 6.
Further statistical analysis was carried out on the model parameters to find the correlation between the individual descriptors and also the impact of each descriptor in the model. The results are shown in Table 7.
A graph of the calculated activities against the biological activities was drawn to show the relationship between derivatives as seen in Fig. 1. Figure 2 shows a plot of standardized residual values versus the experimental activities of the derivatives.
A graph of standardized residual against the leverages was plotted to show the derivatives that fell within the applicability domain as seen in Figs. 3 and 4.

Molecular docking analysis
A summary of the relationship between some 2-(4fluorophenyl) imidazole-5-ones derivatives and the receptor was shown in Table 8. The pictorial analysis of docked compounds was accessed by assessing the hydrogen bond interactions, hydrogen bond length, and hydrophobic interactions. Both 2D representations of the binding pose of compounds 24 and 27 to the active pocket of the protein target are shown in Figs. 5 and 6 respectively.

Ligand Base drug design
Ligand based approach was used in designing 18 new imidazole derivative compounds with higher calculated activities than that of the template compounds as shown in Table 9.

Pharmacokinetics of designed 2-(4-fluorophenyl) imidazole-5-ones compounds
The newly designed compounds were further explored to ascertain their drug-friendliness. The pharmacokinetic analysis of the new compounds are shown in Table 10, all the   Fig. 3 The William's plot  compounds pass the Lipinski rule of five test. The bioavailability radar of molecules 11, 13, and 17 are shown in Fig. 7.

QSAR of 2-(4-fluorophenyl) imidazol-5-ones derivatives
The results obtained from the QSAR analysis showed that both internal and external validation of the model were in agreement with the minimum proposed values used in assessing the equation as seen in Table 2 above. The external validation of model 1 was achieved using the model parameters from the validation compounds as seen in Tables 3 and 4. The effectiveness of the equation was measured by the reliability of the validation compounds and calculated pIC 50 of the calibration compounds. The biological, calculated and the residual values of 2-(4-fluorophenyl) imidazol-5-ones compounds are seen in Table 5. The low residual values are obtained from the difference between the biological and calculated activities, showing the high prediction power of eq. 1. Both internal and external validation conforms eq. 1 to be greatly effective, strong, and extremely predictive. Table 6 shows the definition of the model 1 parameters. The mean effect obtained from the model parameters, shows (GATS5e, MATS4e, and SpMax4_Bhs) carries a positive coefficient showing that an increase in those factors would increase the bioactivities of the derivatives while (RDF150u) carrying a negative coefficient indicates a decrease in the descriptor would also increase the experimental activities of 2-(4fluorophenyl) imidazol-5-ones derivative compounds. The statistical analysis shows that there is no much collinearity amongst the model parameters ensuring that the equation is highly robust as seen in Table 7. The graph of calculated activities (pIC 50 ) against biological activities (IC 50 ) as shown in Fig. 1 indicates the pIC 50 has been in good agreement with the biological activities as seen in Table 3. Figure 2 shows the values of both calibration and validation compounds spread on both sides of the graph, showing no systematic errors between the standardized residual versus bio-activities (Experimental activity) (Jalali-Heravi and Kyani 2004). Fig. 3 shows William's graph (standardized residuals against leverages), indicating that all the molecules fell in the warning leverage area, calculated to be (h = 0.56).

Molecular docking analysis
The docking analysis on compounds of 2-(4-fluorophenyl) imidazole-5-ones derivatives with the protein target, Pololike kinase 1(PKL1) in complex with B16727 was performed. 5 compounds with high pIC 50 were chosen for these studies, amongst the 5, compound 24 and 27 had the highest docking score of −8.8 and − 9.1 kcal/mol as seen in Table 8.

Ligand based design
Eighteen (18) new 2-(4-fluorophenyl) imidazole-5-ones derivative compounds were designed, their predicted activities were higher than that of the chosen template (compounds 4 with pIC 50− 5.0620 and compound 2 with pIC 50 -4.9846) as shown in Table 9. From the mean effect of the descriptors, SpMax4_Bhs had a greater positive impact followed by GATS5e and MATS4e while RDF150u had the least negative impact on the model. According to MATS4e (Moran autocorrelation -lag 4 / weighted by Sanderson electronegativities) and GATS5e (Geary autocorrelation -lag 5 / weighted by Sanderson electronegativities) descriptors, adding more electronegative elements (GATS5e and MATS4e) would increase the potency of the designed compounds. The modification occurred by adding more electronegative elements to the template (compounds 2 and 4).

Pharmacokinetics of designed 2-(4-fluorophenyl) imidazole-5-ones compounds
The designed compounds were assessed for their drug-friendliness. The molecules passed the drug-friendliness assessment (ADME and physicochemical properties) as shown in Table 10, none of the designed compounds violated two rules out of the Lipinski rule of five; a famous benchmark utilized in invalidating the drug-likeness of a molecule (as stated in "Molecular Docking Studies" section). The bioavailability radar of molecules 11, 13, and 17 are shown in Fig. 7. The bioavailability radar gives a quick and easy summary of the pharmacokinetic properties of a compound. The pink area signifies the ideal ranges for each property (lipophilicity: XLOGP3 from −0.7 to +5.0, size: molecular weight from 150 to 500 g/mol, polarity: TPSA from 20 to 130 Å 2 , solubility: log S less than 6, saturation: the fraction of carbons in the sp 3 hybridization not higher than 0.25, and flexibility: less than 9 rotatable bonds) (Daina et al. 2017).
Conclusion 2-(4-fluorophenyl) imidazole-5-ones derivatives showed a more reliable anti-cancer drug candidate against MCF-7 cell line using QSAR analysis, molecular docking assessment, and pharmacokinetics analysis. The model 1 parameters obtained from QSAR showed that increasing MATS4e and RDF150u, and decreasing GATS5e and SPMAX4_Bhs would proliferation the biological activities of the inhibitors 2-(4-fluorophenyl) imidazol-5-ones derivatives as an effective drug for curing breast cancer. The strength and predictive capability of the generated equation was explored for both internal and external validation assessment which conforms with the least approved values, indicating that model number one parameters could be used in developing new 2-(4-fluorophenyl) imidazol-5-ones drug compounds with higher effectiveness. The model parameters (MATS4e and GATS5e) had more significant and based on their mean effect, adjustment were made on the fragments of the lead compounds (2 and 4) to design 18 new imidazole derivative compounds with a higher calculated activity against MCF-7 cell line. The molecular docking result showed that compound 24 and 27 had the highest docking score of −8.8 and − 9.1 kcal/ mol. From the research it is proved that some series of 2-(4fluorophenyl) imidazol-5-ones derivatives compounds bind tightly to the binding pose of the target, stabilizing the receptor Polo-like kinase 1(PKL1) in complex with B16727 which is proven from the complexes as seen above. The compounds would serve as the most capable inhibitors against (PKL1) and this shows a revolution in medicine to design and develop new estrogen-positive (MCF-7 cell line) breast cancer drugs.
Additionally, the pharmacokinetics analysis (drug-likeliness test) executed on the designed molecules revealed that all the compounds can move on to the next step of pre-clinical trial because they passed drug-friendliness analysis (ADME and other physicochemical properties) and they also adhered to the Rule of Five: a benchmark used in assessing the drug-likeness of compounds. This shows a great discovery for medicine in finding permanent solutions to breast cancer (MCF-7 cell line).
Author's contributions HAL worked on the data set computationally by the methodology to get a mathematical model with high predictive activity and was a major contributor in drafting the manuscript, AU carried out statistical analysis on the built model to ensure its effectiveness and also participated in the write-up and SU re-edited the work to ensure it conforms with the manuscript's guide. All the Arthurs read and approved the final manuscript.
Data availability The data set was obtained from (Abo-Elanwar et al. 2019).

Compliance with ethical standards
Conflict of interest Not applicable.
Code availability The software's used includes; Chemdraw software version 12.0.2, Spartan'14 (version 1.1.2), Material studio (V8) software, Pyrex software version, PADEL-Descriptor Software V2.20, and DTC data lab software, Microsoft word Office Excel 2013 version and Auto dock visualizer version 4.2, Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.