QSAR, Ligand Based Design and Pharmacokinetic Studies of Parviflorons Derivatives as Anti-Breast Cancer Drug Compounds Against MCF-7 Cell Line

The anti-proliferative activities of Novel series of Parviflorons against MCF-7 breast cancer cell line was explored via in-silico studies like Quantitative Structure–Activity Relationship QSAR, designing new compounds and analyzing the pharmacokinetics properties of the designed compounds. From QSAR, model one emerged the best from the statistical assessments of (R2) = 0.9444, (R2adj) = 0.9273, (Q2) = 0.8945 and (R2pred) of 0.6214. The model was used in designing new derivative compounds, with higher effectiveness against estrogen positive breast cancer (MCF-7). The pharmacokinetics analysis carried out on the newly designed compounds showed that all the compounds passed the drug-likeness test and also the Lipinski rule of five, and they could further proceed to pre-clinical tests. The results indicates that the derivative compounds would serve as potent cure to estrogen positive breast cancer (MCF-7 cell line).


Introduction
After heart disease, cancer is the seconding cause of mortality amongst mankind, in 2018 about 18.1 million cases were diagnosed. Of all the cancer disease, mammary tumor is frequently amongst women and is the second death related disease amongst women folk. After the collective effort towards concurring this disease, the tumor remains a big challenge globally [7]. In the survival of mammary tumor there is a bigger difference globally, with an approximation of 5 years of 80% in the advanced countries to below 40% for emerging countries. Advancing countries battle with resources and infrastructural limitations that challenges the course of successful mammary tumor outcome by early detection, diagnosis and management [4].
Computer-aided drug discovery (CADD) and designing ensures the best possible lead compound, it reduces the cost related to discovering a drug and it also reduces the time taken for the drug to pass through other stages before its ready for usage. It is a fundamental way in the drugs discovery arena. CADD techniques ascertains principal molecule by assessing, predicting the potency, the probable side effect and also assist in correcting drug-likeliness of the compounds [11]. Drug compounds that have low druglikeness and ADMET properties wont progress to pre-clinical research, irrespective of the high biological activity. ADMET is one of the main properties used in analyzing a drug compound, though a significant progress was made when a lot of consideration was given to such properties recently [17].
There has been an increase of breast cancer occurrences, which is still the most substantial cause of mortality among the female being. Despite the headway made in managing breast cancer, the search for a curative treatment is still ongoing as most times, the tumor becomes resistance to this treatment with a short time. Although a number of crucial studies and clinical trials have significantly contributed to the enhancement of mammary tumor care, many cancer cases and pathway often remain yet unknown to the majority of clinicians [19]. Recently [14] reported the anti-proliferative 1 3 activities of some novel compounds of Parviflorons derivatives against MCF-7 cell line. This study is aimed at building a mathematical QSAR, design new Parviflorons compounds based on a derived QSAR model and to furthermore ascertain the pharmacokinetic properties of the newly designed drug compounds.
Luminal type breast cancer (MCF-7) are Estrogen receptor (ER)-/progesterone receptor (PR)-positive type which are caused by the over expression of estrogen receptor α (ERα). It accounts for about 70% of the mammary tumor patients tagged as ER positive (ER +). The constant activation of ERα by estrogens induces the proliferation of cancer cell [12].

Hardware and Software
The computer details used in this research is; 7th generation HP pavilion Intel R, core i7-7500u RAM 12.00 GB running on a windows 10

Data Gathering
Twenty-six (26) novel derivative compounds of Parviflorons derivatives against MCF-7 cell line with their anti-proliferative activities reported in inhibitory concentration (IC 50 ), against breast cancer (MCF-7) cell line were reported from [14] article.

Anti-proliferative Activities and Geometry Optimization
The IC 50 values were normalized to pIC 50 using scale of logarithm {pIC 50 = − log10 (IC 50 × 10 -6 )}. The tabulated anti-proliferative activities (IC 50 ) and pIC 50 of the derivatives are shown in Table 1, measured in concentration of micro molar (µM). QSAR analysis requires intensive attention for the whole job to be executed. At the start, drawing the structure is a crucial step for the calculation of molecular descriptors as the independent variables. In this research, Chemdraw V (12.0.2) was used in drawing Parviflorons derivatives and converted to 3D format  [3]. The aim of optimization is to acquire a more appropriate 3-dimentional structure that is very close with the original 3-dimentional molecular structure. Therefore the molecular parameters may well represent the main physicochemical properties of the observed molecule [15].

Molecular Descriptors Calculations and Pretreatment
26 derivative compounds of Parviflorons were converted to SDF format after optimization. Pharmaceutical Data Exploration Laboratory Software V (2.20) was used in calculating physicochemical descriptors [18]. The descriptors were pretreated using Data Pre-treatment software GUI 1.2 [1] to remove irrelevant values.

Division of Data Set and Model Building
Kennard-Stone algorithm [13] method was utilized to distribute the derivatives into training or calibration set and test or validation set to build the model [2]. The calibration set is used to develop a calibrated model that would be used in predicting the bio-activities of the validation set of molecules. Version 8 of Material studio software was utilized in constructing a mathematical model with Genetic Function Approximation (GFA) technique. The dependent variable is the anti-proliferative activities (pIC 50 ) and the independent variable are model parameters (descriptors) which were obtained using Pharmaceutical Data Exploration Laboratory Software V (2.20).

Model Validation (Internal)
Internal validation employs the derivative compounds used in generating the model and checks for core effectiveness. Cross-Validation (CV) procedure is commonly utilized as an internal validation technique for the derived model, mostly one compound from the train set is removed, The n-1 (n = the total molecules) molecules are utilized in building the model using the calibration or train set. The anti-proliferative activity of the compound removed is calculated once, the method is repeated n times for every molecule, thus every molecule having a calculated activity [6]. Such procedure is known as leave-one-out (LOO) technique. It's given as: Y training Y exp , and Y pred are the average activities (pIC 50 ) of training set, bio-activities (IC 50 ) and prediction inhibition concentration of the train set [5]. The coefficient of correlation for the cross-validated technique R 2 is given as: Where Y exp and Y pred are averages of the actual and predicted activity of the training sets [16]. It is a research tool used in estimating the prediction power of the statistical model that was acquired from a regression technique.

Model Validation (External)
A built model with excellent good fit and an approved prediction can still be faulty in an actual relationship between (model descriptors) predictor variables and (bio-activity) response variables. The degree of potency of the built model (equation) is analyzed by external validation, it calculates the degree of fitness of the model. The criteria proposed by Golbraikh and Tropsha for an effective built model with good predictive power are stated as follows; where r 2 is the squared correlation coefficient between the actual and calculated activity, r 2 o is the correlation coefficient squared between the actual and calculated activity, and k and k′ are the regression slopes passing through the origin [2].

QSAR Applicability Domain of Model
The goal of an applicability domain methods is for estimating individually, the reliability of each generated model [8]. A model validation should be within the training domain and its essential for the compounds to be assessed as fitting within the domain to ascertain the model. An applicability domain is evaluated by the leverage value for every molecule. The leverage (L) defines the applicability domain of the generated equation [20]. It is formulated as; Where X T is the matrix transpose of X used in constructing the model, X i is matrix of train compounds of I and X is the n x k matrix of train set descriptors. (H*) is the warning leverage, it is a prediction tool that checks for outliers. It's written as; p equals to the total structural descriptors and m is the total compounds of train sets. The William's plot (A plot of standardized values versus the leverage values) of both the training (calibration) and test (validation) set. Molecules that fall within the warning leverages on the plot are the predicted compounds.

Computational Pharmacokinetics (Drug-Likeness)
SwissADME was used in analyzing the drug-likeness of the newly designed compounds. Furthermore, the designed compounds was checked for their compliance with Lipinski's rule of five [10], a well-used criteria to comprehend if a compound can be orally absorbed or not, such as: molecular weight (MW) ≤ 500, octanol/water partition coefficient (AlogP) ≤ 5, number of hydrogen bond donors (HBDs) ≤ 5 and number of hydrogen bond acceptors (HBAs) ≤ 10.6. According to the Rule of Five, a drug compound would not be orally active if it violates two or more of the four rules [9].

Insilico QSAR Investigation
Insilico QSAR investigation was used in finding a simple mathematical equation that was used in calculating an enhanced anti-proliferative activities from structures of Parviflorons derivatives. The QSAR investigation also correlated the molecular descriptors (model parameters) with the physico-chemical properties of the 26 derivative compounds (bio-activities) using statistical techniques. Based on the Genetic Function Approximation (GFA) technique employed, four QSAR models were generated to predict the anti-proliferative activities of Parviflorons derivatives. Model 1 (one) passed both internal and external validation with correlation coefficient squared (R 2 ) of 0.9444, correlation coefficient adjusted squared (R 2 adj ) of 0.9273, cross validation coefficient (Q 2 ) of 0.8945. The external validation of (R 2 pred ) of 0.6214 for model 1 was calculated using the model descriptors from the test set as shown in Tables 2 and  3. The robustness of the QSAR models were assessed using the reliability of the train set and predicted pIC 50 of the test set, which agrees with the criteria proposed by Golbraikh and Tropsha (R 2 pred > 0.6) for an effective QSAR model as shown in Table 5.
The robustness of the QSAR models were assessed by the reliability of the calibration set and calculated pIC 50 of the validation set. The Experimental, predicted and the residual values of Parvifloron derivatives are shown in Table 4. The low residual value is obtained from the difference between the anti-proliferative and calculated activity, indicating the high predictive power of the model. Both internal and external validation conforms model 1 to be very stable and highly effective. Table 5 defines the model parameters (descriptors) in the calculated model, the descriptors were used in verifying the model both internally and externally. They were calculated using PADEL-Descriptor Software V2.20 from (Abdullahi et al. [2]).The effectiveness and predicting power of the generated model was assessed using internal and externalvalidation analysis, the model conformed with the least approved QSAR model values, indicating that model can be used in designing new Parvifloronderivatives compounds with better anti-breast cancer activity as seen in table 6.
Statistical analysis was used in evaluating the individual contribution of each molecular descriptor in the QSAR model, i.e. the Mean effect and VIF (Variance Inflation Factor). The coefficient of the mean effect values are used to either increase or decrease the effect of the descriptors. Therefore, increasing nX, GATS5e and MLFER_BO would increase the bio-activities of the derivative compounds (positive coefficient) while decreasing MATS3e would also increase the bio-activities of the derivative compounds (negative coefficient) as proven in Table 7. VIF (Variance Inflation Factor) gives a degree of the inter-relationship amongst the model parameters. The VIF scores were within the approved value of 1-5, indicating that there is no colinearity between the bio-activity and model parameters (descriptors) of the constructed model, as shown in Table 7. Figure 1 shows a graph of observed activities against the calculated activities of both the test set and the train set of compounds. The plot showed that the predicted activity was in good agreement with its experimental values as shown in Table 2, conforming to the effectiveness and stability of the model generated.  Figure 2 shows the values of both test and train set spread on both sides of zero point on the plot, showing no systematic errors between the standardized residual versus the antiproliferative activity (Experimental activity). Figure 3 shows the standardized residuals against the leverage values also called William's plot. Most of the compounds fell within the applicability domain from the calculated leverage of (L = 0.833), only 3 compounds we found outside the applicability domain which might be due to a slight changes in their molecular structure as compared with other molecules in the data set.

Ligand-Based Drug Design
Eight (8) new Parviflorons derivative compounds were designed using the ligand based approach. The lead compounds (4 and 16) were chosen due to their low residual values and high pIC 50 values as shown in Table 4. This approach uses the molecular descriptors obtained from the mathematical QSAR model and adjustments were made on the lead compounds (4 and 16) based on the definition of the molecular descriptors nX, having a positive coefficient (this mean adding either of the halogen atoms, which includes F, Cl, Br, I etc. at different structural positions) and GATS5e also having a positive coefficient (this also means adding electronegative compounds such as OH, OCH 3 etc.) as shown in Table 5. The newly designed compounds and their new calculated activities are seen visually in Table 8. Moran autocorrelation-lag 3/weighted by Sanderson electronegativities 2D GATS5e Geary autocorrelation-lag 5/weighted by Sanderson electronegativities 2D MLFER_BO Overall or summation solute hydrogen bond basicity 2D

Physicochemical and ADME Properties (Pharmacokinetics) of Designed Parvifloron Compounds
There are lot of designed compounds that fail to become drugs. Efficiency and safety of the drug to the system are the main cause of drug failure, these indicates the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of compounds plays a major role in every step of the drug discovery pipeline. Therefore, it is compulsory to discover potent compounds with effective ADMET properties (Guan et al. [9]). All the newly designed compounds were assessed for their drug-likeliness (ADME and physicochemical properties). None of the designed compounds violated two rules out of the Lipinski rule of five; a prominent principle used in certifying the drug-likeness of a compound, this shows that all the designed compounds passed the drug-likeness test as shown in Table 9, making the compounds a breakthrough in finding the cure to triple-negative breast cancer. Figure 4 shows the bioavailability radar for molecules 1 and 6. The Bioavailability Radar gives an initial scan at the drug-likeness of the compound.

Conclusion
Parvifloron derivatives showed a more promising antibreast cancer drug candidate against MCF-7 cell line via QSAR studies and pharmacokinetics analysis. Based on the statistical analysis from the mathematical model obtained from QSAR studies showed that increasing nX, GATS5e and MLFER_BO descriptors will increase the anti-proliferative activities of Parvifloron derivatives while decreasing MATS3e would also increase the anti-proliferative of Parvifloron derivatives as a standard anti-breast cancer drug    compounds revealed that all the compounds passed druglikeness test (ADME and other physicochemical properties) and they also adhered to the Lipinski rule of five: a criteria used in evaluating the drug-likeness of compounds. This concludes that the compounds can move on to the next step of pre-clinical trial, showing a great discovery for medicine in finding permanent solutions to breast cancer (MCF-7 cell line).
Funding This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of interest No conflict of interest.
Ethics Approval and Consent to Participate Not applicable.

Consent of Publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.