Computer modeling of some anti-breast cancer compounds

The research was aimed at exploring the biological activities of novel series of β-lactam derivatives against MCF-7 breast cancer cell lines via computer modeling such as quantitative structure-activity relationship (QSAR), designing new compounds and analyzing the drug likeliness of designed compounds. The QSAR model was highly robust as it also conforms to the least minimum requirement for QSAR model from the statistical assessments with a correlation coefficient squared (R2) of 0.8706, correlation coefficient adjusted squared (R2adj) of 0.8411, and cross-validation coefficient (Q2) of 0.7844. The external validation of R2pred was calculated as 0.6083 for model 4. The model parameters (MATS5i and MATS1s) were used in designing new derivative compounds with higher potency against estrogen-positive breast cancer. The pharmacokinetics test on the restructured compounds revealed that all the compounds passed the drug likeness test and they could further proceed to clinical trials. These reveal a breakthrough in medicine, in the research for breast cancer drug with higher effectiveness against the MCF-7 cell line.


Introduction
Cancer, a dreadful malady, is also referred to as malignant tumors, and it is a heterogeneous tumor that affects almost all parts of the body. Mammary tumor is among the common diseases that brings about morbidity and mortality among the female being [14]. The cases that were encountered with most anti-cancer drugs during the past three decades include drug ineffectiveness, no drug selectivity, growing side effects, and the drug becoming resistant to the tumor [14]. Regardless of the up-to-date diagnostic and therapeutic advancement, mammary tumor is still the common cause of mortality and the primary utmost cancer among women worldwide [2]. Therefore, more effective and safe therapeutic agents are urgently needed with many pathways to increase the positive outcome of the patients clinically [14].
Computational approaches are regularly employed in almost all modern drug discovery effort, and robust success has been attained for computer-aided lead generation and optimization. These techniques are mostly accurate, faster, and cost-efficient. CADD represents more recent applications of software tools in the designing of lead candidate [13]. CADD technique is basically divided into two sections which are the structure-based (SB) and ligand-based (LB) drug discovery. The CADD technique employed depends on the crystal structure (receptor) available. One of the computer-aided tools for drug discovery and design includes quantitative structureactivity relationship (QSAR).
Quantitative structure-activity relationship (QSAR) is a current technique used in optimizing template molecules and redesigning new drug compounds. QSAR calculates the activities, toxicities, and carcinogenicities of molecules obtained from the definition of the molecular parameters from a derived mathematical equation [5]. It is also a known arithmetic relationship connecting molecular compounds and biological activities for a library of molecules quantitatively [3]. An arithmetic equation is generated from the structural info of a well-derived calibration compound and corresponding biological activities, while the model is validated using some validation compounds for which the biological activities are accessible [12].
Malebari et al. [10] reported 43 novel β-lactam derivative compounds as potent inhibitors against estrogen-positive MCF-7 cell line. The purpose of this research is to utilize ligand-based drug design to design new β-lactam derivative compounds based on an established QSAR mathematical model as inhibitors against estrogen-positive breast cancer (MCF-7 cell line) and, furthermore, to test for the pharmacokinetic properties of the designed compounds.

Data collection
Forty-four (44) new derivative compounds of β-lactams with thier individual inhibitory concentration (IC 50 ) against breast cancer (MCF-7 cell line) were attained from 14 publications.

Bio-activities
The bio-activities of β-lactam derivative compounds were measured in inhibitory concentration (IC 50 ). The scale of logarithm (pIC 50 = − log10 (IC 50 × 10 −6 )) was applied to equalize the IC 50 values. The IC 50 and pIC 50 values of the derivatives are seen in Table 1. It is measured in the concentration of micromolar (μM).

Geometry optimization
This technique was used to get a desirable structure that would be the closest to the initial structural condition [11]. 2D sketching of Chemdraw V (12.0.2) was employed in drawing β-lactam derivatives, and then, it was uploaded on Spartan 14 V (1.1.4) for geometrical optimization. [1]. The template molecule is seen in Fig. 1.

Model parameters
The model parameters were obtained for the whole derived compounds of β-lactams using Pharmaceutical Data Exploration Laboratory Software V (2.20) [16].

Pretreatment and division of data set
The values from the PADEL V (2.20) were prepped using graphical user interface (GUI) 1.2 (Data Pretreatment software), to remove relentless and undesired descriptor values [1]. Kennard-Stone algorithm [8] was employed in splitting the derivatives into calibration and validation fragments in other to construct the equation (model).

Model building and validation
A mathematical equation was built using the train set as predictor variable, while the pIC 50 was used as the predicted variable by using genetic function approximation (GFA) technique of Material Studio Software V 8. The obtained equations were evaluated using Friedman formula [4].
where SEE is the standard estimated error; it is given as C is the summation of the model definitions, p is the total number of model descriptors, M is the sum of prediction set, and d is a user-defined smoothing parameter [9]. The model is verified externally using the correlation coefficient (R 2 ). The nearer R 2 value is to 0.1, the better the regression fitness. R 2 is calculated as: where Y exp and Y pred are means of the actual and calculated activities of the calibration sets [1]. Y minraining indicates the average pIC 50 of the train set molecules [7].    [12]. It is expressed as:

Modeling assessment
The mathematical equation (model) produced is made to undertake statistical test like cross-validated test, R 2 Fisher's test, and R 2 predicted.

Applicability domain
A model validation should be within the training domain and it is essential for the compounds to be assessed as fitting within the domain to ascertain the model. An applicability domain is evaluated by the leverage value for every molecule. The leverage (L) defines the applicability domain of the generated equation [15]. It is formulated as: where X T is the matrix transpose of X used in constructing the equation, X i is the matrix of prediction sets of I, and X is the n x k matrix of train set descriptors. (E*) is the warning leverage; it is a predictive tool that tests for outliers. It is written as: h stands for the total structural parameters and m is the total compounds of train sets. William's plot is a plot of standardized values vs. leverage values of both the training (calibration) and test (validation) sets. Molecules that stay within the calculated H* on the graph are the calculated compounds.

Drug likeness analysis
The ADME properties of a molecule are an important determinant of its therapeutic potency. ADME and bio-availability test play an important role in the drug likeness of new drug molecules [17]. In this research, SwissADME was employed in evaluating the physicochemical properties, pharmacokinetics, and drug similarity of the designed compounds. Furthermore, the designed compounds were checked to ensure compliance with five rules of Lipinski [6].   Tables 2 and 3; the MAE was found to be close to zero which reconfirms the strength of the equation (model) [12]. The effectiveness of the equations was measured by the reliability of the calibration set and calculated pIC 50 of the validation set, which agrees with the criteria proposed by Golbraikh and Tropsha (R 2 pred > 0.6) for a robust equation as seen in Table 6.  The effectiveness of the equation was measured by the value of the calibration set and calculated pIC 50 of the validation set. The observed, calculated, and residual values of βlactam compounds are seen in Table 4. The low residual value from the difference between the biological activities and calculated activities displays the effectiveness of the equation.   Both internal and external validations confirm model 4 to be very potent and extremely effective. Table 5 shows the definition of the descriptors that were used in building the mathematical model; the descriptors were used in validating the model both internally and externally.
The mean effect of the mathematical model was executed statistically to evaluate the contribution of each model parameter individually. From the coefficient of the mean effect values, ALogP2, ATSC0i, MATS5i, and MATS1 had a positive coefficient, meaning that increasing the model parameters would increase the biological activities of the derivatives. Furthermore, ETA_Beta_ns_d having a negative coefficient means that decreasing the model parameter would also increase the biological activities of the derivative compounds as proven in Table 6. Variance inflation factor (VIF) gives a degree of the inter-relationship among the model parameters. The VIF scores were within the approved value of 1-5, indicating that there is no co-linearity between the bio-activities and model parameters of the derived model, as shown in Table 6. The null hypothesis shows no significant connection amid the bio-activity and model parameters of the derived equation at p > 0.05. At a 95% confidence level, the P values of the model parameters were below 0.05. Therefore, the null hypothesis is rejected and the alternative hypothesis is accepted. This indicates that there is no co-linearity between the bioactivity and model parameters of the constructed model, as shown in Table 6. Figure 2 shows a plot of observed activities against the calculated activities of both the test set and the train set of β-lactam derivatives. The graph showed that the predicted activity was in good agreement with its experimental values as shown in Table 2, conforming to the effectiveness and stability of the built model. Figure 3 is a graph of standardized against experimental activity; from the plot, it is shown that the values of both test and train set spread on both sides of zero point on the plot, showing no systematic errors between the standardized residual versus the biological activity (experimental activity). Figure 4 shows a diagram of standardized residual against leverage values also called William's plot. All the compounds fell within the applicability domain from the calculated leverage of L = 0.6429, thou 1 compound was outside the applicability domain which might be due to a slight change in the molecular structure as compared with the remaining molecules in the data set.

Ligand-based drug design
Six (6) new β-lactam derivative compounds were designed using the ligand-based approach. This approach uses the molecular descriptors obtained from the mathematical QSAR model; adjustments were made on the lead compounds (37 and 43) based on the definition of the molecular descriptors (ATSC0i and MATS55i) as shown in Table 5. The newly  MW molecular weight (< 500 mg/mol), nAH number of aromatic heavy atoms, nRB rotatable bonds, HBA hydrogen bond acceptors, HBD hydrogen bond donors, MR molecular refractivity, TPSA topological polar surface area, BBB blood-brain barrier designed compounds with their new calculated activities are seen visually in Table 7.

Computational pharmacokinetics of the designed compounds
The physicochemical properties of the designed derivatives were explored for its drug-like properties. Compounds 37 and 43 revealed the characteristics of the effective drug-like template compounds as seen in Table 8. The compounds showed no violation of Lipinski's rule of five, high GI absorption, 0.55 oral bio-availability score, and zero PAINS alerts + (pain-assay interference structural alerts) indorsing its dependability for further clinical trials. The bio-availability radar is for compounds 37 and 43 and is shown in Fig. 5; it gives a quick glance at the pharmacokinetic properties of the structures.

Conclusion
QSAR and pharmacokinetics analysis carried out on the β-lactam derivatives proved the derivative compounds to be standard anti-breast cancer agents against MCF-7 cell line. The effectiveness of the generated QSAR model was assessed using internal and external validation test; the model conformed to the minimum approved values, indicating the equation could be used in designing new βlactam derivative compounds with enhanced anti-cancer activities. The statistical analysis carried out on the QSAR model showed that ALogP2, ATSC0i, MATS5i, and MATS1s had a positive coefficient, meaning that increasing the model parameters would increase the biological activities of the β-lactam derivative compounds, while ETA_Beta_ns_d having a negative coefficient means decreasing the model parameter would also increase the biological activities of the derivative compounds. Compounds 37 and 43 were chosen as template compounds in designing 6 new derivative compounds because they had higher predicted activity and low residual values. The molecular descriptors (MATS5i and MATS1s) had more significance, and based on their mean effect, adjustments were made on the fragments of the template compounds. Furthermore, the pharmacokinetic analysis (drug likeliness test) carried out on the newly designed compounds revealed that all the compounds passed the drug likeness test (ADME and other physicochemical properties) and they also had zero violation to Lipinski rule of five: a standard measure used in assessing the drug likeness of molecules. This concludes that the compounds can move on to the next step of pre-clinical trial, proving a tremendous discovery for medicine in finding permanent solutions to estrogen-positive breast cancer (MCF-7 cell line).