1 Background

After cardiovascular diseases, cancer is the second most deadly disease to the human health [1]. Worldwide, one of seven main death causes is cancer that affects around 14 million people every year. The adoption of lifestyle activities especially in developing countries where almost 82% of the entire population of the world exist has increased higher risk of cancer, due to lack of exercise, smoking, and heredity variation [2]. Breast cancer is the utmost form of cancer on the globe and the second cause of death related to cancer amidst women. A prediction of about 1 to 1.3 million cases on cancer of the breast is detected yearly globally [3].

Triple-negative type of breast cancers (TNBCs) are termed as antagonistic mammary growths, and they are described by the lack of human epidermal growth factor receptor 2 (HER2), estrogen receptor (ER), and progesterone receptor (PR) [4]. TNBCs metastasize to the central nervous system and lung regularly than non-TNBCs, which usually metastasize to the bone. Such metastatic actions reduce the life expectancy of patients with TNBC than patients with non-TNBC patients due to non-enhanced inhibitors compounds [3].

Recently, a novel series of 2-anilinopyrimidine was reported by Jo et al. [4] as inhibitors against MDA-MB-468 cell line. There is also evidence that reduced thyroid hormone receptor manifestation and/or variations in thyroid hormone genes occur frequently in cancer [5], suggesting that the native receptors could act as tumor suppressors and that loss of re-occurrence of this receptor could show a selective lead for cell alteration and advancement of tumor transformation [6].

Unconventional medicine takes prolonged time and effort to be manufactured, thereby not meeting up with the urgency needed for a comprehensive treatment. Computer-aided drug design has been a great success in designing novel drugs with great effectiveness and better potency against diseases. The aim of this study is to explore the anti-proliferative activities of 2-anilinopyrimidine against triple-negative cancer cell line, MDA-MB-468 via in silico studies like QSAR and docking studies that can be used to further develop anti-breast cancer drug candidate.

2 Methods

2.1 Computational information

2.1.1 Hardware

The computer details used in this research is the 7th Generation HP Pavilion Intel R, core i7-7500u RAM 12.00 GB running on a Windows 10 operating system.

2.1.2 Software

The software used to carry out this research includes Spartan’14 (version 1.1.2), Material studio (V8), AutoDock visualiser version 4.2, Pyrex software version, ChemDraw software version 12.0.2, PADEL-Descriptor Software V2.20 and DTC data lab software, and Microsoft Word Office Excel 2013 version.

2.2 QSAR studies

2.2.1 Data collection

Thirty novel derivative compounds of 2-anilinopyrimidine with their inhibitory concentration (IC50), against triple-negative breast cancer (MDA-MB-468) cell line, were acquired from Jo et al. [4] reports.

2.2.2 Bioactivities

Anti-proliferative activities of 2-anilinopyrimidine derivative compounds were measured in inhibitory concentration (IC50); an IC50 (50% inhibitory concentration) value of a chemical compound is defined as the concentration of the compound required to decrease the viability of a given cell line by 50%. The IC50 values were normalized using the scale of logarithm to pIC50 values to reduce the skew in the activities. The tabulated anti- proliferative activities (IC50) and pIC50 of the derivatives are shown in Table 1, and it is measured in the concentration of micromolar (μM). The logarithm scale is given as follows:

Table 1 2-Anilimopyrimidine derivatives with their activities

pIC50 = − log10 (IC50 × 10−6).

2.2.3 Geometry optimization

The geometry optimization is aimed to earn a more desirable geometric structure that is closer to the actual geometric condition of the molecular structure [2]. The derivative compounds were sketched in 2D on ChemDraw V (12.0.2) and converted on Spartan 14 V (1.1.4) software. Density functional theory (DFT) using the B3LYP, 6-311G basis set, was used for the geometric optimization of the compounds [7,8,9]. The parent compound is shown in Fig. 1.

Fig. 1
figure 1

Parent compound of 2-anilinopyrimidine derivatives

2.2.4 Molecular descriptor

Pharmaceutical Data Exploration Laboratory Software V (2.20) was used in calculating molecular descriptors for the 30 optimized compounds of 2-anilinopyrimidine derivatives [10].

2.2.5 Pretreatment and division of data set

Results obtained from PADEL-software were pretreated using Data Pre-treatment software GUI 1.2 to remove constant values and unwanted descriptors [9, 11]. Kennard-Stone algorithm [12] was used in dividing the derivatives into 21 train and 9 test set to build the model.

2.2.6 Model building and model validation

The internal validation of the train test (twenty-one compounds) was executed in version 8 of Material studio software to construct a model by employing a genetic function approximation technique. Using the Friedman formula, the obtained models were evaluated [13].

$$ \mathrm{LOF}=\frac{SEE}{M{\left[1-\beta \left(\frac{c+d\times p}{M}\right)\right]}^2} $$

where SEE is the standard estimated error. If SEE is low, it implies a better model. SEE is expressed as follows:

$$ \mathrm{SEE}=\sqrt{\frac{{\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{pred}}\right)}^2}{N-P-1}} $$

C is the sum of the model terms, p is the total number of model descriptors, M is the sum of train set, and d is a user-defined smoothing parameter [14]. The model is verified using the correlation coefficient (R2). R2 value should be close to 1, to obtain an enhanced and effective model. R2 is given as follows:

$$ {R}^2=1-\left[\frac{\sum {\left({Y}_{\mathrm{pred}}-{Y}_{\mathrm{exp}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{training}}\right)}^2}\right] $$

where Yexp and Ypred are averages of anti-proliferative and predicted activities of the train set [15].

R2 value increases as the descriptor number increases; thus, R2 value is not guaranteed in terms of the model’s strength. The R2 is altered to obtain a robust and strong model, which is given as follows:

$$ {R}^2\ \mathrm{adj}=\frac{R^2-P\left(n-1\right)}{n-p+1} $$

where p and n are the numbers of generated descriptors and train set. The stability of the model derivatives was assessed using validation coefficient test (Q2cv) given as:

$$ {Q}^2\mathrm{cv}=1-\left[\frac{\sum {\left({Y}_{\mathrm{pred}}-{Y}_{\mathrm{exp}}\right)}^2}{\sum {\left({Y}_{\mathrm{exp}}-{Y}_{\mathrm{training}}\right)}^2}\right] $$

Ytraining, Yexp, and Ypred are the average biological activities (pIC50), biological activities (pIC50), and prediction inhibition concentration of the train set [16].

2.2.7 QSAR modeling evaluation

The models generated were set to undergo statistical parameters such as the cross-validated test, R2 Fisher’s test, and R2 predicted.

2.2.8 Mean effect

The mean effect relates to the impact of the descriptors and the compound activities in the model. Notations attached to the descriptors show the variant direction in the activity of the compounds, either an increase or a decrease in the descriptor value. It is defined as follows:

$$ \mathrm{Mean}\ \mathrm{effect}=\frac{\beta_j{\sum}_i^n{D}_j}{\sum \left({\beta}_j{\sum}_i^n{D}_j\right)} $$

where m is the total descriptors in the model, Bj equals to descriptor coefficient j, n is the total molecules in the train set, and Dj is the matrix value of the descriptor in the train set [17].

2.2.9 Variance inflation factor (VIF)

The VIF measures the extent of correlation between one descriptor and the other descriptor in a model. The higher the values show that it is almost impossible and difficult to show the contribution of a descriptor accurately in a model. It is evaluated as follows:

$$ \mathrm{VIF}=\frac{1}{\left(1-{R}^2\right)} $$

R2 is the correlation coefficient [18].

The higher the value, the greater the correlation between the descriptors. Values of 1–7 are sometimes regarded as being moderate, and it shows the strength and robustness of the model, while values of 10 show the correlation between the descriptors is very high, and therefore, the model is very unstable.

2.2.10 QSAR applicability domain of the model

The goal of applicability domain methods is for estimating individually the consistency of each model generated [19], and it aimed at predicting the uncertainty of a compound depending on its similarities to the compounds used in building the model and also the distance of both train and test set. The leverage is used in defining the applicability domain of the generated models [20]. It is formulated as follows:

$$ {H}_i={x}_i\ {\left({X}^TX\right)}^{-k}\ {x_i}^T\ \left(i={K}^{\dots },P\right) $$

where X is the n × k matrix of train set descriptors, XT is the matrix transpose of X used in building the model, and Xi is the matrix of train compounds of I. (h*) is the warning leverage, and it is a prediction tool that checks for outliers. It is written as follows:

$$ {h}^{\ast }=\frac{3\left(p+1\right)}{n} $$

n is equal to the total train set and p equals to the total descriptors from the model generated. William’s plot is generated by plotting the standardized residuals versus the leverage of both the train and test set. Molecules that fall within the warning leverages on the plot are the predicted compounds that fall within the threshold. The reliability of the QSAR model was assessed using the minimum accepted values as shown in Table 3 [21].

2.3 Molecular docking

Molecular docking studies were implemented on the derivative compounds of 2-anilinopyrimidine (ligand) and thyroid hormone receptor (TRβ1). The receptor was gotten from protein data bank with the code (PDB: 1Y0X). The docking scores of the ligand–receptor were obtained with Autodock Vina of PyRx software [11]. The detailed interactions between the ligand and the receptor were visualized using Discovery Studio Visualizer.

3 Results

3.1 QSAR of 2-anilinopyrimidine derivatives

Four QSAR models were generated using the Genetic Function Approximation (GFA) technique to predict the anti-proliferative activities. Model 4 passed the internal validation test which confirmed with the least requirement for QSAR modeling as shown in Table 2.

Table 2 Recommended values used in the assessing of the QSAR model

3.1.1 Model 1

pIC50 = − 0.000041993 × VR1_Dzv + 0.430022665 × C3SP3 − 0.029366849 × RDF125m − 0.013797643 × RDF105p + 4.414124338

3.1.2 Model 2

pIC50 = 0.019329185 × ALogp2 + 0.209407843 × C3SP3 + 0.013676289 × RDF40i − 0.000911095 × Vm + 3.736702488

3.1.3 Model 3

pIC50 = − 0.015741625 × VR3_Dzv + 0.385603503 × C3SP3 − 0.036977855 × RDF125m − 0.016463267 × RDF105p + 4.720613746

3.1.4 Model 4

pIC50 = − 0.000060824 × VR1_Dzv + 1.185723768 × SpMin1_Bhs + 0.378178925 × C3SP3 − 0.128667903 × MOMI-R + 3.282331294

Tables 3 and 4 show the calculation of the external validation of the QSAR model using the model parameters of model 4. The external validation (R2pred) was calculated as 0.5390, which also conforms to the minimum required values for QSAR modeling, and makes the model very robust and highly potent. The meaning of each model parameter used in validating model 4 is given in Table 6.

Table 3 External validation of model 4
Table 4 Continuation of descriptor values of the test set used in external validation of model 4
$$ \sum {\left({Y}_{ob}-{Y}_{pred}\right)}^2=0.2227\sum {\left({Y}_{ob s}-{Y^{-}}_{train}\right)}^2=0.4814\therefore {R}^2 test=1-\left(0.2227/0.4814\right)=0.5390 $$

The experimental, predicted, and the residual values of 2-anilinopyrimidine derivatives are shown in Table 5. The residual values were obtained from the calculated activities statistically. All the derivative compounds had low residual values indicating the degree of effectiveness of the QSAR model 4.

Table 5 Differentiation of bio-activities (pIC50), prediction inhibition, and residual of model 4

Table 6 shows the four model parameters (descriptors) that were used in building the QSAR model 4 and were also used in evaluating the strength of the model externally. The descriptors are defined and classified in Table 6.

Table 6 Definition of descriptors (model parameters) and their classes for model 4

Table 7 shows the statistical evaluation (VIF, mean effect, P values) of the model parameters. The VIF shows the degree of co-linearity between the descriptors, and it was calculated using the following equation:

$$ VIF=\frac{1}{\left(1-{R}^2\right)} $$
Table 7 Statistical analysis of the descriptors used in the QSAR model 4

R2 is the correlation coefficient [18].

The mean effect shows the contribution of each descriptor to the built model, and the signs of the values show if the descriptors give a negative or positive contribution in the model. The P values evaluate the statistical significance between the model parameters.

Figure 2 shows a straight line graph of calculated activities (predicted activities) against experimental activities of 2-anilinopyrimidine derivative compounds as tabulated in Table 5. Both the experimental and predicted activities showed a good relationship as proven by the graph.

Fig. 2
figure 2

Graph of predicted activities versus experimental activities (bioactivities)

Figure 3 shows a graph of standardized residual against inhibition concentration of both the train and test set. All the values were well distributed on both sides of the y-axis, showing the effectiveness of model 4.

Fig. 3
figure 3

Graph of standardized residual against inhibition concentration (experimental activity)

Figure 4 is a graph of standardized residuals against the leverage values, and the plot is called William’s plot. The plot was used to assess the uncertainty in similarities of the derivative compounds used in building the model. Compounds that fall between the warning leverage tend to be similar structurally. The warning leverage was calculated to be (h* = 0.714) using the formula:

$$ h\ast =\frac{3\left(p+1\right)}{n} $$
Fig. 4
figure 4

A plot of standardized residual versus leverages (William’s plot)

3.2 Molecular docking studies

The summary of the docking studies result of some 2-anilinopyrimidine derivative compounds is given in Table 8. The docking score was obtained using PyRx software while the docking interactions between the receptor and the ligand to form complexes which include hydrophobic bond, hydrogen bond, and the bonding distances were visualized using Discovery Studio Software. The hydrogen and hydrophobic interaction that occurred between 2-anilinopyrimidine derivative compounds (ligand) and the active pocket of (TRβ1) receptor in 3D format for complexes 15 and 18 are shown in 2D format in Figs. 6 and 7, while Fig. 8 shows the same interaction in a 3D format.

Table 8 Binding affinity, interaction type, bond type, and distances in between some compounds and the receptor

4 Discussion

4.1 QSAR of 2-anilinopyrimidine

QSAR modeling was used to validate quantitatively the structure relationship of 2-anilinopyrimidine derivatives with its anti-proliferative activities. The robustness of the QSAR models was assessed by the fitness of the train set and predicted pIC50 of the test set. Four QSAR models generated using the Genetic Function Approximation (GFA) technique to predict the anti-proliferative activities. Model 4 passed the internal validation with correlation coefficient squared (R2) of 0.8760, correlation coefficient adjusted squared (R2adj) of 0.8451, cross-validation coefficient (Q2) of 0.6141, and external validation (R2pred) of 0.5390. All the values obtained were in accordance with the least proposed value used in the evaluation of QSAR model as shown in Table 2. The obtained values (R2, R2adj, Q2, and R2pred) indicate the existence of a high correlation between the predicted pIC50 along with the biological pIC50 of the data set.

4.1.1 External validation of QSAR model 4

Model 4 was verified as the best model using the descriptors from the test set of the derivative compounds. Tables 3 and 4 show how the external validation was achieved using the values of the descriptors from the test set. The experimental, predicted, and the residual values of 2-anilinopyrimidine derivatives are shown in Table 5. The low residual value from biological (anti-proliferative) activities and predicted activities shows the high performance of the model.

Table 6 shows the definition of the descriptors (model parameters). The mean effect result (Table 7) showed the degree of impact of each descriptor on the model, and the values and coefficients of the descriptors show that decreasing MOMI-R and then VR1_Dzv (negative descriptors) would increase the anti-proliferative activities of the derivative compounds while increasing SpMin1_Bh followed by C3SP3 (positive descriptors) which would also increase the anti-proliferative activities of 2-anilinopyrimidine derivative compounds. The variance inflation factor (VIF) showed that there is no much inter-correlation between the descriptors making the model very stable. The null hypothesis shows no significant connection amid the bio-activity and model parameters of the constructed model at p > 0.05. At 95% confidence level, the P values of the model parameters were below 0.05. Therefore, the null hypothesis is rejected and the alternative hypothesis is accepted as shown in Table 7.

Figure 2 shows the plot of predicted activity (pIC50) versus the experimental activity (IC50) of both the test set and train set of compounds. The plot showed that the predicted activity was in good agreement with its experimental values as shown in Table 2, conforming the effectiveness and strength of the built model. Figure 3 is a plot of standardized residual versus biological activity (inhibition concentration) of both the train set and test set, and it shows the values spread on both sides of the zero point of the plot, showing no systematic errors. Figure 4 is a graph of standardized residuals against the leverage value, and the plot is called William’s plot. Almost all the compounds were within the calculated warning leverage (applicability domain) of h* = 0.714, and compounds 2, 8, 15, and 14 were found to be outside the warning leverage which perhaps is because of the slight difference in their structures equated to other molecules of the data set. Both internal and external validation conform model 4 to be very stable, robust, and highly predictive.

4.2 Molecular docking studies

Molecular docking studies on compounds of 2-anilinopyrimidine with the protein target thyroid hormone receptor (TRβ1) were performed. Amongst all the derivatives, compounds 12, 15, 18, and 30 had high docking scores. The prepared receptor and ligand are shown in Fig. 5. Compounds 15 and 18 had the highest docking score of – 7.4 and – 7.3 kcal/mol as shown in Table 8. The visual examination of the docked complexes was done by evaluating the hydrogen bond interaction, hydrogen bond length, and hydrophobic interaction.

Fig. 5
figure 5

3D structure of the ligand and prepared protein receptor

Compound 15 showed the backbone conventional hydrogen bonding interaction with ARG 429 (2.50 A0) and two amino acids of GLU311 (2.7609 A0 and 2.1551 A0). Again, VAL458 showed carbon–hydrogen interaction with the compound at distance of 3.3765 A0. Also, the pi-orbital containing delocalized electrons in the benzene ring interact with the alkyl groups of ILE303 (5.4379 A°), LYS306 (5.04683 A°), and ARG383 (5.3858 A°) and three amino acids of PRO384 (5.1107 A°, 4.7845 A°, and 4.7531 A°) to form hydrophobic bond.

Compound 18 also showed the same hydrogen bonding with amino acid residues of GLU311 (2.10982 A°), ARG439 (2.68544 A°), GLY307 (2.97669 A°), GLU311 (2.85424 A°), and carbon–hydrogen bonding with VAL458 (3.34145 A°). Furthermore, the benzene ring of the compound interacts with the alkyl groups the amino acid residues formed hydrophobic bond, and they include ILE303 (5.28774 A°), LYS306 (4.9622 A°), ARG383 (5.40494 A°), PRO384 (4.84454 A° and 5.15235 A°), and ALA436 (4.91503 A°).

Both compounds were adequately docked and their orientation is similar in some instances, validating the good quality of the docking results. Both compounds showed the same hydrogen bond and hydrophobic bond interactions with the amino acid residues of the receptor at different distances. The ligands had docking scores better than the standard drug gefitinib (− 5.3 kcal/mol). From the compound interaction with the receptor, it proves the ability of the compounds to inhibit TRβ1 receptor. Figures 6 and 7 give detailed binding interactions of the receptor with ligands 15 and 18 while Fig. 8 shows how the ligand (compound) binds firmly to the active site of the protein receptor to form complexes in 3D with ligands 15 and 18.

Fig. 6
figure 6

Receptor–ligand interaction of complex 15 on a 2D diagram

Fig. 7
figure 7

Receptor–ligand interaction of complex 18 on a 2D diagram

Fig. 8
figure 8

Ligand interactions of complexes 15 and 18 in a 3D diagram

5 Conclusion

2-Anilinopyrimidine derivatives were proven to be a better anti-cancer drug candidate against MDA-MB-468 cell line from both QSAR studies and molecular docking studies that were carried out to predict a better activity from the experimental activity of the derivatives and also comprehend the interaction of the ligand (derivatives) and thyroid hormone receptor (TRβ1). The coefficient and values of the mean effect of QSAR model 4 indicate that increasing Spmin1_Bhs and C3SP3 descriptors will increase the anti-proliferative activities of the derivatives while decreasing VR1-DZv and MOMI-R descriptors would also increase the activities of 2-anilinopyrimidine derivatives as a standard anti-breast cancer agent. The robustness, applicability, and predicted capacity of the model generated were analyzed for both internal and external validation test which conforms to the minimum recommended values. This indicates that model 4 can be used in developing new 2-anilinopyrimidine derivative compounds with better anti-breast cancer activity. The molecular docking result showed that compounds 15 and18 had the highest docking score of − 7.4 and − 7.3 kcal/mol, when it is compared to the standard drug gefitinib. From the studies, it is proven that some series of 2-anilinopyrimidine derivative compounds bind tightly to the receptor, stabilizing the receptor (TRβ1) which is evident from the receptor–ligand interactions. The compounds would serve as the most promising inhibitors against TRβ1. This research would be a breakthrough for pharmaceutical researchers in designing and developing new anti-triple-negative breast cancer drugs.