Molecular docking and MD simulation studies of 4-thiazol-N-(pyridin-2-yl)pyrimidin-2-amine derivatives as novel inhibitors targeted to CDK2/4/6

Purpose Nowadays, cyclin-dependent kinase 4/6 (CDK4/6) inhibitors have been approved for treating metastatic breast cancer and have achieved inspiring curative effects. But some discoveries have indicated that CDK 4/6 are not the requisite factors in some cell types because CDK2 partly compensates for the inhibition of CDK4/6. Thus, it is urgent to design CDK2/4/6 inhibitors for significantly enhancing their potency. This study aims to explore the mechanism of the binding of CDK2/4/6 kinases and their inhibitors to design novel CDK2/4/6 inhibitors for significantly enhancing their potency in different kinds of cancers. Materials and methods A series of 72 disparately functionalized 4-substituted N-phenylpyrimidin-2-amine derivatives exhibiting potent inhibitor activities against CDK2, CDK4 and CDK6 were collected to apply to this research. The total set of these derivatives was divided into a training set (54 compounds) and a test set (18 compounds). The derivatives were constructed through the sketch molecule module in SYBYL 6.9 software. A Powell gradient algorithm and Tripos force field were used to calculate the minimal structural energy and the minimized structure was used as the initial conformation for molecular docking. By the means of 3D-QSAR models, partial least squares (PLS) analysis, molecular dynamics (MD) simulations and binding free energy calculations, we can find the relationship between structure and biological activity. Results In this study, we used molecular docking, 3D-QSAR and molecular dynamics simulation methods to comprehensively analyze the interaction and structure–activity relationships of 72 new CDK2/4/6 inhibitors. We used detailed statistical data to reasonably verify the constructed 3D-QSAR models for three receptors (q2 of CDK2 = 0.714, R2pred = 0.764, q2 = 0.815; R2pred of CDK4 = 0.681, q2 = 0.757; R2pred of CDK6 = 0.674). MD simulations and decomposition energy analysis validated the reasonability of the docking results and identified polar interactions as crucial factors that influence the different bioactivities of the studied inhibitors of CDK2/4/6 receptors, especially the electrostatic interactions of Lys33/35/43 and Asp145/158/163. The nonpolar interaction with Ile10/12/19 was also critical for the differing potencies of the CDK2/4/6 inhibitors. We concluded that the following probably enhanced the bioactivity against CDK2/4/6 kinases: (1) electronegative groups at the N1-position and electropositive and moderate-sized groups at ring E; (2) electrogroups featured at R2; (3) carbon atoms at the X-position or ring C replaced by a benzene ring; and (4) an electrogroup as R4. Conclusion Previous studies, to our knowledge, only utilized a single approach of 3D-QSAR and did not integrate this method with other sophisticated techniques such as molecular dynamics simulations to discover new potential inhibitors of CDK2, CDK4, or CDK6. So we applied the intergenerational technology, such as 3D-QSAR technology, molecular docking simulation techniques, molecular dynamics simulations and MMPBSA19/MMGBSA20-binding free energy calculations to statistically explore the correlations between the structure with biological activities. The constructed 3D-QSAR models of the three receptors were reasonable and confirmed by the excellent statistical data. We hope the results obtained from this work will provide some useful references for the development of novel CDK2/4/6 inhibitors.


Introduction
Cyclin-dependent kinases (CDKs) which can produce some extracellular effects, are classified as serine/threonine protein kinases.In mammalian cells, some well-differentiated CDKs can dominate the sequential nature and inactivation of cell cycle progression (Bury et al. 2021;Morrison et al. 2024), which is mediated by an intricate system of regulatory factors, including the tumor depressor retinoblastoma protein 1 (RB1) (Cornwell et al. 2023).
CDK4/6 are key regulators of the cell cycle, forming a complex with cyclin, phosphorylating the retinoblastoma gene (Rb), and releasing transcription factor E2F, which can trigger the cell cycle transition from the G1 phase to the S1 phase (Ammazzalorso et al. 2021).In hormone receptor (HR) positive and human epidermal growth factor receptor 2 (HER2)-triple-negative breast cancer, the overexpression of CDK4 and 6 can lead to uncontrolled cell proliferation that may evolve into malignant tumors (Zabihi et al. 2023).CDK4 and 6 inhibitors can block CDK4 and 6 kinase activity, which reduces the phosphorylation level of the Rb protein, inhibiting transcription factors to inhibit cancer cell division and proliferation, restore cell cycle control, block tumor cell proliferation, and inhibit breast cancer cell growth (Rubin et al. 2020).
CDK4/6 inhibitors induce tumor cell cycle arrest and synergistically promote antitumor immune responses.CDK4/6 inhibitors enhance the expression of genes that control the processing and presentation of antigens on the surface of cancer cells, inhibit the Rb-EF2 pathway (Goel et al. 2022), reduce the activity of DNA methyltransferases, inhibit the proliferation of regulatory T cells, prime cytotoxic T lymphocytes for eliminating tumor cells and heighten their antitumor functions (Morrison et al. 2024).
In light of the success of cyclin-dependent kinase 4 and 6 inhibitors in advanced HR-positive and HER2-triplenegative breast cancer, researchers have been striving to find more uses under different circumstances, including for early stage HR-positive and HER2-triple-negative breast cancer (Chou et al. 2020;Haddad et al. 2023) and for clinical breast oncology in other subtypes, such as HER2-positive breast cancer, HER2-positive estrogen receptor (ER)-positive breast cancer and triple-negative breast cancer (TNBC) (Blohmer et al. 2022;Tolaney et al. 2020).
In addition to the cyclin-dependent kinases 4 and 6, CDK2 is an extensively studied target for eliminating carcinomas because of its vital functions in signal transduction in cell-cycle control (Tadesse et al. 2020;Freeman-Cook et al. 2021).For some reasons CDK4/6 does not have a key role in mediating the proliferation of different cell types, likely because CDK2 can partly compensate for the inhibition of CDK4/6 in diverse cells (Arora et al. 2023).In several in vitro models, it has been shown that cells resistant to palbociclib also exhibit resistance to ribociclib, while still demonstrating sensitivity to abemaciclib.The observed absence of resistance might perhaps be related to the simultaneous suppression of CDK2 (Morrison et al. 2024).
CDK2, CDK4 and CDK6 are folded similarly with high sequence identity, expressed diffusely in the human body (Al-Warhi et al. 2020).Additionally, despite their structural similarities and widespread expression throughout the human body, there are four key residue differences in the ATP-binding sites of CDK2 (Glu12, Lys33, Lys89 and Asp145), CDK4 (Val14, Lys35, Arg101 and Asp158) and CDK6 (Glu21, Lys43, Thr106 and Asp163).These residue differences in the binding pockets may offer opportunities for the development of selective inhibitors against CDK2, CDK4 and CDK6.
Molecular modeling, serving as an effective means of new drug synthesis, has been applied to CDK2, CDK4 and CDK6 inhibitors.Specifically, structure-activity relationships (SARs) of some known CDK2 inhibitors have been analyzed on 3D-QSAR model platforms by statistically utilizing a comparative molecular field analysis (CoMFA) method (Moussaoui et al. 2023;Abdel-Rahman et al. 2021).From another perspective, previous research of 3D-QSAR models has simply considered the advantages of the structural features, aiming only to determine the maximum activity of a specific target protein and neglecting differences among homologous proteins.Besides, other previous studies found that 4-thiazol-N-(pyridin-2-yl)pyrimidin-2-amine derivatives were reported to be promising inhibitors of these three cyclin-dependent kinases (Chohan et al. 2016;Zheng et al. 2014).However, previous studies, to our knowledge, only utilized a single approach of 3D-QSAR and did not integrate this method with other sophisticated techniques such as molecular dynamics simulations to discover new potential inhibitors of CDK2, CDK4, or CDK6.
Encouraged by the previous outcomes from other laboratories, we applied the intergenerational technology, such as 3D-QSAR technology, molecular docking simulation techniques, molecular dynamics simulations and MMPBSA19/MMGBSA20-binding free energy calculations to statistically determine relevant biological activities of dozens of compounds.These compounds contained diverse chemical properties, including steric and electrostatic fields.
In this study, we used the application technique described above to evaluate the interaction of the core structure of 4-substituted N-phenylpyrimidin-2-amines with CDK2, CDK4 and CDK6 for identifying potential lead compounds that can inhibit these three kinases.Herein, a combination of 3D-QSAR, docking and MD simulations methods was applied to verify the effectiveness of disparate interactional fields for use in fostering the development of highly potent selective CDK2/4/6 inhibitors.
The key structural forms of the investigated compounds are shown in Fig. 1a.The total set of these derivatives was divided into a training set (54 compounds) and a test set (18 compounds) (Table 3).The test compounds were selected manually based on the structural diversity and breadth of their activities.The IC 50 values for CDK2, CDK4 and CDK6 were converted to pIC 50 (−log IC 50 ) values and utilized as dependent variables for the 3D-QSAR analyses.
The 3D structures of 4-substituted N-phenylpyrimidin-2-amine derivatives were constructed through the sketch molecule module in SYBYL 6.9 software.
A Powell gradient algorithm and Tripos force field were used to calculate the minimal structural energy with a convergence criterion of 0.001 kcal/(mol-A) and as many as 1000 iterations.An MMFF94 charge was assigned to each compound.The minimized structure was used as the initial conformation for molecular docking.

Molecular docking
To ascertain possible binding consistencies and directions of the interactions between the investigated compounds and CDK2, CDK4 and CDK6, molecular docking was performed utilizing the Surflex-Dock module in SYBYL 6.9 software.The crystal structures of CDK2, CDK4 and CDK6 were obtained from the RCSB protein database (CDK2 PDB ID: 2WIH; CDK4PBD ID: 2W96; and CDK6PBD ID: 2EUF) and used in successive docking experiments without minimizing the energy.
The ligands were docked at the binding sites of the relevant proteins by an empirical scoring function and the patented Surflex-Dock search engine (Paggi et al. 2024).Before docking, we removed all the native ligands and water molecules and added polar hydrogen atoms to the corresponding ligands.Additionally, Kollman-all atom charges were allocated to the protein atoms.
During this research, self-docking was used to generate ProtoMol, which has two important parameters: the ProtoMol-bloat and the ProtoMol-threshold were used at the default values of 0 and 0.50 Å, respectively.
When other parameters were set to their default values, Surflex-Dock produced 20 conformations each ligand.Subsequently, the binding conformation with the highest docking score for the cocrystallized ligands under similar circumstances was chosen for use in the subsequent 3D-QSAR research.During the whole process, the ligand compound became increasingly flexible, and the protein was defined as a stiff species.

Molecular modeling and alignment
When operating on an SGI R2400 platform, an additional package of the SYBYL 6.9 molecular modeling program produced CoMFA models, with all the parameters set as the default values unless otherwise described.
The structural comparison rules of these compounds are pivotal for the CoMFA analysis (Yu et al. 2023).
To acquire the best 3D-QSAR model, two disparate alignment methods were used.The first was a ligandbased alignment; that is, all the compounds were compared against the most active compound 19, in the Align Database command in SYBYL 6.9 software.The other method is based on receptor alignment.
During this process, the final conformations of these compounds in the docking procedure were allocated MMFF94 charges and were arranged for the molecular comparison by 3D-QSAR analysis.The basic structure and the compounds identified above are shown in Fig. 1.

Generation of 3D-QSAR models
In the CoMFA analytical process, the spatial and electrostatic field models are based on Lennard-Jones and Coulombic potential.
To calculate space and electrostatic energy, we used the Tripos force field, which has a distance-dependent dielectric constant at all intersections in a regularly spaced (2 Å) grid with sp carbon atoms.CoMFA can effectively determine spatial and electrostatic mutual effects.In addition, it also easily calculates the corresponding fields by the proper usage of a sp 3 carbon probe atom with a charge of + 1.0 and a van der Waals radius of 1.0 Å.The Gaussian functional form is used for the calculation between probe atoms and molecular atoms.The default value should be set to 0.3.

Partial least squares (PLS) analysis and validation of the QSAR models using the partial least-square (PLS) statistical method
The arithmetic-partial least-square (PLS) statistical method is commonly used for 3D-QSAR equations [ (Chatterjee et al. 2023;Chatterjee and Roy 2023)].Based on the partial least-square method, the leave-one-out (LOO) The compounds with "*" are included in the test set  cross-validation method was applied to acquire the supreme cross-validation correlation coefficient and the most appropriate quantitative value of elements N.
We then performed the non-cross validation method and calculated the customary correlation coefficient R 2 , the standard error of estimation (SEE) and the F value.To further evaluate the robustness and statistical validity of the derived model, a 100-run bootstrap analysis was also performed.
The non-cross-validation methods were then used, and the conventional correlation coefficient R 2 , standard error of estimates (SEE), and F value were calculated.To assess robustness and statistical effectiveness of the models, we performed 100 runs in a bootstrapping analysis.
To evaluate the forecasting ability of the 3D-QSAR model, which is based on the training set, the biological activity of a test set of 18 compounds was used.
The estimated ability of the patterns is evaluated by the predictive correlation coefficient (R 2 pred ) computed using the equation: R 2 pred = (SD-PRESS)/SD.SD is the sum of the squared deviations of the biological activities of the test set compounds to the mean activity of the training set compounds, while PRESS is the numerical value of squared deviations of the discrepancies between real values and predicted values calculated with the test set compounds.
As indicated by Golbraikh and Tropsha and their associates (Golbraikh and Tropsha 2002), the 3D-QSAR model is effective if all of the following conditions are satisfied: q 2 > 0.5, R 2 > 0.6 and R 2 pred > 0.6.

Molecular dynamics (MD) simulations
We utilized AMBER 9 software for the MD simulation to further verify our results (Case et al. 2005).We regarded the 2WIH, 2W96 and 2EUF chemical compounds and the 3, 4, 19, 31, 57, 71 and 72 compounds as our initial docking structures.Our laboratory calculated the electrostatic potential (ESP) of each compound at the B3LYP/6-31G(d) level in the Gaussian 09 program and applied the RESP protocol to produce a portion of the atomic charges of these compounds.To further modify the protein, we ran this program in AMBER FF03, while using the general AMBER force field (gaff) as the ligand.Because of our previous experience, we applied Na+ as a counter ion of to neutralize the compound, and then, the TIP3P water model was utilized to immerse the whole system, extending a margin of 12 Å.Before we started the molecular dynamics simulation, a two-stage energy minimization was required to relieve possible stress.In the initial stage, the overall system was constrained by 2.0 kcal/(mol•Å 2 ), and water molecules and Na+ ions were minimized through the steepest descent of 2000 steps and a conjugate gradient of 3000 steps.In the following phase, the entire system was gradually set to a minimum of 5000 steps in the steepest descent method following a 5000-step conjugated gradient method.
Then, each complex of the system was increasingly heated from 0 to 300 K at a constant volume for less than 200 ps and equilibrated at 1 atm and 300 K for 500 ps.
The simulation lasting for 50 ns for every complex in an NPT (constant number of molecules, p = 1.0 atm and T = 300 K) ensemble.In the molecular dynamics simulation, we evaluated the long-range electrostatic interactions using a nonbonded cutoff of 8.0 Å, while overall restraining the covalent bonds through hydrogen atoms using the SHAKE algorithm with 2 fs as the time for every step.To evaluate empirically the coordinate trajectories, which were reported by Amber software for every 1 ps, we applied the root mean square deviation (RMSD) methodology to the calculation.We were then equipped to determine the integral level of each compound by analyzing their trajectory parameters.

Binding free energy calculations
With the purpose of verifying the binding stabilization of the complexes described above, we employed the MM-PBSA program in AMBER 9 to compute the binding free energy as In this equation, we define the gas-phase interaction energy as ΔE MM , which is equal to the totality of van der Waals energy ΔE vdw plus ΔE ele , representing electrostatic energy.The ΔGsol represents the solvation free energy, which is determined by polar solvation free energy, called ΔG ele,sol , as well as ΔG nonpol,sol , the nonpolar solvation free energy.Both the polar solvation free energy (ΔG ele,sol ) and the nonpolar solvation free energy (ΔG nonpol,sol ) were evaluated by both the generalized Born approximation model and the solvent-accessible surface area, calculated by AMBER 9. Eventually, through utilizing the MM-GBSA method, the free energy of binding was broken down by each residue, without calculating the entropy contribution (TΔS) due to a lack of scientific research funds.
The ΔG ele,sol and ΔG nonpol,sol were assessed by the generalized Born (GB) approximation model and the solvent-accessible surface area (SASA), which was computed with the MolSurf module in AMBER 9. Due to the expensive computational demand and the lack of satisfactory results, the TΔS was not evaluated in this study.Finally, the binding free energies were decomposed to each residue by the MM-GBSA method.

Docking results
All 72 studied inhibitors docked adequately into the binding pockets of CDK2/4/6, and their optimal conformations were superimposed on each other, as shown in Fig. 2. We chose compounds 19 and 72 as representative for the sake of the discussion.Compound 19 had relatively high activity, and its binding site with CDK2/4/6 adequately accounts for the mechanism between ligands and receptors.Compared with compound 19, compound 72 showed an obvious decrease in activity with CDK2/4/6 kinases due to the different groups on the R 2 and R 4 substituents of compound 72.To further explain why the activity was decreased for compound 72, we conducted a detailed analysis on the binding modes of compounds 19 and 72, and their binding patterns when docked into CDK2/4/6 kinases were determined, as demonstrated in Fig. 3.

For CDK2 kinase
As shown in Fig. 3a and b, compounds 19 and 72 were adequately docked into the ATP-binding sites of the CDK2 kinase with similar binding modes.In the binding models, the N 9 atom at the pyrimidine ring and the linker amino (N 12 H) formed two hydrogen bonds with Leu83, with the corresponding distances of 2.0 Å and 2.0 Å for 19 and 2.1 Å and 2.3 Å for 72.Additionally, compound 19 also formed a hydrogen bond with Tyr15, demonstrating that the hydrogen bond interactions were stronger in the CDK2-19 complex than they were in the CDK2-72 complex.The cyclopentyl ring of the R 1 substituent penetrated into the hydrophilic pocket created by Gly11, Glu12, Gly13 and Leu133, suggesting that hydrophobic substituents at this position increase activity (compounds 1-2).In addition, the R 2 substituent was near a hydrophilic pocket surrounded by Lys33, Glu51 and Asp145, where strong electrostatic interactions readily occurred when R 2 held electronegative groups.Thus, for the complex CDK2-19 (R 2 = CF 3 ), the electrostatic interaction was markedly stronger than that for CDK2-72 (R 2 = CH 3 ), which might explain the decrease in compound 72 bioactivity.R 3 was embedded in a small hydrophobic pocket made up of residues Tyr19, Ala31, Leu32 and Phe80, which were, in some instances, far from the two compounds.Obviously, introducing a moderate-sized group at this position was favorable for the activity.Moreover, the pyrimidine rings of both compounds formed an aromatic-aromatic interaction with the phenyl moiety of Tyr19.The residues His84 and Gln85 appeared close to the X-position of the pyridine ring, illustrating that a hydrophilic atom was favorable for bioactivity at this position.However, apart from His 84 and Gln85, some hydrophobic residues, i.e., Ile10, Leu83 and Leu134, were short distances from the X-position of the pyridine ring, which revealed that introducing a hydrophobic atom at the X-position or substituting a pyridine ring with a benzene ring increased bioactivity.By comparing the bioactivity of compounds 31 with 71, we found that 31 (X = C) clearly exhibited bioactivity that was more than 100-fold greater than that of 71 (X = N).This result suggests that replacing the pyridine ring with a benzene ring is advantageous for augmenting bioactivity.Finally, the R 4 substituent was extended into the solvent-accessible region at the entrance of the binding pocket encompassed by Gln85, Asp86 and Lys89, in which N 4'' atoms of compound 19 could be easily protonated, resulting in strong electrostatic interactions with the negatively charged side chains of Gln85, Asp86 and Lys89.In contrast with that of compound 19, the R 4 substituent of compound 72 was replaced by a CH 2 group, whose hydrophilic property was apparently weaker, and the electrostatic interactions between surrounding residues was also weaker, which therefore led to lower potency compared with that of compound 19.and Val96, with the approximate distances of 1.8 Å, 1.8 Å,  and 2.0 Å for 19 and 1.8 Å, 2.3 Å and 3.2 Å for 72.Due to the shorter hydrogen bond, compound 19 might form stronger hydrogen bond interactions than compound 72.In addition to the hydrogen bond interactions, the cyclopentyl moiety of the R 1 substituent established van der Waals interactions with the side chains of Gly13, Val14, Gly15 and Val16, where an overlarge group can easily collided and cause strong repulsive interactions.This unfavorable interaction with plasma enzymes caused an obvious decrease in CDK2 bioactivity (compound 11).In addition, the R 2 substituents of both compounds suitably embedded in a hydrophilic pocket including Tyr21, Lys33 and Asp158, and these residues formed the strong electrostatic interactions with compound 19 (R 2 = CF 3 ) but not with compound 72 (R 2 = CH 3 ).The R 3 substituent was buried in a small hydrophobic pocket surrounded by Tyr21, Ala33, Leu34 and Phe93, which provided sufficient space to accommodate relatively large groups at this position.In addition, the pyrimidine ring exhibited an aromatic-aromatic interaction with the phenyl moiety of Tyr21.At the N 14 -position of the pyridine ring, Ile12 and Leu147 tended to make contact with the benzene ring through hydrophobic interactions.Therefore, we can infer that a carbon atom in the X-position was beneficial for activity, which might be the reason why compound 19 did not show the highest bioactivity.Finally, we observed that Asp99, Arg101 and Thr102 constituted an charged region at the entrance of the binding pocket.These residues form electronegative interactions with the protonated N 4'' atom of compound 19 but not with compound 72.In summary, all of these analyses revealed that electrostatic interactions, hydrophobic interactions and repulsive interactions may be critical for the different activity levels of compounds 19 and 72.

For CDK6 kinase
As shown in Fig. 2e and f, the N 9 -position of the pyrimidine ring and linker amino (N 12 H group) formed two hydrogen bonds with Val101, with the corresponding distances of 2.0 Å, 2.1 Å for 19 and 2.2 Å, 2.4 Å for 72.In addition to these hydrogen bonds, compound 19 gained another two hydrogen bonds at the hinge region: one with Asn150 (Asn150 = O…NH, bond length of 2.3 Å) and the other one with Asp163 (Asp163-NH…N, bond length of 2.7 Å).The cyclopentyl moiety of the R 1 substituent lay close to the side chains of Ala23, Gly22, Glu21 and Gly20 and contacted these residues in van der Waals interactions.In addition, the CF 3 group of compound 19 was readily protonated, which probably enabled strong electrostatic interactions with surrounding residues (Lys43 and Asp163).In contrast with that of compound 19, the R 2 substituent of compound 72 was replaced by a methyl group and forming negligible electrostatic interactions with Lys43 and Asp163.We also found that the basic nitrogen atom at the pyridine ring was in the vicinity of Ile19 and Leu152, thus favorably implanting a carbon atom at the X-position.We also observed a hydrophilic pocket near the entrance of the binding pocket.This pocket involved Asp104, Thr106 and Thr107, all polar residues that readily formed electrostatic interactions with the protonated atom (N 4'' atom) of the R 4 substituent in compound 19 but not with compound 72.Regarding compound 72, two reasons explain the dramatic loss of activity.One is a lack of protonated atoms at the R 2 and R 4 substituents, and the other one is the absence of corresponding hydrogen bonds (Asn150 = O…NH and Asp163-NH…N).

Comparison of CDK2/4/6 kinases
As shown in Fig. 4, we found that the binding sites of compound 19 to CDK2/4/6 were nearly identical, but some differences were observed.First, the cyclopentyl moiety was located in the shadow of Val14 in CDK4 with favorable hydrophobic interactions, whereas no such interactions were formed with Glu12/Glu21 of CDK2/CDK6.Second, the protonated N 4'' atoms situated close to the negatively charged side chain of Arg101 in CDK4, exhibiting a stronger electrostatic interaction than Lys89/Thr106 of CDK2/CDK6.Third, compound 19 formed three hydrogen bonds for CDK2 (bond lengths of 2.0, 2.0 and 3.2 Å), three hydrogen bonds for CDK4 (bond lengths of 1.8, 1.8 and 2.8 Å) and four hydrogen bonds for CDK6 (bond lengths of 2.0, 2.1, 2.3 and 2.7 Å), suggesting that the hydrogen bond interactions became slightly stronger in the CDK6-19 complex than in CDK2-19 and CDK4-19 complexes.Nonetheless, the increase in the number of hydrogen bond interactions in the CDK6-19 complex did not counteract the impairment of hydrophobic and electrostatic interactions, leading to the lower activity of 19 for CDK6 than CDK4.

CoMFA statistical analysis results
The CoMFA analyses were utilized to create the 3D-QSAR models of CDK2, CDK4 and CDK6 kinases, and the detailed statistical parameters are represented in Table 2. Depending on the training set (54 compounds) and the test set (18 compounds), all three models exhibited good correlation coefficients (q 2 , R 2 , F and SEE), with 0.714, 0.962, 179.575 and 0.139 for CDK2; 0.815, 0.976, 313.343 and 0.116 for CDK4; and 0.757, 0.977, 330.996 and 0.125 for CDK6, respectively.The bootstrapping results (100 runs) showed R bs 2 and SD bs values of 0.973 and 0.01 for CDK2, 0.985 and 0.006 for CDK4, 0.985 and 0.005 for CDK6, respectively, demonstrating that these models had satisfactory internal predictability.
Moreover, the predictive correlation coefficients R 2 pred for the three test sets were 0.762, 0.681 and 0.674, respectively, revealing that the three models had reliable external predictability.Comparing the contributions of steric and electrostatic fields, we discovered that the electrostatic fields of the three models all exerted higher influences than steric fields on their inhibitory activity, as shown in Table 2.The predicted pIC 50 values of the three established CoMFA models are listed, and the residual values with experimental pIC 50 were computed, as shown in Table 3.We clearly observed that the experimental pIC 50 values were in line with the predicted pIC 50 values in Fig. 5, in which most points were equally distributed around the regression line, verifying good reliabilities of the three obtained CoMFA models.

CoMFA contour map for CDK2
For each enzyme, the best CoMFA models were chosen to build the 3D coefficient contour maps to observe the field effects on the target features and thus may be helpful for identifying possible interaction sites.The CoMFA steric contour maps are displayed in Fig. 6a using compound 19 as a reference structure for explanation.The green block represents the steric favorable regions, while the yellow block represents unfavorable regions.In the vicinity of ring E, there are a large green and two small yellow blocks,  suggesting that a moderate-sized group is advantageous at this position.This finding was well depicted, with some examples in which compound 2 with cyclopentyl at this position showed higher activities than compounds 14, 11 or 33 with phenyl, isopropyl and H, respectively.Moreover, there was a bulky green block around trifluoromethyl, which is embedded in a hydrophilic pocket that included Tyr21, Lys33 and Asp158 in the docking results.This finding indicates that the replacement of the R 2 substituent with a bulky group enhances bioactivity, which can reasonably explain why compounds 18-20 with -CF 3 groups had higher activity levels than compounds 2, 4 or 5.A yellow region located above the X-position of ring C demonstrates that introducing a large atomic radius at this position is favorable, which may be a reason why compound 71, with a nitrogen atom, behaves nearly 150-fold worse than compound 31, with a carbon atom at the X-position.Thus, there are reasons to believe that a carbon atom at the X-position effectively improves activity.Finally, we also observed a green and yellow region in the proximity of the R 4 substituent, which is hindered by the side chain of Lys89, revealing that a moderate-sized group is conducive to augmenting bioactivity.However, these docking results (Fig. 2a) verified that the terminal region of the R 4 substituent extends toward the solvent region.Hence, we speculate that moderate-sized groups of a certain long length were tolerated by the CDK2 kinase.
The contour map of the CoMFA electrostatic field is shown in Fig. 6b.The blue region indicates that electropositive groups were favorable for bioactivity, and the red region indicates that the electronegative groups were beneficial.Five blue blocks were distributed from ring E to the X-position of ring C, which was in line with the hydrophobic side chains of Ile10, Gly11, Gly13, Leu133 and Leu134 in the docking study (Fig. 2a), showing that introducing a high electropositive group in ring E or featuring a carbon atom in the X-position of ring C favorably enhances the bioactivity.This finding can explain why compound 31, with a carbon atom, showed a bioactivity that was 150-fold greater than that of compound 71, with a nitrogen atom at the X-position, which suggests that an X-position substituted by a carbon atom might be a significant factor to improve bioactivity.Regrettably, most of the potent compounds, namely, 17-21, retaining the cyclopentyl group in ring E, introduce a nitrogen atom at the X-position instead of a carbon atom.Therefore, we believe that these potent compounds would gain better bioactivity with a carbon atom at the X-position.Furthermore, a red block is shown on the left side of the N 1' -position, suggesting that the electrogroup was beneficial.The docking result (see Fig. 2a) also revealed that the N 1' substituent situated near Tyr15 and had the potential to form a hydrogen bond with Asp145.It was verified that compounds 32, 33 and 35, with amino groups, were more potent in bioactivity than compounds 55, 56 and 58, with methylene groups at this position.Finally, the all of R 4 was embedded in a large red block, representing Gln85, Asp86 and Lys89 in the docking study (see Fig. 2a), indicating that R 4 likely introduces groups with strong electronegativity.For example, compound 43, with a -NCOCH 3 group, was approximately 20-fold more electronegative than compound 46 with a -CHN(CH 3 ) 2 group.In addition, compound 2, with higher potency, was also such a case when compared to 72.

CoMFA contour map for CDK4
The steric and electrostatic field contours of the CDK4 CoMFA model are clearly displayed in Fig. 6c and d and were roughly similar to those of CDK2.For example, a bulky green and two little yellow contours lying near ring E indicated that a moderate-sized substituent incorporated at this situation exerted a favorable influence on CDK4 bioactivity.However, some differences were also observed between the CDK2 and CDK4 CoMFA models and mainly involved the absence of a yellow block near ring C and the appearance of a big green block between R 2 and R 3 in the CDK4 CoMFA model.Moreover, three blue and two red blocks of the CDK4 model were found to be almost the same as those of the CDK2 model, but the sizes of the former were increased.These differences may be critical for the selectivity of CDK2 and CDK4.The appearance of a big green block between R 2 and R 3 indicated the need to augment the lengths of R 2 and R 3 , a finding congruent with the docking results showing that the R 2 and R 3 substituents were buried in a bulky pocket surrounded by Tyr21, Ala33, Leu34 Lys35, Leu91, Phe93 and Glu94.This supposition can be rationalized by the examples of compounds 36-39, with -CN groups at the R 2 position, exhibiting higher bioactivity with CDK4 than compound 32-35, with hydrogen atoms at R 2 .Moreover, three blue blocks distributed from ring E to the X-position of ring C were in accord with the hydrophobic side chains of Ile12, Gly13, Val14, Gly15 and Val16 in the docking study (Fig. 2c), offering an explanation for the marked increase in the CDK4 bioactivity of 33-35, with cyclopentyl groups, compared to 2, 4 and 5, with hydrogen atoms.Finally, a red polyhedron engulfing the N 1' H group suggested that a hydrophilic group incorporated at this position was desirable for enhancing the activity to the CDK4 inhibitor, which was supported by the formation of a hydrogen bond (Asp158 = O…H N 1' , bond length 2.8 Å) with Asp158 in the docking result (Fig. 2c).Thus, we confirmed that compounds 55, 56 and 58, with amino groups, all exhibited nearly tenfold decreases compared to compounds 32, 33 and 35, with ethyl groups as R 1 .

CoMFA contour map for CDK6
The steric field of the CoMFA model is displayed in Fig. 6e and was very similar to that of the CDK2 CoMFA model, except for size differences.Thus, it is not discussed here.Figure 6f represents the electrostatic field and shows some differences compared with CDK2 and CDK4.First, two red blocks situated near the N 1' H group, while only a single red block is adjacent to this site in CDK2 and CDK4, indicating that the electronegative group featured here was necessary.Many of the potent compounds 1-3, 8-9, 12, 19-21 and 41-44 possessed electronegative groups at the N 1' -position, whereas the most inactive compounds, 57-58 and 67, carried an electropositive group (CH group) at this position.These findings were confirmed by a docking result showing that the N 1' H group formed a strong hydrogen bond with the near residue Asn150 (bond length 2.3 Å).Second, we observed two red blocks around the R 2 , while only a red block surrounded this region in CDK2 and CDK4.These findings indicated that the electrogroup incorporated in this region was required for a compound to be a potent CDK6 inhibitor.
In the docking study, we found Lys43, Glu61 and Asp163 on the left side of the R 2 substituent forming strong electrostatic interactions with this substituent.Thus, the most potent compounds, 17-21, all had -CF 3 groups rather than -CH 3 groups.Finally, a large blue block was observed stretching from the N 12 to the C 16 position, while just a small blue block was incorporated at the X-position of ring C in CDK4.This finding indicated the need to augment the electropositivity at the X-position, specifying that compound 31, with a carbon atom, exhibited more than tenfold better activity than 71, with a nitrogen atom at this location.The docking result (see Fig. 2e) also demonstrated that some residues (Ile19, Val101 and Leu152) with strong electropositivities were in close vicinity to ring C and probably established some significant hydrophobic interactions to influence the CDK6 bioactivity.
From the above analysis, we formed some conclusions for CDK2/4/6 kinases: 1. Electrogroups at the N 1 -position and electropositive and moderate-sized groups at ring E may be conducive for bioactivity.2. Bulky and electrogroups featured at the R 2 substituent can augment the bioactivity.3. Selecting a nitrogen atom rather than a carbon atom at the X-position favorably contributed to bioactivity.4.An Electrogroup at the R 4 substituent was likely to enhance bioactivity.

MD simulations, MM/GBSA calculations and free energy decomposition
According to the docking results, we selected seven compounds, 3, 4, 19, 31, 57, 71 and 72, with CDK2, CDK4 and CDK6 kinases to carry out the 50-ns molecular dynamics simulations.These selected compounds were separated into four pairs (3-57, 4-19, 31-71 and 4-72), to specify the influences of different substituent R 1 , R 2 , X and R 4 groups on CDK2/4/6 bioactivity.To elucidate the dynamic stability and reasonability of these systems, the RMSD values of the protein backbones against the initial structures were computed and are shown in Fig. 5a.As exhibited in Fig. 7, all systems displayed small RMSD fluctuations, and their average RMSD values were in an acceptable range: 2.55-3.44Å for the CDK2 systems (see Fig. 7a), 1.42-2.11Å for the CDK4 systems (see Fig. 7b) and 2.48-3.19Å for the CDK6 systems (see Fig. 7c).Obviously, the CDK4 system showed greater stability than the CDK2 and CDK6 systems.It was also revealed that the selected compounds tended to have stronger interactions with CDK4 kinase in the ATP-binding pocket than the CDK2 and CDK6 kinases, which might account for the fact that compounds 3, 4, 19, 31, 57, 71 and 72 had greater potency against the CDK4 kinase than against the CDK2 and CDK6 kinases.
To further identify the interior stability of each residue in the CDK2/4/6 kinases during the modeling dynamics process, the values of root-mean-squared fluctuation (RMSF) of the six systems were computed based on their backbone residues and are displayed in Fig. 6.In the RMSF graphs exhibited in Fig. 8a, we can easily recognize that the backbone residues Ile10, Gly11, Gly13, Leu83, Leu134 and Asp145 in CDK2-4 and -72 fluctuated to less than 0.5 Å, which indicated good stability and aligned with the crucial residues identified in our docking result (see Fig. 3a).The other residues showed similar trends, except for Gln85, Asp86 and Lys89, which had much less fluctuation with CDK2-4 than with CDK2-72; for instance, the vibration of Lys89 was up to 0.92 Å for CDK2-72, while it was only 0.46 Å for CDK2-4.As shown to Fig. 11a, we detected that the positively charged side chain of Lys89 formed strong electrostatic interactions with the protonated nitrogen atom of the R 4 substituent of compound 4 rather than the aprotic carbon atom of the R 4 substituent of compound 72.In Fig. 8b, we found that the backbone residues Ile12, Gly13, Val14 and Val96, Leu146 and Leu147 in CDK4-4 and CDK4-72 had satisfying stability due to their strong van der Waals interactions with compounds 4 and 72.Nevertheless, some differences were observed in both systems, such as for Asp99, Arg101, Thr102 and Asp158, the interactions of which were distinct between the two compounds.For instance, Asp158 served as a donor, interacting with the N 1 H group of the R 4 substituent in compound 4 instead of compound 72 (see Fig. 11b).For the CDK6-4 and CDK6-72 systems (Fig. 8c), the vibrations of some important residues, such as Ile19, Gly20, Val101, Leu152 and Asp163, were small, suggesting that the interactions of these residues were crucial and that both systems showed similar stability.Nonetheless, residues Gln103, Asp104, Thr106 and Thr107, situated at the entrance of the binding site, showed larger fluctuations with CDK6-72 than CDK6-4, specifically at Thr106, the side chain of which formed strong electrostatic interactions with the N 4'' atom in compound 4 but not in 72.Considering these findings, we speculated that the interactions formed by CDK2-4, CDK4-4 and CDK6-4 became stronger than those formed by CDK2-72, CDK4-72 and CDK6-72, which agrees with the experimental bioactivities.
In addition, we further investigated receptor-ligand interaction by monitoring the whole trajectory of the MD simulations for the abovementioned twenty-one systems (Table 4).As shown in Table 4, for three inhibitor pairs (3-57, 4-19 and 31-71), inhibitors 3, 19, and 31, with higher bioactivity, formed more hydrogen bonds with CDK2/4/6 kinases than 57, 4 and 71, with lower bioactivity.Furthermore, some hydrogen bonds that formed in the docking model were different than in the MD simulations; for instance, Asn150 C = O…N 1' H, in the docking model (see Fig. 2e) was replaced by the corresponding hydrogen bond (Asp163 C = O…N 1' H) in the CDK6-19 system due to the small movement in the conformation of compound 19.
The binding free energies of the twenty-one systems were determined by the MM-GBSA method, the corresponding statistics of which are displayed in Table 5. Obviously, their binding affinities chiefly aligned with the experimental bioactivities shown in Table 1.To explore the influences of the R 1 substituents, R 2 substituents X-positions and R 4 substituents on the binding affinities in depth four compound pairs (3-57, 4-19, 4-72 and 31-71) and the CDK2/4/6 kinases were methodically compared.
Comparing compounds 3 and 57, only their R 1 substituents were different.Compound 3 featured a -N 1' Hcyclopentyl group at the R 1 substituent, whereas in compound 57, it was replaced by a -CH 3 group, indicating that the former showed better potency than the latter with CDK2/4/6 kinases.This outcome was proven by the results of the binding free energy, as the ΔG bind values of CDK2-3 (− 41.76 kcal/mol), CDK4-3 (− 52.57kcal/mol) and CDK6-3 (− 45.52 kcal/mol) were dramatically higher than those of CDK2-57 (− 36.56 kcal/mol), CDK4-57 (− 47.44 kcal/ mol), and CDK6-57 (− 36.62 kcal/mol), which indicated that compound 3 interactions with CDK2/4/6 kinases were more stable than those of compound 57.In Fig. 8a-c, compound 3 formed three hydrogen bonds with Asp145 of CDK2, Asp158 of CDK4 and Asp163 of CDK6.However, these hydrogen bonds in CDK2-57, CDK4-57 and CDK6-57  were nonexistent, which coincided with the CoMFA results showing the introduction of an electronegative atom at the N 1' -position was favorable for inhibitor bioactivity.This finding was also supported by the contributions of polarity (ΔG ele + ΔG ele,sol ), as shown in Table 5, where the polar contributions (ΔG ele + ΔG ele,sol ) of CDK2-3 (14.72 kcal/ mol), CDK4-3 (12.06 kcal/mol) and CDK6-3 (15.93 kcal/ mol) were distinctly stronger than those of CDK2-57 (17.46 kcal/mol), CDK4-3 (13.94 kcal/mol) and CDK6-3 (20.55 kcal/mol).As shown in Fig. 8d and g, the CDK2 systems demonstrated that the key residues Gly11, Glu12, Gly13, Lys33, Lys89 and Asp145 clearly contributed more to 3 than to 57, particularly Asp145, whose difference of energy in compounds 3 and 57 reached − 1.53 kcal/mol.Moreover, the positively charged side chain of Lys89 formed a stronger electrostatic interaction with compound 3 than with 57 because of the short distance from and spatial orientation with the N 4'' atom.For the CDK4 system (Fig. 8b, e and h), the crucial residues accounting for the energy differences between compounds 3 and 57 were Gly13, Val14, Gly15, Arg101 and Asp158, and their energy differences were essentially derived from their nonpolar interactions (ΔG vdw + ΔG nonpol,sol ).Here, Val14, contacting R 1 directly , formed distinctly stronger hydrophobic interactions with compound 3 than with compound 57, further proving that the nonpolar interactions were the major contributors.The nonpolar interaction of 3 (− 64.63 kcal/mol) was notably higher than that of 57 (− 61.38 kcal/mol) in Table 5.Thus, we regarded the nonpolar interaction as an important factor affecting the different bioactivities of compound 3 and 57 against CDK4.With regard to the two CDK6 systems (Fig. 8c, f and I), we also discovered that the favorable contributions of Gly20, Glu21, Gly22, Asp104, Thr106, Thr107 and Asp163 for compound 3 were much larger than those for compound 57, but Lys43, positioned in closer vicinity to the N 4 atom of compound 57, formed a stronger polar interaction with 57 than with 3. Nonetheless, the increased polar interactions of compound 57 with Lys43 were not sufficient to make up for the weakening interactions with other residues.This finding was in accordance with their binding free energy, as shown in Table 5, where the ΔG bind of CDK6-3 was − 8.90 kcal/mol over CDK6-57.As seen in Fig. 8c, the nonpolar residues Gly20 and Gly22 were located close to the R 1 of 3, where the cyclopentyl ring easily contacted Gly20 and Gly22 through strong nonpolar interactions.In addition, the polar residues Asp104, Thr106 and Thr107 were situated around the protonated N 4'' atom and established stronger polar interactions with 3 than 57 due to their closer distance and favorable orientation to ring D. In summary, the polar and nonpolar interactions jointly influenced the bioactivity of compounds 3 and 57 against CDK2/4/6 kinases.We also concluded that possessing an electron at the N 1' position and replacing ring E with a moderately sized hydrophobic group was important for designing the R 1 substituent.The chemical structures of compounds 4 and 19 were similar except for the R 2 substituent, where -CF 3 in 19 was substituted with -CH 3 in 4. Obviously, compound 19 likely has stronger polar-interactions than 4 with CDK2/4/6 kinases.As displayed in Table 5, the polar energy was found to make the prominent contribution to the binding with three receptors, where the polar tendency (ΔG ele + ΔG ele,sol ) of CDK2-19 was − 4.20 kcal/mol, which was greater than that of CDK2-4, and that of CDK4-19 was − 4.35 kcal/ mol greater than that of CDK-4 and that of CDK4-19 was − 5.04 kcal/mol greater than that of CDK-4.For the CDK2-19 and CDK2-4 systems, Fig. 9d and g shows that five important residues, Tyr19, Lys33, Glu51, Lys89 and Asp145, showed large energy differences, and these residues are polar residues, especially Lys33, which was within the proximity of -CF 3 , whose energy difference between 19 and 4 reached − 1.52 kcal/mol.As shown in Fig. 9a and g, in contrast with compound 4, compound 19 formed a hydrogen bond with Asp145, resulting in Asp145 linking compound 19 with distinctly higher polar interaction than compound 4.For two CDK4 systems (Fig. 9b, e and h), the polar residues Tyr21, Lys35, Asp99, Arg101 and Asp158 made greater contributions for binding 19 than for 4, but Thr102 was located in closer proximity to the C = O of 4 and formed a stronger polar interaction with 4 than with 19.However, 19 still had stronger polar interactions than 4 with CDK4 kinase.As illustrated in Fig. 9b, we observed that compound 19 formed four hydrogen bonds with Lys35, Leu96 and Asp158, while compound 4 formed only three hydrogen bonds with Leu96 and Asp158.Furthermore, the nonpolar interaction of 19 (− 65.35 kcal/mol) had a similar contribution as 4 (− 65.11 kcal/mol), as shown in Table 5, which further indicated that the polar interaction primarily accounted for the difference in the bioactivities of compounds 19 and 4 against CDK4 kinase.With respect to the CDK6-19 and CDK6-4 systems, as shown in Fig. 9f and I, the significant residues Lys43, Glu61, Thr106 and Asp163 made greater beneficial contributions toward the different bioactivities between 19 and 4. Lys43 and Asp163, displayed the largest energy difference of 19 over that of 4, formed additional hydrogen bonds with 19.In addition, Lys43 was positioned close to R 2 and contacted the -CF 3 group of 19 through stable electrostatic interactions, but not the -CH 3 group of 4.Moreover, the nonpolar interaction was roughly equal in the two systems (− 60.45 kcal/mol for 19 and − 59.06 kcal/mol for 4), as shown in Table 5, implying that the polar interaction was responsible for the energy difference of CDK6-19 and CDK6-4.These data indicated that retaining the -CF 3 group as an R 2 substituent was an effective approach for increasing the inhibitor bioactivity against CDK2/4/6 kinases.
The R 3 substituent in compounds 4 and 72 was different; it was an -NCOCH 3 group in 4 and a -CH 2 group in 72.The much greater bioactivity of 4 compared to that of 72 reveals that the -NCOCH 3 group, with strong hydrophilic properties, showed favorable binding to CDK2/4/6 kinases.Accordingly, we also speculated that the electrostatic interactions between the -NCOCH 3 group and surrounding residues might be stronger.This supposition was proven by the results from the binding energy calculations (Table 5), showing that the polar interactions of CDK2-4 (17.35 kcal/mol), CDK4-4 (16.92 kcal/mol) and CDK6-4 (16.60 kcal/mol) were markedly higher than those of CDK2-72 (20.54 kcal/ mol), CDK4-72 (21.56 kcal/mol) and CDK6-72 (18.92 kcal/ mol).As displayed in Fig. 10a, d and g, for two CDK2 systems, the residues critical for the energy difference were Gln85, Asp86 and Lys89, which all surrounded the R 3 substituent and are polar residues, particularly Lys89, whose -NH group, with very close proximity to the protonated N 4'' atom of the -NCOCH 3 group, established stable electrostatic interactions with compound 4. Similar to CDK2 systems, the residues around R 3 (Gln98, Asp99, Arg101 and Thr102) also primarily contributed to the different energies of the CDK4 systems (see Fig. 10b, e and h).This was especially true for Thr102, regarded as the residue with the largest energy difference between 4 and 72; its -NH group was able to form a strong electrostatic interaction with the protonated N 4'' atom of the -NCOCH 3 group because of the small distance and dimensional orientation to the -NCOCH 3 group.In addition, Asp158 contributed obvious differences in energy up to − 0.77 kcal/mol due to its greater ability to form hydrogen bonds with the N 1' atom of compound 4 than with 72.In contrast, it is shown in Fig. 10c that the C = O moiety of Asp163 formed an additional hydrogen bond with the N 1' atom of compound 72 instead of 4 in CDK6 systems.This bond caused the polar interaction of Asp163 with compound 72 to be much higher than it was with 4 (see Fig. 10i).Moreover, we also observed that Lys43 was adjacent to compound 72 and established stronger polar interactions with 72 than with 4. Nevertheless, the increasing number of polar interactions of compound 72 with Lys43 and Asp163 were not sufficient to compensate for the impaired interactions with the residues surrounding the R 3 substituent (Gln103, Asp104, Thr106 and Thr107), which were all polar residues and made distinctly more favorable contributions to 4 than to 72.For instance, Thr107, whose C=O moiety was orientated toward the protonated N 4'' atom of compound 4, was clearly more tightly connected with 4 than with 72 because of the strong polar interaction.Through the above results, it was confirmed that the difference in polar interactions led to the higher bioactivity of 4 over 72 against the CDK2/4/6 kinases.Thus, we should preserve the protonated nitrogen atom at the N 4'' position and pay more attention to the interactions with Lys89 of CDK2, Arg101 of CDK4 and Thr107 of CDK6 when designing R 3 substituents.
The difference between compounds 31 and 71 was at the X-position of ring C, where a carbon atom in 31 was displaced by a nitrogen atom in 71.Accordingly, the ring C moieties of 31 and 71 were benzene and pyridine rings, respectively.For the benzene ring, its hydrophobic properties were comparable with those of a pyridine ring, near which the hydrophobic residues tended to form powerful nonpolar interactions.Then, as shown in Fig. 11a-c, we found that Ile10, Gly11 and Leu134 of CDK2, Ile12 and Leu147 of CDK4, and Ile19, Gly20 and Leu152 of CDK6 were located close to ring C.These residues all showed high hydrophobic properties and formed strong nonpolar interactions with benzene rings instead of pyridine rings.Therefore, Fig. 11d-i clearly demonstrates that the nonpolar residues Gly11 and Leu134 of CDK2, Ile12 and Leu147 of CDK4, and Ile19, Gly20 and Leu152 of CDK6 all contributed more to 31 than to 71.As illustrated in Table 5, the nonpolar interactions were also recognized as the major contributors of energy differences between 31 and 71.Polar interactions also partly influenced energy differences.For two CDK2 systems (Fig. 11a, d  and g), Tyr15 and Lys33 contributed more polar interactions to 31 than to 71, while Asp145 was connected more tightly with 71 than with 31 through polar interactions.Except at Val83, 31 can form two hydrogen bonds with Tyr15 and Lys33, 71 forming one hydrogen bond with Asp145.As displayed in Table 5, the polar interactions of 31 were slightly higher than those of 71.With regard to the CDK4-31 and CDK-71 systems (Fig. 11b, e and i), the polar residues Asp97 and Gln98 lay adjacent to the N 14 atom of the pyridine ring and formed substantially stronger polar interactions with 71 than with 31, which might partly explain why the bioactivity of 31 against CDK4 was nearly tenfold higher than that of 71 against CDK2 and CDK6.For CDK6 systems (Fig. 11c, f, and i), an additional strong hydrogen bond between 31 and Lys43 caused a greater number of polar interaction for 31 than for 71, which further enhanced the stability of CDK4-31 in comparison with that of CDK4-71.Based on the above analysis, we concluded that both nonpolar and polar interactions, especially nonpolar interactions, played important roles in causing the different bioactivities of compounds 31 and 71 against CDK2/4/6 kinases.Therefore, similar to compound 31, the whole series of compounds should possess a carbon atom at the X-position or the pyridine ring should be replaced with a benzene ring, changes that were very promising for increasing bioactivity (Fig. 12).

Discussion
In this present study, we analyzed the interactions and structure-activity relationships of 72 novel CDK2/4/6 inhibitors with the comprehensive approaches of molecular docking, 3D-QSAR and MD simulations.Molecular docking results demonstrated that the studied inhibitors contacted three receptors with similar modes, with some small differences.For instance, the cyclopentyl groups of inhibitors formed strong hydrophobic interactions with Val14 of CDK4 but not Glu12/Glu21 of CDK2/6.The reliability of docking modes for CDK2/4/6 and the significant substituent features (R 1 , R 2 , R 4 and X) were further verified by MD simulations.The constructed 3D-QSAR models of the three receptors were reasonable and confirmed by the excellent statistical data (q 2 = 0.714 and R 2 pred = 0.764 for CDK2; q 2 = 0.815 and R 2 pred = 0.681 for CDK4; and q 2 = 0. 757 and R 2 pred = 0.674 for CDK6).These models had dependable internal verifiability and external predictability.MD simulation and decomposition energy analysis further recognized the interesting nonpolar and polar interactions of CDK2/4/6 receptors by comparing the interactions between four selected pairs of inhibitors (3-57, 4-19, 4-72, 31-71) and the three receptors.According to the integrated results of the docking, 3D-QSAR and MD simulations, we ultimately concluded some groups featured in the corresponding locations probably enhanced the bioactivity against CDK2/4/6 kinases.

Conclusion
The constructed 3D-QSAR models of the three receptors were reasonable and confirmed by the excellent statistical data.According to the integrated results of the docking, 3D-QSAR and MD simulations, we ultimately concluded that the following groups featured in the corresponding locations probably enhanced the bioactivity against CDK2/4/6 kinases: (1) electronegative groups at the N1-position and electropositive and moderate-sized groups at ring E; (2) electrogroups featured at R2; (3) carbon atoms at the X-position or ring C replaced by a benzene ring; and (4) an electrogroup as R4.Finally, we hope the results obtained from this work will provide some useful references for the development of novel CDK2/4/6 inhibitors.

Fig. 1
Fig. 1 General structural formulas and numbering of benzamide-based derivatives (a) and the template molecule (b), compound 19

Fig. 5
Fig. 5 Scatter plots of predicted activities vs. actual activities using the training set (black square) and test set (red triangle) for the CoMFA models of CDK2 (a), CDK4 (b) and CDK6 (c)

Fig. 6
Fig. 6 CoMFA contour maps of the highly potent compound 19.Steric contours for CDK2 (a), CDK4 (c) and CDK6 (e).Green contours indicate regions where bulky groups increase activity, while yellow contours indicate regions where bulky groups decrease activ-

Table 2
Statistical analysis results of the CoMFA models a a N is the optimal number of components, q 2 is the square of LOO cross-validation coefficient, R 2 is the square of non-cross-validation coefficient, SEE is the standard error of estimation, F is the F-test value, R bs 2 is the mean R 2 of bootstrapping analysis (100 runs), SD bs is the mean standard deviation by bootstrapping analysis

Table 4
Hydrogen bond analysis from the results of MD simulations