Background

Due to the vast improvement of computing power and the rapid development of computational chemistry and biology, computer-aided drug design (CADD) technology has been successfully applied in drug discovery, accelerating the research process and reducing the associated costs and risks [1]. Virtual screening is a common CADD technology used for the identification of hit molecules from large-scale compound libraries, providing a highly efficient approach to discover new compounds in the early stage of drug development [2]. Virtual screening methods, including molecular docking [3], pharmacophore model [4], and similarity searching [5], are widely used in drug research and development. With an increasingly large number of target protein structures being resolved, the molecular docking-based virtual screening (DBVS), has become one of the most common virtual screening methods, having achieved a great success [6].

However, due to its low accuracy, DBVS suffers from constant criticism. This method involves two basic processes: positioning ligands into a protein active site, scoring and ranking docked ligands. Both steps are imperfect. In the last decade, several studies on docking and scoring methods demonstrated that most of the docking methods and scoring functions are sensitive to the receptor, and conclusions on different receptors may be inconsistent with each other [7,8,9,10]. Different docking programs may show different performances in the same target protein [8]. There is often a poor correlation between the scores given by scoring functions and experimental binding affinities. New forms of examination are necessary to maximize the success rate of docking and scoring methods, to determine the best parameters for individual protein targets.

Transforming growth factor-β (TGFβ) signaling plays a key role in cell growth, migration, differentiation, epithelial–mesenchymal transition (EMT), and extracellular matrix remodeling [11]. TGFβ is reported to be critical in the later stages of tumorigenesis by increasing immunosuppressive Treg cells and facilitating EMT [12]. TGFβ type 1 receptor (TβR1) is a key member of the TGFβ signaling pathways, and its inhibition has the potential to enhance the immune response against tumors. Thus, TβR1 has attracted wide attention, as a potential drug target [13].

Different virtual screening models can be complementary in the search for new inhibitors. In our previous studies, several inhibitors were screened and designed by the pharmacophore model and structure–activity relationship study [14, 15]. In this study, we aimed to obtain new TβR1 inhibitors using DBVS. For that, we assessed different docking methods and scoring functions for TβR1, before large-scale database screening. We conclude that the CDOCKER protocol with a successful docking rate of 88% and the PMF and Ludi 1 scoring functions with high reliability may be efficiently applied in the virtual screening of new inhibitors of TβR1.

Materials and methods

Database preparation

The Protein Data Bank (PDB; http://www.rcsb.org) funded by the National Science Foundation, the National Institutes of Health, and the US Department of Energy is the single worldwide archive of 3D structural data of biological macromolecules, such as proteins, nucleic acids, and complex assemblies [16]. Up to now, 22 crystal structures of TβR1 with various small molecule inhibitors have been reported in PDB, including 1RW8, 1PY5 [17], 1VJY [18], 2WOU, 2WOT [19], 2X7O [20], 3GXL, 3HMM [21], 3FAA [22], 3KCF [23], 3TZM [24], 4X2F, 4X2G, 4X2J [25], 5E8W, 5E8Z [26], 5FRI [27], 5QIK, 5QIL, 5QIM [28], 5USQ [29], and 6B8Y (Additional file 1: Table S1) [12]. Water molecules and co-crystallized ligands were deleted and then the receptor protein was subsequently prepared by the Prepare Protein protocol using the Discovery Studio 2017R2 software package (DS; https://www.3ds.com/). The following tasks were performed: insertion of missing atoms in incomplete residues, modeling of missing loop regions, deleting of alternate conformations (disorder), standardizing of atom names, and protonating of titratable residues.

22 X-ray structures of TβR1 deposited in the Protein Data Bank (PDB) were analyzed using some parameters to select the most suitable TβR1 structure for next studies, such as information on the resolution and organisms of crystallographic structures, as well as the similarity values and physicochemical properties of all ligands (Additional file 1: Table S2). The overlay of the crystal structure (Additional file 1: Fig. S1) and comparing the amino acid sequence showed that the 22 crystal structure has a high consistency and a complete binding pocket. Due to the Induced Fit Theory, the TβR1 will changes its conformation on binding a ligand. Therefore, there are slight differences in the conformation of 22 co-crystal structures (Fig. 1).

Fig. 1
figure 1

Overlay of crystal structures between 4X2F (gold ribbon) and 6B8Y (blue ribbon). The ligand in 6B8Y was shown with the purple stick model

The ligand in 6B8Y (resolution 1.65 Å) with the IC50 value of 0.55 nM was the most active inhibitor of 22 molecules in co-crystal structures. Therefore, 6B8Y may be more representative of the active conformation of the target protein. 6B8Y was selected for the evaluation of scoring functions from up to 22 co-crystal structures.

The root-mean-square deviation (RMSD) value (in Å) between the best docking poses of ligands and the conformations in co-crystal structures was selected as the evaluation criterion to check the accuracy of docking program in the reproduction of the binding poses of ligands in crystal structures of TβR1-ligand complexes. Inhibitors in the above-mentioned crystal structures are shown in Fig. 1. The conformations of the inhibitors confirmed by experimental methods. Thus, 22 active conformations of inhibitors in the crystal structure were selected as the reference ligands. These 22 active conformations were re-docked back to the active site of TβR1 by molecular docking. In previous studies, RMSD values within 2.0 Å were acceptable for the molecular docking-based virtual screening [30]. Thus, in this investigation, the docking results with RMSD ≤ 2.0 Å were considered successful, and the docking results with RMSD > 2.0 Å were considered to be failed.

Receiver operating characteristic (ROC) curves were used to distinguish between “active” and “inactive” samples based on the scores. For the evaluation and validation of the sensitivity (SE) and specificity (SP) of scoring functions, a test set covering active and inactive ligands was established by downloading 8958 ligands from the DUD-E database (http://dude.docking.org/). Namely, this test set includes 281 known active ligands and 8677 ligands assumed to be inactive (referred to as decoys). The decoy compounds have similar physicochemical properties but dissimilar 2D topology (Additional file 1: Table S3) [31]. The prepared active or decoy ligands were docked into the binding pocket of TβR1, respectively and the top-scoring pose of each ligand along with its docking score was used for further analysis.

Molecular docking and scoring functions

Two molecular docking methods, LibDock and CDOCKER in DS software, were performed in this study. LibDock is a docking program developed by Diller and Merz, which uses protein site features referred to as HotSpots [32], while CDOCKER is a grid-based molecular docking method based on CHARMm [33]. Before molecular docking, the ligands in the test set were prepared using the Prepare Ligands protocol to remove duplicates, generate 3D conformations, and remove compounds with undesirable properties. For LibDock, the parameter “Conformation Method” was set to “BEST”, “Max Hits to Save” was set to “1” and “Minimization Algorithm” was set to “Smart Minimizer”. All other parameters were set to default. For CDOCKER, “Top Hits” was set to “1”, “Pose Cluster Radius” was set to ‘0.5’, and others were set to default. Only one top docking pose for each molecule was reported and saved for further analysis. The RMSD values (in Å) between the best docking poses of ligands and the conformations in co-crystal structures were calculated.

The top docking pose for each molecule was re-scored using the Score Ligand Poses protocol in the DS software. The scoring functions, including CDOCKER Scores (CDOCK), LigScore1, LigScore2, PLP1, PLP2, Jain, PMF, PMF04, Ludi Energy Estimate 1 (Ludi 1), Ludi Energy Estimate 2 (Ludi 2), Ludi Energy Estimate 3 (Ludi 3), were used to calculate ligand binding in an active receptor site. The calculated scores were reported as positive values to ensure that higher scores indicated more favorable binding.

ROC curve analysis

All ligands in the test set (281 active and 8677 inactive) were docked into the binding pocket of TβR1 (PDB ID: 6B8Y). The top docking pose for each ligand was re-scored used the above-mentioned 11 scoring functions. Using a diagnostic trial approach, the SE and SP of each scoring function for distinguishing between “active” and “inactive” samples, were calculated. In the ROC curve, the SE is plotted against 100 − SP for every cut-off point of the variable (score calculated by scoring function). The SE is defined as the ratio of true positive (TP) to the sum of TP and false negative (FN): [TP/(TP + FN)]; and the SP is defined as the ratio of true negative (TN) to the sum of TN and false positive (FP): [TN/(TN + FP)] [14]. The reliability of the scoring function was estimated using the area under the curve (AUC), together with the Chi-square test. A statistically significant difference was defined when P < 0.05. Typically, a ROC curve has an AUC baseline of 0.5, which demonstrates an evenly distributed system. An AUC value closer to 1 indicates good selectivity, whereas a value close to 0.5 indicates random selection.

Enrichment factor analysis

Taking enrichment factor (EF) as the indicator, the performance of the scoring function as a virtual screening tool applied to TβR1 was evaluated. EF is defined as:

$$ {\text{EF}} = \frac{{\mathop {\text{Hits}}\nolimits_{\text{sampled}}^{X\% } }}{{\mathop N\nolimits_{\text{sampled}}^{X\% } }} \cdot \frac{{\mathop N\nolimits_{\text{total}} }}{{\mathop {\text{Hits}}\nolimits_{\text{total}} }} $$

where \( \mathop {\text{Hits}}\nolimits_{\text{sampled}}^{X\% } \) is the number of hit compounds at X% of the database, \( \mathop {\text{N}}\nolimits_{\text{sampled}}^{X\% } \) is the number of compounds screened at X% of the database, \( \mathop N\nolimits_{\text{total}} \) is the number of compounds in the database, and \( \mathop {\text{Hits}}\nolimits_{\text{total}} \) is the number of active compounds in the database. Among the top-rank screened database compounds, the enrichment ability of active compounds is the most noteworthy. Therefore, we mainly focus on the enrichment factor at 0.5%, 1%, and 2% of the ranked database, which are defined as EF0.5%, EF1%, and EF2%.

Results and discussion

Assessment of docking methods

Properly docking the ligands to the active site is the most critical step in the virtual screening. The LibDock and CDOCKER docking programs were performed in this study. The results require evaluation to determine the best docking method for the TβR1 protein target and to maximize the probability of success. The performance of the docking programs for predicting the binding mode between ligand and TβR1 was evaluated with a success rate. 22 known active conformations of TβR1 inhibitors were re-docked into the corresponding binding pockets. As virtual screening projects always involve thousands to tens of thousands of ligands, only the top-scoring pose of each ligand was considered as the possible active conformation. The RMSD values between the docked poses with the highest score and those in co-crystal structures are listed in Table 1.

Table 1 RMSD values (in Å) between the best docking poses of ligands and the conformations in co-crystal structures for all retrieved actives ligands

Results demonstrate that the accuracy of the Lidock program (59%) is much lower than that of the CDOCKER program (95%). Thus, the CDOCKER program was chosen as the docking programs in the subsequent study, based on its reliability.

Assessment of scoring functions

Scoring and ranking the docked ligands play an important role in virtual screening. The best scoring function for TβR1 should be selected to maximize the chances of success. In this study, 281 known active and 8677 inactive ligands were used to determine the best scoring functions before a larger-scale screening of unknown compounds. In this study, the ROC curves are used to compare the performance of scoring functions in attributing better scores to active than inactive ligands. The docking results showed that, among 8959 ligands, only one active molecule and 22 inactive molecules could not be docked into the active cavity. The best docking poses for each docked ligand were recalculated with 11 scoring functions, respectively. Based on the information of known active and inactive ligands, the ROCs (Fig. 2) were drawn using statistical software, and relevant statistical parameters are listed in Table 2.

Fig. 2
figure 2

ROC curves for the scoring functions

Table 2 ROC curve results for the scoring functions

As a higher score indicates a more favorable binding, the result of the CDOCK is unreliable. The ROC curves of Ludi 1, PMF, Ludi 2, Ludi 3, PMF04, PLP1, PLP2, LigScore2, Jain and LigScore1 all tend to the upper left corner, with AUCs of 0.864, 0.856, 0.842, 0.812, 0.776, 0.774, 0.769,0.762, 0.697 and 0.660, respectively (P < 0.0001). All the AUCs of scoring function are significantly larger than those of random distribution (Reference line in Fig. 2, AUC = 0.5), which indicates that the prediction capacity of these 10 scoring functions is better than the random distribution model. The AUC of Ludi 1 is the largest, and the curve is closest to the upper left corner, indicating that Ludi 1 can efficiently distinguish active and inactive molecules. Pairwise comparison of ROC curves shows a significant difference in AUCs between Ludi 1 and other scoring functions (P < 0.0001), except for PMF (P = 0.6170). Given the high accuracy of Ludi 1 and PMF for TβR1, we strongly recommend Ludi 1 and PMF as scoring functions for virtual screening of new inhibitors of TβR1.

Large-scale bioactivity testing is an extremely expensive process. Therefore, it is essential to minimize the number of virtual screening false positives, before a large database search of active molecules and their experimental validation. Therefore, achieving a high positive predictive value (PPV = TP/(TP + FP)) is mandatory. In general, large databases exhibit a low frequency of active ligands. This suggests that increasing SP is more helpful to improve the PPV than altering SE. For example, in this study, active ligands accounted for 3% (280/8958) of the test set. Based on the results form Ludi1, with the increase in SP from 80 to 90%, 95% and 99%, the PPV value increases from 12% (222/1904) to 17% (171/1028), 22% (118/547) and 28% (34/120), respectively. If PMF instead of Ludi 1 as a scoring function, the PPV value increases from 12% (219/1862) to 18% (166/948), 24% (136/569) and 43% (66/151), respectively. As illustrated in Fig. 2, the PMF ROC curve is above the Ludi1 ROC curve in the higher SP region. Furthermore, for the same SP, the PMF ROC curve has higher SE and PPV. Therefore, the PMF scoring function is suitable for this particular research objective.

To obtaining more novel molecular structures for drug design, higher SE must be considered in the virtual screening. When SE = 0.9 (255 active compounds were obtained), a total of 3772 compounds were obtained by PMF, while only 3597 compounds were screened by Ludi1, with fewer false positive ligands. As shown in Fig. 2, the Ludi 1 ROC curve is above the PMF ROC curve in the higher SE region in the upper right corner. For the same SE, the Ludi 1 ROC curve has a higher SP. Therefore, our results indicate that Ludi 1 scoring function has more advantages to this particular objective.

It can be seen from Table 3 that for TβR1, PMF all performs best at EF0.5%, EF1%, and EF2% among all scoring functions, and Ludi 1 ranks the second. Besides, Jain is the worst among the above 10 scoring functions. In line with ROC curve analysis results, EF evaluation results also support PMF and Ludi 1 as the top two scoring functions.

Table 3 EFs of the scoring functions and pharmacophore model A02

Different virtual screening models can be complementary in the search for new inhibitors. In our previous studies, based on the crystal structure of TβR1-BMS22 complex, a pharmacophore model was constructed [14]. In this study, EF0.5%, EF1%, and EF2% of the pharmacophore model A02 was also evaluated. It can be seen from Table 3 that EFs comparison indicates that the pharmacophore model A02 has very excellent performance at EF0.5%, only lower than PMF. Meanwhile, it ranks third at EF1%, slightly lower than PMF and Ludi 1. However, the low ranking of the pharmacophore model A02 at EF2% means that the screening results may not be ideal. EF evaluation results show that the pharmacophore model can also obtain excellent virtual screening results in the top-scoring area. Therefore, we recommend that molecular docking and pharmacophore cross-validation can be performed to obtain more accurate virtual screening results.

In conclusion, our results show that PMF and Ludi1 scoring functions have high reliability in the virtual screening of new inhibitors of TβR1. According to the goal of virtual screening, the scoring function can be flexibly selected. To reduce the cost of experimental activity verification, the PMF scoring function should be chosen first. The Ludi 1 scoring function can be chosen first to obtain more novel inhibitors of TβR1.

Conclusion

Drug target interaction plays an important role in drug discovery based on target [34, 35], and molecular docking can well simulate the interaction between drug and target, which is widely used [36,37,38]. Molecular docking is routinely used in virtual screening and generally involves two separate steps. The first step is to position ligands into a protein active site correctly, and the second step is scoring and ranking these docked ligands reasonably. An increasingly large number of studies reported that the docking method and the scoring functions are sensitive to receptors.

In this study, we present a thorough investigation of the performance of the LibDock and CDOCKER protocol in DS software on a test set of 22 TβR1 protein–ligand complexes. The results demonstrate that the CDOCKER protocol based on CHARMm is the optimal docking protocol for TβR1. With a successful docking rate of 95%, which means high reliability, the CDOCKER protocol can be efficiently applied for accurate docking in medium-size virtual screening experiments of TβR1.

A dataset including 281 known active and 8677 inactive ligands was used to determine the best scoring function in DS software for scoring ligands posed by molecule docking. The ROC curves are used to compare the performance of scoring functions in attributing better scores to active than inactive ligands. The ROC curves of Ludi 1, PMF, Ludi 2, Ludi 3, PMF04, PLP1, PLP2, LigScore2, Jain and LigScore1 are better than that of the random distribution model, with AUC of 0.864, 0.856, 0.842, 0.812, 0.776, 0.774, 0.769, 0.762, 0.697 and 0.660, respectively. Based on the pairwise comparison of ROC curves, we strongly recommend that Ludi 1 and PMF should be chosen as scoring functions for virtual screening of new TβR1 inhibitors. The PMF ROC curve is above the Ludi 1 ROC curve in the higher SP region in the lower left quarter, thus for the same SP, the PMF ROC curve has higher SE and PPV values. It is advantageous to reduce the number of false positive molecules and the cost of experimental activity verification. Additionally, the Ludi 1 ROC curve is above the PMF ROC curve in the higher SE region in the upper-right corner; therefore, for the same SE, the Ludi 1 ROC curve has higher SP, which is optimal for obtaining more novel molecules.

Consistent with ROC curve, EF evaluation also supports PMF and Ludi-1 as the top two scoring functions. EF evaluation results also demonstrates that the pharmacophore model can get good virtual screening results in the highest score areas. Therefore, molecular docking and pharmacophore cross validation can be used to obtain more accurate virtual screening results.