FastGrow: on-the-fly growing and its application to DYRK1A

Fragment-based drug design is an established routine approach in both experimental and computational spheres. Growing fragment hits into viable ligands has increasingly shifted into the spotlight. FastGrow is an application based on a shape search algorithm that addresses this challenge at high speeds of a few milliseconds per fragment. It further features a pharmacophoric interaction description, ensemble flexibility, as well as geometry optimization to become a fully fledged structure-based modeling tool. All features were evaluated in detail on a previously reported collection of fragment growing scenarios extracted from crystallographic data. FastGrow was also shown to perform competitively versus established docking software. A case study on the DYRK1A kinase, using recently reported new chemotypes, illustrates FastGrow’s features in practice and its ability to identify active fragments. FastGrow is freely available to the public as a web server at https://fastgrow.plus/ and is part of the SeeSAR 3D software package. Supplementary Information The online version contains supplementary material available at 10.1007/s10822-022-00469-y.

Pose re-prediction performance on the cross-growing set for the shape-based "Growing" and restrained JAMDA "Optimization". The error bars are 95% confidence intervals

Statistical Methods
Performance of pose re-prediction was expressed as a percentage of poses a method generates that have an RMSD of less than 2Å. The threshold of less than 2Å made this a binary classification. A binary classification and its results, which generate a binomial distribution, approaches a normal distribution. Within certain bounds we could therefore apply parametric methods [1]. A 95% confidence interval in a normal distribution covers 1.96 standard deviations. All pose re-prediction experiments were tested whether they could be approximated by a normal distribution. The confidence interval could then be calculated using the individual standard deviations.
Area under the curve (AUCs) of the receiver operating characteristics (ROC) curves were used to evaluate the enrichment performance in the fragment growing enrichment validation. ROC curves and their AUCs are very specific but established evaluations of enrichment screening [2]. 95% confidence intervals for AUCs were calculated by bootstrap re-sampling [3]. This too can now be considered established in literature by multiple high-profile publications [4,5].
3 Performance on the Cross-growing Set Figure S1 shows FastGrow pose re-prediction performance on the cross-growing set using just the shape-based growing and restrained JAMDA optimization. The main publication states movement in the core atoms was to blame for inconclusive performance differences. Core atoms are not actually considered in any RMSD used for validation, but small movements in the core atoms may have had a large impact on the fragment atoms. Especially changes in angles may have cascaded into large RMSDs. Pose re-prediction performance on the subset of cross-growing systems with stable interactions used as search points. To the left is the shape-based "Growing". The middle bar represents shape-based growing using "Search Points". The rightmost bar represents growing with search points and subsequent restrained JAMDA optimization. The error bars are 95% confidence intervals

Maintaining Interactions
In general the pose generation performance of growings with search points was better than those without. On this subset of the cross-growing set purely shape-based growing achieved 64.3% (95% CI [58.4%, 70.2%]) in re-predicting ligand poses. The addition of search points improved on this with a 73.8% success rate (95% CI [68.4%, 79.2%]). It took subsequent restrained JAMDA optimization to significantly improve poses with a success rate of 77.0% (95% CI [71.8%, 82.2%]).
Search points noticeably moved poses closer to the crystal pose of the reference ligand they were taken from. The main validation question in this subset, however, was whether the interactions the search points were generated from could actually maintained. All three methods performed quite similarly when it came to hydrophobic interactions. These are the least geometric interactions and are probably most determined by pose quality. The main publication discussed the more relevant differences in the quality of hydrogen bond interactions. Figure S3 shows the performance of FastGrow on the water replacement subset of the cross-growing set for purely shape-based growing, using the water replacement search points and performing restrained JAMDA optimization after using the water replacement search points. None of the differences had well separated confidence intervals.

Water Replacement
Performing restrained JAMDA optimization after using the water replacement search points achieved a very similar performance as using optimization and search points in the interaction maintenance validation, further supporting those results. It was interesting to see how search points alone seemed to To the left is the shape-based "Growing". The middle bar shows the performance of placing search points on water positions that are replaced by the ligand to be grown. The right bar represents water replacement by search points with restrained JAMDA optimization. The error bars are 95% confidence intervals perform worse in this situation. A possible explanation is that because these search points had less to do with the ligand to be grown than in the interaction maintenance section the ligand was initially forced into somewhat more uncomfortable poses. These were then relaxed using JAMDA optimization and lead to better poses.
6 Ensemble Flexibility Figure S4 shows the performance of growing with ensembles in comparison to growing with the single input protein, which was used as a query to build the ensembles. Shape-based growing with one input binding site managed to successfully generate poses for 69.3% (95% CI [63.5%, 75.0%]) of the ensemble test cases. Neither growing with ensembles at 72.8% (95% CI [67.2%, 78.3%]) of poses less than 2Å nor growing with ensembles with optimization at 75.2% (95% CI [69.8%, 80.6%]) significantly outperformed shape-based growing. Both were slightly better but none of the error bars could be considered separated.
At first glance the results seemed to suggest that growing with ensembles does not lead to significantly better results. 202 of 245 test cases did not cross the less than 2Å threshold in either direction when comparing growing with one input protein and growing with ensembles. 26 test cases that failed with just one input protein are successful when growing with an ensemble. 17 test cases that were successful with just one input protein, failed with ensembles. Although the effect is small more cases are improved with ensemble flexibility.
The phenomenon that some results get worse using ensembles is usually attributed to the tolerance introduced by multiple proteins leading to false positive poses [6]. This will no doubt also be a contributing factor in this validation. The largest fraction of test cases however does not change significantly due to ensembles. Both of these points highlight the fact that these ensembles are generated automatically and with only minimal curation in the form of To the left is shape-based "Growing" with one input binding site. In the middle is the "Ensemble" growing, which uses a binding site ensemble generated SIENA. The right-most bar represents ensemble growing with subsequent restrained JAMDA optimization. The error bars are 95% confidence intervals an all atom RMSD clustering. This is bound to introduce unnecessary noise. It is, however, difficult to build a large number of curated ensembles by hand without introducing bias, which is why we chose an automated procedure.

Docking and Growing failures
The performance of DOCK on the cross-growing set was lower than expected. Allen et al. determined the average cross-docking success rate for ligands in non-native pocket to be 50.8% using the protocol replicated here [7]. There were several reasons for this difference in performance. One of the most impactful is that no physico-chemical filters, such as the "Rule of Five" [8], were applied to the full ligands when generating the cross-growing test cases. The fragments had to be "Rule of Three" [9] compliant but the ligands were often significantly more complex than either set of property rules would allow. The anchored docking which replicated the fixed-anchor docking (FAD) protocol from the same publication seemed to be more in line with performance expectations.
All cross-growing test cases that failed in docking produced the error message: "Could not complete growth. Confirm that the grid box is large enough to contain the ligand, and try increasing max orientations". Both of the suggested fixes were attempted. Manual verification showed that the ligand to be grown could be fit into docking box. The max orientations parameter was consecutively increased by several orders of magnitude, unfortunately to no effect.
The error message appeared to describe a spatial sampling issue. Due to the fact that cross-growing test cases attempt to grow ligands into pockets they were not crystallized in, which means the binding sites may not have been in an ideal conformation for that ligand, these errors were to be expected. This effect was probably the cause of some of the FastGrow failures as well. One of the FastGrow failures, growing the ligand from PDB code: 5J20 into PDB code: 5J64, could be solved by generating more conformations when creating the fragment databases. This failure was clearly due to undersampling of the conformational space of the ligand. 96% of test cases were, however, unaffected, which still guaranteed a sufficient statistical sample for the comparison to docking.
Allen et al. solved this problem by attempting to minimize a foreign ligand, in their case in a cross-docking scenario, in the binding site and discarding it if it changed by more than an RMSD of 2Å [7]. We chose not to do this because making decisions based on minimization may introduce bias, especially if the same or a similar minimizer is used in the evaluation as well. In their paper [7] they recognize that there are different ways to approach this problem and we hope they will not begrudge us the decision against involving minimization.

Property Matching in Detail
Property matching was implemented as a simple procedure that limited the active and generic fragments to a common min-max range with the "Rule of Three" properties [9]. This is a far simpler procedure than implemented for example by Stein et al. [5] and mostly reflects the limited amount of data available. More complex methods to fit distributions of properties to each other eliminated either too many actives or generic fragments for the data set to be useful. Figure S5 shows the performance of sorting by "Rule of Three" properties as ROC curves. Only the sorting direction, ascending or descending by value, that performed better was considered for each property. The discrete properties "Lipinski Acceptors", "Lipinski Donors" and "Number of Rotatable Bonds(NROT)" were not well suited to evaluation by ROC curves. The variance present for these properties in Figure S5 is generated by randomizing the order of the rows within the discrete categories of these variables, for example "Lipinski Acceptor" counts of 1 to 5. There is actually only one value for "Lipinski Donor" counts in the data set which is zero. "Topological Polar Surface Area (TPSA)" was also an almost discrete value in this context. TPSA is calculated by summing up polar surface area contributions from fragments [10]. Because we calculated TPSA for fragments, which were comparatively small, the variance was small as well, which led to an almost categorical distribution. Molecular weight is the best performing continuous "Rule of Three" property and was used as the baseline.
The residual performance of the "Rule of Three" properties was a function of the different distribution of those properties in the active and generic set. Figure S6 shows molecular weight histograms for the active and generic fragments. Both histograms span the same range of values for molecular weight but have noticeably different distributions. The decoy distribution skews slightly towards lower molecular weights and the active distribution skews slightly to higher molecular weights. This results in the residual 0.65 ± 0.11 AUC when the full fragment set was sorted by descending molecular weight.   Figure S7 shows the ROC curves and AUCs of the enrichment case study including a repeat screening with the pose scoring function that only used its clash term labeled "Only Clash". The pose scoring function with all terms and with only the clash term perform comparably in this scenarios. Only clash performs slightly better than the full pose scoring function which was never parametrized for this task. This clearly suggests that clash is a very discriminative property in this scenario.