We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein–ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 Å of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Kopec KK, Bozyczko-Coyne D, Williams M (2005) Biochem Pharmacol 69:1133
Congreve M, Murray CW, Blundell TL (2005) Drug Discovery Today 10:895
Kraljevic S, Stambrook PJ, Pavelic K (2004) EMBO Rep 5:837
Schnecke V, Bostrom J (2006) Drug Discovery Today 11:43
Hillisch A, Pineda LF, Hilgenfeld R (2004) Drug Discovery Today 9:659
Posner BA (2005) Curr Opin Drug Discovery Dev 8:487
Alvarez JC (2004) Curr Opin Chem Biol 8:365
Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD (2003) Proteins 52:609
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) J Med Chem 47:1739
Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL (2004) J Med Chem 47:1750
Kramer B, Rarey M, Lengauer T (1999) Proteins 37:228
Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Nat Rev Drug Discovery 3:935
Shoichet BK, Bodian DL, Kuntz ID (1992) J Comput Chem 13:380
Ewing TJA, Kuntz ID (1997) J Comput Chem 18:1175
Leach AR, Kuntz ID (1992) J Comput Chem 13:730
Meng EC, Shoichet BK, Kuntz ID (1992) J Comput Chem 13:505
Lischner R (2003) C++ in a nutshell. 1st edn. O’Reilly Media, Inc, Sebastopol, CA
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res 28:235
Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) J Mol Biol 267:727
Pang YP, Perola E, Xu K, Prendergast FG (2001) J Comput Chem 22:1750
Perola E, Walters WP, Charifson PS (2004) Proteins 56:235
Nissink JW, Murray C, Hartshorn M, Verdonk ML, Cole JC, Taylor R (2002) Proteins 49:457
Kuhl FS, Crippen GM, Friesen DK (1984) J Comput Chem 5:24
Nelder JA, Mead R (1965) Comput J 7:308
Gropp W, Lusk E, Doss N, Skjellum A (1996) Parallel Computing 22:789
SYBYL, Tripos, Inc., St. Louis, Missouri, 63144
Case DA, Darden TA, Cheatham III, TE, Simmerling CL, Wang J, Duke RE, Luo R, Merz KM, Wang B, Pearlman DA, Crowley M, Brozell S, Tsui V, Gohlke H, Mongan J, Hornak V, Cui G, Beroza P, Schafmeister C, Caldwell JW, Ross WS, Kollman PA (2004) AMBER 8, University of California, San Francisco
Jakalian A, Bush BL, Jack DB, Bayly CI (2000) J Comput Chem 21:132
Hann MM, Oprea TI (2004) Curr Opin Chem Biol 8:255
Oprea TI (2002) J Comput-Aided Mol Des 16:325
Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) J Chem Inf Model 41:1308
Brooijmans N (2003) Theoretical studies of molecular recognition, Graduate Department of Chemistry and Chemical Biology, University of California, San Francisco, San Francisco, CA
Purcell WP, Singer JA (1967) J Chem Eng Data 12:235
Gasteiger J, Marsili M (1980) Tetrahedron 36:3219
Aqvist J, Warshel A (1990) J Am Chem Soc 112:2860
Merz KM, Murcko MA, Kollman PA (1991) J Am Chem Soc 113:4484
Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA (1995) J Am Chem Soc 117:5179
Richards FM (1977) Ann Rev Biophys Bioeng 6:151
DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID, Venkataraghavan R (1988) J Med Chem 31:722
Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE (1982) J Mol Biol 161:269
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) J Comput Chem 25:1605
Meng EC, Lewis RA (1991) J Comput Chem 12:891
Mills JEJ, Dean PM (1996) J Comput-Aided Mol Des 10:607
Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45:177
The results for the FlexX test set are available at http://www.biosolveit.de/FlexX/
The results for the GOLD test set are available at http://www.ccdc.cam.ac.uk/products/life_sciences/validate/gold_validation/value.html
Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Arthurs S, Colson AB, Freer ST, Larson V, Luty BA, Marrone T, Rose PW (2000) J Comput-Aided Mol Des 14:731
Kuntz ID, Agard DA (2003) Adv Protein Chem 66:1
Gschwend DA, Kuntz ID (1996) J Comput-Aided Mol Des 10:123
Gratitude is expressed to Dr. Bentley Strockbine and Sudipto Mukherjee for computational assistance with MPI calculations. Demetri Moustakas, Natasja Brooijmans, P. Therese Lang and Irwin D. Kuntz would like to thank the NIH grant GM 56531 (Paul Ortiz de Montellano, PI) for support. P. Therese Lang would also like to thank the Burroughs Welcome Foundation and the American Foundation for Pharmaceutical Education for additional support. The authors would like to thank Scott Brozell, Mathew Jacobson, and Brian Shoichet and members of his group for helpful conversations.
D. T. Moustakas and P. T. Lang are joint first authors
Electronic Supplementary Material
The structure files for the test set and the optimized input files used to generate this data can be found at the DOCK web site (http://dock.compbio.ucsf.edu).
Rigid docking parameter optimization
The parameters listed in Appendix 1 control the sampling of ligand poses within the receptor active site during rigid ligand docking. The parameters that control the step sizes for the simplex minimizer (simplex_trans_step, simplex_rot_step, and simplex_tors_step) were optimized in a previous study and were held at those values [14, 49]. For the remaining parameters—the number of orientations (max_orientations) and the number of minimization steps (simplex_final_max_iterations)—a series of rigid ligand docking experiments were performed to optimize the DOCK score for the top ranking pose averaged over the entire test set and the success rate, defined as the orientation of the top ranking pose being within 2 Å heavy atom RMSD from the crystal ligand. The success rate and DOCK scores initially improved as the number of orientations and the amount of minimization increased and then converged (Fig. 10). We selected the lowest converged values—1,000 orientations and 1,000 minimization steps—as optimal.
Flexible docking parameter optimization
For the more complex flexible ligand algorithm, the parameter optimization was performed first on the anchor docking, and the best parameters were then used for optimizing the growth. The parameters that control the sampling in both these steps are listed in Appendix 2. As for rigid ligand docking, the parameters that control step sizes for the simplex minimizer were set to the previously defined optimal values.
The first step in the anchor-and-grow algorithm is ring identification or anchor segmentation. All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomena as sugar puckering and chair-boat hexane conformations, the user needs to supply each ring conformation as a separate input molecule. If the molecule does not have a ring, the largest rigid segment is specified as the anchor. Additional bonds may be specified as rigid by the user. For simplicity, all runs in this study used the default of largest anchor only. If the molecule had multiple anchors of the same size, the first anchor on the anchor list was used. Once the anchor had been identified, the parameters that control the number of anchor orientations (max_orientations), the number of anchor minimization steps (simplex_anchor_max_iterations), and the cutoff for the anchor pruning (num_confs_for_next_growth) were explored. Because the anchors are substructures of the ligand, the parameter convergence was monitored as a function of the RMSD between the anchor orientation and the corresponding substructure of the crystal ligand averaged over all generated orientations before the pruning function. When the number of anchor orientations and minimization steps were varied systematically, the number of minimization steps converged at 500 (Fig. 11a). We expected this optimized value to be lower than rigid docking because anchors are typically smaller than the final ligand.
Because the anchor orientations are pruned before the growth step, we used the optimized number of minimization steps while exploring the number of anchor orientations and the pruning cutoff. The optimal anchor pruning cutoff of 100 was chosen as a balance between convergence and the length of the calculation, which remained fixed for the final exploration of the number of orientations. The optimal number of orientations was selected to be 500 because the combination of these three variables generated the highest number of anchors near the crystal structure (Fig. 11a). Note that if the number of orientations was increased beyond the selected value, the number of anchors near the crystal structure dropped dramatically. We hypothesized that this resulted from a combination of increased sampling and pruning. The pruning function was designed to identify a representative orientation from each energy well that the matching algorithm finds (see Introduction: DOCK background). As sampling increased, the ranked orientations began to converge toward the bottom of the deepest energy wells, sampling less of the alternative high energy wells. Because the pruning function is designed to supply the most diverse ligands, fewer orientations made it through the pruning step as the sampling is increased. We felt that this effect was reducing the potential sampling for the algorithm and plan to explore alternatives in future studies.
The next step in the anchor-and-grow algorithm is flexible bond identification. Each flexible bond is associated with a label defined in an editable file. The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types and chemical environment of the bonded atoms. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.
Using the optimal anchor parameters, we varied number of minimization steps for each layer of growth (simplex_grow_max_iterations) and the cutoff of number of conformers for the growth pruning function (num_confs_for_next_growth). Because the dock run now creates a complete pose, we return to using a combination of the score for the top ranking pose averaged over the entire test set and the success rate to monitor convergence. As with rigid ligand docking, the success rate improves modestly with improved sampling and eventually converges (Fig. 11). However, although DOCK scores improved as the number of orientations and the amount of minimization increased, the values do not converge. We once again attribute this phenomenon to the pruning function. Therefore, we used the success rate to select the lowest converged values—500 minimization steps and the cutoff for the number of conformers for the growth section as 100—as optimal.
About this article
Cite this article
Moustakas, D.T., Lang, P.T., Pegg, S. et al. Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des 20, 601–619 (2006). https://doi.org/10.1007/s10822-006-9060-4
- Automated docking
- Scoring functions
- Structure-based drug design
- Flexible docking
- Binding mode prediction
- Incremental construction