Development and validation of a modular, extensible docking program: DOCK 5

Abstract

We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein–ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 Å of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. 1.

    Kopec KK, Bozyczko-Coyne D, Williams M (2005) Biochem Pharmacol 69:1133

    Article  CAS  Google Scholar 

  2. 2.

    Congreve M, Murray CW, Blundell TL (2005) Drug Discovery Today 10:895

    Article  CAS  Google Scholar 

  3. 3.

    Kraljevic S, Stambrook PJ, Pavelic K (2004) EMBO Rep 5:837

    Article  CAS  Google Scholar 

  4. 4.

    Schnecke V, Bostrom J (2006) Drug Discovery Today 11:43

    Article  CAS  Google Scholar 

  5. 5.

    Hillisch A, Pineda LF, Hilgenfeld R (2004) Drug Discovery Today 9:659

    Article  CAS  Google Scholar 

  6. 6.

    Posner BA (2005) Curr Opin Drug Discovery Dev 8:487

    CAS  Google Scholar 

  7. 7.

    Alvarez JC (2004) Curr Opin Chem Biol 8:365

    Article  CAS  Google Scholar 

  8. 8.

    Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD (2003) Proteins 52:609

    Article  CAS  Google Scholar 

  9. 9.

    Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) J Med Chem 47:1739

    Article  CAS  Google Scholar 

  10. 10.

    Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL (2004) J Med Chem 47:1750

    Article  CAS  Google Scholar 

  11. 11.

    Kramer B, Rarey M, Lengauer T (1999) Proteins 37:228

    Article  CAS  Google Scholar 

  12. 12.

    Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Nat Rev Drug Discovery 3:935

    Article  CAS  Google Scholar 

  13. 13.

    Shoichet BK, Bodian DL, Kuntz ID (1992) J Comput Chem 13:380

    Article  CAS  Google Scholar 

  14. 14.

    Ewing TJA, Kuntz ID (1997) J Comput Chem 18:1175

    Article  CAS  Google Scholar 

  15. 15.

    Leach AR, Kuntz ID (1992) J Comput Chem 13:730

    Article  CAS  Google Scholar 

  16. 16.

    Meng EC, Shoichet BK, Kuntz ID (1992) J Comput Chem 13:505

    Article  CAS  Google Scholar 

  17. 17.

    Lischner R (2003) C++ in a nutshell. 1st edn. O’Reilly Media, Inc, Sebastopol, CA

    Google Scholar 

  18. 18.

    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res 28:235

    Article  CAS  Google Scholar 

  19. 19.

    Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) J Mol Biol 267:727

    Article  CAS  Google Scholar 

  20. 20.

    Pang YP, Perola E, Xu K, Prendergast FG (2001) J Comput Chem 22:1750

    Article  CAS  Google Scholar 

  21. 21.

    Perola E, Walters WP, Charifson PS (2004) Proteins 56:235

    Article  CAS  Google Scholar 

  22. 22.

    Nissink JW, Murray C, Hartshorn M, Verdonk ML, Cole JC, Taylor R (2002) Proteins 49:457

    Article  CAS  Google Scholar 

  23. 23.

    Kuhl FS, Crippen GM, Friesen DK (1984) J Comput Chem 5:24

    Article  CAS  Google Scholar 

  24. 24.

    Nelder JA, Mead R (1965) Comput J 7:308

    Google Scholar 

  25. 25.

    Gropp W, Lusk E, Doss N, Skjellum A (1996) Parallel Computing 22:789

    Article  Google Scholar 

  26. 26.

    SYBYL, Tripos, Inc., St. Louis, Missouri, 63144

  27. 27.

    Case DA, Darden TA, Cheatham III, TE, Simmerling CL, Wang J, Duke RE, Luo R, Merz KM, Wang B, Pearlman DA, Crowley M, Brozell S, Tsui V, Gohlke H, Mongan J, Hornak V, Cui G, Beroza P, Schafmeister C, Caldwell JW, Ross WS, Kollman PA (2004) AMBER 8, University of California, San Francisco

  28. 28.

    Jakalian A, Bush BL, Jack DB, Bayly CI (2000) J Comput Chem 21:132

    Article  CAS  Google Scholar 

  29. 29.

    Hann MM, Oprea TI (2004) Curr Opin Chem Biol 8:255

    Article  CAS  Google Scholar 

  30. 30.

    Oprea TI (2002) J Comput-Aided Mol Des 16:325

    Article  CAS  Google Scholar 

  31. 31.

    Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) J Chem Inf Model 41:1308

    CAS  Article  Google Scholar 

  32. 32.

    Brooijmans N (2003) Theoretical studies of molecular recognition, Graduate Department of Chemistry and Chemical Biology, University of California, San Francisco, San Francisco, CA

  33. 33.

    Purcell WP, Singer JA (1967) J Chem Eng Data 12:235

    Article  CAS  Google Scholar 

  34. 34.

    Gasteiger J, Marsili M (1980) Tetrahedron 36:3219

    Article  CAS  Google Scholar 

  35. 35.

    Aqvist J, Warshel A (1990) J Am Chem Soc 112:2860

    Article  Google Scholar 

  36. 36.

    Merz KM, Murcko MA, Kollman PA (1991) J Am Chem Soc 113:4484

    Article  CAS  Google Scholar 

  37. 37.

    Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA (1995) J Am Chem Soc 117:5179

    Article  CAS  Google Scholar 

  38. 38.

    Richards FM (1977) Ann Rev Biophys Bioeng 6:151

    Article  CAS  Google Scholar 

  39. 39.

    DesJarlais RL, Sheridan RP, Seibel GL, Dixon JS, Kuntz ID, Venkataraghavan R (1988) J Med Chem 31:722

    Article  CAS  Google Scholar 

  40. 40.

    Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE (1982) J Mol Biol 161:269

    Article  CAS  Google Scholar 

  41. 41.

    Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) J Comput Chem 25:1605

    Article  CAS  Google Scholar 

  42. 42.

    Meng EC, Lewis RA (1991) J Comput Chem 12:891

    Article  CAS  Google Scholar 

  43. 43.

    Mills JEJ, Dean PM (1996) J Comput-Aided Mol Des 10:607

    Article  Google Scholar 

  44. 44.

    Irwin JJ, Shoichet BK (2005) J Chem Inf Model 45:177

    Article  CAS  Google Scholar 

  45. 45.

    The results for the FlexX test set are available at http://www.biosolveit.de/FlexX/

  46. 46.

    The results for the GOLD test set are available at http://www.ccdc.cam.ac.uk/products/life_sciences/validate/gold_validation/value.html

  47. 47.

    Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Arthurs S, Colson AB, Freer ST, Larson V, Luty BA, Marrone T, Rose PW (2000) J Comput-Aided Mol Des 14:731

    Article  CAS  Google Scholar 

  48. 48.

    Kuntz ID, Agard DA (2003) Adv Protein Chem 66:1

    CAS  Google Scholar 

  49. 49.

    Gschwend DA, Kuntz ID (1996) J Comput-Aided Mol Des 10:123

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Gratitude is expressed to Dr. Bentley Strockbine and Sudipto Mukherjee for computational assistance with MPI calculations. Demetri Moustakas, Natasja Brooijmans, P. Therese Lang and Irwin D. Kuntz would like to thank the NIH grant GM 56531 (Paul Ortiz de Montellano, PI) for support. P. Therese Lang would also like to thank the Burroughs Welcome Foundation and the American Foundation for Pharmaceutical Education for additional support. The authors would like to thank Scott Brozell, Mathew Jacobson, and Brian Shoichet and members of his group for helpful conversations.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Irwin D. Kuntz.

Additional information

D. T. Moustakas and P. T. Lang are joint first authors

Electronic Supplementary Material

The structure files for the test set and the optimized input files used to generate this data can be found at the DOCK web site (http://dock.compbio.ucsf.edu).

Appendix 1

Appendix 1

Rigid docking parameter optimization

The parameters listed in Appendix  1 control the sampling of ligand poses within the receptor active site during rigid ligand docking. The parameters that control the step sizes for the simplex minimizer (simplex_trans_step, simplex_rot_step, and simplex_tors_step) were optimized in a previous study and were held at those values [14, 49]. For the remaining parameters—the number of orientations (max_orientations) and the number of minimization steps (simplex_final_max_iterations)—a series of rigid ligand docking experiments were performed to optimize the DOCK score for the top ranking pose averaged over the entire test set and the success rate, defined as the orientation of the top ranking pose being within 2 Å heavy atom RMSD from the crystal ligand. The success rate and DOCK scores initially improved as the number of orientations and the amount of minimization increased and then converged (Fig. 10). We selected the lowest converged values—1,000 orientations and 1,000 minimization steps—as optimal.

Appendix 1 Description of and optimized default values for parameters that affect rigid ligand docking
Fig. 10
figure10

Optimization of parameters for rigid ligand docking. Parameters of 50 (□), 100 (◯), 1,000 (▽), and 10,000 (▹) minimization steps (simplex_final_max_iterations) are examined as a function of the number of orientations (max_orientations)

Flexible docking parameter optimization

For the more complex flexible ligand algorithm, the parameter optimization was performed first on the anchor docking, and the best parameters were then used for optimizing the growth. The parameters that control the sampling in both these steps are listed in Appendix 2. As for rigid ligand docking, the parameters that control step sizes for the simplex minimizer were set to the previously defined optimal values.

Appendix 2 Description of and optimized default values for parameters that affect flexible ligand docking

The first step in the anchor-and-grow algorithm is ring identification or anchor segmentation. All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomena as sugar puckering and chair-boat hexane conformations, the user needs to supply each ring conformation as a separate input molecule. If the molecule does not have a ring, the largest rigid segment is specified as the anchor. Additional bonds may be specified as rigid by the user. For simplicity, all runs in this study used the default of largest anchor only. If the molecule had multiple anchors of the same size, the first anchor on the anchor list was used. Once the anchor had been identified, the parameters that control the number of anchor orientations (max_orientations), the number of anchor minimization steps (simplex_anchor_max_iterations), and the cutoff for the anchor pruning (num_confs_for_next_growth) were explored. Because the anchors are substructures of the ligand, the parameter convergence was monitored as a function of the RMSD between the anchor orientation and the corresponding substructure of the crystal ligand averaged over all generated orientations before the pruning function. When the number of anchor orientations and minimization steps were varied systematically, the number of minimization steps converged at 500 (Fig. 11a). We expected this optimized value to be lower than rigid docking because anchors are typically smaller than the final ligand.

Fig. 11
figure11

Optimization of parameters for flexible ligand docking. (a) Parameter optimization for anchor sampling portion of flexible ligand docking. TOP: Parameters of 0 (□), 50 (◯), 100 (▵), and 500 (▽) anchor minimization steps (simplex_anchor_max_iterations) are plotted as a function of the number of orientations (max_orientations). BOTTOM: Parameters of 50 (vertical stripes), 500 (filled), and 5,000 (diagonal stripes) anchor orientations (max_orientations) are compared using an anchor pruning cutoff (num_confs_for_next_growth) of 100. (b) Parameter optimization for growth sampling portion of flexible ligand docking. Growth pruning cutoffs (num_confs_for_next_growth) of 25 (◯), 50 (▵), 100 (▽), and 200 (◊) are plotted as a function of the number of growth minimization steps (simplex_grow_max_iterations)

Because the anchor orientations are pruned before the growth step, we used the optimized number of minimization steps while exploring the number of anchor orientations and the pruning cutoff. The optimal anchor pruning cutoff of 100 was chosen as a balance between convergence and the length of the calculation, which remained fixed for the final exploration of the number of orientations. The optimal number of orientations was selected to be 500 because the combination of these three variables generated the highest number of anchors near the crystal structure (Fig. 11a). Note that if the number of orientations was increased beyond the selected value, the number of anchors near the crystal structure dropped dramatically. We hypothesized that this resulted from a combination of increased sampling and pruning. The pruning function was designed to identify a representative orientation from each energy well that the matching algorithm finds (see Introduction: DOCK background). As sampling increased, the ranked orientations began to converge toward the bottom of the deepest energy wells, sampling less of the alternative high energy wells. Because the pruning function is designed to supply the most diverse ligands, fewer orientations made it through the pruning step as the sampling is increased. We felt that this effect was reducing the potential sampling for the algorithm and plan to explore alternatives in future studies.

The next step in the anchor-and-grow algorithm is flexible bond identification. Each flexible bond is associated with a label defined in an editable file. The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types and chemical environment of the bonded atoms. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.

Using the optimal anchor parameters, we varied number of minimization steps for each layer of growth (simplex_grow_max_iterations) and the cutoff of number of conformers for the growth pruning function (num_confs_for_next_growth). Because the dock run now creates a complete pose, we return to using a combination of the score for the top ranking pose averaged over the entire test set and the success rate to monitor convergence. As with rigid ligand docking, the success rate improves modestly with improved sampling and eventually converges (Fig. 11). However, although DOCK scores improved as the number of orientations and the amount of minimization increased, the values do not converge. We once again attribute this phenomenon to the pruning function. Therefore, we used the success rate to select the lowest converged values—500 minimization steps and the cutoff for the number of conformers for the growth section as 100—as optimal.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Moustakas, D.T., Lang, P.T., Pegg, S. et al. Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des 20, 601–619 (2006). https://doi.org/10.1007/s10822-006-9060-4

Download citation

Keywords

  • Automated docking
  • Scoring functions
  • Structure-based drug design
  • Flexible docking
  • Binding mode prediction
  • Incremental construction
  • Validation