Protein-Ligand Docking in Drug Design: Performance Assessment and Binding-Pose Selection
Main goal in drug discovery is the identification of drug-like compounds capable to modulate specific biological targets. Thus, the prediction of reliable binding poses of candidate ligands, through molecular docking simulations, represents a key step to be pursued in structure-based drug design (SBDD). Since the increasing number of resolved three-dimensional ligand-protein structures, together with the expansion of computational power and software development, the comprehensive and systematic use of experimental data can be proficiently employed to validate the docking performance. This allows to select and refine the protocol to adopt when predicting the binding pose of trial compounds in a target. Given the availability of multiple docking software, a comparative docking assessment in an early research stage represents a must-use step to minimize fails in molecular modeling. This chapter describes how to perform a docking assessment, using freely available tools, in a semiautomated fashion.
Key wordsDrug design Drug discovery Molecular docking Molecular modeling Docking assessment Structure-based drug design (SBDD)
Molecular recognition (generally referred as the non-covalent interactions between two or more molecules) is a key event in many biological systems, and its optimization represents one of the most challenging problems in drug discovery when targeting a certain protein by using small molecules . Prediction or analysis of ligand-protein interactions can be performed with different computer-aided drug design (CADD) tools, generally classified as ligand-based (LB, which depend on the information of diverse molecules that bind to the biological target) and structure-based (SB, which rely on knowledge of three-dimensional structural information from biological targets) methods . Among others, molecular docking is one of the most used structure-based drug design (SBDD) tools in medicinal chemistry to predict ligands’ binding pose in a protein, allowing to evaluate molecular interactions, induced conformational changes, and binding energetics, as well as to perform virtual screening applications . In addition, docked poses can be proficiently used in classic (ligand-based)  or per-residue (structure-based)  three-dimensional quantitative structure-activity relationship (3D QSAR) studies [6, 7]. When ligand-protein structural data from experimental methods (such as X-ray crystallography or NMR spectroscopy) are available, assessment of the docking protocol is required to estimate the reliability of the designed procedure in predicting compounds’ binding pose without experimental information, for a selected target. Different strategies can be adopted to evaluate a docking procedure: docking accuracy (DA) calculation , enrichment factor (EF) analysis , correlation between experimental and predicted binding affinities , and distance between a metal ion in the active site (if present) and the ligands’ metal-binding moieties , among others.
Performing re-docking (or self-docking) and cross-docking (or ensemble docking) simulations
Computing the relative docking accuracies based on root-mean square deviation (RMSD) values between the predicted (docked) and the experimental binding poses
Setup of working directories
Preparation of the structures and input files
Docking simulation, cluster analysis, and DA calculation
Analysis of the results
UCSF Chimera , LigandBox, and MGLTools  are used to perform step 2, AutoDock Vina  to perform molecular docking on a small set of human coagulation factor Xa (FXa) inhibitor complexes [15, 16, 17, 18, 19], and Clusterizer-DockAccessor  to perform cluster analysis and DA calculation (step 3). The whole protocol is intended to run on a Linux environment: a series of basic shell commands to be written in the Linux terminal are provided, step-by-step, to semiautomate the process (see Note 1).
AutoDock Vina (Version 1.1.2). Download AutoDock Vina  (for Linux, version 1.1.2) from http://vina.scripps.edu/download.html. For installation instructions, see http://vina.scripps.edu/manual.html#linux (see Note 2).
AutoDockTools (version 1.5.6). Go to the “MGLTools Web Portal” website (http://mgltools.scripps.edu/downloads), and download MGLTools  (MGLTools version 1.5.6; 32 or 64 bit, according to the Linux system in use). Install following the instructions available from http://mgltools.scripps.edu/downloads/instructions/linux (see Note 3).
Clusterizer-DockAccessor (Version 1.1). Download Clusterizer-DockAccessor  from http://cheminf.com/software/clusterizer_dockaccessor/. Select “Clusterizer-DockAccessor 1.1 Software,” fill the registration form, and click “Send.” An e-mail will be sent reporting a link for downloading the software. To install the software, follow the instructions reported in the user manual available from http://cheminf.com/software/clusterizer_dockaccessor/ (see Note 4).
DOCK 6 (version 6.8). Request DOCK 6  license from the UCSF DOCK website (http://dock.compbio.ucsf.edu/Online_Licensing/index.htm). An e-mail will be sent when the DOCK 6 license is accepted. Follow the instructions provided in the received e-mail to download the last release of DOCK 6. Follow the installation instructions available from the DOCK 6 manual (http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm, see Note 5).
LigandBox. Download LigandBox from http://cheminf.com/software/ligandbox; fill the registration form and click “Send.” An e-mail will be sent reporting a link for downloading the software. Follow the installation instructions available from http://cheminf.com/software/ligandbox (see Note 6).
UCSF Chimera . Download the latest Linux release of UCSF Chimera  (32 or 64 bit, according to the Linux system in use) from https://www.cgl.ucsf.edu/chimera/download.html. Click the relative “Instructions” link from the same web page for installation instructions (see Note 8).
Human Coagulation Factor Xa (FXa) PDB Files. 6 FXa co-crystal structures (PDB codes: 1EZQ, 1F0S, 1XKA, 2BOK, 2CJI, and 2FZZ) have been selected for this exercise (see Note 9). The complexes can be downloaded from the Protein Data Bank (PDB) (https://www.rcsb.org/) . A convenient way to retrieve the structures is through the Linux command line interface (terminal). Open the Linux terminal. Type into the terminal the code shown in Table 1 to create a parent directory (“FXA”) containing a child folder (“00_PDB”), and download the PDB files in there.
Command line sequence 1: retrieve structures from PDB
Set the working directories to store the input/output files.
Prepare the PDB structures and input files for molecular docking: PDB structures are “cleaned” from solvent molecules and non-interacting ions and then superimposed and protonated at physiological pH (using UCSF Chimera); unwanted lines from the PDB files are also removed (using Linux shell command lines). From each cleaned PDB complex, the relative protein (“lock”) and ligand (“key”) structures are extracted; then randomized keys’ conformations are derived (through Open Babel) to perform the subsequent “random conformation docking simulations” (see Note 10). The obtained PDBs are then converted into PDBQT format (as input files for AutoDock Vina) using Python scripts available from MGLTools. Afterward, the grid box center coordinates and dimensions are computed using LigandBox and placed into the AutoDock Vina configuration file (see Note 9).
Experimental/random conformation re-docking and cross-docking simulations (ECRD, RCRD, ECCD, and RCCD, respectively) through AutoDock Vina, followed by cluster analysis and DA calculation using Clusterizer-DockAccessor.
Analysis of the results.
Before starting the protocol, it is advisable to test the system (see Notes 2–8).
3.1 Setting the Working Directories
3.2 Preparing PDB Structures and Input Files
3.2.1 Preparing the PDB Complexes
Command line sequence 3: launch UCSF Chimera from “00_PDB ”
Removing Unnecessary Chains
Removing Solvent and Non-interacting Ions
Solvent molecules and non-interacting ions will be now removed (see Note 11).
Complex structures can be superimposed upon each other using the structure with the highest resolution and no gaps as reference.
Renaming and Saving the PDB Structures
Extracting Useful Data from PDB Files
3.2.2 Extrapolating Locks and Keys
Command line sequence 5: save LOCK and KEY PDBs
In “03_LOCK_KEY” folder, a total of six lock and six key PDB files are now available.
3.2.3 Preparing Ligand Random Conformation
Command line sequence 6: prepare ligands’ randomized conformations
Six new PDB files are now saved in “03_LOCK_KEY” folder.
3.2.4 Preparing PDBQT Input Files for AutoDock Vina
Command line sequence 7: prepare PDBQT files
3.2.5 Setting the Grid Box
Command line sequence 8: enter the “04_PDBQTs” folder and launch LigandBox
The resulting grid box can be explored through AutoDockTools (ADT).
To facilitate the visualization, display the proteins as ribbon and the ligands as ball-and-stick using the dashboard widget panel on the left (Fig. 9, top right). Click Grid ➔ Grid Box ➔ Set Dimensions, and then insert the previously obtained box parameters (from LigandBox, see Note 9) in the “Grid Options” panel to show the resulting grid box (Fig. 9, bottom).
Click File ➔ Exit to close ADT.
3.2.6 Setting the AutoDock Vina Configuration File
The grid box parameters can now be included in the AutoDock Vina configuration file:
3.3 Docking Simulations and Assessment
Go to the parent directory “FXA” (see Subheading 2, item 8) to start the docking simulations and assessment (using AutoDock Vina and Clusterizer-DockAccessor, respectively).
Command line sequence 9: set list files for iterative ECRD and RCRD docking simulations
3.3.1 Re-docking Simulations and Assessment
Command line sequence 10: ECRD (AutoDock Vina) and cluster analysis (Clusterizer)
Command line sequence 11: RCRD (AutoDock Vina) and cluster analysis (Clusterizer)
3.3.2 Cross-docking Simulations and Assessment
Command line sequence 13: ECCD (AutoDock Vina) and cluster analysis (Clusterizer)
Command line sequence 14: RCCD (AutoDock Vina) and cluster analysis (Clusterizer)
Command line sequence 15: ECCD/RCCD docking assessment (DockAccessor)
3.4 Analysis of the Results
In the “VINA” folder, four log files can be found, reporting the docking accuracy values as well as the relevant RMSDs (see Notes 14 and 15). Let’s start considering the results from re-docking simulations:
BF DA value is 100% from ECRD and RCRD. This indicates that even starting from a randomized ligand conformer (RCRD simulation), the docking sampling algorithm is capable to explore efficiently the search space.
BD poses give the highest DA value (83.33%) compared to BC (66.67% and 50% from ECRD and RCRD, respectively).
Discrepancy between the BD (or BC) and the BF docking accuracy values reflects the limitation of the scoring function.
To analyze results from cross-docking simulations:
As expected, when performing cross-docking, the docking performance results get worse as the structures of the receptor with diverse ligands can be rather different. Indeed, this is demonstrated by the lower docking accuracy values from BF poses (since the increased difficulty to sample the experimental conformer of a ligand when considering non-cognate protein structures). Moreover, BD poses still outperform the BC ones in terms of docking accuracy values (Fig. 12), suggesting to consider (in this instance) the BD poses when docking ligands with no experimental pose information.
Command lines may be written line by line (press enter or return at the end of each line) as reported in the tables (without the line number). Since temporary environment variables are set during the computation, the same Linux terminal window must be used.
AutoDock Vina program files (“vina” and “vina_split”) must be executable throughout the whole system (e.g., copied or linked to /usr/local/bin). Test: open a terminal, write “vina” or “vina_split,” and then press enter (or return). Both programs should run.
Some Python scripts from the MGLTools “Utilities24” folder (http://autodock.scripps.edu/faqs-help/faq/where-can-i-find-the-python-scripts-for-preparing-and-analysing-autodock-dockings) must be executable throughout the whole system (e.g., copied or linked to /usr/local/bin), in particular “write_conformation_from_dlg.py,” “prepare_ligand4.py,” and “prepare_receptor4.py”. Test: open a terminal, write “write_conformation_from_dlg.py” or “prepare_ligand4.py” or “prepare_receptor4.py,” and then press enter (or return). All the programs should run.
Clusterizer-DockAccessor programs must be executable throughout the whole system (e.g., copied or linked to /usr/local/bin). Test: open a terminal, write “Clusterizer.1.1.VINA.sh” or “DockAccessor.1.1.sh,” and then press enter (or return). All the programs should run.
After installing DOCK6.8, an environment variable called “DOCKPATH” (specifying the absolute path in which DOCK6 is installed) must be set: i.e., write “export DOCKPATH=/SOFTWARE/dock6” if DOCK6 is installed in /SOFTWARE/dock6. The DOCKPATH variable must be set before starting the protocol.
LigandBox program must be executable throughout the whole system (e.g., copied or linked to /usr/local/bin). Test: open a terminal, write “LigandBox.sh,” and then press enter (or return). The program should run.
Open Babel program must be executable throughout the whole system. Test: open a terminal and write “obabel”, and then press enter (or return). The program should run.
UCSF Chimera program may be executable throughout the whole system. Test: open a terminal and write “chimera”, and then press enter (or return). The program should run.
A reduced set of FXa co-crystal structures was selected for the purpose of this exercise. Since the crystal structures of PDB entries can be revised during the time, current atomic coordinates can differ from those actually used when preparing this chapter; as a consequence, the grid box center XYZ coordinates and size can differ from those herein reported.
The use of input ligand structures with randomized conformation is preferred, since it prevents biases toward the starting conformation in the sampling algorithm.
Because of the variability of the PDB files, preparation of other PDBs may differ from the one herein described. Thus, a preventive inspection of the considered PDB files is generally necessary.
UCSF Chimera assigns protonation states at physiological pH. However, a visual inspection of the protonated ligands and proteins is always recommended. If protonation at different pH is required, Open Babel is a valid alternative to be considered.
Energy minimization of 3D structures solved by X-ray crystallography is generally carried out (before docking simulation) to reduce nonphysical contacts or interactions and optimize molecular geometry. In the present exercise, energy minimization is not addressed since it is beyond the scope of this work.
Since AutoDock Vina uses a random seed for the search algorithm, a certain variability of the docking results is expected.
Extensive analyses (not discussed in this exercise) can be performed by considering the RMSD values from each docked ligand, from re-docking and cross-docking results. For example, it is possible to detect if: 1) the simulation fails when docking a certain ligand scaffold (i.e., when higher RMSD values are obtained by docking a congeneric series of compounds); 2) a representative structure from the protein ensemble can be used proficiently to dock new ligands (i.e., when from cross-docking simulations lower RMSDs are obtained by docking different ligands in a same protein conformer from the ensemble). Also, quantification of the DA results can help the user in tuning the docking parameters (e.g. AutoDock Vina’s exhaustiveness) to achieve optimal performance.
F.B. thanks Prof. Garland R. Marshall (Washington University School of Medicine in St. Louis, MO) for supporting and funding the design and development of the Clusterizer-DockAccessor protocol; Dr. Chris M. W. Ho (Drug Design Methodologies, LLC, St. Louis, MO) and Ms. Mariama Jaiteh (Uppsala University, Uppsala, Sweden) for providing insightful comments.
- 5.Ballante F, Reddy DR, Zhou NJ et al (2017) Structural insights of SmKDAC8 inhibitors: targeting schistosoma epigenetics through a combined structure-based 3D QSAR, in vitro and synthesis strategy. Bioorg Med Chem 25(7):2105–2132. https://doi.org/10.1016/j.bmc.2017.02.020 CrossRefPubMedGoogle Scholar
- 6.Kubinyi H (1993) 3D QSAR in drug design. Volume 1: theory methods and applications. Three-dimensional quantitative structure activity relationships, Vol. 1. Springer, BerlinGoogle Scholar
- 8.Bursulaya BD, Totrov M, Abagyan R et al (2003) Comparative study of several algorithms for flexible ligand docking. J Comp Aided Molec Design 17(11):755–763. https://doi.org/10.1023/B:Jcam.0000017496.76572.6f CrossRefGoogle Scholar
- 17.Scharer K, Morgenthaler M, Paulini R et al (2005) Quantification of cation-pi interactions in protein-ligand complexes: crystal-structure analysis of Factor Xa bound to a quaternary ammonium ion ligand. Angew Chem Int Ed Eng 44(28):4400–4404. https://doi.org/10.1002/anie.200500883 CrossRefGoogle Scholar
- 19.Pinto DJ, Orwat MJ, Quan ML et al (2006) 1-[3-Aminobenzisoxazol-5’-yl]-3-trifluoromethyl-6-[2’-(3-(R)-hydroxy-N-pyrrolidin yl)methyl-[1,1’]-biphen-4-yl]-1,4,5,6-tetrahydropyrazolo-[3,4-c]-pyridin-7-one (BMS-740808) a highly potent, selective, efficacious, and orally bioavailable inhibitor of blood coagulation factor Xa. Bioorg Med Chem Lett 16(15):4141–4147. https://doi.org/10.1016/j.bmcl.2006.02.069 CrossRefPubMedGoogle Scholar
- 23.The Open Babel Package. 2.4.1 http://openbabel.org. Accessed June 2017. edn.