Abstract
The prediction of the structure of protein-protein complexes based on structures or structural models of isolated partners is of increasing importance for structural biology and bioinformatics. The ATTRACT program can be used to perform systematic docking searches based on docking energy minimization. It is part of the object-oriented PTools library written in Python and C++. The library contains various routines to manipulate protein structures, to prepare and perform docking searches as well as analyzing docking results. It also intended to facilitate further methodological developments in the area of macromolecular docking that can be easily integrated. Here, we describe the application of PTools to perform systematic docking searches and to analyze the results. In addition, the possibility to perform multi-component docking will also be presented.
Key words
1 Introduction
The majority of biological processes involve protein-protein interactions. Since only a small fraction of real and putative protein-protein interactions in a cell can be determined experimentally the realistic prediction of protein-protein complex structures (protein-protein docking) is of increasing importance. The ATTRACT program (1–7) employs energy minimization in rotational and translational degrees of freedom (+ potential conformational variables) of one protein partner (ligand) with respect to the second protein (receptor). It can be used as a stand alone program but has also been integrated into the PTools molecular docking library. Flexibility of the partner structures can be taken into account by representing flexible surface side chains (and also loops) as multiple conformational copies. The ATTRACT docking minimization employs a reduced or coarse-grained protein model which is intermediated between a residue-based representation and full atomic resolution. Each residue is represented by up to four pseudo atoms (two for the backbone and up to two for each side chain) approximately accounting for the dual character of some amino acid side chains (e.g., hydrophobic and hydrophilic parts of a side chain). Small amino acid side chains (Ala, Asp, Asn, Ser, Thr, Val, Pro) are represented by one pseudo atom (geometric mean of side chain heavy atoms) whereas larger and more flexible side chains are represented by two pseudo atoms (1,8).
The repulsive and attractive LJ-parameters describe approximately the size and physico-chemical character of the side chain chemical groups. Systematic tests of the model on “bound” protein partners indicate that rigid-body-minimization of the experimental complex structures yields energy-minimized complex structures with an Rmsd (root mean square deviation) of the ligand protein from the experimental position of ~1–2 Å (1,5,8) which is comparable to energy minimization using atomistic models. A schematic view of the various steps to perform a docking search and the form of the energy function to describe effective interactions between coarse-grained centres is given in Fig. 1. The parameters have been systematically optimized by comparing the ranking of near-native solution with respect to non-native decoy complexes (8). The energy function consists of pair-wise soft Lennard-Jones type functions and an electrostatic interaction term with a distance dependent dielectric constant (ε(r) = 15r) for the interaction of charged residues. As illustrated in Fig. 1 the scoring function differs from a standard Lennard-Jones-type function in that it contains a saddle point instead of an energy minimum for certain types of pseudo atom pairs (those that are repulsive).
For systematic docking studies one of the proteins (usually the smaller protein, called the ligand protein) is used as probe and placed at various positions on the surface of the second fixed (receptor) protein. To select regularly spaced starting points a probe radius that is slightly larger than the maximum distance of any receptor atom from the ligand center is used. At each starting position on the receptor protein various initial ligand protein orientations are generated. The docking from each start position consists of a series of energy minimizations of the ligand protein with respect to the receptor protein. During the first minimization a harmonic restrain between the center of the fixed protein and the closest Cα-pseudo atom of the ligand protein can be applied. This first minimization serves to generate a close contact between the two proteins. For the subsequent energy minimizations the ligand protein is typically free to move to the closest energy minimum.
The original ATTRACT program was written in Fortran together with a set of auxiliary programs to setup docking simulations. The program is still used and further developments are supported. Indeed, a number of flexible docking options such as the inclusion of soft normal mode directions as additional variables during docking is so far only possible in the Fortran version of the program. However, in order to facilitate future methodological developments and to make it sufficiently flexible for new functionalities it was recently embedded in the docking library PTools (9) which relies on a modular, object-orientated implementation based on Python/C++ coupling. The PTools library has been designed in order to perform assembly tasks in an efficient way and to ease developments without sacrificing speed for correctness.
PTools can handle both coarse-grained as well as atomic resolution representations of biomolecular structures. It can be used for preparation, setup, running and analysis of docking minimizations following the ATTRACT protocol. It can handle docking problems of two partners but also docking of multiple protein molecules. Recent extensions include the prediction of putative binding sites on proteins and the possibility of including this information during docking based on a reweighting of the interaction scoring function. It is also possible to perform protein-DNA docking searches (5,10). The workflow of using the Ptools package and performing interface prediction as well as running a systematic protein-protein docking run will be explained in the Methods section.
2 Methods
2.1 Setting Up a Docking Simulation Using PTools
PTools can be used to perform docking searches but the library contains also several methods and scripts to load and manipulate structures (an overview is given in Fig. 2). An introduction to some of these options is given in the Notes section (see Note 1 and 2). As a default the PTools library includes the knowledge-based coarse-grained force field used by the docking program ATTRACT for protein-protein and protein-DNA docking. The coarse-grained representation of the macromolecule can be generated by the “reduce.py” script. For the docking simulation on an already known complex one can first load the PDB (Protein Data Bank) file and split it into two partners, the receptor and ligand proteins, respectively (see Note 2 for structure preparation). It is possible to perform this process within a C++ program as a series of method calls (compare Note 3 on PTools documentation):
These C++ commands can also be conveniently integrated into a Python script (via the Python bindings) that can be adapted for application to other protein docking cases.
In the following we will only describe the Python coding for the description of a protein-protein docking search. Of course, instead of splitting a complex structure as described above the two partner proteins can also be loaded separately. Using the “reduce.py” script the two protein structures will be transformed into a coarse-grained representation.
In the following the file extension “.py” indicates a Python script (the $ sign indicates that a python script needs to be invoked). The “.red” filename suffix can be used to easily distinguish reduced coordinates files from regular PDB files. The format of the coarse-grained model is an extended PDB-format with additional columns for pseudo-atom type, charge, conformational copy flag and re-weighting of interactions, respectively.
2.2 Inclusion of Experimental and Bioinformatics Data on Putative Binding Region
Although it is possible to perform a docking search without any knowledge of the interaction surface regions it can be helpful to include such information. In many protein-protein docking cases there is some knowledge on putative binding regions on either one or both protein partners available. It is possible to include this additional data directly during the ATTRACT docking search. This is achieved by giving each interaction a weight that can be modulated by external data. The weight data is stored in an extra column in the reduced PDB-file and can be generated within the PTools approach. The standard weight for each interaction is 1 and indicates that the original ATTRACT score is used. Weights of up to 2 can be used to linearly increase the contribution of selected atoms. Weights lower than 1 will decrease the interaction with those atoms. It is possible to change weights on individual pseudo atoms, for example, if there is experimental evidence for single residues participating in binding. However, it is also possible to include predictions from bioinformatics binding site prediction WEB servers. This option is outlined for the metaPPISP-Server (11) which generates a consensus prediction from several binding site prediction methods. In a comparative evaluation of binding site prediction servers the metaPPISP-Server was among the top performing prediction servers (11). The “metaPPISPprediction.py” python script sends the protein files directly to the WEB-server (Internet connection and installation of the wget program required), waits for the results and maps the prediction onto the original proteins. As a result PDB files with the suffix “_predicted.pdb” will be written with binding site probabilities in the range of 0.0–1.0 included in the B-factor column of the PDB files.
The binding site prediction can then be encoded as weights in the coarse-grained protein representations:
A third option is to directly use a binding site prediction method implemented in PTools based on electrostatic desolvation profiles (12). The method is implemented in PTools as a series of scripts to create input files and perform the necessary calculations. It finally generates interaction weights for each atom according to the prediction which can be used in the same way as described above to bias the docking towards solutions compatible with predicted binding regions.
2.3 Performing Systematic Docking Using the ATTRACT Docking Program
The ATTRACT docking program is implemented as a Python script using the PTools library. This script is also provided with the PTools package. Note, that it is also possible to use the Fortran version of the ATTRACT program which uses the same force field and input files. The Fortran version contains a few options for including side chain and global flexibility based on normal mode variables not yet implemented in the released Python/C++ version. ATTRACT performs systematic docking minimization of the interaction energy, the ligand (mobile partner) being placed at regular positions and orientations around the receptor surface (fixed partner) at a distance slightly larger than its largest dimension. For each starting position, about 200–400 initial ligand orientations are generated. Starting from each of these geometries, an energy minimization (quasi-Newton minimizer) is performed using translational and rotational degrees of freedom of the ligand. Different Python scripts are provided with the ATTRACT program to set up the input files needed by the ATTRACT docking script (see Note 4 for an overview). It requires a receptor and a ligand structure in coarse-grained representation (see above), an input file (called “attract.inp,” see Note 5 for further information) and a parameter file (“parmw.par”). The parameter file contains all pair-wise effective radii and repulsive as well as attractive Lennard-Jones type parameters to setup the force field for the docking search (8). Finally, the “attract.inp” file contains all the specifications required to process the docking simulation (number of minimization steps, cutoff, etc.). It is further explained in the PTools documentation and the Notes section. Several minimizations (with decreasing cutoff) are used and the pairlist to calculate the interactions is only generated at the beginning of each minimization.
In order to perform a systematic docking search the Python command “translate.py” (see Note 6 gives further information about generation of starting points) needs to be invoked to generate regularly spaced starting points on the surface of one of the protein partners (typically the larger partner which is also called the receptor protein).
The various orientations of the mobile partner protein (called the ligand) are stored in the “rotation.dat” file which can also be modified by the user. A systematic docking search can now be started using the “Attract.py” script
Attract docking simulations can be easily launched on distributed supercomputers since a single run option is already implemented in the PTools library. The option –t specifies which starting position (corresponding to one line in the “translation.dat” file) of the ligand should be considered for the docking simulation. Attract can then be launched in a distributed mode with selected tasks for individual docking runs. Output files can be concatenated using a simplecat command. For example, starting a docking search only from position 18 on the receptor surface (but including all starting orientations) can be performed using the following option:
2.4 Analysis of a Docking Simulation
A systematic docking search typically results in a large number of putative solutions which can be ranked according to the docking score. For a search over the complete surface of the target receptor protein the program needs ~6–15 h on a single CPU depending on the size of the protein partners and the number of starting arrangements (see Note 7 for possible failures of docking runs). Depending on the number of available CPUs this can be dramatically reduced if one employs the distributed run option explained above. It is possible to cluster the docking solutions using the “cluster.py” script, which can group nearly identical structures without requiring a preselected number of desired clusters. In the following command, the ouput file of the docking simulation (“Docking.out”) and the protein ligand (“ligand.red”) in its reduced form are used for the clustering analysis.
Each line of the clustering output file identifies a unique structure (each solution is a unique combination of translation and rotation), its energy and a weight representing how many structures are found in this cluster. With the help of the “Extract.py” script it is possible to extract single solutions and write PDB-files from the output file of a systematic search by indicating the appropriate translation and rotation number of the docking solution (Ntrans andNrot):
If the structure of the bound complex is known the quality of the predicted complex structures can be evaluated by calculating the Rmsd of the ligand protein or the interface Rmsd and the fraction of native contacts of the docking solutions.
2.5 Multi-Protein Docking Simulation
In addition to systematic docking searches on two protein partners it is possible to perform single docking minimizations on 2 or more proteins after generating coarse-grained representations of each protein. The sequence of necessary commands is given below:
After loading the force field parameters,
the three proteins are added to the docking minimization run using the AddLigand method (it is, in principle, possible to add an arbitrary number of partner proteins):
The protein A is selected as fixed receptor protein using,and docking minimization is invoked by,
After minimization, the “lbfgs” object contains the energy of the minimized system as well as the final coordinates and other variables of the docking system. The minimizer also stores the different states of the system for each minimization step. The commands for performing single docking minimizations with multiple partners can be used in new scripts to implement systematic strategies for multi protein docking.
3 Notes
-
1.
To use PTools make sure that the PTools directory is in the PATH and PYTHONPATH of your session (e.g., set it to/my/path/to/ptools and /my/path/to/ptools/PyAttract, respectively). Remember to include the PTools library in newly created python scripts.
-
2.
Protein structure files should be inspected and checked prior to docking with respect to completeness of the structure. Missing atoms or residues in the protein files should be added possibly with the aid of external programs. Generally it is prerequisite that the structure files are formatted correctly in the PDB-file format.
-
3.
For the PTools library extensive documentation is provided which goes beyond the description given above. It includes a tutorial describing every step from the compilation of the library source code to full protein—protein and also protein--DNA docking simulations. The C++ API is also automatically parsed by Doxygen (13) which generates the documentation with an exhaustive description of every class and member function within the library.
-
4.
In order to perform a systematic docking run the following files need to be in the working directory: “attract.inp” (Attract docking input file; see Note 5); “translate.dat” (stores the starting placements of the ligand protein with respect to receptor protein) (see Note 6); “rotation.dat” (stores a set of starting orientations of the ligand protein); “parmw.par” (force field scoring parameters for docking). In addition, a ligand reference structure file, termed “standard.pdb” can be used by the program for comparison with all docked structures (arbitrary filename in PTools with the ‐‐ref command option).
-
5.
The ATTRACT docking input file attract.inp is explained in the PTools and ATTRACT manuals in detail. For performing a docking search the file must be present in the working directory. An example input file with detailed description is given below:
Thefirst row in the input indicates the number of successive minimizations (four in the case above), the two 0 s on the first line indicate that no soft modes for receptor or ligand are used.
Second row: restraining coordinates for pushing the ligand on the surface of the protein (usually the center coordinates of the receptor protein), the fourth term is the force constant for the restraining potential (should not be larger than 0.001 RT/Å2).
Thenext 4 lines indicate the minimization conditions for each of the four docking minimizations (the number of lines must equal the number of minimizations chosen in the first line). Each line consists of the following entries:
Column 1. number of EM steps
Column 2. minimization method ((1) steepest descend (only used for testing), (2) variable metric)
Column 3. include rotational forces (if = 1)
Column 4. include translational forces (if = 1)
Column 5. include soft modes for receptor (if = 1)
Column 6. include soft modes for ligand (if = 1)
Column 7. number of ligand soft modes
Column 8. number of receptor soft modes
Column 9. add a restraining contribution (using parameters from the second input line), (if = 1)
Column 10. cutoff radius (squared, means 100.0 corresponds to a cutoff = 10.0 Å)
The selectivity of the current energy function is optimized for a short cutoff (rcut2 = 50 Å). A series of minimizations (with decreasing cutoff) is necessary because the pairlist to calculate the interactions is only calculated at the beginning of each minimization (the variable metric minimizer converges faster if one calculates the pairlist only once). Note, that the option of including pre-calculated normal modes as additional variables accounting for the flexibility of binding partners is currently only available in the Fortran version of the ATTRACT program.
-
6.
Starting points for systematic docking are generated with the translate.py script as described before and by default stored in the “translate.dat” file. With the default settings starting points are placed approximately evenly at the surface of the receptor with a distance between starting points of approximately 7–8 Å. Using the -d option this value can be changed which also changes the number of docking runs. Adjusting this parameter might be useful depending on the size of the system or the available computation time. For example, if the binding region is approximately known one can generate starting points at increased density and subsequently eliminates those beyond a cut off distance from the known binding region.
-
7.
If Attract.py fails to run or stops with import error messages make first sure that the PYTHONPATH is set correctly and the PTools library is included in any new python script (see Note 1). If Attract.py still fails to run make sure all necessary files are in the working folder (or in the PATH of the session) (see also Note 4). Another source of errors can be an incorrect format of pdb start structure files. It is always a good idea to have a look at the reduced structures with a visualization program before docking.
The PTools library has been developed and extensively tested for Python versions 2.4 and 2.5. Some special implementations of python can lead to a “bus error” while trying to import PTools libraries. This can be solved by using the standard Python installed by the OS or if not available by reinstalling a clean Python version 2.4 or 2.5.
References
Zacharias, M. (2003) Protein-protein docking with a reduced protein model accounting for side-chain flexibility.Protein Sci. 12, 1271–1282.
May, A., and Zacharias, M. (2005) Accounting for global protein deformability during protein-protein and protein-ligand docking.Biochem. Biophys. Acta 1754, 225–231.
Zacharias, M. (2005) ATTRACT: Protein-Protein Docking in CAPRI Using a Reduced Protein Model.Proteins 60, 252–256.
Bastard, K., Prevost, C., and Zacharias, M. (2006) Accounting for loop flexibility during protein-protein docking.Proteins 62, 956–969.
Poulain, P., Saladin, A., Hartmann, B., and Prevost, C. (2008) Insights on protein-DNA recognition by coarse-grain modeling.J. Comput. Chem. 29, 2582–2592.
May, A. and Zacharias, M. (2008) Protein-protein docking in CAPRI using ATTRACT to account for global and local flexibility.Proteins 69, 774–780.
Zacharias, M. (2010) Accounting for conformational changes during protein-protein docking.Curr. Opin. Struct. Biol. 20, 180–186.
Fiorucci, S., and M. Zacharias (2010) Binding site prediction and improved scoring during flexible protein-protein docking with ATTRACT.Proteins 78, 3131–3119.
Saladin, A., Fiorucci, S., Poulain, P., Prévost, C., and Zacharias, M. (2009) PTools: an opensource molecular docking library.BMC Struct. Biol. 9, 27–38.
Saladin, A., Amourda, C., Poulain, P., Férey, N., Baaden, M., Zacharias, M., and Delalande, O. (2010) Modeling the early stage of DNA sequence recognition within RecA nucleoprotein filaments. Nucleic Acids Res.38, 6313–6323.
Qin, S.B. and Zhou, H.-X. (2007) meta-PPISP: a meta web server for protein-protein interaction site prediction, Bioinformatics 23, 3386–338.
Fiorucci, S. and Zacharias, M. (2010) Prediction of protein-protein interaction sites using electrostatic desolvation profiles Biophys. J. 98, 1921–1930.
van Heesch, D. (2008) Doxygen: Source code documentation generator tool. [http://www.stack.nl/~dimitri/doxygen/].
Acknowldgements
We thank the Deutsche Forschungsgemeinschaft (DFG) for financial support (grant Za-153/5-3) to MZ.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Schneider, S., Saladin, A., Fiorucci, S., Prévost, C., Zacharias, M. (2012). ATTRACT and PTOOLS: Open Source Programs for Protein–Protein Docking. In: Baron, R. (eds) Computational Drug Discovery and Design. Methods in Molecular Biology, vol 819. Springer, New York, NY. https://doi.org/10.1007/978-1-61779-465-0_15
Download citation
DOI: https://doi.org/10.1007/978-1-61779-465-0_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-61779-464-3
Online ISBN: 978-1-61779-465-0
eBook Packages: Springer Protocols