Web application for studying the free energy of binding and protonation states of protein–ligand complexes based on HINT
A public web server performing computational titration at the active site in a protein–ligand complex has been implemented. This calculation is based on the Hydropathic interaction noncovalent force field. From 3D coordinate data for the protein, ligand and bridging waters (if available), the server predicts the best combination of protonation states for each ionizable residue and/or ligand functional group as well as the Gibbs free energy of binding for the ionization-optimized protein–ligand complex. The 3D structure for the modified molecules is available as output. In addition, a graph depicting how this energy changes with acidity, i.e., as a function of added protons, can be obtained. This data may prove to be of use in preparing models for virtual screening and molecular docking. A few illustrative examples are presented. In β secretase (2va7) computational titration flipped the amide groups of Gln12 and Asn37 and protonated a ligand amine yielding an improvement of 6.37 kcal mol−1 in the protein–ligand binding score. Protonation of Glu139 in mutant HIV-1 reverse transcriptase (2opq) allows a water bridge between the protein and inhibitor that increases the protein–ligand interaction score by 0.16 kcal mol−1. In human sialidase NEU2 complexed with an isobutyl ether mimetic inhibitor (2f11) computational titration suggested that protonating Glu218, deprotonating Arg237, flipping the amide bond on Tyr334, and optimizing the positions of several other polar protons would increase the protein–ligand interaction score by 0.71 kcal mol−1.
KeywordsCrystallography Computational titration Web application Gibbs free energy Protonation Proteins HINT
Even if one has an “atomic resolution” crystal structure of a protein–ligand complex, quantitatively modeling the Gibbs free energy of binding for the ligand can be challenging. There are several reasons for this difficulty, many of which are associated with interpretation of the crystal structure data. For example, the asparagine, glutamine and histidine functional groups may be rotated incorrectly, because in a typical (1.8 Å resolution or poorer) crystal structure hydrogen atoms are not visible and it is hard, if not impossible, to distinguish between N and O of the amide group (Asn or Gln) and C and N atoms in the His imidazole ring unless the interactions these functional groups make with neighboring residues are carefully considered. In fact, Weichenberger et al.  argue that the average rate of Asp and Gln rotation errors found in the current Protein Data Bank (PDB) is as high as 20% [2, 3, 4, 5, 6].
An even greater challenge is the protonation state of the system. As noted above, most crystal structures do not contain information on the positions of hydrogen atoms. This means that for some (ionizable) groups, both on the protein and potentially on the ligand, it is hard to say which protons are present, which ones are absent, and if they are present, to define their orientation. These groups are by no means isolated from each other, but influence each other’s states, such that their geometries and protonation states can not be evaluated independently of each other. The number of model possibilities in this ensemble grows exponentially with the number of ionizable groups in the active site. An active site can have several protonation states that exist at equilibrium with each other that produce many energetically accessible models. Water molecules present in the active site further complicate the problem because hydrogen atoms on them can point in various directions. Water can mediate hydrogen bonds by acting both as a Lewis base and/or as a Lewis acid (below, left) to convert a weakly repulsive polar interaction into a strongly favorable interaction. Thus, a water molecule can “buffer” the active site by rotating and changing its character from a donor to an acceptor when an interacting functional group is protonated or deprotonated (below, right).Open image in new windowOpen image in new window
Scoring Functions for Protein–Ligand Associations. For a protein ligand complex where the geometry and the ionization state are known, calculating the Gibbs free energy is still non-trivial [7, 8, 9, 10, 11]. This is commonly referred to as the “scoring function problem” and is the subject of intense research in computational chemistry. Simply put, when most modeling packages report energy, they are reporting enthalpy, not Gibbs free energy. One conventional approach to predicting the Gibbs free energy of protein–ligand binding is using simplistic scoring functions calibrated against crystal structures of protein–ligand complexes. These scoring systems are obtained by considering a set of protein–ligand crystal structures for which the experimental dissociation constant is known. protein–ligand interactions are classified and counted for each structure and other surface and flexibility-related properties can also be determined for each case. By assigning relative contributions to these interaction and energy components, the sum of effects for each complex results in a free energy “score” that should correlate with the energetics of protein–ligand binding as encoded in the dissociation constant for the complex. Examples of such scoring systems are SCORE1 , SCORE2  and ChemScore . There are obviously a large number of assumptions inherent in this approach, including additivity of contributions [15, 16, 17], the radically different experimental conditions between a low-temperature crystallographic experiment and room temperature solutions where association and dissociation measurements are made . But also, these scoring functions are based on fairly small sets of data, usually on the order of a hundred protein–ligand complexes, so they can be thrown off by interactions that occur in the training set, but are rare in the real world, or vice versa. Most importantly, these scoring systems often are quite poor for compounds that are very different from those in the training set . Knowledge-based score systems, such as DrugScore , focus instead on frequency of interaction types in known crystal structures using the assumption that the more favorable the interaction is, the more frequently it will appear. Sophisticated, and more computationally expensive, ways to find the free energy of binding in protein–ligand complexes include the free-energy-perturbation (FEP) [20, 21] and linear response [22, 23] methods that rely on statistical mechanics analysis of molecular dynamics or Monte Carlo simulations. These methods can be somewhat compromised by errors from a variety of sources .
Placing Protons and Optimizing Ionization States. The problem of placing and optimizing polar protons has been of interest for some time and a variety of methods have been used to correct ambiguous atom placement and assign protonation states and hydrogen orientations to ionizable residues in proteins. First, in terms of correcting X-ray protein crystal structures for problems with Asn, Gln, His, etc., there are several web applications available [25, 26, 27]. For example, MolProbity  can be used both for the placement of hydrogen atoms and correction of errors in protein structures, while others, like NQ-Flipper , ignore hydrogens altogether. Solutions to the more complex problems associated with ionization state evaluation have previously not been made available in a web application, but are available in a number of diverse tools. Quantum mechanics  and quantum mechanics–molecular mechanics [29, 30] approaches and methods based on molecular dynamics (MD) simulations [31, 32, 33] are available. However, most methods rely on solving the Poisson-Boltzmann equation, e.g., DelPhi , to evaluate possible protonation states and hydrogen positions [35, 36, 37, 38, 39, 40, 41]. More recently, solving the Poisson–Boltzmann equation has been combined with MD simulation [42, 43, 44, 45, 46]. Alternatively, Mehler and Guarnieri quantitatively characterize the hydrophilicity or hydrophobicity of the microenvironment around each titratable group instead of obtaining grid-based solutions to the Poisson–Boltzmann equation . However, these methods pursue a different goal than the application described in this paper in determining the protonation state and hydrogen geometry most commonly encountered for a protein, such that the results are most often described as pKas for protein residues. Also, these approaches do not generally focus on the key residues/functional groups leading to ligand binding. Our web application is pursuing that goal—finding the protonation state and hydrogen geometry for which the protein–ligand interaction is the strongest.
Hydropathic interaction has been successfully applied to many problems: in a study of protein-protein interactions in native and mutant hemoglobins, very good correlations were found between HINT scores and the free energies of dimer–dimer association [50, 51, 52]; interactions between proteins and small molecules showed a rather good correlation between HINT score and the ΔGbinding for 53 protein–ligand complexes ; studies have also indicated that HINT analysis is useful for understanding DNA–small molecule [53, 54, 55, 56] and DNA–protein [57, 58] interactions; recently HINT was also used to study the effect of tyrosine nitration of IκBα on NF-κB activation.
Computational Titration. One key advantage of HINT is its speed. It can be used for evaluating large sets of data from docking, virtual screening or other sources. This is a major advantage for examining and evaluating the multitude of different protonation states and possible geometries in a protein–ligand complex even when referencing a single high quality crystal structure of a protein–ligand complex. We refer to the collection of molecular models that would fit within the experimental electron density of a crystal structure, but differ in terms of proton placement or special rotations (Asn, Gln, His, etc.), as being “isocrystallographic”. The speed of HINT thus allows optimization of these ill-defined molecular features in a reasonable time frame. This process encompasses the computational titration algorithm for which some aspects have been described previously [60, 61, 62, 63]. In this report, we are describing an online version of the HINT-based computational titration method, i.e., a free web service for studying protonation states and Gibbs free energies of binding for protein–ligand complexes. This user-friendly web service can be used to solve quite a few potential problems in protein–ligand structural models; namely: questionable group rotations, optimized rotations for H-donor protons, and multiple interacting protonation states. The algorithm calculates a HINT score-based free energy of binding for the examined protein–ligand complex models. The server can be accessed at http://hinttools.isbdd.vcu.edu/CT.
This service joins an ever expanding field of on-line tools for computational chemistry and drug discovery. It is beyond the scope of this article to fully review this phenomenon, but a few related servers of note are: (1) The Virtual Computational Chemistry Laboratory (http://www.vcclab.org) that calculates logP using a variety of algorithms and pKa; (2) The quantum mechanics-based pKa (protonation states) prediction tool of Quantum Pharmaceuticals (http://www.q-pharm.com/ home/contents/drug_d/order_form/online_services/pka_prediction) ; (3) H++ (http://biophysics.cs.vt.edu/H++/) that computes pK values of ionizable groups in macromolecules and adds missing H atoms according to a specified pH ; and (4) PHEPS (pH-dependent Protein Electrostatics Server) (http://pheps.orgchm.bas.bg/home.html) that performs local and global pH-dependent analysis of the electrostatics in proteins .
Results and discussion
To date, the only practical way to use the computational titration algorithm was through a Sybyl [70, 71] interface to HINT available in our local development platform. While Sybyl is limited to only two platforms, Irix and Linux, we have been developing the underlying HINT toolkit across a much broader set of platforms . However, providing HINT-based tools in a usable form outside the Sybyl infrastructure and user interface has been a limiting factor. In effect, very few users have had access to computational titration. Thus, we have chosen to make it available online to a broader audience in the computational molecular design community.
Implementation and Functional Usage. The computational titration service is implemented in several layers. The entire application is controlled through a web server written in Python  that can display static web pages, like the front page, or pages generated by Python-based CGI scripts. These scripts serve several purposes. They provide an easy-to use HTML interface for the HINT program. The computational titration algorithm is implemented as a binary program written in C and linked to the HINT toolkit. This program is the heart of the web application. In addition to this application, for successful runs Python scripts also interact with Gnuplot  to make plots of the computational titration results (vide infra). Python CGI scripts are also used to catch a variety of errors in input files and to provide an intuitive interface that helps users to monitor their jobs.
Protein residue-specific optimization actions available in computational titration
Check amide orientation
Check if amide O and NH2 atoms are correctly assigned in the structure
Find the best ionization state for acid residues
Optimize R–OH rotation
If residue in acid form, exhaustively optimize rotation of –OH
Check ring orientation
Check if CD1, NE1, ND2 and CE2 atoms are correctly assigned in the structure
Find the best ionization state for imidazole ring of His residue
Find the best ionization state for amine of Lys residue
Optimize R–NH2 rotation
Exhaustively optimize the rotation of –NH2 in neutral Lys residues
Find the best ionization state for guanidine of Arg residue
Optimize R–NH rotation
If Arg is deprotonated, optimize the rotation angle of imine NH
Find the best ionization state for Tyr residues
Find the best ionization state for Cys residues
Cys, Ser, Thr, Tyr
Optimize R–XH (X–O,S) rotation
Exhaustively optimize the rotation of –XH for Cys, Ser, Thr and Tyr residues
Titrate C-term acid
Find the best ionization state for the C-terminus acid
Optimize R–OH rotation
If C-terminus in acid form, exhaustively optimize rotation of –OH
Find the best ionization state for the protein N-terminus amine
Optimize R–NH2 rotation
Exhaustively optimize the rotation of –NH2 in neutral N-termini
Ligand-specific optimization actions available in computational titration
Find the best ionization state for ligand amine
Optimize R–XH2 rotation
Exhaustively optimize the rotation of –NH2 in primary amine
Find the best ionization state for aromatic alcohol in ligand
Find the best ionization state for ligand thiol
Optimize R–XH rotation
Exhaustively optimize the rotation of –XH in ligand
Find the best ionization state for carboxylic acid in ligand
Optimize R–OH rotation
If in acid form, exhaustively optimize rotation of –OH
It should be noted that the computational titration algorithm optimizes the local environment of affected residues and ligand functional groups, but does not energy minimize the active site as a whole. Also, the tool that protonates amines and phosphines can assign unrealistic bond angles for the hydrogens it adds to these groups if the nitrogen or phosphorus is not or poorly puckered. The user should examine the geometry of the generated models and (generally) subject them to hydrogen-only molecular mechanics minimization to finalize the model. Further revisions of the program will likely include some molecular mechanics cleanup of the output models.
The computational titration server is currently a sgi Altix 350 system with multiple processor queuing and can thus execute many jobs at the same time. The queuing system is intuitive and allows users to view the progress of their jobs.
Computational Titration Case Studies. One of the difficulties encountered in virtual screening is that often there is a mismatch between the ionization state(s) of the active site residues and proposed incoming ligand if both molecules are modeled as “pH 7”. For example, if a putative ligand placed a carboxylate next to a deprotonated Asp or Glu residue, it would be rejected because of highly unfavorable interactions. Protonating either the ligand carboxylate or the acid residue could in many cases produce a strong hydrogen bond between the species. Thus, it would seem that performing virtual screening based on a crystal structure of a known protein–ligand complex, successfully identifying all potential leads would be enhanced if the hydrogen atoms in the active site are assigned correctly both in terms of protonation state and orientation. In this section we describe a few examples where either assignment of hydrogen atom configuration is not trivial, or the orientation of Asn, Gln or His residues may have been assigned incorrectly. We applied the automated computational titration analysis described above to these structures and report the results here. It should be noted that our application simultaneously varies protonation states on the ligand and in the protein active site. Many of the results we obtained show protonation states different from those typically encountered at pH 7, anomalies caused by the very specific nanoenvironments at the protein active sites.
We should point out that a prototropic (keto-enol) tautomeric change in the ligand was not considered for this protein–ligand complex because the resulting enol would not have been isocrystallographic with the ketone. (If this tautomerization occurs, the formed double bond would force the ethyl into the plane of the ring system, which would be observed in a relatively high resolution X-ray structure).Open image in new window
Computational titration produced dramatic results for the crystal structure of the human sialidase NEU2 in complex with the isobutyl ether mimetic inhibitor (R)5-acetylamino-(S)4-hydroxy-(R)6-isobutoxy-5,6-dihydro-4H-pyran-2-carboxylic acid (below) (PDB code 2f11). This structure is a refinement of 1snt . If all ionizable residues within 6 Å of the ligand were chosen to define the active set, 944,784 configuration models would have to be analyzed, too many for completion in a reasonable amount of time. A number of options for reducing the scale of this problem are available, including random sampling and calculation for a fraction of models, eliminating functional groups and/or residue types from the titratable set, and reducing the volume of the titratable active set. In this case, we have adjusted the settings by reducing the contact cutoff distance from 6 to 4 Å, and eliminated the possibility of tyrosine and cysteine ionizations, i.e., these residues were not titrated. This reduced the number of models in the set to 2,187. (In fact, no cysteine residues were present in or around the active site.)Open image in new window
While it is debatable whether such a change would occur because of the rather unfavorable pKb for guanidinium,1 to avoid unfavorable interactions with the hydrophobic part of the ligand, this model suggests that Arg237 is deprotonated, which in turn leads to flipping the amide group on the side chain of Gln270, so that the amide nitrogen of Gln270 can hydrogen bond donate to the (now) deprotonated nitrogen of Arg237 (H…N, 2.25 Å) (see Fig. 4c, d). On the other side of the Gln270 amide the oxygen forms a favorable interaction with one hydrogen of water55 (H…O, 2.50 Å). The other hydrogen of water55 forms a favorable interaction with the carboxylate group of the ligand (H…O, 2.07 Å). Thus, water55 serves as bridge between Gln270 and the carboxylate group of the ligand. Another bridging water molecule is water3. It forms one hydrogen bond by donating to the oxygen atom on the side chain of Asn86 and also hydrogen bonds by donating to Glu39. Water3 also forms a hydrogen bond by accepting from the ligand’s hydroxyl group; thus, it significantly increases the strength of favorable interactions between the protein and the ligand.
We have implemented the computational titration algorithm as a freely available web service at http://hinttools.isbdd.vcu.edu/CT. This web server is designed to be an intuitive tool that can help users improve their models of protein–ligand interactions as well as calculate the Gibbs free energies of binding for protein–ligand complexes at various acidities. The basis of free energy scoring for computational titration is the HINT program. While this scoring function is simplistic, it has been fruitfully used in a number of studies of protein–ligand interactions, and is rapid enough to make computational titration practical. It should be emphasized that absolute free energies of binding are difficult to calculate, but the relative energies and ordering between protonation models should be fairly reliable. The goal of this modeling tool, and in general most modeling tools, is to facilitate visualization of complex phenomena. The myriad of protonation ensembles in a protein–ligand complex is particularly challenging, so we feel that the computational titration tool will be of benefit to the modeling community. We should note that, because the user selects which residue types are subject to the titration protocol, the user is controlling a crucial aspect of the algorithm. In particular this is, in effect, allowing the user to decide which ionizations he believes are germane to the system. If ionization of tyrosine is selected, some models will be created with tyrosinate. We are exploring automation of this aspect, i.e., incorporating modeled pKas for each ionizable functional group in the computational titration algorithm as an intramolecular contribution to free energy that may be significant in some cases, but at present this is not available.
We are currently developing additional functionality for computational titration and will make it available as it is coded and validated. In particular we are considering: (1) improved optimization of protonated amines and phosphines. The current algorithm for adding protons in such cases is purely geometric, whereas more reasonable conformations would be obtained by a molecular mechanics approach; (2) support for titrating additional functional groups such as phosphates and sulfates; (3) support for titrating nucleotide backbone phosphates and sugars; and (4) implementation of stochastic optimization algorithms to find the best protonation state in cases where the number of potential models is exceedingly large. Nonetheless, even in its current implementation, computational titration should be a useful tool for creating starting ionization state models for protein–ligand complexes.
Guanidinium and guanidine have surprisingly similar hydropathic properties: all of the nitrogens in either case are both H-bond donors and acceptors; the unsaturation in the neutral species compensates for the loss of the H+; and the formal charge is quite delocalized.
We gratefully acknowledge the support of the US N.I.H. Grant GM071894. In addition, the assistance of Dr. P. D. Mosier in configuring the server hardware and software is greatly appreciated.
- 7.Cozzini P, Fornabaio M, Marabotti A, Abraham DJ, Kellogg GE, Mozzarelli A (2004) Curr Med Chem 11:3093Google Scholar
- 17.Dill KA (1997) J Biol Chem 272:701Google Scholar
- 27.Hooft RW, Vriend G, Sander C, Abola EE (1996) Nature 381:272Google Scholar
- 48.Hansch C, Leo AJ (1979) Substituent constants for correlation analysis in chemistry and biology. Wiley, New YorkGoogle Scholar
- 55.Kellogg GE, Scarsdale JN, Cashman DJ (1999) Med Chem Res 9:592Google Scholar
- 60.Kellogg GE, Fornabaio M, Chen DL, Abraham DJ, Spyrakis F, Cozzini P, Mozzarelli A (2006) J Mol Graph. Model 24:434Google Scholar
- 65.http://www.q-pharm.com/home/contents/drug_d/order_form/online_services/pka_prediction (accessed October 2008)
- 70.Sybyl, version 7.3 (2006) Tripos Associates Inc., 1699 S Hanley Rd., St. Louis, MO 631444, USAGoogle Scholar
- 71.http://www.edusoft-lc.com/hint (Accessed May 2008)
- 72.Kellogg GE, Fornabaio M, Chen DL, Abraham JD (2005) Internet Electr J Mol Design 4:194Google Scholar
- 73.http://www.python.org (Accessed May 2008)
- 74.http://www.gnuplot.info (Accessed May 2008)
- 76.Edwards PD, Albert JS, Sylvester M, Aharony D, Andisik D, Callaghan O, Campbell JB, Carr RA, Chessari G, Congreve M, Frederickson M, Folmer RH, Geschwindner S, Koether G, Kolmodin K, Krumrine J, Mauger RC, Murray CW, Olsson LL, Patel S, Spear N, Tian G (2007) J Med Chem 50:5912CrossRefGoogle Scholar
- 77.Chavas LMG, Tringali C, Fusi P, Venerando B, Tettamanti G, Kato R, Monti E, Wakatsuki S (2005) J Biol Chem 280:469Google Scholar