A proper understanding of protein pKa values is essential to a proper understanding of pH-dependent characteristics of protein function. If the pKa of a particular group is known then one can determine its protonation state at a given pH, helping to determine several important properties including protein solubility, protein folding and catalytic activity. Knowledge of the pKa values of the residues of an active site can help to identify the reaction mechanism of an enzyme or aid in the interpretation of experimental results [14]. The pKa value is -log10(Ka) where Ka, is the ionization constant, a measure of a titratable group's ability to donate a proton:

K a = [ H + ] [ A ] [ H A ] ( 1 ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGlbWsdaWgaaWcbaGaemyyaegabeaakiabg2da9maalaaabaGaei4waSLaemisaG0aaWbaaSqabeaacqGHRaWkaaGccqGGDbqxcqGGBbWwcqWGbbqqdaahaaWcbeqaaiabgkHiTaaakiabc2faDbqaaiabcUfaBjabdIeaijabdgeabjabc2faDbaacaWLjaGaaCzcamaabmaabaGaeGymaedacaGLOaGaayzkaaaaaa@4224@

The pKa value is therefore equal to the pH when there is an equal concentration of the protonated and deprotonated groups in solution. Each residue with a titratable group has a model or 'intrinsic' pKa value, defined as the pKa value when all the other groups are fixed in their neutral state. Ionizable groups may be divided into acidic, which are neutral in their protonated state, and basic, which are positively charged in their protonated state. The protonated and the non-protonated forms of a residue can be very different chemically. In the case of His, the protonated form is hydrophilic and positively charged while the non-protonated form has a hydrophobic and aromatic character. Consequently the nature of the interaction made by an ionizable group may differ significantly at a pH above or below the pKa.

Table 1 shows the intrinsic or 'model' pKa values for all protein titratable groups [5]. However, in real protein-solvent systems, interactions between a residue and its environment will cause the titratable group's pKa value to deviate from that of the model. Hence the intrinsic pKa value, pKModel, combined with the environmental perturbation, ΔpKa, describes the real pKa value of a group [69].

Table 1 Model pKa values for all protein basic and acidic titratable groups. See reference 5.

pK a = pK Model + ΔpK a       (2)

The pKa shift caused by the environment is not easily quantified. This is especially true of ionizable residues within protein active sites as they often have markedly higher or lower values than the intrinsic pKa [5]. The three main factors that contribute towards environmental perturbation of the pKa value are inter-molecular hydrogen bonding, the desolvation effect and Coulombic interactions. Previous studies have identified hydrogen bonding as the most important determinant of pKa values [6]. The hydrogen bonding strength is both distance and angle dependent and therefore the extent of the perturbation is heavily dependent on the position of the interacting residues relative to each other. This is less of a factor with side chain hydrogen bonds than with main chain as the former is more flexible and therefore more likely to adopt an optimal orientation for hydrogen bond interactions. The desolvation effect is also important; this describes the energy that is required to move a group from a state of full solvation to a position within the folded protein. Desolvation effects within the protein interior preferentially increases the energies of the negatively-charged, base forms, which will increase the pKa value, while in the case of His, Lys and Arg, the desolvation preferentially increases the energy of the positively-charged, acid forms, which will decrease the pKa values. The extent of the shift is dependent on the degree to which the group is buried within the protein. The third of the major factors, which may cause a pKa shift, are Coulombic interactions between ionizable groups. The pair-wise interactions are dependent on the charges of the respective groups, but also on their location as only residues that are buried produce significant charge-charge interactions.

It is possible to predict the pKa value of a given protein residue from three-dimensional structural data. The pKa shift may be calculated from the difference in energy between the group's charged and neutral forms and added to the pKmodel value to estimate its true value. Several different algorithms have been developed to generate predicted pKa values based on structural data.

The majority of papers which have assessed the reliability of pKa predictive algorithms have only examined a limited number of proteins, making an evaluation of their accuracy very difficult [6, 1013]. The largest of these [13] looked at 260 experimental pKa values taken from 41 proteins. Here we use a large pKa dataset of 100 proteins, which is more than double that of the most extensive previous paper [13], to analyse the predictive capabilities of the MCCE [14, 15], MEAD [16], PROPKA [17] and UHBD [18] programs. The programs differ in their methodology and we assessed the merits of each. An enhanced approach to the problem of pKa prediction is proposed.

Results and discussion

Technical difficulties with the UHBD program, due to errors in the protonation of histidine residues, prevented the successful processing of all 100 proteins; in total only 43 could be completed. This lead to the creation of two separate datasets, the Large dataset (containing 492 residues and excluding the UHBD program) and the Small Dataset (containing 280 residues and including the UHBD program). Several of the programs produced outliers, in some cases outside of the physically possible pH 0–14 range. Outliers were removed from the dataset using a variety of different parameters to see the effect upon the overall accuracy of prediction Datasets were generated where all predicted values were lesser or greater than the intrinsic values by 3, 5, 7 and 10 pH units were removed as well a dataset where only physically possible values were included. This data is presented in Table 2 and Table 3. It may been seen from the data that in general the best results were obtained by using values within a range of 5 pH units. Using those parameters, 89 residues were removed from the Large dataset leaving 403 and 39 residues were removed from the Small dataset leaving 241 residues. It is unlikely that the pKa of a titratable residues can deviate more than 5 pH units from the residue's intrinsic value (see Table 1).

Table 2 Overview of the prediction accuracy of the Large Dataset (404 Residues) I. The table shows the RMSD values for each of the residues from the whole dataset and the dataset following removal of non-physical values and of all outliers outside of a range of 3, 5, 7 and 10 pH units. Figures marked in bold indicate significant results (P = 0.05).
Table 3 Overview of the prediction accuracy of the Small Dataset (242 Residues) I. The table shows the RMSD values for each of the residues from the whole dataset and the dataset following removal of non-physical values and of all outliers outside of a range of 3, 5, 7 and 10 pH units. Figures marked in bold indicate significant results (P = 0.05).

The majority of the outliers in both datasets were generated by the MEAD program, particularly when the PARCE force field was used. Considerably more residues are present within the +/- 1 unit bands for MCCE, UHBD and PROPKA. Thus there is a clear division between the performance of MEAD and that of the other programs. The same trend may be seen in the Root Mean Squared Deviation (RMSD) values (Table 2, 3). PROPKA is more accurate for Asp, Glu, Lys and Tyr with RMSD values of 0.934, 0.849, 0.260 and 1.001 respectively. His is more accurately predicted by MCCE with an RMSD of 1.522. With respect to the Small dataset in Table 3, PROPKA is the best predictor for all residues except Glu and His, where UHBD performs best: RMSD of 0.442 and 0.494 respectively. The overall accuracy of each program to a level of <0.5 pKa units is 27% AMBER, 34% PARSE, 42% MCCE, 40% UHBD (242 dataset) and 48% PROPKA. When the error range is increased to <1 unit, the difference between the programs is more distinct: 56% AMBER, 56% PARSE, 71% MCCE, 67% UHBD (Small dataset) and 81% PROPKA (Table 4, 5). Scatter plots for each program are shown in Figure 1.

Table 4 Overview of the prediction accuracy of the Large Dataset (404 Residues) II Three tables show the accuracy of the predictions to the measured pKexp within the ranges of <2 to <0.5. This is taken as the number of residues predicted within each range. Figure marked in bold indicate significant results (P = 0.05).
Table 5 Overview of the prediction accuracy of the Small Dataset (242 Residues) II. Three tables show the accuracy of the predictions to the measured pKexp. This is taken as the number of residues predicted within each range. Figures marked in bold indicate significant results (P = 0.05).
Figure 1
figure 1

Correlation plots for the individual programs. The bold line indicates perfect prediction (pKpred = pKexp). The outer lines indicate +/- 1 unit from the pKexp.

From a previous study [19], 39 carboxyl residues found within protein active sites were selected. These are shown in Table 6[2030]. The 27 Asp and 12 Glu residues have experimental values that differ from the model pKa value by at least 1 unit. Values for Asp and Glu range from 2.0 – 9.9 and 2.1 – 6.7 respectively. PROPKA and UHBD are distinct within the <0.5 and <1 unit error bands (Table 7), with PROPKA performing best with an accuracy of 66.67% within the <1 unit band. However a large discrepancy exists between the <1 and <0.5 bands for all of the programs, with an approximate 50% drop in accuracy. For the Asp residues, PROPKA predicts far better than the other programs, with values of 37.04%, compared to 18.52% for MCCE and 0% for UHBD at the <0.5 level. However, for Glu residues program performance is closer, with values of 25 % for PARSE, MCCE and UHBD and 33% for AMBER and PROPKA at the <0.5 level. When the error level is extended to <1, PROPKA is far better, with a value of 83.33% compared to its nearest rivals UHBD, AMBER and PARSE with values of 50%. PROPKA shows an accuracy of 85% to within 1 pKa unit for surface residues, whereas the same accuracy is limited to 53% with the MCCE program for buried residues, a considerable reduction. The accuracy values obtained for MCCE were also comparable with those that recently appeared on the program's website, which show an RMSD of 0.77 and an accuracy range of 98% within the <2 pH unit range and 84% accuracy within the <1 pH unit range [31].

Table 6 Carboxyl sites of interest. B = Buried, S = Surface. Figures marked in bold indicate predictions >2 units from the pKexp.
Table 7 Accuracy of prediction for the carboxyl sites. The accuracy was tested to the <2 to <0.5 ranges. The individual accuracy of the residues is given in the bottom two tables. Figures marked in bold indicate the greatest accuracy.

Given the capacity of the predictive programs to under or over predict the true pKa value (see Discussion), the possibility of using a consensus approach to integrate the various programs was investigated. Using the Small dataset, combinations of the prediction values were calculated and the accuracy tested as before. From this dataset, 25 combinations (Table 8) were tried. One, UHBD + PROPKA, leads to improvements in all residues except histidine. The His RMSD value of 0.955, from this combination, is far better than all of the programs except UHBD, which has a value of 0.494. The overall accuracy of this combination was not a surprise due to the individual performance of each program. A further attempt to integrate the programs was made by using Partial Least Squared (PLS) regression. The PLS model generated had a correlation coefficient (r2) of 0.9 and a cross-validated correlation coefficient (q2) of 0.89. The resulting equations were applied to the Small dataset and the accuracy results are shown in Table 9. The accuracy improved greatly in the <0.5 range to almost 58%, significantly greater than either the other programs run individually or the combination method (UHBD + PROPKA). Again, a disparity between the predictive capabilities of MEAD and the other programs may be seen. AMBER and PARSE have coefficients of -0.03129 and -0.0001836 respectively, which are comparatively small compared to the other coefficients (0.239, 0.4282 and 0.4108 for MCCE, UHBD and PROPKA respectively). The results above would indicate a more significant relationship between the PROPKA and UHBD predictions and the experimental pKa values. This data fits with the trends seen in the other analysis. However, multiple linear regression is not an ideal way to increase predictive accuracy as combining programs will cause the propagation of experimental errors within a given dataset.

Table 8 Overview of the combination methods (242 Residues). The residue RMSD values are given for all of the 25 combinations consisting of AMBER (A), PARSE (P), MCCE (M), UHBD (U) and PROPKA (P). Figures marked in bold indicate an improvement while the asterisk indicates the best score.
Table 9 Accuracy of the multiple regression. The accuracy is given as the number of predictions within a range of the pKexp. For comparison the UHBD + PROPKA combination is added. Figures marked in bold indicate improvements.

The disparity between the predictive capabilities of the program relates to the algorithm that used for the calculation. MEAD, UHBD and MCCE are all based upon an electrostatic continuum model that solves the linearised Poisson-Boltzmann equation numerically [32, 33]. The electrostatic potential φ(r)can be calculated by the Poisson-Boltzmann equation:

∇ ε (r)∇ φ (r) - κ2 (r)ε (r)φ (r) = -4πρ (r)       (3)

Whereε is the dielectric constant, r is the position vector, Φ is the electrostatic potential, ρ is the charge distribution and κ is a parameter that represents the effect of mobile ions in solution. All three programs work on the assumption that the major determinant of the pKa shift from the model values are the electrostatic effects of burying titratable groups in low dielectric medium. A model of the macromolecule-solvent system is used with dielectric constants of 80 for the solvent and 4 for the protein. The details of the atomic structure are incorporated into the placement of charges and dielectric boundaries. The calculation accounts for the desolvation energy, the titratable group's interaction with partial charges and the group's interaction with other titratable groups in the protein. MEAD consistently performs more poorly than the other two Poisson-Boltzmann-based programs. This may be because, in addition to the basic calculation, UHBD and MCCE also incorporate a Monte Carlo function to sample the multiple conformations of each titratable site. The Monte Carlo method achieves convergence by random sampling of side chain conformers. This allows the MCCE and UHBD programs to make a more realistic calculation of the charge-charge interactions than MEAD. The RMSD values of the two MEAD data sets – PARSE and AMBER – are comparable, both producing similar RMSD values and numbers of outliers. However, Table 2 shows that the PARSE force field generates outliers that deviate much further from the experimentally-determined values than those of AMBER. Although the parameters of the two force fields are similar; the atomic radii of the hydrogens for PARSE are slightly larger which may have created inaccurate charge-charge interactions that have increased the calculated pKa value (this would also account for the program's propensity to generate outliers). This is especially noticeable for Lys where the respective RMSD values of AMBER and PARSE are 1.2 and 25.8.

Although the PROPKA and MCCE programs are of comparable accuracy, the data suggests that the former tends to under-predict pKa values whilst the latter over-predicts them (Figure 1). This observation may reflect the different approaches towards the calculation of the pKa value in the two programs. PROPKA [17] takes a different approach to the other three programs, calculating the pKa shift by using empirical rules that incorporate effects from hydrogen bonds, desolvation and Coulombic interactions. The extent of the pKa shift caused by hydrogen bonding is proportional to the number of hydrogen bonds formed by the titratable group [19]. The desolvation effect is calculated from the solvent accessible surface and the 'depth of burial' (the distance of the group from the protein surface). Lastly, the strength of the Coulombic charge-charge interactions is dependent on the distance between the charges and on the state of the surrounding ionizable residues. This process, however, is only applied to buried pairs of ionizable residues. Therefore PROPKA's tendency to under-predict pKa values may be caused by the program's emphasis on the dominance of hydrogen bonding in determining the extent of the shift. Hydrogen bonds have the effect of lowering pKa values [34] and the predicted values may reflect that. Conversely, MCCE's tendency to over-predict may be the result of charge-charge interaction forcing an increase in the pKa value. The majority of the over-predicted values are surface residues and, unlike PROPKA, the MCCE program does not take into account the lessened effects of charge-charge interactions when the respective residues are not buried within the protein interior. Consequently, PROPKA and MCCE tend to be more accurate for surface and buried residues respectively.

Side chains located at active sites are of particular interest as they often have unusually high or low pKa values. In some instances, the electrostatic charge of the active site can be radically different from that the rest of the protein as a means to 'steer' a ligand towards the binding cleft [35]. For a program to work as an effective pKa prediction tool it must be able to predict unusual pKa values accurately. Generally speaking, the accuracy of prediction decreased the further the measured pKa value was from the side chain's intrinsic pKa. Again, PROPKA proved to be the most consistent of the programs. This is not surprising as the design of the model and assignment of parameters were based upon a large dataset of carboxyl pKa values. Overall, the active site data was encouraging at the <1 unit level. However, once reduced to <0.5, the accuracy of all of the programs decreased. This highlights a key area for the development of new models and programs.

An interesting correlation is seen with respect to the regression coefficients and the general performance of the programs. Coefficients are generally an indicator of the relative importance of the contributing terms in a regression equation. The comparative performance of PROPKA, the combination methods and the regression model is seen in Figure 2. PROPKA is equally effective as these additional methods, although the regression data performs better than the best combination. A Molecular Dynamics simulation of one of the proteins from the dataset (Barnase wild type ribonuclease (pdb code: 1A2P)) showed a standard deviation of ± 1.4 for the pKa value over a one-nanosecond period. This indicates that a dynamic structure has a large capacity for extreme pKa shifts. This suggests that any accurate prediction pKa method would need to incorporate conformational variability into the algorithm.

Figure 2
figure 2

Comparative performance of the prediction methods. The accuracy ranges (0.5 – 2) apply to the deviation from the measured pKa value. The percentage score represents the number of residues predicted in each range.


PROPKA is the most accurate method for all residues except Glu and His, where it is narrowly surpassed by UHBD and MCCE, respectively. Furthermore, the program also produces by far the best values for surface residues, most likely by taking sufficient account of hydrogen bonding. However, MCCE predicts buried residues far better than PROPKA, possibly by a more accurate evaluation of the charge-charge interaction with the conformers optimised by the Monte Carlo procedure. It must be noted that in all cases, the prediction of the buried residues is less accurate than for surface residues, indicating it is easier to calculate the interaction of a solvated or partially solvated residue than one densely packed within the protein interior. Overall, the best standalone program is PROPKA, which also produced the fewest outliers and is computationally much faster than the other programs. What the program lacks is a capacity to fully explore the conformational space available to the protein, which may ultimately limit its capacity to predict pKa value. The reliability of the predictive programs tends to vary with both the residue type and its spatial location. For glutamic acid residues, UHBD produced the best results while for Histidine and for all buried residues, MCCE performed well. The comparatively poor prediction of the 'unusual' pKa values by all of the programs was disappointing. Their ability to only predict a third of the residues to a high degree of accuracy highlights an area requiring further development. The variation in pKa values observed in our molecular dynamics simulation strongly suggests a complete sampling of conformations accessible to protein structures may be useful in creating accurate predictive software.


100 proteins for which pKa values had been determined experimentally were taken from PPD, a database of protein ionization constants [36, 37]. The full list of the pdb files comprising the dataset is included as an additional file [See PDB codes]. A wide range of both protein size and function was represented in the dataset. The protein structures were taken from the RCSB protein data bank [38]. In order to run the MEAD program, pdb files were protonated by using the leap program and the AMBER 94 force field (subsequent versions of the force field proved to be incompatible) and changed into pqr format using the online PDB2PQR converter [39, 40]. Separate sets of files were created based on the AMBER99 and PARSE force fields. MEAD and UHBD were run on an IBM Blade Center Cluster, which consists of 5 Blade Centers containing 67 Dual Xeon (3.06Ghz, 1Gb) Blades. The MCCE calculations were carried out on an SG Octane. The majority of the pdb files did not need any modification. However, 1D3K, 1GU8, 1HRH and 1DRH were protonated with the leap program and the AMBER 03 force field in order to remove inconsistencies in the pdb files. Additionally, 1DUK, 1NFN and 2CI2 underwent minimization with sander using a steepest descent method that continued for 20,000 1 fs time steps or until the root mean square deviation between successive time-steps had fallen below 0.01Å in order to eliminate steric clashes. The PROPKA program was run online from its server [41]; no modification was required to run the files. Values for all Asp, Glu, His, Tyr, Lys residues were predicted. Arg was excluded from the calculation due to lack of experimental data. Arginines's high pKa precludes establishing a titratable curve as the protein denatures at high pH. Cys was also excluded from the calculations due to a lack of experimental data.

The resultant data was also analysed using the Partial Least Squares (PLS) method. PLS is an extension of Multiple Linear Regression (MLR) that where a set of coefficients are developed from dependent variables, in this case the pKa prediction values, by comparison with the independent variables, the experimental pKa values. The PLS analysis was performed using the program GOLPE (Generating Optimal Linear PLS Estimations)[42].