Abstract
Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.
Similar content being viewed by others
References
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 15:3398–3402
Berman H et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Bartlett G, Porter C, Borkakoti N, Thornton J (2002) Analysis of catalytic residues in enzyme active sites. J Mol Biol 324:105–121
Capra J, Singh S (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23:1875–1882
Caffery D, somaroo S, Hughes J, Mintseris J, huang E (2004) Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
David L, Sutch B, Livesay DR (2005) Predicting protein functional sites with phylogenetic motifs. Proteins 58:309–320
Donald JS, Shakhnovich EI (2005) Determining functional specificity from protein sequence. Bioinformatics 21:2629–2635
Dukka B, Dennis R (2008) Improving position-specific predictions of protein functional sites using phylogenetic motifs. Bioinformatics 24:2308–2316
del Sol Mesa A, Pazos F, Valencia A (2003) Automatic methods for predicting functionally important residues. J Mol Biol 326:1289–1302
Dou YC, Zheng XQ, Wang J (2009a) Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 262(2):317–322
Dou YC, Zheng XQ, Wang J (2009b) Prediction of catalytic residues using the variation of stereochemical properties. Protein J 28:29–33
Dodge C, Schneider R, Sander C (1998) The hssp database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 26:313–315
Fischer JD, Mayer CE, Söding J (2008) Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24:613–620
Gribskov M, Robinson N (1996) Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20:25–33
Gutteridge A, Bartlett GJ, Thornton JM (2003) Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol 330:719–734
Innis CA, Anand AP, Sowdhamini R, Brocchieri L (2004) Prediction of functional sites in proteins using conserved functional group analysis. J Mol Biol 337:1053–1068
Johnson RW (1979) Axiomatic characterization of the directed divergances and their linear combinations. IEEE Trans Inf Theory 6:709–716
Liu XS, Guo WL (2008) Robustness of the residue conservation score reflecting both frequencies and physicochemistries. Amino acids 34:643–652
Lin J (1991) Divergence measure based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151
Mihalek I, Reš I, Lichtarge O (2007) Background frequencies for residue variability estimates: BLOSUM revisited. BMC Bioinformatics 8:488
Mihalek I, Reš I, Lichtarge O (2004) A family of evolution–entropy hybrid methods for ranking residues by importance. J Mol Biol 336:1265–1282
Merkl R, Zwick M (2008) H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 9:151
Mirny L, Shakhnovich E (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196
Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of sitespecific rate-inference methods for protein sequences: empirical bayesian methods are superior. Mol Biol and Evol 21:1781–1791
Pande S, Raheja A, Liversay DR (2007) Prediction of enzyme catalytic sites from sequence using neural networks. IEEE symp CIBCB 07:247–253
Panchenko A, Kondrashov F, Bryant S (2003) Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 13:884–892
Petrova N, Wu C (2006) Prediction of catalytic residues using support vector machines with selected protein sequence and structural properties. BMC Bioinformatics 7:312
Pei J, Grishin N (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17:700–712
Porter C, Bartlett G, Thornton J (2003) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133
Reva B, Antipin Y, Sander C (2007) Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol 8:R232
Sterner B, Singh R, Berger B (2007) Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 14:1058–1073
Shenkin P, Erman BLM (1991) Information-theoretical entropy as a measure of sequence variability. Proteins 11:297–313
Taylor W (1986) The classification of amino acid conservation. J Theor Biol 119:205–218
Tang Y, Sheng Z, Chen Y, Zhang Z (2008) An improved prediction of catalytic residues in enzyme structures. Protein Eng Des Sel 21:295–302
Thompson J, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Valdar W (2002) Scoring residue conservation. Proteins 48:227–241
Wang K, Samudrala R (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 7:385
Williamson R (1995) Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters. J Theor Biol 174:179–188
Ye K, Vriend G, IJzerman A (2008) Tracing evolutionary pressure. Bioinformatics 24:908–915
Youn E, Peters B, Radivojac P, Mooney SD (2007) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci 16:216–226
Zhang T, Zhang H, Chen K, Shen SY, Ruan J, Kurgan L (2008a) Accutate sequence-based prediction of catalytic residues. Bioinformatics 24:2329–2338
Zhang SW, Zhang YL, Pan Q, Cheng YW, Chou KC (2008b) Estimating residue evolutionary conservation by introducing von Neumann entropy and a novel gap-treating approach. Amino acids 35:495–501
Acknowledgments
This work was partially supported by the Natural Science Foundation of China (No.10731040), Shanghai Leading Academic Discipline Project (No. S30405) and Innovation Program of Shanghai Municipal Education Commission (No. 09zz134).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dou, Y., Zheng, X., Yang, J. et al. Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 39, 1353–1361 (2010). https://doi.org/10.1007/s00726-010-0587-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-010-0587-2