Skip to main content
Log in

Prediction of catalytic residues based on an overlapping amino acid classification

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 15:3398–3402

    Google Scholar 

  • Berman H et al (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  CAS  PubMed  Google Scholar 

  • Bartlett G, Porter C, Borkakoti N, Thornton J (2002) Analysis of catalytic residues in enzyme active sites. J Mol Biol 324:105–121

    Article  CAS  PubMed  Google Scholar 

  • Capra J, Singh S (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23:1875–1882

    Article  CAS  PubMed  Google Scholar 

  • Caffery D, somaroo S, Hughes J, Mintseris J, huang E (2004) Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202

    Article  Google Scholar 

  • Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

    Book  Google Scholar 

  • David L, Sutch B, Livesay DR (2005) Predicting protein functional sites with phylogenetic motifs. Proteins 58:309–320

    Google Scholar 

  • Donald JS, Shakhnovich EI (2005) Determining functional specificity from protein sequence. Bioinformatics 21:2629–2635

    Article  CAS  PubMed  Google Scholar 

  • Dukka B, Dennis R (2008) Improving position-specific predictions of protein functional sites using phylogenetic motifs. Bioinformatics 24:2308–2316

    Article  PubMed  Google Scholar 

  • del Sol Mesa A, Pazos F, Valencia A (2003) Automatic methods for predicting functionally important residues. J Mol Biol 326:1289–1302

    Article  CAS  PubMed  Google Scholar 

  • Dou YC, Zheng XQ, Wang J (2009a) Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 262(2):317–322

    Article  PubMed  Google Scholar 

  • Dou YC, Zheng XQ, Wang J (2009b) Prediction of catalytic residues using the variation of stereochemical properties. Protein J 28:29–33

    Article  CAS  PubMed  Google Scholar 

  • Dodge C, Schneider R, Sander C (1998) The hssp database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 26:313–315

    Article  CAS  PubMed  Google Scholar 

  • Fischer JD, Mayer CE, Söding J (2008) Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24:613–620

    Article  CAS  PubMed  Google Scholar 

  • Gribskov M, Robinson N (1996) Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20:25–33

    Article  CAS  PubMed  Google Scholar 

  • Gutteridge A, Bartlett GJ, Thornton JM (2003) Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol 330:719–734

    Article  CAS  PubMed  Google Scholar 

  • Innis CA, Anand AP, Sowdhamini R, Brocchieri L (2004) Prediction of functional sites in proteins using conserved functional group analysis. J Mol Biol 337:1053–1068

    Article  CAS  PubMed  Google Scholar 

  • Johnson RW (1979) Axiomatic characterization of the directed divergances and their linear combinations. IEEE Trans Inf Theory 6:709–716

    Article  Google Scholar 

  • Liu XS, Guo WL (2008) Robustness of the residue conservation score reflecting both frequencies and physicochemistries. Amino acids 34:643–652

    Article  CAS  PubMed  Google Scholar 

  • Lin J (1991) Divergence measure based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151

    Article  Google Scholar 

  • Mihalek I, Reš I, Lichtarge O (2007) Background frequencies for residue variability estimates: BLOSUM revisited. BMC Bioinformatics 8:488

    Article  CAS  PubMed  Google Scholar 

  • Mihalek I, Reš I, Lichtarge O (2004) A family of evolution–entropy hybrid methods for ranking residues by importance. J Mol Biol 336:1265–1282

    Article  CAS  PubMed  Google Scholar 

  • Merkl R, Zwick M (2008) H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 9:151

    Article  PubMed  Google Scholar 

  • Mirny L, Shakhnovich E (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196

    Article  CAS  PubMed  Google Scholar 

  • Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of sitespecific rate-inference methods for protein sequences: empirical bayesian methods are superior. Mol Biol and Evol 21:1781–1791

    Article  CAS  Google Scholar 

  • Pande S, Raheja A, Liversay DR (2007) Prediction of enzyme catalytic sites from sequence using neural networks. IEEE symp CIBCB 07:247–253

    Google Scholar 

  • Panchenko A, Kondrashov F, Bryant S (2003) Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 13:884–892

    Article  Google Scholar 

  • Petrova N, Wu C (2006) Prediction of catalytic residues using support vector machines with selected protein sequence and structural properties. BMC Bioinformatics 7:312

    Article  PubMed  Google Scholar 

  • Pei J, Grishin N (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17:700–712

    Article  CAS  PubMed  Google Scholar 

  • Porter C, Bartlett G, Thornton J (2003) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133

    Article  Google Scholar 

  • Reva B, Antipin Y, Sander C (2007) Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol 8:R232

    Article  PubMed  Google Scholar 

  • Sterner B, Singh R, Berger B (2007) Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 14:1058–1073

    Article  CAS  PubMed  Google Scholar 

  • Shenkin P, Erman BLM (1991) Information-theoretical entropy as a measure of sequence variability. Proteins 11:297–313

    Article  CAS  PubMed  Google Scholar 

  • Taylor W (1986) The classification of amino acid conservation. J Theor Biol 119:205–218

    Article  CAS  PubMed  Google Scholar 

  • Tang Y, Sheng Z, Chen Y, Zhang Z (2008) An improved prediction of catalytic residues in enzyme structures. Protein Eng Des Sel 21:295–302

    Article  CAS  PubMed  Google Scholar 

  • Thompson J, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  PubMed  Google Scholar 

  • Valdar W (2002) Scoring residue conservation. Proteins 48:227–241

    Article  CAS  PubMed  Google Scholar 

  • Wang K, Samudrala R (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 7:385

    Article  PubMed  Google Scholar 

  • Williamson R (1995) Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters. J Theor Biol 174:179–188

    Article  CAS  PubMed  Google Scholar 

  • Ye K, Vriend G, IJzerman A (2008) Tracing evolutionary pressure. Bioinformatics 24:908–915

    Article  CAS  PubMed  Google Scholar 

  • Youn E, Peters B, Radivojac P, Mooney SD (2007) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci 16:216–226

    Article  CAS  PubMed  Google Scholar 

  • Zhang T, Zhang H, Chen K, Shen SY, Ruan J, Kurgan L (2008a) Accutate sequence-based prediction of catalytic residues. Bioinformatics 24:2329–2338

    Article  CAS  PubMed  Google Scholar 

  • Zhang SW, Zhang YL, Pan Q, Cheng YW, Chou KC (2008b) Estimating residue evolutionary conservation by introducing von Neumann entropy and a novel gap-treating approach. Amino acids 35:495–501

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Natural Science Foundation of China (No.10731040), Shanghai Leading Academic Discipline Project (No. S30405) and Innovation Program of Shanghai Municipal Education Commission (No. 09zz134).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dou, Y., Zheng, X., Yang, J. et al. Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 39, 1353–1361 (2010). https://doi.org/10.1007/s00726-010-0587-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-010-0587-2

Keywords

Navigation