Prediction of catalytic residues based on an overlapping amino acid classification

Dou, Yongchao; Zheng, Xiaoqi; Yang, Jialiang; Wang, Jun

doi:10.1007/s00726-010-0587-2

Prediction of catalytic residues based on an overlapping amino acid classification

Original Article
Published: 10 April 2010

Volume 39, pages 1353–1361, (2010)
Cite this article

Amino Acids Aims and scope Submit manuscript

Yongchao Dou^1,2,
Xiaoqi Zheng⁵,
Jialiang Yang⁴ &
…
Jun Wang^3,5

1327 Accesses
16 Citations
Explore all metrics

Abstract

Protein sequence conservation is a powerful and widely used indicator for predicting catalytic residues from enzyme sequences. In order to incorporate amino acid similarity into conservation measures, one attempt is to group amino acids into disjoint sets. In this paper, based on the overlapping amino acids classification proposed by Taylor, we define the relative entropy of Venn diagram (RVD) and RVD2. In large-scale testing, we demonstrate that RVD and RVD2 perform better than many existing conservation measures in identifying catalytic residues, especially than the commonly used relative entropy (RE) and Jensen–Shannon divergence (JSD). To further improve RVD and RVD2, two new conservation measures are obtained by combining them with the classical JSD. Experimental results suggest that these combination measures have excellent performances in identifying catalytic residues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoeViz: a web-based tool for coevolution analysis of protein residues

Article Open access 08 March 2016

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

Discovering co-occurring patterns and their biological significance in protein families

Article Open access 06 November 2014

References

Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 15:3398–3402
Google Scholar
Berman H et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Article CAS PubMed Google Scholar
Bartlett G, Porter C, Borkakoti N, Thornton J (2002) Analysis of catalytic residues in enzyme active sites. J Mol Biol 324:105–121
Article CAS PubMed Google Scholar
Capra J, Singh S (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23:1875–1882
Article CAS PubMed Google Scholar
Caffery D, somaroo S, Hughes J, Mintseris J, huang E (2004) Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202
Article Google Scholar
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Book Google Scholar
David L, Sutch B, Livesay DR (2005) Predicting protein functional sites with phylogenetic motifs. Proteins 58:309–320
Google Scholar
Donald JS, Shakhnovich EI (2005) Determining functional specificity from protein sequence. Bioinformatics 21:2629–2635
Article CAS PubMed Google Scholar
Dukka B, Dennis R (2008) Improving position-specific predictions of protein functional sites using phylogenetic motifs. Bioinformatics 24:2308–2316
Article PubMed Google Scholar
del Sol Mesa A, Pazos F, Valencia A (2003) Automatic methods for predicting functionally important residues. J Mol Biol 326:1289–1302
Article CAS PubMed Google Scholar
Dou YC, Zheng XQ, Wang J (2009a) Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 262(2):317–322
Article PubMed Google Scholar
Dou YC, Zheng XQ, Wang J (2009b) Prediction of catalytic residues using the variation of stereochemical properties. Protein J 28:29–33
Article CAS PubMed Google Scholar
Dodge C, Schneider R, Sander C (1998) The hssp database of protein structure-sequence alignments and family profiles. Nucleic Acids Res 26:313–315
Article CAS PubMed Google Scholar
Fischer JD, Mayer CE, Söding J (2008) Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24:613–620
Article CAS PubMed Google Scholar
Gribskov M, Robinson N (1996) Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20:25–33
Article CAS PubMed Google Scholar
Gutteridge A, Bartlett GJ, Thornton JM (2003) Using a neural network and spatial clustering to predict the location of active sites in enzymes. J Mol Biol 330:719–734
Article CAS PubMed Google Scholar
Innis CA, Anand AP, Sowdhamini R, Brocchieri L (2004) Prediction of functional sites in proteins using conserved functional group analysis. J Mol Biol 337:1053–1068
Article CAS PubMed Google Scholar
Johnson RW (1979) Axiomatic characterization of the directed divergances and their linear combinations. IEEE Trans Inf Theory 6:709–716
Article Google Scholar
Liu XS, Guo WL (2008) Robustness of the residue conservation score reflecting both frequencies and physicochemistries. Amino acids 34:643–652
Article CAS PubMed Google Scholar
Lin J (1991) Divergence measure based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151
Article Google Scholar
Mihalek I, Reš I, Lichtarge O (2007) Background frequencies for residue variability estimates: BLOSUM revisited. BMC Bioinformatics 8:488
Article CAS PubMed Google Scholar
Mihalek I, Reš I, Lichtarge O (2004) A family of evolution–entropy hybrid methods for ranking residues by importance. J Mol Biol 336:1265–1282
Article CAS PubMed Google Scholar
Merkl R, Zwick M (2008) H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 9:151
Article PubMed Google Scholar
Mirny L, Shakhnovich E (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196
Article CAS PubMed Google Scholar
Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of sitespecific rate-inference methods for protein sequences: empirical bayesian methods are superior. Mol Biol and Evol 21:1781–1791
Article CAS Google Scholar
Pande S, Raheja A, Liversay DR (2007) Prediction of enzyme catalytic sites from sequence using neural networks. IEEE symp CIBCB 07:247–253
Google Scholar
Panchenko A, Kondrashov F, Bryant S (2003) Prediction of functional sites by analysis of sequence and structure conservation. Protein Sci 13:884–892
Article Google Scholar
Petrova N, Wu C (2006) Prediction of catalytic residues using support vector machines with selected protein sequence and structural properties. BMC Bioinformatics 7:312
Article PubMed Google Scholar
Pei J, Grishin N (2001) AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 17:700–712
Article CAS PubMed Google Scholar
Porter C, Bartlett G, Thornton J (2003) The catalytic site atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133
Article Google Scholar
Reva B, Antipin Y, Sander C (2007) Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol 8:R232
Article PubMed Google Scholar
Sterner B, Singh R, Berger B (2007) Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 14:1058–1073
Article CAS PubMed Google Scholar
Shenkin P, Erman BLM (1991) Information-theoretical entropy as a measure of sequence variability. Proteins 11:297–313
Article CAS PubMed Google Scholar
Taylor W (1986) The classification of amino acid conservation. J Theor Biol 119:205–218
Article CAS PubMed Google Scholar
Tang Y, Sheng Z, Chen Y, Zhang Z (2008) An improved prediction of catalytic residues in enzyme structures. Protein Eng Des Sel 21:295–302
Article CAS PubMed Google Scholar
Thompson J, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
Article CAS PubMed Google Scholar
Valdar W (2002) Scoring residue conservation. Proteins 48:227–241
Article CAS PubMed Google Scholar
Wang K, Samudrala R (2006) Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 7:385
Article PubMed Google Scholar
Williamson R (1995) Information theory analysis of the relationship between primary sequence structure and ligand recognition among a class of facilitated transporters. J Theor Biol 174:179–188
Article CAS PubMed Google Scholar
Ye K, Vriend G, IJzerman A (2008) Tracing evolutionary pressure. Bioinformatics 24:908–915
Article CAS PubMed Google Scholar
Youn E, Peters B, Radivojac P, Mooney SD (2007) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci 16:216–226
Article CAS PubMed Google Scholar
Zhang T, Zhang H, Chen K, Shen SY, Ruan J, Kurgan L (2008a) Accutate sequence-based prediction of catalytic residues. Bioinformatics 24:2329–2338
Article CAS PubMed Google Scholar
Zhang SW, Zhang YL, Pan Q, Cheng YW, Chou KC (2008b) Estimating residue evolutionary conservation by introducing von Neumann entropy and a novel gap-treating approach. Amino acids 35:495–501
Article PubMed Google Scholar

Download references

Acknowledgments

This work was partially supported by the Natural Science Foundation of China (No.10731040), Shanghai Leading Academic Discipline Project (No. S30405) and Innovation Program of Shanghai Municipal Education Commission (No. 09zz134).

Author information

Authors and Affiliations

School of Mathematical Science, Dalian University of Technology, Dalian, 116024, People’s Republic of China
Yongchao Dou
College of Advanced Science and Technology, Dalian University of Technology, Dalian, 116024, People’s Republic of China
Yongchao Dou
Scientific Computing Key Laboratory of Shanghai Universities, Shanghai, 200234, People’s Republic of China
Jun Wang
MPI-Institute of Computational Biology, CAS, Shanghai, 200031, People’s Republic of China
Jialiang Yang
Department of Mathematics, Shanghai Normal University, Shanghai, 200234, People’s Republic of China
Xiaoqi Zheng & Jun Wang

Authors

Yongchao Dou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jialiang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dou, Y., Zheng, X., Yang, J. et al. Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 39, 1353–1361 (2010). https://doi.org/10.1007/s00726-010-0587-2

Download citation

Received: 18 October 2009
Accepted: 27 March 2010
Published: 10 April 2010
Issue Date: November 2010
DOI: https://doi.org/10.1007/s00726-010-0587-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of catalytic residues based on an overlapping amino acid classification

Abstract

Access this article

Similar content being viewed by others

CoeViz: a web-based tool for coevolution analysis of protein residues

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

Discovering co-occurring patterns and their biological significance in protein families

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of catalytic residues based on an overlapping amino acid classification

Abstract

Access this article

Similar content being viewed by others

CoeViz: a web-based tool for coevolution analysis of protein residues

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

Discovering co-occurring patterns and their biological significance in protein families

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation