Skip to main content
Log in

Sequence Conservation in the Prediction of Catalytic Sites

  • Published:
The Protein Journal Aims and scope Submit manuscript

Abstract

Predicting catalytic sites of a given enzyme is an important open problem of Bioinformatics. Recently, many machine learning-based methods have been developed which have the advantage that they can account for many sequential or structural features. We found that although many kinds of features are incorporated, protein sequence conservation is the main part of information they used and should play an important role in the future. So we tested several conservation features in their ability to predict catalytic sites by using the Support Vector Machine classifier. Our results suggest that position specific scoring matrix performs better than other features and incorporating conservation information of sequentially adjacent sites is more effective than that of structurally adjacent ones. Moreover, although conservation information is effective in predicting catalytic sites, it is a difficult problem to optimize the combination of conservation features and other ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Abbreviations

SVM:

Support Vector Machine

JSD:

Jensen-Shannon divergence

PSSM:

Position specific scoring matrix

PCA:

Principal Component Analysis

PC:

Principal Component

WOP:

Weighted observed percentage

CR:

Contribution rate

ACR:

Accumulated contribution rate

ROC:

Receiver operating characteristic

P:

Precision

R:

Recall

FPR:

False positive rate

TPR:

True position rate

MCC:

Matthews correlation coefficient

RP:

Recall/Precision

References

  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25(17):3398–3402

    Article  Google Scholar 

  2. Caffery D, Somaroo S, Hughes J, Mintserlis J, Hunang E (2004) Protein Sci 13:190–202

    Article  Google Scholar 

  3. Capra J, Singh S (2007) Bioinformatics 23:1875–1882

    Article  CAS  Google Scholar 

  4. Cilia E, Passerini A (2010) BMC Bioinformat 11:115

    Article  Google Scholar 

  5. del sol mesa A, Pazos F, Valencia A (2003) J Mol Biol 326:1289–1302

    Article  CAS  Google Scholar 

  6. Donald JS, Shakhnovich EI (2005) Bioinformatics 21:2629–2635

    Article  CAS  Google Scholar 

  7. Dou YC, Zheng XQ, Wang J (2009) J Theor Biol 262(2):317–322

    Article  Google Scholar 

  8. Dou YC, Zheng XQ, Wang J (2009) Protein J 28:29–33

    Article  CAS  Google Scholar 

  9. Dou YC, Zheng XQ, Yang JL, Wang J (2010) Amino Acids 39:1353–1361

    Article  CAS  Google Scholar 

  10. Dukka B, Dennis R (2008) Bioinformatics 24:2308–2316

    Article  Google Scholar 

  11. Fan RE, Chen PH, Lin CJ (2005) J Mach Learn Res 6:1889–1918

    Google Scholar 

  12. Fischer JD, Mayer CE, Soding J (2008) Bioinformatics 24:613–620

    Article  CAS  Google Scholar 

  13. Gutteridge A, Bartlett GJ, Thornton JM (2003) J Mol Biol 303:719–734

    Article  Google Scholar 

  14. Innis CA, Anand AP, Sowdhamini R (2003) J Mol Biol 337:1053–1068

    Article  Google Scholar 

  15. Johansson F, Toh H (2010) BMC Bioinformat 11:383

    Article  Google Scholar 

  16. Johansson F, Toh H (2010) J Bioinform Comput Biol 8(5):809–823

    Article  CAS  Google Scholar 

  17. Li GH, Huang JF (2010) BMC Bioinformat 11:439

    Article  Google Scholar 

  18. Liu H, Setiono R (1995) IEEE computer society. Washington, DC, USA, pp 388–391

    Google Scholar 

  19. Liu XS, Guo WL (2008) Amino Acids 34:643–652

    Article  CAS  Google Scholar 

  20. Liu ZP, Wu LY, Wang Y, Zhang XS, Chen LN (2010) Bioinformatics 26:1616–1622

    Article  CAS  Google Scholar 

  21. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Mol Biol Evol 21:1781–1791

    Article  CAS  Google Scholar 

  22. Mihalek I, Reos I, Lichtarge O (2004) J Mol Biol 336:1265–1282

    Article  CAS  Google Scholar 

  23. Mirny L, Shakhnovich E (1999) J Mol Biol 291:177–196

    Article  CAS  Google Scholar 

  24. Palenchar P, Mount M, Cusato D, Dougherty J (2008) Protein J 27:401–407

    Article  CAS  Google Scholar 

  25. Panchenko A, Kondrashov F, Bryant S (2003) Protein Sci 13:884–892

    Article  Google Scholar 

  26. Pande S, Raheja A, Livesay DR (2007) IEEE Symp CIBCB 7:247–253

    Google Scholar 

  27. Pei J, Grishin N (2001) Bioinformatics 17:700–712

    Article  CAS  Google Scholar 

  28. Petrova N, Wu C (2006) BMC Bioinformat 7:312

    Article  Google Scholar 

  29. Sankararaman S, Sha F, Kirsch JF, Jordan MI, Kimmen Sjolander K (2010) Bioinformatics 5:617–624

    Article  Google Scholar 

  30. Shenkin P, Erman BLM (1991) Proteins 11:297–313

    Article  CAS  Google Scholar 

  31. Smith LI (2002) http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

  32. Sterner B, Singh R, Berger B (2007) J Comput Biol 14:1058–1073

    Article  CAS  Google Scholar 

  33. Tang Y, Sheng Z, Chen Y, Zhang Z (2008) Protein Eng Des Sel 21:295–302

    Article  CAS  Google Scholar 

  34. Taylor W (1986) J Theor Biol 119:205–218

    Article  CAS  Google Scholar 

  35. Valdar W (2002) Proteins 48:227–241

    Article  CAS  Google Scholar 

  36. Wang K, Samudrala R (2006) BMC Bioinformat 7:385

    Article  Google Scholar 

  37. Williamson R (1995) J Theor Biol 24:908–915

    Google Scholar 

  38. Ye K, Vriend G, IJzerman AP (2008) Bioinformatics 24:908–915

    Article  CAS  Google Scholar 

  39. Youn E (2007) Protein Sci 16:216–226

    Article  CAS  Google Scholar 

  40. Zhang SW, Zhang YL, Pan Q, Cheng YM, Chou KC (2008) Amino Acids 35:495–501

    Article  Google Scholar 

  41. Zhang T, Zhang H, Chen K, Shen SY, Ruan JS, Kurgan L (2008) Bioinformatics 24:2329–2338

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (No. 10731040), Shanghai Leading Academic Discipline Project (No. S30405) and Innovation Program of Shanghai Municipal Education Commission (No. 09zz134). All the calculational tasks are applied on a LENOVO Shenteng 1800 COW, which is located in School of Mathematical Science, Dalian University of Technology. The authors thank DR. Elisa Cilia for providing their data on the HA-superfamily data set.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dou, Y., Geng, X., Gao, H. et al. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 30, 229–239 (2011). https://doi.org/10.1007/s10930-011-9324-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10930-011-9324-2

Keywords

Navigation