Skip to main content
Log in

Prediction of Protein Function Improving Sequence Remote Alignment Search by a Fuzzy Logic Algorithm

  • Published:
The Protein Journal Aims and scope Submit manuscript

Abstract

The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  CAS  Google Scholar 

  2. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291:1304–1351

    Article  CAS  Google Scholar 

  3. Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA (2005) Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 193:223–234

    Article  CAS  Google Scholar 

  4. Ouzounis CA, Karp PD (2002) The past, present and future of genome-wide re-annotation. Genome Biol 3, COMMENT2001

  5. Hoersch S, Leroy C, Brown NP, Andrade MA, Sander C (2000) The GeneQuiz web server: protein functional analysis through the Web. Trends Biochem Sci 25:33–35

    Article  CAS  Google Scholar 

  6. Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C et al (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265

    Article  CAS  Google Scholar 

  7. King RD, Wise PH, Clare A (2004) Confirmation of data mining based predictions of protein function. Bioinformatics 20:1110–1118

    Article  CAS  Google Scholar 

  8. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    CAS  Google Scholar 

  9. Pearson WR, Lipman DJ (1988) Improved Tools for Biological Sequence Comparison. Proc Natl Acad Sci U S A 85:2444–2448

    Article  CAS  Google Scholar 

  10. Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  Google Scholar 

  11. Devos D, Valencia A (2000) Practical limits of function prediction. Proteins Struc Funct Genet 41:98–107

    Article  CAS  Google Scholar 

  12. Friedberg I, Kaplan T, Margalit H (2000) Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci 9:2278–2284

    Article  CAS  Google Scholar 

  13. Jones DT, Swindells MB (2002) Getting the most from PSI-BLAST. Trends Biochem Sci 27:161–164

    Article  CAS  Google Scholar 

  14. Tian WD, Skolnick J (2003) How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 333:863–882

    Article  CAS  Google Scholar 

  15. Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608

    Article  CAS  Google Scholar 

  16. Yona G, Levitt M (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol 315:1257–1275

    Article  CAS  Google Scholar 

  17. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132

    Article  CAS  Google Scholar 

  18. Menendezarias L, Turnay J, Gavilanes JG, Rodriguez R (1987) Relationship between hydropathic variability and functional-properties of alpha-lactalbumins and type C-lysozymes. J Theor Biol 126:91–100

    Article  CAS  Google Scholar 

  19. Schlessinger A, Rost B (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61:115–126

    Article  CAS  Google Scholar 

  20. Cedano J, Aloy P, PerezPons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  CAS  Google Scholar 

  21. Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31:3804–3807

    Article  CAS  Google Scholar 

  22. Jantzen J (1998) Design of fuzzy controllers. Technical University of denmark, Lyngby, p 27

    Google Scholar 

  23. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

    Article  Google Scholar 

  24. Woolf PJ, Wang YX (2000) A fuzzy logic approach to analyzing gene expression data. Physiol Genomics 3:9–15

    CAS  Google Scholar 

  25. Kato R, Nakano H, Konishi H, Kato K, Koga Y, Yamane T, Kobayashi T, Honda H (2005) Novel strategy for protein exploration: high-throughput screening assisted with fuzzy neural network. J Mol Biol 351:683–692

    Article  CAS  Google Scholar 

  26. Jacob E, Sasikumar R, Nair KN (2005) A fuzzy guided genetic algorithm for operon prediction. Bioinformatics 21:1403–1407

    Article  CAS  Google Scholar 

  27. Shen HB, Yang J, Liu XJ, Chou KC (2005) Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Commun 334:577–581

    Article  CAS  Google Scholar 

  28. Karplus PA, Schulz GE (1985) Prediction of chain flexibility in proteins. Naturwissenchaften 72:212–213

    Article  CAS  Google Scholar 

  29. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res 32:D262–266

    Article  CAS  Google Scholar 

  30. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23:1282–1288

    Article  CAS  Google Scholar 

  31. Brenner SE, Koehl P, Levitt R (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 28:254–256

    Article  CAS  Google Scholar 

  32. Park J, Teichmann SA, Hubbard T, Chothia C (1997) Intermediate sequences increase the detection of homology between sequences. J Mol Biol 273:349–354

    Article  CAS  Google Scholar 

  33. Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci U S A 78:3824–3828

    Article  CAS  Google Scholar 

  34. Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  CAS  Google Scholar 

  35. Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769

    Article  CAS  Google Scholar 

  36. Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157

    Article  CAS  Google Scholar 

  37. Chou KC, Shen HB (2006) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897

    Article  CAS  Google Scholar 

  38. Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734

    CAS  Google Scholar 

  39. Chou KC, Shen HB (2007) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678

    Article  CAS  Google Scholar 

  40. Chou KC, Shen HB (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  CAS  Google Scholar 

  41. Chou KC, Shen HB (2007) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640

    Article  CAS  Google Scholar 

  42. Kedarisetti KD, Kurgan L, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988

    Article  CAS  Google Scholar 

  43. Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T (2005) A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci 14:2804–2813

    Article  CAS  Google Scholar 

  44. Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260

    Article  CAS  Google Scholar 

  45. Shen HB, Chou KC (2007) Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Des Sel 20:39–46

    Article  CAS  Google Scholar 

  46. Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011

    Article  CAS  Google Scholar 

  47. Shen HB, Chou KC (2007) Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85:233–240

    Article  CAS  Google Scholar 

  48. Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction of protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids 30:461–468

    Article  CAS  Google Scholar 

  49. Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by Grant BFU 2004-06377 and BIO 2007-67904-C02-01 from the MCYT (Ministerio de Ciencia y Tecnologia, Spain) and by the Centre de Referència de R+D de Biotecnologia de la Generalitat de Catalunya. AH is a fellowship recipient from the Generalitat de Catalunya.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrique Querol.

Additional information

Antonio Gómez and Juan Cedano contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gómez, A., Cedano, J., Espadaler, J. et al. Prediction of Protein Function Improving Sequence Remote Alignment Search by a Fuzzy Logic Algorithm. Protein J 27, 130–139 (2008). https://doi.org/10.1007/s10930-007-9116-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10930-007-9116-x

Keywords

Navigation