Abstract
The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.
Similar content being viewed by others
References
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA et al (2001) The sequence of the human genome. Science 291:1304–1351
Gilks WR, Audit B, de Angelis D, Tsoka S, Ouzounis CA (2005) Percolation of annotation errors through hierarchically structured protein sequence databases. Math Biosci 193:223–234
Ouzounis CA, Karp PD (2002) The past, present and future of genome-wide re-annotation. Genome Biol 3, COMMENT2001
Hoersch S, Leroy C, Brown NP, Andrade MA, Sander C (2000) The GeneQuiz web server: protein functional analysis through the Web. Trends Biochem Sci 25:33–35
Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C et al (2002) Prediction of human protein function from post-translational modifications and localization features. J Mol Biol 319:1257–1265
King RD, Wise PH, Clare A (2004) Confirmation of data mining based predictions of protein function. Bioinformatics 20:1110–1118
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Pearson WR, Lipman DJ (1988) Improved Tools for Biological Sequence Comparison. Proc Natl Acad Sci U S A 85:2444–2448
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Devos D, Valencia A (2000) Practical limits of function prediction. Proteins Struc Funct Genet 41:98–107
Friedberg I, Kaplan T, Margalit H (2000) Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci 9:2278–2284
Jones DT, Swindells MB (2002) Getting the most from PSI-BLAST. Trends Biochem Sci 27:161–164
Tian WD, Skolnick J (2003) How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol 333:863–882
Rost B (2002) Enzyme function less conserved than anticipated. J Mol Biol 318:595–608
Yona G, Levitt M (2002) Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol 315:1257–1275
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132
Menendezarias L, Turnay J, Gavilanes JG, Rodriguez R (1987) Relationship between hydropathic variability and functional-properties of alpha-lactalbumins and type C-lysozymes. J Theor Biol 126:91–100
Schlessinger A, Rost B (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61:115–126
Cedano J, Aloy P, PerezPons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600
Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31:3804–3807
Jantzen J (1998) Design of fuzzy controllers. Technical University of denmark, Lyngby, p 27
Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Woolf PJ, Wang YX (2000) A fuzzy logic approach to analyzing gene expression data. Physiol Genomics 3:9–15
Kato R, Nakano H, Konishi H, Kato K, Koga Y, Yamane T, Kobayashi T, Honda H (2005) Novel strategy for protein exploration: high-throughput screening assisted with fuzzy neural network. J Mol Biol 351:683–692
Jacob E, Sasikumar R, Nair KN (2005) A fuzzy guided genetic algorithm for operon prediction. Bioinformatics 21:1403–1407
Shen HB, Yang J, Liu XJ, Chou KC (2005) Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Commun 334:577–581
Karplus PA, Schulz GE (1985) Prediction of chain flexibility in proteins. Naturwissenchaften 72:212–213
Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The gene ontology annotation (GOA) database: sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res 32:D262–266
Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23:1282–1288
Brenner SE, Koehl P, Levitt R (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 28:254–256
Park J, Teichmann SA, Hubbard T, Chothia C (1997) Intermediate sequences increase the detection of homology between sequences. J Mol Biol 273:349–354
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci U S A 78:3824–3828
Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769
Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157
Chou KC, Shen HB (2006) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897
Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734
Chou KC, Shen HB (2007) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678
Chou KC, Shen HB (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345
Chou KC, Shen HB (2007) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640
Kedarisetti KD, Kurgan L, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988
Matsuda S, Vert JP, Saigo H, Ueda N, Toh H, Akutsu T (2005) A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci 14:2804–2813
Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260
Shen HB, Chou KC (2007) Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Des Sel 20:39–46
Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011
Shen HB, Chou KC (2007) Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85:233–240
Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction of protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids 30:461–468
Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738
Acknowledgements
This research was supported by Grant BFU 2004-06377 and BIO 2007-67904-C02-01 from the MCYT (Ministerio de Ciencia y Tecnologia, Spain) and by the Centre de Referència de R+D de Biotecnologia de la Generalitat de Catalunya. AH is a fellowship recipient from the Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Additional information
Antonio Gómez and Juan Cedano contributed equally to this work.
Rights and permissions
About this article
Cite this article
Gómez, A., Cedano, J., Espadaler, J. et al. Prediction of Protein Function Improving Sequence Remote Alignment Search by a Fuzzy Logic Algorithm. Protein J 27, 130–139 (2008). https://doi.org/10.1007/s10930-007-9116-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10930-007-9116-x