Abstract
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.
Similar content being viewed by others
References
Dwight S S, Harris M A, Dolinski K, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res, 2002, 30: 69–72
Altschul S F, Madden T L, Schaffer A A, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res, 1997, 25: 3389–3402
Yu X J, Lin J C, Shi T L, et al. A novel domain-based method for predicting the functional classes of proteins. Chin Sci Bull, 2004, 49: 2379–2384
Mateos A, Dopazo J, Jansen R, et al. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res, 2002, 12: 1703–1715
Brown M P, Grundy W N, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA, 2000, 97: 262–267
Midelfart H, Lagreid A, Komorowski J. Classification of Gene Expression Data in an Ontology. LNCS. Heidelberg: Springer-Verlag, 2001, 186–194
Hvidsten T R, Komorowski J, Sandvik A K, et al. Predicting gene function from gene expressions and ontologies. Pac Symp Biocomput, 2001: 299–310
Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol, 2000, 18:1257–1261
Hishigaki H, Nakai K, Ono T, et al. Assessment of prediction accuracy of protein function from protein—protein interaction data. Yeast, 2001, 18: 523–531
Deng M, Zhang K, Mehta S, et al. Prediction of protein function using protein-protein interaction data. J Comput Biol, 2003, 10: 947–960
Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: A probabilistic approach. Bioinformatics, 2003, 19(Suppl 1): 197–204
Vazquez A, Flammini A, Maritan A, et al. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003, 21: 697–700
Chen Y, Xu D. Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res, 2004, 32: 6414–6424
Karaoz U, Murali T M, Letovsky S, et al. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA, 2004, 101: 2888–2893
Jiang T, Keating A E. AVID: an integrative framework for discovering functional relationships among proteins. BMC Bioinformatics, 2005, 6:136
Nabieva E, Jim K, Agarwal A, et al., Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 2005, 21(Suppl 1): i302–i310
Ashburner M, Ball C. Gene ontology: Tool for the unification of biology. Nat Genet, 2000, 25: 25–29
Mewes H W, Frishman D, Guldener U, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res, 2002, 30: 31–34
Yu H, Gao L, Tu K, et al. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene, 2005, 352: 75–81
Tu K, Yu H, Guo Z, et al. Learnability-based further prediction of gene functions in Gene Ontology. Genomics, 2004, 84: 922–928
Kemmeren P, van Berkum N L, Vilo J, et al. Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell, 2002, 9:1133–1143
von Mering C, Krause R, Snel B, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 2002, 417: 399–403
Uetz P, Hughes R E. Systematic and large-scale two-hybrid screens. Curr Opin Microbiol, 2000, 3: 303–308
Sun J C, Xu J L, Li Y X, et al. Analysis and application of large-scale protein-protein interaction data sets. Chin Sci Bull, 2005, 50: 2267–2272
Pekar V, Steffen S. Taxonomy learning: Factoring the structure of a taxonomy into a semantic classification decision. Proceedings of the Nineteenth Conference on Computational Linguistics. Morristown: Association for Computational Linguistics, 2002. 786–792
Resnik P. Semantic similarity in a taxonomy: An information-based measure and application to problems of ambiguity in natural language. J Artif Intell Res, 1999, 11: 95–13.
Gasch A P, Spellman P T, Kao C M, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell, 2000, 11: 4241–4257
Stelzl U, Worm U, Lalowski M, et al. A human protein-protein interaction network: A resource for annotating the proteome. Cell, 2005, 122: 957–968
Author information
Authors and Affiliations
Corresponding authors
Additional information
Supported in part by the National Natural Science Foundation of China (Grant Nos. 30170515, 30370388, 30370798, 30570424 and 30571034), the National High Tech Development Project of China (Grant Nos. 2003AA2Z2051 and 2002AA2Z2052), Heilongjiang Science & Technology Key Project (Grant No. GB03C602-4), Harbin (City) Science & Technology Key Project (Grant No. 2003AA3CS113), Natural Science Foundation of Heilongjiang (Grant No. F0177), and Outstanding Overseas Scientist Foundation of Education Department of Heilongjiang Province (Grant No. 1055HG009)
Rights and permissions
About this article
Cite this article
Gao, L., Li, X., Guo, Z. et al. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile. SCI CHINA SER C 50, 125–134 (2007). https://doi.org/10.1007/s11427-007-0009-1
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11427-007-0009-1