Skip to main content
Log in

Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile

  • Published:
Science in China Series C: Life Sciences Aims and scope Submit manuscript


GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Dwight S S, Harris M A, Dolinski K, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res, 2002, 30: 69–72

    Article  PubMed  CAS  Google Scholar 

  2. Altschul S F, Madden T L, Schaffer A A, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res, 1997, 25: 3389–3402

    Article  PubMed  CAS  Google Scholar 

  3. Yu X J, Lin J C, Shi T L, et al. A novel domain-based method for predicting the functional classes of proteins. Chin Sci Bull, 2004, 49: 2379–2384

    Article  CAS  Google Scholar 

  4. Mateos A, Dopazo J, Jansen R, et al. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res, 2002, 12: 1703–1715

    Article  PubMed  CAS  Google Scholar 

  5. Brown M P, Grundy W N, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA, 2000, 97: 262–267

    Article  PubMed  CAS  Google Scholar 

  6. Midelfart H, Lagreid A, Komorowski J. Classification of Gene Expression Data in an Ontology. LNCS. Heidelberg: Springer-Verlag, 2001, 186–194

    Google Scholar 

  7. Hvidsten T R, Komorowski J, Sandvik A K, et al. Predicting gene function from gene expressions and ontologies. Pac Symp Biocomput, 2001: 299–310

  8. Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol, 2000, 18:1257–1261

    Article  PubMed  CAS  Google Scholar 

  9. Hishigaki H, Nakai K, Ono T, et al. Assessment of prediction accuracy of protein function from protein—protein interaction data. Yeast, 2001, 18: 523–531

    Article  PubMed  CAS  Google Scholar 

  10. Deng M, Zhang K, Mehta S, et al. Prediction of protein function using protein-protein interaction data. J Comput Biol, 2003, 10: 947–960

    Article  PubMed  CAS  Google Scholar 

  11. Letovsky S, Kasif S. Predicting protein function from protein/protein interaction data: A probabilistic approach. Bioinformatics, 2003, 19(Suppl 1): 197–204

    Article  Google Scholar 

  12. Vazquez A, Flammini A, Maritan A, et al. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003, 21: 697–700

    Article  PubMed  CAS  Google Scholar 

  13. Chen Y, Xu D. Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res, 2004, 32: 6414–6424

    Article  PubMed  CAS  Google Scholar 

  14. Karaoz U, Murali T M, Letovsky S, et al. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA, 2004, 101: 2888–2893

    Article  PubMed  CAS  Google Scholar 

  15. Jiang T, Keating A E. AVID: an integrative framework for discovering functional relationships among proteins. BMC Bioinformatics, 2005, 6:136

    Article  PubMed  Google Scholar 

  16. Nabieva E, Jim K, Agarwal A, et al., Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 2005, 21(Suppl 1): i302–i310

    Article  PubMed  CAS  Google Scholar 

  17. Ashburner M, Ball C. Gene ontology: Tool for the unification of biology. Nat Genet, 2000, 25: 25–29

    Article  PubMed  CAS  Google Scholar 

  18. Mewes H W, Frishman D, Guldener U, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res, 2002, 30: 31–34

    Article  PubMed  CAS  Google Scholar 

  19. Yu H, Gao L, Tu K, et al. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene, 2005, 352: 75–81

    Article  PubMed  CAS  Google Scholar 

  20. Tu K, Yu H, Guo Z, et al. Learnability-based further prediction of gene functions in Gene Ontology. Genomics, 2004, 84: 922–928

    Article  PubMed  CAS  Google Scholar 

  21. Kemmeren P, van Berkum N L, Vilo J, et al. Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol Cell, 2002, 9:1133–1143

    Article  PubMed  CAS  Google Scholar 

  22. von Mering C, Krause R, Snel B, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 2002, 417: 399–403

    Article  Google Scholar 

  23. Uetz P, Hughes R E. Systematic and large-scale two-hybrid screens. Curr Opin Microbiol, 2000, 3: 303–308

    Article  PubMed  CAS  Google Scholar 

  24. Sun J C, Xu J L, Li Y X, et al. Analysis and application of large-scale protein-protein interaction data sets. Chin Sci Bull, 2005, 50: 2267–2272

    Article  CAS  Google Scholar 

  25. Pekar V, Steffen S. Taxonomy learning: Factoring the structure of a taxonomy into a semantic classification decision. Proceedings of the Nineteenth Conference on Computational Linguistics. Morristown: Association for Computational Linguistics, 2002. 786–792

    Google Scholar 

  26. Resnik P. Semantic similarity in a taxonomy: An information-based measure and application to problems of ambiguity in natural language. J Artif Intell Res, 1999, 11: 95–13.

    Google Scholar 

  27. Gasch A P, Spellman P T, Kao C M, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell, 2000, 11: 4241–4257

    PubMed  CAS  Google Scholar 

  28. Stelzl U, Worm U, Lalowski M, et al. A human protein-protein interaction network: A resource for annotating the proteome. Cell, 2005, 122: 957–968

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Guo Zheng or Rao ShaoQi.

Additional information

Supported in part by the National Natural Science Foundation of China (Grant Nos. 30170515, 30370388, 30370798, 30570424 and 30571034), the National High Tech Development Project of China (Grant Nos. 2003AA2Z2051 and 2002AA2Z2052), Heilongjiang Science & Technology Key Project (Grant No. GB03C602-4), Harbin (City) Science & Technology Key Project (Grant No. 2003AA3CS113), Natural Science Foundation of Heilongjiang (Grant No. F0177), and Outstanding Overseas Scientist Foundation of Education Department of Heilongjiang Province (Grant No. 1055HG009)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, L., Li, X., Guo, Z. et al. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile. SCI CHINA SER C 50, 125–134 (2007).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: