Skip to main content
Log in

Finding finer functions for partially characterized proteins by protein-protein interaction networks

  • Articles
  • Bioinformatics
  • Published:
Chinese Science Bulletin

Abstract

Based on high-throughput data, numerous algorithms have been designed to find functions of novel proteins. However, the effectiveness of such algorithms is currently limited by some fundamental factors, including (1) the low a-priori probability of novel proteins participating in a detailed function; (2) the huge false data present in high-throughput datasets; (3) the incomplete data coverage of functional classes; (4) the abundant but heterogeneous negative samples for training the algorithms; and (5) the lack of detailed functional knowledge for training algorithms. Here, for partially characterized proteins, we suggest an approach to finding their finer functions based on protein interaction sub-networks or gene expression patterns, defined in function-specific subspaces. The proposed approach can lessen the above-mentioned problems by properly defining the prediction range and functionally filtering the noisy data, and thus can efficiently find proteins’ novel functions. For thousands of yeast and human proteins partially characterized, it is able to reliably find their finer functions (e.g., the translational functions) with more than 90% precision. The predicted finer functions are highly valuable both for guiding the follow-up wet-lab validation and for providing the necessary data for training algorithms to learn other proteins.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Brown M P, Grundy W N, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA, 2000, 97(1): 262–267

    Article  Google Scholar 

  2. Kuramochi M, Karypis G. Gene classification using expression profiles: A feasibility Study. 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Bethesda, Maryland, USA, 2001

  3. Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol, 2000, 18(12): 1257–1261

    Article  Google Scholar 

  4. Chen Y, Xu D. Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res, 2004, 32(21): 6414–6424

    Article  Google Scholar 

  5. Sun J C, Xu J L, Li Y X, et al. Analysis and application of large-scale protein-protein in-teraction data sets. Chin Sci Bull, 2005, 50(20): 2267–2272

    Article  Google Scholar 

  6. Jansen R, Gerstein M. Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol, 2004, 7(5): 535–545

    Article  Google Scholar 

  7. Myers C L, Barrett D R, Hibbs M A, et al. Finding function: Evaluation methods for functional genomic data. BMC Genomics, 2006, 7: 187

    Article  Google Scholar 

  8. Dwight S S, Harris M A, Dolinski K, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res, 2002, 30(1): 69–72

    Article  Google Scholar 

  9. Ashburner M, Ball C A, Blake J A, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000, 25(1): 25–29

    Google Scholar 

  10. Tu K, Yu H, Guo Z, et al, Learnability-based further prediction of gene functions in Gene Ontology, Genomics, 2004, 84(6): 922–928

    Article  Google Scholar 

  11. Deng M, Sun F, and Chen T, Assessment of the reliability of protein-protein interactions and protein function prediction, Pac Symp Biocomput, 2003: 140–151

  12. Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinformatics, 2005, 6: 100

    Article  Google Scholar 

  13. Suthram S, Shlomi T, Ruppin E, et al. A direct comparison of protein interaction confidence assignment schemes. BMC Bioinformatics, 2006, 7: 360

    Article  Google Scholar 

  14. Lin N, Wu B, Jansen R, et al. Information assessment on predicting protein-protein interactions. BMC Bioinformatics, 2004, 5: 154

    Article  Google Scholar 

  15. Mateos A, Dopazo J, Jansen R, et al. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res, 2002, 12(11): 1703–1715

    Article  Google Scholar 

  16. Chen J J, Tsai C A, Young J F, et al. Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res, 2005, 16(6): 517–529

    Article  Google Scholar 

  17. Reguly T, Breitkreutz A, Boucher L, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol, 2006, 5(4): 11

    Article  Google Scholar 

  18. Stark C, Breitkreutz B J, Reguly T, et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res, 2006, 34(Database issue): D535–D539

    Article  Google Scholar 

  19. Mishra G R, Suresh M, Kumaran K, et al. Human protein reference database—2006 update. Nucleic Acids Res, 2006, 34(Database issue): D411–D444

    Article  Google Scholar 

  20. Wu C H, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res, 2006, 34(Database issue): D187–D191

    Article  Google Scholar 

  21. Gasch A P, Spellman P T, Kao C M, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell, 2000, 11(12): 4241–4257

    Google Scholar 

  22. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17(6): 520–525

    Article  Google Scholar 

  23. Wang D, Lv Y, Guo Z, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules. Bioinformatics, 2006, 22(23): 2883–2889

    Article  Google Scholar 

  24. Jiang T, Keating A E. AVID: An integrative framework for discovering functional relationships among proteins. BMC Bioinformatics, 2005, 6(1): 136

    Article  Google Scholar 

  25. Taher L, Rinner O, Garg S, et al. AGenDA: Homology-based gene prediction. Bioinformatics, 2003, 19(12): 1575–1577

    Article  Google Scholar 

  26. Wheeler D L, Barrett T, Benson D A, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, 2006, 34(Database issue): D173–D180

    Article  Google Scholar 

  27. Di Como C J, Arndt K T. Nutrients, via the Tor proteins, stimulate the association of Tap42 with type 2A phosphatases. Genes Dev, 1996, 10(15): 1904–1916

    Article  Google Scholar 

  28. Browne G J, Proud C G. Regulation of peptide-chain elongation in mammalian cells. Eur J Biochem, 2002, 269(22): 5360–5368

    Article  Google Scholar 

  29. Andjelkovic N, Zolnierowicz S, van Hoof C, et al. The catalytic subunit of protein phosphatase 2A associates with the translation termination factor eRF1. Embo J, 1996, 15(24): 7156–7167

    Google Scholar 

  30. Chua H N, Sung W K, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006, 22(13): 1623–1630

    Article  Google Scholar 

  31. Guo Z, Zhang T, Li X, et al. Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics, 2005, 6: 58

    Article  Google Scholar 

  32. Zhang M, Zhu J, Guo Z, et al. Identifying disease feature genes based on cellular localized gene functional modules and regulation networks. Chin Sci Bull, 2006, 51(15): 1848–1856

    Article  Google Scholar 

  33. Samanta M P, Liang S. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci USA, 2003, 100(22): 12579–12583

    Article  Google Scholar 

  34. Okada K, Kanaya S, Asai K. Accurate extraction of functional associations between proteins based on common interaction partners and common domains. Bioinformatics, 2005, 21(9): 2043–2048

    Article  Google Scholar 

  35. Karaoz U, Murali T M, Letovsky S, et al. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA, 2004, 101(9): 2888–2893

    Article  Google Scholar 

  36. Vazquez A, Flammini A, Maritan A, et al. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003, 21(6): 697–700

    Article  Google Scholar 

  37. Yook S H, Oltvai Z N, Barabasi A L. Functional and topological characterization of protein interaction networks. Proteomics, 2004, 4(4): 928–942

    Article  Google Scholar 

  38. Han J D, Bertin N, Hao T, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 2004, 430(6995): 88–93

    Article  Google Scholar 

  39. Jansen R, Yu H, Greenbaum D, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 2003, 302(5644): 449–453

    Article  Google Scholar 

  40. Troyanskaya O G, Dolinski K, Owen A B, et al. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA, 2003, 100(14): 8348–8353

    Article  Google Scholar 

  41. Lu L J, Xia Y, Paccanaro A, et al. Assessing the limits of genomic data integration for predicting protein networks. Genome Res, 2005, 15(7): 945–953

    Article  Google Scholar 

  42. Massjouni N, Rivera C G, Murali T M. VIRGO: Computational prediction of gene functions. Nucleic Acids Res, 2006, 34(Web Server issue): W340–W344

    Article  Google Scholar 

  43. Yu H, Gao L, Tu K, et al. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene, 2005, 352: 75–81

    Article  Google Scholar 

  44. Zhu M, Gao L, Guo Z, et al. Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities. Gene, 2007, 391(1–2): 113–119

    Article  Google Scholar 

  45. Gao L, Li X, Guo Z, et al. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile. Sci China C-Life Sci, 2007, 50(1): 125–134

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guo Zheng.

Additional information

Supported in part by the National Natural Science Foundation of China (Grant Nos. 30370388 and 30670539)

About this article

Cite this article

Li, Y., Guo, Z., Ma, W. et al. Finding finer functions for partially characterized proteins by protein-protein interaction networks. Chin. Sci. Bull. 52, 3363–3370 (2007). https://doi.org/10.1007/s11434-008-0016-z

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11434-008-0016-z

Keywords

Navigation