Abstract
Based on high-throughput data, numerous algorithms have been designed to find functions of novel proteins. However, the effectiveness of such algorithms is currently limited by some fundamental factors, including (1) the low a-priori probability of novel proteins participating in a detailed function; (2) the huge false data present in high-throughput datasets; (3) the incomplete data coverage of functional classes; (4) the abundant but heterogeneous negative samples for training the algorithms; and (5) the lack of detailed functional knowledge for training algorithms. Here, for partially characterized proteins, we suggest an approach to finding their finer functions based on protein interaction sub-networks or gene expression patterns, defined in function-specific subspaces. The proposed approach can lessen the above-mentioned problems by properly defining the prediction range and functionally filtering the noisy data, and thus can efficiently find proteins’ novel functions. For thousands of yeast and human proteins partially characterized, it is able to reliably find their finer functions (e.g., the translational functions) with more than 90% precision. The predicted finer functions are highly valuable both for guiding the follow-up wet-lab validation and for providing the necessary data for training algorithms to learn other proteins.
Similar content being viewed by others
References
Brown M P, Grundy W N, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA, 2000, 97(1): 262–267
Kuramochi M, Karypis G. Gene classification using expression profiles: A feasibility Study. 2nd IEEE International Symposium on Bioinformatics and Bioengineering, Bethesda, Maryland, USA, 2001
Schwikowski B, Uetz P, Fields S. A network of protein-protein interactions in yeast. Nat Biotechnol, 2000, 18(12): 1257–1261
Chen Y, Xu D. Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res, 2004, 32(21): 6414–6424
Sun J C, Xu J L, Li Y X, et al. Analysis and application of large-scale protein-protein in-teraction data sets. Chin Sci Bull, 2005, 50(20): 2267–2272
Jansen R, Gerstein M. Analyzing protein function on a genomic scale: The importance of gold-standard positives and negatives for network prediction. Curr Opin Microbiol, 2004, 7(5): 535–545
Myers C L, Barrett D R, Hibbs M A, et al. Finding function: Evaluation methods for functional genomic data. BMC Genomics, 2006, 7: 187
Dwight S S, Harris M A, Dolinski K, et al. Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res, 2002, 30(1): 69–72
Ashburner M, Ball C A, Blake J A, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000, 25(1): 25–29
Tu K, Yu H, Guo Z, et al, Learnability-based further prediction of gene functions in Gene Ontology, Genomics, 2004, 84(6): 922–928
Deng M, Sun F, and Chen T, Assessment of the reliability of protein-protein interactions and protein function prediction, Pac Symp Biocomput, 2003: 140–151
Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinformatics, 2005, 6: 100
Suthram S, Shlomi T, Ruppin E, et al. A direct comparison of protein interaction confidence assignment schemes. BMC Bioinformatics, 2006, 7: 360
Lin N, Wu B, Jansen R, et al. Information assessment on predicting protein-protein interactions. BMC Bioinformatics, 2004, 5: 154
Mateos A, Dopazo J, Jansen R, et al. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res, 2002, 12(11): 1703–1715
Chen J J, Tsai C A, Young J F, et al. Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res, 2005, 16(6): 517–529
Reguly T, Breitkreutz A, Boucher L, et al. Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol, 2006, 5(4): 11
Stark C, Breitkreutz B J, Reguly T, et al. BioGRID: A general repository for interaction datasets. Nucleic Acids Res, 2006, 34(Database issue): D535–D539
Mishra G R, Suresh M, Kumaran K, et al. Human protein reference database—2006 update. Nucleic Acids Res, 2006, 34(Database issue): D411–D444
Wu C H, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): An expanding universe of protein information. Nucleic Acids Res, 2006, 34(Database issue): D187–D191
Gasch A P, Spellman P T, Kao C M, et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell, 2000, 11(12): 4241–4257
Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 2001, 17(6): 520–525
Wang D, Lv Y, Guo Z, et al. Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules. Bioinformatics, 2006, 22(23): 2883–2889
Jiang T, Keating A E. AVID: An integrative framework for discovering functional relationships among proteins. BMC Bioinformatics, 2005, 6(1): 136
Taher L, Rinner O, Garg S, et al. AGenDA: Homology-based gene prediction. Bioinformatics, 2003, 19(12): 1575–1577
Wheeler D L, Barrett T, Benson D A, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, 2006, 34(Database issue): D173–D180
Di Como C J, Arndt K T. Nutrients, via the Tor proteins, stimulate the association of Tap42 with type 2A phosphatases. Genes Dev, 1996, 10(15): 1904–1916
Browne G J, Proud C G. Regulation of peptide-chain elongation in mammalian cells. Eur J Biochem, 2002, 269(22): 5360–5368
Andjelkovic N, Zolnierowicz S, van Hoof C, et al. The catalytic subunit of protein phosphatase 2A associates with the translation termination factor eRF1. Embo J, 1996, 15(24): 7156–7167
Chua H N, Sung W K, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006, 22(13): 1623–1630
Guo Z, Zhang T, Li X, et al. Towards precise classification of cancers based on robust gene functional expression profiles. BMC Bioinformatics, 2005, 6: 58
Zhang M, Zhu J, Guo Z, et al. Identifying disease feature genes based on cellular localized gene functional modules and regulation networks. Chin Sci Bull, 2006, 51(15): 1848–1856
Samanta M P, Liang S. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci USA, 2003, 100(22): 12579–12583
Okada K, Kanaya S, Asai K. Accurate extraction of functional associations between proteins based on common interaction partners and common domains. Bioinformatics, 2005, 21(9): 2043–2048
Karaoz U, Murali T M, Letovsky S, et al. Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA, 2004, 101(9): 2888–2893
Vazquez A, Flammini A, Maritan A, et al. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol, 2003, 21(6): 697–700
Yook S H, Oltvai Z N, Barabasi A L. Functional and topological characterization of protein interaction networks. Proteomics, 2004, 4(4): 928–942
Han J D, Bertin N, Hao T, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 2004, 430(6995): 88–93
Jansen R, Yu H, Greenbaum D, et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 2003, 302(5644): 449–453
Troyanskaya O G, Dolinski K, Owen A B, et al. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA, 2003, 100(14): 8348–8353
Lu L J, Xia Y, Paccanaro A, et al. Assessing the limits of genomic data integration for predicting protein networks. Genome Res, 2005, 15(7): 945–953
Massjouni N, Rivera C G, Murali T M. VIRGO: Computational prediction of gene functions. Nucleic Acids Res, 2006, 34(Web Server issue): W340–W344
Yu H, Gao L, Tu K, et al. Broadly predicting specific gene functions with expression similarity and taxonomy similarity. Gene, 2005, 352: 75–81
Zhu M, Gao L, Guo Z, et al. Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities. Gene, 2007, 391(1–2): 113–119
Gao L, Li X, Guo Z, et al. Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile. Sci China C-Life Sci, 2007, 50(1): 125–134
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported in part by the National Natural Science Foundation of China (Grant Nos. 30370388 and 30670539)
About this article
Cite this article
Li, Y., Guo, Z., Ma, W. et al. Finding finer functions for partially characterized proteins by protein-protein interaction networks. Chin. Sci. Bull. 52, 3363–3370 (2007). https://doi.org/10.1007/s11434-008-0016-z
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11434-008-0016-z