Abstract
Protein function prediction is an important issue in the post-genomic era. When protein function is deduced from protein interaction data, the traditional methods treat each interaction sample equally, where the qualities of the interaction samples are seldom taken into account. In this paper, we investigate the effect of the quality of protein-protein interaction data on predicting protein function. Moreover, two improved methods, weight neighbour counting method (WNC) and weight chi-square method (WCHI), are proposed by considering the quality of interaction samples with the neighbour counting method (NC) and chi-square method (CHI). Experimental results have shown that the qualities of interaction samples affect the performances of protein function prediction methods seriously. It is also demonstrated that WNC and WCHI methods outperform NC and CHI methods in protein function prediction when example weights are chosen properly.
Similar content being viewed by others
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–29.
Brown, M.P., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., Ares, M., Jr., Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 97, 262–267.
Deane, C.M., Salwinski, L., Xenarios, I., Eisenberg, D. 2002. Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 1, 349–356.
Deng, M., Tu, Z., Sun, F., Chen, T. 2004. Mapping Gene Ontology to proteins based on protein-protein interaction data. Bioinformatics 20, 895–902.
Deng, M., Zhang, K., Mehta, S., Chen, T., Sun, F. 2003. Prediction of protein function using proteinprotein interaction data. J Comput Biol 10, 947–960.
Gribskov, M., Robinson, N.L. 1996. Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20, 25–33.
Hanley, J.A., McNeil, B.J. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36.
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T. 2001. Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18, 523–531.
Hodgman, T.C. 2000. A historical perspective on gene/protein functional assignment. Bioinformatics 16, 10–15.
Letovsky, S., Kasif, S. 2003. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19 Suppl 1, i197–204.
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D. 1999. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.
Marcotte, E.M., Pellegrini, M., Thompson, M.J., Yeates, T.O., Eisenber, D. 1999. A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86.
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M. 2005. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21 Suppl 1, i302–310.
Pearson, W.R., Lipman, D.J. 1988. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85, 2444–2448.
Samanta, M.P., Liang, S. 2003. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci U S A 100, 12579–12583.
Schwikowski, B., Uetz, P., Fields, S. 2000. A network of protein-protein interactions in yeast. Nat Biotechnol 18, 1257–1261.
Titz, B., Schlesner, M., Uetz, P. 2004. What do we learn from high-throughput protein interaction data? Expert Rev Proteomics 1, 111–121.
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P. 2002. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403.
Wu, L.F., Hughes, T.R., Davierwala, A.P., Robinson, M.D., Stoughton, R., Altschuler, S.J. 2002. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat Genet 31, 255–265.
Zhou, X., Kao, M.C., Wong, W.H. 2002. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci U S A 99, 12783–12788.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ni, QS., Wang, ZZ., Li, GG. et al. Effect of the quality of the interaction data on predicting protein function from protein-protein interactions. Interdiscip Sci Comput Life Sci 1, 40–45 (2009). https://doi.org/10.1007/s12539-008-0015-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-008-0015-4