Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency

Wang, Hua; Huang, Heng; Ding, Chris

doi:10.1007/978-3-319-05269-4_26

Hua Wang²⁰,
Heng Huang²¹ &
Chris Ding²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8394))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

2994 Accesses
3 Citations

Abstract

Protein function prediction in conventional computational approaches is usually conducted one function at a time, fundamentally. As a result, the functions are treated as separate target classes. However, biological processes are highly correlated, which makes functions assigned to proteins are not independent. Therefore, it would be beneficial to make use of function category correlations in predicting protein function. We propose a novel Maximization of Data-Knowledge Consistency (MDKC) approach to exploit function category correlations for protein function prediction. Our approach banks on the assumption that two proteins are likely to have large overlap in their annotated functions if they are highly similar according to certain experimental data. We first establish a new pairwise protein similarity using protein annotations from knowledge perspective. Then by maximizing the consistency between the established knowledge similarity upon annotations and the data similarity upon biological experiments, putative functions are assigned to unannotated proteins. Most importantly, function category correlations are elegantly incorporated through the knowledge similarity. Comprehensive experimental evaluations on Saccharomyces cerevisiae data demonstrate promising results that validate the performance of our methods.

This work was partially supported by US NSF IIS-1117965, IIS-1302675, IIS- 1344152.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. System Biol. 3(1) (2007)
Google Scholar
Schwikowski, B., Uetz, P., Fields, S.: A network of protein- protein interactions in yeast. Nat. Biotech. 18, 1257–1261 (2000)
Article Google Scholar
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18(6), 523–531 (2001)
Article Google Scholar
Chua, H., Sung, W., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)
Article Google Scholar
Chua, H., Sung, W., Wong, L.: Using indirect protein interactions for the prediction of Gene Ontology functions. BMC Bioinformatics 8(suppl. 4), S8 (2007)
Google Scholar
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, 302–310 (2005)
Article Google Scholar
Weston, J., Elisseeff, A., Zhou, D., Leslie, C., Noble, W.: Protein ranking: from local to global structure in the protein similarity network. Proc. Natl. Acad. Sci. USA 101(17), 6559 (2004)
Article Google Scholar
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21, 697–700 (2003)
Article Google Scholar
Karaoz, U., Murali, T., Letovsky, S., Zheng, Y., Ding, C., Cantor, C., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl Acad. Sci. USA 101(9), 2888–2893 (2004)
Article Google Scholar
Liang, S., Shuiwang, J., Jieping, Y.: Adaptive diffusion kernel learning from biological networks for protein function prediction. BMC Bioinformatics 9, 162 (2008)
Article Google Scholar
Wang, H., Huang, H., Ding, C.: Function-function correlated multi-label protein function prediction over interaction networks. In: Chor, B. (ed.) RECOMB 2012. LNCS, vol. 7262, pp. 302–313. Springer, Heidelberg (2012)
Chapter Google Scholar
Whisstock, J., Lesk, A.: Prediction of protein function from protein sequence and structure. Q. Rev. Biophysics 36(3), 307–340 (2004)
Article Google Scholar
Lanckriet, G., Deng, M., Cristianini, N., Jordan, M., Noble, W.: Kernel-based data fusion and its application to protein function prediction in yeast. In: Proc. of Pacific Symp. on Biocomputing, vol. 9, pp. 300–311 (2004)
Google Scholar
Tsuda, K., Noble, W.: Learning kernels from biological networks by maximizing entropy. Bioinformatics 20, 326–333 (2004)
Article Google Scholar
Shi, L., Cho, Y., Zhang, A.: ANN Based Protein Function Prediction Using Integrated Protein-Protein Interaction Data. In: Proc. of International Joint Conf. on Bioinformatics, Systems Biol. and Intelligent Comp., pp. 271–277 (2009)
Google Scholar
Shin, H., Lisewski, A., Lichtarge, O.: Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics 23(23), 3217 (2007)
Article Google Scholar
Sun, L., Ji, S., Ye, J.: Adaptive diffusion kernel learning from biological networks for protein function prediction. BMC Bioinformatics 9(1), 162 (2008)
Article Google Scholar
Wang, H., Huang, H., Ding, C.: Protein function prediction via laplacian network partitioning incorporating function category correlations. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2049–2055. AAAI Press (2013)
Google Scholar
Wang, H., Huang, H., Ding, C.: Image Annotation Using Multi-label Correlated Green’s Function. In: Proc. of IEEE ICCV 2009, pp. 2029–2034 (2009)
Google Scholar
Wang, H., Ding, C., Huang, H.: Multi-label linear discriminant analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 126–139. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, H., Huang, H., Ding, C.: Multi-label feature transform for image classifications. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 793–806. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, H., Huang, H., Ding, C.: Image annotation using bi-relational graph of images and semantic labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2011 (CVPR 2011), pp. 793–800 (2011)
Google Scholar
Mewes, H., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., Frishman, D.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 27(1), 44 (1999)
Article Google Scholar
Cai, D., He, X., Wu, X., Han, J.: Non-negative matrix factorization on manifold. In: Proc. of ICDM (2008)
Google Scholar
Gu, Q., Zhou, J.: Co-clustering on manifolds. In: Proc. of SIGKDD (2009)
Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized non-negative matrix factorization for data representation. IEEE Trans. Pattern Analysis Mach. Intell. 99 (2010)
Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix t-factorizations for clustering. In: SIGKDD (2006)
Google Scholar
Ding, C., Li, T., Jordan, M.: Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(1), 45–55 (2010)
Article Google Scholar
Wang, H., Nie, F., Huang, H., Makedon, F.: Fast nonnegative matrix tri-factorization for large-scale data co-clustering. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 2, pp. 1553–1558. AAAI Press (2011)
Google Scholar
Wang, H., Nie, F., Huang, H., Ding, C.: Dyadic transfer learning for cross-domain image classification. In: Proc. of ICCV, pp. 551–556. IEEE (2011)
Google Scholar
Wang, H., Nie, F., Huang, H., Ding, C.: Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In: Proceedings of ICDM (2011)
Google Scholar
Wang, H., Huang, H., Ding, C., Nie, F.: Predicting protein-protein interactions from multimodal biological data sources via nonnegative matrix tri-factorization. In: Chor, B. (ed.) RECOMB 2012. LNCS, vol. 7262, pp. 314–325. Springer, Heidelberg (2012)
Chapter Google Scholar
Li, T., Ding, C., Jordan, M.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proc. of ICDM (2007)
Google Scholar
Benson, D., Karsch-Mizrachi, I., Lipman, D.: GenBank. Nucleic Acids Res. 34, D16–D20 (2006)
Google Scholar
Kullback, S., Leibler, R.: On information and sufficiency. The Annals of Mathematical Statistics, 79–86 (1951)
Google Scholar
Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34(database issue), D535 (2006)
Google Scholar
Deane, C., Salwinski, L., Xenarios, I., Eisenberg, D.: Protein Interactions Two Methods for Assessment of the Reliability of High Throughput Observations. Mol. & Cellular Proteomics 1(5), 349–356 (2002)
Article Google Scholar
Pei, P., Zhang, A.: A topological measurement for weighted protein interaction network. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference, pp. 268–278 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Colorado School of Mines, Golden, Colorado, 80401, USA
Hua Wang
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, 76019, USA
Heng Huang & Chris Ding

Authors

Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Heng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chris Ding
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Tel Aviv University, 69978, Tel Aviv, Israel
Roded Sharan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Huang, H., Ding, C. (2014). Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency. In: Sharan, R. (eds) Research in Computational Molecular Biology. RECOMB 2014. Lecture Notes in Computer Science(), vol 8394. Springer, Cham. https://doi.org/10.1007/978-3-319-05269-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-05269-4_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05268-7
Online ISBN: 978-3-319-05269-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics