Gene Function Prediction and Functional Network: The Role of Gene Ontology

Zeng, Erliang; Ding, Chris; Mathee, Kalai; Schneper, Lisa; Narasimhan, Giri

doi:10.1007/978-3-642-23151-3_7

Erliang Zeng³,
Chris Ding⁴,
Kalai Mathee⁵,
Lisa Schneper⁵ &
…
Giri Narasimhan⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 25))

1668 Accesses

Abstract

Almost every cellular process requires the interactions of pairs or larger complexes of proteins. The organization of genes into networks has played an important role in characterizing the functions of individual genes and the interplay between various cellular processes. The Gene Ontology (GO) project has integrated information from multiple data sources to annotate genes to specific biological process. Recently, the semantic similarity (SS) between GO terms has been investigated and used to derive semantic similarity between genes. Such semantic similarity provides us with a new perspective to predict protein functions and to generate functional gene networks. In this chapter, we focus on investigating the semantic similarity between genes and its applications. We have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resniks formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed. In another application, we present a novel approach to automatically generate a functional network of yeast genes using Gene Ontology (GO) annotations. An semantic similarity (SS) is calculated between pairs of genes. This SS score is then used to predict linkages between genes, to generate a functional network. Functional networks predicted by SS and other methods are compared. The network predicted by SS scores outperforms those generated by other methods in the following aspects: automatic removal of a functional bias in network training reference sets, improved precision and recall across the network, and higher correlation between a genes lethality and centrality in the network. We illustrate that the resulting network can be applied to generate coherent function modules and their associations. We conclude that determination of semantic similarity between genes based upon GO information can be used to generate a functional network of yeast genes that is comparable or improved with respect to those that are directly based on integrated heterogeneous genomic and proteomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)
Article Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)
Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics (1997)
Google Scholar
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning (1998)
Google Scholar
Schlicker, A., Domingues, F.S., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302–317 (2006)
Article Google Scholar
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: Pac. Symp. Biocomput., pp. 601–612 (2003)
Google Scholar
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Molecular Systems Biology 3(88), 1–13 (2007)
Google Scholar
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000)
Article Google Scholar
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18, 523–531 (2001)
Article Google Scholar
Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from proteinprotein interactions. Bioinformatics 22, 1623–1630 (2006)
Article Google Scholar
Letovsky, S., Kasif, S.: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 204(suppl. 1), i197–i204 (2003)
Google Scholar
Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics 20(6), 895–902 (2004)
Article Google Scholar
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21, 697–700 (2003)
Article Google Scholar
Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA 101, 2888–2893 (2004)
Article Google Scholar
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i302–i310 (2005)
Google Scholar
Joshi, T., Chen, Y., Becker, J.M., Alexandrov, N., Xu, D.: Genome-scale gene function prediction using multiple sources of high-throughput data in yeast saccharomyces cerevisiae. OMICS 8(4), 322–333 (2004)
Article Google Scholar
Lee, H., Tu, Z., Deng, M., Sun, F., Chen, T.: Diffusion kernel-based logistic regression models for protein function prediction. OMICS 10(1), 40–55 (2006)
Article Google Scholar
Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)
Article Google Scholar
Tsuda, K., Shin, H., Schölkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21(suppl. 2) (2005)
Google Scholar
Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4(1) (2003)
Google Scholar
Sharan, R., Ideker, T., Kelley, B., Shamir, R., Karp, R.M.: Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J. Comput. Biol. 12(6), 835–846 (2005)
Article Google Scholar
Arnau, V., Mars, S., Marin, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), 364–378 (2005)
Article Google Scholar
Segal, E., Wang, H., Koller, D.: Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 19(suppl. 1), i264–i271 (2003)
Google Scholar
Kelley, R., Ideker, T.: Systematic interpretation of genetic interactions using protein networks. Nature Biotechnology 23(5), 561–566 (2005)
Article Google Scholar
Wu, Y., Lonardi, S.: A linear-time algorithm for predicting functional annotations from proteinprotein interaction networks. In: Proceedings of the Workshop on Data Mining in Bioinformatics (BIOKDD 2007), pp. 35–41 (2007)
Google Scholar
Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)
Article Google Scholar
Zhang, L.V., Wong, S.L., King, O.D., Roth, F.P.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5 (April 2004)
Google Scholar
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl. 1) (June 2005)
Google Scholar
Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. PROTEINS: Structure, Function, and Bioinformatics 3, 490–500 (2006)
Article Google Scholar
Lee, I., Date, S.V., Adai, A.T., Marcotte, E.M.: A probabilistic functional network of yeast genes. Science 306, 1555–1558 (2004)
Article Google Scholar
Lee, I., Li, Z., Marcotte, E.M.: An improved, bias-reduced probabilistic functional gene network of baker’s yeast, saccharomyces cerevisiae. PLoS ONE, e988 (2007)
Google Scholar
von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)
Article Google Scholar
Yu, J., Fotouhi, F.: Computational approaches for predicting protein-protein interactions: A survey. J. Med. Syst. 30(1), 39–44 (2006)
Article Google Scholar
Bader, J.S.: Greedily building protein networks with confidence. Bioinformatics 19(15), 1869–1874 (2003)
Article Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Article Google Scholar
Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34 (January 2006)
Google Scholar
Mewes, H., Gruber, F., Geier, C., Haase, B., Kaps, D., Lemcke, A., Mannhaupt, K., Pfeiffer, G., Schuller, F.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30(1), 31–34 (2002)
Article Google Scholar
Murali, T., Wu, C., Kasif, S.: The art of gene function prediction. Nat. Biotechnol. 24(12), 1474–1475 (2006)
Article Google Scholar
Giaever, G., Chu, A., Ni, L., Connelly, C., Riles, L., et al.: Functional profiling of the saccharomyces cerevisiae genome. Nature 418(6896), 387–391 (2002)
Article Google Scholar
Myers, C.L., Robson, D., Wible, A., Hibbs, M.A., Chiriac, C., Theesfeld, C.L., Dolinski, K., Troyanskaya, O.G.: Discovery of biological networks from diverse functional genomic data. Genome Biology 6, R114 (2005)
Google Scholar
Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana-Sundaram, S., Ghosh, D., Pandey, A., Chinnaiyan, A.M.: Probabilistic model of the human protein-protein interaction network. Nature Biotechnology 23(8), 951–959 (2005)
Article Google Scholar
Pan, X., Ye, P., Yuan, D.S., Wang, X., Bader, J.S., Boeke, J.D.: A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 124, 1069–1081 (2006)
Article Google Scholar
Zhong, W., Sternberg, P.W.: Genome-wide prediction of c. elegans genetic interactions. Science 311, 1481–1484 (2006)
Article Google Scholar
Huang, H., Zhang, L.V., Roth, F.P., Bader, J.S.: Probabilistic paths for protein complex inference, pp. 14–28 (2006)
Google Scholar
Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41–42 (2001)
Article Google Scholar
Sevilla, J.L., Segura, V., Podhorski, A., Guruceaga, E., Mato, J.M., Martinez-Cruz, L.A., Corrales, F.J., Rubio, A.: Correlation between gene expression and go semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 330–338 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, 46556, USA
Erliang Zeng
Department of Computer Science and Engineering, University of Texas at Arlington, Texas, 76019, USA
Chris Ding
Department of Molecular Microbiology, College of Medicine, Florida International University, Miami, Florida, 33199, USA
Kalai Mathee & Lisa Schneper
Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, Miami, Florida, 33199, USA
Giri Narasimhan

Authors

Erliang Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Chris Ding
View author publications
You can also search for this author in PubMed Google Scholar
Kalai Mathee
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Schneper
View author publications
You can also search for this author in PubMed Google Scholar
Giri Narasimhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erliang Zeng .

Editor information

Editors and Affiliations

College of Letters & Science, Dept. Statistics & Applied Probability, University of California, Santa Barbara, South Hall 5504, Santa Barbara, 93106-3110, California, USA
Dawn E. Holmes
, School of Electrical and Information Eng, University of South Australia, Adelaide,, Mawson Lakes Campus, SA 5095, Australia
Lakhmi C Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zeng, E., Ding, C., Mathee, K., Schneper, L., Narasimhan, G. (2012). Gene Function Prediction and Functional Network: The Role of Gene Ontology. In: Holmes, D., Jain, L. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23151-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-23151-3_7
Published: 12 January 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23150-6
Online ISBN: 978-3-642-23151-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics