Skip to main content

Gene Function Prediction and Functional Network: The Role of Gene Ontology

  • Chapter
  • First Online:
Data Mining: Foundations and Intelligent Paradigms

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 25))

  • 1668 Accesses

Abstract

Almost every cellular process requires the interactions of pairs or larger complexes of proteins. The organization of genes into networks has played an important role in characterizing the functions of individual genes and the interplay between various cellular processes. The Gene Ontology (GO) project has integrated information from multiple data sources to annotate genes to specific biological process. Recently, the semantic similarity (SS) between GO terms has been investigated and used to derive semantic similarity between genes. Such semantic similarity provides us with a new perspective to predict protein functions and to generate functional gene networks. In this chapter, we focus on investigating the semantic similarity between genes and its applications. We have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resniks formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed. In another application, we present a novel approach to automatically generate a functional network of yeast genes using Gene Ontology (GO) annotations. An semantic similarity (SS) is calculated between pairs of genes. This SS score is then used to predict linkages between genes, to generate a functional network. Functional networks predicted by SS and other methods are compared. The network predicted by SS scores outperforms those generated by other methods in the following aspects: automatic removal of a functional bias in network training reference sets, improved precision and recall across the network, and higher correlation between a genes lethality and centrality in the network. We illustrate that the resulting network can be applied to generate coherent function modules and their associations. We conclude that determination of semantic similarity between genes based upon GO information can be used to generate a functional network of yeast genes that is comparable or improved with respect to those that are directly based on integrated heterogeneous genomic and proteomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G., Sherlock, G.: Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  2. Resnik, P.: Using information content to evaluate semantic similarity. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)

    Google Scholar 

  3. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics (1997)

    Google Scholar 

  4. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning (1998)

    Google Scholar 

  5. Schlicker, A., Domingues, F.S., Rahnenfuhrer, J., Lengauer, T.: A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302–317 (2006)

    Article  Google Scholar 

  6. Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Semantic similarity measures as tools for exploring the gene ontology. In: Pac. Symp. Biocomput., pp. 601–612 (2003)

    Google Scholar 

  7. Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Molecular Systems Biology 3(88), 1–13 (2007)

    Google Scholar 

  8. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18, 1257–1261 (2000)

    Article  Google Scholar 

  9. Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18, 523–531 (2001)

    Article  Google Scholar 

  10. Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from proteinprotein interactions. Bioinformatics 22, 1623–1630 (2006)

    Article  Google Scholar 

  11. Letovsky, S., Kasif, S.: Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 204(suppl. 1), i197–i204 (2003)

    Google Scholar 

  12. Deng, M., Tu, Z., Sun, F., Chen, T.: Mapping gene ontology to proteins based on protein–protein interaction data. Bioinformatics 20(6), 895–902 (2004)

    Article  Google Scholar 

  13. Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21, 697–700 (2003)

    Article  Google Scholar 

  14. Karaoz, U., Murali, T.M., Letovsky, S., Zheng, Y., Ding, C., Cantor, C.R., Kasif, S.: Whole-genome annotation by using evidence integration in functional-linkage networks. Proc. Natl. Acad. Sci. USA 101, 2888–2893 (2004)

    Article  Google Scholar 

  15. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i302–i310 (2005)

    Google Scholar 

  16. Joshi, T., Chen, Y., Becker, J.M., Alexandrov, N., Xu, D.: Genome-scale gene function prediction using multiple sources of high-throughput data in yeast saccharomyces cerevisiae. OMICS 8(4), 322–333 (2004)

    Article  Google Scholar 

  17. Lee, H., Tu, Z., Deng, M., Sun, F., Chen, T.: Diffusion kernel-based logistic regression models for protein function prediction. OMICS 10(1), 40–55 (2006)

    Article  Google Scholar 

  18. Lanckriet, G.R., De Bie, T., Cristianini, N., Jordan, M.I., Noble, W.S.: A statistical framework for genomic data fusion. Bioinformatics 20(16), 2626–2635 (2004)

    Article  Google Scholar 

  19. Tsuda, K., Shin, H., Schölkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21(suppl. 2) (2005)

    Google Scholar 

  20. Bader, G.D., Hogue, C.W.: An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4(1) (2003)

    Google Scholar 

  21. Sharan, R., Ideker, T., Kelley, B., Shamir, R., Karp, R.M.: Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data. J. Comput. Biol. 12(6), 835–846 (2005)

    Article  Google Scholar 

  22. Arnau, V., Mars, S., Marin, I.: Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), 364–378 (2005)

    Article  Google Scholar 

  23. Segal, E., Wang, H., Koller, D.: Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics 19(suppl. 1), i264–i271 (2003)

    Google Scholar 

  24. Kelley, R., Ideker, T.: Systematic interpretation of genetic interactions using protein networks. Nature Biotechnology 23(5), 561–566 (2005)

    Article  Google Scholar 

  25. Wu, Y., Lonardi, S.: A linear-time algorithm for predicting functional annotations from proteinprotein interaction networks. In: Proceedings of the Workshop on Data Mining in Bioinformatics (BIOKDD 2007), pp. 35–41 (2007)

    Google Scholar 

  26. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003)

    Article  Google Scholar 

  27. Zhang, L.V., Wong, S.L., King, O.D., Roth, F.P.: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 5 (April 2004)

    Google Scholar 

  28. Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21(suppl. 1) (June 2005)

    Google Scholar 

  29. Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. PROTEINS: Structure, Function, and Bioinformatics 3, 490–500 (2006)

    Article  Google Scholar 

  30. Lee, I., Date, S.V., Adai, A.T., Marcotte, E.M.: A probabilistic functional network of yeast genes. Science 306, 1555–1558 (2004)

    Article  Google Scholar 

  31. Lee, I., Li, Z., Marcotte, E.M.: An improved, bias-reduced probabilistic functional gene network of baker’s yeast, saccharomyces cerevisiae. PLoS ONE, e988 (2007)

    Google Scholar 

  32. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P.: Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002)

    Article  Google Scholar 

  33. Yu, J., Fotouhi, F.: Computational approaches for predicting protein-protein interactions: A survey. J. Med. Syst. 30(1), 39–44 (2006)

    Article  Google Scholar 

  34. Bader, J.S.: Greedily building protein networks with confidence. Bioinformatics 19(15), 1869–1874 (2003)

    Article  Google Scholar 

  35. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)

    Article  Google Scholar 

  36. Stark, C., Breitkreutz, B.J., Reguly, T., Boucher, L., Breitkreutz, A., Tyers, M.: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34 (January 2006)

    Google Scholar 

  37. Mewes, H., Gruber, F., Geier, C., Haase, B., Kaps, D., Lemcke, A., Mannhaupt, K., Pfeiffer, G., Schuller, F.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30(1), 31–34 (2002)

    Article  Google Scholar 

  38. Murali, T., Wu, C., Kasif, S.: The art of gene function prediction. Nat. Biotechnol. 24(12), 1474–1475 (2006)

    Article  Google Scholar 

  39. Giaever, G., Chu, A., Ni, L., Connelly, C., Riles, L., et al.: Functional profiling of the saccharomyces cerevisiae genome. Nature 418(6896), 387–391 (2002)

    Article  Google Scholar 

  40. Myers, C.L., Robson, D., Wible, A., Hibbs, M.A., Chiriac, C., Theesfeld, C.L., Dolinski, K., Troyanskaya, O.G.: Discovery of biological networks from diverse functional genomic data. Genome Biology 6, R114 (2005)

    Google Scholar 

  41. Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T., Kalyana-Sundaram, S., Ghosh, D., Pandey, A., Chinnaiyan, A.M.: Probabilistic model of the human protein-protein interaction network. Nature Biotechnology 23(8), 951–959 (2005)

    Article  Google Scholar 

  42. Pan, X., Ye, P., Yuan, D.S., Wang, X., Bader, J.S., Boeke, J.D.: A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell 124, 1069–1081 (2006)

    Article  Google Scholar 

  43. Zhong, W., Sternberg, P.W.: Genome-wide prediction of c. elegans genetic interactions. Science 311, 1481–1484 (2006)

    Article  Google Scholar 

  44. Huang, H., Zhang, L.V., Roth, F.P., Bader, J.S.: Probabilistic paths for protein complex inference, pp. 14–28 (2006)

    Google Scholar 

  45. Jeong, H., Mason, S.P., Barabasi, A.L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411, 41–42 (2001)

    Article  Google Scholar 

  46. Sevilla, J.L., Segura, V., Podhorski, A., Guruceaga, E., Mato, J.M., Martinez-Cruz, L.A., Corrales, F.J., Rubio, A.: Correlation between gene expression and go semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 330–338 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erliang Zeng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zeng, E., Ding, C., Mathee, K., Schneper, L., Narasimhan, G. (2012). Gene Function Prediction and Functional Network: The Role of Gene Ontology. In: Holmes, D., Jain, L. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23151-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23151-3_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23150-6

  • Online ISBN: 978-3-642-23151-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics