Function Annotation in Gene Networks

Chapter

Abstract

Modern sequencing technology enables the discovery of new gene products in an increasing number of organisms. However, the sequence on its own does not provide sufficient information about cellular mechanisms and their function. Efforts need to be directed toward genome characterization at the molecular level. Wet-lab experiments in this direction are assisted by a variety of computational methods that exploit the abundance of data. The advent of high-throughput interaction detection methods has generated large amounts of gene interaction data. This has allowed the construction of genome-wide networks. Studying genomes in a networked setting has been beneficial for global annotation in two ways. First, there has been an increasing number of network-based function prediction methods. Second, networks have inspired the community to revisit the definition of gene function. The original molecular characterization of function has been extended to a multi-molecule function, termed biological process[Gene ontology: Tool for the unification of biology. Nature, 2000] in recently emerging annotation systems. In this chapter, we present the current methods of automated annotation of protein functions. We describe existing annotation prediction methods and ontologies used to define a gene’s function at the molecular and process level. We discuss in detail the workings of a generalized framework for network prediction and present experimental accuracy comparison of several popular methods within this framework. We also discuss the use of networks from multiple species for annotation enrichment in sparse genomes.

Notes

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0917149.

References

  1. 1.
    Ensembl – on-line genome database. http://www.ensembl.org/.
  2. 2.
    Gene expression omnibus. http://www.ncbi.nlm.nih.gov/geo/.
  3. 3.
    The gene ontology. http://geneonetology.org/.
  4. 4.
    Gene ontology: Tool for the unification of biology. Nature, 2000.Google Scholar
  5. 5.
    BioGRID: General repository for interaction datasets. http://www.thebiogrid.org/, 2006.
  6. 6.
    V. Arnau, S. Mars, and I. Marin. Iterative clustering analysis of protein interaction data. Bioinformatics, 2005.Google Scholar
  7. 7.
    Petko Bogdanov and Ambuj K. Singh. Molecular Function Prediction Using Neighborhood Features. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 7(2), 2010.Google Scholar
  8. 8.
    Christine Brun, Francois Chevenet, David Martin, Jerome Wojcik, Alain Guenoche, and Bernard Jacq. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology, 5:R6, 2003.Google Scholar
  9. 9.
    T. Can, O. Camoglu, and A. K. Singh. Analysis of protein interaction networks using random walks. Proceedings of the 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics, 2005.Google Scholar
  10. 10.
    Jin Chen, Wynne Hsu, Mong Li Lee, and See-Kiong Ng. Labeling network motifs in protein interactomes for protein function prediction. ICDE, 2007.Google Scholar
  11. 11.
    Kyle C Chipman and Ambuj K Singh. Predicting genetic interactions with random walks on biological networks. BMC bioinformatics, 10:17, January 2009.Google Scholar
  12. 12.
    H. Chua, W. Sung, and L. Wong. Exploiting indirect neighbors and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006.Google Scholar
  13. 13.
    H. Chua, W. Sung, and L. Wong. Using indirect protein interactions for the prediction of gene ontology functions. BMC Bioinformatics, 2007.Google Scholar
  14. 14.
    C. M. Deane, ukasz Salwinski, Ioannis Xenarios, and David Eisenberg. Protein Interactions: Two Methods for Assessment of the Reliability of High Throughput Observations. Molecular & Cellular Proteomics, 1(5):349–356, April 2002.Google Scholar
  15. 15.
    M. Deng, Z. Tu, F. Sun, and T. Chen. Mapping Gene Ontology to proteins based on protein-protein interaction data. Bioinformatics, 20:895–902, Apr 2004.PubMedCrossRefGoogle Scholar
  16. 16.
    M. Deng, K. Zhang, S. Mehta, T. Chen, and F. Sun. Prediction of protein function using protein-protein interaction data. J. Comput. Biol., 10:947–960, 2003.PubMedCrossRefGoogle Scholar
  17. 17.
    R. Dunn, F. Dudbridge, and CM. Sanderson. The use of edge-betweenness clustering to investigate the biological function in protein interaction networks. BMC Bioinformatics, 2005.Google Scholar
  18. 18.
    J.E. Galagan and et. al. The genome sequence of the filamentous fungus neurospora crassa. Nature, 422:859–868, 2003.Google Scholar
  19. 19.
    J. Han, N. Bertin, and T. Hao et Al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 2004.Google Scholar
  20. 20.
    John L Hartman, Barbara Garvik, and Lee Hartwell. Principles for the Buffering of Genetic Variation. Science, 291(9):1001–1004, 2001.Google Scholar
  21. 21.
    T. Hawkins, S. Luban, and D. Kihara. Enhanced automated function prediction using distantly related sequences and contextual association by PFP. Protein Sci., 15:1550–1556, Jun 2006.PubMedCrossRefGoogle Scholar
  22. 22.
    H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi. Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast, 2001.Google Scholar
  23. 23.
    TK Jenssen, A Laegreid, J Komorowski, and E Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28(1):21–28, 2001.PubMedGoogle Scholar
  24. 24.
    Ulas Karaoz, T. M. Murali, Stan Letovsky, Yu Zheng, Chunming Ding, Charles R. Cantor, and Simon Kasif. Whole-genome annotation by using evidence integration in functional-linkage networks. PNAS, 101:2888–2893, 2004.Google Scholar
  25. 25.
    S.K. Kim, J. Lund, M. Kiraly, K. Duke, M. Jiang, J.M. Stuart, A. Eizinger, B.N. Wylie, and G.S. Davidson. A gene expression map for Caenorhabditis elegans. Science, 293:2087–2092, Sep 2001.PubMedCrossRefGoogle Scholar
  26. 26.
    O. D. King, R. E. Foulger, S. S. Dwight, J. V. White, and F. P. Roth. Predicting gene function from patterns of annotation. Genome Res., 13:896–904, May 2003.PubMedCrossRefGoogle Scholar
  27. 27.
    Mustafa Kirac and Gultekin Ozsoyoglu. Protein function prediction based on patterns in biological networks. Research in Computational Molecular Biology, pages 197–213, 2008.Google Scholar
  28. 28.
    S. Kohler, S. Bauer, D. Horn, and P. N. Robinson. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet., 82:949–958, Apr 2008.PubMedCrossRefGoogle Scholar
  29. 29.
    I. Lee, S. V. Date, A. T. Adai, and E. M. Marcotte. A probabilistic functional network of yeast genes. Science, 306:1555–1558, November 2004.PubMedCrossRefGoogle Scholar
  30. 30.
    I. Lee, B. Lehner, C. Crombie, W. Wong, AG Fraser, and E. Marcotte. A single network comprising the majority of genes accurately predicts the phenotypic effects of gene perturbation in caenorhabditis elegans. Nature Genetics, 40(2):181–188, 2008.Google Scholar
  31. 31.
    I. Lee, Z. Li, E. M. Marcotte. An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker’s Yeast, Saccharomyces cerevisiae. PLoS ONE, 2(10):e988. doi:10.1371/journal.pone.0000988, 2007.Google Scholar
  32. 32.
    Stanley Letovsky and Simon Kasif. Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics, 19:i197–i204, 2003.PubMedCrossRefGoogle Scholar
  33. 33.
    MJ Thompson D Eisenberg TO Yeates M Pellegrini, EM Marcotte. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA, 96(8):4285–8, 1999.PubMedCrossRefGoogle Scholar
  34. 34.
    K. Maciag, S.J. Altschuler, M.D. Slack, N.J. Krogan, A. Emili, J.F. Greenblatt, T. Maniatis, and L.F. Wu. Systems-level analysis identify extensive coupling among gene expression machines. Molecular Systems Biology, 2006.Google Scholar
  35. 35.
    Kathy Macropol, Tolga Can, and Ambuj Singh. Rrw: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics, 10:283, 2009.Google Scholar
  36. 36.
    C. Von Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, and B. Snel. String: a database of predicted functional associations between proteins. Nucleic Acids Res., 2003.Google Scholar
  37. 37.
    H. W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. kMorgenstern, M. Munsterkotter, S. Rudd, and B. Weil. Mips: a database for genomes and protein sequences. Nucleic Acids Res., 30:31–34, 2002.Google Scholar
  38. 38.
    H.W. Mewes and et. al. Overview of the yeast genome. Nature, 387:496–512, 1997.Google Scholar
  39. 39.
    E. Nabieva, K. Jim, A. Agarwal, B. Chazelle, and M. Singh. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics, 21: i302–i310, 2005.PubMedCrossRefGoogle Scholar
  40. 40.
    K.P. O’Brien, M. Remm, and E.L. Sonnhammer. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res., 33:D476–480, Jan 2005.PubMedCrossRefGoogle Scholar
  41. 41.
    M. Riley. Functions of the gene products of escherichia coli. FEMS Microbiol. Rev., 57: 862–952, 1993.Google Scholar
  42. 42.
    M. Riley. Multifun, a multifunctional classification scheme for escherichia coli k-12 gene products. Microb Comp Genomics, 5:205–22, 2000.PubMedGoogle Scholar
  43. 43.
    A. Ruepp and et. al. The genome sequence of the thermoacidophilic scavenger thermoplasma acidophilum. Nature, 407:508–513, 2000.Google Scholar
  44. 44.
    A. Ruepp and et. al. The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res., 32:5539–5545, 2004.Google Scholar
  45. 45.
    M. Salanoubat and et. al. Sequence and analysis of chromosome 3 of the plant arabidopsis thaliana. Nature, 408:820–822, 2000.Google Scholar
  46. 46.
    Manoj Pratim Samanta and Shoudan Liang. Predicting protein functions from redundancies in large-scale protein interaction networks. PNAS, 100:12579–12583, 2003.Google Scholar
  47. 47.
    B. Schwikowski, P. Uetz, and S. Fields. A network of protein-protein interactions in yeast. Nature, 2000.Google Scholar
  48. 48.
    R. Sharan, I. Ulitsky, and R. Shamir. Network-based prediction of protein function. Molecular Systems Biology, 2007.Google Scholar
  49. 49.
    V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. PNAS, 2003.Google Scholar
  50. 50.
    Joshua M Stuart, Eran Segal, Daphne Koller, and Stuart K Kim. A gene-coexpression network for global discovery of conserved genetic modules. Science (New York, N.Y.), 302(5643): 249–55, October 2003.Google Scholar
  51. 51.
    A. H. Tong, G. Lesage, G. D. Bader, H. Ding, H. Xu, X. Xin, J. Young, G. F. Berriz, R. L. Brost, M. Chang, Y. Chen, X. Cheng, G. Chua, H. Friesen, D. S. Goldberg, J. Haynes, C. Humphries, G. He, S. Hussein, L. Ke, N. Krogan, Z. Li, J. N. Levinson, H. Lu, P. Menard, C. Munyana, A. B. Parsons, O. Ryan, R. Tonikian, T. Roberts, A. M. Sdicu, J. Shapiro, B. Sheikh, B. Suter, S. L. Wong, L. V. Zhang, H. Zhu, C. G. Burd, S. Munro, C. Sander, J. Rine, J. Greenblatt, M. Peter, A. Bretscher, G. Bell, F. P. Roth, G. W. Brown, B. Andrews, H. Bussey, and C. Boone. Global mapping of the yeast genetic interaction network. Science, 303:808–813, Feb 2004.PubMedCrossRefGoogle Scholar
  52. 52.
    Oron Vanunu and Roded Sharan. A propagation-based algorithm for inferring gene-disease associations. German Conference on Bioinformatics, 2008.Google Scholar
  53. 53.
    Y. Wu and S. Lonardi. A linear-time algorithm for predicting functional annotations from PPI networks. J Bioinform Comput Biol, 6:1049–1065, Dec 2008.PubMedCrossRefGoogle Scholar
  54. 54.
    G. X. Yu, E. M. Glass, N. T. Karonis, and N. Maltsev. Knowledge-based voting algorithm for automated protein functional annotation. PROTEINS: Structure, Function, and Bioinformatics, 61:907–917, 2005.Google Scholar
  55. 55.
    Shi-Hua Zhang, Hong-Wei Liu, Xue-Mei Ning, and Xiang-Sun Zhang. A hybrid graph-theoretic method for mining overlapping functional modules in large sparse protein interaction networks. IJDMB, 3(1):68–84, 2009.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Petko Bogdanov
    • 1
  • Kathy Macropol
    • 2
  • Ambuj K. Singh
    • 3
  1. 1.Department of Computer ScienceUniversity of CaliforniaSanta BarbaraUSA
  2. 2.University of California, Santa BarbaraSanta BarbaraUSA
  3. 3.University of CaliforniaSanta BarbaraUSA

Personalised recommendations