Function Annotation in Gene Networks
Modern sequencing technology enables the discovery of new gene products in an increasing number of organisms. However, the sequence on its own does not provide sufficient information about cellular mechanisms and their function. Efforts need to be directed toward genome characterization at the molecular level. Wet-lab experiments in this direction are assisted by a variety of computational methods that exploit the abundance of data. The advent of high-throughput interaction detection methods has generated large amounts of gene interaction data. This has allowed the construction of genome-wide networks. Studying genomes in a networked setting has been beneficial for global annotation in two ways. First, there has been an increasing number of network-based function prediction methods. Second, networks have inspired the community to revisit the definition of gene function. The original molecular characterization of function has been extended to a multi-molecule function, termed biological process[Gene ontology: Tool for the unification of biology. Nature, 2000] in recently emerging annotation systems. In this chapter, we present the current methods of automated annotation of protein functions. We describe existing annotation prediction methods and ontologies used to define a gene’s function at the molecular and process level. We discuss in detail the workings of a generalized framework for network prediction and present experimental accuracy comparison of several popular methods within this framework. We also discuss the use of networks from multiple species for annotation enrichment in sparse genomes.
This material is based upon work supported by the National Science Foundation under Grant No. IIS-0917149.
- 1.Ensembl – on-line genome database. http://www.ensembl.org/.
- 2.Gene expression omnibus. http://www.ncbi.nlm.nih.gov/geo/.
- 3.The gene ontology. http://geneonetology.org/.
- 4.Gene ontology: Tool for the unification of biology. Nature, 2000.Google Scholar
- 5.BioGRID: General repository for interaction datasets. http://www.thebiogrid.org/, 2006.
- 6.V. Arnau, S. Mars, and I. Marin. Iterative clustering analysis of protein interaction data. Bioinformatics, 2005.Google Scholar
- 7.Petko Bogdanov and Ambuj K. Singh. Molecular Function Prediction Using Neighborhood Features. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 7(2), 2010.Google Scholar
- 8.Christine Brun, Francois Chevenet, David Martin, Jerome Wojcik, Alain Guenoche, and Bernard Jacq. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology, 5:R6, 2003.Google Scholar
- 9.T. Can, O. Camoglu, and A. K. Singh. Analysis of protein interaction networks using random walks. Proceedings of the 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics, 2005.Google Scholar
- 10.Jin Chen, Wynne Hsu, Mong Li Lee, and See-Kiong Ng. Labeling network motifs in protein interactomes for protein function prediction. ICDE, 2007.Google Scholar
- 11.Kyle C Chipman and Ambuj K Singh. Predicting genetic interactions with random walks on biological networks. BMC bioinformatics, 10:17, January 2009.Google Scholar
- 12.H. Chua, W. Sung, and L. Wong. Exploiting indirect neighbors and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006.Google Scholar
- 13.H. Chua, W. Sung, and L. Wong. Using indirect protein interactions for the prediction of gene ontology functions. BMC Bioinformatics, 2007.Google Scholar
- 14.C. M. Deane, ukasz Salwinski, Ioannis Xenarios, and David Eisenberg. Protein Interactions: Two Methods for Assessment of the Reliability of High Throughput Observations. Molecular & Cellular Proteomics, 1(5):349–356, April 2002.Google Scholar
- 17.R. Dunn, F. Dudbridge, and CM. Sanderson. The use of edge-betweenness clustering to investigate the biological function in protein interaction networks. BMC Bioinformatics, 2005.Google Scholar
- 18.J.E. Galagan and et. al. The genome sequence of the filamentous fungus neurospora crassa. Nature, 422:859–868, 2003.Google Scholar
- 19.J. Han, N. Bertin, and T. Hao et Al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 2004.Google Scholar
- 20.John L Hartman, Barbara Garvik, and Lee Hartwell. Principles for the Buffering of Genetic Variation. Science, 291(9):1001–1004, 2001.Google Scholar
- 22.H. Hishigaki, K. Nakai, T. Ono, A. Tanigami, and T. Takagi. Assessment of prediction accuracy of protein function from protein–protein interaction data. Yeast, 2001.Google Scholar
- 24.Ulas Karaoz, T. M. Murali, Stan Letovsky, Yu Zheng, Chunming Ding, Charles R. Cantor, and Simon Kasif. Whole-genome annotation by using evidence integration in functional-linkage networks. PNAS, 101:2888–2893, 2004.Google Scholar
- 27.Mustafa Kirac and Gultekin Ozsoyoglu. Protein function prediction based on patterns in biological networks. Research in Computational Molecular Biology, pages 197–213, 2008.Google Scholar
- 30.I. Lee, B. Lehner, C. Crombie, W. Wong, AG Fraser, and E. Marcotte. A single network comprising the majority of genes accurately predicts the phenotypic effects of gene perturbation in caenorhabditis elegans. Nature Genetics, 40(2):181–188, 2008.Google Scholar
- 31.I. Lee, Z. Li, E. M. Marcotte. An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker’s Yeast, Saccharomyces cerevisiae. PLoS ONE, 2(10):e988. doi:10.1371/journal.pone.0000988, 2007.Google Scholar
- 34.K. Maciag, S.J. Altschuler, M.D. Slack, N.J. Krogan, A. Emili, J.F. Greenblatt, T. Maniatis, and L.F. Wu. Systems-level analysis identify extensive coupling among gene expression machines. Molecular Systems Biology, 2006.Google Scholar
- 35.Kathy Macropol, Tolga Can, and Ambuj Singh. Rrw: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics, 10:283, 2009.Google Scholar
- 36.C. Von Mering, M. Huynen, D. Jaeggi, S. Schmidt, P. Bork, and B. Snel. String: a database of predicted functional associations between proteins. Nucleic Acids Res., 2003.Google Scholar
- 37.H. W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. kMorgenstern, M. Munsterkotter, S. Rudd, and B. Weil. Mips: a database for genomes and protein sequences. Nucleic Acids Res., 30:31–34, 2002.Google Scholar
- 38.H.W. Mewes and et. al. Overview of the yeast genome. Nature, 387:496–512, 1997.Google Scholar
- 41.M. Riley. Functions of the gene products of escherichia coli. FEMS Microbiol. Rev., 57: 862–952, 1993.Google Scholar
- 43.A. Ruepp and et. al. The genome sequence of the thermoacidophilic scavenger thermoplasma acidophilum. Nature, 407:508–513, 2000.Google Scholar
- 44.A. Ruepp and et. al. The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res., 32:5539–5545, 2004.Google Scholar
- 45.M. Salanoubat and et. al. Sequence and analysis of chromosome 3 of the plant arabidopsis thaliana. Nature, 408:820–822, 2000.Google Scholar
- 46.Manoj Pratim Samanta and Shoudan Liang. Predicting protein functions from redundancies in large-scale protein interaction networks. PNAS, 100:12579–12583, 2003.Google Scholar
- 47.B. Schwikowski, P. Uetz, and S. Fields. A network of protein-protein interactions in yeast. Nature, 2000.Google Scholar
- 48.R. Sharan, I. Ulitsky, and R. Shamir. Network-based prediction of protein function. Molecular Systems Biology, 2007.Google Scholar
- 49.V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. PNAS, 2003.Google Scholar
- 50.Joshua M Stuart, Eran Segal, Daphne Koller, and Stuart K Kim. A gene-coexpression network for global discovery of conserved genetic modules. Science (New York, N.Y.), 302(5643): 249–55, October 2003.Google Scholar
- 51.A. H. Tong, G. Lesage, G. D. Bader, H. Ding, H. Xu, X. Xin, J. Young, G. F. Berriz, R. L. Brost, M. Chang, Y. Chen, X. Cheng, G. Chua, H. Friesen, D. S. Goldberg, J. Haynes, C. Humphries, G. He, S. Hussein, L. Ke, N. Krogan, Z. Li, J. N. Levinson, H. Lu, P. Menard, C. Munyana, A. B. Parsons, O. Ryan, R. Tonikian, T. Roberts, A. M. Sdicu, J. Shapiro, B. Sheikh, B. Suter, S. L. Wong, L. V. Zhang, H. Zhu, C. G. Burd, S. Munro, C. Sander, J. Rine, J. Greenblatt, M. Peter, A. Bretscher, G. Bell, F. P. Roth, G. W. Brown, B. Andrews, H. Bussey, and C. Boone. Global mapping of the yeast genetic interaction network. Science, 303:808–813, Feb 2004.PubMedCrossRefGoogle Scholar
- 52.Oron Vanunu and Roded Sharan. A propagation-based algorithm for inferring gene-disease associations. German Conference on Bioinformatics, 2008.Google Scholar
- 54.G. X. Yu, E. M. Glass, N. T. Karonis, and N. Maltsev. Knowledge-based voting algorithm for automated protein functional annotation. PROTEINS: Structure, Function, and Bioinformatics, 61:907–917, 2005.Google Scholar