Abstract
The era of genomics has opened new possibilities for the computational prediction of protein function. In particular, the comparison of fully sequenced genomes allows us to investigate the so-called genomic context of a gene, which includes its chromosomal positioning relative to other genes as well as its evolutionary record among the genomes considered. This information can be exploited to find functionally interacting partners for a protein of unknown function and thus obtain information on the biological process in which it is playing a role. Such comparative genomics-based techniques are increasingly being used in the process of genome annotation and in the development of testable working hypothesis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
1. Devo, D, Valencia A (2001) Intrinsic errors in genome annotation. Trends Genet 17:429–431
Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A, Valencia A, Leroy C, Sander C, Ouzounis CA. (2001) Genome sequences and great expectations. Genome Biol 2: INTERACTIONS0001
3. Gabaldón T, Huynen MA (2004) Prediction of protein function and pathways in the genome era. Cell Mol Life Sci 61:930–944
4. Durbin, R., Eddy, S. R., Krogh, A., and Graeme, M. (1988) Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
5. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680
6. Edgar RC (2004) MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113
7. Gabaldón T (2005) Evolution of proteins and proteomes, a phylogenetics approach. Evolutionary Bioinformatics Online 1:51–56
8. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704
9. Huynen MA, Bork P (1998) Measuring genome evolution. Proc Natl Acad Sci USA 95:5849–5856
10. Tatusov RL, Koonin E V, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637
11. Zmasek CM, Eddy SR (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17:821–828
12. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36
13. Tatusov RL, Fedorova ND, Jackson JJ, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S., Wolf YI, Yin JJ, Natale DA (2003) The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4:41
14. Birney E, Andrews D, Caccamo M et al (2006) Ensembl 2006. Nucleic Acids Res 34: D556–561
15. Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9–e15
16. Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, and Perriere G (2005) Tree pattern matching in phylogenetic trees: Automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 21:2596–2603
17. Burns DM, Horn V, Paluh J, Yanofsky C (1990) Evolution of the tryptophan synthetase of fungi. Analysis of experimentally fused Escherichia coli tryptophan synthetase alpha and beta chains. J Biol Chem 265:2060–2069
18. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO., Eisenberg D (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285:751–753
19. Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90
20. Yanai I, Derti A, DeLisi C (2001) Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc Natl Acad Sci USA 98:7940–7945
21. Moreno-Hagelsieb G, Trevino V, Perez-Rueda E, Smith TF, Collado-Vides J (2001) Transcription unit conservation in the three domains of life: A perspective from Escherichia coli. Trends Genet 17:175–177
22. Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328
23. Overbeek RF, M D'Souza M, Pusch GD,. Maltsev N (1998) Use of contiguity on the chromosome to infer functional coupling. In Silico Biol 2:93–108
24. Blumenthal T (1998) Gene clusters and polycistronic transcription in eukaryotes. Bioessays 20:480–487
25. Spieth J, Brook, G, Kuersten S, Lea K, Blumenthal T (1993) Operons in C. elegans: Polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions. Cell 73:521–532
26. von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B (2003) STRING: A database of predicted functional associations between proteins. Nucleic Acids Res 31:258–261
27. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 96:4285–4288
28. Galperin MY, Koonin EV (2000) Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 18:609–613
29. Huynen M, Snel B, Lathe W, Bork P (2000) Exploitation of gene context. Curr Opin Struct Biol 10:366–370
30. Wu J, Kasif S, DeLisi C (2003) Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19:1524–1530
31. Perna NT, Plunkett G III, Burland V, Mau B et al (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533
32. Blattner FR, Plunkett G III, Bloch CA, Perna NT et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474
33. Gabaldón T, Huynen MA (2005) Lineage-specific gene loss following mitochondrial endosymbiosis and its potential for function prediction in eukaryotes. Bioinformatics 21, Suppl 2: ii144–ii50
34. Fryxell KJ (1996) The coevolution of gene family trees. Trends Genet 12:364–369
35. Pazos F, Valencia A (2001) Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14:609–614
Acknowledgments
Toni Gabalón is supported by a long-term fellowship from EMBO (LTF 402–2005). He thanks Martijn A. Huynen for introducing him to the field of computational protein function prediction.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Gabaldón, T. (2008). Comparative Genomics-Based Prediction of Protein Function. In: Starkey, M., Elaswarapu, R. (eds) Genomics Protocols. Methods in Molecular Biology™, vol 439. Humana Press. https://doi.org/10.1007/978-1-59745-188-8_26
Download citation
DOI: https://doi.org/10.1007/978-1-59745-188-8_26
Publisher Name: Humana Press
Print ISBN: 978-1-58829-871-3
Online ISBN: 978-1-59745-188-8
eBook Packages: Springer Protocols