Skip to main content

Comparative Genomics-Based Prediction of Protein Function

  • Protocol

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 439))

Abstract

The era of genomics has opened new possibilities for the computational prediction of protein function. In particular, the comparison of fully sequenced genomes allows us to investigate the so-called genomic context of a gene, which includes its chromosomal positioning relative to other genes as well as its evolutionary record among the genomes considered. This information can be exploited to find functionally interacting partners for a protein of unknown function and thus obtain information on the biological process in which it is playing a role. Such comparative genomics-based techniques are increasingly being used in the process of genome annotation and in the development of testable working hypothesis.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. 1. Devo, D, Valencia A (2001) Intrinsic errors in genome annotation. Trends Genet 17:429–431

    Article  Google Scholar 

  2. Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A, Valencia A, Leroy C, Sander C, Ouzounis CA. (2001) Genome sequences and great expectations. Genome Biol 2: INTERACTIONS0001

    Google Scholar 

  3. 3. Gabaldón T, Huynen MA (2004) Prediction of protein function and pathways in the genome era. Cell Mol Life Sci 61:930–944

    Article  PubMed  Google Scholar 

  4. 4. Durbin, R., Eddy, S. R., Krogh, A., and Graeme, M. (1988) Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    Google Scholar 

  5. 5. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  CAS  PubMed  Google Scholar 

  6. 6. Edgar RC (2004) MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113

    Article  PubMed  Google Scholar 

  7. 7. Gabaldón T (2005) Evolution of proteins and proteomes, a phylogenetics approach. Evolutionary Bioinformatics Online 1:51–56

    PubMed  Google Scholar 

  8. 8. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704

    Article  PubMed  Google Scholar 

  9. 9. Huynen MA, Bork P (1998) Measuring genome evolution. Proc Natl Acad Sci USA 95:5849–5856

    Article  CAS  PubMed  Google Scholar 

  10. 10. Tatusov RL, Koonin E V, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637

    Article  CAS  PubMed  Google Scholar 

  11. 11. Zmasek CM, Eddy SR (2001) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17:821–828

    Article  CAS  PubMed  Google Scholar 

  12. 12. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36

    Article  CAS  PubMed  Google Scholar 

  13. 13. Tatusov RL, Fedorova ND, Jackson JJ, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S., Wolf YI, Yin JJ, Natale DA (2003) The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4:41

    Article  PubMed  Google Scholar 

  14. 14. Birney E, Andrews D, Caccamo M et al (2006) Ensembl 2006. Nucleic Acids Res 34: D556–561

    Article  CAS  PubMed  Google Scholar 

  15. 15. Alexeyenko A, Tamas I, Liu G, Sonnhammer EL (2006) Automatic clustering of orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22:e9–e15

    Article  CAS  PubMed  Google Scholar 

  16. 16. Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, and Perriere G (2005) Tree pattern matching in phylogenetic trees: Automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics 21:2596–2603

    Article  CAS  PubMed  Google Scholar 

  17. 17. Burns DM, Horn V, Paluh J, Yanofsky C (1990) Evolution of the tryptophan synthetase of fungi. Analysis of experimentally fused Escherichia coli tryptophan synthetase alpha and beta chains. J Biol Chem 265:2060–2069

    CAS  PubMed  Google Scholar 

  18. 18. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO., Eisenberg D (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285:751–753

    Article  CAS  PubMed  Google Scholar 

  19. 19. Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402:86–90

    Article  CAS  PubMed  Google Scholar 

  20. 20. Yanai I, Derti A, DeLisi C (2001) Genes linked by fusion events are generally of the same functional category: A systematic analysis of 30 microbial genomes. Proc Natl Acad Sci USA 98:7940–7945

    Article  CAS  PubMed  Google Scholar 

  21. 21. Moreno-Hagelsieb G, Trevino V, Perez-Rueda E, Smith TF, Collado-Vides J (2001) Transcription unit conservation in the three domains of life: A perspective from Escherichia coli. Trends Genet 17:175–177

    Article  CAS  PubMed  Google Scholar 

  22. 22. Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem Sci 23:324–328

    Article  CAS  PubMed  Google Scholar 

  23. 23. Overbeek RF, M D'Souza M, Pusch GD,. Maltsev N (1998) Use of contiguity on the chromosome to infer functional coupling. In Silico Biol 2:93–108

    Google Scholar 

  24. 24. Blumenthal T (1998) Gene clusters and polycistronic transcription in eukaryotes. Bioessays 20:480–487

    Article  CAS  PubMed  Google Scholar 

  25. 25. Spieth J, Brook, G, Kuersten S, Lea K, Blumenthal T (1993) Operons in C. elegans: Polycistronic mRNA precursors are processed by trans-splicing of SL2 to downstream coding regions. Cell 73:521–532

    Article  CAS  PubMed  Google Scholar 

  26. 26. von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B (2003) STRING: A database of predicted functional associations between proteins. Nucleic Acids Res 31:258–261

    Article  Google Scholar 

  27. 27. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci USA 96:4285–4288

    Article  CAS  PubMed  Google Scholar 

  28. 28. Galperin MY, Koonin EV (2000) Who's your neighbor? New computational approaches for functional genomics. Nat Biotechnol 18:609–613

    Article  CAS  PubMed  Google Scholar 

  29. 29. Huynen M, Snel B, Lathe W, Bork P (2000) Exploitation of gene context. Curr Opin Struct Biol 10:366–370

    Article  CAS  PubMed  Google Scholar 

  30. 30. Wu J, Kasif S, DeLisi C (2003) Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19:1524–1530

    Article  CAS  PubMed  Google Scholar 

  31. 31. Perna NT, Plunkett G III, Burland V, Mau B et al (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529–533

    Article  Google Scholar 

  32. 32. Blattner FR, Plunkett G III, Bloch CA, Perna NT et al (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474

    Article  CAS  PubMed  Google Scholar 

  33. 33. Gabaldón T, Huynen MA (2005) Lineage-specific gene loss following mitochondrial endosymbiosis and its potential for function prediction in eukaryotes. Bioinformatics 21, Suppl 2: ii144–ii50

    Article  PubMed  Google Scholar 

  34. 34. Fryxell KJ (1996) The coevolution of gene family trees. Trends Genet 12:364–369

    Article  CAS  PubMed  Google Scholar 

  35. 35. Pazos F, Valencia A (2001) Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14:609–614

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Toni Gabalón is supported by a long-term fellowship from EMBO (LTF 402–2005). He thanks Martijn A. Huynen for introducing him to the field of computational protein function prediction.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Gabaldón, T. (2008). Comparative Genomics-Based Prediction of Protein Function. In: Starkey, M., Elaswarapu, R. (eds) Genomics Protocols. Methods in Molecular Biology™, vol 439. Humana Press. https://doi.org/10.1007/978-1-59745-188-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-188-8_26

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-871-3

  • Online ISBN: 978-1-59745-188-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics