Phylogenomics, Protein Family Evolution, and the Tree of Life: An Integrated Approach between Molecular Evolution and Computational Intelligence

  • Laila A. Nahum
  • Sergio L. Pereira
Part of the Studies in Computational Intelligence book series (SCI, volume 122)


Information generated by genomic technologies has opened new frontiers in science by bridging a broad range of disciplines. Many tools and methods have been developed over the past several years to allow the analysis of molecular sequences. Nevertheless, the interpretation of genomic data to determine gene function and phylogenetic relationships of organisms remains challenging. Here, we focus on the application of phylogenomics (phylogenetics and genomics) to improve functional prediction of genes and gene products, to understand the evolution of protein families, and to resolve phylogenetic relationships of organisms. We point out areas that require further development, such as computational tools and methods to manipulate large and diverse data sets. The application of integrated computational and biological approaches may help to achieve a better system-based understanding of biological processes in different environments. This will help to fully access valuable information regarding the evolution of genes and genomes in the wide diversity of organisms.


Molecular Evolution Phylogenetic Inference Functional Prediction Phylogenomic Analysis Syst Biol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abhiman S, Sonnhammer EL (2005) FunShift: a database of function shift analysis on protein subfamilies. Nucleic Acids Res 33: D197–200CrossRefGoogle Scholar
  2. 2.
    Barker FK, Cibois A, Schikler P, Feinstein J, Cracraft J (2004) Phylogeny and diversi_ cation of the largest avian radiation. Proc Natl Acad Sci U S A 101: 11040–11045CrossRefGoogle Scholar
  3. 3.
    Blanchette M, Kunisawa T, Sankoff D (1999) Gene order breakpoint evidence in animal mitochondrial phylogeny. J Mol Evol 49: 193–203CrossRefGoogle Scholar
  4. 4.
    Brenner SE (1999) Errors in genome annotation. Trends Genet 15: 132–133CrossRefGoogle Scholar
  5. 5.
    Brown D, Sjolander K (2006) Functional classification using phylogenomic inference. PLoS Comput Biol 2: e77CrossRefGoogle Scholar
  6. 6.
    Camargo MM, Nahum LA (2005) Adapting to a changing world: RAG genomics and evolution. Hum Genomics 2: 132–137Google Scholar
  7. 7.
    Castoe TA, Stephens T, Noonan BP, Calestani C (2007) A novel group of type I polyketide synthases (PKS) in animals and the complex phylogenomics of PKSs. Gene 392: 47–58CrossRefGoogle Scholar
  8. 8.
    Consortium EP (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306: 636–640CrossRefGoogle Scholar
  9. 9.
    Crowe TM, Bowie RC, Bloomer P, Mandiwana TG, Hedderson TAJ, Randi E, Pereira SL, Wakeling J (2006) Phylogenetics, biogeography and classification of, and character evolution in, gamebirds (Aves: Galliformes): effects of character exclusion, data partitioning and missing data. Cladistics 22: 495–532CrossRefGoogle Scholar
  10. 10.
    Dehal PS, Boore JL (2006) A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database. BMC Bioinformatics 7: 201CrossRefGoogle Scholar
  11. 11.
    Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstructtion of the tree of life. Nat Rev Genet 6: 361–375CrossRefGoogle Scholar
  12. 12.
    Deutschbauer AM, Chivian D, Arkin AP (2006) Genomics for environmental microbiology. Curr Opin Biotechnol 17: 229–235CrossRefGoogle Scholar
  13. 13.
    Dunin-Horkawicz S, Feder M, Bujnicki JM (2006) Phylogenomic analysis of the GIY-YIG nuclease superfamily. BMC Genomics 7: 98CrossRefGoogle Scholar
  14. 14.
    Dutilh BE, van Noort V, van der Heijden RT, Boekhout T, Snel B, Huynen MA (2007) Assessment of phylogenomic and orthology approaches for phylogenetic inference. Bioinformatics 23: 815–824CrossRefGoogle Scholar
  15. 15.
    Edwards AW, Cavalli-Sforza LL (1963) The reconstruction of evolution. Ann Hum Genet 27: 105–106Google Scholar
  16. 16.
    Edwards SV, Fertil B, Giron A, Deschavanne PJ (2002) A genomic schism in birds revealed by phylogenetic analysis of DNA strings. Syst Biol 51: 599–613CrossRefGoogle Scholar
  17. 17.
    Eisen JA (1998) Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8: 163–167Google Scholar
  18. 18.
    Eisen JA, Fraser CM (2003) Phylogenomics: intersection of evolution and genomics. Science 300: 1706–1707CrossRefGoogle Scholar
  19. 19.
    Eisen JA, Wu M (2002) Phylogenetic analysis and gene functional predictions: phylogenomics in action. Theor Popul Biol 61: 481–487CrossRefGoogle Scholar
  20. 20.
    Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27: 401–410CrossRefGoogle Scholar
  21. 21.
    Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum like-lihood approach. J Mol Evol 17: 368–376CrossRefGoogle Scholar
  22. 22.
    Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, Mass.Google Scholar
  23. 23.
    Gatesy J, Matthee C, DeSalle R, Hayashi C (2002) Resolution of a supertree/supermatrix paradox. Syst Biol 51: 652–664CrossRefGoogle Scholar
  24. 24.
    Gibb GC, Kardailsky O, Kimball RT, Braun EL, Penny D (2007) Mitochondrial genomes and avian phylogeny: complex characters and resolvability without explosive radiations. Mol Biol Evol 24: 269–280CrossRefGoogle Scholar
  25. 25.
    Glanville JG, Kirshner D, Krishnamurthy N, Sjolander K (2007) Berkeley Phylogenomics Group web servers: resources for structural phylogenomic analysis. Nucleic Acids Res 35: W27–W32CrossRefGoogle Scholar
  26. 26.
    Groth JG, Barrowclough GF (1999) Basal divergences in birds and the phylogenetic utility of the nuclear RAG-1 gene. Mol Phylogenet Evol 12: 115–123CrossRefGoogle Scholar
  27. 27.
    Hawkins T, Kihara D (2007) Function prediction of uncharacterized proteins. J Bioinform Comput Biol 5: 1–30CrossRefGoogle Scholar
  28. 28.
    Hebert PD, Stoeckle MY, Zemlak TS, Francis CM (2004) Identi_ cation of birds through DNA Barcodes. PLoS Biol 2: e312CrossRefGoogle Scholar
  29. 29.
    Hillis DM (1999) SINEs of the perfect character. Proc Natl Acad Sci U S A 96: 9979–9981CrossRefGoogle Scholar
  30. 30.
    Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51: 673–688CrossRefGoogle Scholar
  31. 31.
    Johnson KP (2001) Taxon sampling and the phylogenetic position of Passeriformes: evidence from 916 avian cytochrome b sequences. Syst Biol 50: 128–136Google Scholar
  32. 32.
    Jnsson KA, Fjelds J (2006) A phylogenetic supertree of Oscine passerine birds (Aves: Passeri). Zool Scr 35: 149–186CrossRefGoogle Scholar
  33. 33.
    Korbel JO, Snel B, Huynen MA, Bork P (2002) SHOT: a web server for the construction of genome phylogenies. Trends Genet 18: 158–162CrossRefGoogle Scholar
  34. 34.
    Kriegs JO, Churakov G, Kiefmann M, Jordan U, Brosius J, Schmitz J (2006) Retroposed elements as archives for the evolutionary history of placental mammals. PLoS Biol 4: e91CrossRefGoogle Scholar
  35. 35.
    Krishnamurthy N, Brown DP, Kirshner D, Sjolander K (2006) PhyloFacts: an online structural phylogenomic encyclopedia for protein functional and structural classification. Genome Biol 7: R83CrossRefGoogle Scholar
  36. 36.
    Li S, Pearl DK, Doss H (2000) Phylogenetic tree reconstruction using Markov Chain Monte Carlo. J Am Stat Assoc 95: 493–508CrossRefGoogle Scholar
  37. 37.
    Li W-H (1997) Molecular evolution. Sinauer Associates, Sunderland, Mass.Google Scholar
  38. 38.
    Livezey BC, Zusi RL (2007) High-order phylogeny of modern birds (Theropoda, Aves: Neornithes) based on comparative anatomy. II. Analysis and discussion. Zool J Linn Soc 149: 1–95CrossRefGoogle Scholar
  39. 39.
    . Maddison DR, Schulz K-S (2004) The Tree of Life Web Project. accessed in October 2007).
  40. 40.
    Malik HS, Heniko_ S (2003) Phylogenomics of the nucleosome. Nat Struct Biol 10: 882–891CrossRefGoogle Scholar
  41. 41.
    Mau B, Newton MA, Larget B (1999) Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55: 1–12zbMATHCrossRefMathSciNetGoogle Scholar
  42. 42.
    Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W (2007) Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res 17: 413–421CrossRefGoogle Scholar
  43. 43.
    Nahum LA, Reynolds MT, Wang ZO, Faith JJ, Jonna R, Jiang ZJ, Meyer TJ, Pollock DD (2006) EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity. BMC Bioinformatics 7 Suppl 2: S7CrossRefGoogle Scholar
  44. 44.
    Nahum LA, Riley M (2001) Divergence of function in sequence-related groups of Escherichia coli proteins. Genome Res 11: 1375–1381CrossRefGoogle Scholar
  45. 45.
    Nei M, Kumar S (2000) Molecular Evolution and Phylogenetics. Oxford University Press, Oxford; New YorkGoogle Scholar
  46. 46.
    Norvell JC, Machalek AZ (2000) Structural genomics programs at the US National Institute of General Medical Sciences. Nat Struct Biol 7 Suppl: 931Google Scholar
  47. 47.
    Page RDM, Holmes EC (1998) Molecular evolution: a phylogenetic approach. Blackwell Science, Oxford ; Malden, MAGoogle Scholar
  48. 48.
    Pereira SL, Baker AJ (2006) A mitogenomics timescale for birds detects variable phylogenetic rates of molecular evolution and refutes the standard molecular clock. Mol Biol Evol 23: 1731–1740CrossRefGoogle Scholar
  49. 49.
    Pereira SL, Baker AJ, Wajntal A (2002) Combined nuclear and mitochondrial DNA sequences resolve generic relationships within the Cracidae (Galliformes, Aves). Syst Biol 51: 946–958CrossRefGoogle Scholar
  50. 50.
    Philippe H, Lopez P, Brinkmann H, Budin K, Germot A, Laurent J, Moreira D, Muller M, Le Guyader H (2000) Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc Biol Sci 267: 1213–1221CrossRefGoogle Scholar
  51. 51.
    Pollock DD (2002) Genomic biodiversity, phylogenetics and coevolution in proteins. Appl Bioinformatics 1: 81–92Google Scholar
  52. 52.
    Qi J, Wang B, Hao BI (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58: 1–11CrossRefGoogle Scholar
  53. 53.
    Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol 1: 53–58CrossRefGoogle Scholar
  54. 54.
    Rannala B, Yang Z (1996) Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol 43: 304–311CrossRefGoogle Scholar
  55. 55.
    Reed JL, Patel TR, Chen KH, Joyce AR, Applebee MK, Herring CD, Bui OT, Knight EM, Fong SS, Palsson BO (2006) Systems approach to re_ ning genome annotation. Proc Natl Acad Sci U S A 103: 17480–17484CrossRefGoogle Scholar
  56. 56.
    Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425Google Scholar
  57. 57.
    Serres MH, Riley M (2005) Gene fusions and gene duplications: relevance to genomic annotation and functional analysis. BMC Genomics 6: 33CrossRefGoogle Scholar
  58. 58.
    Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ (2006) Microbial diversity in the deep sea and the underexplored rare biosphere. Proc Natl Acad Sci U S A 103: 12115–12120CrossRefGoogle Scholar
  59. 59.
    Sokal RR, Sneath PHA (1963) Numerical Taxonomy. W. H. Freeman, San FranciscoGoogle Scholar
  60. 60.
    Soltis DE, Soltis PS, Zanis MJ (2002) Phylogeny of seed plants based on evidence from eight genes. Am. J. Bot. 89: 1670–1681CrossRefGoogle Scholar
  61. 61.
    Stechmann A, Cavalier-Smith T (2002) Rooting the eukaryote tree by using a derived gene fusion. Science 297: 89–91CrossRefGoogle Scholar
  62. 62.
    Thomas GH, Wills MA, Szkely T (2004) A supertree approach to shorebird phylogeny. BMC Evol Biol 4: 28CrossRefGoogle Scholar
  63. 63.
    Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66–74CrossRefGoogle Scholar
  64. 64.
    Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM (2007) Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol 367: 1511–1522CrossRefGoogle Scholar
  65. 65.
    Waugh J (2007) DNA barcoding in animal species: progress, potential and pitfalls. Bioessays 29: 188–197CrossRefGoogle Scholar
  66. 66.
    Wickstead B, Gull K (2006) A holistic kinesin phylogeny reveals new kinesin families and predicts protein functions. Mol Biol Cell 17: 1734–1743CrossRefGoogle Scholar
  67. 67.
    Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Syst Biol 52: 528–538CrossRefGoogle Scholar
  68. 68.
    Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV (2001) Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 1: 8CrossRefGoogle Scholar
  69. 69.
    Yang Z (2006) Computational Molecular Evolution. Oxford University Press, OxfordGoogle Scholar
  70. 70.
    Zhang J (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18: 292–298CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Laila A. Nahum
    • 1
  • Sergio L. Pereira
    • 2
  1. 1.Marine Biological LaboratoryBay Paul CenterWoods HoleUSA
  2. 2.Department of Natural HistoryRoyal Ontario MuseumTorontoCanada

Personalised recommendations