Modeling Gene Family Evolution and Reconciling Phylogenetic Discord

  • Gergely J. Szöllősi
  • Vincent Daubin
Part of the Methods in Molecular Biology book series (MIMB, volume 856)


Large-scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of life. Here, we discuss the approaches of increasing complexity aimed at extracting information on the pattern and process of gene family evolution from such datasets. In particular, we consider the models that invoke processes of gene birth (duplication and transfer) and death (loss) to explain the evolution of gene families. First, we review birth-and-death models of family size evolution and their implications in light of the universal features of family size distribution observed across different species and the three domains of life. Subsequently, we proceed to recent developments on models capable of more completely considering information in the sequences of homologous gene families through the probabilistic reconciliation of the phylogenetic histories of individual genes with the phylogenetic history of the genomes in which they have resided. To illustrate the methods and results presented, we use data from the HOGENOM database, demonstrating that the distribution of homologous gene family sizes in the genomes of the eukaryota, archaea, and bacteria exhibits remarkably similar shapes. We show that these distributions are best described by models of gene family size evolution, where for individual genes the death (loss) rate is larger than the birth (duplication and transfer) rate but new families are continually supplied to the genome by a process of origination. Finally, we use probabilistic reconciliation methods to take into consideration additional information from gene phylogenies, and find that, for prokaryotes, the majority of birth events are the result of transfer.

Key words

Gene family evolution Gene duplication Gene loss Horizontal gene transfer Birth-and-death models Reconciliation 


  1. 1.
    Crick, F. H. (1968) The origin of the genetic code. J Mol Biol, 38, 367–79.PubMedCrossRefGoogle Scholar
  2. 2.
    Theobald, D. L. (2010) A formal test of the theory of universal common ancestry. Nature, 465, 219–22.PubMedCrossRefGoogle Scholar
  3. 3.
    Boussau, B. and Daubin, V. (2010) Genomes as documents of evolutionary history. Trends Ecol Evol, 25, 224–32.PubMedCrossRefGoogle Scholar
  4. 4.
    Koonin, E. V. and Wolf, Y. I. (2008) Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res, 36, 6688–719.PubMedCrossRefGoogle Scholar
  5. 5.
    Long, M., Betrán, E., Thornton, K., and Wang, W. (2003) The origin of new genes: glimpses from the young and old. Nat Rev Genet, 4, 865–75.PubMedCrossRefGoogle Scholar
  6. 6.
    Lynch, M. (2007) The origins of genome architecture. Sinauer Associates.Google Scholar
  7. 7.
    Lerat, E., Daubin, V., Ochman, H., and Moran, N. A. (2005) Evolutionary origins of genomic repertoires in bacteria. PLoS Biol, 3, e130.PubMedCrossRefGoogle Scholar
  8. 8.
    Gogarten, J. P. and Townsend, J. P. (2005) Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol, 3, 679–87.PubMedCrossRefGoogle Scholar
  9. 9.
    Lynch, M. and Conery, J. S. (2003) The origins of genome complexity. Science, 302, 1401–4.PubMedCrossRefGoogle Scholar
  10. 10.
    Siew, N. and Fischer, D. (2003) Analysis of singleton orfans in fully sequenced microbial genomes. Proteins, 53, 241–51.PubMedCrossRefGoogle Scholar
  11. 11.
    Daubin, V. and Ochman, H. (2004) Bacterial genomes as new gene homes: the genealogy of orfans in e. coli. Genome Res, 14, 1036–42.PubMedCrossRefGoogle Scholar
  12. 12.
    Huynen, M. A. and van Nimwegen, E. (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol, 15, 583–9.PubMedGoogle Scholar
  13. 13.
    Qian, J., Luscombe, N. M., and Gerstein, M. (2001) Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol, 313, 673–81.PubMedCrossRefGoogle Scholar
  14. 14.
    Karev, G. P., Wolf, Y. I., Rzhetsky, A. Y., Berezovskaya, F. S., and Koonin, E. V. (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol, 2, 18.PubMedCrossRefGoogle Scholar
  15. 15.
    Molina, N. and van Nimwegen, E. (2009) Scaling laws in functional genome content across prokaryotic clades and lifestyles. Trends Genet, 25, 243–7.PubMedCrossRefGoogle Scholar
  16. 16.
    Koonin, E. V., Wolf, Y. I., and Karev, G. P. (2006) Power laws, scale-free networks and genome biology. Molecular biology intelligence unit, Landes Bioscience/ Scholar
  17. 17.
    Penel, S., Arigon, A.-M., Dufayard, J.-F., Sertier, A.-S., Daubin, V., Duret, L., Gouy, M., and Perrière, G. (2009) Databases of homologous gene families for comparative genomics. BMC Bioinformatics, 10 Suppl 6, S3.Google Scholar
  18. 18.
    Novozhilov, A. S., Karev, G. P., and Koonin, E. V. (2006) Biological applications of the theory of birth-and-death processes. Brief Bioinform, 7, 70–85.PubMedCrossRefGoogle Scholar
  19. 19.
    Koonin, E. V., Wolf, Y. I., and Karev, G. P. (2002) The structure of the protein universe and genome evolution. Nature, 420, 218–23.PubMedCrossRefGoogle Scholar
  20. 20.
    Reed, W. J. and Hughes, B. D. (2004) A model explaining the size distribution of gene and protein families. Math Biosci, 189, 97–102.PubMedCrossRefGoogle Scholar
  21. 21.
    Csűrös, M. and Miklós, I. (2009) Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Mol Biol Evol, 26, 2087–95.PubMedCrossRefGoogle Scholar
  22. 22.
    Yule, G. U. (1925) A mathematical theory of evolution, based on the conclusions of dr. j. c. willis, f.r.s. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, 21–87.Google Scholar
  23. 23.
    Feller, W. (1939) Die grundlagen der volterraschen theorie des kampfes urns dasein in wahrscheinliehkeitstheoretischer behandlung. Acta Biotheoretioa Series A., 5, 11–39.CrossRefGoogle Scholar
  24. 24.
    Kendall, D. G. (1948) On the generalized “birth-and-death” process. The Annals of Mathematical Statistics, 19, 1–15.CrossRefGoogle Scholar
  25. 25.
    Bartholomay, A. (1958-06-01) On the linear birth and death processes of biology as markoff chains. Bulletin of Mathematical Biology, 20, 97–118.Google Scholar
  26. 26.
    Takács, L. (1962) Introduction to the theory of queues. Oxford University Press.Google Scholar
  27. 27.
    Ota, T. and Nei, M. (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin vh gene family. Mol Biol Evol, 11, 469–82.PubMedGoogle Scholar
  28. 28.
    Nei, M., Gu, X., and Sitnikova, T. (1997) Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc Natl Acad Sci U S A, 94, 7799–806.PubMedCrossRefGoogle Scholar
  29. 29.
    Yanai, I., Camacho, C. J., and DeLisi, C. (2000) Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. Phys Rev Lett, 85, 2641–4.PubMedCrossRefGoogle Scholar
  30. 30.
    Hughes, A. L., Ekollu, V., Friedman, R., and Rose, J. R. (2005) Gene family content-based phylogeny of prokaryotes: the effect of criteria for inferring homology. Syst Biol, 54, 268–76.PubMedCrossRefGoogle Scholar
  31. 31.
    Wójtowicz, D. and Tiuryn, J. (2007) Evolution of gene families based on gene duplication, loss, accumulated change, and innovation. J Comput Biol, 14, 479–95.PubMedCrossRefGoogle Scholar
  32. 32.
    Fitz-Gibbon, S. T. and House, C. H. (1999) Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res, 27, 4218–22.PubMedCrossRefGoogle Scholar
  33. 33.
    Snel, B., Bork, P., and Huynen, M. A. (1999) Genome phylogeny based on gene content. Nat Genet, 21, 108–10.PubMedCrossRefGoogle Scholar
  34. 34.
    Wolf, Y. I., Rogozin, I. B., Grishin, N. V., and Koonin, E. V. (2002) Genome trees and the tree of life. Trends Genet, 18, 472–9.PubMedCrossRefGoogle Scholar
  35. 35.
    Deeds, E. J., Hennessey, H., and Shakhnovich, E. I. (2005) Prokaryotic phylogenies inferred from protein structural domains. Genome Res, 15, 393–402.PubMedCrossRefGoogle Scholar
  36. 36.
    Lienau, E. K., DeSalle, R., Rosenfeld, J. A., and Planet, P. J. (2006) Reciprocal illumination in the gene content tree of life. Syst Biol, 55, 441–53.PubMedCrossRefGoogle Scholar
  37. 37.
    Mirkin, B. G., Fenner, T. I., Galperin, M. Y., and Koonin, E. V. (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol, 3, 2.PubMedCrossRefGoogle Scholar
  38. 38.
    Csűrös, M. and Miklós, I. (2009) Mathematical framework for phylogenetic birth-and-death models. ar Xiv, p. 0902.0970.Google Scholar
  39. 39.
    Hahn, M. W., De Bie, T., Stajich, J. E., Nguyen, C., and Cristianini, N. (2005) Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res, 15, 1153–60.PubMedCrossRefGoogle Scholar
  40. 40.
    Spencer, M., Susko, E., and Roger, A. J. (2006) Modelling prokaryote gene content. Evol Bioinform Online, 2, 157–78.Google Scholar
  41. 41.
    Iwasaki, W. and Takagi, T. (2007) Reconstruction of highly heterogeneous gene-content evolution across the three domains of life. Bioinformatics, 23, i230–9.PubMedCrossRefGoogle Scholar
  42. 42.
    Felsenstein, J. (2004) Inferring phylogenies. Sinauer Associates.Google Scholar
  43. 43.
    Csűrös, M. (2010) Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics, 26, 1910–2.PubMedCrossRefGoogle Scholar
  44. 44.
    Jeffroy, O., Brinkmann, H., Delsuc, F., and Philippe, H. (2006) Phylogenomics: the beginning of incongruence? Trends Genet, 22, 225–31.PubMedCrossRefGoogle Scholar
  45. 45.
    Galtier, N. and Daubin, V. (2008) Dealing with incongruence in phylogenomic analyses. Philos Trans R Soc Lond B Biol Sci, 363, 4023–9.PubMedCrossRefGoogle Scholar
  46. 46.
    Daubin, V., Moran, N. A., and Ochman, H. (2003) Phylogenetics and the cohesion of bacterial genomes. Science, 301, 829–32.PubMedCrossRefGoogle Scholar
  47. 47.
    Ochman, H., Lerat, E., and Daubin, V. (2005) Examining bacterial species under the specter of gene transfer and exchange. Proc Natl Acad Sci U S A, 102 Suppl 1, 6595–9.PubMedCrossRefGoogle Scholar
  48. 48.
    Beiko, R. G., Harlow, T. J., and Ragan, M. A. (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA, 102, 14332–7.PubMedCrossRefGoogle Scholar
  49. 49.
    Puigbò, P., Wolf, Y. I., and Koonin, E. V. (2009) Search for a ‘tree of life’ in the thicket of the phylogenetic forest. J Biol, 8, 59.PubMedCrossRefGoogle Scholar
  50. 50.
    Puigbò, P., Wolf, Y. I., and Koonin, E. V. (2012) Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business Media New York.Google Scholar
  51. 51.
    Goodman, M., Czelusniak, J., Moore, W., Herrera, R., and Matsuda, G. (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology, 28, 132–163.CrossRefGoogle Scholar
  52. 52.
    Hallett, M., Lagergren, J., and Tofigh, A. (2004) Simultaneous identification of duplications and lateral transfers. RECOMB ’04: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, New York, NY, USA, pp. 347–356, ACM.Google Scholar
  53. 53.
    Abby, S. S., Tannier, E., Gouy, M., and Daubin, V. (2010) Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. BMC Bioinformatics, 11, 324.PubMedCrossRefGoogle Scholar
  54. 54.
    Nakhleh, L., Ruths, D., and Wang, L.-S. (2005) Riata-hgt: A fast and accurate heuristic for reconstructing horizontal gene transfer. Wang, L. (ed.), Computing and Combinatorics, vol. 3595 of Lecture Notes in Computer Science, pp. 84–93, Springer Berlin / Heidelberg.Google Scholar
  55. 55.
    Beiko, R. G. and Hamilton, N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol, 6, 15.PubMedCrossRefGoogle Scholar
  56. 56.
    Tofigh, A. (2009) Using Trees to Capture Reticulate Evolution: Lateral Gene Transfers and Cancer Progression. Ph.D. thesis, KTH, School of Computer Science and Communication.Google Scholar
  57. 57.
    Doyon, J., C, S., KY, G., GJ, S., V, R., and V, B. (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. Proceedings of RECOMB Comperative Genomics, p. to appear.Google Scholar
  58. 58.
    David, L. A. and Alm, E. J. (2011) Rapid evolutionary innovation during an archaean genetic expansion. Nature, 469, 93–6.PubMedCrossRefGoogle Scholar
  59. 59.
    Maddison, W. P. (1997) Gene trees in species trees. Systematic Biology, 46, 523–536.CrossRefGoogle Scholar
  60. 60.
    Akerborg, O., Sennblad, B., Arvestad, L., and Lagergren, J. (2009) Simultaneous bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci USA, 106, 5714–9.PubMedCrossRefGoogle Scholar
  61. 61.
    Suchard, M. A. (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics, 170, 419–31.PubMedCrossRefGoogle Scholar
  62. 62.
    Bloomquist, E. W. and Suchard, M. A. (2010) Unifying vertical and nonvertical evolution: a stochastic arg-based framework. Syst Biol, 59, 27–41.PubMedCrossRefGoogle Scholar
  63. 63.
    Wagner, A. (2009) Evolutionary constraints permeate large metabolic networks. BMC Evol Biol, 9, 231.PubMedCrossRefGoogle Scholar
  64. 64.
    Anderson, C., Liu, L., Pearl, D., and Edwards, S. V. (2012) Tangled Trees: The Challenge of Inferring Species Trees from Coalescent and Non-Coalescent Genes. In Anisimova M (ed) Evolutionary genomics: statistical and computational methods.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.UMR CNRS 5558, LBBE, “Biometrie et Biologie Evolutive” UCB Lyon 1VilleurbanneFrance

Personalised recommendations