Skip to main content

Modeling Gene Family Evolution and Reconciling Phylogenetic Discord

  • Protocol
  • First Online:
Book cover Evolutionary Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 856))

Abstract

Large-scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of life. Here, we discuss the approaches of increasing complexity aimed at extracting information on the pattern and process of gene family evolution from such datasets. In particular, we consider the models that invoke processes of gene birth (duplication and transfer) and death (loss) to explain the evolution of gene families. First, we review birth-and-death models of family size evolution and their implications in light of the universal features of family size distribution observed across different species and the three domains of life. Subsequently, we proceed to recent developments on models capable of more completely considering information in the sequences of homologous gene families through the probabilistic reconciliation of the phylogenetic histories of individual genes with the phylogenetic history of the genomes in which they have resided. To illustrate the methods and results presented, we use data from the HOGENOM database, demonstrating that the distribution of homologous gene family sizes in the genomes of the eukaryota, archaea, and bacteria exhibits remarkably similar shapes. We show that these distributions are best described by models of gene family size evolution, where for individual genes the death (loss) rate is larger than the birth (duplication and transfer) rate but new families are continually supplied to the genome by a process of origination. Finally, we use probabilistic reconciliation methods to take into consideration additional information from gene phylogenies, and find that, for prokaryotes, the majority of birth events are the result of transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Crick, F. H. (1968) The origin of the genetic code. J Mol Biol, 38, 367–79.

    Article  PubMed  CAS  Google Scholar 

  2. Theobald, D. L. (2010) A formal test of the theory of universal common ancestry. Nature, 465, 219–22.

    Article  PubMed  CAS  Google Scholar 

  3. Boussau, B. and Daubin, V. (2010) Genomes as documents of evolutionary history. Trends Ecol Evol, 25, 224–32.

    Article  PubMed  Google Scholar 

  4. Koonin, E. V. and Wolf, Y. I. (2008) Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res, 36, 6688–719.

    Article  PubMed  CAS  Google Scholar 

  5. Long, M., Betrán, E., Thornton, K., and Wang, W. (2003) The origin of new genes: glimpses from the young and old. Nat Rev Genet, 4, 865–75.

    Article  PubMed  CAS  Google Scholar 

  6. Lynch, M. (2007) The origins of genome architecture. Sinauer Associates.

    Google Scholar 

  7. Lerat, E., Daubin, V., Ochman, H., and Moran, N. A. (2005) Evolutionary origins of genomic repertoires in bacteria. PLoS Biol, 3, e130.

    Article  PubMed  Google Scholar 

  8. Gogarten, J. P. and Townsend, J. P. (2005) Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol, 3, 679–87.

    Article  PubMed  CAS  Google Scholar 

  9. Lynch, M. and Conery, J. S. (2003) The origins of genome complexity. Science, 302, 1401–4.

    Article  PubMed  CAS  Google Scholar 

  10. Siew, N. and Fischer, D. (2003) Analysis of singleton orfans in fully sequenced microbial genomes. Proteins, 53, 241–51.

    Article  PubMed  CAS  Google Scholar 

  11. Daubin, V. and Ochman, H. (2004) Bacterial genomes as new gene homes: the genealogy of orfans in e. coli. Genome Res, 14, 1036–42.

    Article  PubMed  CAS  Google Scholar 

  12. Huynen, M. A. and van Nimwegen, E. (1998) The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol, 15, 583–9.

    PubMed  CAS  Google Scholar 

  13. Qian, J., Luscombe, N. M., and Gerstein, M. (2001) Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol, 313, 673–81.

    Article  PubMed  CAS  Google Scholar 

  14. Karev, G. P., Wolf, Y. I., Rzhetsky, A. Y., Berezovskaya, F. S., and Koonin, E. V. (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol, 2, 18.

    Article  PubMed  Google Scholar 

  15. Molina, N. and van Nimwegen, E. (2009) Scaling laws in functional genome content across prokaryotic clades and lifestyles. Trends Genet, 25, 243–7.

    Article  PubMed  CAS  Google Scholar 

  16. Koonin, E. V., Wolf, Y. I., and Karev, G. P. (2006) Power laws, scale-free networks and genome biology. Molecular biology intelligence unit, Landes Bioscience/Eurekah.com.

    Google Scholar 

  17. Penel, S., Arigon, A.-M., Dufayard, J.-F., Sertier, A.-S., Daubin, V., Duret, L., Gouy, M., and Perrière, G. (2009) Databases of homologous gene families for comparative genomics. BMC Bioinformatics, 10 Suppl 6, S3.

    Google Scholar 

  18. Novozhilov, A. S., Karev, G. P., and Koonin, E. V. (2006) Biological applications of the theory of birth-and-death processes. Brief Bioinform, 7, 70–85.

    Article  PubMed  Google Scholar 

  19. Koonin, E. V., Wolf, Y. I., and Karev, G. P. (2002) The structure of the protein universe and genome evolution. Nature, 420, 218–23.

    Article  PubMed  CAS  Google Scholar 

  20. Reed, W. J. and Hughes, B. D. (2004) A model explaining the size distribution of gene and protein families. Math Biosci, 189, 97–102.

    Article  PubMed  CAS  Google Scholar 

  21. Csűrös, M. and Miklós, I. (2009) Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Mol Biol Evol, 26, 2087–95.

    Article  PubMed  Google Scholar 

  22. Yule, G. U. (1925) A mathematical theory of evolution, based on the conclusions of dr. j. c. willis, f.r.s. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213, 21–87.

    Google Scholar 

  23. Feller, W. (1939) Die grundlagen der volterraschen theorie des kampfes urns dasein in wahrscheinliehkeitstheoretischer behandlung. Acta Biotheoretioa Series A., 5, 11–39.

    Article  Google Scholar 

  24. Kendall, D. G. (1948) On the generalized “birth-and-death” process. The Annals of Mathematical Statistics, 19, 1–15.

    Article  Google Scholar 

  25. Bartholomay, A. (1958-06-01) On the linear birth and death processes of biology as markoff chains. Bulletin of Mathematical Biology, 20, 97–118.

    Google Scholar 

  26. Takács, L. (1962) Introduction to the theory of queues. Oxford University Press.

    Google Scholar 

  27. Ota, T. and Nei, M. (1994) Divergent evolution and evolution by the birth-and-death process in the immunoglobulin vh gene family. Mol Biol Evol, 11, 469–82.

    PubMed  CAS  Google Scholar 

  28. Nei, M., Gu, X., and Sitnikova, T. (1997) Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc Natl Acad Sci U S A, 94, 7799–806.

    Article  PubMed  CAS  Google Scholar 

  29. Yanai, I., Camacho, C. J., and DeLisi, C. (2000) Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. Phys Rev Lett, 85, 2641–4.

    Article  PubMed  CAS  Google Scholar 

  30. Hughes, A. L., Ekollu, V., Friedman, R., and Rose, J. R. (2005) Gene family content-based phylogeny of prokaryotes: the effect of criteria for inferring homology. Syst Biol, 54, 268–76.

    Article  PubMed  Google Scholar 

  31. Wójtowicz, D. and Tiuryn, J. (2007) Evolution of gene families based on gene duplication, loss, accumulated change, and innovation. J Comput Biol, 14, 479–95.

    Article  PubMed  Google Scholar 

  32. Fitz-Gibbon, S. T. and House, C. H. (1999) Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res, 27, 4218–22.

    Article  PubMed  CAS  Google Scholar 

  33. Snel, B., Bork, P., and Huynen, M. A. (1999) Genome phylogeny based on gene content. Nat Genet, 21, 108–10.

    Article  PubMed  CAS  Google Scholar 

  34. Wolf, Y. I., Rogozin, I. B., Grishin, N. V., and Koonin, E. V. (2002) Genome trees and the tree of life. Trends Genet, 18, 472–9.

    Article  PubMed  CAS  Google Scholar 

  35. Deeds, E. J., Hennessey, H., and Shakhnovich, E. I. (2005) Prokaryotic phylogenies inferred from protein structural domains. Genome Res, 15, 393–402.

    Article  PubMed  CAS  Google Scholar 

  36. Lienau, E. K., DeSalle, R., Rosenfeld, J. A., and Planet, P. J. (2006) Reciprocal illumination in the gene content tree of life. Syst Biol, 55, 441–53.

    Article  PubMed  Google Scholar 

  37. Mirkin, B. G., Fenner, T. I., Galperin, M. Y., and Koonin, E. V. (2003) Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol Biol, 3, 2.

    Article  PubMed  Google Scholar 

  38. Csűrös, M. and Miklós, I. (2009) Mathematical framework for phylogenetic birth-and-death models. ar Xiv, p. 0902.0970.

    Google Scholar 

  39. Hahn, M. W., De Bie, T., Stajich, J. E., Nguyen, C., and Cristianini, N. (2005) Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res, 15, 1153–60.

    Article  PubMed  CAS  Google Scholar 

  40. Spencer, M., Susko, E., and Roger, A. J. (2006) Modelling prokaryote gene content. Evol Bioinform Online, 2, 157–78.

    Google Scholar 

  41. Iwasaki, W. and Takagi, T. (2007) Reconstruction of highly heterogeneous gene-content evolution across the three domains of life. Bioinformatics, 23, i230–9.

    Article  PubMed  CAS  Google Scholar 

  42. Felsenstein, J. (2004) Inferring phylogenies. Sinauer Associates.

    Google Scholar 

  43. Csűrös, M. (2010) Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics, 26, 1910–2.

    Article  PubMed  Google Scholar 

  44. Jeffroy, O., Brinkmann, H., Delsuc, F., and Philippe, H. (2006) Phylogenomics: the beginning of incongruence? Trends Genet, 22, 225–31.

    Article  PubMed  CAS  Google Scholar 

  45. Galtier, N. and Daubin, V. (2008) Dealing with incongruence in phylogenomic analyses. Philos Trans R Soc Lond B Biol Sci, 363, 4023–9.

    Article  PubMed  Google Scholar 

  46. Daubin, V., Moran, N. A., and Ochman, H. (2003) Phylogenetics and the cohesion of bacterial genomes. Science, 301, 829–32.

    Article  PubMed  CAS  Google Scholar 

  47. Ochman, H., Lerat, E., and Daubin, V. (2005) Examining bacterial species under the specter of gene transfer and exchange. Proc Natl Acad Sci U S A, 102 Suppl 1, 6595–9.

    Article  PubMed  CAS  Google Scholar 

  48. Beiko, R. G., Harlow, T. J., and Ragan, M. A. (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA, 102, 14332–7.

    Article  PubMed  CAS  Google Scholar 

  49. Puigbò, P., Wolf, Y. I., and Koonin, E. V. (2009) Search for a ‘tree of life’ in the thicket of the phylogenetic forest. J Biol, 8, 59.

    Article  PubMed  Google Scholar 

  50. Puigbò, P., Wolf, Y. I., and Koonin, E. V. (2012) Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business Media New York.

    Google Scholar 

  51. Goodman, M., Czelusniak, J., Moore, W., Herrera, R., and Matsuda, G. (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology, 28, 132–163.

    Article  CAS  Google Scholar 

  52. Hallett, M., Lagergren, J., and Tofigh, A. (2004) Simultaneous identification of duplications and lateral transfers. RECOMB ’04: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, New York, NY, USA, pp. 347–356, ACM.

    Google Scholar 

  53. Abby, S. S., Tannier, E., Gouy, M., and Daubin, V. (2010) Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. BMC Bioinformatics, 11, 324.

    Article  PubMed  Google Scholar 

  54. Nakhleh, L., Ruths, D., and Wang, L.-S. (2005) Riata-hgt: A fast and accurate heuristic for reconstructing horizontal gene transfer. Wang, L. (ed.), Computing and Combinatorics, vol. 3595 of Lecture Notes in Computer Science, pp. 84–93, Springer Berlin / Heidelberg.

    Google Scholar 

  55. Beiko, R. G. and Hamilton, N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evol Biol, 6, 15.

    Article  PubMed  Google Scholar 

  56. Tofigh, A. (2009) Using Trees to Capture Reticulate Evolution: Lateral Gene Transfers and Cancer Progression. Ph.D. thesis, KTH, School of Computer Science and Communication.

    Google Scholar 

  57. Doyon, J., C, S., KY, G., GJ, S., V, R., and V, B. (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. Proceedings of RECOMB Comperative Genomics, p. to appear.

    Google Scholar 

  58. David, L. A. and Alm, E. J. (2011) Rapid evolutionary innovation during an archaean genetic expansion. Nature, 469, 93–6.

    Article  PubMed  CAS  Google Scholar 

  59. Maddison, W. P. (1997) Gene trees in species trees. Systematic Biology, 46, 523–536.

    Article  Google Scholar 

  60. Akerborg, O., Sennblad, B., Arvestad, L., and Lagergren, J. (2009) Simultaneous bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci USA, 106, 5714–9.

    Article  PubMed  Google Scholar 

  61. Suchard, M. A. (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics, 170, 419–31.

    Article  PubMed  CAS  Google Scholar 

  62. Bloomquist, E. W. and Suchard, M. A. (2010) Unifying vertical and nonvertical evolution: a stochastic arg-based framework. Syst Biol, 59, 27–41.

    Article  PubMed  Google Scholar 

  63. Wagner, A. (2009) Evolutionary constraints permeate large metabolic networks. BMC Evol Biol, 9, 231.

    Article  PubMed  Google Scholar 

  64. Anderson, C., Liu, L., Pearl, D., and Edwards, S. V. (2012) Tangled Trees: The Challenge of Inferring Species Trees from Coalescent and Non-Coalescent Genes. In Anisimova M (ed) Evolutionary genomics: statistical and computational methods.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Daubin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Szöllősi, G.J., Daubin, V. (2012). Modeling Gene Family Evolution and Reconciling Phylogenetic Discord. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-585-5_2

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-584-8

  • Online ISBN: 978-1-61779-585-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics