Skip to main content

A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2006)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3909))

Abstract

We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM 2) time and O(N+M 2) space, where N is the number of organisms, h is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Proteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Green, E.D.: Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2, 573–583 (2001)

    Article  Google Scholar 

  2. Wolfe, K.H., Li, W.H.: Molecular evolution meets the genomic revolution. Nature Genetics 33, 255–265 (2003)

    Article  Google Scholar 

  3. Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6, 361–375 (2005)

    Article  Google Scholar 

  4. Fitz-Gibbon, S.T., House, C.H.: Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Research 27, 4218–4222 (1999)

    Article  Google Scholar 

  5. Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nature Genetics 21, 108–110 (1999)

    Article  Google Scholar 

  6. Tekaia, F., Lazcano, A., Dujon, B.: The genomic tree as revealed from whole proteome comparisons. Genome Research 9, 550–557 (1999)

    Google Scholar 

  7. Lin, J., Gerstein, M.: Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research 10, 808–818 (2000)

    Article  Google Scholar 

  8. Clarke, G.D.P., Beiko, R.G., Ragan, M.A., Charlebois, R.L.: Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. Journal of Bacteriology 184, 2072–2080 (2002)

    Article  Google Scholar 

  9. Korbel, J.O., Snel, B., Huynen, M.A., Bork, P.: SHOT: a web server for the construction of genome phylogenies. Trends in Genetics 18, 158–162 (2002)

    Article  Google Scholar 

  10. Dutilh, B.E., Huynen, M.A., Bruno, W.J., Snel, B.: The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. Journal of Molecular Evolution 58, 527–539 (2004)

    Article  Google Scholar 

  11. Huson, D.H., Steel, M.: Phylogenetic trees based on gene content. Bioinformatics 20, 2044–2049 (2004)

    Article  Google Scholar 

  12. Gu, X., Zhang, H.: Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution 21, 1401–1408 (2004)

    Article  Google Scholar 

  13. Lake, J.A., Rivera, M.C.: Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Molecular Biology and Evolution 21, 681–690 (2004)

    Article  Google Scholar 

  14. Yang, S., Doolittle, R.F., Bourne, P.E.: Phylogeny determined by protein domain content. Proceedings of the National Academy of Sciences of the USA 102, 373–378 (2005)

    Article  Google Scholar 

  15. Deeds, E.J., Hennessey, H., Shakhnovich, E.I.: Prokaryotic phylogenies inferred from protein structural domains. Genome Research 15, 393–402 (2005)

    Article  Google Scholar 

  16. Montague, M.G., Hutchison III, C.A.: Gene content phylogeny of herpesviruses. Proceedings of the National Academy of Sciences of the USA 97, 5334–5339 (2000)

    Article  Google Scholar 

  17. Herniou, E.A., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Cory, J.S., O’Reilly, D.R.: Use of whole genome sequence data to infer baculovirus phylogeny. Journal of Virology 75, 8117–8126 (2001)

    Article  Google Scholar 

  18. Simonson, A.B., Servin, J.A., Skophammer, R.G., Herbold, C.W., Rivera, M.C., Lake, J.A.: Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the USA 102, 6608–6613 (2005)

    Article  Google Scholar 

  19. Snel, B., Bork, P., Huynen, M.A.: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Research 12, 17–25 (2002)

    Article  Google Scholar 

  20. Mirkin, B.G., Fenner, T.I., Galperin, M.Y., Koonin, E.V.: Algorithms for computing evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evolutionary Biology 3, 2 (2003)

    Article  Google Scholar 

  21. Koonin, E.V., Galperin, M.Y.: Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Kluwer Academic Publishers, New York (2002)

    Google Scholar 

  22. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 441 (2003)

    Article  Google Scholar 

  23. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the USA 96, 4285–4288 (1999)

    Article  Google Scholar 

  24. Jordan, I.K., Makarova, K.S., Spouge, J.L., Wolf, Y.I., Koonin, E.V.: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Research 11, 555–565 (2001)

    Article  Google Scholar 

  25. Gogarten, J.P., Doolittle, W.F., Lawrence, J.G.: Prokaryotic evolution in light of gene transfer. Molecular Biology and Evolution 19, 2226–2238 (2002)

    Google Scholar 

  26. Kurland, C.G., Canback, B., Berg, O.G.: Horizontal gene transfer: a critical view. Proceedings of the National Academy of Sciences of the USA 100, 9658–9662 (2003)

    Article  Google Scholar 

  27. Kunin, V., Goldovsky, L., Darzentas, N., Ouzounis, C.A.: The net of life: reconstructing the microbial phylogenetic network. Genome Research 15, 954–959 (2005)

    Article  Google Scholar 

  28. Ge, F., Wang, L.S., Kim, J.: The cobweb of life revealed by genome-scale estimates of horizontal gene transfer. PLoS Biology 3, e316 (2005)

    Google Scholar 

  29. Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E.R., Nesbø, C.L., Case, R.J., Doolittle, W.F.: Lateral gene transfer and the origin of prokaryotic groups. Annual Review of Genetics 37, 283–328 (2003)

    Article  Google Scholar 

  30. Pál, C., Papp, B., Lercher, M.: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics 37, 1372–1375 (2005)

    Article  Google Scholar 

  31. Hahn, M.W., De Bie, T., Stajich, J.E., Nguyen, C., Cristianini, N.: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Research 15, 1153–1160 (2005)

    Article  Google Scholar 

  32. Karev, G.P., Wolf, Y.I., Rzhetsky, A.Y., Berezovskaya, F.S., Koonin, E.V.: Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evolutionary Biology 2, 18 (2002)

    Article  Google Scholar 

  33. Karev, G.P., Wolf, Y.I., Koonin, E.V.: Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve? Bioinformatics 19, 1889–1900 (2003)

    Article  Google Scholar 

  34. Karev, G.P., Wolf, Y.I., Berezovskaya, F.S., Koonin, E.V.: Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evolutionary Biology 4, 32 (2004)

    Article  Google Scholar 

  35. Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L., Koonin, E.V.: Genome trees constructed by five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 1, 8 (2001)

    Article  Google Scholar 

  36. Kunin, V., Ouzounis, C.A.: GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 19, 1412–1416 (2003)

    Article  Google Scholar 

  37. Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley & Sons, Chichester (1950)

    MATH  Google Scholar 

  38. Sonnhammer, E.L.L., Koonin, E.V.: Orthology, paralogy and proposed classification for paralog subtypes. Trends in Genetics 18, 619–620 (2002)

    Article  Google Scholar 

  39. Karlin, S., McGregor, J.: Linear growth, birth, and death processes. Journal of Mathematics and Mechanics 7, 643–662 (1958)

    MATH  MathSciNet  Google Scholar 

  40. Lerat, E., Daubin, V., Moran, N.A.: From gene trees to organismal phylogeny in Prokaryotes: the case of the γ-Proteobacteria. PLoS Biology 1, E19 (2003)

    Google Scholar 

  41. Boussau, B., Karlberg, E.O., Frank, A.C., Legault, B.A., Andersson, S.G.E.: Computational inference of scenarios for α-proteobacterial genome evolution. Proceedings of the National Academy of Sciences of the USA 101, 9722–9727 (2004)

    Article  Google Scholar 

  42. Herbeck, J.T., Degnan, P.H., Wernegren, J.J.: Nonhomogeneous model of sequence evolution indicates independent origins of endosymbionts within the Enterobacteriales (γ-Proteobacteria). Molecular Biology and Evolution 22, 520–532 (2005)

    Article  Google Scholar 

  43. Belda, E., Moya, A., Silva, F.J.: Genome rearrangement distances and gene order phylogeny in γ-Proteobacteria. Molecular Biology and Evolution 22, 1456–1467 (2005)

    Article  Google Scholar 

  44. Reed, W.J., Hughes, B.D.: A model explaining the size distribution of gene families. Mathematical Biosciences 189, 97–102 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  45. Pupko, T., Pe’er, I., Shamir, R., Graur, D.: A fast algorithm for joint reconstruction of ancestral amino acid sequences. Molecular Biology and Evolution 17, 890–896 (2000)

    Google Scholar 

  46. Csűrös, M.: Likely scenarios of intron evolution. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 47–60. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csűrös, M., Miklós, I. (2006). A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_18

Download citation

  • DOI: https://doi.org/10.1007/11732990_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33295-4

  • Online ISBN: 978-3-540-33296-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics