A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer

  • Miklós Csűrös
  • István Miklós
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM 2) time and O(N+M 2) space, where N is the number of organisms, h is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Proteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database.


Horizontal Gene Transfer Gene Content Horizontal Transfer Gene Count Conditional Likelihood 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Green, E.D.: Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2, 573–583 (2001)CrossRefGoogle Scholar
  2. 2.
    Wolfe, K.H., Li, W.H.: Molecular evolution meets the genomic revolution. Nature Genetics 33, 255–265 (2003)CrossRefGoogle Scholar
  3. 3.
    Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6, 361–375 (2005)CrossRefGoogle Scholar
  4. 4.
    Fitz-Gibbon, S.T., House, C.H.: Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Research 27, 4218–4222 (1999)CrossRefGoogle Scholar
  5. 5.
    Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nature Genetics 21, 108–110 (1999)CrossRefGoogle Scholar
  6. 6.
    Tekaia, F., Lazcano, A., Dujon, B.: The genomic tree as revealed from whole proteome comparisons. Genome Research 9, 550–557 (1999)Google Scholar
  7. 7.
    Lin, J., Gerstein, M.: Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research 10, 808–818 (2000)CrossRefGoogle Scholar
  8. 8.
    Clarke, G.D.P., Beiko, R.G., Ragan, M.A., Charlebois, R.L.: Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. Journal of Bacteriology 184, 2072–2080 (2002)CrossRefGoogle Scholar
  9. 9.
    Korbel, J.O., Snel, B., Huynen, M.A., Bork, P.: SHOT: a web server for the construction of genome phylogenies. Trends in Genetics 18, 158–162 (2002)CrossRefGoogle Scholar
  10. 10.
    Dutilh, B.E., Huynen, M.A., Bruno, W.J., Snel, B.: The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. Journal of Molecular Evolution 58, 527–539 (2004)CrossRefGoogle Scholar
  11. 11.
    Huson, D.H., Steel, M.: Phylogenetic trees based on gene content. Bioinformatics 20, 2044–2049 (2004)CrossRefGoogle Scholar
  12. 12.
    Gu, X., Zhang, H.: Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution 21, 1401–1408 (2004)CrossRefGoogle Scholar
  13. 13.
    Lake, J.A., Rivera, M.C.: Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Molecular Biology and Evolution 21, 681–690 (2004)CrossRefGoogle Scholar
  14. 14.
    Yang, S., Doolittle, R.F., Bourne, P.E.: Phylogeny determined by protein domain content. Proceedings of the National Academy of Sciences of the USA 102, 373–378 (2005)CrossRefGoogle Scholar
  15. 15.
    Deeds, E.J., Hennessey, H., Shakhnovich, E.I.: Prokaryotic phylogenies inferred from protein structural domains. Genome Research 15, 393–402 (2005)CrossRefGoogle Scholar
  16. 16.
    Montague, M.G., Hutchison III, C.A.: Gene content phylogeny of herpesviruses. Proceedings of the National Academy of Sciences of the USA 97, 5334–5339 (2000)CrossRefGoogle Scholar
  17. 17.
    Herniou, E.A., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Cory, J.S., O’Reilly, D.R.: Use of whole genome sequence data to infer baculovirus phylogeny. Journal of Virology 75, 8117–8126 (2001)CrossRefGoogle Scholar
  18. 18.
    Simonson, A.B., Servin, J.A., Skophammer, R.G., Herbold, C.W., Rivera, M.C., Lake, J.A.: Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the USA 102, 6608–6613 (2005)CrossRefGoogle Scholar
  19. 19.
    Snel, B., Bork, P., Huynen, M.A.: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Research 12, 17–25 (2002)CrossRefGoogle Scholar
  20. 20.
    Mirkin, B.G., Fenner, T.I., Galperin, M.Y., Koonin, E.V.: Algorithms for computing evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evolutionary Biology 3, 2 (2003)CrossRefGoogle Scholar
  21. 21.
    Koonin, E.V., Galperin, M.Y.: Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Kluwer Academic Publishers, New York (2002)Google Scholar
  22. 22.
    Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 441 (2003)CrossRefGoogle Scholar
  23. 23.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the USA 96, 4285–4288 (1999)CrossRefGoogle Scholar
  24. 24.
    Jordan, I.K., Makarova, K.S., Spouge, J.L., Wolf, Y.I., Koonin, E.V.: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Research 11, 555–565 (2001)CrossRefGoogle Scholar
  25. 25.
    Gogarten, J.P., Doolittle, W.F., Lawrence, J.G.: Prokaryotic evolution in light of gene transfer. Molecular Biology and Evolution 19, 2226–2238 (2002)Google Scholar
  26. 26.
    Kurland, C.G., Canback, B., Berg, O.G.: Horizontal gene transfer: a critical view. Proceedings of the National Academy of Sciences of the USA 100, 9658–9662 (2003)CrossRefGoogle Scholar
  27. 27.
    Kunin, V., Goldovsky, L., Darzentas, N., Ouzounis, C.A.: The net of life: reconstructing the microbial phylogenetic network. Genome Research 15, 954–959 (2005)CrossRefGoogle Scholar
  28. 28.
    Ge, F., Wang, L.S., Kim, J.: The cobweb of life revealed by genome-scale estimates of horizontal gene transfer. PLoS Biology 3, e316 (2005)Google Scholar
  29. 29.
    Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E.R., Nesbø, C.L., Case, R.J., Doolittle, W.F.: Lateral gene transfer and the origin of prokaryotic groups. Annual Review of Genetics 37, 283–328 (2003)CrossRefGoogle Scholar
  30. 30.
    Pál, C., Papp, B., Lercher, M.: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics 37, 1372–1375 (2005)CrossRefGoogle Scholar
  31. 31.
    Hahn, M.W., De Bie, T., Stajich, J.E., Nguyen, C., Cristianini, N.: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Research 15, 1153–1160 (2005)CrossRefGoogle Scholar
  32. 32.
    Karev, G.P., Wolf, Y.I., Rzhetsky, A.Y., Berezovskaya, F.S., Koonin, E.V.: Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evolutionary Biology 2, 18 (2002)CrossRefGoogle Scholar
  33. 33.
    Karev, G.P., Wolf, Y.I., Koonin, E.V.: Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve? Bioinformatics 19, 1889–1900 (2003)CrossRefGoogle Scholar
  34. 34.
    Karev, G.P., Wolf, Y.I., Berezovskaya, F.S., Koonin, E.V.: Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evolutionary Biology 4, 32 (2004)CrossRefGoogle Scholar
  35. 35.
    Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L., Koonin, E.V.: Genome trees constructed by five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 1, 8 (2001)CrossRefGoogle Scholar
  36. 36.
    Kunin, V., Ouzounis, C.A.: GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 19, 1412–1416 (2003)CrossRefGoogle Scholar
  37. 37.
    Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley & Sons, Chichester (1950)MATHGoogle Scholar
  38. 38.
    Sonnhammer, E.L.L., Koonin, E.V.: Orthology, paralogy and proposed classification for paralog subtypes. Trends in Genetics 18, 619–620 (2002)CrossRefGoogle Scholar
  39. 39.
    Karlin, S., McGregor, J.: Linear growth, birth, and death processes. Journal of Mathematics and Mechanics 7, 643–662 (1958)MATHMathSciNetGoogle Scholar
  40. 40.
    Lerat, E., Daubin, V., Moran, N.A.: From gene trees to organismal phylogeny in Prokaryotes: the case of the γ-Proteobacteria. PLoS Biology 1, E19 (2003)Google Scholar
  41. 41.
    Boussau, B., Karlberg, E.O., Frank, A.C., Legault, B.A., Andersson, S.G.E.: Computational inference of scenarios for α-proteobacterial genome evolution. Proceedings of the National Academy of Sciences of the USA 101, 9722–9727 (2004)CrossRefGoogle Scholar
  42. 42.
    Herbeck, J.T., Degnan, P.H., Wernegren, J.J.: Nonhomogeneous model of sequence evolution indicates independent origins of endosymbionts within the Enterobacteriales (γ-Proteobacteria). Molecular Biology and Evolution 22, 520–532 (2005)CrossRefGoogle Scholar
  43. 43.
    Belda, E., Moya, A., Silva, F.J.: Genome rearrangement distances and gene order phylogeny in γ-Proteobacteria. Molecular Biology and Evolution 22, 1456–1467 (2005)CrossRefGoogle Scholar
  44. 44.
    Reed, W.J., Hughes, B.D.: A model explaining the size distribution of gene families. Mathematical Biosciences 189, 97–102 (2004)MATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Pupko, T., Pe’er, I., Shamir, R., Graur, D.: A fast algorithm for joint reconstruction of ancestral amino acid sequences. Molecular Biology and Evolution 17, 890–896 (2000)Google Scholar
  46. 46.
    Csűrös, M.: Likely scenarios of intron evolution. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 47–60. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Miklós Csűrös
    • 1
  • István Miklós
    • 2
  1. 1.Department of Computer Science and Operations ResearchUniversité de MontréalMontréalCanada
  2. 2.Department of Plant Taxonomy and EcologyEötvös Lóránd UniversityHungary

Personalised recommendations