Abstract
We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM 2) time and O(N+M 2) space, where N is the number of organisms, h is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Proteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Green, E.D.: Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2, 573–583 (2001)
Wolfe, K.H., Li, W.H.: Molecular evolution meets the genomic revolution. Nature Genetics 33, 255–265 (2003)
Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6, 361–375 (2005)
Fitz-Gibbon, S.T., House, C.H.: Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Research 27, 4218–4222 (1999)
Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nature Genetics 21, 108–110 (1999)
Tekaia, F., Lazcano, A., Dujon, B.: The genomic tree as revealed from whole proteome comparisons. Genome Research 9, 550–557 (1999)
Lin, J., Gerstein, M.: Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research 10, 808–818 (2000)
Clarke, G.D.P., Beiko, R.G., Ragan, M.A., Charlebois, R.L.: Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. Journal of Bacteriology 184, 2072–2080 (2002)
Korbel, J.O., Snel, B., Huynen, M.A., Bork, P.: SHOT: a web server for the construction of genome phylogenies. Trends in Genetics 18, 158–162 (2002)
Dutilh, B.E., Huynen, M.A., Bruno, W.J., Snel, B.: The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. Journal of Molecular Evolution 58, 527–539 (2004)
Huson, D.H., Steel, M.: Phylogenetic trees based on gene content. Bioinformatics 20, 2044–2049 (2004)
Gu, X., Zhang, H.: Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution 21, 1401–1408 (2004)
Lake, J.A., Rivera, M.C.: Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Molecular Biology and Evolution 21, 681–690 (2004)
Yang, S., Doolittle, R.F., Bourne, P.E.: Phylogeny determined by protein domain content. Proceedings of the National Academy of Sciences of the USA 102, 373–378 (2005)
Deeds, E.J., Hennessey, H., Shakhnovich, E.I.: Prokaryotic phylogenies inferred from protein structural domains. Genome Research 15, 393–402 (2005)
Montague, M.G., Hutchison III, C.A.: Gene content phylogeny of herpesviruses. Proceedings of the National Academy of Sciences of the USA 97, 5334–5339 (2000)
Herniou, E.A., Luque, T., Chen, X., Vlak, J.M., Winstanley, D., Cory, J.S., O’Reilly, D.R.: Use of whole genome sequence data to infer baculovirus phylogeny. Journal of Virology 75, 8117–8126 (2001)
Simonson, A.B., Servin, J.A., Skophammer, R.G., Herbold, C.W., Rivera, M.C., Lake, J.A.: Decoding the genomic tree of life. Proceedings of the National Academy of Sciences of the USA 102, 6608–6613 (2005)
Snel, B., Bork, P., Huynen, M.A.: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Research 12, 17–25 (2002)
Mirkin, B.G., Fenner, T.I., Galperin, M.Y., Koonin, E.V.: Algorithms for computing evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evolutionary Biology 3, 2 (2003)
Koonin, E.V., Galperin, M.Y.: Sequence-Evolution-Function: Computational Approaches in Comparative Genomics. Kluwer Academic Publishers, New York (2002)
Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 441 (2003)
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences of the USA 96, 4285–4288 (1999)
Jordan, I.K., Makarova, K.S., Spouge, J.L., Wolf, Y.I., Koonin, E.V.: Lineage-specific gene expansions in bacterial and archaeal genomes. Genome Research 11, 555–565 (2001)
Gogarten, J.P., Doolittle, W.F., Lawrence, J.G.: Prokaryotic evolution in light of gene transfer. Molecular Biology and Evolution 19, 2226–2238 (2002)
Kurland, C.G., Canback, B., Berg, O.G.: Horizontal gene transfer: a critical view. Proceedings of the National Academy of Sciences of the USA 100, 9658–9662 (2003)
Kunin, V., Goldovsky, L., Darzentas, N., Ouzounis, C.A.: The net of life: reconstructing the microbial phylogenetic network. Genome Research 15, 954–959 (2005)
Ge, F., Wang, L.S., Kim, J.: The cobweb of life revealed by genome-scale estimates of horizontal gene transfer. PLoS Biology 3, e316 (2005)
Boucher, Y., Douady, C.J., Papke, R.T., Walsh, D.A., Boudreau, M.E.R., Nesbø, C.L., Case, R.J., Doolittle, W.F.: Lateral gene transfer and the origin of prokaryotic groups. Annual Review of Genetics 37, 283–328 (2003)
Pál, C., Papp, B., Lercher, M.: Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics 37, 1372–1375 (2005)
Hahn, M.W., De Bie, T., Stajich, J.E., Nguyen, C., Cristianini, N.: Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Research 15, 1153–1160 (2005)
Karev, G.P., Wolf, Y.I., Rzhetsky, A.Y., Berezovskaya, F.S., Koonin, E.V.: Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evolutionary Biology 2, 18 (2002)
Karev, G.P., Wolf, Y.I., Koonin, E.V.: Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve? Bioinformatics 19, 1889–1900 (2003)
Karev, G.P., Wolf, Y.I., Berezovskaya, F.S., Koonin, E.V.: Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models. BMC Evolutionary Biology 4, 32 (2004)
Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L., Koonin, E.V.: Genome trees constructed by five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 1, 8 (2001)
Kunin, V., Ouzounis, C.A.: GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 19, 1412–1416 (2003)
Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley & Sons, Chichester (1950)
Sonnhammer, E.L.L., Koonin, E.V.: Orthology, paralogy and proposed classification for paralog subtypes. Trends in Genetics 18, 619–620 (2002)
Karlin, S., McGregor, J.: Linear growth, birth, and death processes. Journal of Mathematics and Mechanics 7, 643–662 (1958)
Lerat, E., Daubin, V., Moran, N.A.: From gene trees to organismal phylogeny in Prokaryotes: the case of the γ-Proteobacteria. PLoS Biology 1, E19 (2003)
Boussau, B., Karlberg, E.O., Frank, A.C., Legault, B.A., Andersson, S.G.E.: Computational inference of scenarios for α-proteobacterial genome evolution. Proceedings of the National Academy of Sciences of the USA 101, 9722–9727 (2004)
Herbeck, J.T., Degnan, P.H., Wernegren, J.J.: Nonhomogeneous model of sequence evolution indicates independent origins of endosymbionts within the Enterobacteriales (γ-Proteobacteria). Molecular Biology and Evolution 22, 520–532 (2005)
Belda, E., Moya, A., Silva, F.J.: Genome rearrangement distances and gene order phylogeny in γ-Proteobacteria. Molecular Biology and Evolution 22, 1456–1467 (2005)
Reed, W.J., Hughes, B.D.: A model explaining the size distribution of gene families. Mathematical Biosciences 189, 97–102 (2004)
Pupko, T., Pe’er, I., Shamir, R., Graur, D.: A fast algorithm for joint reconstruction of ancestral amino acid sequences. Molecular Biology and Evolution 17, 890–896 (2000)
Csűrös, M.: Likely scenarios of intron evolution. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 47–60. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Csűrös, M., Miklós, I. (2006). A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2006. Lecture Notes in Computer Science(), vol 3909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732990_18
Download citation
DOI: https://doi.org/10.1007/11732990_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33295-4
Online ISBN: 978-3-540-33296-1
eBook Packages: Computer ScienceComputer Science (R0)