Inferring gene duplications, transfers and losses can be done in a discrete framework
Abstract
In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree—whose internal nodes represent speciation events—while the evolutionary history of a gene family is depicted by a gene tree—whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications (\(\mathbb {D}\)), transfers (\(\mathbb {T}\)) and losses (\(\mathbb {L}\)). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony \(\mathbb {DTL}\) reconciliation problem in the discrete framework is equivalent to finding a most parsimonious \(\mathbb {DTL}\) scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.
Keywords
Tree reconciliation Tree embedding Gene evolution Phylogenetics Parsimony EquivalenceMathematics Subject Classification
68R05 68R10 92-08 05C30Notes
Acknowledgments
This work was funded by the French Agence Nationale de la Recherche (ANR) through Grants ANR-09-PEXT-000 “Phylospace” and ANR-10-BINF-01-01 “Ancestrome”, and by the Institut de Biologie Computationnelle. This publication is Contribution No. 2015-098 of the Institut des Sciences de l’Evolution de Montpellier (ISEM, UMR 5554).
References
- Åkerborg Ö, Sennblad B, Arvestad L, Lagergren J (2009) Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci USA 106(14):5714–5719CrossRefGoogle Scholar
- Arvestad L, Berglund AC, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19 Suppl 1:7–15CrossRefGoogle Scholar
- Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12):i283–i291. doi: 10.1093/bioinformatics/bts225 CrossRefGoogle Scholar
- Berglund AC, Steffansson P, Betts MJ, Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 63:240–250CrossRefGoogle Scholar
- Charleston M (1998) Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci 149(2):191–223. doi: 10.1016/S0025-5564(97)10012-8 MathSciNetCrossRefMATHGoogle Scholar
- Chevenet F, Doyon JF, Scornavacca C, Jousselin E, Berry V (2015) Sylvx: a viewer for phylogenetic reconciliations (under review)Google Scholar
- Conow C, Fielder D, Ovadia Y, Libeskind-Hadas R (2010) Jane: a new tool for the cophylogeny reconstruction problem. Algorithms Mol Biol 5:16CrossRefGoogle Scholar
- Cotton J, Page R (2005) Rates and patterns of gene duplication and loss in the human genome. Proc Biol Sci 272(1560):277–283CrossRefGoogle Scholar
- Daubin V, Moran NA, Ochman H (2003) Phylogenetics and the cohesion of bacterial genomes. Science 301:829–832CrossRefGoogle Scholar
- David L, Alm E (2011) Rapid evolutionary innovation during an archaean genetic expansion. Nature 469(7328):93–96CrossRefGoogle Scholar
- Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW (2006) The evolution of mammalian gene families. PLoS One 1:e85CrossRefGoogle Scholar
- Doyon J, Ranwez V, Daubin V, Berry V (2011) Models, algorithms and programs for phylogeny reconciliation. Brief Bioinform 12:392–400CrossRefGoogle Scholar
- Doyon JP, Scornavacca C, Gorbunov KY, Szllosi GJ, Ranwez V, Berry V (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Tannier E (ed) RECOMB-CG, Lecture Notes in Computer Science, vol 6398. Springer, Berlin, pp 93–108Google Scholar
- Drummond AJ, Ho SY, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4(5). doi: 10.1371/journal.pbio.0040088
- Fischer I, Dainat J, Ranwez V, Glemin S, Dufayard JF, Chantret N (2014) Impact of recurrent gene duplication on adaptation of plant genomes. BMC Plant Biol 14(1):151. doi: 10.1186/1471-2229-14-151. http://www.biomedcentral.com/1471-2229/14/151
- Fitch WM (2000) Homology—a personal view on some of the problems. Trends Genet 16(5):227–231CrossRefGoogle Scholar
- Gabaldon T (2006) Computational approaches for the prediction of protein function in the mitochondrion. Am J Physiol Cell Physiol 291(6):C1121–1128. doi: 10.1152/ajpcell.00225.2006 CrossRefGoogle Scholar
- Goldenfeld N, Woese C (2007) Biology’s next revolution. Nature 445:369CrossRefGoogle Scholar
- Goodman M, Czelusniak J, Moore GW, Herrera RA, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool 28:132–163CrossRefGoogle Scholar
- Gorbunov KY, Lyubetsky VA (2009) Reconstructing genes evolution along a species tree. Mol Biol (Mosk) 43:946–958CrossRefGoogle Scholar
- Górecki P (2004) Reconciliation problems for duplication, loss and horizontal gene transfer. In: Bourne PE, Gusfield D (eds) RECOMB, ACM, pp 316–325. http://dblp.uni-dtrier.de/db/conf/recomb/recomb2004.html#Gorecki04
- Górecki P (2010) H-trees: a model of evolutionary scenario with horizontal gene transfer. Fund Inform 103:105–128MathSciNetMATHGoogle Scholar
- Górecki P, Tiuryn J (2012) Inferring evolutionary scenarios in the duplication, loss and horizontal gene transfer model. In: Constable R, Silva A (eds) Logic and program semantics, Lecture Notes in Computer Science. Springer, Berlin, pp 83–105. doi: 10.1007/978-3-642-29485-3_7
- Hallett M, Lagergren J, Tofigh A (2004) Simultaneous identification of duplications and lateral transfers. In: RECOMB ’04. ACM, New York, NY, USA, pp 347–356Google Scholar
- Hallett MT, Lagergren J (2001) Efficient algorithms for lateral gene transfer problems. In: Proceedings of the fifth annual international conference on computational biology. ACM, New York, NY, USA, pp 149–156. doi: 10.1145/369133.369188
- Han MV, Thomas GW, Lugo-Martinez J, Hahn MW (2013) Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 30(8):1987–1997CrossRefGoogle Scholar
- Kunin V, Ouzounis CA (2003) The balance of driving forces during genome evolution in prokaryotes. Genome Res 13(7):1589–1594CrossRefGoogle Scholar
- Lafond M, Swenson K, El-Mabrouk N (2012) An optimal reconciliation algorithm for gene trees with polytomies. In: Raphael B, Tang J (eds) Algorithms in bioinformatics, Lecture Notes in Computer Science. Springer, Berlin, pp 106–122. doi: 10.1007/978-3-642-33122-0_9
- Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290(5494):1151–1155CrossRefGoogle Scholar
- Maddison WP (1997) Gene trees in species trees. Syst Biol 46(3):523–536CrossRefGoogle Scholar
- Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA 102(15):5454–5459CrossRefGoogle Scholar
- Makino T, McLysaght A (2012) Positionally-biased gene loss after whole genome duplication: evidence from human, yeast and plant. Genome Res 22:24–27CrossRefGoogle Scholar
- Merkle D, Middendorf M (2005) Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information. Theory Biosci 123(4):277–299. doi: 10.1016/j.thbio.2005.01.003 CrossRefGoogle Scholar
- Merkle D, Middendorf M, Wieseke N (2010) A parameter-adaptive dynamic programming approach for inferring cophylogenies. BMC Bioinform 11(Suppl 1):S60. doi: 10.1186/1471-2105-11-S1-S60 CrossRefGoogle Scholar
- Page RD (1994) Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst Biol 43:58–77Google Scholar
- Puigbo P, Wolf Y, Koonin E (2009) Search for a ’tree of life’ in the thicket of the phylogenetic forest. J Biol 8(6):59. doi: 10.1186/jbiol159. http://jbiol.com/content/8/6/59
- Rasmussen MD, Kellis M (2007) Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res 17(12):1932–1942Google Scholar
- Rasmussen MD, Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 2(4):755–765Google Scholar
- Sanderson M (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231CrossRefGoogle Scholar
- Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M (2004) Large-scale copy number polymorphism in the human genome. Science 305(5683):525–528CrossRefGoogle Scholar
- Semon M, Wolfe KH (2007) Consequences of genome duplication. Curr Opin Genet Dev 17:505–512CrossRefGoogle Scholar
- Sjöstrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J (2014) A bayesian method for analyzing lateral gene transfer. Syst Biol 63(3):409–420. doi: 10.1093/sysbio/syu007 CrossRefGoogle Scholar
- Suchard MA (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics 170(1):419–431Google Scholar
- Szöllősi GJ, Daubin V (2012) Modeling gene family evolution and reconciling phylogenetic discord. Methods Mol Biol 856:29–51CrossRefGoogle Scholar
- Szöllősi GJ, Tannier E, Lartillot N, Daubin V (2013) Lateral gene transfer from the dead. Syst Biol 62(3):386–397. doi: 10.1093/sysbio/syt003 CrossRefGoogle Scholar
- Szöllősi GJ, Boussau B, Abby SS, Tannier E, Daubin V (2012) Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc Natl Acad Sci USA 109(43):17513–17518CrossRefGoogle Scholar
- Tofigh A (2009) Using trees to capture reticulate evolution, lateral gene transfers and cancer progression. Ph.D. thesis, KTH Royal Institute of Technology, SwedenGoogle Scholar
- Tofigh A, Hallett M, Lagergren J (2010) Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM TCBB 99. http://doi.ieeecomputersociety.org/10.1109/TCBB.2010.14
- Tofigh A, Sjöstrand J, Sennblad B, Arvestad L, Lagergren J Detecting LGTs using a novel probabilistic model integrating duplications, lgts, losses, rate variation, and sequence evolution (manuscript)Google Scholar
- Vernot B, Stolzer M, Goldman A, Durand D (2008) Reconciliation with non-binary species trees. J Comput Biol 15:981–1006MathSciNetCrossRefGoogle Scholar
- Zhang L (1997) On a Mirkin–Muchnik–Smith conjecture for comparing molecular phylogenies. J Comput Biol 4(2):177–187CrossRefGoogle Scholar