Journal of Mathematical Biology

, Volume 72, Issue 7, pp 1811–1844 | Cite as

Inferring gene duplications, transfers and losses can be done in a discrete framework

  • Vincent Ranwez
  • Celine ScornavaccaEmail author
  • Jean-Philippe Doyon
  • Vincent Berry


In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree—whose internal nodes represent speciation events—while the evolutionary history of a gene family is depicted by a gene tree—whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications (\(\mathbb {D}\)), transfers (\(\mathbb {T}\)) and losses (\(\mathbb {L}\)). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony \(\mathbb {DTL}\) reconciliation problem in the discrete framework is equivalent to finding a most parsimonious \(\mathbb {DTL}\) scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.


Tree reconciliation Tree embedding Gene evolution Phylogenetics Parsimony Equivalence 

Mathematics Subject Classification

68R05 68R10 92-08 05C30 



This work was funded by the French Agence Nationale de la Recherche (ANR) through Grants ANR-09-PEXT-000 “Phylospace” and ANR-10-BINF-01-01 “Ancestrome”, and by the Institut de Biologie Computationnelle. This publication is Contribution No. 2015-098 of the Institut des Sciences de l’Evolution de Montpellier (ISEM, UMR 5554).


  1. Åkerborg Ö, Sennblad B, Arvestad L, Lagergren J (2009) Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci USA 106(14):5714–5719CrossRefGoogle Scholar
  2. Arvestad L, Berglund AC, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19 Suppl 1:7–15CrossRefGoogle Scholar
  3. Bansal MS, Alm EJ, Kellis M (2012) Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics 28(12):i283–i291. doi: 10.1093/bioinformatics/bts225 CrossRefGoogle Scholar
  4. Berglund AC, Steffansson P, Betts MJ, Liberles DA (2006) Optimal gene trees from sequences and species trees using a soft interpretation of parsimony. J Mol Evol 63:240–250CrossRefGoogle Scholar
  5. Charleston M (1998) Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci 149(2):191–223. doi: 10.1016/S0025-5564(97)10012-8 MathSciNetCrossRefzbMATHGoogle Scholar
  6. Chevenet F, Doyon JF, Scornavacca C, Jousselin E, Berry V (2015) Sylvx: a viewer for phylogenetic reconciliations (under review)Google Scholar
  7. Conow C, Fielder D, Ovadia Y, Libeskind-Hadas R (2010) Jane: a new tool for the cophylogeny reconstruction problem. Algorithms Mol Biol 5:16CrossRefGoogle Scholar
  8. Cotton J, Page R (2005) Rates and patterns of gene duplication and loss in the human genome. Proc Biol Sci 272(1560):277–283CrossRefGoogle Scholar
  9. Daubin V, Moran NA, Ochman H (2003) Phylogenetics and the cohesion of bacterial genomes. Science 301:829–832CrossRefGoogle Scholar
  10. David L, Alm E (2011) Rapid evolutionary innovation during an archaean genetic expansion. Nature 469(7328):93–96CrossRefGoogle Scholar
  11. Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW (2006) The evolution of mammalian gene families. PLoS One 1:e85CrossRefGoogle Scholar
  12. Doyon J, Ranwez V, Daubin V, Berry V (2011) Models, algorithms and programs for phylogeny reconciliation. Brief Bioinform 12:392–400CrossRefGoogle Scholar
  13. Doyon JP, Scornavacca C, Gorbunov KY, Szllosi GJ, Ranwez V, Berry V (2010) An efficient algorithm for gene/species trees parsimonious reconciliation with losses, duplications and transfers. In: Tannier E (ed) RECOMB-CG, Lecture Notes in Computer Science, vol 6398. Springer, Berlin, pp 93–108Google Scholar
  14. Drummond AJ, Ho SY, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4(5). doi: 10.1371/journal.pbio.0040088
  15. Fischer I, Dainat J, Ranwez V, Glemin S, Dufayard JF, Chantret N (2014) Impact of recurrent gene duplication on adaptation of plant genomes. BMC Plant Biol 14(1):151. doi: 10.1186/1471-2229-14-151.
  16. Fitch WM (2000) Homology—a personal view on some of the problems. Trends Genet 16(5):227–231CrossRefGoogle Scholar
  17. Gabaldon T (2006) Computational approaches for the prediction of protein function in the mitochondrion. Am J Physiol Cell Physiol 291(6):C1121–1128. doi: 10.1152/ajpcell.00225.2006 CrossRefGoogle Scholar
  18. Goldenfeld N, Woese C (2007) Biology’s next revolution. Nature 445:369CrossRefGoogle Scholar
  19. Goodman M, Czelusniak J, Moore GW, Herrera RA, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool 28:132–163CrossRefGoogle Scholar
  20. Gorbunov KY, Lyubetsky VA (2009) Reconstructing genes evolution along a species tree. Mol Biol (Mosk) 43:946–958CrossRefGoogle Scholar
  21. Górecki P (2004) Reconciliation problems for duplication, loss and horizontal gene transfer. In: Bourne PE, Gusfield D (eds) RECOMB, ACM, pp 316–325.
  22. Górecki P (2010) H-trees: a model of evolutionary scenario with horizontal gene transfer. Fund Inform 103:105–128MathSciNetzbMATHGoogle Scholar
  23. Górecki P, Tiuryn J (2012) Inferring evolutionary scenarios in the duplication, loss and horizontal gene transfer model. In: Constable R, Silva A (eds) Logic and program semantics, Lecture Notes in Computer Science. Springer, Berlin, pp 83–105. doi: 10.1007/978-3-642-29485-3_7
  24. Hallett M, Lagergren J, Tofigh A (2004) Simultaneous identification of duplications and lateral transfers. In: RECOMB ’04. ACM, New York, NY, USA, pp 347–356Google Scholar
  25. Hallett MT, Lagergren J (2001) Efficient algorithms for lateral gene transfer problems. In: Proceedings of the fifth annual international conference on computational biology. ACM, New York, NY, USA, pp 149–156. doi: 10.1145/369133.369188
  26. Han MV, Thomas GW, Lugo-Martinez J, Hahn MW (2013) Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 30(8):1987–1997CrossRefGoogle Scholar
  27. Kunin V, Ouzounis CA (2003) The balance of driving forces during genome evolution in prokaryotes. Genome Res 13(7):1589–1594CrossRefGoogle Scholar
  28. Lafond M, Swenson K, El-Mabrouk N (2012) An optimal reconciliation algorithm for gene trees with polytomies. In: Raphael B, Tang J (eds) Algorithms in bioinformatics, Lecture Notes in Computer Science. Springer, Berlin, pp 106–122. doi: 10.1007/978-3-642-33122-0_9
  29. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290(5494):1151–1155CrossRefGoogle Scholar
  30. Maddison WP (1997) Gene trees in species trees. Syst Biol 46(3):523–536CrossRefGoogle Scholar
  31. Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci USA 102(15):5454–5459CrossRefGoogle Scholar
  32. Makino T, McLysaght A (2012) Positionally-biased gene loss after whole genome duplication: evidence from human, yeast and plant. Genome Res 22:24–27CrossRefGoogle Scholar
  33. Merkle D, Middendorf M (2005) Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information. Theory Biosci 123(4):277–299. doi: 10.1016/j.thbio.2005.01.003 CrossRefGoogle Scholar
  34. Merkle D, Middendorf M, Wieseke N (2010) A parameter-adaptive dynamic programming approach for inferring cophylogenies. BMC Bioinform 11(Suppl 1):S60. doi: 10.1186/1471-2105-11-S1-S60 CrossRefGoogle Scholar
  35. Page RD (1994) Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst Biol 43:58–77Google Scholar
  36. Puigbo P, Wolf Y, Koonin E (2009) Search for a ’tree of life’ in the thicket of the phylogenetic forest. J Biol 8(6):59. doi: 10.1186/jbiol159.
  37. Rasmussen MD, Kellis M (2007) Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. Genome Res 17(12):1932–1942Google Scholar
  38. Rasmussen MD, Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 2(4):755–765Google Scholar
  39. Sanderson M (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218–1231CrossRefGoogle Scholar
  40. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H, Walker M, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC, Trask B, Patterson N, Zetterberg A, Wigler M (2004) Large-scale copy number polymorphism in the human genome. Science 305(5683):525–528CrossRefGoogle Scholar
  41. Semon M, Wolfe KH (2007) Consequences of genome duplication. Curr Opin Genet Dev 17:505–512CrossRefGoogle Scholar
  42. Sjöstrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J (2014) A bayesian method for analyzing lateral gene transfer. Syst Biol 63(3):409–420. doi: 10.1093/sysbio/syu007 CrossRefGoogle Scholar
  43. Suchard MA (2005) Stochastic models for horizontal gene transfer: taking a random walk through tree space. Genetics 170(1):419–431Google Scholar
  44. Szöllősi GJ, Daubin V (2012) Modeling gene family evolution and reconciling phylogenetic discord. Methods Mol Biol 856:29–51CrossRefGoogle Scholar
  45. Szöllősi GJ, Tannier E, Lartillot N, Daubin V (2013) Lateral gene transfer from the dead. Syst Biol 62(3):386–397. doi: 10.1093/sysbio/syt003 CrossRefGoogle Scholar
  46. Szöllősi GJ, Boussau B, Abby SS, Tannier E, Daubin V (2012) Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc Natl Acad Sci USA 109(43):17513–17518CrossRefGoogle Scholar
  47. Tofigh A (2009) Using trees to capture reticulate evolution, lateral gene transfers and cancer progression. Ph.D. thesis, KTH Royal Institute of Technology, SwedenGoogle Scholar
  48. Tofigh A, Hallett M, Lagergren J (2010) Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM TCBB 99.
  49. Tofigh A, Sjöstrand J, Sennblad B, Arvestad L, Lagergren J Detecting LGTs using a novel probabilistic model integrating duplications, lgts, losses, rate variation, and sequence evolution (manuscript)Google Scholar
  50. Vernot B, Stolzer M, Goldman A, Durand D (2008) Reconciliation with non-binary species trees. J Comput Biol 15:981–1006MathSciNetCrossRefGoogle Scholar
  51. Zhang L (1997) On a Mirkin–Muchnik–Smith conjecture for comparing molecular phylogenies. J Comput Biol 4(2):177–187CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Vincent Ranwez
    • 1
    • 4
  • Celine Scornavacca
    • 2
    • 4
    Email author
  • Jean-Philippe Doyon
    • 2
    • 3
  • Vincent Berry
    • 3
    • 4
  1. 1.SupAgro, UMR-AGAPMontpellierFrance
  2. 2.ISEM, UMR 5554 (Univ. Montpellier, CNRS, IRD, EPHE)MontpellierFrance
  3. 3.LIRMM, CNRS, Univ. MontpellierMontpellierFrance
  4. 4.Institut de Biologie ComputationnelleMontpellierFrance

Personalised recommendations