Journal of Molecular Evolution

, Volume 63, Issue 2, pp 240–250 | Cite as

Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony

  • Ann-Charlotte Berglund-Sonnhammer
  • Pär Steffansson
  • Matthew J. Betts
  • David A. Liberles
Article

Abstract

Gene duplication and gene loss as well as other biological events can result in multiple copies of genes in a given species. Because of these gene duplication and loss dynamics, in addition to variation in sequence evolution and other sources of uncertainty, different gene trees ultimately present different evolutionary histories. All of this together results in gene trees that give different topologies from each other, making consensus species trees ambiguous in places. Other sources of data to generate species trees are also unable to provide completely resolved binary species trees. However, in addition to gene duplication events, speciation events have provided some underlying phylogenetic signal, enabling development of algorithms to characterize these processes. Therefore, a soft parsimony algorithm has been developed that enables the mapping of gene trees onto species trees and modification of uncertain or weakly supported branches based on minimizing the number of gene duplication and loss events implied by the tree. The algorithm also allows for rooting of unrooted trees and for removal of in-paralogues (lineage-specific duplicates and redundant sequences masquerading as such). The algorithm has also been made available for download as a software package, Softparsmap.

Keywords

Parsimony Phylogeny Gene duplication/gene loss 

References

  1. Arvestad L, Berglund AC, Lagergren J, Sennblad B (2003) Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19:I7–I15PubMedCrossRefGoogle Scholar
  2. Arvestad L, Berglund AC, Lagergen J, Sennblad B (2004) Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence analysis. RECOMB 2004:326–335Google Scholar
  3. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2005) Genbank. Nucleic Acids Res 33:D34–D38PubMedCrossRefGoogle Scholar
  4. Blanchette M, Green ED, Miller W, Haussler D (2004) Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res 14:2412–2423PubMedCrossRefGoogle Scholar
  5. Chen K, Durand D, Farach-Colton M (2000) Notung: a program for dating gene duplications and optimizing gene family trees. J Comput Biol 7:429–447PubMedCrossRefGoogle Scholar
  6. Cotton JA, Page RDM (2002) Going nuclear: Gene family evolution and vertebrate phylogeny reconciled. Proc Roy Soc London 269:1555–1561CrossRefGoogle Scholar
  7. Durand D, Halldorsson BV, Vernot B (2005) A hybrid micro-macroevolutionary approach to gene tree reconstruction. RECOMB 2005:250–264Google Scholar
  8. Eulenstein O, Mirkin B, Vingron M (1998) Duplication-based measures of difference between gene and species trees. J Comput Biol 5:135–148PubMedGoogle Scholar
  9. Francino MP (2005) An adaptive radiation model for the origin of new gene functions. Nature Genet 37:573–577PubMedCrossRefGoogle Scholar
  10. Galtier N (2001) Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol Biol Evol 18:866–873PubMedGoogle Scholar
  11. Garey MR, Johnson DS (1979) Computers and intractability, a guide to the theory of NP-completeness. Freeman, New YorkGoogle Scholar
  12. Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G (1979) Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst Zool 28:132–163CrossRefGoogle Scholar
  13. Grasso C, Lee C (2004) Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20:1546–1556PubMedCrossRefGoogle Scholar
  14. Guigo R, Muchnik I, Smith TF (1996) Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol 6:189–213PubMedCrossRefGoogle Scholar
  15. Hallett MT, Lagergren J (2000) New algorithms for the duplication-loss model. RECOMB 2000:138–146Google Scholar
  16. Hallet M, Lagergren J, Tofigh A (2004) Simultaneous identification of duplications and lateral transfers. RECOMB 2004:347–356Google Scholar
  17. Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755PubMedCrossRefGoogle Scholar
  18. Koonin EV, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Rogozin IB, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2004) A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol 5(2):R7PubMedCrossRefGoogle Scholar
  19. Liberles DA, Schreiber DR, Govindarajan S, Chamberlin SG, Benner SA (2001) The Adaptive Evolution Database (TAED). Genome Biol 2(8):research0028.1-0028.6PubMedGoogle Scholar
  20. Lopez P, Casane D, Philippe H (2002) Heterotachy, an important process of protein evolution. Mol Biol Evol 19:1–7PubMedGoogle Scholar
  21. Lynch M, O’Hely M, Walsh B, Force A (2001) The probability of preservation of a newly arisen gene duplicate. Genetics 159:1789–1804PubMedGoogle Scholar
  22. Ma B, Li M, Zhang LX (2000) From gene trees to species trees. SIAM J Comput 30:729–752CrossRefGoogle Scholar
  23. Maddison WP (1989) Reconstructing character evolution on polytomous cladograms. Cladistics 5:365–377CrossRefGoogle Scholar
  24. Ohno S (1970) Evolution by gene duplication. Springer-Verlag, New YorkGoogle Scholar
  25. Page RDM (1994) Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst Biol 43:58–77Google Scholar
  26. Page RDM (2000) Extracting species trees from complex gene trees: reconciled trees and vertebrate phylogeny. Mol Phylogenet Evol 14:89–106PubMedCrossRefGoogle Scholar
  27. Page RDM, Cotton (2000) GeneTree: a tool for exploring gene family evolution. In: Sankoff D, Nadeau J (eds) Map alignment, and the evolution of gene families. Kluwer Academic, Dordrecht, pp 525–536Google Scholar
  28. Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28PubMedCrossRefGoogle Scholar
  29. Roth C, Betts MJ, Steffansson P, Saelensminde G, Liberles DA (2005) The Adaptive Evolution Database (TAED): a phylogeny-based tool for comparative genomics. Nucleic Acids Res 33:D495–D497PubMedCrossRefGoogle Scholar
  30. Siltberg J, Liberles DA (2002) A simple covarion-based approach to analyse nucleotide substitution rates. J Evol Biol 15:588–594CrossRefGoogle Scholar
  31. Zhang LX (1997) On a Mirkin-Muchnik-Smith conjecture for comparing molecular phylogenies. J Comput Biol 4:177–187PubMedCrossRefGoogle Scholar
  32. Zmasek CM, Eddy SR (2001a) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics 17:383–384CrossRefGoogle Scholar
  33. Zmasek CM, Eddy SR (2001b) A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17:821–828CrossRefGoogle Scholar
  34. Zmasek CM, Eddy SR (2002) RIO: Analyzing proteomes by automated phylogenomics using resamples inference of orthologs. BMC Bioinform 3:14CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Ann-Charlotte Berglund-Sonnhammer
    • 1
    • 2
  • Pär Steffansson
    • 1
  • Matthew J. Betts
    • 3
    • 4
  • David A. Liberles
    • 1
    • 3
    • 5
  1. 1.Stockholm Bioinformatics CenterStockholm UniversitySweden
  2. 2.Linnaeus Centre for BioinformaticsUppsala UniversitySweden
  3. 3.Computational Biology Unit, BCCSUniversity of BergenNorway
  4. 4.EMBL HeidelbergGermany
  5. 5.Department of Molecular BiologyUniversity of WyomingLaramieUSA

Personalised recommendations