GTP Supertrees from Unrooted Gene Trees: Linear Time Algorithms for NNI Based Local Searches

  • Paweł Górecki
  • J. Gordon Burleigh
  • Oliver Eulenstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)


Gene tree parsimony (GTP) problems infer species supertrees from a collection of rooted gene trees that are confounded by evolutionary events like gene duplication, gene duplication and loss, and deep coalescence. These problems are NP-complete, and consequently, they often are addressed by effective local search heuristics that perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. Still, GTP problems require rooted input gene trees; however, in practice, most phylogenetic methods infer unrooted gene trees and it may be difficult to root correctly. In this work, we (i) define the first local NNI search problems to solve heuristically the GTP equivalents for unrooted input gene trees, called unrooted GTP problems, and (ii) describe linear time algorithms for these local search problems. We implemented the first NNI based local search heuristics for unrooted GTP problems, which enable analyses for thousands of genes. Further, analysis of a large plant data set using the unrooted NNI search provides support for an intriguing new hypothesis regarding the evolutionary relationships among major groups of flowering plants.


Species Tree Local Search Gene Tree Local Search Algorithm Optimal Edge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bansal, M.S., Burleigh, J.G., Eulenstein, O., Wehe, A.: Heuristics for the Gene-Duplication Problem: A Θ(n) Speed-Up for the Local Search. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 238–252. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Bansal, M.S., Eulenstein, O.: An Ω(n 2/ logn) speed-up of TBR heuristics for the gene-duplication problem. IEEE/ACM TCBB 5(4), 514–524 (2008)Google Scholar
  3. 3.
    Bansal, M.S., Eulenstein, O., Wehe, A.: The gene-duplication problem: Near-linear time algorithms for NNI-based local searches. IEEE/ACM TCBB 6(2), 221–231 (2009)Google Scholar
  4. 4.
    Beiko, R.G., Doolittle, W.F., Charlebois, R.L.: The Impact of Reticulate Evolution on Genome Phylogeny. Systematic Biology 57(6), 844–856 (2008)CrossRefGoogle Scholar
  5. 5.
    Bender, M.A., Farach-Colton, M.: The lca Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  6. 6.
    Bininda-Emonds, O.R.P.: Phylogenetic supertrees: combining information to reveal the tree of life (2004)Google Scholar
  7. 7.
    Bouchenak-Khelladi, Y., Salamin, N., Savolainen, V., Forest, F., Bank, M., Chase, M.W., Hodkinson, T.R.: Large multi-gene phylogenetic trees of the grasses (poaceae): progress towards complete tribal and generic level sampling. Mol. Phyl. Evol. 47(2), 488–505 (2008)CrossRefGoogle Scholar
  8. 8.
    Burleigh, J.G., Bansal, M.S., Eulenstein, O., Hartmann, S., Wehe, A., Vision, T.J.: Genome-scale phylogenetics: inferring the plant tree of life from 18,896 discordant gene trees. Systematic Biology 60, 117–125 (2011)CrossRefGoogle Scholar
  9. 9.
    Delsuc, F., Brinkmann, H., Philippe, H.: Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics 6(5), 361–375 (2005)CrossRefGoogle Scholar
  10. 10.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)CrossRefGoogle Scholar
  11. 11.
    Eulenstein, O., Huzurbazar, S., Liberles, D.A.: Reconciling phylogenetic trees. In: Dittmar, Liberles (eds.) Evolution After Gene Duplication. Wiley (2010)Google Scholar
  12. 12.
    Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28(2), 132–163 (1979)CrossRefGoogle Scholar
  13. 13.
    Górecki, P., Tiuryn, J.: Inferring phylogeny from whole genomes. Bioinformatics 23(2), e116–e222 (2007)CrossRefGoogle Scholar
  14. 14.
    Guigó, R., Muchnik, I., Smith, T.F.: Reconstruction of ancient molecular phylogeny. Molecular Phylogenetics and Evolution 6(2), 189–213 (1996)CrossRefGoogle Scholar
  15. 15.
    Holland, B.R., Penny, D., Hendy, M.D.: Outgroup misplacement and phylogenetic inaccuracy under a molecular clock a simulation study. Syst. Biol. 52, 229–238 (2003)CrossRefGoogle Scholar
  16. 16.
    Huelsenbeck, J.P., Bollback, J.P., Levine, A.M.: Inferring the Root of a Phylogenetic Tree. Systematic Biology 51(1), 32–43 (2002)CrossRefGoogle Scholar
  17. 17.
    Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8, 275–282 (1992)Google Scholar
  18. 18.
    Kubatko, L.S., Degnan, J.H.: Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst. Biol. 56(1), 17–24 (2007)CrossRefGoogle Scholar
  19. 19.
    Ma, B., Li, M., Zhang, L.: From gene trees to species trees. SIAM Journal on Computing 30(3), 729–752 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Maddison, W.P.: Gene trees in species trees. Systematic Biology 46, 523–536 (1997)CrossRefGoogle Scholar
  21. 21.
    Moore, M.J., Soltis, P.S., Bell, C.D., Burleigh, J.G., Soltis, D.E.: Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proceedings of the National Academy of Sciences 107(10), 4623–4628 (2010)CrossRefGoogle Scholar
  22. 22.
    Mossel, E., Vigoda, E.: Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science 309(5744), 2207–2209 (2005)CrossRefGoogle Scholar
  23. 23.
    Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 43(1), 58–77 (1994)Google Scholar
  24. 24.
    Qiu, Y., Li, L., Wang, B., Xue, J., Hendry, T.A., Li, R., Brown, J.W., Liu, Y., Hudson, G.T., Chen, Z.: Angiosperm phylogeny inferred from sequences of four mitochondrial genes. Journal of Systematics and Evolution 48(6), 391–425 (2010)CrossRefGoogle Scholar
  25. 25.
    Rouard, M., Guignon, V., Aluome, C., Laporte, M., Droc, G., Walde, C., Zmasek, C.M., Périn, C., Conte, M.G.: Greenphyldb v2.0: comparative and functional genomics in plants. Nucleic Acids Research 39, D1095–D1102 (2010)CrossRefGoogle Scholar
  26. 26.
    Sanderson, M., Michelle, M.: Inferring angiosperm phylogeny from est data with widespread gene duplication. BMC Evolutionary Biology 7(suppl.1) (2007)Google Scholar
  27. 27.
    Soltis, D.E., Smith, S.A., Cellinese, N., Wurdack, K.J., Tank, D.C., Brockington, S.F., Refulio-Rodriguez, N.F., Walker, J.B., Moore, M.J., Carlsward, B.S., Bell, C.D., Latvis, M., Crawley, S., Black, C., Diouf, D., Xi, Z., Rushworth, C.A., Gitzendanner, M.A., Sytsma, K.J., Qiu, Y., Hilu, K.W., Davis, C.C., Sanderson, M.J., Beaman, R.S., Olmstead, R.G., Judd, W.S., Donoghue, M.J., Soltis, P.S.: Angiosperm phylogeny: 17 genes, 640 taxa. American Journal of Botany 98(4), 704–730 (2011)CrossRefGoogle Scholar
  28. 28.
    Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)CrossRefGoogle Scholar
  29. 29.
    Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-Based Multi-locus Phylogeny Inference. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 531–545. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  30. 30.
    Zhang, L.: From gene trees to species trees ii: Species tree inference by minimizing deep coalescence events. IEEE/ACM TCBB 8, 1685–1691 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Paweł Górecki
    • 1
  • J. Gordon Burleigh
    • 2
  • Oliver Eulenstein
    • 3
  1. 1.Institute of InformaticsUniversity of WarsawPoland
  2. 2.Department of BiologyUniversity of FloridaUSA
  3. 3.Dept. of Computer ScienceIowa State UniversityUSA

Personalised recommendations