A Robinson-Foulds Measure to Compare Unrooted Trees with Rooted Trees

  • Paweł Górecki
  • Oliver Eulenstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)

Abstract

The Robinson-Foulds (RF) distance is a well-established measure for the comparative analysis of phylogenetic trees, and comparing unrooted gene trees with rooted species trees is a standard application in phylogenetics. However, the RF distance is not defined to compare unrooted trees with rooted trees. Here we propose urRF a new measure based on the standard RF distance to compare an unrooted tree with a rooted tree, and describe a linear time algorithm to compute urRF. Further, we show that our measure relates to several intriguing properties that can be crucial for the in-depth comparative study and the credible rooting of trees. Finally, we present detailed empirical studies based on real data sets demonstrating the performance of our novel RF measure, the efficiency of our linear time algorithm and the quality of determining optimal rootings of unrooted gene trees.

Keywords

Gene Tree Rooted Tree Outgoing Edge Linear Time Algorithm Unrooted Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bansal, M.S., Burleigh, J.G., Eulenstein, O., Fernández-Baca, D.: Robinson-foulds supertrees. Algorithms for Molecular Biology 5, 18 (2010)CrossRefGoogle Scholar
  2. 2.
    Bender, M.A., Farach-Colton, M.: The lca Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Bordewich, M., Semple, C.: On the computational complexity of the rooted subtree prune and regraft distance. Annals of Combinatorics 8, 409–423 (2004)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Bourque, M.: Abres de Steiner et reseaux dont varie I’emplagement de certains sommnets. PhD thesis, Department d’Informatique, University Montréal, Montréal (1978)Google Scholar
  5. 5.
    Bryant, D., Steel, M.: Computing the distribution of a tree metric. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(3), 420–426 (2009)CrossRefGoogle Scholar
  6. 6.
    Bryant, D., Tsang, J., Kearney, P.E., Li, M.: Computing the quartet distance between evolutionary trees. In: Symposium on Discrete Algorithms, pp. 285–286 (2000)Google Scholar
  7. 7.
    Chaudhary, R., Burleigh, J.G., Fernández-Baca, D.: Fast Local Search for Unrooted Robinson-Foulds Supertrees. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 184–196. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Chen, F., Mackey, A.J., Stoeckert, C.J., Roos, D.S.: Orthomcl-db: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Research 34(suppl.1), D363–D368Google Scholar
  9. 9.
    DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On distances between phylogenetic trees. In: SODA, pp. 427–436 (1997)Google Scholar
  10. 10.
    Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2(1), 7–28 (1985)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004)CrossRefGoogle Scholar
  12. 12.
    Eulenstein, O., Mirkin, B., Vingron, M.: Duplication-based measures of difference between gene and species trees. J. Comput. Biol. 5(1), 135–148 (1998)CrossRefGoogle Scholar
  13. 13.
    Felsenstein, J.: Inferring phylogenies. Sinauer Associates (2004)Google Scholar
  14. 14.
    Goodman, M., Czelusniak, J., Moore, G.W., Romero-Herrera, A.E., Matsuda, G.: Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 28(2), 132–163 (1979)CrossRefGoogle Scholar
  15. 15.
    Górecki, P., Tiuryn, J.: Dls-trees: A model of evolutionary scenarios. Theor. Comput. Sci. 359(1-3), 378–399 (2006)MATHCrossRefGoogle Scholar
  16. 16.
    Górecki, P., Tiuryn, J.: Inferring phylogeny from whole genomes. Bioinformatics 23(2), e116–e222 (2007)CrossRefGoogle Scholar
  17. 17.
    Guindon, S., Delsuc, F., Dufayard, J., Gascuel, O.: Estimating maximum likelihood phylogenies with PhyML. Methods Mol. Biol. 537, 113–137 (2009)CrossRefGoogle Scholar
  18. 18.
    Mecham, C.A.: Theoretical and computational considerations of the compatibility of qualitative taxonomic characters. NATO ASI Series, vol. 1, pp. 304–314. Springer, Berlin (1983)Google Scholar
  19. 19.
    Mirkin, B., Muchnik, I.B., Smith, T.F.: A biologically consistent model for comparing molecular phylogenies. J. Comput. Biol. 2(4), 493–507 (1995)CrossRefGoogle Scholar
  20. 20.
    Page, R.D.M.: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 43(1), 58–77 (1994)Google Scholar
  21. 21.
    Pattengale, N.D., Gottlieb, E.J., Moret, B.M.E.: Efficiently computing the robinson-foulds metric. J. Comput. Biol. 14(6), 724–735 (2007)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Robinson, D.F., Foulds, L.R.: Comparison of weighted labelled trees. Lecture Notes in Mathematics, vol. 748, pp. 119–126 (1979)Google Scholar
  23. 23.
    Sanderson, M., McMahon, M.: Inferring angiosperm phylogeny from est data with widespread gene duplication. BMC Evolutionary Biology 7(suppl.1) (2007)Google Scholar
  24. 24.
    Sayers, E.W., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Research 37(suppl.1), D5–D15 (2009)CrossRefGoogle Scholar
  25. 25.
    Semple, C., Steel, M.A.: Phylogenetics. Oxford University Press (2003)Google Scholar
  26. 26.
    Smith, A.: Rooting molecular trees: problems and strategies. Biol. J. Linn. Soc. 51, 279–292Google Scholar
  27. 27.
    Strimmer, K., von Haeseler, A.: Quartet puzzling: A quartet maximum likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13, 964–969 (1996)CrossRefGoogle Scholar
  28. 28.
    Wheeler, W.: Nucleic acid sequence phylogeny and random outgroups. Cladistics – The International Journal of the Willi Hennig Society 51, 363–368 (1990)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Paweł Górecki
    • 1
  • Oliver Eulenstein
    • 2
  1. 1.Institute of InformaticsUniversity of WarsawPoland
  2. 2.Dept. of Computer ScienceIowa State UniversityUSA

Personalised recommendations