A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric

  • Nicholas D. Pattengale
  • Bernard M. E. Moret
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day’s algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+ε) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.


Edit Distance Rand Index Java Class Popular Package Tree Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bryant, D.: A classification of consensus methods for phylogenetics. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–184. American Math. Soc (2002)Google Scholar
  2. 2.
    Bininda-Edmonds, O. (ed.): Phylogenetic Supertrees: Combining information to reveal the Tree of Life. Kluwer Publ., Dordrecht (2004)Google Scholar
  3. 3.
    DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On computing the nearest neighbor interchange distance. In: Proc. DIMACS Workshop on Discrete Problems with Medical Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 55, pp. 125–143. American Math. Soc (2000)Google Scholar
  4. 4.
    Allen, B., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5, 1–15 (2001)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosciences 53, 131–147 (1981)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Day, W.: Optimal algorithms for comparing trees with labeled leaves. J. of Classification 2, 7–28 (1985)MATHCrossRefGoogle Scholar
  7. 7.
    Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Cont. Math. 26, 189–206 (1984)MATHMathSciNetGoogle Scholar
  8. 8.
    Maddison, W., Maddison, D.: Mesquite: A modular system for evolutionary analysis (2005), Version 1.06:
  9. 9.
    Bryant, D.: The splits in the neighborhood of a tree. Annals of Combinatorics 8, 1–11 (2004)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proc. 42nd IEEE Symp. on Foundations of Computer Science FOCS 2001, pp. 10–33. IEEE Computer Society, Los Alamitos (2001)Google Scholar
  11. 11.
    Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 215–245 (1995)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. 13th ACM Symp. on Theory of Computing STOC 1998, pp. 604–613 (1998)Google Scholar
  13. 13.
    Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 671–687 (2003)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Hillis, D., Heath, T., St John, K.: Analysis and visualization of tree space. Syst. Bio. 54, 471–482 (1995)CrossRefGoogle Scholar
  15. 15.
    Amenta, N., Klingner, J.: Case study: Visualizing sets of evolutionary trees. In: Proc. IEEE Symp. on Information Visualization INFOVIS 2002, pp. 71–73. IEEE Computer Society, Los Alamitos (2002)CrossRefGoogle Scholar
  16. 16.
    Maddison, D.: The discovery and importance of multiple islands of most-parsimonious trees. Syst. Zoology 40, 315–328 (1991)CrossRefGoogle Scholar
  17. 17.
    Rand, W.: Objective criteria for the evaluation of clustering methods. J. American Stat. Assoc. 66, 846–850 (1971)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nicholas D. Pattengale
    • 1
  • Bernard M. E. Moret
    • 1
  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerqueUSA

Personalised recommendations