Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5251)


In this paper, we introduce the HashRF(p,q) algorithm for computing RF matrices of large binary, evolutionary tree collections. The novelty of our algorithm is that it can be used to compute arbitrarily-sized (p ×q) RF matrices without running into physical memory limitations. In this paper, we explore the performance of our HashRF(p,q) approach on 20,000 and 33,306 biological trees of 150 taxa and 567 taxa trees, respectively, collected from a Bayesian analysis. When computing the all-to-all RF matrix, HashRF(p,q) is up to 200 times faster than PAUP* and around 40% faster than HashRF, one of the fastest all-to-all RF algorithms. We show an application of our approach by clustering large RF matrices to improve the resolution rate of consensus trees, a popular approach used by biologists to summarize the results of their phylogenetic analysis. Thus, our HashRF(p,q) algorithm provides scientists with a fast and efficient alternative for understanding the evolutionary relationships among a set of trees.


phylogenetic trees Robinson-Foulds distance clustering performance analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Huelsenbeck, J.P., Ronquist, F., Nielsen, R., Bollback, J.P.: Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001)CrossRefGoogle Scholar
  2. 2.
    Hillis, D.M., Heath, T.A., John, K.S.: Analysis and visualization of tree space. Syst. Biol. 54(3), 471–482 (2005)CrossRefGoogle Scholar
  3. 3.
    Stockham, C., Wang, L.S., Warnow, T.: Statistically based postprocessing of phylogenetic analysis by cluste ring. In: Proceedings of 10th Int’l Conf. on Intelligent Systems for Molecular Biology (ISMB 2002), pp. 285–293 (2002)Google Scholar
  4. 4.
    Swofford, D.L.: PAUP*: Phylogenetic analysis using parsimony (and other methods), Sinauer Associates, Underland, Massachusetts, Version 4.0 (2002)Google Scholar
  5. 5.
    Felsenstein, J.: Inferring Phylogenies. Sinauer Associates (2003)Google Scholar
  6. 6.
    Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. Journal Of Classification 2, 7–28 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Pattengale, N., Gottlieb, E., Moret, B.: Efficiently computing the Robinson-Foulds metric. Journal of Computational Biology 14(6), 724–735 (2007)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Sul, S.J., Williams, T.L.: A randomized algorithm for comparing sets of phylogenetic trees. In: Proc. Fifth Asia Pacific Bioinformatics Conference (APBC 2007), pp. 121–130 (2007)Google Scholar
  9. 9.
    Huelsenbeck, J.P., Ronquist, F.: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17(8), 754–755 (2001)CrossRefGoogle Scholar
  10. 10.
    Lewis, L.A., Lewis, P.O.: Unearthing the molecular phylodiversity of desert soil green algae (chlorophyta). Syst. Bio. 54(6), 936–947 (2005)CrossRefGoogle Scholar
  11. 11.
    Soltis, D.E., Gitzendanner, M.A., Soltis, P.S.: A 567-taxon data set for angiosperms: The challenges posed by bayesian analyses of large data sets. Int. J. Plant Sci. 168(2), 137–157 (2007)CrossRefGoogle Scholar
  12. 12.
    Karypis, G.: CLUTO—software for clustering high-dimensional datasets. Internet Website (last accessed, June 2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceTexas A&M UniversityCollege StationUSA

Personalised recommendations