A Sublinear-Time Randomized Approximation Scheme for the Robinson-Foulds Metric
The Robinson-Foulds (RF) metric is the measure most widely used in comparing phylogenetic trees; it can be computed in linear time using Day’s algorithm. When faced with the need to compare large numbers of large trees, however, even linear time becomes prohibitive. We present a randomized approximation scheme that provides, with high probability, a (1+ε) approximation of the true RF metric for all pairs of trees in a given collection. Our approach is to use a sublinear-space embedding of the trees, combined with an application of the Johnson-Lindenstrauss lemma to approximate vector norms very rapidly. We discuss the consequences of various parameter choices (in the embedding and in the approximation requirements). We also implemented our algorithm as a Java class that can easily be combined with popular packages such as Mesquite; in consequence, we present experimental results illustrating the precision and running-time tradeoffs as well as demonstrating the speed of our approach.
KeywordsEdit Distance Rand Index Java Class Popular Package Tree Vector
Unable to display preview. Download preview PDF.
- 1.Bryant, D.: A classification of consensus methods for phylogenetics. In: Bioconsensus. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–184. American Math. Soc (2002)Google Scholar
- 2.Bininda-Edmonds, O. (ed.): Phylogenetic Supertrees: Combining information to reveal the Tree of Life. Kluwer Publ., Dordrecht (2004)Google Scholar
- 3.DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On computing the nearest neighbor interchange distance. In: Proc. DIMACS Workshop on Discrete Problems with Medical Applications. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 55, pp. 125–143. American Math. Soc (2000)Google Scholar
- 8.Maddison, W., Maddison, D.: Mesquite: A modular system for evolutionary analysis (2005), Version 1.06: http://mesquiteproject.org
- 10.Indyk, P.: Algorithmic applications of low-distortion geometric embeddings. In: Proc. 42nd IEEE Symp. on Foundations of Computer Science FOCS 2001, pp. 10–33. IEEE Computer Society, Los Alamitos (2001)Google Scholar
- 12.Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proc. 13th ACM Symp. on Theory of Computing STOC 1998, pp. 604–613 (1998)Google Scholar