A Metric for Phylogenetic Trees Based on Matching

  • Yu Lin
  • Vaibhav Rajan
  • Bernard M. E. Moret
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6674)

Abstract

Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, similarity measures based on maximum agreement are too strict, while measures based on the elimination of rogue taxa work poorly when the proportion of rogue taxa is significant; distance measures based on edit distances under simple tree operations (such as nearest-neighbor interchange or subtree pruning and regrafting) are NP-hard; and the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes—reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance.

In this paper, we introduce an entirely new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1), 1–15 (2001)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Amir, A., Keselman, D.: Maximum agreement subtree in a set of evolutionary trees: Metrics and efficient algorithms. SIAM J. Computing 26(6), 1656–1669 (1997)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Bryant, D.: Hunting for trees, building trees and comparing trees: Theory and method in phylogenetic analysis. PhD thesis, University of Canterbury (1997)Google Scholar
  4. 4.
    Bryant, D., Steel, M.: Computing the distribution of a tree metric. ACM/IEEE Trans. on Comput. Biology and Bioinformatics 6(3), 420–426 (2009)CrossRefGoogle Scholar
  5. 5.
    Cole, R., Farach-Colton, M., Hariharan, R., Przytycka, T., Thorup, M.: An O(n log n) algorithm for the maximum agreement subtree problem for binary trees. SIAM J. Computing 30(5), 1385–1404 (2000)CrossRefMATHGoogle Scholar
  6. 6.
    DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On distances between phylogenetic trees. In: Proc. 8th ACM/SIAM Symp. Discrete Algs. (SODA 1997), pp. 427–436 (1997)Google Scholar
  7. 7.
    Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. J. Classification 2(1), 7–28 (1985)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Edmonds, J., Karp, R.M.: Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM 19(2), 248–264 (1972)CrossRefMATHGoogle Scholar
  9. 9.
    Farach, M., Przytycka, T.M., Thorup, M.: On the agreement of many trees. Inf. Process. Lett. 55(6), 297–301 (1995)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Gabow, H.N., Tarjan, R.E.: Faster scaling algorithms for network problems. SIAM J. Computing 18(5), 1013–1036 (1989)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Hickey, G., Dehne, F., Rau-Chaplin, A., Blouin, C.: SPR distance computation of unrooted trees. Evol. Bioinform. Online 4, 17–27 (2008)Google Scholar
  12. 12.
    Kao, M.Y.: Tree contractions and evolutionary trees. SIAM J. Computing 27(6), 1592–1616 (1998)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Li, M., Tromp, J., Zhang, L.: On the nearest-neighbour interchange distance between evolutionary trees. J. Theor. Biol. 182(4), 463–467 (1996)CrossRefGoogle Scholar
  14. 14.
    Pattengale, N.D., Gottlieb, E.J., Moret, B.M.E.: Efficiently computing the Robinson-Foulds metric. J. Comput. Biol. 14(6), 724–735 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Pattengale, N.D., Swenson, K.M., Moret, B.M.E.: Uncovering hidden phylogenetic consensus. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) ISBRA 2010. LNCS, vol. 6053, pp. 128–139. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Mathematical Biosciences 53, 131–147 (1981)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Steel, M., Penny, D.: Distributions of tree comparison metrics—some new results. Syst. Biol. 42(2), 126–141 (1993)Google Scholar
  18. 18.
    Steel, M., Warnow, T.: Kaikoura tree theorems: computing maximum agreement subtree problem. Information Processing Letters 48, 77–82 (1993)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Stockham, C., Wang, L.-S., Warnow, T.: Statistically-based postprocessing of phylogenetic analysis using clustering. In: Proc. 10th Conf. Intelligent Systems for Mol. Biol. (ISMB 2002). Bioinformatics, vol. 18, pp. S285–S293. Oxford U. Press, Oxford (2002)Google Scholar
  20. 20.
    Whidden, C., Zeh, N.: A unifying view on approximation and fpt of agreement forests. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 390–402. Springer, Heidelberg (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yu Lin
    • 1
  • Vaibhav Rajan
    • 1
  • Bernard M. E. Moret
    • 1
  1. 1.Laboratory for Computational Biology and BioinformaticsSwiss Federal Institute of Technology (EPFL)LausanneSwitzerland

Personalised recommendations