Skip to main content

A Metric for Phylogenetic Trees Based on Matching

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6674))

Abstract

Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, similarity measures based on maximum agreement are too strict, while measures based on the elimination of rogue taxa work poorly when the proportion of rogue taxa is significant; distance measures based on edit distances under simple tree operations (such as nearest-neighbor interchange or subtree pruning and regrafting) are NP-hard; and the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes—reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance.

In this paper, we introduce an entirely new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, B.L., Steel, M.: Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1), 1–15 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  2. Amir, A., Keselman, D.: Maximum agreement subtree in a set of evolutionary trees: Metrics and efficient algorithms. SIAM J. Computing 26(6), 1656–1669 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bryant, D.: Hunting for trees, building trees and comparing trees: Theory and method in phylogenetic analysis. PhD thesis, University of Canterbury (1997)

    Google Scholar 

  4. Bryant, D., Steel, M.: Computing the distribution of a tree metric. ACM/IEEE Trans. on Comput. Biology and Bioinformatics 6(3), 420–426 (2009)

    Article  Google Scholar 

  5. Cole, R., Farach-Colton, M., Hariharan, R., Przytycka, T., Thorup, M.: An O(n log n) algorithm for the maximum agreement subtree problem for binary trees. SIAM J. Computing 30(5), 1385–1404 (2000)

    Article  MATH  Google Scholar 

  6. DasGupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L.: On distances between phylogenetic trees. In: Proc. 8th ACM/SIAM Symp. Discrete Algs. (SODA 1997), pp. 427–436 (1997)

    Google Scholar 

  7. Day, W.H.E.: Optimal algorithms for comparing trees with labeled leaves. J. Classification 2(1), 7–28 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  8. Edmonds, J., Karp, R.M.: Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM 19(2), 248–264 (1972)

    Article  MATH  Google Scholar 

  9. Farach, M., Przytycka, T.M., Thorup, M.: On the agreement of many trees. Inf. Process. Lett. 55(6), 297–301 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gabow, H.N., Tarjan, R.E.: Faster scaling algorithms for network problems. SIAM J. Computing 18(5), 1013–1036 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hickey, G., Dehne, F., Rau-Chaplin, A., Blouin, C.: SPR distance computation of unrooted trees. Evol. Bioinform. Online 4, 17–27 (2008)

    Google Scholar 

  12. Kao, M.Y.: Tree contractions and evolutionary trees. SIAM J. Computing 27(6), 1592–1616 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  13. Li, M., Tromp, J., Zhang, L.: On the nearest-neighbour interchange distance between evolutionary trees. J. Theor. Biol. 182(4), 463–467 (1996)

    Article  Google Scholar 

  14. Pattengale, N.D., Gottlieb, E.J., Moret, B.M.E.: Efficiently computing the Robinson-Foulds metric. J. Comput. Biol. 14(6), 724–735 (2007)

    Article  MathSciNet  Google Scholar 

  15. Pattengale, N.D., Swenson, K.M., Moret, B.M.E.: Uncovering hidden phylogenetic consensus. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) ISBRA 2010. LNCS, vol. 6053, pp. 128–139. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Robinson, D.R., Foulds, L.R.: Comparison of phylogenetic trees. Mathematical Biosciences 53, 131–147 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  17. Steel, M., Penny, D.: Distributions of tree comparison metrics—some new results. Syst. Biol. 42(2), 126–141 (1993)

    Google Scholar 

  18. Steel, M., Warnow, T.: Kaikoura tree theorems: computing maximum agreement subtree problem. Information Processing Letters 48, 77–82 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  19. Stockham, C., Wang, L.-S., Warnow, T.: Statistically-based postprocessing of phylogenetic analysis using clustering. In: Proc. 10th Conf. Intelligent Systems for Mol. Biol. (ISMB 2002). Bioinformatics, vol. 18, pp. S285–S293. Oxford U. Press, Oxford (2002)

    Google Scholar 

  20. Whidden, C., Zeh, N.: A unifying view on approximation and fpt of agreement forests. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 390–402. Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, Y., Rajan, V., Moret, B.M.E. (2011). A Metric for Phylogenetic Trees Based on Matching. In: Chen, J., Wang, J., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2011. Lecture Notes in Computer Science(), vol 6674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21260-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21260-4_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21259-8

  • Online ISBN: 978-3-642-21260-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics