Journal of Classification

, Volume 2, Issue 1, pp 7–28

Optimal algorithms for comparing trees with labeled leaves

  • William H. E. Day
Authors Of Articles

Abstract

LetRn denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inRn, thestrict consensus tree C(T1, ...,Tk) is that tree inRn containing exactly those clusters common to every one of thek trees. Similarity between treesT1 andT2 inRn is measured by the numberS(T1,T2) of nontrivial clusters in bothT1 andT2; dissimilarity, by the numberD(T1,T2) of clusters inT1 orT2 but not in both. Algorithms are known to computeC(T1, ...,Tk) inO(kn2) time, andS(T1,T2) andD(T1,T2) inO(n2) time. I propose a special representation of the clusters of any treeT Rn, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T1, ...,Tk) inO(kn) time, andS(T1,T2) andD(T1,T2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

Keywords

Algorithm complexity Algorithm design Comparing hierarchical classifications Comparing phylogenetic trees Consensus index Strict consensus tree 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.Google Scholar
  2. AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.Google Scholar
  3. BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.Google Scholar
  4. BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124.Google Scholar
  5. CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.Google Scholar
  6. COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.Google Scholar
  7. DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114.Google Scholar
  8. HARARY, F. (1969),Graph Theory, Reading, Massachusetts: Addison-Wesley.Google Scholar
  9. HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065.Google Scholar
  10. MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327.Google Scholar
  11. MARGUSH, T. (1982), “Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290.Google Scholar
  12. MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”Bulletin of Mathematical Biology, 43, 239–244.Google Scholar
  13. McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.Google Scholar
  14. McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136.Google Scholar
  15. MICKEVICH, M.F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.Google Scholar
  16. NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson'sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.Google Scholar
  17. NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.Google Scholar
  18. NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”Mathematical Biosciences, 63, 271–287.Google Scholar
  19. RESTLE, F. (1959), “A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220.Google Scholar
  20. ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119.Google Scholar
  21. ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147.Google Scholar
  22. ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.Google Scholar
  23. ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.Google Scholar
  24. SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.Google Scholar
  25. SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.Google Scholar
  26. SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”Systematic Zoology, 30, 309–325.Google Scholar
  27. STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: Addison-Wesley.Google Scholar
  28. STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935.Google Scholar
  29. TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.Google Scholar
  30. WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.Google Scholar
  31. WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.Google Scholar

Copyright information

© Springer-Verlag New York Inc 1985

Authors and Affiliations

  • William H. E. Day
    • 1
  1. 1.Department of Computer ScienceMemorial University of NewfoundlandSt. John'sCanada

Personalised recommendations