Skip to main content
Log in

Optimal algorithms for comparing trees with labeled leaves

  • Authors Of Articles
  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

LetR n denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR n , thestrict consensus tree C(T 1, ...,T k ) is that tree inR n containing exactly those clusters common to every one of thek trees. Similarity between treesT 1 andT 2 inR n is measured by the numberS(T 1,T 2) of nontrivial clusters in bothT 1 andT 2; dissimilarity, by the numberD(T 1,T 2) of clusters inT 1 orT 2 but not in both. Algorithms are known to computeC(T 1, ...,T k ) inO(kn 2) time, andS(T 1,T 2) andD(T 1,T 2) inO(n 2) time. I propose a special representation of the clusters of any treeT R n , one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T 1, ...,T k ) inO(kn) time, andS(T 1,T 2) andD(T 1,T 2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.

    Google Scholar 

  • AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.

    Google Scholar 

  • BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124.

    Google Scholar 

  • CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.

    Google Scholar 

  • COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.

    Google Scholar 

  • DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114.

    Google Scholar 

  • HARARY, F. (1969),Graph Theory, Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065.

    Google Scholar 

  • MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327.

    Google Scholar 

  • MARGUSH, T. (1982), “Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290.

    Google Scholar 

  • MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”Bulletin of Mathematical Biology, 43, 239–244.

    Google Scholar 

  • McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.

    Google Scholar 

  • McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136.

    Google Scholar 

  • MICKEVICH, M.F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.

    Google Scholar 

  • NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson'sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.

    Google Scholar 

  • NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.

    Google Scholar 

  • NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”Mathematical Biosciences, 63, 271–287.

    Google Scholar 

  • RESTLE, F. (1959), “A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220.

    Google Scholar 

  • ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119.

    Google Scholar 

  • ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147.

    Google Scholar 

  • ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.

    Google Scholar 

  • ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.

    Google Scholar 

  • SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.

    Google Scholar 

  • SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.

    Google Scholar 

  • SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”Systematic Zoology, 30, 309–325.

    Google Scholar 

  • STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935.

    Google Scholar 

  • TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.

    Google Scholar 

  • WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.

    Google Scholar 

  • WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Day, W.H.E. Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2, 7–28 (1985). https://doi.org/10.1007/BF01908061

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01908061

Keywords

Navigation