Optimal algorithms for comparing trees with labeled leaves

Day, William H. E.

doi:10.1007/BF01908061

Optimal algorithms for comparing trees with labeled leaves

Authors Of Articles
Published: December 1985

Volume 2, pages 7–28, (1985)
Cite this article

Journal of Classification Aims and scope Submit manuscript

William H. E. Day¹

981 Accesses
152 Citations
3 Altmetric
Explore all metrics

Abstract

LetR _n denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR _n, thestrict consensus tree C(T ₁, ...,T _k) is that tree inR _n containing exactly those clusters common to every one of thek trees. Similarity between treesT ₁ andT ₂ inR _n is measured by the numberS(T ₁,T ₂) of nontrivial clusters in bothT ₁ andT ₂; dissimilarity, by the numberD(T ₁,T ₂) of clusters inT ₁ orT ₂ but not in both. Algorithms are known to computeC(T ₁, ...,T _k) inO(kn ²) time, andS(T ₁,T ₂) andD(T ₁,T ₂) inO(n ²) time. I propose a special representation of the clusters of any treeT R _n, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T ₁, ...,T _k) inO(kn) time, andS(T ₁,T ₂) andD(T ₁,T ₂) inO(_n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Longest Common Substring with Approximately k Mismatches

Article Open access 16 February 2019

Centrality measures in networks

Article 24 April 2023

References

ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.
Google Scholar
AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.
Google Scholar
BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.
Google Scholar
BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124.
Google Scholar
CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.
Google Scholar
COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.
Google Scholar
DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114.
Google Scholar
HARARY, F. (1969),Graph Theory, Reading, Massachusetts: Addison-Wesley.
Google Scholar
HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065.
Google Scholar
MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327.
Google Scholar
MARGUSH, T. (1982), “Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290.
Google Scholar
MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”Bulletin of Mathematical Biology, 43, 239–244.
Google Scholar
McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.
Google Scholar
McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136.
Google Scholar
MICKEVICH, M.F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.
Google Scholar
NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson'sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.
Google Scholar
NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.
Google Scholar
NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”Mathematical Biosciences, 63, 271–287.
Google Scholar
RESTLE, F. (1959), “A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220.
Google Scholar
ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119.
Google Scholar
ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147.
Google Scholar
ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.
Google Scholar
ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.
Google Scholar
SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.
Google Scholar
SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.
Google Scholar
SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”Systematic Zoology, 30, 309–325.
Google Scholar
STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: Addison-Wesley.
Google Scholar
STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935.
Google Scholar
TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.
Google Scholar
WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.
Google Scholar
WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Memorial University of Newfoundland, A1C 5S7, St. John's, Newfoundland, Canada
William H. E. Day

Authors

William H. E. Day
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Day, W.H.E. Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2, 7–28 (1985). https://doi.org/10.1007/BF01908061

Download citation

Issue Date: December 1985
DOI: https://doi.org/10.1007/BF01908061

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal algorithms for comparing trees with labeled leaves

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Longest Common Substring with Approximately k Mismatches

Centrality measures in networks

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal algorithms for comparing trees with labeled leaves

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Longest Common Substring with Approximately k Mismatches

Centrality measures in networks

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation