# Optimal algorithms for comparing trees with labeled leaves

- 732 Downloads
- 101 Citations

## Abstract

Let*R*_{ n } denote the set of rooted trees with*n* leaves in which: the leaves are labeled by the integers in {1, ...,*n*}; and among interior vertices only the root may have degree two. Associated with each interior vertex*v* in such a tree is the subset, or*cluster*, of leaf labels in the subtree rooted at*v.* Cluster {1, ...,*n*} is called*trivial*. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For any*k* trees in*R*_{ n }, the*strict consensus tree C*(*T*_{1}, ...,*T*_{ k }) is that tree in*R*_{ n } containing exactly those clusters common to every one of the*k* trees. Similarity between trees*T*_{1} and*T*_{2} in*R*_{ n } is measured by the number*S*(*T*_{1},*T*_{2}) of nontrivial clusters in both*T*_{1} and*T*_{2}; dissimilarity, by the number*D*(*T*_{1},*T*_{2}) of clusters in*T*_{1} or*T*_{2} but not in both. Algorithms are known to compute*C*(*T*_{1}, ...,*T*_{ k }) in*O*(*kn*^{2}) time, and*S*(*T*_{1},*T*_{2}) and*D*(*T*_{1},*T*_{2}) in*O*(*n*^{2}) time. I propose a special representation of the clusters of any tree*T R*_{ n }, one that permits testing in constant time whether a given cluster exists in*T*. I describe algorithms that exploit this representation to compute*C*(*T*_{1}, ...,*T*_{ k }) in*O*(*kn*) time, and*S*(*T*_{1},*T*_{2}) and*D*(*T*_{1},*T*_{2}) in*O*(_{n}) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed in*O*(*n*) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

## Keywords

Algorithm complexity Algorithm design Comparing hierarchical classifications Comparing phylogenetic trees Consensus index Strict consensus tree## Preview

Unable to display preview. Download preview PDF.

## References

- ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”
*Systematic Zoology, 21*, 390–397.Google Scholar - AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),
*The Design and Analysis of Computer Algorithms*, Reading, Massachusetts: Addison-Wesley.Google Scholar - BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.Google Scholar
- BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”
*Journal of Classification, 1*, 93–124.Google Scholar - CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”
*American Journal of Human Genetics, 19*, 233–257.Google Scholar - COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data for
*Menidia*Species: A Reappraisal,”*Systematic Zoology, 29*, 288–299.Google Scholar - DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”
*Mathematical Biosciences, 66*, 97–114.Google Scholar - HARARY, F. (1969),
*Graph Theory*, Reading, Massachusetts: Addison-Wesley.Google Scholar - HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”
*SIAM Journal on Applied Mathematics Theory, 44*, 1054–1065.Google Scholar - MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”
*Colloquium Mathematicum, 6*, 319–327.Google Scholar - MARGUSH, T. (1982), “Distances Between Trees,”
*Discrete Applied Mathematics, 4*, 281–290.Google Scholar - MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”
*Bulletin of Mathematical Biology, 43*, 239–244.Google Scholar - McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” in
*Numerical Taxonomy: Proceedings of a NATO Advanced Study Institute*, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.Google Scholar - McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”
*Mathematical Social Sciences, 4*, 131–136.Google Scholar - MICKEVICH, M.F. (1978), “Taxonomic Congruence,”
*Systematic Zoology, 27*, 143–158.Google Scholar - NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson's
*Familles des Plantes*(1763–1764),”*Systematic Zoology, 28*, 1–21.Google Scholar - NELSON, G., and PLATNICK, N. (1981),
*Systematics and Biogeography: Cladistics and Vicariance*, New York: Columbia University Press.Google Scholar - NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”
*Mathematical Biosciences, 63*, 271–287.Google Scholar - RESTLE, F. (1959), “A Metric and an Ordering on Sets,”
*Psychometrika, 24*, 207–220.Google Scholar - ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”
*Journal of Combinatorial Theory, 11*, 105–119.Google Scholar - ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”
*Mathematical Biosciences, 53*, 131–147.Google Scholar - ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”
*Mathematical Biosciences, 59*, 131–144.Google Scholar - ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”
*Bulletin of Mathematical Biology, 45*, 33–40.Google Scholar - SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”
*Systematic Zoology, 30*, 331–351.Google Scholar - SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.Google Scholar
- SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”
*Systematic Zoology, 30*, 309–325.Google Scholar - STANDISH, T.A. (1980),
*Data Structure Techniques*, Reading, Massachusetts: Addison-Wesley.Google Scholar - STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”
*Bulletin of Mathematical Biology, 46*, 923–935.Google Scholar - TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”
*Journal of Molecular Evolution, 18*, 387–404.Google Scholar - WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”
*Journal of Theoretical Biology, 73*, 789–800.Google Scholar - WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”
*Computing Surveys, 9*, 291–313.Google Scholar