Optimal algorithms for comparing trees with labeled leaves
 William H. E. Day
 … show all 1 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
LetR _{ n } denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR _{ n }, thestrict consensus tree C(T _{1}, ...,T _{ k }) is that tree inR _{ n } containing exactly those clusters common to every one of thek trees. Similarity between treesT _{1} andT _{2} inR _{ n } is measured by the numberS(T _{1},T _{2}) of nontrivial clusters in bothT _{1} andT _{2}; dissimilarity, by the numberD(T _{1},T _{2}) of clusters inT _{1} orT _{2} but not in both. Algorithms are known to computeC(T _{1}, ...,T _{ k }) inO(kn ^{2}) time, andS(T _{1},T _{2}) andD(T _{1},T _{2}) inO(n ^{2}) time. I propose a special representation of the clusters of any treeT R _{ n }, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T _{1}, ...,T _{ k }) inO(kn) time, andS(T _{1},T _{2}) andD(T _{1},T _{2}) inO(_{n}) time. These algorithms are optimal in a technical sense. They enable wellknown indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.
 ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.
 AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: AddisonWesley.
 BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.
 BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124.
 CAVALLISFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.
 COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.
 DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114.
 HARARY, F. (1969),Graph Theory, Reading, Massachusetts: AddisonWesley.
 HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065.
 MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327.
 MARGUSH, T. (1982), “Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290.
 MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus nTrees,”Bulletin of Mathematical Biology, 43, 239–244.
 McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: SpringerVerlag, 122–126.
 McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136.
 MICKEVICH, M.F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.
 NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson'sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.
 NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.
 NEUMANN, D.A. (1983), “Faithful Consensus Methods for nTrees,”Mathematical Biosciences, 63, 271–287.
 RESTLE, F. (1959), “A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220.
 ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119.
 ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147.
 ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.
 ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.
 SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.
 SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.
 SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Reexamined,”Systematic Zoology, 30, 309–325.
 STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: AddisonWesley.
 STINEBRICKNER, R. (1984), “sConsensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935.
 TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.
 WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.
 WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.
 Title
 Optimal algorithms for comparing trees with labeled leaves
 Journal

Journal of Classification
Volume 2, Issue 1 , pp 728
 Cover Date
 19851201
 DOI
 10.1007/BF01908061
 Print ISSN
 01764268
 Online ISSN
 14321343
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Algorithm complexity
 Algorithm design
 Comparing hierarchical classifications
 Comparing phylogenetic trees
 Consensus index
 Strict consensus tree
 Industry Sectors
 Authors

 William H. E. Day ^{(1)}
 Author Affiliations

 1. Department of Computer Science, Memorial University of Newfoundland, A1C 5S7, St. John's, Newfoundland, Canada