Optimal algorithms for comparing trees with labeled leaves Authors Of Articles DOI :
10.1007/BF01908061

Cite this article as: Day, W.H.E. Journal of Classification (1985) 2: 7. doi:10.1007/BF01908061
71
Citations
599
Downloads
Abstract LetR _{n} denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n }; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster , of leaf labels in the subtree rooted atv. Cluster {1, ...,n } is calledtrivial . Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR _{n} , thestrict consensus tree C (T _{1} , ...,T _{k} ) is that tree inR _{n} containing exactly those clusters common to every one of thek trees. Similarity between treesT _{1} andT _{2} inR _{n} is measured by the numberS (T _{1} ,T _{2} ) of nontrivial clusters in bothT _{1} andT _{2} ; dissimilarity, by the numberD (T _{1} ,T _{2} ) of clusters inT _{1} orT _{2} but not in both. Algorithms are known to computeC (T _{1} , ...,T _{k} ) inO (kn ^{2} ) time, andS (T _{1} ,T _{2} ) andD (T _{1} ,T _{2} ) inO (n ^{2} ) time. I propose a special representation of the clusters of any treeT R _{n} , one that permits testing in constant time whether a given cluster exists inT . I describe algorithms that exploit this representation to computeC (T _{1} , ...,T _{k} ) inO (kn ) time, andS (T _{1} ,T _{2} ) andD (T _{1} ,T _{2} ) inO (_{n} ) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO (n ) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

Keywords Algorithm complexity Algorithm design Comparing hierarchical classifications Comparing phylogenetic trees Consensus index Strict consensus tree The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142.

References ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”

Systematic Zoology, 21 , 390–397.

Google Scholar AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),

The Design and Analysis of Computer Algorithms , Reading, Massachusetts: Addison-Wesley.

Google Scholar BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.

Google Scholar BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”

Journal of Classification, 1 , 93–124.

Google Scholar CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”

American Journal of Human Genetics, 19 , 233–257.

Google Scholar COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data for

Menidia Species: A Reappraisal,”

Systematic Zoology, 29 , 288–299.

Google Scholar DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”

Mathematical Biosciences, 66 , 97–114.

Google Scholar HARARY, F. (1969),

Graph Theory , Reading, Massachusetts: Addison-Wesley.

Google Scholar HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”

SIAM Journal on Applied Mathematics Theory, 44 , 1054–1065.

Google Scholar MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”

Colloquium Mathematicum, 6 , 319–327.

Google Scholar MARGUSH, T. (1982), “Distances Between Trees,”

Discrete Applied Mathematics, 4 , 281–290.

Google Scholar MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”

Bulletin of Mathematical Biology, 43 , 239–244.

Google Scholar McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” in

Numerical Taxonomy: Proceedings of a NATO Advanced Study Institute , ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.

Google Scholar McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”

Mathematical Social Sciences, 4 , 131–136.

Google Scholar MICKEVICH, M.F. (1978), “Taxonomic Congruence,”

Systematic Zoology, 27 , 143–158.

Google Scholar NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson's

Familles des Plantes (1763–1764),”

Systematic Zoology, 28 , 1–21.

Google Scholar NELSON, G., and PLATNICK, N. (1981),

Systematics and Biogeography: Cladistics and Vicariance , New York: Columbia University Press.

Google Scholar NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”

Mathematical Biosciences, 63 , 271–287.

Google Scholar RESTLE, F. (1959), “A Metric and an Ordering on Sets,”

Psychometrika, 24 , 207–220.

Google Scholar ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”

Journal of Combinatorial Theory, 11 , 105–119.

Google Scholar ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”

Mathematical Biosciences, 53 , 131–147.

Google Scholar ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”

Mathematical Biosciences, 59 , 131–144.

Google Scholar ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”

Bulletin of Mathematical Biology, 45 , 33–40.

Google Scholar SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”

Systematic Zoology, 30 , 331–351.

Google Scholar SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.

Google Scholar SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”

Systematic Zoology, 30 , 309–325.

Google Scholar STANDISH, T.A. (1980),

Data Structure Techniques , Reading, Massachusetts: Addison-Wesley.

Google Scholar STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”

Bulletin of Mathematical Biology, 46 , 923–935.

Google Scholar TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”

Journal of Molecular Evolution, 18 , 387–404.

Google Scholar WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”

Journal of Theoretical Biology, 73 , 789–800.

Google Scholar WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”

Computing Surveys, 9 , 291–313.

Google Scholar © Springer-Verlag New York Inc 1985

Authors and Affiliations 1. Department of Computer Science Memorial University of Newfoundland St. John's Canada