Journal of Classification

, Volume 2, Issue 1, pp 7–28

Optimal algorithms for comparing trees with labeled leaves

Authors

  • William H. E. Day
    • Department of Computer ScienceMemorial University of Newfoundland
Authors Of Articles

DOI: 10.1007/BF01908061

Cite this article as:
Day, W.H.E. Journal of Classification (1985) 2: 7. doi:10.1007/BF01908061

Abstract

LetRn denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inRn, thestrict consensus tree C(T1, ...,Tk) is that tree inRn containing exactly those clusters common to every one of thek trees. Similarity between treesT1 andT2 inRn is measured by the numberS(T1,T2) of nontrivial clusters in bothT1 andT2; dissimilarity, by the numberD(T1,T2) of clusters inT1 orT2 but not in both. Algorithms are known to computeC(T1, ...,Tk) inO(kn2) time, andS(T1,T2) andD(T1,T2) inO(n2) time. I propose a special representation of the clusters of any treeT Rn, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T1, ...,Tk) inO(kn) time, andS(T1,T2) andD(T1,T2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

Keywords

Algorithm complexityAlgorithm designComparing hierarchical classificationsComparing phylogenetic treesConsensus indexStrict consensus tree

Copyright information

© Springer-Verlag New York Inc 1985