LetR_{n} denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR_{n}, thestrict consensus tree C(T_{1}, ...,T_{k}) is that tree inR_{n} containing exactly those clusters common to every one of thek trees. Similarity between treesT_{1} andT_{2} inR_{n} is measured by the numberS(T_{1},T_{2}) of nontrivial clusters in bothT_{1} andT_{2}; dissimilarity, by the numberD(T_{1},T_{2}) of clusters inT_{1} orT_{2} but not in both. Algorithms are known to computeC(T_{1}, ...,T_{k}) inO(kn^{2}) time, andS(T_{1},T_{2}) andD(T_{1},T_{2}) inO(n^{2}) time. I propose a special representation of the clusters of any treeT R_{n}, one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T_{1}, ...,T_{k}) inO(kn) time, andS(T_{1},T_{2}) andD(T_{1},T_{2}) inO(_{n}) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.

Keywords

Algorithm complexity Algorithm design Comparing hierarchical classifications Comparing phylogenetic trees Consensus index Strict consensus tree