# Approximation algorithms for multiple sequence alignment under a fixed evolutionary tree

## Abstract

We consider the problem of aligning sequences related by a given evolutionary tree: given a fixed tree with its leaves labeled with sequences, find ancestral sequences to label the internal nodes so as to minimize the total cost of all the edges in the tree. The cost of an edge is the edit distance between the sequences labeling its endpoints. In this paper, we consider the case when the given tree is a regular *d*-ary tree for some fixed *d* and provide a *d*+1/*d*−1-approximation algorithm for this problem that runs in time *O(d(2kn)*^{ d }+ *n*^{2}*k*^{2d}) where *k* is the number of leaves in the tree and *n* is the maximum length of any of the sequences labeling the leaves.

We also consider a new bottleneck objective in labeling the internal nodes. In this version, we wish to find the labeling of the internal nodes that minimizes the maximum cost of any edge in the tree. For this problem we provide a simple 2δ+ 1-approximation algorithm where *δ* is the depth of the given undirected tree defined as the maximum over all internal nodes of the number of edges from the internal node to a closest leaf. For phylogenetic trees on *n* nodes that have no internal nodes of degree two, *δ* ≤ lg *n*.

## Keywords

Approximation Algorithm Internal Node Edit Distance Performance Guarantee Ancestral Sequence## Preview

Unable to display preview. Download preview PDF.

## References

- 1.T. Jiang, E. L. Lawler, and L. Wang, “Aligning sequences via an evolutionary tree: complexity and approximation,”
*Proc. 26th ACM Symposium on the Theory of Computing*, 760–769 (1994).Google Scholar - 2.D. Sankoff, “Minimal mutation trees of sequences,”
*SIAM J. Appl. Math.*, 28(1), 35–42, (1975).CrossRefGoogle Scholar - 3.D. Sankoff, R. Cedergren and G. Laplame, “Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA,”
*J. Mol. Evol.*7, 133–149, (1976).PubMedGoogle Scholar - 4.D. Sankoff and R. Cedergren, “Simultaneous comparisons of three or more sequences related by a tree, in D. Sankoff and J. Kruskal (eds.)
*Time warps, string edits and macromolecules: the theory and practice of sequence comparison*, 253–264, Addison Wesley, Reading MA, (1983).Google Scholar - 5.M. S. Waterman and M. D. Perlwitz, “Line geometries for sequence comparisons,”
*Bull. Math. Biol.*46, 567–577, (1984).Google Scholar