Abstract
We study the following fundamental problem in computational molecular biology: Given a set of DNA sequences representing some species and a phylogenetic tree depicting the ancestral relationship among these species, compute an optimal alignment of the sequences by the means of constructing a minimum-cost evolutionary tree. The problem is an important variant of multiple sequence alignment, and is widely known astree alignment. We design an efficient approximation algorithm with performance ratio 2 for tree alignment. The algorithm is then extended to a polynomial-time approximation scheme. The construction actually works for Steiner trees in any metric space, and thus implies a polynomial-time approximation scheme for planar Steiner trees under a given topology (with any constant degree). To our knowledge, this is the first polynomial-time approximation scheme in the fields of computational biology and Steiner trees. The approximation algorithms may be useful in evolutionary genetics practice as they can provide a good initial alignment for the iterative method in [23].
Similar content being viewed by others
References
S. Altschul and D. Lipman, Trees, stars, and multiple sequence alignment,SIAM J. Appl. Math.,49 (1989), 197–209.
S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof verification and hardness of approximation problems,Proc. 33rd IEEE Symp. on Foundations of Computer Science, 1992, pp. 14–23.
M. Bern and P. Plassmann, The Steiner problem with edge lengths 1 and 2,Inform. Process. Lett.,32 (1989), 171–176.
H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology,SIAM J. Appl. Math.,48 (1988), 1073–1082.
S. C. Chan, A. K. C. Wong, and D. K. T. Chiu, A survey of multiple sequence comparison methods,Bull. Math. Biol.,54(4) (1992), 563–598.
D. Z. Du, Y. Zhang, and Q. Feng, On better heuristic for Euclidean Steiner minimum trees,Proc. 32nd IEEE Symp. on Foundations of Computer Science, 1991, pp. 431–439.
J. S. Farris, Methods for computing Wagner trees,Systematic Zoology,19 (1970), 83–92.
M. R. Garey and D. S. Johnson,Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, San Francisco, CA, 1979.
D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds,Bull. Math. Biol.,55 (1993), 141–154.
J. J. Hein, A tree reconstruction method that is economical in the number of pairwise comparisons used,Mol. Biol. Evol.,6(6) (1989), 669–684.
J. J. Hein, A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phytogeny is given,Mol. Biol. Evol.,6(6) (1989), 649–668.
F. K. Hwang and D. S. Richards, Steiner tree problems,Networks,22 (1992), 55–89.
F. K. Hwang and J. F. Weng, The shortest network under a given topology,J. Algorithms,13 (1992), 468–488.
R. M. Karp, Probabilistic analysis of partitioning algorithms for the traveling salesman problem in the plane,Math. Oper. Res.,2 (1977), 209–224.
R. M. Karp, Mapping the genome: some combinatorial problems arising in molecular biology,Proc. ACM Symp. on Theory of Computing, 1993, pp. 278–285.
E. S. Lander, R. Langridge, and D. M. Saccocio, Mapping and interpreting biological information,Comm. ACM,34(11) (1991), 33–39.
C. H. Papadimitriou and M. Yannakakis, Optimization, approximation, and complexity classes,J. Comput. System Sci.,43 (1991), 425–440.
D. Penny, Criteria for optimising phylogenetic trees and the problem of determining the root of a tree,J. Mol. Evol.,8 (1976), 95–116.
D. Sankoff, Minimal mutation trees of sequences,SIAM J. Appl. Math.,28(1) (1975), 35–42.
D. Sankoff and P. Rousseau, Locating the vertices of a Steiner tree in an arbitrary metric space,Math. Programming,9 (1975), 240–246.
D. Sankoff and R. Cedergren, Simultaneous comparisons of three or more sequences related by a tree, in D. Sankoff and J. Kruskal (eds.),Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, pp. 253–264, Addison-Wesley, Reading, MA, 1983.
D. Sankoff and J. Kruskal (eds.),Time Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA, 1983.
D. Sankoff, R. Cedergren, and G. Lapalme, Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA,J. Mol. Evol.,7 (1976), 133–149.
N. Saitou and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees,Mol. Biol. Evol.,4(4) (1987), 406–425.
L. Wang and T. Jiang, On the complexity of multiple sequence alignment,J. Computat. Biol.,1(4) (1994), 337–348.
M. S. Waterman, Sequence alignments, in M. S. Waterman (ed.),Mathematical Methods for DNA Sequences, CRC, Boca Raton, FL, 1989, pp. 53–92.
M. S. Waterman and M. D. Perlwitz, Line geometries for sequence comparisons.Bull. Math. Biol.,46 (1984), 567–577.
Author information
Authors and Affiliations
Additional information
Communicated by R. M. Karp.
Supported in part by NSERC Operating Grant OGP0046613.
Supported in part by NSERC Operating Grant OGP0046613 and a Canadian Genome Analysis and Technology Research Grant.
Supported in part by US Department of Energy Grant DE-FG03-90ER6099.
Rights and permissions
About this article
Cite this article
Wang, L., Jiang, T. & Lawler, E.L. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica 16, 302–315 (1996). https://doi.org/10.1007/BF01955679
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01955679