Abstract
We define a new problem in multiple sequence alignment, called maximum weight trace. The problem formalizes in a natural way the common practice of merging pairwise alignments to form multiple sequence alignments, and contains a version of the minimum sum of pairs alignment problem as a special case.
Informally, the input is a set of pairs of matched characters from the sequences; each pair has an associated weight. The output is a subset of the pairs of maximum total weight that satisfies the following property: there is a multiple alignment that places each pair of characters selected by the subset together in the same column. A set of pairs with this property is called a trace. Intuitively a trace of maximum weight specifies a multiple alignment that agrees as much as possible with the character matches of the input.
We develop a branch and bound algorithm for maximum weight trace. Though the problem is NP-complete, an implementation of the algorithm shows we can solve instances on as many as 6 sequences of length 250 in a few minutes. These are among the largest instances that have been solved to optimality to date for any formulation of multiple sequence alignment.
Preview
Unable to display preview. Download preview PDF.
References
Altschul, Stephen F. and David J. Lipman. Trees, stars, and multiple biological sequence alignment. SIAM Journal on Applied Mathematics 49:1, 197–209, 1989.
Carrillo, Humberto and David Lipman. The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics 48, 1073–1082, 1988.
Chan, S.C., A.K.C. Wong and D.K.Y. Chiu. A survey of multiple sequence comparison methods. To appear in the Bulletin of Mathematical Biology, 1992.
Feng, Da-Fei and Russell F. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360, 1987.
Garey, Michael R. and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, New York, 1979.
Goldberg, Andrew V. and Robert E. Tarjan. A new approach to the maximum flow problem. Journal of the Association for Computing Machinery 35:4, 921–940, 1988.
Gotoh, Osamu. Consistency of optimal sequence alignments. Bulletin of Mathematical Biology 52:4, 509–525, 1990.
Gusfield, Dan. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology 55:1, 141–154, 1993.
Hsu, W.J. and M.W. Du. Computing a longest common subsequence for a set of strings. BIT 24, 45–59, 1984.
Irving, Robert W. and Campbell B. Fraser. Two algorithms for the longest common subsequence of three (or more) strings. In Proceedings of the 3rd Symposium on Combinatorial Pattern Matching, 211–226, 1992.
Kececioglu, John. Exact and Approximation Algorithms for DNA Sequence Reconstruction. PhD dissertation, Technical Report 91-26, Department of Computer Science, The University of Arizona, Tucson, Arizona 85721, 1991.
Maier, David. The complexity of some problems on subsequences and supersequences. Journal of the Association for Computing Machinery 25:2, 322–336, 1978.
Pevzner, Pavel. Multiple alignment, communication cost, and graph matching. To appear in SIAM Journal on Applied Mathematics.
Sankoff, David. Minimal mutation trees of sequences. SIAM Journal on Applied Mathematics 28:1, 35–42, 1975.
Sankoff, David and Joseph B. Kruskal, editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Massachusetts, 1983.
Sleator, Daniel D. and Robert E. Tarjan. Self-adjusting binary search trees. Journal of the Association for Computing Machinery 32:3, 652–686, 1985.
Smith, Temple F. and Michael S. Waterman. Identification of common molecular sequences. Journal of Molecular Biology 147, 195–197, 1981.
Vingron, Martin and Patrick Argos. A fast and sensitive multiple sequence alignment algorithm. Computer Applications in the Biosciences 5:2, 115–121, 1989.
Waterman, M.S. and R. Jones. Consensus methods for DNA and protein sequence alignment. Methods in Enzymology 188, 221–237, 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kececioglu, J. (1993). The maximum weight trace problem in multiple sequence alignment. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1993. Lecture Notes in Computer Science, vol 684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029800
Download citation
DOI: https://doi.org/10.1007/BFb0029800
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56764-6
Online ISBN: 978-3-540-47732-7
eBook Packages: Springer Book Archive