Abstract
Molecular biology has raised many interesting and deep mathematical and computational questions. For example, given the DNA or protein sequences of several organisms, can we know how much they are related to each other by computing an optimal alignment or, in particular, a longest common subsequence, of the sequences? Is it possible to reconstruct the evolutionary process for a set of extant species from their DNA sequences? Given many small overlapping fragments of a DNA molecule, how do we recover the DNA sequence? Will the shortest common superstring of these fragments give a good estimate? How many fragments suffice to guarantee that the reconstructed sequence is within 99% of the true DNA sequence? An organism can evolve by chromosome inversions, and this raises the question of how to transform one sequence into another with the smallest number of reversals.
Rather than an extensive literature survey, the purpose of this article is to introduce in depth several prominent optimization problems arising in molecular biology. We will emphasize recent developments and provide proof sketches for the results whenever possible.
ArticleNote
Supported in part by NSERC Operating Grant OGP0046613.
Supported in part by NSERC Operating Grant OGP0046506.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Altschul and D. Lipman, Trees, stars, and multiple sequence alignment, SIAM Journal on Applied. Math 49, pp. 197–209, 1989
S. Arora, C. Lund, R. Motwani, M. Sudan and M. Szegedy, Proof verification and hardness of approximation problems, Proc. IEEE 32nd FOCS, pp. 14–23, 1992.
D. Bacon and W. Anderson, Multiple sequence alignment, Journal of Molecular Biology 191, pp. 153–161, 1986.
V. Bafna and P. Pevzner, Approximate methods for multiple sequence alignment, Manuscript, 1993.
V. Bafna and P. Pevzner, Genome rearrangements and sorting by reversals, to be presented at 84th IEEE FOCS, Oct. 1993.
P. Berman and V. Ramaiyer, Improved approximations for the Steiner tree problem, Manuscript, 1993.
A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Proc. 23rd ACM Symp. on Theory of Computing, 1991, 328–336; also to appear in J.ACM.
H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM Journal on Applied Math. 48, pp. 1073–1082, 1988.
S. C. Chan, A. K. C. Wong and D. K. T. Chiu, A survey of multiple sequence comparison methods, Bulletin of Mathematical Biology 54 (4), pp. 563–598, 1992.
M.O. Dayhoff. Computer analysis of protein evolution. Scientific American 221:l(July, 1969 ), 86–95.
L. R. Foulds and R.L. Graham, The Steiner problem in phylogeny is NP-complete, Advances in Applied Mathematics 3, pp. 43–49, 1982.
D.E. Foulser. On random strings and sequence comparisons. Ph.D. Thesis, Stanford University, 1986.
D.E. Foulser, M. Li, and Q. Yang. Theory and algorithms for plan merging. Artificial Intelligence Journal, 57 (1992), 143–181.
J. Gallant, D. Maier, J. Storer. On finding minimal length superstring. Journal of Computer and System Sciences, 20 (1980), 50–58.
M. Garey and D. Johnson. Computers and Intractability. Freeman, New York, 1979.
D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds, Tech. Report, CSE-91-4, UC Davis, 1991.
D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds, Bulletin of Mathematical Biology 55, pp. 141–154, 1993.
C.C. Hayes. A model of planning for plan efficiency: Taking advantage of operator overlap. Proceedings of the 11th International Joint Conference of Artificial Intelligence, Detroit, Michigan. (1989), 949–953.
J. J. Hein, A tree reconstruction method that is economical in the number of pairwise comparisons used, Mol. Biol. Evol. 6 (6), pp. 669–684, 1989.
J. J. Hein, A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given, Mol. Biol. Evol. 6 (6), pp. 649–668, 1989.
D.S. Hirschberg. The longest common subsequence problem. Ph.D. Thesis, Princeton University, 1975.
W.J. Hsu and M.W. Du. Computing a longest common subsequence for a set of strings. BIT 24, 1984, 45–59.
F. K. Hwang and D. S. Richards, Steiner tree problems, Networks 22, pp. 55–89, 1992.
R.W. Irving and C.B. Fraser. Two algorithms for the longest common subsequence of three (or more) strings. Proc. 2nd Symp. Combinatorial Pattern Matching, 1992.
T. Jiang and M. Li. Towards a DNA sequencing theory (revised version). Submitted for publication, 1991.
T. Jiang and M. Li. On the complexity of learning strings and sequences. Proc. 4th Workshop on Computational Learning, 1991; also to appear in Theoret. Comp. Sei.\
T. Jiang and M. Li. Approximating shortest superstrings with constraints. Proc. 3rd Workshop on Algorithms and Data Structures, 1993, pp. 385–396; also to appear in Theoret. Comp. Sei.
T. Jiang, M. Li, and D-Z. Du, A note on shortest superstrings with flipping, Inform. Process. Lett., 44: 4 (1992), 195–199.
T.H. Jukes and C.R. Cantor, Evolution of protein molecules, in H.N. Munro, ed., Mammalian Protein Metabolism, Academic Press, pp. 21–132, 1969.
D. Karger, R. Motwani, and G.D.S. Ramkumar. On approximating the longest path in a graph. Proc. 3rd WADS, 1993.
R. Karinthi, D.S. Nau, and Q. Yang. Handling feature interactions in process planning. Department of Computer Science, University of Maryland, College Park, MD. (1990).
R. M. Karp, Mapping the genome: some combinatorial problems arising in molecular biology, Proc. 25th ACM STOC, pp. 278–285, 1993.
J. Kececioglu and D. Sankoff, Exact and approximation algorithms for the inversion distance between two chromosomes, to appear in Algorithmica
E.S. Lander, R. Langridge and D.M. Saccocio, Mapping and interpreting biological information, Communications of the ACM 34 (11), pp. 33–39, 1991.
A. Lesk (Edited). Computational Molecular Biology, Sources and Methods for Sequence Analysis. Oxford University Press, 1988.
M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1993.
M. Li and P.M.B. Vitanyi. Combinatorial properties of finite sequences with high Kolmogorov complexity. To appear in Math. Syst. Theory.
S.Y. Lu and K.S. Fu. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst., Man, Cybern. Vol. SMC-8(5), 1978, 381–389.
D. Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25: 2 (1978), 322–336.
M. Middendorf, More on the complexity of common superstring and supersequence problems, to appear in Theoret. Comp. Sei
C. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982.
C.H. Papadimitriou and M. Yannakakis. Optimization, approximation, and complexity classes. Extended abstract in Proc. 20th ACM Symp. on Theory of Computing. 1988, 229–234; full version in Journal of Computer and System Sciences 43, 1991, 425–440.
C.H. Papadimitriou and M. Yannakakis. Optimization, approximation, and complexity classes. Extended abstract in Proc. 20th ACM Symp. on Theory of Computing. 1988, 229–234; full version in Journal of Computer and System Sciences 43, 1991, 425–440.
P. Pevzner, Multiple alignment, communication cost, and graph matching, SIAM J. Applied Math 56 (6), pp. 1763–1779, 1992.
D. Sankoff, Minimal mutation trees of sequences, SIAM J. Applied Math. 28 (1), pp. 35–42, 1975.
D. Sankoff, R. J. Cedergren and G. Lapalme, Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA, J. Mol. Evol. 7, pp. 133–149, 1976.
D. Sankoff and R. Cedergren, Simultaneous comparisons of three or more sequences related by a tree, In D. Sankoff and J. Kruskal, editors, Time warps, siring edits, and macromolecules: the theory and practice of sequence comparison, pp. 253–264, Addison Wesley, 1983.
D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA., 1983.
G.D. Schuler, S.F. Altschul, and D.J. Lipman. A workbench for multiple alignment construction and analysis, in Proteins: Structure, function and Genetics, in press.
R.Schwarz and M. Dayhoff, Matrices for detecting distant relationships in M. Dayhoff, ed., Atlas of protein sequences, National Biomedical Research Foundation, 1979, pp. 353–358.
T. Sellis. Multiple query optimization. ACM Transactions on Database Systems, 13: 1 (1988), 23–52
T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147 (1981), 195–197.
J. Storer. Data compression: methods and theory. Computer Science Press, 1988.
E. Sweedyk and T. Warnow, The tree alignment problem is NP-hard, Manuscript, 1992.
J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science 57 131–145 1988
S.H. Teng and F. Yao. Approximating shortest superstrings. 34th IEEE Symp. Foundat. Com-put. Sci., 1993.
V.G. Timkovskii. Complexity of common subsequence find supersequence problems and related problems. English Translation from Kibernetika, 5 (1989), 1–13.
J. Turner. Approximation algorithms for the shortest common superstring problem. Information and Computation 83, 1989, 1–20
R.A. Wagner and M.J. Fischer. The string-to-string correction problem. J. ACM, 21: 1 (1974), 168–173.
L. Wang and T. Jiang, On the complexity of multiple sequence alignment, submitted to Journal of Computational Biology, 1993.
L. Wang and T. Jiang, Approximation algorithms for tree alignment with a given phylogeny, submitted to Algorithmica, 1993.
M.S. Waterman, Sequence alignments, in Mathematical Methods for DNA Sequences, M.S. Waterman (ed.), CRC, Boca Raton, FL, pp. 53–92, 1989.
A.Z. Zelikovsky, The 11/6 approximation algorithm for the Steiner problem on networks, to appear in Information and Computation
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Kluwer Academic Publishers
About this chapter
Cite this chapter
Jiang, T., Li, M. (1994). Optimization Problems in Molecular Biology. In: Du, DZ., Sun, J. (eds) Advances in Optimization and Approximation. Nonconvex Optimization and Its Applications, vol 1. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-3629-7_10
Download citation
DOI: https://doi.org/10.1007/978-1-4613-3629-7_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-3631-0
Online ISBN: 978-1-4613-3629-7
eBook Packages: Springer Book Archive