Skip to main content

Optimization Problems in Molecular Biology

  • Chapter
Advances in Optimization and Approximation

Part of the book series: Nonconvex Optimization and Its Applications ((NOIA,volume 1))

Abstract

Molecular biology has raised many interesting and deep mathematical and computational questions. For example, given the DNA or protein sequences of several organisms, can we know how much they are related to each other by computing an optimal alignment or, in particular, a longest common subsequence, of the sequences? Is it possible to reconstruct the evolutionary process for a set of extant species from their DNA sequences? Given many small overlapping fragments of a DNA molecule, how do we recover the DNA sequence? Will the shortest common superstring of these fragments give a good estimate? How many fragments suffice to guarantee that the reconstructed sequence is within 99% of the true DNA sequence? An organism can evolve by chromosome inversions, and this raises the question of how to transform one sequence into another with the smallest number of reversals.

Rather than an extensive literature survey, the purpose of this article is to introduce in depth several prominent optimization problems arising in molecular biology. We will emphasize recent developments and provide proof sketches for the results whenever possible.

ArticleNote

Supported in part by NSERC Operating Grant OGP0046613.

Supported in part by NSERC Operating Grant OGP0046506.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Altschul and D. Lipman, Trees, stars, and multiple sequence alignment, SIAM Journal on Applied. Math 49, pp. 197–209, 1989

    Article  MathSciNet  MATH  Google Scholar 

  2. S. Arora, C. Lund, R. Motwani, M. Sudan and M. Szegedy, Proof verification and hardness of approximation problems, Proc. IEEE 32nd FOCS, pp. 14–23, 1992.

    Google Scholar 

  3. D. Bacon and W. Anderson, Multiple sequence alignment, Journal of Molecular Biology 191, pp. 153–161, 1986.

    Article  Google Scholar 

  4. V. Bafna and P. Pevzner, Approximate methods for multiple sequence alignment, Manuscript, 1993.

    Google Scholar 

  5. V. Bafna and P. Pevzner, Genome rearrangements and sorting by reversals, to be presented at 84th IEEE FOCS, Oct. 1993.

    Google Scholar 

  6. P. Berman and V. Ramaiyer, Improved approximations for the Steiner tree problem, Manuscript, 1993.

    Google Scholar 

  7. A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis. Linear approximation of shortest superstrings. Proc. 23rd ACM Symp. on Theory of Computing, 1991, 328–336; also to appear in J.ACM.

    Google Scholar 

  8. H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM Journal on Applied Math. 48, pp. 1073–1082, 1988.

    Article  MathSciNet  MATH  Google Scholar 

  9. S. C. Chan, A. K. C. Wong and D. K. T. Chiu, A survey of multiple sequence comparison methods, Bulletin of Mathematical Biology 54 (4), pp. 563–598, 1992.

    MATH  Google Scholar 

  10. M.O. Dayhoff. Computer analysis of protein evolution. Scientific American 221:l(July, 1969 ), 86–95.

    Article  Google Scholar 

  11. L. R. Foulds and R.L. Graham, The Steiner problem in phylogeny is NP-complete, Advances in Applied Mathematics 3, pp. 43–49, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  12. D.E. Foulser. On random strings and sequence comparisons. Ph.D. Thesis, Stanford University, 1986.

    Google Scholar 

  13. D.E. Foulser, M. Li, and Q. Yang. Theory and algorithms for plan merging. Artificial Intelligence Journal, 57 (1992), 143–181.

    Article  MathSciNet  MATH  Google Scholar 

  14. J. Gallant, D. Maier, J. Storer. On finding minimal length superstring. Journal of Computer and System Sciences, 20 (1980), 50–58.

    Article  MathSciNet  MATH  Google Scholar 

  15. M. Garey and D. Johnson. Computers and Intractability. Freeman, New York, 1979.

    MATH  Google Scholar 

  16. D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds, Tech. Report, CSE-91-4, UC Davis, 1991.

    Google Scholar 

  17. D. Gusfield, Efficient methods for multiple sequence alignment with guaranteed error bounds, Bulletin of Mathematical Biology 55, pp. 141–154, 1993.

    MATH  Google Scholar 

  18. C.C. Hayes. A model of planning for plan efficiency: Taking advantage of operator overlap. Proceedings of the 11th International Joint Conference of Artificial Intelligence, Detroit, Michigan. (1989), 949–953.

    Google Scholar 

  19. J. J. Hein, A tree reconstruction method that is economical in the number of pairwise comparisons used, Mol. Biol. Evol. 6 (6), pp. 669–684, 1989.

    Google Scholar 

  20. J. J. Hein, A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given, Mol. Biol. Evol. 6 (6), pp. 649–668, 1989.

    Google Scholar 

  21. D.S. Hirschberg. The longest common subsequence problem. Ph.D. Thesis, Princeton University, 1975.

    Google Scholar 

  22. W.J. Hsu and M.W. Du. Computing a longest common subsequence for a set of strings. BIT 24, 1984, 45–59.

    Article  MathSciNet  MATH  Google Scholar 

  23. F. K. Hwang and D. S. Richards, Steiner tree problems, Networks 22, pp. 55–89, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  24. R.W. Irving and C.B. Fraser. Two algorithms for the longest common subsequence of three (or more) strings. Proc. 2nd Symp. Combinatorial Pattern Matching, 1992.

    Google Scholar 

  25. T. Jiang and M. Li. Towards a DNA sequencing theory (revised version). Submitted for publication, 1991.

    Google Scholar 

  26. T. Jiang and M. Li. On the complexity of learning strings and sequences. Proc. 4th Workshop on Computational Learning, 1991; also to appear in Theoret. Comp. Sei.\

    Google Scholar 

  27. T. Jiang and M. Li. Approximating shortest superstrings with constraints. Proc. 3rd Workshop on Algorithms and Data Structures, 1993, pp. 385–396; also to appear in Theoret. Comp. Sei.

    Google Scholar 

  28. T. Jiang, M. Li, and D-Z. Du, A note on shortest superstrings with flipping, Inform. Process. Lett., 44: 4 (1992), 195–199.

    Article  MathSciNet  MATH  Google Scholar 

  29. T.H. Jukes and C.R. Cantor, Evolution of protein molecules, in H.N. Munro, ed., Mammalian Protein Metabolism, Academic Press, pp. 21–132, 1969.

    Google Scholar 

  30. D. Karger, R. Motwani, and G.D.S. Ramkumar. On approximating the longest path in a graph. Proc. 3rd WADS, 1993.

    Google Scholar 

  31. R. Karinthi, D.S. Nau, and Q. Yang. Handling feature interactions in process planning. Department of Computer Science, University of Maryland, College Park, MD. (1990).

    Google Scholar 

  32. R. M. Karp, Mapping the genome: some combinatorial problems arising in molecular biology, Proc. 25th ACM STOC, pp. 278–285, 1993.

    Google Scholar 

  33. J. Kececioglu and D. Sankoff, Exact and approximation algorithms for the inversion distance between two chromosomes, to appear in Algorithmica

    Google Scholar 

  34. E.S. Lander, R. Langridge and D.M. Saccocio, Mapping and interpreting biological information, Communications of the ACM 34 (11), pp. 33–39, 1991.

    Article  Google Scholar 

  35. A. Lesk (Edited). Computational Molecular Biology, Sources and Methods for Sequence Analysis. Oxford University Press, 1988.

    Google Scholar 

  36. M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, 1993.

    MATH  Google Scholar 

  37. M. Li and P.M.B. Vitanyi. Combinatorial properties of finite sequences with high Kolmogorov complexity. To appear in Math. Syst. Theory.

    Google Scholar 

  38. S.Y. Lu and K.S. Fu. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Trans. Syst., Man, Cybern. Vol. SMC-8(5), 1978, 381–389.

    Article  MathSciNet  MATH  Google Scholar 

  39. D. Maier. The complexity of some problems on subsequences and supersequences. J. ACM, 25: 2 (1978), 322–336.

    Article  MathSciNet  MATH  Google Scholar 

  40. M. Middendorf, More on the complexity of common superstring and supersequence problems, to appear in Theoret. Comp. Sei

    Google Scholar 

  41. C. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982.

    MATH  Google Scholar 

  42. C.H. Papadimitriou and M. Yannakakis. Optimization, approximation, and complexity classes. Extended abstract in Proc. 20th ACM Symp. on Theory of Computing. 1988, 229–234; full version in Journal of Computer and System Sciences 43, 1991, 425–440.

    Google Scholar 

  43. C.H. Papadimitriou and M. Yannakakis. Optimization, approximation, and complexity classes. Extended abstract in Proc. 20th ACM Symp. on Theory of Computing. 1988, 229–234; full version in Journal of Computer and System Sciences 43, 1991, 425–440.

    Article  MathSciNet  MATH  Google Scholar 

  44. P. Pevzner, Multiple alignment, communication cost, and graph matching, SIAM J. Applied Math 56 (6), pp. 1763–1779, 1992.

    Article  MathSciNet  Google Scholar 

  45. D. Sankoff, Minimal mutation trees of sequences, SIAM J. Applied Math. 28 (1), pp. 35–42, 1975.

    Article  MathSciNet  MATH  Google Scholar 

  46. D. Sankoff, R. J. Cedergren and G. Lapalme, Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA, J. Mol. Evol. 7, pp. 133–149, 1976.

    Article  Google Scholar 

  47. D. Sankoff and R. Cedergren, Simultaneous comparisons of three or more sequences related by a tree, In D. Sankoff and J. Kruskal, editors, Time warps, siring edits, and macromolecules: the theory and practice of sequence comparison, pp. 253–264, Addison Wesley, 1983.

    Google Scholar 

  48. D. Sankoff and J. Kruskal (Eds.) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA., 1983.

    Google Scholar 

  49. G.D. Schuler, S.F. Altschul, and D.J. Lipman. A workbench for multiple alignment construction and analysis, in Proteins: Structure, function and Genetics, in press.

    Google Scholar 

  50. R.Schwarz and M. Dayhoff, Matrices for detecting distant relationships in M. Dayhoff, ed., Atlas of protein sequences, National Biomedical Research Foundation, 1979, pp. 353–358.

    Google Scholar 

  51. T. Sellis. Multiple query optimization. ACM Transactions on Database Systems, 13: 1 (1988), 23–52

    Article  Google Scholar 

  52. T.F. Smith and M.S. Waterman. Identification of common molecular subsequences. Journal of Molecular Biology, 147 (1981), 195–197.

    Article  Google Scholar 

  53. J. Storer. Data compression: methods and theory. Computer Science Press, 1988.

    Google Scholar 

  54. E. Sweedyk and T. Warnow, The tree alignment problem is NP-hard, Manuscript, 1992.

    Google Scholar 

  55. J. Tarhio and E. Ukkonen. A greedy approximation algorithm for constructing shortest common superstrings. Theoretical Computer Science 57 131–145 1988

    Article  MathSciNet  MATH  Google Scholar 

  56. S.H. Teng and F. Yao. Approximating shortest superstrings. 34th IEEE Symp. Foundat. Com-put. Sci., 1993.

    Google Scholar 

  57. V.G. Timkovskii. Complexity of common subsequence find supersequence problems and related problems. English Translation from Kibernetika, 5 (1989), 1–13.

    MathSciNet  Google Scholar 

  58. J. Turner. Approximation algorithms for the shortest common superstring problem. Information and Computation 83, 1989, 1–20

    Article  MathSciNet  MATH  Google Scholar 

  59. R.A. Wagner and M.J. Fischer. The string-to-string correction problem. J. ACM, 21: 1 (1974), 168–173.

    Article  MathSciNet  MATH  Google Scholar 

  60. L. Wang and T. Jiang, On the complexity of multiple sequence alignment, submitted to Journal of Computational Biology, 1993.

    Google Scholar 

  61. L. Wang and T. Jiang, Approximation algorithms for tree alignment with a given phylogeny, submitted to Algorithmica, 1993.

    Google Scholar 

  62. M.S. Waterman, Sequence alignments, in Mathematical Methods for DNA Sequences, M.S. Waterman (ed.), CRC, Boca Raton, FL, pp. 53–92, 1989.

    Google Scholar 

  63. A.Z. Zelikovsky, The 11/6 approximation algorithm for the Steiner problem on networks, to appear in Information and Computation

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Kluwer Academic Publishers

About this chapter

Cite this chapter

Jiang, T., Li, M. (1994). Optimization Problems in Molecular Biology. In: Du, DZ., Sun, J. (eds) Advances in Optimization and Approximation. Nonconvex Optimization and Its Applications, vol 1. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-3629-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-3629-7_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-3631-0

  • Online ISBN: 978-1-4613-3629-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics