Skip to main content
Book cover

Sequences II pp 225–244Cite as

Efficient Algorithms for Sequence Analysis

  • Conference paper

Abstract

We consider new algorithms for the solution of many dynamic programming recurrences for sequence comparison and for RNA secondary structure prediction. The techniques upon which the algorithms are based effectively exploit the physical constraints of the problem to derive more efficient methods for sequence analysis.

Keywords

  • Cost Function
  • Input Sequence
  • Edit Distance
  • Edit Operation
  • Longe Common Subsequence

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Work partially supported by NSF Grant CCR-9014605.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4613-9323-8_17
  • Chapter length: 20 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-1-4613-9323-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aggarwal, M. M. Klawe, S. Moran, P. Shor, and R. Wilber, Geometric Applications of a Matrix-Searching Algorithm, Algorithmica 2, 1987, pp. 209–233.

    CrossRef  MathSciNet  Google Scholar 

  2. A. Aggarwal and J. Park, Searching in Multidimensional Monotone Matrices, 29th IEEE Symp. Found. Comput. Sci., 1988, pp. 497–512.

    Google Scholar 

  3. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974.

    MATH  Google Scholar 

  4. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, Data Structures and Algorithms, Addison-Wesley, 1983.

    MATH  Google Scholar 

  5. A. Apostolico and C. Guerra, The Longest Common Subsequence Problem Revisited, Algorithmica 2, 1987, pp. 315–336.

    MATH  CrossRef  MathSciNet  Google Scholar 

  6. J. L. Bentley and J. B. Saxe, Decomposable Searching Problems I: Static-to-Dynamic Transformation. J. Algorithms 1 (4), December 1980, pp. 301–358.

    MATH  CrossRef  MathSciNet  Google Scholar 

  7. H. S. Bilofsky, C. Burks, J. W. Fickett, W. B. Goad, F. I. Lewitter, W. P. Rindone, C. D. Swindel, and C. S. Tung, The GenBank Genetic Sequence Databank, Nucl. Acids Res. 14, 1986, pp. 1–4.

    CrossRef  Google Scholar 

  8. C. DeLisi, Computers in Molecular Biology: Current Applications and Emerging Trends, Science, 240, 1988, pp. 47–52.

    CrossRef  Google Scholar 

  9. D. Eppstein, Sequence Comparison with Mixed Convex and Concave Costs, J. of Algorithms, 11, 1990, pp. 85–101.

    MATH  CrossRef  MathSciNet  Google Scholar 

  10. D. Eppstein, Z. Galil, and R. Giancarlo, Speeding Up Dynamic Programming, 29th IEEE Symp. Found. Comput. Sci., 1988, pp. 488–490.

    Google Scholar 

  11. D. Eppstein, Z. Galil, R. Giancarlo, and G. F. Italiano, Sparse Dynamic Programming I: Linear Cost Functions, J. ACM, to appear.

    Google Scholar 

  12. D. Eppstein, Z. Galil, R. Giancarlo, and G. F. Italiano, Sparse Dynamic Programming II: Convex and Concave Cost Functions, J. ACM, to appear.

    Google Scholar 

  13. M. J. Fischer and R. Wagner, The String to String Correction Problem, J. ACM 21, 1974, pp. 168–178.

    MATH  CrossRef  MathSciNet  Google Scholar 

  14. W. M. Fitch, Weighted Parsimony, Workshop on Algorithms for Molecular Genetics, Washington D.C., 1988.

    Google Scholar 

  15. W. M. Fitch and T. F. Smith, Optimal Sequence Alignment, Proc. Nat. Acad. Sci. USA 80, 1983, pp. 1382–1385.

    CrossRef  Google Scholar 

  16. Z. Galil and R. Giancarlo, Speeding Up Dynamic Programming with Applications to Molecular Biology, Theor. Comput. Sci., 64, 1989, pp. 107–118.

    MATH  CrossRef  MathSciNet  Google Scholar 

  17. Z. Galil and Y. Rabani, On the Space Requirement for Computing Edit Distances with Convex or Concave Gap Costs, Theor. Comp. Sci., to appear.

    Google Scholar 

  18. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, 1979.

    MATH  Google Scholar 

  19. O. Gotoh, An Improved Algorithm for Matching Biological Sequences, J. Mol. Biol. 162, 1982, pp. 705–708.

    CrossRef  Google Scholar 

  20. G. H. Hamm and G. N. Cameron, The EMBL Data Library, Nucl. Acids Res. 14, 1986, pp. 5–9.

    CrossRef  Google Scholar 

  21. J. P. Haton, Practical Application of a Real-Time Isolated-Word Recognition System using Syntactic Constraints, IEEE Trans. Acoustics, Speech and Signal Proc. ASSP-22(6), 1974, pp. 416–419.

    CrossRef  Google Scholar 

  22. D. S. Hirschberg, A Linear Space Algorithm for Computing Maximal Common Subsequences, Comm. ACM 18, 1975, pp. 341–343.

    MATH  CrossRef  MathSciNet  Google Scholar 

  23. D. S. Hirschberg, Algorithms for the Longest Common Subsequence Problem, J. ACM 24, 1977, pp. 664–675.

    MATH  CrossRef  MathSciNet  Google Scholar 

  24. D. S. Hirschberg and L. L. Larmore, The Least Weight Subsequence Problem, 26th IEEE Symp. Found. Comput. Sci., 1985, 137–143, and SIAM J. Comput. 16, 1987, pp. 628–638.

    Google Scholar 

  25. D. S. Hirschberg and L. L. Larmore, The Least Weight Subsequence Problem, 26th IEEE Symp. Found. Comput. Sci., 1985, 137–143, and SIAM J. Comput. 16, 1987, pp. 628–638.

    MATH  CrossRef  MathSciNet  Google Scholar 

  26. M. K. Hobish, The Role of the Computer in Estimates of DNA Nucleotide Sequence Divergence, in S. K. Dutta, ed., DNA Systematics, Volume I: Evolution, CRC Press, 1986.

    Google Scholar 

  27. J. W. Hunt and T. G. Szymanski, A Fast Algorithm for Computing Longest Common Subsequences, C. ACM 20 (5), 1977, pp. 350–353.

    MATH  CrossRef  MathSciNet  Google Scholar 

  28. D. B. Johnson, A Priority Queue in Which Initialization and Queue Operations Take O(loglog D) Time, Math. Sys. Th. 15, 1982, pp. 295–309.

    MATH  CrossRef  Google Scholar 

  29. M. I. Kanehisi and W. B. Goad, Pattern Recognition in Nucleic Acid Sequences II: An Efficient Method for Finding Locally Stable Secondary Structures, Nucl. Acids Res. 10 (1), 1982, pp. 265–277.

    CrossRef  Google Scholar 

  30. Z. M. Kedem and H. Fuchs, On Finding Several Shortest Paths in Certain Graphs, 18th Allerton Conf., 1980, pp. 677–686.

    Google Scholar 

  31. M. M. Klawe and D. Kleitman, An Almost Linear Algorithm for Generalized Matrix Searching, Tech. Rep. IBM Almaden Research Center, 1988.

    Google Scholar 

  32. D. E. Knuth and M. F. Plass, Breaking Paragraphs into Lines, Software Practice and Experience 11, 1981, pp. 1119–1184.

    MATH  CrossRef  Google Scholar 

  33. A. G. Ivanov, Distinguishing an approximate word’s inclusion on Turing machine in real time, Izv. Acad. Nauk USSR Ser. Mat. 48, 1984, pp. 520–568.

    MATH  Google Scholar 

  34. L. L. Larmore and B. Schieber, On-Line Dynamic Programming with Applications to the Prediction of RNA Secondary Structure, J. Algorithms, to appear.

    Google Scholar 

  35. V. I. Levenshtein, Binary Codes Capable of Correcting Deletions, Insertions and Reversals, Sov. Phys. Dokl. 10, 1966, pp. 707–710.

    MathSciNet  Google Scholar 

  36. D. Maier, The Complexity of Some Problems on Subsequences and Supersequences, J. ACM 25, 1978, pp. 322–336.

    MATH  CrossRef  MathSciNet  Google Scholar 

  37. T. Maniatis, Recombinant DNA, in D.M. Prescott, ed., Cell Biology, Academic Press, New York, 1980.

    Google Scholar 

  38. H. Martinez, Extending RNA Secondary Structure Predictions to Include Pseudoknots, Workshop on Algorithms for Molecular Genetics, Washington D.C., 1988.

    Google Scholar 

  39. W. J. Masek and M. S. Paterson, A Faster Algorithm Computing String Edit Distances, J. Comp. Sys. Sci. 20, 1980, pp. 18–31.

    MATH  CrossRef  MathSciNet  Google Scholar 

  40. A. M. Maxam and W. Gilbert, Sequencing End-Labeled DNA with Base Specific Chemical Cleavages, Meth. Enzymol. 65, 1980, p. 499.

    CrossRef  Google Scholar 

  41. W. Miller and E. W. Myers, Sequence Comparison with Concave Weighting Functions, Bull. Math. Biol., 50 (2), 1988, pp. 97–120.

    MATH  MathSciNet  Google Scholar 

  42. S. B. Needleman and C. D. Wunsch, A General Method applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins, J. Mol. Biol. 48, 1970, p. 443.

    CrossRef  Google Scholar 

  43. R. Nussinov, G. Pieczenik, J. R. Griggs, and D. J. Kleitman, Algorithms for Loop Matchings, SIAM J. Appl. Math. 35 (1), 1978, pp. 68–82.

    MATH  CrossRef  MathSciNet  Google Scholar 

  44. R. Nussinov and A. Jacobson, Fast Algorithm for Predicting the Secondary Structure of Single-Stranded RNA, Proc. Nat. Acad. Sci. USA 77, 1980, pp. 6309–6313.

    CrossRef  Google Scholar 

  45. G. N. Reeke, Protein Folding: Computational Approaches to an Exponential-Time Problem, Ann. Rev. Comput. Sci. 3, 1988, pp. 59–84.

    CrossRef  Google Scholar 

  46. T. A. Reichert, D. N. Cohen, and A. K. C. Wong, An Application of Information Theory to Genetic Mutations and the Matching of Polypeptide Sequences, J. Theor. Biol. 42, 1973, pp. 245–261.

    CrossRef  Google Scholar 

  47. H. Sakoe and S. Chiba, A Dynamic-Programming Approach to Continuous Speech Recognition, Proc. Int. Cong. Acoustics, Budapest, 1971, Paper 20 C 13.

    Google Scholar 

  48. F. Sanger, S. Nicklen, and A. R. Coulson, Chain Sequencing with Chain-Terminating Inhibitors, Proc. Nat. Acad. Sci. USA 74, 1977, 5463.

    CrossRef  Google Scholar 

  49. David Sankoff, Matching Sequences under Deletion-Insertion Constraints, Proc. Nat. Acad. Sci. USA 69, 1972, pp. 4–6.

    MATH  CrossRef  MathSciNet  Google Scholar 

  50. D. Sankoff, Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems, SIAM J. Appl. Math. 45 (5), 1985, pp. 810–825.

    MATH  CrossRef  MathSciNet  Google Scholar 

  51. D. Sankoff, J. B. Kruskal, S. Mainville, and R. J. Cedergren, Fast Algorithms to Determine RNA Secondary Structures Containing Multiple Loops, in D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983, pp. 93–120.

    Google Scholar 

  52. D. Sankoff and J. B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, 1983.

    Google Scholar 

  53. P. H. Sellers, On the Theory and Computation of Evolutionary Distance, SIAM J. Appl. Math. 26, 1974, pp. 787–793.

    MATH  CrossRef  MathSciNet  Google Scholar 

  54. P. H. Sellers, Personal Communication, 1989.

    Google Scholar 

  55. T. Smith and M. S. Waterman, Identification of Common Molecular Subsequences, J. Mol. Biol. 147 (1981), pp. 195–197.

    CrossRef  Google Scholar 

  56. E. Ukkonen, On approximate string matching, J. of Algorithms, 6, 1985, pp. 132–137.

    MATH  CrossRef  MathSciNet  Google Scholar 

  57. V. M. Velichko and N. G. Zagoruyko, Automatic Recognition of 200 Words, Int. J. Man-Machine Studies 2, 1970, pp. 223–234.

    CrossRef  Google Scholar 

  58. T. K. Vintsyuk, Speech Discrimination by Dynamic Programming, Cybernetics 4(1), 1968, 52–57;

    CrossRef  MathSciNet  Google Scholar 

  59. T. K. Vintsyuk, Speech Discrimination by Dynamic Programming, Russian Kibernetika 4 (1), 1968, pp. 81–88.

    MathSciNet  Google Scholar 

  60. R. A. Wagner, On the Complexity of the Extended String-to-String Correction Problem, 7th ACM Symp. Theory of Computing, 1975, pp. 218–223.

    Google Scholar 

  61. M. S. Waterman, Sequence alignments in the neighborhood of the optimum with general applications to dynamic programming, Proc. Natl. Acad. Sci. USA, 80, 1983, pp. 3123–3124.

    MATH  CrossRef  Google Scholar 

  62. M. S. Waterman, Efficient Sequence Alignment Algorithms, J. of Theor. Biol., 108, 1984, pp. 333.

    CrossRef  MathSciNet  Google Scholar 

  63. M. S. Waterman, General Methods of Sequence Comparison, Bull. Math. Biol. 46, 1984, pp. 473–501.

    MATH  MathSciNet  Google Scholar 

  64. M. S. Waterman Editor, Mathematical Methods for DNA Sequences, CRC Press, Inc., 1988.

    Google Scholar 

  65. M. S. Waterman and T. F. Smith, RNA Secondary Structure: A Complete Mathematical Analysis, Math. Biosciences 42, 1978, pp. 257–266.

    MATH  CrossRef  Google Scholar 

  66. M. S. Waterman and T. F. Smith, New Stratigraphic Correlation Techniques, J. Geol. 88, 1980, pp. 451–457.

    CrossRef  Google Scholar 

  67. M. S. Waterman and T. F. Smith, Rapid Dynamic Programming Algorithms for RNA Secondary Structure, Adv. Appl. Math. 7, 1986, pp. 455–464.

    MATH  CrossRef  MathSciNet  Google Scholar 

  68. M. S. Waterman, T. F. Smith, and W. A. Beyer, Some Biological Sequence Metrics, Adv. Math. 20, 1976, pp. 367–387.

    MATH  CrossRef  MathSciNet  Google Scholar 

  69. Robert Wilber, The Concave Least Weight Subsequence Problem Revisited, J. Algorithms 9 (3), 1988, pp. 418–425.

    MATH  CrossRef  MathSciNet  Google Scholar 

  70. W. J. Wilbur and D. J. Lipman, Rapid Similarity Searches of Nucleic Acid and Protein Data Banks, Proc. Nat. Acad. Sci. USA 80, 1983, pp. 726–730.

    CrossRef  Google Scholar 

  71. W. J. Wilbur and D. J. Lipman, The Context Dependent Comparison of Biological Sequences, SIAM J. Appl. Math. 44 (3), 1984, pp. 557–567.

    MATH  CrossRef  MathSciNet  Google Scholar 

  72. M. Zucker, The Use of Dynamic Programming Algorithms in RNA Secondary Structure Prediction, in M. S. Waterman editor, Mathematical Methods for DNA Sequences, CRC Press, 1988, pp. 159–184.

    Google Scholar 

  73. M. Zuker, and P. Stiegler, Optimal Computer Folding of Large RNA Sequences using Thermodynamics and Auxiliary Information, Nucl. Acids Res. 9, 1981, pp. 133.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 1993 Springer-Verlag New York, Inc.

About this paper

Cite this paper

Eppstein, D., Galil, Z., Giancarlo, R., Italiano, G.F. (1993). Efficient Algorithms for Sequence Analysis. In: Capocelli, R., De Santis, A., Vaccaro, U. (eds) Sequences II. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-9323-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-9323-8_17

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4613-9325-2

  • Online ISBN: 978-1-4613-9323-8

  • eBook Packages: Springer Book Archive