Skip to main content

String Editing and Longest Common Subsequences

  • Chapter
  • First Online:
Handbook of Formal Languages

Summary

The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol of x with another symbol. String editing models a variety of problems arising in such diverse areas as text and speech processing, geology and, last but not least, molecular biology. Special cases of string editing include the longest common subsequence problem, local alignment and similarity searching in DNA and protein sequences, and approximate string searching. We describe serial and parallel algorithmic solutions for the problem and some of its basic variants.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A. V. [ 1990 ], Algorithms for finding patterns in strings, Handbook of Theoretical Computer Science, J. van Leeuwen, Ed., Elsevier, Amsterdam, 255–300.

    Google Scholar 

  2. Aho, A. V., D. S. Hirschberg and J. D. Ullman [ 1976 ], Bounds on the complexity of the longest common subsequence problem, J. Assoc. Comput. Mach., 23, 1–12.

    Article  MathSciNet  Google Scholar 

  3. Aho, A. V., J. E. Hopcroft and J. D. Ullman [ 1974 ], The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA.

    MATH  Google Scholar 

  4. Aggarwal, A. and J. Park [ 1988 ], Notes on searching in multidimensional monotone arrays, in Proc. 29th Annual IEEE Symposium on Foundations of Computer Science, 1988, IEEE Computer Society, Washington, DC, 497–512.

    Google Scholar 

  5. Apostolico, A. [ 1986 ], Improving the worst case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings, Information Processing Letters 23, 63–69.

    Article  MathSciNet  Google Scholar 

  6. Apostolico, A. [ 1987 ], Remark on HSU-DU New Algorithm for the LCS Problem. Information Processing Letters 25, 235–236.

    Article  MathSciNet  Google Scholar 

  7. Apostolico, A., Ed. [1994], Algorithmica 4/5, Special Issue on String Algorithmics and Its Applications.

    Google Scholar 

  8. Apostolico, A., M. J. Atallah, L. L. Larmore and S. Mcfaddin [1990], Efficient parallel algorithms for string editing and related problems, SIAM Journal on Computing 19, 968–988. Also: Proceedings of the 26th Allerton Conf. on Comm., Control and Comp., Monticello, IL, Sept. 1988, 253–263.

    MATH  Google Scholar 

  9. Apostolico, A., S. Browne and C. Guerra [ 1992 ], Fast linear space computations of longest common subsequences, Theoretical Computer Science, 92, 3–17.

    Article  MathSciNet  Google Scholar 

  10. Apostolico, A. and Z. Galil, Eds. [ 1985 ], Combinatorial Algorithms on Words, Springer-Verlag, Berlin.

    MATH  Google Scholar 

  11. Apostolico, A. and C. Guerra [ 1985 ], A fast linear space algorithm for computing longest common subsequences, Proceedings of the 23rd Allerton Conference, Monticello, IL (1985).

    Google Scholar 

  12. Apostolico, A. and C. Guerra [ 1987 ], The longest common subsequence problem revisited, Algorithmica, 2, 315–336.

    Article  MathSciNet  Google Scholar 

  13. Arlazarov, V.L., E. A. Dinic, M. A. Kronrod, and I. A. Faradzev[1970]. On economical construction of the transitive closure of a directed graph, Dokl. Akad. Nauk SSSR 194, 487–488 (in Russian). English translation in Soviet Math. Dokl. 11:5, 1209–1210.

    MathSciNet  Google Scholar 

  14. Atallah, M. J. [ 1993 ] A Faster Parallel Algorithm for a Matrix Searching Problem, Algorithmica, 9, 156–167.

    Article  MathSciNet  Google Scholar 

  15. Bentley, J. L. and A. C-C. Yao [ 1976 ], An almost optimal algorithm for unbounded searching, Inform. Process. Letters 5, 82–87.

    Article  MathSciNet  Google Scholar 

  16. Bishop, M. J. and C. J Rawlings, Eds. [ 1987 ], Nucleic Acids and Protein Sequence Analysis, IRL Press, Oxford.

    Google Scholar 

  17. Bogart, K. P. [ 1983 ], Introductory Combinatorics, Pitman, N.Y.

    MATH  Google Scholar 

  18. Brown, M. R. and R. E. Tarjan [ 1978 ], A representation of linear lists with movable fingers. Proceedings of the 10-th STOC, San Diego, CA, 19–29.

    Google Scholar 

  19. Chang, W. I. and E. L. Lawler [1990], Approximate string matching in sublinear expected time, in Proc. 31st Annual IEEE Symp. on Foundations of Computer Science, St. Louis, MO, 116–124

    Google Scholar 

  20. Chao, K. M. [1994], Computing all suboptimal alignments in linear space, in Combinatorial Pattern Matching 1991, M. Crochemore and D. Gusfield, Eds., Proceedings of the 5th Annual Symposium, Asilomar, CA, June 1994, Springer-Verlag Lecture Notes in Computer Science Vol. 807 (1994).

    Google Scholar 

  21. Crochemore, M. and W. Rytter [ 1994 ], Text Algorithms, Oxford University Press, N.Y.

    Google Scholar 

  22. Dilworth, R. P. [1950], A decomposition theorem for partially ordered sets, Ann. Math. 51, 161–165.

    Article  MathSciNet  Google Scholar 

  23. Doolittle, R. F., Ed. [ 1990 ], Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, Methods of Enzymology 183, Academic Press, San Diego, CA.

    Google Scholar 

  24. van Emde Boas, P. [ 1975 ], Preserving order in a forest in less than logarithmic time, Proc. 16th FOCS, 75–84.

    Google Scholar 

  25. Eppstein, D. and Z. Galil [ 1988 ], Parallel algorithmic techniques for combinatorial computation, Ann. Rev. Comput. Sci., 3, 233–283.

    Article  MathSciNet  Google Scholar 

  26. Eppstein, D., Z. Galil, R. Giancarlo, and G. Italiano [ 1990 ]. Sparse dynamic programming, Proc. Symp. on Discrete Algorithms, San Francisco, CA, 513–522.

    MATH  Google Scholar 

  27. Fredman, M. L. [ 1975 ], On Computing the Length of Longest Increasing Subsequences, Discrete Mathematics 11, 29–35.

    Article  MathSciNet  Google Scholar 

  28. Fuchs, H., Z. M. Kedem, and S. P. Uselton [ 1977 ], Optimal surface reconstruction from planar contours, Communications of the Assoc. Comput. Mach., 20, 693–702.

    MathSciNet  MATH  Google Scholar 

  29. Galil Z. and R. Giancarlo [ 1988 ], Data structures and algorithms for approximate string matching, J. Complexity 4, 33–72.

    Article  MathSciNet  Google Scholar 

  30. Galil, Z. and K. Park [ 1990 ], An improved algorithm for approximate string matching, SIAM Jour. Computing 19, 989–999.

    Article  MathSciNet  Google Scholar 

  31. Gotoh, O. [ 1982 ]. An improved algorithm for matching biological sequences, J. Mol. Biol. 162, 705–708.

    Article  Google Scholar 

  32. von Heijne, G. [ 1987 ], Sequence Analysis in Molecular Biology, Academic Press, San Diego.

    Google Scholar 

  33. Hirschberg, D.S. [ 1975 ], A linear space algorithm for computing maximal common subsequences, CACM 18, 6, 341–343.

    Article  MathSciNet  Google Scholar 

  34. Hirschberg, D. S. [ 1977 ], Algorithms for the longest common subsequence problem, JACM 24, 4, 664–675.

    Article  MathSciNet  Google Scholar 

  35. Hirschberg, D. S. [ 1978 ], An information theoretic lower bound for the longest common subsequence problem, Inform. Process. Lett. 7: 1, 40–41.

    Article  MathSciNet  Google Scholar 

  36. Hsu, W. J., and M. W.Du [ 1984 ], New algorithms for the LCS Problem, J. Comput. System Sci., 29, 133–152.

    Article  MathSciNet  Google Scholar 

  37. Hunt, J. W. and T. G. Szymanski [ 1977 ], A fast algorithm for computing longest common subsequences, CACM 20, 5, 350–353.

    Article  MathSciNet  Google Scholar 

  38. Ja Ja, J. [ 1992 ], An Introduction to Parallel Algorithms, Addison-Wesley, Reading, MA.

    Google Scholar 

  39. Jacobson, G. and K. P. Vo [1992], Heaviest increasing/common subsequence problems, in Combinatorial Pattern Matching, Proceedings of the Third Annual Symposium, A. Apostolico, M. Crochemore, Z. Galil and U. Manger, Eds., Tucson, Arizona, 1992. Springer-Verlag, Berlin, Lecture Notes in Computer Science 644, 52–66.

    Google Scholar 

  40. Johnson, D. B. [ 1982 ]. A priority queue in which initialization and queue operations take O(log log D) time, Math. Systems Theory 15, 295–309.

    Article  MathSciNet  Google Scholar 

  41. Ivanov, A. G. [ 1985 ], Recognition of an approximate occurrence of. words on a Turing machine in real time, Math. USSR Izv., 24, 479–522.

    Article  Google Scholar 

  42. Kedem, Z. M. and H. Fuchs [1980], On finding several shortest paths in certain graphs, in Proc. 18th Allerton Conference on Communication, Control, and Computing, October 1980, pp. 677–683.

    Google Scholar 

  43. Kumar, S. K. and C. P. Rangan [ 1987 ], A linear space algorithm for the LCS problem, Acta Informatica 24, 353–362.

    Article  MathSciNet  Google Scholar 

  44. Ladner, R. E., and M. J. Fischer [ 1980 ], Parallel prefix computation, J. Assoc. Comput. Mach., 27, 831–838.

    Article  MathSciNet  Google Scholar 

  45. Landau. G. M. and U. Vishkin [ 1986 ], Introducing efficient parallelism into approximate string matching and a new serial algorithm, in Proc. 18th Annual ACM STOC, New York, 1986, 220–230.

    Google Scholar 

  46. Landau, G. M. and U. Vishkin [ 1988 ], Fast string matching with k differences, Jour. Comp. and System Sci. 37, 63–78.

    Article  MathSciNet  Google Scholar 

  47. Leighton, F. T. [ 1992 ], Introduction to Parallel Algorithms and Architectures, Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  48. Levenshtein, V. I. [ 1966 ], Binary codes capable of correcting deletions, insertions and reversals, Soviet Phys. Dokl., 10, 707–710.

    Google Scholar 

  49. Lipton, R. J. and D. Lopresti [ 1985 ], A systolic array for rapid string comparison Proc. Chapel Hill Conf. on Very Large Scale Integration, H. Fucs, Ed., Computer Science Press, 363–376.

    Google Scholar 

  50. H. M. Martinez, Ed. [ 1984 ], Mathematical and computational problems in the analysis of molecular sequences, Bull. Math. Bio. 46, ( Special Issue Honoring M. O. Dayhoff ).

    Google Scholar 

  51. Masek, W. J. and M. S. Paterson [ 1980 ], A faster algorithm computing string edit distances, J. Comput. System Sci., 20, 18–31.

    Article  MathSciNet  Google Scholar 

  52. Mathies, T. R. [ 1988 ], A fast parallel algorithm to determine edit distance, Tech. Report CMU-CS-88–130, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, April 1988.

    Google Scholar 

  53. Mehlhorn, K. [ 1984 ], Data structures and algorithms 1: sorting and searching, EATCS Monographs on TCS, Springer-Verlag, Berlin.

    Book  Google Scholar 

  54. Myers, E. W. and W. Miller [ 1988 ], Optimal alignments in linear space, Comp. Appl. Biosc. 4, 1, 11-17.

    Google Scholar 

  55. Myers, E. W. [ 1986 ], An O(ND) difference algorithm and its variations, Algorithmica 1, 251–266.

    Article  MathSciNet  Google Scholar 

  56. Nakatsu, N., Y. Kambayashi, and S. Yajima [ 1982 ], A longest common subsequence algorithm suitable for similar text strings, Acta Informatica 18, 171–179.

    Article  MathSciNet  Google Scholar 

  57. Needleman, R. B. and C. D. Wunsch [ 1973 ], A general method applicable to the search for similarities in the amino-acid sequence of two proteins, J. Molecular Bio., 48, 443–453.

    Article  Google Scholar 

  58. Ranka, S. and S. Sahni [ 1988 ], String editing on an SIMD hypercube multi-computer, Tech. Report 88–29, Department of Computer Science, University of Minnesota, March 1988, J. Parallel Distributed Comput.

    Google Scholar 

  59. Salomaa, A. [ 1973 ] Formal Languages, Academic Press, Orlando, Fl.

    MATH  Google Scholar 

  60. Sankoff, D.[ 1972 ], Matching sequences under deletion-insertion constraints, Proc. Nat. Acad. Sci. U.S.A., 69, 4–6.

    Article  MathSciNet  Google Scholar 

  61. Sankoff, D. and J. B. Kruskal, Eds. [ 1983 ], Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA.

    Google Scholar 

  62. Sankoff, D. and P. H. Sellers [ 1973 ], Shortcuts, Diversions and Maximal Chains in Partially Ordered Sets, Discrete Mathematics, 4, 287–293.

    Article  MathSciNet  Google Scholar 

  63. Sellers, P. H. [ 1980 ], The theory and computation of evolutionary distance: pattern recognition, J. Algorithms, 1, 359–373.

    Article  MathSciNet  Google Scholar 

  64. Smith, T. F. and M. S. Waterman [ 1981 ], Identification of Common Molecular Subsequences, Journal of Molecular Biology 147, 195–197.

    Article  Google Scholar 

  65. Ukkonen, E. [ 1985 ], Finding approximate patterns in strings, J. Algorithms 6, 132–137.

    Article  MathSciNet  Google Scholar 

  66. Wagner, R. A. and M. J. Fischer [ 1974 ], The string to string correction problem, J. Assoc. Comput. Mach., 21, 168–173.

    Article  MathSciNet  Google Scholar 

  67. Waterman, M. S. (Ed.) [ 1989 ], Mathematical Methods for DNA sequences, CRC Press, Boca Raton.

    MATH  Google Scholar 

  68. Wong, C. K. and A. K. Chandra [ 1976 ], Bounds for the string editing problem, J. Assoc. Comput. Mach., 23, 13–16.

    Article  MathSciNet  Google Scholar 

  69. Wu, S., U. Manber, E. W. Myers, and W. Miller [ 1990 ]. An O(NP) sequence comparison algorithm, Info. Proc. Letters 35, 317–323.

    Article  MathSciNet  Google Scholar 

  70. Wu, S., U. Manber, and E. Myers [ 1991 ]. Improving the running times for some string-matching problems.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Apostolico, A. (1997). String Editing and Longest Common Subsequences. In: Rozenberg, G., Salomaa, A. (eds) Handbook of Formal Languages. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-07675-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-07675-0_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-08230-6

  • Online ISBN: 978-3-662-07675-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics