Summary
The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol of x with another symbol. String editing models a variety of problems arising in such diverse areas as text and speech processing, geology and, last but not least, molecular biology. Special cases of string editing include the longest common subsequence problem, local alignment and similarity searching in DNA and protein sequences, and approximate string searching. We describe serial and parallel algorithmic solutions for the problem and some of its basic variants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A. V. [ 1990 ], Algorithms for finding patterns in strings, Handbook of Theoretical Computer Science, J. van Leeuwen, Ed., Elsevier, Amsterdam, 255–300.
Aho, A. V., D. S. Hirschberg and J. D. Ullman [ 1976 ], Bounds on the complexity of the longest common subsequence problem, J. Assoc. Comput. Mach., 23, 1–12.
Aho, A. V., J. E. Hopcroft and J. D. Ullman [ 1974 ], The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA.
Aggarwal, A. and J. Park [ 1988 ], Notes on searching in multidimensional monotone arrays, in Proc. 29th Annual IEEE Symposium on Foundations of Computer Science, 1988, IEEE Computer Society, Washington, DC, 497–512.
Apostolico, A. [ 1986 ], Improving the worst case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings, Information Processing Letters 23, 63–69.
Apostolico, A. [ 1987 ], Remark on HSU-DU New Algorithm for the LCS Problem. Information Processing Letters 25, 235–236.
Apostolico, A., Ed. [1994], Algorithmica 4/5, Special Issue on String Algorithmics and Its Applications.
Apostolico, A., M. J. Atallah, L. L. Larmore and S. Mcfaddin [1990], Efficient parallel algorithms for string editing and related problems, SIAM Journal on Computing 19, 968–988. Also: Proceedings of the 26th Allerton Conf. on Comm., Control and Comp., Monticello, IL, Sept. 1988, 253–263.
Apostolico, A., S. Browne and C. Guerra [ 1992 ], Fast linear space computations of longest common subsequences, Theoretical Computer Science, 92, 3–17.
Apostolico, A. and Z. Galil, Eds. [ 1985 ], Combinatorial Algorithms on Words, Springer-Verlag, Berlin.
Apostolico, A. and C. Guerra [ 1985 ], A fast linear space algorithm for computing longest common subsequences, Proceedings of the 23rd Allerton Conference, Monticello, IL (1985).
Apostolico, A. and C. Guerra [ 1987 ], The longest common subsequence problem revisited, Algorithmica, 2, 315–336.
Arlazarov, V.L., E. A. Dinic, M. A. Kronrod, and I. A. Faradzev[1970]. On economical construction of the transitive closure of a directed graph, Dokl. Akad. Nauk SSSR 194, 487–488 (in Russian). English translation in Soviet Math. Dokl. 11:5, 1209–1210.
Atallah, M. J. [ 1993 ] A Faster Parallel Algorithm for a Matrix Searching Problem, Algorithmica, 9, 156–167.
Bentley, J. L. and A. C-C. Yao [ 1976 ], An almost optimal algorithm for unbounded searching, Inform. Process. Letters 5, 82–87.
Bishop, M. J. and C. J Rawlings, Eds. [ 1987 ], Nucleic Acids and Protein Sequence Analysis, IRL Press, Oxford.
Bogart, K. P. [ 1983 ], Introductory Combinatorics, Pitman, N.Y.
Brown, M. R. and R. E. Tarjan [ 1978 ], A representation of linear lists with movable fingers. Proceedings of the 10-th STOC, San Diego, CA, 19–29.
Chang, W. I. and E. L. Lawler [1990], Approximate string matching in sublinear expected time, in Proc. 31st Annual IEEE Symp. on Foundations of Computer Science, St. Louis, MO, 116–124
Chao, K. M. [1994], Computing all suboptimal alignments in linear space, in Combinatorial Pattern Matching 1991, M. Crochemore and D. Gusfield, Eds., Proceedings of the 5th Annual Symposium, Asilomar, CA, June 1994, Springer-Verlag Lecture Notes in Computer Science Vol. 807 (1994).
Crochemore, M. and W. Rytter [ 1994 ], Text Algorithms, Oxford University Press, N.Y.
Dilworth, R. P. [1950], A decomposition theorem for partially ordered sets, Ann. Math. 51, 161–165.
Doolittle, R. F., Ed. [ 1990 ], Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, Methods of Enzymology 183, Academic Press, San Diego, CA.
van Emde Boas, P. [ 1975 ], Preserving order in a forest in less than logarithmic time, Proc. 16th FOCS, 75–84.
Eppstein, D. and Z. Galil [ 1988 ], Parallel algorithmic techniques for combinatorial computation, Ann. Rev. Comput. Sci., 3, 233–283.
Eppstein, D., Z. Galil, R. Giancarlo, and G. Italiano [ 1990 ]. Sparse dynamic programming, Proc. Symp. on Discrete Algorithms, San Francisco, CA, 513–522.
Fredman, M. L. [ 1975 ], On Computing the Length of Longest Increasing Subsequences, Discrete Mathematics 11, 29–35.
Fuchs, H., Z. M. Kedem, and S. P. Uselton [ 1977 ], Optimal surface reconstruction from planar contours, Communications of the Assoc. Comput. Mach., 20, 693–702.
Galil Z. and R. Giancarlo [ 1988 ], Data structures and algorithms for approximate string matching, J. Complexity 4, 33–72.
Galil, Z. and K. Park [ 1990 ], An improved algorithm for approximate string matching, SIAM Jour. Computing 19, 989–999.
Gotoh, O. [ 1982 ]. An improved algorithm for matching biological sequences, J. Mol. Biol. 162, 705–708.
von Heijne, G. [ 1987 ], Sequence Analysis in Molecular Biology, Academic Press, San Diego.
Hirschberg, D.S. [ 1975 ], A linear space algorithm for computing maximal common subsequences, CACM 18, 6, 341–343.
Hirschberg, D. S. [ 1977 ], Algorithms for the longest common subsequence problem, JACM 24, 4, 664–675.
Hirschberg, D. S. [ 1978 ], An information theoretic lower bound for the longest common subsequence problem, Inform. Process. Lett. 7: 1, 40–41.
Hsu, W. J., and M. W.Du [ 1984 ], New algorithms for the LCS Problem, J. Comput. System Sci., 29, 133–152.
Hunt, J. W. and T. G. Szymanski [ 1977 ], A fast algorithm for computing longest common subsequences, CACM 20, 5, 350–353.
Ja Ja, J. [ 1992 ], An Introduction to Parallel Algorithms, Addison-Wesley, Reading, MA.
Jacobson, G. and K. P. Vo [1992], Heaviest increasing/common subsequence problems, in Combinatorial Pattern Matching, Proceedings of the Third Annual Symposium, A. Apostolico, M. Crochemore, Z. Galil and U. Manger, Eds., Tucson, Arizona, 1992. Springer-Verlag, Berlin, Lecture Notes in Computer Science 644, 52–66.
Johnson, D. B. [ 1982 ]. A priority queue in which initialization and queue operations take O(log log D) time, Math. Systems Theory 15, 295–309.
Ivanov, A. G. [ 1985 ], Recognition of an approximate occurrence of. words on a Turing machine in real time, Math. USSR Izv., 24, 479–522.
Kedem, Z. M. and H. Fuchs [1980], On finding several shortest paths in certain graphs, in Proc. 18th Allerton Conference on Communication, Control, and Computing, October 1980, pp. 677–683.
Kumar, S. K. and C. P. Rangan [ 1987 ], A linear space algorithm for the LCS problem, Acta Informatica 24, 353–362.
Ladner, R. E., and M. J. Fischer [ 1980 ], Parallel prefix computation, J. Assoc. Comput. Mach., 27, 831–838.
Landau. G. M. and U. Vishkin [ 1986 ], Introducing efficient parallelism into approximate string matching and a new serial algorithm, in Proc. 18th Annual ACM STOC, New York, 1986, 220–230.
Landau, G. M. and U. Vishkin [ 1988 ], Fast string matching with k differences, Jour. Comp. and System Sci. 37, 63–78.
Leighton, F. T. [ 1992 ], Introduction to Parallel Algorithms and Architectures, Morgan Kaufmann, San Mateo, CA.
Levenshtein, V. I. [ 1966 ], Binary codes capable of correcting deletions, insertions and reversals, Soviet Phys. Dokl., 10, 707–710.
Lipton, R. J. and D. Lopresti [ 1985 ], A systolic array for rapid string comparison Proc. Chapel Hill Conf. on Very Large Scale Integration, H. Fucs, Ed., Computer Science Press, 363–376.
H. M. Martinez, Ed. [ 1984 ], Mathematical and computational problems in the analysis of molecular sequences, Bull. Math. Bio. 46, ( Special Issue Honoring M. O. Dayhoff ).
Masek, W. J. and M. S. Paterson [ 1980 ], A faster algorithm computing string edit distances, J. Comput. System Sci., 20, 18–31.
Mathies, T. R. [ 1988 ], A fast parallel algorithm to determine edit distance, Tech. Report CMU-CS-88–130, Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, April 1988.
Mehlhorn, K. [ 1984 ], Data structures and algorithms 1: sorting and searching, EATCS Monographs on TCS, Springer-Verlag, Berlin.
Myers, E. W. and W. Miller [ 1988 ], Optimal alignments in linear space, Comp. Appl. Biosc. 4, 1, 11-17.
Myers, E. W. [ 1986 ], An O(ND) difference algorithm and its variations, Algorithmica 1, 251–266.
Nakatsu, N., Y. Kambayashi, and S. Yajima [ 1982 ], A longest common subsequence algorithm suitable for similar text strings, Acta Informatica 18, 171–179.
Needleman, R. B. and C. D. Wunsch [ 1973 ], A general method applicable to the search for similarities in the amino-acid sequence of two proteins, J. Molecular Bio., 48, 443–453.
Ranka, S. and S. Sahni [ 1988 ], String editing on an SIMD hypercube multi-computer, Tech. Report 88–29, Department of Computer Science, University of Minnesota, March 1988, J. Parallel Distributed Comput.
Salomaa, A. [ 1973 ] Formal Languages, Academic Press, Orlando, Fl.
Sankoff, D.[ 1972 ], Matching sequences under deletion-insertion constraints, Proc. Nat. Acad. Sci. U.S.A., 69, 4–6.
Sankoff, D. and J. B. Kruskal, Eds. [ 1983 ], Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, Addison-Wesley, Reading, MA.
Sankoff, D. and P. H. Sellers [ 1973 ], Shortcuts, Diversions and Maximal Chains in Partially Ordered Sets, Discrete Mathematics, 4, 287–293.
Sellers, P. H. [ 1980 ], The theory and computation of evolutionary distance: pattern recognition, J. Algorithms, 1, 359–373.
Smith, T. F. and M. S. Waterman [ 1981 ], Identification of Common Molecular Subsequences, Journal of Molecular Biology 147, 195–197.
Ukkonen, E. [ 1985 ], Finding approximate patterns in strings, J. Algorithms 6, 132–137.
Wagner, R. A. and M. J. Fischer [ 1974 ], The string to string correction problem, J. Assoc. Comput. Mach., 21, 168–173.
Waterman, M. S. (Ed.) [ 1989 ], Mathematical Methods for DNA sequences, CRC Press, Boca Raton.
Wong, C. K. and A. K. Chandra [ 1976 ], Bounds for the string editing problem, J. Assoc. Comput. Mach., 23, 13–16.
Wu, S., U. Manber, E. W. Myers, and W. Miller [ 1990 ]. An O(NP) sequence comparison algorithm, Info. Proc. Letters 35, 317–323.
Wu, S., U. Manber, and E. Myers [ 1991 ]. Improving the running times for some string-matching problems.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Apostolico, A. (1997). String Editing and Longest Common Subsequences. In: Rozenberg, G., Salomaa, A. (eds) Handbook of Formal Languages. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-07675-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-662-07675-0_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-08230-6
Online ISBN: 978-3-662-07675-0
eBook Packages: Springer Book Archive