On approximate string matching
An algorithm is given for computing the edit distance as well as the corresponding sequence of editing steps (insertions, deletions, changes, transpositions of adjacent symbols) between two strings a1a2...am and b1b2...bn. The algorithm needs time 0(s·min(m,n)) and space 0(s2) where s is the edit distance, that is, the minimum number of editing steps needed to transform a1a2...am to b1b2...bn. For small s this is a considerable improvement over the best previously known algorithm that needs time and space 0(mn). If the editing sequence is not required, the space complexity of our algorithm reduces to 0(s). Given a threshold value t, the algorithm can also be modified to test in time 0(t·min(m,n)) and space 0(t) whether the edit distance of the two strings is at most t.
Unable to display preview. Download preview PDF.
- 1.Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10 (1966), 707–710.Google Scholar
- 2.Lowrance, R. and R.A. Wagner: An extension of the string-to-string correction problem. J. ACM 22 (1975), 177–183.Google Scholar
- 3.Nakatsu, N., Y. Kambayashi and S. Yajima: A longest common subsequence algorithm suitable for similar text strings. Acta Informatica 18 (1982), 171–179.Google Scholar
- 5.Peltola,H. & al.: SEQAID — A program package to support biopolymer sequencing. Department of Computer Science and Recombinant DNA Laboratory, University of Helsinki, 1983 (in preparation).Google Scholar
- 7.Sellers, P.H.: The theory and computation of evolutionary distances: Pattern recognition. J. Alg. 1 (1980), 359–373.Google Scholar
- 8.Ukkonen,E.: An algorithm for approximate string matching. In preparation.Google Scholar
- 9.Wagner, R. and M. Fisher: The string-to-string correction problem. J. ACM 21 (1974), 168–178.Google Scholar