Abstract
An algorithm is given for computing the edit distance as well as the corresponding sequence of editing steps (insertions, deletions, changes, transpositions of adjacent symbols) between two strings a1a2...am and b1b2...bn. The algorithm needs time 0(s·min(m,n)) and space 0(s2) where s is the edit distance, that is, the minimum number of editing steps needed to transform a1a2...am to b1b2...bn. For small s this is a considerable improvement over the best previously known algorithm that needs time and space 0(mn). If the editing sequence is not required, the space complexity of our algorithm reduces to 0(s). Given a threshold value t, the algorithm can also be modified to test in time 0(t·min(m,n)) and space 0(t) whether the edit distance of the two strings is at most t.
Preview
Unable to display preview. Download preview PDF.
References
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10 (1966), 707–710.
Lowrance, R. and R.A. Wagner: An extension of the string-to-string correction problem. J. ACM 22 (1975), 177–183.
Nakatsu, N., Y. Kambayashi and S. Yajima: A longest common subsequence algorithm suitable for similar text strings. Acta Informatica 18 (1982), 171–179.
Needleman, S.B. and C.D. Wunsch: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 (1970), 443–453.
Peltola,H. & al.: SEQAID — A program package to support biopolymer sequencing. Department of Computer Science and Recombinant DNA Laboratory, University of Helsinki, 1983 (in preparation).
Sankoff, D.: Matching sequences under deletion/insertion constraints. Proc. Nat. Acad. Sci. 69 (1972), 4–6.
Sellers, P.H.: The theory and computation of evolutionary distances: Pattern recognition. J. Alg. 1 (1980), 359–373.
Ukkonen,E.: An algorithm for approximate string matching. In preparation.
Wagner, R. and M. Fisher: The string-to-string correction problem. J. ACM 21 (1974), 168–178.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1983 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ukkonen, E. (1983). On approximate string matching. In: Karpinski, M. (eds) Foundations of Computation Theory. FCT 1983. Lecture Notes in Computer Science, vol 158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-12689-9_129
Download citation
DOI: https://doi.org/10.1007/3-540-12689-9_129
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-12689-8
Online ISBN: 978-3-540-38682-7
eBook Packages: Springer Book Archive