On approximate string matching

  • Esko Ukkonen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 158)


An algorithm is given for computing the edit distance as well as the corresponding sequence of editing steps (insertions, deletions, changes, transpositions of adjacent symbols) between two strings a1a2...am and b1b2...bn. The algorithm needs time 0(s·min(m,n)) and space 0(s2) where s is the edit distance, that is, the minimum number of editing steps needed to transform a1a2...am to b1b2...bn. For small s this is a considerable improvement over the best previously known algorithm that needs time and space 0(mn). If the editing sequence is not required, the space complexity of our algorithm reduces to 0(s). Given a threshold value t, the algorithm can also be modified to test in time 0(t·min(m,n)) and space 0(t) whether the edit distance of the two strings is at most t.


Edit Distance Space Requirement Editing Operation Genetic Application Editing Sequence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10 (1966), 707–710.Google Scholar
  2. 2.
    Lowrance, R. and R.A. Wagner: An extension of the string-to-string correction problem. J. ACM 22 (1975), 177–183.Google Scholar
  3. 3.
    Nakatsu, N., Y. Kambayashi and S. Yajima: A longest common subsequence algorithm suitable for similar text strings. Acta Informatica 18 (1982), 171–179.Google Scholar
  4. 4.
    Needleman, S.B. and C.D. Wunsch: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 (1970), 443–453.PubMedGoogle Scholar
  5. 5.
    Peltola,H. & al.: SEQAID — A program package to support biopolymer sequencing. Department of Computer Science and Recombinant DNA Laboratory, University of Helsinki, 1983 (in preparation).Google Scholar
  6. 6.
    Sankoff, D.: Matching sequences under deletion/insertion constraints. Proc. Nat. Acad. Sci. 69 (1972), 4–6.PubMedGoogle Scholar
  7. 7.
    Sellers, P.H.: The theory and computation of evolutionary distances: Pattern recognition. J. Alg. 1 (1980), 359–373.Google Scholar
  8. 8.
    Ukkonen,E.: An algorithm for approximate string matching. In preparation.Google Scholar
  9. 9.
    Wagner, R. and M. Fisher: The string-to-string correction problem. J. ACM 21 (1974), 168–178.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1983

Authors and Affiliations

  • Esko Ukkonen
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinki 25Finland

Personalised recommendations