Skip to main content

On approximate string matching

  • Conference paper
  • First Online:
Foundations of Computation Theory (FCT 1983)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 158))

Included in the following conference series:

Abstract

An algorithm is given for computing the edit distance as well as the corresponding sequence of editing steps (insertions, deletions, changes, transpositions of adjacent symbols) between two strings a1a2...am and b1b2...bn. The algorithm needs time 0(s·min(m,n)) and space 0(s2) where s is the edit distance, that is, the minimum number of editing steps needed to transform a1a2...am to b1b2...bn. For small s this is a considerable improvement over the best previously known algorithm that needs time and space 0(mn). If the editing sequence is not required, the space complexity of our algorithm reduces to 0(s). Given a threshold value t, the algorithm can also be modified to test in time 0(t·min(m,n)) and space 0(t) whether the edit distance of the two strings is at most t.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10 (1966), 707–710.

    Google Scholar 

  2. Lowrance, R. and R.A. Wagner: An extension of the string-to-string correction problem. J. ACM 22 (1975), 177–183.

    Google Scholar 

  3. Nakatsu, N., Y. Kambayashi and S. Yajima: A longest common subsequence algorithm suitable for similar text strings. Acta Informatica 18 (1982), 171–179.

    Google Scholar 

  4. Needleman, S.B. and C.D. Wunsch: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 (1970), 443–453.

    PubMed  Google Scholar 

  5. Peltola,H. & al.: SEQAID — A program package to support biopolymer sequencing. Department of Computer Science and Recombinant DNA Laboratory, University of Helsinki, 1983 (in preparation).

    Google Scholar 

  6. Sankoff, D.: Matching sequences under deletion/insertion constraints. Proc. Nat. Acad. Sci. 69 (1972), 4–6.

    PubMed  Google Scholar 

  7. Sellers, P.H.: The theory and computation of evolutionary distances: Pattern recognition. J. Alg. 1 (1980), 359–373.

    Google Scholar 

  8. Ukkonen,E.: An algorithm for approximate string matching. In preparation.

    Google Scholar 

  9. Wagner, R. and M. Fisher: The string-to-string correction problem. J. ACM 21 (1974), 168–178.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marek Karpinski

Rights and permissions

Reprints and permissions

Copyright information

© 1983 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ukkonen, E. (1983). On approximate string matching. In: Karpinski, M. (eds) Foundations of Computation Theory. FCT 1983. Lecture Notes in Computer Science, vol 158. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-12689-9_129

Download citation

  • DOI: https://doi.org/10.1007/3-540-12689-9_129

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-12689-8

  • Online ISBN: 978-3-540-38682-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics