Fast approximate matching using suffix trees

  • Archie L. Cobbs
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 937)


Let T be a text of length n and P a pattern of length m, both strings over a fixed finite alphabet σ. We wish to find all approximate occurrences of P in T having weighted edit distance at most k from P: this is the approximate substring matching problem. We focus on the case in which T is fixed and preprocessed in linear time, while P and k vary over consecutive searches. We give an O(mq+t vanocc ) time and O(q) space algorithm, where q≤n depends on the problem instance, and t vanocc is the size of the output. The running time is proportional to the amount of matching, in the worst case as fast as standard dynamic programming. The algorithm uses the suffix tree representation of the text. The best previous algorithm requires O(mq log q+t vanocc ) time and O(mq) space.


Hash Table Edit Distance Suffix Tree Edit Operation Branch Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    D. Benson, D. J. Lipman, and J. Ostell. Genbank. Nucl. Acids Res., 21(13):2963–2965, 1993.PubMedGoogle Scholar
  2. 2.
    A. Blumer, J. Blumer, A. Ehrenfeucht, D. Haussler, and R. McConnell. Building a complete inverted file for a set of text files in linear time. In FOCS, pages 349–358. ACM, January 1984.Google Scholar
  3. 3.
    M. T. Chen and Joel Seiferas. Efficient and Elegant Subword Tree Construction, pages 97–107. Springer-Verlag, Berlin, 1985.Google Scholar
  4. 4.
    P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts. In Proc. MFCS 1991, volume 16, pages 240–248. Springer-Verlag, September 1991.Google Scholar
  5. 5.
    G. M. Landau and U. Vishkin. Fast string matching with k differences. J. Comp. Sys. Sci., 37:63–78, 1988.CrossRefGoogle Scholar
  6. 6.
    G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. Algorithms, 10:157–169, 1989.CrossRefGoogle Scholar
  7. 7.
    Edward M. McCreight. A space-economical suffix tree construction algorithm. JACM, 23(2):262–272, April 1976.Google Scholar
  8. 8.
    E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12:345–374, 1994.CrossRefGoogle Scholar
  9. 9.
    D. Sankoff and J. B. Kruskal, editors. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.Google Scholar
  10. 10.
    E. Ukkonen. Approximate matching over suffix trees. In Proc. Combinatorial Pattern Matching 1993, volume 4, pages 228–242. Springer-Verlag, June 1993.Google Scholar
  11. 11.
    P. Weiner. Linear pattern matching algorithms. In 14th Annual Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Archie L. Cobbs
    • 1
  1. 1.Computer Science DivisionUniversity of California BerkeleyBerkeley

Personalised recommendations