Advertisement

Fast and practical approximate string matching

  • Ricardo A. Baeza-Yates
  • Chris H. Perleberg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 644)

Abstract

We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    K. Abrahamson. Generalized string matching. SIAM J on Computing, 16:1039–1051, 1987.Google Scholar
  2. 2.
    A.V. Aho and M. Corasick. Efficient string matching: An aid to bibliographic search. C.ACM, 18(6):333–340, June 1975.Google Scholar
  3. 3.
    R. Baeza-Yates and G.H. Gonnet. A new approach to text searching. In Proc. of 12th ACM SIGIR, pages 168–175, Cambridge, Mass., June 1989. (Addendum in ACM SIGIR Forum, V. 23, Numbers 3, 4, 1989, page 7.). To appear in Communications of CACM.Google Scholar
  4. 4.
    R. Baeza-Yates and G.H. Gonnet. Fast string matching with mismatches. Information and Computation, 1992. (to appear). Also as Tech. Report CS-88-36, Dept. of Computer Science, University of Waterloo, 1988.Google Scholar
  5. 5.
    R. Baeza-Yates and M. Régnier. Fast algorithms for two dimensional and multiple pattern matching. In R. Karlsson and J. Gilbert, editors, 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, Lecture Notes in Computer Science 447, pages 332–347, Bergen, Norway, July 1990. Springer-Verlag.Google Scholar
  6. 6.
    W. Chang and E. Lawler. Approximated string matching in sublinear expected time. In Proc. 31st FOCS, pages 116–124, St. Louis, MO, Oct 1990. IEEE.Google Scholar
  7. 7.
    B. Commentz-Walter. A string matching algorithm fast on the average. In ICALP, volume 6 of Lecture Notes in Computer Science, pages 118–132. Springer-Verlag, 1979.Google Scholar
  8. 8.
    M. Fischer and M. Paterson. String matching and other products. In R. Karp, editor, Complexity of Computation (SIAM-AMS Proceedings 7), volume 7, pages 113–125. American Mathematical Society, Providence, RI, 1974.Google Scholar
  9. 9.
    Z. Galil and R. Giancarlo. Improved string matching with k mismatches. SIGACT News, 17:52–54, 1986.Google Scholar
  10. 10.
    Z. Galil and K. Park. An improved algorithm for approximate string matching. In ICALP'89, pages 394–404, Stressa, Italy, 1989.Google Scholar
  11. 11.
    G.H. Gonnet and R. Baeza-Yates. Handbook of Algorithms and Data Structures-In Pascal and C. Addison-Wesley, Wokingham, UK, 1991. (second edition).Google Scholar
  12. 12.
    R. Grossi and F. Luccio. Simple and efficient string matching with k mismatches. Inf. Proc. Letters, 33(3):113–120, July 1989.Google Scholar
  13. 13.
    A. Hume and D.M. Sunday. Fast string searching. Software — Practice and Experience, 21(11):1221–1248, Nov 1991.Google Scholar
  14. 14.
    D.E. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM J on Computing, 6:323–350, 1977.Google Scholar
  15. 15.
    G. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43:239–249, 1986.Google Scholar
  16. 16.
    G. Landau and U. Vishkin. Fast string matching with k differences. JCSS, 37:63–78, 1988.Google Scholar
  17. 17.
    U. Manber and S. Wu. An algorithm for approximate string matching with non uniform costs. Technical Report TR-89-19, Department of Computer Science, University of Arizona, Tucson, Arizona, Sept 1989.Google Scholar
  18. 18.
    P.D. Smith. Experiments with a very fast substring search algorithm. Software — Practice and Experience, 21(10):1065–1074, Oct 1991.Google Scholar
  19. 19.
    M.A. Sridhar. Efficient algorithms for multiple pattern matching. Technical Report Computer Sciences 661, University of Wisconsin-Madison, 1986.Google Scholar
  20. 20.
    J. Tarhio and E. Ukkonen. Boyer-moore approach to approximate string matching. In J.R. Gilbert and R.G. Karlsson, editors, 2nd Scandinavian Workshop in Algorithmic Theory, SWAT'90, Lecture Notes in Computer Science 447, pages 348–359, Bergen, Norway, July 1990. Springer-Verlag.Google Scholar
  21. 21.
    S. Wu. personal communication. 1992.Google Scholar
  22. 22.
    S. Wu and U. Manber. Fast text searching with errors. Technical Report TR-91-11, Department of Computer Science, University of Arizona, Tucson, Arizona, June 1991.Google Scholar
  23. 23.
    S. Wu and U. Manber. Agrep — a fast approximate pattern-matching tool. In Proceedings of USENIX Winter 1992 Technical Conference, pages 153–162, San Francisco, CA, Jan 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Ricardo A. Baeza-Yates
    • 1
  • Chris H. Perleberg
    • 1
  1. 1.Depto. de Ciencias de la ComputaciónUniversidad de ChileSantiagoChile

Personalised recommendations