Advertisement

Approximate multiple string search

  • Robert Muth
  • Udi Manber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1075)

Abstract

This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utilization of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of two-level hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Two-level hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications.

Keywords

Hash Function Hash Table String Match Cache Performance Good Running Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AACS1987]
    Aggarwal A., B. Alpern, A. K. Chandra, and M. Snir, “A Model for Hierarchical Memory,” ACM Symposium on Theory of Computing, New York City (May 1987), pp. 305–314.Google Scholar
  2. [CW79]
    Commentz-Walter, B, “A string matching algorithm fast on the average,” Proc. 6th International Colloquium on Automata, Languages, and Programming (1979), pp. 118–132.Google Scholar
  3. [Ha93]
    Haertel, M., “Gnugrep-2.0,” Usenet archive comp.sources.reviewed, Volume 3 (July, 1993).Google Scholar
  4. [KMP77]
    Knuth D. E., J. H. Morris, and V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing 6 (June 1977), pp. 323–350.CrossRefGoogle Scholar
  5. [MF82]
    Mor M., and S. Fraenkel, “A Hash Code Method for Detecting and Correcting Spelling Errors,” Comm. of the ACM, 25 (December 1982), pp. 935–938.Google Scholar
  6. [MW94]
    Manber U., and S. Wu, “An Algorithm for Approximate Membership Checking With Application to Password Security,” Information Processing Letters 50 (May 1994), pp. 191–197.Google Scholar
  7. [WM92a]
    Wu S., and U. Manber, “Agrep — A Fast Approximate Pattern-Matching Tool,” Usenix Winter 1992 Technical Conference, San Francisco (January 1992), pp. 153–162.Google Scholar
  8. [WM92b]
    Wu S., and U. Manber, “Fast Text Searching Allowing Errors,” Communications of the ACM 35 (October 1992), pp. 83–91.Google Scholar
  9. [WM93]
    Wu S., and U. Manber, “A Fast Algorithm for Multi-Pattern Searching,” Technical Report TR-94-17, Department of Computer Science, University of Arizona (May 1993).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Robert Muth
    • 1
  • Udi Manber
    • 1
  1. 1.Department of Computer ScienceUniversity of ArizonaTucson

Personalised recommendations