Approximate multiple string search

Muth, Robert; Manber, Udi

doi:10.1007/3-540-61258-0_7

Robert Muth¹ &
Udi Manber¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1075))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

214 Accesses
29 Citations

Abstract

This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utilization of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of two-level hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Two-level hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications.

Supported in part by NSF grant CCR-9301129, and by the Advanced Research Projects Agency under contract number DABT63-93-C-0052.

The information contained in this paper does not necessarily reflect the position or the policy of the U.S. Government or other sponsors of this research. No official endorsement should be inferred.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal A., B. Alpern, A. K. Chandra, and M. Snir, “A Model for Hierarchical Memory,” ACM Symposium on Theory of Computing, New York City (May 1987), pp. 305–314.
Google Scholar
Commentz-Walter, B, “A string matching algorithm fast on the average,” Proc. 6th International Colloquium on Automata, Languages, and Programming (1979), pp. 118–132.
Google Scholar
Haertel, M., “Gnugrep-2.0,” Usenet archive comp.sources.reviewed, Volume 3 (July, 1993).
Google Scholar
Knuth D. E., J. H. Morris, and V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing 6 (June 1977), pp. 323–350.
Article Google Scholar
Mor M., and S. Fraenkel, “A Hash Code Method for Detecting and Correcting Spelling Errors,” Comm. of the ACM, 25 (December 1982), pp. 935–938.
Google Scholar
Manber U., and S. Wu, “An Algorithm for Approximate Membership Checking With Application to Password Security,” Information Processing Letters 50 (May 1994), pp. 191–197.
Google Scholar
Wu S., and U. Manber, “Agrep — A Fast Approximate Pattern-Matching Tool,” Usenix Winter 1992 Technical Conference, San Francisco (January 1992), pp. 153–162.
Google Scholar
Wu S., and U. Manber, “Fast Text Searching Allowing Errors,” Communications of the ACM 35 (October 1992), pp. 83–91.
Google Scholar
Wu S., and U. Manber, “A Fast Algorithm for Multi-Pattern Searching,” Technical Report TR-94-17, Department of Computer Science, University of Arizona (May 1993).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Arizona, 85721, Tucson, AZ
Robert Muth & Udi Manber

Authors

Robert Muth
View author publications
You can also search for this author in PubMed Google Scholar
Udi Manber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dan Hirschberg Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muth, R., Manber, U. (1996). Approximate multiple string search. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_7

Download citation

DOI: https://doi.org/10.1007/3-540-61258-0_7
Published: 01 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics