Skip to main content

Tuning String Matching for Huge Pattern Sets

  • Conference paper
  • First Online:

Part of the Lecture Notes in Computer Science book series (LNCS,volume 2676)

Abstract

We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q-grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g. different trie implementations of the Aho-Corasick algorithm. Our algorithms showed to be substantially faster than earlier solutions for sets of 1,000–100,000 patterns. The gain is due to the improved filtering efficiency caused by q-grams.

Keywords

  • Hash Table
  • Memory Usage
  • Binary Search
  • String Match
  • Single Pattern

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/3-540-44888-8_16
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-540-44888-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aho, M. Corasick: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18,6 (1975), 333–340.

    MATH  CrossRef  MathSciNet  Google Scholar 

  2. R. Baeza-Yates. Improved string searching. Software — Practice and Experience, 19,3 (1989), 257–271.

    CrossRef  MathSciNet  Google Scholar 

  3. R. Baeza-Yates, G. Gonnet: A new approach to text searching. Communications of ACM 35,10 (1992), 74–82.

    CrossRef  Google Scholar 

  4. R. Boyer, S. Moore: A fast string searching algorithm. Communications of the ACM 20 (1977), 762–772.

    CrossRef  Google Scholar 

  5. B. Commentz-Walter: A string matching algorithm fast on the average. Proc. 6th International Colloquium on Automata, Languages and Programming, Lecture Notes on Computer Science 71, 1979, 118–132.

    Google Scholar 

  6. M. Crochemore, W. Rytter: Text algorithms. Oxford University Press, 1994.

    Google Scholar 

  7. K. Fredriksson: Fast string matching with super-alphabet. Proc. SPIRE’ 02, String Processing and Information Retrieval, Lecture Notes in Computer Science 2476, 2002, 44–57.

    CrossRef  Google Scholar 

  8. M. Fisk, G. Varghese: Fast content-based packet handling for intrusion detection. UCSD Technical Report CS2001-0670, 2001.

    Google Scholar 

  9. B. Gum, R. Lipton: Cheaper by the dozen: batched algorithms. Proc. First SIAM International Conference on Data Mining, 2001

    Google Scholar 

  10. N. Horspool: Practical fast searching in strings. Software — Practice and Experience 10 (1980), 501–506.

    CrossRef  Google Scholar 

  11. R. Karp, M. Rabin: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31 (1987), 249–260.

    MATH  MathSciNet  CrossRef  Google Scholar 

  12. R. Muth, U. Manber: Approximate multiple string search. Proc. CPM’ 96, Combinatorial Pattern Matching, Lecture Notes in Computer Science 1075, 1996, 75–86.

    Google Scholar 

  13. G. Navarro, M. Raffinot: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithms 5,4 (2000), 1–36.

    MathSciNet  Google Scholar 

  14. G. Navarro, M. Raffinot: Flexible pattern matching in strings. Cambridge University Press, 2002.

    Google Scholar 

  15. S. Wu, U. Manber: A fast algorithm for multi-pattern searching. Report TR-94-17, Department of Computer Science, University of Arizona, 1994.

    Google Scholar 

  16. S. Wu, U. Manber: Agrep — A fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 1992, 153–162.

    Google Scholar 

  17. R. Zhu, T. Takaoka: A technique for two-dimensional pattern matching. Communications of the ACM 32 (1989), 1110–1120.

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kytöjoki, J., Salmela, L., Tarhio, J. (2003). Tuning String Matching for Huge Pattern Sets. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.