Skip to main content

Faster String Matching with Super-Alphabets

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2476))

Included in the following conference series:

Abstract

Given a text T [1... n] and a pattern P [1... m] over some alphabet Σ of size σ, finding the exact occurrences of P in T requires at least Ω (n logσ m/m character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an Ω (n logσ m/m lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an \( \mathcal{O}(n/m) \) average time algorithm. This is achieved by slightly changing the model of computation, and with a modification of an existing algorithm. Our technique uses a super-alphabet for simulating suffix automaton. The space usage of the algorithm is \( \mathcal{O}(\sigma m) \). The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time \( \mathcal{O}(n/m) \), and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time \( \mathcal{O}(nk/m) \) for \( k \leqslant \mathcal{O}(m/\log _\sigma m) \). The known lower bound for this problem is Ω (n(k+logσ m)/m), given in [6]. Finally we show how to adopt a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift-or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the ram model of computation, and that each symbol is coded in b =⌈log2 σ⌉ bits. They work for larger b too, but the speed-up is decreased

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Commun. ACM, 18(6):333–340, 1975.

    Article  MATH  MathSciNet  Google Scholar 

  2. R.A. Baeza-Yates. Improved string searching. Softw. Pract. Exp., 19(3):257–271, 1989.

    Article  MathSciNet  Google Scholar 

  3. R.A. Baeza-Yates. String searching algorithms revisited. In F. Dehne, J.R. Sack, and N. Santoro, editors, Proceedings of the 1st Workshop on Algorithms and Data Structures, number 382 in Lecture Notes in Computer Science, pages 75–96, Ottawa, Canada, 1989. Springer-Verlag, Berlin.

    Google Scholar 

  4. R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Commun. ACM, 35(10):74–82, 1992.

    Article  Google Scholar 

  5. R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762–772, 1977.

    Article  Google Scholar 

  6. W. I. Chang and T. Marr. Approximate string matching with local similarity. In M. Crochemore and D. Gusfield, editors, Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching, number 807 in Lecture Notes in Computer Science, pages 259–273, Asilomar, CA, 1994. Springer-Verlag, Berlin.

    Google Scholar 

  7. M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string matching algorithms. Algorithmica, 12(4/5):247–267, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  8. M. Crochemore, A. Czumaj, L. Gasieniec, T. Lecroq, W. Plandowski, and W. Rytter. Fast practical multi-pattern matching. Inf. Process. Lett., 71((3-4)): 107–113, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  9. R.N. Horspool. Practical fast searching in strings. Softw. Pract. Exp., 10(6):501–506, 1980.

    Article  Google Scholar 

  10. D. A. Huffman. A method for the construction of minimum redundancy codes. Proc. I.R.E., 40:1098–1101, 1951.

    Article  Google Scholar 

  11. D.E. Knuth, J.H. Morris, Jr, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(1):323–350, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  12. W. J. Masek and M.S. Paterson. A faster algorithm for computing string edit distances. J. Comput. Syst. Sci., 20(1):18–31, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  13. M. Miyazaki, S. Fukamachi, M. Takeda, and T. Shinohara. Speeding up the pattern matching machine for compressed texts. Transactions of Information Processing Society of Japan, 39(9):2638–2648, 1998.

    MathSciNet  Google Scholar 

  14. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 18(2):113–139, 2000.

    Article  Google Scholar 

  15. G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. In M. Farach-Colton, editor, Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching, number 1448 in Lecture Notes in Computer Science, pages 14–33, Piscataway, NJ, 1998. Springer-Verlag, Berlin.

    Chapter  Google Scholar 

  16. G. Navarro and J. Tarhio. Boyer-Moore string matching over ziv-lempel compressed text. In R. Giancarlo and D. Sankoff, editors, Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching, number 1848 in Lecture Notes in Computer Science, pages 166–180, Montréal, Canada, 2000. Springer-Verlag, Berlin.

    Chapter  Google Scholar 

  17. J. Tarhio and H. Peltola. String matching in the DNA alphabet. Softw. Pract. Exp., 27(7):851–861, 1997.

    Article  Google Scholar 

  18. S. Wu and U. Manber. Fast text searching allowing errors. Commun. ACM, 35(10):83–91, 1992.

    Article  Google Scholar 

  19. A. C. Yao. The complexity of pattern matching for a random string. SIAM J. Comput., 8(3):368–387, 1979.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fredriksson, K. (2002). Faster String Matching with Super-Alphabets. In: Laender, A.H.F., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2002. Lecture Notes in Computer Science, vol 2476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45735-6_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45735-6_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44158-8

  • Online ISBN: 978-3-540-45735-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics