Skip to main content

Succinct Dictionary Matching with No Slowdown

  • Conference paper
Combinatorial Pattern Matching (CPM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6129))

Included in the following conference series:

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(mlogm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(logσ + O(1)) + O(dlog(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(nlogσ) bits of space while answering queries in O(|T|loglogn + occ) time. In the paper we also show how the space occupancy can be reduced to m(H 0 + O(1)) + O(dlog(n/d)) where H 0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < m ε for any constant 0 < ε< 1. The query time remains unchanged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  2. Chan, H.-L., Hon, W.-K., Lam, T.W., Sadakane, K.: Dynamic dictionary matching and compressed suffix trees. In: SODA, pp. 13–22 (2005)

    Google Scholar 

  3. Clark, D.R., Munro, J.I.: Efficient suffix trees on secondary storage (extended abstract). In: SODA, pp. 383–391 (1996)

    Google Scholar 

  4. Dori, S., Landau, G.M.: Construction of aho corasick automaton in linear time for integer alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 168–177. Springer, Heidelberg (2005)

    Google Scholar 

  5. Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21(2), 246–260 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  6. Fano, R.M.: On the number of bits required to implement an associative memory, Memorandum 61, Computer Structures Group, Project MAC. MIT, Cambridge (1971)

    Google Scholar 

  7. Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS, pp. 137–143 (1997)

    Google Scholar 

  8. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)

    Google Scholar 

  9. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: STOC, pp. 397–406 (2000)

    Google Scholar 

  10. Hon, W.-K., Lam, T.W., Shah, R., Tam, S.-L., Vitter, J.S.: Compressed index for dictionary matching. In: DCC, pp. 23–32 (2008)

    Google Scholar 

  11. Hon, W.-K., Lam, T.W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees. In: SODA, pp. 575–584 (2007)

    Google Scholar 

  13. Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31(3), 762–776 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  14. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)

    Google Scholar 

  15. Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Tam, A., Wu, E., Lam, T.W., Yiu, S.-M.: Succinct text indexing with wildcards. In: SPIRE, pp. 39–50 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Belazzougui, D. (2010). Succinct Dictionary Matching with No Slowdown. In: Amir, A., Parida, L. (eds) Combinatorial Pattern Matching. CPM 2010. Lecture Notes in Computer Science, vol 6129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13509-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13509-5_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13508-8

  • Online ISBN: 978-3-642-13509-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics