Succinct Dictionary Matching with No Slowdown

Belazzougui, Djamal

doi:10.1007/978-3-642-13509-5_9

Djamal Belazzougui¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6129))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

788 Accesses
29 Citations

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(mlogm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(logσ + O(1)) + O(dlog(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(nlogσ) bits of space while answering queries in O(|T|loglogn + occ) time. In the paper we also show how the space occupancy can be reduced to m(H ₀ + O(1)) + O(dlog(n/d)) where H ₀ is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < m ^ε for any constant 0 < ε< 1. The query time remains unchanged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Article MATH MathSciNet Google Scholar
Chan, H.-L., Hon, W.-K., Lam, T.W., Sadakane, K.: Dynamic dictionary matching and compressed suffix trees. In: SODA, pp. 13–22 (2005)
Google Scholar
Clark, D.R., Munro, J.I.: Efficient suffix trees on secondary storage (extended abstract). In: SODA, pp. 383–391 (1996)
Google Scholar
Dori, S., Landau, G.M.: Construction of aho corasick automaton in linear time for integer alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 168–177. Springer, Heidelberg (2005)
Google Scholar
Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21(2), 246–260 (1974)
Article MATH MathSciNet Google Scholar
Fano, R.M.: On the number of bits required to implement an associative memory, Memorandum 61, Computer Structures Group, Project MAC. MIT, Cambridge (1971)
Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: FOCS, pp. 137–143 (1997)
Google Scholar
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In: STOC, pp. 397–406 (2000)
Google Scholar
Hon, W.-K., Lam, T.W., Shah, R., Tam, S.-L., Vitter, J.S.: Compressed index for dictionary matching. In: DCC, pp. 23–32 (2008)
Google Scholar
Hon, W.-K., Lam, T.W., Shah, R., Tam, S.-L., Vitter, J.S.: Succinct index for dynamic dictionary matching. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878. Springer, Heidelberg (2009)
Chapter Google Scholar
Jansson, J., Sadakane, K., Sung, W.-K.: Ultra-succinct representation of ordered trees. In: SODA, pp. 575–584 (2007)
Google Scholar
Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31(3), 762–776 (2001)
Article MATH MathSciNet Google Scholar
Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)
Google Scholar
Sadakane, K.: Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Lee, D.T., Teng, S.-H. (eds.) ISAAC 2000. LNCS, vol. 1969, pp. 410–421. Springer, Heidelberg (2000)
Chapter Google Scholar
Tam, A., Wu, E., Lam, T.W., Yiu, S.-M.: Succinct text indexing with wildcards. In: SPIRE, pp. 39–50 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAFA, Univ. Paris Diderot - Paris 7, 75205, Paris Cedex 13, France
Djamal Belazzougui

Authors

Djamal Belazzougui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA, and Bar-Ilan University, 52900, Ramat-Gan, Israel
Amihood Amir
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Laxmi Parida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Belazzougui, D. (2010). Succinct Dictionary Matching with No Slowdown. In: Amir, A., Parida, L. (eds) Combinatorial Pattern Matching. CPM 2010. Lecture Notes in Computer Science, vol 6129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13509-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-13509-5_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13508-8
Online ISBN: 978-3-642-13509-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics