Abstract
Given a set \({\cal D}\) of d patterns, the dictionary matching problem is to index \({\cal D}\) such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When \({\cal D}\) contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) gave an \(nH_k({\cal D}) + o(n \log \sigma)\)-bit index which supports a query in O(|T| (logε n + logd) + occ) time, where ε> 0 and \(H_k({\cal D})\) denotes the kth order entropy of \({\cal D}\). Very recently, Belazzougui (2010) proposed an elegant scheme, which takes n logσ + O(n) bits of index space and supports a query in optimal O(|T| + occ) time. In this paper, we provide connections between Belazzougui’s index and the XBW compression of Ferragina et al. (2005), and show that Belazzougui’s index can be slightly modified to be stored in \(nH_k({\cal D}) + O(n)\) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) in both space and time.
This work is supported in part by Taiwan NSC Grant 96-2221-E-007-082 and US NSF Grants CCF-1017623 and CCF-0621457.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A., Corasick, M.: Efficient String Matching: An Aid to Bibliographic Search. Communications of the ACM 18(6), 333–340 (1975)
Amir, A., Farach, M., Matias, Y.: Efficient Randomized Dictionary Matching Algorithms (Extended Abstract). In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 262–275. Springer, Heidelberg (1992)
Belazzougui, D.: Succinct Dictionary Matching With No Slowdown. In: Amir, A., Parida, L. (eds.) Combinatorial Pattern Matching. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010)
Burrows, M., Wheeler, D.J.: A Block-sorting Lossless Data Compression Algorithm. Technical Report 124, Digital Equipment Corporation, Paolo Alto, CA, USA (1994)
Chan, H.L., Hon, W.K., Lam, T.W., Sadakane, K.: Compressed Indexes for Dynamic Text Collections. ACM Transactions on Algorithms 3(2) (2007)
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring Labeled Trees for Optimal Succinctness, and Beyond. In: Proceedings of Symposium on Foundations of Computer Science, pp. 184–196 (2005)
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and Indexing Labeled Trees, With Applications. Journal of the ACM 57(1) (2009)
Ferragina, P., Manzini, G.: Indexing Compressed Text. Journal of the ACM 52(4), 552–581 (2005); A preliminary version appears in FOCS 2000
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed Representations of Sequences and Full-Text Indexes. ACM Transactions on Algorithms 3(2) (2007)
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM Journal on Computing 35(2), 378–407 (2005); A preliminary version appears in STOC 2000
Gupta, A., Hon, W.K., Shah, R., Vitter, J.S.: A Framework for Dynamizing Succinct Data Structures. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 521–532. Springer, Heidelberg (2007)
Hon, W.-K., Lam, T.-W., Shah, R., Tam, S.-L., Vitter, J.S.: Compressed Index for Dictionary Matching. In: Proceedings of Data Compression Conference, pp. 23–32 (2008)
Jacobson, G.: Space-efficient Static Trees and Graphs. In: Proceedings of Symposium on Foundations of Computer Science, pp. 549–554 (1989)
McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)
Raman, R., Raman, V., Rao, S.S.: Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees and Multisets. In: Proceedings of Symposium on Discrete Algorithms, pp. 233–242 (2002)
Tam, A., Wu, E., Lam, T.W., Yiu, S.M.: Succinct Text Indexing With Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) String Processing and Information Retrieval. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)
Weiner, P.: Linear Pattern Matching Algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, Los Altos (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hon, WK., Ku, TH., Shah, R., Thankachan, S.V., Vitter, J.S. (2010). Faster Compressed Dictionary Matching. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-16321-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)