A Compressed Self-index Using a Ziv-Lempel Dictionary

Russo, Luís M. S.; Oliveira, Arlindo L.

doi:10.1007/11880561_14

Luís M. S. Russo¹⁹ &
Arlindo L. Oliveira¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

599 Accesses
6 Citations

Abstract

A compressed full-text self-index for a text T, of size u, is a data structure used to search patterns P, of size m, in T that requires reduced space, i.e. that depends on the empirical entropy (H _k, H ₀) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m+occ)logn) time, where occ is the number of occurrences and σ the size of the alphabet of T. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m ²) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the \(\mathcal{T}_{78}\) suffix tree. We show that our method is very competitive in practice by comparing it against the LZ-Index, the FM-index and a compressed suffix array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. McGraw, New York (2001)
MATH Google Scholar
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)
Article MATH MathSciNet Google Scholar
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)
Article MathSciNet Google Scholar
Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)
Article MATH MathSciNet Google Scholar
Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing, pp. 141–155. Carleton University Press (1996)
Google Scholar
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Article MATH MathSciNet Google Scholar
Manzini, G.: An analysis of the burrows-wheeler transform. J. ACM 48(3), 407–430 (2001)
Article MathSciNet Google Scholar
Makinen, V., Navarro, G.: Compressed full text indexes. Technical Report TR/DCC-2006-6, Dept. of Computer Science, University of Chile (2006) 2nd version (2006)
Google Scholar
Arroyuelo, D., Navarro, G., Sadakane, K.: Reducing the space requirement of LZ-index. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 318–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)
Chapter Google Scholar
Grabowski, S., Mäkinen, V., Navarro, G.: First huffman, then burrows-wheeler: A simple alphabet-independent FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 210–211. Springer, Heidelberg (2004)
Chapter Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1999)
Google Scholar
Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Google Scholar
Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: SODA, pp. 1–10. SIAM, Philadelphia (2004)
Google Scholar
Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)
Chapter Google Scholar
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Google Scholar
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17(3), 427–462 (1988)
Article MATH MathSciNet Google Scholar
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)
Article MATH MathSciNet Google Scholar
Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with lempel-ziv algorithms. SIAM J. Comput. 29(3), 893–911 (1999)
Article MathSciNet Google Scholar
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239. ACM Press, New York (2006)
Chapter Google Scholar
http://www.dcc.uchile.cl/~gnavarro/eindex.html
http://pizzachili.dcc.uchile.cl/
Sadakane, K.: Compressed suffix trees with full functionality (to appear in Theory of Computing Systems)
Google Scholar

Download references

Author information

Authors and Affiliations

INESC-ID/IST,
Luís M. S. Russo & Arlindo L. Oliveira

Authors

Luís M. S. Russo
View author publications
You can also search for this author in PubMed Google Scholar
Arlindo L. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, University of Strathclyde, Scotland
Fabio Crestani
Dipartimento di Informatica, University of Pisa, Largo B. Pontecorvo 3, 56127, Pisa, Italy
Paolo Ferragina
Department of Information Studies, University of Sheffield, Sheffield, UK
Mark Sanderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Russo, L.M.S., Oliveira, A.L. (2006). A Compressed Self-index Using a Ziv-Lempel Dictionary. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_14

Download citation

DOI: https://doi.org/10.1007/11880561_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics