Advertisement

Reducing the Space Requirement of LZ-Index

  • Diego Arroyuelo
  • Gonzalo Navarro
  • Kunihiko Sadakane
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

Abstract

The LZ-index is a compressed full-text self-index able to represent a text P 1...m, over an alphabet of size \(\sigma = O(\textrm{polylog}(u))\) and with k-th order empirical entropy H k (T), using 4uH k (T) + o(ulogσ) bits for any k = o(log σ u). It can report all the occ occurrences of a pattern P 1...m in T in O(m 3logσ + (m + occ)logu) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uH k (T) + o(ulogσ) bits of space, for any constant ε> 0, and we simultaneously improve the search time to O(m 2logm + (m + occ)logu). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/log σ u) time. In addition, we show how the space can be squeezed to (1 + ε)uH k (T) + o(ulogσ) to obtain a structure with O(m 2) average search time for \(m \geqslant 2\log_\sigma{u}\).

Keywords

Space Requirement Navigation Scheme Phrase Pair Text Substring Operation Parent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benoit, D., Demaine, E., Munro, I., Raman, R., Raman, V., Rao, S.S.: Representing trees of higher degree. Algorithmica 43(4), 275–292 (2005)CrossRefMathSciNetMATHGoogle Scholar
  2. 2.
    Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. on Computing 17(3), 427–462 (1988)CrossRefMathSciNetMATHGoogle Scholar
  3. 3.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: Proc. FOCS, pp. 184–196 (2005)Google Scholar
  4. 4.
    Ferragina, P., Manzini, G.: Indexing compressed texts. J. of the ACM 54(4), 552–581 (2005)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004), Extended version: ACM TALG (to appear)CrossRefGoogle Scholar
  6. 6.
    Geary, R., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)Google Scholar
  7. 7.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. SODA, pp. 841–850 (2003)Google Scholar
  8. 8.
    Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. on Computing 29(3), 893–911 (1999)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. J. of the ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Morrison, D.R.: Patricia – practical algorithm to retrieve information coded in alphanumeric. J. of the ACM 15(4), 514–534 (1968)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  12. 12.
    Munro, I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Munro, J.I., Raman, V.: Succinct Representation of Balanced Parentheses and Static Trees. SIAM J. on Computing 31(3), 762–776 (2001)CrossRefMathSciNetMATHGoogle Scholar
  14. 14.
    Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004), See also TR/DCC-2003-0, Dept. of CS, U. Chile, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/jlzindex.ps.gz
  15. 15.
    Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA, pp. 233–242 (2002)Google Scholar
  16. 16.
    Sadakane, K.: New Text Indexing Functionalities of the Compressed Suffix Arrays. J. of Algorithms 48(2), 294–313 (2003)CrossRefMathSciNetMATHGoogle Scholar
  17. 17.
    Sadakane, K., Grossi, R.: Squeezing Succinct Data Structures into Entropy Bounds. In: Proc. SODA, pp. 1230–1239 (2006)Google Scholar
  18. 18.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable–rate coding. IEEE Trans. Information Theory 24(5), 530–536 (1978)CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Diego Arroyuelo
    • 1
  • Gonzalo Navarro
    • 1
  • Kunihiko Sadakane
    • 2
  1. 1.Dept. of Computer ScienceUniversidad de Chile 
  2. 2.Dept. of Computer Science and Communication EngineeringKyushu UniversityJapan

Personalised recommendations