A Compressed Self-index Using a Ziv-Lempel Dictionary

  • Luís M. S. Russo
  • Arlindo L. Oliveira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4209)


A compressed full-text self-index for a text T, of size u, is a data structure used to search patterns P, of size m, in T that requires reduced space, i.e. that depends on the empirical entropy (H k , H 0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m+occ)logn) time, where occ is the number of occurrences and σ the size of the alphabet of T. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m 2) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the \(\mathcal{T}_{78}\) suffix tree. We show that our method is very competitive in practice by comparing it against the LZ-Index, the FM-index and a compressed suffix array.


Binary Search Range Query Space Requirement Father Node Space Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. McGraw, New York (2001)MATHGoogle Scholar
  2. 2.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of the 3rd South American Workshop on String Processing, pp. 141–155. Carleton University Press (1996)Google Scholar
  6. 6.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Manzini, G.: An analysis of the burrows-wheeler transform. J. ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Makinen, V., Navarro, G.: Compressed full text indexes. Technical Report TR/DCC-2006-6, Dept. of Computer Science, University of Chile (2006) 2nd version (2006)Google Scholar
  9. 9.
    Arroyuelo, D., Navarro, G., Sadakane, K.: Reducing the space requirement of LZ-index. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 318–329. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Grabowski, S., Mäkinen, V., Navarro, G.: First huffman, then burrows-wheeler: A simple alphabet-independent FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 210–211. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1999)Google Scholar
  13. 13.
    Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  14. 14.
    Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: SODA, pp. 1–10. SIAM, Philadelphia (2004)Google Scholar
  15. 15.
    Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)Google Scholar
  17. 17.
    Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17(3), 427–462 (1988)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with lempel-ziv algorithms. SIAM J. Comput. 29(3), 893–911 (1999)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: SODA, pp. 1230–1239. ACM Press, New York (2006)CrossRefGoogle Scholar
  21. 21.
  22. 22.
  23. 23.
    Sadakane, K.: Compressed suffix trees with full functionality (to appear in Theory of Computing Systems)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Luís M. S. Russo
    • 1
  • Arlindo L. Oliveira
    • 1
  1. 1.INESC-ID/IST 

Personalised recommendations