Advertisement

Statistical Encoding of Succinct Data Structures

  • Rodrigo González
  • Gonzalo Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

Abstract

In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any sequence S=s 1 s 2...s n , over an alphabet of size σ, using \(nH_k(S)+O(\frac{n}{\log_\sigma n} (k \log \sigma + \log\log n))\) bits of space, where H k (S) is the k-th order empirical entropy of S. The representation permits extracting any substring of size Θ(log σ n) in constant time, and thus it completely replaces S under the RAM model. This is extremely important because it permits converting any succinct data structure requiring o(|S|) = o(nlogσ) bits in addition to S, into another requiring nH k (S)+o(nlogσ) (overall) for any k = o(log σ n). They achieve this result by using Ziv-Lempel compression, and conjecture that the result can in particular be useful to implement compressed full-text indexes.

In this paper we extend their result, by obtaining the same space and time complexities using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. In addition, we prove some results on the applicability of the scheme for full-text self-indexing.

Keywords

Statistical Encode Arithmetic Code Metic Code Wavelet Tree Rank Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bell, T., Cleary, J., Witten, I.: Text compression. Prentice-Hall, Englewood Cliffs (1990)Google Scholar
  2. 2.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  3. 3.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Structuring labeled trees for optimal succinctness, and beyond. In: Proc. 46th FOCS (2005)Google Scholar
  4. 4.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and searching XML data via two zips. In: Proc. 15th WWW 2006 (2006)Google Scholar
  5. 5.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. 41st FOCS, pp. 390–398 (2000)Google Scholar
  6. 6.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: An alphabet-friendly FM-index. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 150–160. Springer, Heidelberg (2004) (Extended version to appear in ACM TALG)CrossRefGoogle Scholar
  7. 7.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)Google Scholar
  8. 8.
    Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)MathSciNetGoogle Scholar
  10. 10.
    Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  12. 12.
    Munro, I., Raman, R., Raman, V., Rao, S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Munro, I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. 38th FOCS, pp. 118–126 (1997)Google Scholar
  14. 14.
    Munro, I., Rao, S.S.: Succinct Representations of Functions. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 1006–1015. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004)CrossRefMATHGoogle Scholar
  16. 16.
    Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)Google Scholar
  17. 17.
    Sadakane, K., Grossi, R.: Personal communication (2005)Google Scholar
  18. 18.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. 17th SODA, pp. 1230–1239 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rodrigo González
    • 1
  • Gonzalo Navarro
    • 1
  1. 1.Department of Computer ScienceUniversity of Chile 

Personalised recommendations