Advertisement

Practical Rank/Select Queries over Arbitrary Sequences

  • Francisco Claude
  • Gonzalo Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5280)

Abstract

We present a practical study on the compact representation of sequences supporting rank, select, and access queries. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform, especially in the case of sequences with very large alphabets. We first present a new practical implementation of the compressed representation for bit sequences proposed by Raman, Raman, and Rao [SODA 2002], that is competitive with the existing ones when the sequences are not too compressible. It also has nice local compression properties, and we show that this makes it an excellent tool for compressed text indexing in combination with the Burrows-Wheeler transform. This shows the practicality of a recent theoretical proposal [Mäkinen and Navarro, SPIRE 2007], achieving spaces never seen before. Second, for general sequences, we tune wavelet trees for the case of very large alphabets, by removing their pointer information. We show that this gives an excellent solution for representing a sequence within zero-order entropy space, in cases where the large alphabet poses a serious challenge to typical encoding methods. We also present the first implementation of Golynski et al.’s representation [SODA 2006], which offers another interesting time/space trade-off.

Keywords

Binary Sequence Arbitrary Sequence Inverted Index Extra Space Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barbay, J., He, M., Munro, I., Srinivasa Rao, S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: 18th SODA, pp. 680–689 (2007)Google Scholar
  2. 2.
    Brisaboa, N., Fariña, A., Ladra, S., Navarro, G.: Reorganizing compressed text. In: SIGIR (to appear, 2008)Google Scholar
  3. 3.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech.Rep. 124, December (1994)Google Scholar
  4. 4.
    Clark, D.: Compact Pat Trees. Ph.D thesis, University of Waterloo (1996)Google Scholar
  5. 5.
    Claude, F., Navarro, G.: A fast and compact Web graph representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 118–129. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice (manuscript, 2007), http://pizzachili.dcc.uchile.cl
  7. 7.
    Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM TALG 3(2) article 20 (2007)Google Scholar
  9. 9.
    Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)Google Scholar
  10. 10.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. Posters WEA, pp. 27–38 (2005)Google Scholar
  11. 11.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)Google Scholar
  12. 12.
    Mäkinen, V., Navarro, G.: Implicit compression boosting with applications to self-indexing. In: SPIRE, pp. 214–226 (2007)Google Scholar
  13. 13.
    Munro, I., Raman, R., Raman, V., Srinivasa Rao, S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) article 2 (2007)Google Scholar
  15. 15.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1) article 2 (2007)CrossRefzbMATHGoogle Scholar
  16. 16.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: ALENEX (2007)Google Scholar
  17. 17.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 233–242 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Francisco Claude
    • 1
  • Gonzalo Navarro
    • 1
  1. 1.Department of Computer ScienceUniversidad de ChileChile

Personalised recommendations