Grammar Compressed Sequences with Rank/Select Support

  • Gonzalo Navarro
  • Alberto Ordóñez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8799)


Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. In several recent applications, the need to represent highly repetitive sequences arises, where statistical compression is ineffective. We introduce grammar-based representations for repetitive sequences, which use up to 10% of the space needed by representations based on statistical compression, and support direct access and rank/select operations within tens of microseconds.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. ALENEX, pp. 84–97 (2010)Google Scholar
  2. 2.
    D. Arroyuelo, F. Claude, S. Maneth, V. Mäkinen, G. Navarro, K. Nguy\(\tilde{\hat{\textrm{e}}}\)n, J. Sirén, and N. Välimäki. Fast in-memory xpath search over compressed text and tree indexes. In: Proc. 26th ICDE, pp. 417–428 (2010)Google Scholar
  3. 3.
    Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Belazzougui, D., Navarro, G.: New lower and upper bounds for representing sequences. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 181–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Bille, P., Landau, G., Raman, R., Sadakane, K., Rao Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)Google Scholar
  6. 6.
    Brisaboa, N., Fariña, A., Ladra, S., Navarro, G.: Implicit indexing of natural language text by reorganizing bytecodes. Inf. Retr. 15(6), 527–557 (2012)CrossRefGoogle Scholar
  7. 7.
    Brisaboa, N., Ladra, S., Navarro, G.: DACs: Bringing direct access to variable-length codes. Inf. Proc. Manag. 49(1), 392–404 (2013)CrossRefGoogle Scholar
  8. 8.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theor. 51(7), 2554–2576 (2005)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Clark, D.: Compact Pat trees. PhD thesis, Univ. of Waterloo, Canada (1998)Google Scholar
  10. 10.
    Claude, F., Navarro, G.: Extended compact web graph representations. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 77–91. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    F. Claude and G. Navarro. Improved grammar-based compressed indexes. In Proc. 19th SPIRE, LNCS 7608, pages 180–192, 2012.CrossRefGoogle Scholar
  12. 12.
    Claude, F., Navarro, G.: The wavelet matrix. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 167–179. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: An efficient wavelet tree for large alphabets. Information Systems (to appear, 2014)Google Scholar
  14. 14.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014)Google Scholar
  15. 15.
    Gagie, T., Navarro, G., Puglisi, S.J.: New algorithms on wavelet trees and applications to information retrieval. Theor. Comp. Sci. 426-427, 25–41 (2012)Google Scholar
  16. 16.
    Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)Google Scholar
  17. 17.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proc. 4th WEA, pp. 27–38 (2005)Google Scholar
  18. 18.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)Google Scholar
  19. 19.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proceedings of the I.R.E. 40(9), 1098–1101 (1952)CrossRefGoogle Scholar
  20. 20.
    Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. 483, 115–133 (2013)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. of the IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  22. 22.
    Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comp. Biol. 17(3), 281–308 (2010)CrossRefGoogle Scholar
  23. 23.
    Munro, I.: Tables. In: Proc. 16th FSTTCS, pp. 37–42 (1996)Google Scholar
  24. 24.
    Navarro, G.: Indexing highly repetitive collections. In: Smyth, B. (ed.) IWOCA 2012. LNCS, vol. 7643, pp. 274–279. Springer, Heidelberg (2012)Google Scholar
  25. 25.
    Navarro, G.: Wavelet trees for all. J. Discr. Alg. 25, 2–20 (2014)CrossRefMATHGoogle Scholar
  26. 26.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)Google Scholar
  27. 27.
    Navarro, G., Ordóñez, A.: Faster compressed suffix trees for repetitive text collections. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 424–435. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  28. 28.
    Navarro, G., Puglisi, S.J., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  29. 29.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4), article 43 (2007)Google Scholar
  30. 30.
    Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discr. Alg. 3(2-4), 416–430 (2005)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  32. 32.
    Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gonzalo Navarro
    • 1
  • Alberto Ordóñez
    • 2
  1. 1.Dept. of Computer ScienceUniv. of ChileChile
  2. 2.Lab. de Bases de DatosUniv. da CoruñaSpain

Personalised recommendations