Grammar Compressed Sequences with Rank/Select Support

  • Gonzalo Navarro
  • Alberto Ordóñez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8799)

Abstract

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. In several recent applications, the need to represent highly repetitive sequences arises, where statistical compression is ineffective. We introduce grammar-based representations for repetitive sequences, which use up to 10% of the space needed by representations based on statistical compression, and support direct access and rank/select operations within tens of microseconds.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. ALENEX, pp. 84–97 (2010)Google Scholar
  2. 2.
    D. Arroyuelo, F. Claude, S. Maneth, V. Mäkinen, G. Navarro, K. Nguy\(\tilde{\hat{\textrm{e}}}\)n, J. Sirén, and N. Välimäki. Fast in-memory xpath search over compressed text and tree indexes. In: Proc. 26th ICDE, pp. 417–428 (2010)Google Scholar
  3. 3.
    Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Belazzougui, D., Navarro, G.: New lower and upper bounds for representing sequences. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 181–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Bille, P., Landau, G., Raman, R., Sadakane, K., Rao Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)Google Scholar
  6. 6.
    Brisaboa, N., Fariña, A., Ladra, S., Navarro, G.: Implicit indexing of natural language text by reorganizing bytecodes. Inf. Retr. 15(6), 527–557 (2012)CrossRefGoogle Scholar
  7. 7.
    Brisaboa, N., Ladra, S., Navarro, G.: DACs: Bringing direct access to variable-length codes. Inf. Proc. Manag. 49(1), 392–404 (2013)CrossRefGoogle Scholar
  8. 8.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theor. 51(7), 2554–2576 (2005)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Clark, D.: Compact Pat trees. PhD thesis, Univ. of Waterloo, Canada (1998)Google Scholar
  10. 10.
    Claude, F., Navarro, G.: Extended compact web graph representations. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 77–91. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    F. Claude and G. Navarro. Improved grammar-based compressed indexes. In Proc. 19th SPIRE, LNCS 7608, pages 180–192, 2012.CrossRefGoogle Scholar
  12. 12.
    Claude, F., Navarro, G.: The wavelet matrix. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 167–179. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: An efficient wavelet tree for large alphabets. Information Systems (to appear, 2014)Google Scholar
  14. 14.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014)Google Scholar
  15. 15.
    Gagie, T., Navarro, G., Puglisi, S.J.: New algorithms on wavelet trees and applications to information retrieval. Theor. Comp. Sci. 426-427, 25–41 (2012)Google Scholar
  16. 16.
    Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)Google Scholar
  17. 17.
    González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proc. 4th WEA, pp. 27–38 (2005)Google Scholar
  18. 18.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)Google Scholar
  19. 19.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proceedings of the I.R.E. 40(9), 1098–1101 (1952)CrossRefGoogle Scholar
  20. 20.
    Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. 483, 115–133 (2013)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. of the IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  22. 22.
    Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comp. Biol. 17(3), 281–308 (2010)CrossRefGoogle Scholar
  23. 23.
    Munro, I.: Tables. In: Proc. 16th FSTTCS, pp. 37–42 (1996)Google Scholar
  24. 24.
    Navarro, G.: Indexing highly repetitive collections. In: Smyth, B. (ed.) IWOCA 2012. LNCS, vol. 7643, pp. 274–279. Springer, Heidelberg (2012)Google Scholar
  25. 25.
    Navarro, G.: Wavelet trees for all. J. Discr. Alg. 25, 2–20 (2014)CrossRefMATHGoogle Scholar
  26. 26.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)Google Scholar
  27. 27.
    Navarro, G., Ordóñez, A.: Faster compressed suffix trees for repetitive text collections. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 424–435. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  28. 28.
    Navarro, G., Puglisi, S.J., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  29. 29.
    Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4), article 43 (2007)Google Scholar
  30. 30.
    Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discr. Alg. 3(2-4), 416–430 (2005)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  32. 32.
    Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gonzalo Navarro
    • 1
  • Alberto Ordóñez
    • 2
  1. 1.Dept. of Computer ScienceUniv. of ChileChile
  2. 2.Lab. de Bases de DatosUniv. da CoruñaSpain

Personalised recommendations