Advertisement

A Self-index on Block Trees

  • Gonzalo NavarroEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10508)

Abstract

The Block Tree is a recently proposed data structure that reaches compression close to Lempel-Ziv while supporting efficient direct access to text substrings. In this paper we show how a self-index can be built on top of a Block Tree so that it provides efficient pattern searches while using space proportional to that of the original data structure. More precisely, if a Lempel-Ziv parse cuts a text of length n into z non-overlapping phrases, then our index uses \(O(z\lg (n/z))\) words and finds the occ occurrences of a pattern of length m in time \(O(m^2\lg n+occ\lg ^\epsilon n)\) for any constant \(\epsilon >0\).

Notes

Acknowledgements

Many thanks to Simon Puglisi and an anonymous reviewer for pointing out several fatal typos in the formulas.

References

  1. 1.
    Belazzougui, D., Gagie, T., Gawrychowski, P., Kärkkäinen, J., Ordóñez, A., Puglisi, S.J., Tabei, Y.: Queries on LZ-bounded encodings. In: Proceedings of 25th Data Compression Conference (DCC), pp. 83–92 (2015)Google Scholar
  2. 2.
    Bille, P., Ettienne, M.B., Gørtz, I.L., Vildhøj, H.W.: Time-space trade-offs for Lempel-Ziv compressed indexing. In: Proceedings of 28th Annual Symposium on Combinatorial Pattern Matching (CPM). LIPIcs, vol. 78, pp. 16:1–16:17 (2017)Google Scholar
  3. 3.
    Chan, T.M., Larsen, K.G., Pătraşcu, M.: Orthogonal range searching on the RAM. In: Proceedings of 27th ACM Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)Google Scholar
  4. 4.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Clark, D.: Compact PAT trees. Ph.D. thesis, University of Waterloo, Canada (1996)Google Scholar
  6. 6.
    Claude, F., Fariña, A., Martínez-Prieto, M., Navarro, G.: Universal indexes for highly repetitive document collections. Inf. Syst. 61, 1–23 (2016)CrossRefGoogle Scholar
  7. 7.
    Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2010)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 180–192. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34109-0_19 CrossRefGoogle Scholar
  9. 9.
    Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-28332-1_21 CrossRefGoogle Scholar
  11. 11.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: LZ77-based self-indexing with faster pattern matching. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol. 8392, pp. 731–742. Springer, Heidelberg (2014). doi: 10.1007/978-3-642-54423-1_63 CrossRefGoogle Scholar
  12. 12.
    Golynski, A., Raman, R., Rao, S.S.: On the redundancy of succinct data structures. In: Gudmundsson, J. (ed.) SWAT 2008. LNCS, vol. 5124, pp. 148–159. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-69903-3_15 CrossRefGoogle Scholar
  13. 13.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)Google Scholar
  14. 14.
    Jez, A.: Approximation of grammar-based compression via recompression. Theor. Comput. Sci. 592, 115–134 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Jez, A.: A really simple approximation of smallest grammar. Theor. Comput. Sci. 616, 141–150 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proceedings of 3rd South American Workshop on String Processing (WSP), pp. 141–155 (1996)Google Scholar
  17. 17.
    Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Morrison, D.: PATRICIA - practical algorithm to retrieve information coded in alphanumeric. J. ACM 15(4), 514–534 (1968)CrossRefGoogle Scholar
  19. 19.
    Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations and functions. Theor. Comput. Sci. 438, 74–88 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Nishimoto, T., Tomohiro, I., Inenaga, S., Bannai, H., Takeda, M.: Dynamic index, LZ factorization, and LCE queries in compressed space. CoRR abs/1504.06954 (2015)Google Scholar
  22. 22.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of 9th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 60–70 (2007)Google Scholar
  23. 23.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(24), 416–430 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations