, Volume 80, Issue 11, pp 3207–3224 | Cite as

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

  • Philip Bille
  • Anders Roy Christiansen
  • Patrick Hagge Cording
  • Inge Li Gørtz
  • Frederik Rye Skjoldjensen
  • Hjalte Wedel Vildhøj
  • Søren Vind


Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.



We thank Pawel Gawrychowski for helpful discussions.


  1. 1.
    Alstrup, S., Brodal, G. S., Rauhe, T.: Pattern matching in dynamic texts. In: Proceedings of 11th SODA, pp. 819–828 (2000)Google Scholar
  2. 2.
    Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM TALG 3(2), 19 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: Proceedings of 18th ESA, pp. 427–438 (2010)Google Scholar
  4. 4.
    Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. Theory Comput. Syst. 55(1), 41–60 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Chern, B., Ochoa, I., Manolakos, A., No, A., Venkat, K., Weissman, T.: Reference based genome compression. In: IEEE ITW, pp. 427–431 (2012)Google Scholar
  6. 6.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of 36th STOC, pp. 91–100 (2004)Google Scholar
  7. 7.
    Dietz, P. F.: Optimal algorithms for list indexing and subset rank. In: Proceedings of 1st WADS, pp. 39–46 (1989)Google Scholar
  8. 8.
    Do, H.H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel–Ziv self-index for similar sequences. TCS 532, 14–30 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Fenwick, P.M.: A new data structure for cumulative frequency tables. Softw. Pract. Exp. 24(3), 327–336 (1994)CrossRefGoogle Scholar
  10. 10.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical report (2004)Google Scholar
  12. 12.
    Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. TCS 372(1), 115–121 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Fischer, J., Gagie, T., Gawrychowski, P., Kociumaka, T.: Approximating lz77 via small-space multiple-pattern matching. In: Algorithms-ESA 2015, pp. 533–544. Springer (2015)Google Scholar
  14. 14.
    Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of 21st STOC, pp. 345–354 (1989)Google Scholar
  15. 15.
    Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Gawrychowski, P., Lewenstein, M., Nicholson, P. K.: Weighted ancestors in suffix trees. In: Proceedings of 22nd ESA, pp. 455–466 (2014)Google Scholar
  17. 17.
    Goswami, M., Grønlund, A., Larsen, K. G., Pagh, R.: Approximate range emptiness in constant time and optimal space. In: Proceedings of 26th SODA, pp. 769–775 (2015)Google Scholar
  18. 18.
    Grossi, R., Gupta, A., Vitter, J. S.: High-order entropy-compressed text indexes. In: Proceedings of 14th SODA, pp. 841–850 (2003)Google Scholar
  19. 19.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Hon, W.-K., Sadakane, K., Sung, W.-K.: Succinct data structures for searchable partial sums with optimal worst-case performance. TCS 412(39), 5176–5186 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel–Ziv factorization for efficient storage and retrieval of web collections. PVLDB 5(3), 265–273 (2011)Google Scholar
  22. 22.
    Husfeldt, T., Rauhe, T.: New lower bound techniques for dynamic partial sums and related problems. SIAM J. Comput. 32(3), 736–753 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Husfeldt, T., Rauhe, T., Skyum, S.: Lower bounds for dynamic transitive closure, planar point location, and parentheses matching. In: Proceedings of 5th SWAT, pp. 198–211 (1996)Google Scholar
  24. 24.
    Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: compressed random access memory. In: Proceedings of 39th ICALP, pp. 510–521 (2012)Google Scholar
  25. 25.
    Kernighan, B., Ritchie, D.: The C Programming Language, 1st edn. Prentice-Hall, Upper Saddle River (1978)zbMATHGoogle Scholar
  26. 26.
    Kuruppu, S., Puglisi, S. J., Zobel, J.: Relative Lempel–Ziv compression of genomes for large-scale storage and retrieval. In: Proceedings of 17th SPIRE, pp. 201–206 (2010)Google Scholar
  27. 27.
    Kuruppu, S., Puglisi, S. J., Zobel, J.: Optimized relative Lempel–Ziv compression of genomes. In: Proceedings of 34th ACSC, pp. 91–98 (2011)Google Scholar
  28. 28.
    Lewenstein, M., Nekrich, Y., Vitter, J. S.: Space-efficient string indexing for wildcard pattern matching. In: Proceedings of 31st STACS, pp. 506–517 (2014)Google Scholar
  29. 29.
    Liao, S.Y., Devadas, S., Keutzer, K.: A text-compression-based method for code size minimization in embedded systems. ACM Trans. Des. Autom. Electron. Syst. 4(1), 12–38 (1999)CrossRefGoogle Scholar
  30. 30.
    Liao, S.Y., Devadas, S., Keutzer, K., Tjiang, S.W.K., Wang, A.: Code optimization techniques in embedded DSP microprocessors. Des. Autom. Embed. Syst. 3(1), 59–73 (1998)CrossRefGoogle Scholar
  31. 31.
    Mehlhorn, K., Nähler, S.: Bounded ordered dictionaries in \(O(\log \log N)\) time and \(O(n)\) space. Inf. Process. Lett. 35(4), 183–189 (1990)CrossRefzbMATHGoogle Scholar
  32. 32.
    Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. In: Proceedings of 24th SODA, pp. 865–876 (2013)Google Scholar
  33. 33.
    Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Trans. Algorithms 10(3), 16 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Pătraşcu, M., Demaine, E. D.: Tight bounds for the partial-sums problem. In: Proceedings of 15th SODA, pp. 20–29 (2004)Google Scholar
  35. 35.
    Pătraşcu, M., Thorup, M.: Dynamic integer sets with optimal rank, select, and predecessor search. In: Proceedings of 55th FOCS, pp. 166–175 (2014)Google Scholar
  36. 36.
    Raman, R., Raman, V., Rao, S. S.: Succinct dynamic data structures. In: Proceedings of 7th WADS, pp. 426–437 (2001)Google Scholar
  37. 37.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of 17th SODA, pp. 1230–1239 (2006)Google Scholar
  38. 38.
    Storer, J. A., Szymanski, T. G.: The macro model for data compression. In: Proceedings of 10th STOC, pp. 30–39 (1978)Google Scholar
  39. 39.
    Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Stroustrup, B.: The C++ Programming Language: Special Edition, 3rd edn. Addison-Wesley (2000). First edition from 1985Google Scholar
  41. 41.
    van Emde Baos, P.: Preserving order in a forest in less than logarithmic time and linear space. Inf. Process. Lett. 6(3), 80–82 (1977)CrossRefzbMATHGoogle Scholar
  42. 42.
    van Emde Boas, P., Kaas, R., Zijlstra, E., Zijlstra, E.: Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99–127 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Willard, D.E.: Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree. SIAM J. Comput. 29(3), 1030–1049 (2000)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Philip Bille
    • 1
  • Anders Roy Christiansen
    • 1
  • Patrick Hagge Cording
    • 1
  • Inge Li Gørtz
    • 1
  • Frederik Rye Skjoldjensen
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.Technical University of DenmarkCopenhagenDenmark

Personalised recommendations