Algorithmica

pp 1–18 | Cite as

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

  • Philip Bille
  • Anders Roy Christiansen
  • Patrick Hagge Cording
  • Inge Li Gørtz
  • Frederik Rye Skjoldjensen
  • Hjalte Wedel Vildhøj
  • Søren Vind
Article
  • 10 Downloads

Abstract

Given a static reference string R and a source string S, a relative compression of S with respect to R is an encoding of S as a sequence of references to substrings of R. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string S is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

Notes

Acknowledgements

We thank Pawel Gawrychowski for helpful discussions.

References

  1. 1.
    Alstrup, S., Brodal, G. S., Rauhe, T.: Pattern matching in dynamic texts. In: Proceedings of 11th SODA, pp. 819–828 (2000)Google Scholar
  2. 2.
    Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM TALG 3(2), 19 (2007)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: Proceedings of 18th ESA, pp. 427–438 (2010)Google Scholar
  4. 4.
    Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. Theory Comput. Syst. 55(1), 41–60 (2014)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Chern, B., Ochoa, I., Manolakos, A., No, A., Venkat, K., Weissman, T.: Reference based genome compression. In: IEEE ITW, pp. 427–431 (2012)Google Scholar
  6. 6.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of 36th STOC, pp. 91–100 (2004)Google Scholar
  7. 7.
    Dietz, P. F.: Optimal algorithms for list indexing and subset rank. In: Proceedings of 1st WADS, pp. 39–46 (1989)Google Scholar
  8. 8.
    Do, H.H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel–Ziv self-index for similar sequences. TCS 532, 14–30 (2014)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Fenwick, P.M.: A new data structure for cumulative frequency tables. Softw. Pract. Exp. 24(3), 327–336 (1994)CrossRefGoogle Scholar
  10. 10.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Succinct representation of sequences. Technical report (2004)Google Scholar
  12. 12.
    Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. TCS 372(1), 115–121 (2007)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Fischer, J., Gagie, T., Gawrychowski, P., Kociumaka, T.: Approximating lz77 via small-space multiple-pattern matching. In: Algorithms-ESA 2015, pp. 533–544. Springer (2015)Google Scholar
  14. 14.
    Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of 21st STOC, pp. 345–354 (1989)Google Scholar
  15. 15.
    Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. J. Comput. Syst. Sci. 47(3), 424–436 (1993)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Gawrychowski, P., Lewenstein, M., Nicholson, P. K.: Weighted ancestors in suffix trees. In: Proceedings of 22nd ESA, pp. 455–466 (2014)Google Scholar
  17. 17.
    Goswami, M., Grønlund, A., Larsen, K. G., Pagh, R.: Approximate range emptiness in constant time and optimal space. In: Proceedings of 26th SODA, pp. 769–775 (2015)Google Scholar
  18. 18.
    Grossi, R., Gupta, A., Vitter, J. S.: High-order entropy-compressed text indexes. In: Proceedings of 14th SODA, pp. 841–850 (2003)Google Scholar
  19. 19.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Hon, W.-K., Sadakane, K., Sung, W.-K.: Succinct data structures for searchable partial sums with optimal worst-case performance. TCS 412(39), 5176–5186 (2011)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel–Ziv factorization for efficient storage and retrieval of web collections. PVLDB 5(3), 265–273 (2011)Google Scholar
  22. 22.
    Husfeldt, T., Rauhe, T.: New lower bound techniques for dynamic partial sums and related problems. SIAM J. Comput. 32(3), 736–753 (2003)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Husfeldt, T., Rauhe, T., Skyum, S.: Lower bounds for dynamic transitive closure, planar point location, and parentheses matching. In: Proceedings of 5th SWAT, pp. 198–211 (1996)Google Scholar
  24. 24.
    Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: compressed random access memory. In: Proceedings of 39th ICALP, pp. 510–521 (2012)Google Scholar
  25. 25.
    Kernighan, B., Ritchie, D.: The C Programming Language, 1st edn. Prentice-Hall, Upper Saddle River (1978)MATHGoogle Scholar
  26. 26.
    Kuruppu, S., Puglisi, S. J., Zobel, J.: Relative Lempel–Ziv compression of genomes for large-scale storage and retrieval. In: Proceedings of 17th SPIRE, pp. 201–206 (2010)Google Scholar
  27. 27.
    Kuruppu, S., Puglisi, S. J., Zobel, J.: Optimized relative Lempel–Ziv compression of genomes. In: Proceedings of 34th ACSC, pp. 91–98 (2011)Google Scholar
  28. 28.
    Lewenstein, M., Nekrich, Y., Vitter, J. S.: Space-efficient string indexing for wildcard pattern matching. In: Proceedings of 31st STACS, pp. 506–517 (2014)Google Scholar
  29. 29.
    Liao, S.Y., Devadas, S., Keutzer, K.: A text-compression-based method for code size minimization in embedded systems. ACM Trans. Des. Autom. Electron. Syst. 4(1), 12–38 (1999)CrossRefGoogle Scholar
  30. 30.
    Liao, S.Y., Devadas, S., Keutzer, K., Tjiang, S.W.K., Wang, A.: Code optimization techniques in embedded DSP microprocessors. Des. Autom. Embed. Syst. 3(1), 59–73 (1998)CrossRefGoogle Scholar
  31. 31.
    Mehlhorn, K., Nähler, S.: Bounded ordered dictionaries in \(O(\log \log N)\) time and \(O(n)\) space. Inf. Process. Lett. 35(4), 183–189 (1990)CrossRefMATHGoogle Scholar
  32. 32.
    Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. In: Proceedings of 24th SODA, pp. 865–876 (2013)Google Scholar
  33. 33.
    Navarro, G., Sadakane, K.: Fully functional static and dynamic succinct trees. ACM Trans. Algorithms 10(3), 16 (2014)MathSciNetCrossRefMATHGoogle Scholar
  34. 34.
    Pătraşcu, M., Demaine, E. D.: Tight bounds for the partial-sums problem. In: Proceedings of 15th SODA, pp. 20–29 (2004)Google Scholar
  35. 35.
    Pătraşcu, M., Thorup, M.: Dynamic integer sets with optimal rank, select, and predecessor search. In: Proceedings of 55th FOCS, pp. 166–175 (2014)Google Scholar
  36. 36.
    Raman, R., Raman, V., Rao, S. S.: Succinct dynamic data structures. In: Proceedings of 7th WADS, pp. 426–437 (2001)Google Scholar
  37. 37.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of 17th SODA, pp. 1230–1239 (2006)Google Scholar
  38. 38.
    Storer, J. A., Szymanski, T. G.: The macro model for data compression. In: Proceedings of 10th STOC, pp. 30–39 (1978)Google Scholar
  39. 39.
    Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. J. ACM 29(4), 928–951 (1982)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Stroustrup, B.: The C++ Programming Language: Special Edition, 3rd edn. Addison-Wesley (2000). First edition from 1985Google Scholar
  41. 41.
    van Emde Baos, P.: Preserving order in a forest in less than logarithmic time and linear space. Inf. Process. Lett. 6(3), 80–82 (1977)CrossRefMATHGoogle Scholar
  42. 42.
    van Emde Boas, P., Kaas, R., Zijlstra, E., Zijlstra, E.: Design and implementation of an efficient priority queue. Math. Syst. Theory 10, 99–127 (1977)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Willard, D.E.: Examining computational geometry, van Emde Boas trees, and hashing from the perspective of the fusion tree. SIAM J. Comput. 29(3), 1030–1049 (2000)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • Philip Bille
    • 1
  • Anders Roy Christiansen
    • 1
  • Patrick Hagge Cording
    • 1
  • Inge Li Gørtz
    • 1
  • Frederik Rye Skjoldjensen
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.Technical University of DenmarkCopenhagenDenmark

Personalised recommendations