Advertisement

Approximating LZ77 via Small-Space Multiple-Pattern Matching

  • Johannes Fischer
  • Travis Gagie
  • Paweł Gawrychowski
  • Tomasz Kociumaka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9294)

Abstract

We generalize Karp-Rabin string matching to handle multiple patterns in \(\mathcal{O}(n \log n + m)\) time and \(\mathcal{O}(s)\) space, where n is the length of the text and m is the total length of the s patterns, returning correct answers with high probability. As a prime application of our algorithm, we show how to approximate the LZ77 parse of a string of length n. If the optimal parse consists of z phrases, using only \(\mathcal{O}(z)\) working space we can return a parse consisting of at most 2z phrases in \(\mathcal{O}(n\log n)\) time, and a parse of at most (1 + ε)z phrases in \(\mathcal{O}(n\log^{2}n)\) for any constant ε > 0. As previous quasilinear-time algorithms for LZ77 use Ω(n/polylogn) space, but z can be exponentially small in n, these improvements in space consumption are substantial.

Keywords

Hash Table Full Version String Match Pattern Match Algorithm Space Consumption 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rabin-Karp algorithm — Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=Rabin-Karp_algorithm&oldid=665980736
  2. 2.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Breslauer, D., Grossi, R., Mignosi, F.: Simple real-time constant-space string matching. Theor. Comput. Sci. 483, 2–9 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Clifford, R., Fontaine, A., Porat, E., Sach, B., Starikovskaya, T.: Dictionary matching in a stream. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 8737, pp. ??–?? Springer, Heidelberg (2015)Google Scholar
  5. 5.
    Crochemore, M., Perrin, D.: Two-way string matching. J. ACM 38(3), 651–675 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. P. Am. Math. Soc. 16(1), 109–114 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fischer, J., Gagie, T., Gawrychowski, P., Kociumaka, T.: Approximating LZ77 via small-space multiple-pattern matching. CoRR abs/1504.06647 (2015)Google Scholar
  8. 8.
    Fischer, J., I, T., Köppl, D.: Lempel Ziv Computation in Small Space (LZ-CISS). In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 172–184. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  9. 9.
    Galil, Z., Seiferas, J.I.: Time-space-optimal string matching. J. Comput. Syst. Sci. 26(3), 280–294 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gasieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Efficient algorithms for Lempel-Ziv encoding (extended abstract). In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  11. 11.
    Gum, B., Lipton, R.J.: Cheaper by the dozen: Batched algorithms. In: Kumar, V., Grossman, R.L. (eds.) SDM 2001, pp. 1–11. SIAM, Philadelphia (2001)Google Scholar
  12. 12.
    Hon, W., Ku, T., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theor. Comput. Sci. 475, 113–119 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  14. 14.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Kociumaka, T., Starikovskaya, T., Vildhøj, H.W.: Sublinear space algorithms for the longest common substring problem. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737, pp. 605–617. Springer, Heidelberg (2014)Google Scholar
  16. 16.
    Lohrey, M.: Algorithmics on SLP-compressed strings: A survey. Groups Complexity Cryptology 4(2), 241–299 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Ružić, M.: Constructing efficient dictionaries in close to sorting time. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 84–95. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  18. 18.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Johannes Fischer
    • 1
  • Travis Gagie
    • 2
  • Paweł Gawrychowski
    • 3
  • Tomasz Kociumaka
    • 3
  1. 1.TUDortmundGermany
  2. 2.Helsinki Institute for Information Technology (HIIT), Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.Institute of InformaticsUniversity of WarsawWarsawPoland

Personalised recommendations