Pattern Matching in Lempel-Ziv Compressed Strings: Fast, Simple, and Deterministic

  • Paweł Gawrychowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6942)

Abstract

Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern \(p[1\mathinner{\ldotp\ldotp} m]\) and a Lempel-Ziv representation of a string \(t[1\mathinner{\ldotp\ldotp} N]\), does p occur in t? Farach and Thorup [5] gave a randomized \(\mathcal{O}(n\log^2\frac{N}{n}+m)\) time solution for this problem, where n is the size of the compressed representation of t. Building on the methods of [3] and [6], we improve their result by developing a faster and fully deterministic \(\mathcal{O}(n\log\frac{N}{n}+m)\) time algorithm with the same space complexity. Note that for highly compressible texts, \(\log\frac{N}{n}\) might be of order n, so for such inputs the improvement is very significant. A small fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan [4].

Keywords

pattern matching compression Lempel-Ziv 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in z-compressed files. In: SODA 1994: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 705–714. SIAM, Philadelphia (1994)Google Scholar
  2. 2.
    Bender, M.A., Farach-Colton, M.: The lca problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: STOC 2002: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 792–801. ACM, New York (2002)CrossRefGoogle Scholar
  4. 4.
    Farach, M., Muthukrishnan, S.: Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  5. 5.
    Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. In: STOC 1995: Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, pp. 703–712. ACM, New York (1995)CrossRefGoogle Scholar
  6. 6.
    Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: SODA 2011: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (2011)Google Scholar
  7. 7.
    Iacono, J., Özkan, Ö.: Mergeable dictionaries. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 164–175. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 298, 253–272 (2003)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Kosaraju, S.R.: Pattern matching in compressed texts. In: Thiagarajan, P.S. (ed.) FSTTCS 1995. LNCS, vol. 1026, pp. 349–362. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  12. 12.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Yao, A.C.C.: Lower bounds for algebraic computation trees with integer inputs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science, pp. 308–313. IEEE Computer Society, Washington, DC, USA (1989)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Paweł Gawrychowski
    • 1
  1. 1.Institute of Computer ScienceUniversity of WrocławWroclawPoland

Personalised recommendations