Advertisement

Simple and Efficient LZW-Compressed Multiple Pattern Matching

  • Paweł Gawrychowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)

Abstract

We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string \(t[1\mathinner{\ldotp\ldotp} N]\) and a collection of (uncompressed) patterns p 1,p 2,…,p with ∑  i |p i | = M, does any of p i occur in t? As shown by Kida et al. [12], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of \(\mathcal{O}(n+M^{2})\) for the more general case. We prove that in fact it is possible to achieve \(\mathcal{O}(n\log M+M)\) or \(\mathcal{O}(n+M^{1+\epsilon})\) complexity. While not linear, running time of our solution matches the single pattern bounds achieved by [2] and [14] in a more structured and unified manner, and without using a lot of combinatorics on words. The only nontrivial components are the suffix array, constant time range minimum queries, and any balanced binary search trees.

Keywords

multiple pattern matching compression Lempel-Ziv-Welch 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 333–340 (1975)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in z-compressed files. In: SODA 1994: Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, pp. 705–714. Society for Industrial and Applied Mathematics (1994)Google Scholar
  3. 3.
    Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Bender, M.A., Farach-Colton, M.: The level ancestor problem simplified. Theor. Comput. Sci. 321(1), 5–12 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)zbMATHCrossRefGoogle Scholar
  6. 6.
    Driscoll, J.R., Sarnak, N., Sleator, D.D., Tarjan, R.E.: Making data structures persistent. J. Comput. Syst. Sci. 38(1), 86–124 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. In: STOC 1995: Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, pp. 703–712. ACM, New York (1995)CrossRefGoogle Scholar
  8. 8.
    Galil, Z.: String matching in real time. J. ACM 28(1), 134–149 (1981)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Galil, Z., Seiferas, J.: Time-space-optimal string matching (preliminary report). In: STOC 1981: Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pp. 106–113. ACM, New York (1981)CrossRefGoogle Scholar
  10. 10.
    Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Randall, D. (ed.) SODA, pp. 362–372. SIAM (2011)Google Scholar
  11. 11.
    Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS. LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)Google Scholar
  12. 12.
    Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., Arikawa, S.: Multiple pattern matching in LZW compressed text. In: Proceedings of Data Compression Conference, DCC 1998, pp. 103–112. IEEE (1998)Google Scholar
  13. 13.
    Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Kosaraju, S.R.: Pattern Matching in Compressed Texts. In: Thiagarajan, P.S. (ed.) FSTTCS 1995. LNCS, vol. 1026, pp. 349–362. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  15. 15.
    Morris Jr., J.H., Pratt, V.R.: A linear pattern-matching algorithm. Technical Report 40, University of California, Berkeley (1970)Google Scholar
  16. 16.
    Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Paweł Gawrychowski
    • 1
    • 2
  1. 1.Institute of Computer ScienceUniversity of WrocławPoland
  2. 2.Max-Planck-Institute für InformatikSaarbrückenGermany

Personalised recommendations