Advertisement

Beating \(\mathcal{O}(nm)\) in Approximate LZW-Compressed Pattern Matching

  • Paweł Gawrychowski
  • Damian Straszak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8283)

Abstract

Given an LZW/LZ78 compressed text, we want to find an approximate occurrence of a given pattern of length m. The goal is to achieve time complexity depending on the size n of the compressed representation of the text instead of its length. We consider two specific definitions of approximate matching, namely the Hamming distance and the edit distance, and show how to achieve \(\mathcal{O}(n\sqrt{m}k^{2})\) and \(\mathcal{O}(n\sqrt{m}k^{3})\) running time, respectively, where k is the bound on the distance, both in linear space. Even for very small values of k, the best previously known solutions required Ω(nm) time. Our main contribution is applying a periodicity-based argument in a way that is computationally effective even if we operate on a compressed representation of a string, while the previous solutions were either based on a dynamic programming, or a black-box application of tools developed for uncompressed strings.

Keywords

approximate pattern matching edit distance Lempel-Ziv 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Bille, P., Fagerberg, R., Gørtz, I.L.: Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts. ACM Transactions on Algorithms 6(1) (2009)Google Scholar
  4. 4.
    Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Crochemore, M., Rytter, W.: Jewels of stringology. World Scientific (2002)Google Scholar
  6. 6.
    Gawrychowski, P.: Optimal pattern matching in LZW compressed strings. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 362–372. SIAM (2011)Google Scholar
  7. 7.
    Gawrychowski, P.: Simple and efficient LZW-compressed multiple pattern matching. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 232–242. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Dürr, C., Wilke, T. (eds.) STACS. LIPIcs, vol. 14, pp. 624–635. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)Google Scholar
  9. 9.
    Kärkkäinen, J., Navarro, G., Ukkonen, E.: Approximate string matching on Ziv-Lempel compressed text. J. Discrete Algorithms 1(3-4), 313–338 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Landau, G.M., Vishkin, U.: Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239–249 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Welch, T.A.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)CrossRefGoogle Scholar
  13. 13.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Paweł Gawrychowski
    • 1
  • Damian Straszak
    • 2
  1. 1.Max-Planck-Institut für InformatikSaarbrückenGermany
  2. 2.Institute of Computer ScienceUniversity of WrocławPoland

Personalised recommendations