Advertisement

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

  • Philip Bille
  • Rolf Fagerberg
  • Inge Li Gørtz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4580)

Abstract

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds. In practical applications the space is likely to be a bottleneck and therefore this is of crucial importance.

Keywords

Pattern Match Regular Expression Match Problem Edit Distance Compression Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proceedings of the 2nd Data Compression Conference, pp. 279–288 (1992)Google Scholar
  2. 2.
    Amir, A., Benson, G.: Two-dimensional periodicity and its applications. In: Proceedings of the 3rd Symposium on Discrete algorithms, pp. 440–452 (1992)Google Scholar
  3. 3.
    Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Bille, P.: New algorithms for regular expression matching. In: Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, pp. 643–654 (2006)Google Scholar
  5. 5.
    Bille, P., Fagerberg, R., Gørtz, I.L.: Improved approximate string matching and regular expression matching on ziv-lempel compressed texts (2007), Draft of full version available at arxiv.org/cs/DS/0609085
  6. 6.
    Bille, P., Farach-Colton, M.: Fast and compact regular expression matching, Submitted to a journal (2005), Preprint availiable at arxiv.org/cs/0509069
  7. 7.
    Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Dietzfelbinger, M., Karlin, A., Mehlhorn, K., auf der Heide, F.M., Rohnert, H., Tarjan, R.: Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput. 23(4), 738–761 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Kärkkäinen, J., Navarro, G., Ukkonen, E.: Approximate string matching on Ziv-Lempel compressed text. J. Discrete Algorithms 1(3-4), 313–338 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Kida, T., Takeda, M., Shinohara, A., Miyazaki, M., Arikawa, S.: Multiple pattern matching in LZW compressed text. In: Proceedings of the 8th Data Compression Conference, pp. 103–112 (1998)Google Scholar
  12. 12.
    Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Mäkinen, V., Ukkonen, E., Navarro, G.: Approximate matching of run-length compressed strings. Algorithmica 35(4), 347–369 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Matsumoto, T., Kida, T., Takeda, M., Shinohara, A., Arikawa, S.: Bit-parallel approach to approximate string matching in compressed texts. In: Proceedings of the 7th International Symposium on String Processing and Information Retrieval, pp. 221–228 (2000)Google Scholar
  15. 15.
    Myers, E.W.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)zbMATHCrossRefGoogle Scholar
  16. 16.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  17. 17.
    Navarro, G.: Regular expression searching on compressed text. J. Discrete Algorithms 1(5-6), 423–443 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Navarro, G., Kida, T., Takeda, M., Shinohara, A., Arikawa, S.: Faster approximate string matching over compressed text. In: Proceedings of the Data Compression Conference (DCC 2001), p. 459. IEEE Computer Society, Washington, DC, USA (2001)CrossRefGoogle Scholar
  19. 19.
    Navarro, G., Raffinot, M.: A general practical approach to pattern matching over Ziv-Lempel compressed text. Technical Report TR/DCC-98-12, Dept. of Computer Science, Univ. of Chile (1998)Google Scholar
  20. 20.
    Sellers, P.: The theory and computation of evolutionary distances: Pattern recognition. J. Algorithms 1, 359–373 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Thompson, K.: Programming techniques: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)zbMATHCrossRefGoogle Scholar
  22. 22.
    Welch, T.A.: A technique for high-performance data compression. IEEE Computer 17(6), 8–19 (1984)Google Scholar
  23. 23.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24(5), 530–536 (1978)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Philip Bille
    • 1
  • Rolf Fagerberg
    • 2
  • Inge Li Gørtz
    • 3
  1. 1.IT University of Copenhagen. Rued Langgaards Vej 7, 2300 Copenhagen SDenmark
  2. 2.University of Southern Denmark. Campusvej 55, 5230 Odense MDenmark
  3. 3.Technical University of Denmark. Informatics and Mathematical Modelling, Building 322, 2800 Kgs. LyngbyDenmark

Personalised recommendations