Skip to main content

Processing Compressed Texts: A Tractability Border

  • Conference paper
Combinatorial Pattern Matching (CPM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4580))

Included in the following conference series:

Abstract

What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length.

We present an algorithm that solves the first problem in O(n 3) time and the second problem in O(n 2 m) time. Here n is the size of compressed representation (we consider representations by straight-line programs) of the text and m is the size of compressed representation of the pattern. Next, we prove that the third problem is actually #P-complete. Thus, we indicate a pair of similar problems (equivalence checking, Hamming distance computation) that have radically different complexity on compressed texts. Our algorithmic technique used for problems (1) and (2) helps for computing minimal periods and covers of compressed texts.

Support by grants INTAS 04-77-7173 and NSh-8464.2006.1 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. In: SODA 1994 (1994)

    Google Scholar 

  2. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 340–352. Springer, Heidelberg (2003)

    Google Scholar 

  3. Apostolico, A., Farach, M., Iliopoulos, C.S.: Optimal superprimitivity testing for strings. Inf. Process. Lett. 39(1), 17–20 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  4. Berman, P., Karpinski, M., Larmore, L.L., Plandowski, W., Rytter, W.: On the complexity of pattern matching for highly compressed two-dimensional texts. Journal of Computer and Systems Science 65(2), 332–350 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Cegielski, P., Guessarian, I., Lifshits, Y., Matiyasevich, Y.: Window subsequence problems for compressed texts. In: Grigoriev, D., Harrison, J., Hirsch, E.A. (eds.) CSR 2006. LNCS, vol. 3967, Springer, Heidelberg (2006)

    Google Scholar 

  6. Farach, M., Thorup, M.: String matching in lempel-ziv compressed strings. In: STOC 1995, pp. 703–712. ACM Press, New York (1995)

    Chapter  Google Scholar 

  7. Garey, M., Johnson, D.: Computers and Intractability: a Guide to the Theory of NP-completeness. Freeman (1979)

    Google Scholar 

  8. Ga̧sieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Efficient algorithms for Lempel-Ziv encoding (extended abstract). In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 392–403. Springer, Heidelberg (1996)

    Google Scholar 

  9. Genest, B., Muscholl, A.: Pattern matching and membership for hierarchical message sequence charts. In: Rajsbaum, S. (ed.) LATIN 2002. LNCS, vol. 2286, pp. 326–340. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Hirao, M., Shinohara, A., Takeda, M., Arikawa, S.: Fully compressed pattern matching algorithm for balanced straight-line programs. In: SPIRE 2000, pp. 132–138. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  11. Kärkkäinen, J., Navarro, G., Ukkonen, E.: Approximate string matching over Ziv-Lempel compressed text. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 195–209. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  12. Karpinski, M., Rytter, W., Shinohara, A.: Pattern-matching for strings with short descriptions. In: Galil, Z., Ukkonen, E. (eds.) Combinatorial Pattern Matching. LNCS, vol. 937, pp. 205–214. Springer, Heidelberg (1995)

    Google Scholar 

  13. Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage system: a unifying framework for compressed pattern matching. Theor. Comput. Sci. 298(1), 253–272 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  14. Lasota, S., Rytter, W.: Faster algorithm for bisimulation equivalence of normed context-free processes. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 646–657. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Lifshits, Y.: Algorithmic properties of compressed texts. Technical Report PDMI, 23/2006 (2006)

    Google Scholar 

  16. Lifshits, Y., Lohrey, M.: Quering and embedding compressed texts. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 681–692. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Lohrey, M.: Word problems on compressed word. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 906–918. Springer, Heidelberg (2004)

    Google Scholar 

  18. Miyazaki, M., Shinohara, A., Takeda, M.: An improved pattern matching algorithm for strings in terms of straight line programs. In: Hein, J., Apostolico, A. (eds.) Combinatorial Pattern Matching. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)

    Google Scholar 

  19. Navarro, G.: Regular expression searching on compressed text. J. of Discrete Algorithms 1(5-6), 423–443 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  20. Navarro, G., Raffinot, M.: A general practical approach to pattern matching over Ziv-Lempel compressed text. In: Crochemore, M., Paterson, M.S. (eds.) Combinatorial Pattern Matching. LNCS, vol. 1645, pp. 14–36. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. Pagourtzis, A., Zachos, S.: The complexity of counting functions with easy decision version. In: Královič, R., Urzyczyn, P. (eds.) MFCS 2006. LNCS, vol. 4162, pp. 741–752. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Plandowski, W.: Testing equivalence of morphisms on context-free languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  23. Plandowski, W.: Satisfiability of word equations with constants is in PSPACE. J. ACM 51(3), 483–496 (2004)

    Article  MathSciNet  Google Scholar 

  24. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  25. Rytter, W.: Grammar compression, LZ-encodings, and string algorithms with implicit input. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 15–27. Springer, Heidelberg (2004)

    Google Scholar 

  26. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bin Ma Kaizhong Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lifshits, Y. (2007). Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds) Combinatorial Pattern Matching. CPM 2007. Lecture Notes in Computer Science, vol 4580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73437-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73437-6_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73436-9

  • Online ISBN: 978-3-540-73437-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics