Advertisement

Efficient algorithms for Lempel-Ziv encoding

  • Leszek Gasieniec
  • Marek Karpinski
  • Wojciech Plandowski
  • Wojciech Rytter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1097)

Abstract

We consider several basic problems for texts and show that if the input texts are given by their Lempel-Ziv codes then the problems can be solved deterministically in polynomial time in the case when the original (uncompressed) texts are of exponential size. The growing importance of massively stored information requires new approaches to algorithms for compressed texts without decompressing. Denote by LZ(ω) the version of a string ω produced by Lempel-Ziv encoding algorithm. For given compressed strings LZ(T), LZ(P) we give the first known deterministic polynomial time algorithms to compute compressed representations of the set of all occurrences of the patternP in T, all periods of T, all palindromes of T, and all squares of T. Then we consider several classical language recognition problems:

  • regular language recognition: given LZ(T) and a language L described by a regular expression, test if T ε L,

  • extended regular language recognition: given LZ(T) and a language L described by a LZ-compressed regular expression, test if T ε L, the alphabet is unary,

  • context-free language recognition: given LZ(T) and a language L described by a context-free grammar, test if T ε L, the alphabet is unary.

We show that the first recognition problem has a polynomial time algorithm and the other two problems are NP-hard.

We show also that the LZ encoding can be computed on-line in polynomial time delay and small space (i.e. proportional to the size of the compressed text). Also the compressed representation of a patternmatching automaton for the compressed pattern is computed in polynomial time.

Keywords

Regular Expression Arithmetic Progression Composition Rule Composition System Language Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Amir, G. Benson and M. Farach, Let sleeping files lie: pattern-matching in Z-compressed files, in SODA '94.Google Scholar
  2. 2.
    A. Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992).Google Scholar
  3. 3.
    A. Amir, G. Benson and M. Farach, Optimal two-dimensional compressed matching, in ICALP'94.Google Scholar
  4. 4.
    A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, pp. 296–307.Google Scholar
  5. 5.
    M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York (1994).Google Scholar
  6. 6.
    M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in STOC'95, pp. 703–712.Google Scholar
  7. 7.
    M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman (1979).Google Scholar
  8. 8.
    L. Gasieniec, M. Karpiński, W. Plandowski and W. Rytter, Randomized Efficient Algorithms for Compressed Strings: the Finger-Print Approach, to appear in proceedings of the 7th Combinatorial Pattern Matching, Laguna Beach (1996).Google Scholar
  9. 9.
    M. Karpinski, W. Rytter and A. Shinohara, Pattern-matching for strings with short description, in Combinatorial Pattern Matching, 1995.Google Scholar
  10. 10.
    D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. Addison-Wesley, 1981.Google Scholar
  11. 11.
    A. Lempel and J. Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976).Google Scholar
  12. 12.
    W. Plandowski, Testing equivalence of morphisms on context-free languages, ESA'94, Lecture Notes in Computer Science 855, Springer-Verlag, 460–470 (1994).Google Scholar
  13. 13.
    J. Storer, Data compression: methods and theory, Computer Science Press, Rockville, Maryland, 1988.Google Scholar
  14. 14.
    J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory vo. IT-23(3), 337–343, 1977.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Leszek Gasieniec
    • 1
  • Marek Karpinski
    • 2
  • Wojciech Plandowski
    • 3
  • Wojciech Rytter
    • 3
  1. 1.Max-Planck Institut für Informatik, Im StadtwaldSaarbrückenGermany
  2. 2.Dept. of Computer ScienceUniversity of BonnBonnGermany
  3. 3.Instytut InformatykiUniwersytet WarszawskiWarszawaPoland

Personalised recommendations