Efficient algorithms for LempelZiv encoding
Abstract
We consider several basic problems for texts and show that if the input texts are given by their LempelZiv codes then the problems can be solved deterministically in polynomial time in the case when the original (uncompressed) texts are of exponential size. The growing importance of massively stored information requires new approaches to algorithms for compressed texts without decompressing. Denote by LZ(ω) the version of a string ω produced by LempelZiv encoding algorithm. For given compressed strings LZ(T), LZ(P) we give the first known deterministic polynomial time algorithms to compute compressed representations of the set of all occurrences of the patternP in T, all periods of T, all palindromes of T, and all squares of T. Then we consider several classical language recognition problems:

regular language recognition: given LZ(T) and a language L described by a regular expression, test if T ε L,

extended regular language recognition: given LZ(T) and a language L described by a LZcompressed regular expression, test if T ε L, the alphabet is unary,

contextfree language recognition: given LZ(T) and a language L described by a contextfree grammar, test if T ε L, the alphabet is unary.
We show that the first recognition problem has a polynomial time algorithm and the other two problems are NPhard.
We show also that the LZ encoding can be computed online in polynomial time delay and small space (i.e. proportional to the size of the compressed text). Also the compressed representation of a patternmatching automaton for the compressed pattern is computed in polynomial time.
Keywords
Regular Expression Arithmetic Progression Composition Rule Composition System Language RecognitionPreview
Unable to display preview. Download preview PDF.
References
 1.A. Amir, G. Benson and M. Farach, Let sleeping files lie: patternmatching in Zcompressed files, in SODA '94.Google Scholar
 2.A. Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992).Google Scholar
 3.A. Amir, G. Benson and M. Farach, Optimal twodimensional compressed matching, in ICALP'94.Google Scholar
 4.A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, pp. 296–307.Google Scholar
 5.M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York (1994).Google Scholar
 6.M. Farach and M. Thorup, String matching in LempelZiv compressed strings, in STOC'95, pp. 703–712.Google Scholar
 7.M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness. W.H. Freeman (1979).Google Scholar
 8.L. Gasieniec, M. Karpiński, W. Plandowski and W. Rytter, Randomized Efficient Algorithms for Compressed Strings: the FingerPrint Approach, to appear in proceedings of the 7th Combinatorial Pattern Matching, Laguna Beach (1996).Google Scholar
 9.M. Karpinski, W. Rytter and A. Shinohara, Patternmatching for strings with short description, in Combinatorial Pattern Matching, 1995.Google Scholar
 10.D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. AddisonWesley, 1981.Google Scholar
 11.A. Lempel and J. Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976).Google Scholar
 12.W. Plandowski, Testing equivalence of morphisms on contextfree languages, ESA'94, Lecture Notes in Computer Science 855, SpringerVerlag, 460–470 (1994).Google Scholar
 13.J. Storer, Data compression: methods and theory, Computer Science Press, Rockville, Maryland, 1988.Google Scholar
 14.J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory vo. IT23(3), 337–343, 1977.Google Scholar