Efficient algorithms for Lempel-Ziv encoding
We consider several basic problems for texts and show that if the input texts are given by their Lempel-Ziv codes then the problems can be solved deterministically in polynomial time in the case when the original (uncompressed) texts are of exponential size. The growing importance of massively stored information requires new approaches to algorithms for compressed texts without decompressing. Denote by LZ(ω) the version of a string ω produced by Lempel-Ziv encoding algorithm. For given compressed strings LZ(T), LZ(P) we give the first known deterministic polynomial time algorithms to compute compressed representations of the set of all occurrences of the patternP in T, all periods of T, all palindromes of T, and all squares of T. Then we consider several classical language recognition problems:
regular language recognition: given LZ(T) and a language L described by a regular expression, test if T ε L,
extended regular language recognition: given LZ(T) and a language L described by a LZ-compressed regular expression, test if T ε L, the alphabet is unary,
context-free language recognition: given LZ(T) and a language L described by a context-free grammar, test if T ε L, the alphabet is unary.
We show that the first recognition problem has a polynomial time algorithm and the other two problems are NP-hard.
We show also that the LZ encoding can be computed on-line in polynomial time delay and small space (i.e. proportional to the size of the compressed text). Also the compressed representation of a patternmatching automaton for the compressed pattern is computed in polynomial time.
KeywordsRegular Expression Arithmetic Progression Composition Rule Composition System Language Recognition
Unable to display preview. Download preview PDF.
- 1.A. Amir, G. Benson and M. Farach, Let sleeping files lie: pattern-matching in Z-compressed files, in SODA '94.Google Scholar
- 2.A. Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992).Google Scholar
- 3.A. Amir, G. Benson and M. Farach, Optimal two-dimensional compressed matching, in ICALP'94.Google Scholar
- 4.A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, pp. 296–307.Google Scholar
- 5.M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York (1994).Google Scholar
- 6.M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in STOC'95, pp. 703–712.Google Scholar
- 7.M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman (1979).Google Scholar
- 8.L. Gasieniec, M. Karpiński, W. Plandowski and W. Rytter, Randomized Efficient Algorithms for Compressed Strings: the Finger-Print Approach, to appear in proceedings of the 7th Combinatorial Pattern Matching, Laguna Beach (1996).Google Scholar
- 9.M. Karpinski, W. Rytter and A. Shinohara, Pattern-matching for strings with short description, in Combinatorial Pattern Matching, 1995.Google Scholar
- 10.D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. Addison-Wesley, 1981.Google Scholar
- 11.A. Lempel and J. Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976).Google Scholar
- 12.W. Plandowski, Testing equivalence of morphisms on context-free languages, ESA'94, Lecture Notes in Computer Science 855, Springer-Verlag, 460–470 (1994).Google Scholar
- 13.J. Storer, Data compression: methods and theory, Computer Science Press, Rockville, Maryland, 1988.Google Scholar
- 14.J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory vo. IT-23(3), 337–343, 1977.Google Scholar