Randomized efficient algorithms for compressed strings: the finger-print approach
Denote by LZ(ω) the coded form of a string ω produced by Lempel-Ziv encoding algorithm. We consider several classical algorithmic problems for texts in the compressed setting. The first of them is the equality-testing: given LZ(ω) and integers i, j, k test the equality: ω[i...i+k]=ω[j...j+k]. We give a simple and efficient randomized algorithm for this problem using the finger-printing idea. The equality testing is reduced to the equivalence of certain context-free grammars generating single strings. The equality-testing is the bottleneck in other algorithms for compressed texts. We relate the time complexity of several classical problems for texts to the complexity Eq(n) of equality-testing. Assume n=¦LZ(T)¦, m=¦LZ(P)¦ and U=¦T¦. Then we can compute the compressed representations of the sets of occurrences of P in T, periods of T, palindromes of T, and squares of T respectively in times O(n log2U · Eq(m)+n2 log U), O(n log2U · Eq(n)+n2 log U), O(n log2U · Eq(n)+n2 log U) and O(n2 log3U · Eq(n)+n3 log2U), where Eq(n)=O(n log log n). The randomization improves considerably upon the known deterministic algorithms ( and ).
KeywordsArithmetic Progression Active Point Parse Tree Derivation Tree Terminal Symbol
Unable to display preview. Download preview PDF.
- 1.A.Amir, G. Benson and M. Farach, Let sleeping files lie: pattern-matching in Z-compressed files, in SODA'94.Google Scholar
- 2.A.Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992)Google Scholar
- 3.A.Amir, G. Benson and M. Farach, Optimal two-dimensional compressed matching, in ICALP'94 Google Scholar
- 4.A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, 296–307Google Scholar
- 5.M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in STOC'95, pp. 703–712.Google Scholar
- 6.R.M. Karp and M. Rabin, Efficient randomized pattern matching algorithms, IBM Journal of Research and Dev. 31, pp.249–260 (1987).Google Scholar
- 7.M. Karpinski, W. Plandowski and W. Rytter, The fully compressed string matching for Lempel-Ziv encoding. Technical Report, Institute of Informatics, Bonn University (1995)Google Scholar
- 8.M. Karpinski, W. Rytter and A. Shinohara, Pattern-matching for strings with short description, in Combinatorial Pattern Matching, 1995Google Scholar
- 9.D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. Addison-Wesley (1981).Google Scholar
- 10.A. Lempel and J.Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976)Google Scholar
- 11.W. Plandowski, Testing equivalence of morphisms on context-free languages, ESA'94, Lecture Notes in Computer Science 855, Springer-Verlag, 460–470 (1994).Google Scholar
- 12.J.Ziv and A.Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory 17, 8–19, 1984Google Scholar