Advertisement

Randomized efficient algorithms for compressed strings: the finger-print approach

Extended abstract
  • Leszek Gasieniec
  • Marek Karpinski
  • Wojciech Plandowski
  • Wojciech Rytter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1075)

Abstract

Denote by LZ(ω) the coded form of a string ω produced by Lempel-Ziv encoding algorithm. We consider several classical algorithmic problems for texts in the compressed setting. The first of them is the equality-testing: given LZ(ω) and integers i, j, k test the equality: ω[i...i+k]=ω[j...j+k]. We give a simple and efficient randomized algorithm for this problem using the finger-printing idea. The equality testing is reduced to the equivalence of certain context-free grammars generating single strings. The equality-testing is the bottleneck in other algorithms for compressed texts. We relate the time complexity of several classical problems for texts to the complexity Eq(n) of equality-testing. Assume nLZ(T)¦, mLZ(P)¦ and UT¦. Then we can compute the compressed representations of the sets of occurrences of P in T, periods of T, palindromes of T, and squares of T respectively in times O(n log2U · Eq(m)+n2 log U), O(n log2U · Eq(n)+n2 log U), O(n log2U · Eq(n)+n2 log U) and O(n2 log3U · Eq(n)+n3 log2U), where Eq(n)=O(n log log n). The randomization improves considerably upon the known deterministic algorithms ([7] and [8]).

Keywords

Arithmetic Progression Active Point Parse Tree Derivation Tree Terminal Symbol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A.Amir, G. Benson and M. Farach, Let sleeping files lie: pattern-matching in Z-compressed files, in SODA'94.Google Scholar
  2. 2.
    A.Amir, G. Benson, Efficient two dimensional compressed matching, Proc. of the 2nd IEEE Data Compression Conference 279–288 (1992)Google Scholar
  3. 3.
    A.Amir, G. Benson and M. Farach, Optimal two-dimensional compressed matching, in ICALP'94 Google Scholar
  4. 4.
    A. Apostolico, D. Breslauer, Z. Galil, Optimal parallel algorithms for periods, palindromes and squares, in ICALP'92, 296–307Google Scholar
  5. 5.
    M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in STOC'95, pp. 703–712.Google Scholar
  6. 6.
    R.M. Karp and M. Rabin, Efficient randomized pattern matching algorithms, IBM Journal of Research and Dev. 31, pp.249–260 (1987).Google Scholar
  7. 7.
    M. Karpinski, W. Plandowski and W. Rytter, The fully compressed string matching for Lempel-Ziv encoding. Technical Report, Institute of Informatics, Bonn University (1995)Google Scholar
  8. 8.
    M. Karpinski, W. Rytter and A. Shinohara, Pattern-matching for strings with short description, in Combinatorial Pattern Matching, 1995Google Scholar
  9. 9.
    D. Knuth, The Art of Computing, Vol. II: Seminumerical Algorithms. Second edition. Addison-Wesley (1981).Google Scholar
  10. 10.
    A. Lempel and J.Ziv, On the complexity of finite sequences, IEEE Trans. on Inf. Theory 22, 75–81 (1976)Google Scholar
  11. 11.
    W. Plandowski, Testing equivalence of morphisms on context-free languages, ESA'94, Lecture Notes in Computer Science 855, Springer-Verlag, 460–470 (1994).Google Scholar
  12. 12.
    J.Ziv and A.Lempel, A universal algorithm for sequential data compression, IEEE Trans. on Inf. Theory 17, 8–19, 1984Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Leszek Gasieniec
    • 1
  • Marek Karpinski
    • 2
  • Wojciech Plandowski
    • 3
  • Wojciech Rytter
    • 3
  1. 1.Max-Planck Institut für InformatikSaarbrückenGermany
  2. 2.Dept. of Computer ScienceUniversity of BonnBonnGermany
  3. 3.Instytut InformatykiUniwersytet WarszawskiWarszawaPoland

Personalised recommendations