Advertisement

Computing Lempel-Ziv Factorization Online

  • Tatiana Starikovskaya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7464)

Abstract

We present an algorithm which computes the Lempel-Ziv factorization of a word W of length n on an alphabet Σ of size σ online in the following sense: it reads W starting from the left, and, after reading each r = O(log σ n) characters of W, updates the Lempel-Ziv factorization. The algorithm requires O(nlogσ) bits of space and O(n log2 n) time. The basis of the algorithm is a sparse suffix tree combined with wavelet trees.

Keywords

IEEE Computer Society Data Compression Online Algorithm Maximal Rank Internal Vertex 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Crochemore, M.: Transducers and repetitions. Theor. Comput. Sci. 45, 63–86 (1986)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of the 1999 Symposium on Foundations of Computer Science, pp. 596–604. IEEE Computer Society (1999)Google Scholar
  4. 4.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69, 525–546 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Kreft, S., Navarro, G.: Self-indexing Based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Rodeh, M., Pratt, V.R., Even, S.: Linear algorithm for data compression via string matching. J. ACM 28, 16–24 (1981)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms 2, 53–86 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Chen, G., Puglisi, S.J., Smyth, W.F.: Lempel-Ziv factorization using less time & space. Mathematics in Computer Science 1(4), 605–623 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Inf. Process. Lett. 106, 75–80 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Crochemore, M., Ilie, L., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T.: LPF Computation Revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel-Ziv factorization. In: Proceedings of the Data Compression Conference, pp. 482–488. IEEE Computer Society, Washington, DC (2008)Google Scholar
  12. 12.
    Ohlebusch, E., Gog, S.: Lempel-Ziv Factorization Revisited. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 15–26. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  13. 13.
    Okanohara, D., Sadakane, K.: An Online Algorithm for Finding the Longest Previous Factors. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 696–707. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Chiu, S.-Y., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler transform: Linking range searching and text indexing. In: Proceedings of the Data Compression Conference, pp. 252–261. IEEE Computer Society (2008)Google Scholar
  15. 15.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: On Entropy-Compressed Text Indexing in External Memory. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 75–89. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  16. 16.
    Chiu, S.-Y., Hon, W.-K., Shah, R., Vitter, J.S.: I/O-efficient compressed text indexes: From theory to practice. In: Proceedings of the Data Compression Conference, pp. 426–434. IEEE Computer Society (2010)Google Scholar
  17. 17.
    Kärkkäinen, J., Ukkonen, E.: Sparse Suffix Trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  18. 18.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica, 249–260 (1995)Google Scholar
  19. 19.
    Mäkinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text indexes. ACM Trans. Algorithms 4, 32:1–32:38 (2008)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Mäkinen, V., Navarro, G.: Position-Restricted Substring Searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Kucherov, G., Nekrich, Y., Starikovskaya, T.: Cross-Document Pattern Matching. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 196–207. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tatiana Starikovskaya
    • 1
  1. 1.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations