Advertisement

Lempel-Ziv Factorization Revisited

  • Enno Ohlebusch
  • Simon Gog
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6661)

Abstract

For 30 years the Lempel-Ziv factorization of a string has played an important role in data compression, and more recently it was used as the basis of linear time algorithms for the detection of all maximal repetitions (runs) in a string. In this paper, we present two new linear time algorithms: the first one is the fastest and the second is the most space-efficient among all LZ-factorization algorithms known so far.

Keywords

Data Compression Online Algorithm Linear Time Algorithm Maximal Repetition Edge Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2, 53–86 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Al-Hafeedh, A., Crochemore, M., Ilie, L., Kopylov, J., Smyth, W.F., Tischler, G., Yusufu, M.: A comparison of index-based Lempel-Ziv LZ77 factorization algorithms (2011) (submitted)Google Scholar
  3. 3.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center (1994)Google Scholar
  4. 4.
    Chen, G., Puglisi, S.J., Smyth, W.F.: Lempel-Ziv factorization using less time & space. Mathematics in Computer Science 1(4), 605–623 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Information Processing Letters 106(2), 75–80 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Crochemore, M., Ilie, L., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T.: LPF computation revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel-Ziv factorization. In: Proc. 18th Data Compression Conference, pp. 482–488. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  8. 8.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)Google Scholar
  9. 9.
    Gog, S., Fischer, J.: Advantages of shared data structures for sequences of balanced parentheses. In: Proc. 20th Data Compression Conference, pp. 406–415. IEEE Computer Society, Los Alamitos (2010)Google Scholar
  10. 10.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 841–850 (2003)Google Scholar
  11. 11.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of Computer and System Sciences 69(4), 525–546 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Kasai, T., Lee, G.H., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proc. 40th Annual Symposium on Foundations of Computer Science, pp. 596–604. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  15. 15.
    Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proc. 20th Data Compression Conference, pp. 239–248. IEEE Computer Society, Los Alamitos (2010)Google Scholar
  16. 16.
    Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: Proc. Data Compression Conference, pp. 193–202. IEEE Computer Society, Los Alamitos (2009)Google Scholar
  17. 17.
    Ohlebusch, E., Fischer, J., Gog, S.: CST++. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 322–333. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Okanohara, D., Sadakane, K.: An online algorithm for finding the longest previous factors. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 696–707. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Pokrzywa, R., Polanski, A.: BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform. Genomics 96(5), 316–321 (2010)CrossRefGoogle Scholar
  20. 20.
    Puglisi, S.J., Smyth, W.F., Turpin, A.: A taxonomy of suffix array construction algorithms. ACM Computing Surveys 39(2), 1–31 (2007)CrossRefGoogle Scholar
  21. 21.
    Rodeh, M., Pratt, V.R., Even, S.: A linear time algorithm for data compression via string matching. Journal of the ACM 28, 16–24 (1981)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Enno Ohlebusch
    • 1
  • Simon Gog
    • 1
  1. 1.Institute of Theoretical Computer ScienceUniversity of UlmGermany

Personalised recommendations