Advertisement

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

  • Juha Kärkkäinen
  • Dominik Kempa
  • Simon J. Puglisi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7922)

Abstract

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization algorithms, some of which require only 2nlogn + O(logn) bits of working space to factorize a string of length n. These are the most space efficient linear time algorithms to date, using n logn bits less space than any previous linear time algorithm. The algorithms are also simple to implement, very fast in practice, and amenable to streaming implementation.

Keywords

Linear Time External Memory Linear Time Algorithm Input String Lazy Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Al-Hafeedh, A., Crochemore, M., Ilie, L., Kopylova, E., Smyth, W., Tischler, G., Yusufu, M.: A comparison of index-based Lempel-Ziv LZ77 factorization algorithms. ACM Comput. Surv. 45(1), 5:1–5:17 (2012)CrossRefGoogle Scholar
  2. 2.
    Charikar, M., Lehman, E., Liu, D., Panigrhy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)CrossRefGoogle Scholar
  3. 3.
    Chen, G., Puglisi, S.J., Smyth, W.F.: Fast and practical algorithms for computing all the runs in a string. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 307–315. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Information Processing Letters 106(2), 75–80 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Crochemore, M., Ilie, L., Iliopoulos, C.S., Kubica, M., Rytter, W., Waleń, T.: LPF computation revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel-Ziv factorization. In: DCC 2008, pp. 482–488. IEEE Computer Society (2008)Google Scholar
  7. 7.
    Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 653–662. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  9. 9.
    Goto, K., Bannai, H.: Simpler and faster Lempel Ziv factorization. In: DCC 2013, pp. 133–142. IEEE Computer Society (2013)Google Scholar
  10. 10.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Bonifaci, V. (ed.) SEA 2013. LNCS, vol. 7933, pp. 139–150. Springer, Heidelberg (2013)Google Scholar
  11. 11.
    Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kärkkäinen, J., Manzini, G., Puglisi, S.J.: Permuted longest-common-prefix array. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 181–192. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Kempa, D., Puglisi, S.J.: Lempel-Ziv factorization: simple, fast, practical. In: Zeh, N., Sanders, P. (eds.) ALENEX 2013, pp. 103–112. SIAM (2013)Google Scholar
  14. 14.
    Kreft, S., Navarro, G.: Self-indexing based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)Google Scholar
  16. 16.
    Ohlebusch, E., Gog, S.: Lempel-Ziv factorization revisited. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 15–26. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Wu, F.: Sequential file prefetching in Linux. In: Wiseman, Y., Jiang, S. (eds.) Advanced Operating Systems and Kernel Applications: Techniques and Technologies, ch. 11, pp. 217–236. IGI Global (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Juha Kärkkäinen
    • 1
  • Dominik Kempa
    • 1
  • Simon J. Puglisi
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations