Advertisement

Analyzing Relative Lempel-Ziv Reference Construction

  • Travis GagieEmail author
  • Simon J. Puglisi
  • Daniel Valenzuela
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9954)

Abstract

Relative Lempel-Ziv is a popular algorithm designed to compress sets of strings relative to a given reference string, which acts as a kind of dictionary. It can still applied even when there is no obvious natural reference string for a dataset, by sampling substrings from the dataset and concatenating them to obtain an artificial reference. This works well in practice but a theoretical analysis has been lacking. In this paper we provide such an analysis and verify it experimentally.

Keywords

Binary String Compression Algorithm Internal Memory Suffix Tree Pruning Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Gawrychowski, P.: Faster algorithm for computing the edit distance between SLP-compressed strings. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 229–236. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. Proc. VLDB 5, 265–273 (2011)CrossRefGoogle Scholar
  4. 4.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization: simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 189–200. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory. In: Proceedings of the DCC, pp. 153–162 (2014)Google Scholar
  6. 6.
    Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Liao, K., Petri, M., Moffat, A., Wirth, A.: Effective construction of relative Lempel-Ziv dictionaries. In: Proceedings of the WWW, pp. 807–816 (2016)Google Scholar
  8. 8.
    Raskhodnikova, S., Ron, D., Rubinfeld, R., Smith, A.D.: Sublinear algorithms for approximating string compressibility. Algorithmica 65(3), 685–709 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1–3), 211–222 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Tong, J., Wirth, A., Zobel, J.: Principled dictionary pruning for low-memory corpus compression. In: Proceedings of the SIGIR, pp. 283–292 (2014)Google Scholar
  11. 11.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Travis Gagie
    • 1
    Email author
  • Simon J. Puglisi
    • 1
  • Daniel Valenzuela
    • 1
  1. 1.Department of Computer Science, Helsinki Institute for Information TechnologyUniversity of HelsinkiHelsinkiFinland

Personalised recommendations