LZ78 Compression in Low Main Memory Space

  • Diego Arroyuelo
  • Rodrigo Cánovas
  • Gonzalo Navarro
  • Rajeev Raman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10508)

Abstract

We present the first algorithms that perform the LZ78 compression of a text of length n over alphabet \([1..\sigma ]\), whose output is z integers, using only \(O(z\lg \sigma )\) bits of main memory. The algorithms read the input text from disk in a single pass, and write the compressed output to disk. The text can also be decompressed within the same main memory usage, which is unprecedented too. The algorithms are based on hashing and, under some simplifying assumptions, run in O(n) expected time. We experimentally verify that our algorithms use 2–9 times less time and/or space than previously implemented LZ78 compressors.

References

  1. 1.
    Arroyuelo, D., Davoodi, P., Satti, S.R.: Succinct dynamic cardinal trees. Algorithmica 74(2), 742–777 (2016)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Arroyuelo, D., Navarro, G.: Space-efficient construction of Lempel-Ziv compressed text indexes. Inf. Comput. 209(7), 1070–1102 (2011)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Arroyuelo, D., Navarro, G., Sadakane, K.: Stronger Lempel-Ziv based compressed text indexing. Algorithmica 62(1), 54–101 (2012)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Clark, D.R.: Compact PAT trees. Ph.D. thesis, University of Waterloo, Canada (1996)Google Scholar
  5. 5.
    Ferrada, H., Navarro, G.: A Lempel-Ziv compressed structure for document listing. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 116–128. Springer, Cham (2013). doi:10.1007/978-3-319-02432-5_16 CrossRefGoogle Scholar
  6. 6.
    Ferrada, H., Navarro, G.: Efficient compressed indexing for approximate top-k string retrieval. In: Moura, E., Crochemore, M. (eds.) SPIRE 2014. LNCS, vol. 8799, pp. 18–30. Springer, Cham (2014). doi:10.1007/978-3-319-11918-2_3 Google Scholar
  7. 7.
    Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Fischer, J., I, T., Köppl, D.: Lempel Ziv Computation in Small Space (LZ-CISS). In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 172–184. Springer, Cham (2015). doi:10.1007/978-3-319-19929-0_15 CrossRefGoogle Scholar
  9. 9.
    Hardy, G.H., Wright, E.M.: An Introduction to the Theory of Numbers, 6th edn. Oxford University Press, Oxford (2008)MATHGoogle Scholar
  10. 10.
    Jansson, J., Sadakane, K., Sung, W.: Linked dynamic tries with applications to LZ-compression in sublinear time and space. Algorithmica 71(4), 969–988 (2015)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Köppl, D., Sadakane, K.: Lempel-Ziv Computation in Compressed Space (LZ-CICS). In: Proceedings of 26th Data Compression Conference, pp. 3–12 (2016)Google Scholar
  12. 12.
    Pagh, A., Pagh, R., Ruzic, M.: Linear probing with 5-wise independence. SIAM Rev. 53(3), 547–558 (2011)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Patrascu, M., Thorup, M.: On the k-independence required by linear probing and minwise independence. ACM Trans. Algorithms 12(1) (2016). Article 8Google Scholar
  14. 14.
    Poyias, A., Puglisi, S.J., Raman, R.: m-Bonsai: a practical compact dynamic trie. In: Preliminary Version Proceedings of SPIRE 2015. LNCS, vol. 9309 (2017). CoRR abs/1704.05682. http://arxiv.org/abs/1704.05682,
  15. 15.
    Russo, L.M.S., Oliveira, A.L.: A compressed self-index using a Ziv-Lempel dictionary. Inf. Retrieval 11(4), 359–388 (2008)CrossRefGoogle Scholar
  16. 16.
    Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1230–1239 (2006)Google Scholar
  17. 17.
    Welch, T.A.: A technique for high performance data compression. IEEE Comput. 17(6), 8–19 (1984)CrossRefGoogle Scholar
  18. 18.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Diego Arroyuelo
    • 1
  • Rodrigo Cánovas
    • 2
  • Gonzalo Navarro
    • 3
  • Rajeev Raman
    • 4
  1. 1.Departamento de InformáticaUniversidad Técnica Federico Santa MaríaSan JoaquínChile
  2. 2.LIRMM and IBCMontpellier Cedex 5France
  3. 3.Deptartment of Computer ScienceUniversity of ChileSantiagoChile
  4. 4.Department of InformaticsUniversity of LeicesterLeicesterUK

Personalised recommendations