Skip to main content

Bit-Complexity of Lempel-Ziv Compression

  • Chapter
  • First Online:
Compressed Data Structures for Strings

Part of the book series: Atlantis Studies in Computing ((ATLANTISCOMP,volume 4))

  • 1106 Accesses

Abstract

One of the most famous lossless data-compression schemes is the one introduced by Lempel and Ziv in the late 1970s, and indeed many (non-)commercial programs are currently based on it—like gzip, zip, pkzip, arj, rar, just to cite a few.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recently, Crochemore et al. (2008) showed how to achieve the optimal \(O(n)\) time and space when the alphabet has size \(O(n)\) and the window is unbounded, i.e., \(M=n\).

  2. 2.

    Gzip home page http://www.gzip.org.

  3. 3.

    Bzip2 home page http://www.bzip.org/.

  4. 4.

    In case of a larger alphabet, our algorithms are still correct but we need to add the term \(T_{sort}(n, \sigma )\) to their time complexities, which denotes the time required to sort/remap all distinct symbols of \(T\) into the range \([n]\).

  5. 5.

    Notice that the node \(u_j\) can be identified during the rightward scanning of \(T\) as usually done in LZ77-parsing, taking \(O(n)\) time for all identified phrases.

  6. 6.

    Recall the variant of LZ77 we are considering in this chapter, which uses just a pair of integers per phrase, and thus drops the char following that phrase in \(T\).

  7. 7.

    Notice that there may be several different candidate positions \(p\) from which we can copy the substring \(T[i:j-1]\). We can arbitrarily choose any position among the ones whose distance from \(i\) is encodable with the smallest number of bits (namely, \(|f(d_{i,j})|\) bits is minimized).

  8. 8.

    Recall that \(c(v_i,v_j) = |f(d_{i,j})| + |g(\ell _{i,j})|\), if the edge does exist, otherwise we set \(c(v_i,v_j) = +\infty \).

  9. 9.

    Observe that \(|FS(v)| \le Q(f,n) + Q(g,n)\), for any vertex \(v\) in \(\widetilde{\mathcal{G}}(T)\) (Lemma 4.2).

  10. 10.

    Observe that there may be several leaves having these characteristics. We can arbitrarily choose one of them because they denote copies of the same phrase that can be encoded with the same number of bits for the length and for the distance (i.e., \(c(I_k)\) bits).

  11. 11.

    The value of \(\mathtt{mp}[a(u)]\) can be arbitrarily set to \(\mathtt{min}(u)\) or \(\mathtt{max}(u)\) whenever both \(\mathtt{min}(u)\) and \(\mathtt{max}(u)\) belong to \(W_{a(u)}\).

  12. 12.

    Observe that we obtain \(\mathtt{mp}[11] = 10\) by setting \(\mathtt{mp}[a(u)] = a(\mathtt{parent}(u))\) at the node with \(a =11\).

  13. 13.

    Algorithms Rightmost-LZ77 and BitOptimal-LZ77 encode copy-distances and lengths by using a variant of Rice codes in which we have not just one bucketing of size \(2^k\), rather we have a series of buckets of increasing size, fixed in advance.

  14. 14.

    Lzma2 home page http://7-zip.org/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rossano Venturini .

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Atlantis Press and the authors

About this chapter

Cite this chapter

Venturini, R. (2014). Bit-Complexity of Lempel-Ziv Compression. In: Compressed Data Structures for Strings. Atlantis Studies in Computing, vol 4. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-033-1_4

Download citation

Publish with us

Policies and ethics

Societies and partnerships