Bit-Complexity of Lempel-Ziv Compression

Venturini, Rossano

doi:10.2991/978-94-6239-033-1_4

Rossano Venturini⁴

Part of the book series: Atlantis Studies in Computing ((ATLANTISCOMP,volume 4))

1106 Accesses

Abstract

One of the most famous lossless data-compression schemes is the one introduced by Lempel and Ziv in the late 1970s, and indeed many (non-)commercial programs are currently based on it—like gzip, zip, pkzip, arj, rar, just to cite a few.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recently, Crochemore et al. (2008) showed how to achieve the optimal \(O(n)\) time and space when the alphabet has size \(O(n)\) and the window is unbounded, i.e., \(M=n\).
2.
Gzip home page http://www.gzip.org.
3.
Bzip2 home page http://www.bzip.org/.
4.
In case of a larger alphabet, our algorithms are still correct but we need to add the term \(T_{sort}(n, \sigma )\) to their time complexities, which denotes the time required to sort/remap all distinct symbols of \(T\) into the range \([n]\).
5.
Notice that the node \(u_j\) can be identified during the rightward scanning of \(T\) as usually done in LZ77-parsing, taking \(O(n)\) time for all identified phrases.
6.
Recall the variant of LZ77 we are considering in this chapter, which uses just a pair of integers per phrase, and thus drops the char following that phrase in \(T\).
7.
Notice that there may be several different candidate positions \(p\) from which we can copy the substring \(T[i:j-1]\). We can arbitrarily choose any position among the ones whose distance from \(i\) is encodable with the smallest number of bits (namely, \(|f(d_{i,j})|\) bits is minimized).
8.
Recall that \(c(v_i,v_j) = |f(d_{i,j})| + |g(\ell _{i,j})|\), if the edge does exist, otherwise we set \(c(v_i,v_j) = +\infty \).
9.
Observe that \(|FS(v)| \le Q(f,n) + Q(g,n)\), for any vertex \(v\) in \(\widetilde{\mathcal{G}}(T)\) (Lemma 4.2).
10.
Observe that there may be several leaves having these characteristics. We can arbitrarily choose one of them because they denote copies of the same phrase that can be encoded with the same number of bits for the length and for the distance (i.e., \(c(I_k)\) bits).
11.
The value of \(\mathtt{mp}[a(u)]\) can be arbitrarily set to \(\mathtt{min}(u)\) or \(\mathtt{max}(u)\) whenever both \(\mathtt{min}(u)\) and \(\mathtt{max}(u)\) belong to \(W_{a(u)}\).
12.
Observe that we obtain \(\mathtt{mp}[11] = 10\) by setting \(\mathtt{mp}[a(u)] = a(\mathtt{parent}(u))\) at the node with \(a =11\).
13.
Algorithms Rightmost-LZ77 and BitOptimal-LZ77 encode copy-distances and lengths by using a variant of Rice codes in which we have not just one bucketing of size \(2^k\), rather we have a series of buckets of increasing size, fixed in advance.
14.
Lzma2 home page http://7-zip.org/.

Author information

Authors and Affiliations

Department of Computer Science, University of Pisa, Pisa, Italy
Rossano Venturini

Authors

Rossano Venturini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rossano Venturini .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Venturini, R. (2014). Bit-Complexity of Lempel-Ziv Compression. In: Compressed Data Structures for Strings. Atlantis Studies in Computing, vol 4. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-033-1_4

Download citation

DOI: https://doi.org/10.2991/978-94-6239-033-1_4
Published: 01 November 2013
Publisher Name: Atlantis Press, Paris
Print ISBN: 978-94-6239-032-4
Online ISBN: 978-94-6239-033-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Societies and partnerships

Atlantis Press (opens in a new tab)