Encyclopedia of Algorithms

2016 Edition
| Editors: Ming-Yang Kao

Dictionary-Based Data Compression

  • Travis Gagie
  • Giovanni Manzini
Reference work entry
DOI: https://doi.org/10.1007/978-1-4939-2864-4_108

Years and Authors of Summarized Original Work

  • 1977; Ziv, Lempel

Problem Definition

The problem of lossless data compression is the problem of compactly representing data in a format that admits the faithful recovery of the original information. Lossless data compression is achieved by taking advantage of the redundancy which is often present in the data generated by either humans or machines.

Dictionary-based data compression has been “the solution” to the problem of lossless data compression for nearly 15 years. This technique originated in two theoretical papers of Ziv and Lempel [15, 16] and gained popularity in the “1980s” with the introduction of the Unix tool compress (1986) and of the gif image format (1987). Although today there are alternative solutions to the problem of lossless data compression (e.g., Burrows-Wheeler compression and Prediction by Partial Matching), dictionary-based compression is still widely used in everyday applications: consider for example the zip...

Keywords

LZ compression Lempel compression Parsing-based compression Ziv 
This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Arroyuelo D, Navarro G, Sadakane K (2006) Reducing the space requirement of LZ-index. In: Proceedings of 17th combinatorial pattern matching conference (CPM). LNCS, vol 4009. Springer, pp 318–329Google Scholar
  2. 2.
    Charikar M, Lehman E, Liu D, Panigraphy R, Prabhakaran M, Sahai A, Shelat A (2005) The smallest grammar problem. IEEE Trans Inf Theory 51:2554–2576MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Cormode G, Muthukrishnan S (2005) Substring compression problems. In: Proceedings of. 16th ACM-SIAM symposium on discrete algorithms (SODA ’05), pp 321–330Google Scholar
  4. 4.
    Crochemore M, Landau G, Ziv-Ukelson M (2003) A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J Comput 32:1654–1673MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Ferragina P, Manzini G (2005) Indexing compressed text. J ACM 52:552–581MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Kosaraju R, Manzini G (1999) Compression of low entropy strings with Lempel–Ziv algorithms. SIAM J Comput 29:893–911MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Krishnan P, Vitter J (1998) Optimal prediction for prefetching in the worst case. SIAM J Comput 27:1617–1636MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Lifshits Y, Mozes S, Weimann O, Ziv-Ukelson M (2007) Speeding up HMMdecoding and training by exploiting sequence repetitions. Springer, 2007Google Scholar
  9. 9.
    Matias Y, Sahinalp C (1999) On the optimality of parsing in dynamic dictionary based data compression. In: Proceedings 10th annual ACM-SIAM symposium on discrete algorithms (SODA’99), pp 943–944Google Scholar
  10. 10.
    Navarro G (2004) Indexing text using the Ziv–Lempel trie. J Discret Algorithm 2:87–114MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Navarro G, Tarhio J (2005) LZgrep: a Boyer-Moore string matching tool for Ziv–Lempel compressed text. Softw Pract Exp 35:1107–1130CrossRefGoogle Scholar
  12. 12.
    Sahinalp C, Rajpoot N (2003) Dictionary-based data compression: an algorithmic perspective. In: Sayood K (ed) Lossless compression handbook. Academic Press, pp 153–167CrossRefGoogle Scholar
  13. 13.
    Salomon D (2007) Data compression: the complete reference, 4th edn. Springer, LondonMATHGoogle Scholar
  14. 14.
    Savari S (1997) Redundancy of the Lempel–Ziv incremental parsing rule. IEEE Trans Inf Theory 43:9–21MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE Trans Inf Theory 23:337–343MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Ziv J, Lempel A (1978) Compression of individual sequences via variable-length coding. IEEE Trans Inf Theory 24:530–536MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Travis Gagie
    • 1
    • 2
  • Giovanni Manzini
    • 1
    • 3
  1. 1.Department of Computer ScienceUniversity of Eastern PiedmontAlessandriaItaly
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  3. 3.Department of Science and Technological InnovationUniversity of Piemonte OrientaleAlessandriaItaly