Volume 80, Issue 7, pp 2012–2047

Lempel–Ziv-78 Compressed String Dictionaries

  • Julian Arz
  • Johannes FischerEmail author
Part of the following topical collections:
  1. Special Issue on Compact Data Structures


String dictionaries store a collection \(\left( s_i\right) _{0\le i < m}\) of m variable-length keys (strings) over an alphabet \(\varSigma \) and support the operations lookup (given a string \(s\in \varSigma ^*\), decide if \(s_i=s\) for some i, and return this i) and access (given an integer \(0\le i < m\), return the string \(s_i\)). We show how to modify the Lempel–Ziv-78 data compression algorithm to store the strings space-efficiently and support the operations lookup and access in optimal time. Our approach is validated experimentally on dictionaries of up to 1.5 GB of uncompressed text. We achieve compression ratios often outperforming the existing alternatives, especially on dictionaries containing many repeated substrings. Our query times remain competitive.


Data structures Compression Strings Dictionaries Searching 



Many people helped to improve this article in different ways. First, we thank Giuseppe Ottaviano for providing his data sets, and Francisco Claude and Miguel Ángel Martínez-Prieto for the source codes of their implementations. Second, we thank Paweł Gawrychowski for interesting discussions on this topic, and Giuseppe Ottaviano, Rossano Venturini, and Gonzalo Navarro for pointing out the work by Russo and Oliveira [31] during the Dagstuhl Seminar 13232 “Indexes and Computation over Compressed Structured Data” [24]. Gonzalo Navarro also brought Lemma 2.3 from Kosaraju and Manzini [22] to our attention. We further thank Simon Gog for bringing [36] to our attention, and the anonymous reviewers for their comments that helped to improve this article.


Authors and Affiliations

  1. Department of Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
  2. Department of Computer Science, TU Dortmund, Dortmund, Germany

