Abstract
We investigate two closely related LZ78-based compression schemes: LZMW (an old scheme by Miller and Wegman) and LZD (a recent variant by Goto et al.). Both LZD and LZMW naturally produce a grammar for a string of length n; we show that the size of this grammar can be larger than the size of the smallest grammar by a factor \(\varOmega (n^{\frac{1}{3}})\) but is always within a factor \(O((\frac{n}{\log n})^{\frac{2}{3}})\). In addition, we show that the standard algorithms using \(\varTheta (z)\) working space to construct the LZD and LZMW parsings, where z is the size of the parsing, work in \(\varOmega (n^{\frac{5}{4}})\) time in the worst case. We then describe a new Las Vegas LZD/LZMW parsing algorithm that uses \(O (z \log n)\) space and \(O(n + z \log ^2 n)\) time w.h.p.
Keywords
G. Badkobeh—Supported by the Leverhulme Trust’s Early Career Scheme.
T. Kociumaka—Supported by Polish budget funds for science in 2013–2017 under the ‘Diamond Grant’ program.
S.J. Puglisi—Supported by the Academy of Finland via grant 294143.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We concern ourselves here with LZD parsing, but it should be easy for the reader to see that the algorithms are trivially adapted to instead compute LZMW.
References
Supplementary materials for the present paper: C++ code for described experiments. https://bitbucket.org/dkosolobov/lzd-lzmw
Belazzougui, D., Boldi, P., Vigna, S.: Dynamic Z-Fast tries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 159–172. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16321-0_15
Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y.: Access, rank, and select in grammar-compressed strings. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol. 9294, pp. 142–154. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48350-3_13
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings and trees. SIAM J. Comput. 44(3), 513–539 (2015)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theor. 51(7), 2554–2576 (2005)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., MartÃn-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28332-1_21
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD Factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Cham (2015). doi:10.1007/978-3-319-19929-0_19
Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 35–49. Springer, Cham (2016). doi:10.1007/978-3-319-46049-9_4
I, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Efficient Lyndon factorization of grammar compressed text. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 153–164. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38905-4_16
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Devel. 31(2), 249–260 (1987)
Kempa, D., Kosolobov, D.: LZ-End parsing in compressed space. In: Proceedings of Data Compression Conference (DCC), pp. 350–359. IEEE (2017)
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoret. Comput. Sci. 483, 115–133 (2013)
Miller, V.S., Wegman, M.N.: Variations on a theme by Ziv and Lempel. In: Apostolico, A., Galil, Z. (eds.) Proceedings of NATO Advanced Research Workshop on Combinatorial Algorithms on Words, NATO ASI, vol. 12, pp. 131–140. Springer, Heidelberg (1985)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1–3), 211–222 (2003)
Tanaka, T., I, T., Inenaga, S., Bannai, H., Takeda, M.: Computing convolution on grammar-compressed text. In: Proceedings of Data Compression Conference (DCC), pp. 451–460. IEEE (2013)
Westbrook, J.: Fast incremental planarity testing. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 342–353. Springer, Heidelberg (1992). doi:10.1007/3-540-55719-9_86
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
Acknowledgements
We thank H. Bannai, P. Cording, K. Dabrowski, D. Hücke, D. Kempa, L. Salmela for interesting discussions on LZD at the 2016 StringMasters and Dagstuhl meetings. Thanks also go to D. Belazzougui for advice about the z-fast trie and to the anonymous referees.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Badkobeh, G., Gagie, T., Inenaga, S., Kociumaka, T., Kosolobov, D., Puglisi, S.J. (2017). On Two LZ78-style Grammars: Compression Bounds and Compressed-Space Computation. In: Fici, G., Sciortino, M., Venturini, R. (eds) String Processing and Information Retrieval. SPIRE 2017. Lecture Notes in Computer Science(), vol 10508. Springer, Cham. https://doi.org/10.1007/978-3-319-67428-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-67428-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67427-8
Online ISBN: 978-3-319-67428-5
eBook Packages: Computer ScienceComputer Science (R0)