Advertisement

LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding

  • Keisuke GotoEmail author
  • Hideo Bannai
  • Shunsuke Inenaga
  • Masayuki Takeda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9133)

Abstract

We propose a new variant of the LZ78 factorization which we call the LZ Double-factor factorization (LZD factorization). Each factor of the LZD factorization of a string is the concatenation of the two longest previous factors, while each factor of the LZ78 factorization is that of the longest previous factor and the following character. Interestingly, this simple modification drastically improves the compression ratio in practice. We propose two online algorithms to compute the LZD factorization in \(O(m (M + \min (m, M)\log \sigma ))\) time and \(O(m)\) space, or in \(O(N \log \sigma )\) time and \(O(N)\) space, where \(m\) is the number of factors to output, \(M\) is the length of the longest factor(s), \(N\) is the length of the input string, and \(\sigma \) is the alphabet size. We also show two versions of our LZD factorization with variable-to-fixed encoding, and present online algorithms to compute these versions in \(O(N + \min (m, 2^L) (M + \min (m, M, 2^L) \log \sigma ))\) time and \(O(\min (2^L, m))\) space, where \(L\) is the bit-length of each fixed-length code word. The LZD factorization and its versions with variable-to-fixed encoding are actually grammar-based compression, and our experiments show that our algorithms outperform the state-of-the-art online grammar-based compression algorithms on several data sets.

Keywords

Compression Ratio Online Algorithm Context Free Grammar Suffix Tree Input String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We would like to thank Shirou Maruyama and Takuya Kida for providing source codes of their compression programs FOLCA and ADS.

References

  1. 1.
    Amir, A., Farach, M., Idury, R.M., Poutré, J.A.L., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)zbMATHCrossRefGoogle Scholar
  2. 2.
    Bannai, H., Inenaga, S., Takeda, M.: Efficient LZ78 factorization of grammar compressed text. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 86–98. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  3. 3.
    Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)zbMATHMathSciNetGoogle Scholar
  4. 4.
    Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Speeding up q-gram mining on grammar-based compressed texts. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 220–231. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  5. 5.
    Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: Unified compression-based acceleration of edit-distance computation. Algorithmica 65(2), 339–353 (2013)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: DCC 1999, 296–305 (1999)Google Scholar
  7. 7.
    Maruyama, S., Sakamoto, H., Takeda, M.: An online algorithm for lightweight grammar-based compression. Algorithms 5(2), 214–235 (2012)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Maruyama, S., Tabei, Y.: Fully online grammar compression in constant space. In: DCC 2014, pp. 173–182 (2014)Google Scholar
  9. 9.
    Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 218–229. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  10. 10.
    Nevill-Manning, C.G., Witten, I.H., Maulsby, D.L.: Compression by induction of hierarchical grammars. In: DCC 1994. pp. 244–253 (1994)Google Scholar
  11. 11.
    Peter, T.: A modified LZW data compression scheme. In: Australian Computer Science Communications, pp. 262–272 (1987)Google Scholar
  12. 12.
    Sekine, K., Sasakawa, H., Yoshida, S., Kida, T.: Adaptive dictionary sharing method for re-pair algorithm. In: DCC 2014, p. 425 (2014)Google Scholar
  13. 13.
    Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding up pattern matching by text compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  14. 14.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  15. 15.
    Westbrook, J.: Fast incremental planarity testing. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 342–353. Springer, Heidelberg (1992) CrossRefGoogle Scholar
  16. 16.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Keisuke Goto
    • 1
    Email author
  • Hideo Bannai
    • 1
  • Shunsuke Inenaga
    • 1
  • Masayuki Takeda
    • 1
  1. 1.Department of InformaticsKyushu UniversityFukuokaJapan

Personalised recommendations