Fully-Online Grammar Compression

Maruyama, Shirou; Tabei, Yasuo; Sakamoto, Hiroshi; Sadakane, Kunihiko

doi:10.1007/978-3-319-02432-5_25

Shirou Maruyama¹⁹,
Yasuo Tabei²⁰,
Hiroshi Sakamoto²¹ &
…
Kunihiko Sadakane²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8214))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1254 Accesses
19 Citations

Abstract

We present a fully-online algorithm for constructing straight-line programs (SLPs). A naive array representation of an SLP with n variables on an alphabet of size σ requires \(2n\lg(n+\sigma)\) bits. As already shown in [Tabei et al., CPM’13], in offline setting, this size can be reduced to \( n\lg(n+\sigma) + 2n + o(n)\), which is asymptotically equal to the information-theoretic lower bound. Our algorithm achieves the same size in online setting, i.e., characters of an input string are given one by one to update the current SLP. With an auxiliary position array of size \(n\lg(N/n) + 3n + o(n)\) bits, our representation supports substring extractions in O((m + h)t) time where N is the length of the input string, m is the length of a substring extracted, \(h = O(\lg N)\) is the height of the SLP, t = O(1) in offline case, and \(t=O(\lg n/\lg\lg n)\) in online case. The working space is bounded by \((1+\alpha)n\lg(n+\sigma)+n(3+\lg(\alpha n))\) bits depending on a constant α ∈ (0,1], which is a load factor of hash tables. We compared our algorithm to LZend in experiments using real world repetitive texts.

This work was supported by JSPS KAKENHI(24700140,23680016,23240002).

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02432-5_33

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111, 313–337 (2010)
MathSciNet MATH Google Scholar
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram mining on slp compressed strings. J. Discrete Algorithms 18, 89–99 (2013)
Article MathSciNet MATH Google Scholar
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 636–645 (2003)
Google Scholar
Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: STACS, pp. 26–28 (2009)
Google Scholar
Inenaga, S., Bannai, H.: Finding characteristic substrings from compressed texts. In: PSC, pp. 40–54 (2009)
Google Scholar
Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)
Google Scholar
Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)
MathSciNet MATH Google Scholar
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoretical Computer Science 483, 115–133 (2013)
Article MathSciNet MATH Google Scholar
Maruyama, S., Sakamoto, H., Takeda, M.: An online algorithm for lightweight grammar-based compression. Algorithms 5(2), 213–235 (2012)
MathSciNet Google Scholar
Matsubara, W., Inenaga, S., Ishino, A., Shinohara, A., Nakamura, T., Hashimoto, K.: Efficient algorithms to compute compressed longest common substrings and compressed palindromes. Theoretical Computer Science 410(8-10), 900–913 (2009)
Article MathSciNet MATH Google Scholar
Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Proc. SEA, pp. 295–306 (2012)
Google Scholar
Navarro, G., Sadakane, K.: Fully-functional static and dynamic succinct trees. ACM Transactions on Algorithms (2010), Accepted A preliminary version appeared in SODA 2010
Google Scholar
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Workshop on Algorithm Engineering & Experiments (2007)
Google Scholar
Raman, R., Rao, S.S., Raman, V.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3 (2007)
Google Scholar
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)
Article MathSciNet MATH Google Scholar
Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)
Chapter Google Scholar
Takabatake, Y., Tabei, Y., Sakamoto, H.: Variable-length codes for space-efficient grammar-based compression. In: SPIRE, pp. 398–410 (2012)
Google Scholar
Tiskin, A.: Towards approximate matching in compressed strings: Local subsequence recognition. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 401–414. Springer, Heidelberg (2011)
Chapter Google Scholar
Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and don’t-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Preferred Infrastructure, Inc., Japan
Shirou Maruyama
ERATO Minato Project, JST, Japan
Yasuo Tabei
Kyushu Institute of Technology, Japan
Hiroshi Sakamoto
National Institute of Informatics, Japan
Kunihiko Sadakane

Authors

Shirou Maruyama
View author publications
You can also search for this author in PubMed Google Scholar
Yasuo Tabei
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Kunihiko Sadakane
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Industrial Engineering and Management Technion, Technion Institute of Technology, Bloomfield Hall 308, 32000, Haifa, Israel
Oren Kurland
Bar-Ilan University, Israel
Moshe Lewenstein
Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Ely Porat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K. (2013). Fully-Online Grammar Compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds) String Processing and Information Retrieval. SPIRE 2013. Lecture Notes in Computer Science, vol 8214. Springer, Cham. https://doi.org/10.1007/978-3-319-02432-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-02432-5_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02431-8
Online ISBN: 978-3-319-02432-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics