Skip to main content

Fully-Online Grammar Compression

  • Conference paper
String Processing and Information Retrieval (SPIRE 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8214))

Included in the following conference series:

Abstract

We present a fully-online algorithm for constructing straight-line programs (SLPs). A naive array representation of an SLP with n variables on an alphabet of size σ requires \(2n\lg(n+\sigma)\) bits. As already shown in [Tabei et al., CPM’13], in offline setting, this size can be reduced to \( n\lg(n+\sigma) + 2n + o(n)\), which is asymptotically equal to the information-theoretic lower bound. Our algorithm achieves the same size in online setting, i.e., characters of an input string are given one by one to update the current SLP. With an auxiliary position array of size \(n\lg(N/n) + 3n + o(n)\) bits, our representation supports substring extractions in O((m + h)t) time where N is the length of the input string, m is the length of a substring extracted, \(h = O(\lg N)\) is the height of the SLP, t = O(1) in offline case, and \(t=O(\lg n/\lg\lg n)\) in online case. The working space is bounded by \((1+\alpha)n\lg(n+\sigma)+n(3+\lg(\alpha n))\) bits depending on a constant α ∈ (0,1], which is a load factor of hash tables. We compared our algorithm to LZend in experiments using real world repetitive texts.

This work was supported by JSPS KAKENHI(24700140,23680016,23240002).

The original version of this chapter was revised: The copyright line was incorrect. This has been corrected. The Erratum to this chapter is available at DOI: 10.1007/978-3-319-02432-5_33

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111, 313–337 (2010)

    MathSciNet  MATH  Google Scholar 

  2. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram mining on slp compressed strings. J. Discrete Algorithms 18, 89–99 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 636–645 (2003)

    Google Scholar 

  4. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: STACS, pp. 26–28 (2009)

    Google Scholar 

  5. Inenaga, S., Bannai, H.: Finding characteristic substrings from compressed texts. In: PSC, pp. 40–54 (2009)

    Google Scholar 

  6. Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)

    Google Scholar 

  7. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic J. Comp. 4(2), 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  8. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoretical Computer Science 483, 115–133 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Maruyama, S., Sakamoto, H., Takeda, M.: An online algorithm for lightweight grammar-based compression. Algorithms 5(2), 213–235 (2012)

    MathSciNet  Google Scholar 

  10. Matsubara, W., Inenaga, S., Ishino, A., Shinohara, A., Nakamura, T., Hashimoto, K.: Efficient algorithms to compute compressed longest common substrings and compressed palindromes. Theoretical Computer Science 410(8-10), 900–913 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  11. Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Proc. SEA, pp. 295–306 (2012)

    Google Scholar 

  12. Navarro, G., Sadakane, K.: Fully-functional static and dynamic succinct trees. ACM Transactions on Algorithms (2010), Accepted A preliminary version appeared in SODA 2010

    Google Scholar 

  13. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Workshop on Algorithm Engineering & Experiments (2007)

    Google Scholar 

  14. Raman, R., Rao, S.S., Raman, V.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3 (2007)

    Google Scholar 

  15. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Tabei, Y., Takabatake, Y., Sakamoto, H.: A succinct grammar compression. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 235–246. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  17. Takabatake, Y., Tabei, Y., Sakamoto, H.: Variable-length codes for space-efficient grammar-based compression. In: SPIRE, pp. 398–410 (2012)

    Google Scholar 

  18. Tiskin, A.: Towards approximate matching in compressed strings: Local subsequence recognition. In: Kulikov, A., Vereshchagin, N. (eds.) CSR 2011. LNCS, vol. 6651, pp. 401–414. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and don’t-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K. (2013). Fully-Online Grammar Compression. In: Kurland, O., Lewenstein, M., Porat, E. (eds) String Processing and Information Retrieval. SPIRE 2013. Lecture Notes in Computer Science, vol 8214. Springer, Cham. https://doi.org/10.1007/978-3-319-02432-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02432-5_25

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02431-8

  • Online ISBN: 978-3-319-02432-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics