Skip to main content

Grammar-Based Compression in a Streaming Model

  • Conference paper
Language and Automata Theory and Applications (LATA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6031))

Abstract

We show that, given a string s of length n, with constant memory and logarithmic passes over a constant number of streams we can build a context-free grammar that generates s and only s and whose size is within an \({\mathcal O}\left({\min \left( g \log g, \sqrt{n / \log n} \right)}\right)\)-factor of the minimum g. This stands in contrast to our previous result that, with polylogarithmic memory and polylogarithmic passes over a single stream, we cannot build such a grammar whose size is within any polynomial of g.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Albert, P., Mayordomo, E., Moser, P., Perifel, S.: Pushdown compression. In: Proceedings of the Symposium on Theoretical Aspects of Computer Science, pp. 39–48 (2008)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Amir, A., Aumann, Y., Levy, A., Roshko, Y.: Quasi-distinct parsing and optimal compression methods. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 12–25. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Symposium on Database Systems, pp. 1–16 (2002)

    Google Scholar 

  5. Beame, P., Huynh, T.: On the value of multiple read/write streams for approximating frequency moments. In: Proceedings of the Symposium on Foundations of Computer Science, pp. 499–508 (2008)

    Google Scholar 

  6. Beame, P., Jayram, T.S., Rudra, A.: Lower bounds for randomized read/write stream algorithms. In: Proceedings of the Symposium on Theory of Computing, pp. 689–698 (2007)

    Google Scholar 

  7. Bille, P., Landau, G., Weimann, O.: Random access to grammar compressed strings (2010), http://arxiv.org/abs/1001.1565

  8. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., shelat, a.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  9. Chen, J., Yap, C.-K.: Reversal complexity. SIAM Journal on Computing 20(4), 622–638 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  10. Claude, F., Navarro, G.: Self-indexed text compression using straight-line programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. De Agostino, S., Storer, J.A.: On-line versus off-line computation in dynamic text compression. Information Processing Letters 59(3), 169–174 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  12. Ferragina, P., Gagie, T., Manzini, G.: Lightweight data indexing and compression in external memory. In: Proceedings of the Latin American Theoretical Informatics Symposium (to appear, 2010)

    Google Scholar 

  13. Gagie, T.: On the value of multiple read/write streams for data compression. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 68–77. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  14. Gagie, T., Manzini, G.: Space-conscious compression. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 206–217. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Grohe, M., Hernich, A., Schweikardt, N.: Lower bounds for processing data with few random accesses to external memory. Journal of the ACM 56(3), 1–58 (2009)

    Article  MathSciNet  Google Scholar 

  16. Grohe, M., Schweikardt, N.: Lower bounds for sorting with few random accesses to external memory. In: Proceedings of the Symposium on Database Systems, pp. 238–249 (2005)

    Google Scholar 

  17. Hernich, A., Schweikardt, N.: Reversal complexity revisited. Theoretical Computer Science 401(1-3), 191–205 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kieffer, J.C., Yang, E.-H.: Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory 46(3), 737–754 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  19. Kieffer, J.C., Yang, E.-H., Nelson, G.J., Cosman, P.C.: Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory 46(4), 1227–1245 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  20. Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)

    Article  MathSciNet  Google Scholar 

  21. Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Proceedings of the Data Compression Conference (to appear, 2010)

    Google Scholar 

  22. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  23. Lifshits, Y.: Processing compressed texts: A tractability border. In: Proceedings of the Symposium on Combinatorial Pattern Matching, pp. 228–240 (2007)

    Google Scholar 

  24. Lifshits, Y., Mozes, S., Weimann, O., Ziv-Ukelson, M.: Speeding up HMM decoding and training by exploiting sequence repetitions. Algorithmica 54(3), 379–399 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  25. Magniez, F., Mathieu, C., Nayak, A.: Recognizing well-parenthesized expressions in the streaming model. Technical Report TR09-119, Electronic Colloquium on Computational Complexity (2009)

    Google Scholar 

  26. Mayordomo, E., Moser, P.: Polylog space compression is incomparable with Lempel-Ziv and pushdown compression. In: Proceedings of the Conference on Current Trends in Theory and Practice of Informatics, pp. 633–644 (2009)

    Google Scholar 

  27. Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  28. Muthukrishnan, S.: Data Streams: Algorithms and Applications. In: Foundations and Trends in Theoretical Computer Science, vol. 1(2). Now Publishers (2005)

    Google Scholar 

  29. Navarro, G., Raffinot, M.: Practical and flexible pattern matching over Ziv-Lempel compressed text. Journal of Discrete Algorithms 2(3), 347–371 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  30. Navarro, G., Russo, L.M.S.: Re-pair achieves high-order entropy. In: Proceedings of the Data Compression Conference, p. 537 (2008)

    Google Scholar 

  31. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  32. Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms 3(2-4), 416–430 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  33. Sakamoto, H., Kida, T., Shimozono, S.: A space-saving linear-time algorithm for grammar-based compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)

    Google Scholar 

  34. Sakamoto, H., Maruyama, S., Kida, T., Shimozono, S.: A space-saving approximation algorithm for grammar-based compression. IEICE Transactions 92-D(2), 158–165 (2009)

    Article  Google Scholar 

  35. Schweikardt, N.: Machine models and lower bounds for query processing. In: Proceedings of the Symposium on Principles of Database Systems, pp. 41–52 (2007)

    Google Scholar 

  36. Sheinwald, D., Lempel, A., Ziv, J.: On encoding and decoding with two-way head machines. Information and Computation 116(1), 128–133 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  37. Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. Journal of the ACM 29(4), 928–951 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  38. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  39. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gagie, T., Gawrychowski, P. (2010). Grammar-Based Compression in a Streaming Model. In: Dediu, AH., Fernau, H., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2010. Lecture Notes in Computer Science, vol 6031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13089-2_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13089-2_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13088-5

  • Online ISBN: 978-3-642-13089-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics