Advertisement

On the Value of Multiple Read/Write Streams for Data Compression

  • Travis Gagie
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5577)

Abstract

We study whether, when restricted to using polylogarithmic memory and polylogarithmic passes, we can achieve qualitatively better data compression with multiple read/write streams than we can with only one. We first show how we can achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for us to achieve good grammar-based compression. Finally, we show that two streams are necessary and sufficient for us to achieve entropy-only bounds.

Keywords

Data Compression Arithmetic Code Data Compression Algorithm Markov Source Streaming Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, G., Datar, M., Rajagopalan, S., Ruhl, M.: On the streaming model augmented with a sorting primitive. In: Proceedings of the 45th Symposium on Foundations of Computer Science, pp. 540–549 (2004)Google Scholar
  2. 2.
    Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: An optimal cache-oblivious priority queue and its application to graph algorithms. SIAM Journal on Computing 36(6), 1672–1695 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Beame, P., Huỳnh-Ngọc, D.-T.: On the value of multiple read/write streams for approximating frequency moments. In: Proceedings of the 49th Symposium on Foundations of Computer Science, pp. 499–508 (2008)Google Scholar
  4. 4.
    Bird, R.S., Mu, S.-C.: Inverting the Burrows-Wheeler transform. Journal of Functional Programming 14(6), 603–612 (2004)CrossRefzbMATHGoogle Scholar
  5. 5.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 24, Digital Equipment Corporation (1994)Google Scholar
  6. 6.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., shelat, a.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Chen, J., Yap, C.-K.: Reversal complexity. SIAM Journal on Computing 20(4), 622–638 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, Chichester (2006)zbMATHGoogle Scholar
  10. 10.
    de Bruijn, N.G.: A combinatorial problem. Koninklijke Nederlandse Akademie van Wetenschappen 49, 758–764 (1946)zbMATHGoogle Scholar
  11. 11.
    Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)Google Scholar
  12. 12.
    Gagie, T., Manzini, G.: Move-to-front, distance coding, and inversion frequencies revisited. In: Proceedings of the 18th Symposium on Combinatorial Pattern Matching, pp. 71–82 (2007)Google Scholar
  13. 13.
    Gagie, T., Manzini, G.: Space-conscious compression. In: Proceedings of the 32nd Symposium on Mathematical Foundations of Computer Science, pp. 206–217 (2007)Google Scholar
  14. 14.
    Grohe, M., Koch, C., Schweikardt, N.: Tight lower bounds for query processing on streaming and external memory data. Theoretical Computer Science 380(1–3), 199–217 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Grohe, M., Schweikardt, N.: Lower bounds for sorting with few random accesses to external memory. In: Proceedings of the 24th Symposium on Principles of Database Systems, pp. 238–249 (2005)Google Scholar
  16. 16.
    Gupta, A., Grossi, R., Vitter, J.S.: Nearly tight bounds on the encoding length of the Burrows-Wheeler Transform. In: Proceedings of the 4th Workshop on Analytic Algorithmics and Combinatorics, pp. 191–202 (2008)Google Scholar
  17. 17.
    Hernich, A., Schweikardt, N.: Reversal complexity revisited. Theoretical Computer Science 401(1–3), 191–205 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Knuth, D.E.: The Art of Computer Programming, 2nd edn., vol. 3. Addison-Wesley, Reading (1998)zbMATHGoogle Scholar
  19. 19.
    Kosaraju, R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM Journal on Computing 29(3), 893–911 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Manzini, G.: An analysis of the Burrows-Wheeler Transform. Journal of the ACM 48(3), 407–430 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. Theoretical Computer Science 12, 315–323 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Muthukrishnan, S.: Data Streams: Algorithms and Applications. In: Foundations and Trends in Theoretical Computer Science. Now Publishers (2005)Google Scholar
  23. 23.
    Ruhl, J.M.: Efficient Algorithms for New Computational Models. PhD thesis, Massachusetts Institute of Technology (2003)Google Scholar
  24. 24.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1–3), 211–222 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Savari, S.: Redundancy of the Lempel-Ziv incremental parsing rule. IEEE Transactions on Information Theory 43(1), 9–21 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Schweikardt, N.: Machine models and lower bounds for query processing. In: Proceedings of the 26th Symposium on Principles of Database Systems, pp. 41–52 (2007)Google Scholar
  27. 27.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Travis Gagie
    • 1
  1. 1.University of Eastern PiedmontAlessandriaItaly

Personalised recommendations