Advertisement

Grammar Precompression Speeds Up Burrows–Wheeler Compression

  • Juha Kärkkäinen
  • Pekka Mikkola
  • Dominik Kempa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7608)

Abstract

Text compression algorithms based on the Burrows–Wheeler transform (BWT) typically achieve a good compression ratio but are slow compared to Lempel–Ziv type compression algorithms. The main culprit is the time needed to compute the BWT during compression and its inverse during decompression. We propose to speed up BWT-based compression by performing a grammar-based precompression before the transform. The idea is to reduce the amount of data that BWT and its inverse have to process. We have developed a very fast grammar precompressor using pair replacement. Experiments show a substantial speed up in practice without a significant effect on compression ratio.

Keywords

Compression Ratio Compression Algorithm Compression Rate Pair Replacement Alphabet Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abel, J.: Post BWT stages of the Burrows-Wheeler compression algorithm. Softw., Pract. Exper. 40(9), 751–777 (2010)MathSciNetGoogle Scholar
  2. 2.
    Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows–Wheeler Transform: Data Compression Suffix Arrays, and Pattern Matching. Springer (2008)Google Scholar
  3. 3.
    Bentley, J.L., McIlroy, M.D.: Data compression with long repeated strings. Inf. Sci. 135(1-2), 1–11 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Cannane, A., Williams, H.E.: General-purpose compression for efficient retrieval. JASIST 52(5), 430–437 (2001)CrossRefGoogle Scholar
  5. 5.
    Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Fariña, A., Brisaboa, N.R., Navarro, G., Claude, F., Places, Á.S., Rodríguez, E.: Word-based self-indexes for natural language text. ACM Trans. Inf. Syst. 30(1), 1 (2012)CrossRefGoogle Scholar
  7. 7.
    Ferragina, P., Manzini, G.: On compressing the textual web. In: Proc. 3rd Conference on Web Search and Web Data Mining (WSMD), pp. 391–400. ACM (2010)Google Scholar
  8. 8.
    Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Slashing the time for BWT inversion. In: Proc. Data Compression Conference, pp. 99–108. IEEE CS (2012)Google Scholar
  9. 9.
    Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88, 1722–1732 (2000)CrossRefGoogle Scholar
  10. 10.
    Mahoney, M.: Large text compression benchmark (July 10, 2012), http://mattmahoney.net/dc/text.html
  11. 11.
    Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. Inf. Syst. 15(2), 124–136 (1997)CrossRefGoogle Scholar
  12. 12.
    Skibinski, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression. Softw., Pract. Exper. 35(15), 1455–1476 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Juha Kärkkäinen
    • 1
  • Pekka Mikkola
    • 1
  • Dominik Kempa
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations