Block Merging for Off-Line Compression

  • Raymond Wan
  • Alistair Moffat
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2373)


To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once. In this work we consider the Re-Pair mechanism of [2000], which processes large messages as disjoint blocks. We show that the blocks emitted by Re-Pair can be post-processed to yield further savings, and describe techniques that allow files of 500 MB or more to be compressed in a holistic manner using less than that much main memory. The block merging process we describe has the additional advantage of allowing new text to be appended to the end of the compressed file.


Directed Acyclic Graph Sink Node Wall Street Journal Arithmetic Coder Entropy Coder 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. A. Apostolico and S. Lonardi. Off-line compression by greedy textual substitution. Proc. IEEE, 88(11):1733–1744, Nov. 2000.Google Scholar
  2. D. Bahle, H. E. Williams, and J. Zobel. Compaction techniques for nextword indexes. In G. Navarro, editor, Proc. 8th International Symposium on String Processing and Information Retrieval, pages 33–45. IEEE Computer Society Press, Los Alamitos, CA, Nov. 2001.CrossRefGoogle Scholar
  3. J. Bentley and D. McIlroy. Data compression using long common strings. In J. A. Storer and M. Cohn, editors, Proc. 1999 IEEE Data Compression Conference, pages 287–295. IEEE Computer Society Press, Los Alamitos, California, Mar. 1999.Google Scholar
  4. A. Cannane and H. E. Williams. A compression scheme for large databases. In M. E. Orlowska, editor, Proc. 11th Australasian Database Conference, pages 6–11, Canberra, Australia, 2000. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
  5. A. Cannane and H. E. Williams. General-purpose compression for efficient retrieval. Journal of the American Society for Information Science and Technology, 52(5):430–437, Mar. 2001.Google Scholar
  6. E. S. de Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems, 18(2): 113–139, 2000.CrossRefGoogle Scholar
  7. J. Katajainen and T. Raita. An approximation algorithm for space-optimal encoding of a text. The Computer Journal, 32(3):228–237, 1989.CrossRefGoogle Scholar
  8. S. T. Klein. Efficient optimal recompression. The Computer Journal, 40(2/3): 117–126, 1997.CrossRefGoogle Scholar
  9. N. J. Larsson and A. Moffat. Offline dictionary-based compression. Proc. IEEE, 88(11):1722–1732, Nov. 2000.Google Scholar
  10. U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems, 15(2): 124–136, Apr. 1997.Google Scholar
  11. A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA, 2002.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Raymond Wan
    • 1
  • Alistair Moffat
    • 1
  1. 1.Department of Computer Science and Software EngineeringThe University of MelbourneAustralia

Personalised recommendations