Block Merging for Off-Line Compression
To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once. In this work we consider the Re-Pair mechanism of , which processes large messages as disjoint blocks. We show that the blocks emitted by Re-Pair can be post-processed to yield further savings, and describe techniques that allow files of 500 MB or more to be compressed in a holistic manner using less than that much main memory. The block merging process we describe has the additional advantage of allowing new text to be appended to the end of the compressed file.
KeywordsDirected Acyclic Graph Sink Node Wall Street Journal Arithmetic Coder Entropy Coder
Unable to display preview. Download preview PDF.
- A. Apostolico and S. Lonardi. Off-line compression by greedy textual substitution. Proc. IEEE, 88(11):1733–1744, Nov. 2000.Google Scholar
- J. Bentley and D. McIlroy. Data compression using long common strings. In J. A. Storer and M. Cohn, editors, Proc. 1999 IEEE Data Compression Conference, pages 287–295. IEEE Computer Society Press, Los Alamitos, California, Mar. 1999.Google Scholar
- A. Cannane and H. E. Williams. A compression scheme for large databases. In M. E. Orlowska, editor, Proc. 11th Australasian Database Conference, pages 6–11, Canberra, Australia, 2000. IEEE Computer Society Press, Los Alamitos, CA.Google Scholar
- A. Cannane and H. E. Williams. General-purpose compression for efficient retrieval. Journal of the American Society for Information Science and Technology, 52(5):430–437, Mar. 2001.Google Scholar
- N. J. Larsson and A. Moffat. Offline dictionary-based compression. Proc. IEEE, 88(11):1722–1732, Nov. 2000.Google Scholar
- U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems, 15(2): 124–136, Apr. 1997.Google Scholar
- A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA, 2002.Google Scholar