Abstract
One of the most successful natural language compression methods is word-based Huffman. However, such a two-pass semi-static compressor is not well suited to many interesting real-time transmission scenarios. A one-pass adaptive variant of Huffman exists, but it is character-oriented and rather complex. In this paper we implement word-based adaptive Huffman compression, showing that it obtains very competitive compression ratios. Then, we show how End-Tagged Dense Code, an alternative to word-based Huffman, can be turned into a faster and much simpler adaptive compression method which obtains almost the same compression ratios.
This word is partially supported by CYTED VII.19 RIBIDI Project. It is also funded in part (for the Spanish group) by MCyT (PGE and FEDER) grant(TIC2003-06593) and (for the third author) by Millennium Nucleus Center for Web Research, Grant (P01-029-F), Mideplan, Chile.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice-Hall, Englewood Cliffs (1990)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)
Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F. (s,c)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124 (1994)
de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast searching on compressed text allowing errors. In: Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 298–306 (1998)
de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS) 18(2), 113–139 (2000)
Faller, N.: An adaptive system for data compression. In: Record of the 7th Asilomar Conference on Circuits, Systems, and Computers, pp. 593–597 (1973)
Gallager, R.G.: Variations on a theme by Huffman. IEEE Trans. on Inf. Theory 24(6), 668–674 (1978)
Heaps, H.S.: Information Retrieval: Computational and Theoretical Aspects. Academic Press, New York (1978)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)
Knuth, D.E.: Dynamic Huffman coding. Journal of Algorithms 2(6), 163–180 (1985)
Moffat, A.: Word-based text compression. Software - Practice and Experience 19(2), 185–198 (1989)
Vitter, J.S.: Design and analysis of dynamic Huffman codes. Journal of the ACM (JACM) 34(4), 825–845 (1987)
Vitter, J.S.: Algorithm 673: dynamic Huffman coding. ACM Transactions on Mathematical Software (TOMS) 15(2), 158–167 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R. (2004). Simple, Fast, and Efficient Natural Language Adaptive Compression. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive