Skip to main content

Simple, Fast, and Efficient Natural Language Adaptive Compression

  • Conference paper
String Processing and Information Retrieval (SPIRE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

Abstract

One of the most successful natural language compression methods is word-based Huffman. However, such a two-pass semi-static compressor is not well suited to many interesting real-time transmission scenarios. A one-pass adaptive variant of Huffman exists, but it is character-oriented and rather complex. In this paper we implement word-based adaptive Huffman compression, showing that it obtains very competitive compression ratios. Then, we show how End-Tagged Dense Code, an alternative to word-based Huffman, can be turned into a faster and much simpler adaptive compression method which obtains almost the same compression ratios.

This word is partially supported by CYTED VII.19 RIBIDI Project. It is also funded in part (for the Spanish group) by MCyT (PGE and FEDER) grant(TIC2003-06593) and (for the third author) by Millennium Nucleus Center for Web Research, Grant (P01-029-F), Mideplan, Chile.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice-Hall, Englewood Cliffs (1990)

    Google Scholar 

  2. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Communications of the ACM 20(10), 762–772 (1977)

    Article  Google Scholar 

  3. Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F. (s,c)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical Report 124 (1994)

    Google Scholar 

  6. de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast searching on compressed text allowing errors. In: Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 298–306 (1998)

    Google Scholar 

  7. de Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS) 18(2), 113–139 (2000)

    Article  Google Scholar 

  8. Faller, N.: An adaptive system for data compression. In: Record of the 7th Asilomar Conference on Circuits, Systems, and Computers, pp. 593–597 (1973)

    Google Scholar 

  9. Gallager, R.G.: Variations on a theme by Huffman. IEEE Trans. on Inf. Theory 24(6), 668–674 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  10. Heaps, H.S.: Information Retrieval: Computational and Theoretical Aspects. Academic Press, New York (1978)

    MATH  Google Scholar 

  11. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Eng. 40(9), 1098–1101 (1952)

    Google Scholar 

  12. Knuth, D.E.: Dynamic Huffman coding. Journal of Algorithms 2(6), 163–180 (1985)

    Article  MathSciNet  Google Scholar 

  13. Moffat, A.: Word-based text compression. Software - Practice and Experience 19(2), 185–198 (1989)

    Article  Google Scholar 

  14. Vitter, J.S.: Design and analysis of dynamic Huffman codes. Journal of the ACM (JACM) 34(4), 825–845 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  15. Vitter, J.S.: Algorithm 673: dynamic Huffman coding. ACM Transactions on Mathematical Software (TOMS) 15(2), 158–167 (1989)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R. (2004). Simple, Fast, and Efficient Natural Language Adaptive Compression. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30213-1_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23210-0

  • Online ISBN: 978-3-540-30213-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics