Skip to main content

Compressing Integer Sequences and Sets

2000; Moffat, Stuiver

  • Reference work entry

Problem Definition

Suppose that a message \( { M=\langle s_1, s_2, \dots, s_n\rangle } \) of length \( { n=|M| } \) symbols is to be represented, where each symbol s i is an integer in the range \( { 1 \leq s_i \leq U } \), for some upper limit U that may or may not be known, and may or may not be finite. Messages in this form are commonly the output of some kind of modeling step in a data compression system. The objective is to represent the message over a binary output alphabet \( { \{ \texttt{0},\texttt{1}\} } \) using as few as possible output bits. A special case of the problem arises when the elements of the message are strictly increasing, \( { s_i < s_{i+1} } \). In this case the message M can be thought of as identifying a subset of \( { \{1, 2, \dots, U\} } \). Examples include storing sets of IP addresses or product codes, and recording the destinations of hyperlinks in the graph representation of the world wide web....

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   399.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Anh, V.N., Moffat, A.: Improved word-aligned binary compression for text indexing. IEEE Trans. Knowl. Data Eng. 18(6), 857–861 (2006)

    Article  Google Scholar 

  2. Boldi, P., Vigna, S.: Codes for the world-wide web. Internet Math. 2(4), 405–427 (2005)

    Article  MathSciNet  Google Scholar 

  3. Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F.: \( { ({S},{C}) } \)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A. (ed.) Proc. Symp. String Processing and Information Retrieval. LNCS, vol. 2857, pp. 122–136, Manaus, Brazil, October 2003

    Google Scholar 

  4. Chen, D., Chiang, Y.J., Memon, N., Wu, X.: Optimal alphabet partitioning for semi-adaptive coding of sources of unknown sparse distributions. In: Storer, J.A., Cohn, M. (eds.) Proc. 2003 IEEE Data Compression Conference, pp. 372–381, IEEE Computer Society Press, Los Alamitos, California, March 2003

    Google Scholar 

  5. Cheng, C.S., Shann, J.J.J., Chung, C.P.: Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems. Inf. Process. Manag. 42(2), 407–428 (2006)

    Article  MATH  Google Scholar 

  6. Culpepper, J.S., Moffat, A.: Enhanced byte codes with restricted prefix properties. In: Consens, M.P., Navarro, G. (eds.) Proc. Symp. String Processing and Information Retrieval. LNCS Volume 3772, pp. 1–12, Buenos Aires, November 2005

    Google Scholar 

  7. de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. Inf. Syst. 18(2), 113–139 (2000)

    Article  Google Scholar 

  8. Fenwick, P.: Universal codes. In: Sayood, K. (ed.) Lossless Compression Handbook, pp. 55–78, Academic Press, Boston (2003)

    Google Scholar 

  9. Fraenkel, A.S., Klein, S.T.: Novel compression of sparse bit-strings –Preliminary report. In: Apostolico, A., Galil, Z. (eds) Combinatorial Algorithms on Words, NATO ASI Series F, vol. 12, pp. 169–183. Springer, Berlin (1985)

    Chapter  Google Scholar 

  10. Gupta, A., Hon, W.K., Shah, R., Vitter, J.S.: Compressed data structures: Dictionaries and data-aware measures. In: Storer, J.A., Cohn, M. (eds) Proc. 16th IEEE Data Compression Conference, pp. 213–222, IEEE, Snowbird, Utah, March 2006 Computer Society, Los Alamitos, CA

    Google Scholar 

  11. Moffat, A., Anh, V.N.: Binary codes for locally homogeneous sequences. Inf. Process. Lett. 99(5), 75–80 (2006) Source code available from www.cs.mu.oz.au/~alistair/rbuc/

    Article  MathSciNet  Google Scholar 

  12. Moffat, A., Stuiver, L.: Binary interpolative coding for effective index compression. Inf. Retr. 3(1), 25–47 (2000)

    Article  Google Scholar 

  13. Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer Academic Publishers, Boston (2002)

    Book  Google Scholar 

  14. Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242, San Francisco, CA, January 2002, SIAM, Philadelphia, PA

    Google Scholar 

  15. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco, (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag

About this entry

Cite this entry

Moffat, A. (2008). Compressing Integer Sequences and Sets. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_84

Download citation

Publish with us

Policies and ethics