Compressing Integer Sequences and Sets

Moffat, Alistair

doi:10.1007/978-0-387-30162-4_84

Compressing Integer Sequences and Sets

2000; Moffat, Stuiver

Alistair Moffat²

Reference work entry

221 Accesses
4 Citations

Problem Definition

Suppose that a message \( { M=\langle s_1, s_2, \dots, s_n\rangle } \) of length \( { n=|M| } \) symbols is to be represented, where each symbol s _i is an integer in the range \( { 1 \leq s_i \leq U } \), for some upper limit U that may or may not be known, and may or may not be finite. Messages in this form are commonly the output of some kind of modeling step in a data compression system. The objective is to represent the message over a binary output alphabet \( { \{ \texttt{0},\texttt{1}\} } \) using as few as possible output bits. A special case of the problem arises when the elements of the message are strictly increasing, \( { s_i < s_{i+1} } \). In this case the message M can be thought of as identifying a subset of \( { \{1, 2, \dots, U\} } \). Examples include storing sets of IP addresses or product codes, and recording the destinations of hyperlinks in the graph representation of the world wide web....

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 399.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

Anh, V.N., Moffat, A.: Improved word-aligned binary compression for text indexing. IEEE Trans. Knowl. Data Eng. 18(6), 857–861 (2006)
Article Google Scholar
Boldi, P., Vigna, S.: Codes for the world-wide web. Internet Math. 2(4), 405–427 (2005)
Article MathSciNet Google Scholar
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F.: \( { ({S},{C}) } \)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A. (ed.) Proc. Symp. String Processing and Information Retrieval. LNCS, vol. 2857, pp. 122–136, Manaus, Brazil, October 2003
Google Scholar
Chen, D., Chiang, Y.J., Memon, N., Wu, X.: Optimal alphabet partitioning for semi-adaptive coding of sources of unknown sparse distributions. In: Storer, J.A., Cohn, M. (eds.) Proc. 2003 IEEE Data Compression Conference, pp. 372–381, IEEE Computer Society Press, Los Alamitos, California, March 2003
Google Scholar
Cheng, C.S., Shann, J.J.J., Chung, C.P.: Unique-order interpolative coding for fast querying and space-efficient indexing in information retrieval systems. Inf. Process. Manag. 42(2), 407–428 (2006)
Article MATH Google Scholar
Culpepper, J.S., Moffat, A.: Enhanced byte codes with restricted prefix properties. In: Consens, M.P., Navarro, G. (eds.) Proc. Symp. String Processing and Information Retrieval. LNCS Volume 3772, pp. 1–12, Buenos Aires, November 2005
Google Scholar
de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. Inf. Syst. 18(2), 113–139 (2000)
Article Google Scholar
Fenwick, P.: Universal codes. In: Sayood, K. (ed.) Lossless Compression Handbook, pp. 55–78, Academic Press, Boston (2003)
Google Scholar
Fraenkel, A.S., Klein, S.T.: Novel compression of sparse bit-strings –Preliminary report. In: Apostolico, A., Galil, Z. (eds) Combinatorial Algorithms on Words, NATO ASI Series F, vol. 12, pp. 169–183. Springer, Berlin (1985)
Chapter Google Scholar
Gupta, A., Hon, W.K., Shah, R., Vitter, J.S.: Compressed data structures: Dictionaries and data-aware measures. In: Storer, J.A., Cohn, M. (eds) Proc. 16th IEEE Data Compression Conference, pp. 213–222, IEEE, Snowbird, Utah, March 2006 Computer Society, Los Alamitos, CA
Google Scholar
Moffat, A., Anh, V.N.: Binary codes for locally homogeneous sequences. Inf. Process. Lett. 99(5), 75–80 (2006) Source code available from www.cs.mu.oz.au/~alistair/rbuc/
Article MathSciNet Google Scholar
Moffat, A., Stuiver, L.: Binary interpolative coding for effective index compression. Inf. Retr. 3(1), 25–47 (2000)
Article Google Scholar
Moffat, A., Turpin, A.: Compression and Coding Algorithms. Kluwer Academic Publishers, Boston (2002)
Book Google Scholar
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242, San Francisco, CA, January 2002, SIAM, Philadelphia, PA
Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco, (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, University of Melbourne, Melbourne, VIC, Australia
Alistair Moffat

Authors

Alistair Moffat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer ScienceMcCormick School of Engineering and Applied Science, Northwestern University, Evanston, IL, 60208, USA
Ming-Yang Kao Professor of Computer Science

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Moffat, A. (2008). Compressing Integer Sequences and Sets. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_84

Download citation

DOI: https://doi.org/10.1007/978-0-387-30162-4_84
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30770-1
Online ISBN: 978-0-387-30162-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics