Enhanced Byte Codes with Restricted Prefix Properties

Culpepper, J. Shane; Moffat, Alistair

doi:10.1007/11575832_1

J. Shane Culpepper¹⁸ &
Alistair Moffat¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3772))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

1533 Accesses
27 Citations
3 Altmetric

Abstract

Byte codes have a number of properties that make them attractive for practical compression systems: they are relatively easy to construct; they decode quickly; and they can be searched using standard byte-aligned string matching techniques. In this paper we describe a new type of byte code in which the first byte of each codeword completely specifies the number of bytes that comprise the suffix of the codeword. Our mechanism gives more flexible coding than previous constrained byte codes, and hence better compression. The structure of the code also suggests a heuristic approximation that allows savings to be made in the prelude that describes the code. We present experimental results that compare our new method with previous approaches to byte coding, in terms of both compression effectiveness and decoding throughput speeds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F.: (S,C)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003a)
Chapter Google Scholar
Brisaboa, N.R., Fariña, A., Navarro, G., Paramá, J.R.: Efficiently decodable and searchable natural language adaptive compression. In: Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil. ACM Press, New York (2005) (to appear)
Google Scholar
Brisaboa, N.R., Iglesias, E.L., Navarro, G., Paramá, J.R.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003b)
Chapter Google Scholar
Chen, D., Chiang, Y.-J., Memon, N., Wu, X.: Optimal alphabet partitioning for semi-adaptive coding of sources of unknown sparse distributions. In: Storer, J.A., Cohn, M. (eds.) Proc. 2003 IEEE Data Compression Conference, pp. 372–381. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems 18(2), 113–139 (2000)
Article Google Scholar
Golomb, S.W.: Run-length encodings. IEEE Transactions on Information Theory IT–12(3), 399–401 (1966)
Article MathSciNet Google Scholar
Liddell, M., Moffat, A.: Decoding prefix codes (December 2004); Submitted, Preliminary version published. In: Proc. IEEE Data Compression Conference, pp. 392–401 (2003)
Google Scholar
Rautio, J., Tanninen, J., Tarhio, J.: String matching with stopper encoding and code splitting. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 42–51. Springer, Heidelberg (2002)
Chapter Google Scholar
Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Beaulieu, M., Baeza-Yates, R., Myaeng, S.H., Järvelin, K. (eds.) Proc. 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 222–229. ACM Press, New York (2002)
Chapter Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, 3010, Australia
J. Shane Culpepper & Alistair Moffat

Authors

J. Shane Culpepper
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Moffat
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Toronto,
Mariano Consens
Dept. of Computer Science, University of Chile,
Gonzalo Navarro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Culpepper, J.S., Moffat, A. (2005). Enhanced Byte Codes with Restricted Prefix Properties. In: Consens, M., Navarro, G. (eds) String Processing and Information Retrieval. SPIRE 2005. Lecture Notes in Computer Science, vol 3772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575832_1

Download citation

DOI: https://doi.org/10.1007/11575832_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29740-6
Online ISBN: 978-3-540-32241-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics