Abstract
In this paper we deal with compressed integer arrays that are equipped with fast random access. Our treatment improves over an earlier approach that used address-calculation coding to locate the elements and supported access and search operations in \(O(\lg (n+s))\) time for a sequence of n non-negative integers summing up to s. The idea is to complement the address-calculation method with index structures that considerably decrease access times and also enable updates. For all our structures the memory usage is \(n \lg(1 + s/n) + O(n)\) bits. First a read-only version is introduced that supports rank-based accesses to elements and retrievals of prefix sums in \(O(\lg \lg (n+s)\)) time, as well as prefix-sum searches in \(O(\lg n+ \lg \lg s)\) time, using the word RAM as the model of computation. The second version of the data structure supports accesses in \(O(\lg\lg U)\) time and changes of element values in \(O(\lg^2 U)\) time, where U is the universe size. Both versions performed quite well in practical experiments. A third extension to dynamic arrays is also described, supporting accesses and prefix-sum searches in \(O(\lg n + \lg\lg U)\) time, and insertions and deletions in \(O(\lg^2 U)\) time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Rec. 22(2), 207–216 (1993)
Brisaboa, N.R., Ladra, S., Navarro, G.: Directly Addressable Variable-Length Codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 122–130. Springer, Heidelberg (2009)
Brodnik, A.: Computation of the least significant set bit. In: Proc. Electrotechnical and Comput. Sci. Conf., vol. B, pp. 7–10 (1993)
Culpepper, J.S., Moffat, A.: Compact Set Representation for Information Retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)
Delpratt, O., Rahman, N., Raman, R.: Compressed Prefix Sums. In: van Leeuwen, J., Italiano, G.F., van der Hoek, W., Meinel, C., Sack, H., Plášil, F. (eds.) SOFSEM 2007. LNCS, vol. 4362, pp. 235–247. Springer, Heidelberg (2007)
Fenwick, P.M.: A new data structure for cumulative frequency tables. Software Pract. Exper. 24(3), 327–336 (1994)
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theoret. Comput. Sci. 372(1), 115–121 (2007)
González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 294–305. Springer, Heidelberg (2006)
Gupta, A., Hon, W.K., Shah, R., Vitter, J.S.: Compressed data structures: Dictionaries and data-aware measures. Theoret. Comput. Sci. 387(3), 313–331 (2007)
Hagerup, T.: Sorting and Searching on the Word RAM. In: Meinel, C., Morvan, M. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998)
Jansson, J., Sadakane, K., Sung, W.K.: CRAM: Compressed random access memory. E-print arXiv:1011.1708v2, arXiv.org, Ithaca (2012)
Katajainen, J., Rao, S.S.: A compact data structure for representing a dynamic multiset. Inform. Process. Lett. 110(23), 1061–1066 (2010)
Moffat, A.: Compressing integer sequences and sets. In: Encyclopedia of Algorithms, pp. 178–182. Springer Science+Business Media, LLC, New York (2008)
Moffat, A., Stuiver, L.: Binary interpolative coding for effective index compression. Inform. Retrieval 3(1), 25–47 (2000)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inform. Syst. 14(4), 349–379 (1996)
Raman, R., Raman, V., Rao, S.S.: Succinct Dynamic Data Structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43:1–43:25 (2007)
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proc. 17th Annual ACM-SIAM Symp. on Discrete Algorithms, pp. 1230–1239. ACM/SIAM, New York/Philadelphia (2006)
Teuhola, J.: Interpolative coding of integer sequences supporting log-time random access. Inform. Process. Manag. 47(5), 742–761 (2011)
Transier, F., Sanders, P.: Engineering basic algorithms of an in-memory text search engine. ACM Trans. Inform. Syst. 29(1), 2:1–2:37 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elmasry, A., Katajainen, J., Teuhola, J. (2012). Improved Address-Calculation Coding of Integer Arrays. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)