Skip to main content

Dynamic Entropy-Compressed Sequences and Full-Text Indexes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Abstract

Given a sequence of n bits with binary zero-order entropy H 0, we present a dynamic data structure that requires nH 0 + o(n) bits of space, which is able of performing rank and select, as well as inserting and deleting bits at arbitrary positions, in O(logn) worst-case time. This extends previous results by Hon et al. [ISAAC 2003] achieving O(logn/loglogn) time for rank and select but \(\Theta({\textrm{polylog}}(n))\) amortized time for inserting and deleting bits, and requiring n + o(n) bits of space; and by Raman et al. [SODA 2002] which have constant query time but a static structure. In particular, our result becomes the first entropy-bound dynamic data structure for rank and select over bit sequences.

We then show how the above result can be used to build a dynamic full-text self-index for a collection of texts over an alphabet of size σ, of overall length n and zero-order entropy H 0. The index requires nH 0 + o(n logσ) bits of space, and can count the number of occurrences of a pattern of length m in time O(m logn logσ). Reporting the occ occurrences can be supported in O(occ log2 n logσ) time, paying O(n) extra space. Insertion of text to the collection takes O(logn logσ) time per symbol, which becomes O(log2 n logσ) for deletions. This improves a previous result by Chan et al. [CPM 2004]. As a consequence, we obtain an O(n logn logσ) time construction algorithm for a compressed self-index requiring nH 0 + o(n logσ) bits working space during construction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)

    Google Scholar 

  2. Arroyuelo, D., Navarro, G.: Space-efficient construction of LZ-index. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, pp. 1143–1152. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  4. Chan, W.-L., Hon, W.-K., Lam, T.-W.: Compressed index for a dynamic collection of texts. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 445–456. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  5. Dietz, P.: Optimal algorithms for list indexing and subset rank. In: Dehne, F., Santoro, N., Sack, J.-R. (eds.) WADS 1989. LNCS, vol. 382, pp. 39–46. Springer, Heidelberg (1989)

    Google Scholar 

  6. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. FOCS 2000, pp. 390–398 (2000)

    Google Scholar 

  7. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. ACM Transactions on Algorithms (to appear, 2006); Preliminary versions, In: Proc. SPIRE 2004 and Tech. Rep. TR/DCC-2004-5, Dept. of Computer Science Univ. of Chile, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/sequences.ps.gz

  8. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA 2003, pp. 841–850 (2003)

    Google Scholar 

  9. Hon, W.-K., Sadakane, K., Sung, W.-K.: Succinct data structures for searchable partial sums. In: Ibaraki, T., Katoh, N., Ono, H. (eds.) ISAAC 2003. LNCS, vol. 2906, pp. 505–516. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  10. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)

    MathSciNet  Google Scholar 

  11. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)

    Google Scholar 

  12. Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004)

    Article  MATH  Google Scholar 

  13. Raman, R., Raman, V., Srinivasa Rao, S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  14. Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA 2002, pp. 233–242 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mäkinen, V., Navarro, G. (2006). Dynamic Entropy-Compressed Sequences and Full-Text Indexes. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_28

Download citation

  • DOI: https://doi.org/10.1007/11780441_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics