Skip to main content

Implementing Efficient Updates in Compressed Big Text Databases

  • Conference paper
Database and Expert Systems Applications (DEXA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8056))

Included in the following conference series:

Abstract

Text compression techniques like bzip2 lack the possibility to insert or to delete strings at a given position into a text that has been compressed without prior decompression of the compressed text. We present a technique called DICIRT that supports fast insertion into and deletion from compressed texts without full decompression of the compressed text. For inserted fragments up to a size of 8% of the original text size, and for deleted fragments up to 15% of the original text DICIRT is faster than modifying uncompressed text preceded by a decompression step and followed by a compression step.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Böttcher, S., Bültmann, A., Hartel, R.: Search and Modification in Compressed Texts. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 403–412 (2011)

    Google Scholar 

  2. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124 (1994)

    Google Scholar 

  3. Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 141–152 (2003)

    Google Scholar 

  4. Zhang, N., Kacholia, V., Özsu, M.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, Boston, MA, USA, pp. 54–65 (2004)

    Google Scholar 

  5. Böttcher, S., Hartel, R., Jacobs, T.: Fast multi-update operations on compressed XML data. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds.) BNCOD 2013. LNCS, vol. 7968, pp. 149–164. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the I.R.E., pp. 1098–1101 (1952)

    Google Scholar 

  7. Fraenkel, A., Klein, S.: Robust Universal Complete Codes for Transmission and Compression. Discrete Applied Mathematics 64, 31–55 (1996)

    Article  MATH  Google Scholar 

  8. Golomb, S.: Run-length encodings. IEEE Transactions on Information Theory 12, 399–401 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  9. Witten, I., Neal, R., Cleary, J.: Arithmetic Coding for Data Compression. Commun. ACM 30, 520–540 (1987)

    Article  Google Scholar 

  10. Martin, G.N.N.: Range encoding: an algorithm for removing redundancy from a digitized message. In: Video and Data Recording Conference, Southhampton (1979)

    Google Scholar 

  11. Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23, 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  12. Ziv, J., Lempel, A.: Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24, 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  13. Welch, T.: A Technique for High-Performance Data Compression. IEEE Computer 17, 8–19 (1984)

    Article  Google Scholar 

  14. Cleary, J., Witten, I.: Data Compression Using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications 32, 396–402 (1984)

    Article  Google Scholar 

  15. Cormack, G., Horspool, R.: Data Compression Using Dynamic Markov Modelling. Comput. J. 30, 541–550 (1987)

    Article  MathSciNet  Google Scholar 

  16. Nevill-Manning, C., Witten, I.: Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)

    MATH  Google Scholar 

  17. Kreft, S., Navarro, G.: LZ77-Like Compression with Fast Random Access. In: 2010 Data Compression Conference (DCC 2010), Snowbird, UT, USA, pp. 239–248 (2010)

    Google Scholar 

  18. Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random Access to grammar-Compressed Strings. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, pp. 373–389 (2011)

    Google Scholar 

  19. Chan, H.-L., Hon, W.-K., Lam, T., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3 (2007)

    Google Scholar 

  20. Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5, 12–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  21. Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  22. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, Washington, DC, USA, pp. 269–278 (2001)

    Google Scholar 

  23. Salson, M., Lecroq, T., Léonard, M., Mouchard, L.: A four-stage algorithm for updating a Burrows-Wheeler transform. Theor. Comput. Sci. 410, 4350–4359 (2009)

    Article  MATH  Google Scholar 

  24. Léonard, M., Mouchard, L., Salson, M.: On the number of elements to reorder when updating a suffix array. J. Discrete Algorithms 11, 87–99 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Mäkinen, V., Navarro, G.: Succinct Suffix Arrays based on Run-Length Encoding. Nord. J. Comput. 12, 40–66 (2005)

    Google Scholar 

  26. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, Maryland, USA, pp. 841–850 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Böttcher, S., Bültmann, A., Hartel, R., Schlüßler, J. (2013). Implementing Efficient Updates in Compressed Big Text Databases. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40173-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40173-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40172-5

  • Online ISBN: 978-3-642-40173-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics