Implementing Efficient Updates in Compressed Big Text Databases

Böttcher, Stefan; Bültmann, Alexander; Hartel, Rita; Schlüßler, Jonathan

doi:10.1007/978-3-642-40173-2_17

Stefan Böttcher²¹,
Alexander Bültmann²¹,
Rita Hartel²¹ &
…
Jonathan Schlüßler²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8056))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1573 Accesses
1 Citations

Abstract

Text compression techniques like bzip2 lack the possibility to insert or to delete strings at a given position into a text that has been compressed without prior decompression of the compressed text. We present a technique called DICIRT that supports fast insertion into and deletion from compressed texts without full decompression of the compressed text. For inserted fragments up to a size of 8% of the original text size, and for deleted fragments up to 15% of the original text DICIRT is faster than modifying uncompressed text preceded by a decompression step and followed by a compression step.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Böttcher, S., Bültmann, A., Hartel, R.: Search and Modification in Compressed Texts. In: 2011 Data Compression Conference (DCC 2011), Snowbird, UT, USA, pp. 403–412 (2011)
Google Scholar
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124 (1994)
Google Scholar
Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: Proceedings of 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 141–152 (2003)
Google Scholar
Zhang, N., Kacholia, V., Özsu, M.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, Boston, MA, USA, pp. 54–65 (2004)
Google Scholar
Böttcher, S., Hartel, R., Jacobs, T.: Fast multi-update operations on compressed XML data. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds.) BNCOD 2013. LNCS, vol. 7968, pp. 149–164. Springer, Heidelberg (2013)
Chapter Google Scholar
Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the I.R.E., pp. 1098–1101 (1952)
Google Scholar
Fraenkel, A., Klein, S.: Robust Universal Complete Codes for Transmission and Compression. Discrete Applied Mathematics 64, 31–55 (1996)
Article MATH Google Scholar
Golomb, S.: Run-length encodings. IEEE Transactions on Information Theory 12, 399–401 (1966)
Article MathSciNet MATH Google Scholar
Witten, I., Neal, R., Cleary, J.: Arithmetic Coding for Data Compression. Commun. ACM 30, 520–540 (1987)
Article Google Scholar
Martin, G.N.N.: Range encoding: an algorithm for removing redundancy from a digitized message. In: Video and Data Recording Conference, Southhampton (1979)
Google Scholar
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23, 337–343 (1977)
Article MathSciNet MATH Google Scholar
Ziv, J., Lempel, A.: Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24, 530–536 (1978)
Article MathSciNet MATH Google Scholar
Welch, T.: A Technique for High-Performance Data Compression. IEEE Computer 17, 8–19 (1984)
Article Google Scholar
Cleary, J., Witten, I.: Data Compression Using Adaptive Coding and Partial String Matching. IEEE Transactions on Communications 32, 396–402 (1984)
Article Google Scholar
Cormack, G., Horspool, R.: Data Compression Using Dynamic Markov Modelling. Comput. J. 30, 541–550 (1987)
Article MathSciNet Google Scholar
Nevill-Manning, C., Witten, I.: Identifying Hierarchical Structure in Sequences: A Linear-Time Algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)
MATH Google Scholar
Kreft, S., Navarro, G.: LZ77-Like Compression with Fast Random Access. In: 2010 Data Compression Conference (DCC 2010), Snowbird, UT, USA, pp. 239–248 (2010)
Google Scholar
Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random Access to grammar-Compressed Strings. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, pp. 373–389 (2011)
Google Scholar
Chan, H.-L., Hon, W.-K., Lam, T., Sadakane, K.: Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3 (2007)
Google Scholar
Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5, 12–22 (2007)
Article MathSciNet MATH Google Scholar
Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52, 552–581 (2005)
Article MathSciNet Google Scholar
Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, Washington, DC, USA, pp. 269–278 (2001)
Google Scholar
Salson, M., Lecroq, T., Léonard, M., Mouchard, L.: A four-stage algorithm for updating a Burrows-Wheeler transform. Theor. Comput. Sci. 410, 4350–4359 (2009)
Article MATH Google Scholar
Léonard, M., Mouchard, L., Salson, M.: On the number of elements to reorder when updating a suffix array. J. Discrete Algorithms 11, 87–99 (2012)
Article MathSciNet MATH Google Scholar
Mäkinen, V., Navarro, G.: Succinct Suffix Arrays based on Run-Length Encoding. Nord. J. Comput. 12, 40–66 (2005)
Google Scholar
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Baltimore, Maryland, USA, pp. 841–850 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science, University of Paderborn, Fürstenallee 11, 33102, Paderborn, Germany
Stefan Böttcher, Alexander Bültmann, Rita Hartel & Jonathan Schlüßler

Authors

Stefan Böttcher
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Bültmann
View author publications
You can also search for this author in PubMed Google Scholar
Rita Hartel
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Schlüßler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Instituto Tecnológico de Informática, Valencia, Spain
Hendrik Decker
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Department of Computer Science, The University of Auckland, 1010, Auckland, New Zealand
Sebastian Link
Department of Information Technologies, University of Economics, Winston Churchill Square 4, 130 67, Prague 3, Czech Republic
Josef Basl
Institute of Software Technology, Vienna University of Technology, Favoritenstraße 9-11 / 188, 1040, Vienna, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Böttcher, S., Bültmann, A., Hartel, R., Schlüßler, J. (2013). Implementing Efficient Updates in Compressed Big Text Databases. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 2013. Lecture Notes in Computer Science, vol 8056. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40173-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-40173-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40172-5
Online ISBN: 978-3-642-40173-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics