Computing the Cost of Compressed Data
Compression mechanisms reduce the storage cost of retained data. In the extreme case of data that must be retained indefinitely, the initial cost of performing the compression transformation can be amortized down to zero, since the savings in storage space continue to accrue without limit, albeit at decreasing rates as time goes by and disk storage becomes cheaper. A more typical scenario arises when a fixed data retention period must be supported, after which the stored data is no longer required; and when a certain level of access operations to the stored data can be expected, as part of a regulatory or compliance environment. In this second scenario, the total cost of retention(TCR) is a function of multiple competing factors, and the compression regime that provides the most compact storage might not be the one that provides the smallest TCR. This entry summarizes recent work in the area of cost models for...
- Duda J (2009) Asymmetric numeral systems. CoRR abs/0902.0271Google Scholar
- Duda J (2013) Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding. CoRR abs/1311.2540Google Scholar
- Farruggia A, Ferragina P, Venturini R (2014) Bicriteria data compression: efficient and usable. In: Proceedings of the European symposium on algorithms (ESA), pp 406–417Google Scholar
- Hoobin C, Puglisi SJ, Zobel J (2011) Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections. PVLDB 5(3):265–273Google Scholar
- Liao K, Moffat A, Petri M, Wirth A (2017) A cost model for long-term compressed data retention. In: Proceedings of the ACM international conference on web search and data mining (WSDM), pp 241–249Google Scholar
- Moffat A, Petri M (2017) ANS-based index compression. In: Proceedings of the ACM international conference on information and knowledge management (CIKM), pp 677–686Google Scholar