ADVIS 2004: Advances in Information Systems pp 420-429 | Cite as
A Preprocessor Adding Security to and Improving the Performances of Arithmetic and Huffman Codings
Abstract
Arithmetic Coding and Huffman Coding are among the most common lossless compression algorithms. Their compression performances are relatively low as compared to some other tools. Still, they have widespread acceptance due to their high throughput rates. Both algorithms exploit symbol frequencies to achieve compression, which can be increased by redistributing the symbol statistics in a recoverable manner. We introduce a symbol redistributing scheme to serve as a preprocessor to improve compression. The preprocessor itself is an encryption machine providing compression and simple security. The preprocessor is succeeded by conventional compression tool to offer further compression. The overall scheme is called the Secure Compressor (SeCom). The system employing Arithmetic or Huffman Coding as compressor has been implemented and tested on sample texts in English and Turkish. Results show that SeCom considerably improves compression performances of both algorithms and introduces security to the system, as well.
Keywords
English Text Compression Performance Lossless Compression Arithmetic Code Huffman CodePreview
Unable to display preview. Download preview PDF.
References
- 1.Nelson, M.: Arith. Coding+Stat. Modeling=Data Comp. Dr. Dobb’s Journal (1991)Google Scholar
- 2.Smith, S.W.: The Scientist and Engineer’s Guide to Digital Signal Processing. In: Data Compression, ch. 27, USA (2000)Google Scholar
- 3.Teahan, W.J.: Modelling English Text. D.Phil. thesis, The University of Waikato, Hamilton, New Zealand (1998)Google Scholar
- 4.Nelson, M.: The Data Compression Book. M&T Publishing, New York (1996)Google Scholar
- 5.Lelewer, D.A., Hirschberg, D.S.: Data Compression. ACM Computing Surveys 19(3) (September 1987)Google Scholar
- 6.Chiang, L.: Lossless and Lossy Image Compression, Digital Data Compression, PhD Thesis (February 1998), http://www.image.cityu.edu.hk/~loben/thesis.node2.htm
- 7.Witten, I., Moffat, A., Bell, T.C.: Managing Gigabytes Compressing and Indexing Documents and Images, 2nd edn. Morgan Kauffman Publishers, Inc., San Francisco (1999)Google Scholar
- 8.Calgary Corpus: http://ftp.cpcs.ucalgary.ca/pub/projects/text.compression.corpus
- 9.Canterbury Corpus: http://corpus.canterbury.ac.nz
- 10.Diri, B.: A System Based on the Analysis, Complying With the Turkish Language Structure, and Dynamic Word Based Lossless Compression of Turkish Texts, PhD Thesis (in Turkish), Yildiz Technical University, Istanbul, Turkey (1999)Google Scholar
- 11.Celikel, E.: Modelling and Compression of Turkish Texts., PhD Thesis (in Turkish), Ege University, International Computer Institute, Izmir, Turkey (2004)Google Scholar
- 12.Stinson, D.R.: Cryptography Theory and Practice. CRC Press, USA (1995)MATHGoogle Scholar
- 13.Shannon, C.: A Mathematical Theory of Communication. The Bell Sytem Technical Journal 27, 379–423, 623-656 (1948)MATHMathSciNetGoogle Scholar
- 14.Dalkilic, M.E., Dalkilic, G.: On the Entropy, Redundancy and Compression of Contemporary Printed Turkish. In: Proceedings of International Symposium on Computer and Information Sciences (ISCIS) XV, October 11-13, pp. 60–67. Yildiz Technical University, Istanbul (2000)Google Scholar
- 15.Burrows, M., Wheeler, D.J.: A Block-Sorting Lossless Data Compression Algorithm. In: Digital Systems Research Center, 130 Lytton Avenue, Palo Alto, California, USA (1994)Google Scholar
- 16.