A bit-level text compression scheme based on the ACW algorithm

Article

Abstract

This paper presents a description and performance evaluation of a new bit-level, lossless, adaptive, and asymmetric data compression scheme that is based on the adaptive character wordlength (ACW(n)) algorithm. The proposed scheme enhances the compression ratio of the ACW(n) algorithm by dividing the binary sequence into a number of subsequences (s), each of them satisfying the condition that the number of decimal values (d) of the n-bit length characters is equal to or less than 256. Therefore, the new scheme is referred to as ACW(n, s), where n is the adaptive character wordlength and s is the number of subsequences. The new scheme was used to compress a number of text files from standard corpora. The obtained results demonstrate that the ACW(n, s) scheme achieves higher compression ratio than many widely used compression algorithms and it achieves a competitive performance compared to state-of-the-art compression tools.

Keywords

Data compression bit-level text compression ACW(n) algorithm Huffman coding adaptive coding 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    J. Lánský, M. Žemlička. Text compression: Syllables. In Proceedings of the Dateso Workshop on Databases, Texts, Specifications and Objects, pp. 32–45, 2005.Google Scholar
  2. [2]
    A. Mofat, R. Y. K. Isal. Word-based text compression using the burrows-wheeler transform. Information Processing and Management, vol. 41, no. 5, pp. 1175–1192, 2005.CrossRefGoogle Scholar
  3. [3]
    J. Adiego, P. de la Feunte. On the use of words as source alphabet symbols in PPM. In Proceedings of Data Compression Conference, IEEE, pp. 435, 2006.Google Scholar
  4. [4]
    J. Dvorsky, J. Pokorny, V. Snasel. Word-based compression methods for large text documents. In Proceedings of Data Compression Conference, IEEE, pp. 523, 1999.Google Scholar
  5. [5]
    J. Lánský, M. Žemlička. Compression of a dictionary. In Proceedings of DATESO Workshop on Databases, Texts, Specifications and Objects, pp. 11–20, 2006.Google Scholar
  6. [6]
    H. Al-Bahadili, A. Rababa’a. An adaptive bit-level text compression scheme based on the HCDC algorithm. In Proceedings of Mosharaka International Conference on Communications, Networking and Information Technology, Amman, Jordan, pp. 51–56, 2007.Google Scholar
  7. [7]
    H. Al-Bahadili, S. M. Hussain. An adaptive character wordlength algorithm for data compression. Computers & Mathematics with Applications, vol. 55, no. 6, pp. 1250–1256, 2008.MATHCrossRefMathSciNetGoogle Scholar
  8. [8]
    Y. Weng, J. Jiang. Real-time and automatic close-up retrieval from compressed videos. International Journal of Automation and Computing, vol. 5, no. 2, pp. 198–201, 2008.CrossRefGoogle Scholar
  9. [9]
    L. Zhu, G. Y. Wang, C. Wang. Formal photograph compression algorithm based on object segmentation. International Journal of Automation and Computing, vol. 5, no. 3, pp. 276–283, 2008.CrossRefGoogle Scholar
  10. [10]
    K. Saydood. Introduction to Data Compression, 3rd ed., Morgan Kaufmann, 2006.Google Scholar
  11. [11]
    Y. Ye, P. Cosman. Dictionary design for text image compression with JBIG2. IEEE Transactions on Image Processing, vol. 10, no. 6, pp. 818–828, 2001.MATHCrossRefGoogle Scholar
  12. [12]
    I. H. Witten, A. Moffat, T. C. Bell. Managing gigabytes: Compressing and indexing documents and images. IEEE Transactions on Information Theory, vol. 41, no. 6, Part 2, pp. 2101–2102, 1995.CrossRefGoogle Scholar
  13. [13]
    T. C. Bell, J. G. Cleary, I. H. Witten. Text Compression, NJ, USA: Prentice-Hall, 1990.Google Scholar
  14. [14]
    H. Al-Bahadili. A novel lossless data compression scheme based on the error correcting Hamming codes. Computers & Mathematics with Applications, vol. 56, no. 1, pp. 143–150, 2008.MATHCrossRefMathSciNetGoogle Scholar
  15. [15]
    S. Nofal. Bit-level text compression. In Proceedings of the 1st International Conference on Digital Communications and Computer Applications, Irbid, Jordan, pp. 486–488, 2007.Google Scholar
  16. [16]
    G. Caire, S. Shamai, S. Verdu. Noiseless data compression with low density parity check codes. Advances in Network Information Theory, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, P. Gupta, G. Kramer, A. J. van Wijngaarden, Ed., vol. 66, pp. 263–284Google Scholar
  17. [17]
    A. A. Sharieh. An enhancement of Huffman coding for the compression of multimedia files. Transactions of Engineering Computing and Technology, vol. 3, no. 1, pp. 303–305, 2004.Google Scholar
  18. [18]
    M. V. Mahoney. Fast text compression with neural networks. In Proceedings of the 13th International Florida Artificial Intelligence Research Society Conference, pp. 230–234, 2000.Google Scholar
  19. [19]
    A. Rababaá. An Adaptive Bit-Level Text Compression Scheme Based on the HCDC Algorithm, M. Sc. dissertation, Amman Arab University for Graduate Studies, Amman, Jordan, 2008.Google Scholar
  20. [20]
    R. Arnold, T. Bell. A corpus for the evaluation of lossless compression algorithms. In Proceedings of the Conference on Data Compression, IEEE, pp. 201–210, 1997.Google Scholar
  21. [21]
    J. S. Vitter. Dynamic Huffman codes. Journal of the ACM, vol. 34, no. 4, pp. 158–167, 1989.Google Scholar
  22. [22]
    J. S. Vitter. Design and analysis of dynamic Huffman coding. Journal of the ACM, vol. 34, no. 4, pp. 825–845, 1987.MATHCrossRefMathSciNetGoogle Scholar
  23. [23]
    L. Rueda, B. J. Oommen. A fast and efficient nearly-optimal adaptive Fano coding scheme. Information Science, vol. 176, no. 12, pp. 1656–1683, 2006.MATHCrossRefMathSciNetGoogle Scholar
  24. [24]
    H. Plantinga. An asymmetric, semi-adaptive text compression algorithm. In Proceedings of IEEE Data Compression Conference, 1994.Google Scholar

Copyright information

© Institute of Automation, Chinese Academy of Sciences and Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Arab Academy for Banking and Financial SciencesAmmanJordan
  2. 2.Faculty of Information TechnologyPetra UniversityAmmanJordan

Personalised recommendations