Transforming the Natural Language Text for Improving Compression Performance

Gupta, Ashutosh; Agarwal, Suneeta

doi:10.1007/978-0-387-74935-8_43

Ashutosh Gupta⁴ &
Suneeta Agarwal⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 6))

792 Accesses
4 Citations

In the last 20 years, we have seen a vast explosion of textual information flow over the Web through electronic mail, Web browsing, information retrieval systems, and so on. The importance of data compression is likely to be enhanced in the future as there is a continuous increase in the amount of data that needs to be transformed or archived. In the field of data compression, researchers developed various approaches such as Huffman encoding, arithmetic encoding, Ziv— Lempel family, dynamic Markov compression, prediction with partial matching (PPM [1] and Burrows–Wheeler transform (BWT [2]) based algorithms, among others. BWT permutes the symbol of a data sequence that shares the same unbounded context by cyclic rotation followed by lexicographic sort operations. BWT uses move-to-front and an entropy coder as the backend compressor. PPM is slow and also consumes a large amount of memory to store context information but PPM achieves better compression than almost all existing compression algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Moffat (1990) Implementing the PPM data compression scheme. IEEE Transactions on Communications, 38(11):1917–1921.
Article Google Scholar
M. Burrows and D. Wheeler (1994) A block-sorting lossless data compression algorithm. Technical Report, SRC Research Report 124, Digital Systems Research Center, Palo Alto, CA.
Google Scholar
F.S. Awan and A. Mukherjee (2001) LIPT: A lossless text transform to improve compression. In Proceedings of International Conference on Information and Theory: Coding and Computing, Las Vegas, Nevada, IEEE Computer Society.
Google Scholar
R. Franceschini and A. Mukherjee (1996) Data compression using encrypted text. In Proceedings of the Third Forum on Research and Technology, Advances on Digital Libraries, 130–138. ADL.
Google Scholar
J. Heaps (1978) Information Retrieval—Computational and Theoretical Aspects. Academic Press, New York.
MATH Google Scholar
M.D. Araujo, G. Navaaro, and N. Ziviani (1997) Large text searching allowing errors. In Proceedings of the 4th South American Workshop on String Processing. R. Baeza-Yates, Ed. Carleton University Press International Informatics Series, vol. 8. Carleton University Press, Ottawa, Canada, 2–20.
Google Scholar
E.S. Moura, G. Navarro, and N. Ziviani (1997) Indexing Compressed text. In Proceedings of the 4th South American Workshop on String Processing. R. Baeza-Yates, Ed. Carleton University Press International Informatics Series, vol. 8. Carleton University Press, Ottawa, Canada, 95–111.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering Department, Institute of Engineering and Rural Technology, Allahabad, India
Ashutosh Gupta
Computer Science & Engineering Department, Motilal Nehru National Institute of Technology, Allahabad, India
Suneeta Agarwal

Authors

Ashutosh Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Suneeta Agarwal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Tijuana Institute of Technology, 4207, Chula Vista, CA, 91909, USA
Oscar Castillo
Department of Systems Science and Engineering Yu-Quan Campus, Zhejiang University College of Electrical Engineering, 310027, Hangzhou, People's Republic of China
Li Xu
IAENG Secretariat, 37–39 Hung To Road Unit 1, 1/F, Hong Kong, People's Republic of China
Sio-Iong Ao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gupta, A., Agarwal, S. (2008). Transforming the Natural Language Text for Improving Compression Performance. In: Castillo, O., Xu, L., Ao, SI. (eds) Trends in Intelligent Systems and Computer Engineering. Lecture Notes in Electrical Engineering, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74935-8_43

Download citation

DOI: https://doi.org/10.1007/978-0-387-74935-8_43
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74934-1
Online ISBN: 978-0-387-74935-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics