A Graph-Based Frequent Sequence Mining Approach to Text Compression

Oswald, C.; Ajith Kumar, I.; Avinash, J.; Sivaselvan, B.

doi:10.1007/978-3-319-71928-3_35

A Graph-Based Frequent Sequence Mining Approach to Text Compression

C. Oswald¹⁶,
I. Ajith Kumar¹⁷,
J. Avinash¹⁷ &
…
B. Sivaselvan¹⁶

Conference paper
First Online: 28 November 2017

1152 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10682))

Abstract

A novel algorithm for mining sequential patterns using a graph and multi-layered compression is the main focus of the paper. Mining for patterns is done using a graph structure which allows fast and efficient mining of necessary frequent patterns in the text. These patterns are used with a modification of the seminal LZ78 algorithm to improve the efficiency of compression. Arithmetic coding is done on top of LZ78 to further reduce the redundancy in the text and to achieve higher rates of compression. The proposed approach has been tested with standard corpora and it shows promising results in comparison with Arithmetic coding.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Calgary compression corpus datasets. corpus.canterbury.ac.nz/descriptions/. Accessed 23 July 2015
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Google Scholar
Deutsch, L.P.: Deflate compressed data format specification version 1.3 (1996)
Google Scholar
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
MATH Google Scholar
Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101 (1952)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Oswald, C., Ghosh, A.I., Sivaselvan, B.: An efficient text compression algorithm - data mining perspective. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 563–575. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_53
Chapter Google Scholar
Oswald, C., Sivaselvan, B.: An optimal text compression algorithm based on frequent pattern mining. J. Ambient Intell. Humaniz. Comput. 1–20 (2017). Springer
Google Scholar
Pavlov, I.: LZMA SDK (software development kit) (2007)
Google Scholar
Ramakrishnan, N., Grama, A.Y.: Data mining: from serendipity to science. Computer 32(8), 34–37 (1999)
Article Google Scholar
Salomon, D.: Data Compression: The Complete Reference. Springer Science & Business Media, Heidelberg (2004). https://doi.org/10.1007/978-1-84628-603-2
MATH Google Scholar
Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)
Article Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
Article MathSciNet MATH Google Scholar
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Indian Institute of Information Technology, Design and Manufacturing Kancheepuram, Chennai, India
C. Oswald & B. Sivaselvan
Department of Computer Science and Engineering, Sona College of Technology, Salem, India
I. Ajith Kumar & J. Avinash

Authors

C. Oswald
View author publications
You can also search for this author in PubMed Google Scholar
I. Ajith Kumar
View author publications
You can also search for this author in PubMed Google Scholar
J. Avinash
View author publications
You can also search for this author in PubMed Google Scholar
B. Sivaselvan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Avinash .

Editor information

Editors and Affiliations

Indian Statistical Institute, Kolkata, India
Ashish Ghosh
Institute for Development and Research in Banking Technology, Hyderabad, India
Rajarshi Pal
Indian Institute of Information Technology, Sri City, India
Rajendra Prasath

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oswald, C., Ajith Kumar, I., Avinash, J., Sivaselvan, B. (2017). A Graph-Based Frequent Sequence Mining Approach to Text Compression. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-71928-3_35
Published: 28 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71927-6
Online ISBN: 978-3-319-71928-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics