Abstract
A novel algorithm for mining sequential patterns using a graph and multi-layered compression is the main focus of the paper. Mining for patterns is done using a graph structure which allows fast and efficient mining of necessary frequent patterns in the text. These patterns are used with a modification of the seminal LZ78 algorithm to improve the efficiency of compression. Arithmetic coding is done on top of LZ78 to further reduce the redundancy in the text and to achieve higher rates of compression. The proposed approach has been tested with standard corpora and it shows promising results in comparison with Arithmetic coding.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Calgary compression corpus datasets. corpus.canterbury.ac.nz/descriptions/. Accessed 23 July 2015
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Deutsch, L.P.: Deflate compressed data format specification version 1.3 (1996)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101 (1952)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Oswald, C., Ghosh, A.I., Sivaselvan, B.: An efficient text compression algorithm - data mining perspective. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 563–575. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_53
Oswald, C., Sivaselvan, B.: An optimal text compression algorithm based on frequent pattern mining. J. Ambient Intell. Humaniz. Comput. 1–20 (2017). Springer
Pavlov, I.: LZMA SDK (software development kit) (2007)
Ramakrishnan, N., Grama, A.Y.: Data mining: from serendipity to science. Computer 32(8), 34–37 (1999)
Salomon, D.: Data Compression: The Complete Reference. Springer Science & Business Media, Heidelberg (2004). https://doi.org/10.1007/978-1-84628-603-2
Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Oswald, C., Ajith Kumar, I., Avinash, J., Sivaselvan, B. (2017). A Graph-Based Frequent Sequence Mining Approach to Text Compression. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-71928-3_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71927-6
Online ISBN: 978-3-319-71928-3
eBook Packages: Computer ScienceComputer Science (R0)