Skip to main content

A Graph-Based Frequent Sequence Mining Approach to Text Compression

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10682))

Abstract

A novel algorithm for mining sequential patterns using a graph and multi-layered compression is the main focus of the paper. Mining for patterns is done using a graph structure which allows fast and efficient mining of necessary frequent patterns in the text. These patterns are used with a modification of the seminal LZ78 algorithm to improve the efficiency of compression. Arithmetic coding is done on top of LZ78 to further reduce the redundancy in the text and to achieve higher rates of compression. The proposed approach has been tested with standard corpora and it shows promising results in comparison with Arithmetic coding.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Calgary compression corpus datasets. corpus.canterbury.ac.nz/descriptions/. Accessed 23 July 2015

  2. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)

    Google Scholar 

  3. Deutsch, L.P.: Deflate compressed data format specification version 1.3 (1996)

    Google Scholar 

  4. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  5. Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101 (1952)

    Google Scholar 

  6. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  7. Oswald, C., Ghosh, A.I., Sivaselvan, B.: An efficient text compression algorithm - data mining perspective. In: Prasath, R., Vuppala, A.K., Kathirvalavakumar, T. (eds.) MIKE 2015. LNCS (LNAI), vol. 9468, pp. 563–575. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26832-3_53

    Chapter  Google Scholar 

  8. Oswald, C., Sivaselvan, B.: An optimal text compression algorithm based on frequent pattern mining. J. Ambient Intell. Humaniz. Comput. 1–20 (2017). Springer

    Google Scholar 

  9. Pavlov, I.: LZMA SDK (software development kit) (2007)

    Google Scholar 

  10. Ramakrishnan, N., Grama, A.Y.: Data mining: from serendipity to science. Computer 32(8), 34–37 (1999)

    Article  Google Scholar 

  11. Salomon, D.: Data Compression: The Complete Reference. Springer Science & Business Media, Heidelberg (2004). https://doi.org/10.1007/978-1-84628-603-2

    MATH  Google Scholar 

  12. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (1987)

    Article  Google Scholar 

  13. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theor. 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theor. 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Avinash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oswald, C., Ajith Kumar, I., Avinash, J., Sivaselvan, B. (2017). A Graph-Based Frequent Sequence Mining Approach to Text Compression. In: Ghosh, A., Pal, R., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2017. Lecture Notes in Computer Science(), vol 10682. Springer, Cham. https://doi.org/10.1007/978-3-319-71928-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-71928-3_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-71927-6

  • Online ISBN: 978-3-319-71928-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics