Data compression with substitution

  • Maxime Crochemore
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 377)


All the data compression methods described in this paper are based on substitutions acting on characters or factors occurring inside the source texts. The average expected compression ratio is often close to 2. Most methods have a bad behaviour when error appears in encoded texts. One bit lost and the decompression is almost impossible!

To increase the compression ratios, other methods can be used. Arithmetic coding is such an example which leads to higher efficiency.

Another way to increase the compression ratios is to give up the "lossless information" condition. These compaction methods must use semantic rule to recover the original information. Such methods cannot be applied to create archives or to communicate. A compaction example is found in [McI 82] for the "spell" program available under the Unix operating system.


Compression Ratio Data Compression Sequential Algorithm Source Text Arithmetic Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7. References

  1. [AC 75]
    A.V. Aho, M.J. Corasick, Efficient string matching: An aid to bibliographic research, Commun. ACM18,6 (1975), 333–340.Google Scholar
  2. [BSTW 86]
    J.L. Bentley, D.D. Sleator, R.E. Tarjan, V.K. Wei, A locally adaptive data compression scheme, Commun. ACM29,4 (1986), 320–330.Google Scholar
  3. [BP 85]
    J.Berstel, D.Perrin, Theory of codes, Academic Press (1985).Google Scholar
  4. [Cr 86]
    M. Crochemore, Transducers and Repetitions, Theoret. Comput. Sci.45 (1986), 63–86.Google Scholar
  5. [El 75]
    P. Elias, Universal Codeword Sets and Representation of the Integers, I.E.E.E. Trans. Inform. TheoryIT 21,2 (1975), 194–203.Google Scholar
  6. [Fa 73]
    N.Faller, An adaptive system for data compression, in Record of the 7th Asilomar Conference on Circuits, Systems, and Computers (1973), 593–597.Google Scholar
  7. [Ga 68]
    R.G.Gallager, Information Theory and Reliable Communication, Wiley (1968).Google Scholar
  8. [Ga 78]
    R.G. Gallager, Variations on a theme by Huffman, I.E.E.E. Trans. Inform. TheoryIT 24,6 (1978), 668–674.Google Scholar
  9. [GM 82]
    E.N. Gilbert, C.L. Monma, Multigram Codes, I.E.E.E. Trans. Inform. TheoryIT 28,2 (1982), 346–348.Google Scholar
  10. [HR 84]
    A.Hartman, M.Rodeh, Optimal Parsing of Strings, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 155–167.Google Scholar
  11. [Ha 80]
    R.W.Hamming, Coding and Information Theory, Prentice-Hall (1980).Google Scholar
  12. [He 87]
    G.Held, La compression des données, méthodes et applications, Masson (1987).Google Scholar
  13. [Hu 51]
    D.A. Huffman, A method for the construction of minimum redundancy codes, Proc. IRE40 (1951), 1098–1101.Google Scholar
  14. [Ja 85]
    M. Jakobsson, Compression of character strings by an adaptive dictionary, BIT25 (1985), 593–603.Google Scholar
  15. [Kn 85]
    D.E. Knuth, Dynamic Huffman Coding, J. Algorithms6 (1985), 163–180.Google Scholar
  16. [La 83]
    G.G. Langdon Jr., A Note on the Ziv-Lempel Model for Compressing Individual Sequences, I.E.E.E. Trans. Inform.TheoryIT 29,2 (1983), 284–287.Google Scholar
  17. [LZ 76]
    A. Lempel, J. Ziv, On the Complexity of Finite Sequences, I.E.E.E. Trans. Inform.TheoryIT 22,1 (1976), 75–81.Google Scholar
  18. [Ll 87]
    J.A. Llewellyn, Data Compression for a Source with Markov Charateristics, Comput. J.30,2 (1987), 149–156.Google Scholar
  19. [Mc 82]
    M.D. McIlroy, Development of a Spelling List, I.E.E.E. Trans. Commun.COM 30,1 (1982), 91–99.Google Scholar
  20. [MW 84]
    V.S.Miller, M.N.Wegman, Variations on a Theme by Ziv and Lempel, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985), 131–140.Google Scholar
  21. [RL 79]
    J. Rissanen, G.G. Langdon Jr., Arithmetic Coding, IBM J. Res. Dev.23,2 (1979), 149–162.Google Scholar
  22. [RL 81]
    J. Rissanen, G.G. Langdon Jr., Universal Modeling and Coding, I.E.E.E. Trans. Inform.TheoryIT 27,1 (1981), 12–23.Google Scholar
  23. [RPE 81]
    M. Rodeh, V.R. Pratt, S. Even, Linear Algorithm for Data Compression via String Matching, J. Assoc. Comput. Mach.28,1 (1981), 16–24.Google Scholar
  24. [St 84]
    J.A.Storer, Textual Substitution Techniques for Data Compression, in (Combinatorial Algorithms on Words, Apostolico & Galil ed., Springer-Verlag (1985)) 111–129.Google Scholar
  25. [SS 82]
    J.A. Storer, T.G. Szymanski, Data Compression via Textual Substitution, J. Assoc. Comput. Mach.29,4 (1982), 928–951.Google Scholar
  26. [Vi 87]
    J.S. Vitter, Design and Analysis of Dynamic Huffman Codes, J. Assoc. Comput. Mach.34,4 (1987), 825–845.Google Scholar
  27. [We 84]
    T.A. Welch, A Technique for High-Performance Data Compression, I.E.E.E. Computer17,6 (1984), 8–19.Google Scholar
  28. [WNC 87]
    I.H. Witten, R.M. Neal, J.G. Cleary, Arithmetic coding for data compression, Commun. ACM30,6 (1987), 520–540.Google Scholar
  29. [ZL 77]
    J. Ziv, A. Lempel, A Universal Algorithm for Sequential Data Compression, I.E.E.E. Trans. Inform.TheoryIT 23,3 (1977), 337–343.Google Scholar
  30. [ZL 78]
    J. Ziv, A. Lempel, Compression of Individual Sequences via Variable-rate Coding, I.E.E.E. Trans. Inform.TheoryIT 24,5 (1978), 530–536.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Maxime Crochemore
    • 1
  1. 1.Centre Scientifique et PolytechniqueUniversité de Paris-NordVilletaneuse

Personalised recommendations