Text Compression Using Antidictionaries

  • M. Crochemore
  • F. Mignosi
  • A. Restivo
  • S. Salemi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1644)


We give a new text compression scheme based on Forbidden Words (“antidictionary”). We prove that our algorithms attain the entropy for balanced binary sources. They run in linear time. Moreover, one of the main advantages of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and allows parallel compression. Our algorithms can also be presented as “compilers” that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata.


data compression information theory finite automaton forbidden word pattern matching 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A.V. Aho, M.J. Corasick. afficient string matching: an aid to bibliographic search. Comm. ACM 18:6 (1975) 333–340.zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    R. Ash. Information Theory. Tracts in mathematics, Interscience Publishers, J. Wiley & Sons, 1985.Google Scholar
  3. 3.
    M.P. Béal. Codage Symbolique. Masson, 1993.Google Scholar
  4. 4.
    M.-P. Béal, F. Mignosi, A. Restivo. Minimal Forbidden Words and Symbolic Dynamics. in (STACS’96, C. Puech and R. Reischuk, eds., LNCS 1046, Springer, 1996) 555–566.Google Scholar
  5. 5.
    J. Berstel, D. Perrin. Finite and infinite words. in (Algebraic Combinatorics on Words, J. Berstel, D. Perrin, eds., Cambridge University Press, to appear) Chapter 1. Available at
  6. 6.
    C. Choffrut, K. Culik. On Extendibility of Unavoidable Sets. Discrete Appl. Math., 9, 1984, 125–137.zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    T.C. Bell, J.G. Cleary, I.H. Witten. Text Compression. Prentice Hall, 1990.Google Scholar
  8. 8.
    M. Crochemore, F. Mignosi, A. Restivo. Minimal Forbidden Words and Factor Automata. in (MFCS’98, L. Brim, J. Gruska, J. Slatuška, eds., LNCS 1450, Springer, 1998) 665–673.Google Scholar
  9. 9.
    M. Crochemore, F. Mignosi, A. Restivo. Automata and Forbidden Words. Information Processing Letters 67 (1998) 111–117.CrossRefMathSciNetGoogle Scholar
  10. 10.
    M. Crochemore, F. Mignosi, A. Restivo, S. Salemi. Search in Compressed Data. in preparation.Google Scholar
  11. 11.
    M. Crochemore, F. Mignosi, A. Restivo, S. Salemi. A Compressor Compiler. in preparation.Google Scholar
  12. 12.
    M. Crochemore, W. Rytter. Text Algorithms. Oxford University Press, 1994.Google Scholar
  13. 13.
    V. Diekert, Y. Kobayashi. Some Identities Related to Automata, Determinants, and Möbius Functions. Report Nr. 1997/05, Universität Stuttgart, Fakultät Informatik, 1997.Google Scholar
  14. 14.
    R.S. Ellis. Entropy, Large Deviations, and Statistical Mechanics. Springer Verlag, 1985.Google Scholar
  15. 15.
    J. Gailly. Frequently Asked Questions in data compression, Internet. At the present time available at URL
  16. 16.
    J.G. Kemeny, J.L. Snell. Finite Markov Chains. Van Nostrand Reinhold, 1960.Google Scholar
  17. 17.
    M. Nelson, J. Gailly. The Data Compression Book. M&T Books, New York, NY, 1996. 2nd edition.Google Scholar
  18. 18.
    C. Shannon. Prediction and entropy of printed English. Bell System Technical J., 50–64, January, 1951.Google Scholar
  19. 19.
    J.A. Storer. Data Compression: Methods and Theory. Computer Science Press, Rockville, MD, 1988.Google Scholar
  20. 20.
    I.H. Witten, A. Moffat, T.C. Bell. Managing Gigabytes. Van Nostrand Reinhold, 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • M. Crochemore
    • 1
  • F. Mignosi
    • 2
  • A. Restivo
    • 2
  • S. Salemi
    • 2
  1. 1.Institut Gaspard-MongeItaly
  2. 2.Università di PalermoItaly

Personalised recommendations