Skip to main content

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1645))

Abstract

We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv-Lempel compression. We then apply the scheme to each particular type of compression. We present the first algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching on it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search in as LZ78.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. Second IEEE Data Compression Conference, pages 279–288, March 1992.

    Google Scholar 

  2. [2]-A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. Journal of Computer and System Sciences, 52(2):299–307, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  3. R. Baeza-Yates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, September 1992.

    Google Scholar 

  4. R. Baeza-Yates and G. Gonnet. A new approach to text searching. Communications of the ACM, 35(10):74–82, October 1992.

    Article  Google Scholar 

  5. T. Bell, J. Cleary, and I. Witten. Text Compression. Prentice Hall, New Jersey, 1990.

    Google Scholar 

  6. T. Bell and D. Kulp. Longest-match string searching for Ziv-Lempel compression. Software-Practice and Experience, 23(7):757–771, July 1993.

    Article  Google Scholar 

  7. J. Bentley, D. Sleator, R. Tarjan, and V. Wei. A locally adaptive data compression scheme. Communications of the ACM, 29:320–330, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  8. R.S. Boyer and J.S. Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762–772, 1977.

    Article  MATH  Google Scholar 

  9. A. Czumaj, Maxime Crochemore, L. Gasieniec, S. Jarominek, Thierry Lecroq, W. Plandowski, and W. Rytter. Speeding up two string-matching algorithms. Algorithmica, 12:247–267, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  10. P. Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21:194–203, 1975.

    Article  MathSciNet  MATH  Google Scholar 

  11. M. Farach and M. Thorup. String matching in Lempel-Ziv compressed strings. In 27th ACM Annual Symposium on the Theory of Computing, pages 703–712, 1995.

    Google Scholar 

  12. E. Fiala and D. Greene. Data compression with finite windows. Communications of the ACM, 32(4):490–505, 4 1989.

    Article  Google Scholar 

  13. L. Gasieniec, M. Karpinksi, W. Plandowski, and W. Rytter. Efficient algorithms for Lempel-Ziv encodings. In Proc. SWAT’96, 1996.

    Google Scholar 

  14. R.N. Horspool. Practical fast searching in strings. Software Practice and Experience, 10:501–506, 1980.

    Article  Google Scholar 

  15. D. Huffman. A method for the construction of minimum-redundancy codes. Proc. of the I.R.E., 40(9):1090–1101, 1952.

    Google Scholar 

  16. J. K ärkkäinen and E. Ukkonen. Sparse suffix trees. In COCOON’96, pages 219–230, 1996. LNCS v. 1090.

    Google Scholar 

  17. M. Karpinski, A. Shinohara, and W. Rytter. Pattern matching problem for strings with short descriptions. Nordic Journal of Computing, 4(2):172–186, 1997.

    MathSciNet  MATH  Google Scholar 

  18. T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Shift-and approach to pattern matching in lzw compressed text. In Proc. CPM’99, 1999. To appear.

    Google Scholar 

  19. D.E. Knuth, J.H. Morris, Jr, and V.R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(1):323–350, 1977.

    Article  MathSciNet  MATH  Google Scholar 

  20. [20] U. Manber. A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems, 15(2):124–136, 1997.

    Article  Google Scholar 

  21. V. Miller and M. Wegman. Variations on a theme by Ziv and Lempel. In Combinatorial Algorithms on Words, volume 12 of NATO ASI Series F, pages 131–140. Springer-Verlag, 1985.

    Article  MathSciNet  Google Scholar 

  22. A. Moffat. Word-based text compression. Software Practice and Experience, 19(2):185–198, 1989.

    Article  Google Scholar 

  23. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Direct pattern matching on compressed text. In Proc. SPIRE’98, pages 90–95. IEEE CS Press, 1998.

    Google Scholar 

  24. E. Moura, G. Navarro, N. Ziviani, and R. Baeza-Yates. Fast searching on compressed text allowing errors. In Proc. SIGIR’98, pages 298–306. York Press, 1998.

    Google Scholar 

  25. G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. In Proc. CPM’98, LNCS v. 1448, pages 14–33, 1998.

    Google Scholar 

  26. G. Navarro and M. Raffinot. A general practical approach to pattern matching over Ziv-Lempel compressed text. Technical Report TR/DCC-98-12, Dept. of Computer Science, Univ. of Chile, 1998.

    Google Scholar 

  27. D. Sunday. A very fast substring search algorithm. Communications of the ACM, 33(8):132–142, August 1990.

    Article  Google Scholar 

  28. T. A.Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.

    Article  Google Scholar 

  29. I. Witten, R. Neal, and J. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–541, 1987.

    Article  Google Scholar 

  30. M. Zipstein. Data compression with factor automata. Theor. Comput. Sci., 92(1):213–221, 1992.

    Article  MATH  Google Scholar 

  31. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23:337–343, 1977.

    Article  MathSciNet  MATH  Google Scholar 

  32. J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory, 24:530–536, 1978.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G., Raffinot, M. (1999). A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-48452-3_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66278-5

  • Online ISBN: 978-3-540-48452-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics