Simple and Efficient LZW-Compressed Multiple Pattern Matching

  • Paweł Gawrychowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)


We consider a natural variant of the classical multiple pattern matching problem: given a Lempel-Ziv-Welch representation of a string \(t[1\mathinner{\ldotp\ldotp} N]\) and a collection of (uncompressed) patterns p 1,p 2,…,p with ∑  i |p i | = M, does any of p i occur in t? As shown by Kida et al. [12], extending the single pattern algorithm of Amir, Benson and Farach [2] gives a running time of \(\mathcal{O}(n+M^{2})\) for the more general case. We prove that in fact it is possible to achieve \(\mathcal{O}(n\log M+M)\) or \(\mathcal{O}(n+M^{1+\epsilon})\) complexity. While not linear, running time of our solution matches the single pattern bounds achieved by [2] and [14] in a more structured and unified manner, and without using a lot of combinatorics on words. The only nontrivial components are the suffix array, constant time range minimum queries, and any balanced binary search trees.


multiple pattern matching compression Lempel-Ziv-Welch 


