Indexed Multi-pattern Matching

  • Travis Gagie
  • Kalle Karhu
  • Juha Kärkkäinen
  • Veli Mäkinen
  • Leena Salmela
  • Jorma Tarhio
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7256)

Abstract

If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in n H k (S) + o (n (H k (S) + 1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of \(\ensuremath{\mathcal{O}\!\left( {(z + t) \log \ell \log m \log^{1 + \epsilon} n} \right)}\) time, where z is the number of phrases in the parse.

Keywords

Pattern Match Parse Tree Approximate String Match Library Element Dynamic Library 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340Google Scholar
  2. 2.
    Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Transactions on Algorithms 3(2) (2007)Google Scholar
  3. 3.
    Barbay, J., Gagie, T., Navarro, G., Nekrich, Y.: Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, K.-Y., Park, K. (eds.) ISAAC 2010, Part II. LNCS, vol. 6507, pp. 315–326. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)Google Scholar
  5. 5.
    Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352(1-3), 240–249 (2006)MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Proceedings of the Prague Stringology Conference, pp. 226–231 (2011)Google Scholar
  7. 7.
    Karpinski, M., Rytter, W., Shinohara, A.: Pattern-matching for strings with short descriptions. Nordic Journal of Computing 4(2), 172–186 (1997)MathSciNetMATHGoogle Scholar
  8. 8.
    Lifshits, Y.: Processing Compressed Texts: A Tractability Border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Miyazaki, M., Shinohara, A., Takeda, M.: An Improved Pattern Matching Algorithm for Strings in Terms of Straight-line Programs. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 1–11. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  11. 11.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)Google Scholar
  12. 12.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Travis Gagie
    • 1
  • Kalle Karhu
    • 1
  • Juha Kärkkäinen
    • 2
  • Veli Mäkinen
    • 2
  • Leena Salmela
    • 2
  • Jorma Tarhio
    • 1
  1. 1.Department of Computer Science and EngineeringAalto UniversityFinland
  2. 2.Department of Computer ScienceUniversity of HelsinkiFinland

Personalised recommendations