Advertisement

Worst Case Efficient Single and Multiple String Matching in the RAM Model

  • Djamal Belazzougui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6460)

Abstract

In this paper, we explore worst-case solutions for the problems of pattern and multi-pattern matching on strings in the RAM model with word length w. In the first problem, we have a pattern p of length m over an alphabet of size σ, and given any text T of length n, where each character is encoded using logσ bit, we wish to find all occurrences of p. For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all the occurrences in T of the patterns in S (in the following we refer by occ to the number of reported occurrences). As each character of the text is encoded using logσ bits and we can read w bits in constant time in the RAM model, the best query time for the two problems which can only possibly be achieved by reading Θ(w/logσ) consecutive characters, is \(O(n\frac{\log\sigma}{w}+occ)\). In this paper, we present two results. The first result is that using O(m) words of space, single pattern matching queries can be answered in time \(O(n(\frac{\log m}{m}+\frac{\log \sigma}{w})+occ)\), and multiple pattern matching queries answered in time \(O(n(\frac{\log d+\log y+\log\log m}{y}+\frac{\log \sigma}{w})+occ)\), where y is the length of the shortest pattern. Our second result is a variant of the first result which uses the four Russian technique to remove the dependence on the shortest pattern length at the expense of using an additional space t. It answers to multi-pattern matching queries in time \(O(n\frac{\log d+\log\log_\sigma t+\log\log m}{\log_\sigma t}+occ)\) using O(m + t) words of space.

Keywords

Pattern Match Word Length Query Time String Match Short String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. ACM Commun. 18(6), 333–340 (1975)CrossRefzbMATHGoogle Scholar
  2. 2.
    Arlazarov, V.L., Dinic, E.A., Kronrod, M.A., Faradzev, I.A.: On economical construction of the transitive closure of a directed graph. Soviet Mathematics Doklady 11(5), 1209–1210 (1970)zbMATHGoogle Scholar
  3. 3.
    Belazzougui, D.: Succinct dictionary matching with no slowdown. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 88–100. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Bille, P.: Fast searching in packed strings. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009. LNCS, vol. 5577, pp. 116–126. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. ACM Commun. 20(10), 762–772 (1977)CrossRefzbMATHGoogle Scholar
  6. 6.
    Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)CrossRefzbMATHGoogle Scholar
  7. 7.
    Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler transform: Linking range searching and text indexing. In: DCC, pp. 252–261 (2008)Google Scholar
  8. 8.
    Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12(4/5), 247–267 (1994)CrossRefzbMATHGoogle Scholar
  9. 9.
    Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)zbMATHGoogle Scholar
  10. 10.
    Dietzfelbinger, M., Gil, J., Matias, Y., Pippenger, N.: Polynomial hash functions are reliable (extended abstract). In: ICALP, pp. 235–246 (1992)Google Scholar
  11. 11.
    Ferragina, P., Grossi, R.: The string b-tree: A new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)CrossRefzbMATHGoogle Scholar
  12. 12.
    Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31(3), 538–544 (1984)CrossRefzbMATHGoogle Scholar
  13. 13.
    Fredriksson, K.: Faster string matching with super-alphabets. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 44–57. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  14. 14.
    Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  15. 15.
    Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)CrossRefzbMATHGoogle Scholar
  16. 16.
    Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)CrossRefzbMATHGoogle Scholar
  17. 17.
    Navarro, G.: Indexing text using the ziv-lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)CrossRefzbMATHGoogle Scholar
  18. 18.
    Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: Fast extended string matching. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 14–33. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  19. 19.
    Patrascu, M.: (data) structures. In: FOCS, pp. 434–443 (2008)Google Scholar
  20. 20.
    Rivals, E., Salmela, L., Kiiskinen, P., Kalsi, P., Tarhio, J.: mpscan: Fast localisation of multiple reads in genomes. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 246–260. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  21. 21.
    Tam, A., Wu, E., Lam, T.W., Yiu, S.-M.: Succinct text indexing with wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Mathematical Systems Theory 10, 99–127 (1977)CrossRefzbMATHGoogle Scholar
  23. 23.
    Willard, D.E.: Log-logarithmic worst-case range queries are possible in space theta(n). Inf. Process. Lett. 17(2), 81–84 (1983)CrossRefzbMATHGoogle Scholar
  24. 24.
    Yao, A.C.-C.: The complexity of pattern matching for a random string. SIAM J. Comput. 8(3), 368–387 (1979)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Djamal Belazzougui
    • 1
  1. 1.LIAFAUniv. Paris Diderot-Paris 7Paris, Cedex 13France

Personalised recommendations