Advertisement

String Matching with Variable Length Gaps

  • Philip Bille
  • Inge Li Gørtz
  • Hjalte Wedel Vildhøj
  • David Kofoed Wind
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6393)

Abstract

We consider string matching with variable length gaps. Given a string T and a pattern P consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in T that match P. This problem is a basic primitive in computational biology applications. Let m and n be the lengths of P and T, respectively, and let k be the number of strings in P. We present a new algorithm achieving time O((n + m)logk + α) and space O(m + A), where A is the sum of the lower bounds of the lengths of the gaps in P and α is the total number of occurrences of the strings in P within T. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of m, n, k, A, and α. Our algorithm is surprisingly simple and straightforward to implement.

Keywords

Pattern Match Regular Expression String Match Trie Transition Pattern String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bille, P.: New algorithms for regular expression matching. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 643–654. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. 36th ICALP, pp. 171–182 (2009)Google Scholar
  4. 4.
    Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st SODA (2010)Google Scholar
  5. 5.
    Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)Google Scholar
  6. 6.
    Crochemore, M., Iliopoulos, C., Makris, C., Rytter, W., Tsakalidis, A., Tsichlas, K.: Approximate string matching with gaps. Nordic J. of Computing 9(1), 54–65 (2002)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)CrossRefGoogle Scholar
  8. 8.
    Fredriksson, K., Grabowski, S.: Nested counters in bit-parallel string matching. In: Dediu, A.H., Ionescu, A.M., Martín-Vide, C. (eds.) LATA 2009. LNCS, vol. 5457, pp. 338–349. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The prosite database, its status in. Nucleic Acids Res. (27), 215–219 (1999)Google Scholar
  10. 10.
    Knuth, D.E., James, J., Morris, H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Lee, I., Apostolico, A., Iliopoulos, C.S., Park, K.: Finding approximate occurrences of a pattern that contains gaps. In: Proc. 14th AWOCA, pp. 89–100 (2003)Google Scholar
  12. 12.
    Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Bio. 12(8), 1065–1082 (2005)CrossRefGoogle Scholar
  13. 13.
    Myers, E.W.: Approximate matching of network expressions with spacers. J. Comput. Bio. 3(1), 33–51 (1992)CrossRefGoogle Scholar
  14. 14.
    Myers, E.W.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Myers, G., Mehldau, G.: A system for pattern matching applications on biosequences. CABIOS 9(3), 299–314 (1993)Google Scholar
  16. 16.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Bio. 10(6), 903–923 (2003)CrossRefGoogle Scholar
  17. 17.
    Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Thompson, K.: Regular expression search algorithm. Commun. ACM 11, 419–422 (1968)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Philip Bille
    • 1
  • Inge Li Gørtz
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • David Kofoed Wind
    • 1
  1. 1.Technical University of DenmarkDenmark

Personalised recommendations