Finding Patterns with Variable Length Gaps or Don’t Cares

  • M. Sohel Rahman
  • Costas S. Iliopoulos
  • Inbok Lee
  • Manal Mohamed
  • William F. Smyth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4112)


In this paper we have presented new algorithms to handle the pattern matching problem where the pattern can contain variable length gaps. Given a pattern P with variable length gaps and a text T our algorithm works in O(n + m + α  log(max \(_{\rm 1<={\it i}<={\it l}}\)(b i a i ))) time where n is the length of the text, m is the summation of the lengths of the component subpatterns, α is the total number of occurrences of the component subpatterns in the text and a i and b i are, respectively, the minimum and maximum number of don’t cares allowed between the ith and (i+1)st component of the pattern. We also present another algorithm which, given a suffix array of the text, can report whether P occurs in T in O(m + α loglogn) time. Both the algorithms record information to report all the occurrences of P in T. Furthermore, the techniques used in our algorithms are shown to be useful in many other contexts.


Pattern Match String Match Valid Range Pattern Match Algorithm Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aho, A., Corasick, M.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Akutsu, T.: Approximate string matching with variable length don’t care characters. IEICE Trans. Information and Systems E79-D, 1353–1354 (1996)Google Scholar
  3. 3.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proceedings of the Symposium on Discrete Algorithms (SODA 2000), pp. 794–803 (2000)Google Scholar
  4. 4.
    Baeza-Yates, R., Gonnet, G.: A new approach to text searching. Communications of the ACM 35, 74–82 (1992)CrossRefGoogle Scholar
  5. 5.
    Cole, R., Hariharan, R.: Approximate string matching: a faster simpler algorithm. In: Proceedings of the Symposium on Discrete Algorithms (SODA 1998), pp. 463–472 (1998)Google Scholar
  6. 6.
    Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings of the Symposium on Theory of Computing (STOC 2002), pp. 592–601 (2002)Google Scholar
  7. 7.
    Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. SIGACT News 17(4), 52–54 (1986)CrossRefGoogle Scholar
  8. 8.
    Fischer, M.J., Paterson, M.S.: String matching and other products. Technical report, Massachusetts Institute of Technology, Cambridge, MA (1974)Google Scholar
  9. 9.
    Gusfield, D.: Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge (1997)MATHCrossRefGoogle Scholar
  10. 10.
    Kärkkäinen, J., Sanders, P.: Simple linear work Suffix Array Construction. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 943–955. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Ko, P., Aluru, S.: Space Efficient Linear Time Construction of Suffix Arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Kim, D.K., Sim, J.S., Park, H., Park, K.: Linear-Time Construction of Suffix Arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 186–199. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Landau, G.M., Vishkin, U.: Efficient string matching with k mismatches. Theoretical Computer Science 43, 239–249 (1986)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. Journal of Algorithms 10(2), 157–169 (1989)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Lee, I., Apostolico, A., Iliopoulos, C.S., Park, K.: Finding approximate occurrence of a pattern that contains gaps. In: Proceedings of the 14th Australasian Workshop on Combinatorial Algorithms (AWOCA 2003), pp. 89–100 (2003)Google Scholar
  16. 16.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching. Journal of Computational Biology 10(6), 903–923 (2003)CrossRefGoogle Scholar
  17. 17.
    Sahinalp, S.C., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proceedings of the Symposium on Foundations of Computer Science, pp. 320–328 (1996)Google Scholar
  18. 18.
    van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6, 80–82 (1977)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • M. Sohel Rahman
    • 1
  • Costas S. Iliopoulos
    • 1
  • Inbok Lee
    • 2
  • Manal Mohamed
    • 1
  • William F. Smyth
    • 3
  1. 1.Algorithm Design Group Department of Computer ScienceKing’s College London StrandLondonEngland
  2. 2.School of Computer Science and EngineeringSeoul National UniversitySeoulKorea
  3. 3.Algorithms Research Group, Department of Computing and SoftwareMcMaster UniversityCanada

Personalised recommendations