Online Dictionary Matching with Variable-Length Gaps

  • Tuukka Haapasalo
  • Panu Silvasti
  • Seppo Sippu
  • Eljas Soisalon-Soininen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6630)

Abstract

The string-matching problem with wildcards is considered in the context of online matching of multiple patterns. Our patterns are strings of characters in the input alphabet and of variable-length gaps, where the width of a gap may vary between two integer bounds or from an integer lower bound to infinity. Our algorithm is based on locating “keywords” of the patterns in the input text, that is, maximal substrings of the patterns that contain only input characters. Matches of prefixes of patterns are collected from the keyword matches, and when a prefix constituting a complete pattern is found, a match is reported. In collecting these partial matches we avoid locating those keyword occurrences that cannot participate in any prefix of a pattern found thus far. Our experiments show that our algorithm scales up well, when the number of patterns increases.

Keywords

Partial Match String Match Input Text Pattern Occurrence Input Character 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. of the ACM 18, 333–340 (1975)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), pp. 1297–1308 (2010)Google Scholar
  4. 4.
    Chen, G., Wu, X., Zhu, X., Arslan, A.N., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Syst. 10, 399–419 (2006)CrossRefGoogle Scholar
  5. 5.
    Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inform. Process. Letters 101, 53–54 (2007)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. of the 36th Annual ACM Symposium on Theory of Computing, pp. 90–100 (2004)Google Scholar
  7. 7.
    Fischer, M., Paterson, M.: String matching and other products. In: Proc. of the 7th SIAM-AMS Complexity of Computation, pp. 113–125 (1974)Google Scholar
  8. 8.
    He, D., Wu, X., Zhu, X.: SAIL-APPROX: an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. In: Proc. of the IEEE Internat. Conf. on Bioinformatics and Biomedicine, BIBM 2007, pp. 151–158 (2007)Google Scholar
  9. 9.
    Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, pp. 655–656 (2002)Google Scholar
  10. 10.
    Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theor. Comput. Sci. 178, 129–154 (1997)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12, 1065–1082 (2005)CrossRefGoogle Scholar
  12. 12.
    Navarro, G.: NR-grep: a fast and flexible pattern-matching tool. Soft. Pract. Exper. 31, 1265–1312 (2001)CrossRefMATHGoogle Scholar
  13. 13.
    Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings. Cambridge University Press, Cambridge (2002)CrossRefMATHGoogle Scholar
  14. 14.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10, 903–923 (2003)CrossRefGoogle Scholar
  15. 15.
    Pinter, R.Y.: Efficient string matching. Combinatorial Algorithms on Words. NATO Advanced Science Institute Series F: Computer and System Sciences, vol. 12, pp. 11–29 (1985)Google Scholar
  16. 16.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inform. Process. Letters 110, 216–220 (2010)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Tuukka Haapasalo
    • 1
  • Panu Silvasti
    • 1
  • Seppo Sippu
    • 2
  • Eljas Soisalon-Soininen
    • 1
  1. 1.Aalto University School of ScienceFinland
  2. 2.University of HelsinkiFinland

Personalised recommendations