Advertisement

Dictionary Matching with One Gap

  • Amihood Amir
  • Avivit Levy
  • Ely Porat
  • B. Riva Shalom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8486)

Abstract

The dictionary matching with gaps problem is to preprocess a dictionary D of d gapped patterns P 1,…,P d over alphabet Σ, where each gapped pattern P i is a sequence of subpatterns separated by bounded sequences of don’t cares. Then, given a query text T of length n over alphabet Σ, the goal is to output all locations in T in which a pattern P i  ∈ D, 1 ≤ i ≤ d, ends. There is a renewed current interest in the gapped matching problem stemming from cyber security. In this paper we solve the problem where all patterns in the dictionary have one gap with at least α and at most β don’t cares, where α and β are given parameters. Specifically, we show that the dictionary matching with a single gap problem can be solved in either O(dlogd + |D|) time and O(dlog ε d + |D|) space, and query time O(n(β − α)loglogd log2 min { d, log|D| } + occ), where occ is the number of patterns found, or preprocessing time: O(d 2·ovr + |D|), where ovr is the maximal number of subpatterns including each other as a prefix or as a suffix, space: O(d 2 + |D|), and query time O(n(β − α) + occ), where occ is the number of patterns found. As far as we know, this is the best solution for this setting of the problem, where many overlaps may exist in the dictionary.

Keywords

Pattern Match Range Query Query Time Tree Path Vertical Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)CrossRefMATHMathSciNetGoogle Scholar
  2. 2.
    Amir, A., Calinescu, G.: Alphabet independent and dictionary scaled matching. J. of Algorithms 36, 34–62 (2000)CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Amir, A., Farach, M., Giancarlo, R., Galil, Z., Park, K.: Dynamic dictionary matching. Journal of Computer and System Sciences 49(2), 208–222 (1994)CrossRefMATHMathSciNetGoogle Scholar
  4. 4.
    Amir, A., Farach, M., Idury, R.M., La Poutré, J.A., Schäffer, A.A.: Improved dynamic dictionary matching. Information and Computation 119(2), 258–282 (1995)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  6. 6.
    Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. Theoretical Computer Science (443), 25–34 (2012)Google Scholar
  7. 7.
    Bille, P., Thorup, M.: Faster regular expression matching. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 171–182. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1297–1308 (2010)Google Scholar
  9. 9.
    Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  10. 10.
    Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th Annual ACM Symposium on the Theory of Computing (STOC), pp. 91–100. ACM Press (2004)Google Scholar
  11. 11.
    Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the ram, revisited. In: Proc. 27th ACM Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)Google Scholar
  12. 12.
    Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes and transposition invariance. Inf. Retr. 11(4), 338–349 (2008)CrossRefGoogle Scholar
  13. 13.
    Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online Dictionary Matching with Variable-Length Gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database. Nucleic Acids Res. (27), 215–219 (1999)Google Scholar
  15. 15.
    Krishnamurthy, M., Seagren, E.S., Alder, R., Bayles, A.W., Burke, J., Carter, S., Faskha, E.: How to Cheat at Securing Linux. Syngress Publishing, Inc., Elsevier, Inc., 30 Corporate Dr., Burlington, MA 01803 (2008), e-edition: http://www.sciencedirect.com/science/book/9781597492072
  16. 16.
    Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theoret. Comput. Sci. 178(12), 129–154 (1997)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)CrossRefMATHMathSciNetGoogle Scholar
  18. 18.
    Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Bio. 12(8), 1065–1082 (2005)CrossRefGoogle Scholar
  19. 19.
    Myers, G.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)CrossRefMATHGoogle Scholar
  20. 20.
    Myers, G., Mehldau, G.: A system for pattern matching applications on biosequences. CABIOS 9(3), 299–314 (1993)Google Scholar
  21. 21.
    Naa, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304(3), 87–101 (2003)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Bio. 10(6), 903–923 (2003)CrossRefGoogle Scholar
  23. 23.
    Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2004)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)CrossRefMATHMathSciNetGoogle Scholar
  26. 26.
    van Emde Boas, P.: Preserving order in a forest in less than logarithmic time. In: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, pp. 75–84 (1975)Google Scholar
  27. 27.
    Verint. Packet intrusion detection. Personal communication (2013)Google Scholar
  28. 28.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar
  29. 29.
    Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inform. Process. Letters 110, 216–220 (2010)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Amihood Amir
    • 1
    • 2
  • Avivit Levy
    • 3
  • Ely Porat
    • 1
  • B. Riva Shalom
    • 3
  1. 1.Department of Computer ScienceBar-Ilan UniversityRamat-GanIsrael
  2. 2.Department of Computer ScienceJohns Hopkins UniversityBaltimoreUSA
  3. 3.Department of Software EngineeringShenkar CollegeRamat-GanIsrael

Personalised recommendations