Skip to main content
Log in

Mind the Gap!

Online Dictionary Matching with One Gap

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary D of d patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from D that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Online DMOG captures the difficulty in a bottleneck procedure for cyber-security, as many digital signatures of viruses manifest themselves as patterns with a single gap. In this paper, we demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem, even in the offline setting, can be traced back to the infamous 3SUM conjecture. We show a conditional lower bound of \(\varOmega ({\delta }(G_D)+op)\) time per text character, where \(G_D\) is a bipartite graph that captures the structure of D, \({\delta }(G_D)\) is the degeneracy of this graph, and \(op\) is the output size. Moreover, we show a conditional lower bound in terms of the magnitude of gaps for the bounded case, thereby showing that some known offline upper bounds are essentially optimal. We also provide upper-bounds in terms of the degeneracy for the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on \({\delta }(G_D)\). Our algorithms make use of graph orientations, together with some additional techniques. These algorithms are of interest for practical cases in which \({\delta }(G_D)\) is a small constant. Since \({\delta }(G_D)\) can in general be as large as \(\sqrt{d}\), and even larger if \(G_D\) is a multi-graph, we also obtain other solutions adequate for such dense cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. The closely related problem of deciding whether a given vertex is contained by any triangle (a decision version) has been addressed [7].

  2. There is no clear definition of a combinatorial algorithm, and the notion that is accepted by the algorithmic community is that the way to establish if an algorithm is combinatorial or not is done by just looking at it.

  3. Since our final running time has a log-factor, the sub-logarithmic operations costs don’t transfer to the final asymptotic bound. Thus, we can also use interval trees [15].

References

  1. Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (2014)

  2. Alfred, V.A., Corasick, J.C.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  3. Alon, N., Yuster, R., Zwick, U.: Color-coding. J. Assoc. Comput. Mach. (JACM) 42(4), 844–856 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  4. Amir, A., Farach, M., Idury, R.M., La Poutré, J.A., Schäffer, A.A.: Improved dynamic dictionary matching. Inf. Comput. 119(2), 258–282 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  5. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  6. Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Proceedings of the 25th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 11–20 (2014)

  7. Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory Comput. 8(1), 69–94 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. Theor. Comput. Sci. 443, 25–34 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1297–1308 (2010)

  10. Bjørklund, A., Pagh, R., Williams, V.V., Zwick, U.: Listing triangles. In: Proceedings of of 41st International Colloquium on Automata, Languages, and Programming (ICALP (I)), pp. 223–234 (2014)

  11. Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 65–74 (1996)

  12. Chiba, N., Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. (SICOMP) 14(1), 210–223 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  13. Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40–42), 3795–3800 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36 Annual Symposium on Theory of Computing (STOC), pp. 91–100 (2004)

  15. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry, 2 reised edn, ch. Section 10.1: Interval Trees (ed.), p. 212217. Springer, Berlin (2000)

  16. Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008)

    Article  Google Scholar 

  17. Grønlund, A., Pettie, S.: Threesomes, degenerates, and love triangles. In: Proceedings of 55th IEEE Anuual Symposium on Foundation of Computer Science (FOCS), pp. 621–630 (2014)

  18. Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online dictionary matching with variable-length gaps. In: Proceedings of International Symposium on Experimental Algorithms (SEA), pp. 76–87 (2011)

  19. Henzinger, M., Krinninger, S., Nanongkai, D., Saranurak, T.: Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In: Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pp. 21–30 (2015)

  20. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucl. Acids Res. 27(1), 215–219 (1999)

    Article  Google Scholar 

  21. Hon, W.-K., Lam, T.-W., Shah, R., Thankachan, S.V., Ting, H.-F., Yang, Y.: Dictionary matching with uneven gaps. In: Proceedings of the 26th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–260 (2015)

  22. Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. (SICOMP) 7(4), 413–423 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kopelowitz, T., Pettie, S., Porat, E.: Dynamic set intersection. In: Proceedings of the 14th International Symposium on Algorithms and Data Structures (WADS) (2015)

  24. Kopelowitz, T., Pettie, S., Porat, E.: Higher lower bounds from the 3-sum conjecture. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2016)

  25. Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theor. Comput. Sci. 178(1–2), 129–154 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  26. Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005)

    Article  Google Scholar 

  27. Mortensen, C.W.: Fully dynamic orthogonal range reporting on RAM. SIAM J. Comput. 35(6), 1494–1525 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  28. Eugene, W., Myers, G.: A four russians algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)

    MathSciNet  MATH  Google Scholar 

  29. Myers, G., Mehldau, G.: A system for pattern matching applications on biosequences. CABIOS 9(3), 299–314 (1993)

    Google Scholar 

  30. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003)

    Article  Google Scholar 

  31. Pǎtraşcu, M.: Towards polynomial lower bounds for dynamic problems. In: Proceedings of 42nd ACM Symposium on Theory of Computing (STOC), pp. 603–610 (2010)

  32. VerInt.: Personal communication (2013)

  33. Zhang, M., Zhang, Y., Liang, H.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inf. Process. Lett. 110(6), 216–220 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avivit Levy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Partially supported by ISF Grant 571/14, BSF Grant 2014028, and NSF Grants CCF-1217338, CNS-1318294, and CCF-1514383. This work is supported in part by ISF Grant 1278/16. This project has received funding from the European Research Council (ERC) under the European Union Horizon 2020 research and innovation programme (Grant Agreement No. 683064). A partial version of this paper appeared in ISAAC 2016.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amir, A., Kopelowitz, T., Levy, A. et al. Mind the Gap!. Algorithmica 81, 2123–2157 (2019). https://doi.org/10.1007/s00453-018-0526-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-018-0526-2

Keywords

Navigation