Skip to main content

Graphs Cannot Be Indexed in Polynomial Time for Sub-quadratic Time String Matching, Unless SETH Fails

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12607)

Abstract

The string matching problem on a node-labeled graph \(G=(V,E)\) asks whether a given pattern string P has an occurrence in G, in the form of a path whose concatenation of node labels equals P. This is a basic primitive in various problems in bioinformatics, graph databases, or networks, but only recently proven to have a O(|E||P|)-time lower bound, under the Orthogonal Vectors Hypothesis (OVH). We consider here its indexed version, in which we can index the graph in order to support time-efficient string queries.

We show that, under OVH, no polynomial-time indexing scheme of the graph can support querying P in time \(O(|P|+|E|^\delta |P|^\beta )\), with either \(\delta < 1\) or \(\beta < 1\). As a side-contribution, we introduce the notion of linear independent-components (lic) reduction , allowing for a simple proof of our result. As another illustration that hardness of indexing follows as a corollary of a lic reduction, we also translate the quadratic conditional lower bound of Backurs and Indyk (STOC 2015) for the problem of matching a query string inside a text, under edit distance. We obtain an analogous tight quadratic lower bound for its indexed version, improving the recent result of Cohen-Addad, Feuilloley and Starikovskaya (SODA 2019), but with a slightly different boundary condition.

Keywords

  • Exact pattern matching
  • Indexing
  • Orthogonal vectors
  • Complexity theory
  • Reductions
  • Lower bounds
  • Edit distance
  • Graph query

This work was partially funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 851093, SAFEBIO) and by the Academy of Finland (grants No. 309048, 322595, 328877).

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Notice that if we further require that the path repeats no node (i.e. is a simple path) then SMLG becomes NP-hard, since the Hamiltonian path problem can be easily reduced to it, see e.g. [19].

  2. 2.

    The total degree is the sum of in-degree and out-degree of a node.

  3. 3.

    We implicitly assumed here that the graph G is the part of the input to be indexed. By exchanging G and P it trivially holds that we also cannot polynomially index a pattern string P to support fast queries in the form of a labeled graph.

  4. 4.

    The idea of splitting the two sets into smaller groups was also used in [3] to obtain a fast randomized algorithm for OV, based on the polynomial method, and therein the groups always had equal size.

  5. 5.

    Originally [19] P and G were built on X and Y, respectively. Since it is immaterial for correctness, we assumed the opposite here to keep in line with the notation.

References

  1. Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: FOCS 2015, Berkeley, CA, USA, pp. 59–78 (2015)

    Google Scholar 

  2. Abboud, A., Rubinstein, A., Williams, R.R.: Distributed PCP theorems for hardness of approximation in P. In: IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, pp. 25–36. IEEE (2017)

    Google Scholar 

  3. Abboud, A., Williams, R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, California, pp. 218–230 (2015)

    Google Scholar 

  4. Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: IEEE 55th Annual Symposium on Foundations of Computer Science, Philadelphia, PA, USA, pp. 434–443 (2014)

    Google Scholar 

  5. Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Regular languages meet prefix sorting. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Salt Lake City, UT, USA, pp. 911–930 (2020)

    Google Scholar 

  6. Alzamel, M., et al.: Degenerate string comparison and applications. In: Parida, L., Ukkonen, E. (eds.) 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 113, pp. 21:1–21:14. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)

    Google Scholar 

  7. Amir, A., Lewenstein, M., Lewenstein, N.: Pattern matching in hypertext. In: Dehne, F., Rau-Chaplin, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1997. LNCS, vol. 1272, pp. 160–173. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63307-3_56

    CrossRef  Google Scholar 

  8. Aoyama, K., et al.: Faster online elastic degenerate string matching. In: Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2018)

    Google Scholar 

  9. Backurs, A., Indyk, P.: Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, New York, USA, pp. 51–58 (2015)

    Google Scholar 

  10. Backurs, A., Indyk, P.: Which regular expression patterns are hard to match? In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), New Brunswick, NJ, USA, pp. 457–466. IEEE (2016)

    Google Scholar 

  11. Bernardini, G., Gawrychowski, P., Pisanti, N., Pissis, S.P., Rosone, G.: Even faster elastic-degenerate string matching via fast matrix multiplication. In: Baier, C., Chatzigiannakis, I., Flocchini, P., Leonardi, S. (eds.) 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece. LIPIcs, vol. 132, pp. 21:1–21:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2019)

    Google Scholar 

  12. Bille, P.: Personal Communication at Dagstuhl Seminar on Indexes and Computation over Compressed Structured Data (2013)

    Google Scholar 

  13. Bringmann, K.: Why walking the dog takes time: frechet distance has no strongly subquadratic algorithms unless seth fails. In: IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 661–670. IEEE (2014)

    Google Scholar 

  14. Bringmann, K., Kunnemann, M.: Quadratic conditional lower bounds for string problems and dynamic time warping. In: IEEE 56th Annual Symposium on Foundations of Computer Science, Washington, USA, pp. 79–97. IEEE (2015)

    Google Scholar 

  15. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)

    Google Scholar 

  16. Cohen-Addad, V., Feuilloley, L., Starikovskaya, T.: Lower bounds for text indexing with mismatches and differences. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, USA, pp. 1146–1164 (2019)

    Google Scholar 

  17. Consortium, T.C.P.G.: Computational pan-genomics: status, promises and challenges. Briefings in Bioinform. 19(1), 118–135 (2018)

    Google Scholar 

  18. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific (2002)

    Google Scholar 

  19. Equi, M., Grossi, R., Mäkinen, V., Tomescu, A.I.: On the complexity of string matching for graphs. In: 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Patras, Greece, pp. 55:1–55:15 (2019)

    Google Scholar 

  20. Equi, M., Grossi, R., Tomescu, A.I., Mäkinen, V.: On the complexity of exact pattern matching in graphs: determinism and zig-zag matching. arXiv e-prints arXiv:1902.03560 (2019)

  21. Equi, M., Mäkinen, V., Tomescu, A.I.: Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless seth fails. arXiv e-prints arXiv:2002.00629 (2020)

  22. Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)

    CrossRef  MathSciNet  Google Scholar 

  23. Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1–4:33 (2009)

    CrossRef  MathSciNet  Google Scholar 

  24. Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017)

    CrossRef  MathSciNet  Google Scholar 

  25. Garrison, E., et al.: Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875 (2018)

    CrossRef  Google Scholar 

  26. Gibney, D.: An efficient elastic-degenerate text index? not likely. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 76–88. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_6

    CrossRef  Google Scholar 

  27. Gibney, D., Thankachan, S.V.: On the hardness and inapproximability of recognizing Wheeler graphs. In: ESA 2019, Munich/Garching, Germany, pp. 51:1–51:16 (2019)

    Google Scholar 

  28. Goldstein, I., Lewenstein, M., Porat, E.: Orthogonal vectors indexing. In: ISAAC 2017, Dagstuhl, Germany, pp. 40:1–40:12 (2017)

    Google Scholar 

  29. Goldstein, I., Lewenstein, M., Porat, E.: On the hardness of set disjointness and set intersection with bounded universe. In: ISAAC 2019, Shanghai, China. LIPIcs, vol. 149, pp. 7:1–7:22 (2019)

    Google Scholar 

  30. Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2006)

    CrossRef  MathSciNet  Google Scholar 

  31. Grossi, R., et al.: On-line pattern matching on similar texts. In: CPM 2017. vol. 78, p. 1. Schloss Dagstuhl-Leibniz-Zentrum für Informatik GmbH (2017)

    Google Scholar 

  32. Iliopoulos, C.S., Kundu, R., Pissis, S.P.: Efficient pattern matching in elastic-degenerate texts. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds.) LATA 2017. LNCS, vol. 10168, pp. 131–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53733-7_9

    CrossRef  Google Scholar 

  33. Impagliazzo, R., Paturi, R.: On the complexity of k-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001)

    CrossRef  MathSciNet  Google Scholar 

  34. Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L.: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37(8), 907–915 (2019)

    CrossRef  Google Scholar 

  35. Mäkinen, V., Cazaux, B., Equi, M., Norri, T., Tomescu, A.I.: Linear time construction of indexable founder block graphs. In: WABI 2020, Pisa, Italy. LIPIcs, vol. 172, pp. 7:1–7:18 (2020). https://doi.org/10.4230/LIPIcs.WABI.2020.7

  36. Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)

    CrossRef  MathSciNet  Google Scholar 

  37. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), 2 (2007)

    CrossRef  Google Scholar 

  38. Patrascu, M., Roditty, L.: Distance oracles beyond the Thorup-Zwick bound. SIAM J. Comput. 43(1), 300–311 (2014)

    CrossRef  MathSciNet  Google Scholar 

  39. Rautiainen, M., Mäkinen, V., Marschall, T.: Bit-parallel sequence-to-graph alignment. Bioinformatics 35(19), 3599–3607 (2019)

    CrossRef  Google Scholar 

  40. Schneeberger, K., et al.: Simultaneous alignment of short reads against multiple genomes. Genome Biol. 10, R98 (2009)

    CrossRef  Google Scholar 

  41. Sirén, J.: Indexing variation graphs. In: ALENEX 2017, Barcelona, Spain, pp. 13–27 (2017)

    Google Scholar 

  42. Sirén, J., Välimäki, N., Mäkinen, V.: Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(2), 375–388 (2014)

    CrossRef  Google Scholar 

  43. Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)

    CrossRef  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Massimo Equi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Equi, M., Mäkinen, V., Tomescu, A.I. (2021). Graphs Cannot Be Indexed in Polynomial Time for Sub-quadratic Time String Matching, Unless SETH Fails. In: Bureš, T., et al. SOFSEM 2021: Theory and Practice of Computer Science. SOFSEM 2021. Lecture Notes in Computer Science(), vol 12607. Springer, Cham. https://doi.org/10.1007/978-3-030-67731-2_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67731-2_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67730-5

  • Online ISBN: 978-3-030-67731-2

  • eBook Packages: Computer ScienceComputer Science (R0)