Abstract
The string matching problem on a node-labeled graph \(G=(V,E)\) asks whether a given pattern string P has an occurrence in G, in the form of a path whose concatenation of node labels equals P. This is a basic primitive in various problems in bioinformatics, graph databases, or networks, but only recently proven to have a O(|E||P|)-time lower bound, under the Orthogonal Vectors Hypothesis (OVH). We consider here its indexed version, in which we can index the graph in order to support time-efficient string queries.
We show that, under OVH, no polynomial-time indexing scheme of the graph can support querying P in time \(O(|P|+|E|^\delta |P|^\beta )\), with either \(\delta < 1\) or \(\beta < 1\). As a side-contribution, we introduce the notion of linear independent-components (lic) reduction , allowing for a simple proof of our result. As another illustration that hardness of indexing follows as a corollary of a lic reduction, we also translate the quadratic conditional lower bound of Backurs and Indyk (STOC 2015) for the problem of matching a query string inside a text, under edit distance. We obtain an analogous tight quadratic lower bound for its indexed version, improving the recent result of Cohen-Addad, Feuilloley and Starikovskaya (SODA 2019), but with a slightly different boundary condition.
Keywords
- Exact pattern matching
- Indexing
- Orthogonal vectors
- Complexity theory
- Reductions
- Lower bounds
- Edit distance
- Graph query
This work was partially funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 851093, SAFEBIO) and by the Academy of Finland (grants No. 309048, 322595, 328877).
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Notice that if we further require that the path repeats no node (i.e. is a simple path) then SMLG becomes NP-hard, since the Hamiltonian path problem can be easily reduced to it, see e.g. [19].
- 2.
The total degree is the sum of in-degree and out-degree of a node.
- 3.
We implicitly assumed here that the graph G is the part of the input to be indexed. By exchanging G and P it trivially holds that we also cannot polynomially index a pattern string P to support fast queries in the form of a labeled graph.
- 4.
The idea of splitting the two sets into smaller groups was also used in [3] to obtain a fast randomized algorithm for OV, based on the polynomial method, and therein the groups always had equal size.
- 5.
Originally [19] P and G were built on X and Y, respectively. Since it is immaterial for correctness, we assumed the opposite here to keep in line with the notation.
References
Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: FOCS 2015, Berkeley, CA, USA, pp. 59–78 (2015)
Abboud, A., Rubinstein, A., Williams, R.R.: Distributed PCP theorems for hardness of approximation in P. In: IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA, pp. 25–36. IEEE (2017)
Abboud, A., Williams, R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, California, pp. 218–230 (2015)
Abboud, A., Williams, V.V.: Popular conjectures imply strong lower bounds for dynamic problems. In: IEEE 55th Annual Symposium on Foundations of Computer Science, Philadelphia, PA, USA, pp. 434–443 (2014)
Alanko, J., D’Agostino, G., Policriti, A., Prezza, N.: Regular languages meet prefix sorting. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Salt Lake City, UT, USA, pp. 911–930 (2020)
Alzamel, M., et al.: Degenerate string comparison and applications. In: Parida, L., Ukkonen, E. (eds.) 18th International Workshop on Algorithms in Bioinformatics (WABI 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 113, pp. 21:1–21:14. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018)
Amir, A., Lewenstein, M., Lewenstein, N.: Pattern matching in hypertext. In: Dehne, F., Rau-Chaplin, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1997. LNCS, vol. 1272, pp. 160–173. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63307-3_56
Aoyama, K., et al.: Faster online elastic degenerate string matching. In: Annual Symposium on Combinatorial Pattern Matching (CPM 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2018)
Backurs, A., Indyk, P.: Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, New York, USA, pp. 51–58 (2015)
Backurs, A., Indyk, P.: Which regular expression patterns are hard to match? In: IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), New Brunswick, NJ, USA, pp. 457–466. IEEE (2016)
Bernardini, G., Gawrychowski, P., Pisanti, N., Pissis, S.P., Rosone, G.: Even faster elastic-degenerate string matching via fast matrix multiplication. In: Baier, C., Chatzigiannakis, I., Flocchini, P., Leonardi, S. (eds.) 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9–12, 2019, Patras, Greece. LIPIcs, vol. 132, pp. 21:1–21:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2019)
Bille, P.: Personal Communication at Dagstuhl Seminar on Indexes and Computation over Compressed Structured Data (2013)
Bringmann, K.: Why walking the dog takes time: frechet distance has no strongly subquadratic algorithms unless seth fails. In: IEEE 55th Annual Symposium on Foundations of Computer Science, pp. 661–670. IEEE (2014)
Bringmann, K., Kunnemann, M.: Quadratic conditional lower bounds for string problems and dynamic time warping. In: IEEE 56th Annual Symposium on Foundations of Computer Science, Washington, USA, pp. 79–97. IEEE (2015)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Tech. Rep. 124, Digital Equipment Corporation (1994)
Cohen-Addad, V., Feuilloley, L., Starikovskaya, T.: Lower bounds for text indexing with mismatches and differences. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, USA, pp. 1146–1164 (2019)
Consortium, T.C.P.G.: Computational pan-genomics: status, promises and challenges. Briefings in Bioinform. 19(1), 118–135 (2018)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific (2002)
Equi, M., Grossi, R., Mäkinen, V., Tomescu, A.I.: On the complexity of string matching for graphs. In: 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019), Patras, Greece, pp. 55:1–55:15 (2019)
Equi, M., Grossi, R., Tomescu, A.I., Mäkinen, V.: On the complexity of exact pattern matching in graphs: determinism and zig-zag matching. arXiv e-prints arXiv:1902.03560 (2019)
Equi, M., Mäkinen, V., Tomescu, A.I.: Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless seth fails. arXiv e-prints arXiv:2002.00629 (2020)
Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)
Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and indexing labeled trees, with applications. J. ACM 57(1), 4:1–4:33 (2009)
Gagie, T., Manzini, G., Sirén, J.: Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698, 67–78 (2017)
Garrison, E., et al.: Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875 (2018)
Gibney, D.: An efficient elastic-degenerate text index? not likely. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 76–88. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_6
Gibney, D., Thankachan, S.V.: On the hardness and inapproximability of recognizing Wheeler graphs. In: ESA 2019, Munich/Garching, Germany, pp. 51:1–51:16 (2019)
Goldstein, I., Lewenstein, M., Porat, E.: Orthogonal vectors indexing. In: ISAAC 2017, Dagstuhl, Germany, pp. 40:1–40:12 (2017)
Goldstein, I., Lewenstein, M., Porat, E.: On the hardness of set disjointness and set intersection with bounded universe. In: ISAAC 2019, Shanghai, China. LIPIcs, vol. 149, pp. 7:1–7:22 (2019)
Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2006)
Grossi, R., et al.: On-line pattern matching on similar texts. In: CPM 2017. vol. 78, p. 1. Schloss Dagstuhl-Leibniz-Zentrum für Informatik GmbH (2017)
Iliopoulos, C.S., Kundu, R., Pissis, S.P.: Efficient pattern matching in elastic-degenerate texts. In: Drewes, F., Martín-Vide, C., Truthe, B. (eds.) LATA 2017. LNCS, vol. 10168, pp. 131–142. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53733-7_9
Impagliazzo, R., Paturi, R.: On the complexity of k-SAT. J. Comput. Syst. Sci. 62(2), 367–375 (2001)
Kim, D., Paggi, J.M., Park, C., Bennett, C., Salzberg, S.L.: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37(8), 907–915 (2019)
Mäkinen, V., Cazaux, B., Equi, M., Norri, T., Tomescu, A.I.: Linear time construction of indexable founder block graphs. In: WABI 2020, Pisa, Italy. LIPIcs, vol. 172, pp. 7:1–7:18 (2020). https://doi.org/10.4230/LIPIcs.WABI.2020.7
Masek, W.J., Paterson, M.S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1), 2 (2007)
Patrascu, M., Roditty, L.: Distance oracles beyond the Thorup-Zwick bound. SIAM J. Comput. 43(1), 300–311 (2014)
Rautiainen, M., Mäkinen, V., Marschall, T.: Bit-parallel sequence-to-graph alignment. Bioinformatics 35(19), 3599–3607 (2019)
Schneeberger, K., et al.: Simultaneous alignment of short reads against multiple genomes. Genome Biol. 10, R98 (2009)
Sirén, J.: Indexing variation graphs. In: ALENEX 2017, Barcelona, Spain, pp. 13–27 (2017)
Sirén, J., Välimäki, N., Mäkinen, V.: Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(2), 375–388 (2014)
Williams, R.: A new algorithm for optimal 2-constraint satisfaction and its implications. Theor. Comput. Sci. 348(2–3), 357–365 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Equi, M., Mäkinen, V., Tomescu, A.I. (2021). Graphs Cannot Be Indexed in Polynomial Time for Sub-quadratic Time String Matching, Unless SETH Fails. In: Bureš, T., et al. SOFSEM 2021: Theory and Practice of Computer Science. SOFSEM 2021. Lecture Notes in Computer Science(), vol 12607. Springer, Cham. https://doi.org/10.1007/978-3-030-67731-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-67731-2_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67730-5
Online ISBN: 978-3-030-67731-2
eBook Packages: Computer ScienceComputer Science (R0)